Re: Dataset and lambas

2015-12-07 Thread Michael Armbrust
These specific JIRAs don't exist yet, but watch SPARK- as we'll make sure everything shows up there. On Sun, Dec 6, 2015 at 10:06 AM, Koert Kuipers wrote: > that's good news about plans to avoid unnecessary conversions, and allow > access to more efficient internal types.

Re: Dataset and lambas

2015-12-07 Thread Michael Armbrust
On Sat, Dec 5, 2015 at 3:27 PM, Deenar Toraskar wrote: > > On a similar note, what is involved in getting native support for some > user defined functions, so that they are as efficient as native Spark SQL > expressions? I had one particular one - an arraySum (element

Re: Dataset and lambas

2015-12-07 Thread Koert Kuipers
great thanks On Mon, Dec 7, 2015 at 3:02 PM, Michael Armbrust wrote: > These specific JIRAs don't exist yet, but watch SPARK- as we'll make > sure everything shows up there. > > On Sun, Dec 6, 2015 at 10:06 AM, Koert Kuipers wrote: > >> that's

Re: Dataset and lambas

2015-12-07 Thread Deenar Toraskar
Michael Having VectorUnionSumUDAF implemented would be great. This is quite generic, it does element-wise sum of arrays and maps https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/timeseries/VectorUnionSumUDAF.java and would be massive benefit for a lot of risk

Re: Dataset and lambas

2015-12-06 Thread Koert Kuipers
that's good news about plans to avoid unnecessary conversions, and allow access to more efficient internal types. could you point me to the jiras, if they exist already? i just tried to find them but had little luck. best, koert On Sat, Dec 5, 2015 at 4:09 PM, Michael Armbrust

Dataset and lambas

2015-12-05 Thread Koert Kuipers
hello all, DataFrame internally uses a different encoding for values then what the user sees. i assume the same is true for Dataset? if so, does this means that a function like Dataset.map needs to convert all the values twice (once to user format and then back to internal format)? or is it

Re: Dataset and lambas

2015-12-05 Thread Deenar Toraskar
Hi Michael On a similar note, what is involved in getting native support for some user defined functions, so that they are as efficient as native Spark SQL expressions? I had one particular one - an arraySum (element wise sum) that is heavily used in a lot of risk analytics. Deenar On 5

Re: Dataset and lambas

2015-12-05 Thread Michael Armbrust
On Sat, Dec 5, 2015 at 9:42 AM, Koert Kuipers wrote: > hello all, > DataFrame internally uses a different encoding for values then what the > user sees. i assume the same is true for Dataset? > This is true. We encode objects in the tungsten binary format using code