Re: feedback on dataset api explode

2016-05-25 Thread Koert Kuipers
oh yes, this was by accident, it should have gone to dev On Wed, May 25, 2016 at 4:20 PM, Reynold Xin wrote: > Created JIRA ticket: https://issues.apache.org/jira/browse/SPARK-15533 > > @Koert - Please keep API feedback coming. One thing - in the future, can > you send api

Re: feedback on dataset api explode

2016-05-25 Thread Reynold Xin
Created JIRA ticket: https://issues.apache.org/jira/browse/SPARK-15533 @Koert - Please keep API feedback coming. One thing - in the future, can you send api feedbacks to the dev@ list instead of user@? On Wed, May 25, 2016 at 1:05 PM, Cheng Lian wrote: > Agree, since

Re: feedback on dataset api explode

2016-05-25 Thread Cheng Lian
Agree, since they can be easily replaced by .flatMap (to do explosion) and .select (to rename output columns) Cheng On 5/25/16 12:30 PM, Reynold Xin wrote: Based on this discussion I'm thinking we should deprecate the two explode functions. On Wednesday, May 25, 2016, Koert Kuipers

Re: feedback on dataset api explode

2016-05-25 Thread Reynold Xin
Based on this discussion I'm thinking we should deprecate the two explode functions. On Wednesday, May 25, 2016, Koert Kuipers wrote: > wenchen, > that definition of explode seems identical to flatMap, so you dont need it > either? > > michael, > i didn't know about the

Re: feedback on dataset api explode

2016-05-25 Thread Koert Kuipers
wenchen, that definition of explode seems identical to flatMap, so you dont need it either? michael, i didn't know about the column expression version of explode, that makes sense. i will experiment with that instead. On Wed, May 25, 2016 at 3:03 PM, Wenchen Fan wrote:

Re: feedback on dataset api explode

2016-05-25 Thread Michael Armbrust
These APIs predate Datasets / encoders, so that is why they are Row instead of objects. We should probably rethink that. Honestly, I usually end up using the column expression version of explode now that it exists (i.e. explode($"arrayCol").as("Item")). It would be great to understand more why

feedback on dataset api explode

2016-05-25 Thread Koert Kuipers
we currently have 2 explode definitions in Dataset: def explode[A <: Product : TypeTag](input: Column*)(f: Row => TraversableOnce[A]): DataFrame def explode[A, B : TypeTag](inputColumn: String, outputColumn: String)(f: A => TraversableOnce[B]): DataFrame 1) the separation of the functions