Re: DataFrame versus Dataset creation and usage

2016-06-28 Thread Martin Serrano
Xinh, Thanks for the clarification. I'm new to Spark and trying to navigate the different APIs. I was just following some examples and retrofitting them, but I see now I should stick with plain RDDs until my schema is known (at the end of the data pipeline). Thanks again! On 06/24/2016

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Xinh Huynh
Hi Martin, Since your schema is dynamic, how would you use Datasets? Would you know ahead of time the row type T in a Dataset[T]? One option is to start with DataFrames in the beginning of your data pipeline, figure out the field types, and then switch completely over to RDDs or Dataset in the

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Martin Serrano
Indeed. But I'm dealing with 1.6 for now unfortunately. On 06/24/2016 02:30 PM, Ted Yu wrote: In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano > wrote: Hi, I'm

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Ted Yu
In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano wrote: > Hi, > > I'm exposing a custom source to the Spark environment. I have a question > about the best way to approach this problem. > >