Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Mich Talebzadeh
yes I tested that. sounds like RDD is faster. Having said that I think there are advantages within DS over RDD. Will RDD be phased out? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Maciej Bryński
I think there could be performance reason. RDD can be faster than Datasets. For example check query plan for this code: spark.range(100).map(_ * 2).filter(_ < 100).map(_ * 2).collect() There are two serialize / deserialize pairs. And then compare with RDD equivalent. sc.parallelize(1 to

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Sean Owen
On Thu, Sep 1, 2016 at 4:56 PM, Mich Talebzadeh wrote: > Data Frame built on top of RDD to create as tabular format that we all love > to make the original build easily usable (say SQL like queries, column > headings etc). The drawback is it restricts you with what you

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Mich Talebzadeh
Hi, This is my understanding of these three RDD is the basic construct to prepare to spread data across the nodes. Any form and any shape, structured, un-structured etc. It is the building block of Spark if I may call Data Frame built on top of RDD to create as tabular format that we all love

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Ovidiu-Cristian MARCU
Thank you! The talk is indeed very good. Best, Ovidiu > On 01 Sep 2016, at 16:47, Jules Damji wrote: > > Sean put it succinctly the nuanced differences and the evolution of Datasets. > Simply put, structure, to some extent, limits you—and that's what the > DataFrames &

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Ovidiu-Cristian MARCU
Thank you, I like and agree with your point. RDD evolved to Datasets by means of an optimizer. I just wonder what are the use cases for RDDs (other than current version of GraphX leveraging RDDs)? Best, Ovidiu > On 01 Sep 2016, at 16:26, Sean Owen wrote: > > Here's my

Re: Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Sean Owen
Here's my paraphrase: Datasets are really the new RDDs. They have a similar nature (container of strongly-typed objects) but bring some optimizations via Encoders for common types. DataFrames are different from RDDs and Datasets and do not replace and are not replaced by them. They're

Difference between Data set and Data Frame in Spark 2

2016-09-01 Thread Ashok Kumar
Hi, What are practical differences between the new Data set in Spark 2 and the existing DataFrame. Has Dataset replaced Data Frame and what advantages it has if I use Data Frame instead of Data Frame. Thanks