yes I tested that. sounds like RDD is faster.
Having said that I think there are advantages within DS over RDD.
Will RDD be phased out?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
I think there could be performance reason.
RDD can be faster than Datasets.
For example check query plan for this code:
spark.range(100).map(_ * 2).filter(_ < 100).map(_ * 2).collect()
There are two serialize / deserialize pairs.
And then compare with RDD equivalent.
sc.parallelize(1 to
On Thu, Sep 1, 2016 at 4:56 PM, Mich Talebzadeh
wrote:
> Data Frame built on top of RDD to create as tabular format that we all love
> to make the original build easily usable (say SQL like queries, column
> headings etc). The drawback is it restricts you with what you
Hi,
This is my understanding of these three
RDD is the basic construct to prepare to spread data across the nodes. Any
form and any shape, structured, un-structured etc. It is the building block
of Spark if I may call
Data Frame built on top of RDD to create as tabular format that we all love
Thank you!
The talk is indeed very good.
Best,
Ovidiu
> On 01 Sep 2016, at 16:47, Jules Damji wrote:
>
> Sean put it succinctly the nuanced differences and the evolution of Datasets.
> Simply put, structure, to some extent, limits you—and that's what the
> DataFrames &
Thank you, I like and agree with your point. RDD evolved to Datasets by means
of an optimizer.
I just wonder what are the use cases for RDDs (other than current version of
GraphX leveraging RDDs)?
Best,
Ovidiu
> On 01 Sep 2016, at 16:26, Sean Owen wrote:
>
> Here's my
Here's my paraphrase:
Datasets are really the new RDDs. They have a similar nature
(container of strongly-typed objects) but bring some optimizations via
Encoders for common types.
DataFrames are different from RDDs and Datasets and do not replace and
are not replaced by them. They're
Hi,
What are practical differences between the new Data set in Spark 2 and the
existing DataFrame.
Has Dataset replaced Data Frame and what advantages it has if I use Data Frame
instead of Data Frame.
Thanks