Re: Difference between Data set and Data Frame in Spark 2

Mich Talebzadeh Thu, 01 Sep 2016 08:57:40 -0700

Hi,

This is my understanding of these three


RDD is the basic construct to prepare to spread data across the nodes. Any
form and any shape, structured, un-structured etc. It is the building block
of Spark if I may call

Data Frame built on top of RDD to create as tabular format that we all love
to make the original build easily usable (say SQL like queries, column
headings etc). The drawback is it restricts you with what you can do with
Data Frame (now that you have dome RDD.toDF)

DataSet  is the new RDD with improvements on RDD. As I understand from
Sean's explanation they add some optimisation on top the common RDD.

I guess Data Frames are as before.

Please correct me if I am wrong.

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 September 2016 at 16:33, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> Thank you!
> The talk is indeed very good.
>
> Best,
> Ovidiu
>
> On 01 Sep 2016, at 16:47, Jules Damji <ju...@databricks.com> wrote:
>
> Sean put it succinctly the nuanced differences and the evolution of
> Datasets. Simply put, structure, to some extent, limits you—and that's what
> the DataFrames & Datasets, among other things, offer.
>
> When you want low-level control, dealing with unstructured data, blobs of
> text or images, then RDDs makes sense.
>
> There's a an illuminative talk by Michael Armbrust Structuring Spark:
> DataFrames & Datasets, where he makes an eloquent case of their merits &
> motivation, while also elaborates on RDDs.
>
> https://youtu.be/1a4pgYzeFwE
>
> Cheers
>
> Jules
>
> Sent from my iPhone
> Pardon the dumb thumb typos :)
>
>
>
> Sent from my iPhone
> Pardon the dumb thumb typos :)
> On Sep 1, 2016, at 7:35 AM, Ovidiu-Cristian MARCU <
> ovidiu-cristian.ma...@inria.fr> wrote:
>
> Thank you, I like and agree with your point. RDD evolved to Datasets by
> means of an optimizer.
> I just wonder what are the use cases for RDDs (other than current version
> of GraphX leveraging RDDs)?
>
> Best,
> Ovidiu
>
> On 01 Sep 2016, at 16:26, Sean Owen <so...@cloudera.com> wrote:
>
>
> Here's my paraphrase:
>
>
> Datasets are really the new RDDs. They have a similar nature
>
> (container of strongly-typed objects) but bring some optimizations via
>
> Encoders for common types.
>
>
> DataFrames are different from RDDs and Datasets and do not replace and
>
> are not replaced by them. They're fundamentally for tabular data, not
>
> arbitrary objects, and thus supports SQL-like operations that only
>
> make sense on tabular  data.
>
>
> On Thu, Sep 1, 2016 at 3:17 PM, Ashok Kumar
>
> <ashok34...@yahoo.com.invalid> wrote:
>
> Hi,
>
>
> What are practical differences between the new Data set in Spark 2 and the
>
> existing DataFrame.
>
>
> Has Dataset replaced Data Frame and what advantages it has if I use Data
>
> Frame instead of Data Frame.
>
>
> Thanks
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>

Re: Difference between Data set and Data Frame in Spark 2

Reply via email to