Re: Clarification about typesafe aggregations

Reynold Xin Wed, 04 Jan 2017 14:38:16 -0800

Your understanding is correct - it is indeed slower due to extra
serialization. In some cases we can get rid of the serialization if the
value is already deserialized.



On Wed, Jan 4, 2017 at 7:19 AM, geoHeil <georg.kf.hei...@gmail.com> wrote:

> Hi I would like to know more about typeface aggregations in spark.
>
> http://stackoverflow.com/questions/40596638/inquiries-
> about-spark-2-0-dataset/40602882?noredirect=1#comment70139481_40602882
> An example of these is
> https://blog.codecentric.de/en/2016/07/spark-2-0-datasets-case-classes/
> ds.groupByKey(body => body.color)
>
> does
> "myDataSet.map(foo.someVal) is type safe but as any Dataset operation uses
> RDD and compared to DataFrame operations there is a significant overhead.
> Let's take a look at a simple example:"
> hold true e.g. will type safe aggregation require the deserialisation of
> the
> full objects as displayed for
> ds.map(_.foo).explain ?
>
> Kind regards,
> Georg
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Clarification-
> about-typesafe-aggregations-tp20459.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Clarification about typesafe aggregations

Reply via email to