Your understanding is correct - it is indeed slower due to extra serialization. In some cases we can get rid of the serialization if the value is already deserialized.
On Wed, Jan 4, 2017 at 7:19 AM, geoHeil <georg.kf.hei...@gmail.com> wrote: > Hi I would like to know more about typeface aggregations in spark. > > http://stackoverflow.com/questions/40596638/inquiries- > about-spark-2-0-dataset/40602882?noredirect=1#comment70139481_40602882 > An example of these is > https://blog.codecentric.de/en/2016/07/spark-2-0-datasets-case-classes/ > ds.groupByKey(body => body.color) > > does > "myDataSet.map(foo.someVal) is type safe but as any Dataset operation uses > RDD and compared to DataFrame operations there is a significant overhead. > Let's take a look at a simple example:" > hold true e.g. will type safe aggregation require the deserialisation of > the > full objects as displayed for > ds.map(_.foo).explain ? > > Kind regards, > Georg > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Clarification- > about-typesafe-aggregations-tp20459.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >