All, Thank you for the replies. It seems as though the Dataset API is still far behind the RDD API. This is unfortunate as the Dataset API potentially provides a number of performance benefits. I will move to using it in a more limited set of cases for the moment.
Thank you! Bryan Jeffrey On Tue, Jun 7, 2016 at 2:50 PM, Richard Marscher <rmarsc...@localytics.com> wrote: > There certainly are some gaps between the richness of the RDD API and the > Dataset API. I'm also migrating from RDD to Dataset and ran into > reduceByKey and join scenarios. > > In the spark-dev list, one person was discussing reduceByKey being > sub-optimal at the moment and it spawned this JIRA > https://issues.apache.org/jira/browse/SPARK-15598. But you might be able > to get by with groupBy().reduce() for now, check performance though. > > As for join, the approach would be using the joinWith function on Dataset. > Although the API isn't as sugary as it was for RDD IMO, something which > I've been discussing in a separate thread as well. I can't find a weblink > for it but the thread subject is "Dataset Outer Join vs RDD Outer Join". > > On Tue, Jun 7, 2016 at 2:40 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> > wrote: > >> It would also be nice if there was a better example of joining two >> Datasets. I am looking at the documentation here: >> http://spark.apache.org/docs/latest/sql-programming-guide.html. It seems >> a little bit sparse - is there a better documentation source? >> >> Regards, >> >> Bryan Jeffrey >> >> On Tue, Jun 7, 2016 at 2:32 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> >> wrote: >> >>> Hello. >>> >>> I am looking at the option of moving RDD based operations to Dataset >>> based operations. We are calling 'reduceByKey' on some pair RDDs we have. >>> What would the equivalent be in the Dataset interface - I do not see a >>> simple reduceByKey replacement. >>> >>> Regards, >>> >>> Bryan Jeffrey >>> >>> >> > > > -- > *Richard Marscher* > Senior Software Engineer > Localytics > Localytics.com <http://localytics.com/> | Our Blog > <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | > Facebook <http://facebook.com/localytics> | LinkedIn > <http://www.linkedin.com/company/1148792?trk=tyah> >