All,

Thank you for the replies.  It seems as though the Dataset API is still far
behind the RDD API.  This is unfortunate as the Dataset API potentially
provides a number of performance benefits.  I will move to using it in a
more limited set of cases for the moment.

Thank you!

Bryan Jeffrey

On Tue, Jun 7, 2016 at 2:50 PM, Richard Marscher <rmarsc...@localytics.com>
wrote:

> There certainly are some gaps between the richness of the RDD API and the
> Dataset API. I'm also migrating from RDD to Dataset and ran into
> reduceByKey and join scenarios.
>
> In the spark-dev list, one person was discussing reduceByKey being
> sub-optimal at the moment and it spawned this JIRA
> https://issues.apache.org/jira/browse/SPARK-15598. But you might be able
> to get by with groupBy().reduce() for now, check performance though.
>
> As for join, the approach would be using the joinWith function on Dataset.
> Although the API isn't as sugary as it was for RDD IMO, something which
> I've been discussing in a separate thread as well. I can't find a weblink
> for it but the thread subject is "Dataset Outer Join vs RDD Outer Join".
>
> On Tue, Jun 7, 2016 at 2:40 PM, Bryan Jeffrey <bryan.jeff...@gmail.com>
> wrote:
>
>> It would also be nice if there was a better example of joining two
>> Datasets. I am looking at the documentation here:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html. It seems
>> a little bit sparse - is there a better documentation source?
>>
>> Regards,
>>
>> Bryan Jeffrey
>>
>> On Tue, Jun 7, 2016 at 2:32 PM, Bryan Jeffrey <bryan.jeff...@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I am looking at the option of moving RDD based operations to Dataset
>>> based operations.  We are calling 'reduceByKey' on some pair RDDs we have.
>>> What would the equivalent be in the Dataset interface - I do not see a
>>> simple reduceByKey replacement.
>>>
>>> Regards,
>>>
>>> Bryan Jeffrey
>>>
>>>
>>
>
>
> --
> *Richard Marscher*
> Senior Software Engineer
> Localytics
> Localytics.com <http://localytics.com/> | Our Blog
> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
> Facebook <http://facebook.com/localytics> | LinkedIn
> <http://www.linkedin.com/company/1148792?trk=tyah>
>

Reply via email to