subject:"Performance improvements for sorted RDDs"

RE: Performance improvements for sorted RDDs

2016-03-21 Thread JOAQUIN GUANTER GONZALBEZ

[mailto:daniel.dara...@lynxanalytics.com] Enviado el: lunes, 21 de marzo de 2016 16:20 Para: Ted Yu <yuzhih...@gmail.com> CC: JOAQUIN GUANTER GONZALBEZ <joaquin.guantergonzal...@telefonica.com>; dev@spark.apache.org Asunto: Re: Performance improvements for sorted RDDs There is related discussi

Re: Performance improvements for sorted RDDs

2016-03-21 Thread Daniel Darabos

There is related discussion in https://issues.apache.org/jira/browse/SPARK-8836. It's not too hard to implement this without modifying Spark and we measured ~10x improvement over plain RDD joins. I haven't benchmarked against DataFrames -- maybe they also realize this performance advantage. On

Re: Performance improvements for sorted RDDs

2016-03-21 Thread Ted Yu

Do you have performance numbers to backup this proposal for cogroup operation ? Thanks On Mon, Mar 21, 2016 at 1:06 AM, JOAQUIN GUANTER GONZALBEZ < joaquin.guantergonzal...@telefonica.com> wrote: > Hello devs, > > > > I have found myself in a situation where Spark is doing sub-optimal >

Performance improvements for sorted RDDs

2016-03-21 Thread JOAQUIN GUANTER GONZALBEZ

Hello devs, I have found myself in a situation where Spark is doing sub-optimal computations for my RDDs, and I was wondering whether a patch to enable improved performance for this scenario would be a welcome addition to Spark or not. The scenario happens when trying to cogroup two RDDs that

RE: Performance improvements for sorted RDDs

Re: Performance improvements for sorted RDDs

Re: Performance improvements for sorted RDDs

Performance improvements for sorted RDDs

4 matches

Site Navigation

Mail list logo

Footer information