Re: Sort-merge join improvement

2018-05-22 Thread Petar Zecevic
Hi, we went through a round of reviews on this PR. Performance improvements can be substantial and there are unit and performance tests included. One remark was that the amount of changed code is large but I don't see how to reduce it and still keep the performance improvements. Besides,

Re: Sort-merge join improvement

2018-05-15 Thread Petar Zecevic
Based on some reviews I put additional effort into fixing the case when wholestage codegen is turned off. Sort-merge join with additional range conditions is now 10x faster (can be more or less, depending on exact use-case) in both cases - with wholestage turned off or on - compared to

Re: Sort-merge join improvement

2018-04-23 Thread Petar Zecevic
Hi, the PR tests completed successfully (https://github.com/apache/spark/pull/21109). Can you please review the patch and merge it upstream if you think it's OK? Thanks, Petar Le 4/18/2018 à 4:52 PM, Petar Zecevic a écrit : As instructed offline, I opened a JIRA for this:

Re: Sort-merge join improvement

2018-04-18 Thread Petar Zecevic
As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of

Sort-merge join improvement

2018-04-17 Thread Petar Zecevic
Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the