GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/7942
[SPARK-9357][SQL] Remove JoinedRow/Introduce JoinedProjection [WIP] ```JoinedRow```'s are used to join two rows together, and are used a lot of the most performance critical sections of Spark. The problem with ```JoinedRow``` is that it is an extra layer of indirection, and that the current code has branches; both are serious performance bottlenecks. This PR introduces ```JoinedProjection``` and replaces ```JoinedRow``` as the primary method of combining two rows. A ```JoinedProjection``` is a function that takes a left and a right row as its input, and combines these using the given expressions. ```JoinedRow``` cannot be removed because it provides the only way to do interpreted joined projections (Expression ```eval``` only takes one row as its argument), and because the code generation fallback relies on it. The current implementation supports the interpreted and code generated paths, and has been applied to all aggregate operators in Spark SQL. Other operators using ```JoinedRow```, i.e.: *Joins, Generate and PythonUDF, can be converted in follow-up PRs. cc @yhuai @rxin You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-9357 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7942.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7942 ---- commit 4afec8c5d22cc3483e8331193aa52f4f6302b31f Author: Herman van Hovell <hvanhov...@questtec.nl> Date: 2015-08-04T05:56:26Z WIP - Initial Commit. It compiles. Now make it work. commit 0f1be99d3467d021829b7654d9efc986fc120fa7 Author: Herman van Hovell <hvanhov...@questtec.nl> Date: 2015-08-04T15:51:51Z Clean-up. Replaced non-joined generate path to two paths. Factored out some more expression support. commit e6c5f076fd4bdeed6cd0b75764b6ba127b9fb84c Author: Herman van Hovell <hvanhov...@questtec.nl> Date: 2015-08-04T16:43:06Z Removed Joined Row From Aggregate Operators. commit 05914722bcd0a7508536470c86c1b61628674563 Author: Herman van Hovell <hvanhov...@questtec.nl> Date: 2015-08-04T16:47:56Z Style Fixes. commit 81f11325eb512aa4fc986d323e16a36a9db85185 Author: Herman van Hovell <hvanhov...@questtec.nl> Date: 2015-08-04T17:02:38Z Non-Branching JoinedRow. commit 296a073ede6c195ce6a08d9e1f84176d086dfd0c Author: Herman van Hovell <hvanhov...@questtec.nl> Date: 2015-08-04T20:30:42Z Fix CodeGenFallback path. Bugfixes. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org