GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/7942

    [SPARK-9357][SQL] Remove JoinedRow/Introduce JoinedProjection [WIP]

    ```JoinedRow```'s are used to join two rows together, and are used a lot of 
the most performance critical sections of Spark. The problem with 
```JoinedRow``` is that it is an extra layer of indirection, and that the 
current code has branches; both are serious performance bottlenecks.
    
    This PR introduces ```JoinedProjection``` and replaces ```JoinedRow``` as 
the primary method of combining two rows. A ```JoinedProjection``` is a 
function that takes a left and a right row as its input, and combines these 
using the given expressions.
    
    ```JoinedRow``` cannot be removed because it provides the only way to do 
interpreted joined projections (Expression ```eval``` only takes one row as its 
argument), and because the code generation fallback relies on it.
    
    The current implementation supports the interpreted and code generated 
paths, and has been applied to all aggregate operators in Spark SQL. Other 
operators using ```JoinedRow```, i.e.: *Joins, Generate and PythonUDF, can be 
converted in follow-up PRs.
    
    cc @yhuai @rxin 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-9357

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7942
    
----
commit 4afec8c5d22cc3483e8331193aa52f4f6302b31f
Author: Herman van Hovell <hvanhov...@questtec.nl>
Date:   2015-08-04T05:56:26Z

    WIP - Initial Commit. It compiles. Now make it work.

commit 0f1be99d3467d021829b7654d9efc986fc120fa7
Author: Herman van Hovell <hvanhov...@questtec.nl>
Date:   2015-08-04T15:51:51Z

    Clean-up. Replaced non-joined generate path to two paths. Factored out some 
more expression support.

commit e6c5f076fd4bdeed6cd0b75764b6ba127b9fb84c
Author: Herman van Hovell <hvanhov...@questtec.nl>
Date:   2015-08-04T16:43:06Z

    Removed Joined Row From Aggregate Operators.

commit 05914722bcd0a7508536470c86c1b61628674563
Author: Herman van Hovell <hvanhov...@questtec.nl>
Date:   2015-08-04T16:47:56Z

    Style Fixes.

commit 81f11325eb512aa4fc986d323e16a36a9db85185
Author: Herman van Hovell <hvanhov...@questtec.nl>
Date:   2015-08-04T17:02:38Z

    Non-Branching JoinedRow.

commit 296a073ede6c195ce6a08d9e1f84176d086dfd0c
Author: Herman van Hovell <hvanhov...@questtec.nl>
Date:   2015-08-04T20:30:42Z

    Fix CodeGenFallback path. Bugfixes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to