GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/11177

    [SPARK-13293] [SQL] generate Expand

    Expand suffer from create the UnsafeRow from same input multiple times, 
with codegen, it only need to copy some of the columns.
    
    After this, we can see 3X improvements (from 43 seconds to 13 seconds) on a 
TPCDS query (Q67) that have eight columns in Rollup.
    
    Ideally, we could mask some of the columns based on bitmask, I'd leave that 
in the future, because currently Aggregation (50 ns) is much slower than that 
just copy the variables (1-2 ns).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark gen_expand

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11177
    
----
commit 22ceda9a82c050abbe0d885513a713e9c2dceb29
Author: Davies Liu <dav...@databricks.com>
Date:   2016-02-11T23:32:21Z

    generate Expand

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to