GitHub user davies opened a pull request: https://github.com/apache/spark/pull/11177
[SPARK-13293] [SQL] generate Expand Expand suffer from create the UnsafeRow from same input multiple times, with codegen, it only need to copy some of the columns. After this, we can see 3X improvements (from 43 seconds to 13 seconds) on a TPCDS query (Q67) that have eight columns in Rollup. Ideally, we could mask some of the columns based on bitmask, I'd leave that in the future, because currently Aggregation (50 ns) is much slower than that just copy the variables (1-2 ns). You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark gen_expand Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11177 ---- commit 22ceda9a82c050abbe0d885513a713e9c2dceb29 Author: Davies Liu <dav...@databricks.com> Date: 2016-02-11T23:32:21Z generate Expand ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org