GitHub user sitalkedia opened a pull request: https://github.com/apache/spark/pull/15064
[SPARK-17509]]When wrapping catalyst datatype to Hive data type avoid⦠## What changes were proposed in this pull request? When wrapping catalyst datatypes to Hive data type, wrap function was doing an expensive pattern matching which was consuming around 11% of cpu time. Avoid the pattern matching by returning the wrapper only once and reuse it. ## How was this patch tested? Tested by running the job on cluster and saw around 8% cpu improvements. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sitalkedia/spark skedia/hive_wrapper Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15064 ---- commit 19a2d96c4be9af363c2f5deb54e4a83b541a03f3 Author: Sital Kedia <ske...@fb.com> Date: 2016-09-12T18:57:53Z [SPARK-17509]]When wrapping catalyst datatype to Hive data type avoid pattern matching ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org