GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/15064

    [SPARK-17509]]When wrapping catalyst datatype to Hive data type avoid…

    ## What changes were proposed in this pull request?
    
    When wrapping catalyst datatypes to Hive data type, wrap function was doing 
an expensive pattern matching which was consuming around 11% of cpu time. Avoid 
the pattern matching by returning the wrapper only once and reuse it.
    
    ## How was this patch tested?
    
    Tested by running the job on cluster and saw around 8% cpu improvements.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark skedia/hive_wrapper

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15064
    
----
commit 19a2d96c4be9af363c2f5deb54e4a83b541a03f3
Author: Sital Kedia <ske...@fb.com>
Date:   2016-09-12T18:57:53Z

    [SPARK-17509]]When wrapping catalyst datatype to Hive data type avoid 
pattern matching

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to