GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/17933
[SPARK-20588][SQL] Cache TimeZone instances per thread. ## What changes were proposed in this pull request? Because the method `TimeZone.getTimeZone(String ID)` is synchronized on the TimeZone class, concurrent call of this method will become a bottleneck. This especially happens when casting from string value containing timezone info to timestamp value, which uses `DateTimeUtils.stringToTimestamp()` and gets TimeZone instance on the site. This pr makes a cache of the generated TimeZone instances per thread to avoid the synchronization. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-20588 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17933.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17933 ---- commit de79e50779c0f2e17ea26301ac7d1216b37331c9 Author: Takuya UESHIN <ues...@databricks.com> Date: 2017-05-10T05:55:53Z Cache TimeZone instances per thread. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org