GitHub user JkSelf opened a pull request: https://github.com/apache/spark/pull/23269
partial revert 21052 because of the performance degradation in TPC-DS ## What changes were proposed in this pull request? We tested TPC-DS in spark2.3 with and without [L486](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486) and [L487](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487) in following cluster configuration. And the result [tpc-ds result](https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0) has performance degradation. So we currently partial revert 21052. **Cluster info:**  | Master Node | Worker Nodes -- | -- | -- Node | 1x | 4x Processor | Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz | Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Memory | 192 GB | 384 GB Storage Main | 8 x 960G SSD | 8 x 960G SSD Network | 10Gbe |  Role | CM Management NameNodeSecondary NameNodeResource ManagerHive Metastore Server | DataNodeNodeManager OS Version | CentOS 7.2 | CentOS 7.2 Hadoop | Apache Hadoop 2.7.5 | Apache Hadoop 2.7.5 Hive | Apache Hive 2.2.0 |  Spark | Apache Spark 2.1.0 & Apache Spark2.3.0 |  JDK version | 1.8.0_112 | 1.8.0_112 **Related parameters setting:** Component | Parameter | Value -- | -- | -- Yarn Resource Manager | yarn.scheduler.maximum-allocation-mb | 120GB  | yarn.scheduler.minimum-allocation-mb | 1GB  | yarn.scheduler.maximum-allocation-vcores | 121  | Yarn.resourcemanager.scheduler.class | Fair Scheduler Yarn Node Manager | yarn.nodemanager.resource.memory-mb | 120GB  | yarn.nodemanager.resource.cpu-vcores | 121 Spark | spark.executor.memory | 110GB  | spark.executor.cores | 50 ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JkSelf/spark partial-revert-21052 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23269.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23269 ---- commit 03cfe2b7506f5c5421aaf2858f3f31f2153db8fb Author: jiake <ke.a.jia@...> Date: 2018-12-10T06:50:32Z partial revert 21052 because of the performance degradation in tpc-ds ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org