[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan the new ticket is in [here](https://github.com/apache/spark/pull/23269 ). I will close this ticket. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23204 **[Test build #99894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99894/testReport)** for PR 23204 at commit [`9baf05e`](https://github.com/apache/spark/commit/9baf05e00f2839a1dfa5d3a23cf265e1fe21d2a6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99894/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 can we follow https://github.com/apache/spark/pull/23204#issuecomment-445510026 and create a new ticket? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23204 ok~ already close #23214 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan @dongjoon-hyun update the patch, please help review if you have time. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5909/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23204 **[Test build #99894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99894/testReport)** for PR 23204 at commit [`9baf05e`](https://github.com/apache/spark/commit/9baf05e00f2839a1dfa5d3a23cf265e1fe21d2a6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 If we can quickly finish #23214 (within several days), let's go for it. But if we can't, I'd suggest we do the partial revert first to fix the perf regression, and add back the metrics later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan If we decide to partial revert SPARK-21052 and no need for #23214, I will close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan and @JkSelf . For the partial revert, we had better create a new Apache JIRA issue. That will be a more cleaner way to backport. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan ok, i will revert as your comments later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 The result of all queries in tpcds with 1TB data scale is in [tpcds result](https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 according to https://github.com/apache/spark/pull/23214#issuecomment-443999282 , the hash join metrics is wrongly implemented. I think it's fine to revert it and re-implement it later. @JkSelf can you address the comments and only revert the hash join part? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23204 Hi, @LuciferYang . If we are not going to revert this, could you close this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23204 @cloud-fan @viirya #23214 maybe reslove this problem and we needn't revert this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99615/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23204 **[Test build #99615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99615/testReport)** for PR 23204 at commit [`7d5008d`](https://github.com/apache/spark/commit/7d5008d11e37086f4a8206276791b9424bcf60b7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5669/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23204 **[Test build #99615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99615/testReport)** for PR 23204 at commit [`7d5008d`](https://github.com/apache/spark/commit/7d5008d11e37086f4a8206276791b9424bcf60b7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23204 Is this observable in general hash join query, except for TPC-DS Q19? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 I'm fine to revert it if it caused a significant performance regression, we should revisit it later, with different ideas, like updating the metrics for each batch instead of each record. cc @viirya @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 **Cluster info:**  | Master Node | Worker Nodes -- | -- | -- Node | 1x | 4x Processor | Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz | Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Memory | 192 GB | 384 GB Storage Main | 8 x 960G SSD | 8 x 960G SSD Network | 10Gbe Role | CM Management NameNodeSecondary NameNodeResource ManagerHive Metastore Server | DataNodeNodeManager OS Version | CentOS 7.2 | CentOS 7.2 Hadoop | Apache Hadoop 2.7.5 | Apache Hadoop 2.7.5 Hive | Apache Hive 2.2.0 |  Spark | Apache Spark 2.1.0 & Apache Spark2.3.0 |  JDK version | 1.8.0_112 | 1.8.0_112 **Related parameters setting:** Component | Parameter | Value -- | -- | -- Yarn Resource Manager | yarn.scheduler.maximum-allocation-mb | 40GB  | yarn.scheduler.minimum-allocation-mb | 1GB  | yarn.scheduler.maximum-allocation-vcores | 121  | Yarn.resourcemanager.scheduler.class | Fair Scheduler Yarn Node Manager | yarn.nodemanager.resource.memory-mb | 40GB  | yarn.nodemanager.resource.cpu-vcores | 121 Spark | spark.executor.memory | 34GB  | spark.executor.cores | 40 In above test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is [L486](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486) and [L487](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487) . Following is the result of TPC-DS Q19 in spark2.1, spark2.3 remove L486&487, spark2.3 add L486&487 and spark2.4. spark2.1 | spark2.3 remove L486&487 | spark2.3 addL486&487 | spark2.4 -- | -- | -- | -- 49s | 47s | 307s | 270s --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org