[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-09 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 As @cloud-fan said `the hash join metrics is wrongly implemented`, we will partial revert # SPARK-21052, no longer need this patch, close it ~ ---

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 On the other hand, if is only a `multi-thread problem`, may not affect performance because there is no synchronized code part ... ---

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I think there is a problem, but no one found out because it's only about metrics. --- - To unsubscribe, e-mail:

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 ``` For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread problem, via HashedRelation.asReadOnlyCopy. However, this is a shallow copy, the

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23214 Thanks for doing this. I think we are more close to the root cause. --- - To unsubscribe, e-mail:

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99651/ Test FAILed. ---

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23214 **[Test build #99651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)** for PR 23214 at commit

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 It's easy to track `numKeyLookups` at `HashedRelation`, but it's hard to track `numProbes`. One idea is, we pass a `MutableInt` to `LongToUnsafeRowMap.getValue` as a parameter, and in the method

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I might know the root cause: `LongToUnsafeRowMap` is acutally accessed by multiple threads. For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23214 **[Test build #99651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)** for PR 23214 at commit

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 @adrian-wang ok~ I will add some comments to explain the reason --- - To unsubscribe, e-mail:

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 @JkSelf thx~ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread JkSelf
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23214 @LuciferYang the patch is fine in my test environment. @adrian-wang I will run all the tpcds queries in spark2.3 and spark2.3 with this patch later. ---

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread adrian-wang
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23214 maybe add some detailed test result in description and explain the reason for this in code comment? --- - To unsubscribe,

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 ping @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 cc @cloud-fan , help to review this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread LuciferYang
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 cc @JkSelf help to check this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional