[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/22371 OK, thanks everyone for the help. Close it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22371 My opinion is, it's not worth to spend time on it. The lock is not likely to be a bottleneck and it's better to keep it simple even it's sub-optimal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22371 @ConeyLiu we may have an executor lost and then come back, and may have 2 same tasks running on the same executor. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/22371 @squito , thanks for the review. I intend to using `ConcurrentHashMap[Int, AtomicReferenceArray]` previously. After re-think the code, I can know the lock here is used to prevent the same task with different attempt to commit the shuffle writer result at the same time. The task has a different attempt can be caused by follows: 1. Failed task or stage. In this case, the previous task attempt should already finish(failed or killed) or the result is not used anymore. 2. `Speculative task`. In this case, the speculative task can't be scheduled to the same executor as other attempts. So, what's real value for the lock. Maybe I'm wrong, hopeful some answers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/22371 Thanks @felixcheung, @srowen, @cloud-fan for your time. There is only one instance of `IndexShuffleBlockResolver` per executor, and the synchronize is used to protect the modify safely when there are same tasks with different attempt update at the same time. The synchronize is unnecessary for most of the tasks, and the modify is very simple. I have tested locally, the results as follow. I admit that this change brings little improvement to complex tasks, but it does not cause performance degradation. `./spark-shell --master local[20] --driver-memory 40g` `spark.range(0, 1000, 1, 100).repartition(200).count()` before: map | reduce | --- 2s | 0.4s 0.8s | 0.2s 0.7s | 0.2s after: map | reduce | --- 0.8s | 0.2s 0.5s | 0.4s 0.5s | 0.2s --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22371 How much perf can we save here? I don't think shuffle writing will be bottlenecked by this lock. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22371 + @srowen @squito @JoshRosen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22371 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22371 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22371 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/22371 @cloud-fan @jiangxb1987 Could you help to review this? Thanks a lot. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org