[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-11 Thread ConeyLiu
Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/22371
  
OK, thanks everyone for the help. Close it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22371
  
My opinion is, it's not worth to spend time on it. The lock is not likely 
to be a bottleneck and it's better to keep it simple even it's sub-optimal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22371
  
@ConeyLiu we may have an executor lost and then come back, and may have 2 
same tasks running on the same executor.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-10 Thread ConeyLiu
Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/22371
  
@squito , thanks for the review. I intend to using `ConcurrentHashMap[Int, 
AtomicReferenceArray]` previously.

After re-think the code, I can know the lock here is used to prevent the 
same task with different attempt to commit the shuffle writer result at the 
same time. The task has a different attempt can be caused by follows:

1. Failed task or stage. In this case, the previous task attempt should 
already finish(failed or killed) or the result is not used anymore.

2. `Speculative task`. In this case, the speculative task can't be 
scheduled to the same executor as other attempts.

So, what's real value for the lock. Maybe I'm wrong, hopeful some answers.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-10 Thread ConeyLiu
Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/22371
  
Thanks @felixcheung, @srowen, @cloud-fan for your time. There is only one 
instance of `IndexShuffleBlockResolver` per executor, and the synchronize is 
used to protect the modify safely when there are same tasks with different 
attempt update at the same time. The synchronize is unnecessary for most of the 
tasks, and the modify is very simple.

I have tested locally, the results as follow. I admit that this change 
brings little improvement to complex tasks, but it does not cause performance 
degradation.

`./spark-shell --master local[20] --driver-memory 40g`
`spark.range(0, 1000, 1, 100).repartition(200).count()`

before: 

map | reduce
 | ---
2s | 0.4s
0.8s |  0.2s
0.7s |  0.2s

after:

map | reduce
 | ---
0.8s | 0.2s
0.5s |  0.4s
0.5s |  0.2s


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-10 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22371
  
How much perf can we save here? I don't think shuffle writing will be 
bottlenecked by this lock.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22371
  
+ @srowen @squito @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22371
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22371
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22371
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-09 Thread ConeyLiu
Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/22371
  
@cloud-fan @jiangxb1987 Could you help to review this? Thanks a lot.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org