[ 
https://issues.apache.org/jira/browse/SPARK-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071449#comment-14071449
 ] 

Stephen Boesch commented on SPARK-2638:
---------------------------------------


The external mini-project was used to ensure/prove the concurrency improvement 
from the coarse-grained entire-collection approach currently used (locking on 
the "fetching" collection) vs the  "fine grained" approach that locks only on 
the individual shuffleId's. 

The concept was tested within the separate github project and here are the 
results. To summarize: 20 accesses were done in both coarsse-grained and 
fine-grained mode.  The fine-grained test  completed in 1/20 the time (0.2.sec) 
 of the coarse-grained one (4 sec).


Testing started at 1:29 PM ...
TestResult for FineGrainedRunner-0: result=TestResult for FineGrainedRunner-0: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-1: result=TestResult for FineGrainedRunner-1: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-2: result=TestResult for FineGrainedRunner-2: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-3: result=TestResult for FineGrainedRunner-3: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-4: result=TestResult for FineGrainedRunner-4: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-5: result=TestResult for FineGrainedRunner-5: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-6: result=TestResult for FineGrainedRunner-6: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-7: result=TestResult for FineGrainedRunner-7: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-8: result=TestResult for FineGrainedRunner-8: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-9: result=TestResult for FineGrainedRunner-9: 
result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-10: result=TestResult for 
FineGrainedRunner-10: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-11: result=TestResult for 
FineGrainedRunner-11: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-12: result=TestResult for 
FineGrainedRunner-12: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-13: result=TestResult for 
FineGrainedRunner-13: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-14: result=TestResult for 
FineGrainedRunner-14: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-15: result=TestResult for 
FineGrainedRunner-15: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-16: result=TestResult for 
FineGrainedRunner-16: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-17: result=TestResult for 
FineGrainedRunner-17: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-18: result=TestResult for 
FineGrainedRunner-18: result=None duration=0.2 duration=0.2
TestResult for FineGrainedRunner-19: result=TestResult for 
FineGrainedRunner-19: result=None duration=0.2 duration=0.2
14/07/22 13:29:19 INFO TimedThreadedTestTestSuite: ** Completed 
FineGrainedTests duration=0.2**
TestResult for CoarseGrainedRunner-0: result=TestResult for 
CoarseGrainedRunner-0: result=None duration=0.2 duration=0.2
TestResult for CoarseGrainedRunner-1: result=TestResult for 
CoarseGrainedRunner-1: result=None duration=3.8 duration=3.8
TestResult for CoarseGrainedRunner-2: result=TestResult for 
CoarseGrainedRunner-2: result=None duration=3.6 duration=3.6
TestResult for CoarseGrainedRunner-3: result=TestResult for 
CoarseGrainedRunner-3: result=None duration=3.4 duration=3.4
TestResult for CoarseGrainedRunner-4: result=TestResult for 
CoarseGrainedRunner-4: result=None duration=3.2 duration=3.2
TestResult for CoarseGrainedRunner-5: result=TestResult for 
CoarseGrainedRunner-5: result=None duration=2.8 duration=2.8
TestResult for CoarseGrainedRunner-6: result=TestResult for 
CoarseGrainedRunner-6: result=None duration=3.0 duration=3.0
TestResult for CoarseGrainedRunner-7: result=TestResult for 
CoarseGrainedRunner-7: result=None duration=2.6 duration=2.6
TestResult for CoarseGrainedRunner-8: result=TestResult for 
CoarseGrainedRunner-8: result=None duration=2.4 duration=2.4
TestResult for CoarseGrainedRunner-9: result=TestResult for 
CoarseGrainedRunner-9: result=None duration=2.2 duration=2.2
TestResult for CoarseGrainedRunner-10: result=TestResult for 
CoarseGrainedRunner-10: result=None duration=1.8 duration=1.8
TestResult for CoarseGrainedRunner-11: result=TestResult for 
CoarseGrainedRunner-11: result=None duration=2.0 duration=2.0
TestResult for CoarseGrainedRunner-12: result=TestResult for 
CoarseGrainedRunner-12: result=None duration=1.6 duration=1.6
TestResult for CoarseGrainedRunner-13: result=TestResult for 
CoarseGrainedRunner-13: result=None duration=1.4 duration=1.4
TestResult for CoarseGrainedRunner-14: result=TestResult for 
CoarseGrainedRunner-14: result=None duration=1.2 duration=1.2
TestResult for CoarseGrainedRunner-15: result=TestResult for 
CoarseGrainedRunner-15: result=None duration=1.0 duration=1.0
14/07/22 13:29:23 INFO TimedThreadedTestTestSuite: ** Completed 
CoarseGrainedTests duration=4.0**
14/07/22 13:29:23 INFO TimedThreadedTestTestSuite: HEY we are done!
TestResult for CoarseGrainedRunner-16: result=TestResult for 
CoarseGrainedRunner-16: result=None duration=0.8 duration=0.8
TestResult for CoarseGrainedRunner-17: result=TestResult for 
CoarseGrainedRunner-17: result=None duration=0.6 duration=0.6
TestResult for CoarseGrainedRunner-18: result=TestResult for 
CoarseGrainedRunner-18: result=None duration=0.3 duration=0.4
TestResult for CoarseGrainedRunner-19: result=TestResult for 
CoarseGrainedRunner-19: result=None duration=4.0 duration=4.0



> Improve concurrency of fetching Map outputs
> -------------------------------------------
>
>                 Key: SPARK-2638
>                 URL: https://issues.apache.org/jira/browse/SPARK-2638
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Stephen Boesch
>            Priority: Minor
>              Labels: MapOutput, concurrency
>             Fix For: 1.1.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> This issue was noticed while perusing the MapOutputTracker source code. 
> Notice that the synchronization is on the containing "fetching" collection - 
> which makes ALL fetches wait if any fetch were occurring.  
> The fix is to synchronize instead on the shuffleId (interned as a string to 
> ensure JVM wide visibility).
>   def getServerStatuses(shuffleId: Int, reduceId: Int): 
> Array[(BlockManagerId, Long)] = {
>     val statuses = mapStatuses.get(shuffleId).orNull
>     if (statuses == null) {
>       logInfo("Don't have map outputs for shuffle " + shuffleId + ", fetching 
> them")
>       var fetchedStatuses: Array[MapStatus] = null
>       fetching.synchronized {   // This is existing code
>      //  shuffleId.toString.intern.synchronized {  // New Code
>         if (fetching.contains(shuffleId)) {
>           // Someone else is fetching it; wait for them to be done
>           while (fetching.contains(shuffleId)) {
>             try {
>               fetching.wait()
>             } catch {
>               case e: InterruptedException =>
>             }
>           }
> This is only a small code change, but the testcases to prove (a) proper 
> functionality and (b) proper performance improvement are not so trivial.  
> For (b) it is not worthwhile to add a testcase to the codebase. Instead I 
> have added a git project that demonstrates the concurrency/performance 
> improvement using the fine-grained approach . The github project is at
> https://github.com/javadba/scalatesting.git  .  Simply run "sbt test". Note: 
> it is unclear how/where to include this ancillary testing/verification 
> information that will not be included in the git PR: i am open for any 
> suggestions - even as far as simply removing references to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to