Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1541#issuecomment-54098544
  
    Thanks for the reminder.
    
    @kayousterhout I looked over @zsxwing's example and I agree that there's a 
thread-safety issue here.    We can definitely have multiple concurrent block 
fetches that could race when accessing mapStatuses.
    
    There's a lot of other state in MapOutputTracker that's guarded with 
`synchronized`, which implies that this instances of MapOutputTracker will be 
accessed from multiple threads.  In fact, there's even a 
`statuses.synchronized` at the end of `getServerStatuses` that's guarding a 
`MapOutputTracker.convertMapStatuses` call, but for some reason the other 
branch of the `if` guards it using `fetchedStatuses.synchronized` (which 
doesn't even make sense, since `fetchedStatuses` is a local variable defined 
inside of `getServerStatuses`).
    
    Since the synchronization logic here seems kind of messy / confusing and 
mapStatuses is only accessed from MapOutputTracker, maybe it would be better to 
just add proper synchronization around reads/writes to mapStatuses rather than 
converting it to a ConcurrentHashMap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to