Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1541#issuecomment-54098544 Thanks for the reminder. @kayousterhout I looked over @zsxwing's example and I agree that there's a thread-safety issue here. We can definitely have multiple concurrent block fetches that could race when accessing mapStatuses. There's a lot of other state in MapOutputTracker that's guarded with `synchronized`, which implies that this instances of MapOutputTracker will be accessed from multiple threads. In fact, there's even a `statuses.synchronized` at the end of `getServerStatuses` that's guarding a `MapOutputTracker.convertMapStatuses` call, but for some reason the other branch of the `if` guards it using `fetchedStatuses.synchronized` (which doesn't even make sense, since `fetchedStatuses` is a local variable defined inside of `getServerStatuses`). Since the synchronization logic here seems kind of messy / confusing and mapStatuses is only accessed from MapOutputTracker, maybe it would be better to just add proper synchronization around reads/writes to mapStatuses rather than converting it to a ConcurrentHashMap.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org