Github user squito commented on the issue: https://github.com/apache/spark/pull/16639 cc @kayousterhout @markhamstra @mateiz This isn't just protecting against crazy user code -- I've seen users hit this with spark sql (because of https://github.com/apache/spark/blob/278fa1eb305220a85c816c948932d6af8fa619aa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L214), so it seems important to fix. I attempted to write a larger integration test, which reproduced the issue in a "local-cluster" setup, but got stuck. ShuffleBlockFetcherIterator does _some_ fetches on construction, before its used as an iterator wrapped in user code. So if the failures happen during that initialization, everything was fine before. The failure has to happen inside the call to `shuffleBlockFetcherIterator.next()` when its called by the user's iterator for the error to happen. I eventually was able to reproduce it with this https://github.com/squito/spark/commit/c2d27d10f32edf70e78d849967f7b7bf51495c4e but it involved hacking internals and didn't seem easy to get into a test. I settled for a simpler unit test just on `Executor`, but open to more suggestions.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org