Github user squito commented on the issue:

    https://github.com/apache/spark/pull/16639
  
    cc @kayousterhout @markhamstra @mateiz 
    
    This isn't just protecting against crazy user code -- I've seen users hit 
this with spark sql (because of 
https://github.com/apache/spark/blob/278fa1eb305220a85c816c948932d6af8fa619aa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L214),
 so it seems important to fix.
    
    I attempted to write a larger integration test, which reproduced the issue 
in a "local-cluster" setup, but got stuck.  ShuffleBlockFetcherIterator does 
_some_ fetches on construction, before its used as an iterator wrapped in user 
code.  So if the failures happen during that initialization, everything was 
fine before.  The failure has to happen inside the call to 
`shuffleBlockFetcherIterator.next()` when its called by the user's iterator for 
the error to happen.  I eventually was able to reproduce it with this 
https://github.com/squito/spark/commit/c2d27d10f32edf70e78d849967f7b7bf51495c4e 
but it involved hacking internals and didn't seem easy to get into a test.  I 
settled for a simpler unit test just on `Executor`, but open to more 
suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to