Github user squito commented on the issue: https://github.com/apache/spark/pull/16639 > (1) Instead of this approach, did you consider walking through the exceptions (with getCause()) to see if there's a nested FetchFailure in there? That seems simpler, with the con of missing scenarios where the user discards the initial exception entirely. Not sure how likely that is? The current approach is definitely more defensive towards bad user code, but I'm hesitant about the amount of added complexity. yeah, I considered it but chose this for exactly the reason you suggest, to be more defensive. It just seems way too easy for a user to get this wrong. Even assuming the user always wraps the exception, what about bad error handling in a `finally`? That is pretty common, even in spark, eg.: ```scala try { ... } catch { case t: Throwable => throw new MySpecialAppException(t) } finally { someResource.close() // oops, this can throw an exception } ``` Given the special importance of this exception, it really seems like we should be handling it in some way that bad user code can't cover it up. > (2) For testing, does it help to pass in a much smaller maxBytesToFetch (spark.reduce.maxSizeInFlight) to ShuffleBlockFetcherIterator to limit the size of the initial fetches, to make it easier to wrap the FetchFailed when you want to? Do you mean for writing a larger integration test? that would probably help make the failure more likely, but I off the top my head, I don't think that will be enough to reliable trigger the real failure (which means if its ever broken, we'll just think its a flaky test, not broken functionality). IIRC the problem isn't just having more fetches than fit in the initial requests -- you *also* need the fetch failure to occur at a specific point in the processing. I'd need to futz around with it a while again to say for sure, though. I'd certainly like an integration test, but eventually decided the unit test I added was sufficient.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org