Github user squito commented on the issue:

    https://github.com/apache/spark/pull/16639
  
    > (1) Instead of this approach, did you consider walking through the 
exceptions (with getCause()) to see if there's a nested FetchFailure in there? 
That seems simpler, with the con of missing scenarios where the user discards 
the initial exception entirely. Not sure how likely that is? The current 
approach is definitely more defensive towards bad user code, but I'm hesitant 
about the amount of added complexity.
    
    yeah, I considered it but chose this for exactly the reason you suggest, to 
be more defensive.  It just seems way too easy for a user to get this wrong.  
Even assuming the user always wraps the exception, what about bad error 
handling in a `finally`?  That is pretty common, even in spark, eg.:
    
    ```scala
    try { 
      ...
    } catch {
      case t: Throwable => throw new MySpecialAppException(t)
    } finally {
      someResource.close() // oops, this can throw an exception
    }
    ```
    
    Given the special importance of this exception, it really seems like we 
should be handling it in some way that bad user code can't cover it up.
    
    > (2) For testing, does it help to pass in a much smaller maxBytesToFetch 
(spark.reduce.maxSizeInFlight) to ShuffleBlockFetcherIterator to limit the size 
of the initial fetches, to make it easier to wrap the FetchFailed when you want 
to?
    
    Do you mean for writing a larger integration test?  that would probably 
help make the failure more likely, but I off the top my head, I don't think 
that will be enough to reliable trigger the real failure (which means if its 
ever broken, we'll just think its a flaky test, not broken functionality).  
IIRC the problem isn't just having more fetches than fit in the initial 
requests -- you *also* need the fetch failure to occur at a specific point in 
the processing. I'd need to futz around with it a while again to say for sure, 
though.
    
    I'd certainly like an integration test, but eventually decided the unit 
test I added was sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to