Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17854 > Although looking at it maybe I'm missing how its supposed to handle network failure? Spark has never really handled network failure. If the connection between the driver and the executor is cut, Spark sees that as the executor dying. > I don't in general agree that we shouldn't retry... But those would be case by case basis. Yes, code that want to retry should to do that explicitly. The old "retry" existed not because of needs of the code making the call, but because Akka could lose messages. The new RPC layer doesn't lose messages (ignoring the TCP reset case), so that old-style retry is not needed anymore. The connection itself dying is a bigger issue that needs to be handled in the RPC layer if it's really a problem, and the caller retrying isn't really the solution (IMO).
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org