Github user squito commented on the issue: https://github.com/apache/spark/pull/15249 (a) right, this is a behavior change ... seemed fair since earlier behavior was undocumented, and I don't see a strong reason to maintain the exact same behavior as before. I think its fair for us to change behavior here, though we should try to support general use cases (as I was discussing above). timeout is not enforced whatsoever in this pr (the only reason its here at all is that it was easier for me to pull those bits from the full change in here as well). (b) executor blacklisting is a somewhat odd middle-ground, you're right. One motivating case from yarn's bad disk detection -- it'll exclude the bad disk from future containers, but not existing ones. so you can have one node with some good containers, and some bad ones. Admittedly this solution still isn't great in that case, since the default confs will lead to the entire node getting pushed into the blacklist with just 2 bad executors. I've also seen executors behaving badly though others on the same node are fine, without any clear reason, so its meant to handle these poorly understood cases. Admittedly, for the main goals, things work fine if we only had blacklisting at the node level. (c) -- yup, those changes are already in the larger pr.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org