Github user squito commented on the issue:

    https://github.com/apache/spark/pull/15249
  
    (a) right, this is a behavior change ... seemed fair since earlier behavior 
was undocumented, and I don't see a strong reason to maintain the exact same 
behavior as before.  I think its fair for us to change behavior here, though we 
should try to support general use cases (as I was discussing above).  timeout 
is not enforced whatsoever in this pr (the only reason its here at all is that 
it was easier for me to pull those bits from the full change in here as well).
    
    (b) executor blacklisting is a somewhat odd middle-ground, you're right.  
One motivating case from yarn's bad disk detection -- it'll exclude the bad 
disk from future containers, but not existing ones.  so you can have one node 
with some good containers, and some bad ones.  Admittedly this solution still 
isn't great in that case, since the default confs will lead to the entire node 
getting pushed into the blacklist with just 2 bad executors.  I've also seen 
executors behaving badly though others on the same node are fine, without any 
clear reason, so its meant to handle these poorly understood cases.  
Admittedly, for the main goals, things work fine if we only had blacklisting at 
the node level.
    
    (c) -- yup, those changes are already in the larger pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to