yaooqinn commented on PR #43746:
URL: https://github.com/apache/spark/pull/43746#issuecomment-1811680290

   > Preemption on yarn shouldn't be going against the number of failed 
executors. If it is then something has changed and we should fix that.
   
   Yes, you are right
   
   > This is a consequence of using a shared environment. Ideally Spark would 
isolate it from other and other users wouldn't be affected but that 
unfortunately isn't the case. I'm not sure your environment but ideally users 
test things before running in some production environment and breaking things.
   
   Yeah, the test step is necessary before the prod. But as you said 'ideally'. 
System robust takes precedence over that.
   
   
   > If this feature doesn't really work or has issues on k8s then there should 
be a way to disable it, which seems like more what you want here right? You are 
essentially saying you don't want it to fail the application, thus turn the 
feature off and you should just do monitoring on your own to catch issues.
   
   
   Why do you always mention k8s when I give evidence on yarn? Well for k8s, 
ExecutorFailureTracker works well for app initialization to fail fast for 
continuous pod failures. ExecutorFailureTracker does not work well on apps with 
sufficient pods and then comes some pod failures 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to