mridulm commented on pull request #30876:
URL: https://github.com/apache/spark/pull/30876#issuecomment-751934621


   Specifically for this 
[usecase](https://github.com/apache/spark/pull/30876#issuecomment-750471287), 
we dont need to make it a spark default right ?
   If I understood right, the following conditions are being met:
   a) Application is using RDD with replication > 1 (pre-req)
   b) Application is fine with the proactive replication cost, and is coded 
such that out-of-scope RDD's are ensured to be GC'ed.
   c) k8s is (aggressively) configured such that executor decomm is unable to 
do a sufficiently good enough job.
   d) recomputation cost is high enough to be offset by proactive decomm.
   
   then sure, for those applications/cluster env combination, making proactive 
replication an application default makes sense.
   But this feels sufficiently narrow enough not to require a global default, 
no ? It looks more like a deployment scenario/application default and not a 
platform level default.
   
   In the scenario above though, how do we handle everything else ?
   Shuffle ? replicated RDD where replication == 1 ?
   Perhaps better tuning for (c) might help more holistically ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to