mridulm commented on pull request #30876: URL: https://github.com/apache/spark/pull/30876#issuecomment-751926362
(Sigh, github prematurely posted my previous comment - fleshing it out here). As I mentioned above, the flag helps applications which are fine with paying the overhead for proactive replication. I have sketched some cases where proactive replication does not help, and others where they could be useful - these are examples ofcourse : but in the end, it is specific to the application. Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ? If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ? What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ? Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration. That just exacerbates the problem : since a larger number of executors could be lost. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org