Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22112 > I'm proposing an option 3: > Retry all the tasks of all the succeeding stages if a stage with repartition/zip failed. All RDD actions should tell Spark if it's "repeatable", which becomes a property of the result stage. When we retry a result stage that has several tasks finished, if the result stage is "repeatable" (e.g. collect), retry it. If the result stage is not "repeatable", fail the job with the error message to ask users to checkpoint the RDD before repartition/zip. how does the user then tell spark that the result stage becomes repeatable because they did the checkpoint? Add an option to the api? Or does Spark automatically try to figure that out? I'm still a bit hesitant about making our long term solution that these operations aren't resilient, but I as long as the user can make them resilient perhaps its ok.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org