itskals edited a comment on issue #26975: [SPARK-30325][CORE] Stage retry and executor crash cause app hung up forever URL: https://github.com/apache/spark/pull/26975#issuecomment-568664326 I am of the opinion that when a task is started by a stage attempt and still in progress, no subsequent retries from other stage attempt must be made, unless it is fate is known. To know if the partition is already assigned to some task, the MapStatus entry for the partition could denote the intermediate step.(As of now MapStatusEntry is either null or filled, kind of boolean. I think we can have the third stage). By this proposed model, we can have the compute resources also saved(no need to start a redundant computation if one stage attempt is already working on it). However, we allow speculation as its within same stage attempt. DO let me know if there is any shortcomings in this thought process. @cloud-fan @seayoun @jiangxb1987
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org