[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599551#comment-16599551 ]
Wenchen Fan commented on SPARK-23253: ------------------------------------- This is dangerous, we can only skip shuffle writing if the data in the existing shuffle file are exactly same with the one we are going to write, but in the PR we only check size. We can use checksum to quickly check if the data are same. This caused a problem in https://github.com/apache/spark/pull/22112 , I'm reverting it in my PR, we should revist this optimization later. > Only write shuffle temporary index file when there is not an existing one > ------------------------------------------------------------------------- > > Key: SPARK-23253 > URL: https://issues.apache.org/jira/browse/SPARK-23253 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 2.2.1 > Reporter: Kent Yao > Assignee: Kent Yao > Priority: Major > Fix For: 2.4.0 > > > Shuffle Index temporay file is used for atomic creating shuffle index file, > it is not needed when the index file already exists after another attempts of > same task had it done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org