Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9610#discussion_r44715168
  
    --- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
    @@ -93,6 +93,29 @@ private[spark] class IndexShuffleBlockResolver(conf: 
SparkConf) extends ShuffleB
         } {
           out.close()
         }
    +
    +    val dataFile = getDataFile(shuffleId, mapId)
    +    synchronized {
    +      if (dataFile.exists() && indexFile.exists()) {
    +        if (dataTmp != null && dataTmp.exists()) {
    +          dataTmp.delete()
    +        }
    +        indexTmp.delete()
    +      } else {
    +        if (indexFile.exists()) {
    +          indexFile.delete()
    +        }
    +        if (!indexTmp.renameTo(indexFile)) {
    +          throw new IOException("fail to rename data file " + indexTmp + " 
to " + indexFile)
    +        }
    +        if (dataFile.exists()) {
    +          dataFile.delete()
    +        }
    +        if (dataTmp != null && dataTmp.exists() && 
!dataTmp.renameTo(dataFile)) {
    +          throw new IOException("fail to rename data file " + dataTmp + " 
to " + dataFile)
    --- End diff --
    
    I don't think there is a particular flaw here, but its a bit hard to follow 
since its a mix of first-attempt-wins and last-attempt wins.  First attempt if 
there is a data file & index file; last attempt if its only an index file.  the 
problem w/ last-attempt is that this delete will fail on windows if the file is 
open for reading, I believe.  Though we can't get around that because 
SPARK-4085 always requires us to delete some files that might be open, in which 
case we hope that we don't run into this race again on the next retry.  It 
would be nice to minimize that case, though.  We'd be closer to 
first-attempt-wins if we always wrote a dataFile, even if its empty when 
dataTmp == null.
    
    There is also an issue w/ mapStatus & non-deterministic data.  It might not 
matter which output you get, but the mapstatus should be consistent with the 
data that is read.  If attempt 1 writes non-empty outputs a,b,c, and attempt 2 
writes non-empty outputs d,e,f (which are not committed), the reduce tasks 
might get the mapstatus for attempt 2, look for outputs d,e,f, and get nothing 
but empty blocks.  Matei had suggested writing the mapstatus to a file, so that 
subsequent attempts always return the map status corresponding to the first 
successful attempt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to