Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9610#discussion_r44695273
  
    --- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
    @@ -93,6 +95,10 @@ private[spark] class IndexShuffleBlockResolver(conf: 
SparkConf) extends ShuffleB
         } {
           out.close()
         }
    +    indexFile.deleteOnExit()
    +    if (!tmp.renameTo(indexFile)) {
    +      throw new IOException(s"fail to rename index file $tmp to 
$indexFile")
    --- End diff --
    
    On Thu, Nov 12, 2015 at 8:30 AM, Matei Zaharia <notificati...@github.com>
    wrote:
    
    > In
    > 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
    > <https://github.com/apache/spark/pull/9610#discussion_r44678796>:
    >
    > > @@ -93,6 +95,10 @@ private[spark] class IndexShuffleBlockResolver(conf: 
SparkConf) extends ShuffleB
    > >      } {
    > >        out.close()
    > >      }
    > > +    indexFile.deleteOnExit()
    > > +    if (!tmp.renameTo(indexFile)) {
    > > +      throw new IOException(s"fail to rename index file $tmp to 
$indexFile")
    >
    > Can you test for this? I think the worry was about different TaskSets
    > attempting the same map stage. Imagine that attempt 1 of the stage
    > successfully completes a task, and sends back a map output status, but 
that
    > status gets ignored because that stage attempt got cancelled. Attempt 2
    > might then fail to send a new status for it.
    >
    > There seem to be two ways to fix it if this problem can actually occur --
    > either add MapOutputStatuses even from failed task sets or mark this new
    > task as successful if a file exists.
    >
    After this PR, the second attempt of same task will return SUCCESS, with
    new MapOutputStatus, which could be different than the previous attempt
    (having different sizes of partitions), since we does not use the exact
    number of size (could be lossy compressed), I think it's fine.
    
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/9610/files#r44678796>.
    >
    
    
    
    -- 
     - Davies



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to