Github user superbobry commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19458#discussion_r143586763
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
    @@ -100,7 +102,9 @@ private[spark] class DiskBlockManager(conf: SparkConf, 
deleteFilesOnStop: Boolea
     
       /** List all the blocks currently stored on disk by the disk manager. */
       def getAllBlocks(): Seq[BlockId] = {
    -    getAllFiles().map(f => BlockId(f.getName))
    +    // SPARK-22227: the Try guides against temporary files written
    +    // during shuffle which do not correspond to valid block IDs.
    +    getAllFiles().flatMap(f => Try(BlockId(f.getName)).toOption)
    --- End diff --
    
    > This has the effect of swallowing a number of possible exceptions. 
    It does. Right now the only possible exception coming from `BlockId.apply` 
is `IllegalStateException `, but I agree that using `Try` makes the code 
fragile. My intention was to make an easy to read proof of concept and then 
adjust the patch according to the feedback.
    
    What do you think about adding a "safe" parsing method to `BlockId`? The 
metho would return an `Option` instead of throwing and would only be used in 
`DiskBlockManager` for now? `BlockId.apply` can then be expressed 
`parse(s).getOrElse { throw ... }`.
    
    > But is there a more explicit way of excluding temp files? it seems like 
getAllFiles shouldn't return those either?
    
    I think this is tricky to do without leaking some abstractions, e.g. the 
".UUID" naming scheme in `Utils. tempFileWith` and Temp*BlockId naming scheme.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to