Github user superbobry commented on a diff in the pull request: https://github.com/apache/spark/pull/19458#discussion_r143586763 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -100,7 +102,9 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea /** List all the blocks currently stored on disk by the disk manager. */ def getAllBlocks(): Seq[BlockId] = { - getAllFiles().map(f => BlockId(f.getName)) + // SPARK-22227: the Try guides against temporary files written + // during shuffle which do not correspond to valid block IDs. + getAllFiles().flatMap(f => Try(BlockId(f.getName)).toOption) --- End diff -- > This has the effect of swallowing a number of possible exceptions. It does. Right now the only possible exception coming from `BlockId.apply` is `IllegalStateException `, but I agree that using `Try` makes the code fragile. My intention was to make an easy to read proof of concept and then adjust the patch according to the feedback. What do you think about adding a "safe" parsing method to `BlockId`? The metho would return an `Option` instead of throwing and would only be used in `DiskBlockManager` for now? `BlockId.apply` can then be expressed `parse(s).getOrElse { throw ... }`. > But is there a more explicit way of excluding temp files? it seems like getAllFiles shouldn't return those either? I think this is tricky to do without leaking some abstractions, e.g. the ".UUID" naming scheme in `Utils. tempFileWith` and Temp*BlockId naming scheme.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org