Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19770#discussion_r154814962 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -643,6 +633,44 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) } finally { iterator.foreach(_.close()) } + + // Clean corrupt or empty files that may have accumulated. + if (AGGRESSIVE_CLEANUP) { + var untracked: Option[KVStoreIterator[LogInfo]] = None + try { + untracked = Some(listing.view(classOf[LogInfo]) --- End diff -- So I spent some time reading my own patch and it's covering a slightly different case. My patch covers deleting SHS state when files are deleted, this covers deleting files that the SHS decides are broken. I still think that some code / state can be saved by handling both similarly - still playing with my code, though.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org