[ https://issues.apache.org/jira/browse/SPARK-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873614#comment-16873614 ]
Imran Rashid commented on SPARK-28165: -------------------------------------- btw if anybody wants to investigate this more, here's a simple test case, (though as discussed above, we can't just use the modtime as its not totally trustworthy): {code} test("log cleaner for inprogress files before SHS startup") { val firstFileModifiedTime = TimeUnit.SECONDS.toMillis(10) val secondFileModifiedTime = TimeUnit.SECONDS.toMillis(100) val maxAge = TimeUnit.SECONDS.toMillis(40) val clock = new ManualClock(0) val log1 = newLogFile("inProgressApp1", None, inProgress = true) writeFile(log1, true, None, SparkListenerApplicationStart( "inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1")) ) log1.setLastModified(firstFileModifiedTime) val log2 = newLogFile("inProgressApp2", None, inProgress = true) writeFile(log2, true, None, SparkListenerApplicationStart( "inProgressApp2", Some("inProgressApp2"), 23L, "test2", Some("attempt2")) ) log2.setLastModified(secondFileModifiedTime) // advance the clock so the first log is expired, but second log is still recent clock.setTime(secondFileModifiedTime) assert(clock.getTimeMillis() > firstFileModifiedTime + maxAge) // start up the SHS val provider = new FsHistoryProvider( createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), clock) provider.checkForLogs() // We should cleanup one log immediately updateAndCheck(provider) { list => assert(list.size === 1) } assert(!log1.exists()) assert(log2.exists()) } {code} > SHS does not delete old inprogress files until cleaner.maxAge after SHS start > time > ---------------------------------------------------------------------------------- > > Key: SPARK-28165 > URL: https://issues.apache.org/jira/browse/SPARK-28165 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.3, 2.4.3 > Reporter: Imran Rashid > Priority: Major > > The SHS will not delete inprogress files until > {{spark.history.fs.cleaner.maxAge}} time after it has started (7 days by > default), regardless of when the last modification to the file was. This is > particularly problematic if the SHS gets restarted regularly, as then you'll > end up never deleting old files. > There might not be much we can do about this -- we can't really trust the > modification time of the file, as that isn't always updated reliably. > We could take the last time of any event from the file, but then we'd have to > turn off the optimization of SPARK-6951, to avoid reading the entire file > just for the listing. > *WORKAROUND*: have the SHS save state across restarts to local disk by > specifying a path in {{spark.history.store.path}}. It'll still take 7 days > from when you add that config for the cleaning to happen, but then going for > the cleaning should happen reliably. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org