[ 
https://issues.apache.org/jira/browse/SPARK-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873614#comment-16873614
 ] 

Imran Rashid commented on SPARK-28165:
--------------------------------------

btw if anybody wants to investigate this more, here's a simple test case, 
(though as discussed above, we can't just use the modtime as its not totally 
trustworthy):

{code}
  test("log cleaner for inprogress files before SHS startup") {
    val firstFileModifiedTime = TimeUnit.SECONDS.toMillis(10)
    val secondFileModifiedTime = TimeUnit.SECONDS.toMillis(100)
    val maxAge = TimeUnit.SECONDS.toMillis(40)
    val clock = new ManualClock(0)

    val log1 = newLogFile("inProgressApp1", None, inProgress = true)
    writeFile(log1, true, None,
      SparkListenerApplicationStart(
        "inProgressApp1", Some("inProgressApp1"), 3L, "test", Some("attempt1"))
    )
    log1.setLastModified(firstFileModifiedTime)

    val log2 = newLogFile("inProgressApp2", None, inProgress = true)
    writeFile(log2, true, None,
      SparkListenerApplicationStart(
        "inProgressApp2", Some("inProgressApp2"), 23L, "test2", 
Some("attempt2"))
    )
    log2.setLastModified(secondFileModifiedTime)

    // advance the clock so the first log is expired, but second log is still 
recent
    clock.setTime(secondFileModifiedTime)
    assert(clock.getTimeMillis() > firstFileModifiedTime + maxAge)

    // start up the SHS
    val provider = new FsHistoryProvider(
      createTestConf().set("spark.history.fs.cleaner.maxAge", s"${maxAge}ms"), 
clock)

    provider.checkForLogs()

    // We should cleanup one log immediately
    updateAndCheck(provider) { list =>
      assert(list.size  === 1)
    }
    assert(!log1.exists())
    assert(log2.exists())
  }
{code}

> SHS does not delete old inprogress files until cleaner.maxAge after SHS start 
> time
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-28165
>                 URL: https://issues.apache.org/jira/browse/SPARK-28165
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.3, 2.4.3
>            Reporter: Imran Rashid
>            Priority: Major
>
> The SHS will not delete inprogress files until 
> {{spark.history.fs.cleaner.maxAge}} time after it has started (7 days by 
> default), regardless of when the last modification to the file was.  This is 
> particularly problematic if the SHS gets restarted regularly, as then you'll 
> end up never deleting old files.
> There might not be much we can do about this -- we can't really trust the 
> modification time of the file, as that isn't always updated reliably.
> We could take the last time of any event from the file, but then we'd have to 
> turn off the optimization of SPARK-6951, to avoid reading the entire file 
> just for the listing.
> *WORKAROUND*: have the SHS save state across restarts to local disk by 
> specifying a path in {{spark.history.store.path}}.  It'll still take 7 days 
> from when you add that config for the cleaning to happen, but then going for 
> the cleaning should happen reliably.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to