Github user uncleGen commented on the issue:

    https://github.com/apache/spark/pull/16142
  
    > The current scan code does not make one request to the NameNode per log 
file in the directory. Your code does. That should be avoided.
    
    Make sense, current implementation can be optimized absolutely, it's my 
fault.
    
    > If they come last, then you're first accounting for log sizes of apps 
that have already finished and might end up trying to delete logs from apps 
that are still running (!!!).
    
    I get what you mean. Currently, order of apps log depends on the last 
attempts log time:
    
    ```
    private def compareAppInfo(
          i1: FsApplicationHistoryInfo,
          i2: FsApplicationHistoryInfo): Boolean = {
        val a1 = i1.attempts.head
        val a2 = i2.attempts.head
        if (a1.endTime != a2.endTime) a1.endTime >= a2.endTime else 
a1.startTime >= a2.startTime
      }
    ```
    
    So, if need clean up, completed job logs will be cleaned firstly, and the 
older in-progress (if exists).
    #16165 has supported deleting too old in-progress job logs. So I think it 
is OK in this case.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to