Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16142
  
    > #16165 has supported deleting too old in-progress job logs. So I think it 
is OK in this case.
    
    It's *not* OK. That change uses an heuristic to decide when to delete 
"inprogress" logs: if they haven't been written to for a long time, it assumes 
the app is dead. That's an acceptable heuristic, since if an app hasn't written 
a single byte to the log in several days, it's probably not running anymore.
    
    But your change may delete logs from *actively running* apps, just because 
you're exceeding the threshold you defined. That's not acceptable.
    
    As for your users, I'd first try compression + shorter max age and see how 
that works out. If it doesn't, try enabling logs selectively just for 
applications they care about. Otherwise, there's not much sense in saving the 
logs in the first place, since they may disappear at any time.
    
    If they have a custom need, you can always write a small daemon to do the 
cleanup, without having to modify Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to