Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16142 > #16165 has supported deleting too old in-progress job logs. So I think it is OK in this case. It's *not* OK. That change uses an heuristic to decide when to delete "inprogress" logs: if they haven't been written to for a long time, it assumes the app is dead. That's an acceptable heuristic, since if an app hasn't written a single byte to the log in several days, it's probably not running anymore. But your change may delete logs from *actively running* apps, just because you're exceeding the threshold you defined. That's not acceptable. As for your users, I'd first try compression + shorter max age and see how that works out. If it doesn't, try enabling logs selectively just for applications they care about. Otherwise, there's not much sense in saving the logs in the first place, since they may disappear at any time. If they have a custom need, you can always write a small daemon to do the cleanup, without having to modify Spark.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org