[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

vanzin Thu, 15 Dec 2016 20:38:59 -0800

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16142
  
    > I just add a new clean-up mode, but not add the cleaner itself. 
    
    But that's kinda the point. How many different ways of cleaning need to be 
added? Will this one be enough? Will people ask for archiving next? I'm wary of 
going down that path.
    
    > I think you may not get what I mean.
    
    I get what you mean. I just disagree with you that it's an important 
feature to have.
    
    > So, I do not think get the size of each log will hurt NameNode greatly.
    
    The current scan code does not make one request to the NameNode per log 
file in the directory. Your code does. That should be avoided.
    
    > Besides, the unit test has proved that the older file will be cleaned 
first.
    
    Your code doesn't do that, so if the unit test shows that it's not by 
design. Your code is scanning the list of apps in the order they're kept in 
memory (descending end time). I don't remember whether in progress apps come 
first or last. But if they come first, an old attempt of an in progress app 
will have precedence over newer attempts of apps that have already finished. If 
they come last, then you're first accounting for log sizes of apps that have 
already finished and might end up trying to delete logs from apps that are 
still running (!!!).
    
    The way the current cleaner code works for time does not work if you're 
doing the `shouldClean` check solely based on space used. So this feature is 
not as trivial as your code make it seem.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

Reply via email to