Github user squito commented on the issue:

    https://github.com/apache/spark/pull/22444
  
    > history server startup needs to go through all these logs before being 
usable, so any server restart results in hours of downtime, just from scanning.
    
    I don't think this is true. The first scan may take a long time, but i 
think the SHS is usable even during that time.  As soon as a scan makes it 
through some file, that file is added the listing.
    
    But if I understand correctly, the advantage here is that as more 
applications are run during that 2.5 hour scan, you will pick those up more 
quickly.
    
    > 1. would it make sense for the initial scans to go for the most recent 
logs first, because that 2.5 hour time to scan all files is still there.
    > 2. would you want the UI and rest api to indicate that the scan was still 
in progress, and not to worry if the listing was incomplete?
    
    I think both of these already happen.
    
    @jianjianjiao again its been a while since I've looked at this code -- does 
that sound correct?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to