Github user squito commented on the issue: https://github.com/apache/spark/pull/22444 > history server startup needs to go through all these logs before being usable, so any server restart results in hours of downtime, just from scanning. I don't think this is true. The first scan may take a long time, but i think the SHS is usable even during that time. As soon as a scan makes it through some file, that file is added the listing. But if I understand correctly, the advantage here is that as more applications are run during that 2.5 hour scan, you will pick those up more quickly. > 1. would it make sense for the initial scans to go for the most recent logs first, because that 2.5 hour time to scan all files is still there. > 2. would you want the UI and rest api to indicate that the scan was still in progress, and not to worry if the listing was incomplete? I think both of these already happen. @jianjianjiao again its been a while since I've looked at this code -- does that sound correct?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org