Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-73993809 So, I'm all for the feature, but I'm not sold on the approach here. It makes the code a little confusing, since you're trying to keep the code working in two different modes: "lazy" for the startup check, and "synchronous" for the subsequent checks. Instead, why not always do it lazily? Have a thread pool with a few worker threads, and have `checkForLogs` feed requests for parsing logs to that pool. `checkForLogs` will only list the files, regardless of when it's executed (startup vs. not). That looks like it would be easier to understand, at least to me, would provide performance improvements for all subsequent checks (not just the initial one), and would simplify the code a lot (not having to deal with different types for lazy vs. not lazy app info, for one). What do you think?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org