Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/4043#issuecomment-70216469 Thanks Patrick ! Have two questions inline. > I'm not sure this can be merged as-is. The state clean-up here is based on the assumption that every stage that is pending will at some later time be submitted. Is that definitely true? What happens if a stage is aborted or failed, won't its dependent stages remain pending indefinitely for the entire history of the application? I think at a minimum you need to make sure you remove all associated stages with a given job if the job ends. There may also be other corner cases I'm not thinking of. If I am understanding this correctly, "All stages for a particular job Id on jobEnd event - should be cleaned up". I am certainly missing something, is this not already achieved by [this](https://github.com/apache/spark/pull/4043/files#diff-1f32bcb61f51133bd0959a4177a066a5R191) ? >A second assumption this makes is that the job start event will always occur before the stage is submitted. Is that definitely true? It would be good to dig through the reporting API and make sure that is a safe assumption. I think the guarantees of the listener around event ordering are pretty minimal. This can lead to an anomaly that a stage will appear in progress in both active stages section and pending section. But it will still be obvious that - that stage is in progress. Is this safe to ignore, or is it a concern that should be addressed at the cost of minute complexity ? >Finally, what about putting pending stages after active stages in the display page? My concern is you may have dozens or more pending stages in production jobs, I think people will be frustrated if they open the UI and they have to scroll way down every time they need to e.g. refresh the page. It's a big odd to go (active -> pending -> completed), but I think better for usability. This is thoughtful, I am going to incorporate this asap.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org