Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4043#issuecomment-70214406 I'm not sure this can be merged as-is. The state clean-up here is based on the assumption that every stage that is pending will at some later time be submitted. Is that definitely true? What happens if a stage is aborted or failed, won't its dependent stages remain pending indefinitely for the entire history of the application? I think at a minimum you need to make sure you remove all associated stages with a given job if the job ends. There may also be other corner cases I'm not thinking of. A second assumption this makes is that the job start event will always occur before the stage is submitted. Is that definitely true? It would be good to dig through the reporting API and make sure that is a safe assumption. I think the guarantees of the listener around event ordering are pretty minimal. Finally, what about putting pending stages after active stages in the display page? My concern is you may have dozens or more pending stages in production jobs, I think people will be frustrated if they open the UI and they have to scroll way down every time they need to e.g. refresh the page. It's a big odd to go (active -> pending -> completed), but I think better for usability.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org