Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4043#issuecomment-70214406
  
    I'm not sure this can be merged as-is. The state clean-up here is based on 
the assumption that every stage that is pending will at some later time be 
submitted. Is that definitely true? What happens if a stage is aborted or 
failed, won't its dependent stages remain pending indefinitely for the entire 
history of the application? I think at a minimum you need to make sure you 
remove all associated stages with a given job if the job ends. There may also 
be other corner cases I'm not thinking of.
    
    A second assumption this makes is that the job start event will always 
occur before the stage is submitted. Is that definitely true? It would be good 
to dig through the reporting API and make sure that is a safe assumption. I 
think the guarantees of the listener around event ordering are pretty minimal.
    
    Finally, what about putting pending stages after active stages in the 
display page? My concern is you may have dozens or more pending stages in 
production jobs, I think people will be frustrated if they open the UI and they 
have to scroll way down every time they need to e.g. refresh the page. It's a 
big odd to go (active -> pending -> completed), but I think better for 
usability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to