Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/4043#issuecomment-70216469
  
    Thanks Patrick !
    Have two questions inline.
    
    > I'm not sure this can be merged as-is. The state clean-up here is based 
on the assumption that every stage that is pending will at some later time be 
submitted. Is that definitely true? What happens if a stage is aborted or 
failed, won't its dependent stages remain pending indefinitely for the entire 
history of the application? I think at a minimum you need to make sure you 
remove all associated stages with a given job if the job ends. There may also 
be other corner cases I'm not thinking of.
    
    If I am understanding this correctly, "All stages for a particular job Id 
on jobEnd event - should be cleaned up". I am certainly missing something, is 
this not already achieved by 
[this](https://github.com/apache/spark/pull/4043/files#diff-1f32bcb61f51133bd0959a4177a066a5R191)
 ?
    
    >A second assumption this makes is that the job start event will always 
occur before the stage is submitted. Is that definitely true? It would be good 
to dig through the reporting API and make sure that is a safe assumption. I 
think the guarantees of the listener around event ordering are pretty minimal.
    
    This can lead to an anomaly that a stage will appear in progress in both 
active stages section and pending section. But it will still be obvious that - 
that stage is in progress. Is this safe to ignore, or is it a  concern that 
should be addressed at the cost of minute complexity ?
    
    >Finally, what about putting pending stages after active stages in the 
display page? My concern is you may have dozens or more pending stages in 
production jobs, I think people will be frustrated if they open the UI and they 
have to scroll way down every time they need to e.g. refresh the page. It's a 
big odd to go (active -> pending -> completed), but I think better for 
usability.
    
    This is thoughtful, I am going to incorporate this asap. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to