Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/42#discussion_r10363343
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -30,16 +32,23 @@ import org.apache.spark.scheduler._
      * class, since the UI thread and the DAGScheduler event loop may otherwise
      * be reading/updating the internal data structures concurrently.
      */
    -private[spark] class JobProgressListener(val sc: SparkContext) extends 
SparkListener {
    +private[ui] class JobProgressListener(sc: SparkContext, live: Boolean)
    +  extends StorageStatusSparkListener {
    --- End diff --
    
    This seems a little confusing to me...it looks like this extends 
StorageStatusSparkListener because it needs access to the list of executors and 
their addresses through the storageStatusList member variable.  But for 
JobProgressListener, storageStatusList isn't really kept up to date correctly 
-- because onTaskEnd gets overridden by JobProgressListener, so the RDD info 
won't get correctly updated when the task ends.
    
    I'd propose simplifying this by having JobProgressListener just extend the 
usual SparkListener.  Then, you need a way of getting the executors for the 
ExecutorTable on the stage status page.  But you can get the list of executor 
Ids / hostnames from the list of tasks (which works because you only care about 
the executors that are being used by one of the tasks anyway).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to