[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142772#comment-14142772 ]
Josh Rosen commented on SPARK-2321: ----------------------------------- The scheduler has some data structures like StageInfo, TaskInfo, RDDInfo, etc. that expose some of the information that we might want in a user-facing progress API, but we can't expose these classes in their current form since they're marked @DeveloperAPI and are full of public, mutable fields (the responses returned from our progress / status API need to be immutable). Maybe we should stabilize these scheduler.*Info classes' public interfaces, make them immutable, and add a JobInfo class for capturing per-job information. We can then register a new, private SparkListener for maintaining a view of stage progress and add methods to SparkContext that provide stable, pull-based access to the snapshots of job/stage/task state. > Design a proper progress reporting & event listener API > ------------------------------------------------------- > > Key: SPARK-2321 > URL: https://issues.apache.org/jira/browse/SPARK-2321 > Project: Spark > Issue Type: Improvement > Components: Java API, Spark Core > Affects Versions: 1.0.0 > Reporter: Reynold Xin > Assignee: Josh Rosen > Priority: Critical > > This is a ticket to track progress on redesigning the SparkListener and > JobProgressListener API. > There are multiple problems with the current design, including: > 0. I'm not sure if the API is usable in Java (there are at least some enums > we used in Scala and a bunch of case classes that might complicate things). > 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of > attention to it yet. Something as important as progress reporting deserves a > more stable API. > 2. There is no easy way to connect jobs with stages. Similarly, there is no > easy way to connect job groups with jobs / stages. > 3. JobProgressListener itself has no encapsulation at all. States can be > arbitrarily mutated by external programs. Variable names are sort of randomly > decided and inconsistent. > We should just revisit these and propose a new, concrete design. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org