[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142772#comment-14142772
 ] 

Josh Rosen commented on SPARK-2321:
-----------------------------------

The scheduler has some data structures like StageInfo, TaskInfo, RDDInfo, etc. 
that expose some of the information that we might want in a user-facing 
progress API, but we can't  expose these classes in their current form since 
they're marked @DeveloperAPI and are full of public, mutable fields (the 
responses returned from our progress / status API need to be immutable).

Maybe we should stabilize these scheduler.*Info classes' public interfaces, 
make them immutable, and add a JobInfo class for capturing per-job information. 
 We can then register a new, private SparkListener for maintaining a view of 
stage progress and add methods to SparkContext that provide stable, pull-based 
access to the snapshots of job/stage/task state.

> Design a proper progress reporting & event listener API
> -------------------------------------------------------
>
>                 Key: SPARK-2321
>                 URL: https://issues.apache.org/jira/browse/SPARK-2321
>             Project: Spark
>          Issue Type: Improvement
>          Components: Java API, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Reynold Xin
>            Assignee: Josh Rosen
>            Priority: Critical
>
> This is a ticket to track progress on redesigning the SparkListener and 
> JobProgressListener API.
> There are multiple problems with the current design, including:
> 0. I'm not sure if the API is usable in Java (there are at least some enums 
> we used in Scala and a bunch of case classes that might complicate things).
> 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
> attention to it yet. Something as important as progress reporting deserves a 
> more stable API.
> 2. There is no easy way to connect jobs with stages. Similarly, there is no 
> easy way to connect job groups with jobs / stages.
> 3. JobProgressListener itself has no encapsulation at all. States can be 
> arbitrarily mutated by external programs. Variable names are sort of randomly 
> decided and inconsistent. 
> We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to