Hey Mark,

The motivation was to separate internal DAGScheduler data structures, such as 
Stage, from the interface we'll present to SparkListener, which will be a 
semi-public API. (Semi-public in that it might still change if we make drastic 
changes to the scheduler, but we want people to be able to use it for 
monitoring with as little pain as possible). We aren't following this 
consistently in all the SparkListener events yet but the goal is to do so.

Matei

On Jul 23, 2013, at 4:22 PM, Mark Hamstra <[email protected]> wrote:

> So I'm currently working in Spark's DAGScheduler and related UI code, and
> I'm finding myself wondering why there are StageInfos distinct from Stages.
> It seems like we go through some bookkeeping to make sure that we can get
> from a Stage to a StageInfo, which in turn is just a pairing of the Stage
> with a collection of (TaskInfo, TaskMetrics) pairs.  Why not avoid the
> bookkeeping and just put that collection of (TaskInfo, TaskMetrics) pairs
> right in the Stage itself?  I.e., directly change the Stage class to
> augment it with the collection instead of indirectly augmenting stages by
> going through the (potentially error-prone) mechanics of maintaining an
> association between a StageInfo distinct from the Stage.
> 
> Or am I missing something?

Reply via email to