Re: Stage vs. StageInfo

Mark Hamstra Tue, 23 Jul 2013 16:49:38 -0700

Ah, got it.  So Stage and TaskInfo are opaque outside spark, while
TaskMetrics are visible.



On Tue, Jul 23, 2013 at 4:41 PM, Matei Zaharia <[email protected]>wrote:

> Hey Mark,
>
> The motivation was to separate internal DAGScheduler data structures, such
> as Stage, from the interface we'll present to SparkListener, which will be
> a semi-public API. (Semi-public in that it might still change if we make
> drastic changes to the scheduler, but we want people to be able to use it
> for monitoring with as little pain as possible). We aren't following this
> consistently in all the SparkListener events yet but the goal is to do so.
>
> Matei
>
> On Jul 23, 2013, at 4:22 PM, Mark Hamstra <[email protected]> wrote:
>
> > So I'm currently working in Spark's DAGScheduler and related UI code, and
> > I'm finding myself wondering why there are StageInfos distinct from
> Stages.
> > It seems like we go through some bookkeeping to make sure that we can get
> > from a Stage to a StageInfo, which in turn is just a pairing of the Stage
> > with a collection of (TaskInfo, TaskMetrics) pairs.  Why not avoid the
> > bookkeeping and just put that collection of (TaskInfo, TaskMetrics) pairs
> > right in the Stage itself?  I.e., directly change the Stage class to
> > augment it with the collection instead of indirectly augmenting stages by
> > going through the (potentially error-prone) mechanics of maintaining an
> > association between a StageInfo distinct from the Stage.
> >
> > Or am I missing something?
>
>

Re: Stage vs. StageInfo

Reply via email to