Ah, got it. So Stage and TaskInfo are opaque outside spark, while TaskMetrics are visible.
On Tue, Jul 23, 2013 at 4:41 PM, Matei Zaharia <[email protected]>wrote: > Hey Mark, > > The motivation was to separate internal DAGScheduler data structures, such > as Stage, from the interface we'll present to SparkListener, which will be > a semi-public API. (Semi-public in that it might still change if we make > drastic changes to the scheduler, but we want people to be able to use it > for monitoring with as little pain as possible). We aren't following this > consistently in all the SparkListener events yet but the goal is to do so. > > Matei > > On Jul 23, 2013, at 4:22 PM, Mark Hamstra <[email protected]> wrote: > > > So I'm currently working in Spark's DAGScheduler and related UI code, and > > I'm finding myself wondering why there are StageInfos distinct from > Stages. > > It seems like we go through some bookkeeping to make sure that we can get > > from a Stage to a StageInfo, which in turn is just a pairing of the Stage > > with a collection of (TaskInfo, TaskMetrics) pairs. Why not avoid the > > bookkeeping and just put that collection of (TaskInfo, TaskMetrics) pairs > > right in the Stage itself? I.e., directly change the Stage class to > > augment it with the collection instead of indirectly augmenting stages by > > going through the (potentially error-prone) mechanics of maintaining an > > association between a StageInfo distinct from the Stage. > > > > Or am I missing something? > >
