So I'm currently working in Spark's DAGScheduler and related UI code, and I'm finding myself wondering why there are StageInfos distinct from Stages. It seems like we go through some bookkeeping to make sure that we can get from a Stage to a StageInfo, which in turn is just a pairing of the Stage with a collection of (TaskInfo, TaskMetrics) pairs. Why not avoid the bookkeeping and just put that collection of (TaskInfo, TaskMetrics) pairs right in the Stage itself? I.e., directly change the Stage class to augment it with the collection instead of indirectly augmenting stages by going through the (potentially error-prone) mechanics of maintaining an association between a StageInfo distinct from the Stage.
Or am I missing something?
