My understanding is the committed window could possibly be 30 as well, depends on whether container manager get heart beat from containers.
And I guess the discussion is assuming at_least_once semantic? :) at_most_once should have different recovery window. On Tue, Dec 15, 2015 at 12:01 PM, Timothy Farkas <[email protected]> wrote: > Hi Ashwin, > > In your example, if A fails the recovery windows would be > > D - 15 > C - 15 > B - 15 > A - 15 > > If C fails the recovery windows would be > > D -15 > C -15 > B - 25 > A - 30 > > If every operator just reached window 30 and checkpointed, the committed > window would be 25, and all the checkpoints before window 30 would be > purged, but the checkpoint for window 30 would not be purged. > > Thanks, > Tim > > On Tue, Dec 15, 2015 at 11:41 AM, Ashwin Chandra Putta < > [email protected]> wrote: > > > Tim, > > > > Thanks, that is pretty much inline with what I was thinking. A little > > different thought though in terms of picking the checkpoint based on > > downstream operators. For A, is it not going to be "the checkpoint with > the > > largest window id that is less than or equal to the checkpoint with the > > largest common window id (instead of largest window id) among all the > > operators down stream to A" > > > > For example, > > > > If A -> B -> C -> D is the dag. And say, the checkpoint window count is 5 > > and the largest checkpoints are as follows. > > > > A - 30 > > B - 25 > > C - 20 > > D - 15 > > > > Does A recover at 25 (checkpoint with largest window id) or 15 > (checkpoint > > with largest common window id)? > > > > Also, regarding recovering at committed window id. Is it not possible in > > the following scenario where all operators have checkpointed at 30 and > got > > the committed window call back. And then an operator fails before any > > operator checkpoints further. In that case, the recovery window is 30 > > right? > > > > Regards, > > Ashwin. > > > > On Mon, Dec 14, 2015 at 11:58 PM, Timothy Farkas <[email protected]> > > wrote: > > > > > Hi Ashwin, > > > > > > The recovery checkpoint for operator A is computed by taking the > > checkpoint > > > with the largest window id that is less than or equal to the checkpoint > > > with the largest window id among all the operators down stream to A. > The > > > output operators in a dag will always recover to their most recent > > > checkpoint. The input operator of the dag may recover to the earliest > > > checkpoint. Operators between the input and ouput operators could > recover > > > to a window in between. > > > > > > I don't think you can ever recover to a committed window, the earliest > I > > > think you can recover to is the window after the committed window (may > be > > > wrong on this). > > > > > > On Mon, Dec 14, 2015 at 11:05 PM, Ashwin Chandra Putta < > > > [email protected]> wrote: > > > > > > > In the apex architecture there is concept of checkpointing and > concept > > of > > > > committed when all operator have crossed a common checkpoint. > > > > > > > > So, in which scenarios does a given operator recover at last > checkpoint > > > > window vs last committed window vs some other checkpoint window in > > > between? > > > > -- > > > > > > > > Regards, > > > > Ashwin. > > > > > > > > > > > > > > > -- > > > > Regards, > > Ashwin. > > >
