Siyuan,

Yes, we are discussing at least once semantics.

Tim,

So it is indeed possible that we recover at committed window id in a case
where we just committed and there were no further checkpoints before
failure.

Regards,
Ashwin.

On Tue, Dec 15, 2015 at 1:54 PM, Timothy Farkas <[email protected]> wrote:

> Whoops my bad, that would never happen. There is a check that only allows
> purging of checkpoints for an operator if the operator has more than one
> checkpoint. :)
>
> On Tue, Dec 15, 2015 at 1:39 PM, Timothy Farkas <[email protected]>
> wrote:
>
> > Siyuan, then Ashwin may be right that there is an issue. Looking at the
> > code again I think this could happen:
> >
> > 1 - All operators reach checkpiont 30
> > 2 - Checkpoints are updated on heartbeat and committed window is now 25,
> > everything before window 30 is purged
> > 3 - no new checkpoint is reached for any operator
> > 4 - Checkpoints are updated on heartbeat again and committed window is
> now
> > 30, now window 30 is purged.
> >
> > May be missing something again though.
> >
> > On Tue, Dec 15, 2015 at 1:32 PM, Siyuan Hua <[email protected]>
> > wrote:
> >
> >> My understanding is the committed window could possibly be 30 as well,
> >> depends on whether container manager get heart beat from containers.
> >>
> >> And I guess the discussion is assuming at_least_once semantic? :)
> >> at_most_once should have different recovery window.
> >>
> >> On Tue, Dec 15, 2015 at 12:01 PM, Timothy Farkas <[email protected]>
> >> wrote:
> >>
> >> > Hi Ashwin,
> >> >
> >> > In your example, if A fails the recovery windows would be
> >> >
> >> > D - 15
> >> > C - 15
> >> > B - 15
> >> > A - 15
> >> >
> >> > If C fails the recovery windows would be
> >> >
> >> > D -15
> >> > C -15
> >> > B - 25
> >> > A - 30
> >> >
> >> > If every operator just reached window 30 and checkpointed, the
> committed
> >> > window would be 25, and all the checkpoints before window 30 would be
> >> > purged, but the checkpoint for window 30 would not be purged.
> >> >
> >> > Thanks,
> >> > Tim
> >> >
> >> > On Tue, Dec 15, 2015 at 11:41 AM, Ashwin Chandra Putta <
> >> > [email protected]> wrote:
> >> >
> >> > > Tim,
> >> > >
> >> > > Thanks, that is pretty much inline with what I was thinking. A
> little
> >> > > different thought though in terms of picking the checkpoint based on
> >> > > downstream operators. For A, is it not going to be "the checkpoint
> >> with
> >> > the
> >> > > largest window id that is less than or equal to the checkpoint with
> >> the
> >> > > largest common window id (instead of largest window id) among all
> the
> >> > > operators down stream to A"
> >> > >
> >> > > For example,
> >> > >
> >> > > If A -> B -> C -> D is the dag. And say, the checkpoint window count
> >> is 5
> >> > > and the largest checkpoints are as follows.
> >> > >
> >> > > A - 30
> >> > > B - 25
> >> > > C - 20
> >> > > D - 15
> >> > >
> >> > > Does A recover at 25 (checkpoint with largest window id) or 15
> >> > (checkpoint
> >> > > with largest common window id)?
> >> > >
> >> > > Also, regarding recovering at committed window id. Is it not
> possible
> >> in
> >> > > the following scenario where all operators have checkpointed at 30
> and
> >> > got
> >> > > the committed window call back. And then an operator fails before
> any
> >> > > operator checkpoints further. In that case, the recovery window is
> 30
> >> > > right?
> >> > >
> >> > > Regards,
> >> > > Ashwin.
> >> > >
> >> > > On Mon, Dec 14, 2015 at 11:58 PM, Timothy Farkas <
> [email protected]
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Ashwin,
> >> > > >
> >> > > > The recovery checkpoint for operator A is computed by taking the
> >> > > checkpoint
> >> > > > with the largest window id that is less than or equal to the
> >> checkpoint
> >> > > > with the largest window id among all the operators down stream to
> A.
> >> > The
> >> > > > output operators in a dag will always recover to their most recent
> >> > > > checkpoint. The input operator of the dag may recover to the
> >> earliest
> >> > > > checkpoint. Operators between the input and ouput operators could
> >> > recover
> >> > > > to a window in between.
> >> > > >
> >> > > > I don't think you can ever recover to a committed window, the
> >> earliest
> >> > I
> >> > > > think you can recover to is the window after the committed window
> >> (may
> >> > be
> >> > > > wrong on this).
> >> > > >
> >> > > > On Mon, Dec 14, 2015 at 11:05 PM, Ashwin Chandra Putta <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > > > In the apex architecture there is concept of checkpointing and
> >> > concept
> >> > > of
> >> > > > > committed when all operator have crossed a common checkpoint.
> >> > > > >
> >> > > > > So, in which scenarios does a given operator recover at last
> >> > checkpoint
> >> > > > > window vs last committed window vs some other checkpoint window
> in
> >> > > > between?
> >> > > > > --
> >> > > > >
> >> > > > > Regards,
> >> > > > > Ashwin.
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > > Regards,
> >> > > Ashwin.
> >> > >
> >> >
> >>
> >
> >
>



-- 

Regards,
Ashwin.

Reply via email to