Whoops my bad, that would never happen. There is a check that only allows
purging of checkpoints for an operator if the operator has more than one
checkpoint. :)

On Tue, Dec 15, 2015 at 1:39 PM, Timothy Farkas <[email protected]> wrote:

> Siyuan, then Ashwin may be right that there is an issue. Looking at the
> code again I think this could happen:
>
> 1 - All operators reach checkpiont 30
> 2 - Checkpoints are updated on heartbeat and committed window is now 25,
> everything before window 30 is purged
> 3 - no new checkpoint is reached for any operator
> 4 - Checkpoints are updated on heartbeat again and committed window is now
> 30, now window 30 is purged.
>
> May be missing something again though.
>
> On Tue, Dec 15, 2015 at 1:32 PM, Siyuan Hua <[email protected]>
> wrote:
>
>> My understanding is the committed window could possibly be 30 as well,
>> depends on whether container manager get heart beat from containers.
>>
>> And I guess the discussion is assuming at_least_once semantic? :)
>> at_most_once should have different recovery window.
>>
>> On Tue, Dec 15, 2015 at 12:01 PM, Timothy Farkas <[email protected]>
>> wrote:
>>
>> > Hi Ashwin,
>> >
>> > In your example, if A fails the recovery windows would be
>> >
>> > D - 15
>> > C - 15
>> > B - 15
>> > A - 15
>> >
>> > If C fails the recovery windows would be
>> >
>> > D -15
>> > C -15
>> > B - 25
>> > A - 30
>> >
>> > If every operator just reached window 30 and checkpointed, the committed
>> > window would be 25, and all the checkpoints before window 30 would be
>> > purged, but the checkpoint for window 30 would not be purged.
>> >
>> > Thanks,
>> > Tim
>> >
>> > On Tue, Dec 15, 2015 at 11:41 AM, Ashwin Chandra Putta <
>> > [email protected]> wrote:
>> >
>> > > Tim,
>> > >
>> > > Thanks, that is pretty much inline with what I was thinking. A little
>> > > different thought though in terms of picking the checkpoint based on
>> > > downstream operators. For A, is it not going to be "the checkpoint
>> with
>> > the
>> > > largest window id that is less than or equal to the checkpoint with
>> the
>> > > largest common window id (instead of largest window id) among all the
>> > > operators down stream to A"
>> > >
>> > > For example,
>> > >
>> > > If A -> B -> C -> D is the dag. And say, the checkpoint window count
>> is 5
>> > > and the largest checkpoints are as follows.
>> > >
>> > > A - 30
>> > > B - 25
>> > > C - 20
>> > > D - 15
>> > >
>> > > Does A recover at 25 (checkpoint with largest window id) or 15
>> > (checkpoint
>> > > with largest common window id)?
>> > >
>> > > Also, regarding recovering at committed window id. Is it not possible
>> in
>> > > the following scenario where all operators have checkpointed at 30 and
>> > got
>> > > the committed window call back. And then an operator fails before any
>> > > operator checkpoints further. In that case, the recovery window is 30
>> > > right?
>> > >
>> > > Regards,
>> > > Ashwin.
>> > >
>> > > On Mon, Dec 14, 2015 at 11:58 PM, Timothy Farkas <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Hi Ashwin,
>> > > >
>> > > > The recovery checkpoint for operator A is computed by taking the
>> > > checkpoint
>> > > > with the largest window id that is less than or equal to the
>> checkpoint
>> > > > with the largest window id among all the operators down stream to A.
>> > The
>> > > > output operators in a dag will always recover to their most recent
>> > > > checkpoint. The input operator of the dag may recover to the
>> earliest
>> > > > checkpoint. Operators between the input and ouput operators could
>> > recover
>> > > > to a window in between.
>> > > >
>> > > > I don't think you can ever recover to a committed window, the
>> earliest
>> > I
>> > > > think you can recover to is the window after the committed window
>> (may
>> > be
>> > > > wrong on this).
>> > > >
>> > > > On Mon, Dec 14, 2015 at 11:05 PM, Ashwin Chandra Putta <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > In the apex architecture there is concept of checkpointing and
>> > concept
>> > > of
>> > > > > committed when all operator have crossed a common checkpoint.
>> > > > >
>> > > > > So, in which scenarios does a given operator recover at last
>> > checkpoint
>> > > > > window vs last committed window vs some other checkpoint window in
>> > > > between?
>> > > > > --
>> > > > >
>> > > > > Regards,
>> > > > > Ashwin.
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Regards,
>> > > Ashwin.
>> > >
>> >
>>
>
>

Reply via email to