Re: [Proposal] Named Checkpoints

Amol Kekre Thu, 04 Aug 2016 10:31:32 -0700

We had an user who wanted roll-back and restart from audit purposes. That
time we did not have timed-window. Names checkpoint would have helped a
little bit..


Problem statement: Auditors ask for rerun of yesterday's computations for
verification. Assume that these computations depend on previous state (i.e
data from day before yesterday).

Solution
1. Have named checkpoints at 12 in the night (an input adapter triggers it)
every day
2. The app spools raw logs into hdfs along with window ids and event times
3. The re-run is a separate app that starts off on a named checkpoint (12
night yesterday)

Technically the solution will not as simple and "new audit app" will need a
lot of other checks (dedups, drop events not in yesterday's window, wait
for late arrivals, ...), but names checkpoint helps.

I do agree with Pramod's that replay within the same running app is not
viable within a data-in-motion architecture. But it helps somewhat in a new
audit app. Named checkpoints help data-in-motion architectures handle batch
apps better. In the above case #2 spooling done with event time stamp+state
suffices. The state part comes from names checkpoint.

Thks,
Amol




On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <[email protected]>
wrote:

> I agree. A specific use-case will be useful to support this feature. Also
> the ability to replay from the named checkpoint will be limited because of
> various factors, isn’t it?
>
> On 8/4/16, 9:00 AM, "Pramod Immaneni" <[email protected]> wrote:
>
>     There is a problem here, keeping old checkpoints and recovering from
> them
>     means preserving the old input data along with the state. This is more
> than
>     the mechanism of actually creating named checkpoints, it means having
> the
>     ability for operators to move forward (a.k.a committed and dropping
>     committed states and buffer data) while still having the ability to
> replay
>     from that point from the input source and providing a way for
> operators (at
>     first look input operators) to distinguish that. Why would someone need
>     this with idempotent processing? Is there a specific use case you are
>     looking at? Suppose we go do this, for the mechanism, I would be in
> favor
>     of reusing existing tuple.
>
>     On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <[email protected]>
> wrote:
>
>     > +1 for the feature. At first look I am more in favor of reusing
> existing
>     > control tuple.
>     >
>     > Thank you,
>     >
>     > Vlad
>     >
>     >
>     > On 8/4/16 08:17, Sandesh Hegde wrote:
>     >
>     >> @Chinmay
>     >> We can enhance the existing checkpoint tuple but that one is more
>     >> frequently used than this feature, so why burden Checkpoint tuple
> with
>     >> an extra field?
>     >>
>     >> @Aniruddha
>     >> It is better to leave the scheduling to the users, they can use any
> tool
>     >> that they are already familiar with.
>     >>
>     >> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
>     >> [email protected]>
>     >> wrote:
>     >>
>     >> +1 On the idea, it would be awesome to have.
>     >>>
>     >>> Question: Can we further develop this brilliant idea into:-
>     >>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
>     >>> This would be on the lines of logrotate / general backup
> strategies.
>     >>>
>     >>>
>     >>> Thanks,
>     >>>
>     >>> A
>     >>>
>     >>> _____________________________________
>     >>> Sent with difficulty, I mean handheld ;)
>     >>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <[email protected]>
> wrote:
>     >>>
>     >>> +1
>     >>>>
>     >>>> Ram
>     >>>>
>     >>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> [email protected]
>     >>>> >
>     >>>> wrote:
>     >>>>
>     >>>> Hello Team,
>     >>>>>
>     >>>>> This thread is to discuss the Named Checkpoint feature for Apex.
> (
>     >>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
>     >>>>>
>     >>>>> Named checkpoints allow following workflow,
>     >>>>>
>     >>>>> 1. Users can trigger a checkpoint and give it a name
>     >>>>> 2. Relaunch the application from the named checkpoint.
>     >>>>> 3. These checkpoints survive the "purge of old checkpoints".
>     >>>>>
>     >>>>> Current idea is to add a new control tuple,
> NamedCheckPointTuple, which
>     >>>>> contains the user specified name, it traverses the DAG and along
> the
>     >>>>>
>     >>>> way
>     >>>
>     >>>> necessary actions are taken.
>     >>>>>
>     >>>>> Please let me know your thoughts on this.
>     >>>>>
>     >>>>> Thanks
>     >>>>>
>     >>>>>
>     >
>
>
>
>

Re: [Proposal] Named Checkpoints

Reply via email to