If checkpoint is a multiple of windows and end window tuples are already flowing and triggering end windows on the operators is there additional knowledge being gained by a checkpoint tuple. I can see one advantage that you can force a checkpoint throughout the system adhoc on a window if the STRAM decides.
Chetan can you give me an example of where the operator checkpoint at a multiple greater than the application checkpoint would be used. I would think something like operator wanting to set their own checkpoint interval as an absolute unrelated to another checkpointing mechanism would be more useful. On Fri, Nov 13, 2015 at 9:03 AM, Amol Kekre <[email protected]> wrote: > There is an additional impact of using checkpoint tuple as opposed to each > StramChild simply checkpointing at pre-known windows. This is the knowledge > of checkpoint flow as per Chetan's #1. Stram will know that the checpoint > tuple has passed through all upstream operators. In non-blocking > checkpoints (default) this may not be as critical, but for blocking > checkpoints it may be important. Plus the logic to > re-construct/re-partition does become a lot simpler with this knowledge. > > Getting my memory back, after Chetan's email :) the trigger thought to move > to checkpoint tuple was the ease of aligning checkpoints aka get a clear > application-wide state as Chetan stated. Technically hard coding these > numbers in each StramChild (per operator) may work, but checkpoint tuple > made it easy and Stram could then leverage this as knowledge. Another path > down the memory - I was pushing for heartbeat control tuple(s) whereever we > can. These are tuples that flow through dataflow and report back some > content from which application condiition/dataflow aspects can be derived. > These are needed for a non-blocking master to function. A very critical > part for operabilty we used in past attempts are distributed data-in-motion > architecturees. Control tuple solved that purpose from checkpointing > triggers point of view. WindowId control tuples solved that via dataflow > point of view. > > Thks, > Amol > > > On Thu, Nov 12, 2015 at 9:07 PM, Chetan Narsude (cnarsude) < > [email protected]> wrote: > > > Pramod, the previous design was to checkpoint at random window ids. The > > issue with that was that repartitioning/recovery could be impossible in > > certain cases if all the partitions did not checkpoint at the same > window. > > This is the new design with the control tuple although > > checkpoint_window_count was added later to let the operators delay their > > checkpoint to a later window than the time when they would normally > > checkpoint with the control tuple. We did not want them to be able to do > > the checkpoint earlier than scheduled one as that decision would be > > centrally controlled via application. Useful where the operator > attributes > > are allowed to be configured independent of the application attributes. > > It¹s also documented with the OperatorContext.CHECKPOINT_WINDOW_COUNT > > > > /** > > * Attribute of the operator that hints at the optimal checkpoint > > boundary. > > * By default checkpointing happens after every predetermined > > streaming windows. Application developer can override > > * this behavior by defining the following attribute. When this > > attribute is defined, checkpointing will be done after > > * completion of later of regular checkpointing window and the window > > whose serial number is divisible by the attribute > > * value. Typically user would define this value to be the same as > > that of APPLICATION_WINDOW_COUNT so checkpointing > > * will be done at application window boundary. > > */ > > Attribute<Integer> CHECKPOINT_WINDOW_COUNT = new > Attribute<Integer>(1); > > > > > > > > Besides this design based on the requirement: > > 1. Checkpointing tuple staggers the checkpoints amongst multiple stages. > > It does not trigger checkpoint operation unless upstream operator is done > > checkpointing. This often results in better resource utilization with > > different resources in different configurations. > > 2. Checkpoint tuple helps with resetting the state of the stateful stream > > codecs. > > > > Tim, > > > > The reason for double checkpoint appears to be a bug where the > > lastCheckpointWindowId is not set after checkpoint in the endWindow. The > > condition in ŒCHECKPOINT:¹ case was added to avoid double checkpoints. > Can > > you confirm? > > > > ‹ > > Chetan > > > > > > > > > > On 11/12/15, 6:07 PM, "Amol Kekre" <[email protected]> wrote: > > > > >I am trying to recollect too. I do remember Chetan, Thomas, and I going > > >deep on this choice. One issue was the efficiency of current setup. Only > > >the inputAdapters had to insert control tuple, all other operators were > as > > >is. I will try to recollect other details. or maybe Chetan or Thomas can > > >comment. > > > > > >Thks, > > >Amol > > > > > > > > >On Thu, Nov 12, 2015 at 5:53 PM, Pramod Immaneni < > [email protected]> > > >wrote: > > > > > >> From what I am seeing so far (when implementing APEX-246) it is a left > > >>over > > >> from an earlier implementation but I am not completely sure yet. > > >> > > >> On Thu, Nov 12, 2015 at 5:43 PM, Timothy Farkas <[email protected]> > > >> wrote: > > >> > > >> > After stumbling on https://malhar.atlassian.net/browse/APEX-263 I > am > > >> > wondering what the purpose of the CHECKPOINT control tuple is? Why > is > > >>it > > >> > not sufficient to have each operator checkpoint after it's > checkpoint > > >> > window has passed? > > >> > > > >> > Thanks, > > >> > Tim > > >> > > > >> > > > > >
