Hi David,

Very thanks for the feedback and glad to see that we have the same opinions on 
a lot of points of the
iteration! :)

And for the checkpoint, basically to support the checkpoint for a job with 
feedback edges, we 
need to also include the records on the feedback edges into the checkpoint 
snapshot, as described in [1].
We do this by exploiting the reference count mechanism provided by the raw 
states so that the
asynchronous phase would wait until we finish writing all the feedback records 
into the raw states, 
which is also similar to the implementation in the statefun. 

Including the feedback records into snapshot is enough for the unbounded 
iteration, but for the bounded iteration,
we would also need
1. Checkpoint after tasks finished: since for an iteration job with bounded 
inputs, most time of the execution is spent
after all the sources are finished and the iteration body is executing, we 
would need to support checkpoints during
this period. Fortunately in 1.14 we have implemented the first version of this 
functionality.
2. Keep the notification of round increment exactly-once: for bounded iteration 
we would notify the round end for each
operator via onEpochWatermarkIncrement(), this is done by insert epoch 
watermarks at the end of each round. We would
like to keep the notification of onEpochWatermarkIncrement() exactly-once to 
simplify the algorithms' development. This
is done by ensuring that the epoch watermarks with the same epoch value and the 
barriers of the same checkpoint always
have the same order when transmitting in the iteration body. With this 
condition, after failover all the operators inside the
iteration body must have received the same amount of notifications, and we 
could start with the next one.  Also since the epoch 
watermarks might also be snapshot in the feedback edge snapshot, we disable the 
rescaling of the head / tail operators for the 
bounded iteration. 

Best,
Yun



[1] https://arxiv.org/abs/1506.08603



------------------------------------------------------------------
From:David Morávek <d...@apache.org>
Send Time:2021 Oct. 4 (Mon.) 14:05
To:dev <dev@flink.apache.org>; Yun Gao <yungao...@aliyun.com>
Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink 
ML)

Hi Yun,

I did a quick pass over the design doc and it addresses all of the problems
with the current iterations I'm aware of. It's great to see that you've
been able to workaround the need of vectorized watermarks by giving up
nested iterations (which IMO is more of an academic concept than something
with a solid use-case). I'll try to give it some more thoughts, but from a
first pass it looks great +1 ;)

One thing that I'm unsure about, how do you plan to implement exactly-once
checkpointing of the feedback edge?

Best,
D.

On Mon, Oct 4, 2021 at 4:42 AM Yun Gao <yungao...@aliyun.com.invalid> wrote:

> Hi all,
>
> If we do not have other concerns on this FLIP, we would like to start the
> voting by the end of oct 8th.
>
> Best,
> Yun.
>
>
> ------------------------------------------------------------------
> From:Yun Gao <yungao...@aliyun.com>
> Send Time:2021 Sep. 15 (Wed.) 20:47
> To:dev <dev@flink.apache.org>
> Subject:[DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink
> ML)
>
>
> Hi all,
>
> DongLin, ZhiPeng and I are opening this thread to propose designing and
> implementing a new iteration library inside the Flink-ML project, as
> described in
>  FLIP-176[1].
>
> Iteration serves as a fundamental functionality required to support the
> implementation
> of ML algorithms. Previously Flink supports bounded iteration on top of
> the
> DataSet API and unbounded iteration on top of the DataStream API. However,
> since we are going to deprecated the dataset API and the current unbounded
> iteration
> API on top of the DataStream API is not fully complete, thus we are
> proposing
> to add the new unified iteration library on top of DataStream API to
> support both
> unbounded and bounded iterations.
>
> Very thanks for your feedbacks!
>
> [1] https://cwiki.apache.org/confluence/x/hAEBCw
>
> Best,Yun



------------------------------------------------------------------
From:David Morávek <d...@apache.org>
Send Time:2021 Oct. 4 (Mon.) 14:05
To:dev <dev@flink.apache.org>; Yun Gao <yungao...@aliyun.com>
Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink 
ML)

Hi Yun,

I did a quick pass over the design doc and it addresses all of the problems
with the current iterations I'm aware of. It's great to see that you've
been able to workaround the need of vectorized watermarks by giving up
nested iterations (which IMO is more of an academic concept than something
with a solid use-case). I'll try to give it some more thoughts, but from a
first pass it looks great +1 ;)

One thing that I'm unsure about, how do you plan to implement exactly-once
checkpointing of the feedback edge?

Best,
D.

On Mon, Oct 4, 2021 at 4:42 AM Yun Gao <yungao...@aliyun.com.invalid> wrote:

> Hi all,
>
> If we do not have other concerns on this FLIP, we would like to start the
> voting by the end of oct 8th.
>
> Best,
> Yun.
>
>
> ------------------------------------------------------------------
> From:Yun Gao <yungao...@aliyun.com>
> Send Time:2021 Sep. 15 (Wed.) 20:47
> To:dev <dev@flink.apache.org>
> Subject:[DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink
> ML)
>
>
> Hi all,
>
> DongLin, ZhiPeng and I are opening this thread to propose designing and
> implementing a new iteration library inside the Flink-ML project, as
> described in
>  FLIP-176[1].
>
> Iteration serves as a fundamental functionality required to support the
> implementation
> of ML algorithms. Previously Flink supports bounded iteration on top of
> the
> DataSet API and unbounded iteration on top of the DataStream API. However,
> since we are going to deprecated the dataset API and the current unbounded
> iteration
> API on top of the DataStream API is not fully complete, thus we are
> proposing
> to add the new unified iteration library on top of DataStream API to
> support both
> unbounded and bounded iterations.
>
> Very thanks for your feedbacks!
>
> [1] https://cwiki.apache.org/confluence/x/hAEBCw
>
> Best,Yun

Reply via email to