Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

Arvid Heise Wed, 06 Jan 2021 08:52:21 -0800

Okay then at least you guys are in sync ;) (Although I'm also not too far
away)

I hope I'm not super derailing but could we reiterate why it's good to get
rid of finished tasks (note: I'm also mostly in favor of that):
1. We can free all acquired resources including buffer pools, state
backend(?), threads.
2. TM can forget about the subtask entirely.
3. We can subsequently downscale.
4. What more?

I'm assuming it's not needed to execute the application at all: The
application at one point had all subtasks running, so it's not a resource
issue per se (ignoring rescaling).

My idea is not to let the task live longer (except for final checkpoints
where we are all on the same page I guess). I'm just thinking out loud if
we can avoid 2. while still doing 1.+3.

So can TM retain some slim information about a finished task to still
process RPCs in a potentially different way?
Thus, without keeping the costly task thread and operator chains, could we
implement some RPC handler that knows this is a finished task and forward
the barrier to the next task/TM?
Can we store this slim information in a checkpoint as an operator subtask
state?
Could we transfer this slim information in case of (dynamic) downscaling?

If this somehow works, we would not need to change much in the checkpoint
coordinator. He would always inject into sources. We could also ignore the
race conditions as long as the TM lives. Checkpointing times are also not
worse as with the live task.
Clear downside (assuming feasibility) is that we have two code paths that
would deal with barriers. We would also need to keep more information in
the TM but again at some point the complete subtask fitted.

On Wed, Jan 6, 2021 at 4:39 PM Aljoscha Krettek <aljos...@apache.org> wrote:

> On 2021/01/06 16:05, Arvid Heise wrote:
> >thanks for the detailed example. It feels like Aljoscha and you are also
> >not fully aligned yet. For me, it sounded as if Aljoscha would like to
> >avoid sending RPC to non-source subtasks.
>
> No, I think we need the triggering of intermediate operators.
>
> I was just thinking out loud about the potential scenarios where
> intermediate operators will in fact stay online, and how common they
> are.
>
> Also, I sent an explanation that is similar to Yuns. It seems we always
> write out mails in parallel and then sent them before checking. :-) So
> you always get two explanations of roughly the same thing.
>
> Best,
> Aljoscha
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

Reply via email to