Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

Arvid Heise Wed, 06 Jan 2021 02:31:11 -0800

>
> I was referring to the case where intermediate operators don't have any
> active upstream (input) operators. In that case, they basically become
> the "source" of that part of the graph. In your example, M1 is still
> connected to a "real" source.

I'm assuming that this is the normal case. In a A->B graph, as soon as A
finishes, B still has a couple of input buffers to process. If you add
backpressure or longer pipelines into the mix, it's quite likely that a
checkpoint may occur with B being the head.

    For faked finished tasks, I have some concerns that if the faked
> finished tasks reside in the JM side, there should still be the race
> condition between triggering
>     and tasks get finished, and if the faked finished tasks reside in the
> TM side, we would have to keep consider these tasks in scheduler when
> failover happens.
>     Besides, we would also need to keep the channels between the faked
> finished tasks and normal tasks to pass the checkpoint barriers, this would
> have some conflicts with
>     the current tasks' lifecycle since we still need to keep channels open
> and send messages after EndOfPartitions are sent. If we have mixed jobs
> with both bounded and
>      unbounded sources, the left network channels would not have a chance
> to get closed.
>

These are all valid concerns but I fear that if we don't find a solution to
them, we will not have a reliable system (cancelling checkpoints when
encountering this race condition with a higher DOP).

Let me clarify that faked finished tasks should reside on the TM that they
previously lived. Only through some kind of job stealing that is necessary
for dynamic rescaling they may end up in JM (that's all far down the road).

I have not thought about channels, but I think you are right that channels
should be strictly bound to the life-cycle of a subtask. The question is if
fake finished tasks do need to use channels at all. We could also relay RPC
calls from TM to TM.

For all practical purposes on checkpoint barrier alignment, EndOfPartitions
should make the channel being excluded from alignment (the respective
channel has implicitly received all future barriers).

On Wed, Jan 6, 2021 at 10:46 AM Aljoscha Krettek <aljos...@apache.org>
wrote:

> On 2021/01/05 17:27, Arvid Heise wrote:
> >For your question: will there ever be intermediate operators that should
> be
> >running that are not connected to at least once source?
> >I think there are plenty of examples if you go beyond chained operators
> and
> >fully connected exchanges. Think of any fan-in, let's assume you have
> >source S1...S4, with S1+S2->M1, and S3+S4->M2. If S1 is finished, S2 and
> M1
> >is still running. Or I didn't get your question ;).
>
> I was referring to the case where intermediate operators don't have any
> active upstream (input) operators. In that case, they basically become
> the "source" of that part of the graph. In your example, M1 is still
> connected to a "real" source.
>
> Best,
> Aljoscha
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

Reply via email to