Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

Guowei Ma Thu, 24 Jan 2019 22:56:48 -0800

Thanks to zhijiang for a detailed explanation. I would do some supplements
Blink has indeed solved this particular problem. This problem can be
identified in Blink and the upstream will be restarted by Blink
thanks


zhijiang <wangzhijiang...@aliyun.com.invalid> 于2019年1月25日周五 下午12:04写道：

> Hi Bo,
>
> Your mentioned problems can be summaried into two issues:
>
> 1. Failover strategy should consider whether the upstream produced
> partition is still available when the downstream fails. If the produced
> partition is available, then only downstream region needs to restarted,
> otherwise the upstream region should also be restarted to re-produce the
> partition data.
> 2. The lifecycle of partition: Currently once the partition data is
> transfered via network completely, the partition and view would be released
> from producer side, no matter whether the data is actually processed by
> consumer or not. Even the TaskManager would be released earier when the
> partition data is not transfered yet.
>
> Both issues are already considered in my proposed pluggable shuffle
> manager architecutre which would introduce the ShuffleMaster componenet to
> manage partition globally on JobManager side, then it is natural to solve
> the above problems based on this architecuture. You can refer to the flip
> [1] if interested.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
>
> Best,
> Zhijiang
> ------------------------------------------------------------------
> From:Stephan Ewen <se...@apache.org>
> Send Time:2019年1月24日(星期四) 22:17
> To:dev <dev@flink.apache.org>; Kurt Young <k...@apache.org>
> Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly
> readable to support fine grained recovery
>
> The SpillableSubpartition can also be used during the execution of bounded
> DataStreams programs. I think this is largely independent from deprecating
> the DataSet API.
>
> I am wondering if this particular issue is one that has been addressed in
> the Blink code already (we are looking to merge much of that functionality)
> - because the proposed extension is actually necessary for proper batch
> fault tolerance (independent of the DataSet or Query Processor stack).
>
> I am adding Kurt to this thread - maybe he help us find that out.
>
> On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <pi...@da-platform.com>
> wrote:
>
> > Hi,
> >
> > I’m not sure how much effort we will be willing to invest in the existing
> > batch stack. We are currently focusing on the support of bounded
> > DataStreams (already done in Blink and will be merged to Flink soon) and
> > unifing batch & stream under DataStream API.
> >
> > Piotrek
> >
> > > On 23 Jan 2019, at 04:45, Bo WANG <wbeaglewatc...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > When running the batch WordCount example,  I configured the job
> execution
> > > mode
> > > as BATCH_FORCED, and failover-strategy as region, I manually injected
> > some
> > > errors to let the execution fail in different phases. In some cases,
> the
> > > job could
> > > recovery from failover and became succeed, but in some cases, the job
> > > retried
> > > several times and failed.
> > >
> > > Example:
> > > - If the failure occurred before task read data, e.g., failed before
> > > invokable.invoke() in Task.java, failover could succeed.
> > > - If the failure occurred after task having read data, failover did not
> > > work.
> > >
> > > Problem diagnose:
> > > Running the example described before, each ExecutionVertex is defined
> as
> > > a restart region, and the ResultPartitionType between executions is
> > > BLOCKING.
> > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > > write/read
> > > shuffle data, and data blocks are described as BufferConsumers stored
> in
> > a
> > > list
> > > called buffers, when task requires input data from
> > > SpillableSubpartitionView,
> > > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > > after having read data, some BufferConsumers have already released.
> > > Although tasks retried, the input data is incomplete.
> > >
> > > Fix Proposal:
> > > - BufferConsumer should not be removed from buffers until the consumed
> > > ExecutionVertex is terminal.
> > > - SpillableSubpartition should not be released until the consumed
> > > ExecutionVertex is terminal.
> > > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > > each of which is corresponding to a ExecutionAttempt.
> > >
> > > Best,
> > > Bo
> >
> >
>
>

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

Reply via email to