Thanks to zhijiang for a detailed explanation. I would do some supplements Blink has indeed solved this particular problem. This problem can be identified in Blink and the upstream will be restarted by Blink thanks
zhijiang <wangzhijiang...@aliyun.com.invalid> 于2019年1月25日周五 下午12:04写道: > Hi Bo, > > Your mentioned problems can be summaried into two issues: > > 1. Failover strategy should consider whether the upstream produced > partition is still available when the downstream fails. If the produced > partition is available, then only downstream region needs to restarted, > otherwise the upstream region should also be restarted to re-produce the > partition data. > 2. The lifecycle of partition: Currently once the partition data is > transfered via network completely, the partition and view would be released > from producer side, no matter whether the data is actually processed by > consumer or not. Even the TaskManager would be released earier when the > partition data is not transfered yet. > > Both issues are already considered in my proposed pluggable shuffle > manager architecutre which would introduce the ShuffleMaster componenet to > manage partition globally on JobManager side, then it is natural to solve > the above problems based on this architecuture. You can refer to the flip > [1] if interested. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager > > Best, > Zhijiang > ------------------------------------------------------------------ > From:Stephan Ewen <se...@apache.org> > Send Time:2019年1月24日(星期四) 22:17 > To:dev <dev@flink.apache.org>; Kurt Young <k...@apache.org> > Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly > readable to support fine grained recovery > > The SpillableSubpartition can also be used during the execution of bounded > DataStreams programs. I think this is largely independent from deprecating > the DataSet API. > > I am wondering if this particular issue is one that has been addressed in > the Blink code already (we are looking to merge much of that functionality) > - because the proposed extension is actually necessary for proper batch > fault tolerance (independent of the DataSet or Query Processor stack). > > I am adding Kurt to this thread - maybe he help us find that out. > > On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski <pi...@da-platform.com> > wrote: > > > Hi, > > > > I’m not sure how much effort we will be willing to invest in the existing > > batch stack. We are currently focusing on the support of bounded > > DataStreams (already done in Blink and will be merged to Flink soon) and > > unifing batch & stream under DataStream API. > > > > Piotrek > > > > > On 23 Jan 2019, at 04:45, Bo WANG <wbeaglewatc...@gmail.com> wrote: > > > > > > Hi all, > > > > > > When running the batch WordCount example, I configured the job > execution > > > mode > > > as BATCH_FORCED, and failover-strategy as region, I manually injected > > some > > > errors to let the execution fail in different phases. In some cases, > the > > > job could > > > recovery from failover and became succeed, but in some cases, the job > > > retried > > > several times and failed. > > > > > > Example: > > > - If the failure occurred before task read data, e.g., failed before > > > invokable.invoke() in Task.java, failover could succeed. > > > - If the failure occurred after task having read data, failover did not > > > work. > > > > > > Problem diagnose: > > > Running the example described before, each ExecutionVertex is defined > as > > > a restart region, and the ResultPartitionType between executions is > > > BLOCKING. > > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to > > > write/read > > > shuffle data, and data blocks are described as BufferConsumers stored > in > > a > > > list > > > called buffers, when task requires input data from > > > SpillableSubpartitionView, > > > BufferConsumers are REMOVED from buffers. Thus, when failures occurred > > > after having read data, some BufferConsumers have already released. > > > Although tasks retried, the input data is incomplete. > > > > > > Fix Proposal: > > > - BufferConsumer should not be removed from buffers until the consumed > > > ExecutionVertex is terminal. > > > - SpillableSubpartition should not be released until the consumed > > > ExecutionVertex is terminal. > > > - SpillableSubpartition could creates multi SpillableSubpartitionViews, > > > each of which is corresponding to a ExecutionAttempt. > > > > > > Best, > > > Bo > > > > > >