[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

ASF GitHub Bot (JIRA) Mon, 27 Mar 2017 07:11:14 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943325#comment-15943325
 ]


ASF GitHub Bot commented on FLINK-3257:
---------------------------------------

Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/1668
  
    For raw operator state, override 
`AbstractStreamOperator::snapshotState(StateSnapshotContext context)` inside 
your operator.  Your implementation calls to super, then it can obtain the raw 
stream via `context.getRawOperatorStateOutput()`. This stream works like a 
normal output stream, except that you can also call 
`stream.startNewPartition()`. This signals that a partition is started and 
previous partitions are finalized/immutable. Partitions are the atomic units of 
state redistribution, think of them as the indiviual elements in a 
`ListCheckpointed` state.
    
    For restoring, override 
`AbstractStreamOperator::initializeState(StateInitializationContext context)`. 
After calling super, `context.getRawOperatorStateInputs()` provides an iterable 
with one input stream per partition that your operator should restore.


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> -------------------------------------------------------------------
>
>                 Key: FLINK-3257
>                 URL: https://issues.apache.org/jira/browse/FLINK-3257
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

Reply via email to