[ 
https://issues.apache.org/jira/browse/FLINK-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668324#comment-15668324
 ] 

ASF GitHub Bot commented on FLINK-5056:
---------------------------------------

Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2797#discussion_r88113221
  
    --- Diff: 
flink-streaming-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java
 ---
    @@ -341,23 +355,49 @@ public void setInputType(TypeInformation<?> type, 
ExecutionConfig executionConfi
        }
     
        @Override
    -   public void open(Configuration parameters) throws Exception {
    -           super.open(parameters);
    +   public void initializeState(FunctionInitializationContext context) 
throws Exception {
    +           Preconditions.checkArgument(this.restoredBucketStates == null, 
"The operator has already been initialized.");
     
    -           subtaskIndex = getRuntimeContext().getIndexOfThisSubtask();
    +           initFileSystem();
     
    -           state = new State<T>();
    --- End diff --
    
    undoing it was a bit of an overstatement; moving it into a separate commit 
with all the other formatting changes would do the trick as well. If you try 
that out you will see that the changes to open and initializeState no longer 
intersect so heavily.


> BucketingSink deletes valid data when checkpoint notification is slow.
> ----------------------------------------------------------------------
>
>                 Key: FLINK-5056
>                 URL: https://issues.apache.org/jira/browse/FLINK-5056
>             Project: Flink
>          Issue Type: Bug
>          Components: filesystem-connector
>    Affects Versions: 1.1.3
>            Reporter: Kostas Kloudas
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0
>
>
> Currently if BucketingSink receives no data after a checkpoint and then a 
> notification about a previous checkpoint arrives, it clears its state. This 
> can 
> lead to not committing valid data about intermediate checkpoints for whom
> a notification has not arrived yet. As a simple sequence that illustrates the 
> problem:
> -> input data 
> -> snapshot(0) 
> -> input data
> -> snapshot(1)
> -> no data
> -> notifyCheckpointComplete(0)
> the last will clear the state of the Sink without committing as final the 
> data 
> that arrived for checkpoint 1.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to