Piotr Nowojski created FLINK-17350:
--------------------------------------

             Summary: StreamTask should always fail immediately on failures in 
synchronous part of a checkpoint
                 Key: FLINK-17350
                 URL: https://issues.apache.org/jira/browse/FLINK-17350
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing, Runtime / Task
    Affects Versions: 1.10.0, 1.9.2, 1.8.3, 1.7.2, 1.6.4
            Reporter: Piotr Nowojski


This bugs also Affects 1.5.x branch.

As described 
https://issues.apache.org/jira/browse/FLINK-17327?focusedCommentId=17090576&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090576

{{setTolerableCheckpointFailureNumber(...)}} and its deprecated 
{{setFailTaskOnCheckpointError(...)}} predecessor are implemented incorrectly. 
Since Flink 1.5 (https://issues.apache.org/jira/browse/FLINK-4809) they can 
lead to operators (and especially sinks with an external state) end up in an 
inconsistent state. That's also true even if they are not used, because of 
another issue: PLACEHOLDER

For details please check FLINK-17327.

The problem boils down to a fact, that if operator/user functions throws an 
exception, job should always fail. There is no recovery from this. In case of 
{{FlinkKafkaProducer}} ignoring such failures might mean that whole transaction 
with all of it's records will be lost forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to