[ https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229780#comment-17229780 ]
Robert Metzger commented on FLINK-19300: ---------------------------------------- Thanks a lot for reporting this. Which Flink version are you using? > Timer loss after restoring from savepoint > ----------------------------------------- > > Key: FLINK-19300 > URL: https://issues.apache.org/jira/browse/FLINK-19300 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends > Reporter: Xiang Gao > Priority: Critical > > While using heap-based timers, we are seeing occasional timer loss after > restoring program from savepoint, especially when using a remote savepoint > storage (s3). > After some investigation, the issue seems to be related to [this line in > deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65]. > When trying to check the VERSIONED_IDENTIFIER, the input stream may not > guarantee filling the byte array, causing timers to be dropped for the > affected key group. > Should keep reading until expected number of bytes are actually read or if > end of the stream has been reached. -- This message was sent by Atlassian Jira (v8.3.4#803005)