[ 
https://issues.apache.org/jira/browse/FLINK-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230306#comment-17230306
 ] 

Xiang Gao commented on FLINK-19300:
-----------------------------------

Created a PR. Seems like in case of non-versioned payload, we would push back 
those read bytes. We might not know the correct number of bytes to push back if 
we use DataInputView.readFully() while catching EOF. Did something similar to 
DataInputView.readFully(), but keep the number of bytes so that we can push 
back when necessary.

> Timer loss after restoring from savepoint
> -----------------------------------------
>
>                 Key: FLINK-19300
>                 URL: https://issues.apache.org/jira/browse/FLINK-19300
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 1.8.0
>            Reporter: Xiang Gao
>            Assignee: Xiang Gao
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.12.0, 1.11.3
>
>
> While using heap-based timers, we are seeing occasional timer loss after 
> restoring program from savepoint, especially when using a remote savepoint 
> storage (s3). 
> After some investigation, the issue seems to be related to [this line in 
> deserialization|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65].
>  When trying to check the VERSIONED_IDENTIFIER, the input stream may not 
> guarantee filling the byte array, causing timers to be dropped for the 
> affected key group.
> Should keep reading until expected number of bytes are actually read or if 
> end of the stream has been reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to