[ 
https://issues.apache.org/jira/browse/BEAM-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549012#comment-17549012
 ] 

Danny McCormick commented on BEAM-10927:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/20622

> Beam Flink Runner 1.10 checkpoint failure
> -----------------------------------------
>
>                 Key: BEAM-10927
>                 URL: https://issues.apache.org/jira/browse/BEAM-10927
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 2.23.0
>            Reporter: Omkar Deshpande
>            Priority: P3
>
> Recently upgraded to beam-runners-flink-1.10 v2.23.0 from 
> beam-runners-flink-1.9 v2.23.0. Also, upgraded the flink server to 1.10.2 
> from 1.9.3.
> The beam pipeline reads from kafkaio and writes to kafkaio and there is an 
> in-memory pardo between PBegin and PDone. The application is configured to 
> use s3 for checkpointing and the state backend is RocksDB.
> This beam pipeline was working as expected with beam-runners-flink-1.9 as 
> expected. But after upgrading to beam-runners-flink-1.10 the checkpoints keep 
> timing out. I have tried increasing time out to several hours. But 
> checkpoints keep timing out.
> There are no exceptions in the log. Based on the logs, both synchronous and 
> asynchronous phases of checkpointing are not happening. Usually "Trigger 
> checkpoint" log statement is followed by "Confirm checkpoint" when the 
> checkpoint succeeds. But with 1.10, I only see "Trigger checkpoint" and no 
> confirmation of completion or even indication of progress. There are enough 
> cpu and memory available and there is no deadlock.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to