kbendick opened a new pull request #3106: URL: https://github.com/apache/iceberg/pull/3106
We are occasionally seeing CI runs take 6 hours, and then ultimately timeout. After adding further logging, it seems that there is a Flink test that is still trying to checkpoint after the job has entered the FINISHED state. I'm not 100% sure if adding this config will help with that (as it might not be considered a checkpoint failure), but it's worth a shot for further debugging. Ultimately, we should resolve this issue, but for now I just want to see if this will help. Further details (and logs) can be found here: https://github.com/apache/iceberg/issues/3091 The relevant log that is spewed for hours until timeout is: ``` 2021-09-13T08:19:47.7896411Z > Task :iceberg-flink:test 2021-09-13T08:19:47.7899950Z [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint. 2021-09-13T08:19:47.7905489Z [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint. 2021-09-13T08:19:47.7914766Z [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint. 2021-09-13T08:19:47.7920502Z [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint. ``` cc @nastra @openinx @rdblue @RussellSpitzer @stevenzwu in case you have any insight on how to resolve this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
