kbendick opened a new pull request #3110:
URL: https://github.com/apache/iceberg/pull/3110


   This test has a race condition, where one of the two disjointed DAGs can 
finish and close its tasks before the other has finished.
   
   When the task(s) belonging to the disjoint DAG which terminated aren't 
present to participate in checkpointing, it leads to an infinite loop of 
attempting to re-checkpoint.
   
   Here are some of the logs (visible when passing `-i` for info level logs to 
gradle.
   
   ```
   2021-09-13T08:19:47.7896411Z > Task :iceberg-flink:test
   2021-09-13T08:19:47.7899950Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   2021-09-13T08:19:47.7905489Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   2021-09-13T08:19:47.7914766Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   2021-09-13T08:19:47.7920502Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   ```
   
   Link to another PR where I attempted to debug this with some relevant 
discussion - https://github.com/apache/iceberg/pull/3106
   
   This (temporarily) closes this issue: 
https://github.com/apache/iceberg/issues/3091, though we should fix the 
`BoundedTestSource` (though this edge case might be fixed come Flink 1.14).
   
   More details and discussion in the issue (particularly the linked FLIP).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to