[GitHub] [iceberg] openinx commented on issue #2808: Flink write cdc to iceberg may have duplicate records when an CommitStateUnknownException occurs.

GitBox Mon, 06 Sep 2021 05:58:08 -0700


openinx commented on issue #2808:
URL: https://github.com/apache/iceberg/issues/2808#issuecomment-913629145

> I think the socket timeout Txn1 is committed successfully finally and then
the job restores before the Txn1 is committed, and the restored job commits
normally. Then there will be two same max-committed-checkpointid snapshots.

This could explain why there're two same txn commits in the metadata. I am
thinking the candidate way to resolve this consistent issue are:

1. Just quit the flink streaming job when encountering
CommitStateUnknownException and let people to check whether it's OK to restart
the flink job.
2. Catch the CommitStateUnknownException in
[commitOperation](https://github.com/apache/iceberg/blob/e20088449daec9ed431754044b520b3ac5fa3eaa/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L308),
and retry to check the iceberg table whether it has been committed the stale
txn. If it has been exhausted and timeout to check the table ( I mean it does
not commit the txn successfully finally) , then we start to failover. In this
way we will need to use an experience timeout to evaluate whether it's OK to
stop to check the hive-metastore, and start the flink job failover....

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on issue #2808: Flink write cdc to iceberg may have duplicate records when an CommitStateUnknownException occurs.

Reply via email to