fqaiser94 commented on code in PR #10792:
URL: https://github.com/apache/iceberg/pull/10792#discussion_r1696032626
##########
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/channel/CoordinatorThread.java:
##########
@@ -65,5 +66,15 @@ boolean isTerminated() {
void terminate() {
terminated = true;
+
+ try {
+ join(60_000);
Review Comment:
> Thinking out loud here, I'm considering using the consumer group
generation ID to fence "zombie" coordinators, similar to how Kafka implements
zombie fencing. For example, store the generation ID when we start the
coordinator and assert that generation ID matches the current before we commit.
I've considered this before (and several, more complex variants of the idea)
but sadly there are still race conditions where you would end up with duplicate
file appends. The fundamental problem is that there is always a possibility
that after you've checked the generation-id matches the current-generation-id
but before you've committed to iceberg, another zombie process comes around and
commits the same files. In that sense, it's really no different than what we're
doing today in the `Coordinator` with our table-committed-offsets check 🤷
There are ways to close those race conditions but it still requires
conditional commit support from iceberg to implement the fencing. And at that
point, we don't really need to compare consumer-group-generation-ids. All we
really need to do is assert **at commit time** that the committed table offsets
have not changed since we last checked them. Not before, not after, **at commit
time**.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]