Justine Olshan created KAFKA-15984: -------------------------------------- Summary: Client disconnections can cause hanging transactions on __consumer_offsets Key: KAFKA-15984 URL: https://issues.apache.org/jira/browse/KAFKA-15984 Project: Kafka Issue Type: Task Reporter: Justine Olshan
When investigating frequent hanging transactions on __consumer_offsets partitions, we realized that many of them were cause by the same offset being committed with duplicates and one with `"isDisconnectedClient":true`. TxnOffsetCommits do not have sequence numbers and thus are not protected against duplicates in the same way idempotent produce requests are. Thus, when a client is disconnected (and flushes its requests), we may see the duplicate get appended to the log. KIP-890 part 1 should protect against this as the duplicate will not succeed verification. KIP-890 part 2 strengthens this further as duplicates (from previous transactions) can not be added to new transactions if the partitions is re-added since the epoch will be bumped. Another possible solution is to do duplicate checking on the group coordinator side when the request comes in. This solution could be used instead of KIP-890 part 1 to prevent hanging transactions but given that part 1 only has one open PR remaining, we may not need to do this. However, this can also prevent duplicates from being added to a new transaction – something only part 2 will protect against. -- This message was sent by Atlassian Jira (v8.20.10#820010)