RivenSun2 commented on a change in pull request #11340: URL: https://github.com/apache/kafka/pull/11340#discussion_r727884034
########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: 1. When re-join group via `maybeAutoCommitOffsetsSync`, the default timer is `time.timer(rebalanceTimeoutMs)`. Because our consumers' consumption logic is sometimes heavier, we refer to the configuration of Kafka stream https://kafka.apache.org/documentation/#upgrade_10201_notable Set max.poll.interval.ms to Integer.MAX_VALUE Therefore, in the maybeAutoCommitOffsetsSync method, there will be an endless loop of retry submitting the offset within the Integer.MAX_VALUE time. The impacts are: (1)rebalance cannot be completed, group consumption progress is suspended; (2)offsets commit with a high frequency, and CPU consumption on the broker side increases sharply 2.this method `maybeAutoCommitOffsetsSync(Timer timer)` , customers will not use this method. There are only two places to call this method: (1) onJoinPrepare(...) in ConsumerCoordinator (2) close(final Timer timer) in ConsumerCoordinator ########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: 1. When re-join group via `maybeAutoCommitOffsetsSync`, the default timer is `time.timer(rebalanceTimeoutMs)`. Because our consumers' consumption logic is sometimes heavier, we refer to the configuration of Kafka stream https://kafka.apache.org/documentation/#upgrade_10201_notable Set max.poll.interval.ms to Integer.MAX_VALUE Therefore, in the maybeAutoCommitOffsetsSync method, there will be an endless loop of retry submitting the offset within the Integer.MAX_VALUE time. The impacts are: (1)rebalance cannot be completed, group consumption progress is suspended; (2)offsets commit with a high frequency, and CPU consumption on the broker side increases sharply ########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: 1. When re-join group via `maybeAutoCommitOffsetsSync`, the default timer is `time.timer(rebalanceTimeoutMs)`. Because our consumers' consumption logic is sometimes heavier, we refer to the configuration of Kafka stream https://kafka.apache.org/documentation/#upgrade_10201_notable Set max.poll.interval.ms to Integer.MAX_VALUE Therefore, in the maybeAutoCommitOffsetsSync method, there will be an endless loop of retry submitting the offset within the Integer.MAX_VALUE time. The impacts are: (1)rebalance cannot be completed, group consumption progress is suspended; (2)offsets commit with a high frequency, and CPU consumption on the broker side increases sharply 2.For the call to `Consumer#commitSync`, if we can’t tolerate silently cleaning up non-existent offsets, we still have to make sure The timer passed in by `commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, Timer timer) commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, Timer timer)` method is a controllable value, `max.poll.interval.ms` allows users to set by themselves, However, it is applied here without declaration, which is obviously out of control ########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: 1. When re-join group via `maybeAutoCommitOffsetsSync`, the default timer is `time.timer(rebalanceTimeoutMs)`. Because our consumers' consumption logic is sometimes heavier, we refer to the configuration of Kafka stream https://kafka.apache.org/documentation/#upgrade_10201_notable Set max.poll.interval.ms to Integer.MAX_VALUE Therefore, in the maybeAutoCommitOffsetsSync method, there will be an endless loop of retry submitting the offset within the Integer.MAX_VALUE time. The impacts are: (1)rebalance cannot be completed, group consumption progress is suspended; (2)offsets commit with a high frequency, and CPU consumption on the broker side increases sharply. 2. For the call to `Consumer#commitSync`, if we can’t tolerate silently cleaning up non-existent offsets, we still have to make sure that the timer passed in by `commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, Timer timer) commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, Timer timer)` method is a controllable value, `max.poll.interval.ms` allows users to set by themselves, However, it is applied here without declaration, which is obviously out of control ########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: 1. When re-join group via `maybeAutoCommitOffsetsSync`, the default timer is `time.timer(rebalanceTimeoutMs)`. Because our consumers' consumption logic is sometimes heavier, we refer to the configuration of Kafka stream https://kafka.apache.org/documentation/#upgrade_10201_notable Set max.poll.interval.ms to Integer.MAX_VALUE Therefore, in the maybeAutoCommitOffsetsSync method, there will be an endless loop of retry submitting the offset within the Integer.MAX_VALUE time. The impacts are: (1)rebalance cannot be completed, group consumption progress is suspended; (2)offsets commit with a high frequency, and CPU consumption on the broker side increases sharply. 2. For the call to `Consumer#commitSync`, if we can’t tolerate silently cleaning up non-existent offsets, we still have to make sure that the timer passed in by `commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, Timer timer) commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, Timer timer)` method is a controllable value, `max.poll.interval.ms` allows users to set by themselves, However, it is applied here without declaration, which is obviously out of control. Or we can put `checkConsumedOffsetsAreValid` & `cleanUpConsumedOffsets` in method of `maybeAutoCommitOffsetsSync (Timer timer)` before calling `commitOffsetsSync` ########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: change its signature as ensureActiveGroup(timer, waitOnMetadata) I think if you pass `waitOnMetadata` to `ensureActiveGroup`, it will make things more complicated, because `waitOnMetadata? timer: time.timer(0L)` will need be executed multiple times in `ensureCoordinatorReady` and `joinGroupIfNeeded`. ########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java ########## @@ -994,11 +996,16 @@ public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, if (offsets.isEmpty()) return true; + boolean shouldCleanUpConsumedOffsets = !checkConsumedOffsetsAreValid(offsets); Review comment: "change its signature as ensureActiveGroup(timer, waitOnMetadata)" I think if you pass `waitOnMetadata` to `ensureActiveGroup`, it will make things more complicated, because `waitOnMetadata? timer: time.timer(0L)` will need be executed multiple times in `ensureCoordinatorReady` and `joinGroupIfNeeded`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org