[ https://issues.apache.org/jira/browse/KAFKA-14171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justinwins updated KAFKA-14171: ------------------------------- Description: - This seems a small bug (or improvment) ,but it really impacts perf of mm2. - When DistributedHerder starts, it will startServices()--> this.worker.start() --> offsetBackingStore.start() --> offsetLog.start() ,and finally in `KafkaBasedLog` class ,we see `consumer.seekToBeginning(partitions)` . Take a look at `org.apache.kafka.connect.util.KafkaBasedLog#start` ,you will get to know it. - Basically, mm2-offsets topic will be kept for 7 days (as defined by 'retention.ms' ) . If there are many paritions for mm2 to replicate ,then mm2-offsets topic may be quite 'big' in 7 days. And it may take a few minutes or more to poll unitil the consumer reaches the latest . This is a VERY Cpu-consuming action, and it incurs cpu throttle in the k8s container. - I think mm-offsets topic ,or to be specific , KafkaBasedLog is a special topic .At least, we can set a much shorter ttl for it to avoid this bug . was: - This seems a small bug (or improvment) ,but it really impacts perf of mm2. - When DistributedHerder starts, it will startServices()--> this.worker.start() --> offsetBackingStore.start() --> offsetLog.start() ,and finally in `KafkaBasedLog` class ,we see `consumer.seekToBeginning(partitions)` . Take a look at `org.apache.kafka.connect.util.KafkaBasedLog#start` ,you will get to know it. - Basically, mm2-offsets topic will be kept for 7 days (as defined by 'retention.ms' ) . If there are many paritions for mm2 to replicate ,then mm2-offsets topic may be quite 'big' in 7 days. And it may take a few minutes or more to poll unitil the consumer reaches the latest . This is a VERY Cpu-consuming action, and it incurs cpu throttle in the k8s container. - I think mm-offsets topic ,or to be specific , KafkaBasedLog is a special topic .At least, we can set a much shorter ttl for it to avoid this bug . > mm2-offsets topic should be set retention.ms=1h or less as default > ------------------------------------------------------------------ > > Key: KAFKA-14171 > URL: https://issues.apache.org/jira/browse/KAFKA-14171 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.2.1 > Reporter: Justinwins > Priority: Major > > - This seems a small bug (or improvment) ,but it really impacts perf of mm2. > - When DistributedHerder starts, it will startServices()--> > this.worker.start() --> offsetBackingStore.start() --> offsetLog.start() > ,and finally in `KafkaBasedLog` class ,we see > `consumer.seekToBeginning(partitions)` . Take a look at > `org.apache.kafka.connect.util.KafkaBasedLog#start` ,you will get to know it. > - Basically, mm2-offsets topic will be kept for 7 days (as defined by > 'retention.ms' ) . If there are many paritions for mm2 to replicate ,then > mm2-offsets topic may be quite 'big' in 7 days. And it may take a few > minutes or more to poll unitil the consumer reaches the latest . This is a > VERY Cpu-consuming action, and it incurs cpu throttle in the k8s container. > - I think mm-offsets topic ,or to be specific , KafkaBasedLog is a special > topic .At least, we can set a much shorter ttl for it to avoid this bug . > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)