[jira] [Updated] (KAFKA-16228) Add --remote-log-metadata-decoder to kafka-dump-log.sh
[ https://issues.apache.org/jira/browse/KAFKA-16228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Federico Valeri updated KAFKA-16228: Labels: kip-required (was: ) > Add --remote-log-metadata-decoder to kafka-dump-log.sh > -- > > Key: KAFKA-16228 > URL: https://issues.apache.org/jira/browse/KAFKA-16228 > Project: Kafka > Issue Type: New Feature > Components: Tiered-Storage >Affects Versions: 3.6.1 >Reporter: Federico Valeri >Priority: Major > Labels: kip-required > > It would be good to improve the kafka-dump-log.sh tool adding a decode flag > for __remote_log_metadata records. Something like the following would be > useful for debugging. > {code} > bin/kafka-dump-log.sh --remote-log-metadata-decoder --files > /opt/kafka/data/__remote_log_metadata-0/.log > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16218) Partition reassignment can't complete if any target replica is out-of-sync
[ https://issues.apache.org/jira/browse/KAFKA-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drawxy updated KAFKA-16218: --- Issue Type: Improvement (was: Bug) > Partition reassignment can't complete if any target replica is out-of-sync > -- > > Key: KAFKA-16218 > URL: https://issues.apache.org/jira/browse/KAFKA-16218 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Drawxy >Priority: Major > > Assumed that there were 4 brokers (1001,2001,3001,4001) and a topic partition > _foo-0_ (replicas[1001,2001,3001], isr[1001,3001]). The replica 2001 can't > catch up and become out-of-sync due to some issue. > If we launch a partition reassinment for this _foo-0_ (the target replica > list is [1001,2001,4001]), the partition reassignment can't complete even if > the adding replica 4001 already catches up. At that time, the partition state > would be replicas[1001,2001,4001,3001] isr[1001,3001,4001]. > > The out-of-sync replica 2001 shouldn't make the partition reassignment stuck. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] MINOR fix word spelling mistakes [kafka]
showuon merged PR #15331: URL: https://github.com/apache/kafka/pull/15331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16232) kafka hangs forever in the starting process if the authorizer future is not returned
Luke Chen created KAFKA-16232: - Summary: kafka hangs forever in the starting process if the authorizer future is not returned Key: KAFKA-16232 URL: https://issues.apache.org/jira/browse/KAFKA-16232 Project: Kafka Issue Type: Improvement Affects Versions: 3.6.1 Reporter: Luke Chen For security reason, during broker startup, we will wait until all ACL entries loaded before starting serving requests. But recently, we accidentally set standardAuthorizer to ZK broker, and then, the broker never enters RUNNING state because it's waiting for the standardAuthorizer future completion. Of course this is a human error to set the wrong configuration, but it'd be better we could handle this case better. Suggestions: 1. set timeout for authorizer future waiting (how long is long enough?) 2. add logs before and after future waiting, to allow admin to know we're waiting for the authorizer future. We can start with (2), and thinking about (1) later. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-12823) Remove Deprecated method KStream#through
[ https://issues.apache.org/jira/browse/KAFKA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815089#comment-17815089 ] Matthias J. Sax commented on KAFKA-12823: - Look for ticket labeled "beginner" or "newbie": https://issues.apache.org/jira/browse/KAFKA-16209?jql=project%20%3D%20KAFKA%20AND%20labels%20in%20(Beginner%2C%20beginner%2C%20newbie%2C%20%22newbie%2B%2B%22)%20ORDER%20BY%20created%20DESC%2C%20priority%20DESC%2C%20updated%20DESC > Remove Deprecated method KStream#through > > > Key: KAFKA-12823 > URL: https://issues.apache.org/jira/browse/KAFKA-12823 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: Josep Prat >Priority: Blocker > Fix For: 4.0.0 > > > The method through in Java and Scala class KStream was deprecated in version > 2.6: > * org.apache.kafka.streams.scala.kstream.KStream#through > * org.apache.kafka.streams.kstream.KStream#through(java.lang.String) > * org.apache.kafka.streams.kstream.KStream#through(java.lang.String, > org.apache.kafka.streams.kstream.Produced) > > See KAFKA-10003 and KIP-221 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] MINOR fix word spelling mistakes [kafka]
eliasyaoyc opened a new pull request, #15331: URL: https://github.com/apache/kafka/pull/15331 - fix word spelling mistakes in KafkaRaftClient ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-14517: Implement regex subscriptions [kafka]
JimmyWang6 commented on PR #14327: URL: https://github.com/apache/kafka/pull/14327#issuecomment-1931122011 > Thanks for your reply! > I will remove this part of code @dajac I'm sorry that I replied with the wrong content. Here is my email address: [www.wangzhiw...@qq.com](www.wangzhiw...@qq.com) and much thanks for your invitation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16231: Update consumer_test.py to support KIP-848’s group protocol config [kafka]
kirktrue opened a new pull request, #15330: URL: https://github.com/apache/kafka/pull/15330 Added a new optional `group_protocol` parameter to the test methods, then passed that down to the `setup_consumer` method. Unfortunately, because the new consumer can only be used with the new coordinator, this required a new `@matrix` block instead of adding the `group_protocol=["classic", "consumer"]` to the existing blocks ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-14567) Kafka Streams crashes after ProducerFencedException
[ https://issues.apache.org/jira/browse/KAFKA-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-14567: Description: Running a Kafka Streams application with EOS-v2. We first see a `ProducerFencedException`. After the fencing, the fenced thread crashed resulting in a non-recoverable error: {quote}[2022-12-22 13:49:13,423] ERROR [i-0c291188ec2ae17a0-StreamThread-3] stream-thread [i-0c291188ec2ae17a0-StreamThread-3] Failed to process stream task 1_2 due to the following error: (org.apache.kafka.streams.processor.internals.TaskExecutor) org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=1_2, processor=KSTREAM-SOURCE-05, topic=node-name-repartition, partition=2, offset=539776276, stacktrace=java.lang.IllegalStateException: TransactionalId stream-soak-test-72b6e57c-c2f5-489d-ab9f-fdbb215d2c86-3: Invalid transition attempted from state FATAL_ERROR to state ABORTABLE_ERROR at org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:974) at org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:394) at org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:620) at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1079) at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:959) at org.apache.kafka.streams.processor.internals.StreamsProducer.send(StreamsProducer.java:257) at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:207) at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:162) at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:85) at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290) at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269) at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228) at org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:88) at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:157) at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290) at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269) at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228) at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84) at org.apache.kafka.streams.processor.internals.StreamTask.lambda$doProcess$1(StreamTask.java:791) at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:867) at org.apache.kafka.streams.processor.internals.StreamTask.doProcess(StreamTask.java:791) at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:722) at org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:95) at org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:76) at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1645) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:607) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:569) at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:748) at org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:95) at org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:76) at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1645) at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788) at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:607) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:569) Caused by: java.lang.IllegalStateException: TransactionalId stream-soak-test-72b6e57c-c2f5-489d-ab9f-fdbb215d2c86-3: Invalid transition attempted from state FATAL_ERROR to state ABORTABLE_ERROR
[PR] KIP848- Add JMH Benchmarks for Client And Server Side Assignors [kafka]
rreddy-22 opened a new pull request, #15329: URL: https://github.com/apache/kafka/pull/15329 This PR contains the addition of two JMH benchmark workloads to test the performance of the different assignors present on both the client side and the new server side assignors (KIP-848). For the purpose of testing we've used the following assumptions: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16230: Update verifiable_consumer.py to support KIP-848’s group protocol config [kafka]
kirktrue opened a new pull request, #15328: URL: https://github.com/apache/kafka/pull/15328 Including the new `--group-protocol` command line option to `VerifiableConsumer` (from KAFKA-16037/#15325) if the node is running 3.7.0+ of the consumer ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-15538) Client support for java regex based subscription
[ https://issues.apache.org/jira/browse/KAFKA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815041#comment-17815041 ] Phuc Hong Tran commented on KAFKA-15538: [~lianetm] I think instead of closing this ticket we can move the part of including Pattern into heartbeat request to ticket KAFKA-15561, as the foundation work for it is already there. This ticket will be exclusive to enabling relating tests of the subscribe methods that use Pattern, wdyt? > Client support for java regex based subscription > > > Key: KAFKA-15538 > URL: https://issues.apache.org/jira/browse/KAFKA-15538 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Lianet Magrans >Assignee: Phuc Hong Tran >Priority: Critical > Labels: kip-848-client-support, newbie, regex > Fix For: 3.8.0 > > > When using subscribe with a java regex (Pattern), we need to resolve it on > the client side to send the broker a list of topic names to subscribe to. > Context: > The new consumer group protocol uses [Google > RE2/J|https://github.com/google/re2j] for regular expressions and introduces > new methods in the consumer API to subscribe using a `SubscribePattern`. The > subscribe using a java `Pattern` will be still supported for a while but > eventually removed. > * When the subscribe with SubscriptionPattern is used, the client should > just send the regex to the broker and it will be resolved on the server side. > * In the case of the subscribe with Pattern, the regex should be resolved on > the client side. > As part of this task, we should re-enable all integration tests defined in > the PlainTextAsyncConsumer that relate to subscription with pattern and that > are currently disabled for the new consumer + new protocol -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-7663: Custom Processor supplied on addGlobalStore is not used when restoring state from topic [kafka]
wcarlson5 closed pull request #15326: KAFKA-7663: Custom Processor supplied on addGlobalStore is not used when restoring state from topic URL: https://github.com/apache/kafka/pull/15326 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] MINOR Fix a case where not all ACLs for a given resource are written to ZK [kafka]
mumrah opened a new pull request, #15327: URL: https://github.com/apache/kafka/pull/15327 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-7663: Custom Processor supplied on addGlobalStore is not used when restoring state from topic [kafka]
wcarlson5 opened a new pull request, #15326: URL: https://github.com/apache/kafka/pull/15326 The global table is reloaded but without going through the processor supplied; instead, it calls GlobalStateManagerImp#restoreState which simply stores the input topic K,V records into rocksDB. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16226 Reduce synchronization between producer threads [kafka]
hachikuji commented on code in PR #15323: URL: https://github.com/apache/kafka/pull/15323#discussion_r1480620848 ## clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java: ## @@ -647,27 +647,27 @@ private long batchReady(boolean exhausted, TopicPartition part, Node leader, } /** - * Iterate over partitions to see which one have batches ready and collect leaders of those partitions - * into the set of ready nodes. If partition has no leader, add the topic to the set of topics with - * no leader. This function also calculates stats for adaptive partitioning. + * Iterate over partitions to see which one have batches ready and collect leaders of those + * partitions into the set of ready nodes. If partition has no leader, add the topic to the set + * of topics with no leader. This function also calculates stats for adaptive partitioning. * - * @param metadata The cluster metadata - * @param nowMs The current time - * @param topic The topic - * @param topicInfo The topic info + * @param cluster The cluster metadata + * @param nowMs The current time + * @param topic The topic + * @param topicInfo The topic info * @param nextReadyCheckDelayMs The delay for next check - * @param readyNodes The set of ready nodes (to be filled in) - * @param unknownLeaderTopics The set of topics with no leader (to be filled in) + * @param readyNodesThe set of ready nodes (to be filled in) + * @param unknownLeaderTopics The set of topics with no leader (to be filled in) * @return The delay for next check */ -private long partitionReady(Metadata metadata, long nowMs, String topic, +private long partitionReady(Cluster cluster, long nowMs, String topic, Review Comment: In some ways, this is a step backwards. We have been trying to reduce the reliance on `Cluster` internally because it is public. With a lot of internal usage, we end up making changes to the API which are only needed for the internal implementation (as we are doing in this PR). Have you considered alternatives? Perhaps we could expose something like `Cluster`, but with a reduced scope? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16206 Fix unnecessary topic config deletion during ZK migration [kafka]
ahuang98 commented on PR #14206: URL: https://github.com/apache/kafka/pull/14206#issuecomment-1930876640 Is there a downside to having `deleteTopic` in `ZkTopicMigrationClient` not delete configs? Otherwise changing the logging level seems okay to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16055: Thread unsafe use of HashMap stored in QueryableStoreProvider#storeProviders [kafka]
wcarlson5 commented on PR #15121: URL: https://github.com/apache/kafka/pull/15121#issuecomment-1930810239 @ableegoldman I think you have context on this issue. https://lists.apache.org/thread/gpct1275bfqovlckptn3lvf683qpoxol -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16215; KAFKA-16178: Fix member not rejoining after error [kafka]
lianetm commented on PR #15311: URL: https://github.com/apache/kafka/pull/15311#issuecomment-1930740037 Hey @dajac , I updated this to ensure we record a failed attempt for all errors in HB. That will effectively update the received time and backoff, with the ability to skip backoff (0 backoff) in some specific errors where the next HB depends on other conditions (coordinator discovered, assignment released) and we won't add any extra backoff to that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16037: Update VerifiableConsumer to support KIP-848’s group protocol config [kafka]
kirktrue opened a new pull request, #15325: URL: https://github.com/apache/kafka/pull/15325 Add the optional `--group-protocol` command line option that can be set in the system tests There are no existing unit tests for `VerifiableConsumer`. It was tested by running the system tests locally without regression. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16231) Update consumer_test.py to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16231: -- Description: This task is to update {{consumer_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. For example, here's how it would look to add the new group_protocol parameter to the parameterized tests: {code:python} @cluster(num_nodes=6) @matrix( assignment_strategy=["org.apache.kafka.clients.consumer.RangeAssignor", "org.apache.kafka.clients.consumer.RoundRobinAssignor", "org.apache.kafka.clients.consumer.StickyAssignor"], metadata_quorum=[quorum.zk], use_new_coordinator=[False] ) @matrix( assignment_strategy=["org.apache.kafka.clients.consumer.RangeAssignor", "org.apache.kafka.clients.consumer.RoundRobinAssignor", "org.apache.kafka.clients.consumer.StickyAssignor"], metadata_quorum=[quorum.isolated_kraft], use_new_coordinator=[False] ) @matrix( metadata_quorum=[quorum.isolated_kraft], use_new_coordinator=[True], group_protocol=["classic", "consumer"] ) def test_the_consumer(self, assignment_strategy, metadata_quorum=quorum.zk, use_new_coordinator=False, group_protocol="classic"): consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) {code} The {{group_protocol}} parameter will default to {{{}classic{}}}. {*}Note{*}: we only test the new group protocol setting when {{use_new_coordinator}} is {{{}True{}}}, as that is the only supported mode. was:This task is to update {{verifiable_consumer.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument. It will default to classic and we will take a separate task (Jira) to update the callers. > Update consumer_test.py to support KIP-848’s group protocol config > -- > > Key: KAFKA-16231 > URL: https://issues.apache.org/jira/browse/KAFKA-16231 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to update {{consumer_test.py}} to support the {{group.protocol}} > configuration introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] > by adding an optional {{group_protocol}} argument to the tests and matrixes. > For example, here's how it would look to add the new group_protocol parameter > to the parameterized tests: > {code:python} > @cluster(num_nodes=6) > @matrix( > > assignment_strategy=["org.apache.kafka.clients.consumer.RangeAssignor", > > "org.apache.kafka.clients.consumer.RoundRobinAssignor", > > "org.apache.kafka.clients.consumer.StickyAssignor"], > metadata_quorum=[quorum.zk], > use_new_coordinator=[False] > ) > @matrix( > > assignment_strategy=["org.apache.kafka.clients.consumer.RangeAssignor", > > "org.apache.kafka.clients.consumer.RoundRobinAssignor", > > "org.apache.kafka.clients.consumer.StickyAssignor"], > metadata_quorum=[quorum.isolated_kraft], > use_new_coordinator=[False] > ) > @matrix( > metadata_quorum=[quorum.isolated_kraft], > use_new_coordinator=[True], > group_protocol=["classic", "consumer"] > ) > def test_the_consumer(self, assignment_strategy, > metadata_quorum=quorum.zk, use_new_coordinator=False, > group_protocol="classic"): > consumer = self.setup_consumer("my_topic", > group_protocol=group_protocol) > {code} > The {{group_protocol}} parameter will default to {{{}classic{}}}. > {*}Note{*}: we only test the new group protocol setting when > {{use_new_coordinator}} is {{{}True{}}}, as that is the only supported mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16231) Update consumer_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16231: - Summary: Update consumer_test.py to support KIP-848’s group protocol config Key: KAFKA-16231 URL: https://issues.apache.org/jira/browse/KAFKA-16231 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update {{verifiable_consumer.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument. It will default to classic and we will take a separate task (Jira) to update the callers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16230) Update verifiable_consumer.py to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16230: -- Summary: Update verifiable_consumer.py to support KIP-848’s group protocol config (was: Update verifiable_consumer to support KIP-848’s group protocol config) > Update verifiable_consumer.py to support KIP-848’s group protocol config > > > Key: KAFKA-16230 > URL: https://issues.apache.org/jira/browse/KAFKA-16230 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to update {{verifiable_consumer.py}} to support the > {{group.protocol}} configuration introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] > by adding an optional {{group_protocol}} argument. It will default to > classic and we will take a separate task (Jira) to update the callers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16230) Update verifiable_consumer to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16230: -- Description: This task is to update {{verifiable_consumer.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument. It will default to classic and we will take a separate task (Jira) to update the callers. (was: This task is to update {{VerifiableConsumer}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{--group-protocol}} command line option.) > Update verifiable_consumer to support KIP-848’s group protocol config > - > > Key: KAFKA-16230 > URL: https://issues.apache.org/jira/browse/KAFKA-16230 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to update {{verifiable_consumer.py}} to support the > {{group.protocol}} configuration introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] > by adding an optional {{group_protocol}} argument. It will default to > classic and we will take a separate task (Jira) to update the callers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16230) Update verifiable_consumer to support KIP-848’s group protocol config
Kirk True created KAFKA-16230: - Summary: Update verifiable_consumer to support KIP-848’s group protocol config Key: KAFKA-16230 URL: https://issues.apache.org/jira/browse/KAFKA-16230 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update {{VerifiableConsumer}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{--group-protocol}} command line option. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16037) Update VerifiableConsumer to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16037: -- Description: This task is to update {{VerifiableConsumer}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{--group-protocol}} command line option. (was: This task is to allow the the system tests so that they execute the Consumer for both of the group protocols introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: * classic * consumer This is done in a few steps... First, update the Java-based VerifiableConsumer to support passing in the group protocol by adding a new group_protocol parameter to the relevant parameterized tests: {code:python} @matrix( metadata_quorum=[quorum.isolated_kraft], use_new_coordinator=[True], group_protocol=["classic", "consumer"] ) def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, use_new_coordinator=False): consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) {code} The VerifiableConsumer class in Python represents the calling of the VerifiableConsumer class in Java. The ) > Update VerifiableConsumer to support KIP-848’s group protocol config > > > Key: KAFKA-16037 > URL: https://issues.apache.org/jira/browse/KAFKA-16037 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to update {{VerifiableConsumer}} to support the > {{group.protocol}} configuration introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] > by adding an optional {{--group-protocol}} command line option. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16037) Update VerifiableConsumer to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16037: -- Description: This task is to allow the the system tests so that they execute the Consumer for both of the group protocols introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: * classic * consumer This is done in a few steps... First, update the Java-based VerifiableConsumer to support passing in the group protocol by adding a new group_protocol parameter to the relevant parameterized tests: {code:python} @matrix( metadata_quorum=[quorum.isolated_kraft], use_new_coordinator=[True], group_protocol=["classic", "consumer"] ) def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, use_new_coordinator=False): consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) {code} The VerifiableConsumer class in Python represents the calling of the VerifiableConsumer class in Java. The was: This task is to parameterize the system tests so that they execute the Consumer for both of the group protocols introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: * classic * consumer This is done in a few steps... First, update the Java-based VerifiableConsumer to support passing in the group protocol by adding a new group_protocol parameter to the relevant parameterized tests: {code:python} @matrix( metadata_quorum=[quorum.isolated_kraft], use_new_coordinator=[True], group_protocol=["classic", "consumer"] ) def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, use_new_coordinator=False): consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) {code} The VerifiableConsumer class in Python represents the calling of the VerifiableConsumer class in Java. The > Update VerifiableConsumer to support KIP-848’s group protocol config > > > Key: KAFKA-16037 > URL: https://issues.apache.org/jira/browse/KAFKA-16037 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to allow the the system tests so that they execute the Consumer > for both of the group protocols introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: > * classic > * consumer > This is done in a few steps... > First, update the Java-based VerifiableConsumer to support passing in the > group protocol > by adding a new group_protocol parameter to the relevant parameterized tests: > {code:python} > @matrix( > metadata_quorum=[quorum.isolated_kraft], > use_new_coordinator=[True], > group_protocol=["classic", "consumer"] > ) > def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, > use_new_coordinator=False): > consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) > {code} > The VerifiableConsumer class in Python represents the calling of the > VerifiableConsumer class in Java. The -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16037) Update VerifiableConsumer to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16037: -- Summary: Update VerifiableConsumer to support KIP-848’s group protocol config (was: Upgrade existing system tests to use new consumer) > Update VerifiableConsumer to support KIP-848’s group protocol config > > > Key: KAFKA-16037 > URL: https://issues.apache.org/jira/browse/KAFKA-16037 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to parameterize the system tests so that they execute the > Consumer for both of the group protocols introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: > * classic > * consumer > This is done in a few steps... > First, update the Java-based VerifiableConsumer to support passing in the > group protocol > by adding a new group_protocol parameter to the relevant parameterized tests: > {code:python} > @matrix( > metadata_quorum=[quorum.isolated_kraft], > use_new_coordinator=[True], > group_protocol=["classic", "consumer"] > ) > def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, > use_new_coordinator=False): > consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) > {code} > The VerifiableConsumer class in Python represents the calling of the > VerifiableConsumer class in Java. The -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16037) Upgrade existing system tests to use new consumer
[ https://issues.apache.org/jira/browse/KAFKA-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16037: -- Description: This task is to parameterize the system tests so that they execute the Consumer for both of the group protocols introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: * classic * consumer This is done in a few steps... First, update the Java-based VerifiableConsumer to support passing in the group protocol by adding a new group_protocol parameter to the relevant parameterized tests: {code:python} @matrix( metadata_quorum=[quorum.isolated_kraft], use_new_coordinator=[True], group_protocol=["classic", "consumer"] ) def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, use_new_coordinator=False): consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) {code} The VerifiableConsumer class in Python represents the calling of the VerifiableConsumer class in Java. The was:This task is to parameterize the tests to run twice: both for the old and the new Consumer. > Upgrade existing system tests to use new consumer > - > > Key: KAFKA-16037 > URL: https://issues.apache.org/jira/browse/KAFKA-16037 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to parameterize the system tests so that they execute the > Consumer for both of the group protocols introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]: > * classic > * consumer > This is done in a few steps... > First, update the Java-based VerifiableConsumer to support passing in the > group protocol > by adding a new group_protocol parameter to the relevant parameterized tests: > {code:python} > @matrix( > metadata_quorum=[quorum.isolated_kraft], > use_new_coordinator=[True], > group_protocol=["classic", "consumer"] > ) > def test_the_consumer(self, group_protocol, metadata_quorum=quorum.zk, > use_new_coordinator=False): > consumer = self.setup_consumer("my_topic", group_protocol=group_protocol) > {code} > The VerifiableConsumer class in Python represents the calling of the > VerifiableConsumer class in Java. The -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16037) Upgrade existing system tests to use new consumer
[ https://issues.apache.org/jira/browse/KAFKA-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True reassigned KAFKA-16037: - Assignee: Kirk True (was: Dongnuo Lyu) > Upgrade existing system tests to use new consumer > - > > Key: KAFKA-16037 > URL: https://issues.apache.org/jira/browse/KAFKA-16037 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to parameterize the tests to run twice: both for the old and the > new Consumer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16215; KAFKA-16178: Fix member not rejoining after error [kafka]
lianetm commented on code in PR #15311: URL: https://github.com/apache/kafka/pull/15311#discussion_r1480405280 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestState.java: ## @@ -106,6 +106,13 @@ public void onSendAttempt(final long currentTimeMs) { this.lastSentMs = currentTimeMs; } +/** + * Update the lastReceivedTime in milliseconds, indicating that a response has been received. + */ +public void updateLastReceivedTime(final long lastReceivedMs) { +this.lastReceivedMs = lastReceivedMs; Review Comment: Exactly, I did see the skip backoff logic in produce when a new leader is discovered, related to what you described. I think that the concept of "progress" here would be more abstract but still applicable, depending on the exact error we know that the action it triggers is based on some progress (send HB as new member, send HB when coord available) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16227: -- Priority: Critical (was: Major) > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Critical > Labels: kip-848-client-support > Fix For: 3.8.0 > > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True reassigned KAFKA-16227: - Assignee: (was: Kirk True) > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: David Jacot >Priority: Critical > Labels: kip-848-client-support > Fix For: 3.8.0 > > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16227: -- Labels: kip-848-client-support (was: consumer-threading-refactor) > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Major > Labels: kip-848-client-support > Fix For: 3.8.0 > > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16227: -- Fix Version/s: 3.8.0 > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor > Fix For: 3.8.0 > > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16227: -- Component/s: consumer > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Major > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True updated KAFKA-16227: -- Labels: consumer-threading-refactor (was: ) > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-15761: KRaft support in EpochDrivenReplicationProtocolAcceptanceTest [kafka]
mimaison commented on PR #15295: URL: https://github.com/apache/kafka/pull/15295#issuecomment-1930519984 The build is still failing with Java 8 and Scala 2.12 so we can't merge this PR as is. You should be able to reproduce the issue by running: `./gradlew -PscalaVersion=2.12 core:compileTestScala` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR:Type Casting Correction AND Null Pointer Exception (NPE) Defense [kafka]
mimaison commented on code in PR #9786: URL: https://github.com/apache/kafka/pull/9786#discussion_r1480365763 ## streams/src/main/java/org/apache/kafka/streams/kstream/internals/InternalStreamsBuilder.java: ## @@ -437,7 +438,10 @@ private void rewriteSingleStoreSelfJoin( if (currentNode instanceof StreamStreamJoinNode && currentNode.parentNodes().size() == 1) { final StreamStreamJoinNode joinNode = (StreamStreamJoinNode) currentNode; // Remove JoinOtherWindowed node -final GraphNode parent = joinNode.parentNodes().stream().findFirst().get(); +final GraphNode parent = joinNode.parentNodes().stream() Review Comment: It seems we checked there is one item in parentNodes in the `if` condition just above, so do we really need this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (KAFKA-7663) Custom Processor supplied on addGlobalStore is not used when restoring state from topic
[ https://issues.apache.org/jira/browse/KAFKA-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walker Carlson reassigned KAFKA-7663: - Assignee: Walker Carlson > Custom Processor supplied on addGlobalStore is not used when restoring state > from topic > --- > > Key: KAFKA-7663 > URL: https://issues.apache.org/jira/browse/KAFKA-7663 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Tardif >Assignee: Walker Carlson >Priority: Major > Labels: new-streams-runtime-should-fix > Attachments: image-2018-11-20-11-42-14-697.png > > > I have implemented a StreamBuilder#{{addGlobalStore}} supplying a custom > processor responsible to transform a K,V record from the input stream into a > V,K records. It works fine and my {{store.all()}} does print the correct > persisted V,K records. However, if I clean the local store and restart the > stream app, the global table is reloaded but without going through the > processor supplied; instead, it calls {{GlobalStateManagerImp#restoreState}} > which simply stores the input topic K,V records into rocksDB (hence bypassing > the mapping function of my custom processor). I believe this must not be the > expected result? > This is a follow up on stackoverflow discussion around storing a K,V topic > as a global table with some stateless transformations based on a "custom" > processor added on the global store: > [https://stackoverflow.com/questions/50993292/kafka-streams-shared-changelog-topic#comment93591818_50993729] > If we address this issue, we should also apply > `default.deserialization.exception.handler` during restore (cf. KAFKA-8037) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] MINOR: Update LICENSE-binary file [kafka]
mimaison merged PR #15322: URL: https://github.com/apache/kafka/pull/15322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16229: Fix slow expired producer id deletion [kafka]
jolshan commented on PR #15324: URL: https://github.com/apache/kafka/pull/15324#issuecomment-1930479050 Hey @jeqo thanks for taking a look and improving this area! Can we add the benchmarks from the ticket to the PR description? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (KAFKA-16133) Commits during reconciliation always time out
[ https://issues.apache.org/jira/browse/KAFKA-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lianet Magrans resolved KAFKA-16133. Fix Version/s: 3.7.0 (was: 3.8.0) Resolution: Fixed > Commits during reconciliation always time out > - > > Key: KAFKA-16133 > URL: https://issues.apache.org/jira/browse/KAFKA-16133 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 3.7.0 >Reporter: Lucas Brutschy >Assignee: Lianet Magrans >Priority: Critical > Labels: consumer-threading-refactor, reconciliation, timeout > Fix For: 3.7.0 > > > This only affects the AsyncKafkaConsumer, which is in Preview in 3.7. > In MembershipManagerImpl there is a confusion between timeouts and deadlines. > [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/MembershipManagerImpl.java#L836C38-L836C38] > This causes all autocommits during reconciliation to immediately time out. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage
[ https://issues.apache.org/jira/browse/KAFKA-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Esteban Quilcate Otoya updated KAFKA-16229: - Description: Expiration of ProducerIds is implemented with a slow removal of map keys: ``` producers.keySet().removeAll(keys); ``` Unnecessarily going through all producer ids and then throw all expired keys to be removed. This leads to exponential time on worst case when most/all keys need to be removed: ``` Benchmark (numProducerIds) Mode Cnt Score Error Units ProducerStateManagerBench.testDeleteExpiringIds 100 avgt 3 9164.043 ± 10647.877 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt 3 341561.093 ± 20283.211 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1 avgt 3 44957983.550 ± 9389011.290 ns/op ProducerStateManagerBench.testDeleteExpiringIds 10 avgt 3 5683374164.167 ± 1446242131.466 ns/op ``` A simple fix is to use map#remove(key) instead, leading to a more linear growth: ``` Benchmark (numProducerIds) Mode Cnt Score Error Units ProducerStateManagerBench.testDeleteExpiringIds 100 avgt 3 5779.056 ± 651.389 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt 3 61430.530 ± 21875.644 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1 avgt 3 643887.031 ± 600475.302 ns/op ProducerStateManagerBench.testDeleteExpiringIds 10 avgt 3 7741689.539 ± 3218317.079 ns/op ``` was: Expiration of ProducerIds is implemented with a slow removal of map keys: ``` producers.keySet().removeAll(keys); ``` Unnecessarily going through all producer ids and then throw all expired keys to be removed. This leads to exponential time on worst case when most/all keys need to be removed: ``` Benchmark (numProducerIds) Mode Cnt Score Error Units ProducerStateManagerBench.testDeleteExpiringIds 100 avgt 3 9164.043 ± 10647.877 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt 3 341561.093 ± 20283.211 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1 avgt 3 44957983.550 ± 9389011.290 ns/op ProducerStateManagerBench.testDeleteExpiringIds 10 avgt 3 5683374164.167 ± 1446242131.466 ns/op ``` A simple fix is to use map#remove(key) instead, leading to a more linear growth: ``` Benchmark(numProducerIds) Mode Cnt Score Error Units ProducerStateManagerBench.testDeleteExpiringIds 100 avgt3 5779.056 ± 651.389 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt3 61430.530 ± 21875.644 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1 avgt3 643887.031 ± 600475.302 ns/op ProducerStateManagerBench.testDeleteExpiringIds10 avgt3 7741689.539 ± 3218317.079 ns/op ``` > Slow expiration of Producer IDs leading to high CPU usage > - > > Key: KAFKA-16229 > URL: https://issues.apache.org/jira/browse/KAFKA-16229 > Project: Kafka > Issue Type: Bug >Reporter: Jorge Esteban Quilcate Otoya >Assignee: Jorge Esteban Quilcate Otoya >Priority: Major > > Expiration of ProducerIds is implemented with a slow removal of map keys: > ``` > producers.keySet().removeAll(keys); > ``` > Unnecessarily going through all producer ids and then throw all expired keys > to be removed. > This leads to exponential time on worst case when most/all keys need to be > removed: > ``` > Benchmark (numProducerIds) Mode Cnt > Score Error Units > ProducerStateManagerBench.testDeleteExpiringIds 100 avgt 3 > 9164.043 ± 10647.877 ns/op > ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt 3 > 341561.093 ± 20283.211 ns/op > ProducerStateManagerBench.testDeleteExpiringIds 1 avgt 3 > 44957983.550 ± 9389011.290 ns/op > ProducerStateManagerBench.testDeleteExpiringIds 10 avgt 3 > 5683374164.167 ± 1446242131.466 ns/op > ``` > A simple fix is to use map#remove(key) instead, leading to a more linear > growth: > ``` > Benchmark (numProducerIds) Mode Cnt > Score Error Units > ProducerStateManagerBench.testDeleteExpiringIds
[PR] KAFKA-16229: Fix slow expired producer id deletion [kafka]
jeqo opened a new pull request, #15324: URL: https://github.com/apache/kafka/pull/15324 [[KAFKA-16229](https://issues.apache.org/jira/browse/KAFKA-16229)] ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16215; KAFKA-16178: Fix member not rejoining after error [kafka]
AndrewJSchofield commented on code in PR #15311: URL: https://github.com/apache/kafka/pull/15311#discussion_r1480305816 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestState.java: ## @@ -106,6 +106,13 @@ public void onSendAttempt(final long currentTimeMs) { this.lastSentMs = currentTimeMs; } +/** + * Update the lastReceivedTime in milliseconds, indicating that a response has been received. + */ +public void updateLastReceivedTime(final long lastReceivedMs) { +this.lastReceivedMs = lastReceivedMs; Review Comment: The implementation of exponential backoff for KIP-580 introduced the idea of "equivalent responses". The idea is that exponential backoff applies when metadata responses are "equivalent", but when things are progressing, the backoff is avoided. I think a similar principle could be helpful here. Essentially, if "progress" is being made such as a coordinator change, do not apply the backoff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16229) Slow expiration of Producer IDs leading to high CPU usage
Jorge Esteban Quilcate Otoya created KAFKA-16229: Summary: Slow expiration of Producer IDs leading to high CPU usage Key: KAFKA-16229 URL: https://issues.apache.org/jira/browse/KAFKA-16229 Project: Kafka Issue Type: Bug Reporter: Jorge Esteban Quilcate Otoya Assignee: Jorge Esteban Quilcate Otoya Expiration of ProducerIds is implemented with a slow removal of map keys: ``` producers.keySet().removeAll(keys); ``` Unnecessarily going through all producer ids and then throw all expired keys to be removed. This leads to exponential time on worst case when most/all keys need to be removed: ``` Benchmark (numProducerIds) Mode Cnt Score Error Units ProducerStateManagerBench.testDeleteExpiringIds 100 avgt 3 9164.043 ± 10647.877 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt 3 341561.093 ± 20283.211 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1 avgt 3 44957983.550 ± 9389011.290 ns/op ProducerStateManagerBench.testDeleteExpiringIds 10 avgt 3 5683374164.167 ± 1446242131.466 ns/op ``` A simple fix is to use map#remove(key) instead, leading to a more linear growth: ``` Benchmark(numProducerIds) Mode Cnt Score Error Units ProducerStateManagerBench.testDeleteExpiringIds 100 avgt3 5779.056 ± 651.389 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1000 avgt3 61430.530 ± 21875.644 ns/op ProducerStateManagerBench.testDeleteExpiringIds 1 avgt3 643887.031 ± 600475.302 ns/op ProducerStateManagerBench.testDeleteExpiringIds10 avgt3 7741689.539 ± 3218317.079 ns/op ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16228) Add --remote-log-metadata-decoder to kafka-dump-log.sh
[ https://issues.apache.org/jira/browse/KAFKA-16228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Federico Valeri updated KAFKA-16228: Description: It would be good to improve the kafka-dump-log.sh tool adding a decode flag for __remote_log_metadata records. Something like the following would be useful for debugging. {code} bin/kafka-dump-log.sh --remote-log-metadata-decoder --files /opt/kafka/data/__remote_log_metadata-0/.log {code} was: It would be good to improve the kafka-dump-log.sh tool adding a decode flags for __remote_log_metadata records. Something like the following would be useful for debugging. {code} bin/kafka-dump-log.sh --remote-log-metadata-decoder --files /opt/kafka/data/__remote_log_metadata-0/.log {code} > Add --remote-log-metadata-decoder to kafka-dump-log.sh > -- > > Key: KAFKA-16228 > URL: https://issues.apache.org/jira/browse/KAFKA-16228 > Project: Kafka > Issue Type: New Feature > Components: Tiered-Storage >Affects Versions: 3.6.1 >Reporter: Federico Valeri >Priority: Major > > It would be good to improve the kafka-dump-log.sh tool adding a decode flag > for __remote_log_metadata records. Something like the following would be > useful for debugging. > {code} > bin/kafka-dump-log.sh --remote-log-metadata-decoder --files > /opt/kafka/data/__remote_log_metadata-0/.log > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16228) Add --remote-log-metadata-decoder to kafka-dump-log.sh
Federico Valeri created KAFKA-16228: --- Summary: Add --remote-log-metadata-decoder to kafka-dump-log.sh Key: KAFKA-16228 URL: https://issues.apache.org/jira/browse/KAFKA-16228 Project: Kafka Issue Type: New Feature Components: Tiered-Storage Affects Versions: 3.6.1 Reporter: Federico Valeri It would be good to improve the kafka-dump-log.sh tool adding a decode flags for __remote_log_metadata records. Something like the following would be useful for debugging. {code} bin/kafka-dump-log.sh --remote-log-metadata-decoder --files /opt/kafka/data/__remote_log_metadata-0/.log {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16215; KAFKA-16178: Fix member not rejoining after error [kafka]
lianetm commented on code in PR #15311: URL: https://github.com/apache/kafka/pull/15311#discussion_r1480240755 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestState.java: ## @@ -106,6 +106,13 @@ public void onSendAttempt(final long currentTimeMs) { this.lastSentMs = currentTimeMs; } +/** + * Update the lastReceivedTime in milliseconds, indicating that a response has been received. + */ +public void updateLastReceivedTime(final long lastReceivedMs) { +this.lastReceivedMs = lastReceivedMs; Review Comment: Thanks for the feedback. Agree with @dajac's point that we might be carrying on with a non-desired backoff by leaving it unchanged in this situation, so we definitely need to be setting one, and as @AndrewJSchofield said, all errors would update the last receive time and apply "some" backoff. That leads to deciding which one, and my intention was 0 backoff in a subset of errors. As I see it, there are some specific errors in HB where we don't just retry the same request, but rather do a different one, so it makes sense to skip backoff as an optimization, ex. 1. HB to rejoin as new member after fence error. We would send it right away after the fence, skipping backoff 2. HB to a new coordinator after not_coordinator error. We would send it as soon as the new coordinator is discovered, without applying any backoff. Makes sense? I will update to make sure we always set a backoff, but with custom 0 backoff for these cases as the initial intention was, if it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814894#comment-17814894 ] Lianet Magrans commented on KAFKA-16227: [~kirktrue] Seems like a similar challenge came out in the context of KIP-951, might be helpful https://issues.apache.org/jira/browse/KAFKA-15824 when we get to this one > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Major > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-14822: Allow restricting File and Directory ConfigProviders to specific paths [kafka]
tinaselenge commented on code in PR #14995: URL: https://github.com/apache/kafka/pull/14995#discussion_r1480171669 ## clients/src/main/java/org/apache/kafka/common/config/provider/FileConfigProvider.java: ## @@ -40,7 +42,13 @@ public class FileConfigProvider implements ConfigProvider { private static final Logger log = LoggerFactory.getLogger(FileConfigProvider.class); +public static final String ALLOWED_PATHS_CONFIG = "allowed.paths"; +public static final String ALLOWED_PATHS_DOC = "A comma separated list of paths that this config provider is " + +"allowed to access. If not set, all paths are allowed."; +private AllowedPaths allowedPaths = null; Review Comment: @gharris1727 you are right. The mock classes extending the provider class needed to be updated as they didn't call `configure()` causing test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16227) Console consumer fails with `IllegalStateException`
[ https://issues.apache.org/jira/browse/KAFKA-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Jacot updated KAFKA-16227: Affects Version/s: 3.7.0 > Console consumer fails with `IllegalStateException` > --- > > Key: KAFKA-16227 > URL: https://issues.apache.org/jira/browse/KAFKA-16227 > Project: Kafka > Issue Type: Sub-task > Components: clients >Affects Versions: 3.7.0 >Reporter: David Jacot >Assignee: Kirk True >Priority: Major > > I have seen a few occurrences like the following one. There is a race between > the background thread and the foreground thread. I imagine the following > steps: > * quickstart-events-2 is assigned by the background thread; > * the foreground thread starts the initialization of the partition (e.g. > reset offset); > * quickstart-events-2 is removed by the background thread; > * the initialization completes and quickstart-events-2 does not exist > anymore. > > {code:java} > [2024-02-06 16:21:57,375] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.IllegalStateException: No current assignment for partition > quickstart-events-2 > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) > at > org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) > at > org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) > at > org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) > at > kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-15561: Client support for new SubscriptionPattern based subscription [kafka]
Phuc-Hong-Tran commented on PR #15188: URL: https://github.com/apache/kafka/pull/15188#issuecomment-1930102760 @cadonna Just for clarification, when we were talking about "implement and test everything up to the point where the field is populated", does that mean we're not gonna implement and test the part where the client receive the assignment from broker at this stage? (I'm mostly blocked at this part) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15761: KRaft support in EpochDrivenReplicationProtocolAcceptanceTest [kafka]
highluck commented on PR #15295: URL: https://github.com/apache/kafka/pull/15295#issuecomment-1930100354 @mimaison I tried this and that, but I think I need to study a little more. After finishing this, would it be okay to do follow-up PR work on the `shouldFollowLeaderEpochBasicWorkflow` function? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h3. Lock Profile: Kafka-15415 !kafka_15415_lock_profile.png! h3. Lock Profile: Baseline !baseline_lock_profile.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). With the fix, lock-profile & metrics are similar to baseline. was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h3. Lock Profile: Kafka-15415 !kafka_15415_lock_profile.png! h3. Lock Profile: Baseline !baseline_lock_profile.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: baseline_lock_profile.png, kafka_15415_lock_profile.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h3. Lock Profile: Kafka-15415 !kafka_15415_lock_profile.png! h3. Lock Profile: Baseline !baseline_lock_profile.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !kafka_15415_lock_profile.png! h2. Lock Profile: Baseline !baseline_lock_profile.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: baseline_lock_profile.png, kafka_15415_lock_profile.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Attachment: baseline_lock_profile.png > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: baseline_lock_profile.png, kafka_15415_lock_profile.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > Synchronization has to be reduced between 2 threads in order to address this. > [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids > using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !kafka_15415_lock_profile.png! h2. Lock Profile: Baseline !baseline_lock_profile.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: baseline_lock_profile.png, kafka_15415_lock_profile.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Attachment: kafka_15415_lock_profile.png > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: baseline_lock_profile.png, kafka_15415_lock_profile.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > Synchronization has to be reduced between 2 threads in order to address this. > [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids > using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Attachment: (was: image-20240201-105752.png) > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > Synchronization has to be reduced between 2 threads in order to address this. > [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids > using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Attachment: (was: Screenshot 2024-02-01 at 11.06.36.png) > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > Synchronization has to be reduced between 2 threads in order to address this. > [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids > using Metadata.currentLeader() instead rely on Cluster.leaderFor(). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix Synchronization has to be reduced between 2 threads in order to address this. [https://github.com/apache/kafka/pull/15323] is a fix for it, as it avoids using Metadata.currentLeader() instead rely on Cluster.leaderFor(). was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png, > image-20240201-105752.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called
[jira] [Created] (KAFKA-16227) Console consumer fails with `IllegalStateException`
David Jacot created KAFKA-16227: --- Summary: Console consumer fails with `IllegalStateException` Key: KAFKA-16227 URL: https://issues.apache.org/jira/browse/KAFKA-16227 Project: Kafka Issue Type: Sub-task Components: clients Reporter: David Jacot Assignee: Kirk True I have seen a few occurrences like the following one. There is a race between the background thread and the foreground thread. I imagine the following steps: * quickstart-events-2 is assigned by the background thread; * the foreground thread starts the initialization of the partition (e.g. reset offset); * quickstart-events-2 is removed by the background thread; * the initialization completes and quickstart-events-2 does not exist anymore. {code:java} [2024-02-06 16:21:57,375] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) java.lang.IllegalStateException: No current assignment for partition quickstart-events-2 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367) at org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579) at org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283) at org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226) at org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110) at org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540) at org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525) at org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) at kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473) at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] MINOR:Type Casting Correction AND Null Pointer Exception (NPE) Defense [kafka]
highluck commented on PR #9786: URL: https://github.com/apache/kafka/pull/9786#issuecomment-1930075970 @mimaison Can you review this? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384]] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384],] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png, > image-20240201-105752.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|[https://github.com/apache/kafka/pull/14384]] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > > -- This
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|https://github.com/apache/kafka/pull/14384] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384]] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png, > image-20240201-105752.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|https://github.com/apache/kafka/pull/14384] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > > -- This message
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384],] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix was: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384],] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png, > image-20240201-105752.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|[https://github.com/apache/kafka/pull/14384],] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix > > -- This
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Attachment: image-20240201-105752.png > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png, > image-20240201-105752.png > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > How it happened > As can be seen from the original > [PR|[https://github.com/apache/kafka/pull/14384],] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: h1. Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. h1. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. h1. Why it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384],] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! h2. Lock Profile: Kafka-15415 !Screenshot 2024-02-01 at 11.06.36.png! h2. Lock Profile: Baseline !image-20240201-105752.png! h1. Fix was: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384],] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png, > image-20240201-105752.png > > > h1. Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > h1. What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > h1. Why it happened > As can be seen from the original > [PR|[https://github.com/apache/kafka/pull/14384],] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > h2. Lock Profile: Kafka-15415 > !Screenshot 2024-02-01 at 11.06.36.png! > h2. Lock Profile: Baseline > !image-20240201-105752.png! > h1. Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16066: Upgrade apacheds to 2.0.0.AM27 With apache kerby [kafka]
OmniaGM commented on code in PR #15277: URL: https://github.com/apache/kafka/pull/15277#discussion_r1480038350 ## core/src/test/scala/kafka/security/minikdc/MiniKdc.scala: ## @@ -19,38 +19,22 @@ package kafka.security.minikdc import java.io._ -import java.net.InetSocketAddress import java.nio.charset.StandardCharsets import java.nio.file.Files import java.text.MessageFormat import java.util.{Locale, Properties, UUID} - import kafka.utils.{CoreUtils, Exit, Logging} +import org.apache.directory.server.kerberos.shared.crypto.encryption.KerberosKeyFactory import scala.jdk.CollectionConverters._ -import org.apache.commons.lang.text.StrSubstitutor -import org.apache.directory.api.ldap.model.entry.{DefaultEntry, Entry} -import org.apache.directory.api.ldap.model.ldif.LdifReader -import org.apache.directory.api.ldap.model.name.Dn -import org.apache.directory.api.ldap.schema.extractor.impl.DefaultSchemaLdifExtractor -import org.apache.directory.api.ldap.schema.loader.LdifSchemaLoader -import org.apache.directory.api.ldap.schema.manager.impl.DefaultSchemaManager -import org.apache.directory.server.constants.ServerDNConstants -import org.apache.directory.server.core.DefaultDirectoryService -import org.apache.directory.server.core.api.{CacheService, DirectoryService, InstanceLayout} -import org.apache.directory.server.core.api.schema.SchemaPartition -import org.apache.directory.server.core.kerberos.KeyDerivationInterceptor -import org.apache.directory.server.core.partition.impl.btree.jdbm.{JdbmIndex, JdbmPartition} -import org.apache.directory.server.core.partition.ldif.LdifPartition -import org.apache.directory.server.kerberos.KerberosConfig -import org.apache.directory.server.kerberos.kdc.KdcServer -import org.apache.directory.server.kerberos.shared.crypto.encryption.KerberosKeyFactory -import org.apache.directory.server.kerberos.shared.keytab.{Keytab, KeytabEntry} -import org.apache.directory.server.protocol.shared.transport.{TcpTransport, UdpTransport} -import org.apache.directory.server.xdbm.Index -import org.apache.directory.shared.kerberos.KerberosTime +import org.apache.kerby.kerberos.kerb.KrbException +import org.apache.kerby.kerberos.kerb.identity.backend.BackendConfig +import org.apache.kerby.kerberos.kerb.server.{KdcConfig, KdcConfigKey, SimpleKdcServer} import org.apache.kafka.common.utils.{Java, Utils} - +import org.apache.kerby.kerberos.kerb.`type`.KerberosTime +import org.apache.kerby.kerberos.kerb.`type`.base.{EncryptionKey, PrincipalName} +import org.apache.kerby.kerberos.kerb.keytab.{Keytab, KeytabEntry} +import org.apache.kerby.util.NetworkUtil /** * Mini KDC based on Apache Directory Server that can be embedded in tests or used from command line as a standalone Review Comment: @highluck I believe this java doc still need to change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Attachment: Screenshot 2024-02-01 at 11.06.36.png > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > Attachments: Screenshot 2024-02-01 at 11.06.36.png > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > How it happened > As can be seen from the original > [PR|[https://github.com/apache/kafka/pull/14384],] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[https://github.com/apache/kafka/pull/14384],] RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using synchronised method Metadata.currentLeader(). This has led to increased synchronization between KafkaProducer's application-thread that call send(), and background-thread that actively send producer-batches to leaders. See lock profiles that clearly show increased synchronisation in KAFKA-15415 PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the synchronisation is much worse for paritionReady() in this benchmark as its called for each partition, and it has 36k partitions! Fix was: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[http://example.com|http://example.com/]https://github.com/apache/kafka/pull/14384] Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # request-latency-avg: increased from 50ms to 100ms. > How it happened > As can be seen from the original > [PR|[https://github.com/apache/kafka/pull/14384],] > RecordAccmulator.partitionReady() & drainBatchesForOneNode() started using > synchronised method Metadata.currentLeader(). This has led to increased > synchronization between KafkaProducer's application-thread that call send(), > and background-thread that actively send producer-batches to leaders. > See lock profiles that clearly show increased synchronisation in KAFKA-15415 > PR(highlighted in {color:#de350b}Red{color}) Vs baseline. Note the > synchronisation is much worse for paritionReady() in this benchmark as its > called for each partition, and it has 36k partitions! > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16066: Upgrade apacheds to 2.0.0.AM27 With apache kerby [kafka]
highluck commented on PR #15277: URL: https://github.com/apache/kafka/pull/15277#issuecomment-1930043014 @divijvaidya @mimaison @OmniaGM Here’s a reminder! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Add MetadataType metric from KIP-866 #15299 [kafka]
OmniaGM commented on PR #15306: URL: https://github.com/apache/kafka/pull/15306#issuecomment-1930043205 @cmccabe you left a [comment](https://github.com/apache/kafka/pull/15299#issuecomment-1922507934 ) on the original pr #15299 stating that you are moving the discussion for KIP-866 metrics here but the original pr has been committed and this pr seems to be just a duplicate of @mumrah's PR. Can you same some context when you have time please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[http://example.com|http://example.com/]https://github.com/apache/kafka/pull/14384] Fix was: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # > request-latency-avg: increased from 50ms to 100ms. > How it happened > As can be seen from the original > [PR|[http://example.com|http://example.com/]https://github.com/apache/kafka/pull/14384] > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] Fix was: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. 1. record_queue_time_avg Regression Details Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # > request-latency-avg: increased from 50ms to 100ms. > How it happened > As can be seen from the original > [PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Affects Version/s: 3.6.1 3.7.0 > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > 1. record_queue_time_avg > Regression Details > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Labels: kip-951 (was: ) > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 3.7.0, 3.6.1 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > 1. record_queue_time_avg > Regression Details > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Java client: Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Summary: Java client: Performance regression in Trogdor benchmark with high partition counts (was: Performance regression in Trogdor benchmark with high partition counts) > Java client: Performance regression in Trogdor benchmark with high partition > counts > --- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > 1. record_queue_time_avg > Regression Details > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. 1. record_queue_time_avg Regression Details Fix was:For KIP-951, > Performance regression in Trogdor benchmark with high partition counts > -- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > 1. record_queue_time_avg > Regression Details > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Description: For KIP-951, (was: Right now in java-client, producer-batches backoff upto retry.backoff.ms(100ms by default). This Jira proposes that backoff should be skipped if client knows of a newer-leader for the partition in a sub-sequent retry(typically through refresh of parition-metadata via the Metadata RPC). This would help improve the latency of the produce-request around when partition leadership changes.) > Performance regression in Trogdor benchmark with high partition counts > -- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > For KIP-951, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Issue Type: Bug (was: Improvement) > Performance regression in Trogdor benchmark with high partition counts > -- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Fix For: 3.7.0, 3.6.1 > > > Right now in java-client, producer-batches backoff upto > retry.backoff.ms(100ms by default). This Jira proposes that backoff should be > skipped if client knows of a newer-leader for the partition in a sub-sequent > retry(typically through refresh of parition-metadata via the Metadata RPC). > This would help improve the latency of the produce-request around when > partition leadership changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16226) Performance regression in Trogdor benchmark with high partition counts
[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula updated KAFKA-16226: -- Fix Version/s: 3.6.2 3.8.0 3.7.1 (was: 3.7.0) (was: 3.6.1) > Performance regression in Trogdor benchmark with high partition counts > -- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Right now in java-client, producer-batches backoff upto > retry.backoff.ms(100ms by default). This Jira proposes that backoff should be > skipped if client knows of a newer-leader for the partition in a sub-sequent > retry(typically through refresh of parition-metadata via the Metadata RPC). > This would help improve the latency of the produce-request around when > partition leadership changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16226) Performance regression in Trogdor benchmark with high partition counts
Mayank Shekhar Narula created KAFKA-16226: - Summary: Performance regression in Trogdor benchmark with high partition counts Key: KAFKA-16226 URL: https://issues.apache.org/jira/browse/KAFKA-16226 Project: Kafka Issue Type: Improvement Components: clients Reporter: Mayank Shekhar Narula Assignee: Mayank Shekhar Narula Fix For: 3.7.0, 3.6.1 Right now in java-client, producer-batches backoff upto retry.backoff.ms(100ms by default). This Jira proposes that backoff should be skipped if client knows of a newer-leader for the partition in a sub-sequent retry(typically through refresh of parition-metadata via the Metadata RPC). This would help improve the latency of the produce-request around when partition leadership changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15970) KIP-951, port newly added tests in FetcherTest.java to FetchRequestManagerTest.ajva
[ https://issues.apache.org/jira/browse/KAFKA-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Shekhar Narula resolved KAFKA-15970. --- Resolution: Fixed > KIP-951, port newly added tests in FetcherTest.java to > FetchRequestManagerTest.ajva > --- > > Key: KAFKA-15970 > URL: https://issues.apache.org/jira/browse/KAFKA-15970 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.7.0 >Reporter: Mayank Shekhar Narula >Assignee: Mayank Shekhar Narula >Priority: Major > Labels: kip-951 > > Java client changes for > https://cwiki.apache.org/confluence/display/KAFKA/KIP-951%3A+Leader+discovery+optimisations+for+the+client -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16224) Fix handling of deleted topic when auto-committing before revocation
[ https://issues.apache.org/jira/browse/KAFKA-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lianet Magrans updated KAFKA-16224: --- Description: Current logic for auto-committing offsets when partitions are revoked will retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the member not completing the revocation in time. We should consider this as an indication of the topic being deleted, and in the context of committing offsets to revoke partitions, we should abort the commit attempt and move on to complete and ack the revocation. While reviewing this, review the behaviour around this error for other commit operations as well in case a similar reasoning should be applied. Note that legacy coordinator behaviour around this seems to be the same as the new consumer currently has. was: Current logic for auto-committing offsets when partitions are revoked will retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the member not completing the revocation in time. We should consider this as an indication of the topic being deleted, and in the context of committing offsets to revoke partitions, we should abort the commit attempt and move on to complete and ack the revocation. While reviewing this, review the behaviour around this error for other commit operations as well in case a similar reasoning should be applied. > Fix handling of deleted topic when auto-committing before revocation > > > Key: KAFKA-16224 > URL: https://issues.apache.org/jira/browse/KAFKA-16224 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Lianet Magrans >Assignee: Lianet Magrans >Priority: Major > Labels: kip-848-client-support > > Current logic for auto-committing offsets when partitions are revoked will > retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the > member not completing the revocation in time. We should consider this as an > indication of the topic being deleted, and in the context of committing > offsets to revoke partitions, we should abort the commit attempt and move on > to complete and ack the revocation. > While reviewing this, review the behaviour around this error for other commit > operations as well in case a similar reasoning should be applied. > Note that legacy coordinator behaviour around this seems to be the same as > the new consumer currently has. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] working change [kafka]
msn-tldr opened a new pull request, #15323: URL: https://github.com/apache/kafka/pull/15323 *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-14945: Add Serializer#serializeToByteBuffer() to reduce memory copying [kafka]
LinShunKang commented on PR #12685: URL: https://github.com/apache/kafka/pull/12685#issuecomment-1929526582 > @LinShunKang , sorry for being late. I had a quick look at #14617, it looks like the `ByteBufferSerializer#serialize` is a public API and cannot be changed without KIP. You know more than I do, so, in your opinion, what should we do from now? Could we not touch the `ByteBufferSerializer#serialize` and only implement `ByteBufferSerializer#serializeToByteBuffer`? Do you think we should include people in this [PR](https://github.com/apache/kafka/pull/14617) to this PR for discussion? Or do you have any other thoughts? We could not only implement `ByteBufferSerializer#serializeToByteBuffer` because if `Serializer` implements this method, then the `Serializer#serialize` will never be called. And `ByteBufferSerializer#serialize` has obvious logical problems. I believe we should address the logical issues in `ByteBufferSerializer#serialize`, but this will introduce breaking changes for existing users. And the `Serializer` should not have both `serialize` and `serializeToByteBuffer` methods at the same time. Therefore, I suggest we tackle these issues in Kafka 4.0, where we can modify the signature of the `Serializer`: from ``` //before 4.0 public interface Serializer { byte[] serialize(String topic, T data); default byte[] serialize(String topic, Headers headers, T data) { return serialize(topic, data); } } ``` to ``` //since 4.0 public interface Serializer { ByteBuffer serialize(String topic, T data); ByteBuffer serialize(String topic, Headers headers, T data); } ``` Then we could announce that we are modified the signature of the `Serializer` for existing users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Update LICENSE-binary file [kafka]
jlprat commented on PR #15322: URL: https://github.com/apache/kafka/pull/15322#issuecomment-1929371251 If you need to run this several times, you can try to omit the test suite, it will be faster :) ``` ./gradlewAll clean releaseTarGz -x test ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] MINOR: Update LICENSE-binary file [kafka]
mimaison opened a new pull request, #15322: URL: https://github.com/apache/kafka/pull/15322 Tested with: ``` $ ./gradlewAll clean releaseTarGz $ tar xzf core/build/distributions/kafka_2.13-3.8.0-SNAPSHOT.tgz $ cd kafka_2.13-3.8.0-SNAPSHOT/ $ for f in $(ls libs | grep -v "^kafka\|connect\|trogdor"); do if ! grep -q ${f%.*} LICENSE; then echo "${f%.*} is missing in license file"; fi; done ``` ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15561: Client support for new SubscriptionPattern based subscription [kafka]
Phuc-Hong-Tran commented on PR #15188: URL: https://github.com/apache/kafka/pull/15188#issuecomment-1929317343 @cadonna @lianetm, since we're supporting for both subscribe method using java.util.regex.Pattern and SubscriptionPattern, I think we should throw a illegal heartbeat exeption when user try to use both method at the same time and inform the user to use once at a time, since the field SubscribedRegex is used for java.util.regex.Pattern as well as SubscriptionPattern. What do you guys think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Upgrade maven artifact version to 3.9.6 [kafka]
mimaison merged PR #15309: URL: https://github.com/apache/kafka/pull/15309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16181: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh [kafka]
dajac commented on code in PR #15304: URL: https://github.com/apache/kafka/pull/15304#discussion_r1479611758 ## core/src/test/scala/integration/kafka/admin/ConfigCommandIntegrationTest.scala: ## @@ -162,19 +174,198 @@ class ConfigCommandIntegrationTest extends QuorumTestHarness with Logging { assertThrows(classOf[ConfigException], () => alterConfigWithZk(configs, None, encoderConfigs)) // Dynamic config updates using ZK should fail if broker is running. -registerBrokerInZk(brokerId.toInt) +registerBrokerInZk(zkClient, brokerId.toInt) assertThrows(classOf[IllegalArgumentException], () => alterConfigWithZk(Map("message.max.size" -> "21"), Some(brokerId))) assertThrows(classOf[IllegalArgumentException], () => alterConfigWithZk(Map("message.max.size" -> "22"), None)) // Dynamic config updates using ZK should for a different broker that is not running should succeed alterAndVerifyConfig(Map("message.max.size" -> "23"), Some("2")) } - private def registerBrokerInZk(id: Int): Unit = { + private def registerBrokerInZk(zkClient: kafka.zk.KafkaZkClient, id: Int): Unit = { zkClient.createTopLevelPaths() val securityProtocol = SecurityProtocol.PLAINTEXT val endpoint = new EndPoint("localhost", 9092, ListenerName.forSecurityProtocol(securityProtocol), securityProtocol) val brokerInfo = BrokerInfo(Broker(id, Seq(endpoint), rack = None), MetadataVersion.latestTesting, jmxPort = 9192) zkClient.registerBroker(brokerInfo) } + + @ClusterTest + def testUpdateInvalidBrokersConfig(): Unit = { +checkInvalidBrokerConfig(None) + checkInvalidBrokerConfig(Some(cluster.anyBrokerSocketServer().config.brokerId.toString)) + } + + private def checkInvalidBrokerConfig(entityNameOrDefault: Option[String]): Unit = { +for (incremental <- Array(true, false)) { + val entityNameParams = entityNameOrDefault.map(name => Array("--entity-name", name)).getOrElse(Array("--entity-default")) + ConfigCommand.alterConfig(cluster.createAdminClient(), new ConfigCommandOptions( +Array("--bootstrap-server", s"${cluster.bootstrapServers()}", + "--alter", + "--add-config", "invalid=2", + "--entity-type", "brokers") + ++ entityNameParams + ), incremental) + + val describeResult = TestUtils.grabConsoleOutput( +ConfigCommand.describeConfig(cluster.createAdminClient(), new ConfigCommandOptions( + Array("--bootstrap-server", s"${cluster.bootstrapServers()}", +"--describe", +"--entity-type", "brokers") +++ entityNameParams +))) + // We will treat unknown config as sensitive + assertTrue(describeResult.contains("sensitive=true")) + // Sensitive config will not return + assertTrue(describeResult.contains("invalid=null")) +} + } + + @ClusterTest + def testUpdateInvalidTopicConfig(): Unit = { +TestUtils.createTopicWithAdminRaw( + admin = cluster.createAdminClient(), + topic = "test-config-topic", +) +assertInstanceOf( + classOf[InvalidConfigurationException], + assertThrows( +classOf[ExecutionException], +() => ConfigCommand.alterConfig(cluster.createAdminClient(), new ConfigCommandOptions( + Array("--bootstrap-server", s"${cluster.bootstrapServers()}", +"--alter", +"--add-config", "invalid=2", +"--entity-type", "topics", +"--entity-name", "test-config-topic") +), true)).getCause +) + } + + @ClusterTest + def testUpdateAndDeleteBrokersConfig(): Unit = { +checkBrokerConfig(None) + checkBrokerConfig(Some(cluster.anyBrokerSocketServer().config.brokerId.toString)) + } + + private def checkBrokerConfig(entityNameOrDefault: Option[String]): Unit = { +val entityNameParams = entityNameOrDefault.map(name => Array("--entity-name", name)).getOrElse(Array("--entity-default")) +ConfigCommand.alterConfig(cluster.createAdminClient(), new ConfigCommandOptions( + Array("--bootstrap-server", s"${cluster.bootstrapServers()}", +"--alter", +"--add-config", "log.cleaner.threads=2", +"--entity-type", "brokers") +++ entityNameParams +), true) +TestUtils.waitUntilTrue( + () => cluster.brokerSocketServers().asScala.forall(broker => broker.config.getInt("log.cleaner.threads") == 2), + "Timeout waiting for topic config propagating to broker") + +val describeResult = TestUtils.grabConsoleOutput( + ConfigCommand.describeConfig(cluster.createAdminClient(), new ConfigCommandOptions( +Array("--bootstrap-server", s"${cluster.bootstrapServers()}", + "--describe", + "--entity-type", "brokers") + ++ entityNameParams + ))) +assertTrue(describeResult.contains("log.cleaner.threads=2")) +assertTrue(describeResult.contains("sensitive=false")) + +
Re: [PR] KAFKA-15561: Client support for new SubscriptionPattern based subscription [kafka]
Phuc-Hong-Tran commented on code in PR #15188: URL: https://github.com/apache/kafka/pull/15188#discussion_r1479598776 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java: ## @@ -84,6 +85,9 @@ private enum SubscriptionType { /* the pattern user has requested */ private Pattern subscribedPattern; +/* we should rename this to something more specific */ +private SubscriptionPattern subscriptionPattern; Review Comment: Will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15717: Added KRaft support in LeaderEpochIntegrationTest [kafka]
mimaison commented on PR #15225: URL: https://github.com/apache/kafka/pull/15225#issuecomment-1929267278 Done, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (KAFKA-15717) KRaft support in LeaderEpochIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison reassigned KAFKA-15717: -- Assignee: appchemist > KRaft support in LeaderEpochIntegrationTest > --- > > Key: KAFKA-15717 > URL: https://issues.apache.org/jira/browse/KAFKA-15717 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: appchemist >Priority: Minor > Labels: kraft, kraft-test, newbie > Fix For: 3.8.0 > > > The following tests in LeaderEpochIntegrationTest in > core/src/test/scala/unit/kafka/server/epoch/LeaderEpochIntegrationTest.scala > need to be updated to support KRaft > 67 : def shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader(): > Unit = { > 99 : def shouldSendLeaderEpochRequestAndGetAResponse(): Unit = { > 144 : def shouldIncreaseLeaderEpochBetweenLeaderRestarts(): Unit = { > Scanned 305 lines. Found 0 KRaft tests out of 3 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16215; KAFKA-16178: Fix member not rejoining after error [kafka]
AndrewJSchofield commented on code in PR #15311: URL: https://github.com/apache/kafka/pull/15311#discussion_r1479506790 ## clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestState.java: ## @@ -106,6 +106,13 @@ public void onSendAttempt(final long currentTimeMs) { this.lastSentMs = currentTimeMs; } +/** + * Update the lastReceivedTime in milliseconds, indicating that a response has been received. + */ +public void updateLastReceivedTime(final long lastReceivedMs) { +this.lastReceivedMs = lastReceivedMs; Review Comment: You're probably right @dajac. Should we be applying exponential backoff for heartbeats? Does it only apply to a subset of errors? As a starting position, I would have said that any response or error updates the last receive time, and we would apply exponential backoff in all cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16190) Member should send full heartbeat when rejoining
[ https://issues.apache.org/jira/browse/KAFKA-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814698#comment-17814698 ] Phuc Hong Tran commented on KAFKA-16190: [~lianetm] I understand, thanks for the explaination. > Member should send full heartbeat when rejoining > > > Key: KAFKA-16190 > URL: https://issues.apache.org/jira/browse/KAFKA-16190 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Lianet Magrans >Assignee: Quoc Phong Dang >Priority: Critical > Labels: client-transitions-issues, kip-848-client-support, newbie > Fix For: 3.8.0 > > > The heartbeat request builder should make sure that all fields are sent in > the heartbeat request when the consumer rejoins (currently the > HeartbeatRequestManager request builder is reset on failure scenarios, which > should cover the fence+rejoin sequence). > Note that the existing HeartbeatRequestManagerTest.testHeartbeatState misses > this exact case given that it does explicitly change the subscription when it > gets fenced. We should ensure we test a consumer that keeps it same initial > subscription when it rejoins after being fenced. -- This message was sent by Atlassian Jira (v8.20.10#820010)