[jira] [Created] (KAFKA-16899) Rename MembershipManagerImpl's rebalanceTimeoutMs for clarity
Kirk True created KAFKA-16899: - Summary: Rename MembershipManagerImpl's rebalanceTimeoutMs for clarity Key: KAFKA-16899 URL: https://issues.apache.org/jira/browse/KAFKA-16899 Project: Kafka Issue Type: Improvement Components: clients, consumer Affects Versions: 3.8.0 Reporter: Kirk True Fix For: 4.0.0 The naming of {{{}MembershipManagerImpl{}}}'s {{rebalanceTimeoutMs}} is a little confusing. Ultimately the broker enforces the rebalance process' timeout, not the client. It's therefore a little misleading to use {{rebalanceTimeoutMs}} as the name of a variable that only handles the client's commit portion of the process. It is used in {{MembershipManagerImpl}} as a means to limit the client's efforts in the case where it is repeatedly trying to commit but failing. A suggested name change was {{{}commitTimeoutDuringReconciliation{}}}. It is not clear to me that simply changing the name from {{rebalanceTimeoutMs}} to {{commitTimeoutDuringReconciliation}} is sufficient. When a caller instantiates a {{{}MembershipManagerImpl{}}}, it is passing in a variable named {{rebalanceTimeoutMs}} that ultimately comes from the {{ClientConfig}} object. I can imagine it being a little confusing that a value for {{rebalanceTimeoutMs}} is passed in, but then really only used as the value for {{commitTimeoutDuringReconciliation}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16818) Move event-processing tests from ConsumerNetworkThreadTest to ApplicationEventProcessorTest
Kirk True created KAFKA-16818: - Summary: Move event-processing tests from ConsumerNetworkThreadTest to ApplicationEventProcessorTest Key: KAFKA-16818 URL: https://issues.apache.org/jira/browse/KAFKA-16818 Project: Kafka Issue Type: Improvement Components: clients, consumer, unit tests Affects Versions: 3.8.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{ConsumerNetworkThreadTest}} currently has a number of tests which do the following: # Add event of type _T_ to the event queue # Call {{ConsumerNetworkThread.runOnce()}} to dequeue the events and call {{ApplicationEventProcessor.process()}} # Verify that the appropriate {{ApplicationEventProcessor}} process method was invoked for the event Those types of tests should be moved to {{{}ApplicationEventProcessorTest{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer
[ https://issues.apache.org/jira/browse/KAFKA-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16578. --- Resolution: Won't Fix Most of the {{connect_distributed_test.py}} system tests were fixed, and {{test_exactly_once_source}} was reverted in a separate Jira/PR. > Revert changes to connect_distributed_test.py for the new async Consumer > > > Key: KAFKA-16578 > URL: https://issues.apache.org/jira/browse/KAFKA-16578 > Project: Kafka > Issue Type: Task > Components: clients, consumer, system tests >Affects Versions: 3.8.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Critical > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated > a slew of system tests to run both the "old" and "new" implementations. > KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it > could test the new consumer with Connect. However, we are not supporting > Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the > Connect system tests with the new {{AsyncKafkaConsumer}}, we get errors like > the following: > {code} > test_id: > kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 6 minutes 3.899 seconds > InsufficientResourcesError('Not enough nodes available to allocate. linux > nodes requested: 1. linux nodes available: 0') > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 184, in _do_run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 262, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", > line 433, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", > line 919, in test_exactly_once_source > consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, > self.source.topic, consumer_timeout_ms=1000, print_key=True) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py", > line 97, in __init__ > BackgroundThreadService.__init__(self, context, num_nodes) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py", > line 26, in __init__ > super(BackgroundThreadService, self).__init__(context, num_nodes, > cluster_spec, *args, **kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", > line 107, in __init__ > self.allocate_nodes() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", > line 217, in allocate_nodes > self.nodes = self.cluster.alloc(self.cluster_spec) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py", > line 54, in alloc > allocated = self.do_alloc(cluster_spec) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py", > line 31, in do_alloc > allocated = self._available_nodes.remove_spec(cluster_spec) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py", > line 117, in remove_spec > raise InsufficientResourcesError("Not enough nodes available to allocate. > " + msg) > ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes > available to allocate. linux nodes requested: 1. linux nodes available: 0 > {code} > The task here is to revert the changes made in KAFKA-16272 [PR > 15576|https://github.com/apache/kafka/pull/15576]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16787) Remove TRACE level logging from AsyncKafkaConsumer hot path
Kirk True created KAFKA-16787: - Summary: Remove TRACE level logging from AsyncKafkaConsumer hot path Key: KAFKA-16787 URL: https://issues.apache.org/jira/browse/KAFKA-16787 Project: Kafka Issue Type: Improvement Components: clients, consumer, logging Affects Versions: 3.7.0 Reporter: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16642) Update KafkaConsumerTest to show parameters in test lists
Kirk True created KAFKA-16642: - Summary: Update KafkaConsumerTest to show parameters in test lists Key: KAFKA-16642 URL: https://issues.apache.org/jira/browse/KAFKA-16642 Project: Kafka Issue Type: Bug Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 {{KafkaConsumerTest}} was recently updated to make many of its tests parameterized to exercise both the {{CLASSIC}} and {{CONSUMER}} group protocols. However, in some of the tools in which [lists of tests are provided|https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=org.apache.kafka.clients.consumer.KafkaConsumerTest=FLAKY], say, for analysis, the group protocol information is not exposed. For example, one test ({{{}testReturnRecordsDuringRebalance{}}}) is flaky, but it's difficult to know at a glance which group protocol is causing the problem because the list simply shows: {quote}{{testReturnRecordsDuringRebalance(GroupProtocol)[1]}} {quote} Ideally, it would expose more information, such as: {quote}{{testReturnRecordsDuringRebalance(GroupProtocol=CONSUMER)}} {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16460) New consumer times out consuming records in consumer_test.py system test
[ https://issues.apache.org/jira/browse/KAFKA-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16460. --- Resolution: Duplicate > New consumer times out consuming records in consumer_test.py system test > > > Key: KAFKA-16460 > URL: https://issues.apache.org/jira/browse/KAFKA-16460 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, system tests >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > The {{consumer_test.py}} system test fails with the following errors: > {quote} > * Timed out waiting for consumption > {quote} > Affected tests: > * {{test_broker_failure}} > * {{test_consumer_bounce}} > * {{test_static_consumer_bounce}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16609) Update parse_describe_topic to support new topic describe output
[ https://issues.apache.org/jira/browse/KAFKA-16609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16609. --- Reviewer: Lucas Brutschy Resolution: Fixed > Update parse_describe_topic to support new topic describe output > > > Key: KAFKA-16609 > URL: https://issues.apache.org/jira/browse/KAFKA-16609 > Project: Kafka > Issue Type: Bug > Components: admin, system tests >Affects Versions: 3.8.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Major > Labels: system-test-failure > Fix For: 3.8.0 > > > It appears that recent changes to the describe topic output has broken the > system test's ability to parse the output. > {noformat} > test_id: > kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=False.reassign_from_offset_zero=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 50.333 seconds > IndexError('list index out of range') > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 184, in _do_run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 262, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", > line 433, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py", > line 175, in test_reassign_partitions > self.run_produce_consume_validate(core_test_action=lambda: > self.reassign_partitions(bounce_brokers)) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 105, in run_produce_consume_validate > core_test_action(*args) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py", > line 175, in > self.run_produce_consume_validate(core_test_action=lambda: > self.reassign_partitions(bounce_brokers)) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py", > line 82, in reassign_partitions > partition_info = > self.kafka.parse_describe_topic(self.kafka.describe_topic(self.topic)) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py", > line 1400, in parse_describe_topic > fields = list(map(lambda x: x.split(" ")[1], fields)) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py", > line 1400, in > fields = list(map(lambda x: x.split(" ")[1], fields)) > IndexError: list index out of range > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16623) KafkaAsyncConsumer system tests warn about revoking partitions that weren't previously assigned
Kirk True created KAFKA-16623: - Summary: KafkaAsyncConsumer system tests warn about revoking partitions that weren't previously assigned Key: KAFKA-16623 URL: https://issues.apache.org/jira/browse/KAFKA-16623 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.8.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 When running system tests for the KafkaAsyncConsumer, we occasionally see this warning: {noformat} File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py", line 38, in _protected_worker self._worker(idx, node) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/verifiable_consumer.py", line 304, in _worker handler.handle_partitions_revoked(event, node, self.logger) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/verifiable_consumer.py", line 163, in handle_partitions_revoked (tp, node.account.hostname) AssertionError: Topic partition TopicPartition(topic='test_topic', partition=0) cannot be revoked from worker20 as it was not previously assigned to that consumer {noformat} It is unclear what is causing this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16609) Update parse_describe_topic to support new topic describe output
Kirk True created KAFKA-16609: - Summary: Update parse_describe_topic to support new topic describe output Key: KAFKA-16609 URL: https://issues.apache.org/jira/browse/KAFKA-16609 Project: Kafka Issue Type: Bug Components: admin, system tests Affects Versions: 3.8.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 It appears that recent changes to the describe topic output has broken the system test's ability to parse the output. {noformat} test_id: kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=False.reassign_from_offset_zero=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer status: FAIL run time: 50.333 seconds IndexError('list index out of range') Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 262, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py", line 175, in test_reassign_partitions self.run_produce_consume_validate(core_test_action=lambda: self.reassign_partitions(bounce_brokers)) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/produce_consume_validate.py", line 105, in run_produce_consume_validate core_test_action(*args) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py", line 175, in self.run_produce_consume_validate(core_test_action=lambda: self.reassign_partitions(bounce_brokers)) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py", line 82, in reassign_partitions partition_info = self.kafka.parse_describe_topic(self.kafka.describe_topic(self.topic)) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py", line 1400, in parse_describe_topic fields = list(map(lambda x: x.split(" ")[1], fields)) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py", line 1400, in fields = list(map(lambda x: x.split(" ")[1], fields)) IndexError: list index out of range {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16462) New consumer fails with timeout in security_test.py system test
[ https://issues.apache.org/jira/browse/KAFKA-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16462. --- Resolution: Duplicate > New consumer fails with timeout in security_test.py system test > --- > > Key: KAFKA-16462 > URL: https://issues.apache.org/jira/browse/KAFKA-16462 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, system tests >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > The {{security_test.py}} system test fails with the following error: > {noformat} > test_id: > kafkatest.tests.core.security_test.SecurityTest.test_client_ssl_endpoint_validation_failure.security_protocol=SSL.interbroker_security_protocol=PLAINTEXT.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 1 minute 30.885 seconds > TimeoutError('') > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 184, in _do_run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 262, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", > line 433, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/security_test.py", > line 142, in test_client_ssl_endpoint_validation_failure > wait_until(lambda: self.producer_consumer_have_expected_error(error), > timeout_sec=30) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py", > line 58, in wait_until > raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from > last_exception > ducktape.errors.TimeoutError > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16464) New consumer fails with timeout in replication_replica_failure_test.py system test
[ https://issues.apache.org/jira/browse/KAFKA-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16464. --- Resolution: Duplicate > New consumer fails with timeout in replication_replica_failure_test.py system > test > -- > > Key: KAFKA-16464 > URL: https://issues.apache.org/jira/browse/KAFKA-16464 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, system tests >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > The {{replication_replica_failure_test.py}} system test fails with the > following error: > {noformat} > test_id: > kafkatest.tests.core.replication_replica_failure_test.ReplicationReplicaFailureTest.test_replication_with_replica_failure.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 1 minute 20.972 seconds > TimeoutError('Timed out after 30s while awaiting initial record delivery > of 5 records') > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 184, in _do_run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", > line 262, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", > line 433, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/replication_replica_failure_test.py", > line 97, in test_replication_with_replica_failure > self.await_startup() > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/end_to_end.py", > line 125, in await_startup > (timeout_sec, min_records)) > File > "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py", > line 58, in wait_until > raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from > last_exception > ducktape.errors.TimeoutError: Timed out after 30s while awaiting initial > record delivery of 5 records > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16579) Revert changes to consumer_rolling_upgrade_test.py for the new async Consumer
Kirk True created KAFKA-16579: - Summary: Revert changes to consumer_rolling_upgrade_test.py for the new async Consumer Key: KAFKA-16579 URL: https://issues.apache.org/jira/browse/KAFKA-16579 Project: Kafka Issue Type: Task Components: clients, consumer, system tests Affects Versions: 3.8.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated a slew of system tests to run both the "old" and "new" implementations. KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it could test the new consumer with Connect. However, we are not supporting Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the Connect system tests with the new {{{}AsyncKafkaConsumer{}}}, we get errors like the following: {code:java} test_id: kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer status: FAIL run time: 6 minutes 3.899 seconds InsufficientResourcesError('Not enough nodes available to allocate. linux nodes requested: 1. linux nodes available: 0') Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 262, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", line 919, in test_exactly_once_source consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, self.source.topic, consumer_timeout_ms=1000, print_key=True) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py", line 97, in __init__ BackgroundThreadService.__init__(self, context, num_nodes) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py", line 26, in __init__ super(BackgroundThreadService, self).__init__(context, num_nodes, cluster_spec, *args, **kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", line 107, in __init__ self.allocate_nodes() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", line 217, in allocate_nodes self.nodes = self.cluster.alloc(self.cluster_spec) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py", line 54, in alloc allocated = self.do_alloc(cluster_spec) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py", line 31, in do_alloc allocated = self._available_nodes.remove_spec(cluster_spec) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py", line 117, in remove_spec raise InsufficientResourcesError("Not enough nodes available to allocate. " + msg) ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes available to allocate. linux nodes requested: 1. linux nodes available: 0 {code} The task here is to revert the changes made in KAFKA-16272 [PR 15576|https://github.com/apache/kafka/pull/15576]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer
Kirk True created KAFKA-16578: - Summary: Revert changes to connect_distributed_test.py for the new async Consumer Key: KAFKA-16578 URL: https://issues.apache.org/jira/browse/KAFKA-16578 Project: Kafka Issue Type: Task Components: clients, consumer, system tests Affects Versions: 3.8.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated a slew of system tests to run both the "old" and "new" implementations. KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it could test the new consumer with Connect. However, we are not supporting Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the Connect system tests with the new {{{}AsyncKafkaConsumer{}}}, we get errors like the following: {code:java} test_id: kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer status: FAIL run time: 6 minutes 3.899 seconds InsufficientResourcesError('Not enough nodes available to allocate. linux nodes requested: 1. linux nodes available: 0') Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 262, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py", line 919, in test_exactly_once_source consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, self.source.topic, consumer_timeout_ms=1000, print_key=True) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py", line 97, in __init__ BackgroundThreadService.__init__(self, context, num_nodes) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py", line 26, in __init__ super(BackgroundThreadService, self).__init__(context, num_nodes, cluster_spec, *args, **kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", line 107, in __init__ self.allocate_nodes() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py", line 217, in allocate_nodes self.nodes = self.cluster.alloc(self.cluster_spec) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py", line 54, in alloc allocated = self.do_alloc(cluster_spec) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py", line 31, in do_alloc allocated = self._available_nodes.remove_spec(cluster_spec) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py", line 117, in remove_spec raise InsufficientResourcesError("Not enough nodes available to allocate. " + msg) ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes available to allocate. linux nodes requested: 1. linux nodes available: 0 {code} The task here is to revert the changes made in KAFKA-16272 [PR 15576|https://github.com/apache/kafka/pull/15576]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16577) New consumer fails with stop timeout in consumer_test.py’s test_consumer_bounce system test
Kirk True created KAFKA-16577: - Summary: New consumer fails with stop timeout in consumer_test.py’s test_consumer_bounce system test Key: KAFKA-16577 URL: https://issues.apache.org/jira/browse/KAFKA-16577 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{consumer_test.py}} system test intermittently fails with the following error: {code} test_id: kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_failure.clean_shutdown=True.enable_autocommit=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer status: FAIL run time: 42.582 seconds AssertionError() Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 262, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/consumer_test.py", line 399, in test_consumer_failure assert partition_owner is not None AssertionError Notify {code} Affected tests: * {{test_consumer_failure}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16576) New consumer fails with assert in consumer_test.py’s test_consumer_failure system test
Kirk True created KAFKA-16576: - Summary: New consumer fails with assert in consumer_test.py’s test_consumer_failure system test Key: KAFKA-16576 URL: https://issues.apache.org/jira/browse/KAFKA-16576 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{consumer_test.py}} system test fails with the following errors: {quote} * Timed out waiting for consumption {quote} Affected tests: * {{test_broker_failure}} * {{test_consumer_bounce}} * {{test_static_consumer_bounce}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16405) Mismatch assignment error when running consumer rolling upgrade system tests
[ https://issues.apache.org/jira/browse/KAFKA-16405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16405. --- Reviewer: Lucas Brutschy Resolution: Fixed > Mismatch assignment error when running consumer rolling upgrade system tests > > > Key: KAFKA-16405 > URL: https://issues.apache.org/jira/browse/KAFKA-16405 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, system tests >Reporter: Philip Nee >Assignee: Philip Nee >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > relevant to [https://github.com/apache/kafka/pull/15578] > > We are seeing: > {code:java} > > SESSION REPORT (ALL TESTS) > ducktape version: 0.11.4 > session_id: 2024-03-21--001 > run time: 3 minutes 24.632 seconds > tests run:7 > passed: 5 > flaky:0 > failed: 2 > ignored: 0 > > test_id: > kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=COMBINED_KRAFT.use_new_coordinator=True.group_protocol=classic > status: PASS > run time: 24.599 seconds > > test_id: > kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=COMBINED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 26.638 seconds > AssertionError("Mismatched assignment: {frozenset(), > frozenset({TopicPartition(topic='test_topic', partition=3), > TopicPartition(topic='test_topic', partition=0), > TopicPartition(topic='test_topic', partition=1), > TopicPartition(topic='test_topic', partition=2)})}") > Traceback (most recent call last): > File > "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", > line 186, in _do_run > data = self.run_test() > File > "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", > line 246, in run_test > return self.test_context.function(self.test) > File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line > 433, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py", > line 77, in rolling_update_test > self._verify_range_assignment(consumer) > File > "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py", > line 38, in _verify_range_assignment > assert assignment == set([ > AssertionError: Mismatched assignment: {frozenset(), > frozenset({TopicPartition(topic='test_topic', partition=3), > TopicPartition(topic='test_topic', partition=0), > TopicPartition(topic='test_topic', partition=1), > TopicPartition(topic='test_topic', partition=2)})} > > test_id: > kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=False > status: PASS > run time: 29.815 seconds > > test_id: > kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True > status: PASS > run time: 29.766 seconds > > test_id: > kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=classic > status: PASS > run time: 30.086 seconds > > test_id: > kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer > status: FAIL > run time: 35.965 seconds > AssertionError("Mismatched assignment: {frozenset(), > frozenset({TopicPartition(topic='test_topic', partition=3), > TopicPartition(topic='test_topic', partition=0), > TopicPartition(topic='test_topic', partition=1), > TopicPartition(topic='test_topic', partition=2)})}") > Traceback (most recent call last): > File > "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", > line 186, in _do_run >
[jira] [Created] (KAFKA-16565) IncrementalAssignmentConsumerEventHandler throws error when attempting to remove a partition that isn't assigned
Kirk True created KAFKA-16565: - Summary: IncrementalAssignmentConsumerEventHandler throws error when attempting to remove a partition that isn't assigned Key: KAFKA-16565 URL: https://issues.apache.org/jira/browse/KAFKA-16565 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.8.0 Reporter: Kirk True Fix For: 3.8.0 In {{{}verifiable_consumer.py{}}}, the Incremental {code:java} def handle_partitions_revoked(self, event): self.revoked_count += 1 self.state = ConsumerState.Rebalancing self.position = {} for topic_partition in event["partitions"]: topic = topic_partition["topic"] partition = topic_partition["partition"] self.assignment.remove(TopicPartition(topic, partition)) {code} If the {{self.assignment.remove()}} call is passed a {{TopicPartition}} that isn't in the list, an error is thrown. For now, we should first check that the {{TopicPartition}} is in the list, and if not, log a warning or something. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16558) Implement HeartbeatRequestState.toStringBase()
Kirk True created KAFKA-16558: - Summary: Implement HeartbeatRequestState.toStringBase() Key: KAFKA-16558 URL: https://issues.apache.org/jira/browse/KAFKA-16558 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The code incorrectly overrides the {{toString()}} method instead of overriding {{{}toStringBase(){}}}. This affects debugging and troubleshooting consumer issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16557) Fix CommitRequestManager’s OffsetFetchRequestState.toString()
Kirk True created KAFKA-16557: - Summary: Fix CommitRequestManager’s OffsetFetchRequestState.toString() Key: KAFKA-16557 URL: https://issues.apache.org/jira/browse/KAFKA-16557 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The code incorrectly overrides the {{toString()}} method instead of overriding {{{}toStringBase(){}}}. This affects debugging and troubleshooting consumer issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16556) Race condition between ConsumerRebalanceListener and SubscriptionState
Kirk True created KAFKA-16556: - Summary: Race condition between ConsumerRebalanceListener and SubscriptionState Key: KAFKA-16556 URL: https://issues.apache.org/jira/browse/KAFKA-16556 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Fix For: 3.8.0 There appears to be a race condition between invoking the {{ConsumerRebalanceListener}} callbacks on reconciliation and {{initWithCommittedOffsetsIfNeeded}} in the consumer. The membership manager adds the newly assigned partitions to the {{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set all of those partitions' 'pending' flag to false. During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the subscription's {{initializingPartitions}} method to get a set of the partitions for which to fetch their committed offsets. However, {{SubscriptionState.initializingPartitions()}} only returns partitions that have the {{pendingOnAssignedCallback}} flag set to to false. The result is: * If the {{MembershipManagerImpl.assignPartitions()}} future is completed on the background thread first, the 'pending' flag is set to false. On the application thread, when {{SubscriptionState.initializingPartitions()}} is called, it returns the partition, and we fetch its committed offsets * If instead the application thread calls {{SubscriptionState.initializingPartitions()}} first, the partitions's 'pending' flag is still set to false, and so the partition is omitted from the returned set. The {{updateFetchPositions()}} method then continues on and re-initializes the partition's fetch offset to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16555) Consumer's RequestState has incorrect logic to determine if inflight
Kirk True created KAFKA-16555: - Summary: Consumer's RequestState has incorrect logic to determine if inflight Key: KAFKA-16555 URL: https://issues.apache.org/jira/browse/KAFKA-16555 Project: Kafka Issue Type: Task Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 When running system tests for the new consumer, I've hit an issue where the {{HeartbeatRequestManager}} is sending out multiple concurrent {{CONSUMER_GROUP_REQUEST}} RPCs. The effect is the coordinator creates multiple members which causes downstream assignment problems. Here's the order of events: * Time 202: {{HearbeatRequestManager.poll()}} determines it's OK to send a request. In so doing, it updates the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 202 * Time 236: the response is received and response handler is invoked, setting the {{RequestState}}'s {{lastReceivedMs}} to the current timestamp, 236 * Time 236: {{HearbeatRequestManager.poll()}} is invoked again, and it sees that it's OK to send a request. It creates another request, once again updating the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 236 * Time 237: {{HearbeatRequestManager.poll()}} is invoked again, and ERRONEOUSLY decides it's OK to send another request, despite one already in flight. Here's the problem with {{requestInFlight()}}: {code:java} public boolean requestInFlight() { return this.lastSentMs > -1 && this.lastReceivedMs < this.lastSentMs; } {code} On our case, {{lastReceivedMs}} is 236 and {{lastSentMs}} is _also_ 236. So the received timestamp is _equal_ to the sent timestamp, not _less_. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16465) New consumer does not invoke rebalance callbacks as expected in consumer_test.py system test
Kirk True created KAFKA-16465: - Summary: New consumer does not invoke rebalance callbacks as expected in consumer_test.py system test Key: KAFKA-16465 URL: https://issues.apache.org/jira/browse/KAFKA-16465 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{replication_replica_failure_test.py}} system test fails with the following error: {noformat} test_id: kafkatest.tests.core.replication_replica_failure_test.ReplicationReplicaFailureTest.test_replication_with_replica_failure.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer status: FAIL run time: 1 minute 20.972 seconds TimeoutError('Timed out after 30s while awaiting initial record delivery of 5 records') Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 262, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/replication_replica_failure_test.py", line 97, in test_replication_with_replica_failure self.await_startup() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/end_to_end.py", line 125, in await_startup (timeout_sec, min_records)) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py", line 58, in wait_until raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception ducktape.errors.TimeoutError: Timed out after 30s while awaiting initial record delivery of 5 records {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16464) New consumer fails with timeout in replication_replica_failure_test.py system test
Kirk True created KAFKA-16464: - Summary: New consumer fails with timeout in replication_replica_failure_test.py system test Key: KAFKA-16464 URL: https://issues.apache.org/jira/browse/KAFKA-16464 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{security_test.py}} system test fails with the following error: {noformat} test_id: kafkatest.tests.core.security_test.SecurityTest.test_client_ssl_endpoint_validation_failure.security_protocol=SSL.interbroker_security_protocol=PLAINTEXT.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer status: FAIL run time: 1 minute 30.885 seconds TimeoutError('') Traceback (most recent call last): File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run data = self.run_test() File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py", line 262, in run_test return self.test_context.function(self.test) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/security_test.py", line 142, in test_client_ssl_endpoint_validation_failure wait_until(lambda: self.producer_consumer_have_expected_error(error), timeout_sec=30) File "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py", line 58, in wait_until raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception ducktape.errors.TimeoutError {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16462) New consumer fails with timeout in security_test.py system test
Kirk True created KAFKA-16462: - Summary: New consumer fails with timeout in security_test.py system test Key: KAFKA-16462 URL: https://issues.apache.org/jira/browse/KAFKA-16462 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{security_test.py}} system test fails with the following error: {quote} * Consumer failed to consume up to offsets {quote} Affected test: * {{test_client_ssl_endpoint_validation_failure}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16461) New consumer fails to consume records in security_test.py system test
Kirk True created KAFKA-16461: - Summary: New consumer fails to consume records in security_test.py system test Key: KAFKA-16461 URL: https://issues.apache.org/jira/browse/KAFKA-16461 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{consumer_test.py}} system test fails with the following errors: {quote} * Timed out waiting for consumption {quote} Affected tests: * {{test_broker_failure}} * {{test_consumer_bounce}} * {{test_static_consumer_bounce}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16460) New consumer times out system test
Kirk True created KAFKA-16460: - Summary: New consumer times out system test Key: KAFKA-16460 URL: https://issues.apache.org/jira/browse/KAFKA-16460 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The {{consumer_test.py}} system test fails with two different errors related to consumers joining the consumer group in a timely fashion. {quote} * Consumers failed to join in a reasonable amount of time * Timed out waiting for consumers to join, expected total X joined, but only see Y joined fromnormal consumer group and Z from conflict consumer group{quote} Affected tests: * {{test_fencing_static_consumer}} * {{test_static_consumer_bounce}} * {{test_static_consumer_persisted_after_rejoin}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16459) consumer_test.py’s static membership tests fail with new consumer
Kirk True created KAFKA-16459: - Summary: consumer_test.py’s static membership tests fail with new consumer Key: KAFKA-16459 URL: https://issues.apache.org/jira/browse/KAFKA-16459 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Philip Nee Fix For: 3.8.0 The following error is reported when running the {{test_valid_assignment}} test from {{consumer_test.py}}: {code} Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 186, in _do_run data = self.run_test() File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 246, in run_test return self.test_context.function(self.test) File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 584, in test_valid_assignment wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, consumer.current_assignment()), File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 58, in wait_until raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])] {code} To reproduce, create a system test suite file named {{test_valid_assignment.yml}} with these contents: {code:yaml} failures: - 'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}' {code} Then set the the {{TC_PATHS}} environment variable to include that test suite file. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16444) Run KIP-848 unit tests under code coverage
Kirk True created KAFKA-16444: - Summary: Run KIP-848 unit tests under code coverage Key: KAFKA-16444 URL: https://issues.apache.org/jira/browse/KAFKA-16444 Project: Kafka Issue Type: Task Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16443) Update streams_static_membership_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16443: - Summary: Update streams_static_membership_test.py to support KIP-848’s group protocol config Key: KAFKA-16443 URL: https://issues.apache.org/jira/browse/KAFKA-16443 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 4.0.0 This task is to update the test method(s) in {{streams_standby_replica_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. See KAFKA-16231 as an example of how the test parameters can be changed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16442) Update streams_standby_replica_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16442: - Summary: Update streams_standby_replica_test.py to support KIP-848’s group protocol config Key: KAFKA-16442 URL: https://issues.apache.org/jira/browse/KAFKA-16442 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 4.0.0 This task is to update the test method(s) in {{security_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. See KAFKA-16231 as an example of how the test parameters can be changed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16440) Update security_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16440: - Summary: Update security_test.py to support KIP-848’s group protocol config Key: KAFKA-16440 URL: https://issues.apache.org/jira/browse/KAFKA-16440 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{replication_replica_failure_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. See KAFKA-16231 as an example of how the test parameters can be changed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16441) Update streams_broker_down_resilience_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16441: - Summary: Update streams_broker_down_resilience_test.py to support KIP-848’s group protocol config Key: KAFKA-16441 URL: https://issues.apache.org/jira/browse/KAFKA-16441 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{security_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. See KAFKA-16231 as an example of how the test parameters can be changed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16439) Update replication_replica_failure_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16439: - Summary: Update replication_replica_failure_test.py to support KIP-848’s group protocol config Key: KAFKA-16439 URL: https://issues.apache.org/jira/browse/KAFKA-16439 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{replica_scale_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. See KAFKA-16231 as an example of how the test parameters can be changed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16438) Update consumer_test.py’s static tests to support KIP-848’s group protocol config
Kirk True created KAFKA-16438: - Summary: Update consumer_test.py’s static tests to support KIP-848’s group protocol config Key: KAFKA-16438 URL: https://issues.apache.org/jira/browse/KAFKA-16438 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Reporter: Kirk True Assignee: Kirk True This task is to update the following test method(s) in {{consumer_test.py}} to support the {{group.protocol}} configuration: * {{test_fencing_static_consumer}} * {{test_static_consumer_bounce}} * {{test_static_consumer_persisted_after_rejoin}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16271) Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config
[ https://issues.apache.org/jira/browse/KAFKA-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16271. --- Reviewer: Lucas Brutschy Resolution: Fixed > Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol > config > -- > > Key: KAFKA-16271 > URL: https://issues.apache.org/jira/browse/KAFKA-16271 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Philip Nee >Priority: Blocker > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > > This task is to update the test method(s) in > {{consumer_rolling_upgrade_test.py}} to support the {{group.protocol}} > configuration introduced in > [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] > by adding an optional {{group_protocol}} argument to the tests and matrixes. > See KAFKA-16231 as an example of how the test parameters can be changed. > The tricky wrinkle here is that the existing test relies on client-side > assignment strategies that aren't applicable with the new KIP-848-enabled > consumer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-14246) Update threading model for Consumer
[ https://issues.apache.org/jira/browse/KAFKA-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-14246. --- Resolution: Fixed > Update threading model for Consumer > --- > > Key: KAFKA-14246 > URL: https://issues.apache.org/jira/browse/KAFKA-14246 > Project: Kafka > Issue Type: Improvement > Components: clients, consumer >Reporter: Philip Nee >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor > Fix For: 3.8.0 > > > Hi community, > > We are refactoring the current KafkaConsumer and making it more asynchronous. > This is the master Jira to track the project's progress; subtasks will be > linked to this ticket. Please review the design document and feel free to > use this thread for discussion. > > The design document is here: > [https://cwiki.apache.org/confluence/display/KAFKA/Proposal%3A+Consumer+Threading+Model+Refactor] > > The original email thread is here: > [https://lists.apache.org/thread/13jvwzkzmb8c6t7drs4oj2kgkjzcn52l] > > I will continue to update the 1pager as reviews and comments come. > > Thanks, > P -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16389) consumer_test.py’s test_valid_assignment fails with new consumer
Kirk True created KAFKA-16389: - Summary: consumer_test.py’s test_valid_assignment fails with new consumer Key: KAFKA-16389 URL: https://issues.apache.org/jira/browse/KAFKA-16389 Project: Kafka Issue Type: Bug Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Fix For: 3.8.0 The following error is reported when running the {{test_valid_assignment}} test from {{consumer_test.py}}: {code} Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 186, in _do_run data = self.run_test() File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 246, in run_test return self.test_context.function(self.test) File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 433, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 584, in test_valid_assignment wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, consumer.current_assignment()), File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 58, in wait_until raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])] {code} To reproduce, create a system test suite file named {{test_valid_assignment.yml}} with these contents: {code:yaml} failures: - 'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}' {code} Then run set the the {{TC_PATHS}} environment variable to include that test suite file. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15691) Add new system tests to use new consumer
[ https://issues.apache.org/jira/browse/KAFKA-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True reopened KAFKA-15691: --- > Add new system tests to use new consumer > > > Key: KAFKA-15691 > URL: https://issues.apache.org/jira/browse/KAFKA-15691 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Major > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16315) Investigate propagating metadata updates via queues
Kirk True created KAFKA-16315: - Summary: Investigate propagating metadata updates via queues Key: KAFKA-16315 URL: https://issues.apache.org/jira/browse/KAFKA-16315 Project: Kafka Issue Type: Task Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Fix For: 4.0.0 Some of the new {{AsyncKafkaConsumer}} logic enqueues events for the network I/O thread then issues a call to update the {{ConsumerMetadata}} via {{requestUpdate()}}. If the event ends up stuck in the queue for a long time, it is possible that the metadata is not updated at the correct time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16312) ConsumerRebalanceListener.onPartitionsAssigned should be called after joining, even if empty
Kirk True created KAFKA-16312: - Summary: ConsumerRebalanceListener.onPartitionsAssigned should be called after joining, even if empty Key: KAFKA-16312 URL: https://issues.apache.org/jira/browse/KAFKA-16312 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Lianet Magrans Fix For: 3.8.0 There is a difference between the {{LegacyKafkaConsumer}} and {{AsyncKafkaConsumer}} respecting when the {{ConsumerRebalanceListener.onPartitionsAssigned()}} method is invoked. For example, with {{onPartitionsAssigned()}}: * {{LegacyKafkaConsumer}}: the listener method is invoked when the consumer joins the group, even if that consumer was not assigned any partitions. In this case it's passed an empty list. * {{AsyncKafkaConsumer}}: the listener method is only invoked after the consumer joins the group iff it has assigned partitions This difference is affecting the system tests. The system tests use a Java class named {{VerifiableConsumer}} which uses a {{ConsumerRebalanceListener}} that logs when the callbacks are invoked. The system tests then read from that log to determine when the callbacks are invoked. This coordination is used by the system tests to determine the lifecycle and status of the consumers. The system tests rely heavily on the listener behavior of the {{LegacyKafkaConsumer}}. It invokes the {{onPartitionsAssigned()}} method when the consumer joins the group, and the system tests use that to determine when the consumer is actively a member of the group. This validation of membership is used as an assertion throughout the consumer-related tests. In the system test I'm executing from {{consumer_test.py}}, there's a test that creates three consumers to read from a single topic with a single partition. It's a bit of an oddball test, but it demonstrates the issue. Here are the logs pulled from the test run when executed using the {{LegacyKafkaConsumer}}: Node 1: {code:java} [2024-02-15 00:43:52,400] INFO Adding newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code} Node 2: {code:java} [2024-02-15 00:43:52,401] INFO Adding newly assigned partitions: test_topic-0 (org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code} Node 3: {code:java} [2024-02-15 00:43:52,399] INFO Adding newly assigned partitions: (org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code} Here are the logs when executing the same test using the {{AsyncKafkaConsumer}}: Node 1: {code:java} [2024-02-15 01:15:46,576] INFO Adding newly assigned partitions: test_topic-0 (org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code} Node 2: {code:java}n/a{code} Node 3: {code:java}n/a{code} As a result of this change, the existing system tests do not work with the new consumer. However, even more importantly, this change in behavior may adversely affect existing users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15475) Request might retry forever even if the user API timeout expires
[ https://issues.apache.org/jira/browse/KAFKA-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-15475. --- Resolution: Fixed > Request might retry forever even if the user API timeout expires > > > Key: KAFKA-15475 > URL: https://issues.apache.org/jira/browse/KAFKA-15475 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Reporter: Philip Nee >Assignee: Kirk True >Priority: Critical > Labels: consumer-threading-refactor, timeout > Fix For: 3.8.0 > > > If the request timeout in the background thread, it will be completed with > TimeoutException, which is Retriable. In the TopicMetadataRequestManager and > possibly other managers, the request might continue to be retried forever. > > There are two ways to fix this > # Pass a timer to the manager to remove the inflight requests when it is > expired. > # Pass the future to the application layer and continue to retry. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout
[ https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True reopened KAFKA-16200: --- > Enforce that RequestManager implementations respect user-provided timeout > - > > Key: KAFKA-16200 > URL: https://issues.apache.org/jira/browse/KAFKA-16200 > Project: Kafka > Issue Type: Improvement > Components: clients, consumer >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: consumer-threading-refactor, timeout > Fix For: 3.8.0 > > > The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to > block waiting for the event to complete. The application thread will block > for the timeout, but there is not yet a consistent manner in which events are > timed out. > Enforce at the request manager layer that timeouts are respected per the > design in KAFKA-15848. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16199) Prune the event queue if event timeout expired before starting
[ https://issues.apache.org/jira/browse/KAFKA-16199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16199. --- Resolution: Duplicate > Prune the event queue if event timeout expired before starting > -- > > Key: KAFKA-16199 > URL: https://issues.apache.org/jira/browse/KAFKA-16199 > Project: Kafka > Issue Type: Improvement > Components: clients, consumer >Reporter: Kirk True >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor, timeout > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout
[ https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16200. --- Resolution: Duplicate > Enforce that RequestManager implementations respect user-provided timeout > - > > Key: KAFKA-16200 > URL: https://issues.apache.org/jira/browse/KAFKA-16200 > Project: Kafka > Issue Type: Improvement > Components: clients, consumer >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: consumer-threading-refactor, timeout > Fix For: 3.8.0 > > > The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to > block waiting for the event to complete. The application thread will block > for the timeout, but there is not yet a consistent manner in which events are > timed out. > Enforce at the request manager layer that timeouts are respected per the > design in KAFKA-15848. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16019) Some of the tests in PlaintextConsumer can't seem to deterministically invoke and verify the consumer callback
[ https://issues.apache.org/jira/browse/KAFKA-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16019. --- Resolution: Fixed > Some of the tests in PlaintextConsumer can't seem to deterministically invoke > and verify the consumer callback > -- > > Key: KAFKA-16019 > URL: https://issues.apache.org/jira/browse/KAFKA-16019 > Project: Kafka > Issue Type: Test > Components: clients, consumer >Reporter: Philip Nee >Assignee: Kirk True >Priority: Critical > Labels: consumer-threading-refactor, integration-tests, timeout > Fix For: 3.8.0 > > > I was running the PlaintextConsumer to test the async consumer; however, a > few tests were failing with not being able to verify the listener is invoked > correctly > For example `testPerPartitionLeadMetricsCleanUpWithSubscribe` > Around 50% of the time, the listener's callsToAssigned was never incremented > correctly. Event changing it to awaitUntilTrue it was still the same case > {code:java} > consumer.subscribe(List(topic, topic2).asJava, listener) > val records = awaitNonEmptyRecords(consumer, tp) > assertEquals(1, listener.callsToAssigned, "should be assigned once") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16023) PlaintextConsumerTest needs to wait for reconciliation to complete before proceeding
[ https://issues.apache.org/jira/browse/KAFKA-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16023. --- Resolution: Fixed {{testPerPartitionLagMetricsCleanUpWithSubscribe}} is now passing consistently, so marking this as fixed. > PlaintextConsumerTest needs to wait for reconciliation to complete before > proceeding > > > Key: KAFKA-16023 > URL: https://issues.apache.org/jira/browse/KAFKA-16023 > Project: Kafka > Issue Type: Test > Components: clients, consumer >Reporter: Philip Nee >Assignee: Kirk True >Priority: Critical > Labels: consumer-threading-refactor, integration-tests, timeout > Fix For: 3.8.0 > > > Several tests in PlaintextConsumerTest.scala (such as > testPerPartitionLagMetricsCleanUpWithSubscribe) uses: > assertEquals(1, listener.callsToAssigned, "should be assigned once") > However, as the timing for reconciliation completion is not deterministic due > to asynchronous processing. We actually need to wait until the condition to > happen. > However, another issue is the timeout - some of these tasks might not > complete within the 600ms timeout, so the tests are deemed to be flaky. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15993) Enable max poll integration tests that depend on callback invocation
[ https://issues.apache.org/jira/browse/KAFKA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-15993. --- Resolution: Duplicate > Enable max poll integration tests that depend on callback invocation > > > Key: KAFKA-15993 > URL: https://issues.apache.org/jira/browse/KAFKA-15993 > Project: Kafka > Issue Type: Test > Components: clients, consumer >Reporter: Philip Nee >Assignee: Kirk True >Priority: Blocker > Labels: consumer-threading-refactor, integration-tests, timeout > Fix For: 3.8.0 > > > We will enable integration tests using the async consumer in KAFKA-15971. > However, we should also enable tests that rely on rebalance listeners after > KAFKA-15628 is closed. One example would be testMaxPollIntervalMs, that I > relies on the listener to verify the correctness. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16208) Design new Consumer timeout policy
[ https://issues.apache.org/jira/browse/KAFKA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16208. --- Resolution: Duplicate > Design new Consumer timeout policy > -- > > Key: KAFKA-16208 > URL: https://issues.apache.org/jira/browse/KAFKA-16208 > Project: Kafka > Issue Type: Task > Components: clients, consumer, documentation >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Blocker > Labels: consumer-threading-refactor, timeout > Fix For: 3.8.0 > > > This task is to design and document the timeout policy for the new Consumer > implementation. > The documentation lives here: > https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16287) Implement example test for common rebalance callback scenarios
Kirk True created KAFKA-16287: - Summary: Implement example test for common rebalance callback scenarios Key: KAFKA-16287 URL: https://issues.apache.org/jira/browse/KAFKA-16287 Project: Kafka Issue Type: Test Components: clients, consumer Reporter: Kirk True Assignee: Lucas Brutschy Fix For: 3.8.0 There is justified concern that the new threading model may not play well with "tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide some assurance that it will support complicated patterns. # Design and implement test scenarios # Update and document any design changes with the callback sub-system where needed # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by said design -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16276) Update transactions_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16276: - Summary: Update transactions_test.py to support KIP-848’s group protocol config Key: KAFKA-16276 URL: https://issues.apache.org/jira/browse/KAFKA-16276 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{transactions_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. The wrinkle here is that {{transactions_test.py}} was not able to run as-is. That might deprioritize this until whatever is causing that is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16275) Update kraft_upgrade_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16275: - Summary: Update kraft_upgrade_test.py to support KIP-848’s group protocol config Key: KAFKA-16275 URL: https://issues.apache.org/jira/browse/KAFKA-16275 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{kraft_upgrade_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16274) Update replica_scale_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16274: - Summary: Update replica_scale_test.py to support KIP-848’s group protocol config Key: KAFKA-16274 URL: https://issues.apache.org/jira/browse/KAFKA-16274 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{replica_scale_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16272) Update connect_distributed_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16272: - Summary: Update connect_distributed_test.py to support KIP-848’s group protocol config Key: KAFKA-16272 URL: https://issues.apache.org/jira/browse/KAFKA-16272 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{connect_distributed_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16273) Update consume_bench_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16273: - Summary: Update consume_bench_test.py to support KIP-848’s group protocol config Key: KAFKA-16273 URL: https://issues.apache.org/jira/browse/KAFKA-16273 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{consume_bench_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16271) Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16271: - Summary: Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config Key: KAFKA-16271 URL: https://issues.apache.org/jira/browse/KAFKA-16271 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{consumer_rolling_upgrade_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. The tricky wrinkle here is that the existing test relies on client-side assignment strategies that aren't applicable with the new KIP-848-enabled consumer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16270) Update snapshot_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16270: - Summary: Update snapshot_test.py to support KIP-848’s group protocol config Key: KAFKA-16270 URL: https://issues.apache.org/jira/browse/KAFKA-16270 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{snapshot_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16269) Update reassign_partitions_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16269: - Summary: Update reassign_partitions_test.py to support KIP-848’s group protocol config Key: KAFKA-16269 URL: https://issues.apache.org/jira/browse/KAFKA-16269 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{reassign_partitions_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16268) Update fetch_from_follower_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16268: - Summary: Update fetch_from_follower_test.py to support KIP-848’s group protocol config Key: KAFKA-16268 URL: https://issues.apache.org/jira/browse/KAFKA-16268 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{fetch_from_follower_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16267) Update consumer_group_command_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16267: - Summary: Update consumer_group_command_test.py to support KIP-848’s group protocol config Key: KAFKA-16267 URL: https://issues.apache.org/jira/browse/KAFKA-16267 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update the test method(s) in {{consumer_group_command_test.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument to the tests and matrixes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16256) Update ConsumerConfig to validate use of group.remote.assignor and partition.assignment.strategy based on group.protocol
Kirk True created KAFKA-16256: - Summary: Update ConsumerConfig to validate use of group.remote.assignor and partition.assignment.strategy based on group.protocol Key: KAFKA-16256 URL: https://issues.apache.org/jira/browse/KAFKA-16256 Project: Kafka Issue Type: Improvement Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Fix For: 3.8.0 {{ConsumerConfig}} supports both the {{group.remote.assignor}} and {{partition.assignment.strategy}} configuration options. These, however, should not be used together; the former is applicable only when the {{group.protocol}} is set to {{consumer}} and the latter when the {{group.protocol}} is set to {{{}classic{}}}. We should emit a warning if the user specifies the incorrect configuration based on the value of {{{}group.protocol{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16255) AsyncKafkaConsumer should not use partition.assignment.strategy
Kirk True created KAFKA-16255: - Summary: AsyncKafkaConsumer should not use partition.assignment.strategy Key: KAFKA-16255 URL: https://issues.apache.org/jira/browse/KAFKA-16255 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 The partition.assignment.strategy configuration is used to specify a list of zero or more ConsumerPartitionAssignor instances. However, that interface is not applicable for the KIP-848-based protocol on top of which AsyncKafkaConsumer is built. Therefore, the use of ConsumerPartitionAssignor is in appropriate and should be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16231) Update consumer_test.py to support KIP-848’s group protocol config
Kirk True created KAFKA-16231: - Summary: Update consumer_test.py to support KIP-848’s group protocol config Key: KAFKA-16231 URL: https://issues.apache.org/jira/browse/KAFKA-16231 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update {{verifiable_consumer.py}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{group_protocol}} argument. It will default to classic and we will take a separate task (Jira) to update the callers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16230) Update verifiable_consumer to support KIP-848’s group protocol config
Kirk True created KAFKA-16230: - Summary: Update verifiable_consumer to support KIP-848’s group protocol config Key: KAFKA-16230 URL: https://issues.apache.org/jira/browse/KAFKA-16230 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to update {{VerifiableConsumer}} to support the {{group.protocol}} configuration introduced in [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol] by adding an optional {{--group-protocol}} command line option. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15691) Add new system tests to use new consumer
[ https://issues.apache.org/jira/browse/KAFKA-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-15691. --- Resolution: Duplicate > Add new system tests to use new consumer > > > Key: KAFKA-15691 > URL: https://issues.apache.org/jira/browse/KAFKA-15691 > Project: Kafka > Issue Type: Test > Components: clients, consumer, system tests >Reporter: Kirk True >Assignee: Kirk True >Priority: Major > Labels: kip-848-client-support, system-tests > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16208) Design new Consumer timeout policy
Kirk True created KAFKA-16208: - Summary: Design new Consumer timeout policy Key: KAFKA-16208 URL: https://issues.apache.org/jira/browse/KAFKA-16208 Project: Kafka Issue Type: Task Components: clients, consumer, documentation Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 This task is to design and document the timeout policy for the new Consumer implementation. The documentation lives here: https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16200) Ensure RequestManager handling of expired timeouts are consistent
Kirk True created KAFKA-16200: - Summary: Ensure RequestManager handling of expired timeouts are consistent Key: KAFKA-16200 URL: https://issues.apache.org/jira/browse/KAFKA-16200 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16199) Prune the event queue if events have expired before starting
Kirk True created KAFKA-16199: - Summary: Prune the event queue if events have expired before starting Key: KAFKA-16199 URL: https://issues.apache.org/jira/browse/KAFKA-16199 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16191) Clean up of consumer client internal events
Kirk True created KAFKA-16191: - Summary: Clean up of consumer client internal events Key: KAFKA-16191 URL: https://issues.apache.org/jira/browse/KAFKA-16191 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 There are a few minor issues with the event sub-classes in the org.apache.kafka.clients.consumer.internals package that should be cleaned up: # Update the names of subclasses to remove "Application" or "Background" # Make toString() final in the base classes and clean up the implementations of toStringBase() # Fix minor whitespace inconsistencies -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup
Kirk True created KAFKA-16167: - Summary: Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup Key: KAFKA-16167 URL: https://issues.apache.org/jira/browse/KAFKA-16167 Project: Kafka Issue Type: Bug Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLeadMetricsCleanUpWithSubscribe
[ https://issues.apache.org/jira/browse/KAFKA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16151. --- Resolution: Duplicate > Fix PlaintextConsumerTest.testPerPartitionLeadMetricsCleanUpWithSubscribe > - > > Key: KAFKA-16151 > URL: https://issues.apache.org/jira/browse/KAFKA-16151 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, unit tests >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor, kip-848-client-support > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
[ https://issues.apache.org/jira/browse/KAFKA-16150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-16150. --- Resolution: Duplicate > Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe > > > Key: KAFKA-16150 > URL: https://issues.apache.org/jira/browse/KAFKA-16150 > Project: Kafka > Issue Type: Bug > Components: clients, consumer, unit tests >Affects Versions: 3.7.0 >Reporter: Kirk True >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor, kip-848-client-support > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16152) Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart
Kirk True created KAFKA-16152: - Summary: Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart Key: KAFKA-16152 URL: https://issues.apache.org/jira/browse/KAFKA-16152 Project: Kafka Issue Type: Bug Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe
Kirk True created KAFKA-16151: - Summary: Fix PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe Key: KAFKA-16151 URL: https://issues.apache.org/jira/browse/KAFKA-16151 Project: Kafka Issue Type: Bug Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
Kirk True created KAFKA-16150: - Summary: Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe Key: KAFKA-16150 URL: https://issues.apache.org/jira/browse/KAFKA-16150 Project: Kafka Issue Type: Bug Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16149) Aggressively expire unused client connections
Kirk True created KAFKA-16149: - Summary: Aggressively expire unused client connections Key: KAFKA-16149 URL: https://issues.apache.org/jira/browse/KAFKA-16149 Project: Kafka Issue Type: Improvement Components: clients, consumer, producer Reporter: Kirk True Assignee: Kirk True The goal is to minimize the number of connections from the client to the brokers. On the Java client, there are potentially two types of network connections to brokers: # Connections for metadata requests # Connections for fetch, produce, etc. requests The idea is to apply a much shorter idle time to client connections that have _only_ served metadata (type 1 above) so that they become candidates for expiration more quickly. Alternatively (or additionally), a change to the way metadata requests are routed could be made to reduce the number of connections. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16142) Update metrics documentation for errors and new metrics
Kirk True created KAFKA-16142: - Summary: Update metrics documentation for errors and new metrics Key: KAFKA-16142 URL: https://issues.apache.org/jira/browse/KAFKA-16142 Project: Kafka Issue Type: Task Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16143) New metrics for KIP-848 protocol
Kirk True created KAFKA-16143: - Summary: New metrics for KIP-848 protocol Key: KAFKA-16143 URL: https://issues.apache.org/jira/browse/KAFKA-16143 Project: Kafka Issue Type: Task Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Philip Nee Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16112) Review JMX metrics in Async Consumer and determine the missing ones
Kirk True created KAFKA-16112: - Summary: Review JMX metrics in Async Consumer and determine the missing ones Key: KAFKA-16112 URL: https://issues.apache.org/jira/browse/KAFKA-16112 Project: Kafka Issue Type: Task Components: clients, consumer Reporter: Kirk True Assignee: Philip Nee Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16111) Implement tests for tricky rebalance callbacks scenarios
Kirk True created KAFKA-16111: - Summary: Implement tests for tricky rebalance callbacks scenarios Key: KAFKA-16111 URL: https://issues.apache.org/jira/browse/KAFKA-16111 Project: Kafka Issue Type: Test Components: clients, consumer Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16110) Implement consumer performance tests
Kirk True created KAFKA-16110: - Summary: Implement consumer performance tests Key: KAFKA-16110 URL: https://issues.apache.org/jira/browse/KAFKA-16110 Project: Kafka Issue Type: New Feature Components: clients, consumer Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16109) Ensure system tests cover the "simple consumer + commit" use case
Kirk True created KAFKA-16109: - Summary: Ensure system tests cover the "simple consumer + commit" use case Key: KAFKA-16109 URL: https://issues.apache.org/jira/browse/KAFKA-16109 Project: Kafka Issue Type: Improvement Components: clients, consumer, system tests Reporter: Kirk True Assignee: Kirk True Fix For: 3.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15250) DefaultBackgroundThread is running tight loop
[ https://issues.apache.org/jira/browse/KAFKA-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True reopened KAFKA-15250: --- Assignee: Kirk True (was: Philip Nee) > DefaultBackgroundThread is running tight loop > - > > Key: KAFKA-15250 > URL: https://issues.apache.org/jira/browse/KAFKA-15250 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Reporter: Philip Nee >Assignee: Kirk True >Priority: Major > Labels: consumer-threading-refactor > > The DefaultBackgroundThread is running tight loops and wasting CPU cycles. I > think we need to reexamine the timeout pass to networkclientDelegate.poll. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16037) Upgrade existing system tests to use new consumer
Kirk True created KAFKA-16037: - Summary: Upgrade existing system tests to use new consumer Key: KAFKA-16037 URL: https://issues.apache.org/jira/browse/KAFKA-16037 Project: Kafka Issue Type: Test Components: clients, consumer, system tests Reporter: Kirk True Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16029) Investigate cause of "Unable to find FetchSessionHandler for node X" in logs
Kirk True created KAFKA-16029: - Summary: Investigate cause of "Unable to find FetchSessionHandler for node X" in logs Key: KAFKA-16029 URL: https://issues.apache.org/jira/browse/KAFKA-16029 Project: Kafka Issue Type: Task Components: clients, consumer Affects Versions: 3.7.0 Reporter: Kirk True Assignee: Kirk True -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16011) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnClose
Kirk True created KAFKA-16011: - Summary: Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnClose Key: KAFKA-16011 URL: https://issues.apache.org/jira/browse/KAFKA-16011 Project: Kafka Issue Type: Test Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True The integration test {{PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling}} is failing when using the {{AsyncKafkaConsumer}}. The error is: {code} org.opentest4j.AssertionFailedError: Did not get valid assignment for partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-5, topic1-1, topic1-0, topic1-3] after one consumer left at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) at org.junit.jupiter.api.Assertions.fail(Assertions.java:134) at kafka.api.AbstractConsumerTest.validateGroupAssignment(AbstractConsumerTest.scala:286) at kafka.api.PlaintextConsumerTest.runMultiConsumerSessionTimeoutTest(PlaintextConsumerTest.scala:1883) at kafka.api.PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling(PlaintextConsumerTest.scala:1281) {code} The logs include these lines: {code} [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, groupId=my-test] consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, groupId=my-test] consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) [2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, groupId=my-test] consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) {code} I don't know if that's related or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16010) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling
Kirk True created KAFKA-16010: - Summary: Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling Key: KAFKA-16010 URL: https://issues.apache.org/jira/browse/KAFKA-16010 Project: Kafka Issue Type: Test Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True The integration test {{PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation}} is failing when using the {{AsyncKafkaConsumer}}. The error is: {code} org.opentest4j.AssertionFailedError: Timed out before expected rebalance completed at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) at org.junit.jupiter.api.Assertions.fail(Assertions.java:134) at kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317) at kafka.api.PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation(PlaintextConsumerTest.scala:235) {code} The logs include this line: {code} [2023-12-13 15:20:27,824] WARN [Consumer clientId=ConsumerTestConsumer, groupId=my-test] consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) {code} I don't know if that's related or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16009) Fix PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation
Kirk True created KAFKA-16009: - Summary: Fix PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation Key: KAFKA-16009 URL: https://issues.apache.org/jira/browse/KAFKA-16009 Project: Kafka Issue Type: Test Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is failing when using the {{AsyncKafkaConsumer}}. The error is: {code} org.opentest4j.AssertionFailedError: Timed out before expected rebalance completed at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) at org.junit.jupiter.api.Assertions.fail(Assertions.java:134) at kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317) at kafka.api.PlaintextConsumerTest.testMaxPollIntervalMs(PlaintextConsumerTest.scala:194) {code} The logs include this line: {code} [2023-12-13 15:11:16,134] WARN [Consumer clientId=ConsumerTestConsumer, groupId=my-test] consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188) {code} I don't know if that's related or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16008) Fix PlaintextConsumerTest.testMaxPollIntervalMs
Kirk True created KAFKA-16008: - Summary: Fix PlaintextConsumerTest.testMaxPollIntervalMs Key: KAFKA-16008 URL: https://issues.apache.org/jira/browse/KAFKA-16008 Project: Kafka Issue Type: Test Components: clients, consumer, unit tests Affects Versions: 3.7.0 Reporter: Kirk True The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is failing with the {{{}AsyncKafkaConsumer{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15974) Enforce that CompletableApplicationEvent has a timeout
Kirk True created KAFKA-15974: - Summary: Enforce that CompletableApplicationEvent has a timeout Key: KAFKA-15974 URL: https://issues.apache.org/jira/browse/KAFKA-15974 Project: Kafka Issue Type: Improvement Components: clients, consumer Reporter: Kirk True Assignee: Kirk True The intention of the CompletableApplicationEvent is for a Consumer to block waiting for the event to complete. The application thread will block for the timeout, but there is not yet a consistent manner in which events that have timed out are: a) pruned from the event queue if they've expired before starting b) canceled by the background thread if they've expired after starting -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15909) Remove support for empty "group.id"
Kirk True created KAFKA-15909: - Summary: Remove support for empty "group.id" Key: KAFKA-15909 URL: https://issues.apache.org/jira/browse/KAFKA-15909 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True Assignee: Kirk True Per KIP-266, the {{Consumer.poll(long timeout)}} method was deprecated back in 2.0.0. In 3.7, there are two implementations, each with different behavior: * The {{LegacyKafkaConsumer}} implementation will continue to work but will log a warning about its removal * The {{AsyncKafkaConsumer}} implementation will throw an error. In 4.0, the `poll` method that takes a single `long` timeout will be removed altogether. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15908) Remove deprecated Consumer API poll(long timeout)
Kirk True created KAFKA-15908: - Summary: Remove deprecated Consumer API poll(long timeout) Key: KAFKA-15908 URL: https://issues.apache.org/jira/browse/KAFKA-15908 Project: Kafka Issue Type: Sub-task Reporter: Kirk True Assignee: Kirk True Per KIP-266, the {{Consumer.poll(long timeout)}} method was deprecated back in 2.0.0. In 3.7, there are two implementations, each with different behavior: * The {{LegacyKafkaConsumer}} implementation will continue to work but will log a warning about its removal * The {{AsyncKafkaConsumer}} implementation will throw an error. In 4.0, the `poll` method that takes a single `long` timeout will be removed altogether. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15907) Remove previously deprecated Consumer features from 4.0
Kirk True created KAFKA-15907: - Summary: Remove previously deprecated Consumer features from 4.0 Key: KAFKA-15907 URL: https://issues.apache.org/jira/browse/KAFKA-15907 Project: Kafka Issue Type: Task Reporter: Kirk True Assignee: Kirk True This Jira serves as the main collection of APIs, logic, etc. that were previously marked as "deprecated" by other KIPs. With 4.0, we will be updating the code to remove the deprecated features. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15848) Consumer API timeout inconsistent between LegacyKafkaConsumer and AsyncKafkaConsumer
Kirk True created KAFKA-15848: - Summary: Consumer API timeout inconsistent between LegacyKafkaConsumer and AsyncKafkaConsumer Key: KAFKA-15848 URL: https://issues.apache.org/jira/browse/KAFKA-15848 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Kirk True -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15837) Throw error on use of Consumer.poll(long timeout)
Kirk True created KAFKA-15837: - Summary: Throw error on use of Consumer.poll(long timeout) Key: KAFKA-15837 URL: https://issues.apache.org/jira/browse/KAFKA-15837 Project: Kafka Issue Type: Improvement Components: clients, consumer Reporter: Kirk True Assignee: Kirk True Fix For: 3.7.0 Per [KIP-266|https://cwiki.apache.org/confluence/x/5kiHB], the Consumer.poll(long timeout) method was deprecated back in 2.0.0. The method will now throw a KafkaException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15694) New integration tests to have full coverage for preview
[ https://issues.apache.org/jira/browse/KAFKA-15694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk True resolved KAFKA-15694. --- Assignee: Kirk True Resolution: Duplicate > New integration tests to have full coverage for preview > --- > > Key: KAFKA-15694 > URL: https://issues.apache.org/jira/browse/KAFKA-15694 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer >Reporter: Kirk True >Assignee: Kirk True >Priority: Minor > Labels: kip-848, kip-848-client-support, kip-848-preview > > These are to fix bugs discovered during PR reviews but not tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15767) Redesign TransactionManager to avoid use of ThreadLocal
Kirk True created KAFKA-15767: - Summary: Redesign TransactionManager to avoid use of ThreadLocal Key: KAFKA-15767 URL: https://issues.apache.org/jira/browse/KAFKA-15767 Project: Kafka Issue Type: Improvement Components: clients, producer Affects Versions: 3.6.0 Reporter: Kirk True Assignee: Kirk True Fix For: 3.7.0 A {{TransactionManager}} instance is created by the {{KafkaProducer}} and shared with the {{Sender}} thread. The {{TransactionManager}} has internal states through which it transitions as part of its initialization, transaction management, shutdown, etc. It contains logic to ensure that those state transitions are valid, such that when an invalid transition is attempted, it is handled appropriately. The issue is, the definition of "handled appropriately" depends on which thread is making the API call that is attempting an invalid transition. The application thread expects that the invalid transition will generate an exception. However, the sender thread expects that the invalid transition will "poison" the entire {{TransactionManager}} instance. So as part of the implementation of KAFKA-14831, we needed a way to change logic in the {{TransactionManager}} on a per-thread basis, so a {{ThreadLocal}} instance variable was added to the {{TransactionManager}} to keep track of this. However, the use of ThreadLocal instance variables is generally discouraged because of their tendency for memory leaks, shared state across multiple threads, and not working with virtual threads. The initial implementation attempt of KAFKA-14831 used a context object, passed in to each method, to affect the logic. However, the number of methods that needed to be changed to accept this new context object grew until most of the methods in {{TransactionManager}} needed to be updated. Thus all the affected call sites needed to be updated, resulting in a much larger change than anticipated. The task here is to remove the use of the {{ThreadLocal}} instance variable, let the application thread and {{Sender}} thread keep their desired behavior, but keep the change to a minimum. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15696) Revoke partitions on Consumer.close()
Kirk True created KAFKA-15696: - Summary: Revoke partitions on Consumer.close() Key: KAFKA-15696 URL: https://issues.apache.org/jira/browse/KAFKA-15696 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True Assignee: Philip Nee Upon closing of the {{Consumer}} we need to: # Complete pending commits # Revoke assignment (Note that the revocation involves stop fetching, committing offsets if auto-commit enabled and invoking the onPartitionsRevoked callback) # Send the last GroupConsumerHeartbeatRequest with epoch = -1 to leave the group (or -2 if static member) # Close any fetch sessions on the brokers # Poll the NetworkClient to complete pending I/O There is a mechanism introduced in PR [14406|https://github.com/apache/kafka/pull/14406] that allows for performing network I/O on shutdown. The new method {{DefaultBackgroundThread.runAtClose()}} will be executed when {{Consumer.close()}} is invoked. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15694) New integration tests to have full coverage for preview
Kirk True created KAFKA-15694: - Summary: New integration tests to have full coverage for preview Key: KAFKA-15694 URL: https://issues.apache.org/jira/browse/KAFKA-15694 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True These are to fix bugs discovered during PR reviews but not tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15692) New integration tests to have full coverage
Kirk True created KAFKA-15692: - Summary: New integration tests to have full coverage Key: KAFKA-15692 URL: https://issues.apache.org/jira/browse/KAFKA-15692 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15691) Upgrade existing and add new system tests to use new coordinator
Kirk True created KAFKA-15691: - Summary: Upgrade existing and add new system tests to use new coordinator Key: KAFKA-15691 URL: https://issues.apache.org/jira/browse/KAFKA-15691 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Kirk True -- This message was sent by Atlassian Jira (v8.20.10#820010)