[jira] [Created] (KAFKA-16899) Rename MembershipManagerImpl's rebalanceTimeoutMs for clarity

2024-06-05 Thread Kirk True (Jira)
Kirk True created KAFKA-16899:
-

 Summary: Rename MembershipManagerImpl's rebalanceTimeoutMs for 
clarity
 Key: KAFKA-16899
 URL: https://issues.apache.org/jira/browse/KAFKA-16899
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Affects Versions: 3.8.0
Reporter: Kirk True
 Fix For: 4.0.0


The naming of {{{}MembershipManagerImpl{}}}'s {{rebalanceTimeoutMs}} is a 
little confusing.

Ultimately the broker enforces the rebalance process' timeout, not the client. 
It's therefore a little misleading to use {{rebalanceTimeoutMs}} as the name of 
a variable that only handles the client's commit portion of the process. It is 
used in {{MembershipManagerImpl}} as a means to limit the client's efforts in 
the case where it is repeatedly trying to commit but failing.

A suggested name change was {{{}commitTimeoutDuringReconciliation{}}}.

It is not clear to me that simply changing the name from {{rebalanceTimeoutMs}} 
to {{commitTimeoutDuringReconciliation}} is sufficient. When a caller 
instantiates a  {{{}MembershipManagerImpl{}}}, it is passing in a variable 
named {{rebalanceTimeoutMs}} that ultimately comes from the {{ClientConfig}} 
object. I can imagine it being a little confusing that a value for 
{{rebalanceTimeoutMs}} is passed in, but then really only used as the value for 
{{commitTimeoutDuringReconciliation}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16818) Move event-processing tests from ConsumerNetworkThreadTest to ApplicationEventProcessorTest

2024-05-22 Thread Kirk True (Jira)
Kirk True created KAFKA-16818:
-

 Summary: Move event-processing tests from 
ConsumerNetworkThreadTest to ApplicationEventProcessorTest
 Key: KAFKA-16818
 URL: https://issues.apache.org/jira/browse/KAFKA-16818
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, unit tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{ConsumerNetworkThreadTest}} currently has a number of tests which do the 
following:
 # Add event of type _T_ to the event queue
 # Call {{ConsumerNetworkThread.runOnce()}} to dequeue the events and call 
{{ApplicationEventProcessor.process()}}
 # Verify that the appropriate {{ApplicationEventProcessor}} process method was 
invoked for the event

Those types of tests should be moved to {{{}ApplicationEventProcessorTest{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer

2024-05-21 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16578.
---
Resolution: Won't Fix

Most of the {{connect_distributed_test.py}} system tests were fixed, and 
{{test_exactly_once_source}} was reverted in a separate Jira/PR.

> Revert changes to connect_distributed_test.py for the new async Consumer
> 
>
> Key: KAFKA-16578
> URL: https://issues.apache.org/jira/browse/KAFKA-16578
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer, system tests
>Affects Versions: 3.8.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Critical
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated 
> a slew of system tests to run both the "old" and "new" implementations. 
> KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it 
> could test the new consumer with Connect. However, we are not supporting 
> Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the 
> Connect system tests with the new {{AsyncKafkaConsumer}}, we get errors like 
> the following:
> {code}
> test_id:
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   6 minutes 3.899 seconds
> InsufficientResourcesError('Not enough nodes available to allocate. linux 
> nodes requested: 1. linux nodes available: 0')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 919, in test_exactly_once_source
> consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, 
> self.source.topic, consumer_timeout_ms=1000, print_key=True)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py",
>  line 97, in __init__
> BackgroundThreadService.__init__(self, context, num_nodes)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
>  line 26, in __init__
> super(BackgroundThreadService, self).__init__(context, num_nodes, 
> cluster_spec, *args, **kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
>  line 107, in __init__
> self.allocate_nodes()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
>  line 217, in allocate_nodes
> self.nodes = self.cluster.alloc(self.cluster_spec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py",
>  line 54, in alloc
> allocated = self.do_alloc(cluster_spec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py",
>  line 31, in do_alloc
> allocated = self._available_nodes.remove_spec(cluster_spec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py",
>  line 117, in remove_spec
> raise InsufficientResourcesError("Not enough nodes available to allocate. 
> " + msg)
> ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes 
> available to allocate. linux nodes requested: 1. linux nodes available: 0
> {code}
> The task here is to revert the changes made in KAFKA-16272 [PR 
> 15576|https://github.com/apache/kafka/pull/15576].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16787) Remove TRACE level logging from AsyncKafkaConsumer hot path

2024-05-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16787:
-

 Summary: Remove TRACE level logging from AsyncKafkaConsumer hot 
path
 Key: KAFKA-16787
 URL: https://issues.apache.org/jira/browse/KAFKA-16787
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, logging
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16642) Update KafkaConsumerTest to show parameters in test lists

2024-04-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16642:
-

 Summary: Update KafkaConsumerTest to show parameters in test lists
 Key: KAFKA-16642
 URL: https://issues.apache.org/jira/browse/KAFKA-16642
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


{{KafkaConsumerTest}} was recently updated to make many of its tests 
parameterized to exercise both the {{CLASSIC}} and {{CONSUMER}} group 
protocols. However, in some of the tools in which [lists of tests are 
provided|https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=org.apache.kafka.clients.consumer.KafkaConsumerTest=FLAKY],
 say, for analysis, the group protocol information is not exposed. For example, 
one test ({{{}testReturnRecordsDuringRebalance{}}}) is flaky, but it's 
difficult to know at a glance which group protocol is causing the problem 
because the list simply shows:
{quote}{{testReturnRecordsDuringRebalance(GroupProtocol)[1]}}
{quote}
Ideally, it would expose more information, such as:
{quote}{{testReturnRecordsDuringRebalance(GroupProtocol=CONSUMER)}}
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16460) New consumer times out consuming records in consumer_test.py system test

2024-04-29 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16460.
---
Resolution: Duplicate

> New consumer times out consuming records in consumer_test.py system test
> 
>
> Key: KAFKA-16460
> URL: https://issues.apache.org/jira/browse/KAFKA-16460
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{consumer_test.py}} system test fails with the following errors:
> {quote}
> * Timed out waiting for consumption
> {quote}
> Affected tests:
> * {{test_broker_failure}}
> * {{test_consumer_bounce}}
> * {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16609) Update parse_describe_topic to support new topic describe output

2024-04-26 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16609.
---
  Reviewer: Lucas Brutschy
Resolution: Fixed

> Update parse_describe_topic to support new topic describe output
> 
>
> Key: KAFKA-16609
> URL: https://issues.apache.org/jira/browse/KAFKA-16609
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, system tests
>Affects Versions: 3.8.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: system-test-failure
> Fix For: 3.8.0
>
>
> It appears that recent changes to the describe topic output has broken the 
> system test's ability to parse the output.
> {noformat}
> test_id:
> kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=False.reassign_from_offset_zero=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   50.333 seconds
> IndexError('list index out of range')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
>  line 175, in test_reassign_partitions
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.reassign_partitions(bounce_brokers))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 105, in run_produce_consume_validate
> core_test_action(*args)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
>  line 175, in 
> self.run_produce_consume_validate(core_test_action=lambda: 
> self.reassign_partitions(bounce_brokers))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
>  line 82, in reassign_partitions
> partition_info = 
> self.kafka.parse_describe_topic(self.kafka.describe_topic(self.topic))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 1400, in parse_describe_topic
> fields = list(map(lambda x: x.split(" ")[1], fields))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
>  line 1400, in 
> fields = list(map(lambda x: x.split(" ")[1], fields))
> IndexError: list index out of range
> {noformat} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16623) KafkaAsyncConsumer system tests warn about revoking partitions that weren't previously assigned

2024-04-25 Thread Kirk True (Jira)
Kirk True created KAFKA-16623:
-

 Summary: KafkaAsyncConsumer system tests warn about revoking 
partitions that weren't previously assigned
 Key: KAFKA-16623
 URL: https://issues.apache.org/jira/browse/KAFKA-16623
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


When running system tests for the KafkaAsyncConsumer, we occasionally see this 
warning:

{noformat}
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
 line 38, in _protected_worker
self._worker(idx, node)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/verifiable_consumer.py",
 line 304, in _worker
handler.handle_partitions_revoked(event, node, self.logger)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/verifiable_consumer.py",
 line 163, in handle_partitions_revoked
(tp, node.account.hostname)
AssertionError: Topic partition TopicPartition(topic='test_topic', partition=0) 
cannot be revoked from worker20 as it was not previously assigned to that 
consumer
{noformat}

It is unclear what is causing this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16609) Update parse_describe_topic to support new topic describe output

2024-04-23 Thread Kirk True (Jira)
Kirk True created KAFKA-16609:
-

 Summary: Update parse_describe_topic to support new topic describe 
output
 Key: KAFKA-16609
 URL: https://issues.apache.org/jira/browse/KAFKA-16609
 Project: Kafka
  Issue Type: Bug
  Components: admin, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


It appears that recent changes to the describe topic output has broken the 
system test's ability to parse the output.

{noformat}
test_id:
kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=False.reassign_from_offset_zero=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   50.333 seconds


IndexError('list index out of range')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 175, in test_reassign_partitions
self.run_produce_consume_validate(core_test_action=lambda: 
self.reassign_partitions(bounce_brokers))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 105, in run_produce_consume_validate
core_test_action(*args)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 175, in 
self.run_produce_consume_validate(core_test_action=lambda: 
self.reassign_partitions(bounce_brokers))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 82, in reassign_partitions
partition_info = 
self.kafka.parse_describe_topic(self.kafka.describe_topic(self.topic))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
 line 1400, in parse_describe_topic
fields = list(map(lambda x: x.split(" ")[1], fields))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/kafka/kafka.py",
 line 1400, in 
fields = list(map(lambda x: x.split(" ")[1], fields))
IndexError: list index out of range
{noformat} 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16462) New consumer fails with timeout in security_test.py system test

2024-04-23 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16462.
---
Resolution: Duplicate

> New consumer fails with timeout in security_test.py system test
> ---
>
> Key: KAFKA-16462
> URL: https://issues.apache.org/jira/browse/KAFKA-16462
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{security_test.py}} system test fails with the following error:
> {noformat}
> test_id:
> kafkatest.tests.core.security_test.SecurityTest.test_client_ssl_endpoint_validation_failure.security_protocol=SSL.interbroker_security_protocol=PLAINTEXT.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   1 minute 30.885 seconds
> TimeoutError('')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/security_test.py",
>  line 142, in test_client_ssl_endpoint_validation_failure
> wait_until(lambda: self.producer_consumer_have_expected_error(error), 
> timeout_sec=30)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
>  line 58, in wait_until
> raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
> last_exception
> ducktape.errors.TimeoutError
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16464) New consumer fails with timeout in replication_replica_failure_test.py system test

2024-04-23 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16464.
---
Resolution: Duplicate

> New consumer fails with timeout in replication_replica_failure_test.py system 
> test
> --
>
> Key: KAFKA-16464
> URL: https://issues.apache.org/jira/browse/KAFKA-16464
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> The {{replication_replica_failure_test.py}} system test fails with the 
> following error:
> {noformat}
> test_id:
> kafkatest.tests.core.replication_replica_failure_test.ReplicationReplicaFailureTest.test_replication_with_replica_failure.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   1 minute 20.972 seconds
> TimeoutError('Timed out after 30s while awaiting initial record delivery 
> of 5 records')
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/replication_replica_failure_test.py",
>  line 97, in test_replication_with_replica_failure
> self.await_startup()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/end_to_end.py",
>  line 125, in await_startup
> (timeout_sec, min_records))
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
>  line 58, in wait_until
> raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
> last_exception
> ducktape.errors.TimeoutError: Timed out after 30s while awaiting initial 
> record delivery of 5 records
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16579) Revert changes to consumer_rolling_upgrade_test.py for the new async Consumer

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16579:
-

 Summary: Revert changes to consumer_rolling_upgrade_test.py for 
the new async Consumer
 Key: KAFKA-16579
 URL: https://issues.apache.org/jira/browse/KAFKA-16579
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated a 
slew of system tests to run both the "old" and "new" implementations. 
KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it 
could test the new consumer with Connect. However, we are not supporting 
Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the Connect 
system tests with the new {{{}AsyncKafkaConsumer{}}}, we get errors like the 
following:
{code:java}
test_id:
kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   6 minutes 3.899 seconds


InsufficientResourcesError('Not enough nodes available to allocate. linux 
nodes requested: 1. linux nodes available: 0')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
 line 919, in test_exactly_once_source
consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, 
self.source.topic, consumer_timeout_ms=1000, print_key=True)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py",
 line 97, in __init__
BackgroundThreadService.__init__(self, context, num_nodes)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
 line 26, in __init__
super(BackgroundThreadService, self).__init__(context, num_nodes, 
cluster_spec, *args, **kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 107, in __init__
self.allocate_nodes()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 217, in allocate_nodes
self.nodes = self.cluster.alloc(self.cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py",
 line 54, in alloc
allocated = self.do_alloc(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py",
 line 31, in do_alloc
allocated = self._available_nodes.remove_spec(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py",
 line 117, in remove_spec
raise InsufficientResourcesError("Not enough nodes available to allocate. " 
+ msg)
ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes 
available to allocate. linux nodes requested: 1. linux nodes available: 0
{code}
The task here is to revert the changes made in KAFKA-16272 [PR 
15576|https://github.com/apache/kafka/pull/15576].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16578) Revert changes to connect_distributed_test.py for the new async Consumer

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16578:
-

 Summary: Revert changes to connect_distributed_test.py for the new 
async Consumer
 Key: KAFKA-16578
 URL: https://issues.apache.org/jira/browse/KAFKA-16578
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


To test the new, asynchronous Kafka {{Consumer}} implementation, we migrated a 
slew of system tests to run both the "old" and "new" implementations. 
KAFKA-16272 updated the system tests in {{connect_distributed_test.py}} so it 
could test the new consumer with Connect. However, we are not supporting 
Connect with the new consumer in 3.8.0. Unsurprisingly, when we run the Connect 
system tests with the new {{{}AsyncKafkaConsumer{}}}, we get errors like the 
following:
{code:java}
test_id:
kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_exactly_once_source.clean=False.connect_protocol=eager.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   6 minutes 3.899 seconds


InsufficientResourcesError('Not enough nodes available to allocate. linux 
nodes requested: 1. linux nodes available: 0')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
 line 919, in test_exactly_once_source
consumer_validator = ConsoleConsumer(self.test_context, 1, self.kafka, 
self.source.topic, consumer_timeout_ms=1000, print_key=True)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/services/console_consumer.py",
 line 97, in __init__
BackgroundThreadService.__init__(self, context, num_nodes)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/background_thread.py",
 line 26, in __init__
super(BackgroundThreadService, self).__init__(context, num_nodes, 
cluster_spec, *args, **kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 107, in __init__
self.allocate_nodes()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/services/service.py",
 line 217, in allocate_nodes
self.nodes = self.cluster.alloc(self.cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/cluster.py",
 line 54, in alloc
allocated = self.do_alloc(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/finite_subcluster.py",
 line 31, in do_alloc
allocated = self._available_nodes.remove_spec(cluster_spec)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/cluster/node_container.py",
 line 117, in remove_spec
raise InsufficientResourcesError("Not enough nodes available to allocate. " 
+ msg)
ducktape.cluster.node_container.InsufficientResourcesError: Not enough nodes 
available to allocate. linux nodes requested: 1. linux nodes available: 0
{code}
The task here is to revert the changes made in KAFKA-16272 [PR 
15576|https://github.com/apache/kafka/pull/15576].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16577) New consumer fails with stop timeout in consumer_test.py’s test_consumer_bounce system test

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16577:
-

 Summary: New consumer fails with stop timeout in 
consumer_test.py’s test_consumer_bounce system test
 Key: KAFKA-16577
 URL: https://issues.apache.org/jira/browse/KAFKA-16577
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test intermittently fails with the following 
error:

{code}
test_id:
kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_failure.clean_shutdown=True.enable_autocommit=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   42.582 seconds


AssertionError()
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/consumer_test.py",
 line 399, in test_consumer_failure
assert partition_owner is not None
AssertionError
Notify
{code}

Affected tests:
 * {{test_consumer_failure}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16576) New consumer fails with assert in consumer_test.py’s test_consumer_failure system test

2024-04-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16576:
-

 Summary: New consumer fails with assert in consumer_test.py’s 
test_consumer_failure system test
 Key: KAFKA-16576
 URL: https://issues.apache.org/jira/browse/KAFKA-16576
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test fails with the following errors:

{quote}
* Timed out waiting for consumption
{quote}

Affected tests:

* {{test_broker_failure}}
* {{test_consumer_bounce}}
* {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16405) Mismatch assignment error when running consumer rolling upgrade system tests

2024-04-17 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16405.
---
  Reviewer: Lucas Brutschy
Resolution: Fixed

> Mismatch assignment error when running consumer rolling upgrade system tests
> 
>
> Key: KAFKA-16405
> URL: https://issues.apache.org/jira/browse/KAFKA-16405
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> relevant to [https://github.com/apache/kafka/pull/15578]
>  
> We are seeing:
> {code:java}
> 
> SESSION REPORT (ALL TESTS)
> ducktape version: 0.11.4
> session_id:   2024-03-21--001
> run time: 3 minutes 24.632 seconds
> tests run:7
> passed:   5
> flaky:0
> failed:   2
> ignored:  0
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=COMBINED_KRAFT.use_new_coordinator=True.group_protocol=classic
> status: PASS
> run time:   24.599 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=COMBINED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   26.638 seconds
> AssertionError("Mismatched assignment: {frozenset(), 
> frozenset({TopicPartition(topic='test_topic', partition=3), 
> TopicPartition(topic='test_topic', partition=0), 
> TopicPartition(topic='test_topic', partition=1), 
> TopicPartition(topic='test_topic', partition=2)})}")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
> data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 246, in run_test
> return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py",
>  line 77, in rolling_update_test
> self._verify_range_assignment(consumer)
>   File 
> "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_rolling_upgrade_test.py",
>  line 38, in _verify_range_assignment
> assert assignment == set([
> AssertionError: Mismatched assignment: {frozenset(), 
> frozenset({TopicPartition(topic='test_topic', partition=3), 
> TopicPartition(topic='test_topic', partition=0), 
> TopicPartition(topic='test_topic', partition=1), 
> TopicPartition(topic='test_topic', partition=2)})}
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=False
> status: PASS
> run time:   29.815 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True
> status: PASS
> run time:   29.766 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=classic
> status: PASS
> run time:   30.086 seconds
> 
> test_id:
> kafkatest.tests.client.consumer_rolling_upgrade_test.ConsumerRollingUpgradeTest.rolling_update_test.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   35.965 seconds
> AssertionError("Mismatched assignment: {frozenset(), 
> frozenset({TopicPartition(topic='test_topic', partition=3), 
> TopicPartition(topic='test_topic', partition=0), 
> TopicPartition(topic='test_topic', partition=1), 
> TopicPartition(topic='test_topic', partition=2)})}")
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
>   

[jira] [Created] (KAFKA-16565) IncrementalAssignmentConsumerEventHandler throws error when attempting to remove a partition that isn't assigned

2024-04-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16565:
-

 Summary: IncrementalAssignmentConsumerEventHandler throws error 
when attempting to remove a partition that isn't assigned
 Key: KAFKA-16565
 URL: https://issues.apache.org/jira/browse/KAFKA-16565
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.8.0
Reporter: Kirk True
 Fix For: 3.8.0


In {{{}verifiable_consumer.py{}}}, the Incremental

 
{code:java}
def handle_partitions_revoked(self, event):
self.revoked_count += 1
self.state = ConsumerState.Rebalancing
self.position = {}
for topic_partition in event["partitions"]:
topic = topic_partition["topic"]
partition = topic_partition["partition"]
self.assignment.remove(TopicPartition(topic, partition))
 {code}
If the {{self.assignment.remove()}} call is passed a {{TopicPartition}} that 
isn't in the list, an error is thrown. For now, we should first check that the 
{{TopicPartition}} is in the list, and if not, log a warning or something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16558) Implement HeartbeatRequestState.toStringBase()

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16558:
-

 Summary: Implement HeartbeatRequestState.toStringBase()
 Key: KAFKA-16558
 URL: https://issues.apache.org/jira/browse/KAFKA-16558
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The code incorrectly overrides the {{toString()}} method instead of overriding 
{{{}toStringBase(){}}}. This affects debugging and troubleshooting consumer 
issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16557) Fix CommitRequestManager’s OffsetFetchRequestState.toString()

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16557:
-

 Summary: Fix CommitRequestManager’s 
OffsetFetchRequestState.toString()
 Key: KAFKA-16557
 URL: https://issues.apache.org/jira/browse/KAFKA-16557
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The code incorrectly overrides the {{toString()}} method instead of overriding 
{{{}toStringBase(){}}}. This affects debugging and troubleshooting consumer 
issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16556) Race condition between ConsumerRebalanceListener and SubscriptionState

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16556:
-

 Summary: Race condition between ConsumerRebalanceListener and 
SubscriptionState
 Key: KAFKA-16556
 URL: https://issues.apache.org/jira/browse/KAFKA-16556
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


There appears to be a race condition between invoking the 
{{ConsumerRebalanceListener}} callbacks on reconciliation and 
{{initWithCommittedOffsetsIfNeeded}} in the consumer.
 
The membership manager adds the newly assigned partitions to the 
{{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. 
Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, 
the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set 
all of those partitions' 'pending' flag to false.
 
During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to 
call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already 
cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the 
subscription's {{initializingPartitions}} method to get a set of the partitions 
for which to fetch their committed offsets. However, 
{{SubscriptionState.initializingPartitions()}} only returns partitions that 
have the {{pendingOnAssignedCallback}} flag set to to false.
 
The result is: * If the {{MembershipManagerImpl.assignPartitions()}} future  is 
completed on the background thread first, the 'pending' flag is set to false. 
On the application thread, when {{SubscriptionState.initializingPartitions()}} 
is called, it returns the partition, and we fetch its committed offsets
 * If instead the application thread calls 
{{SubscriptionState.initializingPartitions()}} first, the partitions's 
'pending' flag is still set to false, and so the partition is omitted from the 
returned set. The {{updateFetchPositions()}} method then continues on and 
re-initializes the partition's fetch offset to 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16555) Consumer's RequestState has incorrect logic to determine if inflight

2024-04-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16555:
-

 Summary: Consumer's RequestState has incorrect logic to determine 
if inflight
 Key: KAFKA-16555
 URL: https://issues.apache.org/jira/browse/KAFKA-16555
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


When running system tests for the new consumer, I've hit an issue where the 
{{HeartbeatRequestManager}} is sending out multiple concurrent 
{{CONSUMER_GROUP_REQUEST}} RPCs. The effect is the coordinator creates multiple 
members which causes downstream assignment problems.

Here's the order of events:

* Time 202: {{HearbeatRequestManager.poll()}} determines it's OK to send a 
request. In so doing, it updates the {{RequestState}}'s {{lastSentMs}} to the 
current timestamp, 202
* Time 236: the response is received and response handler is invoked, setting 
the {{RequestState}}'s {{lastReceivedMs}} to the current timestamp, 236
* Time 236: {{HearbeatRequestManager.poll()}} is invoked again, and it sees 
that it's OK to send a request. It creates another request, once again updating 
the {{RequestState}}'s {{lastSentMs}} to the current timestamp, 236
* Time 237:  {{HearbeatRequestManager.poll()}} is invoked again, and 
ERRONEOUSLY decides it's OK to send another request, despite one already in 
flight.

Here's the problem with {{requestInFlight()}}:

{code:java}
public boolean requestInFlight() {
return this.lastSentMs > -1 && this.lastReceivedMs < this.lastSentMs;
}
{code}

On our case, {{lastReceivedMs}} is 236 and {{lastSentMs}} is _also_ 236. So the 
received timestamp is _equal_ to the sent timestamp, not _less_.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16465) New consumer does not invoke rebalance callbacks as expected in consumer_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16465:
-

 Summary: New consumer does not invoke rebalance callbacks as 
expected in consumer_test.py system test
 Key: KAFKA-16465
 URL: https://issues.apache.org/jira/browse/KAFKA-16465
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{replication_replica_failure_test.py}} system test fails with the 
following error:

{noformat}
test_id:
kafkatest.tests.core.replication_replica_failure_test.ReplicationReplicaFailureTest.test_replication_with_replica_failure.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   1 minute 20.972 seconds


TimeoutError('Timed out after 30s while awaiting initial record delivery of 
5 records')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/replication_replica_failure_test.py",
 line 97, in test_replication_with_replica_failure
self.await_startup()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/end_to_end.py",
 line 125, in await_startup
(timeout_sec, min_records))
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
 line 58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: Timed out after 30s while awaiting initial record 
delivery of 5 records
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16464) New consumer fails with timeout in replication_replica_failure_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16464:
-

 Summary: New consumer fails with timeout in 
replication_replica_failure_test.py system test
 Key: KAFKA-16464
 URL: https://issues.apache.org/jira/browse/KAFKA-16464
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{security_test.py}} system test fails with the following error:

{noformat}
test_id:
kafkatest.tests.core.security_test.SecurityTest.test_client_ssl_endpoint_validation_failure.security_protocol=SSL.interbroker_security_protocol=PLAINTEXT.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
status: FAIL
run time:   1 minute 30.885 seconds


TimeoutError('')
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 184, in _do_run
data = self.run_test()
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
 line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
 line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/core/security_test.py",
 line 142, in test_client_ssl_endpoint_validation_failure
wait_until(lambda: self.producer_consumer_have_expected_error(error), 
timeout_sec=30)
  File 
"/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/utils/util.py",
 line 58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16462) New consumer fails with timeout in security_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16462:
-

 Summary: New consumer fails with timeout in security_test.py 
system test
 Key: KAFKA-16462
 URL: https://issues.apache.org/jira/browse/KAFKA-16462
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{security_test.py}} system test fails with the following error:

{quote}
* Consumer failed to consume up to offsets
{quote}

Affected test:

* {{test_client_ssl_endpoint_validation_failure}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16461) New consumer fails to consume records in security_test.py system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16461:
-

 Summary: New consumer fails to consume records in security_test.py 
system test
 Key: KAFKA-16461
 URL: https://issues.apache.org/jira/browse/KAFKA-16461
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test fails with the following errors:

{quote}
* Timed out waiting for consumption
{quote}

Affected tests:

* {{test_broker_failure}}
* {{test_consumer_bounce}}
* {{test_static_consumer_bounce}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16460) New consumer times out system test

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16460:
-

 Summary: New consumer times out system test
 Key: KAFKA-16460
 URL: https://issues.apache.org/jira/browse/KAFKA-16460
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The {{consumer_test.py}} system test fails with two different errors related to 
consumers joining the consumer group in a timely fashion.

{quote}
* Consumers failed to join in a reasonable amount of time
* Timed out waiting for consumers to join, expected total X joined, but only 
see Y joined fromnormal consumer group and Z from conflict consumer group{quote}

Affected tests:

 * {{test_fencing_static_consumer}}
 * {{test_static_consumer_bounce}}
 * {{test_static_consumer_persisted_after_rejoin}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16459) consumer_test.py’s static membership tests fail with new consumer

2024-04-02 Thread Kirk True (Jira)
Kirk True created KAFKA-16459:
-

 Summary: consumer_test.py’s static membership tests fail with new 
consumer
 Key: KAFKA-16459
 URL: https://issues.apache.org/jira/browse/KAFKA-16459
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Philip Nee
 Fix For: 3.8.0


The following error is reported when running the {{test_valid_assignment}} test 
from {{consumer_test.py}}:

 {code}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
584, in test_valid_assignment
wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
consumer.current_assignment()),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])]
{code}

To reproduce, create a system test suite file named 
{{test_valid_assignment.yml}} with these contents:

{code:yaml}
failures:
  - 
'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}'
{code}

Then set the the {{TC_PATHS}} environment variable to include that test suite 
file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16444) Run KIP-848 unit tests under code coverage

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16444:
-

 Summary: Run KIP-848 unit tests under code coverage
 Key: KAFKA-16444
 URL: https://issues.apache.org/jira/browse/KAFKA-16444
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16443) Update streams_static_membership_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16443:
-

 Summary: Update streams_static_membership_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16443
 URL: https://issues.apache.org/jira/browse/KAFKA-16443
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 4.0.0


This task is to update the test method(s) in 
{{streams_standby_replica_test.py}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16442) Update streams_standby_replica_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16442:
-

 Summary: Update streams_standby_replica_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16442
 URL: https://issues.apache.org/jira/browse/KAFKA-16442
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 4.0.0


This task is to update the test method(s) in {{security_test.py}} to support 
the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16440) Update security_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16440:
-

 Summary: Update security_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16440
 URL: https://issues.apache.org/jira/browse/KAFKA-16440
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in 
{{replication_replica_failure_test.py}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16441) Update streams_broker_down_resilience_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16441:
-

 Summary: Update streams_broker_down_resilience_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16441
 URL: https://issues.apache.org/jira/browse/KAFKA-16441
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{security_test.py}} to support 
the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16439) Update replication_replica_failure_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16439:
-

 Summary: Update replication_replica_failure_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16439
 URL: https://issues.apache.org/jira/browse/KAFKA-16439
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{replica_scale_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

See KAFKA-16231 as an example of how the test parameters can be changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16438) Update consumer_test.py’s static tests to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16438:
-

 Summary: Update consumer_test.py’s static tests to support 
KIP-848’s group protocol config
 Key: KAFKA-16438
 URL: https://issues.apache.org/jira/browse/KAFKA-16438
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True


This task is to update the following test method(s) in {{consumer_test.py}} to 
support the {{group.protocol}} configuration:

* {{test_fencing_static_consumer}}
* {{test_static_consumer_bounce}}
* {{test_static_consumer_persisted_after_rejoin}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16271) Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config

2024-03-28 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16271.
---
  Reviewer: Lucas Brutschy
Resolution: Fixed

> Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol 
> config
> --
>
> Key: KAFKA-16271
> URL: https://issues.apache.org/jira/browse/KAFKA-16271
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>
> This task is to update the test method(s) in 
> {{consumer_rolling_upgrade_test.py}} to support the {{group.protocol}} 
> configuration introduced in 
> [KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
>  by adding an optional {{group_protocol}} argument to the tests and matrixes.
> See KAFKA-16231 as an example of how the test parameters can be changed.
> The tricky wrinkle here is that the existing test relies on client-side 
> assignment strategies that aren't applicable with the new KIP-848-enabled 
> consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14246) Update threading model for Consumer

2024-03-25 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-14246.
---
Resolution: Fixed

> Update threading model for Consumer
> ---
>
> Key: KAFKA-14246
> URL: https://issues.apache.org/jira/browse/KAFKA-14246
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Hi community,
>  
> We are refactoring the current KafkaConsumer and making it more asynchronous. 
>  This is the master Jira to track the project's progress; subtasks will be 
> linked to this ticket.  Please review the design document and feel free to 
> use this thread for discussion. 
>  
> The design document is here: 
> [https://cwiki.apache.org/confluence/display/KAFKA/Proposal%3A+Consumer+Threading+Model+Refactor]
>  
> The original email thread is here: 
> [https://lists.apache.org/thread/13jvwzkzmb8c6t7drs4oj2kgkjzcn52l]
>  
> I will continue to update the 1pager as reviews and comments come.
>  
> Thanks, 
> P



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16389) consumer_test.py’s test_valid_assignment fails with new consumer

2024-03-19 Thread Kirk True (Jira)
Kirk True created KAFKA-16389:
-

 Summary: consumer_test.py’s test_valid_assignment fails with new 
consumer
 Key: KAFKA-16389
 URL: https://issues.apache.org/jira/browse/KAFKA-16389
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


The following error is reported when running the {{test_valid_assignment}} test 
from {{consumer_test.py}}:

 {code}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
584, in test_valid_assignment
wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
consumer.current_assignment()),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])]
{code}

To reproduce, create a system test suite file named 
{{test_valid_assignment.yml}} with these contents:

{code:yaml}
failures:
  - 
'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}'
{code}

Then run set the the {{TC_PATHS}} environment variable to include that test 
suite file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15691) Add new system tests to use new consumer

2024-03-14 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-15691:
---

> Add new system tests to use new consumer
> 
>
> Key: KAFKA-15691
> URL: https://issues.apache.org/jira/browse/KAFKA-15691
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, system tests
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16315) Investigate propagating metadata updates via queues

2024-02-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16315:
-

 Summary: Investigate propagating metadata updates via queues
 Key: KAFKA-16315
 URL: https://issues.apache.org/jira/browse/KAFKA-16315
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 4.0.0


Some of the new {{AsyncKafkaConsumer}} logic enqueues events for the network 
I/O thread then issues a call to update the {{ConsumerMetadata}} via 
{{requestUpdate()}}. If the event ends up stuck in the queue for a long time, 
it is possible that the metadata is not updated at the correct time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16312) ConsumerRebalanceListener.onPartitionsAssigned should be called after joining, even if empty

2024-02-28 Thread Kirk True (Jira)
Kirk True created KAFKA-16312:
-

 Summary: ConsumerRebalanceListener.onPartitionsAssigned should be 
called after joining, even if empty
 Key: KAFKA-16312
 URL: https://issues.apache.org/jira/browse/KAFKA-16312
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Lianet Magrans
 Fix For: 3.8.0


There is a difference between the {{LegacyKafkaConsumer}} and 
{{AsyncKafkaConsumer}} respecting when the 
{{ConsumerRebalanceListener.onPartitionsAssigned()}} method is invoked.

 
For example, with {{onPartitionsAssigned()}}:

* {{LegacyKafkaConsumer}}: the listener method is invoked when the consumer 
joins the group, even if that consumer was not assigned any partitions. In this 
case it's passed an empty list.
* {{AsyncKafkaConsumer}}: the listener method is only invoked after the 
consumer joins the group iff it has assigned partitions

 
This difference is affecting the system tests. The system tests use a Java 
class named {{VerifiableConsumer}} which uses a {{ConsumerRebalanceListener}} 
that logs when the callbacks are invoked. The system tests then read from that 
log to determine when the callbacks are invoked. This coordination is used by 
the system tests to determine the lifecycle and status of the consumers.
 
The system tests rely heavily on the listener behavior of the 
{{LegacyKafkaConsumer}}. It invokes the {{onPartitionsAssigned()}} method when 
the consumer joins the group, and the system tests use that to determine when 
the consumer is actively a member of the group. This validation of membership 
is used as an assertion throughout the consumer-related tests.
 
In the system test I'm executing from {{consumer_test.py}}, there's a test that 
creates three consumers to read from a single topic with a single partition. 
It's a bit of an oddball test, but it demonstrates the issue.
 
Here are the logs pulled from the test run when executed using the 
{{LegacyKafkaConsumer}}:
 
Node 1:
 
{code:java}
[2024-02-15 00:43:52,400] INFO Adding newly assigned partitions:  
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}
 
Node 2:
 
{code:java}
[2024-02-15 00:43:52,401] INFO Adding newly assigned partitions: test_topic-0 
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}
 
Node 3:
 
{code:java}
[2024-02-15 00:43:52,399] INFO Adding newly assigned partitions:  
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}

Here are the logs when executing the same test using the {{AsyncKafkaConsumer}}:

Node 1:

{code:java}
[2024-02-15 01:15:46,576] INFO Adding newly assigned partitions: test_topic-0 
(org.apache.kafka.clients.consumer.internals.ConsumerRebalanceListenerInvoker){code}

Node 2:

{code:java}n/a{code}

Node 3:

{code:java}n/a{code}

As a result of this change, the existing system tests do not work with the new 
consumer. However, even more importantly, this change in behavior may adversely 
affect existing users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15475) Request might retry forever even if the user API timeout expires

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15475.
---
Resolution: Fixed

> Request might retry forever even if the user API timeout expires
> 
>
> Key: KAFKA-15475
> URL: https://issues.apache.org/jira/browse/KAFKA-15475
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> If the request timeout in the background thread, it will be completed with 
> TimeoutException, which is Retriable.  In the TopicMetadataRequestManager and 
> possibly other managers, the request might continue to be retried forever.
>  
> There are two ways to fix this
>  # Pass a timer to the manager to remove the inflight requests when it is 
> expired.
>  # Pass the future to the application layer and continue to retry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-16200:
---

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16199) Prune the event queue if event timeout expired before starting

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16199.
---
Resolution: Duplicate

> Prune the event queue if event timeout expired before starting
> --
>
> Key: KAFKA-16199
> URL: https://issues.apache.org/jira/browse/KAFKA-16199
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16200) Enforce that RequestManager implementations respect user-provided timeout

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16200.
---
Resolution: Duplicate

> Enforce that RequestManager implementations respect user-provided timeout
> -
>
> Key: KAFKA-16200
> URL: https://issues.apache.org/jira/browse/KAFKA-16200
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> The intention of the {{CompletableApplicationEvent}} is for a {{Consumer}} to 
> block waiting for the event to complete. The application thread will block 
> for the timeout, but there is not yet a consistent manner in which events are 
> timed out.
> Enforce at the request manager layer that timeouts are respected per the 
> design in KAFKA-15848.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16019) Some of the tests in PlaintextConsumer can't seem to deterministically invoke and verify the consumer callback

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16019.
---
Resolution: Fixed

> Some of the tests in PlaintextConsumer can't seem to deterministically invoke 
> and verify the consumer callback
> --
>
> Key: KAFKA-16019
> URL: https://issues.apache.org/jira/browse/KAFKA-16019
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> I was running the PlaintextConsumer to test the async consumer; however, a 
> few tests were failing with not being able to verify the listener is invoked 
> correctly
> For example `testPerPartitionLeadMetricsCleanUpWithSubscribe`
> Around 50% of the time, the listener's callsToAssigned was never incremented 
> correctly.  Event changing it to awaitUntilTrue it was still the same case
> {code:java}
> consumer.subscribe(List(topic, topic2).asJava, listener)
> val records = awaitNonEmptyRecords(consumer, tp)
> assertEquals(1, listener.callsToAssigned, "should be assigned once") {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16023) PlaintextConsumerTest needs to wait for reconciliation to complete before proceeding

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16023.
---
Resolution: Fixed

{{testPerPartitionLagMetricsCleanUpWithSubscribe}} is now passing consistently, 
so marking this as fixed.

> PlaintextConsumerTest needs to wait for reconciliation to complete before 
> proceeding
> 
>
> Key: KAFKA-16023
> URL: https://issues.apache.org/jira/browse/KAFKA-16023
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Critical
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> Several tests in PlaintextConsumerTest.scala (such as 
> testPerPartitionLagMetricsCleanUpWithSubscribe) uses:
> assertEquals(1, listener.callsToAssigned, "should be assigned once")
> However, as the timing for reconciliation completion is not deterministic due 
> to asynchronous processing. We actually need to wait until the condition to 
> happen.
> However, another issue is the timeout - some of these tasks might not 
> complete within the 600ms timeout, so the tests are deemed to be flaky.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15993) Enable max poll integration tests that depend on callback invocation

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15993.
---
Resolution: Duplicate

> Enable max poll integration tests that depend on callback invocation
> 
>
> Key: KAFKA-15993
> URL: https://issues.apache.org/jira/browse/KAFKA-15993
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, integration-tests, timeout
> Fix For: 3.8.0
>
>
> We will enable integration tests using the async consumer in KAFKA-15971.  
> However, we should also enable tests that rely on rebalance listeners after 
> KAFKA-15628 is closed.  One example would be testMaxPollIntervalMs, that I 
> relies on the listener to verify the correctness.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16208) Design new Consumer timeout policy

2024-02-20 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16208.
---
Resolution: Duplicate

> Design new Consumer timeout policy
> --
>
> Key: KAFKA-16208
> URL: https://issues.apache.org/jira/browse/KAFKA-16208
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer, documentation
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Blocker
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> This task is to design and document the timeout policy for the new Consumer 
> implementation.
> The documentation lives here: 
> https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16287) Implement example test for common rebalance callback scenarios

2024-02-20 Thread Kirk True (Jira)
Kirk True created KAFKA-16287:
-

 Summary: Implement example test for common rebalance callback 
scenarios
 Key: KAFKA-16287
 URL: https://issues.apache.org/jira/browse/KAFKA-16287
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer
Reporter: Kirk True
Assignee: Lucas Brutschy
 Fix For: 3.8.0


There is justified concern that the new threading model may not play well with 
"tricky" {{ConsumerRebalanceListener}} callbacks. We need to provide some 
assurance that it will support complicated patterns.
 # Design and implement test scenarios
 # Update and document any design changes with the callback sub-system where 
needed
 # Provide fix(es) to the {{AsyncKafkaConsumer}} implementation to abide by 
said design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16276) Update transactions_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16276:
-

 Summary: Update transactions_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16276
 URL: https://issues.apache.org/jira/browse/KAFKA-16276
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{transactions_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

The wrinkle here is that {{transactions_test.py}}  was not able to run as-is. 
That might deprioritize this until whatever is causing that is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16275) Update kraft_upgrade_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16275:
-

 Summary: Update kraft_upgrade_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16275
 URL: https://issues.apache.org/jira/browse/KAFKA-16275
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{kraft_upgrade_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16274) Update replica_scale_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16274:
-

 Summary: Update replica_scale_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16274
 URL: https://issues.apache.org/jira/browse/KAFKA-16274
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{replica_scale_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16272) Update connect_distributed_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16272:
-

 Summary: Update connect_distributed_test.py to support KIP-848’s 
group protocol config
 Key: KAFKA-16272
 URL: https://issues.apache.org/jira/browse/KAFKA-16272
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{connect_distributed_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16273) Update consume_bench_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16273:
-

 Summary: Update consume_bench_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16273
 URL: https://issues.apache.org/jira/browse/KAFKA-16273
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{consume_bench_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16271) Update consumer_rolling_upgrade_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16271:
-

 Summary: Update consumer_rolling_upgrade_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16271
 URL: https://issues.apache.org/jira/browse/KAFKA-16271
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in 
{{consumer_rolling_upgrade_test.py}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.

The tricky wrinkle here is that the existing test relies on client-side 
assignment strategies that aren't applicable with the new KIP-848-enabled 
consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16270) Update snapshot_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16270:
-

 Summary: Update snapshot_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16270
 URL: https://issues.apache.org/jira/browse/KAFKA-16270
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{snapshot_test.py}} to support 
the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16269) Update reassign_partitions_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16269:
-

 Summary: Update reassign_partitions_test.py to support KIP-848’s 
group protocol config
 Key: KAFKA-16269
 URL: https://issues.apache.org/jira/browse/KAFKA-16269
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{reassign_partitions_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16268) Update fetch_from_follower_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16268:
-

 Summary: Update fetch_from_follower_test.py to support KIP-848’s 
group protocol config
 Key: KAFKA-16268
 URL: https://issues.apache.org/jira/browse/KAFKA-16268
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{fetch_from_follower_test.py}} to 
support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16267) Update consumer_group_command_test.py to support KIP-848’s group protocol config

2024-02-17 Thread Kirk True (Jira)
Kirk True created KAFKA-16267:
-

 Summary: Update consumer_group_command_test.py to support 
KIP-848’s group protocol config
 Key: KAFKA-16267
 URL: https://issues.apache.org/jira/browse/KAFKA-16267
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update the test method(s) in {{consumer_group_command_test.py}} 
to support the {{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument to the tests and matrixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16256) Update ConsumerConfig to validate use of group.remote.assignor and partition.assignment.strategy based on group.protocol

2024-02-14 Thread Kirk True (Jira)
Kirk True created KAFKA-16256:
-

 Summary: Update ConsumerConfig to validate use of 
group.remote.assignor and partition.assignment.strategy based on group.protocol
 Key: KAFKA-16256
 URL: https://issues.apache.org/jira/browse/KAFKA-16256
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0


{{ConsumerConfig}} supports both the {{group.remote.assignor}} and 
{{partition.assignment.strategy}} configuration options. These, however, should 
not be used together; the former is applicable only when the {{group.protocol}} 
is set to {{consumer}} and the latter when the {{group.protocol}} is set to 
{{{}classic{}}}. We should emit a warning if the user specifies the incorrect 
configuration based on the value of {{{}group.protocol{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16255) AsyncKafkaConsumer should not use partition.assignment.strategy

2024-02-14 Thread Kirk True (Jira)
Kirk True created KAFKA-16255:
-

 Summary: AsyncKafkaConsumer should not use 
partition.assignment.strategy
 Key: KAFKA-16255
 URL: https://issues.apache.org/jira/browse/KAFKA-16255
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


The partition.assignment.strategy configuration is used to specify a list of 
zero or more 
ConsumerPartitionAssignor instances. However, that interface is not applicable 
for the KIP-848-based protocol on top of which AsyncKafkaConsumer is built. 
Therefore, the use of ConsumerPartitionAssignor is in appropriate and should be 
removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16231) Update consumer_test.py to support KIP-848’s group protocol config

2024-02-06 Thread Kirk True (Jira)
Kirk True created KAFKA-16231:
-

 Summary: Update consumer_test.py to support KIP-848’s group 
protocol config
 Key: KAFKA-16231
 URL: https://issues.apache.org/jira/browse/KAFKA-16231
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update {{verifiable_consumer.py}} to support the 
{{group.protocol}} configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{group_protocol}} argument. It will default to classic 
and we will take a separate task (Jira) to update the callers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16230) Update verifiable_consumer to support KIP-848’s group protocol config

2024-02-06 Thread Kirk True (Jira)
Kirk True created KAFKA-16230:
-

 Summary: Update verifiable_consumer to support KIP-848’s group 
protocol config
 Key: KAFKA-16230
 URL: https://issues.apache.org/jira/browse/KAFKA-16230
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to update {{VerifiableConsumer}} to support the {{group.protocol}} 
configuration introduced in 
[KIP-848|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol]
 by adding an optional {{--group-protocol}} command line option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15691) Add new system tests to use new consumer

2024-02-01 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15691.
---
Resolution: Duplicate

> Add new system tests to use new consumer
> 
>
> Key: KAFKA-15691
> URL: https://issues.apache.org/jira/browse/KAFKA-15691
> Project: Kafka
>  Issue Type: Test
>  Components: clients, consumer, system tests
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: kip-848-client-support, system-tests
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16208) Design new Consumer timeout policy

2024-01-29 Thread Kirk True (Jira)
Kirk True created KAFKA-16208:
-

 Summary: Design new Consumer timeout policy
 Key: KAFKA-16208
 URL: https://issues.apache.org/jira/browse/KAFKA-16208
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer, documentation
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


This task is to design and document the timeout policy for the new Consumer 
implementation.

The documentation lives here: 
https://cwiki.apache.org/confluence/display/KAFKA/Java+client+Consumer+timeouts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16200) Ensure RequestManager handling of expired timeouts are consistent

2024-01-26 Thread Kirk True (Jira)
Kirk True created KAFKA-16200:
-

 Summary: Ensure RequestManager handling of expired timeouts are 
consistent
 Key: KAFKA-16200
 URL: https://issues.apache.org/jira/browse/KAFKA-16200
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16199) Prune the event queue if events have expired before starting

2024-01-26 Thread Kirk True (Jira)
Kirk True created KAFKA-16199:
-

 Summary: Prune the event queue if events have expired before 
starting
 Key: KAFKA-16199
 URL: https://issues.apache.org/jira/browse/KAFKA-16199
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16191) Clean up of consumer client internal events

2024-01-24 Thread Kirk True (Jira)
Kirk True created KAFKA-16191:
-

 Summary: Clean up of consumer client internal events
 Key: KAFKA-16191
 URL: https://issues.apache.org/jira/browse/KAFKA-16191
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0


There are a few minor issues with the event sub-classes in the 
org.apache.kafka.clients.consumer.internals package that should be cleaned up:
  # Update the names of subclasses to remove "Application" or "Background"
 # Make toString() final in the base classes and clean up the implementations 
of toStringBase()
 # Fix minor whitespace inconsistencies



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16167) Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup

2024-01-18 Thread Kirk True (Jira)
Kirk True created KAFKA-16167:
-

 Summary: Fix PlaintextConsumerTest.testAutoCommitOnCloseAfterWakeup
 Key: KAFKA-16167
 URL: https://issues.apache.org/jira/browse/KAFKA-16167
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLeadMetricsCleanUpWithSubscribe

2024-01-17 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16151.
---
Resolution: Duplicate

> Fix PlaintextConsumerTest.testPerPartitionLeadMetricsCleanUpWithSubscribe
> -
>
> Key: KAFKA-16151
> URL: https://issues.apache.org/jira/browse/KAFKA-16151
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, kip-848-client-support
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe

2024-01-17 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-16150.
---
Resolution: Duplicate

> Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
> 
>
> Key: KAFKA-16150
> URL: https://issues.apache.org/jira/browse/KAFKA-16150
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor, kip-848-client-support
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16152) Fix PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16152:
-

 Summary: Fix 
PlaintextConsumerTest.testStaticConsumerDetectsNewPartitionCreatedAfterRestart
 Key: KAFKA-16152
 URL: https://issues.apache.org/jira/browse/KAFKA-16152
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16151) Fix PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16151:
-

 Summary: Fix 
PlaintextConsumerTest.testPerPartitionLedMetricsCleanUpWithSubscribe
 Key: KAFKA-16151
 URL: https://issues.apache.org/jira/browse/KAFKA-16151
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16150) Fix PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16150:
-

 Summary: Fix 
PlaintextConsumerTest.testPerPartitionLagMetricsCleanUpWithSubscribe
 Key: KAFKA-16150
 URL: https://issues.apache.org/jira/browse/KAFKA-16150
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16149) Aggressively expire unused client connections

2024-01-16 Thread Kirk True (Jira)
Kirk True created KAFKA-16149:
-

 Summary: Aggressively expire unused client connections
 Key: KAFKA-16149
 URL: https://issues.apache.org/jira/browse/KAFKA-16149
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, producer 
Reporter: Kirk True
Assignee: Kirk True


The goal is to minimize the number of connections from the client to the 
brokers.

On the Java client, there are potentially two types of network connections to 
brokers:
 # Connections for metadata requests
 # Connections for fetch, produce, etc. requests

The idea is to apply a much shorter idle time to client connections that have 
_only_ served metadata (type 1 above) so that they become candidates for 
expiration more quickly.

Alternatively (or additionally), a change to the way metadata requests are 
routed could be made to reduce the number of connections.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16142) Update metrics documentation for errors and new metrics

2024-01-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16142:
-

 Summary: Update metrics documentation for errors and new metrics
 Key: KAFKA-16142
 URL: https://issues.apache.org/jira/browse/KAFKA-16142
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16143) New metrics for KIP-848 protocol

2024-01-15 Thread Kirk True (Jira)
Kirk True created KAFKA-16143:
-

 Summary: New metrics for KIP-848 protocol
 Key: KAFKA-16143
 URL: https://issues.apache.org/jira/browse/KAFKA-16143
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Philip Nee
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16112) Review JMX metrics in Async Consumer and determine the missing ones

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16112:
-

 Summary: Review JMX metrics in Async Consumer and determine the 
missing ones
 Key: KAFKA-16112
 URL: https://issues.apache.org/jira/browse/KAFKA-16112
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Philip Nee
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16111) Implement tests for tricky rebalance callbacks scenarios

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16111:
-

 Summary: Implement tests for tricky rebalance callbacks scenarios
 Key: KAFKA-16111
 URL: https://issues.apache.org/jira/browse/KAFKA-16111
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16110) Implement consumer performance tests

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16110:
-

 Summary: Implement consumer performance tests
 Key: KAFKA-16110
 URL: https://issues.apache.org/jira/browse/KAFKA-16110
 Project: Kafka
  Issue Type: New Feature
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16109) Ensure system tests cover the "simple consumer + commit" use case

2024-01-10 Thread Kirk True (Jira)
Kirk True created KAFKA-16109:
-

 Summary: Ensure system tests cover the "simple consumer + commit" 
use case
 Key: KAFKA-16109
 URL: https://issues.apache.org/jira/browse/KAFKA-16109
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer, system tests
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15250) DefaultBackgroundThread is running tight loop

2024-01-09 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reopened KAFKA-15250:
---
  Assignee: Kirk True  (was: Philip Nee)

> DefaultBackgroundThread is running tight loop
> -
>
> Key: KAFKA-15250
> URL: https://issues.apache.org/jira/browse/KAFKA-15250
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Kirk True
>Priority: Major
>  Labels: consumer-threading-refactor
>
> The DefaultBackgroundThread is running tight loops and wasting CPU cycles.  I 
> think we need to reexamine the timeout pass to networkclientDelegate.poll.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16037) Upgrade existing system tests to use new consumer

2023-12-20 Thread Kirk True (Jira)
Kirk True created KAFKA-16037:
-

 Summary: Upgrade existing system tests to use new consumer
 Key: KAFKA-16037
 URL: https://issues.apache.org/jira/browse/KAFKA-16037
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, system tests
Reporter: Kirk True
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16029) Investigate cause of "Unable to find FetchSessionHandler for node X" in logs

2023-12-18 Thread Kirk True (Jira)
Kirk True created KAFKA-16029:
-

 Summary: Investigate cause of "Unable to find FetchSessionHandler 
for node X" in logs
 Key: KAFKA-16029
 URL: https://issues.apache.org/jira/browse/KAFKA-16029
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Affects Versions: 3.7.0
Reporter: Kirk True
Assignee: Kirk True






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16011) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnClose

2023-12-13 Thread Kirk True (Jira)
Kirk True created KAFKA-16011:
-

 Summary: Fix 
PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnClose
 Key: KAFKA-16011
 URL: https://issues.apache.org/jira/browse/KAFKA-16011
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True


The integration test 
{{PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling}} is 
failing when using the {{AsyncKafkaConsumer}}.

The error is:

{code}
org.opentest4j.AssertionFailedError: Did not get valid assignment for 
partitions [topic1-2, topic1-4, topic-1, topic-0, topic1-5, topic1-1, topic1-0, 
topic1-3] after one consumer left
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
at 
kafka.api.AbstractConsumerTest.validateGroupAssignment(AbstractConsumerTest.scala:286)
at 
kafka.api.PlaintextConsumerTest.runMultiConsumerSessionTimeoutTest(PlaintextConsumerTest.scala:1883)
at 
kafka.api.PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling(PlaintextConsumerTest.scala:1281)
{code}

The logs include these lines:
 
{code}
[2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
[2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
[2023-12-13 15:26:40,736] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
{code} 

I don't know if that's related or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16010) Fix PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling

2023-12-13 Thread Kirk True (Jira)
Kirk True created KAFKA-16010:
-

 Summary: Fix 
PlaintextConsumerTest.testMultiConsumerSessionTimeoutOnStopPolling
 Key: KAFKA-16010
 URL: https://issues.apache.org/jira/browse/KAFKA-16010
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True


The integration test 
{{PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation}} is failing 
when using the {{AsyncKafkaConsumer}}.

The error is:

{code}
org.opentest4j.AssertionFailedError: Timed out before expected rebalance 
completed
at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
at 
kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317)
at 
kafka.api.PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation(PlaintextConsumerTest.scala:235)
{code}

The logs include this line:
 
{code}
[2023-12-13 15:20:27,824] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
{code} 

I don't know if that's related or not.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16009) Fix PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation

2023-12-13 Thread Kirk True (Jira)
Kirk True created KAFKA-16009:
-

 Summary: Fix 
PlaintextConsumerTest.testMaxPollIntervalMsDelayInRevocation
 Key: KAFKA-16009
 URL: https://issues.apache.org/jira/browse/KAFKA-16009
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True


The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is failing 
when using the {{AsyncKafkaConsumer}}.

The error is:

{code}
org.opentest4j.AssertionFailedError: Timed out before expected rebalance 
completed
    at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
at 
kafka.api.AbstractConsumerTest.awaitRebalance(AbstractConsumerTest.scala:317)
at 
kafka.api.PlaintextConsumerTest.testMaxPollIntervalMs(PlaintextConsumerTest.scala:194)
{code}

The logs include this line:
 
{code}
[2023-12-13 15:11:16,134] WARN [Consumer clientId=ConsumerTestConsumer, 
groupId=my-test] consumer poll timeout has expired. This means the time between 
subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
which typically implies that the poll loop is spending too much time processing 
messages. You can address this either by increasing max.poll.interval.ms or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:188)
{code} 

I don't know if that's related or not.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16008) Fix PlaintextConsumerTest.testMaxPollIntervalMs

2023-12-13 Thread Kirk True (Jira)
Kirk True created KAFKA-16008:
-

 Summary: Fix PlaintextConsumerTest.testMaxPollIntervalMs
 Key: KAFKA-16008
 URL: https://issues.apache.org/jira/browse/KAFKA-16008
 Project: Kafka
  Issue Type: Test
  Components: clients, consumer, unit tests
Affects Versions: 3.7.0
Reporter: Kirk True


The integration test {{PlaintextConsumerTest.testMaxPollIntervalMs}} is failing 
with the {{{}AsyncKafkaConsumer{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15974) Enforce that CompletableApplicationEvent has a timeout

2023-12-05 Thread Kirk True (Jira)
Kirk True created KAFKA-15974:
-

 Summary: Enforce that CompletableApplicationEvent has a timeout
 Key: KAFKA-15974
 URL: https://issues.apache.org/jira/browse/KAFKA-15974
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True


The intention of the CompletableApplicationEvent is for a Consumer to block 
waiting for the event to complete. The application thread will block for the 
timeout, but there is not yet a consistent manner in which events that have 
timed out are:

a) pruned from the event queue if they've expired before starting
b) canceled by the background thread if they've expired after starting



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15909) Remove support for empty "group.id"

2023-11-27 Thread Kirk True (Jira)
Kirk True created KAFKA-15909:
-

 Summary: Remove support for empty "group.id"
 Key: KAFKA-15909
 URL: https://issues.apache.org/jira/browse/KAFKA-15909
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True


Per KIP-266, the {{Consumer.poll(long timeout)}} method was deprecated back in 
2.0.0. 

In 3.7, there are two implementations, each with different behavior:

* The {{LegacyKafkaConsumer}} implementation will continue to work but will log 
a warning about its removal
* The {{AsyncKafkaConsumer}} implementation will throw an error.

In 4.0, the `poll` method that takes a single `long` timeout will be removed 
altogether.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15908) Remove deprecated Consumer API poll(long timeout)

2023-11-27 Thread Kirk True (Jira)
Kirk True created KAFKA-15908:
-

 Summary: Remove deprecated Consumer API poll(long timeout)
 Key: KAFKA-15908
 URL: https://issues.apache.org/jira/browse/KAFKA-15908
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kirk True
Assignee: Kirk True


Per KIP-266, the {{Consumer.poll(long timeout)}} method was deprecated back in 
2.0.0. 

In 3.7, there are two implementations, each with different behavior:

* The {{LegacyKafkaConsumer}} implementation will continue to work but will log 
a warning about its removal
* The {{AsyncKafkaConsumer}} implementation will throw an error.

In 4.0, the `poll` method that takes a single `long` timeout will be removed 
altogether.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15907) Remove previously deprecated Consumer features from 4.0

2023-11-27 Thread Kirk True (Jira)
Kirk True created KAFKA-15907:
-

 Summary: Remove previously deprecated Consumer features from 4.0
 Key: KAFKA-15907
 URL: https://issues.apache.org/jira/browse/KAFKA-15907
 Project: Kafka
  Issue Type: Task
Reporter: Kirk True
Assignee: Kirk True


This Jira serves as the main collection of APIs, logic, etc. that were 
previously marked as "deprecated" by other KIPs. With 4.0, we will be updating 
the code to remove the deprecated features.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15848) Consumer API timeout inconsistent between LegacyKafkaConsumer and AsyncKafkaConsumer

2023-11-16 Thread Kirk True (Jira)
Kirk True created KAFKA-15848:
-

 Summary: Consumer API timeout inconsistent between 
LegacyKafkaConsumer and AsyncKafkaConsumer
 Key: KAFKA-15848
 URL: https://issues.apache.org/jira/browse/KAFKA-15848
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Reporter: Kirk True






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15837) Throw error on use of Consumer.poll(long timeout)

2023-11-15 Thread Kirk True (Jira)
Kirk True created KAFKA-15837:
-

 Summary: Throw error on use of Consumer.poll(long timeout)
 Key: KAFKA-15837
 URL: https://issues.apache.org/jira/browse/KAFKA-15837
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.7.0


Per [KIP-266|https://cwiki.apache.org/confluence/x/5kiHB], the 
Consumer.poll(long timeout) method was deprecated back in 2.0.0. The method 
will now throw a KafkaException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15694) New integration tests to have full coverage for preview

2023-11-02 Thread Kirk True (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True resolved KAFKA-15694.
---
  Assignee: Kirk True
Resolution: Duplicate

> New integration tests to have full coverage for preview
> ---
>
> Key: KAFKA-15694
> URL: https://issues.apache.org/jira/browse/KAFKA-15694
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Kirk True
>Priority: Minor
>  Labels: kip-848, kip-848-client-support, kip-848-preview
>
> These are to fix bugs discovered during PR reviews but not tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15767) Redesign TransactionManager to avoid use of ThreadLocal

2023-10-31 Thread Kirk True (Jira)
Kirk True created KAFKA-15767:
-

 Summary: Redesign TransactionManager to avoid use of ThreadLocal
 Key: KAFKA-15767
 URL: https://issues.apache.org/jira/browse/KAFKA-15767
 Project: Kafka
  Issue Type: Improvement
  Components: clients, producer 
Affects Versions: 3.6.0
Reporter: Kirk True
Assignee: Kirk True
 Fix For: 3.7.0


A {{TransactionManager}} instance is created by the {{KafkaProducer}} and 
shared with the {{Sender}} thread. The {{TransactionManager}} has internal 
states through which it transitions as part of its initialization, transaction 
management, shutdown, etc. It contains logic to ensure that those state 
transitions are valid, such that when an invalid transition is attempted, it is 
handled appropriately. 

The issue is, the definition of "handled appropriately" depends on which thread 
is making the API call that is attempting an invalid transition. The 
application thread expects that the invalid transition will generate an 
exception. However, the sender thread expects that the invalid transition will 
"poison" the entire {{TransactionManager}} instance.

So as part of the implementation of KAFKA-14831, we needed a way to change 
logic in the {{TransactionManager}} on a per-thread basis, so a {{ThreadLocal}} 
instance variable was added to the {{TransactionManager}} to keep track of 
this. However, the use of ThreadLocal instance variables is generally 
discouraged because of their tendency for memory leaks, shared state across 
multiple threads, and not working with virtual threads.

The initial implementation attempt of KAFKA-14831 used a context object, passed 
in to each method, to affect the logic. However, the number of methods that 
needed to be changed to accept this new context object grew until most of the 
methods in {{TransactionManager}} needed to be updated. Thus all the affected 
call sites needed to be updated, resulting in a much larger change than 
anticipated.

The task here is to remove the use of the {{ThreadLocal}} instance variable, 
let the application thread and {{Sender}} thread keep their desired behavior, 
but keep the change to a minimum.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15696) Revoke partitions on Consumer.close()

2023-10-26 Thread Kirk True (Jira)
Kirk True created KAFKA-15696:
-

 Summary: Revoke partitions on Consumer.close()
 Key: KAFKA-15696
 URL: https://issues.apache.org/jira/browse/KAFKA-15696
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Philip Nee


Upon closing of the {{Consumer}} we need to:
 # Complete pending commits
 # Revoke assignment (Note that the revocation involves stop fetching, 
committing offsets if auto-commit enabled and invoking the onPartitionsRevoked 
callback)
 # Send the last GroupConsumerHeartbeatRequest with epoch = -1 to leave the 
group (or -2 if static member)
 # Close any fetch sessions on the brokers
 # Poll the NetworkClient to complete pending I/O

There is a mechanism introduced in PR 
[14406|https://github.com/apache/kafka/pull/14406] that allows for performing 
network I/O on shutdown. The new method 
{{DefaultBackgroundThread.runAtClose()}} will be executed when 
{{Consumer.close()}} is invoked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15694) New integration tests to have full coverage for preview

2023-10-26 Thread Kirk True (Jira)
Kirk True created KAFKA-15694:
-

 Summary: New integration tests to have full coverage for preview
 Key: KAFKA-15694
 URL: https://issues.apache.org/jira/browse/KAFKA-15694
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True


These are to fix bugs discovered during PR reviews but not tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15692) New integration tests to have full coverage

2023-10-26 Thread Kirk True (Jira)
Kirk True created KAFKA-15692:
-

 Summary: New integration tests to have full coverage
 Key: KAFKA-15692
 URL: https://issues.apache.org/jira/browse/KAFKA-15692
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15691) Upgrade existing and add new system tests to use new coordinator

2023-10-26 Thread Kirk True (Jira)
Kirk True created KAFKA-15691:
-

 Summary: Upgrade existing and add new system tests to use new 
coordinator
 Key: KAFKA-15691
 URL: https://issues.apache.org/jira/browse/KAFKA-15691
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Kirk True






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >