[ 
https://issues.apache.org/jira/browse/KAFKA-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833704#comment-17833704
 ] 

Philip Nee commented on KAFKA-16389:
------------------------------------

Hi [~kirktrue] Thanks for the initial investigation.  I think your approach 
makes sense but I do think we need to rewrite the verifiable_consumer.py's 
event handler.  As the states transition doesn't necessary match the behavior 
of the current consumer.  And I think that's why there's still some flakiness 
in the patch you submitted.  See my notes below:

 

I'm still occasionally getting errors like: "ducktape.errors.TimeoutError: 
expected valid assignments of 6 partitions when num_started 1: 
[('ducker@ducker11', [])]"

 

This seems to be caused by some weird reconciliation state.  For example: Here 
We can see consumer1 got assigned 6 partitions and then immediately gave up all 
of them.  It is unclear why onPartitionsRevoke is triggered. 

 
{code:java}
  1 node  <ducktape.cluster.cluster.ClusterNode object at 0xffff9fe617c0>
wait for member
idx  1 partiton assigned [{'topic': 'test_topic', 'partition': 0}, {'topic': 
'test_topic', 'partition': 1}, {'topic': 'test_topic', 'partition': 2}, 
{'topic': 'test_topic', 'partition': 3}, {'topic': 'test_topic', 'partition': 
4}, {'topic': 'test_topic', 'partition': 5}]
idx  1 partiton revoked [{'topic': 'test_topic', 'partition': 0}, {'topic': 
'test_topic', 'partition': 1}, {'topic': 'test_topic', 'partition': 2}, 
{'topic': 'test_topic', 'partition': 3}, {'topic': 'test_topic', 'partition': 
4}, {'topic': 'test_topic', 'partition': 5}]
node: ducker11
Current assignment: {<ducktape.cluster.cluster.ClusterNode object at 
0xffff9fe617c0>: []}
idx  1 partiton assigned []
[WARNING - 2024-04-03 11:05:34,587 - service_registry - stop_all - lineno:53]: 
Error stopping service <VerifiableConsumer-0-281473364398912: num_nodes: 3, 
nodes: ['ducker11', 'ducker12', 'ducker13']>: 
<ducktape.cluster.cluster.ClusterNode object at 0xffff9fe61640>
[WARNING - 2024-04-03 11:06:09,128 - service_registry - clean_all - lineno:67]: 
Error cleaning service <VerifiableConsumer-0-281473364398912: num_nodes: 3, 
nodes: ['ducker11', 'ducker12', 'ducker13']>: 
<ducktape.cluster.cluster.ClusterNode object at 0xffff9fe61640>
[INFO:2024-04-03 11:06:09,134]: RunnerClient: 
kafkatest.tests.client.consumer_test.AssignmentValidationTest.test_valid_assignment.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer.group_remote_assignor=range:
 FAIL: TimeoutError("expected valid assignments of 6 partitions when 
num_started 1: [('ducker@ducker11', [])]")
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
186, in _do_run
    data = self.run_test()
  File 
"/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 
246, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
433, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
583, in test_valid_assignment
    wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
consumer.current_assignment()),
  File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
last_exception
ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
num_started 1: [('ducker@ducker11', [])] {code}
 

> consumer_test.py’s test_valid_assignment fails with new consumer
> ----------------------------------------------------------------
>
>                 Key: KAFKA-16389
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16389
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer, system tests
>    Affects Versions: 3.7.0
>            Reporter: Kirk True
>            Assignee: Philip Nee
>            Priority: Blocker
>              Labels: kip-848-client-support, system-tests
>             Fix For: 3.8.0
>
>         Attachments: KAFKA-16389.patch
>
>
> The following error is reported when running the {{test_valid_assignment}} 
> test from {{consumer_test.py}}:
>  {code}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 186, in _do_run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", 
> line 246, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 
> 433, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/client/consumer_test.py", line 
> 584, in test_valid_assignment
>     wait_until(lambda: self.valid_assignment(self.TOPIC, self.NUM_PARTITIONS, 
> consumer.current_assignment()),
>   File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 
> 58, in wait_until
>     raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from 
> last_exception
> ducktape.errors.TimeoutError: expected valid assignments of 6 partitions when 
> num_started 2: [('ducker@ducker05', []), ('ducker@ducker06', [])]
> {code}
> To reproduce, create a system test suite file named 
> {{test_valid_assignment.yml}} with these contents:
> {code:yaml}
> failures:
>   - 
> 'kafkatest/tests/client/consumer_test.py::AssignmentValidationTest.test_valid_assignment@{"metadata_quorum":"ISOLATED_KRAFT","use_new_coordinator":true,"group_protocol":"consumer","group_remote_assignor":"range"}'
> {code}
> Then set the the {{TC_PATHS}} environment variable to include that test suite 
> file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to