[jira] [Commented] (KAFKA-15591) Trogdor produce workload reports errors in KRaft mode

Xi Yang (Jira) Thu, 12 Oct 2023 16:24:40 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774710#comment-17774710
 ]


Xi Yang commented on KAFKA-15591:
---------------------------------

Thanks for your reply [~rndgstn]. 

>If the broker is responding that it does not know about that partition then it 
>could be the case that it has not replicated and acted upon the records in the 
>metadata log that created the partition and identified it as the leader.

But in this case, there is only one broker and the replicationFactor is only 1, 
so should the broker should have known the partitions when the topic is created?

 

>he logs you pasted above show this happening between 2023-10-12 00:30:50,862 
>and 2023-10-12 00:30:50,876, which is 14 ms, which doesn't seem like a lot of 
>time.

I attached the full log. It takes about 1 second with a few retries to resolve 
the error of NOT_LEADER_OR_FOLLOWER. For example, on partition foo1-8:
{code:java}
[2023-10-12 22:59:31,766] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 6 on topic-partition foo1-8, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
.....
a few retries until
.....
[2023-10-12 22:59:32,059] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 44 on topic-partition foo1-8, retrying (2147483644 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-12 22:59:32,059] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-8 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender){code}
Is there a way to reduce this latency? Adding a few seconds delay in each 
benchmark iterations could increase the benchmarking time significantly since 
the users normally run 20 to 30 iterations and repeat this process 20 times.

 

The log also shows that after this NOT_LEADER_OR_FOLLOWER error. The trogor 
client reports this OUT_OF_ORDER_SEQUENCE_NUMBER error that lasts about ~60 
seconds. Any idea how to address this error?

 
{code:java}
[2023-10-12 22:59:33,055] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 90 on topic-partition foo3-2, retrying (2147483642 
attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER 
(org.apache.kafka.clients.producer.internals.Sender)
.....
repeat until
.....
[2023-10-12 23:01:31,755] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 227 on topic-partition foo3-2, retrying 
(2147483523 attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER 
(org.apache.kafka.clients.producer.internals.Sender)
 {code}
cc: [~cmccabe] have you seen similar errors when running the 
simple_produce_bench.json Trogdor task?

 

 
[^trogdor-kafka-kraft.txt]
 

 

> Trogdor produce workload reports errors in KRaft mode
> -----------------------------------------------------
>
>                 Key: KAFKA-15591
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15591
>             Project: Kafka
>          Issue Type: Bug
>         Environment: Linux
>            Reporter: Xi Yang
>            Priority: Blocker
>         Attachments: trogdor-kafka-kraft.txt
>
>
> The Kafka benchmark in the Dacapo Benchmark Suite uses the Trogdor's exec 
> mode ([https://github.com/dacapobench/dacapobench/pull/224)]  to test the 
> Kafka broker.
>  
> I am trying to update the benchmark to use the KRaft protocol. We use single 
> Kafka instant that plays both controller and broker following the guide in 
> Kafka README.md 
> (https://github.com/apache/kafka#running-a-kafka-broker-in-kraft-mode).
>  
> However, the Trogdor producing workload  
> (tests/spec/simple_produce_bench.json) reports the NOT_LEADER_OR_FOLLOWER 
> error. The errors are gone after many time of retry. Is this caused by that 
> in KRaft protocal, Kafka doesn't not elect leaders immediately after a new 
> topic created but rather do that on-demand after receiving the first message 
> on the topic? If this is the root cause, Is there a way to ask Kafka to elect 
> the leader after creating the topic?
> {code:java}
> // code placeholder
> ./bin/trogdor.sh agent -n node0 -c ./config/trogdor.conf --exec 
> ./tests/spec/simple_produce_bench.json
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/xyang/code/kafka/tools/build/dependant-libs-2.13.12/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/xyang/code/kafka/trogdor/build/dependant-libs-2.13.12/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
> Oct 12, 2023 12:30:50 AM org.glassfish.jersey.server.wadl.WadlFeature 
> configure
> WARNING: JAXBContext implementation could not be found. WADL feature is 
> disabled.
> Oct 12, 2023 12:30:50 AM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider org.apache.kafka.trogdor.agent.AgentRestResource 
> registered in SERVER runtime does not implement any provider interfaces 
> applicable in the SERVER runtime. Due to constraint configuration problems 
> the provider org.apache.kafka.trogdor.agent.AgentRestResource will be ignored.
> Waiting for completion of task:{
>   "class" : "org.apache.kafka.trogdor.workload.ProduceBenchSpec",
>   "startMs" : 1697070650540,
>   "durationMs" : 10000000,
>   "producerNode" : "node0",
>   "bootstrapServers" : "localhost:9092",
>   "targetMessagesPerSec" : 10000,
>   "maxMessages" : 50000,
>   "keyGenerator" : {
>     "type" : "sequential",
>     "size" : 4,
>     "startOffset" : 0
>   },
>   "valueGenerator" : {
>     "type" : "constant",
>     "size" : 512,
>     "value" : 
> "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
>   },
>   "activeTopics" : {
>     "foo[1-3]" : {
>       "numPartitions" : 10,
>       "replicationFactor" : 1
>     }
>   },
>   "inactiveTopics" : {
>     "foo[4-5]" : {
>       "numPartitions" : 10,
>       "replicationFactor" : 1
>     }
>   },
>   "useConfiguredPartitioner" : false,
>   "skipFlush" : false
> }
> [2023-10-12 00:30:50,862] WARN [Producer clientId=producer-1] Got error 
> produce response with correlation id 6 on topic-partition foo1-8, retrying 
> (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,862] WARN [Producer clientId=producer-1] Received 
> invalid metadata error in produce request on partition foo1-8 due to 
> org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
> intended only for the leader, this error indicates that the broker is not the 
> current leader. For requests intended for any replica, this error indicates 
> that the broker is not a replica of the topic partition.. Going to request 
> metadata update now (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,870] WARN [Producer clientId=producer-1] Got error 
> produce response with correlation id 8 on topic-partition foo2-8, retrying 
> (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,870] WARN [Producer clientId=producer-1] Received 
> invalid metadata error in produce request on partition foo2-8 due to 
> org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
> intended only for the leader, this error indicates that the broker is not the 
> current leader. For requests intended for any replica, this error indicates 
> that the broker is not a replica of the topic partition.. Going to request 
> metadata update now (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,875] WARN [Producer clientId=producer-1] Got error 
> produce response with correlation id 9 on topic-partition foo3-8, retrying 
> (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,875] WARN [Producer clientId=producer-1] Received 
> invalid metadata error in produce request on partition foo3-8 due to 
> org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
> intended only for the leader, this error indicates that the broker is not the 
> current leader. For requests intended for any replica, this error indicates 
> that the broker is not a replica of the topic partition.. Going to request 
> metadata update now (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,876] WARN [Producer clientId=producer-1] Got error 
> produce response with correlation id 9 on topic-partition foo2-4, retrying 
> (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2023-10-12 00:30:50,876] WARN [Producer clientId=producer-1] Received 
> invalid metadata error in produce request on partition foo2-4 due to 
> org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
> intended only for the leader, this error indicates that the broker is not the 
> current leader. For requests intended for any replica, this error indicates 
> that the broker is not a replica of the topic partition.. Going to request 
> metadata update now (org.apache.kafka.clients.producer.internals.Sender) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15591) Trogdor produce workload reports errors in KRaft mode

Reply via email to