Kamal Chandraprakash created KAFKA-19599:
--------------------------------------------
Summary: Reduce the frequency of ReplicaNotAvailableException
thrown to clients when RLMM is not ready
Key: KAFKA-19599
URL: https://issues.apache.org/jira/browse/KAFKA-19599
Project: Kafka
Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash
During broker restarts, the topic-based RemoteLogMetadataManager constructs the
state by reading the internal {{__remote_log_metadata}} topic. When the
partition is not ready to perform remote storage operations, then
ReplicaNotAvailableException thrown back to the consumer. The clients retries
the request immediately.
This can result to lot of FetchConsumer requests on the broker and can utilize
the request handler threads. Using CountdownLatch the frequency of
ReplicaNotAvailableException thrown back to the clients can be reduced. This
will improve the request handler thread usage on the broker.
Reproducer:
1. Standalone one node cluster with LocalTieredStorage setup.
2. Create a topic with remote storage enabled. RF = 1 and partitionCount = 2
3. Produce few message and ensure that the segments are uploaded to remote
storage.
4. Use console-consumer to read the produced messages from the beginning of the
topic.
5. Update
[RemoteLogMetadataPartitionStore|https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemotePartitionMetadataStore.java?L166]
to micmic that the partition is not ready.
6. Replace the jar and restart the broker.
7. Start the console-consumer to read from the beginning of the topic.
~18K FetchConsumer requests per second are received on the broker for one
consumer:
{code:java}
% sh kafka-topics.sh --bootstrap-server localhost:9092 --topic apple
--replication-factor 1 --partitions 2 --create --config segment.bytes=1048576
--config local.retention.ms=60000 --config remote.storage.enable=true
% sh kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic apple
--from-beginning --property print.key=false --property print.value=false
# broker logs
% less nohup.out | grep "Error occurred while reading the remote data for
4ChgxqKOTPakBikyo0Thjw" | grep -c "2025-08-12 21:18"
1107088
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)