Rajini Sivaram created KAFKA-9171:
-------------------------------------
Summary: DelayedFetch completion may throw exception, causing
successful produce to be failed
Key: KAFKA-9171
URL: https://issues.apache.org/jira/browse/KAFKA-9171
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.4.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
Fix For: 2.4.0
I was looking at the logs of the system test failure of ReassignPartitionsTest.
Logs show produce error ReplicaNotAvailableException for two records in the
producer log, but the data logs of all the brokers contain the records. The
offsets of these records are returned as successful produce for two subsequent
records which don't appear in the logs and hence the test failed.
Broker logs of the leader at the time of the reassignment and leader change
show:
{{[2019-11-11 07:23:17,727] ERROR [ReplicaManager broker=3] Error processing
append operation on partition test_topic-17 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.ReplicaNotAvailableException: Partition
test_topic-5 is not available}}
This is failing the append operation on `test_topic-17` when a different
partition `test_topic-5` was unavailable for fetch. I think it is fetch since
produce would have thrown NotLeaderForPartitionException rather than
ReplicaNotAvailableException.
We don't expect DelayedFetch to throw exceptions and it looks like we are not
handling `ReplicaNotAvailableException`.
I am not sure if this fixes the issues with ReassignPartitionsTest, but this
seems to a scenario that we should fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)