cwildman opened a new pull request, #15441:
URL: https://github.com/apache/kafka/pull/15441
…ient
## Description
Brokers can respond to metadata requests with uninitialized metadata when
they are starting up. The
[NetworkClient](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1187-L1191)
detects this scenario and ignores the empty metadata so that it can be retried
later. Unfortunately the KafkaAdminClient only detects empty metadata for
[listConsumerGroups](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L3369-L3371).
The impact of this is that other metadata requests fail with incorrect errors
when handling an uninitialized metadata response. For example describeTopics
will throw an UnknownTopicOrPartitionException for topics that do exist in the
cluster.
This PR changes the KafkaAdminClient to detect any MetadataResponse that
contains an empty broker set and throws a StaleMetadataException, that enables
the call to be automatically retried. For example `describeTopics`,
`listTopics`, `describeCluster` and the clients own metadata fetches will now
be retried if the returned brokers set is empty. Additionally any calls that
rely on metadata using the AllBrokerStrategy or the PartitionLeaderStrategy.
`listConsumerGroups` was already retrying and will continue to do so.
## Discussion
I think the better long term solution here is to have the brokers respond
with a specific error when their metadata is uninitialized. This would be a
clearer signal to all clients instead of relying on the obscure empty brokers
condition. That would be a larger change so I'd like some feedback on whether
that's the direction we want to go.
I don't think the StaleMetadataException is the perfect exception for the
uninitialized metadata scenario but there was precedent for it already in
`listConsumerGroups` so I went with that. Open to creating a new exception type
if people would like or just making the message within more clear.
## Testing
I wrote a unit test that proves `describeTopics` will now retry when it
receives an empty metadata response. This same test fails with an
UnknownTopicOrPartitionException without my change. I also patched one of the
other test scenarios (describeProducers) to include brokers in its mock
metadata response, because that test would fail otherwise now.
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org