[
https://issues.apache.org/jira/browse/KAFKA-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363678#comment-14363678
]
Jun Rao commented on KAFKA-2020:
--------------------------------
The following is the protocol for TopicMetadataResponse. Currently, we do the
following:
1. If leader is not available, we set the partition level error code to
LeaderNotAvailable.
2. If a non-leader replica is not available, we take that replica out of the
the assigned replica list and isr in the response. As an indication for doing
that, we set the partition level error code to ReplicaNotAvailable.
This has a few problems. First, ReplicaNotAvailable probably shouldn't be an
error, at least for the normal producer/consumer clients that just want to find
out the leader. Second, it can happen that both the leader and another replica
are not available at the same time. There is no error code to indicate both.
Third, even if a replica is not available, it's still useful to return its
replica id since some clients (e.g. admin tool) may still make use of it.
One way to address this issue is to always return the replica id for leader,
assigned replicas, and isr regardless of whether the corresponding broker is
live or not. Since we also return the list of live brokers, the client can
figure out whether a leader or a replica is live or not and act accordingly.
This way, we don't need to set the partition level error code when the leader
or a replica is not available. This doesn't change the wire protocol, but does
change the semantics. So, a new version of the protocol is needed. Since we are
debating evolving TopicMetadataRequest in KIP-4. We can potentially piggyback
on that.
{code}
MetadataResponse => [Broker][TopicMetadata]
Broker => NodeId Host Port (any number of brokers may be returned)
NodeId => int32
Host => string
Port => int32
TopicMetadata => TopicErrorCode TopicName [PartitionMetadata]
TopicErrorCode => int16
PartitionMetadata => PartitionErrorCode PartitionId Leader Replicas Isr
PartitionErrorCode => int16
PartitionId => int32
Leader => int32
Replicas => [int32]
Isr => [int32]
{code}
> I expect ReplicaNotAvailableException to have proper Javadocs
> -------------------------------------------------------------
>
> Key: KAFKA-2020
> URL: https://issues.apache.org/jira/browse/KAFKA-2020
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Reporter: Chris Riccomini
> Assignee: Neha Narkhede
>
> It looks like ReplicaNotAvailableException was copy and pasted from
> LeaderNotAvailable exception. The Javadocs were never changed. This means
> that users think that ReplicaNotAvailableException signifies leaders are not
> available. This is very different from, "I can ignore this exception," which
> is what the Kafka protocol docs say to do with ReplicaNotAvailableException.
> Related: what's the point of ReplicaNotAvailableException if it's supposed to
> be ignored?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)