[
https://issues.apache.org/jira/browse/KAFKA-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992681#comment-14992681
]
Guozhang Wang commented on KAFKA-2756:
--------------------------------------
Thanks for the patch [[email protected]] LGTM.
Just some background of this issue: the design of the versioning protocol is to
recommend a simple client development for any various programming languages
(e.g. third-party non-Java clients) such that developers can choose to only
support one version in their clients. Therefore the response does not include
the version id in its schema, and as a result the recommended upgrading path
would be server first, clients later to make sure the servers always understand
the request versions from the clients.
Now for inter-server communication during upgrades, which shares the same
{code} NetworkClient {code} as the client, we need to let all the servers to
stick to an common and older version of the protocol while doing the rounding
bounce to upgrade them, as [~granthenke] specified in
https://kafka.apache.org/090/documentation.html#upgrade
For either of these cases, the network client should always expect the response
version id to be the same as its request id, and hence use the version id of
its request to decode the response. However if the clients are upgraded first
before the servers and uses a version that the old servers do not recognize,
the server has no way to cleanly notify the clients, hence the problem in
KAFKA-2750. This issue can probably be fixed in future after KIP-35.
> Replication Broken between Kafka 0.8.2.1 and 0.9 - NetworkClient.java uses
> wrong protocol version
> -------------------------------------------------------------------------------------------------
>
> Key: KAFKA-2756
> URL: https://issues.apache.org/jira/browse/KAFKA-2756
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.9.0.0
> Reporter: Matthew Bruce
> Assignee: Matthew Bruce
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2756.patch
>
>
> During a rolling upgrade from 0.8.2.1 to 0.9.0.0, replication between 0.9.0.0
> and 0.8.2.1 fails due to
> org.apache.kafka.clients.networkClient:handleCompletedReceives always using
> the latest version of the API Key available instead of the one specified by
> inter.broker.protocol.version.
> This line should not use ProtoUtils.currentResponseSchema and instead call
> ProtoUtils.ResponseSchema and specify a version explicitly:
> {code}
> Struct body = (Struct)
> ProtoUtils.currentResponseSchema(apiKey).read(receive.payload());
> {code}
> This results in WARN messages like the following in the server.log file as
> the responses are decoded with the wrong Schema:
> {code}
> [2015-11-05 19:13:10,309] WARN [ReplicaFetcherThread-0-182050600], Error in
> fetch kafka.server.ReplicaFetcherThread$FetchRequest@6cc18858. Possible
> cause: org.apache.kafka.common.protocol.types.SchemaException: Error reading
> field 'responses': Error reading field 'topic':
> java.nio.BufferUnderflowException (kafka.server.ReplicaFetcherThread)
> {code}
> {code}
> [2015-11-03 16:55:15,178] WARN [ReplicaFetcherThread-1-182050600], Error in
> fetch kafka.server.ReplicaFetcherThread$FetchRequest@224388b2. Possible
> cause: org.apache.kafka.common.protocol.types.SchemaException: Error reading
> field 'responses': Error reading field 'partition_responses': Error reading
> field 'record_set': java.lang.IllegalArgumentException
> (kafka.server.ReplicaFetcherThread)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)