More information about the issue:
When the issue happens, the controller is always on the 0.9 version Kafka
broker.
In server.log of other brokers, we can see this kind of error:
[2016-03-23 22:36:02,814] ERROR [ReplicaFetcherThread-0-5], Error for
partition [topic,208] to broker
5:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
(kafka.server.ReplicaFetcherThread)

And after restart that controller, everything works again.


On Tue, Mar 22, 2016 at 6:14 PM, Qi Xu <shkir...@gmail.com> wrote:

> Hi folks, Rajiv, Jun,
> I'd like to bring up this thread again from Rajiv Kurian 3 months ago.
> Basically we did the same thing as Rajiv did. I upgraded two machines (out
> of 10) from 0.8.2.1 to 0.9. SO after the upgrade, there will be 2 machines
> in 0.9 and 8 machines in 0.8.2.1. And initially it all works fine. But
> after about 2 hours, all old uploaders and consumers are broken due to no
> leader found for all partitions of all topics. The producer just complains
> "unknown error for topic xxx when it tries to refresh the metadata". And in
> server side there's some error complaining no leader for a partition.
> I'm wondering is there any known issue about 0.9 and 0.8.2 co-existing
> version in the same cluster? Thanks a lot.
>
>
> Below is the original thread:
>
> We had to revert to 0.8.3 because three of our topics seem to have gotten
> corrupted during the upgrade. As soon as we did the upgrade producers to
> the three topics I mentioned stopped being able to do writes. The clients
> complained (occasionally) about leader not found exceptions. We restarted
> our clients and brokers but that didn't seem to help. Actually even after
> reverting to 0.8.3 these three topics were broken. To fix it we had to stop
> all clients, delete the topics, create them again and then restart the
> clients.
>
> I realize this is not a lot of info. I couldn't wait to get more debug info
> because the cluster was actually being used. Has any one run into something
> like this? Are there any known issues with old consumers/producers. The
> topics that got busted had clients writing to them using the old Java
> wrapper over the Scala producer.
>
> Here are the steps I took to upgrade.
>
> For each broker:
>
> 1. Stop the broker.
> 2. Restart with the *0.9* broker running with
> inter.broker.protocol.version=*0.8.2*.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
> Once all the brokers were running the *0.9* code with
> inter.broker.protocol.version=*0.8.2*.X we restarted them one by one with
> inter.broker.protocol.version=0.9.0.0
>
> When reverting I did the following.
>
> For each broker.
>
> 1. Stop the broker.
> 2. Restart with the *0.9* broker running with
> inter.broker.protocol.version=*0.8.2*.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
>
> Once all the brokers were running *0.9* code with
> inter.broker.protocol.version=*0.8.2*.X  I restarted them one by one with
> the
> 0.8.2.3 broker code. This however like I mentioned did not fix the three
> broken topics.
>

Reply via email to