Re: what to do if replicas are not in sync

2015-04-21 Thread Gwen Shapira
They should be trying to get back into sync on their own.
Do you see any errors in broker logs?

Gwen

On Tue, Apr 21, 2015 at 10:15 AM, Thomas Kwan  wrote:
> We have 5 kafka brokers available, and created a topic with replication
> factor of 3. After a few broker issues (e.g. went out of file descriptors),
> running kafkacat on the producer node shows the following:
>
> Command:
>
> kafkacat-CentOS-6.5-x86_64 -L -b "kafka01-east.manage.com,
> kafka02-east.manage.com,kafka03-east.manage.com,kafka04-east.manage.com,
> kafka05-east.manage.com"
>
> Output:
>
>  5 brokers:
>   broker 385 at kafka04-east.manage.com:9092
>   broker 389 at kafka03-east.manage.com:9092
>   broker 381 at kafka01-east.manage.com:9092
>   broker 387 at kafka05-east.manage.com:9092
>   broker 383 at kafka02-east.manage.com:9092
> ...
>   topic "raw-events" with 32 partitions:
> partition 23, leader 387, replicas: 389,387,381, isrs: 387,389
> partition 8, leader 389, replicas: 381,389,383, isrs: 389,381
> partition 17, leader 389, replicas: 383,389,381, isrs: 389,381
> partition 26, leader 387, replicas: 387,389,381, isrs: 387,389
> partition 11, leader 387, replicas: 389,387,381, isrs: 387,389
> partition 29, leader 389, replicas: 383,389,381, isrs: 389,381
> partition 20, leader 389, replicas: 381,389,383, isrs: 389,381
> partition 2, leader 387, replicas: 387,389,381, isrs: 387
> partition 5, leader 389, replicas: 383,389,381, isrs: 389,381
> partition 14, leader 387, replicas: 387,389,381, isrs: 387,389
> partition 4, leader 387, replicas: 381,387,389, isrs: 387,389
> partition 13, leader 387, replicas: 383,387,389, isrs: 387,389
> partition 22, leader 389, replicas: 387,383,389, isrs: 389,387
> partition 31, leader 387, replicas: 389,383,387, isrs: 387,389
> partition 7, leader 387, replicas: 389,383,387, isrs: 387,389
> partition 16, leader 387, replicas: 381,387,389, isrs: 387
> partition 25, leader 387, replicas: 383,387,389, isrs: 387,389
> partition 10, leader 387, replicas: 387,383,389, isrs: 387,389
> partition 1, leader 387, replicas: 383,387,389, isrs: 387,389
> partition 28, leader 387, replicas: 381,387,389, isrs: 387
> partition 19, leader 387, replicas: 389,383,387, isrs: 387,389
> partition 18, leader 387, replicas: 387,381,383, isrs: 387,381
> partition 9, leader 387, replicas: 383,381,387, isrs: 387,381
> partition 27, leader 389, replicas: 389,381,383, isrs: 389,381
> partition 12, leader 387, replicas: 381,383,387, isrs: 387,381
> partition 21, leader 387, replicas: 383,381,387, isrs: 387,381
> partition 3, leader 389, replicas: 389,381,383, isrs: 389,381
> partition 30, leader 387, replicas: 387,381,383, isrs: 387,381
> partition 15, leader 389, replicas: 389,381,383, isrs: 389,381
> partition 6, leader 387, replicas: 387,381,383, isrs: 387,381
> partition 24, leader 387, replicas: 381,383,387, isrs: 387,381
> partition 0, leader 387, replicas: 381,383,387, isrs: 387,381
>
> I notice that some partition (partition #2 for example) only has 1 node
> under isrs. From what I read, isrs shows a list of brokers that have data
> that is in-sync.
>
> My question is - now some partitions are out of sync. What do I do to get
> them in sync again?
>
> thanks
> thomas


what to do if replicas are not in sync

2015-04-21 Thread Thomas Kwan
We have 5 kafka brokers available, and created a topic with replication
factor of 3. After a few broker issues (e.g. went out of file descriptors),
running kafkacat on the producer node shows the following:

Command:

kafkacat-CentOS-6.5-x86_64 -L -b "kafka01-east.manage.com,
kafka02-east.manage.com,kafka03-east.manage.com,kafka04-east.manage.com,
kafka05-east.manage.com"

Output:

 5 brokers:
  broker 385 at kafka04-east.manage.com:9092
  broker 389 at kafka03-east.manage.com:9092
  broker 381 at kafka01-east.manage.com:9092
  broker 387 at kafka05-east.manage.com:9092
  broker 383 at kafka02-east.manage.com:9092
...
  topic "raw-events" with 32 partitions:
partition 23, leader 387, replicas: 389,387,381, isrs: 387,389
partition 8, leader 389, replicas: 381,389,383, isrs: 389,381
partition 17, leader 389, replicas: 383,389,381, isrs: 389,381
partition 26, leader 387, replicas: 387,389,381, isrs: 387,389
partition 11, leader 387, replicas: 389,387,381, isrs: 387,389
partition 29, leader 389, replicas: 383,389,381, isrs: 389,381
partition 20, leader 389, replicas: 381,389,383, isrs: 389,381
partition 2, leader 387, replicas: 387,389,381, isrs: 387
partition 5, leader 389, replicas: 383,389,381, isrs: 389,381
partition 14, leader 387, replicas: 387,389,381, isrs: 387,389
partition 4, leader 387, replicas: 381,387,389, isrs: 387,389
partition 13, leader 387, replicas: 383,387,389, isrs: 387,389
partition 22, leader 389, replicas: 387,383,389, isrs: 389,387
partition 31, leader 387, replicas: 389,383,387, isrs: 387,389
partition 7, leader 387, replicas: 389,383,387, isrs: 387,389
partition 16, leader 387, replicas: 381,387,389, isrs: 387
partition 25, leader 387, replicas: 383,387,389, isrs: 387,389
partition 10, leader 387, replicas: 387,383,389, isrs: 387,389
partition 1, leader 387, replicas: 383,387,389, isrs: 387,389
partition 28, leader 387, replicas: 381,387,389, isrs: 387
partition 19, leader 387, replicas: 389,383,387, isrs: 387,389
partition 18, leader 387, replicas: 387,381,383, isrs: 387,381
partition 9, leader 387, replicas: 383,381,387, isrs: 387,381
partition 27, leader 389, replicas: 389,381,383, isrs: 389,381
partition 12, leader 387, replicas: 381,383,387, isrs: 387,381
partition 21, leader 387, replicas: 383,381,387, isrs: 387,381
partition 3, leader 389, replicas: 389,381,383, isrs: 389,381
partition 30, leader 387, replicas: 387,381,383, isrs: 387,381
partition 15, leader 389, replicas: 389,381,383, isrs: 389,381
partition 6, leader 387, replicas: 387,381,383, isrs: 387,381
partition 24, leader 387, replicas: 381,383,387, isrs: 387,381
partition 0, leader 387, replicas: 381,383,387, isrs: 387,381

I notice that some partition (partition #2 for example) only has 1 node
under isrs. From what I read, isrs shows a list of brokers that have data
that is in-sync.

My question is - now some partitions are out of sync. What do I do to get
them in sync again?

thanks
thomas