Neil,

We fixed a bug related to the BadVersion problem in 0.8.1.1. Would you mind
repeating your test on 0.8.1.1 and if you can still reproduce this issue,
then send around the thread dump and attach the logs to KAFKA-1407?

Thanks,
Neha

On Tue, Oct 21, 2014 at 11:56 AM, Neil Harkins <nhark...@gmail.com> wrote:

> Hi. I've got a 5 node cluster running Kafka 0.8.1,
> with 4697 partitions (2 replicas each) across 564 topics.
> I'm sending it about 1% of our total messaging load now,
> and several times a day there is a period where 1~1500
> partitions have one replica not in sync. Is this normal?
> If a consumer is reading from a replica that gets deemed
> "not in sync", does it get redirected to the good replica?
> Is there a #partitions over which maintenance tasks
> become infeasible?
>
> Relevant config bits:
> auto.leader.rebalance.enable=true
> leader.imbalance.per.broker.percentage=20
> leader.imbalance.check.interval.seconds=30
> replica.lag.time.max.ms=10000
> replica.lag.max.messages=4000
> num.replica.fetchers=4
> replica.fetch.max.bytes=10485760
>
> Not necessarily correlated to those periods,
> I see a lot of these errors in the logs:
>
> [2014-10-20 21:23:26,999] 21963614 [ReplicaFetcherThread-3-1] ERROR
> kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-3-1], Error
> in fetch Name: FetchRequest; Version: 0; CorrelationId: 77423;
> ClientId: ReplicaFetcherThread-3-1; ReplicaId: 2; MaxWait: 500 ms;
> MinBytes: 1 bytes; RequestInfo: ...
>
> And a few of these:
>
> [2014-10-20 21:23:39,555] 3467527 [kafka-scheduler-2] ERROR
> kafka.utils.ZkUtils$  - Conditional update of path
> /brokers/topics/foo.bar/partitions/3/state with data
> {"controller_epoch":11,"leader":3,"version":1,"leader_epoch":109,"isr":[3]}
> and expected version 197 failed due to
> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /brokers/topics/foo.bar/partitions/3/state
>
> And this one I assume is a client closing the connection non-gracefully,
> thus should probably be a warning, not an error?:
>
> [2014-10-20 21:54:15,599] 23812214 [kafka-processor-9092-3] ERROR
> kafka.network.Processor  - Closing socket for /10.31.0.224 because of
> error
>
> -neil
>

Reply via email to