Still hoping for some help here.

On Fri, Dec 7, 2018 at 12:24 AM Suman B N <sumannew...@gmail.com> wrote:

> Guys,
> Another observation is 90% of under-replicated partitions have the same
> node as the follower.
>
> *Any help in here is very much appreciated. We have very less time to
> stabilize kafka. Thanks a lot in advance.*
>
> -Suman
>
> On Thu, Dec 6, 2018 at 9:08 PM Suman B N <sumannew...@gmail.com> wrote:
>
>> +users
>>
>> On Thu, Dec 6, 2018 at 9:01 PM Suman B N <sumannew...@gmail.com> wrote:
>>
>>> Team,
>>>
>>> We are observing ISR shrink and expand very frequently. In the logs of
>>> the follower, below errors are observed:
>>>
>>> [2018-12-06 20:00:42,709] WARN [ReplicaFetcherThread-2-15], Error in
>>> fetch kafka.server.ReplicaFetcherThread$FetchRequest@a0f9ba9
>>> (kafka.server.ReplicaFetcherThread)
>>> java.io.IOException: Connection to 15 was disconnected before the
>>> response was read
>>>         at
>>> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114)
>>>         at
>>> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112)
>>>         at scala.Option.foreach(Option.scala:257)
>>>         at
>>> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112)
>>>         at
>>> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
>>>         at
>>> kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142)
>>>         at
>>> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
>>>         at
>>> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:249)
>>>         at
>>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234)
>>>         at
>>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>>>         at
>>> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>>>         at
>>> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>>>         at
>>> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>>>
>>> Can someone explain this? And help us understand how we can resolve
>>> these under-replicated partitions.
>>>
>>> server.properties file:
>>> broker.id=15
>>> port=9092
>>> zookeeper.connect=zk1,zk2,zk3,zk4,zk5,zk6
>>>
>>> default.replication.factor=2
>>> log.dirs=/data/kafka
>>> delete.topic.enable=true
>>> zookeeper.session.timeout.ms=10000
>>> inter.broker.protocol.version=0.10.2
>>> num.partitions=3
>>> min.insync.replicas=1
>>> log.retention.ms=259200000
>>> message.max.bytes=20971520
>>> replica.fetch.max.bytes=20971520
>>> replica.fetch.response.max.bytes=20971520
>>> max.partition.fetch.bytes=20971520
>>> fetch.max.bytes=20971520
>>> log.flush.interval.ms=5000
>>> log.roll.hours=24
>>> num.replica.fetchers=3
>>> num.io.threads=8
>>> num.network.threads=6
>>> log.message.format.version=0.9.0.1
>>>
>>> Also In what cases we lead to this state? We have 1200-1400 topics and
>>> 5000-6000 partitions spread across 20 node cluster. But only 30-40
>>> partitions are under-replicated while rest are in-sync. 95% of these
>>> partitions are having 2 replication factor.
>>>
>>> --
>>> *Suman*
>>>
>>
>>
>> --
>> *Suman*
>> *OlaCabs*
>>
>
>
> --
> *Suman*
> *OlaCabs*
>


-- 
*Suman*
*OlaCabs*

Reply via email to