Re: Very long consumer rebalances

2018-08-22 Thread Shantanu Deshmukh
Can anyone help me understand how to debug this issue? I tried setting log
level to trace in consumer logback configuration. But at such times nothing
appears in log, even in trace level. It is like entire code is frozen.

On Thu, Aug 16, 2018 at 6:32 PM Shantanu Deshmukh 
wrote:

> I saw a few topics with segment.ms and retention.ms property set. Can
> that be causing any issue? I remember that this is the only change I
> carried out to the cluster in last couple of months after which the problem
> started.
>
> On Fri, Aug 10, 2018 at 2:55 PM M. Manna  wrote:
>
>> if you can upgrade, I would say upgrading to 0.10.2.x would be better for
>> you (or even higher, 2.0.0). Otherwise you have to play around with
>> max.poll.records and session.timeout.ms.
>>
>> As the doc says (or newer versions), the adjustment should be such that
>> request.timeout.ms >= max.poll.interval.ms. Also, heartbeat.interval.ms
>> should be curbed at (rule of thumb) 30% of session.timeout.ms.
>>
>> Lastly, all these have to be within the bounds of
>> group.min.session.timeout.ms and group.max.session.timeout.ms.
>>
>> You can check all these, tune them as necessary, and retry. Some of these
>> configs may or may not be applicable at runtime. so a rolling restart may
>> be required before all changes take place.
>>
>> On 9 August 2018 at 13:48, Shantanu Deshmukh 
>> wrote:
>>
>> > Hi,
>> >
>> > Yes my consumer application works like below
>> >
>> >1. Reads how many workers are required to process each topics from
>> >properties file
>> >2. As many threads are spawned as there are workers mentioned in
>> >properties file, topic name is passed to this thread. FixedThreadPool
>> >implementation is used.
>> >3. Each worker thread initializes one consumer object and subscribes
>> to
>> >given topic. Consumer group is simply -consumer. So if my
>> > topic
>> >bulk-email, then consumer group for all those threads is
>> > bulk-email-consumer
>> >4. Once this is done, inside an infinite while loop
>> consumer.poll(100)
>> >method keeps running. So this application is a daemon. Shuts down
>> only
>> > when
>> >server shuts down or in case of kill command.
>> >
>> > I have configured session.timeout.ms in consumer properties. I haven't
>> > done
>> > anything about zookeeper timeout. Is it required now? Since consumer
>> > accesses only the brokers.
>> >
>> > On Thu, Aug 9, 2018 at 3:03 PM M. Manna  wrote:
>> >
>> > > In the simplest way, how have you implemented your consumer?
>> > >
>> > > 1) Does your consumers join a designated group, process messages, and
>> > then
>> > > closes all connection? Or does it stay open perpetually until server
>> > > shutdown?
>> > > 2) Have you configured the session timeouts for client and zookeeper
>> > > accordingly?
>> > >
>> > > Regards,
>> > >
>> > > On 9 August 2018 at 08:00, Shantanu Deshmukh 
>> > > wrote:
>> > >
>> > > >  I am facing too many problems these days. Now one of our consumer
>> > groups
>> > > > is rebalancing every now and then. And rebalance takes very low,
>> more
>> > > than
>> > > > 5-10 minutes. Even after re-balancing I see that only half of the
>> > > consumers
>> > > > are active/receive assignment. Its all going haywire.
>> > > >
>> > > > I am seeing these logs in kafka consumer logs. Can anyone help me
>> > > > understand what is going on here? It is a very long piece of log,
>> but
>> > > > someone please help me. I am desperately looking for any solution
>> since
>> > > > more than 2 months now. But to no avail.
>> > > >
>> > > > [2018-08-09 11:39:51] :: DEBUG ::
>> > > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received
>> successful
>> > > > heartbeat response for group bulk-email-consumer
>> > > > [2018-08-09 11:39:53] :: DEBUG ::
>> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
>> > > > bulk-email-consumer committed offset 25465113 for partition
>> > bulk-email-8
>> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
>> Completed
>> > > > autocommit of offsets
>> {bulk-email-8=OffsetAndMetadata{offset=25465113,
>> > > > metadata=''}} for group bulk-email-consumer
>> > > > [2018-08-09 11:39:53] :: DEBUG ::
>> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
>> > > > bulk-email-consumer committed offset 25463566 for partition
>> > bulk-email-6
>> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
>> Completed
>> > > > autocommit of offsets
>> {bulk-email-6=OffsetAndMetadata{offset=25463566,
>> > > > metadata=''}} for group bulk-email-consumer
>> > > > [2018-08-09 11:39:53] :: DEBUG ::
>> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
>> > > > bulk-email-consumer committed offset 2588 for partition
>> > bulk-email-9
>> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
>> Completed
>> > > > autocommit of offsets
>> {bulk-email-9=OffsetAndMetadata{offset=2588,
>> > > > metadata=''}} for group 

Re: Very long consumer rebalances

2018-08-16 Thread Shantanu Deshmukh
I saw a few topics with segment.ms and retention.ms property set. Can that
be causing any issue? I remember that this is the only change I carried out
to the cluster in last couple of months after which the problem started.

On Fri, Aug 10, 2018 at 2:55 PM M. Manna  wrote:

> if you can upgrade, I would say upgrading to 0.10.2.x would be better for
> you (or even higher, 2.0.0). Otherwise you have to play around with
> max.poll.records and session.timeout.ms.
>
> As the doc says (or newer versions), the adjustment should be such that
> request.timeout.ms >= max.poll.interval.ms. Also, heartbeat.interval.ms
> should be curbed at (rule of thumb) 30% of session.timeout.ms.
>
> Lastly, all these have to be within the bounds of
> group.min.session.timeout.ms and group.max.session.timeout.ms.
>
> You can check all these, tune them as necessary, and retry. Some of these
> configs may or may not be applicable at runtime. so a rolling restart may
> be required before all changes take place.
>
> On 9 August 2018 at 13:48, Shantanu Deshmukh 
> wrote:
>
> > Hi,
> >
> > Yes my consumer application works like below
> >
> >1. Reads how many workers are required to process each topics from
> >properties file
> >2. As many threads are spawned as there are workers mentioned in
> >properties file, topic name is passed to this thread. FixedThreadPool
> >implementation is used.
> >3. Each worker thread initializes one consumer object and subscribes
> to
> >given topic. Consumer group is simply -consumer. So if my
> > topic
> >bulk-email, then consumer group for all those threads is
> > bulk-email-consumer
> >4. Once this is done, inside an infinite while loop consumer.poll(100)
> >method keeps running. So this application is a daemon. Shuts down only
> > when
> >server shuts down or in case of kill command.
> >
> > I have configured session.timeout.ms in consumer properties. I haven't
> > done
> > anything about zookeeper timeout. Is it required now? Since consumer
> > accesses only the brokers.
> >
> > On Thu, Aug 9, 2018 at 3:03 PM M. Manna  wrote:
> >
> > > In the simplest way, how have you implemented your consumer?
> > >
> > > 1) Does your consumers join a designated group, process messages, and
> > then
> > > closes all connection? Or does it stay open perpetually until server
> > > shutdown?
> > > 2) Have you configured the session timeouts for client and zookeeper
> > > accordingly?
> > >
> > > Regards,
> > >
> > > On 9 August 2018 at 08:00, Shantanu Deshmukh 
> > > wrote:
> > >
> > > >  I am facing too many problems these days. Now one of our consumer
> > groups
> > > > is rebalancing every now and then. And rebalance takes very low, more
> > > than
> > > > 5-10 minutes. Even after re-balancing I see that only half of the
> > > consumers
> > > > are active/receive assignment. Its all going haywire.
> > > >
> > > > I am seeing these logs in kafka consumer logs. Can anyone help me
> > > > understand what is going on here? It is a very long piece of log, but
> > > > someone please help me. I am desperately looking for any solution
> since
> > > > more than 2 months now. But to no avail.
> > > >
> > > > [2018-08-09 11:39:51] :: DEBUG ::
> > > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received
> successful
> > > > heartbeat response for group bulk-email-consumer
> > > > [2018-08-09 11:39:53] :: DEBUG ::
> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > > bulk-email-consumer committed offset 25465113 for partition
> > bulk-email-8
> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
> Completed
> > > > autocommit of offsets
> {bulk-email-8=OffsetAndMetadata{offset=25465113,
> > > > metadata=''}} for group bulk-email-consumer
> > > > [2018-08-09 11:39:53] :: DEBUG ::
> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > > bulk-email-consumer committed offset 25463566 for partition
> > bulk-email-6
> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
> Completed
> > > > autocommit of offsets
> {bulk-email-6=OffsetAndMetadata{offset=25463566,
> > > > metadata=''}} for group bulk-email-consumer
> > > > [2018-08-09 11:39:53] :: DEBUG ::
> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > > bulk-email-consumer committed offset 2588 for partition
> > bulk-email-9
> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
> Completed
> > > > autocommit of offsets
> {bulk-email-9=OffsetAndMetadata{offset=2588,
> > > > metadata=''}} for group bulk-email-consumer
> > > > [2018-08-09 11:39:54] :: DEBUG ::
> > > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received
> successful
> > > > heartbeat response for group bulk-email-consumer
> > > > [2018-08-09 11:39:54] :: DEBUG ::
> > > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received
> successful
> > > > heartbeat response for group bulk-email-consumer
> > > > [2018-08-09 11:39:54] :: DEBUG ::
> > > 

Re: Very long consumer rebalances

2018-08-10 Thread Puneet Saha
Please remove me from the list

On Fri, Jul 6, 2018 at 2:55 AM Shantanu Deshmukh 
wrote:

> Hello everyone,
>
> We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app which
> spawns many consumer threads consuming from different topics. For every
> topic we have specified different consumer-group. A lot of times I see that
> whenever this application is restarted a CG on one or two topics takes more
> than 5 minutes to receive partition assignment. Till that time consumers
> for that topic don't consumer anything. If I go to Kafka broker and run
> consumer-groups.sh and describe that particular CG I see that it is
> rebalancing. There is time critical data stored in that topic and we cannot
> tolerate such long delays. What can be the reason for such long rebalances.
>
> Here's our consumer config
>
>
> auto.commit.interval.ms = 3000
> auto.offset.reset = latest
> bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> check.crcs = true
> client.id =
> connections.max.idle.ms = 54
> enable.auto.commit = true
> exclude.internal.topics = true
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 500
> fetch.min.bytes = 1
> group.id = otp-notifications-consumer
> heartbeat.interval.ms = 3000
> interceptor.classes = null
> key.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
> max.partition.fetch.bytes = 1048576
> max.poll.interval.ms = 30
> max.poll.records = 50
> metadata.max.age.ms = 30
> metric.reporters = []
> metrics.num.samples = 2
> metrics.sample.window.ms = 3
> partition.assignment.strategy = [class
> org.apache.kafka.clients.consumer.RangeAssignor]
> receive.buffer.bytes = 65536
> reconnect.backoff.ms = 50
> request.timeout.ms = 305000
> retry.backoff.ms = 100
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> sasl.kerberos.min.time.before.relogin = 6
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> sasl.kerberos.ticket.renew.window.factor = 0.8
> sasl.mechanism = GSSAPI
> security.protocol = SSL
> send.buffer.bytes = 131072
> session.timeout.ms = 30
> ssl.cipher.suites = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> ssl.endpoint.identification.algorithm = null
> ssl.key.password = null
> ssl.keymanager.algorithm = SunX509
> ssl.keystore.location = null
> ssl.keystore.password = null
> ssl.keystore.type = JKS
> ssl.protocol = TLS
> ssl.provider = null
> ssl.secure.random.implementation = null
> ssl.trustmanager.algorithm = PKIX
> ssl.truststore.location = /x/x/client.truststore.jks
> ssl.truststore.password = [hidden]
> ssl.truststore.type = JKS
> value.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
>
> Please help.
>
> *Thanks & Regards,*
> *Shantanu Deshmukh*
>


Re: Very long consumer rebalances

2018-08-10 Thread Shantanu Deshmukh
All this is fine. But what I do not understand is, why only some of the
consumer groups start very late. We have 7 topics and their consumers
belong to 7 different CGs. Whenever I want to restart my application Only
one or two of them will start very late. It takes almost 5-10 minutes
before their consumers start receiving data.
When that delay occurs there is absolutely no logs generated by consumer
library.

On Fri, Aug 10, 2018 at 2:55 PM M. Manna  wrote:

> if you can upgrade, I would say upgrading to 0.10.2.x would be better for
> you (or even higher, 2.0.0). Otherwise you have to play around with
> max.poll.records and session.timeout.ms.
>
> As the doc says (or newer versions), the adjustment should be such that
> request.timeout.ms >= max.poll.interval.ms. Also, heartbeat.interval.ms
> should be curbed at (rule of thumb) 30% of session.timeout.ms.
>
> Lastly, all these have to be within the bounds of
> group.min.session.timeout.ms and group.max.session.timeout.ms.
>
> You can check all these, tune them as necessary, and retry. Some of these
> configs may or may not be applicable at runtime. so a rolling restart may
> be required before all changes take place.
>
> On 9 August 2018 at 13:48, Shantanu Deshmukh 
> wrote:
>
> > Hi,
> >
> > Yes my consumer application works like below
> >
> >1. Reads how many workers are required to process each topics from
> >properties file
> >2. As many threads are spawned as there are workers mentioned in
> >properties file, topic name is passed to this thread. FixedThreadPool
> >implementation is used.
> >3. Each worker thread initializes one consumer object and subscribes
> to
> >given topic. Consumer group is simply -consumer. So if my
> > topic
> >bulk-email, then consumer group for all those threads is
> > bulk-email-consumer
> >4. Once this is done, inside an infinite while loop consumer.poll(100)
> >method keeps running. So this application is a daemon. Shuts down only
> > when
> >server shuts down or in case of kill command.
> >
> > I have configured session.timeout.ms in consumer properties. I haven't
> > done
> > anything about zookeeper timeout. Is it required now? Since consumer
> > accesses only the brokers.
> >
> > On Thu, Aug 9, 2018 at 3:03 PM M. Manna  wrote:
> >
> > > In the simplest way, how have you implemented your consumer?
> > >
> > > 1) Does your consumers join a designated group, process messages, and
> > then
> > > closes all connection? Or does it stay open perpetually until server
> > > shutdown?
> > > 2) Have you configured the session timeouts for client and zookeeper
> > > accordingly?
> > >
> > > Regards,
> > >
> > > On 9 August 2018 at 08:00, Shantanu Deshmukh 
> > > wrote:
> > >
> > > >  I am facing too many problems these days. Now one of our consumer
> > groups
> > > > is rebalancing every now and then. And rebalance takes very low, more
> > > than
> > > > 5-10 minutes. Even after re-balancing I see that only half of the
> > > consumers
> > > > are active/receive assignment. Its all going haywire.
> > > >
> > > > I am seeing these logs in kafka consumer logs. Can anyone help me
> > > > understand what is going on here? It is a very long piece of log, but
> > > > someone please help me. I am desperately looking for any solution
> since
> > > > more than 2 months now. But to no avail.
> > > >
> > > > [2018-08-09 11:39:51] :: DEBUG ::
> > > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received
> successful
> > > > heartbeat response for group bulk-email-consumer
> > > > [2018-08-09 11:39:53] :: DEBUG ::
> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > > bulk-email-consumer committed offset 25465113 for partition
> > bulk-email-8
> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
> Completed
> > > > autocommit of offsets
> {bulk-email-8=OffsetAndMetadata{offset=25465113,
> > > > metadata=''}} for group bulk-email-consumer
> > > > [2018-08-09 11:39:53] :: DEBUG ::
> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > > bulk-email-consumer committed offset 25463566 for partition
> > bulk-email-6
> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
> Completed
> > > > autocommit of offsets
> {bulk-email-6=OffsetAndMetadata{offset=25463566,
> > > > metadata=''}} for group bulk-email-consumer
> > > > [2018-08-09 11:39:53] :: DEBUG ::
> > > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > > bulk-email-consumer committed offset 2588 for partition
> > bulk-email-9
> > > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 -
> Completed
> > > > autocommit of offsets
> {bulk-email-9=OffsetAndMetadata{offset=2588,
> > > > metadata=''}} for group bulk-email-consumer
> > > > [2018-08-09 11:39:54] :: DEBUG ::
> > > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received
> successful
> > > > heartbeat response for group bulk-email-consumer
> > > > [2018-08-09 11:39:54] :: DEBUG 

Re: Very long consumer rebalances

2018-08-10 Thread M. Manna
if you can upgrade, I would say upgrading to 0.10.2.x would be better for
you (or even higher, 2.0.0). Otherwise you have to play around with
max.poll.records and session.timeout.ms.

As the doc says (or newer versions), the adjustment should be such that
request.timeout.ms >= max.poll.interval.ms. Also, heartbeat.interval.ms
should be curbed at (rule of thumb) 30% of session.timeout.ms.

Lastly, all these have to be within the bounds of
group.min.session.timeout.ms and group.max.session.timeout.ms.

You can check all these, tune them as necessary, and retry. Some of these
configs may or may not be applicable at runtime. so a rolling restart may
be required before all changes take place.

On 9 August 2018 at 13:48, Shantanu Deshmukh  wrote:

> Hi,
>
> Yes my consumer application works like below
>
>1. Reads how many workers are required to process each topics from
>properties file
>2. As many threads are spawned as there are workers mentioned in
>properties file, topic name is passed to this thread. FixedThreadPool
>implementation is used.
>3. Each worker thread initializes one consumer object and subscribes to
>given topic. Consumer group is simply -consumer. So if my
> topic
>bulk-email, then consumer group for all those threads is
> bulk-email-consumer
>4. Once this is done, inside an infinite while loop consumer.poll(100)
>method keeps running. So this application is a daemon. Shuts down only
> when
>server shuts down or in case of kill command.
>
> I have configured session.timeout.ms in consumer properties. I haven't
> done
> anything about zookeeper timeout. Is it required now? Since consumer
> accesses only the brokers.
>
> On Thu, Aug 9, 2018 at 3:03 PM M. Manna  wrote:
>
> > In the simplest way, how have you implemented your consumer?
> >
> > 1) Does your consumers join a designated group, process messages, and
> then
> > closes all connection? Or does it stay open perpetually until server
> > shutdown?
> > 2) Have you configured the session timeouts for client and zookeeper
> > accordingly?
> >
> > Regards,
> >
> > On 9 August 2018 at 08:00, Shantanu Deshmukh 
> > wrote:
> >
> > >  I am facing too many problems these days. Now one of our consumer
> groups
> > > is rebalancing every now and then. And rebalance takes very low, more
> > than
> > > 5-10 minutes. Even after re-balancing I see that only half of the
> > consumers
> > > are active/receive assignment. Its all going haywire.
> > >
> > > I am seeing these logs in kafka consumer logs. Can anyone help me
> > > understand what is going on here? It is a very long piece of log, but
> > > someone please help me. I am desperately looking for any solution since
> > > more than 2 months now. But to no avail.
> > >
> > > [2018-08-09 11:39:51] :: DEBUG ::
> > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > > heartbeat response for group bulk-email-consumer
> > > [2018-08-09 11:39:53] :: DEBUG ::
> > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > bulk-email-consumer committed offset 25465113 for partition
> bulk-email-8
> > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > > autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
> > > metadata=''}} for group bulk-email-consumer
> > > [2018-08-09 11:39:53] :: DEBUG ::
> > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > bulk-email-consumer committed offset 25463566 for partition
> bulk-email-6
> > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > > autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
> > > metadata=''}} for group bulk-email-consumer
> > > [2018-08-09 11:39:53] :: DEBUG ::
> > > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > > bulk-email-consumer committed offset 2588 for partition
> bulk-email-9
> > > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > > autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
> > > metadata=''}} for group bulk-email-consumer
> > > [2018-08-09 11:39:54] :: DEBUG ::
> > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > > heartbeat response for group bulk-email-consumer
> > > [2018-08-09 11:39:54] :: DEBUG ::
> > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > > heartbeat response for group bulk-email-consumer
> > > [2018-08-09 11:39:54] :: DEBUG ::
> > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > > heartbeat response for group bulk-email-consumer
> > > [2018-08-09 11:39:54] :: DEBUG ::
> > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > > heartbeat response for group bulk-email-consumer
> > > [2018-08-09 11:39:54] :: DEBUG ::
> > > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > > heartbeat response for group bulk-email-consumer
> > > [2018-08-09 

Re: Very long consumer rebalances

2018-08-09 Thread Shantanu Deshmukh
Hi,

Yes my consumer application works like below

   1. Reads how many workers are required to process each topics from
   properties file
   2. As many threads are spawned as there are workers mentioned in
   properties file, topic name is passed to this thread. FixedThreadPool
   implementation is used.
   3. Each worker thread initializes one consumer object and subscribes to
   given topic. Consumer group is simply -consumer. So if my topic
   bulk-email, then consumer group for all those threads is bulk-email-consumer
   4. Once this is done, inside an infinite while loop consumer.poll(100)
   method keeps running. So this application is a daemon. Shuts down only when
   server shuts down or in case of kill command.

I have configured session.timeout.ms in consumer properties. I haven't done
anything about zookeeper timeout. Is it required now? Since consumer
accesses only the brokers.

On Thu, Aug 9, 2018 at 3:03 PM M. Manna  wrote:

> In the simplest way, how have you implemented your consumer?
>
> 1) Does your consumers join a designated group, process messages, and then
> closes all connection? Or does it stay open perpetually until server
> shutdown?
> 2) Have you configured the session timeouts for client and zookeeper
> accordingly?
>
> Regards,
>
> On 9 August 2018 at 08:00, Shantanu Deshmukh 
> wrote:
>
> >  I am facing too many problems these days. Now one of our consumer groups
> > is rebalancing every now and then. And rebalance takes very low, more
> than
> > 5-10 minutes. Even after re-balancing I see that only half of the
> consumers
> > are active/receive assignment. Its all going haywire.
> >
> > I am seeing these logs in kafka consumer logs. Can anyone help me
> > understand what is going on here? It is a very long piece of log, but
> > someone please help me. I am desperately looking for any solution since
> > more than 2 months now. But to no avail.
> >
> > [2018-08-09 11:39:51] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:53] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25465113 for partition bulk-email-8
> > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
> > metadata=''}} for group bulk-email-consumer
> > [2018-08-09 11:39:53] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25463566 for partition bulk-email-6
> > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
> > metadata=''}} for group bulk-email-consumer
> > [2018-08-09 11:39:53] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 2588 for partition bulk-email-9
> > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
> > metadata=''}} for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:56] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25463566 for partition bulk-email-6
> > [2018-08-09 11:39:56] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 2588 for partition bulk-email-9
> > [2018-08-09 11:39:56] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - 

Re: Very long consumer rebalances

2018-08-09 Thread Kamal Chandraprakash
In v0.10.0.1, consumer heartbeat background thread feature is not
available.
Lot of users faced similar errors. So, KIP-62

is
proposed. You have to update your
Kafka version.

I highly recommend you to upgrade Kafka to the version where the heartbeat
background
thread feature is implemented (v0.10.1.0). If you don't have any option to
upgrade, you
have to heartbeat the co-ordinator manually from your consumer. You can use
this

code
snippet for reference.




On Thu, Aug 9, 2018 at 3:03 PM M. Manna  wrote:

> In the simplest way, how have you implemented your consumer?
>
> 1) Does your consumers join a designated group, process messages, and then
> closes all connection? Or does it stay open perpetually until server
> shutdown?
> 2) Have you configured the session timeouts for client and zookeeper
> accordingly?
>
> Regards,
>
> On 9 August 2018 at 08:00, Shantanu Deshmukh 
> wrote:
>
> >  I am facing too many problems these days. Now one of our consumer groups
> > is rebalancing every now and then. And rebalance takes very low, more
> than
> > 5-10 minutes. Even after re-balancing I see that only half of the
> consumers
> > are active/receive assignment. Its all going haywire.
> >
> > I am seeing these logs in kafka consumer logs. Can anyone help me
> > understand what is going on here? It is a very long piece of log, but
> > someone please help me. I am desperately looking for any solution since
> > more than 2 months now. But to no avail.
> >
> > [2018-08-09 11:39:51] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:53] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25465113 for partition bulk-email-8
> > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
> > metadata=''}} for group bulk-email-consumer
> > [2018-08-09 11:39:53] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25463566 for partition bulk-email-6
> > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
> > metadata=''}} for group bulk-email-consumer
> > [2018-08-09 11:39:53] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 2588 for partition bulk-email-9
> > [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
> > metadata=''}} for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:54] :: DEBUG ::
> > AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> > heartbeat response for group bulk-email-consumer
> > [2018-08-09 11:39:56] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25463566 for partition bulk-email-6
> > [2018-08-09 11:39:56] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 2588 for partition bulk-email-9
> > [2018-08-09 11:39:56] :: DEBUG ::
> > ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> > bulk-email-consumer committed offset 25465113 for partition bulk-email-8
> > [2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> > autocommit of offsets 

Re: Very long consumer rebalances

2018-08-09 Thread M. Manna
In the simplest way, how have you implemented your consumer?

1) Does your consumers join a designated group, process messages, and then
closes all connection? Or does it stay open perpetually until server
shutdown?
2) Have you configured the session timeouts for client and zookeeper
accordingly?

Regards,

On 9 August 2018 at 08:00, Shantanu Deshmukh  wrote:

>  I am facing too many problems these days. Now one of our consumer groups
> is rebalancing every now and then. And rebalance takes very low, more than
> 5-10 minutes. Even after re-balancing I see that only half of the consumers
> are active/receive assignment. Its all going haywire.
>
> I am seeing these logs in kafka consumer logs. Can anyone help me
> understand what is going on here? It is a very long piece of log, but
> someone please help me. I am desperately looking for any solution since
> more than 2 months now. But to no avail.
>
> [2018-08-09 11:39:51] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:53] :: DEBUG ::
> ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> bulk-email-consumer committed offset 25465113 for partition bulk-email-8
> [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
> metadata=''}} for group bulk-email-consumer
> [2018-08-09 11:39:53] :: DEBUG ::
> ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> bulk-email-consumer committed offset 25463566 for partition bulk-email-6
> [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
> metadata=''}} for group bulk-email-consumer
> [2018-08-09 11:39:53] :: DEBUG ::
> ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> bulk-email-consumer committed offset 2588 for partition bulk-email-9
> [2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
> metadata=''}} for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:54] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:56] :: DEBUG ::
> ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> bulk-email-consumer committed offset 25463566 for partition bulk-email-6
> [2018-08-09 11:39:56] :: DEBUG ::
> ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> bulk-email-consumer committed offset 2588 for partition bulk-email-9
> [2018-08-09 11:39:56] :: DEBUG ::
> ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
> bulk-email-consumer committed offset 25465113 for partition bulk-email-8
> [2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
> metadata=''}} for group bulk-email-consumer
> [2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
> metadata=''}} for group bulk-email-consumer
> [2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
> autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
> metadata=''}} for group bulk-email-consumer
> [2018-08-09 11:39:57] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:57] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response for group bulk-email-consumer
> [2018-08-09 11:39:57] :: DEBUG ::
> AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
> heartbeat response 

Re: Very long consumer rebalances

2018-08-09 Thread Shantanu Deshmukh
 I am facing too many problems these days. Now one of our consumer groups
is rebalancing every now and then. And rebalance takes very low, more than
5-10 minutes. Even after re-balancing I see that only half of the consumers
are active/receive assignment. Its all going haywire.

I am seeing these logs in kafka consumer logs. Can anyone help me
understand what is going on here? It is a very long piece of log, but
someone please help me. I am desperately looking for any solution since
more than 2 months now. But to no avail.

[2018-08-09 11:39:51] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:53] :: DEBUG ::
ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
bulk-email-consumer committed offset 25465113 for partition bulk-email-8
[2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
metadata=''}} for group bulk-email-consumer
[2018-08-09 11:39:53] :: DEBUG ::
ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
bulk-email-consumer committed offset 25463566 for partition bulk-email-6
[2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
metadata=''}} for group bulk-email-consumer
[2018-08-09 11:39:53] :: DEBUG ::
ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
bulk-email-consumer committed offset 2588 for partition bulk-email-9
[2018-08-09 11:39:53] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
metadata=''}} for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:54] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:56] :: DEBUG ::
ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
bulk-email-consumer committed offset 25463566 for partition bulk-email-6
[2018-08-09 11:39:56] :: DEBUG ::
ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
bulk-email-consumer committed offset 2588 for partition bulk-email-9
[2018-08-09 11:39:56] :: DEBUG ::
ConsumerCoordinator$OffsetCommitResponseHandler:640 - Group
bulk-email-consumer committed offset 25465113 for partition bulk-email-8
[2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
autocommit of offsets {bulk-email-6=OffsetAndMetadata{offset=25463566,
metadata=''}} for group bulk-email-consumer
[2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
autocommit of offsets {bulk-email-9=OffsetAndMetadata{offset=2588,
metadata=''}} for group bulk-email-consumer
[2018-08-09 11:39:56] :: DEBUG :: ConsumerCoordinator$4:539 - Completed
autocommit of offsets {bulk-email-8=OffsetAndMetadata{offset=25465113,
metadata=''}} for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::
AbstractCoordinator$HeartbeatResponseHandler:694 - Received successful
heartbeat response for group bulk-email-consumer
[2018-08-09 11:39:57] :: DEBUG ::

Re: Very long consumer rebalances

2018-07-12 Thread Steve Tian
It's a very good and important doc so I think you should read it all.  You
should get some idea from sections like *Detecting Consumer Failures* and
*Multi-threaded Processing* for your case.

On Thu, Jul 12, 2018, 3:17 PM Shantanu Deshmukh 
wrote:

> Hi Steve,
>
> Could you please shed more light on this? What section should I revisit? I
> am using high-level consumer. So I am simply calling consumer.close() when
> I am shutting down the process. Is there any other method to be called
> before calling close()?
>
> On Mon, Jul 9, 2018 at 5:58 PM Steve Tian  wrote:
>
> > Please re-read the javadoc of KafkaConsumer, make sure you know how to
> > wakeup/close consumer properly while shutting down your application.  Try
> > to understand the motivation of KIP-62 and adjust related timeout.
> >
> > On Mon, Jul 9, 2018, 8:05 PM harish lohar  wrote:
> >
> > > Try reducing below timer
> > > metadata.max.age.ms = 30
> > >
> > >
> > > On Fri, Jul 6, 2018 at 5:55 AM Shantanu Deshmukh <
> shantanu...@gmail.com>
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app
> > > which
> > > > spawns many consumer threads consuming from different topics. For
> every
> > > > topic we have specified different consumer-group. A lot of times I
> see
> > > that
> > > > whenever this application is restarted a CG on one or two topics
> takes
> > > more
> > > > than 5 minutes to receive partition assignment. Till that time
> > consumers
> > > > for that topic don't consumer anything. If I go to Kafka broker and
> run
> > > > consumer-groups.sh and describe that particular CG I see that it is
> > > > rebalancing. There is time critical data stored in that topic and we
> > > cannot
> > > > tolerate such long delays. What can be the reason for such long
> > > rebalances.
> > > >
> > > > Here's our consumer config
> > > >
> > > >
> > > > auto.commit.interval.ms = 3000
> > > > auto.offset.reset = latest
> > > > bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> > > > check.crcs = true
> > > > client.id =
> > > > connections.max.idle.ms = 54
> > > > enable.auto.commit = true
> > > > exclude.internal.topics = true
> > > > fetch.max.bytes = 52428800
> > > > fetch.max.wait.ms = 500
> > > > fetch.min.bytes = 1
> > > > group.id = otp-notifications-consumer
> > > > heartbeat.interval.ms = 3000
> > > > interceptor.classes = null
> > > > key.deserializer = class
> > > > org.apache.kafka.common.serialization.StringDeserializer
> > > > max.partition.fetch.bytes = 1048576
> > > > max.poll.interval.ms = 30
> > > > max.poll.records = 50
> > > > metadata.max.age.ms = 30
> > > > metric.reporters = []
> > > > metrics.num.samples = 2
> > > > metrics.sample.window.ms = 3
> > > > partition.assignment.strategy = [class
> > > > org.apache.kafka.clients.consumer.RangeAssignor]
> > > > receive.buffer.bytes = 65536
> > > > reconnect.backoff.ms = 50
> > > > request.timeout.ms = 305000
> > > > retry.backoff.ms = 100
> > > > sasl.kerberos.kinit.cmd = /usr/bin/kinit
> > > > sasl.kerberos.min.time.before.relogin = 6
> > > > sasl.kerberos.service.name = null
> > > > sasl.kerberos.ticket.renew.jitter = 0.05
> > > > sasl.kerberos.ticket.renew.window.factor = 0.8
> > > > sasl.mechanism = GSSAPI
> > > > security.protocol = SSL
> > > > send.buffer.bytes = 131072
> > > > session.timeout.ms = 30
> > > > ssl.cipher.suites = null
> > > > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> > > > ssl.endpoint.identification.algorithm = null
> > > > ssl.key.password = null
> > > > ssl.keymanager.algorithm = SunX509
> > > > ssl.keystore.location = null
> > > > ssl.keystore.password = null
> > > > ssl.keystore.type = JKS
> > > > ssl.protocol = TLS
> > > > ssl.provider = null
> > > > ssl.secure.random.implementation = null
> > > > ssl.trustmanager.algorithm = PKIX
> > > > ssl.truststore.location = /x/x/client.truststore.jks
> > > > ssl.truststore.password = [hidden]
> > > > ssl.truststore.type = JKS
> > > > value.deserializer = class
> > > > org.apache.kafka.common.serialization.StringDeserializer
> > > >
> > > > Please help.
> > > >
> > > > *Thanks & Regards,*
> > > > *Shantanu Deshmukh*
> > > >
> > >
> >
>


Re: Very long consumer rebalances

2018-07-12 Thread Shantanu Deshmukh
Hi Steve,

Could you please shed more light on this? What section should I revisit? I
am using high-level consumer. So I am simply calling consumer.close() when
I am shutting down the process. Is there any other method to be called
before calling close()?

On Mon, Jul 9, 2018 at 5:58 PM Steve Tian  wrote:

> Please re-read the javadoc of KafkaConsumer, make sure you know how to
> wakeup/close consumer properly while shutting down your application.  Try
> to understand the motivation of KIP-62 and adjust related timeout.
>
> On Mon, Jul 9, 2018, 8:05 PM harish lohar  wrote:
>
> > Try reducing below timer
> > metadata.max.age.ms = 30
> >
> >
> > On Fri, Jul 6, 2018 at 5:55 AM Shantanu Deshmukh 
> > wrote:
> >
> > > Hello everyone,
> > >
> > > We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app
> > which
> > > spawns many consumer threads consuming from different topics. For every
> > > topic we have specified different consumer-group. A lot of times I see
> > that
> > > whenever this application is restarted a CG on one or two topics takes
> > more
> > > than 5 minutes to receive partition assignment. Till that time
> consumers
> > > for that topic don't consumer anything. If I go to Kafka broker and run
> > > consumer-groups.sh and describe that particular CG I see that it is
> > > rebalancing. There is time critical data stored in that topic and we
> > cannot
> > > tolerate such long delays. What can be the reason for such long
> > rebalances.
> > >
> > > Here's our consumer config
> > >
> > >
> > > auto.commit.interval.ms = 3000
> > > auto.offset.reset = latest
> > > bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> > > check.crcs = true
> > > client.id =
> > > connections.max.idle.ms = 54
> > > enable.auto.commit = true
> > > exclude.internal.topics = true
> > > fetch.max.bytes = 52428800
> > > fetch.max.wait.ms = 500
> > > fetch.min.bytes = 1
> > > group.id = otp-notifications-consumer
> > > heartbeat.interval.ms = 3000
> > > interceptor.classes = null
> > > key.deserializer = class
> > > org.apache.kafka.common.serialization.StringDeserializer
> > > max.partition.fetch.bytes = 1048576
> > > max.poll.interval.ms = 30
> > > max.poll.records = 50
> > > metadata.max.age.ms = 30
> > > metric.reporters = []
> > > metrics.num.samples = 2
> > > metrics.sample.window.ms = 3
> > > partition.assignment.strategy = [class
> > > org.apache.kafka.clients.consumer.RangeAssignor]
> > > receive.buffer.bytes = 65536
> > > reconnect.backoff.ms = 50
> > > request.timeout.ms = 305000
> > > retry.backoff.ms = 100
> > > sasl.kerberos.kinit.cmd = /usr/bin/kinit
> > > sasl.kerberos.min.time.before.relogin = 6
> > > sasl.kerberos.service.name = null
> > > sasl.kerberos.ticket.renew.jitter = 0.05
> > > sasl.kerberos.ticket.renew.window.factor = 0.8
> > > sasl.mechanism = GSSAPI
> > > security.protocol = SSL
> > > send.buffer.bytes = 131072
> > > session.timeout.ms = 30
> > > ssl.cipher.suites = null
> > > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> > > ssl.endpoint.identification.algorithm = null
> > > ssl.key.password = null
> > > ssl.keymanager.algorithm = SunX509
> > > ssl.keystore.location = null
> > > ssl.keystore.password = null
> > > ssl.keystore.type = JKS
> > > ssl.protocol = TLS
> > > ssl.provider = null
> > > ssl.secure.random.implementation = null
> > > ssl.trustmanager.algorithm = PKIX
> > > ssl.truststore.location = /x/x/client.truststore.jks
> > > ssl.truststore.password = [hidden]
> > > ssl.truststore.type = JKS
> > > value.deserializer = class
> > > org.apache.kafka.common.serialization.StringDeserializer
> > >
> > > Please help.
> > >
> > > *Thanks & Regards,*
> > > *Shantanu Deshmukh*
> > >
> >
>


Re: Very long consumer rebalances

2018-07-09 Thread Steve Tian
Please re-read the javadoc of KafkaConsumer, make sure you know how to
wakeup/close consumer properly while shutting down your application.  Try
to understand the motivation of KIP-62 and adjust related timeout.

On Mon, Jul 9, 2018, 8:05 PM harish lohar  wrote:

> Try reducing below timer
> metadata.max.age.ms = 30
>
>
> On Fri, Jul 6, 2018 at 5:55 AM Shantanu Deshmukh 
> wrote:
>
> > Hello everyone,
> >
> > We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app
> which
> > spawns many consumer threads consuming from different topics. For every
> > topic we have specified different consumer-group. A lot of times I see
> that
> > whenever this application is restarted a CG on one or two topics takes
> more
> > than 5 minutes to receive partition assignment. Till that time consumers
> > for that topic don't consumer anything. If I go to Kafka broker and run
> > consumer-groups.sh and describe that particular CG I see that it is
> > rebalancing. There is time critical data stored in that topic and we
> cannot
> > tolerate such long delays. What can be the reason for such long
> rebalances.
> >
> > Here's our consumer config
> >
> >
> > auto.commit.interval.ms = 3000
> > auto.offset.reset = latest
> > bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> > check.crcs = true
> > client.id =
> > connections.max.idle.ms = 54
> > enable.auto.commit = true
> > exclude.internal.topics = true
> > fetch.max.bytes = 52428800
> > fetch.max.wait.ms = 500
> > fetch.min.bytes = 1
> > group.id = otp-notifications-consumer
> > heartbeat.interval.ms = 3000
> > interceptor.classes = null
> > key.deserializer = class
> > org.apache.kafka.common.serialization.StringDeserializer
> > max.partition.fetch.bytes = 1048576
> > max.poll.interval.ms = 30
> > max.poll.records = 50
> > metadata.max.age.ms = 30
> > metric.reporters = []
> > metrics.num.samples = 2
> > metrics.sample.window.ms = 3
> > partition.assignment.strategy = [class
> > org.apache.kafka.clients.consumer.RangeAssignor]
> > receive.buffer.bytes = 65536
> > reconnect.backoff.ms = 50
> > request.timeout.ms = 305000
> > retry.backoff.ms = 100
> > sasl.kerberos.kinit.cmd = /usr/bin/kinit
> > sasl.kerberos.min.time.before.relogin = 6
> > sasl.kerberos.service.name = null
> > sasl.kerberos.ticket.renew.jitter = 0.05
> > sasl.kerberos.ticket.renew.window.factor = 0.8
> > sasl.mechanism = GSSAPI
> > security.protocol = SSL
> > send.buffer.bytes = 131072
> > session.timeout.ms = 30
> > ssl.cipher.suites = null
> > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> > ssl.endpoint.identification.algorithm = null
> > ssl.key.password = null
> > ssl.keymanager.algorithm = SunX509
> > ssl.keystore.location = null
> > ssl.keystore.password = null
> > ssl.keystore.type = JKS
> > ssl.protocol = TLS
> > ssl.provider = null
> > ssl.secure.random.implementation = null
> > ssl.trustmanager.algorithm = PKIX
> > ssl.truststore.location = /x/x/client.truststore.jks
> > ssl.truststore.password = [hidden]
> > ssl.truststore.type = JKS
> > value.deserializer = class
> > org.apache.kafka.common.serialization.StringDeserializer
> >
> > Please help.
> >
> > *Thanks & Regards,*
> > *Shantanu Deshmukh*
> >
>


Re: Very long consumer rebalances

2018-07-09 Thread harish lohar
Try reducing below timer
metadata.max.age.ms = 30


On Fri, Jul 6, 2018 at 5:55 AM Shantanu Deshmukh 
wrote:

> Hello everyone,
>
> We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app which
> spawns many consumer threads consuming from different topics. For every
> topic we have specified different consumer-group. A lot of times I see that
> whenever this application is restarted a CG on one or two topics takes more
> than 5 minutes to receive partition assignment. Till that time consumers
> for that topic don't consumer anything. If I go to Kafka broker and run
> consumer-groups.sh and describe that particular CG I see that it is
> rebalancing. There is time critical data stored in that topic and we cannot
> tolerate such long delays. What can be the reason for such long rebalances.
>
> Here's our consumer config
>
>
> auto.commit.interval.ms = 3000
> auto.offset.reset = latest
> bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> check.crcs = true
> client.id =
> connections.max.idle.ms = 54
> enable.auto.commit = true
> exclude.internal.topics = true
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 500
> fetch.min.bytes = 1
> group.id = otp-notifications-consumer
> heartbeat.interval.ms = 3000
> interceptor.classes = null
> key.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
> max.partition.fetch.bytes = 1048576
> max.poll.interval.ms = 30
> max.poll.records = 50
> metadata.max.age.ms = 30
> metric.reporters = []
> metrics.num.samples = 2
> metrics.sample.window.ms = 3
> partition.assignment.strategy = [class
> org.apache.kafka.clients.consumer.RangeAssignor]
> receive.buffer.bytes = 65536
> reconnect.backoff.ms = 50
> request.timeout.ms = 305000
> retry.backoff.ms = 100
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> sasl.kerberos.min.time.before.relogin = 6
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> sasl.kerberos.ticket.renew.window.factor = 0.8
> sasl.mechanism = GSSAPI
> security.protocol = SSL
> send.buffer.bytes = 131072
> session.timeout.ms = 30
> ssl.cipher.suites = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> ssl.endpoint.identification.algorithm = null
> ssl.key.password = null
> ssl.keymanager.algorithm = SunX509
> ssl.keystore.location = null
> ssl.keystore.password = null
> ssl.keystore.type = JKS
> ssl.protocol = TLS
> ssl.provider = null
> ssl.secure.random.implementation = null
> ssl.trustmanager.algorithm = PKIX
> ssl.truststore.location = /x/x/client.truststore.jks
> ssl.truststore.password = [hidden]
> ssl.truststore.type = JKS
> value.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
>
> Please help.
>
> *Thanks & Regards,*
> *Shantanu Deshmukh*
>


Re: Very long consumer rebalances

2018-07-09 Thread Shantanu Deshmukh
Kind people on this group, please help me!

On Fri, Jul 6, 2018 at 3:24 PM Shantanu Deshmukh 
wrote:

> Hello everyone,
>
> We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app which
> spawns many consumer threads consuming from different topics. For every
> topic we have specified different consumer-group. A lot of times I see that
> whenever this application is restarted a CG on one or two topics takes more
> than 5 minutes to receive partition assignment. Till that time consumers
> for that topic don't consumer anything. If I go to Kafka broker and run
> consumer-groups.sh and describe that particular CG I see that it is
> rebalancing. There is time critical data stored in that topic and we cannot
> tolerate such long delays. What can be the reason for such long rebalances.
>
> Here's our consumer config
>
>
> auto.commit.interval.ms = 3000
> auto.offset.reset = latest
> bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> check.crcs = true
> client.id =
> connections.max.idle.ms = 54
> enable.auto.commit = true
> exclude.internal.topics = true
> fetch.max.bytes = 52428800
> fetch.max.wait.ms = 500
> fetch.min.bytes = 1
> group.id = otp-notifications-consumer
> heartbeat.interval.ms = 3000
> interceptor.classes = null
> key.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
> max.partition.fetch.bytes = 1048576
> max.poll.interval.ms = 30
> max.poll.records = 50
> metadata.max.age.ms = 30
> metric.reporters = []
> metrics.num.samples = 2
> metrics.sample.window.ms = 3
> partition.assignment.strategy = [class
> org.apache.kafka.clients.consumer.RangeAssignor]
> receive.buffer.bytes = 65536
> reconnect.backoff.ms = 50
> request.timeout.ms = 305000
> retry.backoff.ms = 100
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> sasl.kerberos.min.time.before.relogin = 6
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> sasl.kerberos.ticket.renew.window.factor = 0.8
> sasl.mechanism = GSSAPI
> security.protocol = SSL
> send.buffer.bytes = 131072
> session.timeout.ms = 30
> ssl.cipher.suites = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> ssl.endpoint.identification.algorithm = null
> ssl.key.password = null
> ssl.keymanager.algorithm = SunX509
> ssl.keystore.location = null
> ssl.keystore.password = null
> ssl.keystore.type = JKS
> ssl.protocol = TLS
> ssl.provider = null
> ssl.secure.random.implementation = null
> ssl.trustmanager.algorithm = PKIX
> ssl.truststore.location = /x/x/client.truststore.jks
> ssl.truststore.password = [hidden]
> ssl.truststore.type = JKS
> value.deserializer = class
> org.apache.kafka.common.serialization.StringDeserializer
>
> Please help.
>
> *Thanks & Regards,*
> *Shantanu Deshmukh*
>


Very long consumer rebalances

2018-07-06 Thread Shantanu Deshmukh
Hello everyone,

We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app which
spawns many consumer threads consuming from different topics. For every
topic we have specified different consumer-group. A lot of times I see that
whenever this application is restarted a CG on one or two topics takes more
than 5 minutes to receive partition assignment. Till that time consumers
for that topic don't consumer anything. If I go to Kafka broker and run
consumer-groups.sh and describe that particular CG I see that it is
rebalancing. There is time critical data stored in that topic and we cannot
tolerate such long delays. What can be the reason for such long rebalances.

Here's our consumer config


auto.commit.interval.ms = 3000
auto.offset.reset = latest
bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 54
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = otp-notifications-consumer
heartbeat.interval.ms = 3000
interceptor.classes = null
key.deserializer = class
org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 30
max.poll.records = 50
metadata.max.age.ms = 30
metric.reporters = []
metrics.num.samples = 2
metrics.sample.window.ms = 3
partition.assignment.strategy = [class
org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 6
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = SSL
send.buffer.bytes = 131072
session.timeout.ms = 30
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = /x/x/client.truststore.jks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class
org.apache.kafka.common.serialization.StringDeserializer

Please help.

*Thanks & Regards,*
*Shantanu Deshmukh*