Re: Best approach to frequently restarting consumer process
Consumer groups aren't going to handle 'let it crash' particularly well (and really any session-based services, but particularly consumer groups since a single failure affects the entire group). That said, 'let it crash' doesn't necessarily have to mean 'don't try to clean up at all'. The consumer group will recover *much* more quickly if you make sure any crash path includes a: finally { consumer.close(); } block to do some minimal cleanup. This will cause the consumer to make a best effort to explicitly leave the group, allowing rebalancing to complete after the rest of the members rejoin. If you don't do this, your rebalances get much more expensive since the group coordinator needs to wait for the session timeout. This will probably notice to noticeably longer pauses. The one drawback to doing this today is that the close() can potentially block, so it may not fail as fast as you want it to -- it would be good to get a timeout-based close() implemented as well. That said, the LeaveGroup request *is* best effort, so if the consumer was otherwise in a healthy state, this should be very fast. All this said, 'let it crash' isn't the same thing as 'constant crashes are ok'. It's a fault recovery methodology, but crashing every 5 minutes isn't what the telecom industry had in mind... If things are crashing that frequently, there is likely a very common bug/memory leak/etc which can be fixed to significantly reduce the frequency of crashes. Generally 'let it crash' systems also provide a good way to also collect debugging information for exactly this purpose. -Ewen On Wed, Dec 7, 2016 at 1:38 AM, Harald Kirsch wrote: > With 'restart' I mean a 'let it crash' setup (as promoted by Erlang and > Akka, e.g. http://doc.akka.io/docs/akka/snapshot/intro/what-is-akka.html). > The consumer gets in trouble due to an OOM or a runaway computation or > whatever that we want to preempt somehow. It crashes or gets killed > externally. > > So whether close() is called or not in the dying process, I don't know. > But clearly the subscribe is called after a restart. > > I understand that we are out of luck with this. We would have to separate > the crashing part out into a different operating system process, but must > keep the consumer running all time. :-( > > Thanks for the insight > Harald > > > On 06.12.2016 19:26, Gwen Shapira wrote: > >> Can you clarify what you mean by "restart"? If you call >> consumer.close() and consumer.subscribe() you will definitely trigger >> a rebalance. >> >> It doesn't matter if its "same consumer knocking", we already >> rebalance when you call consumer.close(). >> >> Since we want both consumer.close() and consumer.subscribe() to cause >> rebalance immediately (and not wait for heartbeat), I don't think >> we'll be changing their behavior. >> >> Depending on why consumers need to restart, I'm wondering if you can >> restart other threads in your application but keep the consumer up and >> running to avoid the rebalances. >> >> On Tue, Dec 6, 2016 at 7:18 AM, Harald Kirsch >> wrote: >> >>> We have consumer processes which need to restart frequently, say, every 5 >>> minutes. We have 10 of them so we are facing two restarts every minute on >>> average. >>> >>> 1) It seems that nearly every time a consumer restarts the group is >>> rebalanced. Even if the restart takes less than the heartbeat interval. >>> >>> 2) My guess is that the group manager just cannot know that the same >>> consumer is knocking at the door again. >>> >>> Are my suspicions (1) and (2) correct? Is there a chance to fix this such >>> that a restart within the heartbeat interval does not lead to a >>> re-balance? >>> Would a well defined client.id help? >>> >>> Regards >>> Harald >>> >>> >> >> >> -- Thanks, Ewen
Re: Best approach to frequently restarting consumer process
With 'restart' I mean a 'let it crash' setup (as promoted by Erlang and Akka, e.g. http://doc.akka.io/docs/akka/snapshot/intro/what-is-akka.html). The consumer gets in trouble due to an OOM or a runaway computation or whatever that we want to preempt somehow. It crashes or gets killed externally. So whether close() is called or not in the dying process, I don't know. But clearly the subscribe is called after a restart. I understand that we are out of luck with this. We would have to separate the crashing part out into a different operating system process, but must keep the consumer running all time. :-( Thanks for the insight Harald On 06.12.2016 19:26, Gwen Shapira wrote: Can you clarify what you mean by "restart"? If you call consumer.close() and consumer.subscribe() you will definitely trigger a rebalance. It doesn't matter if its "same consumer knocking", we already rebalance when you call consumer.close(). Since we want both consumer.close() and consumer.subscribe() to cause rebalance immediately (and not wait for heartbeat), I don't think we'll be changing their behavior. Depending on why consumers need to restart, I'm wondering if you can restart other threads in your application but keep the consumer up and running to avoid the rebalances. On Tue, Dec 6, 2016 at 7:18 AM, Harald Kirsch wrote: We have consumer processes which need to restart frequently, say, every 5 minutes. We have 10 of them so we are facing two restarts every minute on average. 1) It seems that nearly every time a consumer restarts the group is rebalanced. Even if the restart takes less than the heartbeat interval. 2) My guess is that the group manager just cannot know that the same consumer is knocking at the door again. Are my suspicions (1) and (2) correct? Is there a chance to fix this such that a restart within the heartbeat interval does not lead to a re-balance? Would a well defined client.id help? Regards Harald
Re: Best approach to frequently restarting consumer process
Can you clarify what you mean by "restart"? If you call consumer.close() and consumer.subscribe() you will definitely trigger a rebalance. It doesn't matter if its "same consumer knocking", we already rebalance when you call consumer.close(). Since we want both consumer.close() and consumer.subscribe() to cause rebalance immediately (and not wait for heartbeat), I don't think we'll be changing their behavior. Depending on why consumers need to restart, I'm wondering if you can restart other threads in your application but keep the consumer up and running to avoid the rebalances. On Tue, Dec 6, 2016 at 7:18 AM, Harald Kirsch wrote: > We have consumer processes which need to restart frequently, say, every 5 > minutes. We have 10 of them so we are facing two restarts every minute on > average. > > 1) It seems that nearly every time a consumer restarts the group is > rebalanced. Even if the restart takes less than the heartbeat interval. > > 2) My guess is that the group manager just cannot know that the same > consumer is knocking at the door again. > > Are my suspicions (1) and (2) correct? Is there a chance to fix this such > that a restart within the heartbeat interval does not lead to a re-balance? > Would a well defined client.id help? > > Regards > Harald > -- Gwen Shapira Product Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter | blog
Best approach to frequently restarting consumer process
We have consumer processes which need to restart frequently, say, every 5 minutes. We have 10 of them so we are facing two restarts every minute on average. 1) It seems that nearly every time a consumer restarts the group is rebalanced. Even if the restart takes less than the heartbeat interval. 2) My guess is that the group manager just cannot know that the same consumer is knocking at the door again. Are my suspicions (1) and (2) correct? Is there a chance to fix this such that a restart within the heartbeat interval does not lead to a re-balance? Would a well defined client.id help? Regards Harald