Hi Jason, Thanks for writing up a proposal (and a thorough one)! This is something that I had been thinking about this week too as I have run into it more than a handful of times now.
I like the idea of having a larger processing timeout, that timeout in unison with max.poll.records should in many cases provide a reasonable assurance that the consumer will stay alive. In rejected alternatives "Add a separate API the user can call to indicate liveness" is listed. I think a heartbeat api could be added along with these new timeout configurations and used for "advanced" use cases where the processing time could be highly variant and less predictable. I think a place where we might use the heartbeat api in Kafka is MirrorMaker. Today, I have seen people trying to find ways to leverage the existing api to "force" heartbeats by: 1. Calling poll to get the batch of records to process 2. Call pause on all partitions 3. Process the record batch 3a. While processing periodically call poll (which is essentially just heartbeat since it returns no records and is paused) 4. Commit offsets and un-pause 5. Repeat from 1 Thanks, Grant On Wed, May 25, 2016 at 6:32 PM, Jason Gustafson <ja...@confluent.io> wrote: > Hi All, > > One of the persistent problems we see with the new consumer is the use of > the session timeout in order to ensure progress. Whenever there is a delay > in message processing which exceeds the session timeout, no heartbeats can > be sent and the consumer is removed from the group. We seem to hit this > problem everywhere the consumer is used (including Kafka Connect and Kafka > Streams) and we don't always have a great solution. I've written a KIP to > address this problem here: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread > . > Have a look and let me know what you think. > > Thanks, > Jason > -- Grant Henke Software Engineer | Cloudera gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke