All,
I'm encountering a situation on a fast machine where the Kafka log
aggregation topic is not empty when the system shuts down. The scenario:
* log consumer consumes all messages
* consumer sleeps (500ms) due to empty queue
* containers exit, posting /final log messages/ about why
* controller notices containers are down and terminates consumers.
* consumer is interrupted from sleep and but has been canceled so it
does not get the rest of the messages.
This scenario can be really confusing during development because an
error may be missed (as in my case) if it falls into the /final log
messages/. Before I file a ticket and fix this, I wanted to get some
feedback. Looking at
org.apache.twill.internal.kafka.client.SimpleKafkaConsumer it seems this
behavior could be intentional given this log message (line 384):
LOG.debug("Unable to fetch messages on {}, kafka consumer
service shutdown is in progress.", topicPart);
My opinion is that final messages logged by a container are likely to be
critical in diagnosing errors and that twill should do whatever it can
to forward them before shutting things down. If there is agreement on
this I'll file a ticket and fix it. My general approach would be to
indicate to the consumer that it is in a shuttingDown state which it
would use to break from the consume loop once the message set was
empty. If this makes sense would we need to support a timeout for the
maximum amount of time to be in this state before punting on the rest of
the messages? My instinct is no, get them all, but given the way the
code is set up now, perhaps there are good reasons to timeout.
Thanks,
Martin Serrano