Ben Stopford created KAFKA-2909:
-----------------------------------
Summary: Another Instance of Gap in Consumption after Restart
Key: KAFKA-2909
URL: https://issues.apache.org/jira/browse/KAFKA-2909
Project: Kafka
Issue Type: Sub-task
Reporter: Ben Stopford
This seems very similar to Rajini's reported KAFAK-2891
*Context*
The context is Seurity Rolling Upgrade with 30s consumer timeout. There was a
2s sleep between restarts. Throughput was limited to 1000 messages per second.
*Failure*
At least one acked message did not appear in the consumed messages.
acked_minus_consumed: set(36802, 36804, 36805, 36807, 36808, 36810, 36811,
64403, 64406, 64409, 36799)
Missing data was correctly written to Kafka data files:
{quote}
value 36802 -> partition 1,offset: 12216
kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files
worker7/kafka-data-logs/test_topic-1/00000000000000000000.log | grep 'offset:
12216'
-> offset: 12216 position: 374994 isvalid: true payloadsize: 5 magic: 0
compresscodec: NoCompressionCodec crc: 3001177408
in all three data files. So the data is there.
{quote}
The first missing value was written at: 20:42:30,185, which is around the time
the third node goes down.
The failed writes correlate with the consumer logging out
NOT_COORDINATOR_FOR_GROUP and Marking the coordinator. There are many of these
messages though over a long period so it’s hard to infer this as being the
cause or specifically correlating with the error.
*Timeline*
{quote}
grep -r 'shutdown complete' *
20:42:06,132 - Node 1 shutdown completed
20:42:18,560 - Node 2 shutdown completed
20:42:30,185 - *Writes that never make it are written by producer*
20:42:31,164 - Node 3 shutdown completed
20:42:57,872 - Node 1 shutdown completed
…
{quote}
All logs for this incident are attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)