Akhilesh Dubey created KAFKA-13635:
--------------------------------------
Summary: Make Consumer Group Protocol resilient to disk issues
with __consumer_offsets
Key: KAFKA-13635
URL: https://issues.apache.org/jira/browse/KAFKA-13635
Project: Kafka
Issue Type: Improvement
Reporter: Akhilesh Dubey
While working with 6.1.1, we experienced offset reset on some consumer groups
after a disk full issue (the actual underlying issue was an uncontrolled kafka
and a machine shutdown).
When the machine and kafka brokers were restarted, consumer applications
received a {{Found no committed offset for partition <xyz>}} which triggered
offset reset which in our case was set to earliest - {{{}Resetting offset for
partition <xyz>{}}}.
On further investigation, we noticed that {{GroupMetadataManager}} silently
handled an offset load issue.
ERROR [GroupMetadataManager brokerId=1] Error loading offsets from
__consumer_offsets-33 (kafka.coordinator.group.GroupMetadataManager)
org.apache.kafka.common.errors.CorruptRecordException: Record size 0 is less
than the minimum record overhead (14)
There's nothing wrong here as the uncontrolled shutdown and possibly pagecache
issues could have led to disk flush issues and GroupCoordinator cannot do much
if the offsets themselves are missing.
I would like to request a feature to stop progress/retry if
{{__consumer_offsets}} partition fails to load.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)