[ https://issues.apache.org/jira/browse/KAFKA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kamal Chandraprakash resolved KAFKA-8547. ----------------------------------------- Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/KAFKA-8335 PR [https://github.com/apache/kafka/pull/6715] > 2 __consumer_offsets partitions grow very big > --------------------------------------------- > > Key: KAFKA-8547 > URL: https://issues.apache.org/jira/browse/KAFKA-8547 > Project: Kafka > Issue Type: Bug > Components: log cleaner > Affects Versions: 2.1.1 > Environment: Ubuntu 18.04, Kafka 2.1.12-2.1.1, running as systemd > service > Reporter: Lerh Chuan Low > Priority: Major > > It seems like log cleaner doesn't clean old data of {{__consumer_offsets}} > on the default policy of compact on that topic. It may eventually cause disk > to run out or for the servers to run out of memory. > We observed a few out of memory errors with our Kafka servers and our theory > was due to 2 overly large partitions in {{__consumer_offsets}}. On further > digging, it looks like these 2 large partitions have segments dating up to 3 > months ago. Also, these old files collectively consumed most of the data from > those partitions (About 10G from the partition's 12G). > When we tried dumping those old segments, we see: > > {code:java} > 1:40 $ ./kafka-run-class.sh kafka.tools.DumpLogSegments --files > 00000000161728257775.log --offsets-decoder --print-data-log --deep-iteration > Dumping 00000000161728257775.log > Starting offset: 161728257775 > offset: 161728257904 position: 61 CreateTime: 1553457816168 isvalid: true > keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 367038 > producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] > endTxnMarker: COMMIT coordinatorEpoch: 746 > offset: 161728258098 position: 200 CreateTime: 1553457816230 isvalid: true > keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 366036 > producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] > endTxnMarker: COMMIT coordinatorEpoch: 761 > ...{code} > It looks like all those old segments all contain transactional information > (As a side note, we did take a while to figure out that for a segment with > the control bit set, the key really is {{endTxnMarker}} and the value is > {{coordinatorEpoch}}...otherwise in a non-control batch dump it would have > value and payload. We were wondering if seeing what those 2 partitions > contained in their keys may give us any clues). Our current workaround is > based on this post: > https://issues.apache.org/jira/browse/KAFKA-3917?focusedCommentId=16816874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16816874. > We set the cleanup policy to both compact,delete and very quickly the > partition was down to below 2G. Not sure if this is something log cleaner > should be able to handle normally? Interestingly, other partitions also > contain transactional information so it's quite curious how 2 specific > partitions were not able to be cleaned. > There's a related issue here: > https://issues.apache.org/jira/browse/KAFKA-3917, just thought it was a > little bit outdated/dead so I opened a new one, please feel free to merge! -- This message was sent by Atlassian JIRA (v7.6.3#76005)