[ https://issues.apache.org/jira/browse/KAFKA-13575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469101#comment-17469101 ]
Alok Kumar Singh edited comment on KAFKA-13575 at 1/5/22, 8:08 AM: ------------------------------------------------------------------- If someone picks this we have a running cluster already to reproduce and provide more information to fix this faster. We are thinking this config is not tested well as most people use compacted topics or use log.retention.ms=-1 instead of log.retention.hours=-1, so this config might not be tested well. If you can give some pointers I will also try to look through code to debug this. We have a pending RCA on a production downtime and need this soon. was (Author: JIRAUSER282915): If someone picks this we have a running cluster already to reproduce and provide more information to fix this faster. We are thinking this config is not tested well as most people use compacted topics or use log.retention.ms=-1 instead of log.retention.hours=-1, so this config might not be tested well. If you can give some pointers I will also try to look through code to debug this. > Unwanted cleanup since log.retention.hours=-1 does not work > ----------------------------------------------------------- > > Key: KAFKA-13575 > URL: https://issues.apache.org/jira/browse/KAFKA-13575 > Project: Kafka > Issue Type: Bug > Components: log cleaner > Affects Versions: 2.6.0 > Environment: AWS, Kubernetes cluster, Kafka run as strimzi pods. > Reporter: Alok Kumar Singh > Priority: Critical > Attachments: Screenshot 2022-01-02 at 4.22.50 PM.png, Screenshot > 2022-01-02 at 4.30.38 PM.png > > > Use case: We have a use case in which we never want topics data to get > deleted. We used `log.retention.hours=-1` to keep the topic's retention as > infinite. > But still Kafka cleanup got triggered and it cleaned our topic's data moving > our oldest offset from 0 to higher value. Below is the log from Kafka using > default retention of 7days=6040800000ms and moving the offsets due to deleted > segment. Logs are filled with messages "due to {*}retention time 604800000ms > breach{*}" when it should have used the infinite retention based on the set > config. > *Logs* of the Kafka Pod that has the cleanup traces: > {code:java} > 2021-12-21 15:28:20,891 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets > [0,2214146,4890534,7645992,10394781,13149705,15798698,18084408,20638290,23202704,25760900,28207227,30796134,33437632,35763142,38247977,40861013,43473617,46086956,48469054,50776433,53050704,55299660,57529943,59743683,61984715,64224317,66458910,68670877,70886848,73080554,74872031,76656553,78573653,80412933,82457759,84854900,89950485,94263626,98668311,103762764,107797683,111809208,113716544,115406491,117052453,119304884,121765245,123536361,125364537,127191439,129028759,130902041,132719344,134787234,139027470,143932296,147888638,150492879,152410799,154338596,156416190,160672948,165054157,169273069,171824978,173847615,175602822,177765267,179701522,181622192,183511328,185382862,187200966,188998652,193646877,198500489,203354616,208211378,210992397,212806790,214599592,216456221,218231836,220014080,221812337,223574882,225341294,227151134,228894227,231000663,232855550,234554788,236304441,238012477,239523793,241164599,242938200,244945345,246534386,248345745,249972929,251761034,253511669,255180052,256960087,258715796,260382401,262123766,263900146,265585895,267349067,269106101,270936370,272601164,274416555,276154378,277854987,279656300,281487544,283272481,285126550,286886989,288671958,290389171,292132610,293756088,295527031,297230008,298992929,300641108,302415673,304138766,305900851,307542789,309368805,311075952,312729042,314516564,316294429,318006961,319760277,321426916,323177673,324845934,326606764,328385019,330117277,331930761,333729477,335525038,337253962,339046341,340756170,342402991,344019861,345789681,347575987,349276417,350912980,352675272,354444307,356157225,357960082,359616551,361418905,363132123,364932894,366638881,368368450] > due to retention time 604800000ms breach (kafka.log.Log) [kafka-scheduler-2] > 2021-12-23 02:28:18,124 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets [370099688] due to retention time 604800000ms breach (kafka.log.Log) > [kafka-scheduler-4] > 2021-12-24 15:03:18,125 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets [371913024] due to retention time 604800000ms breach (kafka.log.Log) > [kafka-scheduler-1] > 2021-12-26 13:33:18,124 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets [373561655] due to retention time 604800000ms breach (kafka.log.Log) > [kafka-scheduler-2] > 2021-12-28 06:18:18,124 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets [375346988] due to retention time 604800000ms breach (kafka.log.Log) > [kafka-scheduler-9] > 2021-12-29 20:18:18,125 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets [377066152] due to retention time 604800000ms breach (kafka.log.Log) > [kafka-scheduler-2] > 2021-12-31 12:18:18,124 INFO [Log partition=ts.praccomm.patientsms-0, > dir=/var/lib/kafka/data/kafka-log2] Found deletable segments with base > offsets [378821599] due to retention time 604800000ms breach (kafka.log.Log) > [kafka-scheduler-2] {code} > *Listing out Configurations to prove we are setting the right values* > *Broker Configurations:* > {code:java} > $ ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties > --bootstrap-server XXX:9094 --entity-type brokers --entity-name 2 --describe > --all | grep retention > (org.apache.kafka.clients.admin.AdminClientConfig) > log.cleaner.delete.retention.ms=86400000 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} > log.retention.hours=-1 sensitive=false > synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, > DEFAULT_CONFIG:log.retention.hours=168} > log.retention.minutes=null sensitive=false synonyms={} > offsets.retention.check.interval.ms=600000 sensitive=false > synonyms={DEFAULT_CONFIG:offsets.retention.check.interval.ms=600000} > log.retention.check.interval.ms=300000 sensitive=false > synonyms={DEFAULT_CONFIG:log.retention.check.interval.ms=300000} > log.retention.ms=null sensitive=false synonyms={} > log.retention.bytes=-1 sensitive=false > synonyms={DEFAULT_CONFIG:log.retention.bytes=-1} > offsets.retention.minutes=525600 sensitive=false > synonyms={STATIC_BROKER_CONFIG:offsets.retention.minutes=525600, > DEFAULT_CONFIG:offsets.retention.minutes=10080} {code} > *I also checked other brokers also have same configuration:* > {code:java} > ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties > --bootstrap-server XXX --entity-type brokers --entity-name 0 --describe --all > | grep log.retention > log.retention.hours=-1 sensitive=false > synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, > DEFAULT_CONFIG:log.retention.hours=168} {code} > {code:java} > ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties > --bootstrap-server XXX --entity-type brokers --entity-name 0 --describe --all > | grep log.retention > log.retention.hours=-1 sensitive=false > synonyms={STATIC_BROKER_CONFIG:log.retention.hours=-1, > DEFAULT_CONFIG:log.retention.hours=168} {code} > *Topic Configuration:* > {code:java} > $ ./bin/kafka-configs.sh --command-config ./bin/client-ssl-auth.properties > --bootstrap-server XXXX:9094 --entity-type topics --entity-name > ts.praccomm.patientsms --describe --all > org.apache.kafka.clients.admin.AdminClientConfig) > All configs for topic ts.praccomm.patientsms are: > compression.type=producer sensitive=false > synonyms={DEFAULT_CONFIG:compression.type=producer} > leader.replication.throttled.replicas= sensitive=false synonyms={} > min.insync.replicas=1 sensitive=false > synonyms={DEFAULT_CONFIG:min.insync.replicas=1} > message.downconversion.enable=true sensitive=false > synonyms={DEFAULT_CONFIG:log.message.downconversion.enable=true} > segment.jitter.ms=0 sensitive=false synonyms={} > cleanup.policy=delete sensitive=false > synonyms={DEFAULT_CONFIG:log.cleanup.policy=delete} > flush.ms=9223372036854775807 sensitive=false synonyms={} > follower.replication.throttled.replicas= sensitive=false synonyms={} > segment.bytes=1073741824 sensitive=false > synonyms={DEFAULT_CONFIG:log.segment.bytes=1073741824} > retention.ms=-1 sensitive=false synonyms={} > flush.messages=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.flush.interval.messages=9223372036854775807} > message.format.version=2.6-IV0 sensitive=false > synonyms={STATIC_BROKER_CONFIG:log.message.format.version=2.6, > DEFAULT_CONFIG:log.message.format.version=2.6-IV0} > max.compaction.lag.ms=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.max.compaction.lag.ms=9223372036854775807} > file.delete.delay.ms=60000 sensitive=false > synonyms={DEFAULT_CONFIG:log.segment.delete.delay.ms=60000} > max.message.bytes=1572864 sensitive=false > synonyms={DYNAMIC_BROKER_CONFIG:message.max.bytes=1572864, > STATIC_BROKER_CONFIG:message.max.bytes=1572864, > DEFAULT_CONFIG:message.max.bytes=1048588} > min.compaction.lag.ms=0 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.min.compaction.lag.ms=0} > message.timestamp.type=CreateTime sensitive=false > synonyms={DEFAULT_CONFIG:log.message.timestamp.type=CreateTime} > preallocate=false sensitive=false > synonyms={DEFAULT_CONFIG:log.preallocate=false} > index.interval.bytes=4096 sensitive=false > synonyms={DEFAULT_CONFIG:log.index.interval.bytes=4096} > min.cleanable.dirty.ratio=0.5 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.min.cleanable.ratio=0.5} > unclean.leader.election.enable=false sensitive=false > synonyms={DEFAULT_CONFIG:unclean.leader.election.enable=false} > retention.bytes=-1 sensitive=false > synonyms={DEFAULT_CONFIG:log.retention.bytes=-1} > delete.retention.ms=86400000 sensitive=false > synonyms={DEFAULT_CONFIG:log.cleaner.delete.retention.ms=86400000} > segment.ms=604800000 sensitive=false synonyms={} > message.timestamp.difference.max.ms=9223372036854775807 sensitive=false > synonyms={DEFAULT_CONFIG:log.message.timestamp.difference.max.ms=9223372036854775807} > segment.index.bytes=10485760 sensitive=false > synonyms={DEFAULT_CONFIG:log.index.size.max.bytes=10485760}{code} > With log.retention.hours being set to -1, it was expected the log cleanup to > not trigger but Kafka still used the default configuration of 7 days > retention and cleaned up the segments. The cleanups are still happening as we > have not changed the configurations to debug and find the cause. Let me know > if you need more details. > Please look into this. -- This message was sent by Atlassian Jira (v8.20.1#820001)