[jira] [Created] (KAFKA-1063) run log cleanup at startup

paul mackles (JIRA) Mon, 23 Sep 2013 09:24:41 -0700

paul mackles created KAFKA-1063:
-----------------------------------

             Summary: run log cleanup at startup
                 Key: KAFKA-1063
                 URL: https://issues.apache.org/jira/browse/KAFKA-1063
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8
            Reporter: paul mackles
            Priority: Minor



Jun suggested I file this ticket to have the brokers start running cleanup at 
start. Here is the scenario that precipitated it:

We ran into a situation on our dev cluster (3 nodes, v0.8) where we ran out of 
disk on one of the nodes . As expected, the broker shut itself down and all of 
the clients switched over to the other nodes. So far so good. 

To free up disk space, I reduced log.retention.hours to something more 
manageable (from 172 to 12). I did this on all 3 nodes. Since the other 2 nodes 
were running OK, I first tried to restart the node which ran out of disk. 
Unfortunately, it kept shutting itself down due to the full disk. From the 
logs, I think this was because it was trying to sync-up the replicas it was 
responsible for and of course couldn't due to the lack of disk space. My hope 
was that upon restart, it would see the new retention settings and free up a 
bunch of disk space before trying to do any syncs.

I then went and restarted the other 2 nodes. They both picked up the new 
retention settings and freed up a bunch of storage as a result. I then went 
back and tried to restart the 3rd node but to no avail. It still had problems 
with the full disks.

I thought about trying to reassign partitions so that the node in question had 
less to manage but that turned out to be a hassle so I wound up manually 
deleting some of the old log/segment files. The broker seemed to come back fine 
after that but that's not something I would want to do on a production server.

We obviously need better monitoring/alerting to avoid this situation 
altogether, but I am wondering if the order of operations at startup 
could/should be changed to better account for scenarios like this. Or maybe a 
utility to remove old logs after changing ttl? Did I miss a better way to 
handle this?

Original email thread is here:

http://mail-archives.apache.org/mod_mbox/kafka-users/201309.mbox/%3cce6365ae.82d66%[email protected]%3e

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (KAFKA-1063) run log cleanup at startup

Reply via email to