[ https://issues.apache.org/jira/browse/KAFKA-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rufus Skyfii updated KAFKA-12560: --------------------------------- Description: These two log files have 0 byte size and last created modified or updated along time ago and these files seem to be created on startup. When disks fill up, sometimes admins go through old log files and delete them. In this case, these two .log were also picked up. Note that I acknowledge this is a mistake as it should be filtered by extension but nevertheless anything in /var/log should not cause process to have issues and if the file doesn't exist, the process should simply recreate it. The observed effect of this on a HA kafka cluster with 3 nodes, the replica members were out of sync with just one member being leader and not in sync with rest of the nodes while some topics had two in sync instead of 3 and the leader had a -1 status. Restarting kafka did not resolve so in order to resolve, kafka was flushed in one of the node where it had lots of -1 leader status and then rest of the nodes restarted. However, this should not happen even if someone deleted kafka-authorizer.log and kafka-request.log and should be more resistant of breaking down due to file system changes. Issue occurs several hours later. {code:java} root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime +30 -print | xargs ls -ll |grep '.log$' -rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-authorizer.log -rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-request.log {code} {code:java} root@ip-172-17-1-63:/var/log/kafka# stat kafka-request.log File: kafka-request.log Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 34h/52d Inode: 768131 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-11-26 05:38:04.724561118 +0000 Modify: 2020-11-26 05:38:04.724561118 +0000 Change: 2020-11-26 05:38:04.724561118 +0000 Birth: - root@ip-172-17-1-63:/var/log/kafka# stat kafka-authorizer.log File: kafka-authorizer.log Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 34h/52d Inode: 768132 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-11-26 05:38:04.724561118 +0000 Modify: 2020-11-26 05:38:04.724561118 +0000 Change: 2020-11-26 05:38:04.724561118 +0000 Birth: - {code} Checking file handles in staging, I can see these files while not updated in 30 days are part of java process and the pid was current: {code:java} root@ip-172-17-1-63:/var/log/kafka# lsof kafka-authorizer.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 28 root 86w REG 0,52 0 768132 kafka-authorizer.log root@ip-172-17-1-63:/var/log/kafka# lsof kafka-request.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 28 root 85w REG 0,52 0 768131 kafka-request.log root@ip-172-17-1-63:/var/log/kafka# ps -ef | grep java root 28 11 83 20:12 ? 02:32:37 /usr/local/openjdk-11/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9999 -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/opt/kafka_2.12-2.2.1/bin/../config/log4j.properties -cp /opt/kafka_2.12-2.2.1/bin/../libs/activation-1.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-api-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-file-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-json-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-client-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-common-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-server-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.12-2.2.1/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-library-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zkclient-0.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zstd-jni-1.3.8-1.jar kafka.Kafka /opt/kafka_2.12-2.2.1/config/server.properties root 29 10 0 20:12 ? 00:00:38 /usr/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=true org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg {code} Below confirms that all extensions that do are not current log (identified by just ".log") and match these ".log.x", ".log.x.gz", ".backup" or ".gz" do not have open file handles and can be safely cleaned. To verify, I checked lsof for all files matching logrotate extensions modified in last 2 days: {code:java} root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' 768472 136 -rw-r--r-- 1 root root 137614 Mar 25 06:25 /var/log/kafka/server.log.1.gz 768517 16 -rw-r--r-- 1 root root 13823 Mar 24 04:58 /var/log/kafka/log-cleaner.log.2.gz 768431 10248 -rw-r--r-- 1 root root 10485918 Mar 24 01:27 /var/log/kafka/server.log.1-2021032406.backup 768507 688 -rw-r--r-- 1 root root 701207 Mar 24 06:24 /var/log/kafka/server.log.2.gz 768401 51508 -rw-r--r-- 1 root root 52742072 Mar 24 23:59 /var/log/kafka/controller.log.1-2021032506.backup 768459 10248 -rw-r--r-- 1 root root 10485913 Mar 25 05:51 /var/log/kafka/server.log.1-2021032506.backup 768432 1664 -rw-r--r-- 1 root root 1701962 Mar 24 05:54 /var/log/kafka/state-change.log.2.gz 768071 51208 -rw-r--r-- 1 root root 52429175 Mar 24 20:16 /var/log/kafka/state-change.log.1-2021032506.backup 768522 20 -rw-r--r-- 1 root root 16598 Mar 25 06:23 /var/log/kafka/log-cleaner.log.1.gz 768465 1052 -rw-r--r-- 1 root root 1075433 Mar 25 06:08 /var/log/kafka/state-change.log.1.gz 768562 3544 -rw-r--r-- 1 root root 3625106 Mar 25 06:24 /var/log/kafka/controller.log.1.gz root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' | awk '{print $11}' | xargs lsof {code} However, I still think accidental deletion of any log files should not affect operational state of the Kafka cluster. was: These two log files have 0 byte size and last created modified or updated along time ago and these files seem to be created on startup. When disks fill up, sometimes admins go through old log files and delete them. In this case, these two .log were also picked up. Note that I acknowledge this is a mistake as it should be filtered by extension but nevertheless anything in /var/log should not cause process to have issues and if the file doesn't exist, the process should simply recreate it. The observed effect of this on a HA kafka cluster with 3 nodes, the replica members were out of sync with just one member being leader and not in sync with rest of the nodes while some topics had two in sync instead of 3 and the leader had a -1 status. Restarting kafka did not resolve so in order to resolve, kafka was flushed in one of the node where it had lots of -1 leader status and then rest of the nodes restarted. However, this should not happen even if someone deleted kafka-authorizer.log and kafka-request.log and should be more resistant of breaking down due to file system changes. {code:java} root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime +30 -print | xargs ls -ll |grep '.log$' -rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-authorizer.log -rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-request.log {code} {code:java} root@ip-172-17-1-63:/var/log/kafka# stat kafka-request.log File: kafka-request.log Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 34h/52d Inode: 768131 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-11-26 05:38:04.724561118 +0000 Modify: 2020-11-26 05:38:04.724561118 +0000 Change: 2020-11-26 05:38:04.724561118 +0000 Birth: - root@ip-172-17-1-63:/var/log/kafka# stat kafka-authorizer.log File: kafka-authorizer.log Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 34h/52d Inode: 768132 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-11-26 05:38:04.724561118 +0000 Modify: 2020-11-26 05:38:04.724561118 +0000 Change: 2020-11-26 05:38:04.724561118 +0000 Birth: - {code} Checking file handles in staging, I can see these files while not updated in 30 days are part of java process and the pid was current: {code:java} root@ip-172-17-1-63:/var/log/kafka# lsof kafka-authorizer.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 28 root 86w REG 0,52 0 768132 kafka-authorizer.log root@ip-172-17-1-63:/var/log/kafka# lsof kafka-request.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 28 root 85w REG 0,52 0 768131 kafka-request.log root@ip-172-17-1-63:/var/log/kafka# ps -ef | grep java root 28 11 83 20:12 ? 02:32:37 /usr/local/openjdk-11/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9999 -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/opt/kafka_2.12-2.2.1/bin/../config/log4j.properties -cp /opt/kafka_2.12-2.2.1/bin/../libs/activation-1.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-api-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-file-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-json-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-client-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-common-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-server-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.12-2.2.1/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-library-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zkclient-0.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zstd-jni-1.3.8-1.jar kafka.Kafka /opt/kafka_2.12-2.2.1/config/server.properties root 29 10 0 20:12 ? 00:00:38 /usr/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=true org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg {code} Below confirms that all extensions that do are not current log (identified by just ".log") and match these ".log.x", ".log.x.gz", ".backup" or ".gz" do not have open file handles and can be safely cleaned. To verify, I checked lsof for all files matching logrotate extensions modified in last 2 days: {code:java} root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' 768472 136 -rw-r--r-- 1 root root 137614 Mar 25 06:25 /var/log/kafka/server.log.1.gz 768517 16 -rw-r--r-- 1 root root 13823 Mar 24 04:58 /var/log/kafka/log-cleaner.log.2.gz 768431 10248 -rw-r--r-- 1 root root 10485918 Mar 24 01:27 /var/log/kafka/server.log.1-2021032406.backup 768507 688 -rw-r--r-- 1 root root 701207 Mar 24 06:24 /var/log/kafka/server.log.2.gz 768401 51508 -rw-r--r-- 1 root root 52742072 Mar 24 23:59 /var/log/kafka/controller.log.1-2021032506.backup 768459 10248 -rw-r--r-- 1 root root 10485913 Mar 25 05:51 /var/log/kafka/server.log.1-2021032506.backup 768432 1664 -rw-r--r-- 1 root root 1701962 Mar 24 05:54 /var/log/kafka/state-change.log.2.gz 768071 51208 -rw-r--r-- 1 root root 52429175 Mar 24 20:16 /var/log/kafka/state-change.log.1-2021032506.backup 768522 20 -rw-r--r-- 1 root root 16598 Mar 25 06:23 /var/log/kafka/log-cleaner.log.1.gz 768465 1052 -rw-r--r-- 1 root root 1075433 Mar 25 06:08 /var/log/kafka/state-change.log.1.gz 768562 3544 -rw-r--r-- 1 root root 3625106 Mar 25 06:24 /var/log/kafka/controller.log.1.gz root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' | awk '{print $11}' | xargs lsof {code} However, I still think accidental deletion of any log files should not affect operational state of the Kafka cluster. > Accidental delete of some log files kafka-authorizer.log and > kafka-request.log can break topics in cluster > ---------------------------------------------------------------------------------------------------------- > > Key: KAFKA-12560 > URL: https://issues.apache.org/jira/browse/KAFKA-12560 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 2.2.1 > Environment: AWS EC2 i3.xlarge > Reporter: Rufus Skyfii > Priority: Minor > > These two log files have 0 byte size and last created modified or updated > along time ago and these files seem to be created on startup. > When disks fill up, sometimes admins go through old log files and delete > them. In this case, these two .log were also picked up. Note that I > acknowledge this is a mistake as it should be filtered by extension but > nevertheless anything in /var/log should not cause process to have issues and > if the file doesn't exist, the process should simply recreate it. > The observed effect of this on a HA kafka cluster with 3 nodes, the replica > members were out of sync with just one member being leader and not in sync > with rest of the nodes while some topics had two in sync instead of 3 and the > leader had a -1 status. Restarting kafka did not resolve so in order to > resolve, kafka was flushed in one of the node where it had lots of -1 leader > status and then rest of the nodes restarted. However, this should not happen > even if someone deleted kafka-authorizer.log and kafka-request.log and should > be more resistant of breaking down due to file system changes. > Issue occurs several hours later. > > {code:java} > root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime +30 -print | > xargs ls -ll |grep '.log$' > -rw-r--r-- 1 root root 0 Nov 26 05:38 > /var/log/kafka/kafka-authorizer.log > -rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-request.log > {code} > {code:java} > root@ip-172-17-1-63:/var/log/kafka# stat kafka-request.log > > > File: kafka-request.log > > > Size: 0 Blocks: 0 IO Block: 4096 regular empty > file > > Device: 34h/52d Inode: 768131 Links: 1 > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > > Access: 2020-11-26 05:38:04.724561118 +0000 > > > Modify: 2020-11-26 05:38:04.724561118 +0000 > > > Change: 2020-11-26 05:38:04.724561118 +0000 > > > Birth: - > > > root@ip-172-17-1-63:/var/log/kafka# stat kafka-authorizer.log > > > File: kafka-authorizer.log > > > Size: 0 Blocks: 0 IO Block: 4096 regular empty > file > > Device: 34h/52d Inode: 768132 Links: 1 > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > > Access: 2020-11-26 05:38:04.724561118 +0000 > > Modify: 2020-11-26 05:38:04.724561118 +0000 > > Change: 2020-11-26 05:38:04.724561118 +0000 > > Birth: - > > {code} > Checking file handles in staging, I can see these files while not updated in > 30 days are part of java process and the pid was current: > {code:java} > root@ip-172-17-1-63:/var/log/kafka# lsof kafka-authorizer.log > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > java 28 root 86w REG 0,52 0 768132 kafka-authorizer.log > root@ip-172-17-1-63:/var/log/kafka# lsof kafka-request.log > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > java 28 root 85w REG 0,52 0 768131 kafka-request.log > root@ip-172-17-1-63:/var/log/kafka# ps -ef | grep java > root 28 11 83 20:12 ? 02:32:37 > /usr/local/openjdk-11/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC > -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 > -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.port=9999 -Dkafka.logs.dir=/var/log/kafka > -Dlog4j.configuration=file:/opt/kafka_2.12-2.2.1/bin/../config/log4j.properties > -cp > /opt/kafka_2.12-2.2.1/bin/../libs/activation-1.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-api-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-file-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-json-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-client-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-common-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-server-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.12-2.2.1/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-library-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zkclient-0.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zstd-jni-1.3.8-1.jar > kafka.Kafka /opt/kafka_2.12-2.2.1/config/server.properties > root 29 10 0 20:12 ? 00:00:38 /usr/bin/java > -Dzookeeper.log.dir=/var/log/zookeeper > -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp > /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=true > org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg > {code} > Below confirms that all extensions that do are not current log (identified by > just ".log") and match these ".log.x", ".log.x.gz", ".backup" or ".gz" do not > have open file handles and can be safely cleaned. To verify, I checked lsof > for all files matching logrotate extensions modified in last 2 days: > {code:java} > root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep > -e '.log.$|.backup$|.gz$' > 768472 136 -rw-r--r-- 1 root root 137614 Mar 25 06:25 > /var/log/kafka/server.log.1.gz > 768517 16 -rw-r--r-- 1 root root 13823 Mar 24 04:58 > /var/log/kafka/log-cleaner.log.2.gz > 768431 10248 -rw-r--r-- 1 root root 10485918 Mar 24 01:27 > /var/log/kafka/server.log.1-2021032406.backup > 768507 688 -rw-r--r-- 1 root root 701207 Mar 24 06:24 > /var/log/kafka/server.log.2.gz > 768401 51508 -rw-r--r-- 1 root root 52742072 Mar 24 23:59 > /var/log/kafka/controller.log.1-2021032506.backup > 768459 10248 -rw-r--r-- 1 root root 10485913 Mar 25 05:51 > /var/log/kafka/server.log.1-2021032506.backup > 768432 1664 -rw-r--r-- 1 root root 1701962 Mar 24 05:54 > /var/log/kafka/state-change.log.2.gz > 768071 51208 -rw-r--r-- 1 root root 52429175 Mar 24 20:16 > /var/log/kafka/state-change.log.1-2021032506.backup > 768522 20 -rw-r--r-- 1 root root 16598 Mar 25 06:23 > /var/log/kafka/log-cleaner.log.1.gz > 768465 1052 -rw-r--r-- 1 root root 1075433 Mar 25 06:08 > /var/log/kafka/state-change.log.1.gz > 768562 3544 -rw-r--r-- 1 root root 3625106 Mar 25 06:24 > /var/log/kafka/controller.log.1.gz > root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep > -e '.log.$|.backup$|.gz$' | awk '{print $11}' | xargs lsof > {code} > However, I still think accidental deletion of any log files should not affect > operational state of the Kafka cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)