Tim Stilwell created KAFKA-600: ---------------------------------- Summary: kafka should respond gracefully rather than crash when unable to write due to ENOSPC Key: KAFKA-600 URL: https://issues.apache.org/jira/browse/KAFKA-600 Project: Kafka Issue Type: Bug Components: core Reporter: Tim Stilwell
problem: user starts kafka with log.dir value set to a small partition and begins writing data to the mq. when the disk partition is full, kafka crashes. given that this product is used for both reading and writing operations, crashing seems rather drastic even if the error message is helpful. something more robust would be appreciated. perhaps, logging an error and rejecting additional write requests while accepting additional read requests? perhaps, sending an email alert to Operations? at least shutdown gracefully so the user is aware that received messages were saved with a helpful message providing some details of the last message received. when tens or hundreds of thousands of messages can be processed in a second, it isn't helpful to merely log a timestamp and crash. steps to reproduce: 1) download and install kafka 2) modify server.properties # vi /opt/kafka-0.7.2-incubating-src/config/server.properties set log.dir="/var/log/kafka" 3) modify log4j # vi /opt/kafka-0.7.2-incubating-src/config/log4j.properties set fileAppender.File=/var/log/kafka/kafka-request.log 4) start kafka service $ sudo bash # ulimit -c unlimited # /opt/kafka-0.7.2-incubating-src/bin/kafka-server-start.sh /opt/kafka-0.7.2-incubating-src/config/server.properties & 6) begin writing data to hostname:9092 7) review /var/log/kafka-request.log results: $ grep log.dir /opt/kafka-0.7.2-incubating-src/config/server.properties log.dir=/var/log/kafka $ df -h /var/log/kafka Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 4.0G 0 100% / $ tail /var/log/kafka/kafka-request.log 17627442 [ZkClient-EventThread-14-10.0.20.242:2181] INFO kafka.server.KafkaZooKeeper - Begin registering broker topic /brokers/topics/raw/0 with 1 partitions 17627444 [ZkClient-EventThread-14-10.0.20.242:2181] INFO kafka.server.KafkaZooKeeper - End registering broker topic /brokers/topics/raw/0 17627445 [ZkClient-EventThread-14-10.0.20.242:2181] INFO kafka.server.KafkaZooKeeper - done re-registering broker 18337676 [kafka-processor-3] ERROR kafka.network.Processor - Closing socket for /10.0.20.138 because of error java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) at sun.nio.ch.IOUtil.read(IOUtil.java:191) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) at kafka.utils.Utils$.read(Utils.scala:538) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:311) at kafka.network.Processor.run(SocketServer.scala:214) at java.lang.Thread.run(Thread.java:722) 18391974 [kafka-processor-4] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18422004 [kafka-processor-5] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18434563 [kafka-processor-6] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18485005 [kafka-processor-7] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18497083 [kafka-processor-0] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18525720 [kafka-processor-1] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18543843 [kafka-processor-2] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18563230 [kafka-processor-4] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18575613 [kafka-processor-5] INFO kafka.network.Processor - Closing socket connection to /10.0.20.138. 18677568 [kafka-processor-6] ERROR kafka.network.Processor - Closing socket for /10.0.20.138 because of error java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) at sun.nio.ch.IOUtil.read(IOUtil.java:191) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) at kafka.utils.Utils$.read(Utils.scala:538) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:311) at kafka.network.Processor.run(SocketServer.scala:214) at java.lang.Thread.run(Thread.java:722) 18828016 [kafka-processor-7] INFO kafka.network.Processor - Closing socket connection to /10.0.20.248. 18844274 [kafka-processor-0] INFO kafka.network.Processor - Closing socket connection to /10.0.20.248. 18849691 [kafka-processor-1] INFO kafka.network.Processor - Closing socket connection to /10.0.20.248. 18896883 [kafka-processor-2] INFO kafka.network.Processor - Closing socket connection to /10.0.20.248. 22383195 [kafka-processor-2] FATAL kafka.log.Log - Halting due to unrecoverable I/O error while handling producer request java.io.IOException: No space left on device at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:59) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) at sun.nio.ch.IOUtil.write(IOUtil.java:60) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:195) at kafka.message.ByteBufferMessageSet.writeTo(ByteBufferMessageSet.scala:76) at kafka.message.FileMessageSet.append(FileMessageSet.scala:159) at kafka.log.LogSegment.append(Log.scala:105) at kafka.log.Log.liftedTree1$1(Log.scala:246) at kafka.log.Log.append(Log.scala:242) at kafka.server.KafkaRequestHandlers.kafka$server$KafkaRequestHandlers$$handleProducerRequest(KafkaRequestHandlers.scala:69) at kafka.server.KafkaRequestHandlers.handleProducerRequest(KafkaRequestHandlers.scala:53) at kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$1.apply(KafkaRequestHandlers.scala:38) at kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$1.apply(KafkaRequestHandlers.scala:38) at kafka.network.Processor.handle(SocketServer.scala:296) at kafka.network.Processor.read(SocketServer.scala:319) at kafka.network.Processor.run(SocketServer.scala:214) at java.lang.Thread.run(Thread.java:722) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira