Tim Stilwell created KAFKA-600:
----------------------------------

             Summary: kafka should respond gracefully rather than crash when 
unable to write due to ENOSPC
                 Key: KAFKA-600
                 URL: https://issues.apache.org/jira/browse/KAFKA-600
             Project: Kafka
          Issue Type: Bug
          Components: core
            Reporter: Tim Stilwell


problem:
user starts kafka with log.dir value set to a small partition and begins 
writing data to the mq.  when the disk partition is full, kafka crashes.  given 
that this product is used for both reading and writing operations, crashing 
seems rather drastic even if the error message is helpful.   something more 
robust would be appreciated.  perhaps, logging an error and rejecting 
additional write requests while accepting additional read requests?  perhaps, 
sending an email alert to Operations?  at least shutdown gracefully so the user 
is aware that received messages were saved with a helpful message providing 
some details of the last message received.  when tens or hundreds of thousands 
of messages can be processed in a second, it isn't helpful to merely log a 
timestamp and crash.

steps to reproduce:
1) download and install kafka
2) modify server.properties
    # vi /opt/kafka-0.7.2-incubating-src/config/server.properties
    set log.dir="/var/log/kafka"
3) modify log4j
    # vi /opt/kafka-0.7.2-incubating-src/config/log4j.properties
    set fileAppender.File=/var/log/kafka/kafka-request.log
4) start kafka service
    $ sudo bash
    # ulimit -c unlimited
    # /opt/kafka-0.7.2-incubating-src/bin/kafka-server-start.sh 
/opt/kafka-0.7.2-incubating-src/config/server.properties &
6) begin writing data to hostname:9092
7) review /var/log/kafka-request.log

results:
$ grep log.dir /opt/kafka-0.7.2-incubating-src/config/server.properties
log.dir=/var/log/kafka
$ df -h /var/log/kafka
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       4.0G  4.0G     0 100% /
$ tail /var/log/kafka/kafka-request.log
17627442 [ZkClient-EventThread-14-10.0.20.242:2181] INFO  
kafka.server.KafkaZooKeeper  - Begin registering broker topic 
/brokers/topics/raw/0 with 1 partitions
17627444 [ZkClient-EventThread-14-10.0.20.242:2181] INFO  
kafka.server.KafkaZooKeeper  - End registering broker topic 
/brokers/topics/raw/0
17627445 [ZkClient-EventThread-14-10.0.20.242:2181] INFO  
kafka.server.KafkaZooKeeper  - done re-registering broker
18337676 [kafka-processor-3] ERROR kafka.network.Processor  - Closing socket 
for /10.0.20.138 because of error
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
        at sun.nio.ch.IOUtil.read(IOUtil.java:191)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
        at kafka.utils.Utils$.read(Utils.scala:538)
        at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        at kafka.network.Processor.read(SocketServer.scala:311)
        at kafka.network.Processor.run(SocketServer.scala:214)
        at java.lang.Thread.run(Thread.java:722)
18391974 [kafka-processor-4] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18422004 [kafka-processor-5] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18434563 [kafka-processor-6] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18485005 [kafka-processor-7] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18497083 [kafka-processor-0] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18525720 [kafka-processor-1] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18543843 [kafka-processor-2] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18563230 [kafka-processor-4] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18575613 [kafka-processor-5] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.138.
18677568 [kafka-processor-6] ERROR kafka.network.Processor  - Closing socket 
for /10.0.20.138 because of error
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
        at sun.nio.ch.IOUtil.read(IOUtil.java:191)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
        at kafka.utils.Utils$.read(Utils.scala:538)
        at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
        at kafka.network.Processor.read(SocketServer.scala:311)
        at kafka.network.Processor.run(SocketServer.scala:214)
        at java.lang.Thread.run(Thread.java:722)
18828016 [kafka-processor-7] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.248.
18844274 [kafka-processor-0] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.248.
18849691 [kafka-processor-1] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.248.
18896883 [kafka-processor-2] INFO  kafka.network.Processor  - Closing socket 
connection to /10.0.20.248.
22383195 [kafka-processor-2] FATAL kafka.log.Log  - Halting due to 
unrecoverable I/O error while handling producer request
java.io.IOException: No space left on device
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:59)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
        at sun.nio.ch.IOUtil.write(IOUtil.java:60)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:195)
        at 
kafka.message.ByteBufferMessageSet.writeTo(ByteBufferMessageSet.scala:76)
        at kafka.message.FileMessageSet.append(FileMessageSet.scala:159)
        at kafka.log.LogSegment.append(Log.scala:105)
        at kafka.log.Log.liftedTree1$1(Log.scala:246)
        at kafka.log.Log.append(Log.scala:242)
        at 
kafka.server.KafkaRequestHandlers.kafka$server$KafkaRequestHandlers$$handleProducerRequest(KafkaRequestHandlers.scala:69)
        at 
kafka.server.KafkaRequestHandlers.handleProducerRequest(KafkaRequestHandlers.scala:53)
        at 
kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$1.apply(KafkaRequestHandlers.scala:38)
        at 
kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$1.apply(KafkaRequestHandlers.scala:38)
        at kafka.network.Processor.handle(SocketServer.scala:296)
        at kafka.network.Processor.read(SocketServer.scala:319)
        at kafka.network.Processor.run(SocketServer.scala:214)
        at java.lang.Thread.run(Thread.java:722)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to