Guozhang Wang created KAFKA-1860:
------------------------------------
Summary: File system errors are not detected unless Kafka tries to
write
Key: KAFKA-1860
URL: https://issues.apache.org/jira/browse/KAFKA-1860
Project: Kafka
Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
Fix For: 0.9.0
When the disk (raid with caches dir) dies on a Kafka broker, typically the
filesystem gets mounted into read-only mode, and hence when Kafka tries to read
the disk, they'll get a FileNotFoundException with the read-only errno set
(EROFS).
However, as long as there is no produce request received, hence no writes
attempted on the disks, Kafka will not exit on such FATAL error (when the disk
starts working again, Kafka might think some files are gone while they will
reappear later as raid comes back online). Instead it keeps spilling exceptions
like:
{code}
2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1]
[kafka-server] [] Uncaught exception in scheduled task
'kafka-recovery-point-checkpoint'
java.io.FileNotFoundException:
/export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp
(Read-only file system)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:206)
at java.io.FileOutputStream.<init>(FileOutputStream.java:156)
at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)