[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127177#comment-15127177 ] ASF GitHub Bot commented on KAFKA-1860: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/698 > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113128#comment-15113128 ] Mayuresh Gharat commented on KAFKA-1860: [~guozhang] can you take another look at the PR? I have included most of the comments on the PR. > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064922#comment-15064922 ] ASF GitHub Bot commented on KAFKA-1860: --- GitHub user MayureshGharat opened a pull request: https://github.com/apache/kafka/pull/697 KAFKA-1860 The JVM should stop if the underlying file system goes in to Read only mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/MayureshGharat/kafka kafka-1860 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/697.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #697 commit 5c7f2e749fd8674bae66b6698319181a0f3e9251 Author: Mayuresh GharatDate: 2015-12-18T18:28:32Z Added topic-partition information to the exception message on batch expiry in RecordAccumulator commit 140d89f33171d665ec27839e8589f2055dc2a34b Author: Mayuresh Gharat Date: 2015-12-18T19:02:49Z Made the exception message more clear explaining why the batches expired > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064942#comment-15064942 ] ASF GitHub Bot commented on KAFKA-1860: --- Github user MayureshGharat closed the pull request at: https://github.com/apache/kafka/pull/697 > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064951#comment-15064951 ] ASF GitHub Bot commented on KAFKA-1860: --- GitHub user MayureshGharat opened a pull request: https://github.com/apache/kafka/pull/698 KAFKA-1860 The JVM should stop if the underlying file system goes in to Read only mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/MayureshGharat/kafka KAFKA-1860 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/698.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #698 commit 4ba2186fbf422395387254d3201f53bc6707 Author: Mayuresh GharatDate: 2015-12-18T23:21:01Z The JVM should stop if the underlying file system goes in to Read only mode > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062380#comment-15062380 ] Ismael Juma commented on KAFKA-1860: Maybe worth filing a PR with this change. > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062409#comment-15062409 ] Mayuresh Gharat commented on KAFKA-1860: Cool. > File system errors are not detected unless Kafka tries to write > --- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang >Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:206) > at java.io.FileOutputStream.(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652067#comment-14652067 ] Mayuresh Gharat commented on KAFKA-1860: [~guozhang] ping. File system errors are not detected unless Kafka tries to write --- Key: KAFKA-1860 URL: https://issues.apache.org/jira/browse/KAFKA-1860 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Mayuresh Gharat Fix For: 0.9.0 Attachments: KAFKA-1860.patch When the disk (raid with caches dir) dies on a Kafka broker, typically the filesystem gets mounted into read-only mode, and hence when Kafka tries to read the disk, they'll get a FileNotFoundException with the read-only errno set (EROFS). However, as long as there is no produce request received, hence no writes attempted on the disks, Kafka will not exit on such FATAL error (when the disk starts working again, Kafka might think some files are gone while they will reappear later as raid comes back online). Instead it keeps spilling exceptions like: {code} 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] [kafka-server] [] Uncaught exception in scheduled task 'kafka-recovery-point-checkpoint' java.io.FileNotFoundException: /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp (Read-only file system) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:206) at java.io.FileOutputStream.init(FileOutputStream.java:156) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366069#comment-14366069 ] Mayuresh Gharat commented on KAFKA-1860: Created reviewboard https://reviews.apache.org/r/32172/diff/ against branch origin/trunk File system errors are not detected unless Kafka tries to write --- Key: KAFKA-1860 URL: https://issues.apache.org/jira/browse/KAFKA-1860 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Mayuresh Gharat Fix For: 0.9.0 Attachments: KAFKA-1860.patch When the disk (raid with caches dir) dies on a Kafka broker, typically the filesystem gets mounted into read-only mode, and hence when Kafka tries to read the disk, they'll get a FileNotFoundException with the read-only errno set (EROFS). However, as long as there is no produce request received, hence no writes attempted on the disks, Kafka will not exit on such FATAL error (when the disk starts working again, Kafka might think some files are gone while they will reappear later as raid comes back online). Instead it keeps spilling exceptions like: {code} 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] [kafka-server] [] Uncaught exception in scheduled task 'kafka-recovery-point-checkpoint' java.io.FileNotFoundException: /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp (Read-only file system) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:206) at java.io.FileOutputStream.init(FileOutputStream.java:156) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write
[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366088#comment-14366088 ] Mayuresh Gharat commented on KAFKA-1860: We need to reproduce and test the fix on our kafka server mp File system errors are not detected unless Kafka tries to write --- Key: KAFKA-1860 URL: https://issues.apache.org/jira/browse/KAFKA-1860 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Mayuresh Gharat Fix For: 0.9.0 Attachments: KAFKA-1860.patch When the disk (raid with caches dir) dies on a Kafka broker, typically the filesystem gets mounted into read-only mode, and hence when Kafka tries to read the disk, they'll get a FileNotFoundException with the read-only errno set (EROFS). However, as long as there is no produce request received, hence no writes attempted on the disks, Kafka will not exit on such FATAL error (when the disk starts working again, Kafka might think some files are gone while they will reappear later as raid comes back online). Instead it keeps spilling exceptions like: {code} 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] [kafka-server] [] Uncaught exception in scheduled task 'kafka-recovery-point-checkpoint' java.io.FileNotFoundException: /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp (Read-only file system) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:206) at java.io.FileOutputStream.init(FileOutputStream.java:156) at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)