[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2016-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127177#comment-15127177
 ] 

ASF GitHub Bot commented on KAFKA-1860:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/698


> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2016-01-22 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113128#comment-15113128
 ] 

Mayuresh Gharat commented on KAFKA-1860:


[~guozhang] can you take another look at the PR? I have included most of the 
comments on the PR. 

> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064922#comment-15064922
 ] 

ASF GitHub Bot commented on KAFKA-1860:
---

GitHub user MayureshGharat opened a pull request:

https://github.com/apache/kafka/pull/697

KAFKA-1860

The JVM should stop if the underlying file system goes in to Read only mode

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MayureshGharat/kafka kafka-1860

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #697


commit 5c7f2e749fd8674bae66b6698319181a0f3e9251
Author: Mayuresh Gharat 
Date:   2015-12-18T18:28:32Z

Added topic-partition information to the exception message on batch expiry 
in RecordAccumulator

commit 140d89f33171d665ec27839e8589f2055dc2a34b
Author: Mayuresh Gharat 
Date:   2015-12-18T19:02:49Z

Made the exception message more clear explaining why the batches expired




> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064942#comment-15064942
 ] 

ASF GitHub Bot commented on KAFKA-1860:
---

Github user MayureshGharat closed the pull request at:

https://github.com/apache/kafka/pull/697


> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064951#comment-15064951
 ] 

ASF GitHub Bot commented on KAFKA-1860:
---

GitHub user MayureshGharat opened a pull request:

https://github.com/apache/kafka/pull/698

KAFKA-1860

The JVM should stop if the underlying file system goes in to Read only mode

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MayureshGharat/kafka KAFKA-1860

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #698


commit 4ba2186fbf422395387254d3201f53bc6707
Author: Mayuresh Gharat 
Date:   2015-12-18T23:21:01Z

The JVM should stop if the underlying file system goes in to Read only mode




> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-12-17 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062380#comment-15062380
 ] 

Ismael Juma commented on KAFKA-1860:


Maybe worth filing a PR with this change.

> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-12-17 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062409#comment-15062409
 ] 

Mayuresh Gharat commented on KAFKA-1860:


Cool.

> File system errors are not detected unless Kafka tries to write
> ---
>
> Key: KAFKA-1860
> URL: https://issues.apache.org/jira/browse/KAFKA-1860
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
> Fix For: 0.10.0.0
>
> Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:206)
>   at java.io.FileOutputStream.(FileOutputStream.java:156)
>   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-08-03 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652067#comment-14652067
 ] 

Mayuresh Gharat commented on KAFKA-1860:


[~guozhang] ping.

 File system errors are not detected unless Kafka tries to write
 ---

 Key: KAFKA-1860
 URL: https://issues.apache.org/jira/browse/KAFKA-1860
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Mayuresh Gharat
 Fix For: 0.9.0

 Attachments: KAFKA-1860.patch


 When the disk (raid with caches dir) dies on a Kafka broker, typically the 
 filesystem gets mounted into read-only mode, and hence when Kafka tries to 
 read the disk, they'll get a FileNotFoundException with the read-only errno 
 set (EROFS).
 However, as long as there is no produce request received, hence no writes 
 attempted on the disks, Kafka will not exit on such FATAL error (when the 
 disk starts working again, Kafka might think some files are gone while they 
 will reappear later as raid comes back online). Instead it keeps spilling 
 exceptions like:
 {code}
 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
 [kafka-server] [] Uncaught exception in scheduled task 
 'kafka-recovery-point-checkpoint'
 java.io.FileNotFoundException: 
 /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
 (Read-only file system)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:206)
   at java.io.FileOutputStream.init(FileOutputStream.java:156)
   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-03-17 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366069#comment-14366069
 ] 

Mayuresh Gharat commented on KAFKA-1860:


Created reviewboard https://reviews.apache.org/r/32172/diff/
 against branch origin/trunk

 File system errors are not detected unless Kafka tries to write
 ---

 Key: KAFKA-1860
 URL: https://issues.apache.org/jira/browse/KAFKA-1860
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Mayuresh Gharat
 Fix For: 0.9.0

 Attachments: KAFKA-1860.patch


 When the disk (raid with caches dir) dies on a Kafka broker, typically the 
 filesystem gets mounted into read-only mode, and hence when Kafka tries to 
 read the disk, they'll get a FileNotFoundException with the read-only errno 
 set (EROFS).
 However, as long as there is no produce request received, hence no writes 
 attempted on the disks, Kafka will not exit on such FATAL error (when the 
 disk starts working again, Kafka might think some files are gone while they 
 will reappear later as raid comes back online). Instead it keeps spilling 
 exceptions like:
 {code}
 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
 [kafka-server] [] Uncaught exception in scheduled task 
 'kafka-recovery-point-checkpoint'
 java.io.FileNotFoundException: 
 /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
 (Read-only file system)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:206)
   at java.io.FileOutputStream.init(FileOutputStream.java:156)
   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

2015-03-17 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366088#comment-14366088
 ] 

Mayuresh Gharat commented on KAFKA-1860:


We need to reproduce and test the fix on our kafka server mp

 File system errors are not detected unless Kafka tries to write
 ---

 Key: KAFKA-1860
 URL: https://issues.apache.org/jira/browse/KAFKA-1860
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Mayuresh Gharat
 Fix For: 0.9.0

 Attachments: KAFKA-1860.patch


 When the disk (raid with caches dir) dies on a Kafka broker, typically the 
 filesystem gets mounted into read-only mode, and hence when Kafka tries to 
 read the disk, they'll get a FileNotFoundException with the read-only errno 
 set (EROFS).
 However, as long as there is no produce request received, hence no writes 
 attempted on the disks, Kafka will not exit on such FATAL error (when the 
 disk starts working again, Kafka might think some files are gone while they 
 will reappear later as raid comes back online). Instead it keeps spilling 
 exceptions like:
 {code}
 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
 [kafka-server] [] Uncaught exception in scheduled task 
 'kafka-recovery-point-checkpoint'
 java.io.FileNotFoundException: 
 /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
 (Read-only file system)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:206)
   at java.io.FileOutputStream.init(FileOutputStream.java:156)
   at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)