[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069899#comment-14069899 ] Dmitry Bugaychenko commented on KAFKA-1539: --- Going to test power failure again later today, I'll get back with results as soon as we get them. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Attachments: KAFKA-1539.patch Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070274#comment-14070274 ] Dmitry Bugaychenko commented on KAFKA-1539: --- With fileOutputStream.getFD.sync() patch we passed the power failure tests without loosing offset files. So, it seems to work. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Attachments: KAFKA-1539.patch Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068831#comment-14068831 ] Jay Kreps commented on KAFKA-1539: -- Created reviewboard https://reviews.apache.org/r/23743/ against branch trunk Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Attachments: KAFKA-1539.patch Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068837#comment-14068837 ] Jay Kreps commented on KAFKA-1539: -- This is a really good catch, were clearly thinking flush() meant fsync, which is totally wrong. I uploaded a patch with your fix. If you are doing testing with this let me know that this actually fixes the issue you saw. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Attachments: KAFKA-1539.patch Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068848#comment-14068848 ] Sriram Subramanian commented on KAFKA-1539: --- I had encountered the same issue in another project and had to explicitly use fsync to fix it. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Attachments: KAFKA-1539.patch Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068010#comment-14068010 ] Dmitry Bugaychenko commented on KAFKA-1539: --- Digged the proble a bit more. It looks like calling flush on new BufferedWriter(new FileWriter(temp)) only forces buffered writer to dump everything into a FileOutputStream under the FileWriter and call flush on it. However, according to http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/FileOutputStream.java#FileOutputStream it does nothing. In order to really force data to be written to disk you need to call fos.getFD().sync(). According to that the patch could be like that: {code} def write(offsets: Map[TopicAndPartition, Long]) { lock synchronized { // write to temp file and then swap with the existing file val temp = new File(file.getAbsolutePath + .tmp) val fileOutputStream = new FileOutputStream(temp) val writer = new BufferedWriter(new FileWriter(fileOutputStream)) try { // write the current version writer.write(0.toString) writer.newLine() // write the number of entries writer.write(offsets.size.toString) writer.newLine() // write the entries offsets.foreach { case (topicPart, offset) = writer.write(%s %d %d.format(topicPart.topic, topicPart.partition, offset)) writer.newLine() } // flush and overwrite old file writer.flush() // Force fsync to disk fileOutputStream.getFD.sync() } finally { writer.close() } // swap new offset checkpoint file with previous one if(!temp.renameTo(file)) { // renameTo() fails on Windows if the destination file exists. file.delete() if(!temp.renameTo(file)) throw new IOException(File rename from %s to %s failed..format(temp.getAbsolutePath, file.getAbsolutePath)) } } } {code} Note that the problem is easily reproducable only on XFS, ext3/ext4 seems to handle this case much better. Hope we will be able to try the patch later this week and check if it helps. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061724#comment-14061724 ] Dmitry Bugaychenko commented on KAFKA-1539: --- It looks like even after flush data are not necesary written to HDD. In XFS by default it could be cached up to 30 secodns, it also can be cached by a disk controller and etc. Wrtiting to temp file is a good idea, but it is better to keep the previous file untouched (do not replace it with the temp one). On a 20 HDD server with XFS it is pretty easy to reproduce - after power failure we got corrupted offset files on 4-5 disks. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062973#comment-14062973 ] Jun Rao commented on KAFKA-1539: If flush is not guaranteed, will keeping two versions of the file help? At some point, we will have flushed both versions and neither one is guaranteed to persist. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data
[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060456#comment-14060456 ] Dmitry Bugaychenko commented on KAFKA-1539: --- This is not about log files themselves^ but about chekpoint offset files {code} -rw-r--r-- 1 root root 158 Jul 14 12:11 recovery-point-offset-checkpoint -rw-r--r-- 1 root root 163 Jul 14 12:11 replication-offset-checkpoint -rw-r--r-- 1 root root 0 May 28 13:09 cleaner-offset-checkpoint {code} If recovery-point-offset-checkpoint got corrupted, broker startup slows down dramatically (to hours), if replication-offset-checkpoint got corrupted, then broker removes all the data it has and starts recovering from other replicas. If both got corrupted then you get both - broker spending hours checking log segment files and then removeing them all. Due to OS caching Kafka might loose offset files which causes full reset of data Key: KAFKA-1539 URL: https://issues.apache.org/jira/browse/KAFKA-1539 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1.1 Reporter: Dmitry Bugaychenko Assignee: Jay Kreps Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark). Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)