[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-22 Thread Dmitry Bugaychenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069899#comment-14069899
 ] 

Dmitry Bugaychenko commented on KAFKA-1539:
---

Going to test power failure again later today, I'll get back with results as 
soon as we get them.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps
 Attachments: KAFKA-1539.patch


 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-22 Thread Dmitry Bugaychenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070274#comment-14070274
 ] 

Dmitry Bugaychenko commented on KAFKA-1539:
---

With fileOutputStream.getFD.sync() patch we passed the power failure tests 
without loosing offset files. So, it seems to work.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps
 Attachments: KAFKA-1539.patch


 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-21 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068831#comment-14068831
 ] 

Jay Kreps commented on KAFKA-1539:
--

Created reviewboard https://reviews.apache.org/r/23743/
 against branch trunk

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps
 Attachments: KAFKA-1539.patch


 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-21 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068837#comment-14068837
 ] 

Jay Kreps commented on KAFKA-1539:
--

This is a really good catch, were clearly thinking flush() meant fsync, which 
is totally wrong. I uploaded a patch with your fix. If you are doing testing 
with this let me know that this actually fixes the issue you saw.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps
 Attachments: KAFKA-1539.patch


 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-21 Thread Sriram Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068848#comment-14068848
 ] 

Sriram Subramanian commented on KAFKA-1539:
---

I had encountered the same issue in another project and had to explicitly use 
fsync to fix it.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps
 Attachments: KAFKA-1539.patch


 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-20 Thread Dmitry Bugaychenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068010#comment-14068010
 ] 

Dmitry Bugaychenko commented on KAFKA-1539:
---

Digged the proble a bit more. It looks like calling flush on new 
BufferedWriter(new FileWriter(temp)) only forces buffered writer to dump 
everything into a FileOutputStream under the FileWriter and call flush on it. 
However, according to 
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/FileOutputStream.java#FileOutputStream
 it does nothing. In order to really force data to be written to disk you need 
to call fos.getFD().sync(). According to that the patch could be like that:

{code}
  def write(offsets: Map[TopicAndPartition, Long]) {
lock synchronized {
  // write to temp file and then swap with the existing file
  val temp = new File(file.getAbsolutePath + .tmp)

  val fileOutputStream = new FileOutputStream(temp)
  val writer = new BufferedWriter(new FileWriter(fileOutputStream))
  try {
// write the current version
writer.write(0.toString)
writer.newLine()
  
// write the number of entries
writer.write(offsets.size.toString)
writer.newLine()

// write the entries
offsets.foreach { case (topicPart, offset) =
  writer.write(%s %d %d.format(topicPart.topic, topicPart.partition, 
offset))
  writer.newLine()
}
  
// flush and overwrite old file
writer.flush()

// Force fsync to disk
fileOutputStream.getFD.sync()
  } finally {
writer.close()
  }
  
  // swap new offset checkpoint file with previous one
  if(!temp.renameTo(file)) {
// renameTo() fails on Windows if the destination file exists.
file.delete()
if(!temp.renameTo(file))
  throw new IOException(File rename from %s to %s 
failed..format(temp.getAbsolutePath, file.getAbsolutePath))
  }
}
  }
{code}

Note that the problem is easily reproducable only on XFS, ext3/ext4 seems to 
handle this case much better. Hope we will be able to try the patch later this 
week and check if it helps.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps

 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-15 Thread Dmitry Bugaychenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061724#comment-14061724
 ] 

Dmitry Bugaychenko commented on KAFKA-1539:
---

It looks like even after flush data are not necesary written to HDD. In XFS by 
default it could be cached up to 30 secodns, it also can be cached by a disk 
controller and etc. Wrtiting to temp file is a good idea, but it is better to 
keep the previous file untouched (do not replace it with the temp one).

On a 20 HDD server with XFS it is pretty easy to reproduce - after power 
failure we got corrupted offset files on 4-5 disks.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps

 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-15 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062973#comment-14062973
 ] 

Jun Rao commented on KAFKA-1539:


If flush is not guaranteed, will keeping two versions of the file help? At some 
point, we will have flushed both versions and neither one is guaranteed to 
persist.

 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps

 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

2014-07-14 Thread Dmitry Bugaychenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060456#comment-14060456
 ] 

Dmitry Bugaychenko commented on KAFKA-1539:
---

This is not about log files themselves^ but about chekpoint offset files 

{code}
-rw-r--r--  1 root root   158 Jul 14 12:11 recovery-point-offset-checkpoint
-rw-r--r--  1 root root   163 Jul 14 12:11 replication-offset-checkpoint
-rw-r--r--  1 root root 0 May 28 13:09 cleaner-offset-checkpoint
{code}

If recovery-point-offset-checkpoint got corrupted, broker startup slows down 
dramatically (to hours), if replication-offset-checkpoint got corrupted, then 
broker removes all the data it has and starts recovering from other replicas. 
If both got corrupted then you get both - broker spending hours checking log 
segment files and then removeing them all.


 Due to OS caching Kafka might loose offset files which causes full reset of 
 data
 

 Key: KAFKA-1539
 URL: https://issues.apache.org/jira/browse/KAFKA-1539
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1.1
Reporter: Dmitry Bugaychenko
Assignee: Jay Kreps

 Seen this while testing power failure and disk failures. Due to chaching on 
 OS level (eg. XFS can cache data for 30 seconds) after failure we got offset 
 files of zero length. This dramatically slows down broker startup (it have to 
 re-check all segments) and if high watermark offsets lost it simply erases 
 all data and start recovering from other brokers (looks funny - first 
 spending 2-3 hours re-checking logs and then deleting them all due to missing 
 high watermark).
 Proposal: introduce offset files rotation. Keep two version of offset file, 
 write to oldest, read from the newest valid. In this case we would be able to 
 configure offset checkpoint time in a way that at least one file is alway 
 flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)