Dmitrii Kovalkov created ZOOKEEPER-4311:
-------------------------------------------

             Summary: Fsync errors are ignored in AtomicFileWritingIdiom
                 Key: ZOOKEEPER-4311
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4311
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection, server
            Reporter: Dmitrii Kovalkov


Class AtomicFileOutputStream has a non-trivial logic in its 'close' method. 
([code|https://github.com/apache/zookeeper/blob/5c102298f8a160ea996be7b6d6f95189d4ff2f41/zookeeper-server/src/main/java/org/apache/zookeeper/common/AtomicFileOutputStream.java#L76-L106]).
 It ensures that data is persistently stored on the disk via 'flush' and 
'fsync' to .tmp file, then tries to rename the file. In case of any errors, 
.tmp file is deleted and exception is thrown.

AtomicFileWritingIdiom, which is based on AtomicFileOutputStream, only calls 
'flush' explicitly. 'close' method is called via IOUtils.closeStream 
([code|https://github.com/apache/zookeeper/blob/5c102298f8a160ea996be7b6d6f95189d4ff2f41/zookeeper-server/src/main/java/org/apache/zookeeper/common/AtomicFileWritingIdiom.java#L87]).
But docs says that IOUtils.closeStream ignores IOException, which can happen 
during fsync. 
([docs|https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/io/IOUtils.html#closeStream(java.io.Closeable)]).
 As a result, in case of fsync errors, .tmp file is deleted, main file is not 
updated, but zookeeper ignores an exception and assumes that everything is ok.

AtomicFileWritingIdiom is used in leader election to store 'currentEpoch' and 
'acceptedEpoch' files. This bug theoreticly can lead to electing two leaders in 
one epoch in case of disk failures.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to