[
https://issues.apache.org/jira/browse/ZOOKEEPER-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435785#comment-15435785
]
Patrick Hunt commented on ZOOKEEPER-2528:
-----------------------------------------
Thanks for the report - do you have the stack trace from the EOF? That would
simplify tracking this one down.
> ZooKeeper cluster can become unavailable due to power failures
> --------------------------------------------------------------
>
> Key: ZOOKEEPER-2528
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2528
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.8
> Environment: A normal ZooKeeper cluster of 3 nodes running on 3 Linux
> machines.
> Reporter: Ramnatthan Alagappan
>
> ZooKeeper cluster can become unavailable if power failures happen at certain
> specific points in time.
> Details:
> I am running a three-node ZooKeeper cluster. I perform a simple update from a
> client machine.
> When I try to update a value, ZooKeeper creates a new log file (for example,
> when the current log is fully utilized). First, it creates the file and
> appends some header information to the newly created log. The system call
> sequence looks like below:
> creat(log.200000001)
> append(log.200000001, offset=0, count=16)
> Now, if a power failure happens just after the creat of the log file but
> before the append of the header information, the node simply crashes with an
> EOF exception. If the same problem occurs at two or more nodes in my
> three-node cluster, the entire cluster becomes unavailable as the majority of
> servers have crashed because of the above problem.
> A power failure at the same time across multiple nodes may be possible in
> single data center or single rack deployment scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)