>> Why does ZooKeeper not fix the corrupted entry automatically using replicas? What is the reason for this design decision?
My view on this is the decision of repair or not might not be always applicable in any cases - in some cases such as hardware failures it is better to crashing stop and have admin intervene (i.e. by reconfiguring the cluster) than prematurely trying to repair autonomously which would not succeed due to hardware failure, and worse proceeding on top of such failures might compromise the safety properties ZooKeeper promised. Crashing immediately sacrifice liveness to certain extent (if the quorum fails to form by taking out the failed node), but kept the safety property. Another benefit of crashing immediately is this could greatly simplify the system model and behavior to reason about, for applications / management layers built on top of ZK. On the other side in some cases it might be totally reasonable to try performing a repair and I remember there are some ZooKeeper JIRAs describing such best-effort based action. But in general since ZooKeeper does not deal with Byzantine Faults, it might not always be applicable for such repair attempts. On Mon, Jan 9, 2017 at 9:41 AM, Edward Ribeiro <edward.ribe...@gmail.com> wrote: > Hi, > > I am not aware if this was a design decision, to be honest. AFAIK, this has > been a long standing bug. :( I have compiled a handful of JIRA issues that > are basically this problem scattered through multiple repetitive issues. > Gonna aggregate them soon, I hope. We, the community, should raise the > priority and tackle this issue to make ZK server more resilient and robust > in the face of logs/txn files corruption. Any suggestion is more than > welcome, by the way! > > Cheers, > Eddie > > > On Fri, Jan 6, 2017 at 6:38 PM, Aishwarya Ganesan <ash8as...@gmail.com> > wrote: > > > Hi, > > > > We are looking at how ZooKeeper handles silent data corruptions resulting > > from underlying problems in disks and file systems atop them [1,2]. > > > > We set up a 3-node ZooKeeper cluster and introduce silent data > corruptions > > to different blocks in the on-disk files. In all the cases, ZooKeeper is > > able to detect corruptions in the log file using checksums. > > > > However, on detecting a corruption, the ZooKeeper node in which > corruption > > occurred crashes instead of trying to fix the corrupted data > automatically > > using the replicas. Why does ZooKeeper not fix the corrupted entry > > automatically using replicas? What is the reason for this design > decision? > > It would be helpful if anyone could give some insights on this. > > > > [1] https://research.cs.wisc.edu/wind/Publications/zfs- > > corruption-fast10.pdf > > [2] http://www.cs.toronto.edu/~bianca/papers/fast08.pdf > > > > Thanks, > > Aishwarya > > > -- Cheers Michael.