>> Why does ZooKeeper not fix the corrupted entry
automatically using replicas? What is the reason for this design decision?
My view on this is the decision of repair or not might not be always
applicable in any cases - in some cases such as hardware failures it is
better to crashing stop and have admin intervene (i.e. by reconfiguring the
cluster) than prematurely trying to repair autonomously which would not
succeed due to hardware failure, and worse proceeding on top of such
failures might compromise the safety properties ZooKeeper promised. Crashing
immediately sacrifice liveness to certain extent (if the quorum fails to
form by taking out the failed node), but kept the safety property. Another
benefit of crashing immediately is this could greatly simplify the system
model and behavior to reason about, for applications / management layers
built on top of ZK.
On the other side in some cases it might be totally reasonable to try
performing a repair and I remember there are some ZooKeeper JIRAs
describing such best-effort based action. But in general since ZooKeeper
does not deal with Byzantine Faults, it might not always be applicable for
such repair attempts.
On Mon, Jan 9, 2017 at 9:41 AM, Edward Ribeiro
wrote:
> Hi,
>
> I am not aware if this was a design decision, to be honest. AFAIK, this has
> been a long standing bug. :( I have compiled a handful of JIRA issues that
> are basically this problem scattered through multiple repetitive issues.
> Gonna aggregate them soon, I hope. We, the community, should raise the
> priority and tackle this issue to make ZK server more resilient and robust
> in the face of logs/txn files corruption. Any suggestion is more than
> welcome, by the way!
>
> Cheers,
> Eddie
>
>
> On Fri, Jan 6, 2017 at 6:38 PM, Aishwarya Ganesan
> wrote:
>
> > Hi,
> >
> > We are looking at how ZooKeeper handles silent data corruptions resulting
> > from underlying problems in disks and file systems atop them [1,2].
> >
> > We set up a 3-node ZooKeeper cluster and introduce silent data
> corruptions
> > to different blocks in the on-disk files. In all the cases, ZooKeeper is
> > able to detect corruptions in the log file using checksums.
> >
> > However, on detecting a corruption, the ZooKeeper node in which
> corruption
> > occurred crashes instead of trying to fix the corrupted data
> automatically
> > using the replicas. Why does ZooKeeper not fix the corrupted entry
> > automatically using replicas? What is the reason for this design
> decision?
> > It would be helpful if anyone could give some insights on this.
> >
> > [1] https://research.cs.wisc.edu/wind/Publications/zfs-
> > corruption-fast10.pdf
> > [2] http://www.cs.toronto.edu/~bianca/papers/fast08.pdf
> >
> > Thanks,
> > Aishwarya
> >
>
--
Cheers
Michael.