Jay Bae created CURATOR-76:
------------------------------

             Summary: Adding leader selection and TTL feature in ChildReaper 
recipe
                 Key: CURATOR-76
                 URL: https://issues.apache.org/jira/browse/CURATOR-76
             Project: Apache Curator
          Issue Type: Bug
          Components: Recipes
            Reporter: Jay Bae


We are having serious data corruption issue when we are rolling restart of 
zookeeper servers due to one application which is using ChildReaper recipe. I 
am not sure its root cause but my theory is, when the multiple instances are 
running ChildReaper recipe, they would conflict each other among checking exist 
and deleting paths. This conflict can cause data corruption. We observed all 
servers died due to corrupted data and we had to manually copy log/snapshot 
data and restart them.

Also, it wouldn't be enough checking simply whether the zknode is empty. It 
would be better if ChildReaper is checking the node is empty and it's not 
modified for the amount of time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to