Fangmin Lv created ZOOKEEPER-2808: ------------------------------------- Summary: ACL with index 1 might be removed if it's only being used once Key: ZOOKEEPER-2808 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2808 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.6.0 Reporter: Fangmin Lv Priority: Critical
When Zeus start up, it will create DataTree instance, in which the empty config znode is created with READ_UNSAFE acl, the acl will be stored in a map with index 1. Then it's going to load the snapshot from disk, the nodes and acl map will be cleared, but the reconfig znode is still reference to acl index 1. The reconfig znode will be reused, so actually it may reference to a different ACL stored in the snasphot. After leader-follower syncing, the reconfig znode will be added back again (if it doesn't exist), which will remove the previous reference to ACL index 1, if the index 1 has 0 reference it will be removed from the ACL map, which could cause that ACL un-usable, and that znode will not be readable. Error logs related: ----------------------------- 2017-06-12 12:02:21,443 [myid:2] - ERROR [CommitProcWorkThread-14:DataTree@249] - ERROR: ACL not available for long 1 2017-06-12 12:02:21,444 [myid:2] - ERROR [CommitProcWorkThread-14:FinalRequestProcessor@567] - Failed to process sessionid:0x201035cc882002d type:getChildren cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a java.lang.RuntimeException: Failed to fetch acls for 1 at org.apache.zookeeper.server.DataTree.convertLong(DataTree.java:250) at org.apache.zookeeper.server.DataTree.getACL(DataTree.java:799) at org.apache.zookeeper.server.ZKDatabase.getACL(ZKDatabase.java:574) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:463) at org.apache.zookeeper.server.quorum.CommitProcessor$CommitWorkRequest.doWork(CommitProcessor.java:439) at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.4.14#64029)