Alex Mirgorodskiy created ZOOKEEPER-3591:
--------------------------------------------

             Summary: Inconsistent resync with dynamic reconfig
                 Key: ZOOKEEPER-3591
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3591
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.5.5
            Reporter: Alex Mirgorodskiy
         Attachments: instance1.log.gz, instance6.log.gz

We've run into a problem where one of the zookeeper instances lost most of its 
data after its zk process has been restarted. We suspect an interaction between 
dynamic reconfiguration and snapshot-based resync of that instance. Details and 
some amateurish analysis are below. We can also upload transaction logs, if 
need be.

We have a 6-instance ensemble running version 3.5.5 with 3 quorum members and 3 
observers. One of the observers (Instance 6) saw its db shrink from 3162 znodes 
down to 10 after that instance restarted:

> 2019-10-13T16:44:19.060+0000 [.zk-monitor-0] Monitor command mntr: zk_version 
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3162
> --
> 2019-10-13T16:48:32.713+0000 [.zk-monitor-0] Monitor command mntr: zk_version 
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 10

Contrast it with Instance 1 that was the leader at the time, and whose 
znode_count remained stable around 3000:

> 2019-10-13T16:44:48.625+0000 [.zk-monitor-0] Monitor command mntr: zk_version 
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3178
> --
> ...
> --
> 2019-10-13T16:48:48.731+0000 [.zk-monitor-0] Monitor command mntr: zk_version 
> 3.5.5-afd10a8846b22a34c5a818034bb22e99dd44587b, built on 09/16/2019 18:31 GMT
> zk_znode_count 3223

It appears that the problem had happened 30 minutes earlier, when Instance 6 
got resynced from the leader via the Snap method, yet skipped creating an 
on-disk snapshot. The end result was that the in-memory state was fine, but 
there was only the primordial snapshot.0 on disk, and transaction logs only 
started after the missing snapshot:

$ ls -l version-2
> total 1766
> -rw-r--r-- 1 daautomation daautomation 1 Oct 13 09:14 acceptedEpoch
> -rw-r--r-- 1 daautomation daautomation 1 Oct 13 10:12 currentEpoch
> -rw-r--r-- 1 daautomation daautomation 2097168 Oct 13 09:44 log.6000002e0
> -rw-r--r-- 1 daautomation daautomation 1048592 Oct 13 10:09 log.600001f1b
> -rw-r--r-- 1 daautomation daautomation 4194320 Oct 13 12:16 log.600003310
> -rw-r--r-- 1 daautomation daautomation 770 Oct 13 09:14 snapshot.0

So the zk reboot wiped out most of the state.

Dynamic reconfig might be relevant here. Instance 6 started as an observer, got 
removed, and immediately re-added as a participant. Instance 2 went the other 
way, from participant to observer:

> 2019-10-13T16:14:19.323+0000 ZK reconfig: removing node 6
> 2019-10-13T16:14:19.359+0000 ZK reconfig: adding 
> server.6=10.80.209.138:2888:3888:participant;0.0.0.0:2181
> 2019-10-13T16:14:19.399+0000 ZK reconfig: adding 
> server.2=10.80.209.131:2888:3888:observer;0.0.0.0:2181

Looking at the logs, Instance 6 started and received a resync snapshot from the 
leader right before the dynamic reconfig:

> 2019-10-13T16:14:19.284+0000 
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Getting a 
> snapshot from leader 0x6000002dd
> ...
> 2019-10-13T16:14:19.401+0000 
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Got zxid 
> 0x6000002de expected 0x1

Had it processed the NEWLEADER packet afterwards, it would've persisted the 
snapshot locally. But there's no NEWLEADER message in the Instance 6 log. 
Instead, there's a "changes proposed in reconfig" exception, likely a result of 
the instance getting dynamically removed and re-added as a participant:

> 2019-10-13T16:14:19.467+0000 
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Becoming a 
> non-voting participant
> 2019-10-13T16:14:19.467+0000 
> [.QuorumPeer[myid=6](plain=/0.0.0.0:2181)(secure=disabled)] Exception when 
> observing the leaderjava.lang.Exception: changes proposed in reconfig\n\tat 
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:506)\n\tat
>  
> org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:74)\n\tat
>  org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1258)

Perhaps the NEWLEADER packet was still in the socket, but sitting behing 
INFORMANDACTIVATE, whose exception prevented us from processing NEWLEADER?

Also, it may or may not be related, but this area got changed recently as part 
of https://issues.apache.org/jira/browse/ZOOKEEPER-3104.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to