Fangmin Lv created ZOOKEEPER-3658: ------------------------------------- Summary: Potential data inconsistency due to txns gap in committedLog when ZkDB not fully shutdown Key: ZOOKEEPER-3658 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3658 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.6, 3.6.0 Reporter: Fangmin Lv Assignee: Fangmin Lv
During DIFF sync, the txns will be applied to learner's DataTree but it won't be added into the in memory committed txns cache in ZkDatabase. If this server became new leader later, and when other servers try to sync with it, it may cause data inconsistency due to part of txns are missing. This is not a problem if we fully shutdown the ZkDB and reload from disk, but the current behavior in 3.5 and 3.6 will not fully shutdown the DB, which is a nice optimization to reduce the unavailable time with large snapshot. Internally, we have another version of 'Retain DB' implementation, and we caught this issue with the digest feature we just upstreamed, and have fixed that internally. Just realized we haven't upstreamed that, and this is the Jira for that issue, will send a PR for this soon. -- This message was sent by Atlassian Jira (v8.3.4#803005)