[ https://issues.apache.org/jira/browse/ZOOKEEPER-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459705#comment-17459705 ]
Mate Szalay-Beko commented on ZOOKEEPER-4425: --------------------------------------------- I see some points why this would be useful, but on the other hand this will not solve the "problem" described above. The snapshots are not enough by themselves to restore the state, you always need the snapshots and the write-ahead-logs (aka. "transaction logs"). See: https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#sc_dataFileManagement bq. The snapshot files stored in the data directory are fuzzy snapshots in the sense that during the time the ZooKeeper server is taking the snapshot, updates are occurring to the data tree. The suffix of the snapshot file names is the zxid, the ZooKeeper transaction id, of the last committed transaction at the start of the snapshot. Thus, the snapshot includes a subset of the updates to the data tree that occurred while the snapshot was in process. The snapshot, then, may not correspond to any data tree that actually existed, and for this reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can recover using this snapshot because it takes advantage of the idempotent nature of its updates. By replaying the transaction log against fuzzy snapshots ZooKeeper gets the state of the system at the end of the log. So in general, if you need to restore ZooKeeper, then simply you need to backup the last snapshot, plus the transaction logs that were flushed since the time of the snapshot creation start. (or usually just to be on the safe side, you take the last 2-3 snapshots, plus the related transaction logs, to be sure that you can recover even if the latest snapshot is corrupted for some reason). Forcing the generation of a snapshot doesn't really change this, you will still need the transaction logs to be able to restore. Of course, the more recent the snapshots are, the less number of transaction logs you need and in general the recovery (startup time) can be quicker. > 4lw Command: On demand snapshot > ------------------------------- > > Key: ZOOKEEPER-4425 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4425 > Project: ZooKeeper > Issue Type: New Feature > Components: server > Reporter: Pablo Francisco Pérez Hidalgo > Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Working with disaster recovery scenarios at work, we found that having the > capacity of telling a ZooKeeper instance to take a snapshot, thus dumping > into files the contents of its view of the internal database, could be a last > resource hatch out of potential data loss. > As an example, imagine that all the voting members of the ensemble are wiped > out due a wrong deployment configuration change. A single surviving observer > could hold on its memory the last copy of most recently updated the ensemble > data. Sending it a _*snap*_ *four letter words command* that forced it to > save a snapshot of that information into disk could be a very convenient way > of recovering the database. > > This issue aims to discuss the addition of this feature and serve as the gate > for a an already available patch providing this feature. -- This message was sent by Atlassian Jira (v8.20.1#820001)