Hi, On Sat, Mar 10, 2012 at 11:32 PM, Jörg Hoh <[email protected]> wrote: > We should have the possibility to create backup during normal operation of > the repository, without shutting down the repository and without major > impact to read or write performance. A true online backup.
With the Oak architecture as currently envisioned, there are at least three alternative ways to achieve this: 1) The MVCC model gives us a stable snapshot of the repository state at any given time, so a backup client should be able to export a snapshot of the entire repository without interfering (except for the extra IO overhead and potential cache impact) with normal repository use. 2) Assuming we get the clustering architecture right (which we should), it should be possible to start a new read-only node to an existing cluster, wait for it to synchronize all existing content from the rest of the cluster, and finally stop this backup node. The result should be a complete, runnable copy of the repository. 3) Since the Oak architecture builds on immutable data, most persistence models will likely employ an append-only approach with garbage-collection to clean up unused space. With little coordination from the garbage collector, it should be possible to also get a stable snapshot of the entire repository with native backup tools of the underlying persistence mechanism. > A bonus would be if this backup facility is additionally able to produce > a diff to the latest backup (incremental backup). I believe this should be doable with all the above approaches. BR, Jukka Zitting
