There's an algorithm in the BigCouch codebase for storing up to N checkpoints with exponentially increasing granularity (in terms of sequence values) between them. It strikes a nice balance between checkpoint document size and ability to resume with minimal replay.
Adam > On Apr 23, 2014, at 11:28 AM, Calvin Metcalf <[email protected]> wrote: > > with the rolling hash thingy, a checkpoint document could store more then > one database hash, e.g. the last 5 but totally up to whoever is storing > the checkpoint. This would cover the case where you stop the replication > after one of the dbs has stored the checkpoint but before the other one > has. > > >> On Tue, Apr 15, 2014 at 9:21 PM, Dale Harvey <[email protected]> wrote: >> >> ah, yeh got it now, cheers >> >> >>> On 16 April 2014 02:17, Calvin Metcalf <[email protected]> wrote: >>> >>> Your source data base is upto seq 10, but the box its on catches fire. >> You >>> have a backup though but its at seq 8, same UUID though but you'll miss >> the >>> next 2 seqs. >>>> On Apr 15, 2014 8:57 PM, "Dale Harvey" <[email protected]> wrote: >>>> >>>> Sorry still dont understand the problem here >>>> >>>> The uuid is stored inside the database file, you either have the same >>> data >>>> and the same uuid, or none of them? >>>> >>>> >>>> On 15 April 2014 19:54, Calvin Metcalf <[email protected]> >> wrote: >>>> >>>>> I think the problem is not as much deleting and recreating a database >>> but >>>>> wiping a virtual machine and restoring from a backup, now you have >> more >>>> or >>>>> less gone back in time with the target database and it has different >>>> stuff >>>>> but the same uuid. >>>>> >>>>> >>>>>> On Tue, Apr 15, 2014 at 2:32 PM, Dale Harvey <[email protected]> >>>>> wrote: >>>>> >>>>>> I dont understand the problem with per db uuids, so the uuid isnt >>>>>> multivalued nor is it queried >>>>>> >>>>>> A is readyonly, B is client, B starts replication from A >>>>>> B reads the db uuid from A / itself, generates a replication_id, >>>>> stores >>>>>> on B >>>>>> try to fetch replication checkpoint, if successful we query >>> changes >>>>> from >>>>>> since? >>>>>> >>>>>> In pouch we store the uuid along with the data, so file based >> backups >>>>> arent >>>>>> a problem, seems couchdb could / should do that too >>>>>> >>>>>> This also fixes the problem mentioned on the mailing list, and one >> I >>>> have >>>>>> run into personally where people forward db requests but not server >>>>>> requests via a proxy >>>>>> >>>>>> >>>>>> On 15 April 2014 19:18, Calvin Metcalf <[email protected]> >>>> wrote: >>>>>> >>>>>>> except there is no way to calculate that from outside the >> database >>> as >>>>>>> changes only ever gives the more recent document version. >>>>>>> >>>>>>> >>>>>>> On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf < >>>>>> [email protected] >>>>>>>> wrote: >>>>>>> >>>>>>>> oo didn't think of that, yeah uuids wouldn't hurt, though the >>> more >>>> I >>>>>>> think >>>>>>>> about the rolling hashing on revs, the more I like that >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski < >>>>>>> [email protected]>wrote: >>>>>>>> >>>>>>>>> Yes, but then sysadmins have to be very very careful about >>>> restoring >>>>>>> from >>>>>>>>> a file-based backup. We run the risk that {uuid, seq} could be >>>>>>>>> multi-valued, which diminishes its value considerably. >>>>>>>>> >>>>>>>>> I like the UUID in general -- we've added them to our internal >>>> shard >>>>>>>>> files at Cloudant -- but on their own they're not a >> bulletproof >>>>>> solution >>>>>>>>> for read-only incremental replications. >>>>>>>>> >>>>>>>>> Adam >>>>>>>>> >>>>>>>>>> On Apr 13, 2014, at 5:16 PM, Calvin Metcalf < >>>>>> [email protected] >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I mean if your going to add new features to couch you could >>> just >>>>>> have >>>>>>>>> the >>>>>>>>>> db generate a random uuid on creation that would be >> different >>> if >>>>> it >>>>>>> was >>>>>>>>>> deleted and recreated >>>>>>>>>>> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" < >>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Other thoughts: >>>>>>>>>>> >>>>>>>>>>> - We could enhance the authorization system to have a role >>> that >>>>>>> allows >>>>>>>>>>> updates to _local docs but nothing else. It wouldn't make >>> sense >>>>> for >>>>>>>>>>> completely untrusted peers, but it could give peace of mind >>> to >>>>>>>>> sysadmins >>>>>>>>>>> trying to execute replications with the minimum level of >>> access >>>>>>>>> possible. >>>>>>>>>>> >>>>>>>>>>> - We could teach the sequence index to maintain a report of >>>>> rolling >>>>>>>>> hash >>>>>>>>>>> of the {id,rev} pairs that comprise the database up to that >>>>>> sequence, >>>>>>>>>>> record that in the replication checkpoint document, and >> check >>>>> that >>>>>>> it's >>>>>>>>>>> unchanged on resume. It's a new API enhancement and it >> grows >>>> the >>>>>>>>> amount of >>>>>>>>>>> information stored with each sequence, but it completely >>> closes >>>>> off >>>>>>> the >>>>>>>>>>> probabilistic edge case associated with simply checking >> that >>>> the >>>>>> {id, >>>>>>>>> rev} >>>>>>>>>>> associated with the checkpointed sequence has not changed. >>>>> Perhaps >>>>>>>>> overkill >>>>>>>>>>> for what is admittedly a pretty low-probability event. >>>>>>>>>>> >>>>>>>>>>> Adam >>>>>>>>>>> >>>>>>>>>>> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski < >>>>>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Yeah, this is a subtle little thing. The main reason we >>>>> checkpoint >>>>>>> on >>>>>>>>>>> both source and target and compare is to cover the case >> where >>>> the >>>>>>>>> source >>>>>>>>>>> database is deleted and recreated in between replication >>>>> attempts. >>>>>> If >>>>>>>>> that >>>>>>>>>>> were to happen and the replicator just resumes blindly from >>> the >>>>>>>>> checkpoint >>>>>>>>>>> sequence stored on the target then the replication could >>>>>> permanently >>>>>>>>> miss >>>>>>>>>>> some documents written to the new source. >>>>>>>>>>>> >>>>>>>>>>>> I'd love to have a robust solution for incremental >>> replication >>>>> of >>>>>>>>>>> read-only databases. To first order a UUID on the source >>>> database >>>>>>> that >>>>>>>>> was >>>>>>>>>>> fixed at create time could do the trick, but we'll run into >>>>> trouble >>>>>>>>> with >>>>>>>>>>> file-based backup and restores. If a database file is >>> restored >>>>> to a >>>>>>>>> point >>>>>>>>>>> before the latest replication checkpoint we'd again be in a >>>>>> position >>>>>>> of >>>>>>>>>>> potentially permanently missing updates. >>>>>>>>>>>> >>>>>>>>>>>> Calvin's suggestion of storing e.g. {seq, id, rev} instead >>> of >>>>>> simply >>>>>>>>> seq >>>>>>>>>>> as the checkpoint information would dramatically reduce the >>>>>>> likelihood >>>>>>>>> of >>>>>>>>>>> that type of permanent skip in the replication, but it's >>> only a >>>>>>>>>>> probabilistic answer. >>>>>>>>>>>> >>>>>>>>>>>> Adam >>>>>>>>>>>> >>>>>>>>>>>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf < >>>>>>>>> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Though currently we have the opposite problem right if we >>>>> delete >>>>>>> the >>>>>>>>>>> target >>>>>>>>>>>>> db? (this on me brain storming) >>>>>>>>>>>>> >>>>>>>>>>>>> Could we store last rev in addition to last seq? >>>>>>>>>>>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" < >>> [email protected] >>>>> >>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> If the src database was to be wiped, when we restarted >>>>>> replication >>>>>>>>>>> nothing >>>>>>>>>>>>>> would happen until the source database caught up to the >>>>>> previously >>>>>>>>>>> written >>>>>>>>>>>>>> checkpoint >>>>>>>>>>>>>> >>>>>>>>>>>>>> create A, write 5 documents >>>>>>>>>>>>>> replicate 5 documents A -> B, write checkpoint 5 on B >>>>>>>>>>>>>> destroy A >>>>>>>>>>>>>> write 4 documents >>>>>>>>>>>>>> replicate A -> B, pick up checkpoint from B and to >>> ?since=5 >>>>>>>>>>>>>> .. no documents written >> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is >>>>>>>>>>>>>> our test that covers it >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 13 April 2014 18:02, Calvin Metcalf < >>>>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we were to unilaterally switch to checkpoint on >> target >>>>> what >>>>>>>>> would >>>>>>>>>>>>>>> happen, replication in progress would loose their >> place? >>>>>>>>>>>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" < >>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So with checkpointing we write the checkpoint to both >> A >>>> and >>>>> B >>>>>>> and >>>>>>>>>>>>>> verify >>>>>>>>>>>>>>>> they match before using the checkpoint >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What happens if the src of the replication is read >> only? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As far as I can tell couch will just checkout a >>>>>>>>>>> checkpoint_commit_error >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> carry on from the start, The only improvement I can >>> think >>>> of >>>>>> is >>>>>>>>> the >>>>>>>>>>>>>> user >>>>>>>>>>>>>>>> specifies they know the src is read only and to only >> use >>>> the >>>>>>>>> target >>>>>>>>>>>>>>>> checkpoint, we can 'possibly' make that happen >>>> automatically >>>>>> if >>>>>>>>> the >>>>>>>>>>> src >>>>>>>>>>>>>>>> specifically fails the write due to permissions. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -Calvin W. Metcalf >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -Calvin W. Metcalf >>>>> >>>>> >>>>> >>>>> -- >>>>> -Calvin W. Metcalf > > > > -- > -Calvin W. Metcalf
