Re: Checkpointing on read only databases

Dale Harvey Tue, 15 Apr 2014 11:33:07 -0700

I dont understand the problem with per db uuids, so the uuid isnt
multivalued nor is it queried


   A is readyonly, B is client, B starts replication from A
   B reads the db uuid from A / itself, generates a replication_id, stores
on B
   try to fetch replication checkpoint, if successful we query changes from
since?

In pouch we store the uuid along with the data, so file based backups arent
a problem, seems couchdb could / should do that too

This also fixes the problem mentioned on the mailing list, and one I have
run into personally where people forward db requests but not server
requests via a proxy


On 15 April 2014 19:18, Calvin Metcalf <[email protected]> wrote:

> except there is no way to calculate that from outside the database as
> changes only ever gives the more recent document version.
>
>
> On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf <[email protected]
> >wrote:
>
> > oo didn't think of that, yeah uuids wouldn't hurt, though the more I
> think
> > about the rolling hashing on revs, the more I like that
> >
> >
> > On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <
> [email protected]>wrote:
> >
> >> Yes, but then sysadmins have to be very very careful about restoring
> from
> >> a file-based backup. We run the risk that {uuid, seq} could be
> >> multi-valued, which diminishes its value considerably.
> >>
> >> I like the UUID in general -- we've added them to our internal shard
> >> files at Cloudant -- but on their own they're not a bulletproof solution
> >> for read-only incremental replications.
> >>
> >> Adam
> >>
> >> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <[email protected]
> >
> >> wrote:
> >> >
> >> > I mean if your going to add new features to couch you could just have
> >> the
> >> > db generate a random uuid on creation that would be different if it
> was
> >> > deleted and recreated
> >> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <[email protected]>
> >> wrote:
> >> >>
> >> >> Other thoughts:
> >> >>
> >> >> - We could enhance the authorization system to have a role that
> allows
> >> >> updates to _local docs but nothing else. It wouldn't make sense for
> >> >> completely untrusted peers, but it could give peace of mind to
> >> sysadmins
> >> >> trying to execute replications with the minimum level of access
> >> possible.
> >> >>
> >> >> - We could teach the sequence index to maintain a report of rolling
> >> hash
> >> >> of the {id,rev} pairs that comprise the database up to that sequence,
> >> >> record that in the replication checkpoint document, and check that
> it's
> >> >> unchanged on resume. It's a new API enhancement and it grows the
> >> amount of
> >> >> information stored with each sequence, but it completely closes off
> the
> >> >> probabilistic edge case associated with simply checking that the {id,
> >> rev}
> >> >> associated with the checkpointed sequence has not changed. Perhaps
> >> overkill
> >> >> for what is admittedly a pretty low-probability event.
> >> >>
> >> >> Adam
> >> >>
> >> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <
> [email protected]>
> >> >> wrote:
> >> >>
> >> >>> Yeah, this is a subtle little thing. The main reason we checkpoint
> on
> >> >> both source and target and compare is to cover the case where the
> >> source
> >> >> database is deleted and recreated in between replication attempts. If
> >> that
> >> >> were to happen and the replicator just resumes blindly from the
> >> checkpoint
> >> >> sequence stored on the target then the replication could permanently
> >> miss
> >> >> some documents written to the new source.
> >> >>>
> >> >>> I'd love to have a robust solution for incremental replication of
> >> >> read-only databases. To first order a UUID on the source database
> that
> >> was
> >> >> fixed at create time could do the trick, but we'll run into trouble
> >> with
> >> >> file-based backup and restores. If a database file is restored to a
> >> point
> >> >> before the latest replication checkpoint we'd again be in a position
> of
> >> >> potentially permanently missing updates.
> >> >>>
> >> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply
> >> seq
> >> >> as the checkpoint information would dramatically reduce the
> likelihood
> >> of
> >> >> that type of permanent skip in the replication, but it's only a
> >> >> probabilistic answer.
> >> >>>
> >> >>> Adam
> >> >>>
> >> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <
> >> [email protected]>
> >> >>> wrote:
> >> >>>
> >> >>>> Though currently we have the opposite problem right if we delete
> the
> >> >> target
> >> >>>> db? (this on me brain storming)
> >> >>>>
> >> >>>> Could we store last rev in addition to last seq?
> >> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <[email protected]>
> wrote:
> >> >>>>>
> >> >>>>> If the src database was to be wiped, when we restarted replication
> >> >> nothing
> >> >>>>> would happen until the source database caught up to the previously
> >> >> written
> >> >>>>> checkpoint
> >> >>>>>
> >> >>>>> create A, write 5 documents
> >> >>>>> replicate 5 documents A -> B, write checkpoint 5 on B
> >> >>>>> destroy A
> >> >>>>> write 4 documents
> >> >>>>> replicate A -> B, pick up checkpoint from B and to ?since=5
> >> >>>>> .. no documents written
> >> >>
> >>
> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
> >> >>>>> our test that covers it
> >> >>>>>
> >> >>>>>
> >> >>>>> On 13 April 2014 18:02, Calvin Metcalf <[email protected]>
> >> >> wrote:
> >> >>>>>
> >> >>>>>> If we were to unilaterally switch to checkpoint on target what
> >> would
> >> >>>>>> happen, replication in progress would loose their place?
> >> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <[email protected]>
> >> wrote:
> >> >>>>>>>
> >> >>>>>>> So with checkpointing we write the checkpoint to both A and B
> and
> >> >>>>> verify
> >> >>>>>>> they match before using the checkpoint
> >> >>>>>>>
> >> >>>>>>> What happens if the src of the replication is read only?
> >> >>>>>>>
> >> >>>>>>> As far as I can tell couch will just checkout a
> >> >> checkpoint_commit_error
> >> >>>>>> and
> >> >>>>>>> carry on from the start, The only improvement I can think of is
> >> the
> >> >>>>> user
> >> >>>>>>> specifies they know the src is read only and to only use the
> >> target
> >> >>>>>>> checkpoint, we can 'possibly' make that happen automatically if
> >> the
> >> >> src
> >> >>>>>>> specifically fails the write due to permissions.
> >> >>
> >> >>
> >>
> >
> >
> >
> > --
> > -Calvin W. Metcalf
> >
>
>
>
> --
> -Calvin W. Metcalf
>

Re: Checkpointing on read only databases

Reply via email to