Re: Checkpointing on read only databases

Dale Harvey Tue, 15 Apr 2014 18:23:19 -0700

ah, yeh got it now, cheers


On 16 April 2014 02:17, Calvin Metcalf <[email protected]> wrote:

> Your source data base is upto seq 10, but the box its on catches fire. You
> have a backup though but its at seq 8, same UUID though but you'll miss the
> next 2 seqs.
> On Apr 15, 2014 8:57 PM, "Dale Harvey" <[email protected]> wrote:
>
> > Sorry still dont understand the problem here
> >
> > The uuid is stored inside the database file, you either have the same
> data
> > and the same uuid, or none of them?
> >
> >
> > On 15 April 2014 19:54, Calvin Metcalf <[email protected]> wrote:
> >
> > > I think the problem is not as much deleting and recreating a database
> but
> > > wiping a virtual machine and restoring from a backup, now you have more
> > or
> > > less gone back in time with the target database and it has different
> > stuff
> > > but the same uuid.
> > >
> > >
> > > On Tue, Apr 15, 2014 at 2:32 PM, Dale Harvey <[email protected]>
> > wrote:
> > >
> > > > I dont understand the problem with per db uuids, so the uuid isnt
> > > > multivalued nor is it queried
> > > >
> > > >    A is readyonly, B is client, B starts replication from A
> > > >    B reads the db uuid from A / itself, generates a replication_id,
> > > stores
> > > > on B
> > > >    try to fetch replication checkpoint, if successful we query
> changes
> > > from
> > > > since?
> > > >
> > > > In pouch we store the uuid along with the data, so file based backups
> > > arent
> > > > a problem, seems couchdb could / should do that too
> > > >
> > > > This also fixes the problem mentioned on the mailing list, and one I
> > have
> > > > run into personally where people forward db requests but not server
> > > > requests via a proxy
> > > >
> > > >
> > > > On 15 April 2014 19:18, Calvin Metcalf <[email protected]>
> > wrote:
> > > >
> > > > > except there is no way to calculate that from outside the database
> as
> > > > > changes only ever gives the more recent document version.
> > > > >
> > > > >
> > > > > On Sun, Apr 13, 2014 at 9:47 PM, Calvin Metcalf <
> > > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > oo didn't think of that, yeah uuids wouldn't hurt, though the
> more
> > I
> > > > > think
> > > > > > about the rolling hashing on revs, the more I like that
> > > > > >
> > > > > >
> > > > > > On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <
> > > > > [email protected]>wrote:
> > > > > >
> > > > > >> Yes, but then sysadmins have to be very very careful about
> > restoring
> > > > > from
> > > > > >> a file-based backup. We run the risk that {uuid, seq} could be
> > > > > >> multi-valued, which diminishes its value considerably.
> > > > > >>
> > > > > >> I like the UUID in general -- we've added them to our internal
> > shard
> > > > > >> files at Cloudant -- but on their own they're not a bulletproof
> > > > solution
> > > > > >> for read-only incremental replications.
> > > > > >>
> > > > > >> Adam
> > > > > >>
> > > > > >> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <
> > > > [email protected]
> > > > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > I mean if your going to add new features to couch you could
> just
> > > > have
> > > > > >> the
> > > > > >> > db generate a random uuid on creation that would be different
> if
> > > it
> > > > > was
> > > > > >> > deleted and recreated
> > > > > >> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <
> > > > [email protected]>
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >> Other thoughts:
> > > > > >> >>
> > > > > >> >> - We could enhance the authorization system to have a role
> that
> > > > > allows
> > > > > >> >> updates to _local docs but nothing else. It wouldn't make
> sense
> > > for
> > > > > >> >> completely untrusted peers, but it could give peace of mind
> to
> > > > > >> sysadmins
> > > > > >> >> trying to execute replications with the minimum level of
> access
> > > > > >> possible.
> > > > > >> >>
> > > > > >> >> - We could teach the sequence index to maintain a report of
> > > rolling
> > > > > >> hash
> > > > > >> >> of the {id,rev} pairs that comprise the database up to that
> > > > sequence,
> > > > > >> >> record that in the replication checkpoint document, and check
> > > that
> > > > > it's
> > > > > >> >> unchanged on resume. It's a new API enhancement and it grows
> > the
> > > > > >> amount of
> > > > > >> >> information stored with each sequence, but it completely
> closes
> > > off
> > > > > the
> > > > > >> >> probabilistic edge case associated with simply checking that
> > the
> > > > {id,
> > > > > >> rev}
> > > > > >> >> associated with the checkpointed sequence has not changed.
> > > Perhaps
> > > > > >> overkill
> > > > > >> >> for what is admittedly a pretty low-probability event.
> > > > > >> >>
> > > > > >> >> Adam
> > > > > >> >>
> > > > > >> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <
> > > > > [email protected]>
> > > > > >> >> wrote:
> > > > > >> >>
> > > > > >> >>> Yeah, this is a subtle little thing. The main reason we
> > > checkpoint
> > > > > on
> > > > > >> >> both source and target and compare is to cover the case where
> > the
> > > > > >> source
> > > > > >> >> database is deleted and recreated in between replication
> > > attempts.
> > > > If
> > > > > >> that
> > > > > >> >> were to happen and the replicator just resumes blindly from
> the
> > > > > >> checkpoint
> > > > > >> >> sequence stored on the target then the replication could
> > > > permanently
> > > > > >> miss
> > > > > >> >> some documents written to the new source.
> > > > > >> >>>
> > > > > >> >>> I'd love to have a robust solution for incremental
> replication
> > > of
> > > > > >> >> read-only databases. To first order a UUID on the source
> > database
> > > > > that
> > > > > >> was
> > > > > >> >> fixed at create time could do the trick, but we'll run into
> > > trouble
> > > > > >> with
> > > > > >> >> file-based backup and restores. If a database file is
> restored
> > > to a
> > > > > >> point
> > > > > >> >> before the latest replication checkpoint we'd again be in a
> > > > position
> > > > > of
> > > > > >> >> potentially permanently missing updates.
> > > > > >> >>>
> > > > > >> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead
> of
> > > > simply
> > > > > >> seq
> > > > > >> >> as the checkpoint information would dramatically reduce the
> > > > > likelihood
> > > > > >> of
> > > > > >> >> that type of permanent skip in the replication, but it's
> only a
> > > > > >> >> probabilistic answer.
> > > > > >> >>>
> > > > > >> >>> Adam
> > > > > >> >>>
> > > > > >> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <
> > > > > >> [email protected]>
> > > > > >> >>> wrote:
> > > > > >> >>>
> > > > > >> >>>> Though currently we have the opposite problem right if we
> > > delete
> > > > > the
> > > > > >> >> target
> > > > > >> >>>> db? (this on me brain storming)
> > > > > >> >>>>
> > > > > >> >>>> Could we store last rev in addition to last seq?
> > > > > >> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <
> [email protected]
> > >
> > > > > wrote:
> > > > > >> >>>>>
> > > > > >> >>>>> If the src database was to be wiped, when we restarted
> > > > replication
> > > > > >> >> nothing
> > > > > >> >>>>> would happen until the source database caught up to the
> > > > previously
> > > > > >> >> written
> > > > > >> >>>>> checkpoint
> > > > > >> >>>>>
> > > > > >> >>>>> create A, write 5 documents
> > > > > >> >>>>> replicate 5 documents A -> B, write checkpoint 5 on B
> > > > > >> >>>>> destroy A
> > > > > >> >>>>> write 4 documents
> > > > > >> >>>>> replicate A -> B, pick up checkpoint from B and to
> ?since=5
> > > > > >> >>>>> .. no documents written
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
> > > > > >> >>>>> our test that covers it
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>> On 13 April 2014 18:02, Calvin Metcalf <
> > > > [email protected]>
> > > > > >> >> wrote:
> > > > > >> >>>>>
> > > > > >> >>>>>> If we were to unilaterally switch to checkpoint on target
> > > what
> > > > > >> would
> > > > > >> >>>>>> happen, replication in progress would loose their place?
> > > > > >> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <
> > > [email protected]>
> > > > > >> wrote:
> > > > > >> >>>>>>>
> > > > > >> >>>>>>> So with checkpointing we write the checkpoint to both A
> > and
> > > B
> > > > > and
> > > > > >> >>>>> verify
> > > > > >> >>>>>>> they match before using the checkpoint
> > > > > >> >>>>>>>
> > > > > >> >>>>>>> What happens if the src of the replication is read only?
> > > > > >> >>>>>>>
> > > > > >> >>>>>>> As far as I can tell couch will just checkout a
> > > > > >> >> checkpoint_commit_error
> > > > > >> >>>>>> and
> > > > > >> >>>>>>> carry on from the start, The only improvement I can
> think
> > of
> > > > is
> > > > > >> the
> > > > > >> >>>>> user
> > > > > >> >>>>>>> specifies they know the src is read only and to only use
> > the
> > > > > >> target
> > > > > >> >>>>>>> checkpoint, we can 'possibly' make that happen
> > automatically
> > > > if
> > > > > >> the
> > > > > >> >> src
> > > > > >> >>>>>>> specifically fails the write due to permissions.
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Calvin W. Metcalf
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Calvin W. Metcalf
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Calvin W. Metcalf
> > >
> >
>

Re: Checkpointing on read only databases

Reply via email to