Other thoughts:
- We could enhance the authorization system to have a role that allows updates
to _local docs but nothing else. It wouldn't make sense for completely
untrusted peers, but it could give peace of mind to sysadmins trying to execute
replications with the minimum level of access possible.
- We could teach the sequence index to maintain a report of rolling hash of the
{id,rev} pairs that comprise the database up to that sequence, record that in
the replication checkpoint document, and check that it's unchanged on resume.
It's a new API enhancement and it grows the amount of information stored with
each sequence, but it completely closes off the probabilistic edge case
associated with simply checking that the {id, rev} associated with the
checkpointed sequence has not changed. Perhaps overkill for what is admittedly
a pretty low-probability event.
Adam
On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <[email protected]> wrote:
> Yeah, this is a subtle little thing. The main reason we checkpoint on both
> source and target and compare is to cover the case where the source database
> is deleted and recreated in between replication attempts. If that were to
> happen and the replicator just resumes blindly from the checkpoint sequence
> stored on the target then the replication could permanently miss some
> documents written to the new source.
>
> I'd love to have a robust solution for incremental replication of read-only
> databases. To first order a UUID on the source database that was fixed at
> create time could do the trick, but we'll run into trouble with file-based
> backup and restores. If a database file is restored to a point before the
> latest replication checkpoint we'd again be in a position of potentially
> permanently missing updates.
>
> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply seq as
> the checkpoint information would dramatically reduce the likelihood of that
> type of permanent skip in the replication, but it's only a probabilistic
> answer.
>
> Adam
>
> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <[email protected]> wrote:
>
>> Though currently we have the opposite problem right if we delete the target
>> db? (this on me brain storming)
>>
>> Could we store last rev in addition to last seq?
>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <[email protected]> wrote:
>>
>>> If the src database was to be wiped, when we restarted replication nothing
>>> would happen until the source database caught up to the previously written
>>> checkpoint
>>>
>>> create A, write 5 documents
>>> replicate 5 documents A -> B, write checkpoint 5 on B
>>> destroy A
>>> write 4 documents
>>> replicate A -> B, pick up checkpoint from B and to ?since=5
>>> .. no documents written
>>>
>>>
>>> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
>>> our test that covers it
>>>
>>>
>>> On 13 April 2014 18:02, Calvin Metcalf <[email protected]> wrote:
>>>
>>>> If we were to unilaterally switch to checkpoint on target what would
>>>> happen, replication in progress would loose their place?
>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <[email protected]> wrote:
>>>>
>>>>> So with checkpointing we write the checkpoint to both A and B and
>>> verify
>>>>> they match before using the checkpoint
>>>>>
>>>>> What happens if the src of the replication is read only?
>>>>>
>>>>> As far as I can tell couch will just checkout a checkpoint_commit_error
>>>> and
>>>>> carry on from the start, The only improvement I can think of is the
>>> user
>>>>> specifies they know the src is read only and to only use the target
>>>>> checkpoint, we can 'possibly' make that happen automatically if the src
>>>>> specifically fails the write due to permissions.
>>>>>
>>>>
>>>
>