Would it be safer to have a low- and high- watermark for the update_seq in memory? What I mean is that the db writer will never write out an update_seq that is N higher than the last committed one; if it is forced to do so, to permit a write, it then fsync's and resets high_seq to last_committed_seq. This way you can genuinely ensure that you don't reuse an update_seq. In practice we could allow a large delta, one that is larger than the number of fsyncs we expect to manage in the commit interval.
Your idea to just bump the update_seq "significantly" mostly pans out (I know a system that does precisely this) but it would be a data loss scenario if when it doesn't pan out. B. On Mon, Apr 12, 2010 at 3:54 AM, Adam Kocoloski <[email protected]> wrote: > Currently a DB update_seq can be reused if there's a power failure before the > header is sync'ed to disk. This adds some extra complexity and overhead to > the replicator, which must confirm before saving a checkpoint that the source > update_seq it is recording will not be reused later. It does this by issuing > an ensure_full_commit call to the source DB, which may be a pretty expensive > operation if the source has a constant write load. > > Should we try to fix that? One way to do so would be start at a > significantly higher update_seq than the committed one whenever the DB is > opened after an "unclean" shutdown; that is, one where the DB header is not > the last term stored in the file. Although, I suppose that's not an ironclad > test for data loss -- it might be the case that none of the lost updates were > written to the file. I suppose we could "bump" the update_seq on every > startup. > > Adam > >
