On Jan 24, 2010, at 5:42 PM, Paul Serice wrote:

> On Sun, 2010-01-24 at 15:35 -0500, D. Richard Hipp wrote:
>> Could this be a case of "we've never done it that way before"?
>
> I think it's more a case of "been there, done that, never want to do
> it again."  Just search for "Berkeley DB usage leading to respository
> corruption and data loss" on the Wikipedia page for Subversion.

That's why subversion switched to SQLite, isn't it?


>
> I think an analogous (and common experience) is losing most of a
> *.tar.gpg file because one byte early in the file is corrupt.  (I
> think the same thing is true of *.tar.gz but not *.tar.bz2.)  On the
> other hand, if your archive is just *.tar, one corrupt byte will cause
> you to lose at most up to the start of the next file in the archive.
>
> What's the story with SQLite?  What's my exposure to a single corrupt
> byte?

For the SQLite repository just over 3/4rd of the database is the BLOB  
table which contains compressed deltas of artifacts.  If a disk byte  
error occurs on one of the earlier deltas in a revision chain, you've  
lost all prior versions of that file.   So it is similar to the  
compressed tarball case.   On the other hand - disk byte errors are  
relatively uncommon, and if one does occurs, you can simply re-clone  
the whole repository from one of your sync partners.  Disaster  
recovery is therefore very quick and easy.

Usually the whole drive goes out, rather than changing a single byte.   
For that reason, we have clones of all our important repositories on  
geographically separated machines.  Cron jobs run a "sync"  
periodically to keep all copies aligned. So if a data center explodes,  
we still have complete backups.  See 
http://www.fossil-scm.org/fossil/doc/tip/www/selfhost.wiki 
  for details.  If we lose a data center, all we have to do is lease  
space at a new one, set up a web server, and clone one of the other  
repositories.  We could be back up in a few minutes with no data  
loss.  Contrast this with the disaster that the folks at 
http://www.firebirdsql.org/ 
  faced in December when their main disk when corrupt.  (You can see a  
synopsis of the problem in news postings on their main page.)  It took  
them a "gargantuan" effort over nearly a month to recover.  Had  
firebird been using Fossil (or something like it, perhaps implemented  
using Firebird instead of SQLite) then their recovery could have been  
accomplished in 10 minutes using a single "clone" command.

Note that running "sync" every hour is not expensive.  Fossil "sync"  
is very efficient - much more so even than rsync. See 
http://www.fossil-scm.org/fossil/doc/tip/www/stats.wiki 
  for a detailed analysis.

Software bugs seem a more serious threat to repository consistency.   
Fossil uses a variety of techniques to prevent repository damage due  
to software errors.   See the 
http://www.fossil-scm.org/fossil/doc/tip/www/selfcheck.wiki 
  page for an overview.  Bottom line:  All updates to a repository are  
made within a transaction, and that transaction will not commit until  
Fossil has verified that it can recover all content files.  This makes  
commits a little slower, but it is very effective at catching software  
problems before they can corrupt a repository.

Note also Fossil leverages the heavily tested crash recovery  
mechanisms of SQLite so that a power loss in the middle of a commit  
does not corrupt your repository.  I'm not sure what would happen if  
you took a power loss in the middle of a commit in git or hg, but I'm  
guessing that it would not be pretty.  (Somebody please correct me if  
I'm wrong.)  See section 3.3 of http://www.sqlite.org/testing.html for  
a discussion of the extensive crash testing performed on SQLite.

Another safety feature is that the "Fossil File Format" is really an  
unordered bag of artifacts.  SQLite is used for local storage, but  
that is merely an implementation detail.  (See 
http://www.fossil-scm.org/fossil/doc/tip/www/theory1.wiki 
   and http://www.fossil-scm.org/fossil/doc/tip/www/fileformat.wiki  
for further elaboration of this concept.)  The SQLite database can be  
completely reconstructed from the original, canonical artifacts at any  
time.  In fact, that is exactly what the "rebuild" command does.  The  
artifacts are normally stored as compressed deltas for space  
efficiency.  But if you want to store backups as individual files (one  
file per artifact) Andreas Kupries has created a fossil command to do  
exactly that:  "deconstruct".

In 2.5 years of operation on a variety of projects, no file has every  
been lost (to our knowledge) after having been checked into a fossil  
repository and synced to a backup machine.


D. Richard Hipp
d...@hwaci.com



_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to