[BackupPC-devel] BackupPC 4.0 features - reference counting

Craig Barratt Wed, 02 Mar 2011 11:52:01 -0800

In 3.x hardlinks are used for reference counting of pool files.

In 4.x hardlinks are not used.  Reference counting is done using simple
flat file databases.  For every file (digest) in the pool a reference
count is maintained.  This count is the number of backup instances that
refer (ie: use) this file.  The reference counts need to be updated as
backups are completed and as backups are removed.  The sole purpose of
the reference counts is to determine when a pool file is no longer used
by any backups and can therefore be removed.


The main risk of using application-level reference counting, rather
than file-system reference counting with hardlinks, is the risk of
inconsistency due to bugs, abnormally terminated backup processes,
race conditions and a file system error could cause the entire
reference count database to be corrupted.

However, the benefits significantly outweigh the drawbacks:

 - eliminating hardlinks means the backup storage is much easier
   to replicate, copy or restore.

 - determining which pool files can be deleted is much more
   efficient, since only the reference count database needs
   to be searched for reference counts of 0.  It is no longer
   necessary to stat() every file in the pool, which is very
   time consuming on large pools.

It is not necessary to update the reference counts in real time, so
the implementation is a lot simpler and more efficient.  In fact,
the reference count updating is done as part of the BackupPC_nightly
process.

The reference count database is stored in 128 different files,
based on the first byte of the digest anded with 0xfe.  Therefore
the file:

    CPOOL_DIR/4e/poolCnt

stores all the reference counts for digests that start with 0x4e or
0x4f.  The file itself is the result of using Storable::store() on a
hash whose key is the digest and value is the reference count.  This
is a compact format for storing the perl data structure.  The entire
file is read or written in a batch like manner - it is not intended
for dynamic updates of individual entries.

When backups are done or backups are deleted, a file is created
that records the changes in reference counts.  For example, if a
backup is being done, and a new file is matched to an existing
pool file, then the reference count for that pool file needs to be
incremented.  Similarly, if a backup is deleted so that a given
pool file is no longer referenced, then that reference count needs
to be decremented.  Remember that backups are stored as reverse-time
deltas in 4.x, so there are a few subtle issues about how reference
counts change.  For example, if a file was present in the prior backup,
but has been removed prior to the current backup, then the reference
count doesn't change - the file is simply moved from the current backup
to the prior backup.

Those "pool reference delta" files are stored in each PC's backup
directory, and also in the trash directory.  There could be many of
these as backups are done and others are deleted.  Their name has
the form "tpoolCntDelta.PID.NNN" as they are being written, where
PID is the process ID of the writing process, and NNN is a number
to ensure the file is unique.  Once the file is closed, it is renamed
to "poolCntDelta.PID.NNN".  Each PC directory and the trash directory
could have several or many of these files.

The script bin/BackupPC_refCountUpdate reads all the poolCntDelta*
files in the PC and trash directories, and updates the poolCnt
files below CPOOL_DIR and POOL_DIR.  If it encounters any errors
it does its best to restore all the files to their original
form.

A script bin/BackupPC_fsck can be used to verify the reference
counts and/or to fix them.  BackupPC cannot be running when
BackupPC_fsck is used.

Craig

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
BackupPC-devel mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

[BackupPC-devel] BackupPC 4.0 features - reference counting

Reply via email to