Hi Kern,
On Friday 02 July 2010, Kern Sibbald wrote:
> On Thursday 01 July 2010 21:46:50 Howard Thomson wrote:
> > Hi Kern,
> >
> > On Thursday 01 July 2010, Kern Sibbald wrote:
> > > Hello Howard,
> > >
> > > What does "chunked" backup mean exactly? I am not sure what the high
> > > level concept is here. Bacula can already backup multi-gigabyte virtual
> > > disks, so obviously you are thinking about something different.
> >
> > The concept that I am calling 'chunked backup' is sub-file incremental
> > backup.
> >
> > Currently, for a 10Gb Virtualbox virtual disk, a Full-backup will backup
> > the whole file.
> >
> > Subsequent incremental backups, where perhaps only 1Mb of the virtual-disk
> > has changed, will backup the entire [10Gb] single file, because it has
> > changed.
> >
> > Bacula currently records a hash-value for the entire file, whereas I am
> > intending, in addition and for appropriately large files, to record a
> > hash-value for sub-file chunks, to be able to selectively not backup those
> > chunks when doing an incremental / differential backup.
>
> OK, now I understand. This is a feature that we are working on -- it is
> actually a form of deduplication. Before implementing it, there are a number
> of things that need to be decided and some important changes in Bacula that
> need to be made.
>
> 1. By the way, I call these "deltas" that is it is some change to the
> originally backed up image that must be applied. However, what is different
> from an Incremental is two things: 1. only a part of the file is saved. 2.
> *all* the deltas must be restored (not just the most recent as is what
> happens for incremental backups).
>
> 2. From the above, you can see that we need some way of marking these as
> deltas rather than incremental. Perhaps it could simply be called a "delta"
> backup level rather than Incremental.
>
> 3. We need to decide how the "deltas" are going to be generated -- there
> needs
> to be something to figure out what has changed, which means, in general, you
> need access to the previous backups or some form of hashing done by
> deduplication code.
>
> 4. Determine how the deltas are gong to be stored -- actually, IMO, that is
> trivial it just needs a very small amount of code that looks much like the
> sparse file handling code -- we may even be able to use the same code.
>
> >
> > I want to use Bacula to do full + incremental backups of my own system, to
> > disk, without separating out virtual-disks into separate backups, with
> > different recycle criteria for space constraint reasons.
> >
> > Current [admittedly] simple-minded incremental backups of my file-tree are
> > much larger than they need to be ...
>
> Yes, much larger. We have some Bacula Systems scripts that help with this
> for
> VirtualBox, but it is not integrated with Bacula as deltas would be.
>
> This whole subject is non-trivial.
It is certainly non-trivial ...
Delta backup, to use your terminology, requires:
1/ Retrieve file-offset / hash-code pairs for file being backed up
2/ Generate hash-code for each file-offset otherwise selected to backup
3/ Lookup file-offset in retrieved list and proceed with backup if
either
not found [sparse file chunk not backed up] or found but
different
4/ Store all newly generated file-offset / hash-code pairs to the
database.
Restore, of a delta backed-up file requires:
5/ Retrieve jobid (?) / file-offset pairs from database
6/ For each backup-stream read, selectively restore deltas as needed.
Restoring all deltas, in the right order, would work but be
bandwidth inefficient.
In looking at all the relevant code, I am finding that the interation with the
database,
directly and indirectly, is the least obvious structure to extend and change ...
The comment on sparse file handling is, of course, correct and I am treating
delta file
backup as a special case of sparse file backup.
It seems to be the responsibility of the SD to send relevant updates to the
Director, currently
at the end of each file. However, the SD has no knowledge of which file-offsets
of a sparse file
it has processed on behalf of an FD, so I am unclear at present as to how, and
when, /4 updates
to the database will occur.
When you say that this is being worked on, is it worth me continuing with
current work-in-progress ?
I haven't altered many files yet in my git repo; I've spent more time reading
code than writing it so far ...!
Regards,
Howard
--
"Only two things are infinite, the universe and human stupidity,
and I'm not sure about the former." -- Albert Einstein
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel