Hi all,

I am currently backing up several similar systems onto one volume, as probably 
most people do. To save time and space, I have the following suggestion:

In principle it is unnecessary to store the same content twice on the same 
volume. I assume that, during restore, a volume can either be read completely 
or completely gets lost, so there is no immediate benefit from storing the same 
content twice on one single volume.

Now it would be nice if bacula would never put the same file *content* twice on 
the same volume. This could happen in a transparent manner: bacula has a 
database with all files stored on each volume. When a request to store a file 
arrives, bacula could easily determine if this file has already been written to 
the volume earlier (looking up the file size and MD5 sum in all entries in the 
database for this volume) and then store something like a "backlink on the 
tape" together with the filename (which may be different from the other file 
with the same content) instead of the file itself.

Personally I think about the following implementation: The FD could, before 
transmitting a file, transmit the path/name, size and MD5 sum beforehand. Then 
then the SD could perhaps decide if it still needs the data or if it could 
simply store this "hardlink" instead. The SD would probably have to consult the 
director for database lookup.

Perhaps this idea could be merged with the "basefiles" concept: a file would 
become a basefile automatically when the same content arrives a second time on 
the same volume. But, according to my idea, there would never be a need to 
store the "basefiles" separately, so no new backup level and/or strategy would 
be needed.

Of course, restores will become more complicated: instead of working through 
one single physical session, multiple sessions will have to be read.

This concept would make volume handling much more flexible. A volume would 
never overflow because of storing the same set of files multiple times. Full 
Backups would automatically shrink when inadvertently done twice on the same 
tape.

Logically, each volume would still contain the same data, just in a different 
form. Effectively this would just be an "abstraction layer" above the physical 
volume. This would mean that most procedures would not change much, there is no 
need for a new concept (like "static files" or the like).

There are some more complications, e.g. when considering spooled catalog 
updates while streaming to a fast tape device, but these should not be 
unsurmountable.

What do you think about this?
Regards
--Marcel



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to