John Drescher wrote:
I have seen this discussed in this list before and I believe there are several problems on top of the small chance that a file will have the same size and same md5sum but different contents. One is do we only search (for dups) in the current backup job or volume or do we include other backups and other volumes. If we inclulde other backups how do we handle the case where a file from job X is on a volume from job Y because of a duplicate and now some user has purged that volume that contains job Y.My opinion is that as backup solution we could check only in one volume, but on one volume we could have information from more file daemons. Otherwise we will have problems with volume retention and this is true mainly for removable storage (tapes, external drives...). With file system storage we could use an algorithm similar to CDP to limit the number of copies that are held in storage or age and because file system is randomly accessible and always available it will be easy to copy data. Or if database type storage type is used (why not) we could just create/delete links to a row. About searching for duplications it will be just comparing a checksum this could be done fast in SQL with b-tree indexes (I think) and if found file is not transmitted over the network, just the relevant info (filename, location, permissions etc...) -- Hristo Benev IT Manager WAVEROAD Partners in Telecommunications 514-935-2020 x225 T 514-935-1001 F www.waveroad.ca [EMAIL PROTECTED] |
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users