Hi, Les Mikesell wrote on 16.05.2007 at 13:55:04 [Re: [BackupPC-users] Advice on BackupPC]: > Vetch wrote: > > I have a two site network [...] Our bandwidth is limited [...] > > I want to backup my data from one site to the other... > > In order to assess whether that would be do-able, I went to an > > exhibition of backup technologies. > > One that caught my eye was a company called Data Domain, who claimed to > > de-duplicate data at the block level of 16KB chunks... > > Apparently, all they send are the changed chunks and the schema to > > retrieve the data. > > Backuppc can use rsync to transfer the data. Rsync works by reading > through the file at both ends, exchanging block checksums to find the > changed parts.
the important part about this is that rsync compares a file with the version in the reference backup (last incremental of lower level or full backup). Consequentially, a new file will be transfered in full even if an identical file exists in the pool. De-duplication happens on the file level after transfer. As far as I know, rsync uses 2KB chunks of the file, so you may need to transfer less data in some cases than with 16KB chunks. On the other hand, more checksums will need to be transfered in the general case. rsync incremental backups take file attributes into account (modification time, permissions etc.) and only transfer apparently changed files, using block checksums as with full backups. > > Does it send the changed data down the line and then check to see if it > > already has a copy, or does it check then send? In general, it sends data and then checks (on-the-fly, without creating a temporary copy for existing files). With rsync, it is possible to cut down bandwidth requirements by comparing against the previous version of the respective file. > > The other thing is, can BackupPC de-duplicate at the block level or is > > it just file level? > > I'm thinking that block level might save considerable amounts of > > traffic, because we will need to send file dumps of Exchange databases > > over the wire... > > ... Which I assume will mean that we've got about 16GB at least to copy > > everyday, since it'll be creating a new file daily... File level. That means you'll have a new file every day. Unless you happen to have other files with identical contents, pooling won't gain you anything for these files, though compression might. > > On the other hand, would 16KB blocks be duplicated that regularly - I > > imagine there is a fair amount of variability in 16KB of ones and zeros, > > and the chances of them randomly reoccurring without being part of the > > same file, I would say are slim... Well, for your database dumps, that would be sufficient, wouldn't it? If you've got multiple copies of a 16GB database file and each differs only by a few MB, that would leave a lot of identical blocks. Considering we're talking about a M|([EMAIL PROTECTED] product, I wouldn't bet on the dump format being especially convenient, though. They've probably got a variable length header format just for the sake of defeating block-level de-duplication strategies :-). > > What do you think? > > I think rsync will do it as well as it can be done. For the transfer: yes - if the database dumps are always stored in the same file. If you have a new file name each day (including the date, for instance), then rsync won't help you at all. For storage, the transfer method is irrelevant. > You can test the transfer efficiency locally first to get an idea of how > well the common blocks are handled. Correct. You can do this for single files (database dumps) or the whole file tree you want to back up. For your database dumps, rsync should also give you a hint, how much savings block-level de-duplication could gain you. If rsync can't speed up the transfer, de-duplication likely won't save any disk space. BackupPC is not difficult to set up. You could simply test how well it works for you before deciding to spend money on a commercial product. BackupPC has its limits which may make a commercial product the better choice for you. But then, the commercial product probably also has its limits, and the question is whether they are so well documented. If it's only the block-level de-duplication, disk space might be cheaper than software. Regards, Holger P.S.: For LVM snapshots, the problem is also that de-duplication take place at file level. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/