Hi, Jeffrey J. Kosowsky wrote on 2008-12-08 09:37:16 -0500 [Re: [BackupPC-users] Advice on creating duplicate backup server]: > > It just hit me that given the known architecture of the pool and cpool > directories shouldn't it be possible to come up with a scheme that > works better than either rsync (which can choke on too many hard > links) and 'dd' (which has no notion of incremental and requires you > to resize the filesystem etc.).
yes, that hit someone on the list several years ago (I don't remember the name, sorry). I implemented the idea he sketched (well, more or less, there's some work left to make it really useful). > My thought is as follows: > 1. First, recurse through the pc directory to create a list of > files/paths and the corresponding pool links. > Note that finding the pool links can be done in one of several > ways: > - Method 1: Create a sorted list of pool files (which should be > significantly shorter than the list of all files due to the > nature of pooling and therefore require less memory than rsyn) > and then look up the links. Wrong. You need one entry per inode that points to an arbitrary path (the first one you copy). Every file(*) is in the pool, meaning a list of all pool files is exactly what you need. A different way to look at it: every file with a link count > 1 is a pooled file, and it's these files that cause rsync&co problems, not single link files. (Well, yes, rsync pre-3 needed a complete list of all files.) (*) Files that are not in the pool: 1.) 0-byte files. They take up no file system blocks, so pooling them saves only inodes. Not pooling them makes things simpler. 2.) log files (they get appended to; that would make pooling somewhat difficult; besides, what chance is there of a pool hit?), backups files (including backups.old) attrib files are pooled, contrary to popular belief, and that makes sense, because they are often identical with the same attrib file from the previous backup(s). The algorithm I implemented is somewhat similar: 1.) Walk pool/, cpool/ and pc/, printing information on the files and directories to a file (which will be quite large; by default I put it on the destination pool FS, because there should be large amounts of space there). 2.) Sort the file with the 'sort' command. The lines in the file are designed such that they will be sorted into a meaningful order: - directories first, so I can create them and subsequently not worry about whether the place I want to copy/link a file to already exists or not - files next, sorted by inode number, with the (c)pool file preceeding its pc/ links The consequence is that I get all references to one inode on adjacent lines. The first time, I copy the file. For the repetitions, I link to the first copy. All I need to keep in memory is something like one line from the file list, one "previous inode number", one "file name of previous inode". 'sort' handles huge files quite nicely, but it seems to create large (amounts of) files under /tmp, possibly under $TMPDIR if you set that (not sure). You need to make sure you've got the space, but if you're copying a multi-GB/TB pool, you probably have. My guess is that the necessary amount of space roughly equals the size of the file I'm sorting. 3.) Walk the sorted file, line by line, creating directories and copying files (with File::Copy::cp, but I plan to change that to PoolWrite, so I can add (part of) one pool to an existing second pool, or something that communicates over TCP/IP, so I can copy to a different machine) and linking files (with Perl function link()). In theory, a pool could also be compressed or uncompressed on the fly (uncompressed for copying to zfs, for instance). Once again, because people seem to be determined to miss the point: it's *not* processing by sorted inode numbers in order to save disk seeks that is the point, it's the fact that the 'link' system call takes two paths link $source_path, $dest_path; # to use Perl notation while the 'stat' system call gives you only an inode number. To link a filename to a previously copied inode, you need to know the name you copied it to. A general purpose tool can't know when it will need the information, so it needs to keep information on all inodes with link count > 1 it has encountered. You can keep a mapping of inode_number->file_name in memory for a few thousand files, but not for hundreds of millions. By sorting the list by inode number, I can be sure that I'll never need the info for one inode again once I've reached the next inode, so I only have to keep info for one file in memory, regardless of how many I'm copying. The difficult part is now the 'sort', but, again, the 'sort' command is good at handling huge files - probably without limit to the file size. So, what's the problem? Well, I used it once, because I needed to copy a pool. It seemed to Work For Me (tm), but I'm unsure how to verify the result, aside from randomly looking at a few files and hoping the 99.99999% I didn't look at are ok too. It's far from complete. Its usefulness is limited, as long as I can't copy to a remote machine. It only handles the pool/, cpool/ and pc/ directories, the rest needs to be copied by hand. There is debug output for cases I hope to not encounter but which may be present in other people's pools. I think I'm still missing a chown() or two. Let's see if I can find the figures. My pool was 103 GB, 10 million directory entries pointing to 4 million inodes. Copy from local disk to ISCSI target over shared 100 MBit network in 10:45 hours. Of this time, 15 minutes were spent examining cpool (pool was empty), 73 minutes examining pc/, 165 seconds sorting file list (1.1GB) and 9:14 hours copying/linking. rsync might have worked for this pool, but I didn't test. I would be very curious how this scales to a 3TB pool though ;-). Question (to a tar expert named Craig or otherwise): Is it possible to create a tar stream with this structure (i.e. lots of directories, then file 1/2/3/123whatever with content, then several links in different directories under pc/ to this file, then next pool file and so on), or does a tar need to be sorted by directories? If it *is* possible, creating a tar stream instead of copying/linking would not be difficult, and then you could run BackupPC_copyPool ... | ssh hostname tar xpf - (or via netcat, or even store the result on tape). Even merging into an existing pool could be split into a BackupPC_mergePool script which takes a tar stream and does whatever is necessary. > - Method 2: Calculate the md5sum file path of the file to determine > out where it is in the pool. Where necessary, determine among > chain duplicates That's basically what BackupPC_tarPCCopy does. > - Method 3: Not possible yet but would be possible if the md5sum > file paths were appended to compressed backups. This would add very > little to the storage but it would allow you to very easily > determine the right link. If so then you could just read the link > path from the file. I believe this would speed up BackupPC_tarPCCopy by many orders of magnitude. > 2. Then rsync *just* the pool -- this should be no problem since by > definition there are no hard links within the pool itself > > 3. Finally, run through the list generated in #1 to create the new pc > directory by creating the necessary links (and for files with no > hard links, just copy/rsync them) See BackupPC_tarPCCopy. > The above could also be easily adapted to allow for "incremental" syncing. > Specifically, in #1, you would use rsync to just generate a list of > *changed* files in the pc directory. In #2, you would continue to use > rsync to just sync *changed* pool entries. In #3 you would only act on > the shortened incremental sync list generated in #1. While you can limit your pc/ directory traversal to only a subset of all backups of a host (or all hosts, if you give a start date, for example), I don't quite see how syncing the pool should work. Remember that pool files with hash collisions may be renumbered. Is this supposed to be limited to re-creating an identical pool? Even then, renaming a pool file does not affect the pc/ links to it. Overwriting it with different content does. You would need to re-establish the correct links for existing backups too, or figure out how the source pool was changed and replicate the changes to the destination pool (rm foo_2 foo_3; mv foo_4 foo_2). This can be done, but not with rsync, as far as I can tell. > The more I think about it, the more I LIKE the idea of appending the > md5sums file paths to compressed pool files (Method #3) Yes, but the question is how often this information is needed. We're going to a lot of trouble to *eliminate* redundancy. Adding redundancy for a case every 100th user is going to encounter once in his life may not be warranted. Then again, it's a fixed amount per file and probably not enough to worry about ... Regards, Holger ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/