Hi, Jim Wilcoxson wrote on 2009-08-31 08:08:48 -0400 [Re: [BackupPC-users] Keeping servers in sync]: > [...] > I did some reading today about BackupPC's storage layout and design. > I haven't finished yet, but one thing stuck out: > > "BackupPC_link reads the NewFileList written by BackupPC_dump and > inspects each new file in the backup." > > To speed up incrementals, HashBackup could make use the NewFileList.
BackupPC_link deletes the NewFileList upon completion. Surely BackupPC could be changed to keep the NewFileList (as NewFileList.N for N = backup number) instead, but it's a bit awkward, because BackupPC no longer needs this information, and it's strange information at that (a list of all files in the backup that were not linked to pool files but need to be). It is really only meaningful for the communication between BackupPC_dump and BackupPC_link. It might be helpful for incremental pool backups, maybe (but only of {c,}pool/, not of pc/). 1.) Robustness - do you want to trust the contents of files on a file system and risk missing pool files in your copy, because they are for whatever reason not listed? 2.) Completeness - you need to account for pool chain renumbering (and deletion of pool files). Unless BackupPC_nighly also provides information on what it changed, you need to traverse the pool anyway. 3.) Which NewFileList.* files would you want to look at? Presumably those for all backups, for which you need to copy the pc/host/num/ tree. 4.) How do you handle trees of backups that are in progress? > Reading the > NewFileList might be a way to speed up an incremental backup of the > BackupPC pool, though incremental scans are fairly quick already. I tend to think that you would introduce dependencies (on the BackupPC version) for an insignificant gain. > Another thing about BackupPC is that by my reading, new files are > first written to the PC area, then pool links are created by > BackupPC_link. This suggests that backing up the pool last might > improve performance, because it is likely to be more fragmented. I'm not sure about that. Full backups contain links to all files in the corresponding pc/host/num/ tree, which will be to pool files, wherever on the disk they might be. Incremental backups don't only contain files that are new to the pool (NewFileList) but also links to existing pool files with the same content. Again, it's impossible to predict where on the disk they might be. > Right now, HB will backup cpool first, then pc, then pool, in that > order. It might be better to backup pc first, then cpool and pool. > I'm not sure how much of a difference it would make, if any, because > it's hard to predict disk layouts in any filesystem. I'm sceptical that it will make any systematic difference. For one pool, it might be significantly faster one way, for another pool, the other way. What exactly is the speed advantage you are hoping for? Having inode information in cache from one part to the next (i.e. {c,}pool/ vs. pc/ traversal), or reading file content for multiple small files? Or are you thinking about the resulting speed of your HashBackup pool? Regards, Holger ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/