Hi, John Rouillard wrote on 2011-03-31 15:20:23 +0000 [[BackupPC-users] BackupPC_dump hangs with: .: size doesn't match (12288 vs 17592185913344)]: > [...] > I get a bunch of output (the share being backed up /etc on a centos > 5.5. box) which ends with: > > attribSet: dir=f%2fetc exists > attribSet(dir=f%2fetc, file=zshrc, size=640, placeholder=1) > Starting file 0 (.), blkCnt=134217728, blkSize=131072, remainder=0 > .: size doesn't match (12288 vs 17592185913344)
at first glance, this would appear to be an indication of something I have been suspecting for a long time: corruption - caused by whatever - in an attrib file leading to the SIGALRM abort. If I remember correctly, someone (presumably File::RsyncP) would ordinarily try to allocate space for the file (though that doesn't seem to make sense, so I probably remember incorrectly) and either gives up when that fails or refrains from trying in the first place, because the amount is obviously insane. The weird thing in this case is that we're seeing a directory. There is absolutely no reason (unless I am missing something) to worry about the *size* of a directory. The value is absolutely file system dependant and not even necessarily an indication of the *current* amount of entries in the directory. In any case, you restore the contents of a directory by restoring the files in it, and you (incrementally) backup a directory by determining if any files have changed or been added. The *size* of a directory will not help with that decision. Then again, the problematic file (or attrib file entry) may or may not be the last one reported (maybe it's the first one not reported?). > [...] I have had similar hanging issues before > but usully scheduling a full backup or removing a prior backup or two > in the chain will let things work again. However I would like to > actually get this fixed this time around as it seems to be occurring > more often recently (on different backuppc servers and against > different hosts). I agree with you there. This is probably one of the most frustrating problems to be encountered with BackupPC, because there is no obvious cause and nothing obvious to correct (throwing away part of your backup history for no better reason than "after that it works again" is somewhat unsatisfactory). The reason not to investigate this matter any further so far seems to have been that it is usually "solved" by removing the reference backup (I believe simply running a full backup will encounter the same problem again), because people tend to want to get their backups back up and running. There are two things to think about here: 1.) Why does attrib file corruption cause the backup to hang? Is there no sane(r) way to deal with the situation? 2.) How does the attrib file get corrupted in the first place? Presuming it *is* attrib file corruption. Could you please send me a copy of the attrib file off-list? > If I dump the root attrib file (where /etc starts) for either last > successful or the current (partial) failing backup I see: > > '/etc' => { > 'uid' => 0, > 'mtime' => 1300766985, > 'mode' => 16877, > 'size' => 12288, > 'sizeDiv4GB' => 0, > 'type' => 5, > 'gid' => 0, > 'sizeMod4GB' => 12288 > }, I would expect the interesting part to be the '.' entry in the attrib file for '/etc' (f%2fetc/attrib of the last successful backup, that is). And I would be curious about how the attrib file was decoded, because I'd implement decoding differently from how BackupPC does, though BackupPC's method does appear to be well tested ;-). > [...] the last few lines of strace show: > > [...] > 19368 15:00:38.199634 select(1, [0], [], NULL, {60, 0}) = 0 (Timeout) > <59.994597> I believe this is the result of File::RsyncP having given up on the transfer because of either a failing malloc() or a suppressed malloc(). I'll have to find some time to check in more detail. I vaguely remember it was a rather complicated matter, and there was never really enough evidence to support that corrupted attrib files were really the cause. But I sure would like to get to the bottom of this :-). Regards, Holger ------------------------------------------------------------------------------ Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/