Robin Lee Powell wrote at about 17:02:06 -0800 on Monday, December 6, 2010: > On Mon, Dec 06, 2010 at 01:17:43PM -0800, Robin Lee Powell wrote: > > > > So, yeah. More than one link, matches something in the pool, but > > not actually linked to it. Isn't that *awesome*? ;'( > > > > I very much want BackupPC_fixLinks to deal with this, and I'm > > trying to modify it to do that now. > > Seems to be working; here's the diff. Feel free to drop the print > statements. :) > > For all I know this will eat your dog; I have no idea what else I > broke. I *do* know that it should be a flag, because I expect that > checksumming *everything* takes a very, very long time.
I looked through your code... it is certainly a quick-and-dirty patch and it may even work for your purposes but... 1. It is needlessly doing a lot of file comparisons rather than inode number comparisons so it can be much speeded up. 2. It mishandles some use cases by automatically always going down the first case of the if statement on any non-zero file... So, I re-did the logic to be significantly faster by first comparing inode numbers rather than md5sums to verify chain matches and only when that fails does it actually look at the file contents to find a potential match. I also preserved the original logic so it still works when links=1 or file size =0 or when you are not interested in verifying the pc heirarchy links. I also added a new -V (verify) flag to turn on and off this option. At the same time I did some scattered minor code cleanup -- looking back this code is a bit amateurish since it was one of the first real perl programs I ever wrote. I don't have the time though to do a thorough rewrite but I cleaned up a little and improved some of the commenting and documentation. Note the file comparison part of the code (which is really only now significant when a good fraction of the total cpool entries are dups or pc entries are missing) can probably be reduced by almost a factor of 2 if instead of using md5sums, you calculate the md4sum and compare it to the md4sum checksums that are appended to each cpool file (note this only occurs with rsync and I think only the second time the file is backed up). In general, I think the bad files are typically a small fraction of the entire pool or pc tree so it probably is not worth the effort to decode and test the md4sums. Anyway here is the diff. I have not had time to check it much beyond verifying that it seems to run -- SO I WOULD TRULY APPRECIATE IT IF YOU CONTINUE TO TEST IT AND GIVE ME FEEDBACK. Also, it would be great if you would let me know approximately what speedup you achieved with this code vs. your original. Thanks --------------------------------------------------------------------------- --- BackupPC_fixLinks.pl 2009-12-22 07:50:24.291625432 -0500 +++ BackupPC_fixLinks.pl.test 2010-12-08 23:48:43.845678288 -0500 @@ -11,7 +11,7 @@ # Jeff Kosowsky # # COPYRIGHT -# Copyright (C) 2008, 2009 Jeff Kosowsky +# Copyright (C) 2008, 2009, 2010 Jeff Kosowsky # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -29,12 +29,12 @@ # #======================================================================== # -# Version 0.2, released Aug 2009 +# Version 0.3, released December 2010 # #======================================================================== use strict; -#use warnings; +use warnings; use File::Path; use File::Find; #use File::Compare; @@ -53,7 +53,7 @@ %Conf = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my') my %opts; -if ( !getopts("i:l:fb:dsqvch", \%opts) || @ARGV > 0 || $opts{h} || +if ( !getopts("i:l:fb:Vdsqvch", \%opts) || @ARGV > 0 || $opts{h} || ($opts{i} && $opts{l})) { print STDERR <<EOF; usage: $0 [options] @@ -68,23 +68,30 @@ sure there are no holes in the pool (although this shouldn''t happen...) Options: - -i <inode file> Read innodes from file and proceed with 2nd pc tree pass - -l <link file> Read links from file and proceed with final repair pass + + -i <inode file> Read pool dups from file and proceed with 2nd pc tree pass + -l <link file> Read pool dups & bad pc links from file and proceed + with final repair pass + NOTE: -i and -l options are mutually exclusive. + -s Skip first pass of generating (or tabulating if + -i or -l options are set) cpool dups -f Fix links -c Clean up pool - schedule BackupPC_nightly to run (requires server running) - -s Skip first pass of generating/reading cpool dups -b <path> Search backups from <path> (relative to TopDir/pc) + -V Verify links of all files in pc path (WARNING: slow!) -d Dry-run -q Quiet - only print summaries & results -v Verbose - print details on each relink -h Print this usage message + EOF exit(1); } my $file = ($opts{i} ? $opts{i} : $opts{l}); -my $verbose =!$opts{q}; -my $Verbose=$opts{v}; +my $verifypc=$opts{V}; +my $notquiet =!$opts{q}; +my $verbose=$opts{v}; $dryrun = $opts{d}; #global variable in jLib.pm my $fixlinks = $opts{f}; my $runnightly = $opts{c}; @@ -202,23 +209,21 @@ # Find or read-in list of duplicate pool entries if (!$opts{s}) { # Read in or find duplicate pool entries - if ($opts{i} || $opts{l}) { #Read in previously generated list of inodes (note link entriew will be ignored if they exist) + if ($opts{i} || $opts{l}) { #Read in and tabulate previously generated list of inodes from input file (note link entries will be ignored if they exist) read_inodHOA($file); - print_inodHOA() if $verbose; + print_inodHOA() if $notquiet; } - elsif (!$opts{s}){ # Find inodes + else{ # Find inodes by recursing through the pool find(\&pool_dups, $pooldir, $cpooldir); } print "Found $totdups dups (and $collisions true collisions) with $totlinks total links and $totsize size\n"; } # Find backup files with broken/missing links or with links to duplicate pool entries -if ($opts{l}) { # Read in previously generated list of inodes && start fixing links if -r flag set +if ($opts{l}) { # Read in previously generated list of inodes & optionally start fixing links & duplicate pool entries if -f flag set read_LinkFile($file); - $totunlinked = $totnewlinks + $totnewfiles; - print "Found $totmatches matching files and $totunlinked unlinked files ($totnewfiles NewFiles, $totnewlinks NewLinks, $totmd5errs MD5Errors)\n"; } -else { +else { #Find bad links in pc path and optionally fix together with duplicate pool nodes if -f flag set foreach my $backup (@backups) { $backup =~ m#^($pc/[^/]*/[^/]*)#; $cmprsslvl = get_bakinfo($1, "compress"); #Note this is set at the level of the backup number @@ -226,9 +231,9 @@ print "Finding links in $backup\n"; find(\&find_BadOrMissingLinks, $backup); } +} $totunlinked = $totnewlinks + $totnewfiles; print "Found $totmatches matching files and $totunlinked unlinked files ($totnewfiles NewFiles, $totnewlinks NewLinks, $totmd5errs MD5Errors)\n"; -} print "Fixed $totfixed out of $totbroken links\n" if $fixlinks; run_nightly() if (!$dryrun && $runnightly); print "DONE\n"; @@ -294,7 +299,7 @@ $comparflg='#'; } $inodHOA{$inoD} = [$parent, $dup, $thepool, $comparflg.$fbyteD.$fbyteP, --$nlinkD, $sizeD]; - print "$inoD @{ $inodHOA{$inoD} }\n" if $verbose; + print "$inoD @{ $inodHOA{$inoD} }\n" if $notquiet; # print "$inoD $parent $dup $thepool $comparflg, $nlinkD $sizeD\n"; $totdups++; $totlinks += $nlinkD; @@ -302,7 +307,7 @@ return; #Earliest duplicate checksum (i.e. parent) in the chain found so stop going down chain } # No matching copies found in the chain - print "$inoD $dup COLLISION $thepool X $nlinkD $sizeD\n" if $verbose; + print "$inoD $dup COLLISION $thepool X $nlinkD $sizeD\n" if $notquiet; $collisions++; } @@ -345,13 +350,14 @@ } else {$fixed=" BROKEN$DRYRUN";} } - if ($verbose) { + if ($notquiet) { my $name = shift(@MatchA); print "\"" . $name . "\" " . join(" ", @MatchA) . "$fixed\n"; } } -# Return -1 if no match +# Return -1 if no problem detected with link +# Return -2 if can't stat file (shouldn't happen) # Return 0 if MD5Err - shouldn't happen # Return 1 if links to pool dup in %inodHoA # Return 2 if no links to pool but matching pool entry found (NewLink) @@ -367,9 +373,18 @@ unless (($devM, $inoM, $modeM, $nlinkM, $uidM, $gidM, $rdevM, $sizeM, $therestM) = stat($_)) { warnerr "Can't stat: $matchpath\n"; - return; + return -2; #This really shouldn't happen! + } + if (exists $inodHOA{$inoM}) { #File links to dup pool element in our list + @MatchA = ($matchname, $inoM, @{$inodHOA{$inoM}}); +# print "\"$matchname\" $inoM @{ $inodHOA{$inoM} }\n"; + $totmatches++; + return 1; #type=1 } - if ($nlinkM == 1 && $sizeM > 0) { # Non-zero file with no link to pool + elsif($sizeM == 0 || ($nlinkM > 1 && !$verifypc)){ + return -1; #Zero length or single-linked file + } + else { my $matchbyte = firstbyte($matchpath); my $comparflg = 'x'; # Default if no link to pool my $matchtype = "NewFile"; # Default if no link to pool @@ -384,11 +399,21 @@ } my $thepool = ($cmprsslvl > 0 ? "cpool" : "pool"); my $thepooldir = ($cmprsslvl > 0 ? $cpooldir : $pooldir); - my $md5sumpath = my $md5sumpathbase = $bpc->MD52Path($md5sum, 0, $thepooldir); + my $md5sumpathbase = $bpc->MD52Path($md5sum, 0, $thepooldir); my $i; - for ($i=-1; -f $md5sumpath ; $md5sumpath = $md5sumpathbase . '_' . ++$i) { - #Again start at the root, try to find best match in pool... - if ((my $cmpresult = compare_files ($matchpath, $md5sumpath, $cmprsslvl)) > 0) { #match found + if($verifypc) { + for ($i=-1, my $md5sumpath = $md5sumpathbase; + -f $md5sumpath; $md5sumpath = $md5sumpathbase . '_' . ++$i) { + #Start at the root, looking for inode match in the pool... + return -1 if($inoM == (stat($md5sumpath))[1]); + } + #Otherwise, pc file not found in pool + } + # Now we know we have a pc file that doesn't link to the pool... + for ($i=-1, my $md5sumpath = $md5sumpathbase; + -f $md5sumpath; $md5sumpath = $md5sumpathbase . '_' . ++$i) { + #Again start at the root, try to find file content match in pool... + if ((my $cmpresult = compare_files ($matchpath, $md5sumpath, $cmprsslvl)) > 0) { #Exact file match found my $inod =(stat($md5sumpath))[1]; #inode if (exists $inodHOA{$inod}) { #Oops target set to be relinked @@ -407,9 +432,9 @@ $totnewlinks++; $rettype=2; #NewLink goto match_return; - } #Otherwise, continue to move up the chain looking for a pool match... + } #Otherwise, continue up the chain looking for a pool match... } - $totnewfiles++; #Otherwise must be a NewFile + $totnewfiles++; #Otherwise must be a NewFile since not found in pool my $fullmd5sum = zFile2FullMD5($bpc, $md5, $matchpath, $cmprsslvl); ($md5sum .= '_' . $i) if $i >= 0; # Name of first empty pool slot if ($md5sumhash{$fullmd5sum}) { #Already seen before! @@ -427,13 +452,6 @@ # print "\"$matchname\" $inoM $md5sum $matchtype $thepool ${comparflg}${matchbyte}${md5sumbyte} $nlinkM $sizeM\n"; return $rettype; } - elsif (exists $inodHOA{$inoM}) { #File links to dup element in our list - @MatchA = ($matchname, $inoM, @{$inodHOA{$inoM}}); -# print "\"$matchname\" $inoM @{ $inodHOA{$inoM} }\n"; - $totmatches++; - return 1; #type=1 - } - else { return -1;} #No dup or single-linked file } #Read in link file for matching pool md5sums(dups), NewFiles, NewLinks; don't read in MD5Err entries or other errors @@ -458,7 +476,7 @@ } my $name = shift(@MatchA); print "\"" . $name . "\" " . join(" ", @MatchA) . "$fixed\n" - if $matchtype >= 0 && $verbose; + if $matchtype >= 0 && $notquiet; } } @@ -536,7 +554,7 @@ warnerr "\"$matchname\" - link from \"$md5sum\" failed\n"; return -1; } - print "\"$matchname\" successfully (re)linked from $matchtype [$inoM] to $md5sum [$inoP]" if $Verbose; + print "\"$matchname\" successfully (re)linked from $matchtype [$inoM] to $md5sum [$inoP]" if $verbose; return 1; } elsif ($type == 3 && $matchtype =~ m|^NewFile$|) { #New File @@ -552,12 +570,12 @@ } $md5sumpath =~ m|(.*)/|; # Find the containing directory jmkpath($1, 0, 0777) if (!-d $1); - print "\"$matchname\" - Making new pool directory $1\n" if ($Verbose && ! -d $1); + print "\"$matchname\" - Making new pool directory $1\n" if ($verbose && ! -d $1); if (!jlink($matchpath, $md5sumpath)){ # Note reverse order of link from types 1&2 warnerr "\"$matchname\" - link to \"$md5sum\" failed\n"; return -1; } - print "\"$matchname\" successfully linked to new file $md5sum [$inoM]" if $Verbose; + print "\"$matchname\" successfully linked to new file $md5sum [$inoM]" if $verbose; return 1; } else { ------------------------------------------------------------------------------ This SF Dev2Dev email is sponsored by: WikiLeaks The End of the Free Internet http://p.sf.net/sfu/therealnews-com _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/