Robin Lee Powell wrote at about 17:02:06 -0800 on Monday, December 6, 2010:
 > On Mon, Dec 06, 2010 at 01:17:43PM -0800, Robin Lee Powell wrote:
 > > 
 > > So, yeah.  More than one link, matches something in the pool, but
 > > not actually linked to it.  Isn't that *awesome*?  ;'(
 > > 
 > > I very much want BackupPC_fixLinks to deal with this, and I'm
 > > trying to modify it to do that now.
 > 
 > Seems to be working; here's the diff.  Feel free to drop the print
 > statements.  :)
 > 
 > For all I know this will eat your dog; I have no idea what else I
 > broke.  I *do* know that it should be a flag, because I expect that
 > checksumming *everything* takes a very, very long time.

I looked through your code... it is certainly a quick-and-dirty patch
and it  may even work for your purposes but...
1. It is needlessly doing a lot of file comparisons rather than inode
   number comparisons so it can be much speeded up.
2. It mishandles some use cases by automatically always going down
   the first case of the if statement on any non-zero file...

So, I re-did the logic to be significantly faster by first comparing inode
numbers rather than md5sums to verify chain matches and only when that
fails does it actually look at the file contents to find a potential
match. I also preserved the original logic so it still works when
links=1 or file size =0 or when you are not interested in verifying
the pc heirarchy links. I also added a new -V (verify) flag to turn on
and off this option.

At the same time I did some scattered minor code cleanup -- looking
back this code is a bit amateurish since it was one of the first real
perl programs I ever wrote. I don't have the time though to do a
thorough rewrite but I cleaned up a little and improved some of the
commenting and documentation.

Note the file comparison part of the code (which is really only now
significant when a good fraction of the total cpool entries are dups
or pc entries are missing) can probably be reduced by almost a factor
of 2 if instead of using md5sums, you calculate the md4sum and compare
it to the md4sum checksums that are appended to each cpool file (note
this only occurs with rsync and I think only the second time the file
is backed up). In general, I think the bad files are typically a small
fraction of the entire pool or pc tree so it probably
is not worth the effort to decode and test the md4sums.

Anyway here is the diff. I have not had time to check it much beyond
verifying that it seems to run -- SO I WOULD TRULY APPRECIATE IT IF
YOU CONTINUE TO TEST IT AND GIVE ME FEEDBACK. Also, it would be great
if you would let me know approximately what speedup you achieved with
this code vs. your original.

Thanks

---------------------------------------------------------------------------
--- BackupPC_fixLinks.pl        2009-12-22 07:50:24.291625432 -0500
+++ BackupPC_fixLinks.pl.test   2010-12-08 23:48:43.845678288 -0500
@@ -11,7 +11,7 @@
 #   Jeff Kosowsky
 #
 # COPYRIGHT
-#   Copyright (C) 2008, 2009  Jeff Kosowsky
+#   Copyright (C) 2008, 2009, 2010  Jeff Kosowsky
 #
 #   This program is free software; you can redistribute it and/or modify
 #   it under the terms of the GNU General Public License as published by
@@ -29,12 +29,12 @@
 #
 #========================================================================
 #
-# Version 0.2, released Aug 2009
+# Version 0.3, released December 2010
 #
 #========================================================================
 
 use strict;
-#use warnings;
+use warnings;
 use File::Path;
 use File::Find;
 #use File::Compare;
@@ -53,7 +53,7 @@
 %Conf   = $bpc->Conf(); #Global variable defined in jLib.pm (do not use 'my')
 
 my %opts;
-if ( !getopts("i:l:fb:dsqvch", \%opts) || @ARGV > 0 || $opts{h} ||
+if ( !getopts("i:l:fb:Vdsqvch", \%opts) || @ARGV > 0 || $opts{h} ||
         ($opts{i} && $opts{l})) {
     print STDERR <<EOF;
 usage: $0 [options]
@@ -68,23 +68,30 @@
   sure there are no holes in the pool (although this shouldn''t happen...)
 
   Options:
-    -i <inode file>  Read innodes from file and proceed with 2nd pc tree pass
-    -l <link file>   Read links from file and proceed with final repair pass
+
+    -i <inode file>  Read pool dups from file and proceed with 2nd pc tree pass
+    -l <link file>   Read pool dups & bad pc links from file and proceed
+                     with final repair pass
+                     NOTE: -i and -l options are mutually exclusive. 
+    -s               Skip first pass of generating (or tabulating if
+                     -i or -l options are set) cpool dups
     -f               Fix links
     -c               Clean up pool - schedule BackupPC_nightly to run 
                      (requires server running)
-    -s               Skip first pass of generating/reading cpool dups
     -b <path>        Search backups from <path> (relative to TopDir/pc)
+    -V               Verify links of all files in pc path (WARNING: slow!)
     -d               Dry-run
     -q               Quiet - only print summaries & results
     -v               Verbose - print details on each relink
     -h               Print this usage message
+
 EOF
 exit(1);
 }
 my $file = ($opts{i} ? $opts{i} : $opts{l});
-my $verbose =!$opts{q};
-my $Verbose=$opts{v};
+my $verifypc=$opts{V};
+my $notquiet =!$opts{q};
+my $verbose=$opts{v};
 $dryrun = $opts{d}; #global variable in jLib.pm
 my $fixlinks = $opts{f};
 my $runnightly = $opts{c};
@@ -202,23 +209,21 @@
 
 # Find or read-in list of duplicate pool entries
 if (!$opts{s}) {  # Read in or find duplicate pool entries
-       if ($opts{i} || $opts{l}) { #Read in previously generated list of 
inodes (note link entriew will be ignored if they exist)
+       if ($opts{i} || $opts{l}) { #Read in and tabulate previously generated 
list of inodes from input file (note link entries will be ignored if they exist)
                read_inodHOA($file);
-               print_inodHOA() if $verbose;
+               print_inodHOA() if $notquiet;
        }
-       elsif (!$opts{s}){ # Find inodes
+       else{ # Find inodes by recursing through the pool
                find(\&pool_dups, $pooldir, $cpooldir); 
        }
        print "Found $totdups dups (and $collisions true collisions) with 
$totlinks total links and $totsize size\n";
 }
 
 # Find backup files with broken/missing links or with links to duplicate pool 
entries
-if ($opts{l}) { # Read in previously generated list of inodes && start fixing 
links if -r flag set
+if ($opts{l}) { # Read in previously generated list of inodes & optionally 
start fixing links & duplicate pool entries if -f flag set
        read_LinkFile($file);
-       $totunlinked = $totnewlinks + $totnewfiles;
-       print "Found $totmatches matching files and $totunlinked unlinked files 
($totnewfiles NewFiles, $totnewlinks NewLinks, $totmd5errs MD5Errors)\n";
 }
-else {
+else { #Find bad links in pc path and optionally fix together with duplicate 
pool nodes if -f flag set
        foreach my $backup (@backups) {
                $backup =~ m#^($pc/[^/]*/[^/]*)#;
                $cmprsslvl = get_bakinfo($1, "compress"); #Note this is set at 
the level of the backup number
@@ -226,9 +231,9 @@
                print "Finding links in $backup\n";
                find(\&find_BadOrMissingLinks, $backup);
        }
+}
        $totunlinked = $totnewlinks + $totnewfiles;
        print "Found $totmatches matching files and $totunlinked unlinked files 
($totnewfiles NewFiles, $totnewlinks NewLinks, $totmd5errs MD5Errors)\n";
-}
 print "Fixed $totfixed out of $totbroken links\n" if $fixlinks;
 run_nightly() if (!$dryrun && $runnightly);
 print "DONE\n";
@@ -294,7 +299,7 @@
                        $comparflg='#';
                }
                $inodHOA{$inoD} = [$parent, $dup, $thepool, 
$comparflg.$fbyteD.$fbyteP, --$nlinkD, $sizeD];
-               print "$inoD @{ $inodHOA{$inoD} }\n" if $verbose;
+               print "$inoD @{ $inodHOA{$inoD} }\n" if $notquiet;
 #              print "$inoD $parent $dup $thepool $comparflg, $nlinkD 
$sizeD\n";
                $totdups++;
                $totlinks += $nlinkD;
@@ -302,7 +307,7 @@
                return;  #Earliest duplicate checksum (i.e. parent) in the 
chain found so stop going down chain
        }
        # No matching copies found in the chain
-       print "$inoD $dup COLLISION $thepool X $nlinkD $sizeD\n" if $verbose;
+       print "$inoD $dup COLLISION $thepool X $nlinkD $sizeD\n" if $notquiet;
        $collisions++;
 }
 
@@ -345,13 +350,14 @@
                }
                else {$fixed=" BROKEN$DRYRUN";}
        }
-       if ($verbose) {
+       if ($notquiet) {
                my $name = shift(@MatchA);
                print "\"" . $name . "\" " . join(" ", @MatchA) . "$fixed\n";
        }
 }
 
-# Return -1 if no match
+# Return -1 if no problem detected with link
+# Return -2 if can't stat file (shouldn't happen)
 # Return 0 if MD5Err - shouldn't happen
 # Return 1 if links to pool dup in %inodHoA
 # Return 2 if no links to pool but matching pool entry found (NewLink)
@@ -367,9 +373,18 @@
        unless (($devM, $inoM, $modeM, $nlinkM, $uidM, $gidM, $rdevM, $sizeM, 
$therestM)
                        = stat($_)) {
                warnerr "Can't stat: $matchpath\n";
-               return;
+               return -2; #This really shouldn't happen!
+       }
+       if (exists $inodHOA{$inoM}) { #File links to dup pool element in our 
list
+               @MatchA = ($matchname, $inoM, @{$inodHOA{$inoM}});
+#              print "\"$matchname\" $inoM @{ $inodHOA{$inoM} }\n";
+               $totmatches++;
+               return 1;  #type=1
        }
-       if ($nlinkM == 1 && $sizeM > 0) { # Non-zero file with no link to pool
+       elsif($sizeM == 0 || ($nlinkM > 1 && !$verifypc)){
+               return -1; #Zero length or single-linked file
+       }
+       else {
                my $matchbyte = firstbyte($matchpath);
                my $comparflg = 'x';  # Default if no link to pool
                my $matchtype = "NewFile"; # Default if no link to pool
@@ -384,11 +399,21 @@
                }
                my $thepool = ($cmprsslvl > 0 ? "cpool" : "pool");
                my $thepooldir = ($cmprsslvl > 0 ? $cpooldir : $pooldir);
-               my $md5sumpath = my $md5sumpathbase = $bpc->MD52Path($md5sum, 
0, $thepooldir);
+               my $md5sumpathbase = $bpc->MD52Path($md5sum, 0, $thepooldir);
                my $i;
-               for ($i=-1; -f $md5sumpath ; $md5sumpath = $md5sumpathbase . 
'_' . ++$i) {
-            #Again start at the root, try to find best match in pool...
-                       if ((my $cmpresult  = compare_files ($matchpath, 
$md5sumpath, $cmprsslvl)) > 0) { #match found
+               if($verifypc) {
+                       for ($i=-1, my $md5sumpath = $md5sumpathbase; 
+                                -f $md5sumpath; $md5sumpath = $md5sumpathbase 
. '_' . ++$i) {
+                               #Start at the root, looking for inode match in 
the pool...
+                               return -1 if($inoM ==  (stat($md5sumpath))[1]);
+                       }
+                       #Otherwise, pc file not found in pool
+               }
+               # Now we know we have a pc file that doesn't link to the pool...
+               for ($i=-1, my $md5sumpath = $md5sumpathbase; 
+                        -f $md5sumpath; $md5sumpath = $md5sumpathbase . '_' . 
++$i) {
+            #Again start at the root, try to find file content match in pool...
+                       if ((my $cmpresult = compare_files ($matchpath, 
$md5sumpath, $cmprsslvl)) > 0) { #Exact file match found
 
                                my $inod =(stat($md5sumpath))[1]; #inode
                                if (exists $inodHOA{$inod}) { #Oops target set 
to be relinked
@@ -407,9 +432,9 @@
                                $totnewlinks++;
                                $rettype=2; #NewLink
                                goto match_return;
-                       } #Otherwise, continue to move up the chain looking for 
a pool match...
+                       } #Otherwise, continue up the chain looking for a pool 
match...
                }
-               $totnewfiles++; #Otherwise must be a NewFile
+               $totnewfiles++; #Otherwise must be a NewFile since not found in 
pool
                my $fullmd5sum = zFile2FullMD5($bpc, $md5, $matchpath, 
$cmprsslvl);
                ($md5sum .= '_' . $i) if $i >= 0;  # Name of first empty pool 
slot
                if ($md5sumhash{$fullmd5sum}) {   #Already seen before!
@@ -427,13 +452,6 @@
 #              print "\"$matchname\" $inoM $md5sum $matchtype $thepool 
${comparflg}${matchbyte}${md5sumbyte} $nlinkM $sizeM\n";
                return $rettype;
        }
-       elsif (exists $inodHOA{$inoM}) { #File links to dup element in our list
-               @MatchA = ($matchname, $inoM, @{$inodHOA{$inoM}});
-#              print "\"$matchname\" $inoM @{ $inodHOA{$inoM} }\n";
-               $totmatches++;
-               return 1;  #type=1
-       }
-       else { return -1;} #No dup or single-linked file
 }
 
 #Read in link file for matching pool md5sums(dups), NewFiles, NewLinks; don't 
read in MD5Err entries or other errors
@@ -458,7 +476,7 @@
                }
                my $name = shift(@MatchA);
                print "\"" . $name . "\" " . join(" ", @MatchA) . "$fixed\n" 
-                       if $matchtype >= 0 && $verbose;
+                       if $matchtype >= 0 && $notquiet;
        }
 }
 
@@ -536,7 +554,7 @@
                        warnerr "\"$matchname\" - link from \"$md5sum\" 
failed\n";
                        return -1;
                        }
-               print "\"$matchname\" successfully (re)linked from $matchtype 
[$inoM] to $md5sum [$inoP]" if $Verbose;
+               print "\"$matchname\" successfully (re)linked from $matchtype 
[$inoM] to $md5sum [$inoP]" if $verbose;
                return 1;
        }
        elsif ($type == 3 && $matchtype =~ m|^NewFile$|) {  #New File
@@ -552,12 +570,12 @@
                }
                $md5sumpath =~ m|(.*)/|;  # Find the containing directory
                jmkpath($1, 0, 0777) if (!-d $1);
-               print "\"$matchname\" - Making new pool directory $1\n" if 
($Verbose && ! -d $1);
+               print "\"$matchname\" - Making new pool directory $1\n" if 
($verbose && ! -d $1);
            if (!jlink($matchpath, $md5sumpath)){ # Note reverse order of link 
from types 1&2
                        warnerr "\"$matchname\" - link to \"$md5sum\" failed\n";
                        return -1;
                }
-               print "\"$matchname\" successfully linked to new file $md5sum 
[$inoM]" if $Verbose;
+               print "\"$matchname\" successfully linked to new file $md5sum 
[$inoM]" if $verbose;
                return 1;
        }
        else {

------------------------------------------------------------------------------
This SF Dev2Dev email is sponsored by:

WikiLeaks The End of the Free Internet
http://p.sf.net/sfu/therealnews-com
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to