On Sat, 26 Feb 2005 23:53:10 +0100, Patrick Useldinger <[EMAIL PROTECTED]> wrote:
> I've tested it intensively "Famous Last Words" :-) >Thanks for your feedback! Here's some more: (1) Manic s/w producing lots of files all the same size: the Borland C[++] compiler produces a debug symbol file (.tds) that's always 384KB; I have 144 of these on my HD, rarely more than 1 in the same directory. Here's a snippet from a duplicate detection run: DUP|393216|2|\devel\delimited\build\lib.win32-1.5\delimited.tds|\devel\delimited\build\lib.win32-2.1\delimited.tds DUP|393216|2|\devel\delimited\build\lib.win32-2.3\delimited.tds|\devel\delimited\build\lib.win32-2.4\delimited.tds (2) There appears to be a flaw in your logic such that it will find duplicates only if they are in the *SAME* directory and only when there are no other directories with two or more files of the same size. The above duplicates were detected only when I made the following changes to your script: --- fdups Sat Feb 26 06:41:36 2005 +++ fdups_jm.py Sun Feb 27 12:18:04 2005 @@ -29,13 +29,14 @@ self.count = self.totalsize = self.inodecount = self.slinkcount = 0 self.gain = self.bytescompared = self.bytesread = self.inodecount = 0 for toplevel in args: - os.path.walk(toplevel, self.buildList, None) + os.path.walk(toplevel, self.updateDict, None) if self.count > 0: self.compare() - def buildList(self,arg,dirpath,namelist): - """ build a dictionnary of files to be analysed, indexed by length """ - files = {} + def updateDict(self,arg,dirpath,namelist): + """ update a dictionary of files to be analysed, indexed by length """ + # files = {} + files = self.compfiles for filepath in namelist: fullpath = os.path.join(dirpath,filepath) if os.path.isfile(fullpath): @@ -51,20 +52,23 @@ if size >= MIN_FILESIZE: self.count += 1 self.totalsize += size + # is above totalling in the wrong place? if size not in files: files[size]=[fullpath] else: files[size].append(fullpath) - for size in files: - if len(files[size]) != 1: - self.compfiles[size]=files[size] + # for size in files: + # if len(files[size]) != 1: + # self.compfiles[size]=files[size] def compare(self): """ compare all files of the same size - outer loop """ sizes=self.compfiles.keys() sizes.sort() for size in sizes: - self.comparefiles(size,self.compfiles[size]) + list_of_filenames = self.compfiles[size] + if len(list_of_filenames) > 1: + self.comparefiles(size, list_of_filenames) def comparefiles(self,size,filelist): """ compare all files of the same size - inner loop """ (3) Your fdups-check gadget doesn't work on Windows; the commands module works only on Unix but is supplied with Python on all platforms. The results might just confuse a newbie: (1, "'{' is not recognized as an internal or external command,\noperable program or batch file.") Why not use the Python filecmp module? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list