Re: fdups: calling for beta testers
John Machin wrote: I've tested it intensively Famous Last Words :-) ;-) (1) Manic s/w producing lots of files all the same size: the Borland C[++] compiler produces a debug symbol file (.tds) that's always 384KB; I have 144 of these on my HD, rarely more than 1 in the same directory. Not sure what you want me to do about it. I've decreased the minimum block size once more, to accomodate for more files of the same length without increasing the total amount of memory used. (2) There appears to be a flaw in your logic such that it will find duplicates only if they are in the *SAME* directory and only when there are no other directories with two or more files of the same size. Ooops... A really stupid mistake on my side. Corrected. (3) Your fdups-check gadget doesn't work on Windows; the commands module works only on Unix but is supplied with Python on all platforms. The results might just confuse a newbie: Why not use the Python filecmp module? Done. It's also faster AND it works better. Thanks for the suggestion. Please fetch the new version from http://www.homepages.lu/pu/fdups.html. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
Patrick Useldinger wrote: (9) Any good reason why the executables don't have .py extensions on their names? (9) Because I am lazy and Linux doesn't care. I suppose Windows does? Unfortunately, yes. Windows has nothing like the x permission bit, so you have to have an actual extension on the filename and Windows (XP anyway) will check it against the list of extensions in the PATHEXT environment variable to determine if it should be treated like an executable. Otherwise you must type python and the full filename. -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
Peter Hansen wrote: Patrick Useldinger wrote: (9) Any good reason why the executables don't have .py extensions on their names? (9) Because I am lazy and Linux doesn't care. I suppose Windows does? Unfortunately, yes. Windows has nothing like the x permission bit, so you have to have an actual extension on the filename and Windows (XP anyway) will check it against the list of extensions in the PATHEXT environment variable to determine if it should be treated like an executable. Otherwise you must type python and the full filename. Or use exemaker, which IMHO is the best way to handle this problem. Serge. -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
Serge Orlov wrote: Or use exemaker, which IMHO is the best way to handle this problem. Looks good, but I do not use Windows. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
On Sat, 26 Feb 2005 23:53:10 +0100, Patrick Useldinger [EMAIL PROTECTED] wrote: I've tested it intensively Famous Last Words :-) Thanks for your feedback! Here's some more: (1) Manic s/w producing lots of files all the same size: the Borland C[++] compiler produces a debug symbol file (.tds) that's always 384KB; I have 144 of these on my HD, rarely more than 1 in the same directory. Here's a snippet from a duplicate detection run: DUP|393216|2|\devel\delimited\build\lib.win32-1.5\delimited.tds|\devel\delimited\build\lib.win32-2.1\delimited.tds DUP|393216|2|\devel\delimited\build\lib.win32-2.3\delimited.tds|\devel\delimited\build\lib.win32-2.4\delimited.tds (2) There appears to be a flaw in your logic such that it will find duplicates only if they are in the *SAME* directory and only when there are no other directories with two or more files of the same size. The above duplicates were detected only when I made the following changes to your script: --- fdups Sat Feb 26 06:41:36 2005 +++ fdups_jm.py Sun Feb 27 12:18:04 2005 @@ -29,13 +29,14 @@ self.count = self.totalsize = self.inodecount = self.slinkcount = 0 self.gain = self.bytescompared = self.bytesread = self.inodecount = 0 for toplevel in args: -os.path.walk(toplevel, self.buildList, None) +os.path.walk(toplevel, self.updateDict, None) if self.count 0: self.compare() -def buildList(self,arg,dirpath,namelist): - build a dictionnary of files to be analysed, indexed by length -files = {} +def updateDict(self,arg,dirpath,namelist): + update a dictionary of files to be analysed, indexed by length +# files = {} +files = self.compfiles for filepath in namelist: fullpath = os.path.join(dirpath,filepath) if os.path.isfile(fullpath): @@ -51,20 +52,23 @@ if size = MIN_FILESIZE: self.count += 1 self.totalsize += size +# is above totalling in the wrong place? if size not in files: files[size]=[fullpath] else: files[size].append(fullpath) -for size in files: -if len(files[size]) != 1: -self.compfiles[size]=files[size] +# for size in files: +# if len(files[size]) != 1: +# self.compfiles[size]=files[size] def compare(self): compare all files of the same size - outer loop sizes=self.compfiles.keys() sizes.sort() for size in sizes: -self.comparefiles(size,self.compfiles[size]) +list_of_filenames = self.compfiles[size] +if len(list_of_filenames) 1: + self.comparefiles(size, list_of_filenames) def comparefiles(self,size,filelist): compare all files of the same size - inner loop (3) Your fdups-check gadget doesn't work on Windows; the commands module works only on Unix but is supplied with Python on all platforms. The results might just confuse a newbie: (1, '{' is not recognized as an internal or external command,\noperable program or batch file.) Why not use the Python filecmp module? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
fdups: calling for beta testers
Hi all, I am looking for beta-testers for fdups. fdups is a program to detect duplicate files on locally mounted filesystems. Files are considered equal if their content is identical, regardless of their filename. Also, fdups ignores symbolic links and is able to detect and ignore hardlinks, where available. In contrast to similar programs, fdups does not rely on md5 sums or other hash functions to detect potentially identical files. Instead, it does a direct blockwise comparison and stops reading as soon as possible, thus reducing the file reads to a maximum. fdups has been developed on Linux but should run on all platforms that support Python. fdups' homepage is at http://www.homepages.lu/pu/fdups.html, where you'll also find a link to download the tar. I am primarily interested in getting feedback if it produces correct results. But as I haven't been programming in Python for a year or so, I'd also be interested in comments on code if you happen to look at it in detail. Your help is much appreciated. -pu -- http://mail.python.org/mailman/listinfo/python-list
Re: fdups: calling for beta testers
Patrick Useldinger wrote: fdups' homepage is at http://www.homepages.lu/pu/fdups.html, where you'll also find a link to download the tar. fdups has no installation program. Just change into a temporary directory, and type tar xfj fdups.tar.bz. You should also chown the files according to your needs, and then copy the executables to your PATH. (1) It's actually .bz2, not .bz (2) Why annoy people with the not-widely-known bzip2 format just to save a few % of a 12KB file?? (3) Typing that on Windows command line doesn't produce a useful result (4) Haven't you heard of distutils? (5) if files[subgroup[j]]['flag'] and files[subgroup[i]]['buffer'] == files[subgroup[j]]['buffer']: That's not the most readable code I've ever seen. (6) You are keeping open handles for all files of a given size -- have you actually considered the possibility of an exception like this: IOError: [Errno 24] Too many open files: 'foo509' Once upon a time, max 20 open files was considered as generous as 640KB of memory. Looks like Bill thinks 512 (open files, that is) is about right these days. (7) ! def compare(self): ! compare all files of the same size - outer loop !sizes=self.compfiles.keys() !sizes.sort() !for size in sizes: !self.comparefiles(size,self.compfiles[size]) Why sort? What's wrong with just two lines: ! for size, file_list in self.compfiles.iteritems(): ! self.comparefiles(size, file_list) (8) global MIN_FILESIZE,MAX_ONEBUFFER,MAX_ALLBUFFERS,BLOCKSIZE,INODES That doesn't sit very well with the 'everything must be in a class' religion seemingly espoused by the following: ! class fDups: ! encapsulates the whole logic (9) Any good reason why the executables don't have .py extensions on their names? All in all, a very poor out-of-the-box experience. Bear in mind that very few Windows users would have even heard of bzip2, let alone have a bzip2.exe on their machine. They wouldn't even be able to *open* the box. And what is chown -- any relation of Perl's chomp? -- http://mail.python.org/mailman/listinfo/python-list