Re: Slight discrepancy with filecmp.cmp
Hi All-- John Machin wrote: On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham [EMAIL PROTECTED] wrote: [snip] So I wrote a set of programs to both index the disk versions with the cd versions, and to compare, using filecmp.cmp(), the cd and disk version. Works fine. Turned up several dozen files that had been inadvertantly rotated or saved with the wrong quality, various fat-fingered mistakes like that. However, it didn't flag the files that I know have bitrot. I seem to remember that diff uses a checksum algorithm on binary files, not a byte-by-byte comparison. Am I wrong? According to the docs: cmp( f1, f2[, shallow[, use_statcache]]) Compare the files named f1 and f2, returning True if they seem equal, False otherwise. Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal and what is an os.stat() signature, you ask? So did I. According to the code itself: def _sig(st): return (stat.S_IFMT(st.st_mode), st.st_size, st.st_mtime) Looks like it assumes two files are the same if they are of the same type, same size, and same time-last-modified. Normally I guess that's good enough, but maybe the phantom bit-toggler is bypassing the file system somehow. What OS are you running? WinXP, SP2 You might like to do two things: (1) run your comparison again with shallow=False (2) submit a patch to the docs. You know, I read that doc, tried it, and it made absolutely no difference. Then I read your message, read the docs again, and finally realized I had flipped the sense of shallow in my head. Sheesh. So then I tried it with shallow=False, not True, and it runs about ten times slower, but it works. Beautifully. Now I have to go back and redo the first five thousand, but it's worth it. Thanks. Shows how much you need another set of eyeballs to debug your brain;-) (-: You have of course attempted to eliminate other variables by checking that the bit-rot effect is apparent using different display software, a different computer, an observer who's not on the same medication as you, ... haven't you? :-) ;-) Absolutely. Several different viewers and several different OSs. And my wife never sees anything the way I do;-) Metta, Ivan -- Ivan Van Laningham God N Locomotive Works http://www.andi-holmes.com/ http://www.foretec.com/python/workshops/1998-11/proceedings.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours -- http://mail.python.org/mailman/listinfo/python-list
Re: Slight discrepancy with filecmp.cmp
On Mon, 18 Apr 2005 09:02:44 -0600, Ivan Van Laningham [EMAIL PROTECTED] wrote: ... Shows how much you need another set of eyeballs to debug your brain;-) +1 QOTW ... And my wife never sees anything the way I do;-) There's probably a rude joke in there somewhere about your wife's eyes debugging your brain, but since I would like to remain married, I will not make it. :-/ Regards, Dan -- Dan Sommers http://www.tombstonezero.net/dan/ c = 1 -- http://mail.python.org/mailman/listinfo/python-list
Slight discrepancy with filecmp.cmp
Hi All-- I noticed recently that a few of the jpgs from my digital cameras have developed bitrot. Not a real problem, because the cameras are CD Mavicas, and I can simply copy the original from the cd. Except for the fact that I've got nearly 25,000 images to check. So I wrote a set of programs to both index the disk versions with the cd versions, and to compare, using filecmp.cmp(), the cd and disk version. Works fine. Turned up several dozen files that had been inadvertantly rotated or saved with the wrong quality, various fat-fingered mistakes like that. However, it didn't flag the files that I know have bitrot. I seem to remember that diff uses a checksum algorithm on binary files, not a byte-by-byte comparison. Am I wrong? If I am, what then is the source of the problem in my jpg images where it looks like a bit or two has been shifted or added; suddenly, there's a line going through the picture above which it's normal, and below it either the color has changed (usually to pinkish) or the remaining raster lines are all shifted either right or left? Any ideas? Metta, Ivan -- Ivan Van Laningham God N Locomotive Works http://www.andi-holmes.com/ http://www.foretec.com/python/workshops/1998-11/proceedings.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours -- http://mail.python.org/mailman/listinfo/python-list
Re: Slight discrepancy with filecmp.cmp
On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham [EMAIL PROTECTED] wrote: [snip] So I wrote a set of programs to both index the disk versions with the cd versions, and to compare, using filecmp.cmp(), the cd and disk version. Works fine. Turned up several dozen files that had been inadvertantly rotated or saved with the wrong quality, various fat-fingered mistakes like that. However, it didn't flag the files that I know have bitrot. I seem to remember that diff uses a checksum algorithm on binary files, not a byte-by-byte comparison. Am I wrong? According to the docs: cmp( f1, f2[, shallow[, use_statcache]]) Compare the files named f1 and f2, returning True if they seem equal, False otherwise. Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal and what is an os.stat() signature, you ask? So did I. According to the code itself: def _sig(st): return (stat.S_IFMT(st.st_mode), st.st_size, st.st_mtime) Looks like it assumes two files are the same if they are of the same type, same size, and same time-last-modified. Normally I guess that's good enough, but maybe the phantom bit-toggler is bypassing the file system somehow. What OS are you running? You might like to do two things: (1) run your comparison again with shallow=False (2) submit a patch to the docs. (-: You have of course attempted to eliminate other variables by checking that the bit-rot effect is apparent using different display software, a different computer, an observer who's not on the same medication as you, ... haven't you? :-) HTH, John -- http://mail.python.org/mailman/listinfo/python-list