Re: Slight discrepancy with filecmp.cmp

2005-04-18 Thread Ivan Van Laningham
Hi All--

John Machin wrote:
 
 On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
 [EMAIL PROTECTED] wrote:
 [snip]
  So I wrote a set of
 programs to both index the disk versions with the cd versions, and to
 compare, using filecmp.cmp(), the cd and disk version.  Works fine.
 Turned up several dozen files that had been inadvertantly rotated or
 saved with the wrong quality, various fat-fingered mistakes like that.
 
 However, it didn't flag the files that I know have bitrot.  I seem to
 remember that diff uses a checksum algorithm on binary files, not a
 byte-by-byte comparison.  Am I wrong?
 
 According to the docs:
 
 
 cmp( f1, f2[, shallow[, use_statcache]])
 
 Compare the files named f1 and f2, returning True if they seem equal,
 False otherwise.
 Unless shallow is given and is false, files with identical os.stat()
 signatures are taken to be equal
 
 
 and what is an os.stat() signature, you ask? So did I.
 
 According to the code itself:
 
 def _sig(st):
 return (stat.S_IFMT(st.st_mode),
 st.st_size,
 st.st_mtime)
 
 Looks like it assumes two files are the same if they are of the same
 type, same size, and same time-last-modified. Normally I guess that's
 good enough, but maybe the phantom bit-toggler is bypassing the file
 system somehow. What OS are you running?
 

WinXP, SP2

 You might like to do two things: (1) run your comparison again with
 shallow=False (2) submit a patch to the docs.
 

You know, I read that doc, tried it, and it made absolutely no
difference.  Then I read your message, read the docs again, and finally
realized I had flipped the sense of shallow in my head.  Sheesh.  So
then I tried it with shallow=False, not True, and it runs about ten
times slower, but it works.  Beautifully.

Now I have to go back and redo the first five thousand, but it's worth
it.  Thanks.  Shows how much you need another set of eyeballs to debug
your brain;-)

 (-:
 You have of course attempted to eliminate other variables by checking
 that the bit-rot effect is apparent using different display software,
 a different computer, an observer who's not on the same medication as
 you, ... haven't you?
 :-)
 

;-)  Absolutely.  Several different viewers and several different OSs. 
And my wife never sees anything the way I do;-)

Metta,
Ivan
--
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/workshops/1998-11/proceedings.html
Army Signal Corps:  Cu Chi, Class of '70
Author:  Teach Yourself Python in 24 Hours
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Slight discrepancy with filecmp.cmp

2005-04-18 Thread Dan Sommers
On Mon, 18 Apr 2005 09:02:44 -0600,
Ivan Van Laningham [EMAIL PROTECTED] wrote:

 ... Shows how much you need another set of eyeballs to debug your
 brain;-)

+1 QOTW

 ... And my wife never sees anything the way I do;-)

There's probably a rude joke in there somewhere about your wife's eyes
debugging your brain, but since I would like to remain married, I will
not make it.  :-/

Regards,
Dan

-- 
Dan Sommers
http://www.tombstonezero.net/dan/
c = 1
-- 
http://mail.python.org/mailman/listinfo/python-list


Slight discrepancy with filecmp.cmp

2005-04-17 Thread Ivan Van Laningham
Hi All--
I noticed recently that a few of the jpgs from my digital cameras have
developed bitrot.  Not a real problem, because the cameras are CD
Mavicas, and I can simply copy the original from the cd.  Except for the
fact that I've got nearly 25,000 images to check.  So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version.  Works fine. 
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot.  I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison.  Am I wrong?  If I am, what then is the source
of the problem in my jpg images where it looks like a bit or two has
been shifted or added; suddenly, there's a line going through the
picture above which it's normal, and below it either the color has
changed (usually to pinkish) or the remaining raster lines are all
shifted either right or left?

Any ideas?

Metta,
Ivan
--
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/workshops/1998-11/proceedings.html
Army Signal Corps:  Cu Chi, Class of '70
Author:  Teach Yourself Python in 24 Hours
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Slight discrepancy with filecmp.cmp

2005-04-17 Thread John Machin
On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
[EMAIL PROTECTED] wrote:
[snip]
 So I wrote a set of
programs to both index the disk versions with the cd versions, and to
compare, using filecmp.cmp(), the cd and disk version.  Works fine. 
Turned up several dozen files that had been inadvertantly rotated or
saved with the wrong quality, various fat-fingered mistakes like that.

However, it didn't flag the files that I know have bitrot.  I seem to
remember that diff uses a checksum algorithm on binary files, not a
byte-by-byte comparison.  Am I wrong?  

According to the docs:


cmp( f1, f2[, shallow[, use_statcache]]) 

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise. 
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal


and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?

You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:-)


HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list