Re: find_longest_match in SequenceMatcher

2006-07-25 Thread koara
Hello again John -- your hack/fix seems to work. Thanks a lot, now
let's hope timbot will indeed be here shortly with a proper fix =)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: find_longest_match in SequenceMatcher

2006-07-24 Thread koara
John Machin wrote:
 --test results snip---
 Looks to me like the problem has nothing at all to do with the length
 of the searched strings, but a bug appeared in 2.3.  What version(s)
 were you using? Can you reproduce your results (500  499 giving
 different answers) with the same version?

Hello John, thank you for investigating and responding!

Yes, I can reproduce the behaviour with different results within the
same version -- which is 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
32 bit (Intel)]

The catch is to remove the last character, as i described in my
original post, as opposed to passing reduced length parameters to
find_longest_match, which is what you did.

It is morning now, but i still fail to see the mistake i am making --
if it is indeed a bug, where do i report it? 

Cheers!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: find_longest_match in SequenceMatcher

2006-07-24 Thread John Machin
koara wrote:
 John Machin wrote:
  --test results snip---
  Looks to me like the problem has nothing at all to do with the length
  of the searched strings, but a bug appeared in 2.3.  What version(s)
  were you using? Can you reproduce your results (500  499 giving
  different answers) with the same version?

 Hello John, thank you for investigating and responding!

 Yes, I can reproduce the behaviour with different results within the
 same version -- which is 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310
 32 bit (Intel)]

 The catch is to remove the last character, as i described in my
 original post, as opposed to passing reduced length parameters to
 find_longest_match, which is what you did.

 It is morning now, but i still fail to see the mistake i am making --
 if it is indeed a bug, where do i report it?


Further sniffing shows that we were both partially right/wrong:
There is a bug.
It is length-dependant.
It was introduced in Python 2.3.

If you want to, you can hack your copy of
Python-2.4_installation_directory/Lib/difflib.py
so that line 316 reads:
if 0:
instead of:
if n = 200 and len(indices) * 100  n:

I'll report it.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: find_longest_match in SequenceMatcher

2006-07-24 Thread John Machin

John Machin wrote:
 koara wrote:
  John Machin wrote:
 There is a bug.
 I'll report it.

Reported.
http://sourceforge.net/tracker/index.php?func=detailaid=1528074group_id=5470atid=105470

-- 
http://mail.python.org/mailman/listinfo/python-list


find_longest_match in SequenceMatcher

2006-07-23 Thread koara
Hello, it might be too late or too hot, but i cannot work out this
behaviour of find_longest_match() in difflib.SequenceMatcher:

string1:
releasenotesforwildmagicversion01thiscdromcontainstheinitialreleaseofthesourcecodethataccompaniesthebook3dgameenginedesign:apracticalapproachtorealtimecomputergraphicsthereareanumberofknownissuesaboutthecodeastheseissuesareaddressedtheupdatedcodewillbeavailableatthewebsitehttp://wwwmagicsoftwarecom/[EMAIL
 PROTECTED]

string2:
releasenotesforwildmagicversion02updatefromversion01toversion02ifyourcopyofthebookhasversion01andifyoudownloadedversion02fromthewebsitethenapplythefollowingdirectionsforinstallingtheupdateforalinuxinstallationseethesectionattheendofthisdocumentupdatedirectionsassumingthatthetopleveldirectoryiscalledmagicreplacebyyourtoplevelnameyoushouldhavetheversion01contentsinthislocation1deletethecontentsofmagic\include2deletethesubdirectorymagic\source\mgcapplication3deletetheobsoletefiles:amagic\source\mgc

find_longest_match(0,500,0,500)=(24,43,10)=version01t

What? O_o Clearly there is a longer match, right at the beginning!
And then, after removal of the last character from each string (i found
the limit of 500 by trial and error -- and it looks suspiciously
rounded):

find_longest_match(0,499,0,499)=(0,0,32)=releasenotesforwildmagicversion0


Is this the expected behaviour? What's going on?
Thank you for any ideas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: find_longest_match in SequenceMatcher

2006-07-23 Thread John Machin

koara wrote:
 Hello, it might be too late or too hot, but i cannot work out this
 behaviour of find_longest_match() in difflib.SequenceMatcher:

 string1:
[snipped 500-byte string]

 string2:

[snipped 500-byte string]

 find_longest_match(0,500,0,500)=(24,43,10)=version01t

 What? O_o Clearly there is a longer match, right at the beginning!
 And then, after removal of the last character from each string (i found
 the limit of 500 by trial and error -- and it looks suspiciously
 rounded):

What limit? (a) My results [see below] (b) my inspection of the Python
version 2.4 source for the difflib module (c) what I know of the author
-- all tend to indicate that there is no hidden undocumented length
limit.


 find_longest_match(0,499,0,499)=(0,0,32)=releasenotesforwildmagicversion0


 Is this the expected behaviour? What's going on?

My code: (koara.py)
8---
strg1 =
rreleasenotesforwildmagicversion01thiscdromcontainstheinitialreleaseofthesourcecodethataccompaniesthebook3dgameenginedesign:apracticalapproachtorealtimecomputergraphicsthereareanumberofknownissuesaboutthecodeastheseissuesareaddressedtheupdatedcodewillbeavailableatthewebsitehttp://wwwmagicsoftwarecom/[EMAIL
 PROTECTED]
strg2 =
rreleasenotesforwildmagicversion02updatefromversion01toversion02ifyourcopyofthebookhasversion01andifyoudownloadedversion02fromthewebsitethenapplythefollowingdirectionsforinstallingtheupdateforalinuxinstallationseethesectionattheendofthisdocumentupdatedirectionsassumingthatthetopleveldirectoryiscalledmagicreplacebyyourtoplevelnameyoushouldhavetheversion01contentsinthislocation1deletethecontentsofmagic\include2deletethesubdirectorymagic\source\mgcapplication3deletetheobsoletefiles:amagic\source\mgc
import sys
print sys.version
from difflib import SequenceMatcher as SM
smo = SM(None, strg1, strg2)
print len(strg1), len(strg2)
print smo.find_longest_match(0, 500, 0, 500)
print smo.find_longest_match(0, 499, 0, 499)
print smo.find_longest_match(0, 100, 0, 100)
print smo.find_longest_match(1, 101, 1, 101)
print smo.find_longest_match(2, 102, 2, 102)
8---

The results on 4 python versions:

C:\junkc:\python24\python koara.py
2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)]
500 500
(24, 43, 10)
(24, 43, 10)
(24, 43, 10)
(24, 43, 10)
(24, 43, 10)

C:\junkc:\python23\python koara.py
2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)]
500 500
(24, 43, 10)
(24, 43, 10)
(24, 43, 10)
(24, 43, 10)
(24, 43, 10)

C:\junkc:\python22\python koara.py
2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)]
500 500
(0, 0, 32)
(0, 0, 32)
(0, 0, 32)
(1, 1, 31)
(2, 2, 30)

C:\junkc:\python21\python koara.py
2.1.3 (#35, Apr  8 2002, 17:47:50) [MSC 32 bit (Intel)]
500 500
(0, 0, 32)
(0, 0, 32)
(0, 0, 32)
(1, 1, 31)
(2, 2, 30)

Looks to me like the problem has nothing at all to do with the length
of the searched strings, but a bug appeared in 2.3.  What version(s)
were you using? Can you reproduce your results (500  499 giving
different answers) with the same version?

Anyway, as they say in the classics, Take a number; the timbot will be
with you shortly.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list