New submission from Simon Descarpentries <silt...@acoeuro.com>:

I, it's my 1st post here. I'm a French computer-science engineer with 10 years 
XP and manager at Acoeuro.com SSLL compagny. I suggested a better regexp 
integration on python-ideas a few months ago failing to convince getting things 
done.

Despite issue https://bugs.python.org/issue25391 closed in 2010, nothing seems 
visible on https://docs.python.org/3/library/difflib.html to help users 
anticipating that a string greatly matching at 199 characters length, won't at 
all at the 200th one. It's an inconsistent behavior that looks like a bug.

#!/usr/bin/env python3

from difflib import SequenceMatcher

a = 'ab'*400                                                                    
                
b = 'ba'*400

for i in range(1, 400):
    diff_ratio = SequenceMatcher(None, a=a[:i], b=b[:i]).ratio()
    print('%3.i %.2f' % (i, diff_ratio), end=' ')
    not i % 10 and print('')

EOF

At 199c I have a 0.99 ratio, and 0.00 at 200c. The results are nearly the same 
with strings made of emails like in 
https://github.com/Siltaar/drop_alternatives especially comparing strings like 
: 

"performantsetdvelopperducontenusimilairepourvosprochainstweets.suivezn...@twitterbusinesspourdautresinfosetactus.testerlespublicitstwitterbusiness.twitter.com|@TwitterBusiness|Aide|SedsinscrireLemailsadresse@gggTwitter,Inc.MarketStreet,SuiteSanFrancisco,CA"

"rducontenusimilairepourvosprochainsTweets.Suiveznous@TwitterBusinesspourprofiterdautresinfosetactus.TesterlesPublicitésTwitterbusiness.twitter.com@TwitterBusinessAideSedésinscrireTwitterInternationalCompanyOneCumberlandPlace,FenianStreetDublin,DAXIRELAND"

Fortunately, I didn't experienced the problem using quick_ratio().

The documentation is not clear about ratio / quick_ratio / real_quick_ratio ; 
but looks like unstable. With in addition an inconsistent behavior it looks 
like worthing some fix.

----------
components: Library (Lib)
messages: 305153
nosy: Siltaar
priority: normal
severity: normal
status: open
title: difflib SequenceMatcher ratio() still have unpredictable behavior
type: behavior
versions: Python 2.7, Python 3.5, Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31889>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to