[issue46667] SequenceMatcher & autojunk - false negative

Jonathan Mon, 07 Feb 2022 07:02:08 -0800


Jonathan <bugrepo...@lightpear.com> added the comment:


I still don't get how UNIQUESTRING is the longest even with autojunk=True, but 
that's an implementation detail and I'll trust you that it's working as 
expected.

Given this, I'd suggest the following then:

* `Autojunk=False` should be the default unless there's some reason to believe 
SequenceMatcher is mostly used for code comparisons.

* If - for whatever reason - the default can't be changed, I'd suggest a nice 
big docs "Warning" (at a minimum a "Note") saying something like "The default 
autojunk=True is not suitable for normal string comparison. See autojunk for 
more information".

* Human-friendly doc explanation for autojunk. The current explanation is only 
going to be helpful to the tiny fraction of users who understand the algorithm. 
Your explanation is a good start:
        "Autojunk was introduced as a way to greatly speed comparing files of 
code, viewing them as sequences of lines. But it more often backfires when 
comparing strings (viewed as sequences of characters)"

Put simply: The current docs aren't helpful to users who don't have text 
matching expertise, nor do they emphasise the huge caveat that autojunk=True 
raises.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46667>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue46667] SequenceMatcher & autojunk - false negative

Reply via email to