New submission from Springem Springsbee <mral...@gmail.com>:
Hello, I'm using difflib's SequenceMatcher to locate common substrings. It seems like the matcher is missing a common substrings. I'm guessing this is a rather low-level issue in difflib. The autojunk parameter has no effect for obvious reasons. Alternate pairwise comparisons between the following 3 strings omit the 2-character match 'AC' GATTACA TAGACCA ATACA The following Github gist captures the issue, which I'll repeat here for redundancy https://gist.github.com/MatthewRalston/b0ab6ac1dbe322cb12063310ccdbb786 >import difflib >string1 = "TAGACCA" >string2 = "ATACA" >s = difflib.SequenceMatcher(None, string1, string2) >blox = s.get_matching_blocks() >print(blox) [Match(a=0, b=1, size=2), Match(a=5, b=3, size=2), Match(a=7, b=5, size=0)] # Missing Match(a=3, b=2, size=2) >print([string1[x.a:x.a+x.size] for x in blox if x.size > 1]) ['TA', 'CA'] # Missing the substring 'CA' ---------- components: Library (Lib) messages: 328593 nosy: Springem Springsbee, terry.reedy priority: normal severity: normal status: open title: difflib.SequenceMatcher.get_matching_blocks omits common strings type: behavior versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35079> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com