New submission from Martin <ma...@dtu.dk>:

difflib.SequenceMatcher fails to make a proper alignment between 2 sequences 
with only 3 single letter changes. Its performance is completely off with a 
similarity ratio of 0.16, in stead of the more accurate 0.99.

Here is a snippet to replicate the failure:
>>> aa_ref = 
>>> 'MTLFTTLLVLIFERLFKLGEHWQLDHRLEAFFRRVKHFSLGRTLGMTIIAMGVTFLLLRALQGVLFNVPTLLVWLLIGLLCIGAGKVRLHYHAYLTAASRNDSHARATMAGELTMIHGVPAGCDEREYLRELQNALLWINFRFYLAPLFWLIVGGTWGPVTLMGYAFLRAWQYWLARYQTPHHRLQSGIDAVLHVLDWVPVRLAGVVYALIGHGEKALPAWFASLGDFHTSQYQVLTRLAQFSLAREPHVDKVETPKAAVSMAKKTSFVVVVVIALLTIYGALV'
>>> aa_seq = 
>>> 'MTLFTTLLVLIFERLFKLGEHWQLDHRLEAFFRRVKHFSLGRTLCMTIIAMGVTFLLLRALQGVLFNVPTLLVWLLIGLLCIGAGKVRLHYHAYLTAASRNDSHAHATMAGELTMIHGVPAGCDEREYLRELQNALLWINFRFYLAPLFWLIVGGTWGPVTLMGYAFLRAWQYWLARYQTPHHRLQSGIDAVLHALDWVPVRLAGVVYALIGHGEKALPAWFASLGDFHTSQYQVLTRLAQFSLAREPHVDKVETPKAAVSMAKKTSFVVVVVIALLTIYGALV'
>>> sum(a!=b for a, b in zip(aa_ref, aa_seq))
3
>>> match = SequenceMatcher(a=aa_ref, b=aa_seq)
>>> match.ratio()
0.1619718309859155
>>> match.get_opcodes()
[('equal', 0, 43, 0, 43), ('delete', 43, 79, 43, 43), ('equal', 79, 81, 43, 
45), ('replace', 81, 122, 45, 80), ('equal', 122, 123, 80, 81), ('replace', 
123, 284, 81, 284)]

----------
messages: 314163
nosy: mcft
priority: normal
severity: normal
status: open
title: SequenceMatcher bug
type: behavior
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33112>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to