Tim Peters <t...@python.org> added the comment:
difflib generally synchs on the longest contiguous matching subsequence that doesn't contain a "junk" element. By default, `ndiff()`'s optional `charjunk` argument considers blanks and tabs to be junk characters. In the strings: "drwxrwxr-x 2 2000 2000\n" "drwxr-xr-x 2 2000 2000\n" the longest matching substring not containing whitespace is "rwxr-x", of length 6, starting at index 4 in the first string and at index 1 in the second. So it's aligning the strings like so: "drwxrwxr-x 2 2000 2000\n" "drwxr-xr-x 2 2000 2000\n" 123456 That's why it wants to delete the 1:4 slice in the first string and insert "r-x" after the longest matching substring. The default is aimed at improving results for human-readable text, like prose and Python code, where stuff between whitespace is often read "as a whole" (words, keywords, identifiers, ...). For cases like this one, where character-by-character differences are important, it's often better to pass `charjunk=None`. Then the longest matching substring is "xr-x 2 2000 2000" at the tail end of both strings, and you get the output you're expecting. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue35955> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com