[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Tim Peters


Tim Peters  added the comment:

It's probably OK, but there's no "pure win" to be had here.  There's generally 
more than one way to convert one string to another, and what "looks right" to 
humans depends a whole lot on context.

For example, consider these strings:

"private Thread currentThread;"
"private volatile Thread currentThread;"

"It's obvious" someone inserted "volatile" into the first string, and that's 
what ndiff's default says:

- private Thread currentThread;
+ private volatile Thread currentThread;
? +

However, pass `charjunk=None` instead, and ndiff claims someone inserted "e 
volatil" after the "t" in "private":

- private Thread currentThread;
+ private volatile Thread currentThread;
?   +

Which is also a correct way, but - to human eyes - an insane way ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

Nice insight Tim.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Karthikeyan Singaravelan

Karthikeyan Singaravelan  added the comment:

Thanks for the explanation. This seems to give the desired diff with 
charjunk=None passed to multiline string comparison helper. I am not sure how 
useful it would be to pass it to sequence and dict comparison that also use 
ndiff. I can open a PR if it's okay with the set of strings in the report as a 
test case. There are no test case failures in existing unittest folder test 
suite so this seems like a safe change to me.


# With patch charjunk=None

./python.exe ../backups/bpo35955_1.py
F
==
FAIL: test_foo (__main__.FooTestCase)
--
Traceback (most recent call last):
  File "../backups/bpo35955_1.py", line 6, in test_foo
self.assertEqual("drwxrwxr-x 2 2000  2000\n", "drwxr-xr-x 2 2000  2000\n")
AssertionError: 'drwxrwxr-x 2 2000  2000\n' != 'drwxr-xr-x 2 2000  2000\n'
- drwxrwxr-x 2 2000  2000
?  ^
+ drwxr-xr-x 2 2000  2000
?  ^


--
Ran 1 test in 0.003s

FAILED (failures=1)

# Without patch

➜  cpython git:(master) ✗ python3.7 ../backups/bpo35955_1.py
F
==
FAIL: test_foo (__main__.FooTestCase)
--
Traceback (most recent call last):
  File "../backups/bpo35955_1.py", line 6, in test_foo
self.assertEqual("drwxrwxr-x 2 2000  2000\n", "drwxr-xr-x 2 2000  2000\n")
AssertionError: 'drwxrwxr-x 2 2000  2000\n' != 'drwxr-xr-x 2 2000  2000\n'
- drwxrwxr-x 2 2000  2000
?  ---
+ drwxr-xr-x 2 2000  2000
?+++


--
Ran 1 test in 0.002s

FAILED (failures=1)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Tim Peters


Tim Peters  added the comment:

difflib generally synchs on the longest contiguous matching subsequence that 
doesn't contain a "junk" element.  By default, `ndiff()`'s optional `charjunk` 
argument considers blanks and tabs to be junk characters.

In the strings:

"drwxrwxr-x 2 2000  2000\n"
"drwxr-xr-x 2 2000  2000\n"

the longest matching substring not containing whitespace is "rwxr-x", of length 
6, starting at index 4 in the first string and at index 1 in the second.  So 
it's aligning the strings like so:

"drwxrwxr-x 2 2000  2000\n"
   "drwxr-xr-x 2 2000  2000\n"
 123456

That's why it wants to delete the 1:4 slice in the first string and insert 
"r-x" after the longest matching substring.

The default is aimed at improving results for human-readable text, like prose 
and Python code, where stuff between whitespace is often read "as a whole" 
(words, keywords, identifiers, ...).

For cases like this one, where character-by-character differences are 
important, it's often better to pass `charjunk=None`.  Then the longest 
matching substring is "xr-x 2 2000  2000" at the tail end of both strings, and 
you get the output you're expecting.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

I am not sure this is a duplicate since the other issue was about newline at 
the end of strings. This is about the diff being little irrelevant even with 
newline in the end for strings. Sample program where change in 5th character 
gives the reported diff.

import difflib

for i in range(7):
print(f"Change character at {i}")
a = list("drwxrwxr-x 2 2000  2000\n")
b = "drwxrwxr-x 2 2000  2000\n"
a[i] = '-'
a = ''.join(a)
print(''.join(difflib.ndiff([a], [b])))

Change character at 0
- -rwxrwxr-x 2 2000  2000
? ^
+ drwxrwxr-x 2 2000  2000
? ^

Change character at 1
- d-wxrwxr-x 2 2000  2000
?  ^
+ drwxrwxr-x 2 2000  2000
?  ^

Change character at 2
- dr-xrwxr-x 2 2000  2000
?   ^
+ drwxrwxr-x 2 2000  2000
?   ^

Change character at 3
- drw-rwxr-x 2 2000  2000
?^
+ drwxrwxr-x 2 2000  2000
?^

Change character at 4
- drwx-wxr-x 2 2000  2000
? ^
+ drwxrwxr-x 2 2000  2000
? ^

Change character at 5
- drwxr-xr-x 2 2000  2000
?---
+ drwxrwxr-x 2 2000  2000
?  +++

Change character at 6
- drwxrw-r-x 2 2000  2000
?   ^
+ drwxrwxr-x 2 2000  2000
?   ^

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

I don't think so, because the issue happens on a single line diff... although 
it's plausible there's a common-mode fix.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Chris Jerdonek


Chris Jerdonek  added the comment:

Is this a duplicate of issue24780?

--
nosy: +chris.jerdonek

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

I have tried with different places where only '-' and 'w' differ. They seemed 
to produce correct diff except for this once case where the diff was confusing.

--
nosy: +tim.peters
type:  -> behavior
versions: +Python 2.7, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Jason R. Coombs


Jason R. Coombs  added the comment:

I'm re-opening this issue as it does seem to apply stdlib (difflib.ndiff), 
which is why I encountered it both in unittest and pytest. Thanks xtreak for 
the distilled example.

--
resolution: third party -> 
stage: resolved -> 
status: closed -> open
title: unittest assertEqual reports incorrect location of mismatch -> difflib 
reports incorrect location of mismatch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com