[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-05 Thread Tim Peters


Tim Peters  added the comment:

Terry, your suggested replacement statement looks like an improvement to me. 
Perhaps the longer explanation could be placed in a footnote.

Note that I'm old ;-) I grew up on plain old ASCII, decades & decades ago, and 
tabs are in fact the only "characters" I've had a problem with in doctests. But 
then, e.g., I never in my life used goofy things like ASCII "form feed" 
characters, or NUL bytes, or ... in text either.

I don't use Unicode either, except to the extent that Python forces me to when 
I'm sticking printable ASCII characters inside string quotes ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-05 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

I have an alternate replacement:  "These lines can be confusing if the 
sequences contain tab characters or other characters that result in the 
indicator symbols in these lines being mislocated."

Or leave the current sentence as is.

Explanation with the details omitted from the above:
In 3.x, strings are unicode.  Even if one uses a fixed pitch font for the ascii 
subset, a majority of characters will be rendered either in a different fixed 
pitch or with variable pitch.  And on a graphics screen that is not simulating 
a fixed-pitch text terminal (such as Windows console), the so-called 
double-wide East Asian characters are not really double wide but more like 1.6 
times as wide.  The details depend on the OS, the font, and perhaps the font 
size.  One can explore this in the font sample box for the Font tab of the IDLE 
settings dialog.  The problems include chars less than 'one space', down to 0 
wide.  For general unicode, ^ marking does not work.  Syntax error marking has 
the same problem and there is no general solution.  

Tab is an example of a character that is either displayed as a variable space 
or a fixed double space ('\t') or larger.  If we were to make a change, we 
should mention, as above, that many non-ascii chars are as especially confusing 
as tabs.

In your example above, the caret at least points to the right space.  It 
correctly indicates some difference beyond the visible end - a non-visible 
whitespace difference.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-03 Thread Jürgen Gmach

Jürgen Gmach  added the comment:

First I need to apologize for not providing more info already when I created 
the issue.

Initially, I did not even plan to create an issue, and thought the PR with the 
context of the current documentation would be sufficient information.

Thanks for taking your time anyway!

Also, thanks to Tim for explaining the meaning of the question mark in detail. 
When I read the documentation, I also had to pause a moment to understand the 
sentence. But I agree with Tim, it is hard to explain it better without getting 
much more verbose.

My initial reason to read (and then to update) the documentation was an output 
of pytest, which left me puzzled.

E   AssertionError: assert 'ROOT: No tox...ith_no_t0/p\n' == 'ROOT: No 
tox..._with_no_t0/p'
E Skipping 136 identical leading characters in diff, use -v to show
E - ith_no_t0/p
E + ith_no_t0/p
E ?+

Here is the screenshot and some discussion:
https://twitter.com/jugmac00/status/1377317886419738624

Using a similar snippet as Tim, here is a minimal example:

for L in d.compare(["abcdefghijkl"], ["abcdefghijkl\n"]):
print(L)

- abcdefghijkl
+ abcdefghijkl

? +


Usually, the output is pretty obvious most of the time, so I never actually 
noticed the question mark - except when whitespace characters are involved.

I was then told that pytest uses difflib, and I was kindly pointed to the 
Python documentation.

As only the tab character was listed, I thought it would be a good idea to add 
the other whitespace characters as well.

After Tim's explanation, I see, that tabs could be especially confusing, while 
all whitespace characters are on a normal level of confusing :-), especially at 
the end of the diff.

I certainly won't forget what I learned, but maybe my proposal helps one fellow 
Python user or another.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-03 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

After 3+ years of Github I did not remember that B diffs use lines with 
change position markers and in particular that at they (often? always?) start 
with ?s. IDLE also uses color to mark positions (for syntax errors).  The 
following would have been clearer to me and likely to people who have never 
seen such lines.

"Location marker lines beginning with ‘?’ use symbols to guide the eye to 
intraline differences."

Tim, you seem to still think that tabs are especially problematical. 

Jürgen, without evidence otherwise, I agree with this.  Adding other chars to 
the sentence would dilute the current focus on tabs.  Hence my request for 
examples to justify doing so.  Sorry I was not as clear as I could and should 
have been.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-02 Thread Tim Peters


Tim Peters  added the comment:

Lines beginning with "?" are entirely synthetic: they were not present in 
either input.  So that's what that part means.

I'm not clear on what else could be materially clearer without greatly bloating 
the text. For example,

>>> d = difflib.Differ()
>>> for L in d.compare(["abcefghijkl\n"], ["a cxefghijkl\n"]):
print(L, end="")
- abcefghijkl
?  ^
+ a cxefghijkl
?  ^ +

The "?" lines guide the eye to the places that differ: "b" was replaced by a 
blank, and "x" was inserted.  The marks on the "?" lines are intended to point 
out exactly where changes (substitutions, insertions, deletions) occurred.

If the second input had a tab instead of a blank, the "+" wouldn't _appear_ to 
be under the "x" at all.  It would instead "look like" a long string of blanks 
was between "a" and "c" in the first input, and the "+" would appear to be 
under one of them somewhere near the middle of the empty space.

Tough luck. Use tab characters (or any other kind of "goofy" whitespace) in 
input to visual tools, and you deserve whatever you get :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-02 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

The quote is in the following section.
https://docs.python.org/3/library/difflib.html#difflib.Differ
I do not really understand the previous line "Lines beginning with ‘?’ attempt 
to guide the eye to intraline differences, and were not present in either input 
sequence. "  Can you give examples where '?' occurs, with tabs and spaces 
(newlines would not be within a line).?

--
nosy: +terry.reedy
versions:  -Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-01 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-01 Thread Jürgen Gmach

New submission from Jürgen Gmach :

In the documentation you can currently read for the "?"-output:

"These lines can be confusing if the sequences contain tab characters."

>From first hand experience :-), I can assure it is also very confusing for 
>other types of whitespace characters, such as spaces and line breaks.

I'd like to add the other characters to the documentation.

--
assignee: docs@python
components: Documentation
messages: 389961
nosy: docs@python, jugmac00
priority: normal
pull_requests: 23879
severity: normal
status: open
title: difflib: mention other "problematic" characters in documentation
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com