[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-23 Thread Terry J. Reedy


Change by Terry J. Reedy :


--
type: behavior -> enhancement
versions: +Python 3.11 -Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-03 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

I've added an initial draft PR: 
https://github.com/python/cpython/pull/27587/files

I will add docs and news if this looks good in general.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-03 Thread Andrei Kulakov


Change by Andrei Kulakov :


--
pull_requests: +26089
pull_request: https://github.com/python/cpython/pull/27587

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-03 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

Note that I'm not handling a single '\r' because that was before Mac OS X; but 
it is handled by the following line (i.e. by the old logic):

text = text.translate(self.unicode_whitespace_trans)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-03 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

Irit: I assume you mean r' \r?\n', that's a great idea, it's much faster than 
adding a separate replacement step.

Latest version I came up with is this:

if re.search(r' \r?\n', text):
text = re.sub(r' \r?\n', ' ', text)
if re.search(r'\r?\n ', text):
text = re.sub(r'\r?\n ', ' ', text)

This optimizes the case when there's no newlines, which is likely the most 
common case for small fragments of text, but it may be the less common case for 
larger fragments where performance is more important; so I'm not sure if it's 
worth it.

Timings:
# sub() has to run
2904 (~/opensource/cpython) % ./python.exe -mtimeit 'import textwrap' 
'textwrap.wrap("abc foo\n bar baz", 5)'   VICMD
5000 loops, best of 5: 67.6 usec per loop

# search() runs; but sub() does NOT because there's no adjacent space
2906 (~/opensource/cpython) % ./python.exe -mtimeit 'import textwrap' 
'textwrap.wrap("abc foo\nbar baz", 5)'VICMD
5000 loops, best of 5: 60.3 usec per loop

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-03 Thread Irit Katriel


Irit Katriel  added the comment:

You should be able to do them in one re, something like 

text = re.sub(r' ?\n', ' ', text)

--
nosy: +iritkatriel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-03 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

I think fix to make `drop_whitespace=False` stable, can be as simple as adding 
two lines in `_munge_whitespace()`:

+text = re.sub(r' \n', ' ', text)
+text = re.sub(r'\n ', ' ', text)
 text = text.translate(self.unicode_whitespace_trans)

The perf impact is not small though, 12% :

2892 (~/opensource/cpython) % ./python.exe -mtimeit 'import textwrap' 
'textwrap.wrap("abc foo\nbar baz", 5)'  --INS--
5000 loops, best of 5: 60.2 usec per loop

2893 (~/opensource/cpython) % r 
  --INS--
./python.exe -mtimeit 'import textwrap' 'textwrap.wrap("abc foo\nbar baz", 5)'
5000 loops, best of 5: 52.9 usec per loop


I don't know if it's worth doing, but if yes, the options are:

 - just add this change for drop_whitespace=False, which is not the default, so 
perf regression will not affect default usage of wrap.

 - add a new arg that will only have effect when drop_whitespace=False, and 
will run these 2 lines. Name could be something like `collapse_space_newline`. 
It's hard to think of a good name.

If '\r\n' is handled, it needs one additional `sub()` line, and the perf. 
difference is 22%.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2021-08-02 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

This issue is due to *drop_whitespace* arg being True by default.

Documentation states that *drop_whitespace* applies after wrapping, so the fix 
in the PR would break that promise because, as far as I understand, it applies 
the wrapping after *drop_whitespace*. (Irit's comment on the PR refers to this).

I feel like the behaviour in the PR is more logical and probably what most 
users would prefer and need in most cases. But it's a backwards compatibility 
change of behaviour that's not buggy.

'foo  bar'
# wrap for width=10
['foo','bar']  # current behaviour
['foo bar']# more intuitive wrapping to width=10

As I was looking at this, I thought that wrapping with drop_whitespace=False 
would surely be stable. Not so!

It looks like the logic assumes that \n needs to be replaced by a space for 
cases like 'foo\nbar', which makes sense because otherwise two words would be 
joined together. But with drop_whitespace=False, repeated fill() calls will 
keep replacing \n with a space and then adding a new \n to split lines again:

  original: '    .  '
   wrapped: '    .  \n'
 wrapped twice: '    .   \n'
wrapped thrice: '    .\n'

Further, a newline with a nearby space within the requested width will be 
converted to two spaces:
'a\n b' => 'a  b'

I don't know if the original issue is worth fixing or not, but I think the 
issue shown above would be good to fix.

--
nosy: +andrei.avk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2020-10-08 Thread Julien Palard


Julien Palard  added the comment:

Could be "related" to https://bugs.python.org/issue41975.

--
nosy: +mdk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2019-10-20 Thread Cheryl Sabella


Change by Cheryl Sabella :


--
versions: +Python 3.9 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2019-05-18 Thread Cheryl Sabella


Cheryl Sabella  added the comment:

@larry, it looks like this was close to being merged pending some review 
comments by Serhiy.  Although this is considered a bug and not a new feature, 
it might be nice to try to get this in for 3.8.  Thanks!

--
nosy: +cheryl.sabella

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2018-02-11 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
nosy: +serhiy.storchaka
type:  -> behavior
versions: +Python 3.6, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2018-02-10 Thread Larry Hastings

Change by Larry Hastings :


--
keywords: +patch
pull_requests: +5425
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2017-12-21 Thread Larry Hastings

Larry Hastings  added the comment:

FWIW, the test program produces this output:

--

 original: '    .  '
  wrapped: '    .\n'
wrapped twice: '    . '

Traceback (most recent call last):
  File "textwrap.isnt.stable.py", line 24, in 
assert wrapped == wrapped2
AssertionError

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32397] textwrap output may change if you wrap a paragraph twice

2017-12-21 Thread Larry Hastings

New submission from Larry Hastings :

If you word-wrap a paragraph twice with textwrap, you may get different 
results.  Specifically, you *will* get different results when:
* the original text has a line that is too long by one character,
* the last word on the line is the first word in a new sentence, and
* there are two spaces after the period.

The first textwrap will replace the two spaces after the period with a newline; 
the second textwrap will replace the newline with a single space.

Attached is a test case demonstrating the problem.

It's not a big problem, but it did cause an assertion failure in blurb.  The 
workaround was to word-wrap all paragraphs twice, which works but is kind of 
dumb.

--
components: Library (Lib)
files: textwrap.isnt.stable.py
messages: 308872
nosy: larry
priority: low
severity: normal
status: open
title: textwrap output may change if you wrap a paragraph twice
versions: Python 3.7
Added file: https://bugs.python.org/file47344/textwrap.isnt.stable.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com