On Thu, 11 Jun 2015 12:28 am, Skip Montanaro wrote: > On Wed, Jun 10, 2015 at 8:28 AM, Tim Chase > <python.l...@tim.thechases.com> wrote: >> Is this a bug? > > Looks like it's been reported a few times with slightly different context: > > https://bugs.python.org/issue6537 > https://bugs.python.org/issue16623 > https://bugs.python.org/issue20491 > https://bugs.python.org/issue1390608 > > The couple times it's come up in the context of str.split, it's been > rejected, since the purpose of that method is to split words.
That reasoning is ... strange. The whole point of the NBSP is specifically *not* to split on it. If you wanted it to split, you would use a regular space. (Oh, and for the record, there are at least two non-breaking spaces in Unicode, U+00A0 "NO-BREAK SPACE" and U+202F "NARROW NO-BREAK SPACE".) http://www.unicode.org/charts/PDF/U0080.pdf http://www.unicode.org/charts/PDF/U2000.pdf Non-breaking spaces should be used for when you want to prevent word-wrapping, and also for "open form" compound words: http://grammar.ccc.commnet.edu/grammar/compounds.htm textwrap should also treat NBSPs as non-spaces for the purposes of wrapping. As a work-around, I think this should work: - split the string on NBSPs; - for substring returned, split normally; - merge sub-substrings. def split(s): """Split on whitespace, except NBSP. >>> split(u'hello world spam\\u00A0eggs cheese') [u'hello', u'world', u'spam\\xa0eggs', 'cheese'] """ words = [] NBSP = u'\u00A0' substrings = s.split(NBSP) for i, sub in enumerate(substrings): parts = sub.split() if i == 0: words.extend(parts) else: words[-1] += NBSP + parts[0] words.extend(parts[1:]) return words -- Steven -- https://mail.python.org/mailman/listinfo/python-list