[issue8859] split() splits on non whitespace char when ther is no separator given.

Peter Landgren Sun, 30 May 2010 13:03:55 -0700

Peter Landgren <[email protected]> added the comment:

I am not sure I can follow you. I will try to be more specific.


The test string consists originally of one character; the Czech Š.

1. On Linux with Python 2.6.4
1.1 If I keep the original code line order:
label = obj.get()
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
label = unicode(label)
if len(label) > 40:
    label = label[:40] + "..."

Both lines print type(label), repr(label) gives:
<type 'str'> '\xc5\xa0'

1.2 If I change order and take the unicode conversion first:
label = obj.get()
label = unicode(label)
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
if len(label) > 40:
    label = label[:40] + "..."

Both lines print type(label), repr(label) gives:
<type 'unicode'> u'\u0160'

2. On Windows with Python 2.6.5
2.1 The original code line order:
The lines print type(label), repr(label) gives
<type 'str'> '\xc5\xa0'
<type 'str'> '\xc5'
 8217: ERROR: gramps.py: line 138: Unhandled exception
 ....

2.2 If I change order and take the unicode conversion first:
Both lines print type(label), repr(label) gives:
<type 'unicode'> u'\u0160'

3.
If I use this little code:
# -*- coding: utf-8 -*-
label = 'Š'
print type(label), repr(label)
label = " ".join(label.split())
print type(label), repr(label)
I get 
<type 'str'> '\xc5\xa0'
<type 'str'> '\xc5\xa0'
on both Linux and Windows.

The examples above under 1. and 2. comes from an application, Gramps.

There is still something I don't understand.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8859>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8859] split() splits on non whitespace char when ther is no separator given.

Reply via email to