Re: 'Straße' ('Strasse') and Python 2

Ned Batchelder Wed, 15 Jan 2014 04:16:13 -0800

On 1/15/14 7:00 AM, Robin Becker wrote:

On 12/01/2014 07:50, wxjmfa...@gmail.com wrote:

sys.version

2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]

s = 'Straße'
assert len(s) == 6
assert s[5] == 'e'

jmf


On my utf8 based system

robin@everest ~:
$ cat ooo.py
if __name__=='__main__':
    import sys
    s='A̅B'
    print('version_info=%s\nlen(%s)=%d' % (sys.version_info,s,len(s)))
robin@everest ~:
$ python ooo.py
version_info=sys.version_info(major=3, minor=3, micro=3,
releaselevel='final', serial=0)
len(A̅B)=3
robin@everest ~:
$



so two 'characters' are 3 (or 2 or more) codepoints. If I want to
isolate so called graphemes I need an algorithm even for python's
unicode ie when it really matters, python3 str is just another encoding.

You are right that more than one codepoint makes up a grapheme, and thatyou'll need code to deal with the correspondence between them. But let'snot muddy these already confusing waters by referring to that mapping asan encoding.

In Unicode terms, an encoding is a mapping between codepoints and bytes.Python 3's str is a sequence of codepoints.


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Re: 'Straße' ('Strasse') and Python 2

Reply via email to