On 12/01/2014 07:50, wxjmfa...@gmail.com wrote:
sys.version
2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]
s = 'Straße'
assert len(s) == 6
assert s[5] == 'e'


jmf


On my utf8 based system


robin@everest ~:
$ cat ooo.py
if __name__=='__main__':
    import sys
    s='A̅B'
    print('version_info=%s\nlen(%s)=%d' % (sys.version_info,s,len(s)))
robin@everest ~:
$ python ooo.py
version_info=sys.version_info(major=3, minor=3, micro=3, releaselevel='final', 
serial=0)
len(A̅B)=3
robin@everest ~:
$


so two 'characters' are 3 (or 2 or more) codepoints. If I want to isolate so called graphemes I need an algorithm even for python's unicode ie when it really matters, python3 str is just another encoding.
--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to