On 15/01/2014 12:13, Ned Batchelder wrote:
........
On my utf8 based system


robin@everest ~:
$ cat ooo.py
if __name__=='__main__':
    import sys
    s='A̅B'
    print('version_info=%s\nlen(%s)=%d' % (sys.version_info,s,len(s)))
robin@everest ~:
$ python ooo.py
version_info=sys.version_info(major=3, minor=3, micro=3,
releaselevel='final', serial=0)
len(A̅B)=3
robin@everest ~:
$


........
You are right that more than one codepoint makes up a grapheme, and that you'll
need code to deal with the correspondence between them. But let's not muddy
these already confusing waters by referring to that mapping as an encoding.

In Unicode terms, an encoding is a mapping between codepoints and bytes.  Python
3's str is a sequence of codepoints.

Semantics is everything. For me graphemes are the endpoint (or should be); to get a proper rendering of a sequence of graphemes I can use either a sequence of bytes or a sequence of codepoints. They are both encodings of the graphemes; what unicode says is an encoding doesn't define what encodings are ie mappings from some source alphabet to a target alphabet.
--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to