Re: 'Straße' ('Strasse') and Python 2

Robin Becker Wed, 15 Jan 2014 09:01:26 -0800

On 15/01/2014 16:28, Travis Griggs wrote:

........ of a sequence of graphemes I can use either a sequence of bytes or asequence of codepoints. They are both encodings of the graphemes; what unicodesays is an encoding doesn't define what encodings are ie mappings from somesource alphabet to a target alphabet.


But you’re talking about two levels of encoding. One runs on top of the other. 
So insisting that you be able to call them all encodings, makes the term 
pointless, because now it’s ambiguous as to what you’re referring to. Are you 
referring to encoding in the sense of representing code points with bytes? Or 
are you referring to what the unicode guys call “forms”?

For example, the NFC form of ‘ñ’ is ’\u00F1’. ‘nThe NFD form represents the 
exact same grapheme, but is ‘\u006e\u0303’. You can call them encodings if you 
want, but I echo Ned’s sentiment that you keep that to yourself. 
Conventionally, they’re different forms, not different encodings. You can 
encode either form with an encoding, e.g.

'\u00F1'.encode('utf8’)
'\u00F1'.encode('utf16’)

'\u006e\u0303'.encode('utf8’)
'\u006e\u0303'.encode('utf16')

I think about these as encodings, because that's what they are mathematically,logically & practically. I can encode the target grapheme sequence as a sequenceof bytes using a particular 'unicode encoding' eg utf8 or a sequence of code points.


The fact that unicoders want to take over the meaning of encoding is not 
relevant.

In my utf8 bash shell the python print() takes one encoding (python3 str) andtranslates that to the stdout encoding which happens to be utf8 and passes thatto the shell which probably does a lot of work to render the result as graphicalsymbols (or graphemes).

I'm not anti unicode, that's just an assignment of identity to some symbols.Coding the values of the ids is a separate issue. It's my belief that we don'tneed more than the byte level encoding to represent unicode. One of the claimsmade for python3 unicode is that it somehow eliminates the problems associatedwith other encodings eg utf8, but in fact they will remain until we forceprinters/designers to stop using complicated multi-codepoint graphemes. Isuspect that won't happen.

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Re: 'Straße' ('Strasse') and Python 2

Reply via email to