Re: 'Straße' ('Strasse') and Python 2

Terry Reedy Wed, 15 Jan 2014 16:29:46 -0800

On 1/15/2014 11:55 AM, Robin Becker wrote:

The fact that unicoders want to take over the meaning of encoding is not
relevant.

I agree with you that 'encoding' should not be limited to 'byte encodingof a (subset of) unicode characters. For instance, .jpg and .png arebyte encodings of images. In the other hand, it is common in humandiscourse to omit qualifiers in particular contexts. 'Computer virus'gets condensed to 'virus' in computer contexts.

The problem with graphemes is that there is no fixed set of unicodegraphemes. Which is to say, the effective set of graphemes iscontext-specific. Just limiting ourselves to English, 'fi' is usually 2graphemes when printing to screen, but often just one when printing topaper. This is why the Unicode consortium punted 'graphemes' to'application' code.

I'm not anti unicode, that's just an assignment of identity to some
symbols. Coding the values of the ids is a separate issue. It's my
belief that we don't need more than the byte level encoding to represent
unicode. One of the claims made for python3 unicode is that it somehow
eliminates the problems associated with other encodings eg utf8,

The claim is true for the following problems of the way-too-numerousunicode byte encodings.


Subseting: only a subset of characters can be encoded.

Shifting: the meaning of a byte depends on a preceding shift character,which might be back as the beginning of the sequence.

Varying size: the number of bytes to encode a character depends on thecharacter.

Both of the last two problems can turn O(1) operations into O(n)operations. 3.3+ eliminates all these problems.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: 'Straße' ('Strasse') and Python 2

Reply via email to