Re: Handle foreign character web input

Richard Damon Sun, 30 Jun 2019 05:28:02 -0700

On 6/30/19 4:00 AM, moi wrote:
> Le samedi 29 juin 2019 19:25:40 UTC+2, Richard Damon a écrit :
>>
>> Now (as I understand it), all Python (3) 'Strings' are internally
>> Unicode, if you need something with a different encoding it needs to be
>> in Bytes.
>>
>> -- 
> 
> Unfortunately not.
> 
> The only thing Python succeeds to propose is a mechanism
> which does the opposite of UTF-8 when it comes to handle
> memory *and* - at the same time - which also does the opposite
> of UTF-32 regarding performance.
> 
> For some other reasons, this mechanism leads to buggy
> code.
>


My understanding was that the Python 3 'String' class always used a
Unicode encoding (never a code-page encoding). If you indexed into a
string you would get at each location the full code point value of that
character. Now Unicode isn't just UTF-8 or UTF-32/UCS-4 or the like,
those are just different ways to encode into memory/a stream Unicode
code points. It may be that Python makes some awkward choices of how it
wants to store the characters in memory, but to the programmer, it is
just Unicode code points. If you specifically want something list a
UTF-8 encoding, that is one of the usages of Bytes was.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Handle foreign character web input

Reply via email to