On 03.11.2021 01:21, Chris Angelico wrote:
> On Wed, Nov 3, 2021 at 11:09 AM Steven D'Aprano <st...@pearwood.info> wrote:
>>
>> On Wed, Nov 03, 2021 at 03:03:54AM +1100, Chris Angelico wrote:
>>> On Wed, Nov 3, 2021 at 1:06 AM Petr Viktorin <encu...@gmail.com> wrote:
>>>> Let me know if it's clear in the newest version, with this note:
>>>>
>>>>> Here, ``encoding: unicode_escape`` in the initial comment is an encoding
>>>>> declaration. The ``unicode_escape`` encoding instructs Python to treat
>>>>> ``\u0027`` as a single quote (which can start/end a string), ``\u002c`` as
>>>>> a comma (punctuator), etc.
>>>>
>>>
>>> Huh. Is that level of generality actually still needed? Can Python
>>> deprecate all but a small handful of encodings?
>>
>> To be clear, are you proposing to deprecate the encodings *completely*
>> or just as the source code encoding?
> 
> Only source code encodings. Obviously we still need to be able to cope
> with all manner of *data*, but Python source code shouldn't need to be
> in bizarre, weird encodings.
> 
> (Honestly, I'd love to just require that Python source code be UTF-8,
> but that would probably cause problems, so mandating that it be one of
> a small set of encodings would be a safer option.)

Most Python code will be written in UTF-8 going forward, but there's
still a lot of code out there in other encodings. Limiting this
to some reduced set doesn't really make sense, since it's not
clear where to draw the line.

Coming back to the thread topic, many of the Unicode security
considerations don't apply to non-Unicode encodings, since those
usually don't support e.g. changing the bidi direction within a
stream of text or other interesting features you have in Unicode
such as combining code points, invisible (space) code points, font
rendering hint code points, etc.

So in a sense, those non-Unicode encodings are safer than
using UTF-8 :-)

Please also note that most character lookalikes are not encoding
issues, but instead font issues, which then result in the characters
looking similar.

There are fonts which are designed to avoid this
and it's no surprise that source code fonts typically do make
e.g. 0 and O, as well as 1 and l look sufficiently different to be
able to notice the difference.

Things get a lot harder when dealing with combining characters, since
it's not always easy to spot the added diacritics, e.g. try
this:

>>> print ('a\u0348bc') # strong articulation
a͈bc
>>> print ('a\u034Fbc') # combining grapheme joiner
a͏bc

The latter is only "visible" in the unicode_escape encoding:

>>> print ('a\u034Fbc'.encode('unicode_escape'))
b'a\\u034fbc'

Projects wanting to limit code encoding settings, disallow using
bidi markers and other special code points in source code, can easily
do this via e.g. pre-commit hooks, special editor settings, code
linters or security scanners.

I don't think limiting the source code encoding is the right approach
to making code more secure. Instead, tooling has to be used to detect
potentially malicious code points in code.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 03 2021)
>>> Python Projects, Coaching and Support ...    https://www.egenix.com/
>>> Python Product Development ...        https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               https://www.egenix.com/company/contact/
                     https://www.malemburg.com/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MBWBY47ILPL3E6733W4XAZXF2M6RKFH6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to