[issue46572] Unicode identifiers not necessarily unique

2022-01-29 Thread Diego Argueta


Diego Argueta  added the comment:

I did read PEP-3131 before posting this but I still thought the behavior was 
counterintuitive.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46572] Unicode identifiers not necessarily unique

2022-01-29 Thread Eryk Sun

Eryk Sun  added the comment:

Please read "Identifiers and keywords" [1] in the documentation. For example:

>>> import unicodedata as ud
>>> ud.normalize('NFKC', '햇햆햗') == 'bar'
True

>>> c = '\N{CYRILLIC SMALL LETTER A}'
>>> ud.name(ud.normalize('NFKC', c))
'CYRILLIC SMALL LETTER A'

---
[1] 
https://docs.python.org/3/reference/lexical_analysis.html?highlight=nfkc#identifiers

--
nosy: +eryksun
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46572] Unicode identifiers not necessarily unique

2022-01-29 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

This seems coherent with https://www.python.org/dev/peps/pep-3131/ to me. The 
parser ensures all identifiers are converted into the normal form NFKC while 
parsing; comparison of identifiers is based on NFKC.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46572] Unicode identifiers not necessarily unique

2022-01-29 Thread Diego Argueta

New submission from Diego Argueta :

The way Python 3 handles identifiers containing mathematical characters appears 
to be broken. I didn't test the entire range of U+1D400 through U+1D59F but I 
spot-checked them and the bug manifests itself there:

Python 3.9.7 (default, Sep 10 2021, 14:59:43) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> foo = 1234567890
>>> bar = 1234567890
>>> foo is bar
False
>>> 햇햆햗 = 1234567890

>>> foo is 햇햆햗
False
>>> bar is 햇햆햗
True

>>> 햇햆햗 = 0
>>> bar
0


This differs from the behavior with other non-ASCII characters. For example, 
ASCII 'a' and Cyrillic 'a' are properly treated as different identifiers:

>>> а = 987654321# Cyrillic lowercase 'a', U+0430
>>> a = 123456789# ASCII 'a'
>>> а# Cyrillic
987654321
>>> a# ASCII
123456789


While a bit of a pathological case, it is a nasty surprise. It's possible this 
is a symptom of a larger bug in the way identifiers are resolved.

This is similar but not identical to https://bugs.python.org/issue46555

Note: I did not find this myself; I give credit to Cooper Stimson 
(https://github.com/6C1) for finding this bug. I merely reported it.

--
components: Parser, Unicode
messages: 412084
nosy: da, ezio.melotti, lys.nikolaou, pablogsal, vstinner
priority: normal
severity: normal
status: open
title: Unicode identifiers not necessarily unique
type: behavior
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com