We seem to agree that this is work for linters. That's reasonable; I'd
generalize it to "tools and policies". But even so, discussing what we'd
expect linters to do is on topic here.
Perhaps we can even find ways for the language to support linters --
type checking is also for external tools, but has language support.
For example: should the parser emit a lightweight audit event if it
finds a non-ASCII identifier? (See below for why ASCII is special.)
Or for encoding declarations?
On 03. 11. 21 6:26, Stephen J. Turnbull wrote:
Serhiy Storchaka writes:
> All control characters except CR, LF, TAB and FF are banned outside
> comments and string literals. I think it is worth to ban them in
> comments and string literals too.
+1
> > For homoglyphs/confusables, should there be a SyntaxWarning when an
> > identifier looks like ASCII but isn't?
>
> It would virtually ban Cyrillic.
+1 (for the comment and for the implied -1 on SyntaxWarning, let's
keep the Cyrillic repertoire in Python!)
I don't think this would actually ban Cyrillic/Greek.
(My suggestion is not vanilla confusables detection; it might require
careful reading: "should there be a [linter] warning when an identifier
looks like ASCII but isn't?")
I am not a native speaker, but I did try a bit to find an actual
ASCII-like word in a language that uses Cyrillic. I didn't succeed; I
think they might be very rare.
Even if there was such a word -- or a one-letter abbreviation used as a
variable name -- it would be confusing to use. Removing the possibility
of confusion could *help* Cyrillic users. (I can't speak for them; this
is just a brainstorming idea.)
Steven adds:
Let's not enshrine as a language "feature" that non Western European
languages are dangerous second-class citizens.
That would be going too far, yes, but the fact is that non-English
languages *are* second-class citizens. Code that uses Python keywords
and stdlib must use English, and possibly another language. It is the
mixing of languages that can be dangerous/confusing, not the languages
themselves.
> It is a work for linters,
+1
Aside from the reasons Serhiy presents, I'd rather not tie
this kind of rather ambiguous improvement in Unicode handling to the
release cycle.
It might be worth having a pep9999 module/script in Python (perhaps
more likely, PyPI but maintained by whoever does the work to make
these improvements + Petr or somebody Petr trusts to do it), that
lints scripts specifically for confusables and other issues.
If I have any say in it, the name definitely won't include a PEP number ;)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/LB4O3YVDNVVNLYPMNH236QXGGUYG4BVI/
Code of Conduct: http://python.org/psf/codeofconduct/