We seem to agree that this is work for linters. That's reasonable; I'd generalize it to "tools and policies". But even so, discussing what we'd expect linters to do is on topic here. Perhaps we can even find ways for the language to support linters -- type checking is also for external tools, but has language support.

For example: should the parser emit a lightweight audit event if it finds a non-ASCII identifier? (See below for why ASCII is special.)
Or for encoding declarations?

On 03. 11. 21 6:26, Stephen J. Turnbull wrote:
Serhiy Storchaka writes:

  > All control characters except CR, LF, TAB and FF are banned outside
  > comments and string literals. I think it is worth to ban them in
  > comments and string literals too.

+1

  > > For homoglyphs/confusables, should there be a SyntaxWarning when an
  > > identifier looks like ASCII but isn't?
  >
  > It would virtually ban Cyrillic.

+1 (for the comment and for the implied -1 on SyntaxWarning, let's
keep the Cyrillic repertoire in Python!)

I don't think this would actually ban Cyrillic/Greek.
(My suggestion is not vanilla confusables detection; it might require careful reading: "should there be a [linter] warning when an identifier looks like ASCII but isn't?")

I am not a native speaker, but I did try a bit to find an actual ASCII-like word in a language that uses Cyrillic. I didn't succeed; I think they might be very rare. Even if there was such a word -- or a one-letter abbreviation used as a variable name -- it would be confusing to use. Removing the possibility of confusion could *help* Cyrillic users. (I can't speak for them; this is just a brainstorming idea.)

Steven adds:
Let's not enshrine as a language "feature" that non Western European languages are dangerous second-class citizens.

That would be going too far, yes, but the fact is that non-English languages *are* second-class citizens. Code that uses Python keywords and stdlib must use English, and possibly another language. It is the mixing of languages that can be dangerous/confusing, not the languages themselves.



  > It is a work for linters,

+1

Aside from the reasons Serhiy presents, I'd rather not tie
this kind of rather ambiguous improvement in Unicode handling to the
release cycle.

It might be worth having a pep9999 module/script in Python (perhaps
more likely, PyPI but maintained by whoever does the work to make
these improvements + Petr or somebody Petr trusts to do it), that
lints scripts specifically for confusables and other issues.

If I have any say in it, the name definitely won't include a PEP number ;)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LB4O3YVDNVVNLYPMNH236QXGGUYG4BVI/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to