Guido van Rossum wrote: >> The distinction of letters and digits is also straight-forward: >> a digit is ASCII [0-9]; it's a separate lexical class only >> because it plays a special role in (number) literals. More >> generally, there is the distinction of starter and non-starter >> characters. > > But Unicode has many alternative sets digits for which "isdigit" is true.
You mean, the Python isdigit() method? Sure, but the tokenizer uses the C isdigit function, which gives true only for [0-9]. FWIW, POSIX allows 6 alternative characters to be defined as hexdigits for isxdigit, so the tokenizer shouldn't really use isxdigit for hexadecimal literals. So from the implementation point of view, nothing much would have to change: the usage of isalnum in the tokenizer is already wrong, as it already allows to put non-ASCII characters into identifiers, if the locale classifies them as alpha-numeric. I can't see why the Unicode notion of digits should affect the language specification in any way. The notion of digit is only used to define what number literals are, and I don't propose to change the lexical rules for number literals - I propose to change the rules for identifiers. > You can as far a the lexer is concerned because the lexer treats > keywords as "just" identifiers. Only the parser knows which ones are > really keywords. Right. But if the identifier syntax was [:identifier_start:][:identifier_cont:]* then thinks would work out just fine: identifier_start intersected with ASCII would be [A-Za-z_], and identifier_cont intersected with ASCII would be [A-Za-z0-9_]; this would include all keywords. You would still need punctuation between two subsequent "identifiers", and that punctuation would have to be ASCII, as non-ASCII characters would be restricted to comments, string literals, and identifiers. Regards, Martin _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
