On Tue, Mar 22, 2016 at 12:48 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Mon, 21 Mar 2016 11:59 pm, Chris Angelico wrote:
>
>> On Mon, Mar 21, 2016 at 11:34 PM, BartC <b...@freeuk.com> wrote:
>>> For Python I would have used a table of 0..255 functions, indexed by the
>>> ord() code of each character. So all 52 letter codes map to the same
>>> name-handling function. (No Dict is needed at this point.)
>>>
>>
>> Once again, you forget that there are not 256 characters - there are
>> 1114112. (Give or take.)
>
> Pardon me, do I understand you correctly? You're saying that the C parser is
> Unicode-aware and allows you to use Unicode in C source code? Because
> Bart's test is for a (simplified?) C tokeniser, and expecting his tokeniser
> to support character sets that C does not would be, well, Not Cricket, my
> good chap.

We nutted part of this out earlier in the thread; Python 3.x code is,
and any other modern language should be, defined to have Unicode
source. (And yes, MRAB, I'm aware that only a tiny fraction of
codepoints are defined; it's still a lot more than 256, and going to
make for a far larger lookup table.) While you could plausibly define
that your source code consists only of printable ASCII characters (eg
09,10,13,32-126), it is an extremely bad idea to declare that it has
256 possibilities - you're shackling your language to a parser
definition that includes both more and less than people will expect.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to