DMD: invalid UTF character `\U0000d800`

Per Nordlöw via Digitalmars-d-learn Sat, 07 Nov 2020 08:15:29 -0800

I'm writing a parser generator for ANTLR-grammars and have comeacross the rule


fragment Letter
    : [a-zA-Z$_] // these are below 0x7F

| ~[\u0000-\u007F\uD800-\uDBFF] // covers all charactersabove 0x7F which are not a surrogate| [\uD800-\uDBFF] [\uDC00-\uDFFF] // covers UTF-16 surrogatepairs encodings for U+10000 to U+10FFFF


at

https://github.com/antlr/grammars-v4/blob/master/cto/CtoLexer.g4#L158

This rule is converted into

    Match m__Letter()
    {

return alt(alt(rng('a', 'z'), rng('A', 'Z'), ch('$'),ch('_')),not(alt(rng('\u0000', '\u007F'), rng('\uD800','\uDBFF'))),seq(rng('\uD800', '\uDBFF'), rng('\uDC00','\uDFFF')));

    }

given suitable defs of alt, rng, seq, not.

This errors as

CtoLexer_parser.d 665 57 error invalid UTFcharacter \U0000d800CtoLexer_parser.d 665 67 error invalid UTFcharacter \U0000dbffCtoLexer_parser.d 666 28 error invalid UTFcharacter \U0000d800CtoLexer_parser.d 666 38 error invalid UTFcharacter \U0000dbffCtoLexer_parser.d 666 53 error invalid UTFcharacter \U0000dc00CtoLexer_parser.d 666 63 error invalid UTFcharacter \U0000dfff


Doesn't DMD support these Unicodes yet?

DMD: invalid UTF character `\U0000d800`

Reply via email to