> On Jul 4, 2023, at 11:45 AM, Nikolay Nikolov via fpc-pascal 
> <fpc-pascal@lists.freepascal.org> wrote:
> 
> But you just don't need to do this, in order to tokenize Pascal. The 
> beginning and the end of the string literal is the apostrophe, which is 
> ASCII. The bear is a sequence of UTF-8 code units (opaque to the compiler), 
> that will not be mistaken for an apostrophe, or end of line, because they 
> will have their high bit set. There's simply no need for a Pascal tokenizer 
> to iterate over UTF-8 code points, instead of code units.

You know you're right, with properly enclosed patterns you can capture 
everything inside and it works. You won't know if you had unicode in your 
string or not though but that depends on what's being parsed and if you care or 
not (I'm doing a TOML parser).

Maybe I can skip that part and just focus on the decoding of the unicode scalars

Regards,
Ryan Joseph

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to