Re: [fpc-pascal] Parse unicode scalar

Nikolay Nikolov via fpc-pascal Mon, 03 Jul 2023 22:08:44 -0700


On 7/4/23 07:56, Hairy Pixels via fpc-pascal wrote:

On Jul 4, 2023, at 11:50 AM, Hairy Pixels <[email protected]> wrote:

You know you're right, with properly enclosed patterns you can capture 
everything inside and it works. You won't know if you had unicode in your 
string or not though but that depends on what's being parsed and if you care or 
not (I'm doing a TOML parser).

Sorry I'm still curious even though it's not my current problem :)

How can I make this program output the expected results:

   w: widechar;
   a: array of widechar;
begin
    for w in 'abc🐻' do
      a += [w];
   // Outputs 7 instead of 4
   writeln(length(a));
end;

The user doesn't know about unicode they just want to get an array of 
characters and not worry about all these little details. What can FPC do to 
solve this problem?

Depends on what you need, but I suppose in this case you want to countthe number of extended grapheme clusters (a.k.a. "user perceivedcharacters" - how many character-like things are displayed on thescreen). You might be tempted to count the number of Unicode codepoints, but that's not the same, due to the existence of combiningcharacters:


https://en.wikipedia.org/wiki/Combining_character

For extended grapheme clusters, there's an iterator in thegraphemebreakproperty unit. I implemented this for the Unicode KVM andFreeVision. There it's needed for figuring out how many character blocksin the console will be needed to display a certain string. For theconsole or other GUIs that use fixed width fonts, there's also the EastAsian Width property as well - some characters (East Asian - Chinese,Japanese, Korean) take double the space. So, to figure out where to movethe cursor, you need to take East Asian Width as well.


Nikolay

_______________________________________________
fpc-pascal maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Parse unicode scalar

Reply via email to