Hi Ryan,

I’ve created attached unit, which takes a code point and returns the utf8 char as a string. 
It’s based on the Wikipedia article on UTF8

UTF-8 encodes code points in one to four bytes, depending on the value of the code point. The x characters are replaced by the bits of the code point:



This table is copied from Wikipedia.

Attachment: uencoding.pas
Description: Binary data

Hope it’s useful for you. If you improve the code pls let me know.

Best regards,
Jeroen



On 2 Jul 2023, at 15:30, Hairy Pixels via fpc-pascal <fpc-pascal@lists.freepascal.org> wrote:

I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (💖).

Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of these 4 bytes:

240 159 146 150

I was able to correctly concatenate these characters and writeln printed the correct character.

var
s: String;
begin
s := char(240)+char(159)+char(146)+char(150);
writeln(s);
end.

The question is, how was 1F496 decomposed into 4 bytes?

Regards,
Ryan Joseph

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to