En/na Marco van de Voort ha escrit:
They have a UTF-16/UCS-2 internal representation, same as MSEgui which works very well and is fast and handy BTW.
And len, slicing, etc. work as expected.
Note that if you need characters beyond $ffff you have to compile it
with wide unicode support, and in that case every character will use 4
bytes.

That's IMHO a faulty system. It requires you to choose between an incomplete
solution or making strings a horrible memory hog.

OTOH using variable length characters will make string operations expensive (since you can't just multiply the index by 2 or 4 but you have to examine the string from the beginning, and the length in bytes isn't the same as the length in characters).

But maybe that doesn't
matter for mere scripting languages (though I wonder then why they didn't
chose UTF-32 directly)

Surrogates are not nice, but they were invented for a reason.

Well, yes, they're a trade-off between performance and memory consumption, but I fear we're losing one of the advantages that pascal has over C: fast and simple string handling.

Bye
--
Luca
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to