En/na Marco van de Voort ha escrit:
They have a UTF-16/UCS-2 internal representation, same as MSEgui which works
very well and is fast and handy BTW.
And len, slicing, etc. work as expected.
Note that if you need characters beyond $ffff you have to compile it
with wide unicode support, and in that case every character will use 4
bytes.
That's IMHO a faulty system. It requires you to choose between an incomplete
solution or making strings a horrible memory hog.
OTOH using variable length characters will make string operations
expensive (since you can't just multiply the index by 2 or 4 but you
have to examine the string from the beginning, and the length in bytes
isn't the same as the length in characters).
But maybe that doesn't
matter for mere scripting languages (though I wonder then why they didn't
chose UTF-32 directly)
Surrogates are not nice, but they were invented for a reason.
Well, yes, they're a trade-off between performance and memory
consumption, but I fear we're losing one of the advantages that pascal
has over C: fast and simple string handling.
Bye
--
Luca
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal