Re: [fpc-devel] Unicode resource strings

Hans-Peter Diettrich Tue, 21 Aug 2012 04:20:06 -0700

Aleksa Todorovic schrieb:

On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B <ivankob4m...@gmail.com> wrote:

Handling 1..4(6) bytes is less efficient than handling surrogate
 *pairs*.
===============
But surrogate pairs break array-like fast char access anyway,  isn't it ?


It's also "broken" in UTF8 in the same way - so none of them gets +1
on this. UCS4 is the only real winner here (one dword for each
character).

Depending on the language, ligatures etc. still can span multiplecodepoints. IMO everybody should decide whether he wants to do textprocessing for full Unicode, or whether simple stringhandling (as usedtill now) is sufficient.

I never heard that non-canoncial text has caused problems in charactersets with accents or umlauts - except in (MacOS, Linux) filenames. Sincefile searches have to use the platform API, all required specialhandling can be encapsulated in the RTL.

Breaking strings into substrings can be done on specific delimiters(spaces...), which are all ASCII, again no complication with UTF. Acomparison or search for given patterns also is insensitive to theencoding. Where would one really need indexed access to single characters?


DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode resource strings

Reply via email to