Martin Schreiber schrieb:

All "access a char by index into a string" code I have seen, 99.99% of
the time work in a sequential manner. For that reason there is no
speed difference between using a UTF-16 or UTF-8 encoded string. Both
can be coded equally efficient.

Graeme, this is simply not true. Searching for known German characters in a UnicodeString the program can use the simple approach by character (code unit) index. It is even possible for known Chinese symbols of the BMP. And a simple "if" for surrogate pairs is more efficent as a 4-stage "case" for utf-8.

The good ole Pos() can do that, why search for more complicated implementations?

You still try to use old coding patterns which are simply inappropriate for dealing with Unicode strings. Why make a distinction between searching for a single character or multiple characters, when it's known that one character can require multiple bytes or words in UTF-8/16?

DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to