Re: [fpc-devel] Unicode support in RTL - Roadmap

Sergei Gorelkin Fri, 21 Nov 2008 06:47:28 -0800

Michael Schnell wrote:

I don't really understand your question.
I think would would need to have two different function
UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String), firstgiving the string length in code elements (byte) and second giving thelength in code points (unicode characters),
So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would be 1.
I think we should have a third function Length(UTF8String) that can beselected by the user (e.g. via a {$ option to be mapped to wither of thetwo.
The same would be necessary for the SetLength function

e.g.
(1) UTF8ElementSetLength(utfstring_1), UTF8ElementLength(utfstring_2));
or
(2) UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
(2) would work as expected if the purpose i to delete all but the firstn characters in a string.
I don't see a decent use for (1) other than creating a string longenough to use as a buffer for e.g. TStream.read.
I do see that there in fact is a compatibility problem when porting oldcode with the setting of UTF8Count=Point.
here

SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
which does not make sense if UTF8PointLength(utfstring_1) is smallerthan UTF8PointLength(utfstring_2).

The SetLength function is used mostly for allocating the storage for thenew strings. Yes, it can be used for truncating the overlong strings,but truncating can be perfectly done with Delete (or UTF8Delete).

As you mentioned yourself, allocating utf-8 strings using length incodepoints is senseless. This is exactly what I wanted to say initially.

What follows is that for calls like SetLength(str1, Pos('foo', str2))you also cannot freely change the return value of Pos() from elements tocodepoints. And so on, and so forth.


Regards,
Sergei
_______________________________________________
fpc-devel maillist  -  [email protected]
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode support in RTL - Roadmap

Reply via email to