On 21 Nov 2008, at 14:50, Michael Schnell wrote:
If Length() would return its value in chars, what length in *bytes*
would the following call set:
SetLength(utfstring_1), Length(utfstring_2));
I don't really understand your question.
I think would would need to have two different function
UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String),
first giving the string length in code elements (byte) and second
giving the length in code points (unicode characters),
So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would
be 1.
Or 2, depending on whether it's predcomposed or decomposed.
I think we should have a third function Length(UTF8String) that can
be selected by the user (e.g. via a {$ option to be mapped to wither
of the two.
He's simply talking about the case where Length is mapped to your
proposed UTF8PointLength.
I do see that there in fact is a compatibility problem when porting
old code with the setting of UTF8Count=Point.
here
SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
which does not make sense if UTF8PointLength(utfstring_1) is smaller
than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because there is
no way for "UTF8PointSetLength" to know how many bytes it has to
allocate when you pass a value (any value, regardless of where it
comes from) to it.
Jonas_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel