Re: [fpc-devel] Unicode support in RTL - Roadmap

Jonas Maebe Fri, 21 Nov 2008 06:28:10 -0800


On 21 Nov 2008, at 14:50, Michael Schnell wrote:

If Length() would return its value in chars, what length in *bytes*would the following call set:
SetLength(utfstring_1), Length(utfstring_2));
I don't really understand your question.

I think would would need to have two different function
UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String),first giving the string length in code elements (byte) and secondgiving the length in code points (unicode characters),
So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') wouldbe 1.


Or 2, depending on whether it's predcomposed or decomposed.

I think we should have a third function Length(UTF8String) that canbe selected by the user (e.g. via a {$ option to be mapped to witherof the two.

He's simply talking about the case where Length is mapped to yourproposed UTF8PointLength.

I do see that there in fact is a compatibility problem when portingold code with the setting of UTF8Count=Point.
here

SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
which does not make sense if UTF8PointLength(utfstring_1) is smallerthan UTF8PointLength(utfstring_2).

It does not make any sense under any circumstances, because there isno way for "UTF8PointSetLength" to know how many bytes it has toallocate when you pass a value (any value, regardless of where itcomes from) to it.



Jonas_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode support in RTL - Roadmap

Reply via email to