Hi Experts,

There has been a long winding discussion on this in the "German Lazarus Forum" and I have been very dissatisfied with the result.

Maybe this already has been discussed in one of the "Unicode" threads here, but I did not follow all of them down to the latest twig and leaf. So I start a new thread hoping for a more comprehensive result.

When using UTF8String I found that if s is an UTF8String containing "ö2", length(s) is 3 and s[3] is "2". Obviously, UTF8Strings content is counted regarding the 8 bit sub-codes and not the "visible" characters. While I don't like this "un-String-like" behavior at all, I am aware that this is by design to guarantee a decent speed.

But happily we don't need to use UTF8Strings to handle Unicode, as we do have WideStrings, which suffer from this queer behavior only when we try to store extremely strange characters (Unicode > $FFFF) using "surrogate pairs". I feel that I am very unlikely to ever need to do this.

So I did some tests with WideStrings and found strange things with them, too. While some of them are Lazarus issues, one quite obviously is introduced by the compiler.

When I want to simply assign a constant text "ö2" to a WideString I would think that I just write s := 'ö2'; . But I found that this does not work, but that it creates a WideString of length 3 that contains the three 8-Bit subcodes of the utf8-coded string "ö2", zero-extended to 16 Bits, each in one WideChar element. For me this is very surprising and incompatible to the same code (s := 'ö2'; ) used in a Turbo-Delphi program.

Obviously - other than Turbo-Delphi that uses ANSIString here - a constant string gets UTF8String as it's intermediate type. This might be a useful definition, but if that is done this way why does an assignment WideString := UTF8String inot implicitly call UTF8Decode as a type conversion ? In my example it calls fpc_ansistr_to_widestr instead, just as if the UTF8String would be an ANSIString.

Is there some compiler setting to change this ?

-Michael

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to