Re: [fpc-devel] Unicode support in RTL - Roadmap

Michael Schnell Mon, 24 Nov 2008 03:56:54 -0800

Your comments are absolutely vague and meaningless.

Sorry, but this was discussed already several times, so I supposed thatthe problems I see are known to the discussion members:

But here a simple example Lazarus project with all options left instandard setting:


procedure TForm1.Button1Click(Sender: TObject);
var
sAnsiString: AnsiString;
sUTF8String: UTF8String;
sWideString: WideString;
begin
sAnsiString:='üu';
sUTF8String:='üu';
sWideString:='üu';

Memo1.Lines.Add('1) ' + IntToHex(integer(sAnsiString[1]),sizeof(char)*2) + ' ' +

                        IntToHex(integer(sAnsiString[2]), sizeof(char)*2) +
                        ' should be FC 75');

Memo1.Lines.Add('2) ' + IntToHex(integer(sUTF8String[1]),sizeof(char)*2) + ' ' +

                        IntToHex(integer(sUTF8String[2]), sizeof(char)*2) +
                        ' should be C3 BC');

Memo1.Lines.Add('3) ' + IntToHex(integer(sWideString[1]),sizeof(WideChar)*2) + ' ' +IntToHex(integer(sWideString[2]),sizeof(WideChar)*2) +

                        ' should be 00FC 0075');
end;

This results in

1) C3 BC should be FC 75
2) C3 BC should be C3 BC
3) 00C3 00BC should be 00FC 0075

You don't need to tell me why the result is as it is, I do know thedetails, but for me this really is "not at all desirable", as anynewcomer will get hit by this as soon as he tries to do any string handling.


Comment:

1) The type is named ANSIString and so anybody will suppose it in factholds data of this type (ANSI code according to the system's locale) -unless you do something else with it in your user program, but obviouslyit does not (with German locale on Windows the ANSI code of ü is $FC ).

2) This in fact is as expected, provided you know that UTF8Strings arecounted in code-elements rather than in code-points (aka UnicodeCharacters). But I feel that anybody who does not explicitly usesUnicode will assume character (notwithstanding that an utf8character isnot defined in FPC). But you legally can claim that anybody who reallywants to do Unicode should make himself comfortable with the details ofUTF8.

3) Assigning a string constant to a WideString does not work asexpected. The result is not a legal UTF16 representing the constant theuser wrote.

Not to mention
thay also don't propose an alternative.

In these discussions I provided a lot of suggestions (that might ormight not be sensible) but of course the executive teams (FPC andLazarus) themselves need to decide what to do. (The FPC team seem tointend to introduce strings that dynamically know the coding it contains.)

Sorry to be blunt, but so were your comments.

Sorry if I sounded blunt. I'm very happy and thankful that there arevolunteers who dedicate their spare time to make things like FPC andLazarus happen. My ranting was meant to help them improve Lazarus andFPC usability.

While the previous Lazarus version's string handling worked as expectedwith ANSIString, the new version forces utf8 coding onto the user, evenif he is perfectly happy with the locale-depending ANSI he is used to.IMHO this only is harmful (shooing away potential users), as it instandard situation it does not work exactly as the old ANSIString handling.


-Michael



_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode support in RTL - Roadmap

Reply via email to