Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Michael Schnell


Your comments are absolutely vague and meaningless. 


Sorry, but this was discussed already several times, so I supposed that 
the problems I see are known to the discussion members:


But here a simple example Lazarus project with all options left in 
standard setting:


procedure TForm1.Button1Click(Sender: TObject);
var
sAnsiString: AnsiString;
sUTF8String: UTF8String;
sWideString: WideString;
begin
sAnsiString:='üu';
sUTF8String:='üu';
sWideString:='üu';
Memo1.Lines.Add('1) ' + IntToHex(integer(sAnsiString[1]), 
sizeof(char)*2) + ' ' +

IntToHex(integer(sAnsiString[2]), sizeof(char)*2) +
' should be FC 75');
Memo1.Lines.Add('2) ' + IntToHex(integer(sUTF8String[1]), 
sizeof(char)*2) + ' ' +

IntToHex(integer(sUTF8String[2]), sizeof(char)*2) +
' should be C3 BC');
Memo1.Lines.Add('3) ' + IntToHex(integer(sWideString[1]), 
sizeof(WideChar)*2) + ' ' +
IntToHex(integer(sWideString[2]), 
sizeof(WideChar)*2) +

' should be 00FC 0075');
end;

This results in

1) C3 BC should be FC 75
2) C3 BC should be C3 BC
3) 00C3 00BC should be 00FC 0075



You don't need to tell me why the result is as it is, I do know the 
details, but for me this really is not at all desirable, as any 
newcomer will get hit by this as soon as he tries to do any string handling.


Comment:

1) The type is named ANSIString and so anybody will suppose it in fact 
holds data of this type (ANSI code according to the system's locale) - 
unless you do something else with it in your user program, but obviously 
it does not (with German locale on Windows the ANSI code of ü is $FC ).


2) This in fact is as expected, provided you know that UTF8Strings are 
counted in code-elements rather than in code-points (aka Unicode 
Characters). But I feel that anybody who does not explicitly uses 
Unicode will assume character (notwithstanding that an utf8character is 
not defined in FPC). But you legally can claim that anybody who really 
wants to do Unicode should make himself comfortable with the details of 
UTF8.


3) Assigning a string constant to a WideString does not work as 
expected. The result is not a legal UTF16 representing the constant the 
user wrote.




Not to mention
thay also don't propose an alternative.
  
In these discussions I provided a lot of suggestions (that might or 
might not be sensible) but of course the executive teams (FPC and 
Lazarus) themselves need to decide what to do. (The FPC team seem to 
intend to introduce strings that dynamically know the coding it contains.)

Sorry to be blunt, but so were your comments.
  
Sorry if I sounded blunt. I'm very happy and thankful that there are 
volunteers who dedicate their spare time to make things like FPC and 
Lazarus happen. My ranting was meant to help them improve Lazarus and 
FPC usability.


While the previous Lazarus version's string handling worked as expected 
with ANSIString, the new version forces utf8 coding onto the user, even 
if he is perfectly happy with the locale-depending ANSI he is used to. 
IMHO this only is harmful (shooing away potential users), as it in 
standard situation it does not work exactly as the old ANSIString handling.


-Michael



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Michael Schnell



if compiled using *none* utf8 mode.
I did not find a way to set none utf8 mode with Lazarus, so that I 
just can use ANSIString (and WideString) like I did in the previous version.


Did I miss this option ?

If it exists, why not set same as default so that it works for someone 
ignoring Unicode.


(But I suppose this is prevented by the UTF8API of LCL and the FPC not 
being able to tell ANSIString from UTF8String. )


-Michael (We are turning in Circles on that issue)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Michael Schnell




It is works for win32 only for now. Only system unit is finished. Work 
in progress...

Sounds great so far !

Is there a document on how exactly it is going to work (will a common 
String type get a dynamic coding specification or will there be 
different String types for any coding variants ?


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Jeff Wormsley


Martin Friebe wrote:
I must agree with the FPC can not to it all automatically line (as 
much as I regret, and admit the beauty there was, if fpc could).


What I mean is:

1) Any Application/Program, that currently compiles and works (using 
none utf8, never mind if ascii or ansi) will keep working, if compiled 
using *none* utf8 mode.
This is reasonable.  It also implies that perhaps what everyone is 
trying to do is impossible.  With plain strings, or Ansi strings, we 
have code that works today.  If you change any of those to UTF*, then 
code that uses things such as SetLength, Length, stringvar[index], 
copy(string, index, count), pos etc. cannot work 100% reliably.  You 
don't know what the programmer wants when he says stringvar[3].  Does he 
mean the third character in the string?  Or the third byte in the memory 
array represented by the string (perhaps he was using a string as a 
buffer)?  If you assume one or the other, when one element of a string 
doesn't equal one byte, half of the time you'll be wrong, it doesn't 
matter which UTF type you are using, what locale you are in, or 
anything.  It almost seems to me, that if you want to use UTF strings as 
the default, you should either throw errors or at least stern warnings 
on any use of Length, SetLength, stringvar[index] et all and force any 
code using them to be rewritten with UTF variants.  It would make more 
sense to knowingly say all code using such constructs is broken in a 
Unicode environment than to leave it to chance that the way the code now 
interprets these constructs is the way the coder originally intended.


I know much of my code would break just using AnsiString as opposed to 
the original counted string.  For me, *any* UTF* version discussed here 
would break it even more.


I don't have any need for Unicode, so feel free to ignore anything I 
say.  But I don't want my code breaking in unpredictable ways, either, 
because the underlying string types change on me behind my back (ie, in 
the RTL/FCL).


Jeff.
--
I haven't smoked for 2 years, 3 months and 1 week, saving $3,736.95 and 
not smoking 24,913.01 cigarettes.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Graeme Geldenhuys
On Mon, Nov 24, 2008 at 3:55 PM, Jeff Wormsley [EMAIL PROTECTED] wrote:
 such as SetLength, Length, stringvar[index], copy(string, index, count), pos
 etc. cannot work 100% reliably.  You don't know what the programmer wants
 when he says stringvar[3].  Does he mean the third character in the string?
  Or the third byte in the memory array represented by the string (perhaps he
 was using a string as a buffer)?

That is why I currently use CharAt(str, i) in my projects and fpGUI -
instead of direct array access. CharAt() handles ANSI and UTF-8
strings perfectly.  Yes it might be slower, but I hardly ever need
character access for the type of applications I am writing. So using
CharAt() once or twice in my application is not a performance problem.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Michael Schnell


With plain strings, or Ansi strings, we have code that works today.  
If you change any of those to UTF*, then code that uses things such as 
SetLength, Length, stringvar[index], copy(string, index, count), pos 
etc. cannot work 100% reliably.  You don't know what the programmer 
wants when he says stringvar[3].
That is what the two types ANSString and UTF8String suggest: if you use 
ANSIString, everything works fine as it always did, if you use 
UTF8String you need to take a look at what Unicode handling is all 
about. But unfortunately the compiler does not know the difference 
between the two types and can't do the appropriate conversions if 
necessary (e.g. when accessing the LCL that uses UTF8String) or call the 
appropriate functions (like for doing uppercase) according to what 
type(s) are used in an operation.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-24 Thread Yury Sidorov

From: Michael Schnell [EMAIL PROTECTED]


It is works for win32 only for now. Only system unit is finished. 
Work in progress...

Sounds great so far !

Is there a document on how exactly it is going to work (will a 
common String type get a dynamic coding specification or will there 
be different String types for any coding variants ?


No docoment is available yet. This branch is still experimental. It 
introduces RtlString - string type which is native to RTL on 
corresponding target. RtlString=utf16string on windows, 
RtlString=utf8string for unix, etc.
Also RtlString can be ansistring. In this case RTL will be ANSI only 
and 100% compatible with existing ANSI user code.


It is planned to allow users to build unicode or ansi RTL.

Yury. 
___

fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] libc on various platforms

2008-11-24 Thread Marco van de Voort
In our previous episode, Mark Morgan Lloyd said:
 Looking at 2.2.3 on sparc-linux I see that libc.ppu etc. is no longer 
 being built, it is however built for i386 and arm and for 2.2.0 and 
 older. Is this intentional?
 
 To be honest I can't remember why I needed this but I've got notes to 
 copy the files manually during installation- it might have been 
 something to do with Lazarus.

In addition to Jonas comments, see here:

http://wiki.freepascal.org/libc_unit
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode conversion routines

2008-11-24 Thread Marco van de Voort
In our previous episode, Florian Klaempfl said:
  Of course if OS provides functions, use them! they should be properly
  implemented and will be even faster without memory serious impact, but
  I'm quite sure that not all functions will be provided by all OSs.
  Maybe a small subset should be available for platforms that do not
  provide native support like DOS and a WideString or UnicodeString is
  available, maybe as a separate unit to be linked in only if needed.
 
 Well, if anybody volounteers, something similiar to cwstrings for unix
 could be done :)

Worse, the Dos users are the worst with respect to binary size syndrome.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode conversion routines

2008-11-24 Thread Tomas Hajny
On 24 Nov 08, at 21:49, Marco van de Voort wrote:
 In our previous episode, Florian Klaempfl said:
   Of course if OS provides functions, use them! they should be properly
   implemented and will be even faster without memory serious impact, but
   I'm quite sure that not all functions will be provided by all OSs.
   Maybe a small subset should be available for platforms that do not
   provide native support like DOS and a WideString or UnicodeString is
   available, maybe as a separate unit to be linked in only if needed.
  
  Well, if anybody volounteers, something similiar to cwstrings for unix
  could be done :)
 
 Worse, the Dos users are the worst with respect to binary size syndrome.

Well, as long as it's optional (as suggested by the original post), 
noone should complain... ;-)

Tomas

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel