Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?

2011-04-07 Thread Marco van de Voort
In our previous episode, Skybuck Flying said:
 Suppose some kind of weird dissaster happens, like tsunami in japan... all 
 our computers are destroyed...

(Then recompiling free pascal is the least of our problems)
 
 What would happen if the compiler was unicode only ?
 
 Could the compiler still be build ? I would guess so... unless it depends on 
 some ansi strings in assembly or so...

This is a very complicated situation, and such questions are not answerable,
since everything depends on boundery problems.

The easiest would probably be to go back in time and research when
development systems migrated from local computer encodings to a standarised
characterset (ansi). Since unicode mostly only increases the charset, it is
less of a revolution than ansi was.
 
 Furthermore what happens to statements/code like this:
 
 SomeString := 'SomeText';
 
 I think in a unicode compiler 'SomeText' might actually be defaulted to 
 unicode ?

Source code encoding and runtime encoding are two different things. In e.g. 
recent Delphis, the source code encoding (including such strings) is UTF8,
while the runtime library encoding is UTF16.

In general, your questions suffer from oversimplifications that make them
unanswerable or not really a problem.

As far as bootstrapping is concerned, such problems were very interesting in
the seventies and eighties, and is considered mostly solved nowadays. The
bootstrapping problem is a general problem for any compiler, not just
freepascal.

For some details of FPC's bootstrapping buildup have a look at the buildfaq:

http://www.stack.nl/~marcov/buildfaq/
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] ccharset.pas, charset.pas and strings/unicode ?

2011-04-06 Thread Skybuck Flying

Hello,

I am having momentarily confusion about the situation with ccharset.pas and 
charset.pas and strings, ansistrings and unicode in general... ?!?


So some questions about this:

I in particularly do not understand the following uses clausule:

{$ifdef VER2_2}ccharset{$else VER2_2}charset{$endif VER2_2},

Somewhere it says something about bootstrapping and stuff like that... it 
seems to have something to do with unicode mappings...


It also said that this wasn't necessary anymore beyond version 2.2.2 or 
something ?


This seems to me like a little unicode-hack to get unicode into the compiler 
or something ?


What the hell is this ? =D

Anyway some questions about the free pascal 2.4.2 sources in relation to 
Delphi XE situation:


In the latest Delphi versions string is now considered a Unicode string.

What's the situation with the options.pas in the compiler folder ?

Lot's of string stuff and character stuff going on there... ansistring 
versus unicodestring, ansichar versus unicodechar ?


Seems a bit conflicting for what I am trying to do... which is use some of 
this code in Delphi...


So I am getting all kinds of typecast/implicit string cast warnings and 
errors and stuff and potential data loss

from string to ansistring... a bit too whacky for my taste but ok...

So to get some sense into all of this let me ask you a simple question:

1. What type of strings does free pascal use ? Especially in options.pas ?

Are these string types considered to be AnsiStrings or UnicodeStrings ???

And what about char types ? Are those AnsiChar or UnicodeChar ???

(probably also know as widechar,widestrong...)

(I have in principle done no real programming yet with the newer Delphi 
versions with the unicode stuff in it...
so this is new stuff for me... and now a bit confusion unfortunately... and 
perhaps even unavoidable confusion...
because this reinterpretation that new-borland did is now conflicting 
and causing interpretation issue's :(
so it depends on the compiler... and I don't know what free pascal does... 
so that's why I ask here...)


Also there is something I don't understand about the conditional way above:

It reads in away:

IF VERSION IS 2.2 THEN USE CCHARSET ELSE CHARSET

The thing is: I am using 2.4.2 and CHARSET is missing from 2.4.2

So perhaps this conditional was ment to read something like:

if Version  2.2 then use CCHARSET else CHARSET; ???

So for 2.4.2 I must probably use CCHARSET.pas the thing with the confusing 
strings remains though ;)


So for messy posting... but this is messy ! ;) =D

Bye,
 Skybuck. 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?

2011-04-06 Thread Michael Schnell

On 04/06/2011 08:30 AM, Skybuck Flying wrote:


In the latest Delphi versions string is now considered a Unicode 
string.


The realization of something like this is done in a dedicated new 
string branch of the svn called cpstrnew.


AFAIK, it's still far from usable.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?

2011-04-06 Thread Sven Barth

Am 06.04.2011 08:30, schrieb Skybuck Flying:

Hello,

I am having momentarily confusion about the situation with ccharset.pas
and charset.pas and strings, ansistrings and unicode in general... ?!?

So some questions about this:

I in particularly do not understand the following uses clausule:

{$ifdef VER2_2}ccharset{$else VER2_2}charset{$endif VER2_2},

Somewhere it says something about bootstrapping and stuff like that...
it seems to have something to do with unicode mappings...

It also said that this wasn't necessary anymore beyond version 2.2.2 or
something ?



Something like this is normally done when code is added to the RTL (in 
this case the unit charset) which is used by the compiler as well. As 
the compiler must be built with an older compiler (and its older RTL) 
first, that compiler does not yet know about the charset unit. Thatfor 
the unit is copied to the compiler's directory with a c prefix (in 
this case ccharset) until a release is made which contains that new 
unit. The unit you are looking for is in rtl/inc now, so that 
ifdef-construct (and the ccharset unit) could be removed now.


Something similar was done a few days ago with the new windirs unit 
which was added as cwindirs to the compiler as well.



This seems to me like a little unicode-hack to get unicode into the
compiler or something ?

What the hell is this ? =D

Anyway some questions about the free pascal 2.4.2 sources in relation to
Delphi XE situation:

In the latest Delphi versions string is now considered a Unicode string.

What's the situation with the options.pas in the compiler folder ?

Lot's of string stuff and character stuff going on there... ansistring
versus unicodestring, ansichar versus unicodechar ?



Options.pas has nothing to do with different string types. It's for 
parsing the command line arguments and the configuration file and for 
setting up the start defines based on that arguments and files. Mostly 
you don't need to touch options.pas at all.



Seems a bit conflicting for what I am trying to do... which is use some
of this code in Delphi...

So I am getting all kinds of typecast/implicit string cast warnings and
errors and stuff and potential data loss
from string to ansistring... a bit too whacky for my taste but ok...

So to get some sense into all of this let me ask you a simple question:

1. What type of strings does free pascal use ? Especially in options.pas ?

Are these string types considered to be AnsiStrings or UnicodeStrings ???

And what about char types ? Are those AnsiChar or UnicodeChar ???

(probably also know as widechar,widestrong...)



The compiler itself mostly uses ShortString and pointers to ShortString 
as they don't have the reference counting and thus are faster to handle. 
In some seldom cases AnsiString (aka String) is used and WideString is - 
as far as I'm aware of - never used.


The supported string types by FPC though are ShortString, AnsiString, 
WideString (non reference counted 2 Byte String for Windows 
compatibilty) and UnicodeString (reference counted 2 Byte String). On 
all platforms except Windows (Win32, Win64, WinCE) a WideString is an 
alias for UnicodeString.
In mode Delphi String is an alias for AnsiString in all other modes 
(unless $H+ is given) String is an alias for ShortString.



(I have in principle done no real programming yet with the newer Delphi
versions with the unicode stuff in it...
so this is new stuff for me... and now a bit confusion unfortunately...
and perhaps even unavoidable confusion...
because this reinterpretation that new-borland did is now
conflicting and causing interpretation issue's :(
so it depends on the compiler... and I don't know what free pascal
does... so that's why I ask here...)

Also there is something I don't understand about the conditional way above:

It reads in away:

IF VERSION IS 2.2 THEN USE CCHARSET ELSE CHARSET

The thing is: I am using 2.4.2 and CHARSET is missing from 2.4.2


This condition is the correct one. CCharSet should be removed maybe as 
all compilers from 2.4.0 on use CharSet from the RTL directory.




So perhaps this conditional was ment to read something like:

if Version  2.2 then use CCHARSET else CHARSET; ???

So for 2.4.2 I must probably use CCHARSET.pas the thing with the
confusing strings remains though ;)

So for messy posting... but this is messy ! ;) =D


No, it's not ;)

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?

2011-04-06 Thread Skybuck Flying
Hmm ok, so here is a little theoretical/hypothetical question for you to 
think and guess about ;):


Suppose some kind of weird dissaster happens, like tsunami in japan... all 
our computers are destroyed...


What remains are the free pascal source codes.

What remains is a object pascal compiler which works with unicode strings 
only.


Now suppose string is defined as a unicode string.

This would lead to some problems... but ok... if the compiler supports 
shortstring then that's easily solved...


But my question is a little bit the following:

What would happen if the compiler was unicode only ?

Could the compiler still be build ? I would guess so... unless it depends on 
some ansi strings in assembly or so...


Furthermore what happens to statements/code like this:

SomeString := 'SomeText';


I think in a unicode compiler 'SomeText' might actually be defaulted to 
unicode ?


So then perhaps in compiler it's necessary to typecast this explicitly to:

SomeString := AnsiString('SomeText');

or perhaps even

SomeString := ShortString('SomeText');

I am not sure if these typecasts are needed or if there is a better way...

One way would be:

ShortString := 'SomeText';

But then the assumption would be that the compiler turns the string into 
whatever ShortString is...



But then the question is what would the following do:


if SomeString = 'SomeText' then

???

Would 'SomeText' be used to have the same type as SomeString ?

Would automatic conversion take place ?

or

Would a string type violation occur if the SomeString was of another type 
then the default of 'SomeText'... ?


So that's pretty nasty...

At the moment I have little idea what Delphi XE does... (little experience 
with unicode)


But I would guess everything defaults to unicode ?!? I could be wrong 
though...

(At least that's what it seems to be doing ;))

That does not necessarily mean I agree with how things are done in Delphi XE 
but such is life ;)




Anyway what remains to be discussed is advantages of a unicode compiler...


One thing comes to mind: chinese people and greek people might be able to 
develop a compiler in their own language...



Also what remains is disadventages of unicode compiler...


You already mentioned possible performance issue's... though is there really 
that much difference between shortstring and widestring and ansistring...
it's more or less the same except one has a reference count and another has 
double the ammount of characters...


But a bigger disadventage which I can imagine is operating systems... 
perhaps older ones which do not support unicode ?!?


What would happen to them ?!? Big string corruption me thinks ;) But I could 
be wrong ;)


Maybe even free pascal dos applications could still somehow use unicode if 
the compiler took care of all of it ?


At least internally in the application it would then work... same could be 
done for win95...


Only communication with api's in win95 or interrupts in dos would probably 
screwed up... relating to dos those pretty little might be re-written but 
ok.
Must draw the line somewhere... perhaps even unicode-fonts could be included 
;) pff ;) (but that's probably pushing it ! ;) =D) Nice to think of 
possibilities though... I like backwards compatibility quite a lot ;)



Bye,
 Skybuck =D


- Original Message - 
From: Sven Barth pascaldra...@googlemail.com

To: fpc-devel@lists.freepascal.org
Sent: Wednesday, 6 April, 2011 14:40 PM
Subject: Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?



Am 06.04.2011 08:30, schrieb Skybuck Flying:

Hello,

I am having momentarily confusion about the situation with ccharset.pas
and charset.pas and strings, ansistrings and unicode in general... ?!?

So some questions about this:

I in particularly do not understand the following uses clausule:

{$ifdef VER2_2}ccharset{$else VER2_2}charset{$endif VER2_2},

Somewhere it says something about bootstrapping and stuff like that...
it seems to have something to do with unicode mappings...

It also said that this wasn't necessary anymore beyond version 2.2.2 or
something ?



Something like this is normally done when code is added to the RTL (in 
this case the unit charset) which is used by the compiler as well. As 
the compiler must be built with an older compiler (and its older RTL) 
first, that compiler does not yet know about the charset unit. Thatfor 
the unit is copied to the compiler's directory with a c prefix (in this 
case ccharset) until a release is made which contains that new unit. The 
unit you are looking for is in rtl/inc now, so that ifdef-construct (and 
the ccharset unit) could be removed now.


Something similar was done a few days ago with the new windirs unit 
which was added as cwindirs to the compiler as well.



This seems to me like a little unicode-hack to get unicode into the
compiler or something ?

What the hell is this ? =D

Anyway some questions about the free pascal 2.4.2 sources in relation to
Delphi XE