Hmm ok, so here is a little theoretical/hypothetical question for you to
think and guess about ;):
Suppose some kind of weird dissaster happens, like tsunami in japan... all
our computers are destroyed...
What remains are the free pascal source codes.
What remains is a object pascal compiler which works with unicode strings
only.
Now suppose string is defined as a unicode string.
This would lead to some problems... but ok... if the compiler supports
shortstring then that's easily solved...
But my question is a little bit the following:
What would happen if the compiler was unicode only ?
Could the compiler still be build ? I would guess so... unless it depends on
some ansi strings in assembly or so...
Furthermore what happens to statements/code like this:
SomeString := 'SomeText';
I think in a unicode compiler 'SomeText' might actually be defaulted to
unicode ?
So then perhaps in compiler it's necessary to typecast this explicitly to:
SomeString := AnsiString('SomeText');
or perhaps even
SomeString := ShortString('SomeText');
I am not sure if these typecasts are needed or if there is a better way...
One way would be:
ShortString := 'SomeText';
But then the assumption would be that the compiler turns the string into
whatever ShortString is...
But then the question is what would the following do:
if SomeString = 'SomeText' then
???
Would 'SomeText' be used to have the same type as SomeString ?
Would automatic conversion take place ?
or
Would a string type violation occur if the SomeString was of another type
then the default of 'SomeText'... ?
So that's pretty nasty...
At the moment I have little idea what Delphi XE does... (little experience
with unicode)
But I would guess everything defaults to unicode ?!? I could be wrong
though...
(At least that's what it seems to be doing ;))
That does not necessarily mean I agree with how things are done in Delphi XE
but such is life ;)
Anyway what remains to be discussed is advantages of a unicode compiler...
One thing comes to mind: "chinese people and greek people might be able to
develop a compiler in their own language..."
Also what remains is disadventages of unicode compiler...
You already mentioned possible performance issue's... though is there really
that much difference between shortstring and widestring and ansistring...
it's more or less the same except one has a reference count and another has
double the ammount of characters...
But a bigger disadventage which I can imagine is operating systems...
perhaps older ones which do not support unicode ?!?
What would happen to them ?!? Big string corruption me thinks ;) But I could
be wrong ;)
Maybe even free pascal dos applications could still somehow use unicode if
the compiler took care of all of it ?
At least internally in the application it would then work... same could be
done for win95...
Only communication with api's in win95 or interrupts in dos would probably
screwed up... relating to dos those pretty little might be re-written but
ok.
Must draw the line somewhere... perhaps even unicode-fonts could be included
;) pff ;) (but that's probably pushing it ! ;) =D) Nice to think of
possibilities though... I like backwards compatibility quite a lot ;)
Bye,
Skybuck =D
----- Original Message -----
From: "Sven Barth" <pascaldra...@googlemail.com>
To: <fpc-devel@lists.freepascal.org>
Sent: Wednesday, 6 April, 2011 14:40 PM
Subject: Re: [fpc-devel] ccharset.pas, charset.pas and strings/unicode ?
Am 06.04.2011 08:30, schrieb Skybuck Flying:
Hello,
I am having momentarily confusion about the situation with ccharset.pas
and charset.pas and strings, ansistrings and unicode in general... ?!?
So some questions about this:
I in particularly do not understand the following uses clausule:
{$ifdef VER2_2}ccharset{$else VER2_2}charset{$endif VER2_2},
Somewhere it says something about bootstrapping and stuff like that...
it seems to have something to do with unicode mappings...
It also said that this wasn't necessary anymore beyond version 2.2.2 or
something ?
Something like this is normally done when code is added to the RTL (in
this case the unit "charset") which is used by the compiler as well. As
the compiler must be built with an older compiler (and its older RTL)
first, that compiler does not yet know about the "charset" unit. Thatfor
the unit is copied to the compiler's directory with a "c" prefix (in this
case "ccharset") until a release is made which contains that new unit. The
unit you are looking for is in rtl/inc now, so that ifdef-construct (and
the ccharset unit) could be removed now.
Something similar was done a few days ago with the new "windirs" unit
which was added as "cwindirs" to the compiler as well.
This seems to me like a little unicode-hack to get unicode into the
compiler or something ?
What the hell is this ? =D
Anyway some questions about the free pascal 2.4.2 sources in relation to
Delphi XE situation:
In the latest Delphi versions "string" is now considered a Unicode
string.
What's the situation with the "options.pas" in the compiler folder ?
Lot's of string stuff and character stuff going on there... ansistring
versus unicodestring, ansichar versus unicodechar ?
Options.pas has nothing to do with different string types. It's for
parsing the command line arguments and the configuration file and for
setting up the start defines based on that arguments and files. Mostly you
don't need to touch options.pas at all.
Seems a bit conflicting for what I am trying to do... which is use some
of this code in Delphi...
So I am getting all kinds of typecast/implicit string cast warnings and
errors and stuff and potential data loss
from "string" to "ansistring"... a bit too whacky for my taste but ok...
So to get some sense into all of this let me ask you a simple question:
1. What type of strings does free pascal use ? Especially in options.pas
?
Are these "string" types considered to be AnsiStrings or UnicodeStrings
???
And what about "char" types ? Are those AnsiChar or UnicodeChar ???
(probably also know as widechar,widestrong...)
The compiler itself mostly uses ShortString and pointers to ShortString as
they don't have the reference counting and thus are faster to handle. In
some seldom cases AnsiString (aka String) is used and WideString is - as
far as I'm aware of - never used.
The supported string types by FPC though are ShortString, AnsiString,
WideString (non reference counted 2 Byte String for Windows compatibilty)
and UnicodeString (reference counted 2 Byte String). On all platforms
except Windows (Win32, Win64, WinCE) a WideString is an alias for
UnicodeString.
In mode Delphi "String" is an alias for "AnsiString" in all other modes
(unless $H+ is given) "String" is an alias for "ShortString".
(I have in principle done no real programming yet with the newer Delphi
versions with the unicode stuff in it...
so this is new stuff for me... and now a bit confusion unfortunately...
and perhaps even unavoidable confusion...
because this "reinterpretation" that "new-borland" did is now
conflicting and causing interpretation issue's :(
so it depends on the compiler... and I don't know what free pascal
does... so that's why I ask here...)
Also there is something I don't understand about the conditional way
above:
It reads in away:
IF VERSION IS 2.2 THEN USE CCHARSET ELSE CHARSET
The thing is: I am using 2.4.2 and CHARSET is missing from 2.4.2
This condition is the correct one. CCharSet should be removed maybe as all
compilers from 2.4.0 on use CharSet from the RTL directory.
So perhaps this conditional was ment to read something like:
if Version > 2.2 then use CCHARSET else CHARSET; ???
So for 2.4.2 I must probably use CCHARSET.pas the thing with the
confusing strings remains though ;)
So for messy posting... but this is messy ! ;) =D
No, it's not ;)
Regards,
Sven
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel