subject:"\[fpc\-devel\] String handling in trunk $was utf8 in 2.6.0$"


On 01/07/2013 02:01 PM, Tomas Hajny wrote:

(also just my understanding of what Jonas wrote)


I feel you are wrong. The string does not know about the code it's 
content is to be interpreted in (other than with Delphi XE).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

So the ambiguity  with _filling_ a string with data in fact arises when 
_not_ using the #nn notation :-) . With #nn the effect (i.e. the 
resulting binary) is obvious.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Tomas Hajny

On Mon, January 7, 2013 13:28, Ewald wrote:
> Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
> said:
>> On 01/05/2013 12:28 PM, Jonas Maebe wrote:
>>> Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
>>> encoding of that character.
>> Sorry, I can't follow. Does #xx not just define a numerical
>> representation of an 8 bit entity ?
>>
>> The interpretation in any code might be done later by any code that
>> digests the string.
>>
>> Am I wrong ?
> I *think* Jonas is trying to say that if you want the character `Ǿ` in a
> string you would either type
> - 'Ǿ' or
> - #$C7#$BE if you want to keep the source free of encoding specific
> characters
 .
 .

...or
- #$01FE and then the whole string becomes a Unicode string which is
either kept that way (if it is assigned to a UnicodeString constant), or
it is converted to some 8-bit encoding at compile time (if it is assigned
to an 8-bit constant/variable like ansistring)

(also just my understanding of what Jonas wrote)

Tomas


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-07 Thread Ewald

Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell
said:
> On 01/05/2013 12:28 PM, Jonas Maebe wrote:
>> Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8
>> encoding of that character.
> Sorry, I can't follow. Does #xx not just define a numerical
> representation of an 8 bit entity ?
>
> The interpretation in any code might be done later by any code that
> digests the string.
>
> Am I wrong ?
I *think* Jonas is trying to say that if you want the character `Ǿ` in a
string you would either type
- 'Ǿ' or
- #$C7#$BE if you want to keep the source free of encoding specific
characters

You as a programmer make up what you do with it afterwards, if you
decide to write it to an UTF-8 terminal, you would get `Ǿ`, and if you
write it to some other terminal you might see a character that matches
$C7, followed by a character that matches $BE in the lookuptable of the
encoding of the terminal. Look at it this way: the byte sequence ($C7,
$BE) has got no meaning to the compiler whatsoever, it is a byte
sequence. That's what matters to the compiler, what is in this sequence
is for you to decide.

Correct me if I'm wrong.

-- 
Ewald

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)


On 01/05/2013 01:35 PM, Jy V wrote:

I do vote for UTF-8

-1

Regarding that conversions in the RTL (or LCL) are a rather seldom 
runtime-task, GUI performance issues are not really necessary to be 
considered.


Viable issues seem to be Delphi compatibility, backward compatibility, 
usability, runtime-performance with time consuming complex string tasks 
(these seem to vote against UTF8, but for either static UTF 16 or 
(quasi-) dynamical (CE-alike) encoding; and memory usage and 
runtime-performance with time consuming simple string tasks (which vote 
for locale-based ANSI or UTF-8).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)


On 01/05/2013 01:10 PM, Michael Van Canneyt wrote:


String = Ansistring.


Which is the mother of all confusion, IMHO :-[ .

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)


On 01/05/2013 12:28 PM, Jonas Maebe wrote:
Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 
encoding of that character.
Sorry, I can't follow. Does #xx not just define a numerical 
representation of an 8 bit entity ?


The interpretation in any code might be done later by any code that 
digests the string.


Am I wrong ?

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-06 Thread Michael Van Canneyt




On Sun, 6 Jan 2013, Hans-Peter Diettrich wrote:


Michael Van Canneyt schrieb:

IMO resource strings are for display purposes, so that UTF-8/16 encoding 
is expected by an OS API.  AFAIR Win32 string resources are stored in 
UTF-16,


You are very much wrong.


Not really. I was talking about Win32 resources, not about what FPC makes 
from resourcestring.


The discussion is about unnecessary conversions in *FPC resourcestrings*, 
not about win32 resources.


Why you brought up the Windows resourcestrings was (and is) a mystery to me.
From your statement, I assumed that you probably thought FPC stores it's 

resourcestrings as win32 resources. It does not.

To start with, resource strings are not stored as Win32 resources. 
Secondly, they are stored in the code as an ansistring.


The resource string of the above example is stored as:

.globl  _$PROGRAM$_Ld2
_$PROGRAM$_Ld2:
.ascii  "Something\000"
.balign 8
.short  0,1
.long   0
.quad   -1,15
.globl  _$PROGRAM$_Ld3

Thirdly: in my apps, no UTF-8/16 encoding is expected by the OS. If it 
were, I would have used widestrings instead of ansistring to begin with, 
and in that case I would not have made any remark...


I don't know which OS you're using, but the WinAPI uses UTF-16 throughout.


I use both windows and Linux.

You are mistakenly assuming that I am using Windows GUI calls or so. 
There is no GUI.


Probably the only call that cares about codepage is FileCreate(), and that is 
not done using resource strings.
For the rest, all is done using FileWrite() and sendto()/recvfrom().
Both do not care about encoding. They transfer bytes, that's it.

So I use ansistrings throughout.

And hence, resourcestrings being stored in unicode format would cause totally 
unnecessary conversions.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-06 Thread Hans-Peter Diettrich


Paul Ishenin schrieb:

05.01.13, 23:54, Michael Van Canneyt пишет:


You are very much wrong.

To start with, resource strings are not stored as Win32 resources.


I personally think that resources should be stored in their native 
formats where is possible. This will allow to change them using software 
designed for that environment. For example for windows there are many 
resource editors which can replace icons, bitmap and string resources 
too. It would be nice to have this ability also for binaries which FPC 
do. On OS X resources are also stored different from what FPC do 
currently - they are stored in application bundles as I know, so they 
can be edited by external programs too.


Point taken :-)

But I'm not sure about nowadays use of native resources. Even on Windows 
most programs nowadays don't use Windows resources for their menus, 
dialog boxes etc. any more. I've used the Delphi ResourceWorkshop for 
some time, to tweak some third party programs and even Windows itself.


This will be almost impossible with current software. Try e.g. to set 
the Windows menu color to yellow, what I did for a long time, and you'll 
find out that the Explorer and many other Windows tools don't honor that 
setting. Or you'll find that these system settings have been removed at 
all, replaced perhaps by themes?


So I'm not sure about the use of native resources, nowadays. How should 
a multi-platform application handle a string or graphical (icon...) 
resource, so that it can be designed on one platform, and be shown on 
all other platforms without modifications?


With graphical resources I'd use a single internal (FPC) format, which 
is converted by the widgetset for use on the target platform. String 
resources may require more adjustments than only a translation, to match 
the different semantics of other languages - independently from the 
target platform.


That's why I'd suggest UTF-8 encoding for resource strings, what doesn't 
affect program logic because AnsiString still can be used. The *encoded* 
AnsiStrings require that the coder knows about the best encoding of 
every string, when he wants to reduce the number of implicit string 
conversions. Using AnsiString(CP_ACP) may be a reasonable decision for 
use in a program with *very* limited usage (one country, one language, 
one target platform...), but FPC should support programs with a broader 
audience as well.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-06 Thread Hans-Peter Diettrich


Michael Van Canneyt schrieb:

IMO resource strings are for display purposes, so that UTF-8/16 
encoding is expected by an OS API.  AFAIR Win32 string resources are 
stored in UTF-16,


You are very much wrong.


Not really. I was talking about Win32 resources, not about what FPC 
makes from resourcestring.


To start with, resource strings are not stored as Win32 resources. 
Secondly, they are stored in the code as an ansistring.


The resource string of the above example is stored as:

.globl  _$PROGRAM$_Ld2
_$PROGRAM$_Ld2:
.ascii  "Something\000"
.balign 8
.short  0,1
.long   0
.quad   -1,15
.globl  _$PROGRAM$_Ld3

Thirdly: in my apps, no UTF-8/16 encoding is expected by the OS. If it 
were, I would have used widestrings instead of ansistring to begin with, 
and in that case I would not have made any remark...


I don't know which OS you're using, but the WinAPI uses UTF-16 
throughout. I suppose that other OS also use some Unicode string 
representation, for lossless representation of texts of all languages.


The dual W/A interface of Win32 is due to the stripped-down Win9x 
versions, which require Unicode extensions for supporting more than 
CP_ACP. But now we are in 2013, with Unicode being present everywhere.



So the conversion really would be 100% totally redundant.


It may look so to you...

Why then do you use resourcestring instead of ordinary string constants?


Another note and question, about multi-lingual resources. Windows 
resource scripts (.RC) allow for multi-lingual stringtables. In my 
recent research I learned that the resource compiler extracts the 
requested language from the script, and stores only these strings in the 
resource file (.RES) and application (.EXE, .DLL). That's why 
resourcestring was added to Delphi.


How does FPC support the same? (.PO files?)

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

On Sat, 5 Jan 2013, Paul Ishenin wrote:

05.01.13, 23:54, Michael Van Canneyt пишет:

You are very much wrong.

To start with, resource strings are not stored as Win32 resources.

I personally think that resources should be stored in their native formats
where is possible. This will allow to change them using software designed for
that environment. For example for windows there are many resource editors
which can replace icons, bitmap and string resources too. It would be nice to
have this ability also for binaries which FPC do. On OS X resources are also
stored different from what FPC do currently - they are stored in application
bundles as I know, so they can be edited by external programs too.

And Jonas is worried about the overhead in the compiler by simple 1/2 byte format ?
I doubt this will relieve his worries ;-)

The idea of FPC's resourcestrings implementation has always been to be independent
of any OS features so it is a cross-platform solution. That's how it is implemented,
and as far as I am concerned that's how it should stay as the default.

If we do what you think by default, then people making a cross-platform app will need to start
using different technologies to translate their strings. I doubt that is a good solution.

Currently, people that want to use native win32/OSX resource strings always have the option
of doing so, but no special language support for it exists.

Michael.___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Paul Ishenin


05.01.13, 23:54, Michael Van Canneyt пишет:


You are very much wrong.

To start with, resource strings are not stored as Win32 resources.


I personally think that resources should be stored in their native 
formats where is possible. This will allow to change them using software 
designed for that environment. For example for windows there are many 
resource editors which can replace icons, bitmap and string resources 
too. It would be nice to have this ability also for binaries which FPC 
do. On OS X resources are also stored different from what FPC do 
currently - they are stored in application bundles as I know, so they 
can be edited by external programs too.


Best regards,
Paul Ishenin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)




On Sat, 5 Jan 2013, Hans-Peter Diettrich wrote:


Michael Van Canneyt schrieb:



On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 12:53, Paul Ishenin wrote:

ResourceStrings are stored as AnsiString type with 0 codepage (as I 
remember). Delphi now stores ResourceStrings as UnicodeString type. I 
think FPC will follow this in m_default_unicodestring modeswitch.


It would probably even be better to always do that. At least I don't see a
downside, other than slightly larger binaries (and that's not an issue in
this case as far as I'm concerned; maintaining two separate resourcestring
systems/handlers is just not worth the trouble).


But it means that for

Resourcestring
  AString = 'Something';

Var
  S : Ansistring;

begin
  S:=AString;
end.

Always a conversion will happen.

I do not think this is a good idea given that currently, String = 
Ansistring.


IMO resource strings are for display purposes, so that UTF-8/16 encoding is 
expected by an OS API.  AFAIR Win32 string resources are stored in UTF-16,


You are very much wrong.

To start with, resource strings are not stored as Win32 resources. 
Secondly, they are stored in the code as an ansistring.


The resource string of the above example is stored as:

.globl  _$PROGRAM$_Ld2
_$PROGRAM$_Ld2:
.ascii  "Something\000"
.balign 8
.short  0,1
.long   0
.quad   -1,15
.globl  _$PROGRAM$_Ld3

Thirdly: in my apps, no UTF-8/16 encoding is expected by the OS. 
If it were, I would have used widestrings instead of ansistring 
to begin with, and in that case I would not have made any remark...


So the conversion really would be 100% totally redundant.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Hans-Peter Diettrich


Michael Van Canneyt schrieb:



On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 12:53, Paul Ishenin wrote:

ResourceStrings are stored as AnsiString type with 0 codepage (as I 
remember). Delphi now stores ResourceStrings as UnicodeString type. I 
think FPC will follow this in m_default_unicodestring modeswitch.


It would probably even be better to always do that. At least I don't 
see a

downside, other than slightly larger binaries (and that's not an issue in
this case as far as I'm concerned; maintaining two separate 
resourcestring

systems/handlers is just not worth the trouble).


But it means that for

Resourcestring
  AString = 'Something';

Var
  S : Ansistring;

begin
  S:=AString;
end.

Always a conversion will happen.

I do not think this is a good idea given that currently, String = 
Ansistring.


IMO resource strings are for display purposes, so that UTF-8/16 encoding 
is expected by an OS API. AFAIR Win32 string resources are stored in 
UTF-16, so that assignments to an AnsiString already require a 
conversion. So IMO UTF-8 would be better, for now and in future.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)




On Sat, 5 Jan 2013, Sven Barth wrote:


On 05.01.2013 14:16, Michael Van Canneyt wrote:



On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 13:10, Michael Van Canneyt wrote:


On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 12:53, Paul Ishenin wrote:


ResourceStrings are stored as AnsiString type with 0 codepage (as I
remember). Delphi now stores ResourceStrings as UnicodeString type.
I think FPC will follow this in m_default_unicodestring modeswitch.


It would probably even be better to always do that. At least I don't
see a
downside, other than slightly larger binaries (and that's not an
issue in
this case as far as I'm concerned; maintaining two separate
resourcestring
systems/handlers is just not worth the trouble).


But it means that for

Resourcestring
 AString = 'Something';

Var
 S : Ansistring;

begin
 S:=AString;
end.

Always a conversion will happen.

I do not think this is a good idea given that currently, String =
Ansistring.


String will always be shortstring or ansistring in the syntax modes in
which that is currently the case. And yes, it will involve a
conversion in that case. Just like every single constant string
assignment to an ansistring in 2.6.x in case the constant string
contains non-ASCII characters and is part of a {$codepage xxx} file
(because those strings are all stored as unicodestring in the program
there).


Judging by all the code that I have written during 14 years, there would
never be a single conversion necessary.
This system would force them on me for every single use.

I do not think that the support of both ansi/unicode string resources is
such a burden that it justifies that.

I admittedly have limited knowledge of compiler internals, but I cannot
imagine that being able to store them in 2 formats (ansi and some form
of unicode) is more than a matter of maintaining 1 flag per string, and
writing a word instead of a byte.

All the other code, needed for conversions depending on codepage and
whatnot settings, is necessary anyway.


You forget also the code necessary to translate resourcestrings (at runtime). 
Currently the ResourceString related code inside rtl/objpas/objpas.pp only 
handles AnsiString and then this would need to be adjusted so that 
UnicodeString can also be handled. For example there will be the need for a 
"SetResourceStrings" overload with a UnicodeString based TResourceIterator.


No, I had I though of that. 
It will need to be changed anyhow, and fell under "is necessary anyway", 
since we'll need some kind of backwards-compatibility mechanism.


Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Sven Barth


On 05.01.2013 14:16, Michael Van Canneyt wrote:



On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 13:10, Michael Van Canneyt wrote:


On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 12:53, Paul Ishenin wrote:


ResourceStrings are stored as AnsiString type with 0 codepage (as I
remember). Delphi now stores ResourceStrings as UnicodeString type.
I think FPC will follow this in m_default_unicodestring modeswitch.


It would probably even be better to always do that. At least I don't
see a
downside, other than slightly larger binaries (and that's not an
issue in
this case as far as I'm concerned; maintaining two separate
resourcestring
systems/handlers is just not worth the trouble).


But it means that for

Resourcestring
 AString = 'Something';

Var
 S : Ansistring;

begin
 S:=AString;
end.

Always a conversion will happen.

I do not think this is a good idea given that currently, String =
Ansistring.


String will always be shortstring or ansistring in the syntax modes in
which that is currently the case. And yes, it will involve a
conversion in that case. Just like every single constant string
assignment to an ansistring in 2.6.x in case the constant string
contains non-ASCII characters and is part of a {$codepage xxx} file
(because those strings are all stored as unicodestring in the program
there).


Judging by all the code that I have written during 14 years, there would
never be a single conversion necessary.
This system would force them on me for every single use.

I do not think that the support of both ansi/unicode string resources is
such a burden that it justifies that.

I admittedly have limited knowledge of compiler internals, but I cannot
imagine that being able to store them in 2 formats (ansi and some form
of unicode) is more than a matter of maintaining 1 flag per string, and
writing a word instead of a byte.

All the other code, needed for conversions depending on codepage and
whatnot settings, is necessary anyway.


You forget also the code necessary to translate resourcestrings (at 
runtime). Currently the ResourceString related code inside 
rtl/objpas/objpas.pp only handles AnsiString and then this would need to 
be adjusted so that UnicodeString can also be handled. For example there 
will be the need for a "SetResourceStrings" overload with a 
UnicodeString based TResourceIterator.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

On Sat, 5 Jan 2013, Jonas Maebe wrote:

On 05 Jan 2013, at 13:10, Michael Van Canneyt wrote:

On Sat, 5 Jan 2013, Jonas Maebe wrote:

On 05 Jan 2013, at 12:53, Paul Ishenin wrote:

ResourceStrings are stored as AnsiString type with 0 codepage (as I remember).
Delphi now stores ResourceStrings as UnicodeString type. I think FPC will
follow this in m_default_unicodestring modeswitch.

It would probably even be better to always do that. At least I don't see a
downside, other than slightly larger binaries (and that's not an issue in
this case as far as I'm concerned; maintaining two separate resourcestring
systems/handlers is just not worth the trouble).

But it means that for

Resourcestring
AString = 'Something';

Var
S : Ansistring;

begin
S:=AString;
end.

Always a conversion will happen.

I do not think this is a good idea given that currently, String = Ansistring.

String will always be shortstring or ansistring in the syntax modes in which that is currently the case.
And yes, it will involve a conversion in that case.
Just like every single constant string assignment to an ansistring in 2.6.x in case the constant string
contains non-ASCII characters and is part of a {$codepage xxx} file (because those strings are all stored
as unicodestring in the program there).

Judging by all the code that I have written during 14 years, there would never
be a single conversion necessary.
This system would force them on me for every single use.

I do not think that the support of both ansi/unicode string resources is such a
burden that it justifies that.

I admittedly have limited knowledge of compiler internals, but I cannot imagine that being able to store them
in 2 formats (ansi and some form of unicode) is more than a matter of maintaining 1 flag per string, and writing
a word instead of a byte.

All the other code, needed for conversions depending on codepage and whatnot
settings, is necessary anyway.

Michael.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Jy V

> Yes, the exception is probably UTF-8 on Unix systems, but is that really
> worth it to complicate the compiler and RTL? Resourcestings are generally
> not used in performance-critical code, I'd assume. Always using UTF-8 is
> however also fine for me,


I do vote for UTF-8


> btw. I just don't believe it is worth the trouble to support both
> unicodestring and ansistring resourcestrings.
>

I agree.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

On 05 Jan 2013, at 13:33, Martin Schreiber wrote:

> On Saturday 05 January 2013 12:57:44 Jonas Maebe wrote:
>> On 05 Jan 2013, at 12:53, Martin Schreiber wrote:
>>> So compiled with -Fcutf8
>>> "
>>> unicodestringvar:= 'Best'#228'tigung';
>>> "
>>> produces a different result on fixes_2_6 and trunk? I assume in trunk
>>> there will be a compile error?
>> 
>> No. In both cases it results in a widestring with this content:
>> 
>> .short   66,101,115,116,228,116,105,103,117,110,103,0
>> 
>> I guess invalid utf-8 values are just copied through by the compiler. As
>> mentioned: absolutely nothing whatsoever changed in how character sequences
>> are interpreted by the compiler in 2.7.x. The explanation you quoted above
>> (and which I deleted) applies to both 2.6.x and 2.7.x. I really don't know
>> how I can say this in another way, and repeating it clearly doesn't help.
>> 
>> I think it's best if you compile trunk for yourself and test as many
>> scenarios as you can, because I feel I cannot add anything further to the
>> discussion, and I'm not interested in playing compile bot.
>> 
> Then it was a misunderstanding again

No, it was simply an omission in my explanation. As mentioned above: "I guess 
invalid utf-8 values are just copied through by the compiler". It's a special 
case, but the special case is the same in 2.6.x and 2.7.x (2.6.x converts the 
UTF-8 string to UTF-16 immediately in the scanner, while 2.7.x does it while 
processing the assignment; the actual conversion code that's used is however 
exactly same). The fact that everything remains 100% the same in all cases 
everywhere always between 2.6.x and 2.7.x has been mentioned at least 10 times 
in this thread, and that's what I keep trying to make clear. But I give up.

Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Martin Schreiber

On Saturday 05 January 2013 12:57:44 Jonas Maebe wrote:
> On 05 Jan 2013, at 12:53, Martin Schreiber wrote:
> > So compiled with -Fcutf8
> > "
> > unicodestringvar:= 'Best'#228'tigung';
> > "
> > produces a different result on fixes_2_6 and trunk? I assume in trunk
> > there will be a compile error?
>
> No. In both cases it results in a widestring with this content:
>
> .short66,101,115,116,228,116,105,103,117,110,103,0
>
> I guess invalid utf-8 values are just copied through by the compiler. As
> mentioned: absolutely nothing whatsoever changed in how character sequences
> are interpreted by the compiler in 2.7.x. The explanation you quoted above
> (and which I deleted) applies to both 2.6.x and 2.7.x. I really don't know
> how I can say this in another way, and repeating it clearly doesn't help.
>
> I think it's best if you compile trunk for yourself and test as many
> scenarios as you can, because I feel I cannot add anything further to the
> discussion, and I'm not interested in playing compile bot.
>
Then it was a misunderstanding again because I read
"
Alternatively, in both cases you can instead define a unicodestring/widestring 
constant instead of an ansistring/shortstring constant by embedding widechar 
constants in the character sequence. Such widechar constants are of the form 
# with  a valid Pascal representation of an integer constant 
between 255 and 65535.
"
and
"
Whether or not they contain character literals whose value is >#127 in the 
source code's code page, or explicit "#xx", "#xxx" etc expressions has no 
influence, nothing changed in the compiler in that account.
"
and
"
I have no idea how anything I wrote suggests that it wouldn't. As mentioned, 
the only difference is that string constants containing characters >#127 are 
no longer always converted to unicodestring constants at compile time.
"
--> >#255 <> >#127 and the question arose how can one define "widechar 
constants" for strings without a character value >255.

Martin

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

On 05 Jan 2013, at 13:10, Michael Van Canneyt wrote:

> On Sat, 5 Jan 2013, Jonas Maebe wrote:
> 
>> 
>> On 05 Jan 2013, at 12:53, Paul Ishenin wrote:
>> 
>>> ResourceStrings are stored as AnsiString type with 0 codepage (as I 
>>> remember). Delphi now stores ResourceStrings as UnicodeString type. I think 
>>> FPC will follow this in m_default_unicodestring modeswitch.
>> 
>> It would probably even be better to always do that. At least I don't see a
>> downside, other than slightly larger binaries (and that's not an issue in
>> this case as far as I'm concerned; maintaining two separate resourcestring
>> systems/handlers is just not worth the trouble).
> 
> But it means that for
> 
> Resourcestring
>  AString = 'Something';
> 
> Var
>  S : Ansistring;
> 
> begin
>  S:=AString;
> end.
> 
> Always a conversion will happen.
> 
> I do not think this is a good idea given that currently, String = Ansistring.

String will always be shortstring or ansistring in the syntax modes in which 
that is currently the case. And yes, it will involve a conversion in that case. 
Just like every single constant string assignment to an ansistring in 2.6.x in 
case the constant string contains non-ASCII characters and is part of a 
{$codepage xxx} file (because those strings are all stored as unicodestring in 
the program there).

Then again, it will also involve a conversion if the implementation using 
ansistrings is fixed to supported non-ASCII resourcestrings and the system 
codepage is different from the code page in which the resource string has been 
stored by the compiler. In fact, it will then cause two conversions on most 
systems (few systems can directly transcode from arbitrary code page X to 
arbitrary code page Y; most use UTF-16 as intermediate format, although some 
can probably also use UTF-8).

Yes, the exception is probably UTF-8 on Unix systems, but is that really worth 
it to complicate the compiler and RTL? Resourcestings are generally not used in 
performance-critical code, I'd assume. Always using UTF-8 is however also fine 
for me, btw. I just don't believe it is worth the trouble to support both 
unicodestring and ansistring resourcestrings.

Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)




On Sat, 5 Jan 2013, Jonas Maebe wrote:



On 05 Jan 2013, at 12:53, Paul Ishenin wrote:


ResourceStrings are stored as AnsiString type with 0 codepage (as I remember). 
Delphi now stores ResourceStrings as UnicodeString type. I think FPC will 
follow this in m_default_unicodestring modeswitch.


It would probably even be better to always do that. At least I don't see a
downside, other than slightly larger binaries (and that's not an issue in
this case as far as I'm concerned; maintaining two separate resourcestring
systems/handlers is just not worth the trouble).


But it means that for

Resourcestring
  AString = 'Something';

Var
  S : Ansistring;

begin
  S:=AString;
end.

Always a conversion will happen.

I do not think this is a good idea given that currently, String = Ansistring.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

On 05 Jan 2013, at 12:53, Paul Ishenin wrote:

> ResourceStrings are stored as AnsiString type with 0 codepage (as I 
> remember). Delphi now stores ResourceStrings as UnicodeString type. I think 
> FPC will follow this in m_default_unicodestring modeswitch.

It would probably even be better to always do that. At least I don't see a 
downside, other than slightly larger binaries (and that's not an issue in this 
case as far as I'm concerned; maintaining two separate resourcestring 
systems/handlers is just not worth the trouble).

Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

On 05 Jan 2013, at 12:53, Martin Schreiber wrote:

> So compiled with -Fcutf8
> "
> unicodestringvar:= 'Best'#228'tigung';
> "
> produces a different result on fixes_2_6 and trunk? I assume in trunk there 
> will be a compile error?

No. In both cases it results in a widestring with this content:

.short  66,101,115,116,228,116,105,103,117,110,103,0

I guess invalid utf-8 values are just copied through by the compiler. As 
mentioned: absolutely nothing whatsoever changed in how character sequences are 
interpreted by the compiler in 2.7.x. The explanation you quoted above (and 
which I deleted) applies to both 2.6.x and 2.7.x. I really don't know how I can 
say this in another way, and repeating it clearly doesn't help.

I think it's best if you compile trunk for yourself and test as many scenarios 
as you can, because I feel I cannot add anything further to the discussion, and 
I'm not interested in playing compile bot.

Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Paul Ishenin


05.01.13, 19:40, Jonas Maebe пишет:


You can put anything in it and it may or may not work depending on the current 
system code page, but afaik the only thing that is guaranteed to work at this 
time is plain ASCII.


ResourceStrings are stored as AnsiString type with 0 codepage (as I 
remember). Delphi now stores ResourceStrings as UnicodeString type. I 
think FPC will follow this in m_default_unicodestring modeswitch.


Best regards,
Paul Ishenin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Martin Schreiber

On Saturday 05 January 2013 12:28:03 Jonas Maebe wrote:

> Alternatively, in both cases you can instead define a
> unicodestring/widestring constant instead of an ansistring/shortstring
> constant by embedding widechar constants in the character sequence. Such
> widechar constants are of the form # with  a valid Pascal
> representation of an integer constant between 255 and 65535. Then you can
> use those widechars to represent the desired characters as UTF-16 code
> points. In that case, the entire string will however be parsed as a
> sequence of UTF-16 code points (because a string is either a sequence of
> ansichars, or a sequence of widechars; it can never be a mixture of the
> two), and hence also #1 or #128 appearing in a widestring will be parsed as
> widechar(#1) and widechar(#128) as opposed to being interpreted according
> to the current codepage setting.
>
So compiled with -Fcutf8
"
unicodestringvar:= 'Best'#228'tigung';
"
produces a different result on fixes_2_6 and trunk? I assume in trunk there 
will be a compile error? We use this form of character constants in MSEgui to 
have the sources in pure ASCII.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)


On 05 Jan 2013, at 12:36, Sven Barth wrote:

> On 05.01.2013 12:28, Jonas Maebe wrote:
>>> And again, sorry for the impertinence, how do resource strings fit in the
>>> string handling scenario of Free Pascal trunk?
>> 
>> Unicode support for resourcestrings is still not available in FPC trunk. 
>> They can currently still only be used safely for ASCII content.
> 
> What about UTF8 content?

You can put anything in it and it may or may not work depending on the current 
system code page, but afaik the only thing that is guaranteed to work at this 
time is plain ASCII.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)

2013-01-05 Thread Sven Barth


On 05.01.2013 12:28, Jonas Maebe wrote:

And again, sorry for the impertinence, how do resource strings fit in the
string handling scenario of Free Pascal trunk?


Unicode support for resourcestrings is still not available in FPC trunk. They 
can currently still only be used safely for ASCII content.


What about UTF8 content?

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)