subject:"Re\: \[Lazarus\] String vs WideString"

Re: [Lazarus] String vs WideString

2017-08-12 Thread Mattias Gaertner via Lazarus

On Sat, 12 Aug 2017 16:46:09 -0300
"Marcos Douglas B. Santos via Lazarus" 
wrote:

>[...]
> Lib.SetLicense(
>   IniFile.ReadString('TheLib', 'license', '')
> );

What encoding has the ini file?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-12 Thread Marcos Douglas B. Santos via Lazarus

On Sat, Aug 12, 2017 at 5:32 PM, Mattias Gaertner via Lazarus
 wrote:
> On Sat, 12 Aug 2017 16:46:09 -0300
> "Marcos Douglas B. Santos via Lazarus" 
> wrote:
>
>>[...]
>> Lib.SetLicense(
>>   IniFile.ReadString('TheLib', 'license', '')
>> );
>
> What encoding has the ini file?

ANSI. A simple text file on Windows with only ANSI chars.

But I'm so sorry Mattias, it was my fault.
The program was reading the wrong file version (problem in paths...).

It works now, but I have one question:
What is the right way to code to do not see this warning?

Warning: Implicit string type conversion from "AnsiString" to "WideString"

Regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-12 Thread Mattias Gaertner via Lazarus

On Sat, 12 Aug 2017 17:43:29 -0300
"Marcos Douglas B. Santos via Lazarus" 
wrote:

>[...]
> > What encoding has the ini file?  
> 
> ANSI. A simple text file on Windows with only ANSI chars.

Which one? Do you mean Windows CP-1252?

 
>[...]
> Warning: Implicit string type conversion from "AnsiString" to "WideString"

Explicit type cast:

Lib.SetLicense(
   WideString(IniFile.ReadString('TheLib', 'license', ''))
); 

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-12 Thread Marcos Douglas B. Santos via Lazarus

On Sat, Aug 12, 2017 at 5:49 PM, Mattias Gaertner via Lazarus
 wrote:
> On Sat, 12 Aug 2017 17:43:29 -0300
> "Marcos Douglas B. Santos via Lazarus" 
> wrote:
>
>>[...]
>> > What encoding has the ini file?
>>
>> ANSI. A simple text file on Windows with only ANSI chars.
>
> Which one? Do you mean Windows CP-1252?

Yes...
But would it make any difference?

>>[...]
>> Warning: Implicit string type conversion from "AnsiString" to "WideString"
>
> Explicit type cast:
>
> Lib.SetLicense(
>WideString(IniFile.ReadString('TheLib', 'license', ''))
> );

Wow... everywhere? :(

Regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-12 Thread Bo Berglund via Lazarus

On Sat, 12 Aug 2017 17:56:58 -0300, "Marcos Douglas B. Santos via
Lazarus"  wrote:

>> Which one? Do you mean Windows CP-1252?
>
>Yes...
>But would it make any difference?

I recently had a problem with an application that was converted from
old string type to AnsiString and seemingly worked in the new Unicode
environment.
However, we received reports that it had failed in some Asian
countries (Korea, China, Thailand) and upon checking it turned out
that the data inside a string used as buffer was changed because of
locale differences

After switching out the affected variable declarations from AnsiString
to RawByteString the application seemingly started to work again also
on these locations.

So AnsiString is not safe either

And after this I have spent some time to totally rework the use of
strings as buffers to instead use TBytes. Lots of work but guaranteed
to not sneak in unexpected conversions.

-- 
Bo Berglund
Developer in Sweden

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-12 Thread Marcos Douglas B. Santos via Lazarus

On Sat, Aug 12, 2017 at 7:21 PM, Bo Berglund via Lazarus
 wrote:
> On Sat, 12 Aug 2017 17:56:58 -0300, "Marcos Douglas B. Santos via
> Lazarus"  wrote:
>
>>> Which one? Do you mean Windows CP-1252?
>>
>>Yes...
>>But would it make any difference?
>
> I recently had a problem with an application that was converted from
> old string type to AnsiString and seemingly worked in the new Unicode
> environment.
> However, we received reports that it had failed in some Asian
> countries (Korea, China, Thailand) and upon checking it turned out
> that the data inside a string used as buffer was changed because of
> locale differences
>
> After switching out the affected variable declarations from AnsiString
> to RawByteString the application seemingly started to work again also
> on these locations.
>
> So AnsiString is not safe either
>
> And after this I have spent some time to totally rework the use of
> strings as buffers to instead use TBytes. Lots of work but guaranteed
> to not sneak in unexpected conversions.

Is not simpler to use RawByteString instead TBytes?

Regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-13 Thread Bo Berglund via Lazarus

On Sat, 12 Aug 2017 23:42:43 -0300, "Marcos Douglas B. Santos via
Lazarus"  wrote:

>> After switching out the affected variable declarations from AnsiString
>> to RawByteString the application seemingly started to work again also
>> on these locations.
>>
>> So AnsiString is not safe either
>>
>> And after this I have spent some time to totally rework the use of
>> strings as buffers to instead use TBytes. Lots of work but guaranteed
>> to not sneak in unexpected conversions.
>
>Is not simpler to use RawByteString instead TBytes?

Well, initially just changing the declarations would seem to be
simpler. But given how the conversion problem sneaked up behind my
back, I thought it wiser to move all serial comm buffers from various
string types (string->AnsiString->RawByteString) to TBytes since that
is really guaranteed to be "the real thing".

Whenever there is a need for displaying the data or putting them into
a string type variable I have added a few utility functions to do the
conversions using the Move() procedure. Likewise I made a PosBin() for
searching for patterns like Pos() for strings etc.

-- 
Bo Berglund
Developer in Sweden

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-13 Thread Juha Manninen via Lazarus

On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
 wrote:
> So AnsiString is not safe either

That is a little misleading.
Actually using the Windows system codepage is not safe any more.
The current Unicode system in Lazarus maps AnsiString to use UTF-8.
Text with Windows codepage must be converted explicitly.
This is a breaking change compared to the old Unicode suppport in
Lazarus 1.4.x + FPC 2.6.x.
The right solution is to use Unicode everywhere. Windows codepages can
be seen as a historical remain, retained for backwards compatibility.
Now is year 2017, Unicode has been used for decades. Everybody should
use it by now.

Marcos Douglas, please change the encoding in your text file to UTF-8.
Every decent text editor, including the editor in Lazarus, has a
feature to do it.
Once the data is Unicode, it is all smooth sailing.
Data is converted between UTF-8 and UTF-16 losslessly.

One more thing:
Data for WideString/UnicodeString parameters in WinAPI functions are
converted automatically. You can ignore the warning or suppress it by
a type cast as Mattias showed.
However for PWideChar parameters you should create an explicit
temporary variable, usually UnicodeString but WideString for OLE.
Assigning to it from your "String" data converts encoding.
Then cast the new variable as the required pointer type.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-13 Thread Juha Manninen via Lazarus

On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
 wrote:
> I recently had a problem with an application that was converted from
> old string type to AnsiString and seemingly worked in the new Unicode
> environment.

What was the old string type?

> However, we received reports that it had failed in some Asian
> countries (Korea, China, Thailand) and upon checking it turned out
> that the data inside a string used as buffer was changed because of
> locale differences

Unicode was designed to solve exactly the problems caused by locale differences.
Why don't you use it?

> After switching out the affected variable declarations from AnsiString
> to RawByteString the application seemingly started to work again also
> on these locations.
> ...
> And after this I have spent some time to totally rework the use of
> strings as buffers to instead use TBytes. Lots of work but
> guaranteed to not sneak in unexpected conversions.

RawByteString is for text which encoding is not meant to be converted.
It has its special use cases.
TBytes is usually for binary data.
Did I understand right: you use TBytes to hold strings having Windows
codepage encoding?
That sounds like a very dummy thing to do!
Again: Why not Unicode? Then you could throw away your hacks.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-13 Thread Bo Berglund via Lazarus

On Sun, 13 Aug 2017 14:18:23 +0300, Juha Manninen via Lazarus
 wrote:

>On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
> wrote:
>> I recently had a problem with an application that was converted from
>> old string type to AnsiString and seemingly worked in the new Unicode
>> environment.
>
>What was the old string type?

Note: The programs were started back in around 2000 using Delphi 7...

We used "string" as the container for processing serial data to/from
CNC machine tool controllers amongst others. This was triggered really
by the serial components, which mostly transferred char(acters) and
had methods for sending and receiving strings, even though we usually
used char.

>> However, we received reports that it had failed in some Asian
>> countries (Korea, China, Thailand) and upon checking it turned out
>> that the data inside a string used as buffer was changed because of
>> locale differences
>
>Unicode was designed to solve exactly the problems caused by locale 
>differences.
>Why don't you use it?

Again, these are old existing programs and  we are not doing this
anymore for new programs. However, there is one problem still becauyse
there is an interface point to the hardware, in the form of serial
components, which still handle chars...
And chars are nowadays Unicode chars, i.e. not mapping to bytes sent
by RS232...
And our data are NOT text, they are binary streams of bytes.

>> After switching out the affected variable declarations from AnsiString
>> to RawByteString the application seemingly started to work again also
>> on these locations.
>> ...
>> And after this I have spent some time to totally rework the use of
>> strings as buffers to instead use TBytes. Lots of work but
>> guaranteed to not sneak in unexpected conversions.
>
>RawByteString is for text which encoding is not meant to be converted.
>It has its special use cases.

My first attempt at "fixing" the problem in Asian locales was to use
RawByteString so as to inhibit conversions.
Still with these as comm buffers...
It seemed to work out, but to be safer I have reworked one application
to replace with TBytes everywhere comm data are handled.

>TBytes is usually for binary data.

Exactly, and this is why I made the comment that to be on the safe
side dealing with RS232 the buffers should be TBytes (or some other
similar construct).

>Did I understand right: you use TBytes to hold strings having Windows
>codepage encoding?

No, definitively not. At the time we were not aware of any encoding at
all. To us a string was just a handy container for the serial data
like a dynamic array of byte with some useful functions available for
searching and things like that. I think we were not alone...

>Again: Why not Unicode? Then you could throw away your hacks.

The application itself is Unicode now but we had to run circles around
the RS232 comm part. When converting to Unicode we first set the comm
related strings to be AnsiString...

PS: We never programmed the serial interface directly, we always used
commercial RS232 components and they all dealt with char and string...
DS

-- 
Bo Berglund
Developer in Sweden

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-13 Thread Juha Manninen via Lazarus

On Sun, Aug 13, 2017 at 7:41 PM, Bo Berglund via Lazarus
 wrote:
> And our data are NOT text, they are binary streams of bytes.

I see. Then TBytes indeed is the best choice.
You have misused "String" or "AnsiString" from the beginning for binary data.
There have always been warnings against it.
The new Lazarus Unicode system did not create the problem but made it
more visible.

Marcos Douglas however has a different problem.
Your recommendation to use RawByteString or TBytes does not apply in
his case and thus was a bit misleading.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Michael Schnell via Lazarus


On 13.08.2017 22:41, Juha Manninen via Lazarus wrote:

You have misused "String" or "AnsiString" from the beginning for binary data.
There have always been warnings against it.

While this might be true, it's decently silly, IMHO.

The name "String" can easily be interpreted as "String of things" and 
does not necessarily mean "String of printable stuff".


The management Pascal always provided for strings (after the "Short 
String" was not the only string type) (i.e. Operators, built-in 
functions, lazy copy, reference counting) is perfectly applicable to 
"Strings of things", and don't force any known encoding at all.


The drama only was introduced by Embarcadero's abysmal / sloppy 
implementation of automatic code conversion for strings.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Tony Whyman via Lazarus



On 13/08/17 12:18, Juha Manninen via Lazarus wrote:

Unicode was designed to solve exactly the problems caused by locale differences.
Why don't you use it?

I believe you effectively answer your own question in your preceding post:


Actually using the Windows system codepage is not safe any more.
The current Unicode system in Lazarus maps AnsiString to use UTF-8.
Text with Windows codepage must be converted explicitly.
This is a breaking change compared to the old Unicode suppport in
Lazarus 1.4.x + FPC 2.6.x.
If you are processing strings as "text" then you probably do not care 
how it is encoded and can live with "breaking changes". However, if, for 
some reason you are or need to be aware of how the text is encoded - or 
are using string types as a useful container for binary data then, types 
that sneak up on you with implicit type conversions or which have 
semantics that change between compilers or versions, are just another 
source of bugs.


PChar used to be  a safe means to access binary data - but not anymore, 
especially if you move between FPC and Delphi. (One of my gripes is that 
the FCL still makes too much use of PChar instead of PByte with the 
resulting Delphi incompatibility). The "string" type also used to be a 
safe container for any sort of binary data, but when its definition can 
change between compilers and versions, it is now something to be avoided.


As a general rule, I now always use PByte for any sort of string that is 
binary, untyped or encoding to be determined. It works across compilers 
(FPC and Delphi) with consistent semantics and is safe for such use.


I also really like AnsiString from FCP 3.0 onwards. By making the 
encoding a dynamic attribute of the type, it means that I know what is 
in the container and can keep control.


I am sorry, but I would only even consider using Unicodestrings as a 
type (or the default string type) when I am just processing text for 
which the encoding is a don't care, such as a window caption, or for 
intensive text analysis. If I am reading/writing text from a file or 
database where the encoding is often implicit and may vary from the 
Unicode standard then my preference is for AnsiString. I can then read 
the text (e.g. from the file) into a (RawByteString) buffer, set the 
encoding and then process it safely while often avoiding the overhead 
from any transliteration. PByte comes into its own when the file 
contains a mixture of binary data and text.


Text files and databases tend to use UTF-8 or are encoded using legacy 
Windows Code pages. The Chinese also have GB18030. With a database, the 
encoding is usually known and AnsiString is a good way to read/write 
data and to convey the encoding, especially as databases usually use a 
variable length multi-byte encoding natively and not UTF-16/Unicode. 
With files, the text encoding is usually implicit and AnsiString is 
ideal for this as it lets you read in the text and then assign the 
(implicit) encoding to the string, or ensure the correct encoding when 
writing.


And anyway, I do most of my work in Linux, so why would I even want to 
bother myself with arrays of widechars when the default environment is UTF8?


We do need some stability and consistency in strings which, as someone 
else noted have been confused by Embarcadero. I would like to see that 
focused on AnsiString with UnicodeString being only for specialist use 
on Windows or when intensive text analysis makes a two byte encoding 
more efficient than a variable length multi-byte encoding.


Tony Whyman
MWA

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Marcos Douglas B. Santos via Lazarus

On Sun, Aug 13, 2017 at 7:51 AM, Juha Manninen via Lazarus
 wrote:
> On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
>  wrote:
>> So AnsiString is not safe either
>
> That is a little misleading.
> Actually using the Windows system codepage is not safe any more.
> The current Unicode system in Lazarus maps AnsiString to use UTF-8.
> Text with Windows codepage must be converted explicitly.
> This is a breaking change compared to the old Unicode suppport in
> Lazarus 1.4.x + FPC 2.6.x.
> The right solution is to use Unicode everywhere. Windows codepages can
> be seen as a historical remain, retained for backwards compatibility.
> Now is year 2017, Unicode has been used for decades. Everybody should
> use it by now.

"The right solution is to use Unicode everywhere."
I agree. But would be best if the compiler uses Unicode everywhere and
us, developers, using just one type called "string"... Even if this
break the old code. Maybe, instead using "string", the new code should
be use just UnicodeString...

Well, I know that many people here already had this "fight" about
Unicode so, let's forget about it what the compiler "should" or not to
do.

> Marcos Douglas, please change the encoding in your text file to UTF-8.
> Every decent text editor, including the editor in Lazarus, has a
> feature to do it.
> Once the data is Unicode, it is all smooth sailing.
> Data is converted between UTF-8 and UTF-16 losslessly.

You're right.

> One more thing:
> Data for WideString/UnicodeString parameters in WinAPI functions are
> converted automatically. You can ignore the warning or suppress it by
> a type cast as Mattias showed.
> However for PWideChar parameters you should create an explicit
> temporary variable, usually UnicodeString but WideString for OLE.
> Assigning to it from your "String" data converts encoding.
> Then cast the new variable as the required pointer type.

This is a ugly trick... but I understood what you mean.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Marcos Douglas B. Santos via Lazarus

On Mon, Aug 14, 2017 at 6:53 AM, Tony Whyman via Lazarus
 wrote:
>
> On 13/08/17 12:18, Juha Manninen via Lazarus wrote:
>>
>> Unicode was designed to solve exactly the problems caused by locale
>> differences.
>> Why don't you use it?
>
> I believe you effectively answer your own question in your preceding post:
>
>> Actually using the Windows system codepage is not safe any more.
>> The current Unicode system in Lazarus maps AnsiString to use UTF-8.
>> Text with Windows codepage must be converted explicitly.
>> This is a breaking change compared to the old Unicode suppport in
>> Lazarus 1.4.x + FPC 2.6.x.
>
> If you are processing strings as "text" then you probably do not care how it
> is encoded and can live with "breaking changes". However, if, for some
> reason you are or need to be aware of how the text is encoded - or are using
> string types as a useful container for binary data then, types that sneak up
> on you with implicit type conversions or which have semantics that change
> between compilers or versions, are just another source of bugs.
>
> PChar used to be  a safe means to access binary data - but not anymore,
> especially if you move between FPC and Delphi. (One of my gripes is that the
> FCL still makes too much use of PChar instead of PByte with the resulting
> Delphi incompatibility). The "string" type also used to be a safe container
> for any sort of binary data, but when its definition can change between
> compilers and versions, it is now something to be avoided.
>
> As a general rule, I now always use PByte for any sort of string that is
> binary, untyped or encoding to be determined. It works across compilers (FPC
> and Delphi) with consistent semantics and is safe for such use.
>
> I also really like AnsiString from FCP 3.0 onwards. By making the encoding a
> dynamic attribute of the type, it means that I know what is in the container
> and can keep control.
>
> I am sorry, but I would only even consider using Unicodestrings as a type
> (or the default string type) when I am just processing text for which the
> encoding is a don't care, such as a window caption, or for intensive text
> analysis. If I am reading/writing text from a file or database where the
> encoding is often implicit and may vary from the Unicode standard then my
> preference is for AnsiString. I can then read the text (e.g. from the file)
> into a (RawByteString) buffer, set the encoding and then process it safely
> while often avoiding the overhead from any transliteration. PByte comes into
> its own when the file contains a mixture of binary data and text.
>
> Text files and databases tend to use UTF-8 or are encoded using legacy
> Windows Code pages. The Chinese also have GB18030. With a database, the
> encoding is usually known and AnsiString is a good way to read/write data
> and to convey the encoding, especially as databases usually use a variable
> length multi-byte encoding natively and not UTF-16/Unicode. With files, the
> text encoding is usually implicit and AnsiString is ideal for this as it
> lets you read in the text and then assign the (implicit) encoding to the
> string, or ensure the correct encoding when writing.

Unicode everywhere and you using AnsiString and doing everything...
Now I'm confused.

> And anyway, I do most of my work in Linux, so why would I even want to
> bother myself with arrays of widechars when the default environment is UTF8?

Maybe you do not have problems because you don't use Windows.

> We do need some stability and consistency in strings which, as someone else
> noted have been confused by Embarcadero. I would like to see that focused on
> AnsiString with UnicodeString being only for specialist use on Windows or
> when intensive text analysis makes a two byte encoding more efficient than a
> variable length multi-byte encoding.

FPC and Lazarus claim they are cross-platform — this is a fact — and
because that, IMHO, both should be use in only one way in every
system, don't you think?

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Michael Schnell via Lazarus


On 14.08.2017 14:50, Marcos Douglas B. Santos via Lazarus wrote:


"The right solution is to use Unicode everywhere."
Embarcadero though that this would not b the "right" solution. Otherwise 
they would not have invented the encoding aware strings.


IMHO that was a good idea. They only completely  failed to do a decent 
specification and implementation.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Tony Whyman via Lazarus



On 14/08/17 14:11, Marcos Douglas B. Santos via Lazarus wrote:

FPC and Lazarus claim they are cross-platform — this is a fact — and
because that, IMHO, both should be use in only one way in every
system, don't you think?

Best regards,
Marcos Douglas

Precisely. But why this fixation on UTF-16/Unicode and not UTF8?

Lazarus is already a UTF8 environment.

Much of the LCL assumes UTF8.

UTF8 is arguably a much more efficient way to store and transfer data

UTF-16/Unicode can only store 65,536 characters while the Unicode 
standard (that covers UTF8 as well) defines 136,755 characters.


UTF-16/Unicode's main advantage seems to be for rapid indexing of large 
strings.


You made need UTF-16/Unicode support for accessing Microsoft APIs but 
apart from that, why is it being promoted as the universal standard?

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Mattias Gaertner via Lazarus

On Mon, 14 Aug 2017 14:21:57 +0100
Tony Whyman via Lazarus  wrote:

>[...]
> Lazarus is already a UTF8 environment.
> 
> Much of the LCL assumes UTF8.

True.

> UTF8 is arguably a much more efficient way to store and transfer data

It depends.

> UTF-16/Unicode can only store 65,536 characters while the Unicode 
> standard (that covers UTF8 as well) defines 136,755 characters.

No. 
UTF-16 can encode the full 1 million Unicode range. It uses one or
two words per codepoint. UTF-8 uses 1 to 4 bytes.
See here for more details:
https://en.wikipedia.org/wiki/UTF-16

Although you are right, that there are still many applications, that
falsely claim to support UTF-16, but only support the first $D800
codepoints.

> UTF-16/Unicode's main advantage seems to be for rapid indexing of large 
> strings.

That's only true for UCS-2, which is obsolete.

> You made need UTF-16/Unicode support for accessing Microsoft APIs but 
> apart from that, why is it being promoted as the universal standard?

Who does that?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Graeme Geldenhuys via Lazarus


On 2017-08-13 11:51, Juha Manninen via Lazarus wrote:

Now is year 2017, Unicode has been used for decades. Everybody should
use it by now.


Indeed, I can't agree more. Plus, I normally use UTF-8 for any text 
files I create.


Regards,
  Graeme

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Tony Whyman via Lazarus


On 14/08/17 14:46, Mattias Gaertner via Lazarus wrote:

You made need UTF-16/Unicode support for accessing Microsoft APIs but
apart from that, why is it being promoted as the universal standard?

Who does that?

Mattias


Because the obvious implication when someone argues against AnsiString 
(from which UTF8String derives) and talks about Unicode is that they are 
promoting UTF-16 and the UnicodeString type. Perhaps this is because I 
am old enough to remember when MS first added wide characters to Windows 
and that they called it "Unicode". To me, when people say "Unicode" they 
mean Windows wide characters.


Perhaps the problem is the use of the word "Unicode".  By trying to 
embrace UTF8, UTF16 and UTF32 with the older UCS-2 it is perhaps too 
ambiguous a term - especially as the Delphi/FPC UnicodeString type 
exists and probably (but I'm not certain)  means UTF-16.


What I see in FPC/Lazarus today is:

-  UTF8 supported through AnsiString.

- A confusion of Widestring/UnicodeString for UTF-16 and legacy UCS-2.

- Nothing for UTF-32.

If nothing else, FPC Lazarus could do with a clean-up of both 
terminology and string types. Indeed, why isn't there a single container 
string type for all character sets where the encoding whether a legacy 
code page, UTF8, UTF16 or UTF32 is simply a dynamic attribute of the 
type - a sort of extended AnsiString?






-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Graeme Geldenhuys via Lazarus


On 2017-08-14 15:11, Tony Whyman via Lazarus wrote:

ambiguous a term - especially as the Delphi/FPC UnicodeString type
exists and probably (but I'm not certain)  means UTF-16.


Yes, that is f**ken annoying. FPC should have named it what it really is 
- UTF16String! But instead they followed Delphi like a lemming and named 
it UnicodeString.


In reality, UNICODE means text with an encoding of any of UTF-8, 
UTF-16LE, UTF-16BE, or UTF-32.


In terms of Delphi and FPC, they decided Unicode = UTF-16. I'm not even 
sure if they mean LE or BE.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Sven Barth via Lazarus

Am 14.08.2017 16:11 schrieb "Tony Whyman via Lazarus" <
lazarus@lists.lazarus-ide.org>:
> If nothing else, FPC Lazarus could do with a clean-up of both terminology
and string types. Indeed, why isn't there a single container string type
for all character sets where the encoding whether a legacy code page, UTF8,
UTF16 or UTF32 is simply a dynamic attribute of the type - a sort of
extended AnsiString?

The main problem of such a dynamic type would be the inability to do fast
indexing as the compiler would need to insert runtime checks for the size
of a character. I had already thought the same, but then had to discard the
idea due to this.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Sven Barth via Lazarus

Am 14.08.2017 16:21 schrieb "Graeme Geldenhuys via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> On 2017-08-14 15:11, Tony Whyman via Lazarus wrote:
>>
>> ambiguous a term - especially as the Delphi/FPC UnicodeString type
>> exists and probably (but I'm not certain)  means UTF-16.
>
>
> Yes, that is f**ken annoying. FPC should have named it what it really is
- UTF16String! But instead they followed Delphi like a lemming and named it
UnicodeString.

Because the crowd demanding Delphi compatibility is larger than the crowd
demanding exact terminology.

> In reality, UNICODE means text with an encoding of any of UTF-8,
UTF-16LE, UTF-16BE, or UTF-32.
>
> In terms of Delphi and FPC, they decided Unicode = UTF-16. I'm not even
sure if they mean LE or BE.

If I remember correctly it depends on the endianess of the platform...
Though I could be wrong.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Marcos Douglas B. Santos via Lazarus

On Mon, Aug 14, 2017 at 10:21 AM, Tony Whyman via Lazarus
 wrote:
>
> On 14/08/17 14:11, Marcos Douglas B. Santos via Lazarus wrote:
>>
>> FPC and Lazarus claim they are cross-platform — this is a fact — and
>> because that, IMHO, both should be use in only one way in every
>> system, don't you think?
>>
>> Best regards,
>> Marcos Douglas
>
> Precisely. But why this fixation on UTF-16/Unicode and not UTF8?

I have no fixation in any Unicode flavors...
My "problem" is because I use Windows, not Linux where UTF8 is the default.

> Lazarus is already a UTF8 environment.
>
> Much of the LCL assumes UTF8.
>
> UTF8 is arguably a much more efficient way to store and transfer data
>
> UTF-16/Unicode can only store 65,536 characters while the Unicode standard
> (that covers UTF8 as well) defines 136,755 characters.
>
> UTF-16/Unicode's main advantage seems to be for rapid indexing of large
> strings.
>
> You made need UTF-16/Unicode support for accessing Microsoft APIs but apart
> from that, why is it being promoted as the universal standard?

I didn't propose that.
But take a look in other languages, see what they are using.


Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-14 Thread Juha Manninen via Lazarus

On Mon, Aug 14, 2017 at 5:11 PM, Tony Whyman via Lazarus
 wrote:
> Indeed, why isn't there a single container string type for
> all character sets where the encoding whether a legacy code page, UTF8,
> UTF16 or UTF32 is simply a dynamic attribute of the type - a sort of
> extended AnsiString?

As Sven Barth wrote, they have different size of char.

Tony Whyman, this issue has been discussed again and again for the
past 10+ years first in FPC mailing lists and then in Lazarus lists.
The current Unicode support in Lazarus works f***ing well and is
amazingly compatible with Delphi.
WinAPI parameters may require an explicit temporary UnicodeString
variable but even then the code is compatible with Delphi.

Tony Whyman, Marcos Douglas and Michael Schnell, please study the facts.
For starters, this is about the current Unicode support in Lazarus:
  http://wiki.freepascal.org/Unicode_Support_in_Lazarus
I think the dynamic encoding and automatic conversion now work perfectly well.
If you have a piece of code where it does not work, please ask for
detailed info.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 14.08.2017 18:49, Sven Barth via Lazarus wrote:


Because the crowd demanding Delphi compatibility is larger than the 
crowd demanding exact terminology.



... or even a revised concept avoiding the junk presented by Embarcadero :(

But obviously the fpc team has no choice.

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 14.08.2017 18:47, Sven Barth via Lazarus wrote:


The main problem of such a dynamic type would be the inability to do 
fast indexing as the compiler would need to insert runtime checks for 
the size of a character.



What "indexing" do you think of ?
Could you give an example where such a difference is supposed to get 
important ?


(As you know I wrote a paper where I claimed the contrary. I'd like to 
revise same if necessary.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Tony Whyman via Lazarus


On 14/08/17 17:47, Sven Barth via Lazarus wrote:
The main problem of such a dynamic type would be the inability to do 
fast indexing as the compiler would need to insert runtime checks for 
the size of a character. I had already thought the same, but then had 
to discard the idea due to this.


Is this really a big problem? It is not as if it would be necessary to 
do a table lookup everytime you index a string as the indexing method 
could be an attribute of the string and updated with the character 
encoding attribute. Is it really that complicated for the compiler to 
generate code that jumps to an indexing method depending upon a data 
attribute?


Is your problem really more about the result type as, depending on the 
character width, the result could be an AnsiChar or WideChar or a UTF8 
character for which I don't believe there is a defined char type (other 
than an arguable  mis-use of UCS4Char)?


I can accept that a clear up of this area would also have to extend to 
the char types as well - but I would also argue that that is well 
overdue. On a quick count, I found 7 different char types in the system 
unit.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus

On Sat, 12 Aug 2017 17:56:58 -0300
"Marcos Douglas B. Santos via Lazarus" 
wrote:

>[...]
> > Which one? Do you mean Windows CP-1252?  
> 
> Yes...
> But would it make any difference?

Just

> >>[...]
> >> Warning: Implicit string type conversion from "AnsiString" to "WideString" 
> >>  
> >
> > Explicit type cast:
> >
> > Lib.SetLicense(
> >WideString(IniFile.ReadString('TheLib', 'license', ''))
> > );  
> 
> Wow... everywhere? :(

You could instead define an overloaded Lib.SetLicense(AnsiString). Or
you could disable this hint altogether for your project (not
recommended). Select the message in the Messages window. Right click
and click on add -vm

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus

On Mon, 14 Aug 2017 18:47:58 +0200
Sven Barth via Lazarus  wrote:

>[...]
> The main problem of such a dynamic type would be the inability to do fast
> indexing as the compiler would need to insert runtime checks for the size
> of a character. I had already thought the same, but then had to discard the
> idea due to this.

IMHO the main problem of adding a new string type is
https://xkcd.com/927/

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Tony Whyman via Lazarus


On 14/08/17 22:01, Juha Manninen via Lazarus wrote:

Tony Whyman, this issue has been discussed again and again for the
past 10+ years first in FPC mailing lists and then in Lazarus lists.
The current Unicode support in Lazarus works f***ing well and is
amazingly compatible with Delphi.
WinAPI parameters may require an explicit temporary UnicodeString
variable but even then the code is compatible with Delphi.

Tony Whyman, Marcos Douglas and Michael Schnell, please study the facts.
For starters, this is about the current Unicode support in Lazarus:
   http://wiki.freepascal.org/Unicode_Support_in_Lazarus
I think the dynamic encoding and automatic conversion now work perfectly well.
If you have a piece of code where it does not work, please ask for
detailed info.
If a topic keeps on being discussed after 10+ years of argument, the 
reason is usually either (a) the problem and its solution have not been 
documented properly, or (b) the outcome is an unsatisfactory compromise.


In this case, I would argue that both are true.

I went back and read the wiki article you mentioned and was no more the 
wiser as to why the current mess exists. Is it really no more than 
because Delphi continues to screw up in this area, so must FPC? The body 
of the article appears to be a set of notes - not necessarily wrong in 
themselves but lacking the background and context needed to explain why 
it is like it is.


This problem will keep coming up until it is fixed properly and, by 
that, I mean the that solution is consistent, understandable intuitively 
and well documented. Windows eccentricity also need to kept to Windows.


Here is my wish list:

1. Stop using the term "Unicode".

   It is too ambiguous. It is used as both an all embracing term for
   multi-byte encoding and as a synonym for UTF16 and that is really
   too confusing. The problem is made worse by having UnicodeString as
   a two byte wide string type in both FPC and Delphi.


2. Clean up the char type.

   When Wirth created the "char" type in Pascal it was a simple ASCII
   or EBCDIC character. There are now seven different char types
   (including type equivalence) with no guidelines on when each is
   applicable. This is too many. Why shouldn't there be a single char
   type that intuitively represents a single character regardless of
   how many bytes are used to represent it. Yes, in a world where we
   have to live with UTF8, UTF16, UTF32, legacy code pages and Chinese
   variations on UTF8, that means that dynamic attributes have to be
   included in the type. But isn't that the only way to have consistent
   and intuitive character handling?


3. The problem with string handling today is that it is not based on a 
consistent approach to the character type.


   If you clean up character handling then the model for string
   handling should become obvious. A string is after all no more than a
   container for a character array and which should be constrained to
   have the same character encoding. A string should intuitively
   represent a string of text regardless of how many bytes are used to
   represent each character and with dynamic attributes to tell you how
   it is encoded.


4. FPC should clean up Delphi's mess for it. If a unified string type 
follows a consistent model then it should be possible to make all Delphi 
string types synonyms.


   You will need to allow exceptions for legacy programs that insist on
   manipulating the bytes themselves - but that is not rocket science.
   There is also the issue of the Windows API and its insistence on
   Wide Strings - but isn't that why calling conventions such as cdecl
   and stdcall exist - to tell the compiler when it needs to reformat
   the call for a given API convention.

Tony Whyman



-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Tony Whyman via Lazarus


You can me as a "like" on that one.


On 15/08/17 10:13, Mattias Gaertner via Lazarus wrote:

IMHO the main problem of adding a new string type is
https://xkcd.com/927/


--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus




On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote:


On Mon, 14 Aug 2017 18:47:58 +0200
Sven Barth via Lazarus  wrote:


[...]
The main problem of such a dynamic type would be the inability to do fast
indexing as the compiler would need to insert runtime checks for the size
of a character. I had already thought the same, but then had to discard the
idea due to this.


IMHO the main problem of adding a new string type is
https://xkcd.com/927/


Exactly. I don't think we should add even more.

As it is now, FPC offers a way out for all cases:

WideString/UnicodeString for those that want 2-byte characters.
A codepage-aware single-byte string for those that want 1-byte characters.
The shortstring is even still available.

Attempting to store binary data in a string is not advisable. 
Dynamic arrays, TBytes and - in the worst case - TBytesStream are powerful enough to

cover most use-cases in this area.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 11:25, Michael Van Canneyt via Lazarus wrote:

WideString/UnicodeString for those that want 2-byte characters.
A codepage-aware single-byte string for those that want 1-byte 
characters.

The shortstring is even still available.


IM (often stated) O, this does not help as long as TStrings does not 
without forced auto-conversion support the string type the user is 
inclined to choose.


This obviously requires an (additional) fully dynamic string brand.

This (again obviously) is not the "Embarcadero way", but supposedly does 
not necessarily lead to incompatibility regarding the user code.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus




On Tue, 15 Aug 2017, Michael Schnell via Lazarus wrote:


On 15.08.2017 11:25, Michael Van Canneyt via Lazarus wrote:

WideString/UnicodeString for those that want 2-byte characters.
A codepage-aware single-byte string for those that want 1-byte 
characters.

The shortstring is even still available.


IM (often stated) O, this does not help as long as TStrings does not 
without forced auto-conversion support the string type the user is 
inclined to choose.


Please check TStrings in trunk. This exists.

procedure LoadFromFile(const FileName: string; AEncoding: TEncoding); overload; 
virtual;
procedure LoadFromStream(Stream: TStream; AEncoding: TEncoding); overload; 
virtual;

The only 'problem' is that TStrings uses a single-byte string.

This cannot be solved properly except by duplicating the classes unit.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:


In this case, I would argue that both are true.
And the culprit obviously is Embarcadeo and not the fpc or the Lazarus 
team, who did their best to try to do a compatible and implementation 
that is really workable on the multiple supported platforms (which E$ 
did not feel necessary when they released the encoding aware strings).


Maybe a better solution can be found, but who would want to nudge the 
fpc / Lazarus developers to invest a huge amount of time to create it 
and then make sure it is decently tested stable ?


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote:

This cannot be solved properly except by duplicating the classes unit.


Sorry to disagree, but IMHO this can only be solved properly by defining 
an additional fully dynamically encoded string type and use same for 
TStrings (see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
)


But I am perfectly aware that implementing this would be a huge effort 
(see other mail here), and nobody i entitled to ask for this. (I wrote 
the article just to elaborate what was discussed in the fpc mailing list 
at that time.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus

On Tue, 15 Aug 2017 12:02:28 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote:
> > This cannot be solved properly except by duplicating the classes unit.  
> 
> Sorry to disagree, but IMHO this can only be solved properly by defining 
> an additional fully dynamically encoded string type and use same for 
> TStrings (see -> 
> http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support
>  
> )

It does not explain what the characters of DynamicString are, does it?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus




On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote:


On Tue, 15 Aug 2017 12:02:28 +0200
Michael Schnell via Lazarus  wrote:


On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote:
> This cannot be solved properly except by duplicating the classes unit. 

Sorry to disagree, but IMHO this can only be solved properly by defining 
an additional fully dynamically encoded string type and use same for 
TStrings (see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
)


It does not explain what the characters of DynamicString are, does it?


I was just going to write that.

The problem of the element size is circumvented by simply not digging into it.

What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ?


Michael.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 12:11, Mattias Gaertner via Lazarus wrote:
It does not explain what the characters of DynamicString are, does it? 


I don't understand what you are asking.

The element size and encoding of a Dynamic String ("CP_ANY" in the 
document) are not predefined, but depend on the content:


http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
-> Defining String variables and String types:
*CP_ANY* = $FF00 // ElementSize dynamically assigned // fully 
dynamical String for intermediate storing string content // just 
assigned to the Type or variable, never used in the "Encoding" field 
in the string header. 



Hence it stores the "branding" when it is assigned to from a string with 
a fixed branding (such as *CP_UTF8*), and the content is auto-converted 
if necessary when  assigning form CP_ANY to a fixed branded string variable.



If (in your example) the data is read from a file, a CP_ANY Strings 
based StringList would keep the encoding/char_size of the data as t is 
in the file (it would need to somehow get to know the presumed encoding 
of the file, anyway) and store that information in the 
EncodingBrandNumber and ElementSize fields (which do exist in any 
"NewString" variable, anyway), in each String read.


If the user assignes an element of the stringlist to a fixed branding 
(such as *CP_UTF8*),  the content obviously is auto-converted if 
necessary when  assigning form CP_ANY to a fixed branded string 
variable, as usual.


In fact I suppose that the current implementation of TStringlist does 
not use new strings to store the data on the heap, but I never said that 
trying to implement such idea would not require a lot of work.


-Michael
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 12:15, Michael Van Canneyt via Lazarus wrote:

What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ?
Regarding the users' appreciation, the S[x] notation is decently 
incompatible between the different string types and compiler versions.


There were hundreds of complains in all the appropriate forums and 
mailing list.


So not much additional harm can be done, anyway.

I suggest that it should be according to the character_size definition 
stored S, and the operation c := S[x] should transfer the appropriate 
count of bits, provided the type of c allows for taking them.


This seems to be compatible to the current implementation of any 1-Byte 
brand and UTF16.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus




On Tue, 15 Aug 2017, Michael Schnell via Lazarus wrote:


On 15.08.2017 12:15, Michael Van Canneyt via Lazarus wrote:

What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ?
Regarding the users' appreciation, the S[x] notation is decently 
incompatible between the different string types and compiler versions.


Of course not.

It's 1 byte for ansistring, 2 bytes for widestring.

The point is that the compiler knows how many bytes it is based on the
declaration of S. In your proposal, it is dynamic, if I understand it
correctly.

There were hundreds of complains in all the appropriate forums and 
mailing list.


Complaints about what exactly ?



So not much additional harm can be done, anyway.

I suggest that it should be according to the character_size definition 
stored S, and the operation c := S[x] should transfer the appropriate 
count of bits, provided the type of c allows for taking them.


As far as I understand your proposal, this currently cannot be done ?

The compiler needs to know the S[X] size at compile time.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Bart via Lazarus

On 8/15/17, Tony Whyman via Lazarus  wrote:

> 2. Clean up the char type.
>
> Why shouldn't there be a single char
> type that intuitively represents a single character regardless of
> how many bytes are used to represent it.

You would have to define what "a single character" means in the first place.
This is especially important when it involves precomposed characters
and combining characters.

Bart
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:
Why shouldn't there be a single char type that intuitively represents 
a single character regardless of how many bytes are used to represent it.


I suppose by "char" you mean "single printable thingy" with Unicode it's 
rather debatable what such a thingy is.


Hence a Unicode singe char would need to be just be a Unicode string.

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:


3. The problem with string handling today is that it is not based on a 
consistent approach to the character type.


If you clean up character handling then the model for string
handling should become obvious. A string is after all no more than
a container for a character array and which should be constrained
to have the same character encoding. A string should intuitively
represent a string of text regardless of how many bytes are used
to represent each character and with dynamic attributes to tell
you how it is encoded.


4. FPC should clean up Delphi's mess for it. If a unified string type 
follows a consistent model then it should be possible to make all 
Delphi string types synonyms.


You will need to allow exceptions for legacy programs that insist
on manipulating the bytes themselves - but that is not rocket
science. There is also the issue of the Windows API and its
insistence on Wide Strings - but isn't that why calling
conventions such as cdecl and stdcall exist - to tell the compiler
when it needs to reformat the call for a given API convention.


see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support


-Michael
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus

On Tue, 15 Aug 2017 14:26:34 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:
> > Why shouldn't there be a single char type that intuitively represents 
> > a single character regardless of how many bytes are used to represent it.  
> 
> I suppose by "char" you mean "single printable thingy" with Unicode it's 
> rather debatable what such a thingy is.
> 
> Hence a Unicode singe char would need to be just be a Unicode string.

Do you mean a 'char' is a string in your proposal?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus

On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 14:26:34 +0200
Michael Schnell via Lazarus  wrote:

On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:
> Why shouldn't there be a single char type that intuitively represents 
> a single character regardless of how many bytes are used to represent it. 

I suppose by "char" you mean "single printable thingy" with Unicode it's 
rather debatable what such a thingy is.

Hence a Unicode singe char would need to be just be a Unicode string.

Do you mean a 'char' is a string in your proposal?

That would be a neat recursive definition :)

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus


On 15.08.2017 14:53, Mattias Gaertner via Lazarus wrote:
Do you mean a 'char' is a string in your proposal? 
Nope. In my proposal there would be Chars for any statically encoded 
String Type, hence 1, 2, 4, and 8 byte wide. (As regarding statically 
encoded string (and char) brands, it's just an extension of the existing 
paradigm.


I did not think about the necessity to also have a dynamically encoded 
Char type. If yes, it (like a string) would need the additional fields 
for encoding number and bytes_per_char, and the appropriate compiler 
magic to handle them appropriately (workalike to a on-element string).


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus

On Tue, 15 Aug 2017 16:44:30 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 14:53, Mattias Gaertner via Lazarus wrote:
> > Do you mean a 'char' is a string in your proposal?   
> Nope. In my proposal there would be Chars for any statically encoded 
> String Type, hence 1, 2, 4, and 8 byte wide. (As regarding statically 
> encoded string (and char) brands, it's just an extension of the existing 
> paradigm.

8 bytes?

Do you propose a string without the array operator [] ?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus


On 2017-08-15 10:52, Michael Van Canneyt via Lazarus wrote:

The only 'problem' is that TStrings uses a single-byte string.


Why can't that be changed to a UnicodeString or UTF8String - after all, 
the Unicode standard is meant to support all languages. I would have 
thought that would be an obvious move for a Unicode-aware RTL. TStrings 
could also be extended (if it hasn't already) to keep track of what 
encoding is read in from file, and what encoding in should procedure 
when lines are extracted - in case those two encodings are not the same.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus


El 15/08/17 a les 11:25, Michael Van Canneyt via Lazarus ha escrit:



Attempting to store binary data in a string is not advisable. Dynamic 
arrays, TBytes and - in the worst case - TBytesStream are powerful 
enough to

cover most use-cases in this area.


I has worked extremely well and reliably until fpc 2.6.4 (i.e. with 
string=ansistring).

Does it not work in 3.x?
If not it's a big problem, not only for my code (that I can, 
reluctantly, change) but for 3rd party libraries/components (e.g. 
synapse comes to mind)


Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread wkitty42--- via Lazarus


On 08/15/2017 05:25 AM, Michael Van Canneyt via Lazarus wrote:

As it is now, FPC offers a way out for all cases:

WideString/UnicodeString for those that want 2-byte characters.



what if 3 and 4 byte characters are required? will they also work in 
UnicodeStrings?

i'm looking at this from a linux POV but have been trying to come from the very 
old school DOS TP stuff using codepages... especially needing to be able to read 
codepage strings and properly convert all their characters to UTF-8...


converting back would be a huge help, too... even with the possible loss of 
characters requiring replacing them with "?" or something to hold their place 
and show they didn't convert... that or even leaving them in their 2, 3 or 4 
byte form and let those using older codepage stuff see them raw...



--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list unless*
   *a signed and pre-paid contract is in effect with us.*
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus


On 2017-08-15 18:29, Luca Olivetti via Lazarus wrote:

but for 3rd party libraries/components (e.g.
synapse comes to mind


Then better start filing bug reports to all those 3rd party libraries 
and components - they have been abusing the system and will silently 
fail. Not to mention that FPC is almost at v3.0.4 and the new string 
changes were introduced in v3.0.0 already.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus


El 15/08/17 a les 21:14, Graeme Geldenhuys via Lazarus ha escrit:

On 2017-08-15 18:29, Luca Olivetti via Lazarus wrote:

but for 3rd party libraries/components (e.g.
synapse comes to mind


Then better start filing bug reports to all those 3rd party libraries 
and components - they have been abusing the system and will silently 
fail. Not to mention that FPC is almost at v3.0.4 and the new string 
changes were introduced in v3.0.0 already.


Wait a minute, why "abuse"?
After all, before code aware strings, an ansistring could store any kind 
of arbitrary data with no problem and no conversion, and made it 
extremely easy to, e.g., add bytes to a buffer or find and extract data 
from the same buffer.
*If* code that worked before (and dare I say without abusing the 
language) suddenly breaks, the bug is in the compiler and not in the 
library.
(I remarked the "if" because I don't know if that's the case, according 
to Bo Berglund's experience it is)


Bye

--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:

>[...]
> *If* code that worked before (and dare I say without abusing the 
> language) suddenly breaks, the bug is in the compiler and not in the 
> library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Ondrej Pokorny via Lazarus


On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus


El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit:

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.



If that's all it's OK then, thank you.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus


El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit:

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.


If that's all it's OK then, thank you.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus


El 15/08/17 a les 22:08, Luca Olivetti ha escrit:

El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit:

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.



If that's all it's OK then, thank you.


Sorry for the direct reply, it was meant for the list.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus


On 2017-08-15 20:22, Luca Olivetti via Lazarus wrote:

Wait a minute, why "abuse"?
After all, before code aware strings, an ansistring could store any kind
of arbitrary data with no problem and no conversion, and made it
extremely easy



Just listen to what you are saying A string type and you want to 
store all kinds of non-string related data in that type. How is that not 
"abuse"???  Use a TBytes, TStream or other binary byte based storage 
mechanism. A string type was definitely not the right choice.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus


El 15/08/17 a les 22:45, Graeme Geldenhuys via Lazarus ha escrit:

On 2017-08-15 20:22, Luca Olivetti via Lazarus wrote:

Wait a minute, why "abuse"?
After all, before code aware strings, an ansistring could store any kind
of arbitrary data with no problem and no conversion, and made it
extremely easy



Just listen to what you are saying A string type and you want to 
store all kinds of non-string related data in that type. How is that not 
"abuse"???  Use a TBytes, TStream or other binary byte based storage 
mechanism. A string type was definitely not the right choice.


A "string" was just a handy container for bytes so I think it was the 
right choice for storing, er, bytes.


Bye

--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus


On 2017-08-15 23:41, Luca Olivetti via Lazarus wrote:

A "string" was just a handy container for bytes so I think it was the
right choice for storing, er, bytes.



The type "String" has always been an alias to another type, and could 
mean many things. eg: ShortString, AnsiString, and now UnicodeString. 
Making the assumption that it will always be a container for byte sized 
data was wrong.


In hind sight, using TBytes or TMemoryStream and it would have been very 
clear that it is a storage container for byte sized data, and no 
automatic conversion (by the compiler) would be done to data stored in 
such containers.


Don't worry though, you were not alone in making that wrong assumption. 
Many Delphi developers have made that mistake, and some are still making 
that mistake today.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Bo Berglund via Lazarus

On Tue, 15 Aug 2017 21:22:10 +0200, Luca Olivetti via Lazarus
 wrote:

>(I remarked the "if" because I don't know if that's the case, according 
>to Bo Berglund's experience it is)

Just to expand on my "experience" and the reason I posted:

My work on converting the old program started back a couple of years
when I went from Delphi 2007 (pre-unicode) to Delphi XE5 because we
wanted the GUI to be translatable to non-western languages.

But then all the communications functions (and these are many in this
utility application) broke because they used strings as containers for
the inherently binary serial data.

So I followed advice on the Embarcadero forum to switch to AnsiString
because that was really what the old string type was an alias for.
I had no great insight in the inner workings of the string handling
functions but I "knew" that AnsiString was a 1-byte per element and
(unicode)string was now a 2-byte per element container. The fact that
the code could alter the content of the AnsiString did not dawn on me
at all.
And the comm functions worked fine after the change (I tested a lot,
but of course only on my English Win7 computer).

Then some time ago there was a report of a failure of the new program
version that only happened in Korea, China and Thailand. In the log
files there was a very strange entry about finding an illegal command
byte when sending a command to the equipment.

It never triggered when I debugged the problem, for me and my
collegues it worked flawlessly. So I had to add more logging and found
that the problem arose when the outgoing command was built. A certain
1-byte command was then expanded to 2 bytes with the wrong first byte!
The commands in the protocol are the first byte of the data of a
telegram and they are in range $C0..$E9.
When one of these (I don't now remember exactly which one) was used in
an assignment to the AnsiString buffer it was converted to $3F +
something that was never logged and the operation failed because the
equipment could not decode the command.

So I asked again on the forum and was steered towards RawByteString
because presumably that container would disallow conversions.
And when I changed this and sent a new version to the distributor in
Korea the problem was seemingly gone.

Based on this experience I wanted to alert the OP of the fact that
using AnsiString instead of string is not a cure-all for binary data,
you need to fix the codepage too, which is what the RawByteString does
for you

But I have now moved on and replaced all comm related containers with
TBytes including modifying the serial component we have used.
(With some help from Remy Lebeau).

-- 
Bo Berglund
Developer in Sweden

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-15 Thread Bo Berglund via Lazarus

On Wed, 16 Aug 2017 07:53:11 +0200, Bo Berglund via Lazarus
 wrote:

>But I have now moved on and replaced all comm related containers with
>TBytes including modifying the serial component we have used.
>(With some help from Remy Lebeau).

I forgot to mention that the problem area is located inside a non-GUI
class file for handling the communications, and this file is also used
in some programs written in FPC for Raspberry Pi target computers.
I.e. Linux and the reason for going to FPC.
So I want it to be both FPC and Delphi compatible...

-- 
Bo Berglund
Developer in Sweden

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 15.08.2017 19:18, Graeme Geldenhuys via Lazarus wrote:


Why can't that be changed to a UnicodeString or UTF8String


IMHO, any implementation of TStrings that forces a conversion (just 
because the class uses TStrings and not due to a logical demand), is a 
contradiction to providing code aware strings at all.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 15.08.2017 19:29, Luca Olivetti via Lazarus wrote:
I has worked extremely well and reliably until fpc 2.6.4 (i.e. with 
string=ansistring).

Does it not work in 3.x?
I understand that storing uncoded Bytes in UTF8-Strings (hence in fpc) 
works as good as it always had, as long as all strings are defined with 
the same code branding as TSrings (and friends) is (i.e. UTF8), because 
there never will be a conversion.


But it does not work in Delphi, as here TStrings is defined to be UTF-16.

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 15.08.2017 21:38, Ondrej Pokorny via Lazarus wrote:


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.
This only works if all tools that you use do the same. And a major tool 
for handling strings is TStrings and it's siblings. You hardly an avoid 
using same.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Mattias Gaertner via Lazarus

On Wed, 16 Aug 2017 10:47:37 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 19:29, Luca Olivetti via Lazarus wrote:
> > I has worked extremely well and reliably until fpc 2.6.4 (i.e. with 
> > string=ansistring).
> > Does it not work in 3.x?  
> I understand that storing uncoded Bytes in UTF8-Strings (hence in fpc) 
> works as good as it always had, as long as all strings are defined with 
> the same code branding as TSrings (and friends) is (i.e. UTF8), because 
> there never will be a conversion.
> 
> But it does not work in Delphi, as here TStrings is defined to be UTF-16.

This thread is going out of topic.
Please start a new thread if you want to discuss Delphi strings.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus

On Wed, Aug 16, 2017 at 8:53 AM, Bo Berglund via Lazarus
 wrote:
> Based on this experience I wanted to alert the OP of the fact that
> using AnsiString instead of string is not a cure-all for binary data,
> you need to fix the codepage too, which is what the RawByteString does
> for you

Bo, everybody has known for decades that AnsiString is not for binary data.
Why do you proclaim it as a new discovery?
The OP's problem was completely different. It was about text encoding.
TBytes is clearly the right choice for your binary data, but this
discussion is not about binary data!

What means "AnsiString instead of string"?
String is typically an alias for AnsiString.

Your sentence about RawByteString is also wrong.
There is no automatic codepage conversion for RawByteString.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 15.08.2017 22:45, Graeme Geldenhuys via Lazarus wrote:

 How is that not "abuse"???
IMHO it's a major shortcoming to define "string" as "printable text". In 
fact the name "String" does not suggest this at all. A "string" in my 
understanding just is a  sequence of similar "things".



A string type was definitely not the right choice.
Notwithstanding the discussion about the mere wording, this only would 
hold, if the system would provide a differently named non "printable 
text" basic type that comes with the features needed for such usage: 
reference counting, lazy copy, simple operators for concatenating and 
element extraction and replacement, built-in function for substring 
locating, ...


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Van Canneyt via Lazarus




On Wed, 16 Aug 2017, Michael Schnell via Lazarus wrote:


On 15.08.2017 22:45, Graeme Geldenhuys via Lazarus wrote:

 How is that not "abuse"???

IMHO it's a major shortcoming to define "string" as "printable text".


On the contrary. That is exactly what it means. 
Anything else is just a collection of bytes.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus


On 2017-08-16 09:43, Michael Schnell via Lazarus wrote:

IMHO, any implementation of TStrings that forces a conversion (just
because the class uses TStrings and not due to a logical demand), is a
contradiction to providing code aware strings at all.


But in FPC 3.x (using modern compiler modes - not TP or Mac) String = 
UnicodeString. So it makes sense that TStrings should use UnicodeString 
internally to store its data. The Unicode standard is also the only 
standard that can support any language. So all Windows code-pages can be 
supported with the single UnicodeString type.


Are you suggesting that internally TStrings should have different 
storage for all possible languages, or some RawByteString type? So if 
you load some non-Latin code-page text internally it still stores that 
text as that non-Latin bytes? That would just over-complicate the 
TStrings class. FPC is moving towards UnicodeString being used 
internally for everything in the RTL, so why must TStrings be any different.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 16.08.2017 10:58, Mattias Gaertner via Lazarus wrote:

This thread is going out of topic.
Please start a new thread if you want to discuss Delphi strings.
You can't discuss fpc's string problems without mentioning Delphi, as 
they are a direct consequence as well of Delphi-compatibility as of 
Delphi-incompatibility.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 15.08.2017 19:53, wkitty42--- via Lazarus wrote:
what if 3 and 4 byte characters are required? will they also work in 
UnicodeStrings?
UTF-8 and UTF-16 are just encoding variants for 32 bit Unicode 
"characters", storing them in n (or 2*n) Bytes according to a simple 
scheme.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus

On Mon, Aug 14, 2017 at 4:11 PM, Marcos Douglas B. Santos via Lazarus
 wrote:
> Unicode everywhere and you using AnsiString and doing everything...
> Now I'm confused.

Yes, please read:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus
I have advertised it so much that some people are already irritated,
but maybe you missed it so far.

> FPC and Lazarus claim they are cross-platform — this is a fact — and
> because that, IMHO, both should be use in only one way in every
> system, don't you think?

Yes, and that's how it works.

> This is a ugly trick... but I understood what you mean.

This was about the explicit temporary UnicodeString variable for
WinAPI call parameters.
No, it is not ugly, the code remains 100% compatible with Delphi.
Please remember also that direct WinAPI call are not needed in
cross-platform code.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus



On 15.08.2017 18:33, Mattias Gaertner via Lazarus wrote:
Do you propose a string without the array operator [] ? 

I don't understand what you mean by this.

Of course an appropriate "char" type for each string encoding brand 
could to be provided, hence a "CP_QWord Char" as an alias or a QWord.


(Please keep in mind that in that paper (as explicitly pointed out) 
"String" is not a synonym for "printable text" but for "sequence of 
similar things". And  here of course (at least in a 64 bit system) it's 
extremely appropriate to allow for 64 bit elements. And of course this 
is just a suggestion that could solve a certain class of problems but 
needs a big effort to do and verify the modifications in the compiler 
and the libraries.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Mattias Gaertner via Lazarus

On Wed, 16 Aug 2017 11:09:17 +0200
Michael Schnell via Lazarus  wrote:

> On 16.08.2017 10:58, Mattias Gaertner via Lazarus wrote:
> > This thread is going out of topic.
> > Please start a new thread if you want to discuss Delphi strings.  
> You can't discuss fpc's string problems without mentioning Delphi, as 
> they are a direct consequence as well of Delphi-compatibility as of 
> Delphi-incompatibility.

The original post was about a string conversion warning.

Anyone who wants to discuss the grand picture of strings in FPC for
the millionth time should start a new topic.


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote:
So it makes sense that TStrings should use UnicodeString internally to 
store its data. The Unicode standard is also the only standard that 
can support any language. 
But in fact "Unicode" is just a universal standard defining 64 bit 
entities. The encoding of those varies: UTF-8, UTF-16 high byte first,  
UTF-16 low byte first,  64 bit low byte first, 64 bit high byte first, 
 fpc and Delphi do support several of those as a string encoding 
(and with that crating any number of problems).


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote:
Are you suggesting that internally TStrings should have different 
storage for all possible languages,
Not at all. In the said paper I point out that a new fully dynamical 
string encoding brand would be introduced and same is used for TStrings. 
Everything else will not provide an improvement of the class of problems 
under discussion since years.


-Michael (knowing that this will never happen)
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 16.08.2017 11:32, Mattias Gaertner via Lazarus wrote:

Anyone who wants to discuss the grand picture of strings in FPC for the 
millionth time should start a new topic.
Right you are. And it will be by far too late and futile, anyway, 
because of the reasons discussed a million times.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Mattias Gaertner via Lazarus

On Wed, 16 Aug 2017 11:33:04 +0200
Michael Schnell via Lazarus  wrote:

>[...]
> But in fact "Unicode" is just a universal standard defining 64 bit 
> entities. 

No.
1,114,112 possible code points need at most 21 bits. Due to encoding at
most 32bit.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus

On Mon, Aug 14, 2017 at 4:21 PM, Tony Whyman via Lazarus
 wrote:
> UTF-16/Unicode can only store 65,536 characters while the Unicode standard
> (that covers UTF8 as well) defines 136,755 characters.
> UTF-16/Unicode's main advantage seems to be for rapid indexing of large
> strings.

That shows complete ignorance from your side about Unicode.
You consider UTF-16 as a fixed-width encoding.  :(
Unfortunately many other programmers had the same wrong idea or they
were just lazy. The result anyway is a lot of broken UTF-16 code out
there.

On Tue, Aug 15, 2017 at 12:15 PM, Tony Whyman via Lazarus
 wrote:
> If a topic keeps on being discussed after 10+ years of argument, the reason
> is usually either (a) the problem and its solution have not been documented
> properly, or (b) the outcome is an unsatisfactory compromise.

Or (c) The people discussing are ignorant about the topic.

> I went back and read the wiki article you mentioned and was no more the
> wiser as to why the current mess exists. Is it really no more than because
> Delphi continues to screw up in this area, so must FPC? The body of the
> article appears to be a set of notes - not necessarily wrong in themselves
> but lacking the background and context needed to explain why it is like it is.

Hmmm...
Originally the page was a mess because it had lots of irrelevant
background info about the old obsolete LCL Unicode support. Text was
added by many people but none was removed.
Finally I cleaned the page. It now has most relevant info at the top
and then special cases and technical details later.
I am rather happy with the page now, it explains how to use Unicode
with Lazarus as clearly as possible.
However I am willing to improve it. What kind of background and
context would you need?

> 1. Stop using the term "Unicode".

You can stop using it. No problem.
For others however it is a well defined international standard. See:
  https://en.wikipedia.org/wiki/Unicode

> 2. Clean up the char type.
> ...
> Why shouldn't there be a single char type that intuitively represents
> a single character regardless of how many bytes are used to represent it.

What do you mean by "a single character"?
A "character" in Unicode can mean about 7 different things. Which one
is your pick?
This question is for everybody in this thread who used the word "character".

> Yes, in a world where we have to live with UTF8, UTF16, UTF32, legacy code
> pages and Chinese variations on UTF8, that means that dynamic attributes
> have to be included in the type. But isn't that the only way to have
> consistent and intuitive character handling?

What do you mean? Chinese don't have a variation of UTF8.
UTF8 is global unambiguous encoding standard, part of Unicode.

The fundamental problem is that you want to hide the complexity of
Unicode by some magic String type of a compiler.
It is not possible. Unicode remains complex but the complexity is NOT
in encodings!
No, a codepoint's encoding is the easy part. For example I was easily
able to create a unit to support encoding agnostic code. See unit
LazUnicode in package LazUtils.
The complexity is elsewhere:
- "Character" composed of codepoints in precomposed and decomposed
(normalized) forms.
- Compare and sort text based on locale.
- Uppercase / Lowercase rules based on locale.
- Glyphs
- Graphemes
- etc.

I must admit I don't understand well those complex parts.
I do understand codeunits and codepoints, and I understand they are
the easy part.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 16.08.2017 11:55, Mattias Gaertner via Lazarus wrote:
1,114,112 possible code points need at most 21 bits. Due to encoding 
at most 32bit. 

Sorry. Typo.
-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus

On Wed, Aug 16, 2017 at 12:12 PM, Michael Schnell via Lazarus
 wrote:
> UTF-8 and UTF-16 are just encoding variants for 32 bit Unicode "characters",
> storing them in n (or 2*n) Bytes according to a simple scheme.

No, they are encodings for codepoints, not "characters" (whatever that means).

Michael Schnell, your posts are completely out of topic.
Unicode related topics clearly pull you like a magnet and then you
loose all control and start to proclaim your grand plan for a string
revamp.
It can continue for months as we remember from past years.
You should stop writing in this thread now. I agree with Mattias.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus


On 16.08.2017 12:22, Juha Manninen via Lazarus wrote:
You should stop writing in this thread now. I agree with Mattias. 
I perfectly agree with you. But you can't blame me for answering when 
asked.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus


On 2017-08-16 11:05, Juha Manninen via Lazarus wrote:

Unfortunately many other programmers had the same wrong idea or they
were just lazy. The result anyway is a lot of broken UTF-16 code out
there.


Yeah, I see that even in commercial products and projects. It's very sad 
to see. Hence I always promote UTF-8, and you can't get it wrong as 
easily as UTF-16. No endianess to worry about, no surrogate pairs and 
UTF-8 is ready for streaming (network or disk) out of the box.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Marcos Douglas B. Santos via Lazarus

On Wed, Aug 16, 2017 at 6:12 AM, Juha Manninen via Lazarus
 wrote:
> On Mon, Aug 14, 2017 at 4:11 PM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> Unicode everywhere and you using AnsiString and doing everything...
>> Now I'm confused.
>
> Yes, please read:
>  http://wiki.freepascal.org/Unicode_Support_in_Lazarus
> I have advertised it so much that some people are already irritated,
> but maybe you missed it so far.

Thanks. I know about this page... unfortunately looks like it is not
enough, since many others still complain.

>> This is a ugly trick... but I understood what you mean.
>
> This was about the explicit temporary UnicodeString variable for
> WinAPI call parameters.
> No, it is not ugly, the code remains 100% compatible with Delphi.
> Please remember also that direct WinAPI call are not needed in
> cross-platform code.

This thread is not only about WinAPI. I have this problem because I
need to use a Windows 3rd Lib, which uses WideString.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus

On Wed, Aug 16, 2017 at 5:13 PM, Marcos Douglas B. Santos via Lazarus
 wrote:
> Thanks. I know about this page... unfortunately looks like it is not
> enough, since many others still complain.

What is missing? I can try to improve it.

> This thread is not only about WinAPI. I have this problem because I
> need to use a Windows 3rd Lib, which uses WideString.

Then just use WideString or UnicodeString where needed. It is not a problem.

Note,  WideString is for OLE programming. Most often you should use
UnicodeString. Their memory management differs.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Marcos Douglas B. Santos via Lazarus

On Wed, Aug 16, 2017 at 11:37 AM, Juha Manninen via Lazarus
 wrote:
> On Wed, Aug 16, 2017 at 5:13 PM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> Thanks. I know about this page... unfortunately looks like it is not
>> enough, since many others still complain.
>
> What is missing? I can try to improve it.

I cannot say from others, but I had this issue (about WideString) for now.

>> This thread is not only about WinAPI. I have this problem because I
>> need to use a Windows 3rd Lib, which uses WideString.
>
> Then just use WideString or UnicodeString where needed. It is not a problem.

Are you saying that I need to do this?
(following the firt example on this thread)

=== begin ===
var
  U: UnicodeString;
  W: WideString;
begin
  U := IniFile.ReadString('TheLib', 'license', '');
  W := U;
  Lib.SetLicense(W);
  // ...
end;
=== end ===

...and I will not get a "Warning", right?


> Note,  WideString is for OLE programming. Most often you should use
> UnicodeString. Their memory management differs.

Ok... thanks... but in my case is a OLE object that I need to use.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus

On Wed, Aug 16, 2017 at 5:48 PM, Marcos Douglas B. Santos via Lazarus
 wrote:
> I cannot say from others, but I had this issue (about WideString) for now.

The section "Calling Windows API" says:
 'Only the "W" versions of Windows API functions should be called. It
is like in Delphi except that you must assign strings to/from API
calls to UnicodeString variables or typecast with UnicodeString().'
Then it also explains the difference between WideString and UnicodeString.
I should add a mention about PWideChar parameters.
Anyway the idea is to keep the information useful and dense. Earlier
it was bloated and intimidating.

> Are you saying that I need to do this?
> (following the firt example on this thread)

No, if the parameter is WideString, not a pointer PWideChar, you can
just call it like you did. Suppress the warning as Mattias told if it
bothers you. You can also make a helper function so the conversion
happens in one place.
Yes, for OLE you need WideString.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus

On 15.08.2017 10:34, Tony Whyman via Lazarus wrote:
> On 14/08/17 17:47, Sven Barth via Lazarus wrote:
>> The main problem of such a dynamic type would be the inability to do
>> fast indexing as the compiler would need to insert runtime checks for
>> the size of a character. I had already thought the same, but then had
>> to discard the idea due to this.
> 
> Is this really a big problem? It is not as if it would be necessary to
> do a table lookup everytime you index a string as the indexing method
> could be an attribute of the string and updated with the character
> encoding attribute. Is it really that complicated for the compiler to
> generate code that jumps to an indexing method depending upon a data
> attribute?

In a tight loop where one accesss the string character by character
(take Pos() for example) this will lead to a significant slowdown as the
compiler (without optimizations) will have to insert a call to the
lookup function for each access. While I generally don't consider
performance degradation as a backwards compatibility issue I do in this
case, due to the significant decrease in performance.

Take this evaluation example:

=== code begin ===

program tperf;

{$mode objfpc}{$H+}

uses
  SysUtils;

function lookup(const aStr: String; aIndex: SizeInt): Char;
begin
  Result := aStr[aIndex];
end;

var
  str: String;
  starttime, endtime: TDateTime;
  i, j: LongInt;
begin
  SetLength(str, 1);

  starttime := Now;
  for i := 0 to 1 do
for j := 1 to Length(str) do
  if str[j] <> '' then ;
  endtime := Now;

  Writeln('Direct: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime));

  starttime := Now;
  for i := 0 to 1 do
for j := 1 to Length(str) do
  if lookup(str, j) <> '' then ;
  endtime := Now;

  Writeln('Lookup: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime));
end.

=== code end ===

=== output begin ===

Direct: 00:00:01.766
Lookup: 00:00:02.061

=== output end ===

While this example is of course artificial it nevertheless shows the
slow down.

> Is your problem really more about the result type as, depending on the
> character width, the result could be an AnsiChar or WideChar or a UTF8
> character for which I don't believe there is a defined char type (other
> than an arguable  mis-use of UCS4Char)?

That is indeed also a problem. I might not have had that one in mind
with my mail above, but I did back then when I had brainstormed this.

> I can accept that a clear up of this area would also have to extend to
> the char types as well - but I would also argue that that is well
> overdue. On a quick count, I found 7 different char types in the system
> unit.

And most important of all: any solution that is developed *MUST* be
backwards compatible, so that means that in the least that type aliases
would remain anyway.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus

On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote:
> On 2017-08-16 09:43, Michael Schnell via Lazarus wrote:
>> IMHO, any implementation of TStrings that forces a conversion (just
>> because the class uses TStrings and not due to a logical demand), is a
>> contradiction to providing code aware strings at all.
> 
> But in FPC 3.x (using modern compiler modes - not TP or Mac) String =
> UnicodeString. So it makes sense that TStrings should use UnicodeString
> internally to store its data. The Unicode standard is also the only
> standard that can support any language. So all Windows code-pages can be
> supported with the single UnicodeString type.

You are wrong. The string types in 3.0.x and 3.1 are like this:

TP, Iso, ExtPas, MacPas, FPC, ObjFPC (or below modes with $H-): String =
ShortString
Delphi (or other modes with $H+): String = AnsiString (or more precisely
String(CP_ACP), meaning the system codepage)
Delphi_Unicode (or other modes with $H+ and $modeswitch unicodestring):
String = UnicodeString

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Luca Olivetti via Lazarus


El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:

In hind sight, using TBytes or TMemoryStream and it would have been very 
clear that it is a storage container for byte sized data, and no 
automatic conversion (by the compiler) would be done to data stored in 
such containers.


Call me lazy but I don't want to reinvent the wheel and re-implement 
from scratch the functionality that a plain ansistring provides and 
TBytes to this day doesn't.
I mean, TBytes is just an "array of char". I can't (easily) add a byte 
to the end, cut a slice of the bytes, find one byte in the array, etc.
OK, I can, but I have to program it all by myself while a string does 
all that and more and probably it's a lot more efficient.


Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Luca Olivetti via Lazarus


El 16/08/17 a les 20:26, Luca Olivetti via Lazarus ha escrit:

El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:

In hind sight, using TBytes or TMemoryStream and it would have been 
very clear that it is a storage container for byte sized data, and no 
automatic conversion (by the compiler) would be done to data stored in 
such containers.


Call me lazy but I don't want to reinvent the wheel and re-implement 
from scratch the functionality that a plain ansistring provides and 
TBytes to this day doesn't.
I mean, TBytes is just an "array of char". I can't (easily) add a byte 
to the end, cut a slice of the bytes, find one byte in the array, etc.
OK, I can, but I have to program it all by myself while a string does 
all that and more and probably it's a lot more efficient.


Not to mention that its index starts from 0. If I wanted to program in C 
I would be programming in C, not pascal ;-)


Bye



--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus

On 16.08.2017 20:26, Luca Olivetti via Lazarus wrote:
> El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:
> 
>> In hind sight, using TBytes or TMemoryStream and it would have been
>> very clear that it is a storage container for byte sized data, and no
>> automatic conversion (by the compiler) would be done to data stored in
>> such containers.
> 
> Call me lazy but I don't want to reinvent the wheel and re-implement
> from scratch the functionality that a plain ansistring provides and
> TBytes to this day doesn't.
> I mean, TBytes is just an "array of char". I can't (easily) add a byte
> to the end, cut a slice of the bytes, find one byte in the array, etc.
> OK, I can, but I have to program it all by myself while a string does
> all that and more and probably it's a lot more efficient.

Trunk supports Insert() and Delete() on dynamic arrays, Concat() and +
are on the near term ToDo list.

Regards,
Sven

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Luca Olivetti via Lazarus


El 16/08/17 a les 22:40, Sven Barth via Lazarus ha escrit:

On 16.08.2017 20:26, Luca Olivetti via Lazarus wrote:

El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:


In hind sight, using TBytes or TMemoryStream and it would have been
very clear that it is a storage container for byte sized data, and no
automatic conversion (by the compiler) would be done to data stored in
such containers.


Call me lazy but I don't want to reinvent the wheel and re-implement
from scratch the functionality that a plain ansistring provides and
TBytes to this day doesn't.
I mean, TBytes is just an "array of char". I can't (easily) add a byte
to the end, cut a slice of the bytes, find one byte in the array, etc.
OK, I can, but I have to program it all by myself while a string does
all that and more and probably it's a lot more efficient.


Trunk supports Insert() and Delete() on dynamic arrays, Concat() and +
are on the near term ToDo list.


I started using strings as communication buffers since delphi 2. There 
weren't even dynamic arrays then...


Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus


On 2017-08-16 18:35, Sven Barth via Lazarus wrote:

You are wrong. The string types in 3.0.x and 3.1 are like this:


Thanks for correcting me. I was thinking of the "$modeswitch 
unicodestring" option.


Regards,
  Graeme

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus


On 2017-08-16 19:26, Luca Olivetti via Lazarus wrote:

I mean, TBytes is just an "array of char".


NO!  Char can now mean a 1-byte char or a 2-byte char (I don't know how 
FPC plans to support Unicode surrogate pairs which will require 
4-bytes). In the olden days (Delphi 7 and FPC 2.6.4) the Char type might 
always have meant 1-byte, but it doesn't necessarily these days.


TBytes has always been a container for Byte data.

Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus


On 2017-08-16 23:46, Luca Olivetti via Lazarus wrote:

I started using strings as communication buffers since delphi 2. There
weren't even dynamic arrays then...


Well, Link-Lists existed from the beginning of time. I used them plenty 
in my TP days, and adding, inserting, indexing etc was pretty easy. 
Maybe programmers have just become spoilt over time with all the "out of 
the box" functionality and actually become lazy in coding.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] String vs WideString

2017-08-16 Thread wkitty42--- via Lazarus


On 08/16/2017 07:30 PM, Graeme Geldenhuys via Lazarus wrote:

On 2017-08-16 18:35, Sven Barth via Lazarus wrote:

You are wrong. The string types in 3.0.x and 3.1 are like this:


Thanks for correcting me. I was thinking of the "$modeswitch unicodestring" 
option.



will that modeswitch take care of the warning about explicit conversion between 
ansistring and unicode string when one has


var foo : unicodestring;

writeln(padright(foo,5);

??

i wrote a quick and simple little array exhibit program for someone... i had 
thought to try to embrace this new unicode stuff by using unicode strings... the 
using the padright and similar string manipulators gave me warnings about 
ansistring conversions :?


NOTE: this may be because i have an older lazarus and fpc installed... lazarus 
fixes 1.6.1 and fpc fixes 3.0.something...



--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list unless*
   *a signed and pre-paid contract is in effect with us.*
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

1 2 >

1 - 100 of 130 matches

Mail list logo