Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus
Am 17.08.2017 04:16 schrieb "wkitty42--- via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> On 08/16/2017 06:46 PM, Luca Olivetti via Lazarus wrote:
>>
>> I started using strings as communication buffers since delphi 2. There
>> weren't even dynamic arrays then...
>
>
> really? delphi came from TP/BP... i was (still am, actually) using
dynamic arrays in TP6 ;)

Dynamic arrays in the form of "array of Type" were only introduced in
Delphi 3 if I remember correctly. Anything before that needed manual memory
management.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread wkitty42--- via Lazarus

On 08/16/2017 06:46 PM, Luca Olivetti via Lazarus wrote:

I started using strings as communication buffers since delphi 2. There
weren't even dynamic arrays then...


really? delphi came from TP/BP... i was (still am, actually) using dynamic 
arrays in TP6 ;)



--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list unless*
   *a signed and pre-paid contract is in effect with us.*
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread wkitty42--- via Lazarus

On 08/16/2017 07:30 PM, Graeme Geldenhuys via Lazarus wrote:

On 2017-08-16 18:35, Sven Barth via Lazarus wrote:

You are wrong. The string types in 3.0.x and 3.1 are like this:


Thanks for correcting me. I was thinking of the "$modeswitch unicodestring" 
option.



will that modeswitch take care of the warning about explicit conversion between 
ansistring and unicode string when one has


var foo : unicodestring;

writeln(padright(foo,5);

??

i wrote a quick and simple little array exhibit program for someone... i had 
thought to try to embrace this new unicode stuff by using unicode strings... the 
using the padright and similar string manipulators gave me warnings about 
ansistring conversions :?


NOTE: this may be because i have an older lazarus and fpc installed... lazarus 
fixes 1.6.1 and fpc fixes 3.0.something...



--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list unless*
   *a signed and pre-paid contract is in effect with us.*
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus

On 2017-08-16 23:46, Luca Olivetti via Lazarus wrote:

I started using strings as communication buffers since delphi 2. There
weren't even dynamic arrays then...


Well, Link-Lists existed from the beginning of time. I used them plenty 
in my TP days, and adding, inserting, indexing etc was pretty easy. 
Maybe programmers have just become spoilt over time with all the "out of 
the box" functionality and actually become lazy in coding.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus

On 2017-08-16 19:26, Luca Olivetti via Lazarus wrote:

I mean, TBytes is just an "array of char".


NO!  Char can now mean a 1-byte char or a 2-byte char (I don't know how 
FPC plans to support Unicode surrogate pairs which will require 
4-bytes). In the olden days (Delphi 7 and FPC 2.6.4) the Char type might 
always have meant 1-byte, but it doesn't necessarily these days.


TBytes has always been a container for Byte data.

Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus

On 2017-08-16 18:35, Sven Barth via Lazarus wrote:

You are wrong. The string types in 3.0.x and 3.1 are like this:


Thanks for correcting me. I was thinking of the "$modeswitch 
unicodestring" option.


Regards,
  Graeme

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus
On 16.08.2017 20:26, Luca Olivetti via Lazarus wrote:
> El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:
> 
>> In hind sight, using TBytes or TMemoryStream and it would have been
>> very clear that it is a storage container for byte sized data, and no
>> automatic conversion (by the compiler) would be done to data stored in
>> such containers.
> 
> Call me lazy but I don't want to reinvent the wheel and re-implement
> from scratch the functionality that a plain ansistring provides and
> TBytes to this day doesn't.
> I mean, TBytes is just an "array of char". I can't (easily) add a byte
> to the end, cut a slice of the bytes, find one byte in the array, etc.
> OK, I can, but I have to program it all by myself while a string does
> all that and more and probably it's a lot more efficient.

Trunk supports Insert() and Delete() on dynamic arrays, Concat() and +
are on the near term ToDo list.

Regards,
Sven

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus

On 16/08/2017 20:44, Juha Manninen via Lazarus wrote:



So using "char" (the type) as reference to "codepoint" is something we have
to do, because today the type "char" is for codepoints.

Sorry I didn't understand this one.
"Char" (the type) holds a codeunit, not a codepoint. Char is either 1


Right yes. Genuine mistake with all the confusion

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 7:53 PM, Martin Frb via Lazarus
 wrote:
>> I know CodeUnit and CodePoint are not called "character" officially by
>> the Unicode Standard.
>> They however are called "character" in normal communication.
>
> And that is where the problem starts.
> ...

Exactly. Discussions where the word "character" is used are very vague
and inaccurate.

> So using "char" (the type) as reference to "codepoint" is something we have
> to do, because today the type "char" is for codepoints.

Sorry I didn't understand this one.
"Char" (the type) holds a codeunit, not a codepoint. Char is either 1
byte or 2 bytes depending on if it maps to AnsiChar or WideChar, for
UTF-8 or UTF-16 respectively.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Luca Olivetti via Lazarus

El 16/08/17 a les 20:26, Luca Olivetti via Lazarus ha escrit:

El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:

In hind sight, using TBytes or TMemoryStream and it would have been 
very clear that it is a storage container for byte sized data, and no 
automatic conversion (by the compiler) would be done to data stored in 
such containers.


Call me lazy but I don't want to reinvent the wheel and re-implement 
from scratch the functionality that a plain ansistring provides and 
TBytes to this day doesn't.
I mean, TBytes is just an "array of char". I can't (easily) add a byte 
to the end, cut a slice of the bytes, find one byte in the array, etc.
OK, I can, but I have to program it all by myself while a string does 
all that and more and probably it's a lot more efficient.


Not to mention that its index starts from 0. If I wanted to program in C 
I would be programming in C, not pascal ;-)


Bye



--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Luca Olivetti via Lazarus

El 16/08/17 a les 01:17, Graeme Geldenhuys via Lazarus ha escrit:

In hind sight, using TBytes or TMemoryStream and it would have been very 
clear that it is a storage container for byte sized data, and no 
automatic conversion (by the compiler) would be done to data stored in 
such containers.


Call me lazy but I don't want to reinvent the wheel and re-implement 
from scratch the functionality that a plain ansistring provides and 
TBytes to this day doesn't.
I mean, TBytes is just an "array of char". I can't (easily) add a byte 
to the end, cut a slice of the bytes, find one byte in the array, etc.
OK, I can, but I have to program it all by myself while a string does 
all that and more and probably it's a lot more efficient.


Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus
On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote:
> On 2017-08-16 09:43, Michael Schnell via Lazarus wrote:
>> IMHO, any implementation of TStrings that forces a conversion (just
>> because the class uses TStrings and not due to a logical demand), is a
>> contradiction to providing code aware strings at all.
> 
> But in FPC 3.x (using modern compiler modes - not TP or Mac) String =
> UnicodeString. So it makes sense that TStrings should use UnicodeString
> internally to store its data. The Unicode standard is also the only
> standard that can support any language. So all Windows code-pages can be
> supported with the single UnicodeString type.

You are wrong. The string types in 3.0.x and 3.1 are like this:

TP, Iso, ExtPas, MacPas, FPC, ObjFPC (or below modes with $H-): String =
ShortString
Delphi (or other modes with $H+): String = AnsiString (or more precisely
String(CP_ACP), meaning the system codepage)
Delphi_Unicode (or other modes with $H+ and $modeswitch unicodestring):
String = UnicodeString

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Sven Barth via Lazarus
On 15.08.2017 10:34, Tony Whyman via Lazarus wrote:
> On 14/08/17 17:47, Sven Barth via Lazarus wrote:
>> The main problem of such a dynamic type would be the inability to do
>> fast indexing as the compiler would need to insert runtime checks for
>> the size of a character. I had already thought the same, but then had
>> to discard the idea due to this.
> 
> Is this really a big problem? It is not as if it would be necessary to
> do a table lookup everytime you index a string as the indexing method
> could be an attribute of the string and updated with the character
> encoding attribute. Is it really that complicated for the compiler to
> generate code that jumps to an indexing method depending upon a data
> attribute?

In a tight loop where one accesss the string character by character
(take Pos() for example) this will lead to a significant slowdown as the
compiler (without optimizations) will have to insert a call to the
lookup function for each access. While I generally don't consider
performance degradation as a backwards compatibility issue I do in this
case, due to the significant decrease in performance.

Take this evaluation example:

=== code begin ===

program tperf;

{$mode objfpc}{$H+}

uses
  SysUtils;

function lookup(const aStr: String; aIndex: SizeInt): Char;
begin
  Result := aStr[aIndex];
end;

var
  str: String;
  starttime, endtime: TDateTime;
  i, j: LongInt;
begin
  SetLength(str, 1);

  starttime := Now;
  for i := 0 to 1 do
for j := 1 to Length(str) do
  if str[j] <> '' then ;
  endtime := Now;

  Writeln('Direct: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime));

  starttime := Now;
  for i := 0 to 1 do
for j := 1 to Length(str) do
  if lookup(str, j) <> '' then ;
  endtime := Now;

  Writeln('Lookup: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime));
end.

=== code end ===

=== output begin ===

Direct: 00:00:01.766
Lookup: 00:00:02.061

=== output end ===

While this example is of course artificial it nevertheless shows the
slow down.

> Is your problem really more about the result type as, depending on the
> character width, the result could be an AnsiChar or WideChar or a UTF8
> character for which I don't believe there is a defined char type (other
> than an arguable  mis-use of UCS4Char)?

That is indeed also a problem. I might not have had that one in mind
with my mail above, but I did back then when I had brainstormed this.

> I can accept that a clear up of this area would also have to extend to
> the char types as well - but I would also argue that that is well
> overdue. On a quick count, I found 7 different char types in the system
> unit.

And most important of all: any solution that is developed *MUST* be
backwards compatible, so that means that in the least that type aliases
would remain anyway.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus

On 16/08/2017 16:55, Juha Manninen via Lazarus wrote:

On Wed, Aug 16, 2017 at 6:24 PM, Martin Frb via Lazarus
 wrote:

Actually no.

I know CodeUnit and CodePoint are not called "character" officially by
the Unicode Standard.
They however are called "character" in normal communication.

And that is where the problem starts.

As long as people do this, even if they know it is incorrect, others 
will pick it up, and others will learn the wrong concepts.


Calling codepoints = char, means that newcomers will think s[x] is a 
valid way to deal with chars.

And that is wrong, even in utf32.



For example in the "String vs WideString" thread most people used
"character" as a synonym for CodePoint.
Lots of people used the word character as if they where the same as 
codeunit.


But the questions is did they use it as synonym? I.e did they know they 
were substituting with the wrong word?

If so, why would they intentionally use misleading terms?


For CodeUnit the term is very logical for historical reasons as the
type "Char" is a short form of "Character".

That is why today it is a misnomer.

So using "char" (the type) as reference to "codepoint" is something we 
have to do, because today the type "char" is for codepoints.


That is different from the English word "char" and that can cause a huge 
confusion.


The English word "character" however is unambitious. It is not the name 
of a type. So it refers to character only, not to codepoint.




--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 18:06:36 +0200
Michael Schnell via Lazarus  wrote:

>[...]
> The only difference to the current status is that with the "dynamic" 
> string brand the content of the "bytes per element" field is not 
> predefined by the variable declaration but can change when something is 
> assigned to that (additional) brand of string variables (I feel that 
> this is clearly stated in the paper). Hence for that (additional) brand 
> of string variables the compiler needs to generate code to read this 
> field when implementing the built-in functions.

This "dynamicstring" sounds like Rawbytestring times two. Any function
accessing the inner chars of a "dynamicstring" has to handle
Rawbytestring codepages and unicodestring and array of byte/word/dword.
If this is the price for avoiding some conversions, many programmers
will become unhappy.
Michael, please tell me your proposal has some serious advantages. I
don't see them.

Mattias

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 17:55, Juha Manninen via Lazarus wrote:

although Pos(), Copy() and Length() deal with CodeUnit resolution.
I wonder how the new fancy string types would handle it without a
performance penalty.
This again is not in the scope of the paper, and supposed to stay as it 
is. S[x], Pos(), and friends work in terms of "bytes per element" bytes.


The only difference to the current status is that with the "dynamic" 
string brand the content of the "bytes per element" field is not 
predefined by the variable declaration but can change when something is 
assigned to that (additional) brand of string variables (I feel that 
this is clearly stated in the paper). Hence for that (additional) brand 
of string variables the compiler needs to generate code to read this 
field when implementing the built-in functions.


-Michael


--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 17:20, Juha Manninen via Lazarus wrote:


Unicode is the standard now. We cannot ignore it, and we don't want to
ignore it because it solves so many problems of the earlier solutions.
If you create a new string type, you certainly must take Unicode into account.
It is not "ignored", as it is handled by the conversion functions the 
functionality of which is not touched. The paper is just about storing 
the information in the strings (including the "encoding brand" and 
"bytes per element") fields.


So the actual meaning of the stuff that is stored in the strings is 
beyond the scope of the paper. And supposed to stay as it currently is.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus

On 16/08/2017 16:20, Juha Manninen via Lazarus wrote:


The word "character" in Unicode can mean:

1. CodeUnit — Represented by Pascal type "Char".

Actually no.

It can overlap. But a codeunit is NOT a character.

For example a codeunit that holds a codepoint of class "combining mark", 
this is not a character. It is just something that can form a character 
if combined with other codepoints.



2. CodePoint

Also not a character. Same as above.

Some Codepoints happen to also be a character. But some are not.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 4:49 PM, Michael Schnell via Lazarus
 wrote:
>> You are writing about encodings etc. which are part of codepoints, but
>> you call them "characters". Why?
>
> Because the type for this stuff used in Delphi and and FPC is called "char".

No, actually the Pascal type "Char" contains a CodeUnit, not CodePoint.
It is the smallest fixed width "atom" of Unicode text. It is still
extremely useful in Unicode related programming.
The word "character" in Unicode can mean:

1. CodeUnit — Represented by Pascal type "Char".

2. CodePoint — all the arguments about one encoding's supremacy over
another deal with CodePoints. Yes, UTF-8, UTF-16, UTF-32 etc. all only
encode CodePoints.

3. Abstract Unicode character — like  'WINE GLASS'.

4. Coded Unicode character — "U" + a unique number, like U+1F377. This
is what "character" means in Unicode Standard.

5. User-perceived character — Whatever the end user thinks of as a character.
This is language dependent. For instance, ‘ch’ is two letters in
English but one letter in Czech and Slovak.
Many more complexities are involved here, including decomposed codepoints.

6. Grapheme cluster

7. Glyph — related to fonts.

So, number 4. is the official Unicode "character".
Otherwise the most useful meanings are 1. "CodeUnit" for programmers
and 5. "User-perceived character" for everybody else.
Note, CodePoint is NOT a useful meaning for "character". It would only
confuse things. Yet most people in these Unicode threads write about
"character" like it meant CodePoint. It can only mean that those
people are ignorant of the complexity of Unicode.  :(


> In fact I did not explicitly talk about Unicode at all. the paper says it:
> ...

Unicode is the standard now. We cannot ignore it, and we don't want to
ignore it because it solves so many problems of the earlier solutions.
If you create a new string type, you certainly must take Unicode into account.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Marcos Douglas B. Santos via Lazarus
On Wed, Aug 16, 2017 at 11:37 AM, Juha Manninen via Lazarus
 wrote:
> On Wed, Aug 16, 2017 at 5:13 PM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> Thanks. I know about this page... unfortunately looks like it is not
>> enough, since many others still complain.
>
> What is missing? I can try to improve it.

I cannot say from others, but I had this issue (about WideString) for now.

>> This thread is not only about WinAPI. I have this problem because I
>> need to use a Windows 3rd Lib, which uses WideString.
>
> Then just use WideString or UnicodeString where needed. It is not a problem.

Are you saying that I need to do this?
(following the firt example on this thread)

=== begin ===
var
  U: UnicodeString;
  W: WideString;
begin
  U := IniFile.ReadString('TheLib', 'license', '');
  W := U;
  Lib.SetLicense(W);
  // ...
end;
=== end ===

...and I will not get a "Warning", right?


> Note,  WideString is for OLE programming. Most often you should use
> UnicodeString. Their memory management differs.

Ok... thanks... but in my case is a OLE object that I need to use.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus

On 16/08/2017 13:48, Michael Schnell wrote:

On 16.08.2017 14:30, Martin Frb via Lazarus wrote:


And that would still not be "char", but "codepoint"

A char can be composed of several combining code points (each of them 
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a 
combined form).
Unfortunately in Delphi and FPC the appropriate work-alike existing 
type is called Char (with certain extensions). It would cause major 
problems to drop that name for something else, even if that would be 
appropriate.


I agree. "char" actually is a "code unit".

But renaming it, would probably be as good as killing the language.

and anyone can do
type codeunit=char;

and use this.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus

On 16/08/2017 13:37, Alexey via Lazarus wrote:

On 16.08.2017 15:30, Martin Frb via Lazarus wrote:


A char can be composed of several combining code points (each of them 
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a 
combined form).


See my prev post: i see that each S[i] good to be like QWord 
(sizeof(one char)= sizeof(Qword)). It can be TextChar. And type can be 
TextString. internally it can be compressed to utf8. TextString is 
good if i want to parse text by "chars". If "char" needs more bytes- 
lets take more (internally it is same utf8)




Have a look at 
https://www.reddit.com/r/Unicode/comments/4yie0a/tallest_longest_unicode_character/


There is ONE character, that comprises more than 200 codepoints.
Only way to store such a char is in a type of dynamic size (aka string)

Well I couldn't find an official doc what makes the boundaries of a char.

But as far as I can see: if ä is one character, and it can be encoded as 
"none combining codepoint" + "combining codepoint", then a character is 
any sequence of one "none combining codepoint" + zero or more "combining 
codepoints" (AFAIK Arabic scripts has chars, that have several 
"combining codepoints", so this is happening in actual languages.


The example as far as I checked fulfils this definition.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 5:13 PM, Marcos Douglas B. Santos via Lazarus
 wrote:
> Thanks. I know about this page... unfortunately looks like it is not
> enough, since many others still complain.

What is missing? I can try to improve it.

> This thread is not only about WinAPI. I have this problem because I
> need to use a Windows 3rd Lib, which uses WideString.

Then just use WideString or UnicodeString where needed. It is not a problem.

Note,  WideString is for OLE programming. Most often you should use
UnicodeString. Their memory management differs.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Marcos Douglas B. Santos via Lazarus
On Wed, Aug 16, 2017 at 6:12 AM, Juha Manninen via Lazarus
 wrote:
> On Mon, Aug 14, 2017 at 4:11 PM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> Unicode everywhere and you using AnsiString and doing everything...
>> Now I'm confused.
>
> Yes, please read:
>  http://wiki.freepascal.org/Unicode_Support_in_Lazarus
> I have advertised it so much that some people are already irritated,
> but maybe you missed it so far.

Thanks. I know about this page... unfortunately looks like it is not
enough, since many others still complain.

>> This is a ugly trick... but I understood what you mean.
>
> This was about the explicit temporary UnicodeString variable for
> WinAPI call parameters.
> No, it is not ugly, the code remains 100% compatible with Delphi.
> Please remember also that direct WinAPI call are not needed in
> cross-platform code.

This thread is not only about WinAPI. I have this problem because I
need to use a Windows 3rd Lib, which uses WideString.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 15:33, Juha Manninen via Lazarus wrote:
Why don't you implement such a system. This is all FOSS, free and open 
source. 

I would never dare to try to edit the compiler :-[

You are writing about encodings etc. which are part of codepoints, but
you call them "characters". Why?
Because the type for this stuff used in Delphi and and FPC is called 
"char".



Is it possible you don't know Unicode beyond codepoints?

In fact I did not explicitly talk about Unicode at all. the paper says it:
"In this article, a "String" is thought of as a reference counted 
ordered array of a number of "Things" (aka elements). (I feel that this 
is what the name String suggests.)" ..."If the elements of the strings 
are printable characters or partial codes of UTF. OK, this is nice 
(provided the conversion functions are in place) and makes doing 
programs handling conventional problems very easy" ...

Do you have plans to tackle also the complex issues of Unicode?

Not at all.

If not, then your efforts are useless because codeunits and codepoints
are easy in any case.
I know. The intention was to handle a completely different problem from 
that you suggest here.

You use energy for a problem that does not
exist.

I wrote the paper because I once was requested to do so in the fpc forum.

-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 2:47 PM, Michael Schnell via Lazarus
 wrote:
> -Michael (It's rather frustrating to discuss that obviously never will
> happen :-()

Why don't you implement such a system. This is all FOSS, free and open source.

You are writing about encodings etc. which are part of codepoints, but
you call them "characters". Why?
Is it possible you don't know Unicode beyond codepoints?
Do you have plans to tackle also the complex issues of Unicode?
If not, then your efforts are useless because codeunits and codepoints
are easy in any case. You use energy for a problem that does not
exist.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 3:37 PM, Alexey via Lazarus
 wrote:
> See my prev post: i see that each S[i] good to be like QWord (sizeof(one
> char)= sizeof(Qword)). It can be TextChar. And type can be TextString.
> internally it can be compressed to utf8. TextString is good if i want to
> parse text by "chars". If "char" needs more bytes- lets take more
> (internally it is same utf8)

No Alexey, you are now explaining codepoints.
Codeunits and codepoints are the easy part in any case.
Could you please define character.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote:

For some unknown
reason you want to store different encodings in a TStrings and fear
the "time-consuming" and loss-prone auto conversions.
It's obvious that a user using a different encoding brand in a string 
var than that suggested by TStrings (UTF-8 in fpc, UTF-16 in Delphi) 
implicitly triggers auto-conversion when handling the string. This has 
several consequences.


It might be a really good idea when e.g. doing some code that in a loop 
needs certain operation that might be very fast with UTF-16 but 
TStringList would store the data in a more compact way.


It might be time consuming when the conversion is done without being 
necessary.


It might be error pone when the user stores some random stuff in the 
string that is not able to be handled by the conversion forth and back.


In any case all this happens without the user being aware of, which 
might cause frustration.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote:
Not if complicated things get more complicated. 
Please leave out the additional encoding brands suggested just as an 
afterthought in the paper. These are not the purpose at all but ()if the 
other stuff would be in place) just com as a natural enhancement.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] GlobalMemoryStatus is Windows only, how to get installed RAM on Linux ?

2017-08-16 Thread Michael Van Canneyt via Lazarus



On Wed, 16 Aug 2017, Landmesser John via Lazarus wrote:


googled in vain ...

... and "TsmBios" ( -> Win/Linux https://github.com/RRUZ/tsmbios ) won't 
compile :-(



So how to get Information about installed RAM on Linux for example?

Ok, i could grep "hwinfo" or such in a terminal but thats not what i'm 
looking for.


Your best options is most likely to read /proc/meminfo and parse the result.
It contains a wealth of information.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


[Lazarus] GlobalMemoryStatus is Windows only, how to get installed RAM on Linux ?

2017-08-16 Thread Landmesser John via Lazarus

googled in vain ...

... and "TsmBios" ( -> Win/Linux https://github.com/RRUZ/tsmbios ) won't 
compile :-(



So how to get Information about installed RAM on Linux for example?

Ok, i could grep "hwinfo" or such in a terminal but thats not what i'm 
looking for.


Tipps are welcome

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 14:22, Alexey via Lazarus wrote:
BTW, it will be good to have "Cstring" (or another name, not 
"dynamicstring") : ...


You are missing the point the paper is supposed to be about: enhancing 
the versatility of the library functions such as those using TStrings. 
Not just creating another type of strings, which is nothing but a 
prerequisite for the main purpose.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 15:22:20 +0300
Alexey via Lazarus  wrote:

> On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote:
> > When you propose a new string type "dynamicstring" you have to define these 
> > operators.  
> 
> BTW, it will be good to have "Cstring" (or another name, not 
> "dynamicstring") :
> 
> - [] operator is 0-based like Python/C
> 
> - s[i] is DWORD per char (for all Unicode chars from 0 to MaxDWORD codes)
> 
> PChar(s)/PWChar(s) wont work for it? so it is not ok idea? But this type 
> can be compressed inside, eg in utf8. S[i] is DWORD outside. It is like 
> some class.

This sounds, as if you want an UTF-32 string type. 

Michael's proposal is a multi encoded string type, storing Ansi, UTF-8,
UTF-16 and UTF-32.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 14:30, Martin Frb via Lazarus wrote:


And that would still not be "char", but "codepoint"

A char can be composed of several combining code points (each of them 
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a 
combined form).
Unfortunately in Delphi and FPC the appropriate work-alike existing type 
is called Char (with certain extensions). It would cause major problems 
to drop that name for something else, even if that would be appropriate.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 13:47:26 +0200
Michael Schnell via Lazarus  wrote:

> On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote:
> > You are confusing people if you name your encodings like this.   
> There also is no "official" Code pages named "Default" or "None", the 
> naming "CP_DEFAULT" and "CP_NONE" has just been invented by Emparcadero.

It is not about "official". A codepage describes a character set. What
has your CP_QWORD to do with any character set?

 
>[...]
> > What is the intention of your proposal?  
> 
> That is given in the instructional paragraph "The problem":
> "The most obvious candidate for pain on that behalf is “TStrings”.

I read it, but I must admit, I don't understand it. For some unknown
reason you want to store different encodings in a TStrings and fear
the "time-consuming" and loss-prone auto conversions. And then it
sounds as if this is a common problem ("much more urgent").


>[...]
> Enhancing the count of available encoding brandings is just a logical 
> consequence of a less problem prone and more versatile (not implicitly 
> restricted to printable text) overall string handling.

Who wants to have more encodings?
AFAIK everyone wants less, preferably only one.

 
> -Michael (It's rather frustrating to discuss that obviously never will 
> happen :-()

Not if complicated things get more complicated.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Alexey via Lazarus

On 16.08.2017 15:30, Martin Frb via Lazarus wrote:


A char can be composed of several combining code points (each of them 
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a 
combined form).


See my prev post: i see that each S[i] good to be like QWord (sizeof(one 
char)= sizeof(Qword)). It can be TextChar. And type can be TextString. 
internally it can be compressed to utf8. TextString is good if i want to 
parse text by "chars". If "char" needs more bytes- lets take more 
(internally it is same utf8)


--
Regards,
Alexey

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus

On 16/08/2017 10:51, Mattias Gaertner via Lazarus wrote:

Of course an appropriate "char" type for each string encoding brand
could to be provided, hence a "CP_QWord Char" as an alias or a QWord.

There is no QWord codepage. That would be confusing.



And that would still not be "char", but "codepoint"

A char can be composed of several combining code points (each of them 
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a combined 
form).

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Alexey via Lazarus

On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote:

When you propose a new string type "dynamicstring" you have to define these 
operators.


BTW, it will be good to have "Cstring" (or another name, not 
"dynamicstring") :


- [] operator is 0-based like Python/C

- s[i] is DWORD per char (for all Unicode chars from 0 to MaxDWORD codes)

PChar(s)/PWChar(s) wont work for it? so it is not ok idea? But this type 
can be compressed inside, eg in utf8. S[i] is DWORD outside. It is like 
some class.


--
Regards,
Alexey

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote:
You are confusing people if you name your encodings like this. 
There also is no "official" Code pages named "Default" or "None", the 
naming "CP_DEFAULT" and "CP_NONE" has just been invented by Emparcadero.


So I did the same and just brainlessly extended the existing "CP..." 
naming scheme.

Your "dynamicstring" supports char, widechar, byte, word, dword, qword.
Why not shortint or smallint?
Why not boolean, single and variant?
As pointed out this is just a draft of a proposal, prone to enhancement 
and improvement.



What is the intention of your proposal?


That is given in the instructional paragraph "The problem":
"The most obvious candidate for pain on that behalf is “TStrings”.

Only a fully dynamically encoded version of TStrings and friends would 
allow for a solution for many string encoding related problems, as the 
user can't modify the string encoding brand TStrings uses and hence will 
face the described problems when he uses TStrings with all but one of 
the String encoding brandings he can choose from.


Enhancing the count of available encoding brandings is just a logical 
consequence of a less problem prone and more versatile (not implicitly 
restricted to printable text) overall string handling.


-Michael (It's rather frustrating to discuss that obviously never will 
happen :-()

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 12:24:55 +0200
Michael Schnell via Lazarus  wrote:

> On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote:
> > Every Delphi/FPC type has a bunch of operators. Strings support :=, =, 
> > <>, >=, <= and [] for read and write.
> > When you propose a new string type "dynamicstring" you have to define these 
> > operators.  
>[...]
> For "new" encoding brandings, such as CP_Byte, CP_Word, CP_DWord, 
> CP_QWord, the working of the operators is obvious.

There are no such codepages. You are confusing people if you name
your encodings like this.


> It somebody tries to 
> compare a printable Text string with a string of binary elements, maybe 
> the behavior is undefined.

Your "dynamicstring" supports char, widechar, byte, word, dword, qword.
Why not shortint or smallint? 
Why not boolean, single and variant?

What is the intention of your proposal?

 
Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus

On 2017-08-16 11:05, Juha Manninen via Lazarus wrote:

Unfortunately many other programmers had the same wrong idea or they
were just lazy. The result anyway is a lot of broken UTF-16 code out
there.


Yeah, I see that even in commercial products and projects. It's very sad 
to see. Hence I always promote UTF-8, and you can't get it wrong as 
easily as UTF-16. No endianess to worry about, no surrogate pairs and 
UTF-8 is ready for streaming (network or disk) out of the box.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 12:22, Juha Manninen via Lazarus wrote:
You should stop writing in this thread now. I agree with Mattias. 
I perfectly agree with you. But you can't blame me for answering when 
asked.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote:
Every Delphi/FPC type has a bunch of operators. Strings support :=, =, 
<>, >=, <= and [] for read and write.

When you propose a new string type "dynamicstring" you have to define these 
operators.

That is easily doable.
The definition of := is discussed in the paper. (Only for :=  there is 
no accessible encoding definition for the left operand.)
If the encoding branding is one of those that already exist, the current 
definition is used.
For "new" encoding brandings, such as CP_Byte, CP_Word, CP_DWord, 
CP_QWord, the working of the operators is obvious. It somebody tries to 
compare a printable Text string with a string of binary elements, maybe 
the behavior is undefined.


There is no QWord codepage. That would be confusing. 


Of course the term "Codepage" Embarcadero chose for the encoding 
identification is misleading in this context. That is why in the said 
paper it's called "encoding style" (which is not a really appropriate 
wording, either, but hey, it's just an initial suggestion and not yet a 
final documentation, and it had been clear from the beginning that it's 
in vain, anyway. )


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 12:12 PM, Michael Schnell via Lazarus
 wrote:
> UTF-8 and UTF-16 are just encoding variants for 32 bit Unicode "characters",
> storing them in n (or 2*n) Bytes according to a simple scheme.

No, they are encodings for codepoints, not "characters" (whatever that means).

Michael Schnell, your posts are completely out of topic.
Unicode related topics clearly pull you like a magnet and then you
loose all control and start to proclaim your grand plan for a string
revamp.
It can continue for months as we remember from past years.
You should stop writing in this thread now. I agree with Mattias.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 11:55, Mattias Gaertner via Lazarus wrote:
1,114,112 possible code points need at most 21 bits. Due to encoding 
at most 32bit. 

Sorry. Typo.
-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


[Lazarus] OpenGL 4.6 bindings and generator

2017-08-16 Thread Kostas Michalopoulos via Lazarus
Hi all,

After finding the OpenGL bindings that come with Lazarus a bit on the
ancient side of things (i think it only supports up to 4.0? Also there is a
4.3 version loading function but only seems to call 3.3's loader - ignoring
4.0 - and loads only a single extension) and never really liking the global
functions of pointers approach (if nothing else it makes the autocompletion
in the IDE a bit annoying) i decided to make some brand new bindings.

I wrote a parser for Khronos' XML spec (gl.xml) that generates the
appropriate interface and implementation. You only have to call LoadGLProcs
after you have a context ready and it tries to load everything it knows of.
Instead of Load_some_extension you get a global Has_some_extension variable
(these are initialized via LoadGLProcs too). As a bonus you get a
HasExtension function as well as an AllExtensions array of strings.

Btw it is not a drop in replacement for GL/GLext/GLotherstuff, although you
can use it with the OpenGL control that comes with Lazarus and most likely
it is compatible as long as you don't use it from the same unit (since all
they do is call the driver stuff anyway) and you initialize both using the
same context (since different contexts might give different functions).

You can find the code as well as a pregenerated "OpenGL" unit at:

http://runtimeterror.com/rep/gl2unit/index

At the moment only Windows is supported, but soon i'll add Linux and
eventually Mac OS X support (it should be around ~10 lines of code for each
OS, hopefully). Also i have done minimal testing so there might be bugs :-).

Kostas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus
On Mon, Aug 14, 2017 at 4:21 PM, Tony Whyman via Lazarus
 wrote:
> UTF-16/Unicode can only store 65,536 characters while the Unicode standard
> (that covers UTF8 as well) defines 136,755 characters.
> UTF-16/Unicode's main advantage seems to be for rapid indexing of large
> strings.

That shows complete ignorance from your side about Unicode.
You consider UTF-16 as a fixed-width encoding.  :(
Unfortunately many other programmers had the same wrong idea or they
were just lazy. The result anyway is a lot of broken UTF-16 code out
there.


On Tue, Aug 15, 2017 at 12:15 PM, Tony Whyman via Lazarus
 wrote:
> If a topic keeps on being discussed after 10+ years of argument, the reason
> is usually either (a) the problem and its solution have not been documented
> properly, or (b) the outcome is an unsatisfactory compromise.

Or (c) The people discussing are ignorant about the topic.

> I went back and read the wiki article you mentioned and was no more the
> wiser as to why the current mess exists. Is it really no more than because
> Delphi continues to screw up in this area, so must FPC? The body of the
> article appears to be a set of notes - not necessarily wrong in themselves
> but lacking the background and context needed to explain why it is like it is.

Hmmm...
Originally the page was a mess because it had lots of irrelevant
background info about the old obsolete LCL Unicode support. Text was
added by many people but none was removed.
Finally I cleaned the page. It now has most relevant info at the top
and then special cases and technical details later.
I am rather happy with the page now, it explains how to use Unicode
with Lazarus as clearly as possible.
However I am willing to improve it. What kind of background and
context would you need?

> 1. Stop using the term "Unicode".

You can stop using it. No problem.
For others however it is a well defined international standard. See:
  https://en.wikipedia.org/wiki/Unicode

> 2. Clean up the char type.
> ...
> Why shouldn't there be a single char type that intuitively represents
> a single character regardless of how many bytes are used to represent it.

What do you mean by "a single character"?
A "character" in Unicode can mean about 7 different things. Which one
is your pick?
This question is for everybody in this thread who used the word "character".

> Yes, in a world where we have to live with UTF8, UTF16, UTF32, legacy code
> pages and Chinese variations on UTF8, that means that dynamic attributes
> have to be included in the type. But isn't that the only way to have
> consistent and intuitive character handling?

What do you mean? Chinese don't have a variation of UTF8.
UTF8 is global unambiguous encoding standard, part of Unicode.

The fundamental problem is that you want to hide the complexity of
Unicode by some magic String type of a compiler.
It is not possible. Unicode remains complex but the complexity is NOT
in encodings!
No, a codepoint's encoding is the easy part. For example I was easily
able to create a unit to support encoding agnostic code. See unit
LazUnicode in package LazUtils.
The complexity is elsewhere:
- "Character" composed of codepoints in precomposed and decomposed
(normalized) forms.
- Compare and sort text based on locale.
- Uppercase / Lowercase rules based on locale.
- Glyphs
- Graphemes
- etc.

I must admit I don't understand well those complex parts.
I do understand codeunits and codepoints, and I understand they are
the easy part.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 11:33:04 +0200
Michael Schnell via Lazarus  wrote:

>[...]
> But in fact "Unicode" is just a universal standard defining 64 bit 
> entities. 

No.
1,114,112 possible code points need at most 21 bits. Due to encoding at
most 32bit.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 11:32, Mattias Gaertner via Lazarus wrote:

Anyone who wants to discuss the grand picture of strings in FPC for the 
millionth time should start a new topic.
Right you are. And it will be by far too late and futile, anyway, 
because of the reasons discussed a million times.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote:
Are you suggesting that internally TStrings should have different 
storage for all possible languages,
Not at all. In the said paper I point out that a new fully dynamical 
string encoding brand would be introduced and same is used for TStrings. 
Everything else will not provide an improvement of the class of problems 
under discussion since years.


-Michael (knowing that this will never happen)
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 11:08, Graeme Geldenhuys via Lazarus wrote:
So it makes sense that TStrings should use UnicodeString internally to 
store its data. The Unicode standard is also the only standard that 
can support any language. 
But in fact "Unicode" is just a universal standard defining 64 bit 
entities. The encoding of those varies: UTF-8, UTF-16 high byte first,  
UTF-16 low byte first,  64 bit low byte first, 64 bit high byte first, 
 fpc and Delphi do support several of those as a string encoding 
(and with that crating any number of problems).


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 11:09:17 +0200
Michael Schnell via Lazarus  wrote:

> On 16.08.2017 10:58, Mattias Gaertner via Lazarus wrote:
> > This thread is going out of topic.
> > Please start a new thread if you want to discuss Delphi strings.  
> You can't discuss fpc's string problems without mentioning Delphi, as 
> they are a direct consequence as well of Delphi-compatibility as of 
> Delphi-incompatibility.

The original post was about a string conversion warning.

Anyone who wants to discuss the grand picture of strings in FPC for
the millionth time should start a new topic.


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 16.08.2017 10:58, Mattias Gaertner via Lazarus wrote:

This thread is going out of topic.
Please start a new thread if you want to discuss Delphi strings.
You can't discuss fpc's string problems without mentioning Delphi, as 
they are a direct consequence as well of Delphi-compatibility as of 
Delphi-incompatibility.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Graeme Geldenhuys via Lazarus

On 2017-08-16 09:43, Michael Schnell via Lazarus wrote:

IMHO, any implementation of TStrings that forces a conversion (just
because the class uses TStrings and not due to a logical demand), is a
contradiction to providing code aware strings at all.


But in FPC 3.x (using modern compiler modes - not TP or Mac) String = 
UnicodeString. So it makes sense that TStrings should use UnicodeString 
internally to store its data. The Unicode standard is also the only 
standard that can support any language. So all Windows code-pages can be 
supported with the single UnicodeString type.


Are you suggesting that internally TStrings should have different 
storage for all possible languages, or some RawByteString type? So if 
you load some non-Latin code-page text internally it still stores that 
text as that non-Latin bytes? That would just over-complicate the 
TStrings class. FPC is moving towards UnicodeString being used 
internally for everything in the RTL, so why must TStrings be any different.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Van Canneyt via Lazarus



On Wed, 16 Aug 2017, Michael Schnell via Lazarus wrote:


On 15.08.2017 22:45, Graeme Geldenhuys via Lazarus wrote:

 How is that not "abuse"???

IMHO it's a major shortcoming to define "string" as "printable text".


On the contrary. That is exactly what it means. 
Anything else is just a collection of bytes.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 15.08.2017 22:45, Graeme Geldenhuys via Lazarus wrote:

 How is that not "abuse"???
IMHO it's a major shortcoming to define "string" as "printable text". In 
fact the name "String" does not suggest this at all. A "string" in my 
understanding just is a  sequence of similar "things".



A string type was definitely not the right choice.
Notwithstanding the discussion about the mere wording, this only would 
hold, if the system would provide a differently named non "printable 
text" basic type that comes with the features needed for such usage: 
reference counting, lazy copy, simple operators for concatenating and 
element extraction and replacement, built-in function for substring 
locating, ...


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 8:53 AM, Bo Berglund via Lazarus
 wrote:
> Based on this experience I wanted to alert the OP of the fact that
> using AnsiString instead of string is not a cure-all for binary data,
> you need to fix the codepage too, which is what the RawByteString does
> for you

Bo, everybody has known for decades that AnsiString is not for binary data.
Why do you proclaim it as a new discovery?
The OP's problem was completely different. It was about text encoding.
TBytes is clearly the right choice for your binary data, but this
discussion is not about binary data!

What means "AnsiString instead of string"?
String is typically an alias for AnsiString.

Your sentence about RawByteString is also wrong.
There is no automatic codepage conversion for RawByteString.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 10:47:37 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 19:29, Luca Olivetti via Lazarus wrote:
> > I has worked extremely well and reliably until fpc 2.6.4 (i.e. with 
> > string=ansistring).
> > Does it not work in 3.x?  
> I understand that storing uncoded Bytes in UTF8-Strings (hence in fpc) 
> works as good as it always had, as long as all strings are defined with 
> the same code branding as TSrings (and friends) is (i.e. UTF8), because 
> there never will be a conversion.
> 
> But it does not work in Delphi, as here TStrings is defined to be UTF-16.

This thread is going out of topic.
Please start a new thread if you want to discuss Delphi strings.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 15.08.2017 21:38, Ondrej Pokorny via Lazarus wrote:


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.
This only works if all tools that you use do the same. And a major tool 
for handling strings is TStrings and it's siblings. You hardly an avoid 
using same.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-16 Thread Michael Schnell via Lazarus

On 15.08.2017 19:18, Graeme Geldenhuys via Lazarus wrote:


Why can't that be changed to a UnicodeString or UTF8String


IMHO, any implementation of TStrings that forces a conversion (just 
because the class uses TStrings and not due to a logical demand), is a 
contradiction to providing code aware strings at all.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] The new kid is growing up fast

2017-08-16 Thread Michael Schnell via Lazarus

On 15.08.2017 21:40, Ondrej Pokorny via Lazarus wrote:

Too bad that Eugene didn't decide to improve Lazarus Cocoa bindings :)

Does he use fpc as a compiler ?

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus