Re: [Lazarus] Converting all code to use UnicodeString

2017-09-28 Thread el es via Lazarus
On 27/09/17 09:16, Graeme Geldenhuys via Lazarus wrote:
> On 2017-09-27 03:51, Marcos Douglas B. Santos via Lazarus wrote:
>> A constant that can change...
> 
> 
> Yeah, that concept still blows my mind. [figuratively speaking] They
> should shoot the developer that came up with that idea - and the team
> leader that approved it.
> 
> Regards, Graeme
> 

comp.compilers.free-pascal.social is leaking ;)

It dates back to when, Turbo Pascal ?
Late 1980s / Early 1990s ?

Imagine this:

Developer (thinking): "The rave was great last weekend, still feeling the pain 
Thursday"
Developer: " we have this almost ready and this looks like a great idea"
Supervisor (thinking): "Ah the world's going to end next week anyway, who cares"
Supervisor: "OK, make it so"

;) 


-L.

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-27 Thread Marcos Douglas B. Santos via Lazarus
On Wed, Sep 27, 2017 at 7:05 AM, Juha Manninen via Lazarus
 wrote:
> On Tue, Sep 26, 2017 at 10:52 PM, Marcos Douglas B. Santos via Lazarus
> [...]
> About the string constant concatenation, just use variables when it is proper:
> const
>   V1: string = 'a';
> var
>   S1: String;
> ... later in code ...
>   S1 := V1 + 'b';
>
> String literals can be assigned without problems as long as your
> variables are "String".
> The big table in the wiki page is intimidating, in reality the issue
> is not so complex.

I'm already doing that.
This not perfect, but is better than have problems. Thanks.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-27 Thread Marcos Douglas B. Santos via Lazarus
On Wed, Sep 27, 2017 at 5:16 AM, Graeme Geldenhuys via Lazarus
 wrote:
> On 2017-09-27 03:51, Marcos Douglas B. Santos via Lazarus wrote:
>>
>> A constant that
>> can change...
>
>
>
> Yeah, that concept still blows my mind. [figuratively speaking] They should
> shoot the developer that came up with that idea - and the team leader that
> approved it.

Everybody has crazy ideias... the problem is who sign them saying
"yeah, go ahead!" :)

Regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-27 Thread Juha Manninen via Lazarus
On Tue, Sep 26, 2017 at 10:52 PM, Marcos Douglas B. Santos via Lazarus
 wrote:
> So we can say that Lazarus code do not use XPath to work with XML, right?

No I cannot say much about the issue. I didn't try it myself.
I understood Mattias and Michael V.C. have plans to migrate the XML
units to FCL sources. Maybe they can elaborate.

> I don't use it. (Windows codepages)

Ok, then I misunderstood.  :)

About the string constant concatenation, just use variables when it is proper:
const
  V1: string = 'a';
var
  S1: String;
... later in code ...
  S1 := V1 + 'b';

String literals can be assigned without problems as long as your
variables are "String".
The big table in the wiki page is intimidating, in reality the issue
is not so complex.


On Tue, Sep 26, 2017 at 7:29 PM, zeljko  wrote:
> POS receipt printers :)

Ok maybe. I don't have one, difficult to say.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-27 Thread Graeme Geldenhuys via Lazarus

On 2017-09-27 03:51, Marcos Douglas B. Santos via Lazarus wrote:

A constant that
can change...



Yeah, that concept still blows my mind. [figuratively speaking] They 
should shoot the developer that came up with that idea - and the team 
leader that approved it.


Regards,
  Graeme

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Marcos Douglas B. Santos via Lazarus
On Tue, Sep 26, 2017 at 5:06 PM, Howard Page-Clark via Lazarus
 wrote:
> On 26/09/17 20:51, Marcos Douglas B. Santos via Lazarus wrote:
>>
>> I understood that I can use like this:
>> const
>>VALUE: string = 'áéíóú';
>>
>> Not like this:
>> const
>>VALUE = 'áéíóú';
>>
>> Right?
>> But this is not compile:
>> const
>>V1: string = 'a';
>>V2: string = V1 + 'b';
>
> You can't do that in a const declaration.
> But in an implementation, the following does compile:
>
> {$J+} {$H+}
> const
>   V1: string = 'a';
>   V2: string = 'b';
>   V3: String = '';
>
> begin
>   V3:=V1 + V2;
>   WriteLn(V3);
> end.

I know this trick that was deprecated a long time ago. A constant that
can change...
I think may be better not using constants in the code anymore.

But thanks, anyway.


Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Howard Page-Clark via Lazarus

On 26/09/17 20:51, Marcos Douglas B. Santos via Lazarus wrote:

I understood that I can use like this:
const
   VALUE: string = 'áéíóú';

Not like this:
const
   VALUE = 'áéíóú';

Right?
But this is not compile:
const
   V1: string = 'a';
   V2: string = V1 + 'b';

You can't do that in a const declaration.
But in an implementation, the following does compile:

{$J+} {$H+}
const
  V1: string = 'a';
  V2: string = 'b';
  V3: String = '';

begin
  V3:=V1 + V2;
  WriteLn(V3);
end.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Marcos Douglas B. Santos via Lazarus
On Tue, Sep 26, 2017 at 9:09 AM, Juha Manninen via Lazarus
 wrote:
> On Tue, Sep 26, 2017 at 12:11 AM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> Yeah, but DOM uses DOMString, which is WideString.
>> Lazarus uses UTF8 and have a laz2_DOM that uses "string", which is
>> UTF8, but I cannot use this unit with XPath unit, which needs a
>> TXMLDocument that works with WideString... see my point?
>
> That is a problem. I guess you can use the units with Lazarus but it
> results to many conversions between encodings.
> It should be solved somehow.

So we can say that Lazarus code do not use XPath to work with XML, right?


Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Marcos Douglas B. Santos via Lazarus
On Tue, Sep 26, 2017 at 6:31 AM, Juha Manninen via Lazarus
 wrote:
> On Tue, Sep 26, 2017 at 4:37 AM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> But according with this table, I shouldn't do that because so many
>> problems could happen.
>> http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8
>
> No. It works when assigning to String and that is what matters.

I understood that I can use like this:
const
  VALUE: string = 'áéíóú';

Not like this:
const
  VALUE = 'áéíóú';

Right?
But this is not compile:
const
  V1: string = 'a';
  V2: string = V1 + 'b';

>>> The solution is to NOT use Windows codepages.
>>> ...
>> So, no problems here and the page is outdated. OK.
>
> The page is correct but your code and/or data is outdated if it uses
> the Windows codepage encoding. :)
> Well, honestly, why do you still use it?
> Unicode has been around for decades. It solved all the horrible
> problems inherent to locale dependent codepages. Windows has supported
> full Unicode for ~18 years.
> Maybe there still is a valid reason to use codepages but I don't know
> what it is.

I don't use it.

Regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread zeljko via Lazarus

On 26.09.2017 11:31, Juha Manninen via Lazarus wrote:


Maybe there still is a valid reason to use codepages but I don't know
what it is.


POS receipt printers :)

zeljko
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Juha Manninen via Lazarus
On Tue, Sep 26, 2017 at 12:11 AM, Marcos Douglas B. Santos via Lazarus
 wrote:
> Yeah, but DOM uses DOMString, which is WideString.
> Lazarus uses UTF8 and have a laz2_DOM that uses "string", which is
> UTF8, but I cannot use this unit with XPath unit, which needs a
> TXMLDocument that works with WideString... see my point?

That is a problem. I guess you can use the units with Lazarus but it
results to many conversions between encodings.
It should be solved somehow.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Juha Manninen via Lazarus
On Tue, Sep 26, 2017 at 4:37 AM, Marcos Douglas B. Santos via Lazarus
 wrote:
> But according with this table, I shouldn't do that because so many
> problems could happen.
> http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8

No. It works when assigning to String and that is what matters.


>> The solution is to NOT use Windows codepages.
>> ...
> So, no problems here and the page is outdated. OK.

The page is correct but your code and/or data is outdated if it uses
the Windows codepage encoding. :)
Well, honestly, why do you still use it?
Unicode has been around for decades. It solved all the horrible
problems inherent to locale dependent codepages. Windows has supported
full Unicode for ~18 years.
Maybe there still is a valid reason to use codepages but I don't know
what it is.


> Like I said, it's a hack. But, again, it was|is a great job. No doubt.

Yes. The wiki page lists 3 simple rules:
* Normally use type "String" instead of UTF8String or UnicodeString.
* Assign a constant always to a type String variable.
* Use type UnicodeString explicitly for API calls that need it.

For you I would add:
* Use Unicode instead of Windows system codepages.

With those rules the code is mostly compatible with Delphi. Not bad.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-26 Thread Kostas Michalopoulos via Lazarus
I do not see how it is a hack, when you have a function taking a null
terminated string of a specific character type (in this case PWideChar) and
you only have the generic string type you don't know what format the
underlying memory of the string is so you cannot pass it as a pointer to
the function. In this case you need to convert to the explicit type.

These are (potentially) two different types, they just happen to be
strings. You could think of it as if you had a function Foo(VP: PSingle)
and a variable V: Number where Number could be either Single or Double
depending on some macro - but you don't know which one, so to avoid passing
a Double you'd need to assign it to a temporary variable to convert it to
the right type.

There is nothing wrong or hacky with that approach, this is how working
with functions that accept pointers work in general - you need to make sure
that the pointer you pass in is of the correct type.


On Tue, Sep 26, 2017 at 4:37 AM, Marcos Douglas B. Santos via Lazarus <
lazarus@lists.lazarus-ide.org> wrote:

> On Mon, Sep 25, 2017 at 9:52 PM, Juha Manninen via Lazarus
>  wrote:
> > On Tue, Sep 26, 2017 at 3:14 AM, Marcos Douglas B. Santos via Lazarus
> >  wrote:
> >> So, you mean that I cannot declare a constant without specify the
> >> type. The language allow me but it won't work?
> >
> > Yes you can declare a string constant without specifying the type.
>
> But according with this table, I shouldn't do that because so many
> problems could happen.
> http://wiki.freepascal.org/Unicode_Support_in_Lazarus#
> Without_.7B.24codepage_utf8.7D_or_compilerswitch_-FcUTF8
>
> >> 3.1. "When a parameter type is a pointer PWideChar,
> >> you need a temporary UnicodeString variable.
> >> ...
> >> That is a ugly hack. This code doesn't make any sense, if you don't
> >> know about these Unicode issues.
> >> We need do remember that trick when we are coding... not good.
> >
> > It is not so ugly. It is actually an elegant solution. Just one
> > assignment, using the FPC's automatic conversion in a clever way. No
> > explicit conversion functions or anything.
> > The "ugly" pointer typecast is needed always, also in Delphi.
>
> The "ugly" is because we need to remember to do that instead of just
> assign the variable.
> IMHO, both design are wrong. But I understand that the problem is in
> the compiler — or RTL.
>
> >> 4. "Reading / writing text file with Windows codepage"
> >> ...
> >> The text said: "This is not compatible with Delphi ".
> >> Examples on that page are hacks.
> >
> > The solution is to NOT use Windows codepages. They can be seen as a
> > historical remain with severe inherent problems which are solved by
> > Unicode already a long ago.
> > Windows has supported full Unicode since year 2000, and supported
> > UCS-2 before that.
> > Why would anybody still use the historical Windows codepages?
>
> So, no problems here and the page is outdated. OK.
>
> >> Summary:
> >> I know that was a huge work for who made that. Lazarus is more
> >> Unicode, more compatible with Delphi, and the team could move on.
> >> Great.
> >> But you might agree with me that this is far from a good design, right?
> >
> > IMO it is not far from a good design. From FPC's point of view it is a
> > hack but you can write 100% Delphi compatible code by following just
> > few simple rules (and dumping the historical Windows codepages).
>
> Like I said, it's a hack. But, again, it was|is a great job. No doubt.
>
> Best regards,
> Marcos Douglas
> --
> ___
> Lazarus mailing list
> Lazarus@lists.lazarus-ide.org
> https://lists.lazarus-ide.org/listinfo/lazarus
>
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Juha Manninen via Lazarus
On Tue, Sep 26, 2017 at 3:14 AM, Marcos Douglas B. Santos via Lazarus
 wrote:
> So, you mean that I cannot declare a constant without specify the
> type. The language allow me but it won't work?

Yes you can declare a string constant without specifying the type.


> 3.1. "When a parameter type is a pointer PWideChar,
> you need a temporary UnicodeString variable.
> ...
> That is a ugly hack. This code doesn't make any sense, if you don't
> know about these Unicode issues.
> We need do remember that trick when we are coding... not good.

It is not so ugly. It is actually an elegant solution. Just one
assignment, using the FPC's automatic conversion in a clever way. No
explicit conversion functions or anything.
The "ugly" pointer typecast is needed always, also in Delphi.


> 4. "Reading / writing text file with Windows codepage"
> ...
> The text said: "This is not compatible with Delphi ".
> Examples on that page are hacks.

The solution is to NOT use Windows codepages. They can be seen as a
historical remain with severe inherent problems which are solved by
Unicode already a long ago.
Windows has supported full Unicode since year 2000, and supported
UCS-2 before that.
Why would anybody still use the historical Windows codepages?


> Summary:
> I know that was a huge work for who made that. Lazarus is more
> Unicode, more compatible with Delphi, and the team could move on.
> Great.
> But you might agree with me that this is far from a good design, right?

IMO it is not far from a good design. From FPC's point of view it is a
hack but you can write 100% Delphi compatible code by following just
few simple rules (and dumping the historical Windows codepages).

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
On Mon, Sep 25, 2017 at 7:52 PM, Juha Manninen via Lazarus
 wrote:
> Marcos Douglas, this wiki page answers all your questions about using
> Unicode with Lazarus:
>  http://wiki.freepascal.org/Unicode_Support_in_Lazarus

OK, let's talk:

1. "Using UTF-8 in non-LCL programs"
"In a non-LCL project add a dependency for LazUtils package. Then add
LazUTF8 unit in the uses section of main program file. It must be near
the beginning, just after the critical memory managers and threading
stuff (e.g. cmem, heaptrc, cthreads)."

Indeed, that was very good. Thanks.
That solved one of my questions. I tested and it worked perfectly.
I would say that should be part of compiler, not in a Lazarus package,
because this is a basic thing that should work without other "3rd
lib".


2. "Assign a constant always to a type String variable."

So, you mean that I cannot declare a constant without specify the
type. The language allow me but it won't work?


3. "Calling API functions that use WideString or UnicodeString"
"When a parameter type is WideString or UnicodeString, you can just
pass a String to it. The compiler converts data automatically. There
will be a warning about converting from AnsiString to UnicodeString
which can be either ignored or suppressed by typecasting the String to
UnicodeString."

Then the example:
=== code begin ===
procedure ApiCall(aParam: UnicodeString);  // Definition.
 ...
ApiCall(S);// Call with String S, ignore warning.
ApiCall(UnicodeString(S)); // Call with String S, suppress warning.

=== code end ===

All these warnings is so annoying. I understood the point here, but I
don't like to see any hint or warning. I need to solve all.
But, I am in doubt about what is more annoying: typecasting all
arguments or ignore all.


3.1. "When a parameter type is a pointer PWideChar, you need a
temporary UnicodeString variable. Assign your String to it. The
compiler then converts its data. Then typecast the temporary variable
to PWideChar."
=== code begin ===
procedure ApiCallP(aParamP: PWideChar);  // Definition.
 ...
var Tmp: UnicodeString;   // Temporary variable.
 ...
Tmp := S; // Assign String -> UnicodeString.
ApiCallP(PWideChar(Tmp)); // Call with temp variable, typecast to pointer.
=== code end ===

That is a ugly hack. This code doesn't make any sense, if you don't
know about these Unicode issues.
We need do remember that trick when we are coding... not good.


4. "Reading / writing text file with Windows codepage"
"This is not compatible with Delphi nor with former Lazarus code. In
practice you must encapsulate the code dealing with system codepage
and convert the data to UTF-8 as quickly as possible."

The text said: "This is not compatible with Delphi ".

Examples on that page are hacks.


5. "CodePoint functions for encoding agnostic code"

I liked to know that exists an unit to work with Code Point which is
agnostic if the encoding is UTF8 or UTF16. I will use it. Thanks
again.


On Mon, Sep 25, 2017 at 8:01 PM, Juha Manninen via Lazarus
 wrote:
> And more ...
>
> Marcos Douglas, the Unicode solution in Lazarus works amazingly well
> when your data is Unicode from the start.
> It only has trouble with Windows system codepages but they can be
> converted, too.

Nowadays, I'm only using Windows so...

> Question: what is the fundamental problem? Why can't you use the
> system as it is advertised and documented?

I've already wrote my issues from the first email. Please, see the
first email and then, one of my answer to Mattias about WideString,
DOM, etc.


Summary:
I know that was a huge work for who made that. Lazarus is more
Unicode, more compatible with Delphi, and the team could move on.
Great.
But you might agree with me that this is far from a good design, right?


Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Juha Manninen via Lazarus
And more ...

Marcos Douglas, the Unicode solution in Lazarus works amazingly well
when your data is Unicode from the start.
It only has trouble with Windows system codepages but they can be
converted, too.
Question: what is the fundamental problem? Why can't you use the
system as it is advertised and documented?

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Juha Manninen via Lazarus
Marcos Douglas, this wiki page answers all your questions about using
Unicode with Lazarus:
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus


On Mon, Sep 25, 2017 at 9:19 PM, Ondrej Pokorny via Lazarus
 wrote:
> You will have to write your own methods with IFDEF-ed code for things
> where it matters (read/write from/to buffer, char-by-char iterations etc.).

For iterating codepoints or even "Unicode characters" (*) you don't need IFDEFs.
Unit LazUnicode provides helper functions and iterators for it.

(*) Unicode character here includes combining codepoints which covers
most practical use cases at least with western languages.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
On Mon, Sep 25, 2017 at 6:10 PM, Sven Barth via Lazarus
 wrote:
> On 25.09.2017 22:18, Marcos Douglas B. Santos via Lazarus wrote:
>> [...]
>> Yes, but using {$modeswitch unicodestrings}, at least in a certain
>> unit, should work with the same code between compilers because
>> "string", for that unit, is UnicodeString as Delphi string is, no?
>
> Yes, but it does not change the types of functions, classes, etc. that
> are used. They have the types they were compiled with while you are
> using a different string type. So you can't simply override a virtual
> method for example that has a String argument that is in fact a
> AnsiString with a method that has a String that's a UnicodeString as
> argument. So of course there will be warnings in case you're passing
> UnicodeString variables to AnsiString variables.

I saw that many RTL functions have an overload like this:
Function FileExists (Const FileName : RawByteString) : Boolean;
Function FileExists (Const FileName : UnicodeString) : Boolean;

The first one calls the second:
Function FileExists (Const FileName : RawByteString) : Boolean;
begin
  Result:=FileExists(UnicodeString(FileName));
end;

My question is:
No matter the encode of FileName: RawByteString is, if I cast to
UnicodeString I will not have any loss of characters?

>> Yes, Lazarus do that by default. But did you see in my examples, at
>> the first email, how many inconsistencies I got, using just Lazarus
>> and change chars in one simple constant?
>
> Note: I'll ignore the GUI example, cause Ondrej might be better for that.

No problem.

> For the console you need to keep in mind that the console - at least on
> Windows - has a code page as well. On my Linux - which is set to UTF-8 -
> your example works without any problem, but if I use Wine I get the same
> output as you.

Ok, but the compiler knows if a program is a CLI, I believe... so, it
could change those variables DefaultSystemCodePage,
DefaultFileSystemCodePage...
For users (developers) is not clear, do you agree?

>>[...]
>> I know almost nothing about compilers. But IMHO, the compiler should
>> have which it already have: "string", which is an alias.
>> Then, for each OS, we should pass one argument like (simplifying):
>> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
>> understood).
>
> The compiler is not the problem. It's that especially the low level part
> of the RTL needs to be aware of the String type and handle it correctly.
> Essentially all functions will need to be checked whether they can
> correctly handle String (as in the generic string type) or are specific
> for AnsiString and thus would need to be adjusted.

I see...

>> I mean, we should not have overload functions, but only one type of
>> string. Even if that type may be RawByteString.
>
> You are wrong. Think about functions reading or writing data from/to
> files. Especially when the data was written with the other String type
> in mind.

It is normal that external data (files) to have different encodes.
IMO, only in these cases, we should care about encoding, because an
external data is outside of our code, we cannot control it.

>> After compiled, we will have a RTL that will work follow the "-S" argument.
>>
>>> So the RTL will be adjusted in a way that it can be easily
>>> compiled with String = UnicodeString or as is now with String =
>>> AnsiString(CP_ACP). But we are not there yet.
>>
>> Now we're talking.
>> Almost everyone that know how to work with "the group of strings",
>> making them compatible between FPC and Delphi, are saying that Unicode
>> is already done and everything is fine. You are the first one to say
>> that is not complete yet. Thank you. I'm glad to know that I'm not
>> crazy.
>
> Unicode itself is working, but in the form of UTF-8, not UTF-16 and as
> such it is as compatible to Delphi as it can currently get with some
> caveats when the specific type is important.

Well, I only setted {mode delphi} and {modeswitch unicodestrings} and
I did not leave Lazarus and still got strange results... looks like
FPC flags is not compatible with itself or Lazarus.
Again, I know that you, Mattias and many others understand that
perfectly. But my examples were very simple, but they didn't work
perfectly using just FPC and Lazarus.

Regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Sven Barth via Lazarus
On 25.09.2017 23:11, Marcos Douglas B. Santos via Lazarus wrote:
>>> [...]
>>> I know almost nothing about compilers. But IMHO, the compiler should
>>> have which it already have: "string", which is an alias.
>>> Then, for each OS, we should pass one argument like (simplifying):
>>> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
>>> understood).
>>
>> The flags are -MDelphiUnicode, -MDelphi or -MObjFPC.
>> But they only compile the units with sources in the unit path, which
>> excludes all FPC units. Also keep in mind that the system unit and the
>> RTL require a lot of low level functions, which require separate
>> versions.
> 
> Which make this flags useless for that. It should be all code (my,
> RTL, Lazarus, etc) to make this work using one type of string.

No, because especially the RTL and FCL is usually provided precompiled.
Thus you can't change the string type anymore afterwards without
recompiling all the code.

>> Unicode <> UnicodeString
>> Unicode is working with UTF-8.
>> If you want a Delphi compatible UTF-16 RTL and packages you are welcome
>> to help the FPC team.
> 
> I can help in a high level way (Classes, Components, etc) not in the
> compiler level.
> But how can I know about these tasks? May I just pick one in bug
> tracker that I want? How to know who is working on each task, which is
> more important?

Currently noone is working on it.

A first step would be to add modeswitch headers to all units that must
not use a specific mode (e.g. the System, ObjPas and some more language
support units) like this:

=== code begin ===

{$ifdef FPC_UNICODE_RTL}
{$modeswitch unicodestrings}
{$endif}

=== code end ===

Once this is done one can test to compile the RTL, FCL and packages with
FPC_UNICODE_RTL defined and see what blows and fix that step by step...

Alternatively a constant in the System unit might be better so that one
can check like this:

=== code begin ===

// System unit
{$ifdef FPC_UNICODE_RTL}
FpcRtlIsUnicode = true;
{$else}
FpcRtlIsUnicode = false;
{$endif}

// some other unit
{$if FpcRtlIsUnicode}
{$modeswitch unicodestrings}
{$endif}

=== code end ===

Or if one wants to compile with -Municodestrings than instead the core
units need to be protected with

=== code begin ===

{$modeswitch unicodestrings-}

=== code end ===

I'm currently not sure what would be the better approach in the long
term... :/

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
Hi Mattias,

On Mon, Sep 25, 2017 at 5:45 PM, Mattias Gaertner via Lazarus
 wrote:
> On Mon, 25 Sep 2017 17:18:05 -0300
> "Marcos Douglas B. Santos via Lazarus" 
> wrote:
>
>>[...]
>
> Your first email does not contain a simple Lazarus+string example. I
> see an example for LCL+unicodestring.

Yes, because I tried to make the code compatible. If Delphi uses UTF16
there is some logic to use it the same encode... I thought.

>>[...]
>> I know almost nothing about compilers. But IMHO, the compiler should
>> have which it already have: "string", which is an alias.
>> Then, for each OS, we should pass one argument like (simplifying):
>> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
>> understood).
>
> The flags are -MDelphiUnicode, -MDelphi or -MObjFPC.
> But they only compile the units with sources in the unit path, which
> excludes all FPC units. Also keep in mind that the system unit and the
> RTL require a lot of low level functions, which require separate
> versions.

Which make this flags useless for that. It should be all code (my,
RTL, Lazarus, etc) to make this work using one type of string.

>> I mean, we should not have overload functions, but only one type of
>> string. Even if that type may be RawByteString.
>
> From a user pov: Yes, that's what Lazarus is recommending: Simply use
> one string type, and that is String. The confusion starts when you start
> using different string types.

Yeah, but DOM uses DOMString, which is WideString.
Lazarus uses UTF8 and have a laz2_DOM that uses "string", which is
UTF8, but I cannot use this unit with XPath unit, which needs a
TXMLDocument that works with WideString... see my point?
RTL was only ANSI, now has overload to UnicodeString... but according
with Sven, the Unicode support is not finished yet.

And what about the huge Warnings between these units, do you think
that is normal to use cast on everything?

> Unicode <> UnicodeString
> Unicode is working with UTF-8.
> If you want a Delphi compatible UTF-16 RTL and packages you are welcome
> to help the FPC team.

I can help in a high level way (Classes, Components, etc) not in the
compiler level.
But how can I know about these tasks? May I just pick one in bug
tracker that I want? How to know who is working on each task, which is
more important?

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Sven Barth via Lazarus
On 25.09.2017 22:18, Marcos Douglas B. Santos via Lazarus wrote:
> Hi Sven,
> First of all, thanks for your time to answer me.
> 
> On Mon, Sep 25, 2017 at 4:43 PM, Sven Barth via Lazarus
>  wrote:
>> On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote:
>>> I understand use IFDEF to compile in different platforms like Windows
>>> vs... err... Haiku. Of Linux vs Nintendo Wii...
>>> But why should I use IFDEF in a code that should be the same in both
>>> compilers (FPC vs Delphi)?
>>
>> Because they *aren't* the same. In Delphi String = UnicodeString while
>> in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a
>> different modeswitch *does not* change that, cause modes are unit specific.
> 
> Yes, but using {$modeswitch unicodestrings}, at least in a certain
> unit, should work with the same code between compilers because
> "string", for that unit, is UnicodeString as Delphi string is, no?

Yes, but it does not change the types of functions, classes, etc. that
are used. They have the types they were compiled with while you are
using a different string type. So you can't simply override a virtual
method for example that has a String argument that is in fact a
AnsiString with a method that has a String that's a UnicodeString as
argument. So of course there will be warnings in case you're passing
UnicodeString variables to AnsiString variables.

>> Especially the RTL is not ready for String = UnicodeString. So your best
>> bet is to use UTF8String or set the default code page to UTF8 (the LCL
>> units do that by default if I remember correctly, but Ondrej can confirm
>> or deny that).
> 
> Yes, Lazarus do that by default. But did you see in my examples, at
> the first email, how many inconsistencies I got, using just Lazarus
> and change chars in one simple constant?

Note: I'll ignore the GUI example, cause Ondrej might be better for that.

For the console you need to keep in mind that the console - at least on
Windows - has a code page as well. On my Linux - which is set to UTF-8 -
your example works without any problem, but if I use Wine I get the same
output as you.

>>> It will be slower than now? Yes, maybe... but we already use objects!
>>> If you want 500% performance, use pointers, records and procedures
>>> with whatever encode you want. But if you use objects, the overhead
>>> already exists... and who cares? 1ms... 2ms... even 2s that you may
>>> lost using UTF16? (or UTF8, but make all equal!) So? The world is
>>> using Ruby and they don't care... or Python, Java... and they store in
>>> UTF16 too, which requires a double of space... but if it works and the
>>> code is clean, should be more important, don't agree?
>>
>> For FPC also more restricted targets are to be kept in mind (AVR, DOS,
>> etc.).
> 
> I know almost nothing about compilers. But IMHO, the compiler should
> have which it already have: "string", which is an alias.
> Then, for each OS, we should pass one argument like (simplifying):
> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
> understood).

The compiler is not the problem. It's that especially the low level part
of the RTL needs to be aware of the String type and handle it correctly.
Essentially all functions will need to be checked whether they can
correctly handle String (as in the generic string type) or are specific
for AnsiString and thus would need to be adjusted.

> I mean, we should not have overload functions, but only one type of
> string. Even if that type may be RawByteString.

You are wrong. Think about functions reading or writing data from/to
files. Especially when the data was written with the other String type
in mind.

> 
> After compiled, we will have a RTL that will work follow the "-S" argument.
> 
>> So the RTL will be adjusted in a way that it can be easily
>> compiled with String = UnicodeString or as is now with String =
>> AnsiString(CP_ACP). But we are not there yet.
> 
> Now we're talking.
> Almost everyone that know how to work with "the group of strings",
> making them compatible between FPC and Delphi, are saying that Unicode
> is already done and everything is fine. You are the first one to say
> that is not complete yet. Thank you. I'm glad to know that I'm not
> crazy.

Unicode itself is working, but in the form of UTF-8, not UTF-16 and as
such it is as compatible to Delphi as it can currently get with some
caveats when the specific type is important.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Mattias Gaertner via Lazarus
On Mon, 25 Sep 2017 17:18:05 -0300
"Marcos Douglas B. Santos via Lazarus" 
wrote:

>[...]
> Yes, but using {$modeswitch unicodestrings}, at least in a certain
> unit, should work with the same code between compilers because
> "string", for that unit, is UnicodeString as Delphi string is, no?

The important thing is "in a certain unit". As soon as you access
strings from other units, you have to consider their type.

 
> > Especially the RTL is not ready for String = UnicodeString. So your best
> > bet is to use UTF8String or set the default code page to UTF8 (the LCL
> > units do that by default if I remember correctly, but Ondrej can confirm
> > or deny that).  

Unit LazUtf8 does it.

 
> Yes, Lazarus do that by default. But did you see in my examples, at
> the first email, how many inconsistencies I got, using just Lazarus
> and change chars in one simple constant?

Your first email does not contain a simple Lazarus+string example. I
see an example for LCL+unicodestring.

 
>[...]
> I know almost nothing about compilers. But IMHO, the compiler should
> have which it already have: "string", which is an alias.
> Then, for each OS, we should pass one argument like (simplifying):
> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
> understood).

The flags are -MDelphiUnicode, -MDelphi or -MObjFPC.
But they only compile the units with sources in the unit path, which
excludes all FPC units. Also keep in mind that the system unit and the
RTL require a lot of low level functions, which require separate
versions.


> I mean, we should not have overload functions, but only one type of
> string. Even if that type may be RawByteString.

From a user pov: Yes, that's what Lazarus is recommending: Simply use
one string type, and that is String. The confusion starts when you start
using different string types.


> After compiled, we will have a RTL that will work follow the "-S" argument.

The RTL has already a lot of IFDEFs for the coming UnicodeString RTL.

 
> > So the RTL will be adjusted in a way that it can be easily
> > compiled with String = UnicodeString or as is now with String =
> > AnsiString(CP_ACP). But we are not there yet.  
> 
> Now we're talking.
> Almost everyone that know how to work with "the group of strings",
> making them compatible between FPC and Delphi, are saying that Unicode
> is already done and everything is fine. You are the first one to say
> that is not complete yet. Thank you. I'm glad to know that I'm not
> crazy.

Unicode <> UnicodeString
Unicode is working with UTF-8.
If you want a Delphi compatible UTF-16 RTL and packages you are welcome
to help the FPC team.


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
Hi Sven,
First of all, thanks for your time to answer me.

On Mon, Sep 25, 2017 at 4:43 PM, Sven Barth via Lazarus
 wrote:
> On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote:
>> I understand use IFDEF to compile in different platforms like Windows
>> vs... err... Haiku. Of Linux vs Nintendo Wii...
>> But why should I use IFDEF in a code that should be the same in both
>> compilers (FPC vs Delphi)?
>
> Because they *aren't* the same. In Delphi String = UnicodeString while
> in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a
> different modeswitch *does not* change that, cause modes are unit specific.

Yes, but using {$modeswitch unicodestrings}, at least in a certain
unit, should work with the same code between compilers because
"string", for that unit, is UnicodeString as Delphi string is, no?

> Especially the RTL is not ready for String = UnicodeString. So your best
> bet is to use UTF8String or set the default code page to UTF8 (the LCL
> units do that by default if I remember correctly, but Ondrej can confirm
> or deny that).

Yes, Lazarus do that by default. But did you see in my examples, at
the first email, how many inconsistencies I got, using just Lazarus
and change chars in one simple constant?

>> It will be slower than now? Yes, maybe... but we already use objects!
>> If you want 500% performance, use pointers, records and procedures
>> with whatever encode you want. But if you use objects, the overhead
>> already exists... and who cares? 1ms... 2ms... even 2s that you may
>> lost using UTF16? (or UTF8, but make all equal!) So? The world is
>> using Ruby and they don't care... or Python, Java... and they store in
>> UTF16 too, which requires a double of space... but if it works and the
>> code is clean, should be more important, don't agree?
>
> For FPC also more restricted targets are to be kept in mind (AVR, DOS,
> etc.).

I know almost nothing about compilers. But IMHO, the compiler should
have which it already have: "string", which is an alias.
Then, for each OS, we should pass one argument like (simplifying):
-S=UnicodeString  or -S=AnsiString... something like that (I hope you
understood).
I mean, we should not have overload functions, but only one type of
string. Even if that type may be RawByteString.

After compiled, we will have a RTL that will work follow the "-S" argument.

> So the RTL will be adjusted in a way that it can be easily
> compiled with String = UnicodeString or as is now with String =
> AnsiString(CP_ACP). But we are not there yet.

Now we're talking.
Almost everyone that know how to work with "the group of strings",
making them compatible between FPC and Delphi, are saying that Unicode
is already done and everything is fine. You are the first one to say
that is not complete yet. Thank you. I'm glad to know that I'm not
crazy.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
I understand use IFDEF to compile in different platforms like Windows
vs... err... Haiku. Of Linux vs Nintendo Wii...
But why should I use IFDEF in a code that should be the same in both
compilers (FPC vs Delphi)?
Is it because the string type is not Unicode? OK, so I want to convert
all to use UTF16, ie, UnicodeString (wrong name) and make ALL code
compatible. But this is looks like not possible without:

* IFDEFs
* know a few {modes}
* know what type of string I'm working on


If there is an argument in the compiler to compile it with the
definition of "all string is an UnicodeString like Java, C#, Delphi
and all them", would be great.
Then we will compile the compiler and Lazarus with the same type of
string and everything will work.
It will be slower than now? Yes, maybe... but we already use objects!
If you want 500% performance, use pointers, records and procedures
with whatever encode you want. But if you use objects, the overhead
already exists... and who cares? 1ms... 2ms... even 2s that you may
lost using UTF16? (or UTF8, but make all equal!) So? The world is
using Ruby and they don't care... or Python, Java... and they store in
UTF16 too, which requires a double of space... but if it works and the
code is clean, should be more important, don't agree?

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
On Mon, Sep 25, 2017 at 3:19 PM, Ondrej Pokorny via Lazarus
 wrote:
> On 25.09.2017 20:02, Marcos Douglas B. Santos via Lazarus wrote:
>>
>> May I code using just "string"?
>
>
> Yes. LCL is ANSI/UTF8 only, so is TStrings.
>
> You can write Lazarus+Delphi compatible code without a lot of problems. Just
> use the string type. The only thing you have to be aware is that in Delphi
> you work with UTF-16 and in Lazarus with UTF-8 - but for most cases it
> doesn't really matter. You will have to write your own methods with IFDEF-ed
> code for things where it matters (read/write from/to buffer, char-by-char
> iterations etc.).

But my code had different outputs and/or warnings only using Lazarus!

You said compatible.
What about Warnings?
Why I need IFDEF-ed if the code "is" compatible?

For example, is this code compatible/work with/on Delphi?
https://github.com/mdbs99/james/blob/a9ad48fb8eaf4f11c6dd7b65d6ac2f63e6fc09fb/test/james.data.tests.pas#L57

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Ondrej Pokorny via Lazarus

On 25.09.2017 20:02, Marcos Douglas B. Santos via Lazarus wrote:

May I code using just "string"?


Yes. LCL is ANSI/UTF8 only, so is TStrings.

You can write Lazarus+Delphi compatible code without a lot of problems. 
Just use the string type. The only thing you have to be aware is that in 
Delphi you work with UTF-16 and in Lazarus with UTF-8 - but for most 
cases it doesn't really matter. You will have to write your own methods 
with IFDEF-ed code for things where it matters (read/write from/to 
buffer, char-by-char iterations etc.).


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


[Lazarus] Converting all code to use UnicodeString

2017-09-25 Thread Marcos Douglas B. Santos via Lazarus
Hi,

Yes, yes... another email about Unicode, because it has not been
completed yet. Sorry.
I would like to know how I can develop CLI and GUI (Lazarus) programs
using just UnicodeString, making them all compatible with Delphi
compiler.

My environment is:
Lazarus 1.9.0 r54784 FPC 3.0.1 i386-win32-win32/win64

I thought that I could use this:

{$mode delphi}
{$modeswitch unicodestrings}

Then, I made some tests:

1- Writing a CLI with basic chars:

===code-begin===
program Project1;
{$mode delphi}
{$modeswitch unicodestrings}
uses
  SysUtils, Classes;
const
  TXT = '1'#13#10'2'#13#10'3';
var
  Ss: TStrings;
begin
  Ss := TStringList.Create;
  try
Ss.Text := TXT;
Writeln('text ', ss.Text);
Writeln('count ', ss.Count);
  finally
Ss.Free;
  end;
  ReadLn;
end.
===code-end===

===output-begin===
text 1
2
3

count 3
===output-end===

Everything worked. No warnings. Good.


Then I changed the const like this:
===code-begin===
const
  TXT: string = '1'#13#10'2'#13#10'3';
===code-end===

Everything worked. But now I have a warning:
project1.lpr(13,19) Warning: Implicit string type conversion with
potential data loss from "UnicodeString" to "AnsiString"

Why?
Is not String supposed to be UnicodeString?
Is TStrings ANSI and because that I got this warning?


Then I changed the const like this:
===code-begin===
const
  TXT = '1'#13#10'2'#13#10'3'#13#10'áéíóú';
===code-end===


And:
===output-begin===
text 1
2
3
áéíóú

count 4
===output-end===

Is it not possible to write accented chars to display at the console?




2- Writing a GUI with basic chars:

I just copy the same code with some modifications:
===code-begin===
unit Unit1;

{$mode delphi}
{$modeswitch unicodestrings}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
Button1: TButton;
procedure Button1Click(Sender: TObject);
  end;

var
  Form1: TForm1;

implementation

{$R *.lfm}

const
  TXT = '1'#13#10'2'#13#10'3'#13#10;

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
var
  Ss: TStrings;
begin
  Ss := TStringList.Create;
  try
Ss.Text := TXT;
ShowMessage('text ' + ss.Text);
ShowMessage('count ' + ss.Count.ToString);
  finally
Ss.Free;
  end;
end;

end.
===code-end===


Everything worked. But now I have a few warnings:

Compile Project, Target: C:\temp\project1.exe: Success, Warnings: 4
unit1.pas(36,29) Warning: Implicit string type conversion from
"AnsiString" to "UnicodeString"
unit1.pas(36,34) Warning: Implicit string type conversion with
potential data loss from "UnicodeString" to "AnsiString"
unit1.pas(37,30) Warning: Implicit string type conversion from
"AnsiString" to "UnicodeString"
unit1.pas(37,45) Warning: Implicit string type conversion with
potential data loss from "UnicodeString" to "AnsiString"


Then, I changed the const like this (again):
===code-begin===
const
  TXT: string = '1'#13#10'2'#13#10'3';
===code-end===

Everything worked. But now I have same warnings before.



Then, I changed the const like this (yeah, again):
===code-begin===
const
  TXT = '1'#13#10'2'#13#10'3'#13#10'áéíóú';
===code-end===

Everything worked. But now I have same warnings before.


Finally ,I changed the const like this:
===code-begin===
const
  TXT: string = '1'#13#10'2'#13#10'3'#13#10'áéíóú';
===code-end===

And now, the first message is:
[Window Title]
project1

[Content]
text 1
2
3
áéíóú



Summary:

1. I did simple programs with simple constants and I got different
results or warnings.

2. I truly believe that smarter people as FPC team, Lazarus team, and
all smart collaborators can code CLI, GUI and all compatible with
Delphi... but for me is still very hard to understand.


What I need and my thoughts for help you to help me:

1. I would like to use just "string" everywhere.
2. I DON'T care about performance like gain 2ms... 10ms... 1s... I
really don't care, if I can code a simple and elegant code.
3. Don't think external files. If a have a UTF8 encoded file, I know
that and I can use string conversion... but it is EXTERNAL, not part
of the code. So, in my mind this is OK and normal.

Again, my env is:
Lazarus 1.9.0 r54784 FPC 3.0.1 i386-win32-win32/win64


May I code using just "string"?

Thank you.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus