subject:"\[Lazarus\] Making sources compatible with Delphi \(but Lazarus is priority\)"

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Sven Barth via Lazarus

On 07.05.2017 12:17, Florian Klaempfl via Lazarus wrote:
> Am 07.05.2017 um 12:11 schrieb Sven Barth via Lazarus:
>> Am 07.05.2017 12:07 schrieb "Florian Klaempfl via Lazarus"
>> mailto:lazarus@lists.lazarus-ide.org>>:
>>>
>>> Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus:
 On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote:
>> Yeah, that would be the logical thing to do.
>
> Why? What makes a string literal UTF-8?
>

 As Mattias said, the fact that the source unit is UTF-8 encoded.
 Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source
 unit is UTF-8 encoded, the literal string constant can't (and
 shouldn't) be in any other encoding.

 I would say the same if the source unit was stored in UTF-16
 encoding. Then string literals would be treated as UTF-16.
>>>
>>> And if a ISO/Ansi codepage is given? Things would probably fail.
>>>
>>> The point is: FPC is consistent in this regard: also sources with a
>>> given iso/ansi codepage are handled the same way. If there is a string
>>> literal with non-ascii chars, it is converted to UTF-16 using the
>>> codepage of the source. Very simple, very logical. It is a matter of
>>> preference if UTF-8, -16, -32 are chosen at this point, but FPC uses
>>> UTF-16. If it uses UTF-8, the problem would occur the other way around.
>>>
>>> If no codepage is given (by directive, command line, BOM), string
>>> literals are handled byte-wise as raw strings.
>>
>> Small correction: FPC only does this conversion if the codepage is
>> UTF-8, no other.
> 
> Then something is wrong/broken :)
> 

Well, the code in tscannerfile.readtoken() only does the conversion to
UTF-16 if the source codepage is UTF-8, otherwise it only converts to
UTF-16 if the string is already an UTF-16 string.
So probably not broken as it seems rather on purpose; if at all it's
wrong...

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Graeme Geldenhuys via Lazarus

On 2017-05-07 11:17, Florian Klaempfl via Lazarus wrote:
> Then something is wrong/broken :)


I rest my case.  :-P


Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Florian Klaempfl via Lazarus

Am 07.05.2017 um 12:11 schrieb Sven Barth via Lazarus:
> Am 07.05.2017 12:07 schrieb "Florian Klaempfl via Lazarus"
> mailto:lazarus@lists.lazarus-ide.org>>:
>>
>> Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus:
>> > On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote:
>> >>> Yeah, that would be the logical thing to do.
>> >>
>> >> Why? What makes a string literal UTF-8?
>> >>
>> >
>> > As Mattias said, the fact that the source unit is UTF-8 encoded.
>> > Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source
>> > unit is UTF-8 encoded, the literal string constant can't (and
>> > shouldn't) be in any other encoding.
>> >
>> > I would say the same if the source unit was stored in UTF-16
>> > encoding. Then string literals would be treated as UTF-16.
>>
>> And if a ISO/Ansi codepage is given? Things would probably fail.
>>
>> The point is: FPC is consistent in this regard: also sources with a
>> given iso/ansi codepage are handled the same way. If there is a string
>> literal with non-ascii chars, it is converted to UTF-16 using the
>> codepage of the source. Very simple, very logical. It is a matter of
>> preference if UTF-8, -16, -32 are chosen at this point, but FPC uses
>> UTF-16. If it uses UTF-8, the problem would occur the other way around.
>>
>> If no codepage is given (by directive, command line, BOM), string
>> literals are handled byte-wise as raw strings.
> 
> Small correction: FPC only does this conversion if the codepage is
> UTF-8, no other.

Then something is wrong/broken :)

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Florian Klaempfl via Lazarus

Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus:
> On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote:
>>> Yeah, that would be the logical thing to do. 
>>
>> Why? What makes a string literal UTF-8?
>>
> 
> As Mattias said, the fact that the source unit is UTF-8 encoded.
> Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source
> unit is UTF-8 encoded, the literal string constant can't (and
> shouldn't) be in any other encoding.
> 
> I would say the same if the source unit was stored in UTF-16
> encoding. Then string literals would be treated as UTF-16.

And if a ISO/Ansi codepage is given? Things would probably fail.

The point is: FPC is consistent in this regard: also sources with a
given iso/ansi codepage are handled the same way. If there is a string
literal with non-ascii chars, it is converted to UTF-16 using the
codepage of the source. Very simple, very logical. It is a matter of
preference if UTF-8, -16, -32 are chosen at this point, but FPC uses
UTF-16. If it uses UTF-8, the problem would occur the other way around.

If no codepage is given (by directive, command line, BOM), string
literals are handled byte-wise as raw strings.

> 
> It's perfectly logical to me.

It is logical only in a limited view.

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Sven Barth via Lazarus

Am 07.05.2017 12:07 schrieb "Florian Klaempfl via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus:
> > On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote:
> >>> Yeah, that would be the logical thing to do.
> >>
> >> Why? What makes a string literal UTF-8?
> >>
> >
> > As Mattias said, the fact that the source unit is UTF-8 encoded.
> > Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source
> > unit is UTF-8 encoded, the literal string constant can't (and
> > shouldn't) be in any other encoding.
> >
> > I would say the same if the source unit was stored in UTF-16
> > encoding. Then string literals would be treated as UTF-16.
>
> And if a ISO/Ansi codepage is given? Things would probably fail.
>
> The point is: FPC is consistent in this regard: also sources with a
> given iso/ansi codepage are handled the same way. If there is a string
> literal with non-ascii chars, it is converted to UTF-16 using the
> codepage of the source. Very simple, very logical. It is a matter of
> preference if UTF-8, -16, -32 are chosen at this point, but FPC uses
> UTF-16. If it uses UTF-8, the problem would occur the other way around.
>
> If no codepage is given (by directive, command line, BOM), string
> literals are handled byte-wise as raw strings.

Small correction: FPC only does this conversion if the codepage is UTF-8,
no other.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Graeme Geldenhuys via Lazarus

On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote:
>> Yeah, that would be the logical thing to do. 
>
> Why? What makes a string literal UTF-8?
> 

As Mattias said, the fact that the source unit is UTF-8 encoded.
Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source
unit is UTF-8 encoded, the literal string constant can't (and
shouldn't) be in any other encoding.

I would say the same if the source unit was stored in UTF-16
encoding. Then string literals would be treated as UTF-16.

It's perfectly logical to me.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Florian Klaempfl via Lazarus

Am 07.05.2017 um 10:30 schrieb Mattias Gaertner via Lazarus:
> On Sun, 7 May 2017 10:10:26 +0200
> Florian Klaempfl via Lazarus  wrote:
> 
>> Am 05.05.2017 um 13:35 schrieb Graeme Geldenhuys via Lazarus:
>>> On 2017-05-05 12:17, Mattias Gaertner via Lazarus wrote:  
 I wonder if it would help if FPC would store UTF-8 string literals as
 UTF-8   
>>>
>>> Yeah, that would be the logical thing to do.   
>>
>> Why? What makes a string literal UTF-8?
> 
> Here: $codepage utf-8 and non ASCII.
> 
> 
>>> FPC not doing that is what
>>> really confused me.  
>>
>> You have to distinuish between source encoding and string encoding.
> 
> Yes, but sometimes the string encoding is not obvious:
> 
> {$codepage utf8}
> const s = 'äöüالعَرَبِيَّة';
> begin
>   writeln(s); // needs widestringmanager
> end.

Yes. Which is good imo. The compiler should call the unicode writeln in
this case and this requires a widestringmanager (where I agree, its name
is chosen wrong, it should be named unicodestringmanager) anyways. If
the string were encoded UTF-8 and the ansistring writeln is called, your
example would work only on a system with an utf-8 console. But you can
achieve the same (broken output) by just leaving away the codepage
directive then. It will not work on non utf-8 consoles either.

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Mattias Gaertner via Lazarus

On Sun, 7 May 2017 10:10:26 +0200
Florian Klaempfl via Lazarus  wrote:

> Am 05.05.2017 um 13:35 schrieb Graeme Geldenhuys via Lazarus:
> > On 2017-05-05 12:17, Mattias Gaertner via Lazarus wrote:  
> >> I wonder if it would help if FPC would store UTF-8 string literals as
> >> UTF-8   
> > 
> > Yeah, that would be the logical thing to do.   
> 
> Why? What makes a string literal UTF-8?

Here: $codepage utf-8 and non ASCII.


> > FPC not doing that is what
> > really confused me.  
> 
> You have to distinuish between source encoding and string encoding.

Yes, but sometimes the string encoding is not obvious:

{$codepage utf8}
const s = 'äöüالعَرَبِيَّة';
begin
  writeln(s); // needs widestringmanager
end.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-07 Thread Florian Klaempfl via Lazarus

Am 05.05.2017 um 13:35 schrieb Graeme Geldenhuys via Lazarus:
> On 2017-05-05 12:17, Mattias Gaertner via Lazarus wrote:
>> I wonder if it would help if FPC would store UTF-8 string literals as
>> UTF-8 
> 
> Yeah, that would be the logical thing to do. 

Why? What makes a string literal UTF-8?

> FPC not doing that is what
> really confused me.

You have to distinuish between source encoding and string encoding.

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 16:36:51 +0300
Juha Manninen via Lazarus  wrote:

> On Fri, May 5, 2017 at 4:21 PM, Mattias Gaertner via Lazarus
>  wrote:
> > Oops. Which one?  
> 
> The FAQ says:
> "Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8}
> at the beginning of the unit."

I improved it a bit.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus

On Fri, May 5, 2017 at 4:21 PM, Mattias Gaertner via Lazarus
 wrote:
> Oops. Which one?

The FAQ says:
"Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8}
at the beginning of the unit."

The same page in "String Literals" section says:
 "In most cases {$codepage utf8} / -FcUTF8 is not needed."
which is the correct information.

Actually I don't know if that FAQ entry is yours. Many people have
added stuff there. The page is intimidating for a user who just wants
to support Unicode without a fuss.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus

On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
 wrote:
> That is mainly due to the compiler not supporting surrogate pairs for the
> UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
> a problem anymore...

That is a serious bug. Getting codepoints right is the absolute
minimum requirement for Unicode support. Surrogate pairs are the
UTF-16 equivalent of multi-byte codepoints in UTF-8.

Now I understand this was not caused by our UTF-8 run-time switch
"hack". It is a plain bug in FPC.
Is there a plan to fix it?

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 14:12:05 +0300
Juha Manninen via Lazarus  wrote:

>[...]
> Then Mattias adds FAQs contradicting the earlier texts ...

Oops. Which one?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Sven Barth via Lazarus

Am 05.05.2017 13:50 schrieb "Juha Manninen via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> On Fri, May 5, 2017 at 2:29 PM, Michael Van Canneyt via Lazarus
>  wrote:
> > Then what is still the problem ?
>
> With BOM you get:
>  Error: UTF-8 code greater than 65535 found
> which is counter-intuitive when the file and the string literal are both
UTF-8.

That is mainly due to the compiler not supporting surrogate pairs for the
UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't
be a problem anymore... (though of course it would need to be ensured that
other parts of the RTL support surrogate pairs as well)

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 12:49, Juha Manninen via Lazarus wrote:
> A wrong information easily propagates, thus it is important to get this right.

No worries, I agree. Thanks for correcting my terminology.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus

On Fri, May 5, 2017 at 2:02 PM, Graeme Geldenhuys via Lazarus
 wrote:
> If so, when why does LCL also call the above two functions?

Graeme, they are called by LazUtils package, LazUTF8 unit, not by LCL.
It is not limited to GUI programming.
A wrong information easily propagates, thus it is important to get this right.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus

On Fri, May 5, 2017 at 2:29 PM, Michael Van Canneyt via Lazarus
 wrote:
> Then what is still the problem ?

With BOM you get:
 Error: UTF-8 code greater than 65535 found
which is counter-intuitive when the file and the string literal are both UTF-8.
It is related to changing the default codepage at run-time which is a
hack from FPC's POV.
For the same reason we need this grid:
 
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals_Overview
So, it is not only a communication issue. It is truly messy. If only
it could be improved...

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 13:02, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 10:41, Ondrej Pokorny via Lazarus wrote:

Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string
is unicode enabled.

So does that mean you don't have to also call the following two functions 
(which LCL does).

  SetMultiByteConversionCodePage(CP_UTF8);
  SetMultiByteRTLFileSystemCodePage(CP_UTF8);


So doing

DefaultSystemCodePage := CP_UTF8;

is all you need to switch the RTL, FCL and the String data type to UTF-8?

If so, when why does LCL also call the above two functions?


SetMultiByteConversionCodePage does only one thing: it sets 
DefaultSystemCodePage :) So yes, if you set DefaultSystemCodePage you 
don't have to call SetMultiByteConversionCodePage.


You are right - I forgot about 
SetMultiByteRTLFileSystemCodePage/DefaultRTLFileSystemCodePage.


BUT if I take a look into the RTL sources I see that it's used only in 
FindFirst/FindNext, FExpand and GetDir/do_GetDir. And only in the result 
strings. IMO it could be removed and replaced with DefaultSystemCodePage.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 12:17, Mattias Gaertner via Lazarus wrote:
> I wonder if it would help if FPC would store UTF-8 string literals as
> UTF-8 

Yeah, that would be the logical thing to do. FPC not doing that is what
really confused me.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Mattias Gaertner via Lazarus wrote:


On Fri, 5 May 2017 12:52:48 +0200 (CEST)
Michael Van Canneyt via Lazarus  wrote:


[...]
I propose to let the compiler observe the BOM. 
But I don't think more is needed.


FPC observes the BOM. Same as Delphi.


Then what is still the problem ?



I wonder if it would help if FPC would store UTF-8 string literals as
UTF-8 and how much work that is.


Amount of work is probably not so much.

The question is whether it will cause problems for e.g. the JVM code
generator.

But I still fail to see the actual problem, aside from a lot of confusion by
users.

Confusion arises from a lack of clear information.

So in my opinion, we simply need to provide clear information, before we
start changing things.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 12:52:48 +0200 (CEST)
Michael Van Canneyt via Lazarus  wrote:

>[...]
> I propose to let the compiler observe the BOM. 
> But I don't think more is needed.

FPC observes the BOM. Same as Delphi.

I wonder if it would help if FPC would store UTF-8 string literals as
UTF-8 and how much work that is.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Schnell via Lazarus


On 05.05.2017 12:16, Graeme Geldenhuys via Lazarus wrote:

In the end it’s about supporting Unicode. Does it really matter
what internal encoding it is to achieve the “Unicode support”
goal?


Yep it does.

There are ways around that issue (i.e. code aware strings) but in fact 
these trigger a new bunch of problems.


You might want to read -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 11:55, Jürgen Hestermann via Lazarus wrote:
> I use UTF-8 internally and
> convert to/from UTF-16 for all Windows API functions and
> I never found any problem with it.
> The time that the API functions requires is so much longer than the
> time for string conversion that it does not matter at all.

This is what I've been doing for years, and I agree, it works great.
Windows is also the only platform (of any modern OS) that doesn't
use UTF-8 as standard - so I consider it the minority.

> A situation where it may be a problem is when reading
> (UTF-16 encoded) text files.

I'm yet to find a UTF-16 encoded text file in the wild. I'm not saying
they don't exist, I'm just saying they are extremely rare and more
like an anomaly. UTF-8 seems to rule the roost and the Internet.

This graph should say it all:

https://en.wikipedia.org/wiki/File:Utf8webgrowth.svg

  (source):  https://en.wikipedia.org/wiki/UTF-8

Even so, a simple conversion to UTF-8 at load time should resolve
all possible problems.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus

On Fri, May 5, 2017 at 1:20 AM, Graeme Geldenhuys via Lazarus
 wrote:
> A case in point. Looking at the Wiki page you listed, I read the following:
> "
> Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8} at the 
> beginning of the unit.
> ...

Uhhh, the same page in "String Literals" section says:
 "In most cases {$codepage utf8} / -FcUTF8 is not needed."
which is the correct information.

Also this wiki page has become a mess when many people add stuff but
nobody removes any.
For example Michl added the grid about how constant assignment works
with and without {$codepage utf8}. It is nice but he didn't remove the
other paragraphs explaining the same thing. It looks like an extremely
complex topic for a new user, while in reality he should code like
with Delphi + remember only few simple rules.
Then Mattias adds FAQs contradicting the earlier texts ...

The comment from Martok was valid. This page is not good for users who
just want to get started quickly.
I will simplify the page. I will remove stuff and move the FAQ to a
new page. Sorry in advance for people who's text will be removed.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 10:41, Ondrej Pokorny via Lazarus wrote:
> Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string 
> is unicode enabled.

So does that mean you don't have to also call the following two functions 
(which LCL does).

 SetMultiByteConversionCodePage(CP_UTF8);
 SetMultiByteRTLFileSystemCodePage(CP_UTF8);

So doing

   DefaultSystemCodePage := CP_UTF8;

is all you need to switch the RTL, FCL and the String data type to UTF-8?

If so, when why does LCL also call the above two functions?

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 12:55, Jürgen Hestermann via Lazarus wrote:

A situation where it may be a problem is when reading
(UTF-16 encoded) text files.


No, not at all. If you convert the file on the fly, there is almost 0 
performance penalty.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 12:01, Michael Van Canneyt via Lazarus wrote:

On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:
Believe me, I use it in production without any problems: I have 
unicode-aware TStrings, I can read files with unicode names, I can do 
everything with plain FPC trunk.


I am aware of this, I do it myself. But I work on Linux, where UTF8 is 
the norm.


So I cannot vouch for other platforms...


For now I am only on Windows and I have to say loadly: IT WORKS GREAT :)


I don't need a 100% UTF-16 Delphi-Compatible RTL for that at all.


This is the crux of the problem. Is this wanted/needed or do we stick 
to UTF8 ?


We claim Delphi compatibility. So IMHO we must provide a UTF-16 Delphi 
compatible RTL.


I write code that is compatible with FPC and Delphi 5 - 10.2 and it 
works fine. So you already have a Delphi-compatible RTL. The only (well 
documented) difference is that FPC uses single-byte string and Delphi 
uses 2-byte string.


The only place where you need to handle the difference is where you need 
the size of char (when you access string as buffer) - which is 
particularly low-level code:


MyStream.WriteBuffer(MyString[1], Length(MyString) * SizeOf(Char));

-> you need the extra SizeOf(Char) and not a constant (1 for fpc, 2 for 
unicode Delphi).


That's all. All high-level code is compatible already. Good job. I 
really do think it's not worth it to pollute FPC RTL with UnicodeString 
overloads of every function, class etc.


Better to keep 1 clean approach (UTF-8 RTL) and not confuse people with 
2 approaches (UTF-8 vs UTF-16). E.g. how do you want to call the new 
UnicodeString-TStrings class? You have 2 options:
1.) Break compatibility to legacy FPC. (New TStrings will use 
UnicodeString.)

2.) Break compatibility to Delphi. (TStrings will stay with 8-byte string.)

There is no obvious solution for the problem :/

And then if you will introduce a compiler switch to change String from 
1-byte to 2-bytes... Oh no, so much mess and so many variants to care 
about. Really, sometimes it's better to give people no options :) (Or 
have you already introduced the switch?)


Just stick with current utf8 approach that proved well :)

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Jürgen Hestermann via Lazarus


Am 2017-05-05 um 12:16 schrieb Graeme Geldenhuys via Lazarus:
> In the end it’s about supporting Unicode. Does it really matter
> what internal encoding it is to achieve the “Unicode support”
> goal?

From a performance perspective it may be unwanted
to convert string encodings back and forth all the time.

Although, in my file manager I use UTF-8 internally and
convert to/from UTF-16 for all Windows API functions and
I never found any problem with it.
The time that the API functions requires is so much longer than the
time for string conversion that it does not matter at all.
Even fast API-functions like changing attributes only take
a second for thousands of files.

A situation where it may be a problem is when reading
(UTF-16 encoded) text files.
But I never stumbled over such a thing yet.

I would promote the use of UTF-8 whereever possible
while converting to target encodings only when unavoidable.
It makes life much easier if you only concentrate on one (the best)
Unicode encoding (UTF-8).

Therefore I see no use of a UTF-16 bases RTL.
I don't think that you would notice any performance difference
to the UTF-8 based RTL.
It would only waste valuable time that can be invested in other things.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Juha Manninen via Lazarus wrote:


On Fri, May 5, 2017 at 9:43 AM, Michael Van Canneyt via Lazarus
 wrote:

What tricks do you still need in 3.0.x ?


The annoying tricky part with our UTF-8 solution is the assignment of
Unicode string literals.
With UTF-8 BOM it does not work at all, as discussed here.
Without BOM it depends on string type + compiler settings in an illogical way.
We would need a more robust solution for that. Do you have ideas?


I propose to let the compiler observe the BOM. 
But I don't think more is needed.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus

On Fri, May 5, 2017 at 9:43 AM, Michael Van Canneyt via Lazarus
 wrote:
> What tricks do you still need in 3.0.x ?

The annoying tricky part with our UTF-8 solution is the assignment of
Unicode string literals.
With UTF-8 BOM it does not work at all, as discussed here.
Without BOM it depends on string type + compiler settings in an illogical way.
We would need a more robust solution for that. Do you have ideas?

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 12:17:22 +0200
Ondrej Pokorny via Lazarus  wrote:

>[...]
> Embarcadero realized they made a mistake when they disabled (yes, only 
> disabled not removed) 8-byte strings from NEXTGEN compilers. UTF8String 
> and RawByteString are back for all NEXTGEN compilers since 10.1. You can 
> use them in Linux Delphi as well.
> 
> http://andy.jgknet.de/blog/2016/05/system-bytestrings-for-10-1-berlin/

Wow. I guess that means FPC lost the title of
"compiler with most confusing string types".


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 12:01:47 +0200 (CEST)
Michael Van Canneyt via Lazarus  wrote:

>[...]
> > Believe me, I use it in production without any problems: I have 
> > unicode-aware TStrings, I can read files with unicode names, I can do 
> > everything with plain FPC trunk.  
> 
> I am aware of this, I do it myself. 
> But I work on Linux, where UTF8 is the norm.
> 
> So I cannot vouch for other platforms...

It worked on Linux since years.
Since FPC 3.0 it works on Windows too.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 12:08, Mattias Gaertner via Lazarus wrote:

On Fri, 5 May 2017 10:56:41 +0100
Graeme Geldenhuys via Lazarus  wrote:


[...]

or work with large amount of 8-bit strings.

Why would you want to? Unicode supports all languages,

Maybe there is a misunderstanding. Let me rephrase my question:
What string do you use in Linux Delphi when working with UTF-8 strings?


Embarcadero realized they made a mistake when they disabled (yes, only 
disabled not removed) 8-byte strings from NEXTGEN compilers. UTF8String 
and RawByteString are back for all NEXTGEN compilers since 10.1. You can 
use them in Linux Delphi as well.


http://andy.jgknet.de/blog/2016/05/system-bytestrings-for-10-1-berlin/

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 11:01, Michael Van Canneyt via Lazarus wrote:
> We claim Delphi compatibility. 
> So IMHO we must provide a UTF-16 Delphi compatible RTL.

In the end it’s about supporting Unicode. Does it really matter
what internal encoding it is to achieve the “Unicode support”
goal?

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 10:56:41 +0100
Graeme Geldenhuys via Lazarus  wrote:

>[...]
> > or work with large amount of 8-bit strings.  
> 
> Why would you want to? Unicode supports all languages,

Maybe there is a misunderstanding. Let me rephrase my question:
What string do you use in Linux Delphi when working with UTF-8 strings?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:


On 05.05.2017 11:23, Michael Van Canneyt via Lazarus wrote:
Yes, this somewhat alleviates the problem; but this still is a 
single-byte TStrings, as opposed to the WideString

TStrings of Delphi. It's also still a single-byte filename argument.


Yes but you forget that unicode is also single-byte UTF-8. And the 
greatest thing about FPC: it fully supports "DefaultSystemCodePage := 
CP_UTF8".


Therefore you don't need WideString/UnicodeString file arguments and 
UnicodeString-TStrings to have full unicode support in current FPC.


Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string 
is unicode enabled.


Believe me, I use it in production without any problems: I have 
unicode-aware TStrings, I can read files with unicode names, I can do 
everything with plain FPC trunk.


I am aware of this, I do it myself. 
But I work on Linux, where UTF8 is the norm.


So I cannot vouch for other platforms...



I don't need a 100% UTF-16 Delphi-Compatible RTL for that at all.


This is the crux of the problem. 
Is this wanted/needed or do we stick to UTF8 ?


We claim Delphi compatibility. 
So IMHO we must provide a UTF-16 Delphi compatible RTL.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 10:41, Mattias Gaertner via Lazarus wrote:
> I wonder what they do when you need to access the raw 8-bit file names,

OSX, iOS, Android and Linux all use UTF-8 as standard, so filename access
is not going to be any problem. Windows is moving more and more towards
UTF-16 everywhere, so that shouldn't be a problem either.

> or work with large amount of 8-bit strings.

Why would you want to? Unicode supports all languages, there simply is no
need for other non-Unicode encodings any more. If it is memory usage
you are worried about, convert your 8-bit strings as UTF-8 encoded text
(most Western countries text will all use low memory then - compared to
UTF-16 as an alternative).

Java has only supported Unicode since its inception in 1995, and Java runs
everywhere. It's never had a problem running on non-Unicode enabled
platforms.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 10:01:24 +0100
Graeme Geldenhuys via Lazarus  wrote:

>[...]
> > AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
> > many cases, because of double fooling the compiler. This trick does not
> > work on Windows with RTL file functions though.  
> 
> Yes and true, but fpGUI supplies its own "wrapper" RTL file functions, thus
> it works 100% on all platforms for years. I believe LCL used to do the same.

Yes, and with FPC 3.0 many of them are no longer needed.

> RawByteString type (yet another string type in FPC & Delphi's arsenal) did
> not exist at the time, otherwise I would probably have defined...
> 
>   TfpgString = RawByteString;
> 
> and used that everywhere.

How would that help?

> > Of course it would be nicer, if we don't need tricks to get Unicode.  
> 
> Indeed, and that is why I love solutions implemented by Java and Qt
> Framework. They are simple, it works and not confusing.

IMO you are comparing apples and oranges.
The FP compiler provides a very easy Unicode solution - or even two
(UTF-8 and UTF-16). The problem are the old RTL and libs, which are
written for system encoding, not for Unicode.
You can design in FPC an Unicode RTL just like Java and QT. fpgui and
LazUtils are kind of a start of that.
Or you can help FPC finishing the Unicode RTL. So stop complaining and
help them.

> Even Embarcadero
> is doing some string type clean-up. Their new Linux compiler completely
> removed AnsiString support. After all, why do you need any other
> string types when you support the Unicode standard.

That's true for most cases.
I wonder what they do when you need to access the raw 8-bit file names,
or work with large amount of 8-bit strings.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 11:23, Michael Van Canneyt via Lazarus wrote:
Yes, this somewhat alleviates the problem; but this still is a 
single-byte TStrings, as opposed to the WideString

TStrings of Delphi. It's also still a single-byte filename argument.


Yes but you forget that unicode is also single-byte UTF-8. And the 
greatest thing about FPC: it fully supports "DefaultSystemCodePage := 
CP_UTF8".


Therefore you don't need WideString/UnicodeString file arguments and 
UnicodeString-TStrings to have full unicode support in current FPC.


Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string 
is unicode enabled.


Believe me, I use it in production without any problems: I have 
unicode-aware TStrings, I can read files with unicode names, I can do 
everything with plain FPC trunk.


I don't need a 100% UTF-16 Delphi-Compatible RTL for that at all. I can 
do that with current UTF-8 FPC RTL as well. (Honestly I think it's 
better for FPC to stick with UTF-8 and don't overcomplicate the RTL with 
UTF-16 support.)


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:


On 05.05.2017 11:17, Michael Van Canneyt via Lazarus wrote:

On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.


Again, I didn't have time to follow FPC 3.x development much, and I 
was too

confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using 
TStringList and

specify the encoding of the file at load time?

Something like:
 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.


Current trunk 3.1.1 can do that since r34475 - you applied it :) I don't 
know if you ported it back to 3.0.x, though.


As far as I know, it is not backported, but Marco would need to confirm it.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 11:24, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:

Something like:

  sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
  sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
  sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);

Not yet. These are the exceptions I was talking about.


That also means FPC 3.0.x is then seriously flawed. It
supports Unicode, but it also doesn't support Unicode.

So what is the suggested work-around for FPC 3.0.2 to load
various text encoding files into a TStringList? Hopefully
the answer is not: "there is none"  :-/


Use "DefaultSystemCodePage := CP_UTF8" and you can load any text in any 
encoding into TStrings without character loss - the file will be 
converted to UTF-8 in LoadFrom* and converted back in SaveTo*. So your 
code can handle all encodings equally.


There are no limitations and no problems whatsoever. Yes, FPC is fully 
unicode-ready - in case you are fine with using UTF-8 internally!


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
>> Something like: 
>>
>>  sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
>>  sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
>>  sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);
>
> Not yet. These are the exceptions I was talking about.

That also means FPC 3.0.x is then seriously flawed. It
supports Unicode, but it also doesn't support Unicode.

So what is the suggested work-around for FPC 3.0.2 to load
various text encoding files into a TStringList? Hopefully
the answer is not: "there is none"  :-/

Because that will seriously impair/break INI usage too. The
first example off the top of my head. XML and JSON probably
too.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
Something like: 


 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.



That also means FPC 3.0.x is then seriously flawed. It
supports Unicode, but it also doesn't support Unicode.

So what is the suggested work-around for FPC 3.0.2 to load
various text encoding files into a TStringList? Hopefully
the answer is not: "there is none"  :-/


Use the plain pascal routines to read lines from a file, 
fill stringlist. You can write a class helper for it.



Because that will seriously impair/break INI usage too. The
first example off the top of my head. XML and JSON probably
too.


No. Those have been using widestring/UTF8string since day 1.

The main problem to switch the classes unit is backwards compatibility.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 10:17, Ondrej Pokorny via Lazarus wrote:
> I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a 
> patch for it (r34475). 

Fantastic! Glad to see somebody was thinking in the same train of thought
as I did. :)

Is that scheduled to be back-ported to FPC 3.0.x?

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 11:17, Michael Van Canneyt via Lazarus wrote:

On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.


Again, I didn't have time to follow FPC 3.x development much, and I 
was too

confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using 
TStringList and

specify the encoding of the file at load time?

Something like:
 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.


Current trunk 3.1.1 can do that since r34475 - you applied it :) I don't 
know if you ported it back to 3.0.x, though.


TFileStream can also open files with unicode names - at least on Windows 
(since 3.0.0 if I am not mistaken). See


Function FileCreate (Const FileName : UnicodeString; ShareMode : 
Integer; Rights : Integer) : THandle;


in rtl/win/sysutils.pas

-->> There are absolutely no limitations whatsover, AFAIK. At least I 
don't know any and I don't experience any.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:


On 05.05.2017 11:06, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.

Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like:

   sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
   sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
   sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a 
patch for it (r34475). I also extended TEncoding to support AnsiString, 
which was the requirement for TStrings encoding support.


Yes, this somewhat alleviates the problem; 
but this still is a single-byte TStrings, as opposed to the WideString

TStrings of Delphi. It's also still a single-byte filename argument.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 09:59, Michael Schnell via Lazarus wrote:
> (Most obvious drawback: not flexibly typed TStrings.)

I know not everybody likes Generics, but that is where I see
Generics could come in very handy. A single TStrings implementation
that supports multiple string types.

Or just implement a UTF-8 version. ;-)
On a side note:
  I have implemented a UTF-8 version of TStrings & TStringList somewhere
  on my hard drive.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.


Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like: 


 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.
But the FileOpen, Assign,Reset, Write of plain pascal 
do work with both Unicode and plain strings.


To fix the classes issues properly, we need a unicode RTL 
and a ANSI RTL if we wish to remain backwards compatible:

The Strings[] property can have only 1 type.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus


On 05.05.2017 11:06, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.

Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like:

   sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
   sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
   sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a 
patch for it (r34475). I also extended TEncoding to support AnsiString, 
which was the requirement for TStrings encoding support.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 09:31, Kostas Michalopoulos via Lazarus wrote:
> After all, BMP does include practically all languages used today.

The bottom line:

   Unicode Standard <> BMP only!

If you think that, then rather promote your application as a UCS-2
compliant application, not a Unicode compliant application.

I can't remember my exact use case at the time, but the code-points
I needed to work with (using a data dump text file) were outside
the BMP range. I had to use a Java based text editor to correctly
edit the files.

Also, as Mattias said, the Emoji's, Musical notes, Scientific symbols,
Map symbols etc all fall outside the BMP.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
> As far as I know, you don't need any tricks to work with unicode
> filenames or output in 3.0.2. Maybe with exception of TStrings and
> TFileStream.

Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like:  

  sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
  sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
  sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);

etc

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus

On 2017-05-05 00:15, Mattias Gaertner via Lazarus wrote:

> I added a FAQ:
> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#What_happens_when_I_use_.24codepage_utf8.3F

Ah, thanks for that explanation.

> AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
> many cases, because of double fooling the compiler. This trick does not
> work on Windows with RTL file functions though.

Yes and true, but fpGUI supplies its own "wrapper" RTL file functions, thus
it works 100% on all platforms for years. I believe LCL used to do the same.

RawByteString type (yet another string type in FPC & Delphi's arsenal) did
not exist at the time, otherwise I would probably have defined...

  TfpgString = RawByteString;

and used that everywhere.

> Of course it would be nicer, if we don't need tricks to get Unicode.

Indeed, and that is why I love solutions implemented by Java and Qt
Framework. They are simple, it works and not confusing. Even Embarcadero
is doing some string type clean-up. Their new Linux compiler completely
removed AnsiString support. After all, why do you need any other
string types when you support the Unicode standard.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Schnell via Lazarus


On 04.05.2017 16:56, Juha Manninen via Lazarus wrote:

I believe everybody is happy to get rid of the horrendous Windows
If if this is true, there is a decent need for backwards compatibility. 
That is why, theoretically, code aware strings is a good idea. 
Unfortunately the implementation of those, IMHO, is abysmal, as well in 
Delphi, as in fpc. (Most obvious drawback: not flexibly typed TStrings.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus

On Fri, 5 May 2017 11:31:00 +0300
Kostas Michalopoulos via Lazarus  wrote:

>[...]
> To play the devil's advocate, the fact that ALL reviews said that it has
> excellent support for Unicode means that characters outside the BMP *are*
> rare. After all, BMP does include practically all languages used today.
> 
> I mean, it isn't technically correct, it is just that in practice it is
> good enough for a very large number of tasks.

Devil's advocate: The new emojis are outside BMP.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Kostas Michalopoulos via Lazarus

On Thu, May 4, 2017 at 8:53 PM, Graeme Geldenhuys via Lazarus <
lazarus@lists.lazarus-ide.org> wrote:

> On 2017-05-04 15:56, Juha Manninen via Lazarus wrote:
> > I have seen comments saying that treating UTF-16 as fixed width
> > encoding is OK because the characters outside BMP are so rare. It is
> > like saying that a buggy spreadsheet app is OK because it calculates
> > the sums wrong only sometimes. IMO such people should not do
> > programming.
>
> +1
> I purchased a commercial text editor renowned for having excellent
> Unicode support - at least that is what ALL the reviews said. Umm
> yeah, to my disappointment it internally uses UTF-16 (because it is
> written in Delphi), and treats UTF-16 as 2-byte fixed width! WTF!
>


To play the devil's advocate, the fact that ALL reviews said that it has
excellent support for Unicode means that characters outside the BMP *are*
rare. After all, BMP does include practically all languages used today.

I mean, it isn't technically correct, it is just that in practice it is
good enough for a very large number of tasks.
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Michael Van Canneyt via Lazarus




On Fri, 5 May 2017, Mattias Gaertner via Lazarus wrote:


I simply can't see myself moving past FPC 2.6.4 at this point. FPC 3.x just 
doesn't make any sense.


AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
many cases, because of double fooling the compiler. This trick does not
work on Windows with RTL file functions though. 
The good news is that the same trick works in FPC 3.0. And with some new

tricks it is now possible to make the RTL file functions support UTF-8.
Of course it would be nicer, if we don't need tricks to get Unicode.


What tricks do you still need in 3.0.x ?

As far as I know, you don't need any tricks to work with unicode filenames
or output in 3.0.2. Maybe with exception of TStrings and TFileStream.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Mattias Gaertner via Lazarus

On Thu, 4 May 2017 23:20:33 +0100
Graeme Geldenhuys via Lazarus  wrote:

>[...]
> Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8} at the 
> beginning of the unit.
> 
>   * Note: This changes all string literals to UTF-16, increasing the size of 
> the binary and slowing it down. That's why Lazarus does not add it by default.
> "
> 
> That makes NO sense to me. My units are always saved as UTF-8 encoded text. 
> By helping the compiler out by explicitly telling it my files are UTF-8 
> encoded using -FcUTF8 or adding {$codepage UTF8} or saving the unit with a 
> BOM marker breaks writeln() output under Linux/FreeBSD. Who knows what else 
> it breaks. Apparently the breakage is because of the "NOTE" quoted above. Why 
> the hell does FPC consider string literals UTF-16 when I explicitly told it 
> the whole unit is UTF-8 encoded? FPC is doing the opposite of what I told it!

I added a FAQ:
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#What_happens_when_I_use_.24codepage_utf8.3F

 
> I simply can't see myself moving past FPC 2.6.4 at this point. FPC 3.x just 
> doesn't make any sense.

AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
many cases, because of double fooling the compiler. This trick does not
work on Windows with RTL file functions though. 
The good news is that the same trick works in FPC 3.0. And with some new
tricks it is now possible to make the RTL file functions support UTF-8.
Of course it would be nicer, if we don't need tricks to get Unicode.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Graeme Geldenhuys via Lazarus

On 2017-05-04 21:53, Juha Manninen via Lazarus wrote:
> It is briefly explained here:

I haven't been following FPC 3.x development much because I think the Unicode 
changes are terribly confusing.

A case in point. Looking at the Wiki page you listed, I read the following:

"
Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8} at the 
beginning of the unit.

  * Note: This changes all string literals to UTF-16, increasing the size of 
the binary and slowing it down. That's why Lazarus does not add it by default.
"

That makes NO sense to me. My units are always saved as UTF-8 encoded text. By 
helping the compiler out by explicitly telling it my files are UTF-8 encoded 
using -FcUTF8 or adding {$codepage UTF8} or saving the unit with a BOM marker 
breaks writeln() output under Linux/FreeBSD. Who knows what else it breaks. 
Apparently the breakage is because of the "NOTE" quoted above. Why the hell 
does FPC consider string literals UTF-16 when I explicitly told it the whole 
unit is UTF-8 encoded? FPC is doing the opposite of what I told it!

I simply can't see myself moving past FPC 2.6.4 at this point. FPC 3.x just 
doesn't make any sense.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Juha Manninen via Lazarus

On Thu, May 4, 2017 at 8:53 PM, Graeme Geldenhuys via Lazarus
 wrote:
> You made me curious, so I want to take a look. Hopefully it doesn’t
> depend too heavily on the rest of LCL, so I’ll be able to use it in
> other projects of mine.

It has no dependency for LCL, it is part of LazUtils package just like
the other Unicode stuff.
It can be used also with Delphi by copying 2 unit files.
It is briefly explained here:
 
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#CodePoint_functions_for_encoding_agnostic_code

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Graeme Geldenhuys via Lazarus

On 2017-05-04 15:56, Juha Manninen via Lazarus wrote:
> I have seen comments saying that treating UTF-16 as fixed width 
> encoding is OK because the characters outside BMP are so rare. It is 
> like saying that a buggy spreadsheet app is OK because it calculates 
> the sums wrong only sometimes. IMO such people should not do
> programming.

+1
I purchased a commercial text editor renowned for having excellent
Unicode support - at least that is what ALL the reviews said. Umm
yeah, to my disappointment it internally uses UTF-16 (because it is
written in Delphi), and treats UTF-16 as 2-byte fixed width! WTF!

 
> I have not seen any feedback or comments about LazUnicode so far. I
> guess it means that nobody uses it. :(

You made me curious, so I want to take a look. Hopefully it doesn’t
depend too heavily on the rest of LCL, so I’ll be able to use it in
other projects of mine.

Regards,
  Graeme


-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread wkitty42--- via Lazarus


On 05/04/2017 10:56 AM, Juha Manninen via Lazarus wrote:

On Thu, May 4, 2017 at 2:47 PM, wkitty42--- via Lazarus
 wrote:

On 05/03/2017 05:21 AM, Juha Manninen via Lazarus wrote:

Encoding does not matter any more, as long as it is Unicode.


reminds me of a saying that is attributed to Henry Ford...
Any customer can have a car painted any color that he wants so long as it is
black.


Ok, maybe my wording was not good.


i was making a joke... my apologies if it didn't make the great divide... you're 
good! :) :) :)




--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list* unless
   private contact is specifically requested and granted.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Juha Manninen via Lazarus

On Thu, May 4, 2017 at 2:47 PM, wkitty42--- via Lazarus
 wrote:
> On 05/03/2017 05:21 AM, Juha Manninen via Lazarus wrote:
>> Encoding does not matter any more, as long as it is Unicode.
>
> reminds me of a saying that is attributed to Henry Ford...
> Any customer can have a car painted any color that he wants so long as it is
> black.

Ok, maybe my wording was not good.
I believe everybody is happy to get rid of the horrendous Windows
system codepages and the question marks in text.
Unicode is a good thing!
The LazUnicode unit provides a solution for encoding agnostic code.
Yes. It is no joke.

There has been many wars about UTF-8 <> UTF-16 encoding supremacy.
They look ridiculous for anybody who knows the complexity of Unicode.
The complexity is not related to encodings, it is at the abstract
Unicode level where encodings have no effect.

Currently Delphi and Lazarus use a different encoding. Some people see
it as a problem. LazUnicode proves it is not a problem. You can write
code that works with both. Codepoints in both encodings have variable
width, thus they must use the same fundamental concepts. It was easy
to write functions to support them both.
However LazUnicode does not solve the problems at abstract Unicode
level and nobody claims so.

Another fact is that lots of UTF-16 code out there is broken because
people treat UTF-16 as fixed width encoding for some reason. Using
LazUnicode unit improves the situation. The code will inevitably work
right with both encodings.
I have seen comments saying that treating UTF-16 as fixed width
encoding is OK because the characters outside BMP are so rare. It is
like saying that a buggy spreadsheet app is OK because it calculates
the sums wrong only sometimes.
IMO such people should not do programming.

I have not seen any feedback or comments about LazUnicode so far.
I guess it means that nobody uses it. :(

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread wkitty42--- via Lazarus


On 05/03/2017 05:21 AM, Juha Manninen via Lazarus wrote:

Encoding does not matter any more, as long as it is Unicode.


reminds me of a saying that is attributed to Henry Ford...


Any customer can have a car painted any color that he wants so long as it is 
black.


:)


--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list* unless
   private contact is specifically requested and granted.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Mattias Gaertner via Lazarus

On Thu, 4 May 2017 09:56:18 +0100
Tony Whyman via Lazarus  wrote:

>[...]
> I don't believe that string indexing even works for UTF8 strings at 
> present - at least not in a simple s[i] way.

It exists the same as for UTF-16 strings.

> Is it really that much overhead to have a simple codepage check before 
> calling the correct function to index a string? The obvious optimisation 
> would be to check for UTF8, then UTF16 then the Default codepage and 
> then the rest. Or perhaps UTF16 first for Windows. With register level 
> code you are talking about very few actual machine level operations.

The char type does not fit widechar. You would need widechar.

And in most cases the [] are used in loops. The compiler would have
to add checks on each access. It would be faster to convert the string
at the beginning to UnicodeString and back at the end.
A technique that many RTL functions do to support any string type.

> To me, a unified string type would have the advantage that:
> 
> - You would only have one managed string type "string" (and hence avoids 
> the confusion that exists today).

You can avoid the confusion by using only one string encoding,
either UTF-8 or UTF-16. The problem is that existing libraries often
support only one.

>[...]> - The only time that a programmer has to think about the character 
> encoding is when writing code that interacts directly with an external 
> interface.

That's already possible. With LazUTF8.
The problem is legacy code and sharing code with Delphi.

>[...]

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Graeme Geldenhuys via Lazarus

On 2017-05-04 09:56, Tony Whyman via Lazarus wrote:
> I don't believe that string indexing even works for UTF8 strings at 
> present - at least not in a simple s[i] way.

It's simple, STOP using index arrays into strings. It doesn't work for
Unicode! Use specialised code-point iterators or something similar instead.

If you expect a Byte value from s[i] then fine, but if you expect a
"character" (like something you see on the screen), then no it will
never work. Why?  See below:

* UTF-16 will return a 2-byte value which isn't big enough to cover the
full Unicode range BMP and above.

* UTF-8 will return a 1-byte value which again isn't big enough to cover
all possible code points in Unicode. For UTF-8 it could be anything from
1-4 bytes.

* A "character seen on the screen" could be made up of multiple code
points. eg: U+0065 (e) + U+0302 (^) gives you ê. So it might look like
one "character", it is *not*. How is arraying indexing into a string
supposed to handle this? It can't, unless it first normalises all
Unicode strings, but even that will not work in all cases - because not
all combining code points can be normalised.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Graeme Geldenhuys via Lazarus

On 2017-05-04 10:15, Graeme Geldenhuys via Lazarus wrote:
> * A "character seen on the screen" could be made up of multiple code
> points. eg: U+0065 (e) + U+0302 (^) gives you ê. So it might look like
> one "character", it is *not*.

Applying better typography to that representation would yield:

U+0065 (e) + U+0302 (̂◌) = ê.

The “DejaVu Sans” font will render the above correctly.

And here is a list of Combining Diacritics, and as I mentioned before,
not all of them can be normalised into a single code-point “character”.
Also some languages even use multiple combining diacritics for a single
"character on the screen".

  https://en.wikipedia.org/wiki/Combining_Diacritical_Marks

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Tony Whyman via Lazarus


On 03/05/17 17:53, Sven Barth via Lazarus wrote:


Am 03.05.2017 14:37 schrieb "Tony Whyman via Lazarus" 
mailto:lazarus@lists.lazarus-ide.org>>:
> On the other hand, AnsiString and UnicodeString are still separate 
types. Why? Why should there not be a single unified string type with 
(e.g.) ASCII, UTF8 and UTF-16 (or MS Unicode) being just another code 
page?


Because indexed access to the string data would slow down quite a bit 
as the RTL would need to determine whether the string is a 1-Byte, 
2-Byte, 4-Byte or multi Byte String. Yes the compiler could do 
optimizations for this inside loops, but it would definitely slow down 
-O- code.


Regards,
Sven





I don't believe that string indexing even works for UTF8 strings at 
present - at least not in a simple s[i] way.


Is it really that much overhead to have a simple codepage check before 
calling the correct function to index a string? The obvious optimisation 
would be to check for UTF8, then UTF16 then the Default codepage and 
then the rest. Or perhaps UTF16 first for Windows. With register level 
code you are talking about very few actual machine level operations.


To me, a unified string type would have the advantage that:

- You would only have one managed string type "string" (and hence avoids 
the confusion that exists today).


- You would have standard string byte length and string character length 
functions (which yes, in the latter case, would have to have a codepage 
check as above).


- String indexing could be standardised as always returning the 
character at position 'i' (including UTF8 strings - albeit after having 
to "walk" the string).


- Automatic transliteration on string compare (with code page check of 
course) - and perhaps with the option to specific a non-standard collation.


- Readily portable code.

- The only time that a programmer has to think about the character 
encoding is when writing code that interacts directly with an external 
interface.


How often would that extra lookup be significant compared with the 
benefits that unified string handling would bring? And, there is no 
reason why you could not retain the UnicodeString type for cases where 
you really need to optimise UTF16 handling.


I see the unified string type as a further extension to AnsiString to 
include UTF16 and UCS2 code pages together with appropriate function 
support.


Tony


-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-04 Thread Juha Manninen via Lazarus

On Thu, May 4, 2017 at 12:55 AM, Graeme Geldenhuys via Lazarus
 wrote:
> On 2017-05-03 20:47, Juha Manninen via Lazarus wrote:
>> If you share and edit the sources between Delphi and Lazarus then you
>> cannot use the full Unicode.
>
> Quite comical considering that the FPC team always makes such a big fuss
> about "we want Delphi compatibility", and now it seems to be worse than
> ever before.

Well, the code + its string literals are compatible. If you copy/paste
them, they compile and work. Now only this small invisible BOM screws
up things.

Actually the IDE could have an option to remove the BOM. No scripts or
packages are needed for that. It should be easy to implement. I will
look at it later, hold on ...

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Marcos Douglas B. Santos via Lazarus

On Wed, May 3, 2017 at 9:37 AM, Mattias Gaertner via Lazarus
 wrote:
>> Hmmm... why does FPC not understand the BOM?
>
> It does. And so does Delphi. But with and without BOM have different
> meanings.
>
>
>> > You are right, that using Unicode with Lazarus only needs a couple of
>> > rules to follow. Sharing code with Delphi adds a few more rules.
>>
>> One valid choice is to edit in Lazarus and copy to Delphi only to be
>> built. I understood Marcos Douglas planned something like that.
>> If code really must be edited in both, how to solve it?
>
> That's what I tried to explain to Marcos and what you described as
> "That must be very confusing."
>
>
>> This was another complication I did not think about. :(
>>
>> About the couple of rules to follow, I had these in mind:
>>  1. Normally use type "String".
>>  2. Assign a constant always to a type String variable.
>>  3. Use type UnicodeString explicitly for API calls that need it.
>
> 4. When sharing code with Delphi use BOM or use only ASCII constants.
> You can choose for each unit. Either way load texts using
> resourcestring or similar techniques.
>
> 5. What about Char and aString[]?
> 6. What about PChar?

I got it.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Marcos Douglas B. Santos via Lazarus

On Wed, May 3, 2017 at 6:13 AM, Ondrej Pokorny via Lazarus
 wrote:
> Not if you need pre-unicode Delphi support :)
>
> (Well, Marcos didn't specify what Delphi version he wants to target but he
> stated "If Delphi sources don't use UTF8 [...]", which applies to
> pre-unicode Delphi versions.)

Yeah, sorry. I said that because I didn't know which encoding Delphi
is using nowadays.
But I would like to use only Delphi with Unicode support with
"dot.unit.support".

Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Graeme Geldenhuys via Lazarus

On 2017-05-03 20:47, Juha Manninen via Lazarus wrote:
> If you share and edit the sources between Delphi and Lazarus then you
> cannot use the full Unicode. 

Quite comical considering that the FPC team always makes such a big fuss
about "we want Delphi compatibility", and now it seems to be worse than
ever before.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Sven Barth via Lazarus

On 03.05.2017 21:47, Juha Manninen via Lazarus wrote:
> Why Delphi adds the BOM? Why can't it just read the file and
> understand it is UTF-8?

Probably for the same reason as FPC: the default code page if no BOM is
available and no command line option is set and no $codepage directive
is found is ISO-8859-1.
A BOM does the same as the command line option -FcUTF8 or the directive
{$codepage utf8}, namely switching the source codepage to UTF-8. Only
then string constants that contain UTF-8 characters are converted to
UnicodeString constants.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Ondrej Pokorny via Lazarus


On 03.05.2017 21:47, Juha Manninen via Lazarus wrote:

How many people are editing their sources in both Delphi and Lazarus?


Me, but I keep the files ASCII-only because I need to target all Delphi 
versions down to D5 :/ My customers really demand it, unfortunately. I'd 
like to kill these dinosaurs, believe me - not the customers but the 
thousand different Delphi versions from ancient eras, during the best 
period even 2 of them in a year :)


I still feel that code should be in English, including comments - and I 
am a patriot, even if it is not popular nowadays :). The utf-8 bom seems 
to be a stupid issue, though.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Juha Manninen via Lazarus

On Wed, May 3, 2017 at 12:03 PM, Juha Manninen
 wrote:
> Marcos Douglas B. Santos wrote:
>> But if I put theses constants as resourcestrings, it's Ok as Mattias
>> told me, right?
>
> I don't think it makes any difference. You can use the full Unicode in
> both cases.

I stand corrected.
If you share and edit the sources between Delphi and Lazarus then you
cannot use the full Unicode. Resourcestrings work for translated
Unicode texts.
If you edit only with Lazarus and copy the files for Delphi
compilation then you can use full Unicode.

Actually I now remember testing with UTF-8 BOM and noticing the same
issue but I forgot already. This is an unfortunate side-effect of the
solution being a hack.

It is so close to being easily portable.
Why Delphi adds the BOM? Why can't it just read the file and
understand it is UTF-8?
If somebody is interested, this could be solved by removing the BOM
from all files by a script before opening again in Lazarus. It could
even be a plugin package in Lazarus. It would be relatively easy to
implement.
How many people are editing their sources in both Delphi and Lazarus?

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Sven Barth via Lazarus

Am 03.05.2017 14:37 schrieb "Tony Whyman via Lazarus" <
lazarus@lists.lazarus-ide.org>:
> On the other hand, AnsiString and UnicodeString are still separate types.
Why? Why should there not be a single unified string type with (e.g.)
ASCII, UTF8 and UTF-16 (or MS Unicode) being just another code page?

Because indexed access to the string data would slow down quite a bit as
the RTL would need to determine whether the string is a 1-Byte, 2-Byte,
4-Byte or multi Byte String. Yes the compiler could do optimizations for
this inside loops, but it would definitely slow down -O- code.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Sven Barth via Lazarus

Am 03.05.2017 11:34 schrieb "Graeme Geldenhuys via Lazarus" <
lazarus@lists.lazarus-ide.org>:
> For example, take a look at ConEmu for Windows.
>   * Tab support built-in
>   * Resizeable console windows

While not point and click you can resize console windows (and the window
buffer) without problems. And it's even remembered *per shortcut*!

>   * User defined encoding per console window

While not selectable upon start (I think) it can easily be changed with a
command.

>   * Font choice

Supported since at least XP

>   * better mouse & clipboard support

If you use the PowerShell window that experience is already vastly
improved. Could be that this is the case for the default console window in
Windows 10 as well.

>   * User defined "console engine" per window or tab.
> eg: I can have Bash run in one tab and the standard
>  windows console in another.

No one stops you from running a different shell in a command window (e.g. I
often switch to cmd inside PowerShell when some program give me trouble
with the latter).

>   * color customisation

Already supported since at least Windows 7 (can't currently check older
versions).

> I don't know why anybody would still want to run the standard Windows
> console - it is 20 years behind everybody else.

Because it does its job and its part of the OS without the need to install
some 3rd party application.

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Graeme Geldenhuys via Lazarus

On 2017-05-03 13:37, Tony Whyman via Lazarus wrote:
> Is Delphi/FPC string handling that much worse than 'C'?

I can’t answer about C, but compared to Java and Qt’s solution, Delphi
and FPC’s solutions are terrible and very confusing.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Martok via Lazarus

Am 03.05.2017 um 11:03 schrieb Juha Manninen via Lazarus:
> How could this thing be communicated so that people understand?
It would probably help if there weren't three different pages about "Unicode
Support" on the wiki, all saying slightly different and conflicting things
(because they talk about different things, but that's really not obvious unless
you already know) and decidedly *not* saying what a user might want to know...

Maybe split the technical internals from a "simpler" user's guide?


Martok

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Mattias Gaertner via Lazarus

On Wed, 3 May 2017 13:37:24 +0100
Tony Whyman via Lazarus  wrote:

>[...]
> On the other hand, AnsiString and UnicodeString are still separate 
> types. Why? Why should there not be a single unified string type with 
> (e.g.) ASCII, UTF8 and UTF-16 (or MS Unicode) being just another code page?

Many 8bit string functions work for any 8bit encoding, and so do many
16bit string functions for any 16bit encoding. But almost no string
function works for both.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Tony Whyman via Lazarus


On 03/05/17 09:52, Graeme Geldenhuys via Lazarus wrote:

[rant]
ps:
   Both FPC and Delphi is in such a messed up state when it comes to
   string and character types. It is the laughing stock of programming
   languages at the moment. At least EMBT is heading in the right
   direction with their Linux Delphi compiler - they completely removed
   AnsiString.

   FPC and Delphi can learn a huge lesson from Java and Qt in how to
   handle string and character types.
[/rant]

Regards,
   Graeme

Is Delphi/FPC string handling that much worse than 'C'?

To me, the great thing about AnsiString is that it provides unified 
handling of UTF8 and legacy codepages in a single managed type by 
including the code page id as a dynamic property of the string.


On the other hand, AnsiString and UnicodeString are still separate 
types. Why? Why should there not be a single unified string type with 
(e.g.) ASCII, UTF8 and UTF-16 (or MS Unicode) being just another code page?

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Mattias Gaertner via Lazarus

On Wed, 3 May 2017 15:15:53 +0300
Juha Manninen via Lazarus  wrote:

>[...]
> > Back in Lazarus compiling such a file gives the error:
> > LazUnicodeTest.lpr(28,10) Error: UTF-8 code greater than 65535 found  
> 
> Äh, I did not test moving it back to Lazarus.

Well, that's the point of sharing code, isn't it?


> Hmmm... why does FPC not understand the BOM?

It does. And so does Delphi. But with and without BOM have different
meanings.

 
> > You are right, that using Unicode with Lazarus only needs a couple of
> > rules to follow. Sharing code with Delphi adds a few more rules.  
> 
> One valid choice is to edit in Lazarus and copy to Delphi only to be
> built. I understood Marcos Douglas planned something like that.
> If code really must be edited in both, how to solve it?

That's what I tried to explain to Marcos and what you described as
"That must be very confusing."


> This was another complication I did not think about. :(
> 
> About the couple of rules to follow, I had these in mind:
>  1. Normally use type "String".
>  2. Assign a constant always to a type String variable.
>  3. Use type UnicodeString explicitly for API calls that need it.

4. When sharing code with Delphi use BOM or use only ASCII constants.
You can choose for each unit. Either way load texts using
resourcestring or similar techniques.

5. What about Char and aString[]?
6. What about PChar?

 
> They are not listed as a short list in wiki. The wiki page is more detailed.
> Maybe they should be listed as a "Getting started" paragraph.

Good idea.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Marcos Douglas B. Santos via Lazarus

On Wed, May 3, 2017 at 5:59 AM, Graeme Geldenhuys via Lazarus
 wrote:
> On 2017-05-03 01:21, Marcos Douglas B. Santos via Lazarus wrote:
>> Sorry about that. I stopped using Delphi at version 7, that uses ANSI.
>> I thought that Delphi nowadays was using UTF16.
>
> They (Delphi) loves to follow Microsoft. Files are stored in UTF-8 (this
> is the norm), but they use UTF-16 internally.
>
> Lazarus stores files in UTF-8 and uses UTF-8 internally.
>
> Some background info
> 
> UTF-16 was the first encoding implementation for Unicode - at a time
> when they thought 2-bytes will be big enough for everything. They were
> wrong. So then they invented UTF-8 to solve the problem. But by that
> time Microsoft already standardised on UTF-16, so Delphi followed suite.
> Linux, FreeBSD etc saw the light and used UTF-8 instead.

Ok, thank you for these informations.

best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Juha Manninen via Lazarus

On Wed, May 3, 2017 at 2:43 PM, Mattias Gaertner via Lazarus
 wrote:
> First it did not compile in Delphi, because of an unsupported inline. I
> fix that with an IFDEF FPC.

Right, I had added that after testing with Delphi.
The inline looks good to me, don't know why Delphi does not like it.

> Then it runs. The output is somewhat hard to interpret as the
> Windows console shows many chars as '?' and the writelns do not
> explain what it is supposed to show.
> The good news is that it works.

Yes, I guess it should be a GUI app. The console is the only place in
Windows not supporting Unicode still. Damn MS, they have supported
Unicode in other APIs and apps for nearly 20 years already.

> The bad news is, that it only works because Delphi silently altered
> the source file and added the BOM.
> Back in Lazarus compiling such a file gives the error:
> LazUnicodeTest.lpr(28,10) Error: UTF-8 code greater than 65535 found

Äh, I did not test moving it back to Lazarus.
Hmmm... why does FPC not understand the BOM?

> You are right, that using Unicode with Lazarus only needs a couple of
> rules to follow. Sharing code with Delphi adds a few more rules.

One valid choice is to edit in Lazarus and copy to Delphi only to be
built. I understood Marcos Douglas planned something like that.
If code really must be edited in both, how to solve it? This was
another complication I did not think about. :(

About the couple of rules to follow, I had these in mind:
 1. Normally use type "String".
 2. Assign a constant always to a type String variable.
 3. Use type UnicodeString explicitly for API calls that need it.

They are not listed as a short list in wiki. The wiki page is more detailed.
Maybe they should be listed as a "Getting started" paragraph.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Mattias Gaertner via Lazarus

On Wed, 3 May 2017 12:03:41 +0300
Juha Manninen via Lazarus  wrote:

>[...]
> Please also look at program LazUnicodeTest in components/lazutils/test/.
> It does advanced Unicode stuff and works in both Delphi and Lazarus.

I tried it:

First it did not compile in Delphi, because of an unsupported inline. I
fix that with an IFDEF FPC.

Then it runs. The output is somewhat hard to interpret as the
Windows console shows many chars as '?' and the writelns do not
explain what it is supposed to show. 
The good news is that it works.

The bad news is, that it only works because Delphi silently altered
the source file and added the BOM.

Back in Lazarus compiling such a file gives the error:
LazUnicodeTest.lpr(28,10) Error: UTF-8 code greater than 65535 found

> It means any code dealing with Unicode can do it.
> 
> How could this thing be communicated so that people understand?
> Why other Lazarus developers don't want to mention it?
> I am puzzled. :(

You are right, that using Unicode with Lazarus only needs a couple of
rules to follow. Sharing code with Delphi adds a few more rules.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Mattias Gaertner via Lazarus

On Wed, 3 May 2017 12:03:41 +0300
Juha Manninen via Lazarus  wrote:

>[...]
> Mattias Gaertner wrote:
> > Option a) You can use English in sources and load all non ASCII
> > constants via resourcestrings or similar. Then the codepage is
> > irrelevant.
> > Option b) You can store all files as UTF-8 with BOM. Then FPC will
> > store all non ASCII string constants as unicodestrings. Be careful when
> > using PChar with them. This adds implicit conversions, so it might be
> > slower.  
> 
> That must be very confusing. Why didn't you just tell him to use the
> default Unicode support in Lazarus which allows to write Delphi
> compatible code, just by remembering couple of rules.

The "default Unicode support in Lazarus" is not compatible with Delphi.
For compatibility it needs the BOM.

Do you have a link where the couple of rules are listed?


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Jürgen Hestermann via Lazarus


Am 2017-05-03 um 11:34 schrieb Graeme Geldenhuys via Lazarus:
> I don't know why anybody would still want to run the standard Windows
> console - it is 20 years behind everybody else.

The reason: It is available on every Windows machine.
The alternatives need to be installed first
so scripts designed for them don't work out of the box.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Juha Manninen via Lazarus

On Wed, May 3, 2017 at 12:13 PM, Ondrej Pokorny via Lazarus
 wrote:
> Not if you need pre-unicode Delphi support :)

Ok, true. IMO such old Delphi versions should not be used any more for new code.
Maintenance tasks only I think.

Fortunately there is again a free Delphi Starter edition. It means
anybody can use the latest version. Things are surely getting better!

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Ondrej Pokorny via Lazarus


On 03.05.2017 11:21, Juha Manninen via Lazarus wrote:
Windows already supports Unicode in everything ... except for console 
output! Why is that?


You can start the console with UTF-8 codepage: 
http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default


Then you have full unicode (utf-8) support.

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Graeme Geldenhuys via Lazarus

On 2017-05-03 10:25, Ondrej Pokorny via Lazarus wrote:
> You can start the console with UTF-8 codepage: 
> http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default
> 
> Then you have full unicode (utf-8) support.

Or use the much better console alternatives. The Windows platform is
finally catching on to what was been available in X11 environments for
years - multiple choice on consoles.

For example, take a look at ConEmu for Windows.
  * Tab support built-in
  * Resizeable console windows
  * User defined encoding per console window
  * Font choice
  * better mouse & clipboard support
  * User defined "console engine" per window or tab.
eg: I can have Bash run in one tab and the standard
 windows console in another.
  * color customisation
  * transparency support

  https://sourceforge.net/projects/conemu/

I don't know why anybody would still want to run the standard Windows
console - it is 20 years behind everybody else.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Juha Manninen via Lazarus

On Wed, May 3, 2017 at 11:52 AM, Graeme Geldenhuys via Lazarus
 wrote:
> At least EMBT is heading in the right direction with their
> Linux Delphi compiler - they completely removed AnsiString.

I must agree with you. I hope it will be removed in (far) future when
nobody uses the old Windows system codepages any more.
Windows already supports Unicode in everything ... except for console
output! Why is that?

Anyway, please let's leave out encoding supremacy issues now.
My point has been that our Unicode solution makes the encoding issues
irrelevent. It is almost compatible at source level despite the
different encodings.
Think how improbable that is, yet it works.
See also the encoding agnostic support provided by LazUnicode.

Encoding does not matter any more, as long as it is Unicode.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Ondrej Pokorny via Lazarus


On 03.05.2017 11:03, Juha Manninen via Lazarus wrote:

I am puzzled why there were so many misleading and confusing replies,
also from knowledgeable Lazarus developers.
Remember, the question was about making sources compatible with Delphi.
The person (Marcos Douglas) did not know details of how strings work
in Delphi and Lazarus.
Now we finally have a system that allows (more or less) compatible
code when using Unicode. Why was it not even mentioned by you guys?

For example:

Ondrej Pokorny wrote:

Speaking from my experience, the only approach (not only the best one but
the only one) is not to use characters above #255.

Nonsense. Full Unicode is supported.


Not if you need pre-unicode Delphi support :)

(Well, Marcos didn't specify what Delphi version he wants to target but 
he stated "If Delphi sources don't use UTF8 [...]", which applies to 
pre-unicode Delphi versions.)


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Juha Manninen via Lazarus

Marcos Douglas B. Santos wrote:
> I am develop on Windows. What problems do you mean?

Unicode is recommended also on Windows. No worries. You don't need to
use the old system codepages.
People who need them must convert them explicitly because the Unicode
system of Lazarus does not support them directly.

> Sorry about that. I stopped using Delphi at version 7, that uses ANSI.
> I thought that Delphi nowadays was using UTF16.
> I will install Delphi Tokyo Starter and discover these things.

You confuse separate things now.
The encoding of their "String" is now UTF-16.
Source files are saved as UTF-8.

> But if I put theses constants as resourcestrings, it's Ok as Mattias
> told me, right?

I don't think it makes any difference. You can use the full Unicode in
both cases.

---
I am puzzled why there were so many misleading and confusing replies,
also from knowledgeable Lazarus developers.
Remember, the question was about making sources compatible with Delphi.
The person (Marcos Douglas) did not know details of how strings work
in Delphi and Lazarus.
Now we finally have a system that allows (more or less) compatible
code when using Unicode. Why was it not even mentioned by you guys?

For example:

Ondrej Pokorny wrote:
> Speaking from my experience, the only approach (not only the best one but
> the only one) is not to use characters above #255.

Nonsense. Full Unicode is supported.

Mattias Gaertner wrote:
> Option a) You can use English in sources and load all non ASCII
> constants via resourcestrings or similar. Then the codepage is
> irrelevant.
> Option b) You can store all files as UTF-8 with BOM. Then FPC will
> store all non ASCII string constants as unicodestrings. Be careful when
> using PChar with them. This adds implicit conversions, so it might be
> slower.

That must be very confusing. Why didn't you just tell him to use the
default Unicode support in Lazarus which allows to write Delphi
compatible code, just by remembering couple of rules.

Also Tony's advice to use AnsiString explicitly is quite irresponsible
for a person who looks for Delphi compatibility.
AnsiString is not Delphi compatible any more in our system and it
brings you back to stone-age in Delphi, to the horrors of system
codepages.

Is it possible that people still don't know how Delphi compatible the
Lazarus Unicode system is (unless you need the old system codepages
obviously)?
For example Lazarus developer Werner (wp) didn't know the Ansi...()
string functions, like AnsiCompareStr(), are compatible with Delphi.

http://forum.lazarus.freepascal.org/index.php/topic,36664.msg244619.html#msg244619
Yes they are!

Please also look at program LazUnicodeTest in components/lazutils/test/.
It does advanced Unicode stuff and works in both Delphi and Lazarus.
It means any code dealing with Unicode can do it.

How could this thing be communicated so that people understand?
Why other Lazarus developers don't want to mention it?
I am puzzled. :(

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Graeme Geldenhuys via Lazarus

On 2017-05-03 01:21, Marcos Douglas B. Santos via Lazarus wrote:
> Sorry about that. I stopped using Delphi at version 7, that uses ANSI.
> I thought that Delphi nowadays was using UTF16.

They (Delphi) loves to follow Microsoft. Files are stored in UTF-8 (this
is the norm), but they use UTF-16 internally.

Lazarus stores files in UTF-8 and uses UTF-8 internally.

Some background info

UTF-16 was the first encoding implementation for Unicode - at a time
when they thought 2-bytes will be big enough for everything. They were
wrong. So then they invented UTF-8 to solve the problem. But by that
time Microsoft already standardised on UTF-16, so Delphi followed suite.
Linux, FreeBSD etc saw the light and used UTF-8 instead.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-03 Thread Graeme Geldenhuys via Lazarus

On 2017-05-02 09:30, Juha Manninen via Lazarus wrote:
> From FPC's point of view our UTF-8 solution is a hack. 

FPC's point of view or Marco's point of view? Just curious - so what is
FPC’s “correct” solution then for using UTF-8 and the preferred
encoding? What's the alternative they offer?


[rant]
ps:
  Both FPC and Delphi is in such a messed up state when it comes to
  string and character types. It is the laughing stock of programming
  languages at the moment. At least EMBT is heading in the right
  direction with their Linux Delphi compiler - they completely removed
  AnsiString.

  FPC and Delphi can learn a huge lesson from Java and Qt in how to
  handle string and character types.
[/rant]

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-02 Thread Marcos Douglas B. Santos via Lazarus

On Tue, May 2, 2017 at 5:30 AM, Juha Manninen via Lazarus
 wrote:
>>>  1. Assign a constant always to a type String variable.
>>
>> What do you mean? Instead of create a constant, is it better create a
>> String variable and assign the string to it?
>
> From FPC's point of view our UTF-8 solution is a hack. In practice it
> means that success in assigning string literals depends on the string
> type.
> This:
>   S := 'Have 🍷 for FPC 💓 Lazarus';
> always works if "S" is a "String". It may not work with other string types.
> It is all explained in the wiki page.

I understood, thanks.

> When all your string data is Unicode then you can code in a Delphi
> compatible way.
> Only the Windows system codepages impose a problem, but I got an
> impression you don't need them now.

I am develop on Windows. What problems do you mean?


Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-02 Thread Marcos Douglas B. Santos via Lazarus

On Tue, May 2, 2017 at 5:58 AM, Juha Manninen via Lazarus
 wrote:
> On Sun, Apr 30, 2017 at 7:37 PM, Marcos Douglas B. Santos via Lazarus
>  wrote:
>> If Delphi sources don't use UTF8, how is the best way to mantain sources
>> that need to work in both compilers?
>
> I wonder if I have misunderstood something about your questions.
> What means "Delphi sources don't use UTF8"?
> AFAIK they do use UTF8.

Sorry about that. I stopped using Delphi at version 7, that uses ANSI.
I thought that Delphi nowadays was using UTF16.
I will install Delphi Tokyo Starter and discover these things.

> One more clarification about assigning string data:
> The potential problem is only with string literals, constants.
> Assignment between variables always goes right thanks to their dynamic
> encoding in FPC 3+.

But if I put theses constants as resourcestrings, it's Ok as Mattias
told me, right?

Thank you for these tips.

Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-02 Thread Juha Manninen via Lazarus

On Sun, Apr 30, 2017 at 7:37 PM, Marcos Douglas B. Santos via Lazarus
 wrote:
> If Delphi sources don't use UTF8, how is the best way to mantain sources
> that need to work in both compilers?

I wonder if I have misunderstood something about your questions.
What means "Delphi sources don't use UTF8"?
AFAIK they do use UTF8.

One more clarification about assigning string data:
The potential problem is only with string literals, constants.
Assignment between variables always goes right thanks to their dynamic
encoding in FPC 3+.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-02 Thread Juha Manninen via Lazarus

On Tue, May 2, 2017 at 2:30 AM, Marcos Douglas B. Santos via Lazarus
 wrote:
> So, as Mattias said, we should code using ANSI chars and everything will be 
> Ok.

No, you can use all the Unicode freely.
The source files are saved as UTF-8 by default. Delphi does the same,
this detail is also compatible.

>> For Delphi compatible generics you can use FPC trunk and the Generics
>> Collection lib made by Maciej.
>
> Is it part of FPC? If not, could you can post the official URL?

It is part of FPC trunk.

>>  1. Assign a constant always to a type String variable.
>
> What do you mean? Instead of create a constant, is it better create a
> String variable and assign the string to it?

From FPC's point of view our UTF-8 solution is a hack. In practice it
means that success in assigning string literals depends on the string
type.
This:
  S := 'Have 🍷 for FPC 💓 Lazarus';
always works if "S" is a "String". It may not work with other string types.
It is all explained in the wiki page.

When all your string data is Unicode then you can code in a Delphi
compatible way.
Only the Windows system codepages impose a problem, but I got an
impression you don't need them now.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-01 Thread Marcos Douglas B. Santos via Lazarus

On Mon, May 1, 2017 at 11:06 AM, Juha Manninen via Lazarus
 wrote:
> On Mon, May 1, 2017 at 12:30 PM, Tony Whyman via Lazarus
> ...
>
> No! The good idea is to use "String".

I agree.

>> 5. Take care when using string literals.
>> I added
>> {$IFDEF FPC}
>> {$codepage UTF8}
>> {$ENDIF}
>
> Yes, string literals are tricky but usually you should NOT use {$codepage 
> UTF8}.
> It is explained in the wiki page. I will not repeat it here.

So, as Mattias said, we should code using ANSI chars and everything will be Ok.

>> 7. Generics
>
> For Delphi compatible generics you can use FPC trunk and the Generics
> Collection lib made by Maciej.

Is it part of FPC? If not, could you can post the official URL?


>> I hope you find this a useful checklist.
>
> It contained so much false information that it only confuses people. :(
>
> I want to repeat that it is possible to write code dealing with
> Unicode that is fully compatible with Delphi at source level.
> It will be compatible with a future UTF-16 solution in Lazarus as well.
> Encoding agnostic (UTF-8 / UTF-16) code is possible even if you must
> iterate individual codepoints. See the wiki page for details.

That is I wanted to read. Thanks.

Some doubts:

> Remember these to keep your code compatible:
>  1. Normally use type "String".
>  1. Assign a constant always to a type String variable.

What do you mean? Instead of create a constant, is it better create a
String variable and assign the string to it?

>  2. Use type UnicodeString explicitly for API calls that need it.


Best regards,
Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-01 Thread Marcos Douglas B. Santos via Lazarus

On Mon, May 1, 2017 at 8:47 AM, Mattias Gaertner via Lazarus
 wrote:
> Option a) You can use English in sources and load all non ASCII
> constants via resourcestrings or similar. Then the codepage is
> irrelevant.
> Option b) You can store all files as UTF-8 with BOM. Then FPC will
> store all non ASCII string constants as unicodestrings. Be careful when
> using PChar with them. This adds implicit conversions, so it might be
> slower.
>

Maybe option A could be the best.
I did not remember to use resourcestrings... it is a good tip, thanks.

Marcos Douglas
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

1 2 >

1 - 100 of 113 matches

Mail list logo