Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 16:36:51 +0300
Juha Manninen via Lazarus  wrote:

> On Fri, May 5, 2017 at 4:21 PM, Mattias Gaertner via Lazarus
>  wrote:
> > Oops. Which one?  
> 
> The FAQ says:
> "Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8}
> at the beginning of the unit."

I improved it a bit.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus
On Fri, May 5, 2017 at 4:21 PM, Mattias Gaertner via Lazarus
 wrote:
> Oops. Which one?

The FAQ says:
"Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8}
at the beginning of the unit."

The same page in "String Literals" section says:
 "In most cases {$codepage utf8} / -FcUTF8 is not needed."
which is the correct information.

Actually I don't know if that FAQ entry is yours. Many people have
added stuff there. The page is intimidating for a user who just wants
to support Unicode without a fuss.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus
On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
 wrote:
> That is mainly due to the compiler not supporting surrogate pairs for the
> UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
> a problem anymore...

That is a serious bug. Getting codepoints right is the absolute
minimum requirement for Unicode support. Surrogate pairs are the
UTF-16 equivalent of multi-byte codepoints in UTF-8.

Now I understand this was not caused by our UTF-8 run-time switch
"hack". It is a plain bug in FPC.
Is there a plan to fix it?

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 14:12:05 +0300
Juha Manninen via Lazarus  wrote:

>[...]
> Then Mattias adds FAQs contradicting the earlier texts ...

Oops. Which one?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Sven Barth via Lazarus
Am 05.05.2017 13:50 schrieb "Juha Manninen via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> On Fri, May 5, 2017 at 2:29 PM, Michael Van Canneyt via Lazarus
>  wrote:
> > Then what is still the problem ?
>
> With BOM you get:
>  Error: UTF-8 code greater than 65535 found
> which is counter-intuitive when the file and the string literal are both
UTF-8.

That is mainly due to the compiler not supporting surrogate pairs for the
UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't
be a problem anymore... (though of course it would need to be ensured that
other parts of the RTL support surrogate pairs as well)

Regards,
Sven
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 12:49, Juha Manninen via Lazarus wrote:
> A wrong information easily propagates, thus it is important to get this right.

No worries, I agree. Thanks for correcting my terminology.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus
On Fri, May 5, 2017 at 2:02 PM, Graeme Geldenhuys via Lazarus
 wrote:
> If so, when why does LCL also call the above two functions?

Graeme, they are called by LazUtils package, LazUTF8 unit, not by LCL.
It is not limited to GUI programming.
A wrong information easily propagates, thus it is important to get this right.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus
On Fri, May 5, 2017 at 2:29 PM, Michael Van Canneyt via Lazarus
 wrote:
> Then what is still the problem ?

With BOM you get:
 Error: UTF-8 code greater than 65535 found
which is counter-intuitive when the file and the string literal are both UTF-8.
It is related to changing the default codepage at run-time which is a
hack from FPC's POV.
For the same reason we need this grid:
 
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals_Overview
So, it is not only a communication issue. It is truly messy. If only
it could be improved...

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 13:02, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 10:41, Ondrej Pokorny via Lazarus wrote:

Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string
is unicode enabled.

So does that mean you don't have to also call the following two functions 
(which LCL does).

  SetMultiByteConversionCodePage(CP_UTF8);
  SetMultiByteRTLFileSystemCodePage(CP_UTF8);


So doing

DefaultSystemCodePage := CP_UTF8;

is all you need to switch the RTL, FCL and the String data type to UTF-8?

If so, when why does LCL also call the above two functions?


SetMultiByteConversionCodePage does only one thing: it sets 
DefaultSystemCodePage :) So yes, if you set DefaultSystemCodePage you 
don't have to call SetMultiByteConversionCodePage.


You are right - I forgot about 
SetMultiByteRTLFileSystemCodePage/DefaultRTLFileSystemCodePage.


BUT if I take a look into the RTL sources I see that it's used only in 
FindFirst/FindNext, FExpand and GetDir/do_GetDir. And only in the result 
strings. IMO it could be removed and replaced with DefaultSystemCodePage.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 12:17, Mattias Gaertner via Lazarus wrote:
> I wonder if it would help if FPC would store UTF-8 string literals as
> UTF-8 

Yeah, that would be the logical thing to do. FPC not doing that is what
really confused me.

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Mattias Gaertner via Lazarus wrote:


On Fri, 5 May 2017 12:52:48 +0200 (CEST)
Michael Van Canneyt via Lazarus  wrote:


[...]
I propose to let the compiler observe the BOM. 
But I don't think more is needed.


FPC observes the BOM. Same as Delphi.


Then what is still the problem ?



I wonder if it would help if FPC would store UTF-8 string literals as
UTF-8 and how much work that is.


Amount of work is probably not so much.

The question is whether it will cause problems for e.g. the JVM code
generator.

But I still fail to see the actual problem, aside from a lot of confusion by
users.

Confusion arises from a lack of clear information.

So in my opinion, we simply need to provide clear information, before we
start changing things.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 12:52:48 +0200 (CEST)
Michael Van Canneyt via Lazarus  wrote:

>[...]
> I propose to let the compiler observe the BOM. 
> But I don't think more is needed.

FPC observes the BOM. Same as Delphi.

I wonder if it would help if FPC would store UTF-8 string literals as
UTF-8 and how much work that is.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Schnell via Lazarus

On 05.05.2017 12:16, Graeme Geldenhuys via Lazarus wrote:

In the end it’s about supporting Unicode. Does it really matter
what internal encoding it is to achieve the “Unicode support”
goal?


Yep it does.

There are ways around that issue (i.e. code aware strings) but in fact 
these trigger a new bunch of problems.


You might want to read -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 11:55, Jürgen Hestermann via Lazarus wrote:
> I use UTF-8 internally and
> convert to/from UTF-16 for all Windows API functions and
> I never found any problem with it.
> The time that the API functions requires is so much longer than the
> time for string conversion that it does not matter at all.

This is what I've been doing for years, and I agree, it works great.
Windows is also the only platform (of any modern OS) that doesn't
use UTF-8 as standard - so I consider it the minority.


> A situation where it may be a problem is when reading
> (UTF-16 encoded) text files.

I'm yet to find a UTF-16 encoded text file in the wild. I'm not saying
they don't exist, I'm just saying they are extremely rare and more
like an anomaly. UTF-8 seems to rule the roost and the Internet.

This graph should say it all:

https://en.wikipedia.org/wiki/File:Utf8webgrowth.svg

  (source):  https://en.wikipedia.org/wiki/UTF-8

Even so, a simple conversion to UTF-8 at load time should resolve
all possible problems.



Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus
On Fri, May 5, 2017 at 1:20 AM, Graeme Geldenhuys via Lazarus
 wrote:
> A case in point. Looking at the Wiki page you listed, I read the following:
> "
> Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8} at the 
> beginning of the unit.
> ...

Uhhh, the same page in "String Literals" section says:
 "In most cases {$codepage utf8} / -FcUTF8 is not needed."
which is the correct information.

Also this wiki page has become a mess when many people add stuff but
nobody removes any.
For example Michl added the grid about how constant assignment works
with and without {$codepage utf8}. It is nice but he didn't remove the
other paragraphs explaining the same thing. It looks like an extremely
complex topic for a new user, while in reality he should code like
with Delphi + remember only few simple rules.
Then Mattias adds FAQs contradicting the earlier texts ...

The comment from Martok was valid. This page is not good for users who
just want to get started quickly.
I will simplify the page. I will remove stuff and move the FAQ to a
new page. Sorry in advance for people who's text will be removed.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 10:41, Ondrej Pokorny via Lazarus wrote:
> Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string 
> is unicode enabled.

So does that mean you don't have to also call the following two functions 
(which LCL does).

 SetMultiByteConversionCodePage(CP_UTF8);
 SetMultiByteRTLFileSystemCodePage(CP_UTF8);


So doing

   DefaultSystemCodePage := CP_UTF8;

is all you need to switch the RTL, FCL and the String data type to UTF-8?

If so, when why does LCL also call the above two functions?

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 12:55, Jürgen Hestermann via Lazarus wrote:

A situation where it may be a problem is when reading
(UTF-16 encoded) text files.


No, not at all. If you convert the file on the fly, there is almost 0 
performance penalty.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 12:01, Michael Van Canneyt via Lazarus wrote:

On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:
Believe me, I use it in production without any problems: I have 
unicode-aware TStrings, I can read files with unicode names, I can do 
everything with plain FPC trunk.


I am aware of this, I do it myself. But I work on Linux, where UTF8 is 
the norm.


So I cannot vouch for other platforms...


For now I am only on Windows and I have to say loadly: IT WORKS GREAT :)


I don't need a 100% UTF-16 Delphi-Compatible RTL for that at all.


This is the crux of the problem. Is this wanted/needed or do we stick 
to UTF8 ?


We claim Delphi compatibility. So IMHO we must provide a UTF-16 Delphi 
compatible RTL.


I write code that is compatible with FPC and Delphi 5 - 10.2 and it 
works fine. So you already have a Delphi-compatible RTL. The only (well 
documented) difference is that FPC uses single-byte string and Delphi 
uses 2-byte string.


The only place where you need to handle the difference is where you need 
the size of char (when you access string as buffer) - which is 
particularly low-level code:


MyStream.WriteBuffer(MyString[1], Length(MyString) * SizeOf(Char));

-> you need the extra SizeOf(Char) and not a constant (1 for fpc, 2 for 
unicode Delphi).


That's all. All high-level code is compatible already. Good job. I 
really do think it's not worth it to pollute FPC RTL with UnicodeString 
overloads of every function, class etc.


Better to keep 1 clean approach (UTF-8 RTL) and not confuse people with 
2 approaches (UTF-8 vs UTF-16). E.g. how do you want to call the new 
UnicodeString-TStrings class? You have 2 options:
1.) Break compatibility to legacy FPC. (New TStrings will use 
UnicodeString.)

2.) Break compatibility to Delphi. (TStrings will stay with 8-byte string.)

There is no obvious solution for the problem :/

And then if you will introduce a compiler switch to change String from 
1-byte to 2-bytes... Oh no, so much mess and so many variants to care 
about. Really, sometimes it's better to give people no options :) (Or 
have you already introduced the switch?)


Just stick with current utf8 approach that proved well :)

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Jürgen Hestermann via Lazarus

Am 2017-05-05 um 12:16 schrieb Graeme Geldenhuys via Lazarus:
> In the end it’s about supporting Unicode. Does it really matter
> what internal encoding it is to achieve the “Unicode support”
> goal?

From a performance perspective it may be unwanted
to convert string encodings back and forth all the time.

Although, in my file manager I use UTF-8 internally and
convert to/from UTF-16 for all Windows API functions and
I never found any problem with it.
The time that the API functions requires is so much longer than the
time for string conversion that it does not matter at all.
Even fast API-functions like changing attributes only take
a second for thousands of files.

A situation where it may be a problem is when reading
(UTF-16 encoded) text files.
But I never stumbled over such a thing yet.

I would promote the use of UTF-8 whereever possible
while converting to target encodings only when unavoidable.
It makes life much easier if you only concentrate on one (the best)
Unicode encoding (UTF-8).

Therefore I see no use of a UTF-16 bases RTL.
I don't think that you would notice any performance difference
to the UTF-8 based RTL.
It would only waste valuable time that can be invested in other things.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Juha Manninen via Lazarus wrote:


On Fri, May 5, 2017 at 9:43 AM, Michael Van Canneyt via Lazarus
 wrote:

What tricks do you still need in 3.0.x ?


The annoying tricky part with our UTF-8 solution is the assignment of
Unicode string literals.
With UTF-8 BOM it does not work at all, as discussed here.
Without BOM it depends on string type + compiler settings in an illogical way.
We would need a more robust solution for that. Do you have ideas?


I propose to let the compiler observe the BOM. 
But I don't think more is needed.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Juha Manninen via Lazarus
On Fri, May 5, 2017 at 9:43 AM, Michael Van Canneyt via Lazarus
 wrote:
> What tricks do you still need in 3.0.x ?

The annoying tricky part with our UTF-8 solution is the assignment of
Unicode string literals.
With UTF-8 BOM it does not work at all, as discussed here.
Without BOM it depends on string type + compiler settings in an illogical way.
We would need a more robust solution for that. Do you have ideas?

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 12:17:22 +0200
Ondrej Pokorny via Lazarus  wrote:

>[...]
> Embarcadero realized they made a mistake when they disabled (yes, only 
> disabled not removed) 8-byte strings from NEXTGEN compilers. UTF8String 
> and RawByteString are back for all NEXTGEN compilers since 10.1. You can 
> use them in Linux Delphi as well.
> 
> http://andy.jgknet.de/blog/2016/05/system-bytestrings-for-10-1-berlin/

Wow. I guess that means FPC lost the title of
"compiler with most confusing string types".


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 12:01:47 +0200 (CEST)
Michael Van Canneyt via Lazarus  wrote:

>[...]
> > Believe me, I use it in production without any problems: I have 
> > unicode-aware TStrings, I can read files with unicode names, I can do 
> > everything with plain FPC trunk.  
> 
> I am aware of this, I do it myself. 
> But I work on Linux, where UTF8 is the norm.
> 
> So I cannot vouch for other platforms...

It worked on Linux since years.
Since FPC 3.0 it works on Windows too.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 12:08, Mattias Gaertner via Lazarus wrote:

On Fri, 5 May 2017 10:56:41 +0100
Graeme Geldenhuys via Lazarus  wrote:


[...]

or work with large amount of 8-bit strings.

Why would you want to? Unicode supports all languages,

Maybe there is a misunderstanding. Let me rephrase my question:
What string do you use in Linux Delphi when working with UTF-8 strings?


Embarcadero realized they made a mistake when they disabled (yes, only 
disabled not removed) 8-byte strings from NEXTGEN compilers. UTF8String 
and RawByteString are back for all NEXTGEN compilers since 10.1. You can 
use them in Linux Delphi as well.


http://andy.jgknet.de/blog/2016/05/system-bytestrings-for-10-1-berlin/

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 11:01, Michael Van Canneyt via Lazarus wrote:
> We claim Delphi compatibility. 
> So IMHO we must provide a UTF-16 Delphi compatible RTL.

In the end it’s about supporting Unicode. Does it really matter
what internal encoding it is to achieve the “Unicode support”
goal?


Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 10:56:41 +0100
Graeme Geldenhuys via Lazarus  wrote:

>[...]
> > or work with large amount of 8-bit strings.  
> 
> Why would you want to? Unicode supports all languages,

Maybe there is a misunderstanding. Let me rephrase my question:
What string do you use in Linux Delphi when working with UTF-8 strings?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:


On 05.05.2017 11:23, Michael Van Canneyt via Lazarus wrote:
Yes, this somewhat alleviates the problem; but this still is a 
single-byte TStrings, as opposed to the WideString

TStrings of Delphi. It's also still a single-byte filename argument.


Yes but you forget that unicode is also single-byte UTF-8. And the 
greatest thing about FPC: it fully supports "DefaultSystemCodePage := 
CP_UTF8".


Therefore you don't need WideString/UnicodeString file arguments and 
UnicodeString-TStrings to have full unicode support in current FPC.


Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string 
is unicode enabled.


Believe me, I use it in production without any problems: I have 
unicode-aware TStrings, I can read files with unicode names, I can do 
everything with plain FPC trunk.


I am aware of this, I do it myself. 
But I work on Linux, where UTF8 is the norm.


So I cannot vouch for other platforms...



I don't need a 100% UTF-16 Delphi-Compatible RTL for that at all.


This is the crux of the problem. 
Is this wanted/needed or do we stick to UTF8 ?


We claim Delphi compatibility. 
So IMHO we must provide a UTF-16 Delphi compatible RTL.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 10:41, Mattias Gaertner via Lazarus wrote:
> I wonder what they do when you need to access the raw 8-bit file names,


OSX, iOS, Android and Linux all use UTF-8 as standard, so filename access
is not going to be any problem. Windows is moving more and more towards
UTF-16 everywhere, so that shouldn't be a problem either.


> or work with large amount of 8-bit strings.

Why would you want to? Unicode supports all languages, there simply is no
need for other non-Unicode encodings any more. If it is memory usage
you are worried about, convert your 8-bit strings as UTF-8 encoded text
(most Western countries text will all use low memory then - compared to
UTF-16 as an alternative).

Java has only supported Unicode since its inception in 1995, and Java runs
everywhere. It's never had a problem running on non-Unicode enabled
platforms.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 10:01:24 +0100
Graeme Geldenhuys via Lazarus  wrote:

>[...]
> > AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
> > many cases, because of double fooling the compiler. This trick does not
> > work on Windows with RTL file functions though.  
> 
> Yes and true, but fpGUI supplies its own "wrapper" RTL file functions, thus
> it works 100% on all platforms for years. I believe LCL used to do the same.

Yes, and with FPC 3.0 many of them are no longer needed.

 
> RawByteString type (yet another string type in FPC & Delphi's arsenal) did
> not exist at the time, otherwise I would probably have defined...
> 
>   TfpgString = RawByteString;
> 
> and used that everywhere.

How would that help?

  
> > Of course it would be nicer, if we don't need tricks to get Unicode.  
> 
> Indeed, and that is why I love solutions implemented by Java and Qt
> Framework. They are simple, it works and not confusing.

IMO you are comparing apples and oranges.
The FP compiler provides a very easy Unicode solution - or even two
(UTF-8 and UTF-16). The problem are the old RTL and libs, which are
written for system encoding, not for Unicode.
You can design in FPC an Unicode RTL just like Java and QT. fpgui and
LazUtils are kind of a start of that.
Or you can help FPC finishing the Unicode RTL. So stop complaining and
help them.


> Even Embarcadero
> is doing some string type clean-up. Their new Linux compiler completely
> removed AnsiString support. After all, why do you need any other
> string types when you support the Unicode standard.

That's true for most cases.
I wonder what they do when you need to access the raw 8-bit file names,
or work with large amount of 8-bit strings.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 11:23, Michael Van Canneyt via Lazarus wrote:
Yes, this somewhat alleviates the problem; but this still is a 
single-byte TStrings, as opposed to the WideString

TStrings of Delphi. It's also still a single-byte filename argument.


Yes but you forget that unicode is also single-byte UTF-8. And the 
greatest thing about FPC: it fully supports "DefaultSystemCodePage := 
CP_UTF8".


Therefore you don't need WideString/UnicodeString file arguments and 
UnicodeString-TStrings to have full unicode support in current FPC.


Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string 
is unicode enabled.


Believe me, I use it in production without any problems: I have 
unicode-aware TStrings, I can read files with unicode names, I can do 
everything with plain FPC trunk.


I don't need a 100% UTF-16 Delphi-Compatible RTL for that at all. I can 
do that with current UTF-8 FPC RTL as well. (Honestly I think it's 
better for FPC to stick with UTF-8 and don't overcomplicate the RTL with 
UTF-16 support.)


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:


On 05.05.2017 11:17, Michael Van Canneyt via Lazarus wrote:

On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.


Again, I didn't have time to follow FPC 3.x development much, and I 
was too

confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using 
TStringList and

specify the encoding of the file at load time?

Something like:
 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.


Current trunk 3.1.1 can do that since r34475 - you applied it :) I don't 
know if you ported it back to 3.0.x, though.


As far as I know, it is not backported, but Marco would need to confirm it.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 11:24, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:

Something like:

  sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
  sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
  sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);

Not yet. These are the exceptions I was talking about.


That also means FPC 3.0.x is then seriously flawed. It
supports Unicode, but it also doesn't support Unicode.

So what is the suggested work-around for FPC 3.0.2 to load
various text encoding files into a TStringList? Hopefully
the answer is not: "there is none"  :-/


Use "DefaultSystemCodePage := CP_UTF8" and you can load any text in any 
encoding into TStrings without character loss - the file will be 
converted to UTF-8 in LoadFrom* and converted back in SaveTo*. So your 
code can handle all encodings equally.


There are no limitations and no problems whatsoever. Yes, FPC is fully 
unicode-ready - in case you are fine with using UTF-8 internally!


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
>> Something like: 
>>
>>  sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
>>  sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
>>  sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);
>
> Not yet. These are the exceptions I was talking about.


That also means FPC 3.0.x is then seriously flawed. It
supports Unicode, but it also doesn't support Unicode.

So what is the suggested work-around for FPC 3.0.2 to load
various text encoding files into a TStringList? Hopefully
the answer is not: "there is none"  :-/

Because that will seriously impair/break INI usage too. The
first example off the top of my head. XML and JSON probably
too.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
Something like: 


 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.



That also means FPC 3.0.x is then seriously flawed. It
supports Unicode, but it also doesn't support Unicode.

So what is the suggested work-around for FPC 3.0.2 to load
various text encoding files into a TStringList? Hopefully
the answer is not: "there is none"  :-/


Use the plain pascal routines to read lines from a file, 
fill stringlist. You can write a class helper for it.



Because that will seriously impair/break INI usage too. The
first example off the top of my head. XML and JSON probably
too.


No. Those have been using widestring/UTF8string since day 1.

The main problem to switch the classes unit is backwards compatibility.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 10:17, Ondrej Pokorny via Lazarus wrote:
> I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a 
> patch for it (r34475). 

Fantastic! Glad to see somebody was thinking in the same train of thought
as I did. :)

Is that scheduled to be back-ported to FPC 3.0.x?

Regards,
  Graeme

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 11:17, Michael Van Canneyt via Lazarus wrote:

On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.


Again, I didn't have time to follow FPC 3.x development much, and I 
was too

confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using 
TStringList and

specify the encoding of the file at load time?

Something like:
 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.


Current trunk 3.1.1 can do that since r34475 - you applied it :) I don't 
know if you ported it back to 3.0.x, though.


TFileStream can also open files with unicode names - at least on Windows 
(since 3.0.0 if I am not mistaken). See


Function FileCreate (Const FileName : UnicodeString; ShareMode : 
Integer; Rights : Integer) : THandle;


in rtl/win/sysutils.pas

-->> There are absolutely no limitations whatsover, AFAIK. At least I 
don't know any and I don't experience any.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:


On 05.05.2017 11:06, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.

Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like:

   sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
   sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
   sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a 
patch for it (r34475). I also extended TEncoding to support AnsiString, 
which was the requirement for TStrings encoding support.


Yes, this somewhat alleviates the problem; 
but this still is a single-byte TStrings, as opposed to the WideString

TStrings of Delphi. It's also still a single-byte filename argument.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 09:59, Michael Schnell via Lazarus wrote:
> (Most obvious drawback: not flexibly typed TStrings.)

I know not everybody likes Generics, but that is where I see
Generics could come in very handy. A single TStrings implementation
that supports multiple string types.

Or just implement a UTF-8 version. ;-)
On a side note:
  I have implemented a UTF-8 version of TStrings & TStringList somewhere
  on my hard drive.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Van Canneyt via Lazarus



On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:


On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.


Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like: 


 sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
 sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
 sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


Not yet. These are the exceptions I was talking about.
But the FileOpen, Assign,Reset, Write of plain pascal 
do work with both Unicode and plain strings.


To fix the classes issues properly, we need a unicode RTL 
and a ANSI RTL if we wish to remain backwards compatible:

The Strings[] property can have only 1 type.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Ondrej Pokorny via Lazarus

On 05.05.2017 11:06, Graeme Geldenhuys via Lazarus wrote:

On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:

As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.

Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like:

   sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
   sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
   sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);


I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a 
patch for it (r34475). I also extended TEncoding to support AnsiString, 
which was the requirement for TStrings encoding support.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 09:31, Kostas Michalopoulos via Lazarus wrote:
> After all, BMP does include practically all languages used today.

The bottom line:

   Unicode Standard <> BMP only!

If you think that, then rather promote your application as a UCS-2
compliant application, not a Unicode compliant application.

I can't remember my exact use case at the time, but the code-points
I needed to work with (using a data dump text file) were outside
the BMP range. I had to use a Java based text editor to correctly
edit the files.

Also, as Mattias said, the Emoji's, Musical notes, Scientific symbols,
Map symbols etc all fall outside the BMP.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
> As far as I know, you don't need any tricks to work with unicode
> filenames or output in 3.0.2. Maybe with exception of TStrings and
> TFileStream.

Again, I didn't have time to follow FPC 3.x development much, and I was too
confused with all the Unicode changes.

With FPC 3.0.x, can you now load text files from disk using TStringList and
specify the encoding of the file at load time?

Something like:  

  sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
  sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
  sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);

etc

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Graeme Geldenhuys via Lazarus
On 2017-05-05 00:15, Mattias Gaertner via Lazarus wrote:

> I added a FAQ:
> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#What_happens_when_I_use_.24codepage_utf8.3F

Ah, thanks for that explanation.

 
> AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
> many cases, because of double fooling the compiler. This trick does not
> work on Windows with RTL file functions though.

Yes and true, but fpGUI supplies its own "wrapper" RTL file functions, thus
it works 100% on all platforms for years. I believe LCL used to do the same.

RawByteString type (yet another string type in FPC & Delphi's arsenal) did
not exist at the time, otherwise I would probably have defined...

  TfpgString = RawByteString;

and used that everywhere.

 
> Of course it would be nicer, if we don't need tricks to get Unicode.

Indeed, and that is why I love solutions implemented by Java and Qt
Framework. They are simple, it works and not confusing. Even Embarcadero
is doing some string type clean-up. Their new Linux compiler completely
removed AnsiString support. After all, why do you need any other
string types when you support the Unicode standard.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Michael Schnell via Lazarus

On 04.05.2017 16:56, Juha Manninen via Lazarus wrote:

I believe everybody is happy to get rid of the horrendous Windows
If if this is true, there is a decent need for backwards compatibility. 
That is why, theoretically, code aware strings is a good idea. 
Unfortunately the implementation of those, IMHO, is abysmal, as well in 
Delphi, as in fpc. (Most obvious drawback: not flexibly typed TStrings.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Mattias Gaertner via Lazarus
On Fri, 5 May 2017 11:31:00 +0300
Kostas Michalopoulos via Lazarus  wrote:

>[...]
> To play the devil's advocate, the fact that ALL reviews said that it has
> excellent support for Unicode means that characters outside the BMP *are*
> rare. After all, BMP does include practically all languages used today.
> 
> I mean, it isn't technically correct, it is just that in practice it is
> good enough for a very large number of tasks.

Devil's advocate: The new emojis are outside BMP.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

2017-05-05 Thread Kostas Michalopoulos via Lazarus
On Thu, May 4, 2017 at 8:53 PM, Graeme Geldenhuys via Lazarus <
lazarus@lists.lazarus-ide.org> wrote:

> On 2017-05-04 15:56, Juha Manninen via Lazarus wrote:
> > I have seen comments saying that treating UTF-16 as fixed width
> > encoding is OK because the characters outside BMP are so rare. It is
> > like saying that a buggy spreadsheet app is OK because it calculates
> > the sums wrong only sometimes. IMO such people should not do
> > programming.
>
> +1
> I purchased a commercial text editor renowned for having excellent
> Unicode support - at least that is what ALL the reviews said. Umm
> yeah, to my disappointment it internally uses UTF-16 (because it is
> written in Delphi), and treats UTF-16 as 2-byte fixed width! WTF!
>


To play the devil's advocate, the fact that ALL reviews said that it has
excellent support for Unicode means that characters outside the BMP *are*
rare. After all, BMP does include practically all languages used today.

I mean, it isn't technically correct, it is just that in practice it is
good enough for a very large number of tasks.
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus