Re: [Lazarus] lresources.pp(3089, 67) Error: Identifier not found "RT_RCDATA"

2014-11-24 Thread Joost van der Sluis

On 11/23/2014 09:09 PM, waldo kitty wrote:

On 11/22/2014 6:45 PM, Mattias Gaertner wrote:

On Sat, 22 Nov 2014 23:27:30 +0100
Bart  wrote:


On 11/22/14, Joost van der Sluis  wrote:


Add the windows-unit to the uses section of lresources as a quick fix.


Done.


i pulled your update last evening and compiled... it seems to have
worked...


With 'quick-fix' I meant that you could do this change locally to avoid 
the problem. Not that it should be committed as a definitive fix.


I think it's a bug in fpc. Sven?

Regards,

Joost.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell

On 11/23/2014 07:52 PM, Felipe Monteiro de Carvalho wrote:


Well, the first reports of how the unicode rtl would look like were
pretty scary: Total break of the string part of millions of lines of
code that people wrote with Lazarus since years.

That is why I stopped recommending Lazarus to my colleagues who are 
doing Delphi.


They took a huge amount of pain to convert their software from Delphi 
one byte strings to Delphi two bytes strings. Hence they will not be 
pleased to be forced to convert back to one byte strings to be able to 
use Lazarus and some time later convert to two byte strings again once 
Lazarus might be forced to finally follow Delphi on that behalf.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell

On 11/22/2014 05:18 PM, Hans-Peter Diettrich wrote:
Does this mean that Lazarus (new mode) ignores the OS system codepage 
setting?


IMHO that would be just GREAT to allow for doing portable software. The 
RTL and LCL interface should be OS ignorant for portability. In user 
code, the user should be allowed to use the string encoding (and byte 
cont per character), he finds the most convenient for his application.


OTOH this of course does provide a decent set of  problems including but 
not limited to unnecessary conversions in certain cases.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] lresources.pp(3089, 67) Error: Identifier not found "RT_RCDATA"

2014-11-24 Thread Sven Barth
Am 24.11.2014 10:20 schrieb "Joost van der Sluis" :
>
> On 11/23/2014 09:09 PM, waldo kitty wrote:
>>
>> On 11/22/2014 6:45 PM, Mattias Gaertner wrote:
>>>
>>> On Sat, 22 Nov 2014 23:27:30 +0100
>>> Bart  wrote:
>>>
 On 11/22/14, Joost van der Sluis  wrote:

> Add the windows-unit to the uses section of lresources as a quick fix.
>>>
>>>
>>> Done.
>>
>>
>> i pulled your update last evening and compiled... it seems to have
>> worked...
>
>
> With 'quick-fix' I meant that you could do this change locally to avoid
the problem. Not that it should be committed as a definitive fix.
>
> I think it's a bug in fpc. Sven?

I already fixed it yesterday evening. I readded RT_RCDATA for Win32 and
Win64 (WinCE didn't have it there and other platforms have it in System).
But I added it with a deprecated message, because the correct solution is
to use the one in the Windows unit (or maybe it would be better if we'd
enable the RT_* constants in System for Windows as well).

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread luiz americo pereira camara
2014-11-24 6:29 GMT-03:00 Michael Schnell :

> On 11/23/2014 07:52 PM, Felipe Monteiro de Carvalho wrote:
>
>>
>> Well, the first reports of how the unicode rtl would look like were
>> pretty scary: Total break of the string part of millions of lines of
>> code that people wrote with Lazarus since years.
>>
>>  That is why I stopped recommending Lazarus to my colleagues who are
> doing Delphi.
>
> They took a huge amount of pain to convert their software from Delphi one
> byte strings to Delphi two bytes strings. Hence they will not be pleased to
> be forced to convert back to one byte strings to be able to use Lazarus and
> some time later convert to two byte strings again once Lazarus might be
> forced to finally follow Delphi on that behalf.
>
>
If the program does not explicitely assumesa specific encoding, i.e. use
only String type and do not do low level string handling, there will be no
need to change.

I did/do convert a lot of Delphi components and can assure that most will
not need changes as is today

Luiz
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Need testers for the a new debugger

2014-11-24 Thread Joost van der Sluis

On 11/22/2014 12:45 PM, C Western wrote:

I have been switching back and forth between gdb and the new one - both
have some issues. The one I noticed most recently with the new one is
with a multi threaded application - a break set in the thread seem to
cause the debugger to become "lost". Is the debugger set up to cope in
this situation? (I will often turn off the multi threading for
debugging, but this is not always possible.)


I did not do anything threading-related. So I'm not surprised that it 
does not really work.


Can you create a bug report for this? And the OS you were using?

Regards,

Joost.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Need testers for the a new debugger

2014-11-24 Thread Joost van der Sluis

On 11/22/2014 12:18 AM, Mattias Gaertner wrote:

On Fri, 21 Nov 2014 23:08:00 +
Martin Frb  wrote:


[...]
So as far as the debugger goes, this would then be correctly following
the debug info.


Funny fpc. Thanks for checking.

BTW, the default TGDBMIDebugger jumps even worse. Begin, End, Begin,
i:=3. So the new debugger is better here.


:-) Yet another thing the new debugger is better at.

Btw: It could be that this problem is fixed in fpc-trunk. If not I have 
to look at it.


There's also the NextOnlyStopOnStartLine debugger-specific-option. You 
can try if that fixes your problem.


Regards,

Joost.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell

On 11/24/2014 11:44 AM, luiz americo pereira camara wrote:


If the program does not explicitely assumesa specific encoding, i.e. 
use only String type and do not do low level string handling, there 
will be no need to change.


I don't know the internals of the program(s). It's a huge system and 
does anything that somehow might be possible :-) .


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Juha Manninen
On Mon, Nov 24, 2014 at 11:33 AM, Michael Schnell  wrote:
> IMHO that would be just GREAT to allow for doing portable software. The RTL
> and LCL interface should be OS ignorant for portability. In user code, the
> user should be allowed to use the string encoding (and byte cont per
> character), he finds the most convenient for his application.
>
> OTOH this of course does provide a decent set of  problems including but not
> limited to unnecessary conversions in certain cases.

See the request from Mattias :
"Please test and tell what you find out."

Michael Schnell and others, let's keep this thread in a more congrete level.
You can start another philosophical thread about how strings should be
in a perfect world.

Juha

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Sun, 23 Nov 2014 21:37:56 -0300
luiz americo pereira camara  wrote:

> 2014-11-20 13:21 GMT-03:00 Mattias Gaertner :
>[...]

First of all: Thanks for testing.

> Without {$codepage utf8} directive String constants will get Code Page 0
> (CP_ACP) and not the 1200 (UTF16 - UnicodeString).

Beware: There are different types of string constants.

 
> String variables assigned to those constants will also have Code Page = 0
> 
> This is because the constant string code page is evaluated at compile time
> 
> Not sure if there's a compiler command line param with same effect as
> {$codepage utf8}
> 
> The attached program show how data loss can occur

The program uses writeln, which converts to console CP.
When you save the strings to a file you can see what they contain. Or
write the byte values.

This works with or without {$codepage utf8}:

S := 'João'; // constant to (Ansi or Short)string
W:=S; 
SUTF8:=S;

const c: string = 'João';
W:=c; // constant to Wide/Unicode/UTF8String

This requires {$codepage utf8} or -Fcutf8:

W := 'João'; // constant to Wide/Unicode/UTF8string 

const c = 'João';
W:=c;

I guess it would be a good idea to pass -Fcutf8 with FPC 2.7.1. For
both modes.


Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 12:15:03 +0100
Mattias Gaertner  wrote:

>[...]
> I guess it would be a good idea to pass -Fcutf8 with FPC 2.7.1. For
> both modes.

On second thought: only for new mode. 
Passing it in the old mode will make the wide/unicode/utf8string work,
but the Ansi/Shortstring will be wrong.

We need a table in the wiki. FPC 2.6.5 and below, FPC 2.7.1+
and FPC 2.7.1+ with UTF8 as default CP. And with or without {$codepage
utf8}.

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell

On 11/24/2014 12:01 PM, Juha Manninen wrote:

See the request from Mattias : "Please test and tell what you find out."


I have not enough knowledge to be able to patch the compiler :-(


let's keep this thread in a more congrete level.

Agreed (even if I don't think that will lead to anything fairly portable.).

As requested by Michael vC, I will do a Wiki page tomorrow and start a 
new Thread based on this.


-Michael



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 13:12:04 +0100
Michael Schnell  wrote:

> On 11/24/2014 12:01 PM, Juha Manninen wrote:
> > See the request from Mattias : "Please test and tell what you find out."
> 
> I have not enough knowledge to be able to patch the compiler :-(

I asked for testing compiling with -dEnableUTF8RTL.
Don't hijack threads.

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Sun, 23 Nov 2014 18:27:12 -0300
luiz americo pereira camara  wrote:

> 2014-11-20 13:21 GMT-03:00 Mattias Gaertner :
>[...]
> Please test and tell what you find out.
> >
> >
> The FormatSettings fields are still encoded with System Code Page
> regardless of DefaultSystemCodePage value.
> 
> While for english locales there's no problem, other locales like PT-BR have
> accented names in days and monthes.
> 
> The problem is in windows SysUtils.GetLocaleStr function that uses non
> unicode Win Api function. This problem will affect also the UnicodeString
> RTL.
> 
> Attached is a test app that shows the issue. It also has a version of
> GetLocaleStr that fixes the issue for the RTL (both versions)

Thanks. It works here too.

I reported it:
http://bugs.freepascal.org/view.php?id=27086

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 11/23/2014 07:52 PM, Felipe Monteiro de Carvalho wrote:


Well, the first reports of how the unicode rtl would look like were
pretty scary: Total break of the string part of millions of lines of
code that people wrote with Lazarus since years.

That is why I stopped recommending Lazarus to my colleagues who are 
doing Delphi.


They took a huge amount of pain to convert their software from Delphi 
one byte strings to Delphi two bytes strings.


I had similar problems, but only in porting a huge codebase from 
ShortString to AnsiString. The move from D5 to XE was painless then, 
only the uses lists deserved some updates. In so far it might be a good 
idea to educate some old-school Delphi coders, how to deal with managed 
strings and other past-BP items in general.


As for Lazaurs, I think that UTF-8 is the best choice for multi-platform 
projects, with almost no extra conversions required on any platform.
Please note that until now Windows did the Ansi to UTF conversions 
itself, in every API call with strings involved. If this was not noticed 
before, the conversions won't be noticeable afterwards as well.


A move to UTF-16 instead will only favor Windows, while additional 
string conversions will be required on almost every other platform. I 
think that FPC/Lazarus should fork and support separate libraries 
(RTL...) for UTF-8 and UTF-16 strings, if compatibility with newer 
Delphi VCL projects is desired. Full Delphi compatibility would also 
require a FireMonkey replacement for the LCL, and that were another very 
new project, extending the UTF-16 branch (only).


Just my 0.02€
DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell

On 11/24/2014 02:19 PM, Hans-Peter Diettrich wrote:


A move to UTF-16 instead will only favor Windows,

Regarding the RTL interface, you of course are right.

Doing the user software with UTF-16 instead of RTZF-8 strings, in many 
cases (but of course not perfectly) allows for keeping old-style 1-Byte 
ANSI code using s[n], and manually using the result of pos().


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Sven Barth
Am 24.11.2014 14:55 schrieb "Hans-Peter Diettrich" :
> Please note that until now Windows did the Ansi to UTF conversions
itself, in every API call with strings involved. If this was not noticed
before, the conversions won't be noticeable afterwards as well.

This is something that one definitely shoudln't forget! Up to now Windows
did the conversion for us and do we see people complaining about the
conversion during API calls? No, we don't...

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell

On 11/24/2014 02:50 PM, Hans-Peter Diettrich wrote:


code, the user should be allowed to use the string encoding (and byte 
cont per character), he finds the most convenient for his application.


I'm not sure what exactly you mean here.
Here I menat that for a *new project* the user might be willing to 
choose e.g. either UTF-16 (sometimes easier to use) or utf-8 (sometimes 
faster and less memory overhead) for his own code, while the RTL might 
be done specifically in favor of the OS.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner

Please don't start an UTF war again.

This has been discussed in length and a zillion times.

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Graeme Geldenhuys
On 2014-11-24 10:52, Michael Schnell wrote:
> I don't know the internals of the program(s). It's a huge system and 
> does anything that somehow might be possible :-) .

Luckily you have everything unit tested right. So it would simply be a
case of running the test suite to see what works and what doesn't. ;-)


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread luiz americo pereira camara
2014-11-24 8:15 GMT-03:00 Mattias Gaertner :

> On Sun, 23 Nov 2014 21:37:56 -0300
> luiz americo pereira camara  wrote:
>
> > The attached program show how data loss can occur
>
> The program uses writeln, which converts to console CP.
> When you save the strings to a file you can see what they contain. Or
> write the byte values.
>
>
Yes. I improved the program (see message that followed) to write the bytes
values so the comparison should be more exact.


> This works with or without {$codepage utf8}:
>
> S := 'João'; // constant to (Ansi or Short)string
>

Without {$codepage utf8}
When DefaultSystemCodePage is CP_ACP the variable S will have the content
of UTF8 but the encoding will be ACP (in my case 1252), just like is today.
With DefaultSystemCodePage as CP_UTF8 both content and code page will match

[..]

I guess it would be a good idea to pass -Fcutf8 with FPC 2.7.1. For
> both modes.
>
>
Probably yes.
There's one case that must be tested. When the file is encoded in ansi like
those shared with Delphi.
What i understand with -Fcutf8, the compiler will interpret those content
as UTF8 creating wrong encoded constant.

$codepage directive overrides -Fcutf8?
If so, to fix the developer could use $codepage with the correct file
encoding

Luiz
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 12:45:54 -0300
luiz americo pereira camara  wrote:

> 2014-11-24 8:15 GMT-03:00 Mattias Gaertner :
>[...]
> > This works with or without {$codepage utf8}:
> >
> > S := 'João'; // constant to (Ansi or Short)string
> >
> 
> Without {$codepage utf8}
> When DefaultSystemCodePage is CP_ACP the variable S will have the content
> of UTF8 but the encoding will be ACP (in my case 1252), just like is today.
> With DefaultSystemCodePage as CP_UTF8 both content and code page will match

Yes, but CP_ACP is treated as CP_UTF8. So it does not matter.

 
> [..]
> 
> I guess it would be a good idea to pass -Fcutf8 with FPC 2.7.1. For
> > both modes.
> >
> >
> Probably yes.
> There's one case that must be tested. When the file is encoded in ansi like
> those shared with Delphi.
> What i understand with -Fcutf8, the compiler will interpret those content
> as UTF8 creating wrong encoded constant.

Yes.
 
> $codepage directive overrides -Fcutf8?

Yes.

> If so, to fix the developer could use $codepage with the correct file
> encoding

Yes.

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Graeme Geldenhuys
On 2014-11-22 16:38, Michael Van Canneyt wrote:
> The exact behaviour of the RTL is controlled by a couple of variables:
> DefaultSystemCodePage, DefaultFileSystemCodePage , 
> DefaultRTLFileSystemCodePage.

I've read the updated wiki page, but still confused about something...

  TFormatSettings = record
CurrencyFormat: Byte;
NegCurrFormat: Byte;
ThousandSeparator: Char;
DecimalSeparator: Char;
...snip...


How is ThousandSeparator and DecimalSeparator supposed to work it
TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian
thousand separator (4-byte non-breaking white space character) for
example will not fit into a Char type.

I haven't read this whole thread yet, and haven't played with the latest
FPC 2.7.1 yet - so maybe I'm just missing some key information for now.

Or is TFormatSettings just something that hasn't yet been converted to
be Unicode friendly?


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 16:25:15 +
Graeme Geldenhuys  wrote:

>[...]
> Or is TFormatSettings just something that hasn't yet been converted to
> be Unicode friendly?

It has not yet been converted.

We can help the FPC team by collecting all places.


Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Graeme Geldenhuys
On 2014-11-24 16:36, Mattias Gaertner wrote:
> It has not yet been converted.

Many thanks for confirming that.


> We can help the FPC team by collecting all places.

Where should we report this? Mantis or Unicode page of the Wiki?


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Need testers for the a new debugger

2014-11-24 Thread C Western

On 24/11/14 10:46, Joost van der Sluis wrote:

On 11/22/2014 12:45 PM, C Western wrote:

I have been switching back and forth between gdb and the new one - both
have some issues. The one I noticed most recently with the new one is
with a multi threaded application - a break set in the thread seem to
cause the debugger to become "lost". Is the debugger set up to cope in
this situation? (I will often turn off the multi threading for
debugging, but this is not always possible.)


I did not do anything threading-related. So I'm not surprised that it
does not really work.

Can you create a bug report for this? And the OS you were using?


The OS was linux - x86_64

I just tried on lazarus/examples/multithreading/multithreadingexample1.lpi

Setting a break point on line 100 (in the thread routine) seems to lead 
to immediate disaster.


Colin


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Hans-Peter Diettrich

luiz americo pereira camara schrieb:

When DefaultSystemCodePage is CP_ACP the variable S will have the 
content of UTF8 but the encoding will be ACP (in my case 1252), just 
like is today.

With DefaultSystemCodePage as CP_UTF8 both content and code page will match


The Delphi (and FPC) encoding model allows for strings of different 
static (declared) and dynamic (true content) encoding, see the special 
handling of RawByteString (Wiki).


So far it's not a good idea to simply *assume* that a string variable 
contains bytes of the declared encoding. In detail one should check or 
force the right dynamic encoding of every string variable, before 
searching for specific bytes (chars) in it.


I'm missing documentation for working safely (and efficiently) with such 
irregular strings, most probably none of the FPC (and Delphi) developers 
ever noticed how users are left alone with this problem :-(


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Hans-Peter Diettrich

Graeme Geldenhuys schrieb:


How is ThousandSeparator and DecimalSeparator supposed to work it
TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian
thousand separator (4-byte non-breaking white space character) for
example will not fit into a Char type.


The Char type is quite useless with Unicode, at least if it has less 
than 3 bytes (4 for UTF-8). There exist many more flaws in the RTL/LCL, 
assuming that a character always fits into a Char (like the Pos 
overload...).


In the best case Char could be retyped into an string (substring), so 
that it can hold any Unicode character *and* its encoding. Unicode 
stringhandling in general should always use substrings, for the same 
reasons. Until then 99.9% of occurences of Char in UTF-8 aware library 
or application code can be considered bugs :-(


The FPC team can sort out the real low-level code (most probably only 
the string conversion routines), the rest will become Delphi 
incompatible when fixed.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 22:15:29 +0100
Hans-Peter Diettrich  wrote:

>[...]
> The Delphi (and FPC) encoding model allows for strings of different 
> static (declared) and dynamic (true content) encoding, see the special 
> handling of RawByteString (Wiki).
> 
> So far it's not a good idea to simply *assume* that a string variable 
> contains bytes of the declared encoding. In detail one should check or 
> force the right dynamic encoding of every string variable, before 
> searching for specific bytes (chars) in it.
> 
> I'm missing documentation for working safely (and efficiently) with such 
> irregular strings, most probably none of the FPC (and Delphi) developers 
> ever noticed how users are left alone with this problem :-(

Maybe I don't understand the question, but it seems to me this is
documented where static-, dynamic cp and rawbytestring are explained.

http://wiki.freepascal.org/FPC_Unicode_support#Ansistring

When a procedure requires a specific encoding it uses a specific String
type. If it works with CP_ACP it uses "String". If it needs UTF8 it
uses UTF8String. If it can work with any 8-bit encoding it uses
RawByteString. If you need it even more detailed use the
StringCodePage function.

What else do you need?

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 22:53:44 +0100
Hans-Peter Diettrich  wrote:

> Graeme Geldenhuys schrieb:
> 
> > How is ThousandSeparator and DecimalSeparator supposed to work it
> > TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian
> > thousand separator (4-byte non-breaking white space character) for
> > example will not fit into a Char type.
> 
> The Char type is quite useless with Unicode,

Correction: *This* Char type needs to be extended.
"Char" in general is very useful.

> at least if it has less 
> than 3 bytes (4 for UTF-8). There exist many more flaws in the RTL/LCL, 
> assuming that a character always fits into a Char (like the Pos 
> overload...).

There is a Pos overload for strings. Where is the flaw in Pos?

 
> In the best case Char could be retyped into an string (substring),

That would be wrong in 99.9% of the cases.

> so 
> that it can hold any Unicode character *and* its encoding. Unicode 
> stringhandling in general should always use substrings, for the same 
> reasons. Until then 99.9% of occurences of Char in UTF-8 aware library 
> or application code can be considered bugs :-(
> 
> The FPC team can sort out the real low-level code (most probably only 
> the string conversion routines), the rest will become Delphi 
> incompatible when fixed.

Please give real world examples.

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 16:40:06 +
Graeme Geldenhuys  wrote:

>[...]
> Where should we report this? Mantis or Unicode page of the Wiki?

On a second thought, a programmer need to know what might fail and the
alternative/workaround. The latter depends on settings.
In case of the new LCL mode we can extend the "LCL Unicode support" page.


Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus