Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-12 Thread Michael Schnell

On 05/11/2016 04:38 PM, Michael Van Canneyt wrote:



Where is the string-type for string-buffers gone?


There never was one, this would break in 2.6.4 too.


Right.

But -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support


-Michael
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 21:58, Mattias Gaertner wrote:
>> > They made the changes to take advantage of the new
>> > functionality in FPC 3.0, because the end result is much simpler code
>> > both in the LCL and in user programs.
>
> Yes, simpler and more powerful. For example FPC now supports full UTF-8
> in many RTL/FCL functions under Windows.

Thanks Jonas and Mattias. That is at least some promising news.

Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Mattias Gaertner
On Wed, 11 May 2016 22:33:13 +0200
Jonas Maebe  wrote:

> Graeme Geldenhuys wrote:
> > On 2016-05-11 18:58, Michael Van Canneyt wrote:
> >> >  For 99,99% of cases, no changes to your code are required.
> >> >  If it worked in 2.6.4, it will work in 3.0.0
> >
> > Just curious, so why was there so many changes required for LCL, and a
> > whole wiki page of its own to explain it?
> 
> Those changes were not required, (almost) everything worked still fine 
> with the old code. They made the changes to take advantage of the new 
> functionality in FPC 3.0, because the end result is much simpler code 
> both in the LCL and in user programs.

Yes, simpler and more powerful. For example FPC now supports full UTF-8
in many RTL/FCL functions under Windows.

 
> The wiki page is mainly to explain all of the things you no longer have 
> to do when using this new method.

Yes. And the few incompatibilities.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Tomas Hajny
On Wed, May 11, 2016 22:08, Graeme Geldenhuys wrote:
> On 2016-05-11 18:58, Michael Van Canneyt wrote:
>> For 99,99% of cases, no changes to your code are required.
>> If it worked in 2.6.4, it will work in 3.0.0
>
> Just curious, so why was there so many changes required for LCL, and a
> whole wiki page of its own to explain it?

My understanding: Because LCL wanted to benefit from new possibilities in
version 3.0.0 (e.g. use functionality newly provided by the RTL instead of
certain own alternative routines), but the previous LCL code included some
assumptions (like that all ansistrings should always contain UTF-8) which
may not always be the case in FPC RTL by default (e.g. certainly not for
MS Windows applications). But again - they could continue using the
original code as it was if they wanted to do so.

Tomas


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe

Graeme Geldenhuys wrote:

On 2016-05-11 18:58, Michael Van Canneyt wrote:

>  For 99,99% of cases, no changes to your code are required.
>  If it worked in 2.6.4, it will work in 3.0.0


Just curious, so why was there so many changes required for LCL, and a
whole wiki page of its own to explain it?


Those changes were not required, (almost) everything worked still fine 
with the old code. They made the changes to take advantage of the new 
functionality in FPC 3.0, because the end result is much simpler code 
both in the LCL and in user programs.


The wiki page is mainly to explain all of the things you no longer have 
to do when using this new method.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 18:58, Michael Van Canneyt wrote:
> For 99,99% of cases, no changes to your code are required. 
> If it worked in 2.6.4, it will work in 3.0.0

Just curious, so why was there so many changes required for LCL, and a
whole wiki page of its own to explain it?

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Sven Barth
Am 11.05.2016 19:35 schrieb "Santiago A." :
> Something about codpages needs a second thought.
>
> a) There shouldn't be automatic conversion at all.
> b) The codepage of a string shouldn't change when you assign a string
with another codepage, just rise an error.
> c) Corollary of previous premises: Empty strings should also have
codepage.

The codepage aware ansistring was implemented for Delphi-compatibility so
this is highly unlikely to change.

> Extra 1) Beside calling SetSetcodepage, it would be handy that you could
set the codepage when you declare a string. I don't mean codepage should be
statically typed, just it would be handy.

A string is Nil upon it's declaration so there is nowhere where you could
store that information. It only has the static codepage that it had been
declared with.

> Extra 2)  Being able to set the codepage statically, so that mismatch
codepage could be detected in compiler time, would be handy. In this case I
do mean codepage could also be statically typed,

Codepages are already set statically.

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Santiago A. wrote:


El 11/05/2016 a las 16:38, Michael Van Canneyt escribió:



FPC 3.0 adds unsafe auto-conversions


Why do you think it is unsafe ?


I have an answer for this.

In short:
Different codepage strings and raw strings should be considered
different incompatible types. Pascal is a hardtyped language, and I love
that, and codepages are prone to errors (all these threads prove it).


They are only prone to errors if you don't understand what is happening.

That is so for any feature.


Something about codpages needs a second thought.

a) There shouldn't be automatic conversion at all.


This is simply not debatable, it is Delphi compatibility that requires this.


To be clear: I think all the problems are hugely exaggerated and blown out
of proportion.

For 99,99% of cases, no changes to your code are required. 
If it worked in 2.6.4, it will work in 3.0.0


Only if somewhere explicitly different codepages are used will you have
problems, or if the characters are a different codepage than what is said 
in the string codepage setting. 
(which is what is happening in TStringField.AsString)


In those cases, you would have problems anyway, no matter what the solution.

I have a huge codebase dealing with databases and lots of string manipulation. 
It uses 2.6.4. It converts data from a database with cp1251 data to UTF8, 
in 2.6.4.


I have recompiled the code, I am running this since 3.0 came out, and have yet 
to encounter the first problem in the applications.


Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Santiago A.
El 11/05/2016 a las 16:38, Michael Van Canneyt escribió:
>
>> FPC 3.0 adds unsafe auto-conversions
>
> Why do you think it is unsafe ?
>
I have an answer for this.

In short:
Different codepage strings and raw strings should be considered
different incompatible types. Pascal is a hardtyped language, and I love
that, and codepages are prone to errors (all these threads prove it).

Something about codpages needs a second thought.

a) There shouldn't be automatic conversion at all.
b) The codepage of a string shouldn't change when you assign a string
with another codepage, just rise an error.
c) Corollary of previous premises: Empty strings should also have codepage.

Extra 1) Beside calling SetSetcodepage, it would be handy that you could
set the codepage when you declare a string. I don't mean codepage should
be statically typed, just it would be handy.
Extra 2)  Being able to set the codepage statically, so that mismatch
codepage could be detected in compiler time, would be handy. In this
case I do mean codepage could also be statically typed,

-- 
Saludos

Santi
s...@ciberpiula.net

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Andreas Dorn wrote on Wed, 11 May 2016:


All in all Graeme is right. FPC looks pretty much broken to me, too.
For my projects I pulled the emergency-break on anything FPC.
 
The most serious flaws for me of FPC 3.0 are:
- assuming that it's possible to assign an encoding to every string
- using an (unsafe) guess about the encoding for auto-conversions


Do you have code that works correctly in FPC 2.6.x, but not in FPC  
3.0? If so, can you please post it or file bug reports? Again: the  
main focus when designing all of this new functionality was backward  
compatibility: existing code that uses plain  
string/shortstring/ansistring/unicodestring/char/widechar/unicodechar/pchar/pwidechar/punicodechar should have the same behaviour in FPC 3.0 as in previous FPC versions if you don't make any changes. And in virtually all cases it does (the utf8string type being a notable  
exception).



Some examples:
1) String-Buffers
Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an
encoding to
those chunks, and allowing auto-conversions will just lead to corruption.
 
Where is the string-type for string-buffers gone?


There never was any, but as long as you don't try to convert strings  
containing such arbitrary data from one code page to another (by  
either calling setcodepage() or by assigning them from a string with  
declared code page X to a string with declared code page Y), no  
conversions will happen.



2) Most programming languages out there use something like "sequence of
UTF-16 codepoints" as a string-type.
(That's not the same as UTF-16 string !)
It's a proper string type for "UTF-16 buffer" - pretty much nobody out
there uses a low-level string-type that assumes
that the content is a complete UTF-16 string.


The meaning of UnicodeString has not changed in FPC 3.0 compared to  
previous FPC versions, nor the way they are converted to/from other  
string types. You can argue it was broken from the start, but that's  
unrelated to the present animosity that's getting vented about FPC 3.0.



 3) Filenames on Windows
You can't convert any random filename on Windows to UTF8 and back without
dataloss.
There simply isn't any encoding that correctly fits to all possible
filenames.


We only auto-convert Windows file names from UTF-16 to anything else  
if you use non-unicodestring/widestring variables with the file name  
APIs. If you consistently use unicodestring/widestring, no conversion  
will happen (except with not yet converted APIs, such as classes).



A lot of APIs use buffers. You can try to assign an encoding to a buffer,
but if you use that encoding
to auto-convert anything you made a blatant mistake. Assuming that anything
from the outside world
(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...


Maybe we should add support for "WTF-8" like in Rust:  
https://github.com/rust-lang/rust/issues/12056



4) some Barcodes,


I would not consider these to be strings, but other than that the same  
holds as for String Buffers above.



5) Various File-Format-Standards,


Idem.


6) anything that uses ASCII + some Control-Bytes for communication,


Idem.


7) some encodings used in databases, ...
all that won't fit into the FCP scheme of 'known encodings'..
 

The most obvious showstoppers for FPC 3.0 are:
FPC 3.0 doesn't have a useful type for string-buffers.


Use arrays, like in any other programming language. If you insist on  
using strings, simply stick to consistently using a single string type.



FPC 3.0 doesn't have a useful type for Filenames


Use UnicodeString: as long as you do not assign it to another string  
type, it won't get converted.



FPC 3.0 adds unsafe auto-conversions


Where/when?


Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Andreas Dorn wrote:


All in all Graeme is right. FPC looks pretty much broken to me, too.
For my projects I pulled the emergency-break on anything FPC.
 
The most serious flaws for me of FPC 3.0 are:
- assuming that it's possible to assign an encoding to every string
- using an (unsafe) guess about the encoding for auto-conversions
 
It's not possible to assign a valid encoding to every string (not 
automatically, and not even manually).


Please stop spreading FUD, this is plainly a false statement.


 
Some examples:
1) String-Buffers
Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an encoding to
those chunks, and allowing auto-conversions will just lead to corruption.
 
Where is the string-type for string-buffers gone?


There never was one, this would break in 2.6.4 too.

If you thought there was in 2.6.4, you are simply mistaken.


 
2) Most programming languages out there use something like "sequence of UTF-16 
codepoints" as a string-type.
(That's not the same as UTF-16 string !)
It's a proper string type for "UTF-16 buffer" - pretty much nobody out there 
uses a low-level string-type that assumes
that the content is a complete UTF-16 string.  


No-one stops you from using Unicodestring ?


3) Filenames on Windows
You can't convert any random filename on Windows to UTF8 and back without 
dataloss.
There simply isn't any encoding that correctly fits to all possible filenames.


You will need to explain what you mean by this.


A lot of APIs use buffers. You can try to assign an encoding to a buffer, but 
if you use that encoding
to auto-convert anything you made a blatant mistake. Assuming that anything 
from the outside world
(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...
 
4) some Barcodes,
5) Various File-Format-Standards,
6) anything that uses ASCII + some Control-Bytes for communication,
7) some encodings used in databases, ...
all that won't fit into the FCP scheme of 'known encodings'..


FPC 3.0.0 has not changed with regard to 2.6.4 in this regard.
  

The most obvious showstoppers for FPC 3.0 are:
FPC 3.0 doesn't have a useful type for string-buffers.


Please explain what you mean with 'string buffers'.

When using e.g. windows or C apis, the string buffer you need to use is 
either "Array of char" or "array of widechar".


Which one you should use depends on the API you want to access.

In the case of Array of Char, you must take care of encoding, but this was so 
in 2.6.4 as well.

Nothing has changed in this regard.


FPC 3.0 doesn't have a useful type for Filenames


Just use the native filename type, or UnicodeString.


FPC 3.0 adds unsafe auto-conversions


Why do you think it is unsafe ?

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Andreas Dorn
All in all Graeme is right. FPC looks pretty much broken to me, too.

For my projects I pulled the emergency-break on anything FPC.

 

The most serious flaws for me of FPC 3.0 are:

- assuming that it's possible to assign an encoding to every string

- using an (unsafe) guess about the encoding for auto-conversions

 

It's not possible to assign a valid encoding to every string (not automatically, and not even manually).

 

Some examples:


1) String-Buffers

Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an encoding to

those chunks, and allowing auto-conversions will just lead to corruption.


 

Where is the string-type for string-buffers gone?


 

2) Most programming languages out there use something like "sequence of UTF-16 codepoints" as a string-type.

(That's not the same as UTF-16 string !)

It's a proper string type for "UTF-16 buffer" - pretty much nobody out there uses a low-level string-type that assumes

that the content is a complete UTF-16 string.
 


3) Filenames on Windows

You can't convert any random filename on Windows to UTF8 and back without dataloss.

There simply isn't any encoding that correctly fits to all possible filenames.

 

A lot of APIs use buffers. You can try to assign an encoding to a buffer, but if you use that encoding

to auto-convert anything you made a blatant mistake. Assuming that anything from the outside world

(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...


 

4) some Barcodes,

5) Various File-Format-Standards,

6) anything that uses ASCII + some Control-Bytes for communication,

7) some encodings used in databases, ...

all that won't fit into the FCP scheme of 'known encodings'..

 


The most obvious showstoppers for FPC 3.0 are:

FPC 3.0 doesn't have a useful type for string-buffers.

FPC 3.0 doesn't have a useful type for Filenames

FPC 3.0 adds unsafe auto-conversions
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


My test program under FPC 2.6.4 doesn't give problems. It's when that
same program is compiled under FPC 3.0.0 that it does. All due to String
(and thus AnsiString) changing its encoding based on the running
environment.


In FPC 2.6.x, if you use a widestring manager (such as cwstring), the  
code page of shortstring/ansistring/pchar /also/ depends on the  
running environment, in fact in exactly the same way as in FPC 3.x.  
The main thing that is new in FPC 3.x, is that instead of the RTL  
assuming that all ansistring contents are always encoded in this code  
page, we now explicitly attach the code page information in a hidden  
field of the ansistring structure (so different ansistrings can have  
different encodings, but the default for plain ansistrings remains  
exactly the same).



With FPC 2.6.4 compiled programs, no matter the environment
(UTF-8 or Latin-1), my test program behaves the same.


As I asked before: did you change "String" to "Unicodestring" when  
compiling under FPC 2.6.4? As mentioned before, {$modeswitch  
unicodestrings} is a new feature in FPC 3.0 and is ignored by FPC  
2.6.4 (you will get a warning about this when compiling with FPC  
2.6.4, but not an error).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Graeme Geldenhuys wrote:


If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


There is none satisfactory.


While not 100% satisfactory, doing exactly the same as in FPC 2.6.x will give 
you exactly the same as you got there.


I am aware of this. We are using lots of DB apps at my work and so I tested 
this.
The apps work without change.

The biggest problem for going to 3.0.0 is the currency bug.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 14:14, Jonas Maebe wrote:
> compared to FPC 2.6.x due to automatic conversions in the RTL and FCL.  
> When it is clear that is not true, you are now saying that the  
> behaviour of FPC 3.0 is different to FPC 2.6.x if you compile  
> different code with each one.

My test program under FPC 2.6.4 doesn't give problems. It's when that
same program is compiled under FPC 3.0.0 that it does. All due to String
(and thus AnsiString) changing its encoding based on the running
environment. With FPC 2.6.4 compiled programs, no matter the environment
(UTF-8 or Latin-1), my test program behaves the same.

I give up!

Regards,
  Graeme

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Graeme Geldenhuys wrote:


If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


There is none satisfactory.


While not 100% satisfactory, doing exactly the same as in FPC 2.6.x  
will give you exactly the same as you got there.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe

Graeme Geldenhuys wrote on Wed, 11 May 2016:


On 2016-05-11 13:27, Jonas Maebe wrote:

If you change the string to unicodestring (since
FPC 2.6.4 does not know {$modeswitch unicodestring}), you should get
the same results in FPC 2.6.4 and FPC 3.x


No, because FPC 2.6.4 doesn't do automatic encoding conversions.


FPC 2.6.x and FPC 3.0 perform exactly the same automatic encoding  
conversions when assigning that ansistring property to a unicodestring  
variable in your test program.



I would
first have to add UTF8Decode() calls wherever I assign known UTF-8 data
to a UnicodeString.


If you do the same in FPC 3.0, you will get exactly the same results  
as in FPC 2.6.x.



With FPC 2.6.4 I never use UnicodeString. Like with Lazarus LCL, I use
AnsiString with a UTF-8 payload. I define a new type which I use in my
application to remind me of that fact.


First, you start with a warning about how no one should use FPC 3.0  
with the String type, because it completely changes the behaviour  
compared to FPC 2.6.x due to automatic conversions in the RTL and FCL.  
When it is clear that is not true, you are now saying that the  
behaviour of FPC 3.0 is different to FPC 2.6.x if you compile  
different code with each one.


Well, yes: if you use different code in FPC 2.6.x and FPC 3.x, then  
you can indeed get very different behaviour.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


On 2016-05-11 13:37, Michael Van Canneyt wrote:


It would not help if we did this: the data would be wrong in the TDataset
buffers, and the result would be worse.


I didn't mean literally search and replace - that would simply be too
easy. ;-) Some work and testing would be required, otherwise Delphi
would have had Unicode support much sooner.



You just need to know what conversions happen where, and if you do it
works just fine.


If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


There is none satisfactory. 
As I said: TField.AsString is a problem, we are aware of it.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:27, Jonas Maebe wrote:
> If you change the string to unicodestring (since  
> FPC 2.6.4 does not know {$modeswitch unicodestring}), you should get  
> the same results in FPC 2.6.4 and FPC 3.x

No, because FPC 2.6.4 doesn't do automatic encoding conversions. I would
first have to add UTF8Decode() calls wherever I assign known UTF-8 data
to a UnicodeString.

With FPC 2.6.4 I never use UnicodeString. Like with Lazarus LCL, I use
AnsiString with a UTF-8 payload. I define a new type which I use in my
application to remind me of that fact.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:37, Michael Van Canneyt wrote:
> 
> It would not help if we did this: the data would be wrong in the TDataset
> buffers, and the result would be worse.

I didn't mean literally search and replace - that would simply be too
easy. ;-) Some work and testing would be required, otherwise Delphi
would have had Unicode support much sooner.


> You just need to know what conversions happen where, and if you do it 
> works just fine.

If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 12:03, Graeme Geldenhuys wrote:
> I wrote a small
> test program that reads data from a Firebird database where the database
> and field charset is set to UTF8.

For those that want to try the sample application, a backup of the
database (3.7MB in size) can be found at:

  http://geldenhuys.co.uk/~graemeg/temp/unicode_test.fbk


Regards,
  Graeme


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


On 2016-05-11 13:07, Michael Van Canneyt wrote:


But what does your program prove ?


See below...


You're only proving that a conversion happens when you do
s := fieldByName('somefield').asString;


I'm proving that using the String type anywhere in the RTL and FCL is
now terrible. If the FPC team did a search and replace (String ->
UnicodeString) all over the RTL and FCL, then such data corrupts
probably would not have occurred because UnicodeString is not affected
by the running environment. Probably the exact reason why Delphi now
using String = UnicodeString on all platforms.


It would not help if we did this: the data would be wrong in the TDataset
buffers, and the result would be worse.

I agree that the situation is currently not ideal: TString.AsString is a
problem, but it's not nowhere near as bad as you make it out to be.

You just need to know what conversions happen where, and if you do it 
works just fine.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread LacaK






That's because you have {$modeswitch unicodestring}, so 
string=unicodestring.

This is answer to my question :-)

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread LacaK






I just double checked my results again. With u: String variable and
compiled with FPC 3.0 and running in a Latin-1 environment, data is
completely corrupted.

It will be good to know where this happens.
Because AFAIK fcl-db internaly uses AnsiString/String so assigning 
between them should not trigger any code page conversion.
So if you fetch UTF-8 data from database and then you move them between 
various string instances, they should be preserved.

(no matter than ACP of String is Latin1)
So in end when you save this data to file they should still be UTF-8 
encoded ?

Can you dump binary content of "u" before is saved to file ?
-Laco.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


On 2016-05-11 13:05, Jonas Maebe wrote:


Your concern is with utf8string, not with string/ansistring.
UTF8String is a AnsiString with utf-8 code page set.
If you
only use string/ansistring/unicodestring, then the behaviour of your
program will be identical with FPC 2.6.4 and 3.0. With utf8string, the
result is different in FPC 3.0 because now, just like when assigning


No it's not. I welcome you to try the program yourself. The test program
includes a $DEFINE where I can toggle between using String or
UTF8String. Simply disable that define at the top of the unit.


That's because you have {$modeswitch unicodestring}, so  
string=unicodestring. If you change the string to unicodestring (since  
FPC 2.6.4 does not know {$modeswitch unicodestring}), you should get  
the same results in FPC 2.6.4 and FPC 3.x (since then they are also  
actually using the same string type).


I can't easily try myself as I have no database server whatsoever  
running, nor any experience with setting them up.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:07, Michael Van Canneyt wrote:
> 
> But what does your program prove ?

See below...

> You're only proving that a conversion happens when you do
> s := fieldByName('somefield').asString;

I'm proving that using the String type anywhere in the RTL and FCL is
now terrible. If the FPC team did a search and replace (String ->
UnicodeString) all over the RTL and FCL, then such data corrupts
probably would not have occurred because UnicodeString is not affected
by the running environment. Probably the exact reason why Delphi now
using String = UnicodeString on all platforms.

> AFAIK 3.0 is no different in this matter from 2.6.4, Jonas can confirm/deny. 
> Unlike 2.6.4, 3.0.0 offers us the possibility to fix it by allowing to specify

See my reply to Jonas. There is a massive difference between FPC 2.6.4
and 3.0.0 using the exact same program and test environment.

I can't see how anybody can currently switch to FPC 3.0.0 - it simply
isn't ready for prime usage. As my test shows, you can't simply
recompile your application with FPC 3.0.0 and think it is going to work
like it did in FPC 2.6.4 - it doesn't.

Yes some parts in FPC 3.0 are now in place going forward, but there is
still too much that can go wrong (in the RTL and FCL) due to the
dynamically changing AnsiString type being used everywhere.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:05, Jonas Maebe wrote:
> 
> Your concern is with utf8string, not with string/ansistring.

UTF8String is a AnsiString with utf-8 code page set.

 If you
> only use string/ansistring/unicodestring, then the behaviour of your  
> program will be identical with FPC 2.6.4 and 3.0. With utf8string, the  
> result is different in FPC 3.0 because now, just like when assigning  

No it's not. I welcome you to try the program yourself. The test program
includes a $DEFINE where I can toggle between using String or
UTF8String. Simply disable that define at the top of the unit.

I just double checked my results again. With u: String variable and
compiled with FPC 3.0 and running in a Latin-1 environment, data is
completely corrupted.

[unicode_test]$ export LANG=en_US.ISO8859-1
[unicode_test]$ ./unicodetest

And see the attached screenshot for the result of the data.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


Hi,

Here is an example [proof if you will] of the problem. I wrote a small
test program that reads data from a Firebird database where the database
and field charset is set to UTF8.

I compile the program, then run it. No recompiles between the two runs.
The first run my system is set to have a UTF-8 locale. The second run is
where I set my system to have a ISO8859-1 (Latin-1) locale. The program
outputs the DefaultSystemCodePage to the console.

Because the locale changes the behaviour of String (aka AnsiString) in
the RTL and FCL, the first run works, but the second run corrupts my data.

Console output:

[unicode_test]$ export LANG=en_US.UTF-8
[unicode_test]$ ./unicodetest
65001

[unicode_test]$ export LANG=en_US.ISO8859-1
[unicode_test]$ ./unicodetest
28591


In my test program I write the data read from the database to a file
using TFileStream, thus console and file encoding settings will not
affect the data being written to file. TFileStream is simply writing bytes.


But what does your program prove ?

You're only proving that a conversion happens when you do
s := fieldByName('somefield').asString;
and that the conversion takes into account the locale, which in one of the 2
runs is different from the actual locale data in the database.

This conversion is as-designed, and known to be wrong in the case of TField.AsString, 
but will not be solved by simply using {$modeswitch unicodestring} in the database code.


AFAIK 3.0 is no different in this matter from 2.6.4, Jonas can confirm/deny. 
Unlike 2.6.4, 3.0.0 offers us the possibility to fix it by allowing to specify 
the codepage in TField. This is not yet implemented, however.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 12:03, Graeme Geldenhuys wrote:
> Console output:
> 
> [unicode_test]$ export LANG=en_US.UTF-8
> [unicode_test]$ ./unicodetest
> 65001
> 
> [unicode_test]$ export LANG=en_US.ISO8859-1
> [unicode_test]$ ./unicodetest
> 28591

Just to add, compiling that test program with FPC 2.6.4 I get the
correct output in output.data, no matter what my locale setting is.
That's what I meant by the fact that I can accurately assume AnsiString
contains a UTF-8 payload (because I'm reading UTF-8 data), and that the
RTL and FCL did not make any encoding conversions.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


I'm honestly trying very hard to understand the string changes
implemented in FPC 3.x, and the best way to use it going forward. In
this example I tried everything I learned from the recent mailing list
discussions. My concern with the usage of String/AnsiString still
stands, as this test program shows.


Your concern is with utf8string, not with string/ansistring. If you  
only use string/ansistring/unicodestring, then the behaviour of your  
program will be identical with FPC 2.6.4 and 3.0. With utf8string, the  
result is different in FPC 3.0 because now, just like when assigning  
an ansistring to a unicodestring, if you assign an ansistring to a  
utf8string the compiler will insert a code page conversion if necessary.


So yes: if you use utf8string, then your code may behave differently.  
It's not due to code page conversions in the RLT or FCL though (which  
is what you claimed before), but due to code page conversions in your  
own code.


I also think utf8string is the only such case where behaviour is  
different. Ideally, we should not have introduced that type before FPC  
3.0. Of course, hindsight is 20/20.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
Hi,

Here is an example [proof if you will] of the problem. I wrote a small
test program that reads data from a Firebird database where the database
and field charset is set to UTF8.

I compile the program, then run it. No recompiles between the two runs.
The first run my system is set to have a UTF-8 locale. The second run is
where I set my system to have a ISO8859-1 (Latin-1) locale. The program
outputs the DefaultSystemCodePage to the console.

Because the locale changes the behaviour of String (aka AnsiString) in
the RTL and FCL, the first run works, but the second run corrupts my data.

Console output:

[unicode_test]$ export LANG=en_US.UTF-8
[unicode_test]$ ./unicodetest
65001

[unicode_test]$ export LANG=en_US.ISO8859-1
[unicode_test]$ ./unicodetest
28591


In my test program I write the data read from the database to a file
using TFileStream, thus console and file encoding settings will not
affect the data being written to file. TFileStream is simply writing bytes.

The “locale_utf8.png” screenshots shows the actual data in the database
on the left, and the read (and saved to a file “output.data”) data on
the right

Compiled with 64-bit FPC 3.0.1 (updated yesterday) on my FreeBSD 10.3
system. Firebird v2.5.4 is being used. I can supply a backup of the test
Firebird database too if needed - it is small.


I'm honestly trying very hard to understand the string changes
implemented in FPC 3.x, and the best way to use it going forward. In
this example I tried everything I learned from the recent mailing list
discussions. My concern with the usage of String/AnsiString still
stands, as this test program shows.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
program project1;

{$mode objfpc}
{$H+}
{$modeswitch unicodestrings} // this makes String = UnicodeString

 // use UTF8String instead of String
{$DEFINE u8}

uses
  cwstring,
  classes,
  sysutils,
  db,
  sqldb,
  IBConnection;

const
  cBOM = #$EF#$BB#$BF;

var
  FDatabase: TIBConnection;
  FTransaction: TSQLTransaction;
  FQuery: TSQLQuery;
  f: TFileStream;
  u: {$IFDEF u8} UTF8String {$ELSE} String {$ENDIF};
begin
  writeln(DefaultSystemCodePage);
  FDatabase := TIBConnection.Create(nil);
  FDatabase.Dialect := 3;
  FDatabase.LoginPrompt := False;
  FDatabase.CharSet := 'UTF8';
  FDatabase.DatabaseName := '192.168.0.2:/data/devel/data/unicode_test.fdb';
  FDatabase.UserName := 'sysdba';
  FDatabase.Password := 'masterkey';

  FTransaction := TSQLTransaction.Create(nil);
  FDatabase.Transaction := FTransaction;

  FQuery := TSQLQuery.Create(nil);
  FQuery.DataBase := FDatabase;
  FQuery.SQL.Text := 'SELECT DESCRIPTION, UNIVALUE FROM UNICODE'; // where Description = ''Ligatures''';

  FDatabase.Connected := True;
  FQuery.Open;

  f := TFileStream.Create('output.data', fmCreate);
  f.Write(cBOM[1], 3);

  FQuery.First;
  while not FQuery.EOF do
  begin
// field one
u := FQuery.FieldByName('DESCRIPTION').AsString;
f.write(u[1], Length(u));
f.WriteByte(10); // new line
// field two
u := FQuery.FieldByName('UNIVALUE').AsString;
f.write(u[1], Length(u));
f.WriteByte(10); // new line
f.WriteByte(10); // new line to separate records
FQuery.Next;
  end;
  f.Free;

  FDatabase.Connected := False;

  FQuery.Free;
  FTransaction.Free;
  FDatabase.Free;
end.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal