Re: RE : [fpc-pascal] JSON and UTF8

2012-07-11 Thread Luiz Americo Pereira Camara

Em 10/7/2012 23:19, waldo kitty escreveu:

On 7/10/2012 07:00, Luiz Americo Pereira Camara wrote:
With the old behavior, in an system with a system code page  UTF8, 
if i try to

show the parsed value of \u4E01 in e.g. a LCL app will get garbage.

I would expect to work correctly in any enviroment


this means that some environments will end up with garbage for those 
UTF-8 characters that cannot be translated back to the local 
codepage... i've been running headlong into this with another project 
and needing to convert from UTF-8 back to at least CP437... there are 
more than 255 characters in UTF-8 and there's no way i know of to 
translate them all back to 255 characters... even with trying to use 
multiples like ae for æ ( alt-145 in CP437 i think realizing that this 
editor can do whatever it wants to :/ )... the doublet and the 
character i typed the ones i was thinking of for this example, though...


In the previous behavior (conversion UTF16 - system code page) you will 
get a meaningless character anyway, i.e., those unicode characters are 
not correctly translated to the system code page correctly since is 
impossible.


BTW: the original issue is already fixed. Thanks Michael

Luiz

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Ludo Brands
 Following up on bug 22310 http://bugs.freepascal.org/view.php?id=22310
 
 I enabled the use of UTF8 in the FPC JSON support.
 
 The constructors of the JSON parser/scanner now accept an 
 extra argument 
 UseUTF8 which tells them to convert JSON strings to UTF8, not 
 the system codepage.
 
I don't understand why you want to keep buggy behavior for backwards
compatibility. I explain:
StringToJSONString doesn't do any character conversion except for some
special characters. The json spec rfc4627 par 3 says: JSON text SHALL be
encoded in Unicode.  The default encoding is UTF-8. Since no conversion is
done logic says that fpjson expects unicode. Since TJSONStringType =
AnsiString, UTF-8 is the only unicode encoding possible. So I don't
understand why writing a TJSONStringType has to be utf8 to be compliant with
the spec and with the outside world and when reading the same string back it
is converted in system encoding. 
If system encoding support is needed then StringToJSONString should do also
a system to utf8 encoding. 

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Ludo Brands
 Because the old behaviour is not buggy.
 
 It simply did not support Unicode, and does the next best thing, 
 in casu: it transforms to the system codepage.
 
 A car without ABS and SAT-Nav is not buggy. 
 It just doesn't support features which are nowadays called 
 standard. You can perfectly drive it.
 

An old car without a seat belt is buggy although when it was built seat
belts where not standard. 

 Just as Unicode is nowadays standard, N years ago, it was not.
 
 Since backwards compatibility is very important, we must 
 offer the option.
 
The Json standard specifies unicode. Why have utf8 default off?  
Old cars are retro-fitted with seat belts. Why sell new cars with seat belts
as an option? 

Ludo 

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Michael Van Canneyt



On Tue, 10 Jul 2012, Ludo Brands wrote:


Because the old behaviour is not buggy.

It simply did not support Unicode, and does the next best thing,
in casu: it transforms to the system codepage.

A car without ABS and SAT-Nav is not buggy.
It just doesn't support features which are nowadays called
standard. You can perfectly drive it.



An old car without a seat belt is buggy although when it was built seat
belts where not standard.


I still wouldn't call the car buggy. 
For me a car must be able to drive. That is its essential function.

If it doesn't start, or stops every 5 minutes, then it is buggy.
All the rest are options. Whether they are legally required or 
not is irrelevant to the car-ness of the car.


But I drive a motorcycle, so the point is moot ;-)




Just as Unicode is nowadays standard, N years ago, it was not.

Since backwards compatibility is very important, we must
offer the option.


The Json standard specifies unicode. Why have utf8 default off?


For backwards compatibility by default.


Old cars are retro-fitted with seat belts. Why sell new cars with seat belts
as an option?


So you'd reverse the constructor boolean argument to specify Utf8 as default, 
and let the user choose the old behaviour if he needs it ?


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


RE : RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Ludo Brands
 So you'd reverse the constructor boolean argument to specify 
 Utf8 as default, 
 and let the user choose the old behaviour if he needs it ?
 
If that is unthinkable then define new contructors
TJSONParser.Create2(...,AUseUTF8 : Boolean = True) or Create2(...,AUseUTF8 :
Boolean = True) and mark the Create(...) as deprecated. Or deprecate
TJSONParser and make it a descendant of TJSONParser2 that has
Create(...,AUseUTF8 : Boolean = True) constructors.

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Michael Van Canneyt



On Tue, 10 Jul 2012, Ludo Brands wrote:


So you'd reverse the constructor boolean argument to specify
Utf8 as default,
and let the user choose the old behaviour if he needs it ?


If that is unthinkable then define new contructors
TJSONParser.Create2(...,AUseUTF8 : Boolean = True) or Create2(...,AUseUTF8 :
Boolean = True) and mark the Create(...) as deprecated. Or deprecate
TJSONParser and make it a descendant of TJSONParser2 that has
Create(...,AUseUTF8 : Boolean = True) constructors.


Nothing is unthinkable. The other constructs are very ugly.

I reversed the argument default value to True. UTF8 is now the default.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


RE : RE : RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Ludo Brands

 Nothing is unthinkable. The other constructs are very ugly.
 
 I reversed the argument default value to True. UTF8 is now 
 the default.
 

A very wise decision;)

Ludo

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : RE : RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Michael Van Canneyt



On Tue, 10 Jul 2012, Ludo Brands wrote:




Nothing is unthinkable. The other constructs are very ugly.

I reversed the argument default value to True. UTF8 is now
the default.



A very wise decision;)


I must be getting older :)

Seriously: when json support was written, Unicode/UTF8 support in FPC was 
sketchy at best.
It makes sense to change it, but backward compatibility must be preserved as 
much as possible.

I have added a section to

http://wiki.freepascal.org/index.php?title=User_Changes_Trunk

to notify people of this change.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : RE : RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Jonas Maebe

On 10 Jul 2012, at 11:40, Michael Van Canneyt wrote:

 I have added a section to
 
 http://wiki.freepascal.org/index.php?title=User_Changes_Trunk
 
 to notify people of this change.

Could you change it to use the same format/template as the other entries?


Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : RE : RE : RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Marco van de Voort
In our previous episode, Michael Van Canneyt said:
 
  http://wiki.freepascal.org/index.php?title=User_Changes_Trunk
 
  to notify people of this change.
 
  Could you change it to use the same format/template as the other entries?
 
 We can always trust Jonas to notice such things :-)
 
 Done.

Jonas remark reminded me of something else. Delphi uses TEncoding
everywhere as parameter (specially load/savetofile/stream like methods) to 
select codepage. Maybe do that too here?
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Luiz Americo Pereira Camara

Em 10/7/2012 04:32, Ludo Brands escreveu:

Following up on bug 22310 http://bugs.freepascal.org/view.php?id=22310

I enabled the use of UTF8 in the FPC JSON support.

The constructors of the JSON parser/scanner now accept an
extra argument
UseUTF8 which tells them to convert JSON strings to UTF8, not
the system codepage.


I don't understand why you want to keep buggy behavior for backwards
compatibility. I explain:
StringToJSONString doesn't do any character conversion except for some
special characters. The json spec rfc4627 par 3 says: JSON text SHALL be
encoded in Unicode.  The default encoding is UTF-8. Since no conversion is
done logic says that fpjson expects unicode. Since TJSONStringType =
AnsiString, UTF-8 is the only unicode encoding possible. So I don't
understand why writing a TJSONStringType has to be utf8 to be compliant with
the spec and with the outside world and when reading the same string back it
is converted in system encoding.
If system encoding support is needed then StringToJSONString should do also
a system to utf8 encoding.


Agree with Ludo

Luiz
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread Luiz Americo Pereira Camara

Em 10/7/2012 04:59, Michael Van Canneyt escreveu:



On Tue, 10 Jul 2012, Ludo Brands wrote:


Following up on bug 22310 http://bugs.freepascal.org/view.php?id=22310

I enabled the use of UTF8 in the FPC JSON support.

The constructors of the JSON parser/scanner now accept an
extra argument
UseUTF8 which tells them to convert JSON strings to UTF8, not
the system codepage.


I don't understand why you want to keep buggy behavior for backwards
compatibility.


Because the old behaviour is not buggy.


It's.

With the old behavior, in an system with a system code page   UTF8,  
if i try to show the parsed value of \u4E01 in e.g. a LCL app will get 
garbage.


I would expect to work correctly in any enviroment

Luiz
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: RE : [fpc-pascal] JSON and UTF8

2012-07-10 Thread waldo kitty

On 7/10/2012 07:00, Luiz Americo Pereira Camara wrote:

With the old behavior, in an system with a system code page  UTF8, if i try to
show the parsed value of \u4E01 in e.g. a LCL app will get garbage.

I would expect to work correctly in any enviroment


this means that some environments will end up with garbage for those UTF-8 
characters that cannot be translated back to the local codepage... i've been 
running headlong into this with another project and needing to convert from 
UTF-8 back to at least CP437... there are more than 255 characters in UTF-8 and 
there's no way i know of to translate them all back to 255 characters... even 
with trying to use multiples like ae for æ ( alt-145 in CP437 i think realizing 
that this editor can do whatever it wants to :/ )... the doublet and the 
character i typed the ones i was thinking of for this example, though...

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal