Re: [fpc-pascal] problems using utf8toansi

2007-12-10 Thread Jonas Maebe


On 10 Dec 2007, at 08:43, Marc Santhoff wrote:

You can compile with -al and search for CWSTRING in the assembler  
file
generated for your main program. Since that unit has an  
initialization
section, it will be in the init/final table if it's included  
somewhere.


Hm, that's funny, the string is not found.

I did:

$ fpc -Fu../zipfile -al -B -FE./bin TestDocInfo
$ grep -i CWSTRING bin/*.s

and the output was empty.

Meanwhile I had some look and found that DOM is using a type  
DOMString

everywhere which itself is defined as

DOMString = WideString;

so that is an indicator for using widestrings? The uses-line looks  
like

this:

uses
  {$IFDEF MEM_CHECK}MemCheck,{$ENDIF}
  SysUtils, Classes, AVL_Tree;

Confusing ...


The system and sysutils units contain bare metal widestring support:  
i.e., widestring support which only works (as far as alphabetical  
ordering, upper/lowercase support and converting from/to ansistrings  
is concerned) with ascii values = #127. It is perfectly possible to  
use widestrings in that way, but then they are simply using twice the  
memory for no gain whatsoever.


You have to add cwstring on any *nix platform to get actual ansi/ 
widestring support for your current locale. If you don't, anything  
can happen.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] problems using utf8toansi

2007-12-10 Thread Marc Santhoff
Am Montag, den 10.12.2007, 11:10 +0100 schrieb Jonas Maebe:
 On 10 Dec 2007, at 08:43, Marc Santhoff wrote:

  Confusing ...
 
 The system and sysutils units contain bare metal widestring support:  
 i.e., widestring support which only works (as far as alphabetical  
 ordering, upper/lowercase support and converting from/to ansistrings  
 is concerned) with ascii values = #127. It is perfectly possible to  
 use widestrings in that way, but then they are simply using twice the  
 memory for no gain whatsoever.
 
 You have to add cwstring on any *nix platform to get actual ansi/ 
 widestring support for your current locale. If you don't, anything  
 can happen.

Now thing are getting clear. I'll look at sysutils and try out the
behaviour in both cases to be safe.

Thank you,
Marc


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] problems using utf8toansi

2007-12-09 Thread Jonas Maebe


On 07 Dec 2007, at 20:01, Marc Santhoff wrote:


Am Freitag, den 07.12.2007, 14:00 +0100 schrieb Jonas Maebe:





Also, if you do not use the cwstring unit, a lot of things will not
work with widestrings under *nix (including FreeBSD). The fact that
some chars such as Umlauts and 'ß' work suggests that some other unit
is already using it though.


That may well be the case, it is a components source pulling lots of  
LCL

stuff in (derived from Darius' TZipFile).

Although I searched the first levels of uses-dependecies to no avail.


You can compile with -al and search for CWSTRING in the assembler file  
generated for your main program. Since that unit has an initialization  
section, it will be in the init/final table if it's included somewhere.



Jonas___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] problems using utf8toansi

2007-12-09 Thread Marc Santhoff
Am Sonntag, den 09.12.2007, 21:38 +0100 schrieb Jonas Maebe:
 On 07 Dec 2007, at 20:01, Marc Santhoff wrote:
 
  Am Freitag, den 07.12.2007, 14:00 +0100 schrieb Jonas Maebe:
 
 
  Also, if you do not use the cwstring unit, a lot of things will not
  work with widestrings under *nix (including FreeBSD). The fact that
  some chars such as Umlauts and 'ß' work suggests that some other unit
  is already using it though.
 
  That may well be the case, it is a components source pulling lots of  
  LCL
  stuff in (derived from Darius' TZipFile).
 
  Although I searched the first levels of uses-dependecies to no avail.
 
 You can compile with -al and search for CWSTRING in the assembler file  
 generated for your main program. Since that unit has an initialization  
 section, it will be in the init/final table if it's included somewhere.

Hm, that's funny, the string is not found.

I did:

$ fpc -Fu../zipfile -al -B -FE./bin TestDocInfo
$ grep -i CWSTRING bin/*.s

and the output was empty.

Meanwhile I had some look and found that DOM is using a type DOMString
everywhere which itself is defined as

DOMString = WideString;

so that is an indicator for using widestrings? The uses-line looks like
this:

uses
  {$IFDEF MEM_CHECK}MemCheck,{$ENDIF}
  SysUtils, Classes, AVL_Tree;

Confusing ...
Marc


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] problems using utf8toansi

2007-12-09 Thread Marc Santhoff
Am Sonntag, den 09.12.2007, 21:38 +0100 schrieb Jonas Maebe:
 You can compile with -al and search for CWSTRING in the assembler file  
 generated for your main program. Since that unit has an initialization  
 section, it will be in the init/final table if it's included somewhere.

Another try:

$ nm dom.o

revealed at least:

...
 U FPC_WIDESTR_DECR_REF
 U FPC_WIDESTR_INCR_REF
...
 U fpc_widestr_compare
 U fpc_widestr_concat
 U fpc_widestr_copy
 U fpc_widestr_decr_ref
 U fpc_widestr_setlength


so however it is done, DOM does seem to use widestrings IMHO.

Marc


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] problems using utf8toansi

2007-12-07 Thread Jonas Maebe


On 07 Dec 2007, at 07:43, Marc Santhoff wrote:


output
dbg: Description
testing, one, two ...
? à
dbg:
/output

Using german umlauts the same happens, the string is empty. When  
feeding

in plain ascii the output is okay, the string is actually filled.


On which platform with which locale/codepage? If on *nix, are you  
using the cwstring unit?



Jonas___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] problems using utf8toansi

2007-12-07 Thread Jonas Maebe


On 07 Dec 2007, at 13:17, Marc Santhoff wrote:


Am Freitag, den 07.12.2007, 11:28 +0100 schrieb Jonas Maebe:

On 07 Dec 2007, at 07:43, Marc Santhoff wrote:


output
dbg: Description
testing, one, two ...
? à
dbg:
/output

Using german umlauts the same happens, the string is empty. When
feeding
in plain ascii the output is okay, the string is actually filled.


On which platform with which locale/codepage? If on *nix, are you
using the cwstring unit?


I'm using FreeBSD with ISO8859-1 or 15 and do not use cwstring
explicitly.

But I think my error was to assume the strings given by objects  
from the

dom-unit are un-decoded UTF8. Now I think (haven't checked yet) that
decoding to the german system locale (ISO8859-1 or 15) is done  
already.


Ansistrings indeed always use the system locale.


If I leave out the decoding completly it works - besides the missing
euro sign, but that has very low prority. Umlauts and 'ß' are okay.


ISO 8859-1 does not have a euro sign. ISO 8859-15 should have it though.


Jonas___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


[fpc-pascal] problems using utf8toansi

2007-12-06 Thread Marc Santhoff
Hi,

when using system.utf8toansi() the result is an empty string as soon as
I put in some special chars:

code
{$H+}
...
fDescription: String;
...

 function sDecode(sin: string): string; inline;
  begin
result := utf8toansi(sin);
  end;

...

fDescription := sDecode(Item[i].FirstChild.NodeValue);
writeln('dbg: '+Item[i].FirstChild.NodeValue);
writeln('dbg: '+fDescription);
/code

input
 Description
testing, one, two ...
€ à 
/input

xml
dc:description Description
testing, one, two ...
€/dc:description
/xml

output
dbg: Description
testing, one, two ...
? à 
dbg: 
/output

Using german umlauts the same happens, the string is empty. When feeding
in plain ascii the output is okay, the string is actually filled.

I fear this is another problem using the rather old fpc 2.0.4, but
what's going on here?

TIA,
Marc


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal