Re: [fpc-pascal] problems using utf8toansi
On 10 Dec 2007, at 08:43, Marc Santhoff wrote: You can compile with -al and search for CWSTRING in the assembler file generated for your main program. Since that unit has an initialization section, it will be in the init/final table if it's included somewhere. Hm, that's funny, the string is not found. I did: $ fpc -Fu../zipfile -al -B -FE./bin TestDocInfo $ grep -i CWSTRING bin/*.s and the output was empty. Meanwhile I had some look and found that DOM is using a type DOMString everywhere which itself is defined as DOMString = WideString; so that is an indicator for using widestrings? The uses-line looks like this: uses {$IFDEF MEM_CHECK}MemCheck,{$ENDIF} SysUtils, Classes, AVL_Tree; Confusing ... The system and sysutils units contain bare metal widestring support: i.e., widestring support which only works (as far as alphabetical ordering, upper/lowercase support and converting from/to ansistrings is concerned) with ascii values = #127. It is perfectly possible to use widestrings in that way, but then they are simply using twice the memory for no gain whatsoever. You have to add cwstring on any *nix platform to get actual ansi/ widestring support for your current locale. If you don't, anything can happen. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] problems using utf8toansi
Am Montag, den 10.12.2007, 11:10 +0100 schrieb Jonas Maebe: On 10 Dec 2007, at 08:43, Marc Santhoff wrote: Confusing ... The system and sysutils units contain bare metal widestring support: i.e., widestring support which only works (as far as alphabetical ordering, upper/lowercase support and converting from/to ansistrings is concerned) with ascii values = #127. It is perfectly possible to use widestrings in that way, but then they are simply using twice the memory for no gain whatsoever. You have to add cwstring on any *nix platform to get actual ansi/ widestring support for your current locale. If you don't, anything can happen. Now thing are getting clear. I'll look at sysutils and try out the behaviour in both cases to be safe. Thank you, Marc ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] problems using utf8toansi
On 07 Dec 2007, at 20:01, Marc Santhoff wrote: Am Freitag, den 07.12.2007, 14:00 +0100 schrieb Jonas Maebe: Also, if you do not use the cwstring unit, a lot of things will not work with widestrings under *nix (including FreeBSD). The fact that some chars such as Umlauts and 'ß' work suggests that some other unit is already using it though. That may well be the case, it is a components source pulling lots of LCL stuff in (derived from Darius' TZipFile). Although I searched the first levels of uses-dependecies to no avail. You can compile with -al and search for CWSTRING in the assembler file generated for your main program. Since that unit has an initialization section, it will be in the init/final table if it's included somewhere. Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] problems using utf8toansi
Am Sonntag, den 09.12.2007, 21:38 +0100 schrieb Jonas Maebe: On 07 Dec 2007, at 20:01, Marc Santhoff wrote: Am Freitag, den 07.12.2007, 14:00 +0100 schrieb Jonas Maebe: Also, if you do not use the cwstring unit, a lot of things will not work with widestrings under *nix (including FreeBSD). The fact that some chars such as Umlauts and 'ß' work suggests that some other unit is already using it though. That may well be the case, it is a components source pulling lots of LCL stuff in (derived from Darius' TZipFile). Although I searched the first levels of uses-dependecies to no avail. You can compile with -al and search for CWSTRING in the assembler file generated for your main program. Since that unit has an initialization section, it will be in the init/final table if it's included somewhere. Hm, that's funny, the string is not found. I did: $ fpc -Fu../zipfile -al -B -FE./bin TestDocInfo $ grep -i CWSTRING bin/*.s and the output was empty. Meanwhile I had some look and found that DOM is using a type DOMString everywhere which itself is defined as DOMString = WideString; so that is an indicator for using widestrings? The uses-line looks like this: uses {$IFDEF MEM_CHECK}MemCheck,{$ENDIF} SysUtils, Classes, AVL_Tree; Confusing ... Marc ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] problems using utf8toansi
Am Sonntag, den 09.12.2007, 21:38 +0100 schrieb Jonas Maebe: You can compile with -al and search for CWSTRING in the assembler file generated for your main program. Since that unit has an initialization section, it will be in the init/final table if it's included somewhere. Another try: $ nm dom.o revealed at least: ... U FPC_WIDESTR_DECR_REF U FPC_WIDESTR_INCR_REF ... U fpc_widestr_compare U fpc_widestr_concat U fpc_widestr_copy U fpc_widestr_decr_ref U fpc_widestr_setlength so however it is done, DOM does seem to use widestrings IMHO. Marc ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] problems using utf8toansi
On 07 Dec 2007, at 07:43, Marc Santhoff wrote: output dbg: Description testing, one, two ... ? à dbg: /output Using german umlauts the same happens, the string is empty. When feeding in plain ascii the output is okay, the string is actually filled. On which platform with which locale/codepage? If on *nix, are you using the cwstring unit? Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] problems using utf8toansi
On 07 Dec 2007, at 13:17, Marc Santhoff wrote: Am Freitag, den 07.12.2007, 11:28 +0100 schrieb Jonas Maebe: On 07 Dec 2007, at 07:43, Marc Santhoff wrote: output dbg: Description testing, one, two ... ? à dbg: /output Using german umlauts the same happens, the string is empty. When feeding in plain ascii the output is okay, the string is actually filled. On which platform with which locale/codepage? If on *nix, are you using the cwstring unit? I'm using FreeBSD with ISO8859-1 or 15 and do not use cwstring explicitly. But I think my error was to assume the strings given by objects from the dom-unit are un-decoded UTF8. Now I think (haven't checked yet) that decoding to the german system locale (ISO8859-1 or 15) is done already. Ansistrings indeed always use the system locale. If I leave out the decoding completly it works - besides the missing euro sign, but that has very low prority. Umlauts and 'ß' are okay. ISO 8859-1 does not have a euro sign. ISO 8859-15 should have it though. Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] problems using utf8toansi
Hi, when using system.utf8toansi() the result is an empty string as soon as I put in some special chars: code {$H+} ... fDescription: String; ... function sDecode(sin: string): string; inline; begin result := utf8toansi(sin); end; ... fDescription := sDecode(Item[i].FirstChild.NodeValue); writeln('dbg: '+Item[i].FirstChild.NodeValue); writeln('dbg: '+fDescription); /code input Description testing, one, two ... € à /input xml dc:description Description testing, one, two ... â¬/dc:description /xml output dbg: Description testing, one, two ... ? à dbg: /output Using german umlauts the same happens, the string is empty. When feeding in plain ascii the output is okay, the string is actually filled. I fear this is another problem using the rather old fpc 2.0.4, but what's going on here? TIA, Marc ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal