[fpc-pascal] XML with lazarus UTF8 problem
Hi, I am using DOM and XMLRead units for parsing XML files. The XML file is UTF-8. The text-nodes in the file contain german umlaute. The file is valid XML! When it try to get the textnodes containing german umlaute, I get an empty string, although the length of the string seems to be correct. I uses Lazarus from trunk this morning, FPC 2.2.4. I even tried fcl-xml from FPC svn trunk with no changes... -- Hinnack ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] XML with lazarus UTF8 problem
Henrik Genssen rašė: Hi, I am using DOM and XMLRead units for parsing XML files. The XML file is UTF-8. The text-nodes in the file contain german umlaute. The file is valid XML! When it try to get the textnodes containing german umlaute, I get an empty string, although the length of the string seems to be correct. I uses Lazarus from trunk this morning, FPC 2.2.4. I even tried fcl-xml from FPC svn trunk with no changes... -- Hinnack Maybe this helps you: http://news.gmane.org/find-root.php?message_id=%3c4734B336.1080806%40erdves.lt%3e I solved this problem by using UTF8Decode/UTF8Encode routines to convert between UTF8 - Widestring. -- Valdas Jankūnas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] XML with lazarus UTF8 problem
Hello FPC-Pascal, Monday, July 6, 2009, 8:13:56 AM, you wrote: HG I am using DOM and XMLRead units for parsing XML files. HG The XML file is UTF-8. The text-nodes in the file contain german umlaute. HG The file is valid XML! I do not work with XML DOM but if the file is UTF-8 and the text node contains an ANSI german umlaute the XML is not valid as the text is not UTF-8 conformant. fpc UTF-8 decoder clear any UTF-8 string with wrong encode (I think trunk one replace wrong encodes with '?' mark). -- Best regards, JoshyFun ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
RE: Re: [fpc-pascal] XML with lazarus UTF8 problem
As I said, the file is valid XML. It parses well in: - Firefox - IE - libxml - msxml regards Hinnack reply to message: date: 06.07.2009 13:17:05 from: JoshyFun joshy...@gmail.com to: FPC-Pascal users discussions fpc-pascal@lists.freepascal.org subject: Re: [fpc-pascal] XML with lazarus UTF8 problem Hello FPC-Pascal, Monday, July 6, 2009, 8:13:56 AM, you wrote: HG I am using DOM and XMLRead units for parsing XML files. HG The XML file is UTF-8. The text-nodes in the file contain german umlaute. HG The file is valid XML! I do not work with XML DOM but if the file is UTF-8 and the text node contains an ANSI german umlaute the XML is not valid as the text is not UTF-8 conformant. fpc UTF-8 decoder clear any UTF-8 string with wrong encode (I think trunk one replace wrong encodes with '?' mark). -- Best regards, JoshyFun ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
RE: Re: [fpc-pascal] XML with lazarus UTF8 problem
On Mon, 6 Jul 2009, Henrik Genssen wrote: As I said, the file is valid XML. No-one disputes this, the question is how the codepage is used in the rest of your program. You may need to do a manual transformation e.g. to widestrings in order to be able to use the XML objects in your application. Michael. I do not work with XML DOM but if the file is UTF-8 and the text node contains an ANSI german umlaute the XML is not valid as the text is not UTF-8 conformant. fpc UTF-8 decoder clear any UTF-8 string with wrong encode (I think trunk one replace wrong encodes with '?' mark). -- Best regards, JoshyFun ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
RE: Re: [fpc-pascal] XML with lazarus UTF8 problem
using UTF8Encode / UTF8Decode does not make any change I used that already - even leaving it out does not work. regards Hinnack reply to message: date: 06.07.2009 12:42:46 from: Valdas Jankunas zmu...@gmail.com to: FPC-Pascal users discussions fpc-pascal@lists.freepascal.org subject: Re: [fpc-pascal] XML with lazarus UTF8 problem Henrik Genssen rae: Hi, I am using DOM and XMLRead units for parsing XML files. The XML file is UTF-8. The text-nodes in the file contain german umlaute. The file is valid XML! When it try to get the textnodes containing german umlaute, I get an empty string, although the length of the string seems to be correct. I uses Lazarus from trunk this morning, FPC 2.2.4. I even tried fcl-xml from FPC svn trunk with no changes... -- Hinnack Maybe this helps you: http://news.gmane.org/find-root.php?message_id=%3c4734B336.1080806%40erdves.lt%3e I solved this problem by using UTF8Decode/UTF8Encode routines to convert between UTF8 - Widestring. -- Valdas Jankunas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
RE: Re: [fpc-pascal] XML with lazarus UTF8 problem
Moin Hinnack, Am Montag, den 06.07.2009, 11:28 +0200 schrieb Henrik Genssen: using UTF8Encode / UTF8Decode does not make any change I used that already - even leaving it out does not work. I have used that routines for using XML files encoded similarly to yours. After reading the file to a DOM treee I'm putting some strings into GTK-components like this: stitle := UTF8Encode(widestring(entry.Title)); (Where entry is a structure pointing to the DOM tree.) And the umlauts are shown as expected. Did you add cwstring to the uses clause of your program? uses {$IFDEF UNIX}{$IFDEF UseCThreads} cthreads, {$ENDIF}{$ENDIF} Interfaces, // this includes the LCL widgetset Forms { you can add units after this }, cwstring, ... HTH, -- Marc Santhoff m.santh...@web.de ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
RE: RE: Re: [fpc-pascal] XML with lazarus UTF8 problem
Moin, Moin, sounds interesting. Where do I get cwstring? on this thread someone pointed out, that one needs it only on linux - do I need it on darwin, too? http://lists.freepascal.org/lists/fpc-devel/2007-November/012047.html Why do I have to Encode and not to Decode UTF8 if the file is UTF8. This is my mistake. It works now! But I do not understand that. Can someone explain? regards Hinnack reply to message: date: 06.07.2009 14:14:31 from: Marc Santhoff m.santh...@web.de to: FPC-Pascal users discussions fpc-pascal@lists.freepascal.org subject: RE: Re: [fpc-pascal] XML with lazarus UTF8 problem Moin Hinnack, Am Montag, den 06.07.2009, 11:28 +0200 schrieb Henrik Genssen: using UTF8Encode / UTF8Decode does not make any change I used that already - even leaving it out does not work. I have used that routines for using XML files encoded similarly to yours. After reading the file to a DOM treee I'm putting some strings into GTK-components like this: stitle := UTF8Encode(widestring(entry.Title)); (Where entry is a structure pointing to the DOM tree.) And the umlauts are shown as expected. Did you add cwstring to the uses clause of your program? uses {$IFDEF UNIX}{$IFDEF UseCThreads} cthreads, {$ENDIF}{$ENDIF} Interfaces, // this includes the LCL widgetset Forms { you can add units after this }, cwstring, ... HTH, -- Marc Santhoff m.santh...@web.de ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
RE: RE: Re: [fpc-pascal] XML with lazarus UTF8 problem
Am Montag, den 06.07.2009, 12:12 +0200 schrieb Henrik Genssen: Moin, Moin, sounds interesting. Where do I get cwstring? It comes with fpc, you only need to activate it. on this thread someone pointed out, that one needs it only on linux - do I need it on darwin, too? http://lists.freepascal.org/lists/fpc-devel/2007-November/012047.html Dunno, I'm not using Darwin. Why do I have to Encode and not to Decode UTF8 if the file is UTF8. This is my mistake. It works now! But I do not understand that. Can someone explain? Don't aks me, I didn't name those functions. IIRC I asked myself the same question. ;) Munter bleiben, -- Marc Santhoff m.santh...@web.de ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re[4]: [fpc-pascal] XML with lazarus UTF8 problem
Hello FPC-Pascal, Monday, July 6, 2009, 12:12:18 PM, you wrote: HG sounds interesting. Where do I get cwstring? HG on this thread someone pointed out, that one needs it only on HG linux - do I need it on darwin, too? HG http://lists.freepascal.org/lists/fpc-devel/2007-November/012047.html HG Why do I have to Encode and not to Decode UTF8 if the file is HG UTF8. This is my mistake. It works now! HG But I do not understand that. Can someone explain? So the problem was not reading the DOM but displaying the nodes. The fpc DOM (if my memory serves me) always store everything in WideString format, regardless the input encoding. That's the reason you cast it to WideString and them UTF8 encode it to be displayed by the LCL. I think the cast is not needed, but as the string is a DOMString maybe it is a must (not checked). -- Best regards, JoshyFun ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal