[fpc-pascal] XMLWrite looses data
Hi, I'm loading up a XSL file into a TXMLDocument using XMLRead. Up to this point everything seems to be ok, and I can query the DOMNodes without problem. If I then save that file out again, using XMLWrite, I noticed that some data is lost. :-/ I don't know if this is because the file is a XSL file? Though I thought XSL is exactly the same structure as XML - so didn't expect any problems. Anyway, here is a sample area in the XSL file that looses data. Before: 8-8-8-8-8 !-- Short Month Name -- xsl:template name=format-date-shortxsl:param name=date/ xsl:choose xsl:when test=string-length($date)=0/xsl:when xsl:otherwise xsl:variable name=month xsl:choose xsl:when test=substring($date,1,3)='Jan'01/xsl:when xsl:when test=substring($date,1,3)='Feb'02/xsl:when xsl:when test=substring($date,1,3)='Mar'03/xsl:when xsl:when test=substring($date,1,3)='Apr'04/xsl:when xsl:when test=substring($date,1,3)='May'05/xsl:when xsl:when test=substring($date,1,3)='Jun'06/xsl:when xsl:when test=substring($date,1,3)='Jul'07/xsl:when xsl:when test=substring($date,1,3)='Aug'08/xsl:when xsl:when test=substring($date,1,3)='Sep'09/xsl:when xsl:when test=substring($date,1,3)='Oct'10/xsl:when xsl:when test=substring($date,1,3)='Nov'11/xsl:when xsl:when test=substring($date,1,3)='Dec'12/xsl:when /xsl:choose /xsl:variable xsl:value-of select=substring($date,5,2)/#xa0;xsl:value-of select=$month/#xa0;xsl:value-of select=substring($date,8,4)/ /xsl:otherwise /xsl:choose /xsl:template 8-8-8-8-8 After the save: 8-8-8-8-8 !-- Short Month Name -- xsl:template name=format-date-short xsl:param name=date/ xsl:choose xsl:when test=string-length($date)=0/ xsl:otherwise xsl:variable name=month xsl:choose xsl:when test=substring($date,1,3)='Jan'01/xsl:when xsl:when test=substring($date,1,3)='Feb'02/xsl:when xsl:when test=substring($date,1,3)='Mar'03/xsl:when xsl:when test=substring($date,1,3)='Apr'04/xsl:when xsl:when test=substring($date,1,3)='May'05/xsl:when xsl:when test=substring($date,1,3)='Jun'06/xsl:when xsl:when test=substring($date,1,3)='Jul'07/xsl:when xsl:when test=substring($date,1,3)='Aug'08/xsl:when xsl:when test=substring($date,1,3)='Sep'09/xsl:when xsl:when test=substring($date,1,3)='Oct'10/xsl:when xsl:when test=substring($date,1,3)='Nov'11/xsl:when xsl:when test=substring($date,1,3)='Dec'12/xsl:when /xsl:choose /xsl:variable xsl:value-of select=substring($date,5,2)/ xsl:value-of select=$month/ xsl:value-of select=substring($date,8,4)/ /xsl:otherwise /xsl:choose /xsl:template 8-8-8-8-8 Note the two '#xa0;' escaped characters are lost near the end in the newly written file. I'm using FPC 2.6.2 under 64-bit FreeBSD, but will be compiling this application for Windows 32-bit and 64-bit tomorrow at work. Any idea what is causing this? A bug, because I'm using XSL or anything else maybe? The XSL file passed all validation tools I could throw at it. We currently use the full XSL file to populate and generate PDF documents, so I don't believe there is any validation/syntax issues. In case this is useful, the XSL file starts like this. 8-8-8-8-8 ?xml version=1.0 encoding=utf-8 ? xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; version=1.0 xsl:output method=html indent=yes / xsl:decimal-format name=noNaN decimal-separator=. grouping-separator=, NaN= / xsl:template match=/QUESTIONS ...snip... 8-8-8-8-8 Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] XMLWrite looses data
On Sun, Mar 23, 2014 at 2:58 PM, Graeme Geldenhuys mailingli...@geldenhuys.co.uk wrote: I'm using FPC 2.6.2 under 64-bit FreeBSD, but will be compiling this application for Windows 32-bit and 64-bit tomorrow at work. If you can, try also the Laz2_ XML units: Laz2_Dom, Laz2_xmlwrite and read. They seem to work better with unicode or utf8 at least. Any idea what is causing this? A bug, because I'm using XSL or anything else maybe? No idea, but maybe changing some parser option can help. The Validating example shows how to change options: http://wiki.freepascal.org/XML_Tutorial#Validating_a_document ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] XMLWrite looses data
On Sun, 23 Mar 2014 17:58:16 + Graeme Geldenhuys mailingli...@geldenhuys.co.uk wrote: Hi, I'm loading up a XSL file into a TXMLDocument using XMLRead. Up to this point everything seems to be ok, and I can query the DOMNodes without problem. If I then save that file out again, using XMLWrite, I noticed that some data is lost. :-/ I don't know if this is because the file is a XSL file? Though I thought XSL is exactly the same structure as XML - so didn't expect any problems. Yes, XSL is XML. [...] Note the two '#xa0;' escaped characters are lost near the end in the newly written file. The parser converts #*; to Unicode characters when reading. AFAIR some xsl parsers like xsltproc do the same. If you want xslt to output '#xa0;' you can use 'amp;#xa0;' Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] XMLWrite looses data
On 2014-03-24 13:58, Mattias Gaertner wrote: Yes, XSL is XML. Thought so - thanks for confirming. The parser converts #*; to Unicode characters when reading. AFAIR some xsl parsers like xsltproc do the same. If you want xslt to output '#xa0;' you can use 'amp;#xa0;' Thanks for that info, it helped find the problem (though no solution yet). Tha character isn't actully a unicode character, it is simply a no-break space character at position $A0 in the ASCII chart. Using hex value notation, instead of the more popular decimal notation when escaped. ===[ charmap details ] U+00A0 NO-BREAK SPACE UTF-8: 0xC2 0xA0 UTF-16: 0x00A0 C octal escaped UTF-8: \302\240 XML decimal entity: #160; = But I now see what happened. When I enabled show hidden characters like spaces and tabs in my editor, I noticed that the no-break space character is still there, but in the resaved output file it is simply not escaped any more. How is the fcl-xml package supposed to handle escaped characters which will form part of the data the XSL will generate? Is fcl-xml supposed to write them back as escaped characters, or as an normal un-escaped character? I tried using the decimal notation too: #160; And that produced the same result as the original. Note: When we process a XML file with our XSL file, we want he resulting output to have a no-break character - we don't what to display the text '#a0;' - which I think is what your suggestion with the amp; will produce. To put this in context, in case my original XSL snippet wasn't clear. That snippet generates a date string in the format 'dd MMM ' and the spaces between those elements are not normal spaces, but no-break spaces, so that whole text stays together (and wouldn't wordwrap in the middle). The current resaved XSL file still works, but not being able to physically see the no-break space characters could cause us problems months down the line when we re-edit those files. Hence the reason they were escaped (to make them clearly visible to the developer). Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] XMLWrite looses data
On Mon, 24 Mar 2014 20:12:50 + Graeme Geldenhuys mailingli...@geldenhuys.co.uk wrote: [...] The parser converts #*; to Unicode characters when reading. AFAIR some xsl parsers like xsltproc do the same. If you want xslt to output '#xa0;' you can use 'amp;#xa0;' Thanks for that info, it helped find the problem (though no solution yet). Tha character isn't actully a unicode character, it is simply a no-break space character at position $A0 in the ASCII chart. Well, I see, that the term character is confusing here. It is a Unicode codepoint. The #xa0; is just a xml alias. For xml it does not matter if you write it as code or encoded in UTF-8/UTF-16. Using hex value notation, instead of the more popular decimal notation when escaped. ===[ charmap details ] U+00A0 NO-BREAK SPACE UTF-8: 0xC2 0xA0 UTF-16: 0x00A0 C octal escaped UTF-8: \302\240 XML decimal entity: #160; = But I now see what happened. When I enabled show hidden characters like spaces and tabs in my editor, I noticed that the no-break space character is still there, but in the resaved output file it is simply not escaped any more. Yes. That's what I meant. How is the fcl-xml package supposed to handle escaped characters which will form part of the data the XSL will generate? Is fcl-xml supposed to write them back as escaped characters, or as an normal un-escaped character? XML writers can choose. Both forms are valid xml of the given text. I tried using the decimal notation too: #160; And that produced the same result as the original. Note: When we process a XML file with our XSL file, we want he resulting output to have a no-break character - we don't what to display the text '#a0;' - which I think is what your suggestion with the amp; will produce. To put this in context, in case my original XSL snippet wasn't clear. That snippet generates a date string in the format 'dd MMM ' and the spaces between those elements are not normal spaces, but no-break spaces, so that whole text stays together (and wouldn't wordwrap in the middle). The current resaved XSL file still works, but not being able to physically see the no-break space characters could cause us problems months down the line when we re-edit those files. Hence the reason they were escaped (to make them clearly visible to the developer). You can use comments. The current XML writer only escapes '', '', '', #0..#31. Maybe you want to extend it with an option or hook to escape more characters. For example all control characters. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal