[fpc-pascal] XMLWrite looses data

2014-03-24 Thread Graeme Geldenhuys
Hi,

I'm loading up a XSL file into a TXMLDocument using XMLRead. Up to this
point everything seems to be ok, and I can query the DOMNodes without
problem. If I then save that file out again, using XMLWrite, I noticed
that some data is lost. :-/

I don't know if this is because the file is a XSL file? Though I thought
XSL is exactly the same structure as XML - so didn't expect any problems.

Anyway, here is a sample area in the XSL file that looses data.

Before:
8-8-8-8-8

!-- Short Month Name --
xsl:template name=format-date-shortxsl:param name=date/
   xsl:choose
  xsl:when test=string-length($date)=0/xsl:when
 xsl:otherwise
 xsl:variable name=month
xsl:choose
   xsl:when test=substring($date,1,3)='Jan'01/xsl:when
   xsl:when test=substring($date,1,3)='Feb'02/xsl:when
   xsl:when test=substring($date,1,3)='Mar'03/xsl:when
   xsl:when test=substring($date,1,3)='Apr'04/xsl:when
   xsl:when test=substring($date,1,3)='May'05/xsl:when
   xsl:when test=substring($date,1,3)='Jun'06/xsl:when
   xsl:when test=substring($date,1,3)='Jul'07/xsl:when
   xsl:when test=substring($date,1,3)='Aug'08/xsl:when
   xsl:when test=substring($date,1,3)='Sep'09/xsl:when
   xsl:when test=substring($date,1,3)='Oct'10/xsl:when
   xsl:when test=substring($date,1,3)='Nov'11/xsl:when
   xsl:when test=substring($date,1,3)='Dec'12/xsl:when
/xsl:choose
 /xsl:variable
 xsl:value-of
select=substring($date,5,2)/#xa0;xsl:value-of
select=$month/#xa0;xsl:value-of select=substring($date,8,4)/
  /xsl:otherwise
   /xsl:choose
/xsl:template
8-8-8-8-8


After the save:
8-8-8-8-8
  !-- Short Month Name --
  xsl:template name=format-date-short
xsl:param name=date/
xsl:choose
  xsl:when test=string-length($date)=0/
  xsl:otherwise
xsl:variable name=month
  xsl:choose
xsl:when test=substring($date,1,3)='Jan'01/xsl:when
xsl:when test=substring($date,1,3)='Feb'02/xsl:when
xsl:when test=substring($date,1,3)='Mar'03/xsl:when
xsl:when test=substring($date,1,3)='Apr'04/xsl:when
xsl:when test=substring($date,1,3)='May'05/xsl:when
xsl:when test=substring($date,1,3)='Jun'06/xsl:when
xsl:when test=substring($date,1,3)='Jul'07/xsl:when
xsl:when test=substring($date,1,3)='Aug'08/xsl:when
xsl:when test=substring($date,1,3)='Sep'09/xsl:when
xsl:when test=substring($date,1,3)='Oct'10/xsl:when
xsl:when test=substring($date,1,3)='Nov'11/xsl:when
xsl:when test=substring($date,1,3)='Dec'12/xsl:when
  /xsl:choose
/xsl:variable
xsl:value-of select=substring($date,5,2)/ 
xsl:value-of select=$month/ 
xsl:value-of select=substring($date,8,4)/
  /xsl:otherwise
/xsl:choose
  /xsl:template
8-8-8-8-8


Note the two '#xa0;' escaped characters are lost near the end in the
newly written file.

I'm using FPC 2.6.2 under 64-bit FreeBSD, but will be compiling this
application for Windows 32-bit and 64-bit tomorrow at work.

Any idea what is causing this? A bug, because I'm using XSL or anything
else maybe?

The XSL file passed all validation tools I could throw at it. We
currently use the full XSL file to populate and generate PDF documents,
so I don't believe there is any validation/syntax issues.


In case this is useful, the XSL file starts like this.

8-8-8-8-8
?xml version=1.0 encoding=utf-8 ?
xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform;
version=1.0
xsl:output method=html indent=yes /
xsl:decimal-format name=noNaN decimal-separator=.
grouping-separator=, NaN= /
xsl:template match=/QUESTIONS
...snip...
8-8-8-8-8

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] XMLWrite looses data

2014-03-24 Thread Daniel Gaspary
On Sun, Mar 23, 2014 at 2:58 PM, Graeme Geldenhuys
mailingli...@geldenhuys.co.uk wrote:
 I'm using FPC 2.6.2 under 64-bit FreeBSD, but will be compiling this
 application for Windows 32-bit and 64-bit tomorrow at work.

If you can, try also the Laz2_ XML units: Laz2_Dom, Laz2_xmlwrite and read.

They seem to work better with unicode or utf8 at least.

 Any idea what is causing this? A bug, because I'm using XSL or anything
 else maybe?

No idea, but maybe changing some parser option can help. The
Validating example shows how to change options:

http://wiki.freepascal.org/XML_Tutorial#Validating_a_document
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] XMLWrite looses data

2014-03-24 Thread Mattias Gaertner
On Sun, 23 Mar 2014 17:58:16 +
Graeme Geldenhuys mailingli...@geldenhuys.co.uk wrote:

 Hi,
 
 I'm loading up a XSL file into a TXMLDocument using XMLRead. Up to this
 point everything seems to be ok, and I can query the DOMNodes without
 problem. If I then save that file out again, using XMLWrite, I noticed
 that some data is lost. :-/
 
 I don't know if this is because the file is a XSL file? Though I thought
 XSL is exactly the same structure as XML - so didn't expect any problems.

Yes, XSL is XML.

 [...]
 Note the two '#xa0;' escaped characters are lost near the end in the
 newly written file.

The parser converts #*; to Unicode characters when
reading. AFAIR some xsl parsers like xsltproc do the same.
If you want xslt to output '#xa0;' you can use 'amp;#xa0;'


Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] XMLWrite looses data

2014-03-24 Thread Graeme Geldenhuys
On 2014-03-24 13:58, Mattias Gaertner wrote:
 
 Yes, XSL is XML.

Thought so - thanks for confirming.


 The parser converts #*; to Unicode characters when
 reading. AFAIR some xsl parsers like xsltproc do the same.
 If you want xslt to output '#xa0;' you can use 'amp;#xa0;'

Thanks for that info, it helped find the problem (though no solution
yet). Tha character isn't actully a unicode character, it is simply a
no-break space character at position $A0 in the ASCII chart. Using hex
value notation, instead of the more popular decimal notation when escaped.

===[ charmap details ]
U+00A0 NO-BREAK SPACE
UTF-8: 0xC2 0xA0
UTF-16: 0x00A0

C octal escaped UTF-8: \302\240
XML decimal entity: #160;
=

But I now see what happened. When I enabled show hidden characters
like spaces and tabs in my editor, I noticed that the no-break space
character is still there, but in the resaved output file it is simply
not escaped any more.

How is the fcl-xml package supposed to handle escaped characters which
will form part of the data the XSL will generate? Is fcl-xml supposed to
write them back as escaped characters, or as an normal un-escaped character?

I tried using the decimal notation too: #160;
And that produced the same result as the original.

Note:
When we process a XML file with our XSL file, we want he resulting
output to have a no-break character - we don't what to display the text
'#a0;' - which I think is what your suggestion with the amp; will produce.

To put this in context, in case my original XSL snippet wasn't clear.
That snippet generates a date string in the format 'dd MMM ' and the
spaces between those elements are not normal spaces, but no-break
spaces, so that whole text stays together (and wouldn't wordwrap in the
middle).


The current resaved XSL file still works, but not being able to
physically see the no-break space characters could cause us problems
months down the line when we re-edit those files. Hence the reason they
were escaped (to make them clearly visible to the developer).


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] XMLWrite looses data

2014-03-24 Thread Mattias Gaertner
On Mon, 24 Mar 2014 20:12:50 +
Graeme Geldenhuys mailingli...@geldenhuys.co.uk wrote:

[...]
  The parser converts #*; to Unicode characters when
  reading. AFAIR some xsl parsers like xsltproc do the same.
  If you want xslt to output '#xa0;' you can use 'amp;#xa0;'
 
 Thanks for that info, it helped find the problem (though no solution
 yet). Tha character isn't actully a unicode character, it is simply a
 no-break space character at position $A0 in the ASCII chart.

Well, I see, that the term character is confusing here.
It is a Unicode codepoint. The #xa0; is just a xml alias. For xml it
does not matter if you write it as code or encoded in UTF-8/UTF-16.


 Using hex
 value notation, instead of the more popular decimal notation when escaped.
 
 ===[ charmap details ]
 U+00A0 NO-BREAK SPACE
 UTF-8: 0xC2 0xA0
 UTF-16: 0x00A0
 
 C octal escaped UTF-8: \302\240
 XML decimal entity: #160;
 =
 
 But I now see what happened. When I enabled show hidden characters
 like spaces and tabs in my editor, I noticed that the no-break space
 character is still there, but in the resaved output file it is simply
 not escaped any more.

Yes. That's what I meant.

 
 How is the fcl-xml package supposed to handle escaped characters which
 will form part of the data the XSL will generate? Is fcl-xml supposed to
 write them back as escaped characters, or as an normal un-escaped character?

XML writers can choose. Both forms are valid xml of the given text.

 
 I tried using the decimal notation too: #160;
 And that produced the same result as the original.
 
 Note:
 When we process a XML file with our XSL file, we want he resulting
 output to have a no-break character - we don't what to display the text
 '#a0;' - which I think is what your suggestion with the amp; will produce.
 
 To put this in context, in case my original XSL snippet wasn't clear.
 That snippet generates a date string in the format 'dd MMM ' and the
 spaces between those elements are not normal spaces, but no-break
 spaces, so that whole text stays together (and wouldn't wordwrap in the
 middle).
 
 
 The current resaved XSL file still works, but not being able to
 physically see the no-break space characters could cause us problems
 months down the line when we re-edit those files. Hence the reason they
 were escaped (to make them clearly visible to the developer).

You can use comments.

The current XML writer only escapes '', '', '', #0..#31.
Maybe you want to extend it with an option or hook to escape more
characters. For example all control characters.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal