Re: [fpc-devel] fpdoc and unicode characters

Sergei Gorelkin Thu, 14 Aug 2008 05:24:56 -0700

Graeme Geldenhuys wrote:

On Thu, Aug 14, 2008 at 1:14 PM, Marco van de Voort <[EMAIL PROTECTED]> wrote:

How does this argument fit with XML which also uses UTF-8 as the de
facto standard encoding. And seeing that fpdoc uses XML for the
documentation files, can I use the actual Unicode characters in my
fpdoc documentation, or must I still stick with the?what now seems to
be outdated?escaped method?

Depends. Is & a steering character in all of XML, or only the xhtml like
standards?


I think only XHTML.

XML too. In XML, you *must* escape ampersand (U+0026) and less-than sign(U+003C). Also greater-than sign (U+003E) must be escaped if it ispreceded by ']]' sequence. Additionally, in attribute values, quotes(U+0022) must be escaped if they are used as value delimiters (otheroption is to delimit values with apostrophes (U+0027)).Here I mean the XML file, not the DOM tree. You may freely use thementioned characters in plaintext while manupulating DOM; the writerwill escape them on output.

But what is fpdoc's xml files?  Pure XML, XHTML or some custom/hybrid
format? The layout of fpdoc's files seem XML, but the documentation
content seems some hybrid HTML - hence the confusion with what is
allowed!

XHTML is XML with defined 'vocabulary' (DTD). These formats have nocharacter-level differences.

Anybody know the rules of strict XML files and Unicode?  Can I use
Unicode characters as data in XML nodes? I would imagine I may because
most well-formed XML files specify UTF-8 as the encoding type.

Also something I think has been resolved in recent versions, but in
older 'makeskel' versions, it did not include the encoding in the
generated .xml file.  So what are we supposed to treat such files
encoding as? Default to W3C standards and use assume UTF-8?  LCL and
fpGUI's fpdoc documentation (mostly) has no encoding specified in the
.xml files.  FPC's documentation specifies ISO8859-1 as the encoding
type, though I found one file (dateutils.xml) it FPC docs that hasn't
got an encoding (but my doc update is out of date).

W3C demands that XML file without encoding label should be treated asUTF-8 (unless it has an UTF-16 BOM, in which case it should be treatedas UTF-16). Therefore UTF-8 labeling is optional.In older times, makeskel used to write 'ISO8859-1' label, which btw isinvalid (IANA recognized names are ISO-8859-1 and ISO_8859-1). Later,when the parser got more compliant, the labeling was removed. The parserhas a workaround to understand the ISO8859-1 labeling.

The XML writer always produces UTF-8 encoding and writes no label.

To summarize: Unicode can be used in fpdoc xml files. If the file hasISO8859-1 encoding label, it should be removed or replaced with UTF-8label. The output stages of fpdoc may or may not have problems withUnicode - that requires additional research.


Sergei

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] fpdoc and unicode characters

Reply via email to