Graeme Geldenhuys wrote:
On Thu, Aug 14, 2008 at 1:14 PM, Marco van de Voort <[EMAIL PROTECTED]> wrote:
How does this argument fit with XML which also uses UTF-8 as the de
facto standard encoding. And seeing that fpdoc uses XML for the
documentation files, can I use the actual Unicode characters in my
fpdoc documentation, or must I still stick with the?what now seems to
be outdated?escaped method?
Depends. Is & a steering character in all of XML, or only the xhtml like
standards?
I think only XHTML.
XML too. In XML, you *must* escape ampersand (U+0026) and less-than sign
(U+003C). Also greater-than sign (U+003E) must be escaped if it is
preceded by ']]' sequence. Additionally, in attribute values, quotes
(U+0022) must be escaped if they are used as value delimiters (other
option is to delimit values with apostrophes (U+0027)).
Here I mean the XML file, not the DOM tree. You may freely use the
mentioned characters in plaintext while manupulating DOM; the writer
will escape them on output.
But what is fpdoc's xml files? Pure XML, XHTML or some custom/hybrid
format? The layout of fpdoc's files seem XML, but the documentation
content seems some hybrid HTML - hence the confusion with what is
allowed!
XHTML is XML with defined 'vocabulary' (DTD). These formats have no
character-level differences.
Anybody know the rules of strict XML files and Unicode? Can I use
Unicode characters as data in XML nodes? I would imagine I may because
most well-formed XML files specify UTF-8 as the encoding type.
Also something I think has been resolved in recent versions, but in
older 'makeskel' versions, it did not include the encoding in the
generated .xml file. So what are we supposed to treat such files
encoding as? Default to W3C standards and use assume UTF-8? LCL and
fpGUI's fpdoc documentation (mostly) has no encoding specified in the
.xml files. FPC's documentation specifies ISO8859-1 as the encoding
type, though I found one file (dateutils.xml) it FPC docs that hasn't
got an encoding (but my doc update is out of date).
W3C demands that XML file without encoding label should be treated as
UTF-8 (unless it has an UTF-16 BOM, in which case it should be treated
as UTF-16). Therefore UTF-8 labeling is optional.
In older times, makeskel used to write 'ISO8859-1' label, which btw is
invalid (IANA recognized names are ISO-8859-1 and ISO_8859-1). Later,
when the parser got more compliant, the labeling was removed. The parser
has a workaround to understand the ISO8859-1 labeling.
The XML writer always produces UTF-8 encoding and writes no label.
To summarize: Unicode can be used in fpdoc xml files. If the file has
ISO8859-1 encoding label, it should be removed or replaced with UTF-8
label. The output stages of fpdoc may or may not have problems with
Unicode - that requires additional research.
Sergei
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel