PHPDoc uses the same setup of DocBook suggested by the DocBook manual
[1], namely, split up sections of the DocBook into fragments, and then
mush them back in using external entities. These external entities are
defined within the internal subset of a document, namely,

<!DOCTYPE book [
  <!ENTITY subset SYSTEM "subset.xml">
]>

PHPDoc takes this system up the wazoo; there are entities for literally
everything, and we only use XIncludes for very specific post-processing
that cannot be done with entities.

The practical consequence is that virtually *none* of the xml files
found in phpdoc/ are well-formed. The mere use of &reftitle.description;
invalidates the document, since /that/ particular document doesn't
define this entity (it doesn't have a DTD at all, because parsed
entities choke on them [2]).

A further consequence is that tool-based validators, even when given the
Docbook DTD, are not able to perform validation. I came across this when
I booted up an XML file in Komodo Edit 4, and the very first entity was
red-underlined. Tools like xmllint can't be applied to single files; the
entire build-system must be invoked to test a single file change.

I think PHPDoc can do better.

THE SOLUTION
------------

The internal subset we use to define our entities is merely an extension
of the DTD. Thus, they can be *moved* to an external DTD, something that
looks like:

<!-- We use xhtml entities all over the place -->
<!ENTITY % xhtml-lat1        SYSTEM "@srcdir@/entities/ISO/xhtml1-lat1.ent">

[snip all the other entity inclusions]

<!ENTITY % docbook-dtd PUBLIC "-//OASIS//DTD DocBook XML V5.0//EN"
"@srcdir@/docbook/docbook-xml/docbook.dtd">
%docbook-dtd;

The Doctype in manual.xml(.in) becomes the short and sweet:

<!DOCTYPE set SYSTEM "@srcdir@/phpdoc.dtd"> [3]

And, although we can't put a DOCTYPE in every XML document in phpdoc/,
we can specify it directly with xmllint --dtdvalid, and presto, instant
validation. There's nothing earth-shattering about this change; we've
simply factored out the necessary DTD definitions so that they can be
tacked on to an arbitrary file. [4]

Our documents still aren't well-formed, but with a little coaxing they
can be made to be so. If we wanted to get fancy, it would be trivial to
create a "wrapper" script that takes an XML source file, inserts the
proper doctype, and outputs those contents for validation (remember:
<!DOCTYPE $element defines the root-level element, so we can use
anything we want and the DTD will still validate it fine).

Oh, and there's one minor implementation detail: we need LIBXML_DTDLOAD
when we load the document (it gets loaded anyway when we call
validate(), so there shouldn't be any harm to performance).

Comments?

ENDNOTES
--------

[1] http://www.docbook.org/tdg5/en/html/ch02.html
[2] This limitation could be worked around using XInclude, but as
discussed in phd/RFC/Buildsystem-proposal.rtf, it's too large and
unwieldy to be useful
[3] Public identifier pending; also, the path could point anywhere,
probably in docbook
[4] In theory, it should be possible to make a catalog that maps to our
new DTD (although, in such a case, it would be a good idea to redefine
xmlns to something else). I haven't tested this for xmllint, but it
turns out that a limitation to Komodo's XML syntax checking doesn't use
catalog files, so, without setting a DOCTYPE, Komodo refuses to
recognize entities and syntax checking remains out of my reach. I have
filed a bug/feature request accordingly:
http://bugs.activestate.com/show_bug.cgi?id=75287

-- 
 Edward Z. Yang                        GnuPG: 0x869C48DA
 HTML Purifier <http://htmlpurifier.org> Anti-XSS Filter
 [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]]

Reply via email to