The problem
-----------
It seems that some tools still do not use a Catalog Entity Resolver [1]
to locate local copies of DTDs. Hence these tools break, because there
are not actual DTDs on the website. This is limiting the uptake of
the Apache Document DTDs and drawing lots of user questions.

Some background
---------------
Forrest and Cocoon both use the Catalog Entity Resolver and this works
fine in all situations (command-line, webapp). Any sensible XML tool
can utilise the xml-commons-resolver.jar, and the Catalog supplied by
Forrest. Some other tools either don't bother to use it or the user
has not bothered to configure it.

The solution
------------
Place the document-v* DTDs and supporting files on the Forrest website
and keep them up-to-date with Forrest CVS.

Add an HTML page to explain that there are better solutions and that
this is only a fall-back solution.

Some issues
-----------
1) The DTDs will still be managed in the Forrest CVS at [2].

2) The Forrest build system is complex. It would be good to automate
the publishing of DTD versions, but that may not be possible.

3) The Forrest website is built using the "stable" version of
Forrest (currently v0.5.1). So how will DTDs from the current
CVS (v0.6-dev) get into the website CVS [3]? Manual copy? See 4).

4) If some committer changes the DTDs in CVS then they will be
out-of-sync. Will committers remember to do the manual copy? See 3).

5) Extra impact on the Apache webserver. Is this a big bandwidth
consumer? See some estimates in [4].

6) What will be the URLs for the DTDs?
http://xml.apache.org/forrest/dtd/...
We presume that Forrest will not be a top-level project soon.

7) Does the Apache webserver deliver the supporting files using
the appropriate Content-Type? Is that "text/plain"? Does it matter?
We have *.dtd and *.mod and *.pen and *.ent extensions.

8) We have two separate directories in CVS. The /dtd/ and the
/entity/ directories are parallel. We can probably merge
everything into /dtd/ and change the Catalogs.

9) We will never know if the Catalog Entity Resolver gets
broken after an upgrade. Forrest will still work but will
be slower, doing downloads of the DTD and supporting files
on each document parse. We can probably add a test document
in the "forrest seed site" to detect failure.

10) Cocoon has a copy of the DTDs and stuff in its own CVS
so that it can build its own documentation via its webapp
and command-line builds. This can still continue, but needs
a better solution. Perhaps Forrest can provide later.

11) Do we need to ask infrastructure@ about this proposal
or just do it?

Next steps
----------
* Discuss this proposal on cocoon-dev and forrest-dev.
* If no new issues surface, then summarise and call a vote if needed.
* Modify Forrest build system to copy the DTDs into the website CVS.

References
----------
[1] Apache Catalog Entity Resolver
http://xml.apache.org/commons/components/resolver/

[2] DTDs in Forrest CVS
http://cvs.apache.org/viewcvs/xml-forrest/src/core/context/resources/schema/

[3] Forrest website CVS
http://cvs.apache.org/viewcvs/xml-site/targets/forrest/

[4] Some recent mail discussion:
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=107042081805837

[5] Jira issue FOR-107
http://issues.cocoondev.org/jira//secure/ViewIssue.jspa?key=FOR-107


Reply via email to