Joerg Heinicke wrote:
> David Crossley wrote:
> <snip/>
> >>>Would it be acceptable to infrastructure@ apache? It is not just
> >>>one simple DTD you are downloading, there are many included bits.
> >>>
> >>>I do not agree with the approach. Cocoon has encouraged people to
> >>>use the entity resolver. We should not join their bad practice.
> 
> What's exactly the problem with it? In most (?) cases the entity 
> resolver would jump in. Only when a parser does not understand the 
> concept of the resolvers, the live access would jump in.

I think that it is bad practice to ever enable an xml tool to
clumsily drag the DTDs across the network. Website and network
efficiency has always been a big driver for me. I do not know
what percentage of requests would result in this.

What do other people think? (See stats below.)

<snip/>
> > Also if people are working with Cocoon documentation, then
> > they already have the DTDs with the distribution.
> > 
> > Here is an alternative. We could go back to having the hard-coded
> > ../../dtd/document-v10.dtd type of System Identifiers and set up
> > the entity resolver to have a catalog at the top-level of xdocs.
> 
> But this would again not work for the CVS - which was the reason for 
> this thread :-) At the point "../../dtd/document-v10.dtd" you won't find
> a DTD, but a HTML file about the CVS data of this file.

In my opinion we should not be driven by that use case.
Why would someone try to use a web browser via ViewCVS web
application to view a raw XML file, and then complain that
there are bits missing?

> > That is preferable to retrieving a mass of DTD stuff across
> > the network every time that someone looks at a document.
> 
> The question is - and I can't answer it - are this really masses?

Hard to estimate, but here are some clues:
----------
Average xdocs/*.xml size is roughly 12 kB

With document-v10 there are 2 additional downloads:
document-v10.dtd = 20 kB
characters.ent = 32 kB

With document-v12 there are 9 additional downloads, as it is more
modular (and the faq and changes DTDs add even more modules):
document-v12.dtd = 8 kB
document-v12.mod = 16 kB
common-charents-v10.mod = 4 kB
iso*.pen entity sets (6 files) = 40 kB
----------

So, if that is not a big impact on Apache infrastructure,
then perhaps we should put the DTDs at somewhere.apache.org
If we did that, then we should not do the hard-coded local
System Identifier thing - just let those users suffer the
network overhead.

The Forrest project would need to participate in this discussion.
They are currently managing the DTDs. There is also the issue of
duplication of these between Cocoon and Forrest. It might be
better that they are part of Cocoon and are made available in
a way that Forrest can utilise them for its various needs (during
the build by Ant, during the command-line docs build, while
running as a webapp, and augmented by other projects with their
own DTDs, etc.).

I see the need for a Proposal, but i also feel that my case of
volunteeritis is worsening. So if it is up to me then it might
take some time.

--David


Reply via email to