On 29 Mar 2004, at 18:52, Craig McClanahan wrote:
Paul Libbrecht wrote:

I think Digester.parse(java.io.File) should do it for me, or?
(this method does build an input-source with correct URL, btw)
There's even, in the maven code, efforts towards making this an absolute path.


In theory it should ... but if it doesn't, you can easily construct a URL for a file and use the technique I described.

But the problem remains: if you look at the code of Digester.java, there's nothing that keeps the URL of the file! And the call to the method configure() is without any parameter!

But that's a feature, not a bug :-). No code in Digester is necessary, because it's all handled by the SAX parser underneath.

I do think, contrary to what Robert claims, that XML-compliance requires relative-system-id-entities to be resolved completely as long as we have a URL.

Correct relative entity resolution also requires users to correctly utilize what the JAXP APIs provide. If you don't provide an absolute URL for the document being parsed, relative URL references will fail. If you do provide an absolute URL, entity references will work in a manner totally transparent to Digester, because this is a feature built in to the underlying SAX based parser.

'../whatever.dtd' is not an url. XML parsers can therefore reject it and still be specification compliant. (the url should be something like 'file:../whatever.dtd'.) digester makes an attempt to resolve the url in the standard java way which is more than the xml specification requires in this case.


but paul has highlighted an area where the digester could be improved: in the resolution of relative file urls. digester resolves these using the standard java system. this system can (in many common situations) conflict with the system outlined in the xml specification (which should be relative to the document).

the SAX specification says 'If the system identifier is an URL. the SAX parser must resolve it fully before reporting it to the application.'. from a search, the exact meaning of this phrase seems to be in doubt. i'd hope that '../whatever.dtd' should be passed to the EntityResolver as an absolute file URL but this behaviour quite possibly isn't present in many common parsers.

but SAX does give an option that digester doesn't really exploit at the moment: returning null. this should force the SAX parser to resolve the system identity in it's standard way. i'd say that this should definitely be an option in this particular circumstance.

i'd suggest creating a test for bad URLs (probably something like those that don't contain a ':' in the substring starting at zero-based position 2). for bad urls, digester tries to resolve them using the standard java process. if this fails, then digester returns null leaving the parser to cope with the problem.

i think that this should ensure that situations where a good URL is specified (such as the cases craig outlined earlier) digester would work as at present. in those situations where the URL is not so well specified then this change should give the behaviour expected by users - that of the parser they are using.

- robert


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to