Re: [digester] can't resolve relative entities ?
On 4 Apr 2004, at 20:11, Paul Libbrecht wrote: On 4-Apr-04, at 20:26 Uhr, robert burrell donkin wrote: i've been looking into the issue and i know believe that it's parser implementation related. the version of xerces that i'm using seems to resolve all relative urls, passing digester only absolute urls. it would be very helpful if you could tell me which version you're using. The one in maven, 1.4.1 it seems. Should I contribute a test ? You seem to have put one... the test i've added succeeds on my platform. could you try it on yours? if it passes then please contribute a test that fails for you and i'll see if it fails for me. - robert - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] can't resolve relative entities ?
I think Digester.parse(java.io.File) should do it for me, or? (this method does build an input-source with correct URL, btw) There's even, in the maven code, efforts towards making this an absolute path. But the problem remains: if you look at the code of Digester.java, there's nothing that keeps the URL of the file! And the call to the method configure() is without any parameter! I do think, contrary to what Robert claims, that XML-compliance requires relative-system-id-entities to be resolved completely as long as we have a URL. paul On 29-Mar-04, at 06:17 Uhr, Craig McClanahan wrote: One important ingredient in using relative references for entity resolution is to use the appropriate Digester.parse() method. If you use the one that takes an InputStream, as an example, there is no way for the SAX parser or Digester to know what the absolute URL of that resource is, and therefore no way to resolve relative references. On the other hand, if you use the entry point that takes a URL, or a (properly formatted) InputSource, then you are providing enough information for the parser to resolve relative references without doing anything else at all. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] can't resolve relative entities ?
Paul Libbrecht wrote: I think Digester.parse(java.io.File) should do it for me, or? (this method does build an input-source with correct URL, btw) There's even, in the maven code, efforts towards making this an absolute path. In theory it should ... but if it doesn't, you can easily construct a URL for a file and use the technique I described. But the problem remains: if you look at the code of Digester.java, there's nothing that keeps the URL of the file! And the call to the method configure() is without any parameter! But that's a feature, not a bug :-). No code in Digester is necessary, because it's all handled by the SAX parser underneath. I do think, contrary to what Robert claims, that XML-compliance requires relative-system-id-entities to be resolved completely as long as we have a URL. Correct relative entity resolution also requires users to correctly utilize what the JAXP APIs provide. If you don't provide an absolute URL for the document being parsed, relative URL references will fail. If you do provide an absolute URL, entity references will work in a manner totally transparent to Digester, because this is a feature built in to the underlying SAX based parser. paul Craig - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] can't resolve relative entities ?
On 29 Mar 2004, at 18:52, Craig McClanahan wrote: Paul Libbrecht wrote: I think Digester.parse(java.io.File) should do it for me, or? (this method does build an input-source with correct URL, btw) There's even, in the maven code, efforts towards making this an absolute path. In theory it should ... but if it doesn't, you can easily construct a URL for a file and use the technique I described. But the problem remains: if you look at the code of Digester.java, there's nothing that keeps the URL of the file! And the call to the method configure() is without any parameter! But that's a feature, not a bug :-). No code in Digester is necessary, because it's all handled by the SAX parser underneath. I do think, contrary to what Robert claims, that XML-compliance requires relative-system-id-entities to be resolved completely as long as we have a URL. Correct relative entity resolution also requires users to correctly utilize what the JAXP APIs provide. If you don't provide an absolute URL for the document being parsed, relative URL references will fail. If you do provide an absolute URL, entity references will work in a manner totally transparent to Digester, because this is a feature built in to the underlying SAX based parser. '../whatever.dtd' is not an url. XML parsers can therefore reject it and still be specification compliant. (the url should be something like 'file:../whatever.dtd'.) digester makes an attempt to resolve the url in the standard java way which is more than the xml specification requires in this case. but paul has highlighted an area where the digester could be improved: in the resolution of relative file urls. digester resolves these using the standard java system. this system can (in many common situations) conflict with the system outlined in the xml specification (which should be relative to the document). the SAX specification says 'If the system identifier is an URL. the SAX parser must resolve it fully before reporting it to the application.'. from a search, the exact meaning of this phrase seems to be in doubt. i'd hope that '../whatever.dtd' should be passed to the EntityResolver as an absolute file URL but this behaviour quite possibly isn't present in many common parsers. but SAX does give an option that digester doesn't really exploit at the moment: returning null. this should force the SAX parser to resolve the system identity in it's standard way. i'd say that this should definitely be an option in this particular circumstance. i'd suggest creating a test for bad URLs (probably something like those that don't contain a ':' in the substring starting at zero-based position 2). for bad urls, digester tries to resolve them using the standard java process. if this fails, then digester returns null leaving the parser to cope with the problem. i think that this should ensure that situations where a good URL is specified (such as the cases craig outlined earlier) digester would work as at present. in those situations where the URL is not so well specified then this change should give the behaviour expected by users - that of the parser they are using. - robert - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] can't resolve relative entities ?
On 29 Mar 2004, at 20:52, robert burrell donkin wrote: On 29 Mar 2004, at 18:52, Craig McClanahan wrote: Paul Libbrecht wrote: I think Digester.parse(java.io.File) should do it for me, or? (this method does build an input-source with correct URL, btw) There's even, in the maven code, efforts towards making this an absolute path. In theory it should ... but if it doesn't, you can easily construct a URL for a file and use the technique I described. But the problem remains: if you look at the code of Digester.java, there's nothing that keeps the URL of the file! And the call to the method configure() is without any parameter! But that's a feature, not a bug :-). No code in Digester is necessary, because it's all handled by the SAX parser underneath. I do think, contrary to what Robert claims, that XML-compliance requires relative-system-id-entities to be resolved completely as long as we have a URL. Correct relative entity resolution also requires users to correctly utilize what the JAXP APIs provide. If you don't provide an absolute URL for the document being parsed, relative URL references will fail. If you do provide an absolute URL, entity references will work in a manner totally transparent to Digester, because this is a feature built in to the underlying SAX based parser. '../whatever.dtd' is not an url. XML parsers can therefore reject it and still be specification compliant. (the url should be something like 'file:../whatever.dtd'.) digester makes an attempt to resolve the url in the standard java way which is more than the xml specification requires in this case. i should probably admit my mistake before others pick it up. relative urls do not need a scheme prefix but an entity resolver cannot know the base protocol. this is probably why the SAX specification says that only fully resolved URLs should be passed in (i take this to mean absolute URLs). IMO the only safe way for digester to deal with relative URLs is to return null and leave the parser to try to sort out the mess. the only issue would be sort out the relative URLs from badly formed absolute file URLs such as 'C:\whatever.dtd'. i don't think that returning null in the case of a badly formed URL should make much difference (most parsers should find the file in question) but i suppose digester could test for the existing of a file (if people think that this is a serious problem). - robert - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] can't resolve relative entities ?
hi paul On 27 Mar 2004, at 00:31, Paul Libbrecht wrote: Dear Digester-Gurus... While trying really much to resolve the possible responsability of a buggy dom4j in errors to resolve entities in maven project parsing, I finally realize that Digester may be the reason. We start with a guess: Digester.parse(File) is weird (around lines 1527...): it doesn't store, at all, the reference to the file but still offers himself as EntityResolver. How can it resolve an entity if it doesn't know the path ?? in many ways, digester builds a more user-friendly interface on top of SAX. the usual philosophy is to offer easy, out-of-the-box support for the most common use cases and then offer access to SAX for those who need more sophisticated solutions. entity resolution is a good example of this. digester offers simple support for common use cases by offering itself as the default entity resolving. digester maintains a simple map of publicIDs to URLs and a method for users (and digester) to register them. though this is better than the default adopted by most parsers, this approach has many limitations. the standard advice for users who need more sophisticated support is register a separate EntityResolver. (the business of creating and maintaining DTD catalog programs is best left to specialist components.) The pathology appears very while building taglibs of jelly: the project.xml of each taglibs, extends ../taglib-project.xml which itself should reference, by means of DTD-internal-subset ../commonDeps.ent. As this is buggy, the current jelly CVS contains a copy of commonDeps.ent. i've taken a fresh look at the specs and i think that buggy is probably too strong a word. '../taglib-project.xml' is not an URI and so parsers can legitimately refuse to resolve it but most common parsers interpret this as a path relative to the file (i think, please correct me if i'm wrong). i've taken a look at the digester source and it's probable that digester does not allow parsers to apply this feature since it will always interpret a system id as an URI and then rely on java to find it. it should be possible to alter the entity resolution code so that (when the URI is a relative file url) the java relative path is tested first and null returned if the file does not exist allowing the parser to use it's default resolution code which (i think) should find the file relative to the file path. does this sound like it would fix the jelly problems? and can anyone else see any problems with this approach? - robert - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] can't resolve relative entities ?
robert burrell donkin wrote: hi paul On 27 Mar 2004, at 00:31, Paul Libbrecht wrote: Dear Digester-Gurus... While trying really much to resolve the possible responsability of a buggy dom4j in errors to resolve entities in maven project parsing, I finally realize that Digester may be the reason. We start with a guess: Digester.parse(File) is weird (around lines 1527...): it doesn't store, at all, the reference to the file but still offers himself as EntityResolver. How can it resolve an entity if it doesn't know the path ?? in many ways, digester builds a more user-friendly interface on top of SAX. the usual philosophy is to offer easy, out-of-the-box support for the most common use cases and then offer access to SAX for those who need more sophisticated solutions. entity resolution is a good example of this. digester offers simple support for common use cases by offering itself as the default entity resolving. digester maintains a simple map of publicIDs to URLs and a method for users (and digester) to register them. though this is better than the default adopted by most parsers, this approach has many limitations. the standard advice for users who need more sophisticated support is register a separate EntityResolver. (the business of creating and maintaining DTD catalog programs is best left to specialist components.) The pathology appears very while building taglibs of jelly: the project.xml of each taglibs, extends ../taglib-project.xml which itself should reference, by means of DTD-internal-subset ../commonDeps.ent. As this is buggy, the current jelly CVS contains a copy of commonDeps.ent. i've taken a fresh look at the specs and i think that buggy is probably too strong a word. '../taglib-project.xml' is not an URI and so parsers can legitimately refuse to resolve it but most common parsers interpret this as a path relative to the file (i think, please correct me if i'm wrong). i've taken a look at the digester source and it's probable that digester does not allow parsers to apply this feature since it will always interpret a system id as an URI and then rely on java to find it. it should be possible to alter the entity resolution code so that (when the URI is a relative file url) the java relative path is tested first and null returned if the file does not exist allowing the parser to use it's default resolution code which (i think) should find the file relative to the file path. does this sound like it would fix the jelly problems? and can anyone else see any problems with this approach? One important ingredient in using relative references for entity resolution is to use the appropriate Digester.parse() method. If you use the one that takes an InputStream, as an example, there is no way for the SAX parser or Digester to know what the absolute URL of that resource is, and therefore no way to resolve relative references. On the other hand, if you use the entry point that takes a URL, or a (properly formatted) InputSource, then you are providing enough information for the parser to resolve relative references without doing anything else at all. As an example of this approach, this is a (slightly simplified) version of the logic that Struts uses to construct an InputSource for parsing struts-config.xml files: String path = /WEB-INF/struts-config.xml; InputStream stream = getServletContext().getResourceAsStream(path); URL url = getServletContext().getResource(path); InputSource source = new InputSource(url.toExternalForm()); source.setByteStream(stream); digester.parse(source); In this way, relative entity references in the struts-config.xml document get resolve to other XML documents in the /WEB-INF directory of my webapp, with no extra muss or fuss. The same approach will work for non-webapp based applications as well, as long as you always configure the InputSource with an absolute URL. - robert Craig - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[digester] can't resolve relative entities ?
Dear Digester-Gurus... While trying really much to resolve the possible responsability of a buggy dom4j in errors to resolve entities in maven project parsing, I finally realize that Digester may be the reason. We start with a guess: Digester.parse(File) is weird (around lines 1527...): it doesn't store, at all, the reference to the file but still offers himself as EntityResolver. How can it resolve an entity if it doesn't know the path ?? The pathology appears very while building taglibs of jelly: the project.xml of each taglibs, extends ../taglib-project.xml which itself should reference, by means of DTD-internal-subset ../commonDeps.ent. As this is buggy, the current jelly CVS contains a copy of commonDeps.ent. Digging into the source, I realize that the place where it leaves the maven sources is in MavenUtils line 190 (at the beginning of getNonJellyProject(). Inserting something the code below right before the call to getProjectBeanReader().parse() proved me that the whole logic of relative URLs runs fine... (even the stream opens)... URL u = new URL(file:// + projectDescriptor.getAbsolutePath()); System.out.println(Will parse project: + u); if (projectDescriptor.getName().equals(tag-project.xml)) { URL u2 = new URL(u,../commonDependencies.ent); System.out.println(CommonDeps should be at + u2); System.out.println(Stream: + u2.openStream()); } Can someone at least comment on my guess about the inappropriate construction of the entity-resolver ? I think one needs to create a new one for every new URL handed-out to the digester, or ? paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]