Re: [digester] can't resolve relative entities ?

2004-04-04 Thread robert burrell donkin
On 4 Apr 2004, at 20:11, Paul Libbrecht wrote:

On 4-Apr-04, at 20:26 Uhr, robert burrell donkin wrote:
i've been looking into the issue and i know believe that it's parser 
implementation related. the version of xerces that i'm using seems to 
resolve all relative urls, passing digester only absolute urls. it 
would be very helpful if you could tell me which version you're 
using.
The one in maven, 1.4.1 it seems.

Should I contribute a test ?
You seem to have put one...
the test i've added succeeds on my platform. could you try it on yours?

if it passes then please contribute a test that fails for you and i'll 
see if it fails for me.

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester] can't resolve relative entities ?

2004-03-29 Thread Paul Libbrecht
I think Digester.parse(java.io.File) should do it for me, or?
(this method does build an input-source with correct URL, btw)
There's even, in the maven code, efforts towards making this an 
absolute path.

But the problem remains: if you look at the code of Digester.java, 
there's nothing that keeps the URL of the file! And the call to the 
method configure() is without any parameter!

I do think, contrary to what Robert claims, that XML-compliance 
requires relative-system-id-entities to be resolved completely as long 
as we have a URL.

paul

On 29-Mar-04, at 06:17 Uhr, Craig McClanahan wrote:

One important ingredient in using relative references for entity 
resolution is to use the appropriate Digester.parse() method.  If you 
use the one that takes an InputStream, as an example, there is no way 
for the SAX parser or Digester to know what the absolute URL of that 
resource is, and therefore no way to resolve relative references.  On 
the other hand, if you use the entry point that takes a URL, or a 
(properly formatted) InputSource, then you are providing enough 
information for the parser to resolve relative references without 
doing anything else at all.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester] can't resolve relative entities ?

2004-03-29 Thread Craig McClanahan
Paul Libbrecht wrote:

I think Digester.parse(java.io.File) should do it for me, or?
(this method does build an input-source with correct URL, btw)
There's even, in the maven code, efforts towards making this an 
absolute path.

In theory it should ... but if it doesn't, you can easily construct a 
URL for a file and use the technique I described.

But the problem remains: if you look at the code of Digester.java, 
there's nothing that keeps the URL of the file! And the call to the 
method configure() is without any parameter!

But that's a feature, not a bug :-).  No code in Digester is necessary, 
because it's all handled by the SAX parser underneath.

I do think, contrary to what Robert claims, that XML-compliance 
requires relative-system-id-entities to be resolved completely as long 
as we have a URL.

Correct relative entity resolution also requires users to correctly 
utilize what the JAXP APIs provide.  If you don't provide an absolute 
URL for the document being parsed, relative URL references will fail.  
If you do provide an absolute URL, entity references will work in a 
manner totally transparent to Digester, because this is a feature built 
in to the underlying SAX based parser.

paul

Craig

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester] can't resolve relative entities ?

2004-03-29 Thread robert burrell donkin
On 29 Mar 2004, at 18:52, Craig McClanahan wrote:
Paul Libbrecht wrote:

I think Digester.parse(java.io.File) should do it for me, or?
(this method does build an input-source with correct URL, btw)
There's even, in the maven code, efforts towards making this an 
absolute path.

In theory it should ... but if it doesn't, you can easily construct a 
URL for a file and use the technique I described.

But the problem remains: if you look at the code of Digester.java, 
there's nothing that keeps the URL of the file! And the call to the 
method configure() is without any parameter!

But that's a feature, not a bug :-).  No code in Digester is 
necessary, because it's all handled by the SAX parser underneath.

I do think, contrary to what Robert claims, that XML-compliance 
requires relative-system-id-entities to be resolved completely as 
long as we have a URL.

Correct relative entity resolution also requires users to correctly 
utilize what the JAXP APIs provide.  If you don't provide an absolute 
URL for the document being parsed, relative URL references will fail.  
If you do provide an absolute URL, entity references will work in a 
manner totally transparent to Digester, because this is a feature 
built in to the underlying SAX based parser.
'../whatever.dtd' is not an url. XML parsers can therefore reject it 
and still be specification compliant. (the url should be something like 
'file:../whatever.dtd'.) digester makes an attempt to resolve the url 
in the standard java way which is more than the xml specification 
requires in this case.

but paul has highlighted an area where the digester could be improved: 
in the resolution of relative file urls. digester resolves these using 
the standard java system. this system can (in many common situations) 
conflict with the system outlined in the xml specification (which 
should be relative to the document).

the SAX specification says 'If the system identifier is an URL. the SAX 
parser must resolve it fully before reporting it to the application.'. 
from a search, the exact meaning of this phrase seems to be in doubt. 
i'd hope that '../whatever.dtd' should be passed to the EntityResolver 
as an absolute file URL but this behaviour quite possibly isn't present 
in many common parsers.

but SAX does give an option that digester doesn't really exploit at the 
moment: returning null. this should force the SAX parser to resolve the 
system identity in it's standard way. i'd say that this should 
definitely be an option in this particular circumstance.

i'd suggest creating a test for bad URLs (probably something like those 
that don't contain a ':' in the substring starting at zero-based 
position 2). for bad urls, digester tries to resolve them using the 
standard java process. if this fails, then digester returns null 
leaving the parser to cope with the problem.

i think that this should ensure that situations where a good URL is 
specified (such as the cases craig outlined earlier) digester would 
work as at present. in those situations where the URL is not so well 
specified then this change should give the behaviour expected by users 
- that of the parser they are using.

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester] can't resolve relative entities ?

2004-03-29 Thread robert burrell donkin
On 29 Mar 2004, at 20:52, robert burrell donkin wrote:

On 29 Mar 2004, at 18:52, Craig McClanahan wrote:
Paul Libbrecht wrote:

I think Digester.parse(java.io.File) should do it for me, or?
(this method does build an input-source with correct URL, btw)
There's even, in the maven code, efforts towards making this an 
absolute path.

In theory it should ... but if it doesn't, you can easily construct a 
URL for a file and use the technique I described.

But the problem remains: if you look at the code of Digester.java, 
there's nothing that keeps the URL of the file! And the call to the 
method configure() is without any parameter!

But that's a feature, not a bug :-).  No code in Digester is 
necessary, because it's all handled by the SAX parser underneath.

I do think, contrary to what Robert claims, that XML-compliance 
requires relative-system-id-entities to be resolved completely as 
long as we have a URL.

Correct relative entity resolution also requires users to correctly 
utilize what the JAXP APIs provide.  If you don't provide an absolute 
URL for the document being parsed, relative URL references will fail. 
 If you do provide an absolute URL, entity references will work in a 
manner totally transparent to Digester, because this is a feature 
built in to the underlying SAX based parser.
'../whatever.dtd' is not an url. XML parsers can therefore reject it 
and still be specification compliant. (the url should be something 
like 'file:../whatever.dtd'.) digester makes an attempt to resolve the 
url in the standard java way which is more than the xml specification 
requires in this case.
i should probably admit my mistake before others pick it up. relative 
urls do not need a scheme prefix but an entity resolver cannot know the 
base protocol. this is probably why the SAX specification says that 
only fully resolved URLs should be passed in (i take this to mean 
absolute URLs).

IMO the only safe way for digester to deal with relative URLs is to 
return null and leave the parser to try to sort out the mess. the only 
issue would be sort out the relative URLs from badly formed absolute 
file URLs such as 'C:\whatever.dtd'. i don't think that returning null 
in the case of a badly formed URL should make much difference (most 
parsers should find the file in question) but i suppose digester could 
test for the existing of a file (if people think that this is a serious 
problem).

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester] can't resolve relative entities ?

2004-03-28 Thread robert burrell donkin
hi paul

On 27 Mar 2004, at 00:31, Paul Libbrecht wrote:

Dear Digester-Gurus...

While trying really much to resolve the possible responsability of a 
buggy dom4j in errors to resolve entities in maven project parsing, I 
finally realize that Digester may be the reason.

We start with a guess: Digester.parse(File) is weird (around lines 
1527...): it doesn't store, at all, the reference to the file but 
still offers himself as EntityResolver. How can it resolve an entity 
if it doesn't know the path ??
in many ways, digester builds a more user-friendly interface on top of 
SAX. the usual philosophy is to offer easy, out-of-the-box support for 
the most common use cases and then offer access to SAX for those who 
need more sophisticated solutions.

entity resolution is a good example of this. digester offers simple 
support for common use cases by offering itself as the default entity 
resolving. digester maintains a simple map of publicIDs to URLs and a 
method for users (and digester) to register them.

though this is better than the default adopted by most parsers, this 
approach has many limitations. the standard advice for users who need 
more sophisticated support is register a separate EntityResolver. (the 
business of creating and maintaining DTD catalog programs is best left 
to specialist components.)

The pathology appears very while building taglibs of jelly: the 
project.xml of each taglibs, extends ../taglib-project.xml which 
itself should reference, by means of DTD-internal-subset 
../commonDeps.ent.
As this is buggy, the current jelly CVS contains a copy of 
commonDeps.ent.
i've taken a fresh look at the specs and i think that buggy is probably 
too strong a word. '../taglib-project.xml' is not an URI and so parsers 
can legitimately refuse to resolve it but most common parsers interpret 
this as a path relative to the file (i think, please correct me if i'm 
wrong).

i've taken a look at the digester source and it's probable that 
digester does not allow parsers to apply this feature since it will 
always interpret a system id as an URI and then rely on java to find 
it. it should be possible to alter the entity resolution code so that 
(when the URI is a relative file url) the java relative path is tested 
first and null returned if the file does not exist allowing the parser 
to use it's default resolution code which (i think) should find the 
file relative to the file path.

does this sound like it would fix the jelly problems?

and can anyone else see any problems with this approach?

- robert

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester] can't resolve relative entities ?

2004-03-28 Thread Craig McClanahan
robert burrell donkin wrote:

hi paul

On 27 Mar 2004, at 00:31, Paul Libbrecht wrote:

Dear Digester-Gurus...

While trying really much to resolve the possible responsability of a 
buggy dom4j in errors to resolve entities in maven project parsing, I 
finally realize that Digester may be the reason.

We start with a guess: Digester.parse(File) is weird (around lines 
1527...): it doesn't store, at all, the reference to the file but 
still offers himself as EntityResolver. How can it resolve an entity 
if it doesn't know the path ??


in many ways, digester builds a more user-friendly interface on top of 
SAX. the usual philosophy is to offer easy, out-of-the-box support for 
the most common use cases and then offer access to SAX for those who 
need more sophisticated solutions.

entity resolution is a good example of this. digester offers simple 
support for common use cases by offering itself as the default entity 
resolving. digester maintains a simple map of publicIDs to URLs and a 
method for users (and digester) to register them.

though this is better than the default adopted by most parsers, this 
approach has many limitations. the standard advice for users who need 
more sophisticated support is register a separate EntityResolver. (the 
business of creating and maintaining DTD catalog programs is best left 
to specialist components.)

The pathology appears very while building taglibs of jelly: the 
project.xml of each taglibs, extends ../taglib-project.xml which 
itself should reference, by means of DTD-internal-subset 
../commonDeps.ent.
As this is buggy, the current jelly CVS contains a copy of 
commonDeps.ent.


i've taken a fresh look at the specs and i think that buggy is 
probably too strong a word. '../taglib-project.xml' is not an URI and 
so parsers can legitimately refuse to resolve it but most common 
parsers interpret this as a path relative to the file (i think, please 
correct me if i'm wrong).

i've taken a look at the digester source and it's probable that 
digester does not allow parsers to apply this feature since it will 
always interpret a system id as an URI and then rely on java to find 
it. it should be possible to alter the entity resolution code so that 
(when the URI is a relative file url) the java relative path is tested 
first and null returned if the file does not exist allowing the parser 
to use it's default resolution code which (i think) should find the 
file relative to the file path.

does this sound like it would fix the jelly problems?

and can anyone else see any problems with this approach?

One important ingredient in using relative references for entity 
resolution is to use the appropriate Digester.parse() method.  If you 
use the one that takes an InputStream, as an example, there is no way 
for the SAX parser or Digester to know what the absolute URL of that 
resource is, and therefore no way to resolve relative references.  On 
the other hand, if you use the entry point that takes a URL, or a 
(properly formatted) InputSource, then you are providing enough 
information for the parser to resolve relative references without doing 
anything else at all.

As an example of this approach, this is a (slightly simplified) version 
of the logic that Struts uses to construct an InputSource for parsing 
struts-config.xml files:

   String path = /WEB-INF/struts-config.xml;
   InputStream stream = getServletContext().getResourceAsStream(path);
   URL url = getServletContext().getResource(path);
   InputSource source = new InputSource(url.toExternalForm());
   source.setByteStream(stream);
   digester.parse(source);
In this way, relative entity references in the struts-config.xml 
document get resolve to other XML documents in the /WEB-INF directory 
of my webapp, with no extra muss or fuss.  The same approach will work 
for non-webapp based applications as well, as long as you always 
configure the InputSource with an absolute URL.

- robert

Craig

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


[digester] can't resolve relative entities ?

2004-03-26 Thread Paul Libbrecht
Dear Digester-Gurus...

While trying really much to resolve the possible responsability of a 
buggy dom4j in errors to resolve entities in maven project parsing, I 
finally realize that Digester may be the reason.

We start with a guess: Digester.parse(File) is weird (around lines 
1527...): it doesn't store, at all, the reference to the file but still 
offers himself as EntityResolver. How can it resolve an entity if it 
doesn't know the path ??

The pathology appears very while building taglibs of jelly: the 
project.xml of each taglibs, extends ../taglib-project.xml which itself 
should reference, by means of DTD-internal-subset ../commonDeps.ent.
As this is buggy, the current jelly CVS contains a copy of 
commonDeps.ent.

Digging into the source, I realize that the place where it leaves the 
maven sources is in MavenUtils line 190 (at the beginning of 
getNonJellyProject(). Inserting something the code below right before 
the call to getProjectBeanReader().parse() proved me that the whole 
logic of relative URLs runs fine... (even the stream opens)...

URL u = new URL(file:// + projectDescriptor.getAbsolutePath());
System.out.println(Will parse project:  + u);
if (projectDescriptor.getName().equals(tag-project.xml)) {
URL u2 = new URL(u,../commonDependencies.ent);
System.out.println(CommonDeps should be at  + u2);
System.out.println(Stream:  + u2.openStream());
}
Can someone at least comment on my guess about the inappropriate 
construction of the entity-resolver ? I think one needs to create a new 
one for every new URL handed-out to the digester, or ?

paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]