I didn't mean to say that lack of a DOCTYPE should be treated as an error condition. In fact, what's an error and what isn't is really up to the application that parses the XML. Think of your XML processor as having an implicit contract which imposes constraints on what the incoming XML looks like. The details of that contract are up to you. Hence, whether or not the lack of a DOCTYPE is an error depends on the XML processor's contract. The contract could be: "if there is no DOCTYPE, this is an error", or it could be "if there is no DOCTYPE, I'll validate it against my DTD anyway."
The problem with the current SAX standard is that it provides no strait-forward way to enforce such contracts. Instead, it imposes a default contract, which says: "I will validate against the DTD you specify in your DOCTYPE; if you do not include a DOCTYPE, I will not validate." Jeff Turner's DoctypeChanger looks like a decent workaround for this deficiency in the SAX standard. To me it would seem a bit crufty to actually incorporate this kind of functionality within the Digester, since if you need to control the doctype in this way, you can simply pass a DoctypeChangerStream to the Digester.parse() method. The point of my last email was essentially to suggest a contract: "if there is no DOCTYPE, validate against my DTD; if there is a DOCTYPE, and it contains the public id that I expect, validate against my DTD; if there is a DOCTYPE, and it contains some unknown public id, terminate with a validation failure exception." Such a contract could be enforced using the DoctypeChanger framework, with a custom implementation of the DoctypeGenerator interface. -DHM > -----Original Message----- > From: Tal Dayan [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, November 13, 2001 8:15 PM > To: Jakarta Commons Developers List > Subject: RE: [digester] forcing a specific DTD > > > Let see if I get your suggestions right, there are two error cases: > > 1. There is no DOCTYPE at all. Since there will be no > validation, we want to > abort the parsing. How can we currently detect this case ? > > 2. The document has DOCTYPE but the public id or URL are > incorrect. Does the > current version of Digester provides a way to detect this case ? > > Simply parsing the document without validation (and using the > correct DTD) > is not an option since we don't want to write explicitly the entire > validation code, so at least we need to detect the error > condition (that is, > no validation or validation using a wrong DTD) and abort the parsing. > > This looks like a very general XML parsing issue so it will > be great if > Digester will come out of the box with an easy way to address it. > > Tal > > > > -----Original Message----- > > From: Dave Martin [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, November 13, 2001 7:11 PM > > To: Jakarta Commons Developers List > > Subject: RE: [digester] forcing a specific DTD > > > > > > I can only speak for myself, but I would summarize it as: > > > > 1. If you can enforce that the XML file has a DOCTYPE > declaration, you can > > configure the Digester to use a specified DTD, based on the PUBLIC > > identifier in the DOCTYPE entity. (See the 'register' method in > > the Digester > > class.) > > > > 2. If it is possible for the XML file to omit the DOCTYPE > declaration, and > > you still want to validate the XML, you have a slight > problem, because the > > SAX standard currently doesn't provide a mechanism to force > > validation. (SAX > > says: "if validation is 'on', _and_ the XML contains DOCTYPE, > > then validate > > against the specified DTD"). > > > > One workaround is to trick the XML parser into validating, > by intercepting > > the XML input stream and prepending a DOCTYPE declaration > to it. I do not > > believe it is appropriate to implement this capability in > the Digester > > itself. > > > > That said, the simplest solution to #2 is to simply not validate > > if the XML > > file omits a doctype declaration. This can be treacherous, > > however, because > > if a DTD defines default values, it's possible for the same > XML file to be > > parsed differently, depending on whether validation is used. > > > > -DHM > > > > > -----Original Message----- > > > From: Tal Dayan [mailto:[EMAIL PROTECTED]] > > > Sent: Tuesday, November 13, 2001 6:44 PM > > > To: Jakarta Commons Developers List > > > Subject: RE: [digester] forcing a specific DTD > > > > > > > > > So, what is the concensous of the list regarding DTD > > > validation of parsed > > > files of known type ? > > > Is there a safe way to do it now (if so, how) ? If not, does > > > it make sense > > > to add more support > > > for enforcing validation ? > > > > > > Tal > > > > > > > -----Original Message----- > > > > From: Dave Martin [mailto:[EMAIL PROTECTED]] > > > > Sent: Tuesday, November 13, 2001 11:53 AM > > > > To: Jakarta Commons Developers List > > > > Subject: RE: [digester] forcing a specific DTD > > > > > > > > > > > > IMHO, if the input XML document declares a document type other > > > > than what is > > > > expected, the appropriate action should be to terminate the > > > parse due to a > > > > validation failure. (I.e. "If you're not what you say you > > > are, how can I > > > > trust you?") > > > > The Digester's register() method, which, as Craig pointed out, > > > > allows you to > > > > key a specific DTD off of the DOCTYPE's public > identifier, should be > > > > appropriate for almost all cases. (Since if the XML cannot > > > declare the > > > > location of the DTD via the SYSTEM identifier, it > should at least > > > > be able to > > > > identify its type via the PUBLIC identifier.) > > > > On the other hand, in the scenario where the XML input does not > > > > contain any > > > > DOCTYPE declaration, one might want to 'assume' it follows > > > a particular > > > > document type, and validate it against that DTD to verify > > > that assumption. > > > > > > > > -DHM > > > > > > > > > -----Original Message----- > > > > > From: Arun M. Thomas [mailto:[EMAIL PROTECTED]] > > > > > Sent: Tuesday, November 13, 2001 11:37 AM > > > > > To: Jakarta Commons Developers List > > > > > Subject: RE: [digester] forcing a specific DTD > > > > > > > > > > > > > > > You're absolutely right, this wouldn't work without a DOCTYPE > > > > > declaration. However, it wouldn't matter what the > > > contents of that > > > > > declaration were. (The user could modify to his > hearts content > > > > > and still have the document validated against the same DTD. > > > > > > > > > > -AMT > > > > > > > > > > -----Original Message----- > > > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On > > > Behalf Of Craig > > > > > R. McClanahan > > > > > Sent: Tuesday, November 13, 2001 10:17 AM > > > > > To: Jakarta Commons Developers List > > > > > Subject: RE: [digester] forcing a specific DTD > > > > > > > > > > > > > > > Would something like this work even in the absence of > a <DOCTYPE> > > > > > declaration at all in the file being parsed? I thought that > > > > > this was the > > > > > only time resolveEntity() was called. > > > > > > > > > > Craig > > > > > > > > > > > > > > > On Tue, 13 Nov 2001, Arun M. Thomas wrote: > > > > > > > > > > > Date: Tue, 13 Nov 2001 10:20:42 -0800 > > > > > > From: Arun M. Thomas <[EMAIL PROTECTED]> > > > > > > Reply-To: Jakarta Commons Developers List > > > > > <[EMAIL PROTECTED]> > > > > > > To: Jakarta Commons Developers List > > > <[EMAIL PROTECTED]> > > > > > > Subject: RE: [digester] forcing a specific DTD > > > > > > > > > > > > Craig, > > > > > > > > > > > > Despite my previous response to TAL, it should be possible > > > > > to do this > > > > > > by instantiating the SAXParser with a subclass of > > > > > DefaultHandler which > > > > > > overrides the resolveEntity method. I had to do > > > exactly this in the > > > > > > last application on which I worked using the JAXP1.0 > > > API. In that > > > > > > case, we provided a custom implementation of > > > EntityResolver which > > > > > > always returned an INPUT source to the same dtd. It > > > > > appears the JAXP1.1 > > > > > > has hidden the EntityResolver under the DefaultHandler > > > > > class, so providing > > > > > > an implementation of that method which is customized to > > > > > return a specific > > > > > > dtd should suffice. > > > > > > > > > > > > In the Digester case, it means a potentially simple > > > > > modification to the > > > > > > resolveEntity method of Digester (which is a > > > DefaultHandler). I've > > > > > > included a diff of a quick patch below as a suggestion, and > > > > > attached a > > > > > copy > > > > > > of the modified version of digester. > > > > > > > > > > > > Cheers, > > > > > > -AMT > > > > > > > > > > > > cvs diff Digester.java (in directory > > > > > > > > > > > > > > > C:\Dev\jakarta-commons\digester\src\java\org\apache\commons\digester\) > > > > > > Index: Digester.java > > > > > > > > > > =================================================================== > > > > > > RCS file: > > > > > > > > > > > /home/cvspublic/jakarta-commons/digester/src/java/org/apache/c > > > > ommons/digeste > > > > > > r/Digester.java,v > > > > > > retrieving revision 1.23 > > > > > > diff -r1.23 Digester.java > > > > > > 161c161,171 > > > > > > < > > > > > > --- > > > > > > > /** > > > > > > > * Works only in a JAXP1.1 world, but allows > > > > > the user to supply a > > > > > fixed > > > > > > URL against > > > > > > > * which all documents will be validated. The > > > > > supplied parameter will > > > > > > be used by > > > > > > > * the {@link #resolveEntity(String, > > > > String)} method. > > > > > > > */ > > > > > > > public Digester(String fixedDTDUrl) { > > > > > > > super(); > > > > > > > > > > > > > > this.fixedDTDUrl = fixedDTDUrl; > > > > > > > } > > > > > > > > > > > > > 313a324,328 > > > > > > > /** > > > > > > > * URL which may be supplied against which all > > > > > documents should be > > > > > > > * validated regardless of the public and > > > > > system identifiers. > > > > > > > */ > > > > > > > private String fixedDTDUrl = null; > > > > > > 1042a1058,1060 > > > > > > > if (fixedDTDUrl != null) > > > > > > > dtdURL = fixedDTDUrl; > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: craigmcc@localhost [mailto:craigmcc@localhost]On > > > > > Behalf Of Craig > > > > > > R. McClanahan > > > > > > Sent: Tuesday, November 13, 2001 9:30 AM > > > > > > To: Jakarta Commons Developers List > > > > > > Subject: Re: [digester] forcing a specific DTD > > > > > > > > > > > > ... > > > > > > > > > > > > This sounds like it might be an interesting idea, but I > > > > > don't know how to > > > > > > implement it :-(. Digester uses a SAX parser via the > > > JAXP/1.1 APIs > > > > > > underneath the covers. How do you tell the parser to use > > > > > an arbitrary DTD > > > > > > instead of whatever is specified in the document > being parsed? > > > > > > > > > > > > Craig > > > > > > > > > > > > > > > > > > > > > -- > > > > > To unsubscribe, e-mail: > > > > > <mailto:[EMAIL PROTECTED]> > > > > > For additional commands, e-mail: > > > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > > > > -- > > > > > To unsubscribe, e-mail: > > > > > <mailto:[EMAIL PROTECTED]> > > > > > For additional commands, e-mail: > > > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > > -- > > > > To unsubscribe, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > -- > > > To unsubscribe, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > > > <mailto:[EMAIL PROTECTED]> > > > > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>