> -----Original Message-----
> From: Joseph Kesselman [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, May 17, 2005 1:17 PM
> To: [email protected]
> Subject: Re: How to use entities with XML Schema?
>
> Officially (officiously?), the folks who designed schemas
> expect us to stop using entities and migrate to some other
> variety of macro/external-data reference.
As a member of the W3C XML Schema WG let me speak to this (unoffically, of course...and hopefully folks won't think I'm being officious :-).
The reason the schema spec doesn't include a mechanism for declaring entities is because the XML spec wouldn't allow us to specify such a mechanism. The reason is simple: the instance documents that schema processors operate on are not text streams (i.e., pointy brackets) but infosets. As such, the instance documents must be well-formed (or else there wouldn't be an infoset). According to the XML spec [1,2] an instance document that contains references to entities that are not declared in a DTD is not well-formed...therefore there is no infoset for input to the schema processor.
We struggled with this for quite some time and no one (including folks on XML Core WG) could come up with a solution (short of modifying the XML spec itself and no one was willing to do that).
Also note that entity decls can appear in the internal subset and non-validating (non DTD-validating) processors are requried to read and process the interal subset, including entity decls. Thus, if you include your entity decls in the internal subset you should be able to forgo DTD validating and still get schema validation, according to the specs [3]. Granted, this doesn't help if you don't control the instance documents you're trying to schema validate :-(
pvb
[1] http://www.w3.org/TR/REC-xml/#wf-entdeclared
[2] http://www.xml.com/axml/notes/EntDecExeg.html (a nice explanation of the rule that appears in Tim Bray's Annotated XML Spec, v1)
[3] In the xerces2-j FAQ on configuration for validation [4] there is note that reads: "An application may choose to create a configuration that does not have a DTD validator but has an XML Schema validator. This will turn Xerces into a non-compliant processor according to XML 1.0 and XML Schema specifications, thus the validation/augmentation outcome is undefined." I completely disagree with this note and think xerces is incorrect in this behavior. I suspose their reasoning is that the presence of a <!DOCTYPE> in the instance requires that DTD validation be performed but this isn't sanctioned by the XML spec. The presence of <!DOCTYPE> is NOT the signal to perform validation [5] although many processors believe it is.
[4] http://xml.apache.org/xerces2-j/faq-pcfp.html
[5] http://www.xml.com/axml/notes/DoctypeMeans.html
