On 5/13/2010 4:39 PM, Ben Griffin wrote:
Alberto,  thanks for this.
What I notice is that if I have preloaded a schema into the grammarpool, then 
the namespace declaration is good enough as a hook to use that, and then the 
document is treated as a schema based document.
Is this a bug / side effect?

It is intentional; if you use loadGrammar with the toCache parameter set to "true", you are also implicitly setting to "true" the property useCachedGrammarInParse. This means that any parsing done by the same parser object will look if the parsed element is part of a validated namespace.

Should I really expect xercesc to treat every document as based on an empty DTD 
unless a schema declaration / DOCTYPE is found?

Yes, but it shouldn't matter. It's just an implementation detail.

If it is safe for me to assume that a namespace declaration is enough to 
identify the type of document as long as the namespace is already in the 
grammarpool, then I have a workaround (which is a shame - as I would have 
preferred the scanner to do this for me) which involves intercepting the 
documents owning xmlns attribute and then looking up and preloading the grammar 
before the parse takes place.

Yes. As for preloading the grammar as soon as a namespace is found, that's a feature that has been requested in the past (https://issues.apache.org/jira/browse/XERCESC-1180)


When scanning the document, I notice that the method 
IGXMLScanner::scanStartTagNS(bool&  gotData) is being called and the namespace 
is being identified as a part of the document scan.  It seems to be eminently 
sensible for xercesc to have some form of callback method for allowing the calling 
code to respond to the namespace, and this is what I understood Michael to mean in 
his original post.

Keep in mind that Michael was talking about Xerces-J, not Xerces-C.

Alberto

Best regards, and thanks for your advice on this.

Ben.


On 13 May 2010, at 15:09, Alberto Massari wrote:

When Xerces parses an XML file, it assumes it is based on en empty DTD; only 
when a schema declaration is found, the schema validator becomes the active one.
Having an element declared in a non-empty namespace doesn't make it use an 
XMLSchema, it's only a namespace declaration.
And any resource resolver is used only when trying to actually load an external 
resource, e.g. when a xsi:schemaLocation or a DOCTYPE instruction is found in 
the XML document.

Alberto

On 5/13/2010 3:39 PM, Ben Griffin wrote:
On the xerces-j list about 3 years ago, Michael said:

On 1/29/07, Michael Glavassevich<[email protected]>   wrote:

If you were expecting to resolve the schema documents based on their target 
namespace
you should use an API which has a resolver that will pass that
information (see the JAXP 1.3 Validation API [2] and LSResourceResolver [3]) to 
you.

I really want to be able to do this usng xercesc, but I keep hitting walls.

I am not sure if it because of the API statement
The LSParser will then allow the application to intercept any external 
entities, including the
external DTD subset and external parameter entities, before including them.
The top-level document entity is never passed to the resolveResource method.

but for however I try, when parsing ( with DOMLSParser ) an xml document such as

<foo xmlns="http://www.foo.org";>
...
</foo>

there is no callback to my resolveResource() method.

I am setting up my DOMLSResourceResolver with

conf->setParameter(XMLUni::fgDOMResourceResolver,myResourceHandler);

(I have tried using XMLEntityResolver classes also, to no avail).

Also (and this maybe related) - it appears to me that unless the grammar is 
already loaded against the root element's namespace,
the xml document is treated as if it were a DTD instance, rather than a Schema 
instance,
even though there is no DOCTYPE declaration, and there is clearly marked an 
xmlns 'attribute' on the root element.

Any answers?
.


Reply via email to