On 5/13/2010 4:39 PM, Ben Griffin wrote:
Alberto, thanks for this.
What I notice is that if I have preloaded a schema into the grammarpool, then
the namespace declaration is good enough as a hook to use that, and then the
document is treated as a schema based document.
Is this a bug / side effect?
It is intentional; if you use loadGrammar with the toCache parameter set
to "true", you are also implicitly setting to "true" the property
useCachedGrammarInParse. This means that any parsing done by the same
parser object will look if the parsed element is part of a validated
namespace.
Should I really expect xercesc to treat every document as based on an empty DTD
unless a schema declaration / DOCTYPE is found?
Yes, but it shouldn't matter. It's just an implementation detail.
If it is safe for me to assume that a namespace declaration is enough to
identify the type of document as long as the namespace is already in the
grammarpool, then I have a workaround (which is a shame - as I would have
preferred the scanner to do this for me) which involves intercepting the
documents owning xmlns attribute and then looking up and preloading the grammar
before the parse takes place.
Yes. As for preloading the grammar as soon as a namespace is found,
that's a feature that has been requested in the past
(https://issues.apache.org/jira/browse/XERCESC-1180)
When scanning the document, I notice that the method
IGXMLScanner::scanStartTagNS(bool& gotData) is being called and the namespace
is being identified as a part of the document scan. It seems to be eminently
sensible for xercesc to have some form of callback method for allowing the calling
code to respond to the namespace, and this is what I understood Michael to mean in
his original post.
Keep in mind that Michael was talking about Xerces-J, not Xerces-C.
Alberto
Best regards, and thanks for your advice on this.
Ben.
On 13 May 2010, at 15:09, Alberto Massari wrote:
When Xerces parses an XML file, it assumes it is based on en empty DTD; only
when a schema declaration is found, the schema validator becomes the active one.
Having an element declared in a non-empty namespace doesn't make it use an
XMLSchema, it's only a namespace declaration.
And any resource resolver is used only when trying to actually load an external
resource, e.g. when a xsi:schemaLocation or a DOCTYPE instruction is found in
the XML document.
Alberto
On 5/13/2010 3:39 PM, Ben Griffin wrote:
On the xerces-j list about 3 years ago, Michael said:
On 1/29/07, Michael Glavassevich<[email protected]> wrote:
If you were expecting to resolve the schema documents based on their target
namespace
you should use an API which has a resolver that will pass that
information (see the JAXP 1.3 Validation API [2] and LSResourceResolver [3]) to
you.
I really want to be able to do this usng xercesc, but I keep hitting walls.
I am not sure if it because of the API statement
The LSParser will then allow the application to intercept any external
entities, including the
external DTD subset and external parameter entities, before including them.
The top-level document entity is never passed to the resolveResource method.
but for however I try, when parsing ( with DOMLSParser ) an xml document such as
<foo xmlns="http://www.foo.org">
...
</foo>
there is no callback to my resolveResource() method.
I am setting up my DOMLSResourceResolver with
conf->setParameter(XMLUni::fgDOMResourceResolver,myResourceHandler);
(I have tried using XMLEntityResolver classes also, to no avail).
Also (and this maybe related) - it appears to me that unless the grammar is
already loaded against the root element's namespace,
the xml document is treated as if it were a DTD instance, rather than a Schema
instance,
even though there is no DOCTYPE declaration, and there is clearly marked an
xmlns 'attribute' on the root element.
Any answers?
.