Alberto,  thanks for this.
What I notice is that if I have preloaded a schema into the grammarpool, then 
the namespace declaration is good enough as a hook to use that, and then the 
document is treated as a schema based document.
Is this a bug / side effect? 

Should I really expect xercesc to treat every document as based on an empty DTD 
unless a schema declaration / DOCTYPE is found?

If it is safe for me to assume that a namespace declaration is enough to 
identify the type of document as long as the namespace is already in the 
grammarpool, then I have a workaround (which is a shame - as I would have 
preferred the scanner to do this for me) which involves intercepting the 
documents owning xmlns attribute and then looking up and preloading the grammar 
before the parse takes place.


When scanning the document, I notice that the method 
IGXMLScanner::scanStartTagNS(bool& gotData) is being called and the namespace 
is being identified as a part of the document scan.  It seems to be eminently 
sensible for xercesc to have some form of callback method for allowing the 
calling code to respond to the namespace, and this is what I understood Michael 
to mean in his original post.

Best regards, and thanks for your advice on this.

Ben.


On 13 May 2010, at 15:09, Alberto Massari wrote:

> When Xerces parses an XML file, it assumes it is based on en empty DTD; only 
> when a schema declaration is found, the schema validator becomes the active 
> one.
> Having an element declared in a non-empty namespace doesn't make it use an 
> XMLSchema, it's only a namespace declaration.
> And any resource resolver is used only when trying to actually load an 
> external resource, e.g. when a xsi:schemaLocation or a DOCTYPE instruction is 
> found in the XML document.
> 
> Alberto
> 
> On 5/13/2010 3:39 PM, Ben Griffin wrote:
>> On the xerces-j list about 3 years ago, Michael said:
>> 
>> On 1/29/07, Michael Glavassevich<[email protected]>  wrote:
>>   
>>> If you were expecting to resolve the schema documents based on their target 
>>> namespace
>>> you should use an API which has a resolver that will pass that
>>> information (see the JAXP 1.3 Validation API [2] and LSResourceResolver 
>>> [3]) to you.
>>>     
>> I really want to be able to do this usng xercesc, but I keep hitting walls.
>> 
>> I am not sure if it because of the API statement
>> The LSParser will then allow the application to intercept any external 
>> entities, including the
>> external DTD subset and external parameter entities, before including them.
>> The top-level document entity is never passed to the resolveResource method.
>> 
>> but for however I try, when parsing ( with DOMLSParser ) an xml document 
>> such as
>> 
>> <foo xmlns="http://www.foo.org";>
>> ...
>> </foo>
>> 
>> there is no callback to my resolveResource() method.
>> 
>> I am setting up my DOMLSResourceResolver with
>> 
>> conf->setParameter(XMLUni::fgDOMResourceResolver,myResourceHandler);
>> 
>> (I have tried using XMLEntityResolver classes also, to no avail).
>> 
>> Also (and this maybe related) - it appears to me that unless the grammar is 
>> already loaded against the root element's namespace,
>> the xml document is treated as if it were a DTD instance, rather than a 
>> Schema instance,
>> even though there is no DOCTYPE declaration, and there is clearly marked an 
>> xmlns 'attribute' on the root element.
>> 
>> Any answers?

Reply via email to