Re: [digester2] performance of ns-aware parsing

Simon Kitching Sat, 05 Feb 2005 22:02:48 -0800

On Sat, 2005-02-05 at 21:02 -0800, Reid Pinchback wrote:
> --- Simon Kitching <[EMAIL PROTECTED]> wrote:
> > >  Mucking with (d) is supposed to result in significant
> > > wins when you tune the grammar handling to your app, but I haven't tried 
> > > it 
> > > myself and I've never seen timing differences quoted.  
> > > 
> > 
> > I don't quite understand what (d) means, but is it actually relevant?
> > Again, we are talking about *namespaces* not validation.
> 
> Yes... and every entity (Element and Attribute) is jammed through a
> resolution process first.  Remember XML attributes with default values?
> Guess where those values are identified and handed to the parser - during
> the resolution process.  Namespaces just add more data to shuffle
> around during the resolution process.


Well, in a document that doesn't use namespaces, the penalty is zero.

In a document that uses namespaces, there are a few xmlns:... attributes
floating around. But these have to be handled by the DTD processor
regardless of whether namespace processing is enabled or not, yes?

I don't see where namespaces adds any extra data for a DTD processor to
deal with during the "infoset augmentation" stage.


> 
> > What I'm trying to achieve is to avoid having actions or patterns deal
> > with element-names containing prefixes, eg stating that an element's
> > name is "foo:item". This is just broken; the item's name is really the
> > tuple (some-namespace, item).
> > 
> > Grammars/schemas can optionally be bound to namespaces, but namespaces
> > themselves are a lower layer that can be used without any of these
> > things. I'm talking here about requiring the parser to convert
> > <foo:item> into (namespace, item) but do not intend to imply that any
> > kind of schema should be loaded for the specified namespace. 
> 
> That sounds sensible.
> 
> > The XMLReader.setNamespaceAware(true) method does exactly this; enables
> > mapping of prefixes -> namespaces, but does not enable processing of
> > either DTDs or schemas.
> 
> I don't think it actually has any impact at all on DTD processing.
> DTDs, if declared, are always processed unless you install an entity 
> resolver that excises that activity out.

You are right; DTDs get processed in the same manner regardless of
whether the parser is namespace-aware or not. What I meant was
namespaceAware does not affect the parser's handling of DTDs or schemas
(though it is a prerequisite for schema validation).

> 
> > >  I agree
> > > that old parsers providing (c) aren't particularly interesting, but
> > > if you spend any time tracing through the guts of the parsing, 
> > > particularly
> > > when you see how DTDs are loaded for entity resolution, you begin to see 
> > > (d) as having potential.  Throwing (b) away may result in less code in
> > > Digester2, but it may be worth doing some timing tests to see if that 
> > > code reduction is consequence-free.
> > 
> > What does loading DTDs have to do with namespaces?
> 
> As you said, the XML spec doesn't require that the namespaces mean
> anything, and hence it is possible that a parser won't try to resolve
> and validate against multiple DTDs, but I haven't ever traced through
> the code in a situation where there were multiple namespaces to
> resolve against, so I don't know if there is relationship there or not.
> In general, if a parser thinks it needs a DTD in order to understand
> a document, it tends to grab it.  

I presume you're using "DTD" as a general term covering both traditional
DTDs (which are not namespace-aware) and w3c schemas?

An xml parser does need to read a DTD regardless of whether validation
is enabled or not, for the reasons you pointed out: default attributes,
entity definitions etc.

But w3c xml schemas deliberately don't have any functionality that
affects the infoset of the document. So if you're not validating you can
completely ignore any xml schema - and parsers do. To double-check, I
tested this today, and verified the entity resolver isn't called to
resolve xsi:schemaLocation references unless validation is enabled.

> I don't know if there are situations
> where it tries to interpret namespace declations as public ids for DTDs.
No, xml parsers never dereference namespace-uris to load either DTDs or
schemas. The only way to reference a schema from an xml document is via
  xsi:schemaLocation="namespace url"

I think some XML editing programs do try to load schemas based upon the
namespace URI (eg jEdit, XMLSpy) but this is quite different (and
probably against the xml standard).


> > > > I still find it hard to believe that leaving out namespace support makes
> > > > a performance difference. The parser needs to keep a map of
> > > >    prefix->(stack of namespace)
> > > > and that's about it. 
> 
> I stopped using belief as a measurement of code a long time
> ago.  Usually only works when I wrote all the code.  :-)
> I'll cook up an experiment and see what I can come up with
> in the way of timing information.

That would be excellent. I look forward to seeing the results..


Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [digester2] performance of ns-aware parsing

Reply via email to