On Thu, 2005-02-03 at 07:52 -0800, Reid Pinchback wrote: > --- Simon Kitching <[EMAIL PROTECTED]> wrote: > > > On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote: > > Of course if someone can demonstrate that non-namespace-aware parsers > > *are* still useful then I'll change my mind. > > Just to clarify, since I was being sloppy before (I gotta > stop typing in shorthand) there is an important distinction: > > a) having NS-aware parser, always using NS-aware API methods > b) having NS-aware parser, selectively using NS-aware API methods > c) having non-NS-aware parser (and obviously never using NS-aware API methods) > d) having NS-aware parser where the developer fixes a grammar that > ignores any NS distinctions >
> Even for Sax the performance difference between (a) and (b) is roughly > a factor of 2 across all parsers when processing small (typical > message-sized) > docs that don't use NS. I would *really* love to see some actual measurements on this if you can find some. You seem to be quoting from some study you have done or read - it would be great to have this. [See comments on Piccolo below] > Mucking with (d) is supposed to result in significant > wins when you tune the grammar handling to your app, but I haven't tried it > myself and I've never seen timing differences quoted. > I don't quite understand what (d) means, but is it actually relevant? Again, we are talking about *namespaces* not validation. The w3c namespaces spec clearly makes a distinction between namespaces and whether or not the namespace URI "means" anything: <quote source="http://www.w3c.org/TR/xml-names11/"> Note also that the Namespaces specification says nothing about what might (or might not) happen if one were to attempt to dereference a URI/IRI used to identify a namespace. </quote> What I'm trying to achieve is to avoid having actions or patterns deal with element-names containing prefixes, eg stating that an element's name is "foo:item". This is just broken; the item's name is really the tuple (some-namespace, item). Grammars/schemas can optionally be bound to namespaces, but namespaces themselves are a lower layer that can be used without any of these things. I'm talking here about requiring the parser to convert <foo:item> into (namespace, item) but do not intend to imply that any kind of schema should be loaded for the specified namespace. The XMLReader.setNamespaceAware(true) method does exactly this; enables mapping of prefixes -> namespaces, but does not enable processing of either DTDs or schemas. > I'm not trying to advocate any approach except to notice that, since your > README mentioned requiring a namespace-aware parser, it sounded like > there was a potential for options (b), (c), and (d) to become unintentionally > closed to developers in Digester2 when they weren't in Digester1. Well, I did intend to close options (b) and (c) as I didn't believe there was any reason at all to support them. Some real measurements showing the kind of performance you quote would definitely change my mind. > I agree > that old parsers providing (c) aren't particularly interesting, but > if you spend any time tracing through the guts of the parsing, particularly > when you see how DTDs are loaded for entity resolution, you begin to see > (d) as having potential. Throwing (b) away may result in less code in > Digester2, but it may be worth doing some timing tests to see if that > code reduction is consequence-free. What does loading DTDs have to do with namespaces? > > I still find it hard to believe that leaving out namespace support makes > > a performance difference. The parser needs to keep a map of > > prefix->(stack of namespace) > > and that's about it. > > Actually the XML spec distinguishes between the default namespace > and all other namespaces, so parsers can reasonably make the same > distinction and try to avoid a bunch of per-entity operations and > temporary object creations in the case where there is no namespace. Sorry, what per-entity operations, and what temporary object creations? > Look at the piccolo stats published on Sourceforge. Compare Soap, > Soap+NS, and random XML-no NS timings and it suggests that NS > ain't free. > > Useful links: > > Jade (now part of Javolution) http://javolution.org/api/index.html, > look at the javolution.xml package (trades String for CharSequence > to increase performance, but keeps NS) Hmm.. I've added a reference to javolution to the wiki. However I couldn't find any info on the performance of namespaceAware vs nonNamespaceAware... > > Picollo you probably already have the link for, but for anybody > else interested: http://piccolo.sourceforge.net Piccolo does have a page where they state their performance tests for "SOAP - namespaces off" is about 12% faster than "SOAP - namespaces on". But there is no further info on what these phrases mean. The piccolo site provides a download for "SAXBench" benchmarking tool, but (a) I never managed to get this working, and (b) it doesn't seem to include the SOAP tests referenced anyway. http://piccolo.sourceforge.net/bench.html > > Zapthink comments on XML parsing challenges, > > http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci858888,00.html No occurrence of the word "namespace" anywhere in the article. > > Developerworks articles on XML performance, > http://www-106.ibm.com/developerworks/xml/library/x-perfap1.html > This article had this paragraph: <quote> You should also avoid using namespaces in your applications unless they're absolutely necessary. Processing a document with the namespace feature enabled can slow the processing of the whole document. A parser not only processes namespace declarations, verifying their correctness, but it also ensures that an XML document is namespace well-formed. </quote> but I believe this refers only to code that builds DOMs then serializes them; during serialization the DOM tree is checked to make sure all elements have valid namespace declarations. This is not relevant to digester. > Sun articles on XML performance, > http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML_part3/ This article didn't seem to have any performance info about namespaces. So in summary: My instincts still tell me that: * for documents that don't use namespaces, enabling namespace-aware parsing will have no impact at all. * for documents that do use namespaces, sane coders will want proper namespace-aware support anyway * for performance-maniacs of the sort who would deliberately process documents with namespaces using a non-namespace-aware parser in order to get faster performance, they are out of luck and will have to wear a performance hit of about 1%. Or they can patch digester themselves. The piccolo stats suggest they tested *something* to do with namespaces and got a 12% hit, but as no further details are provided it's hard to tell whether this is relevant or not. For the moment, therefore, I don't intend to add non-ns-aware-parser support for digester2. Anyone else is very welcome to provide a proper performance test that proves me wrong at which time I will offer my congratulations and personally commit their patch to add this feature. Regards, Simon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]