Re: earth-to-digester

Craig R. McClanahan Wed, 01 Aug 2001 08:37:25 -0700
Remembering that Digester is based on SAX, any matching technology has to
be able to function when the appropriate SAX events are fired.  In the
current implementation, Digester calls getRules() to determine the ones
that match in two different places:

* startElement() so it can call the Rule.begin() method on all
  matching rules, and

* endElement() so it can call the Rule.body() method (if there was
  any body text) and Rule.end() method of all matching rules

So, with no changes, you could figure out a way to select matching rules
based on XPath expressions (or anything else that depended on
attributes) only in the startElement, where the attributes are available.
Therefore, to implement something like XPath matching, we'd have to keep a
stack of the actual elements (and attributes) representing the current
nesting in the graph.

While this probably isn't so complex, it does slow down performance pretty
significantly on the "simple cases" that Digester was originally designed
for.  Remember, we're doing *one pass* through the XML document, and the
whole idea of SAX is to avoid building the entire document in memory like
a DOM structure does.

It seems to me that changing getRules() to use regexp matching on the
element names match expression would deal with most of the use cases that
have been presented.  That way, you can do things like:

* Match if my element is *anywhere* in the match list, or at the front
  only, or at the back only ("a/*", "*/a/*", "*/a", and more complicated
  combinations)

* Match on an element that is nested inside another element, no matter
  how far (a/*/b)

* Match nested pairs of elements, no matter how deep they are
  (a/b/*, */a/b/*, */a/b)

What do you think?

Craig

PS:  In the mean time, I'm going to go ahead and fix the problem on
matching when namespace awareness has been turned off - see my message of
a couple weeks ago about this.



On Wed, 1 Aug 2001, James Strachan wrote:

> From: "Incze Lajos" <[EMAIL PROTECTED]>
> > On Sun, Jul 29, 2001 at 09:18:04PM +0100, robert burrell donkin wrote:
> > > On Sunday, July 29, 2001, at 07:02 PM, Scott Sanders wrote:
> > >
> > > > Bring it on!
> > >
> > > the way that matching rules work at the moment are a concern to me.
> > > (maybe i don't understand then well enough - or maybe they need
> enhancing.
> > >   i'm going to write as if i understand them but i'm sure you'll set me
> > > right where i don't)
> > >
> > > the current way that matching rules work means that the number of rules
> > > required rises almost exponentially for complex schema.
> > >
> > > you can only wildcard prefixes (*/a but not a/*). this means that you
> end
> > > up having a rule for every child for a parent that adds child in a
> certain
> >
> > ... etc. Ithink that in digester it would be a good idea to change the
> > JSP-ish matching rules to XPATH expressions.
> 
> 
> I agree - though I also agree with Craigs concerns that digester is small
> and lightweight and built on top of SAX so dom4j/JDOM/XPath might be too
> much.
> 
> 
> Maybe we can introduce some simple XPath features to the matching of
> digester. e.g.
> 
> foo/@name
> 
> means match the name attribute of element a, like this
> 
>     <foo name="James">
> 
> This would be useful when properties are in attributes rather than elements.
> 
> Similarly for recursive structures the path
> 
> //foo/bar
> 
> would match all bar elements that are children of foo. Is this equivalent to
> */foo/bar in digester right now?
> Though this could be used inside a path like this...
> 
> //foo//bar
> 
> which is similar to *foo*bar in current digester pattern matching.
> 
> 
> Another idea borrowed from XPath could be the absolute / relative paths.
> Anything starting with "/" is denoted an absolute path (rather like in file
> systems) and so begins at the start of the document, whereas everything else
> starts at the current context.
> 
> If working on a 'big model' you could end up with lots of long paths...
> 
> /a/b/c/d/e/f...
> 
> if there were some way to define a 'context' then we could use relative
> paths.
> 
> e.g.
> 
> Context context = new Context( digester, "/a/b/c/d/e" );
> 
> context.addObjectCreate( "f", ... );
> context.addCallMethod( "f/bar", ... );
> 
> Its syntax sugar but it would allow the same set of rules to be used on
> different 'contexts' if ever the document structure changes and can make
> long paths easier to manage.
> 
> 
> 
> > At the same time digester
> > could use JDOM or DOM4J (they have esstially the same XPATH engine).
> 
> FWIW they will shortly share *exactly* the same XPath engine. The new Jaxen
> project (http://jaxen.org) is making excellent progress. Its an XPath engine
> which can be bound to any tree model, whether dom4j, JDOM, EXML or DOM.
> Hopefully one day soon it will support java beans as well. Currently Jaxen
> completely supports dom4j and DOM and has nearly complete support for JDOM
> and EXML.
> 
> 
> > XPATH was designed to walk through an XML graph, so you can express
> > as complex or as simple rules as you want. both JDOM and DOM4j gives
> > you a pretty convenient (I mean collections) interface to the
> > document. Comments?
> 
> I also share your appreciation of XPath (particularly being founder dom4j
> and cofounder of Jaxen ;-) though parsing a document via dom4j or JDOM then
> performing XPath expressions on it to figure out which Java Bean objects to
> construct may be a little too much for what digester is intended to b. I
> guess it all depends on the complexity of the mapping from XML to beans. If
> its fairly simple, then digester rocks as is. If its very complex then XPath
> comes into its own.
> 
> James
> 
> 
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
> 
>
Re: earth-to-digester

Reply via email to