Re: [digester2] Additions

Simon Kitching Mon, 14 Feb 2005 14:17:46 -0800

On Mon, 2005-02-14 at 17:24 +0100, Oliver Zeigermann wrote:
> On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching <[EMAIL PROTECTED]> wrote:
> > > - DefaultRuleManager now returns an unmodifiable list upon 
> > > getMatchingActions
> > 
> > I'm not so sure about this one.
> > 
> > This change means a new object is created each time getMatchingActions
> > is called, and that is once for every xml element in the input document.
> > And in addition, each method call is "relayed" by the UnmodifiableList
> > wrapper to the underlying one, resulting in a performance hit too. Is
> > this significant in the context of parsing an xml document? I don't
> > know; maybe it's not measurable.
> > 
> > However the benefits of this change are really limited only to
> > preventing bad code in SAXHandler or RuleManager classes. Cases where
> > users write their own versions of those will be extremely rare. And
> > mistakes in the core digester classes are, I think, likely to be picked
> > up by our unit tests.
> > 
> > So my current feeling is that while the above change **is theoretically
> > correct**, I would leave it out and return the actual list, just
> > documenting that it should be treated as unmodifiable.
> > 
> 
> Well, think of the situation where people inherit and do things like I
> have done on my first try. So, I would be all for keeping it in, but
> would not protest if you removed it.


I said above: "Cases where users write their own versions of those will
be extremely rare". I really believe that the number of people writing
their own custom RuleManager class will be about 3, worldwide, over the
next decade. But the number of people who get impacted by the
(admittedly small) performance hit for the proposed change is 100% of
users.

I guess we have a tie here: +1 from you, and -1 from me. The best thing
would be if we got a third opinion from either Craig or Robert.

Otherwise, we could just have a remote game of bzflag
(http://bzflag.org/screenshots/bzfi0026.jpg) or something, with winner
takes all :-)

[snip]

> > 
> > The ReadOnlyConcatList is cute. I don't think it would save any time or
> > effort over
> >   List list = new ArrayList(l1.size() + l2.size());
> >   list.appendAll(l1);
> >   list.appendAll(l2);
> > but I guess the unmodifiable feature is thrown in for free.
> 
> ReadOnlyConcatList certainly would be faster and have a smaller memory
> footprint.

Can you describe why ReadOnlyConcatList is faster and smaller?

The array list is (in c terms) just an array of pointers to Actions.

To create a new one, I am sure the ArrayList constructor does:
 * malloc a "manager" object with a few ints (size, capacity) 
   and a reference to the data block.
 * malloc the data block, which is capacity*sizeof(reference)
 * for each entry in the source list, copy reference to new list [1] [2]
Once the new ArrayList has been created, access is very fast.
[1] The list is likely to be 1, 2, or 3 elements. Having more than 3
Actions match a particular input xml element would be very unusual..
[2] There is even the possibility that ArrayList has been optimised to
do a "memcpy" if its source is an ArrayList.

With the ReadOnlyConcatList, the work done is:
 * malloc a block of data for the ReadOnlyConcatList, which contains one
int and the two references to ArrayList objects.
 * copy 2 references and call size once
And accesses are fractionally slower, because get invokes the
ReadOnlyConcatList, which performs an if-statement then invokes the get
method on the appropriate underlying list.


So on the positive side I think that ReadOnlyConcatList is marginally
smaller (it doesn't need the data block, which is typically 4-12 bytes),
and slightly more efficient to create (one less malloc, a few less
copies of references)

On the negative, the get method is slightly slower to use. It also adds
a dozen lines to the source and an extra .class file to the jar.

All in all, we are debating microseconds and handfuls of bytes here.
Both seem *acceptable* solutions to me, though I do prefer simple in
this case. And like the Unmodifiable discussion, the easiest resolution
would be if some other Digester developer (eg Robert or Craig) would
give their call, then we could get on with it.


>  
> > I'm also thinking about the possibility that we may move to a
> > RuleManager style that matches based on much richer state than just the
> > current path. FOr example, if we want to support patterns like this:
> >  "/foo/[EMAIL PROTECTED]'me']/baz[pos()==last()]"    [1]
> > then getMatchingActions will definitely not be taking a String path any
> > more, but instead an array of w3c.dom.Element objects or similar so it
> > has sufficient info to decide whether the pattern matches or not. That
> > won't affect many places in the code, but would make this caching
> > impossible to implement so we'd have to take it out then anyway.
> 
> We can worry about this as soon as getMatchingActions takes anything
> else than a String. Right now the method signature that I implemented
> says it is a String, so there is no problem.

It just seems a shame for you to write code, javadoc, and unit tests for
something that may have to be thrown away in order to support changes to
the RuleManager class that are in the roadmap.




[snip simon's counter-proposal]

> OK, if you really want to do this, I won't stop you.

I really do want you (and as many others as possible) working on
digester2. More ideas is definitely better, and anyway it's boring
working on a project alone. So I'm very reluctant indeed to commit my
solution just because "no-one will stop me".

I think a "new" project is always going to have more debate on these
sorts of things than an established one. And having two contributors
makes it trickier. We just need a little patience, and a willingness to
do lots of back-and-forth discussion until conventions for the project
are established...

If neither Robert nor Craig offer their opinion, then I'm willing to
settle for the following compromise. It won't make either of us 100%
happy, but I don't think there's anything we can't live with, either,
and we can move forward again:
 * merge "fallback" rules into RuleManager and DefaultRuleManager,
   and remove FallbackRuleManager
 * leave SupplementaryRuleManager exactly as you wrote it, with
   caching and ReadOnlyConcatList (except for the changes needed
   due to removal of FallbackRuleManager)
 * remove the UnmodifiableList stuff from DefaultRuleManager

Note also that while I won't deliberately change the RuleManager
interface to cause problems for SupplementaryRuleManager, I won't
hesitate to change it if it is needed to add the more complex matching
features I described, and fixing SupplementaryRuleManager will be up to
you...

Would that be ok by you?

Cheers,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [digester2] Additions

Reply via email to