On Mon, 2005-02-14 at 17:24 +0100, Oliver Zeigermann wrote: > On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching <[EMAIL PROTECTED]> wrote: > > > - DefaultRuleManager now returns an unmodifiable list upon > > > getMatchingActions > > > > I'm not so sure about this one. > > > > This change means a new object is created each time getMatchingActions > > is called, and that is once for every xml element in the input document. > > And in addition, each method call is "relayed" by the UnmodifiableList > > wrapper to the underlying one, resulting in a performance hit too. Is > > this significant in the context of parsing an xml document? I don't > > know; maybe it's not measurable. > > > > However the benefits of this change are really limited only to > > preventing bad code in SAXHandler or RuleManager classes. Cases where > > users write their own versions of those will be extremely rare. And > > mistakes in the core digester classes are, I think, likely to be picked > > up by our unit tests. > > > > So my current feeling is that while the above change **is theoretically > > correct**, I would leave it out and return the actual list, just > > documenting that it should be treated as unmodifiable. > > > > Well, think of the situation where people inherit and do things like I > have done on my first try. So, I would be all for keeping it in, but > would not protest if you removed it.
I said above: "Cases where users write their own versions of those will be extremely rare". I really believe that the number of people writing their own custom RuleManager class will be about 3, worldwide, over the next decade. But the number of people who get impacted by the (admittedly small) performance hit for the proposed change is 100% of users. I guess we have a tie here: +1 from you, and -1 from me. The best thing would be if we got a third opinion from either Craig or Robert. Otherwise, we could just have a remote game of bzflag (http://bzflag.org/screenshots/bzfi0026.jpg) or something, with winner takes all :-) [snip] > > > > The ReadOnlyConcatList is cute. I don't think it would save any time or > > effort over > > List list = new ArrayList(l1.size() + l2.size()); > > list.appendAll(l1); > > list.appendAll(l2); > > but I guess the unmodifiable feature is thrown in for free. > > ReadOnlyConcatList certainly would be faster and have a smaller memory > footprint. Can you describe why ReadOnlyConcatList is faster and smaller? The array list is (in c terms) just an array of pointers to Actions. To create a new one, I am sure the ArrayList constructor does: * malloc a "manager" object with a few ints (size, capacity) and a reference to the data block. * malloc the data block, which is capacity*sizeof(reference) * for each entry in the source list, copy reference to new list [1] [2] Once the new ArrayList has been created, access is very fast. [1] The list is likely to be 1, 2, or 3 elements. Having more than 3 Actions match a particular input xml element would be very unusual.. [2] There is even the possibility that ArrayList has been optimised to do a "memcpy" if its source is an ArrayList. With the ReadOnlyConcatList, the work done is: * malloc a block of data for the ReadOnlyConcatList, which contains one int and the two references to ArrayList objects. * copy 2 references and call size once And accesses are fractionally slower, because get invokes the ReadOnlyConcatList, which performs an if-statement then invokes the get method on the appropriate underlying list. So on the positive side I think that ReadOnlyConcatList is marginally smaller (it doesn't need the data block, which is typically 4-12 bytes), and slightly more efficient to create (one less malloc, a few less copies of references) On the negative, the get method is slightly slower to use. It also adds a dozen lines to the source and an extra .class file to the jar. All in all, we are debating microseconds and handfuls of bytes here. Both seem *acceptable* solutions to me, though I do prefer simple in this case. And like the Unmodifiable discussion, the easiest resolution would be if some other Digester developer (eg Robert or Craig) would give their call, then we could get on with it. > > > I'm also thinking about the possibility that we may move to a > > RuleManager style that matches based on much richer state than just the > > current path. FOr example, if we want to support patterns like this: > > "/foo/[EMAIL PROTECTED]'me']/baz[pos()==last()]" [1] > > then getMatchingActions will definitely not be taking a String path any > > more, but instead an array of w3c.dom.Element objects or similar so it > > has sufficient info to decide whether the pattern matches or not. That > > won't affect many places in the code, but would make this caching > > impossible to implement so we'd have to take it out then anyway. > > We can worry about this as soon as getMatchingActions takes anything > else than a String. Right now the method signature that I implemented > says it is a String, so there is no problem. It just seems a shame for you to write code, javadoc, and unit tests for something that may have to be thrown away in order to support changes to the RuleManager class that are in the roadmap. [snip simon's counter-proposal] > OK, if you really want to do this, I won't stop you. I really do want you (and as many others as possible) working on digester2. More ideas is definitely better, and anyway it's boring working on a project alone. So I'm very reluctant indeed to commit my solution just because "no-one will stop me". I think a "new" project is always going to have more debate on these sorts of things than an established one. And having two contributors makes it trickier. We just need a little patience, and a willingness to do lots of back-and-forth discussion until conventions for the project are established... If neither Robert nor Craig offer their opinion, then I'm willing to settle for the following compromise. It won't make either of us 100% happy, but I don't think there's anything we can't live with, either, and we can move forward again: * merge "fallback" rules into RuleManager and DefaultRuleManager, and remove FallbackRuleManager * leave SupplementaryRuleManager exactly as you wrote it, with caching and ReadOnlyConcatList (except for the changes needed due to removal of FallbackRuleManager) * remove the UnmodifiableList stuff from DefaultRuleManager Note also that while I won't deliberately change the RuleManager interface to cause problems for SupplementaryRuleManager, I won't hesitate to change it if it is needed to add the more complex matching features I described, and fixing SupplementaryRuleManager will be up to you... Would that be ok by you? Cheers, Simon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]