Re: [digester2] Additions

2005-02-14 Thread Simon Kitching
Hi Oliver,

On Mon, 2005-02-14 at 08:16 +0100, Oliver Zeigermann wrote:
 [update]
 
 - now all member variables of DefaultRuleManager are private -
 introduced copy ctor for copying members from subclasses

thanks - that looks good to me.

 - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions

I'm not so sure about this one. 

This change means a new object is created each time getMatchingActions
is called, and that is once for every xml element in the input document.
And in addition, each method call is relayed by the UnmodifiableList
wrapper to the underlying one, resulting in a performance hit too. Is
this significant in the context of parsing an xml document? I don't
know; maybe it's not measurable. 

However the benefits of this change are really limited only to
preventing bad code in SAXHandler or RuleManager classes. Cases where
users write their own versions of those will be extremely rare. And
mistakes in the core digester classes are, I think, likely to be picked
up by our unit tests.

So my current feeling is that while the above change **is theoretically
correct**, I would leave it out and return the actual list, just
documenting that it should be treated as unmodifiable.

Comments?

 - created a new FallbackRuleManager class that only takes care about
 fallback actions

[see my comments at end first!]

Have you considered writing this class as a *decorator* rather than a
subclass of DefaultRuleManager? If it was a decorator, it could then be
applied to any rulemanager class, not just the default. It would be used
as follows:
  RuleManager realrm = new DefaultRuleManager(); // or any other type
  RuleManager rm = new FallbackRuleManagerDecorator(realrm);
  Digester digester = new Digester(rm);

The disadvantage would be that it would be necessary to implement
each of the RuleManager methods in order to forward them to the
decorated instance, together with the performance hit that would entail.
[ah, if only this were Ruby]

 - SupplementaryRuleManager now takes account of the explicitely
 unmodifiable list returned by DefaultRuleManager and does a wild
 mixture of caching and concatenating the list returned from
 DefaultRuleManager and its own list of supplementary actions

[see my comments at end first!]

The ReadOnlyConcatList is cute. I don't think it would save any time or
effort over
  List list = new ArrayList(l1.size() + l2.size());
  list.appendAll(l1);
  list.appendAll(l2);
but I guess the unmodifiable feature is thrown in for free.

I've been thinking about the caching and see a mild problem. If wildcard
pattern a (equivalent to */a in digester1), and a elements are
scattered around in various places in the input document, then the cache
is going to grow fairly large.

So basically, whether caching improve things or not depends upon whether
the input xml is simple and repetitive (caching wins) or reasonably
unstructured (caching loses). And given we can't tell which is more
likely over the set of digester users (at least I don't have a clue) I
would probably prefer the simplest solution - not to cache.

I'm also thinking about the possibility that we may move to a
RuleManager style that matches based on much richer state than just the
current path. FOr example, if we want to support patterns like this:
 /foo/[EMAIL PROTECTED]'me']/baz[pos()==last()][1]
then getMatchingActions will definitely not be taking a String path any
more, but instead an array of w3c.dom.Element objects or similar so it
has sufficient info to decide whether the pattern matches or not. That
won't affect many places in the code, but would make this caching
impossible to implement so we'd have to take it out then anyway.

[1] sorry my xpath is a bit rusty - this is probably not valid

 - Both FallbackRuleManager and SupplementaryRuleManager now can have
 several callback actions
 - Still docs and tests need improvement - will do so once the code
 seems somewhat stable
 
 Comments?

You're probably going to hit me for this :-)

After some thought, I think my preferred option would be to include
fallback and supplementary features as standard, ie define them in
the RuleManager interface, provide an implementation in the
DefaultRuleManager class, and to remove the FallbackRuleManager and
SupplementaryRuleManager classes completely.

I've always said I'm happy for fallback features to be in
DefaultRuleManager, and after thought I believe it ought to be defined
as part of the main RuleManager interface. And as you're so keen for
supplementary actions, I'm ok with adding them too as long as there is
no performance hit for people who don't use them. Note that this would
definitely be the easiest solution from the user point of view, with no
special classes required in order to use supplementary actions.

And I would go for the simplest implementation of supplementary
actions, by simply creating a new list each time getMatchingActions is
called when one or more supplementary actions are 

Re: [digester2] Additions

2005-02-14 Thread Oliver Zeigermann
On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching [EMAIL PROTECTED] wrote:
  - DefaultRuleManager now returns an unmodifiable list upon 
  getMatchingActions
 
 I'm not so sure about this one.
 
 This change means a new object is created each time getMatchingActions
 is called, and that is once for every xml element in the input document.
 And in addition, each method call is relayed by the UnmodifiableList
 wrapper to the underlying one, resulting in a performance hit too. Is
 this significant in the context of parsing an xml document? I don't
 know; maybe it's not measurable.
 
 However the benefits of this change are really limited only to
 preventing bad code in SAXHandler or RuleManager classes. Cases where
 users write their own versions of those will be extremely rare. And
 mistakes in the core digester classes are, I think, likely to be picked
 up by our unit tests.
 
 So my current feeling is that while the above change **is theoretically
 correct**, I would leave it out and return the actual list, just
 documenting that it should be treated as unmodifiable.
 

Well, think of the situation where people inherit and do things like I
have done on my first try. So, I would be all for keeping it in, but
would not protest if you removed it.

  - created a new FallbackRuleManager class that only takes care about
  fallback actions
 
 [see my comments at end first!]
 
 Have you considered writing this class as a *decorator* rather than a
 subclass of DefaultRuleManager? If it was a decorator, it could then be
 applied to any rulemanager class, not just the default. It would be used
 as follows:
   RuleManager realrm = new DefaultRuleManager(); // or any other type
   RuleManager rm = new FallbackRuleManagerDecorator(realrm);
   Digester digester = new Digester(rm);
 
 The disadvantage would be that it would be necessary to implement
 each of the RuleManager methods in order to forward them to the
 decorated instance, together with the performance hit that would entail.
 [ah, if only this were Ruby]
 
  - SupplementaryRuleManager now takes account of the explicitely
  unmodifiable list returned by DefaultRuleManager and does a wild
  mixture of caching and concatenating the list returned from
  DefaultRuleManager and its own list of supplementary actions
 
 [see my comments at end first!]
 
 The ReadOnlyConcatList is cute. I don't think it would save any time or
 effort over
   List list = new ArrayList(l1.size() + l2.size());
   list.appendAll(l1);
   list.appendAll(l2);
 but I guess the unmodifiable feature is thrown in for free.

ReadOnlyConcatList certainly would be faster and have a smaller memory
footprint.

 I've been thinking about the caching and see a mild problem. If wildcard
 pattern a (equivalent to */a in digester1), and a elements are
 scattered around in various places in the input document, then the cache
 is going to grow fairly large.
 
 So basically, whether caching improve things or not depends upon whether
 the input xml is simple and repetitive (caching wins) or reasonably
 unstructured (caching loses). And given we can't tell which is more
 likely over the set of digester users (at least I don't have a clue) I
 would probably prefer the simplest solution - not to cache.

If the cache gets too large we can easily limit its size using LRUMap
or something.
 
 I'm also thinking about the possibility that we may move to a
 RuleManager style that matches based on much richer state than just the
 current path. FOr example, if we want to support patterns like this:
  /foo/[EMAIL PROTECTED]'me']/baz[pos()==last()][1]
 then getMatchingActions will definitely not be taking a String path any
 more, but instead an array of w3c.dom.Element objects or similar so it
 has sufficient info to decide whether the pattern matches or not. That
 won't affect many places in the code, but would make this caching
 impossible to implement so we'd have to take it out then anyway.

We can worry about this as soon as getMatchingActions takes anything
else than a String. Right now the method signature that I implemented
says it is a String, so there is no problem.

 You're probably going to hit me for this :-)

I can't, you are too far away ;)
 
 After some thought, I think my preferred option would be to include
 fallback and supplementary features as standard, ie define them in
 the RuleManager interface, provide an implementation in the
 DefaultRuleManager class, and to remove the FallbackRuleManager and
 SupplementaryRuleManager classes completely.
 
 I've always said I'm happy for fallback features to be in
 DefaultRuleManager, and after thought I believe it ought to be defined
 as part of the main RuleManager interface. And as you're so keen for
 supplementary actions, I'm ok with adding them too as long as there is
 no performance hit for people who don't use them. Note that this would
 definitely be the easiest solution from the user point of view, with no
 special classes required in order to 

Re: [digester2] Additions

2005-02-14 Thread Simon Kitching
On Mon, 2005-02-14 at 17:24 +0100, Oliver Zeigermann wrote:
 On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching [EMAIL PROTECTED] wrote:
   - DefaultRuleManager now returns an unmodifiable list upon 
   getMatchingActions
  
  I'm not so sure about this one.
  
  This change means a new object is created each time getMatchingActions
  is called, and that is once for every xml element in the input document.
  And in addition, each method call is relayed by the UnmodifiableList
  wrapper to the underlying one, resulting in a performance hit too. Is
  this significant in the context of parsing an xml document? I don't
  know; maybe it's not measurable.
  
  However the benefits of this change are really limited only to
  preventing bad code in SAXHandler or RuleManager classes. Cases where
  users write their own versions of those will be extremely rare. And
  mistakes in the core digester classes are, I think, likely to be picked
  up by our unit tests.
  
  So my current feeling is that while the above change **is theoretically
  correct**, I would leave it out and return the actual list, just
  documenting that it should be treated as unmodifiable.
  
 
 Well, think of the situation where people inherit and do things like I
 have done on my first try. So, I would be all for keeping it in, but
 would not protest if you removed it.

I said above: Cases where users write their own versions of those will
be extremely rare. I really believe that the number of people writing
their own custom RuleManager class will be about 3, worldwide, over the
next decade. But the number of people who get impacted by the
(admittedly small) performance hit for the proposed change is 100% of
users.

I guess we have a tie here: +1 from you, and -1 from me. The best thing
would be if we got a third opinion from either Craig or Robert.

Otherwise, we could just have a remote game of bzflag
(http://bzflag.org/screenshots/bzfi0026.jpg) or something, with winner
takes all :-)

[snip]

  
  The ReadOnlyConcatList is cute. I don't think it would save any time or
  effort over
List list = new ArrayList(l1.size() + l2.size());
list.appendAll(l1);
list.appendAll(l2);
  but I guess the unmodifiable feature is thrown in for free.
 
 ReadOnlyConcatList certainly would be faster and have a smaller memory
 footprint.

Can you describe why ReadOnlyConcatList is faster and smaller?

The array list is (in c terms) just an array of pointers to Actions.

To create a new one, I am sure the ArrayList constructor does:
 * malloc a manager object with a few ints (size, capacity) 
   and a reference to the data block.
 * malloc the data block, which is capacity*sizeof(reference)
 * for each entry in the source list, copy reference to new list [1] [2]
Once the new ArrayList has been created, access is very fast.
[1] The list is likely to be 1, 2, or 3 elements. Having more than 3
Actions match a particular input xml element would be very unusual..
[2] There is even the possibility that ArrayList has been optimised to
do a memcpy if its source is an ArrayList.

With the ReadOnlyConcatList, the work done is:
 * malloc a block of data for the ReadOnlyConcatList, which contains one
int and the two references to ArrayList objects.
 * copy 2 references and call size once
And accesses are fractionally slower, because get invokes the
ReadOnlyConcatList, which performs an if-statement then invokes the get
method on the appropriate underlying list.


So on the positive side I think that ReadOnlyConcatList is marginally
smaller (it doesn't need the data block, which is typically 4-12 bytes),
and slightly more efficient to create (one less malloc, a few less
copies of references)

On the negative, the get method is slightly slower to use. It also adds
a dozen lines to the source and an extra .class file to the jar.

All in all, we are debating microseconds and handfuls of bytes here.
Both seem *acceptable* solutions to me, though I do prefer simple in
this case. And like the Unmodifiable discussion, the easiest resolution
would be if some other Digester developer (eg Robert or Craig) would
give their call, then we could get on with it.


  
  I'm also thinking about the possibility that we may move to a
  RuleManager style that matches based on much richer state than just the
  current path. FOr example, if we want to support patterns like this:
   /foo/[EMAIL PROTECTED]'me']/baz[pos()==last()][1]
  then getMatchingActions will definitely not be taking a String path any
  more, but instead an array of w3c.dom.Element objects or similar so it
  has sufficient info to decide whether the pattern matches or not. That
  won't affect many places in the code, but would make this caching
  impossible to implement so we'd have to take it out then anyway.
 
 We can worry about this as soon as getMatchingActions takes anything
 else than a String. Right now the method signature that I implemented
 says it is a String, so there is no problem.

It just seems a 

Re: [digester2] Additions

2005-02-14 Thread Oliver Zeigermann
On Tue, 15 Feb 2005 11:27:29 +1300, Simon Kitching [EMAIL PROTECTED] wrote:
 On Mon, 2005-02-14 at 17:24 +0100, Oliver Zeigermann wrote:
  On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching [EMAIL PROTECTED] 
  wrote:
- DefaultRuleManager now returns an unmodifiable list upon 
getMatchingActions
  
   I'm not so sure about this one.
  
   This change means a new object is created each time getMatchingActions
   is called, and that is once for every xml element in the input document.
   And in addition, each method call is relayed by the UnmodifiableList
   wrapper to the underlying one, resulting in a performance hit too. Is
   this significant in the context of parsing an xml document? I don't
   know; maybe it's not measurable.
  
   However the benefits of this change are really limited only to
   preventing bad code in SAXHandler or RuleManager classes. Cases where
   users write their own versions of those will be extremely rare. And
   mistakes in the core digester classes are, I think, likely to be picked
   up by our unit tests.
  
   So my current feeling is that while the above change **is theoretically
   correct**, I would leave it out and return the actual list, just
   documenting that it should be treated as unmodifiable.
  
 
  Well, think of the situation where people inherit and do things like I
  have done on my first try. So, I would be all for keeping it in, but
  would not protest if you removed it.
 
 I said above: Cases where users write their own versions of those will
 be extremely rare. I really believe that the number of people writing
 their own custom RuleManager class will be about 3, worldwide, over the
 next decade. But the number of people who get impacted by the
 (admittedly small) performance hit for the proposed change is 100% of
 users.
 
 I guess we have a tie here: +1 from you, and -1 from me. The best thing
 would be if we got a third opinion from either Craig or Robert.
 
 Otherwise, we could just have a remote game of bzflag
 (http://bzflag.org/screenshots/bzfi0026.jpg) or something, with winner
 takes all :-)

I am not +1. I am +0. I have reverted that change.
 
 [snip]
 
  
   The ReadOnlyConcatList is cute. I don't think it would save any time or
   effort over
 List list = new ArrayList(l1.size() + l2.size());
 list.appendAll(l1);
 list.appendAll(l2);
   but I guess the unmodifiable feature is thrown in for free.
 
  ReadOnlyConcatList certainly would be faster and have a smaller memory
  footprint.
 
 Can you describe why ReadOnlyConcatList is faster and smaller?
 
 The array list is (in c terms) just an array of pointers to Actions.
 
 To create a new one, I am sure the ArrayList constructor does:
  * malloc a manager object with a few ints (size, capacity)
and a reference to the data block.
  * malloc the data block, which is capacity*sizeof(reference)
  * for each entry in the source list, copy reference to new list [1] [2]
 Once the new ArrayList has been created, access is very fast.
 [1] The list is likely to be 1, 2, or 3 elements. Having more than 3
 Actions match a particular input xml element would be very unusual..
 [2] There is even the possibility that ArrayList has been optimised to
 do a memcpy if its source is an ArrayList.
 
 With the ReadOnlyConcatList, the work done is:
  * malloc a block of data for the ReadOnlyConcatList, which contains one
 int and the two references to ArrayList objects.
  * copy 2 references and call size once
 And accesses are fractionally slower, because get invokes the
 ReadOnlyConcatList, which performs an if-statement then invokes the get
 method on the appropriate underlying list.
 
 So on the positive side I think that ReadOnlyConcatList is marginally
 smaller (it doesn't need the data block, which is typically 4-12 bytes),
 and slightly more efficient to create (one less malloc, a few less
 copies of references)
 
 On the negative, the get method is slightly slower to use. It also adds
 a dozen lines to the source and an extra .class file to the jar.
 
 All in all, we are debating microseconds and handfuls of bytes here.
 Both seem *acceptable* solutions to me, though I do prefer simple in
 this case. And like the Unmodifiable discussion, the easiest resolution
 would be if some other Digester developer (eg Robert or Craig) would
 give their call, then we could get on with it.

No problem, really, simply replace it.

 If neither Robert nor Craig offer their opinion, then I'm willing to
 settle for the following compromise. It won't make either of us 100%
 happy, but I don't think there's anything we can't live with, either,
 and we can move forward again:
  * merge fallback rules into RuleManager and DefaultRuleManager,
and remove FallbackRuleManager
  * leave SupplementaryRuleManager exactly as you wrote it, with
caching and ReadOnlyConcatList (except for the changes needed
due to removal of FallbackRuleManager)
  * remove the UnmodifiableList stuff from DefaultRuleManager
 

Re: [digester2] Additions

2005-02-14 Thread Simon Kitching
On Tue, 2005-02-15 at 01:35 +0100, Oliver Zeigermann wrote:
 After I have thought about it, I am +1 to do it the way you proposed
 initially: have it all in one class. That's not because I am
 frustrated or am no longer interested, but I think that's a good
 solution and causes the least controversy. Go ahead with it please.

Ok, done.

I've used the term mandatoryAction rather than supplementaryAction
as it's equivalent, shorter and hopefully more familiar to non-english
speakers.

I've moved the path-matching functionality from SupplementaryRuleManager
into a new class SimpleMatchUtils. And of course new class 
  SimpleMatchUtilsTestCase
now contains the basic xmlio demonstration adapted from the old
  SupplementaryRuleManagerTestCase.

Cheers,

Simon



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester2] Additions

2005-02-13 Thread Oliver Zeigermann
 One design goal was to avoid using protected members. The problem here
 is that using protected for a *member variable* locks us into an
 implementation forever (a protected member is part of the external api
 that users can access). I think it preferable to add a public or
 protected method that provides access to the member rather than exposing
 the member directly. The method can later get the data in some manner
 other than reading from a member if we change the implementation. Note
 that having a method that returns a Map isn't a *whole* lot better. I
 think what we need is a copy-helper type method rather than exposing
 the namespace member just so a subclass can copy it. I can't think what
 the best solution is at the moment, but I feel there's an elegant (and
 probably fairly well known) solution nearby. How does clone() solve this
 issue?
 
 Note that I have (I think) been rigorous about avoiding protected
 members in the o.a.c.digester2 package, but not so rigorous in the
 action subpackage. I think it probably is a good idea to apply this to
 the action package too, though.

In case those member variables really change this would be a general
redesign I guess. In this case you might just as well create a copy
from DefaultRuleManager maybe called NewDefaultRuleManager or
DefaultRuleManager2 or whatever. I agree that passing the information
using a method is no better. And having something like a copy helper
is not enough as people might want have access to DefaultRuleManager 
guts in other scenarios as well.
 
 ===
 
 I would suggest that the match method be placed somewhere other than on
 the Path class. Digester could well provide multiple different matching
 implementations in future. I think a static method on some XMLIOUtils
 class might be appropriate. The user's hand-written Action can then
 choose whether to use that matching algorithm or some other. All that
 Path currently does is return a String representing the canonical form
 of the Path, and that seems clean  focussed to me.

OK. If you think so. Done.

 ===
 
 This bit in SupplementaryRuleManager will cause problems I think:
 public List getMatchingActions(String path) {
 List actionList = super.getMatchingActions(path);
 ...
 if (supplementaryAction != null) {
 actionList.add(supplementaryAction);
 }
 The list the RuleManager returns should not be modified; it is actually
 an internal data structure. This does point out that DefaultRuleManager

I see. You are right.

 should be a little more defensive, perhaps returning
   Collections.unmodifiableList(actionList)
 which would cause the above add call to fail and throw an exception.
 That would, however, cause an object to be created on each
 getMatchingActions call, which I would prefer to avoid if possible.
 How about this instead?
 public List getMatchingActions(String path) {
 List actionList = super.getMatchingActions(path);
 ...
 if (supplementaryAction != null) {
 newActionList = new ArrayList(actionList);
 newActionList.add(supplementaryAction)
 }
 
 I suppose this could be done instead:
 if (supplementaryAction != null) {
 if (actionList.contains(supplementaryAction) {
   // nothing to do, we must have added the
   // supplementary action to this list earlier
 } else {
   // permanently modify DefaultRuleManager's
   // internal data structure; yecch!
   actionList.add(action)
 }
 }
 but as the comments indicate, I think that having one class manipulate
 another's internal data is not in terribly good taste, nor easy to
 maintain.

Got that. I will think about that and provide something new.

 Why is there only facility for one fallback action, and one
 supplementary action? Wouldn't having a List of actions for each be no
 more difficult and potentially more useful?

Maybe. Will do so.
 
 ===
 
 On a minor SVN note, the SVN property svn:keywords needs to be set on
 a file in order to enable expansion of $Id$ into the appropriate text.
 This can be done manually via:
   svn propset svn:keywords Id filename
 It takes a commit to update the repository. I suggest this would be nice
 to do next time you update the SupplementaryRuleManager class, otherwise
 that text in the first line of the file will never change. Adding the
 following to your ~/.subversion/config file will ensure that each .java
 file that you add to subversion automatically gets $Id$ expansion
 enabled:
   [auto-props]
   *.java = svn:keywords=Id

Tried that. Hopefully done now.

  (2) Why are the guts of Context accessible to the Action as well? They
  should only be interesting to the Digester core, not to the
  application. Why not adding an interface that hides the implentation
  details? Like
 
  public interface Context {
  Path getCurrentPath();
  String getMatchPath();
 
  

Re: [digester2] Additions

2005-02-13 Thread Oliver Zeigermann
[update]

- now all member variables of DefaultRuleManager are private -
introduced copy ctor for copying members from subclasses
- DefaultRuleManager now returns an unmodifiable list upon getMatchingActions
- created a new FallbackRuleManager class that only takes care about
fallback actions
- SupplementaryRuleManager now takes account of the explicitely
unmodifiable list returned by DefaultRuleManager and does a wild
mixture of caching and concatenating the list returned from
DefaultRuleManager and its own list of supplementary actions
- Both FallbackRuleManager and SupplementaryRuleManager now can have
several callback actions
- Still docs and tests need improvement - will do so once the code
seems somewhat stable

Comments?

Oliver

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester2] Additions

2005-02-12 Thread Simon Kitching
On Sat, 2005-02-12 at 08:28 +0100, Oliver Zeigermann wrote:
 Hi Simon,
 
 I have now committed the first proposal for the XMLIO like rule
 manager called SupplementaryRuleManager. I also added a very basic
 test for it. I had to make a minor visibility change in
 DefaultRuleManager that it extends. Path now has match methods.
 
 All this isn't perfect, especially Javadoc is missing and tests need
 to be extended, but I wanted to make this public for discussion as
 soon as possible.
 
 So, comments?


One design goal was to avoid using protected members. The problem here
is that using protected for a *member variable* locks us into an
implementation forever (a protected member is part of the external api
that users can access). I think it preferable to add a public or
protected method that provides access to the member rather than exposing
the member directly. The method can later get the data in some manner
other than reading from a member if we change the implementation. Note
that having a method that returns a Map isn't a *whole* lot better. I
think what we need is a copy-helper type method rather than exposing
the namespace member just so a subclass can copy it. I can't think what
the best solution is at the moment, but I feel there's an elegant (and
probably fairly well known) solution nearby. How does clone() solve this
issue?

Note that I have (I think) been rigorous about avoiding protected
members in the o.a.c.digester2 package, but not so rigorous in the
action subpackage. I think it probably is a good idea to apply this to
the action package too, though.

=== 

I would suggest that the match method be placed somewhere other than on
the Path class. Digester could well provide multiple different matching
implementations in future. I think a static method on some XMLIOUtils
class might be appropriate. The user's hand-written Action can then
choose whether to use that matching algorithm or some other. All that
Path currently does is return a String representing the canonical form
of the Path, and that seems clean  focussed to me.

=== 

This bit in SupplementaryRuleManager will cause problems I think:
public List getMatchingActions(String path) {
List actionList = super.getMatchingActions(path);
...
if (supplementaryAction != null) {
actionList.add(supplementaryAction);
}
The list the RuleManager returns should not be modified; it is actually
an internal data structure. This does point out that DefaultRuleManager
should be a little more defensive, perhaps returning 
  Collections.unmodifiableList(actionList)
which would cause the above add call to fail and throw an exception.
That would, however, cause an object to be created on each
getMatchingActions call, which I would prefer to avoid if possible.
How about this instead?
public List getMatchingActions(String path) {
List actionList = super.getMatchingActions(path);
...
if (supplementaryAction != null) {
newActionList = new ArrayList(actionList);
newActionList.add(supplementaryAction)
}

I suppose this could be done instead:
if (supplementaryAction != null) { 
if (actionList.contains(supplementaryAction) {
  // nothing to do, we must have added the
  // supplementary action to this list earlier
} else {
  // permanently modify DefaultRuleManager's 
  // internal data structure; yecch!
  actionList.add(action)
}
}
but as the comments indicate, I think that having one class manipulate
another's internal data is not in terribly good taste, nor easy to
maintain.

=== 

As I mentioned earlier, I'm happy to put the fallbackAction
functionality into DefaultRuleManager. 

=== 

Why is there only facility for one fallback action, and one
supplementary action? Wouldn't having a List of actions for each be no
more difficult and potentially more useful?

=== 

On a minor SVN note, the SVN property svn:keywords needs to be set on
a file in order to enable expansion of $Id$ into the appropriate text.
This can be done manually via:
  svn propset svn:keywords Id filename
It takes a commit to update the repository. I suggest this would be nice
to do next time you update the SupplementaryRuleManager class, otherwise
that text in the first line of the file will never change. Adding the
following to your ~/.subversion/config file will ensure that each .java
file that you add to subversion automatically gets $Id$ expansion
enabled:
  [auto-props]
  *.java = svn:keywords=Id



 And, by the, now that I have done some work and I came across some new
 questions. Maybe they are just dumb, but what do you think about this?
 
 (1) Why aren't relative paths to actions also cached?

The current DefaultRuleManager implementation is straight from the old
RulesBase class, except that absolute paths are expected to start with
/.

I have plans for a major rework of 

Re: [digester2] Additions

2005-02-11 Thread Oliver Zeigermann
Hi Simon,

I have now committed the first proposal for the XMLIO like rule
manager called SupplementaryRuleManager. I also added a very basic
test for it. I had to make a minor visibility change in
DefaultRuleManager that it extends. Path now has match methods.

All this isn't perfect, especially Javadoc is missing and tests need
to be extended, but I wanted to make this public for discussion as
soon as possible.

So, comments?

And, by the, now that I have done some work and I came across some new
questions. Maybe they are just dumb, but what do you think about this?

(1) Why aren't relative paths to actions also cached?

(2) Why are the guts of Context accessible to the Action as well? They
should only be interesting to the Digester core, not to the
application. Why not adding an interface that hides the implentation
details? Like

public interface Context {
Path getCurrentPath();
String getMatchPath();

ClassLoader getClassLoader();
Object getRoot();
boolean isEmpty();
int getStackSize();
Object peek();
Object peek(int n);

putItem(ItemId id, Object data);
Object getItem(ItemId id);
}

and ContextImpl being what Context is now. Similar thing with Path. If
you agree I could do the work.

(3) Bringing this up again: Wouldn't it be more flexible to allow more
than a String as a pattern in RuleManager#addRule(String pattern,
Action action)?

Oliver

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


[digester2] Additions

2005-02-10 Thread Oliver Zeigermann
Folks,

as I noticed Simon has done so much work on Digester2, I just wanted
to be sure that my scheduled additions still are appropriate and
aligned to the overall design. Here they are:

(1) XMLIORuleManager: A rule manager that takes an action and
unconditionally calls it on any event. It's useful to imitate xmlio
style processing, but not limited to this. One application might be
logging. As it is more general any proposal for a better name?

Quoting code sample from Simon:

 // xmlio-style digester
 Action myHandler = new AbstractAction() {
   public void begin(
Context context,
String namespace, String name,
Attributes attrs) {

 String path = context.getMatchPath();
 if (path.equals(..)) {
 
 } else {
 
 }
   }

   public void body(...) {
   }
 }

 RuleManager xmlioRuleManager = new XMLIORuleManager(myHandler);
 Digester d  = new Digester();
 d.setRuleManager(xmlioRuleManager);


As an option it might be possible to register an action with any rule
manager that gets called unconditionally - epsecially for logging and
debugging this can be really helpful. However,  I wasn't quite sure if
Simon was happy with that.

(2) Next to the body and bodySegment call back methods there might be
one that gives you the complete body of an element as a string
composed of all character and markup data. This might be useful when
you want to verbosely take over formatted mixed content like in XHTML.

For performance reasons there should be some mechanism that tells
digester when such content is needed on an element basis. Maybe
something returned by the begin method or - maybe cleaner - something
returned by an additional call back directly called after begin like
boolean requiresComposedBody(String namespace, String name) or
anything similar. This method could alternatively be part of an
additional or extended interface.

Comments?

Oliver

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester2] Additions

2005-02-10 Thread Simon Kitching
Hi Oliver,

On Thu, 2005-02-10 at 19:08 +0100, Oliver Zeigermann wrote:
 Folks,
 
 as I noticed Simon has done so much work on Digester2, I just wanted
 to be sure that my scheduled additions still are appropriate and
 aligned to the overall design. Here they are:
 
 (1) XMLIORuleManager: A rule manager that takes an action and
 unconditionally calls it on any event. It's useful to imitate xmlio
 style processing, but not limited to this. One application might be
 logging. As it is more general any proposal for a better name?
 
 Quoting code sample from Simon:
 
  // xmlio-style digester
  Action myHandler = new AbstractAction() {
public void begin(
 Context context,
 String namespace, String name,
 Attributes attrs) {
 
  String path = context.getMatchPath();
  if (path.equals(..)) {
  
  } else {
  
  }
}
 
public void body(...) {
}
  }
 
  RuleManager xmlioRuleManager = new XMLIORuleManager(myHandler);
  Digester d  = new Digester();
  d.setRuleManager(xmlioRuleManager);
 
 
 As an option it might be possible to register an action with any rule
 manager that gets called unconditionally - epsecially for logging and
 debugging this can be really helpful. However,  I wasn't quite sure if
 Simon was happy with that.
 

Adding functionality to DefaultRuleManager that returns a set of default
actions *if no other action matches the current path* is not a problem.
I'm all for it. And this would meet your requirements, yes? So I'm happy
for you to go with this if you agree. And as a bonus, implementation
should be trivial: a dozen lines or so. I'm not sure whether this API
should then be exposed via the RuleManager interface (ie required for
all RuleManagers) or just left on DefaultRuleManager.

Adding functionality that returns a set of actions *in addition to* the
actions that match the current path is trickier. In what order are they
returned? And can we avoid instantiating a new List object each time
getMatchingActions is called? I'm not saying I'm opposed to it, I'm just
not sure right now.

 (2) Next to the body and bodySegment call back methods there might be
 one that gives you the complete body of an element as a string
 composed of all character and markup data. This might be useful when
 you want to verbosely take over formatted mixed content like in XHTML.
 
 For performance reasons there should be some mechanism that tells
 digester when such content is needed on an element basis. Maybe
 something returned by the begin method or - maybe cleaner - something
 returned by an additional call back directly called after begin like
 boolean requiresComposedBody(String namespace, String name) or
 anything similar. This method could alternatively be part of an
 additional or extended interface.

Yep. I'm in favour of this. I would prefer an eye to be kept on
performance; the requiresComposedBody solution doesn't tempt me
greatly as that is a call that must be made on each Action each time it
matches, and 99.999% of those calls will return false. I could live with
it if there is no better alternative. However if an action's begin
method could enable the body text collection and the end method disable
it then wouldn't that have the same effect without additional work when
the feature isn't being used? I agree it's a little less intuitive for
Action writers though...and I haven't thought it through fully.


Note that I am tentatively planning to do some significant rework of the
DefaultRuleManager class. But that certainly won't be for some weeks so
hack away! I guess you'll also be wanting to work on the Path class, so
the custom Action class can more easily test the current path? I'll make
sure I check on this list before altering this class (though I've got no
plans to do so).

Cheers,

Simon


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]