Re: [digester2] performance of ns-aware parsing

2005-02-06 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:
  I stopped using belief as a measurement of code a long time
  ago.  Usually only works when I wrote all the code.  :-)
  I'll cook up an experiment and see what I can come up with
  in the way of timing information.
 
 That would be excellent. I look forward to seeing the results..

Actually, an experiment implies a question to be answered, and
while this has been an interesting back-and-forth, not sure
we really have a question to answer.  This whole thing began
with me simply asking a question about something you'd
put in your readme file on the upcoming work.  Practically
I don't see you not expecting a namespace-aware parser, the
question is really more one of the user of Digester2 deciding
if they are using namespace features.  While we could do
timing tests to help people understand what the impact may
or may not be of using NS in the documents they parse, it
obviously has nothing to do with whether or not you are
going to expect a parser to handle NS if the docs contain NS.
That will be the developer's problem, not yours, yes?







__ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [digester2] performance of ns-aware parsing

2005-02-05 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:

 On Thu, 2005-02-03 at 07:52 -0800, Reid Pinchback wrote: 
  Even for Sax the performance difference between (a) and (b) is roughly 
  a factor of 2 across all parsers when processing small (typical 
  message-sized) 
  docs that don't use NS. 
 
 I would *really* love to see some actual measurements on this if you can
 find some. You seem to be quoting from some study you have done or read
 - it would be great to have this. [See comments on Piccolo below]

Take another look at the Piccolo data, and compare the 2 Soap examples
to the random no-NS data.  The differences between the two Soap examples
isn't material because both use NS, so in a sense you have a couple of
different samples of NS data, and in the random case you have another
sample, but I agree it would be better to create tests that were better
understood in order to decide what the difference was.


   Mucking with (d) is supposed to result in significant
  wins when you tune the grammar handling to your app, but I haven't tried it 
  myself and I've never seen timing differences quoted.  
  
 
 I don't quite understand what (d) means, but is it actually relevant?
 Again, we are talking about *namespaces* not validation.

Yes... and every entity (Element and Attribute) is jammed through a
resolution process first.  Remember XML attributes with default values?
Guess where those values are identified and handed to the parser - during
the resolution process.  Namespaces just add more data to shuffle
around during the resolution process.


 What I'm trying to achieve is to avoid having actions or patterns deal
 with element-names containing prefixes, eg stating that an element's
 name is foo:item. This is just broken; the item's name is really the
 tuple (some-namespace, item).
 
 Grammars/schemas can optionally be bound to namespaces, but namespaces
 themselves are a lower layer that can be used without any of these
 things. I'm talking here about requiring the parser to convert
 foo:item into (namespace, item) but do not intend to imply that any
 kind of schema should be loaded for the specified namespace. 

That sounds sensible.

 The XMLReader.setNamespaceAware(true) method does exactly this; enables
 mapping of prefixes - namespaces, but does not enable processing of
 either DTDs or schemas.

I don't think it actually has any impact at all on DTD processing.
DTDs, if declared, are always processed unless you install an entity 
resolver that excises that activity out.

   I agree
  that old parsers providing (c) aren't particularly interesting, but
  if you spend any time tracing through the guts of the parsing, particularly
  when you see how DTDs are loaded for entity resolution, you begin to see 
  (d) as having potential.  Throwing (b) away may result in less code in
  Digester2, but it may be worth doing some timing tests to see if that 
  code reduction is consequence-free.
 
 What does loading DTDs have to do with namespaces?

As you said, the XML spec doesn't require that the namespaces mean
anything, and hence it is possible that a parser won't try to resolve
and validate against multiple DTDs, but I haven't ever traced through
the code in a situation where there were multiple namespaces to
resolve against, so I don't know if there is relationship there or not.
In general, if a parser thinks it needs a DTD in order to understand
a document, it tends to grab it.  I don't know if there are situations
where it tries to interpret namespace declations as public ids for DTDs.
If that happens, then those DTDs would also be loaded by the parser
and namespaces would have to be matched to the appropriate collections
of contexts during entity resolution.


   I still find it hard to believe that leaving out namespace support makes
   a performance difference. The parser needs to keep a map of
  prefix-(stack of namespace)
   and that's about it. 

I stopped using belief as a measurement of code a long time
ago.  Usually only works when I wrote all the code.  :-)
I'll cook up an experiment and see what I can come up with
in the way of timing information.


 Sorry, what per-entity operations, and what temporary object creations?

The Jade/Javolution author wrote a fair bit about that, I'll see
if I can find his pages.  I couldn't find the details at the
Javolution site; when Jade was separate he indicated that the
String operations required to satisfy the SAX API semantics 
dragged down performance heavily.

Zapthink comments on XML parsing challenges,

  http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci85,00.html
 
 No occurrence of the word namespace anywhere in the article.

For this and other similar concepts, it helps to start associating
namespaces with other aspects of parsing internals.  Elements and 
attributes have to be matched up to their definitions - the 
resolution process.  Namespaces are an aspect of the match up, just 
more information

Re: [digester] initial code for Digester2.0

2005-02-03 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:

 On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote:
 Of course if someone can demonstrate that non-namespace-aware parsers
 *are* still useful then I'll change my mind.

Just to clarify, since I was being sloppy before (I gotta
stop typing in shorthand) there is an important distinction:

a) having NS-aware parser, always using NS-aware API methods
b) having NS-aware parser, selectively using NS-aware API methods
c) having non-NS-aware parser (and obviously never using NS-aware API methods)
d) having NS-aware parser where the developer fixes a grammar that
   ignores any NS distinctions

Even for Sax the performance difference between (a) and (b) is roughly 
a factor of 2 across all parsers when processing small (typical message-sized) 
docs that don't use NS.  Mucking with (d) is supposed to result in significant
wins when you tune the grammar handling to your app, but I haven't tried it 
myself and I've never seen timing differences quoted.  

I'm not trying to advocate any approach except to notice that, since your 
README mentioned requiring a namespace-aware parser, it sounded like 
there was a potential for options (b), (c), and (d) to become unintentionally
closed to developers in Digester2 when they weren't in Digester1.  I agree
that old parsers providing (c) aren't particularly interesting, but
if you spend any time tracing through the guts of the parsing, particularly
when you see how DTDs are loaded for entity resolution, you begin to see 
(d) as having potential.  Throwing (b) away may result in less code in
Digester2, but it may be worth doing some timing tests to see if that 
code reduction is consequence-free.



 I still find it hard to believe that leaving out namespace support makes
 a performance difference. The parser needs to keep a map of
prefix-(stack of namespace)
 and that's about it. 

Actually the XML spec distinguishes between the default namespace
and all other namespaces, so parsers can reasonably make the same
distinction and try to avoid a bunch of per-entity operations and 
temporary object creations in the case where there is no namespace.
Look at the piccolo stats published on Sourceforge.  Compare Soap, 
Soap+NS, and random XML-no NS timings and it suggests that NS 
ain't free.

Useful links:

  Jade (now part of Javolution) http://javolution.org/api/index.html,
  look at the javolution.xml package (trades String for CharSequence
  to increase performance, but keeps NS)

  Picollo you probably already have the link for, but for anybody
  else interested: http://piccolo.sourceforge.net

  Zapthink comments on XML parsing challenges,
  
http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci85,00.html

  Developerworks articles on XML performance,
  http://www-106.ibm.com/developerworks/xml/library/x-perfap1.html

  Sun articles on XML performance,
  http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML_part3/


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] initial code for Digester2.0

2005-02-02 Thread Reid Pinchback

One section of the release notes says:

The Digester now *always* uses a namespace-aware xml parser.

I was wondering why this is.  There are a lot of XML parsers
out there, and some of them have done things like trade
namespace awareness for performance.  If somebody has a
application where namespaces aren't an issue, why should
they be limited to only using a namespace-aware parser?
Not something that seems like an important issue if you are
just using a Digester to process some kind of app config
file, but is an issue if processing streams of XML data
is fundamentally what the app is about.





__ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] initial code for Digester2.0

2005-02-02 Thread Reid Pinchback

--- Oliver Zeigermann [EMAIL PROTECTED] wrote:
 On Wed, 02 Feb 2005 18:28:04 +1300, Simon Kitching [EMAIL PROTECTED] wrote:
  My major concern is that if we are going to warn people not to implement 
  the Action interface,
  then what really is the point of providing it in the first place? As I said 
  above, I just
 cannot
  think of any situation where a class would want to be an Action *and* 
  extend some other class.

 I am +1 for using an interface and the default (why abstract?)
 implementation like with Swing or SAX.

I don't get why we would ever warn people not to implement the interface,
beyond including JavaDoc that clarified what the behaviour contract is
for the various methods.  Part of a developer's job is to exercise
judgement about what they are or are not going to do in their implementation.
If the existing Action implementations and base class provides what a developer 
needs to do 99% of the time, they won't bother implementing the interface, but 
when they encounter that 1% scenario, its nice not to hit a brick wall.

Here is a concrete example of why you could want to implement the interface
and extend another class, I've actually had situations with the existing
Digester where I'd wished I could do that.  The one that I can recall now
was an instrumentation issue.  Doing debugging and performance tuning of
a suite of rules can be tedious because, currently, the only options are
either to watch a spew of logging messages or single-step your way through
all the callbacks in a debugger (PAIN).  If the major coupling points
in the Digester had been abstracted by interfaces, it would have been
easier to insert instrumentation proxies or EasyMock'd test implementations 
of classes at key points.



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] initial code for Digester2.0

2005-02-02 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:

 Supporting namespaces in an xml parser seems very simple to me. I think
 it much more likely that only antique and unmaintained parsers fail to
 support namespaces. And people who are determined to use antique and
 unmaintained parsers can just stick with digester 1.x as far as I am
 concerned. I'm not pushing for digester to remove non-namespace-aware
 support - just digester2!

Wow, that is an unexpectedly harsh reaction.  My reason for asking 
was simple, and I believe not unreasonable.   You were the one asking
for feedback on your proposal. 

Using the namespace-based API of an XML parser is known throughput 
substantially, 
covered in a host of Java xml mag articles, available from google searches, and
one or two of the Java performance tuning books still in distribution.  XML 
performance tuning is a tough area, and people continually struggle with it.
I don't recall the SAX-only stats, but I know that for DOM parsers you can 
shoot for an increase XML processing bandwidth by an order of magnitude through 
a change in parser and not using NS.  Antiqueness of parsers isn't the issue.

I think it helps to keep in mind that NS was intended as a way of creating 
name-resolution scopes that allow the merging of document structures from 
different origins that otherwise could experience element and attribute
name clashes.  When somebody has an application that doesn't require that 
kind of merging, and they aren't using a namespace-dependent XML technology 
like Soap or XMLSchma, then using using NS features of an NS parser can
be a burden without corresponding benefit.  Under the hood, that parser has 
to do a lot of work to continually manage the NS resolution of the node names.
It has no way of knowing that the work is pointless - you've told it to
assume that there is a point when you use the NS features.






__ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] initial code for Digester2.0

2005-02-01 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:
 Does this mean you prefer Action to Rule? I certainly expect to hear
 from people who want to keep the current names...

I'm not wedded to Rule but I do have a concern about Action.
I suspect it could make Struts code rather confusing.





__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] initial code for Digester2.0

2005-02-01 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:

 Ok, we'll see what the general consensus is. I happen to personally like
 prefixes rather than suffixes, but will go with the majority opinion.

I vote for prefixes.

 That sounds reasonable. However I do dislike having mutual dependencies
 between java packages; a DAG (directed acyclic graph) is good for a
 number of reasons. 

I strongly agree.  Cyclic package dependencies seem
unimportant when you only have a few classes, but as the
amount of code grows, you quickly find that testing and
refactoring because much more difficult than it had to be.






__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] initial code for Digester2.0

2005-02-01 Thread Reid Pinchback

Sure thing.  Just to make it easier to envision, let's get
packages out of the equation.  Just think about cyclic
dependencies between two classes in the same package.
That is enough to show the problem; packages just add complexity 
because the dependencies can be much harder to detect visually 
(usually you would use something like JDepend to spot them) 
and harder to unwind.

Refactoring is harder simply because you have to do a larger
number of smaller steps.  Doesn't mean impossible, more steps
just mean more work, more time, more money.  Tricky enough when 
only two classes are involved, harder as the number of classes 
involved in the cycle increase.  Get enough classes involved, 
and you start to hear statements like it will be easier to 
throw that away and start over again than it will be to fix it.

class A {
  int a;
  int fooA(int arg) {
// 1a. do stuff with {B.fooB,a,arg}
// 2a. do other stuff with result and {a}
  }
}

class B {
  int b1, b2;
  int fooB(int arg) {
// 1b. do stuff with {A.fooA,b1,arg}
// 2b. do other stuff with result and b2
// 3b. do stuff with {A.fooA,b2,arg}
  }
}


Refactoring remains possible, but tricky because
you have both compile-time code dependencies and
run-time state dependencies.  You are faced with 
things like factoring out small fragments of code 
into helper classes, and maybe introducing an 
interface to at least eliminate the compile-time
dependency between A and B, even if the run-time
dependency remains.  

Often the solution ends up something like

a) make interface I
b) create class C implements I
   and migrate some of A and B state into C
c) modify A and B to share I

It works, it just takes time... and often you
are doing it before even trying to tackle whatever
bug or feature enhancement you were faced with
in the first place.





__ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 
http://mobile.yahoo.com/maildemo 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] Are performance improvements wanted?

2004-09-12 Thread Reid Pinchback

I won't repeat my previous comments re: JUnitPerf,
but they apply here too.  Just looked at the bench
case stuff, looks decent, better for fast tests of
small code fragments.  Whether it is appropriate
or not depends on what you are trying to achieve.
If you want to be able to record measurements
(e.g. in some historical performance file) and
compare against that, the approach is fine.

What I'm a bit more concerned about right now
is to, at more-or-less-the-same-time, compare 
the timings of two pieces of code in the same
environment.  I'd like the test to know if
I've achieved an improvement or not.

On the issue of platform-specific differences,
I agree, that is tough.  The problem with
posting numbers is that systems vary so much
its hard to draw conclusions.  If somebody
claimed to have similar hardware and O/S to
you, if their numbers are the same, higher,
or lower than yours, what does it tell you?
Unfortunately, the data is from an experiment
that is too uncontrolled to help a developer
decide if a proposed code change is likely
to be faster across multiple platforms.

If you are inclined to muse in the direction
of random unpractical thoughts, you could
envision a small reference set of Java code
fragments.  Measure Digester performance in
terms of the reference set.  That performance
number should be platform dependent, while
the actual results on any given platform would 
be finally determined by the raw performance of 
the reference set.  That is essentially the
technique used in a variety of numerical
modeling, estimation, or optimization approaches.

Definitely pie-in-the-sky category solution.
Maybe put it on the Wiki for, oh, Digester 27.0.

:-)



--- Phil Steitz [EMAIL PROTECTED] wrote:

 The approach used in
 o.a.c.beanutils.BeanUtilsBenchCase -- creating a 
 separate microbenchmarks test case with timing
 included -- could 
 probably also be applied to [digester] and other
 commons components.


 
 I have no clue how one would go about eliminating
 platform-specific 
 differences.




__
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [digester] Are performance improvements wanted?

2004-09-12 Thread Reid Pinchback

--- Simon Kitching [EMAIL PROTECTED] wrote:

 You should be warned, though, that the logging area
 is particularly tricky. 

Yup, I figured that could be the case.  Before
I even proposed this I'd already decided that I'd
just float each change as a proposal, and just grin
and bear it if there was something that made the
change unwise.  While you strive to create performance
fixes that don't change behaviour at all, sometimes
you run into cases were that isn't true.  When
that happens, folks have to decide if the change
would be to something that mattered, or not.

From what I remember, there is a requirement
 that frameworks
 which use digester (eg j2ee app servers) must be
 able to direct logging
 output to different destinations depending on which
 app the framework
 is running the digester on behalf of.
...
 I was not able to find a
 better way to organise logging while satisfying the
 original
 requirements.
 
 I'm not saying there *isn't* a way to improve
 digester logging, just
 that it is probably necessary to read that email
 thread first to be sure
 the improvements still satisfy the requirements as
 described by Craig.

Ok, I'll see if I can find anything archived about
that.  At a guess I bet its something like the
following:

- getLogger returns a reference to a logger
- Digester instances currently each have their
  own reference
- if you use that reference to change the logger
  behaviour for your Digester, do you change only
  your own logging, or everybody else's logging
  via the Digester/Digester.sax categories, and
  would sharing a static logger change that?

Can't say I've traced this kind of thing through
log4j, but I'd have expected that changing the 
logger changed everybody's logging via the same
category against the same repository.  Could be I'm
wrong.  Normally I'd expect that if multiple clients
needed different control of logging for the same
category, they'd need to have their own repositories.

In any case, I'm not overly worried about winning
on this particular change.  Its the kind of thing
that matters more during development than during
execution - its a measurable drag on running unit
tests that instantiate Digester instances in loops,
but not such a big deal in real-life Digester usage.

Not an issue for now, but for the future I'm
particularly intrigued by some of the Wiki
comments for Digester 2.0, and how it might be
time to split out various areas of functionality.
I think at that point you might have a chance
to allow for some very serious performance
improvements in areas that wouldn't be possible
today without changing the API in undesirable ways.
I think a lot of the circular dependencies between
classes and packages that exist in Digester today 
are the initial sniff test of interesting
opportunities with a different approach.

   Reid





__
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[digester] Wiki todo 2.1.7, yes Digester can do Ant properties

2004-09-12 Thread Reid Pinchback
FYI, I've verified that yes, the Digester 
substitution facilities in 1.6 can be used
to do the same kind of variable substitution
that Ant has.  Just wanted to send in a note
so nobody wastes time tackling the same
problem.  Once Simon has finished merging
the 1.6 source into the head, I'll post
the change.  At that poing somebody with
Wiki godliness should probably indicate the
issue closed.

Nothing earth-shattering to do it.
VariableExpansionTestCase was a large part of
the way there, just needed to take it a little 
bit further.  No changes to functional code are
needed, just required a combination of the 
substitutor framework, CallMethodRule, 
CallParamRule, and an appropriate initial 
object shoved on the Digester stack.



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[digester] Are performance improvements wanted?

2004-09-10 Thread Reid Pinchback

I just finished a project where I had to do a fair bit
of performance tuning work over the last year.  I was
looking through the current digester source, and even
without torquing the code wierdly or changing class
APIs I've seen places that could probably be made
faster.

1) Would folks be interested in digester performance
fixes?  No point in my wasting time on them if, for
example, some major re-write is underway.

2) What would be the preferred way of submitting them?
 I was thinking of submitting a tweaked class as an
enhancement request with an attached patch and maybe a
unit test that measured both the old and new code. 
People could use the test to try the changes on other
platforms (I'd only be testing on some Win32 sdk
versions, but the fixes I have in mind should either
help or at least do no harm on other platforms).  

How much of a gain people would see in real use of
course would depend on what they were doing; I'm
expecting these fixes to matter more in situations
where digesters would run frequently (e.g. SOAP) and
developers have, where feasible, already dealt with
the obvious (factoring out rule+parser factory+parser
instantiations).

Thanks

 Reid





___
Do you Yahoo!?
Shop for Back-to-School deals on Yahoo! Shopping.
http://shopping.yahoo.com/backtoschool

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]