Re: [digester] Are performance improvements wanted?
On 12 Sep 2004, at 15:46, Phil Steitz wrote: i've been thinking about the problem of proving performance improvements by using unit tests for a while now. i'd really like to be able to be able to create reports about the current performance of library code. maybe it'd be possible to use some kind of normalization to eliminate (or at least reduce) platform specific differences. i'd be interested to hear comments from other folks about this (or ideally, hear about a tool out there which does this ;) I have not personally used it, but JUnitPerf http://www.clarkware.com/software/JUnitPerf.html looks like it is designed to measure performance changes in unit tests. It is BSD-licensed. The approach used in o.a.c.beanutils.BeanUtilsBenchCase -- creating a separate microbenchmarks test case with timing included -- could probably also be applied to [digester] and other commons components. I have no clue how one would go about eliminating platform-specific differences. Could be the best we can do is make microbenchmark test suites available and set up a place where users can report results on different components for different platforms. The Wiki is a natural place to report things; but does it support forms well enough to organize the results? i did take a quick look at JUnitPerf a while ago. i haven't been through it in detail but though it looks like it would work well in a commercial situation running on a central continuous integration box, open source development needs something that can run on different platforms. i wonder whether it would be possible to calibrate JVM performance using a series of tests and then use that rating to work out what the timings should be on different platforms. - robert - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] Are performance improvements wanted?
On 10 Sep 2004, at 18:18, Reid Pinchback wrote: I just finished a project where I had to do a fair bit of performance tuning work over the last year. I was looking through the current digester source, and even without torquing the code wierdly or changing class APIs I've seen places that could probably be made faster. 1) Would folks be interested in digester performance fixes? No point in my wasting time on them if, for example, some major re-write is underway. though there's probably going to be a radical rewriting one day (digester2), i (for one) will be willing to review and apply patches to the digester one code stream for the foreseeable future. IMHO digester 1 is approaching feature completeness (at least, given the limits of backwards compatibility) and should be continued to maintained as a mature, stable, well tested library. looking at performance issues now seems appropriate (though it's not a particular itch of mine and i'm not likely to spearhead any comprehensive effort). 2) What would be the preferred way of submitting them? I was thinking of submitting a tweaked class as an enhancement request with an attached patch and maybe a unit test that measured both the old and new code. People could use the test to try the changes on other platforms (I'd only be testing on some Win32 sdk versions, but the fixes I have in mind should either help or at least do no harm on other platforms). i've been thinking about the problem of proving performance improvements by using unit tests for a while now. i'd really like to be able to be able to create reports about the current performance of library code. maybe it'd be possible to use some kind of normalization to eliminate (or at least reduce) platform specific differences. i'd be interested to hear comments from other folks about this (or ideally, hear about a tool out there which does this ;) so, even if no tool exists (at the moment), it'd be great to have unit tests that demonstrate the performance improvement. that way, once a tool exists, we can just plug it straight in. in terms of submitting patches, if you haven't take a look already, read the standard stuff on submitting patches on the web site and attach them to bugzilla enhancements. (IIRC the lists now strip most attachments to limit stress caused by viruses.) you might like to post an email to the list explaining the changes and linking to the request (bugzilla messages often slip through my filters). it's better to create many small requests (one per improvement) rather than one large one. it's hard to verify large patches and so they tend to get pushed down the priority list. How much of a gain people would see in real use of course would depend on what they were doing; I'm expecting these fixes to matter more in situations where digesters would run frequently (e.g. SOAP) and developers have, where feasible, already dealt with the obvious (factoring out rule+parser factory+parser instantiations). i think that it'd be an excellent idea to collate the collective community knowledge about real life digester performance. the wiki (http://wiki.apache.org/jakarta-commons) seems like the right place for something like this. it'd be really great if you could pull something together on this. - robert - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] Are performance improvements wanted?
I won't repeat my previous comments re: JUnitPerf, but they apply here too. Just looked at the bench case stuff, looks decent, better for fast tests of small code fragments. Whether it is appropriate or not depends on what you are trying to achieve. If you want to be able to record measurements (e.g. in some historical performance file) and compare against that, the approach is fine. What I'm a bit more concerned about right now is to, at more-or-less-the-same-time, compare the timings of two pieces of code in the same environment. I'd like the test to know if I've achieved an improvement or not. On the issue of platform-specific differences, I agree, that is tough. The problem with posting numbers is that systems vary so much its hard to draw conclusions. If somebody claimed to have similar hardware and O/S to you, if their numbers are the same, higher, or lower than yours, what does it tell you? Unfortunately, the data is from an experiment that is too uncontrolled to help a developer decide if a proposed code change is likely to be faster across multiple platforms. If you are inclined to muse in the direction of random unpractical thoughts, you could envision a small reference set of Java code fragments. Measure Digester performance in terms of the reference set. That performance number should be platform dependent, while the actual results on any given platform would be finally determined by the raw performance of the reference set. That is essentially the technique used in a variety of numerical modeling, estimation, or optimization approaches. Definitely pie-in-the-sky category solution. Maybe put it on the Wiki for, oh, Digester 27.0. :-) --- Phil Steitz [EMAIL PROTECTED] wrote: The approach used in o.a.c.beanutils.BeanUtilsBenchCase -- creating a separate microbenchmarks test case with timing included -- could probably also be applied to [digester] and other commons components. I have no clue how one would go about eliminating platform-specific differences. __ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] Are performance improvements wanted?
On Mon, 2004-09-13 at 08:38, Reid Pinchback wrote: The first performance-related patch I'll submit shows how I approximate this. Mostly I try to minimize how much JIT, GC, and differences in inheritance hierarchy depth can distort the comparison. The case I've put together is on what the impact would be of handling logger initialization statically in the Digester class. Not a big win, obviously, but an easy example of the approach. Besides cutting constructor cost in 1/2 is never bad. Hi Reid, I'm also interested in seeing performance patches. It's great to hear you're working on this topic. You should be warned, though, that the logging area is particularly tricky. From what I remember, there is a requirement that frameworks which use digester (eg j2ee app servers) must be able to direct logging output to different destinations depending on which app the framework is running the digester on behalf of. There's some email discussion about logging in digester from about a year back that goes into this in some depth; I was not happy with the way logging worked in Digester but after Craig explained why it was the way it was, and what the requirements were, I was not able to find a better way to organise logging while satisfying the original requirements. I'm not saying there *isn't* a way to improve digester logging, just that it is probably necessary to read that email thread first to be sure the improvements still satisfy the requirements as described by Craig. [of course these requirements should really be coded as unit tests so that required behaviour *can't* be changed without unit test failures] I'm certain, however, that there are a number of other places where optimisations are available, and look forward to seeing some improvements. Regards, Simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [digester] Are performance improvements wanted?
--- Simon Kitching [EMAIL PROTECTED] wrote: You should be warned, though, that the logging area is particularly tricky. Yup, I figured that could be the case. Before I even proposed this I'd already decided that I'd just float each change as a proposal, and just grin and bear it if there was something that made the change unwise. While you strive to create performance fixes that don't change behaviour at all, sometimes you run into cases were that isn't true. When that happens, folks have to decide if the change would be to something that mattered, or not. From what I remember, there is a requirement that frameworks which use digester (eg j2ee app servers) must be able to direct logging output to different destinations depending on which app the framework is running the digester on behalf of. ... I was not able to find a better way to organise logging while satisfying the original requirements. I'm not saying there *isn't* a way to improve digester logging, just that it is probably necessary to read that email thread first to be sure the improvements still satisfy the requirements as described by Craig. Ok, I'll see if I can find anything archived about that. At a guess I bet its something like the following: - getLogger returns a reference to a logger - Digester instances currently each have their own reference - if you use that reference to change the logger behaviour for your Digester, do you change only your own logging, or everybody else's logging via the Digester/Digester.sax categories, and would sharing a static logger change that? Can't say I've traced this kind of thing through log4j, but I'd have expected that changing the logger changed everybody's logging via the same category against the same repository. Could be I'm wrong. Normally I'd expect that if multiple clients needed different control of logging for the same category, they'd need to have their own repositories. In any case, I'm not overly worried about winning on this particular change. Its the kind of thing that matters more during development than during execution - its a measurable drag on running unit tests that instantiate Digester instances in loops, but not such a big deal in real-life Digester usage. Not an issue for now, but for the future I'm particularly intrigued by some of the Wiki comments for Digester 2.0, and how it might be time to split out various areas of functionality. I think at that point you might have a chance to allow for some very serious performance improvements in areas that wouldn't be possible today without changing the API in undesirable ways. I think a lot of the circular dependencies between classes and packages that exist in Digester today are the initial sniff test of interesting opportunities with a different approach. Reid __ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[digester] Are performance improvements wanted?
I just finished a project where I had to do a fair bit of performance tuning work over the last year. I was looking through the current digester source, and even without torquing the code wierdly or changing class APIs I've seen places that could probably be made faster. 1) Would folks be interested in digester performance fixes? No point in my wasting time on them if, for example, some major re-write is underway. 2) What would be the preferred way of submitting them? I was thinking of submitting a tweaked class as an enhancement request with an attached patch and maybe a unit test that measured both the old and new code. People could use the test to try the changes on other platforms (I'd only be testing on some Win32 sdk versions, but the fixes I have in mind should either help or at least do no harm on other platforms). How much of a gain people would see in real use of course would depend on what they were doing; I'm expecting these fixes to matter more in situations where digesters would run frequently (e.g. SOAP) and developers have, where feasible, already dealt with the obvious (factoring out rule+parser factory+parser instantiations). Thanks Reid ___ Do you Yahoo!? Shop for Back-to-School deals on Yahoo! Shopping. http://shopping.yahoo.com/backtoschool - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]