I wrote the following note to Jeffrey Friedl and figured I'd forward it to the list since it should be of general interest. More tests are necessary before deciding how to proceed with jakarta-oro development, but initial results seem to imply we're going to have to sacrifice performance with earlier JVMs in the interest of gaining performance with the new JVMs. I suspect we can regain the performance lead in regular expression matching by making some simple changes that recognize HotSpot's improvements instead of working around the deficiencies of previous JITs.
------- Forwarded Message I finally found some time to run some tests and it looks like jakarta-oro loses all of its performance in the overhead of converting a String to a char[] before doing the matching. You might help me confirm this if you run your tests on straight char arrays rather than Strings. My tests indicate that with the latest client HotSpot, the ratio between iterating over a char array with straight indexing versus a String with charAt() is 2:3 and with the server HotSpot, it's 1:1, which means HotSpot server is doing the proper method inlining. By contrast, with JDK 1.1 and 1.2, the ratio is 2:5, which is why it was such a win for OROMatcher back in those days. The overhead of converting to a char array each time before iterating in those days only reduced the ratio to 1:4, still a big win. However, with the latest HotSpot, the situation flip flops to a nasty 4:1. So, as I had suspected, what used to be a beneficial performance workaround has become a major penalty. I haven't run tests against JDK 1.4's java.util.regex to isolate how much of the performance discrepancy is due to this simple problem, but I suspect a large chunk of it is. Anyway, I just thought you might be interested. ------- End of Forwarded Message -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
