In message <[EMAIL PROTECTED]>, David Graham w rites: >Those benchmarks are in line with some I performed for Validator. >Validator uses ORO but when I replaced it with Java 1.4 regexs I got a 2x >speed improvement. ORO works well (if not slowly) for Validator.
Why oh why does no one ever donate their benchmark code to help build a better product? :) (I know you probably wrote some quick tests, so I'm making a general plea rather than one directed at you David.) Seriously, it would provide an incentive to move out of maintenance mode. If ORO broke backward compatibility and took advantage of J2SE 1.4 features, I'm pretty sure the Perl stuff could match java.util.regex on average. But what's the point ... Which is really why I'm chiming in on this thread. Do Java regular expression users see ORO and Regexp mainly as vehicles for supporting pre-J2SE 1.4 code (and possibly J2ME; both can be made to work with J2ME with minor code changes)? Should they stay on the shelf in maintenance mode or is there any reason to continue enhancing them? Even though there are a lot of directions they can go in, it doesn't seem like anyone has any itches left to scratch. To answer the original question. If you need Perl (including zero-width negative lookahead assertions), AWK, or glob expressions, use ORO. If you need POSIX-like expressions, use Regexp. If you don't care, then establish some other criteria to make the decision, such as whichever you feel is easier to use. Microbenchmarks like the one at http://tusker.org/regex/regex_benchmark.html are not very useful because the performance of regular expression libraries depends heavily on the patterns and input data used (unless the patterns and data used are characterstic of what your application will use). For example, in that benchmark, ORO beats java.util.regex on the second pattern when I run it: ------------------------------------------ Regular expression library: java.util.regex.Pattern RE: usd [+-]?[0-9]+.[0-9][0-9] MS MAX AVG MIN DEV INPUT 27 1 0.0027 0 0 'http://www.linux.com/' 61 1 0.0061 0 0 'http://www.thelinuxshow.com/main.php3' 114 4 0.0114 0 0 'usd 1234.00' 132 4 0.0132 0 0 'he said she said he said no' ------------------------------------------ ------------------------------------------ Regular expression library: org.apache.oro.text.regex.Perl5Matcher RE: usd [+-]?[0-9]+.[0-9][0-9] MS MAX AVG MIN DEV INPUT 18 1 0.0018 0 0 'http://www.linux.com/' 35 1 0.0035 0 0 'http://www.thelinuxshow.com/main.php3' 85 1 0.0087 0 0 'usd 1234.00' 108 1 0.0116 0 0 'he said she said he said no' ------------------------------------------ The total time (which is what the benchmark uses) for the java.util.regex is 334 and the second is 256. If you only ran that, you might conclude that ORO 1.35X faster than java.util.regex. Nonetheless, I have no doubt that java.util.regex is on average faster on J2SE 1.4 than libraries that predate J2SE 1.4. And I trust David Graham's assessment with Validator. I'm just suggesting that you (Simon) be careful about this benchmark because it uses a very limited number of patterns and input. All that said, it would be great to have a configurable benchmark to test ORO and Regexp in order to isolate use cases where their performance can be improved. But does anybody care anymore? daniel --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
