>Basically, the regexp package is smaller and has a reduced feature set.
>In fact, the regexp package jar file is less than half the size of the oro
>package jar.

The feeling that jakarta-oro is large is a common misconception.  The size
of what used to be OROMatcher is very small.  All you need for
regular expressions is the org.apache.oro.text.regex package, not all
of the other stuff.  To alleviate this misconception, we're going to
provide a jakarta-oro jar that has everything and then separate jars for
strictly those slices that people want, roughly corresponding to the old
OROMatcher, PerlTools, AwkTools, and TextTools packages.

>Initially, regexp handles matching (and rejecting matches) more quickly. But,
>after a few hundred matches, the time required by the regexp package
>(especially in rejecting matches) increases considerably when compared to
>the oro package.

This is also another misconception, although not directly in relation to
the regexp package.  The jakarta-oro package has 4 different regular
expression packages.  So when you compare performance, you have to
specify which one.  Also, a lot of times people talk about jakarta-oro
when they really mean the Perl5Util class, which is a convenience
wrapper around the org.apache.oro.text.regex package.  Perl5Util will
always be slow (although we can improve its performance) because it
does a higher level set of parsing so that you can use Perl-specific
syntactic sugar like 's/foobar/barfoo/g' instead of the allegedly
more cumbersome approach of directly using the org.apache.oro.text.regex
classes.  Furthermore, most people blatantly misuse the
org.apache.oro.text.regex package by constantly reinstantiating and
Perl5Compiler and Perl5Matcher instances and constantly recompiling
regular expressions.  Hopefully this will stop after we write a new
user's guide explaining how to make proper use of the package.
A valid performance comparison can only be made by posting the code used
to make the comparison.  I don't know how you reached the assessment you
made.  All performance evaluation code is welcome on oro-dev because
even though the primary goal for at least the Perl related stuff is to
achieve compatibility with Perl, the secondary goal is to be as fast
as possible within the constraints of Perl's regex syntax and Java's
runtime performance.

daniel


Reply via email to