Hi there,

This question is a development question relating to how I might solve or
approach a particular regular expression problem using the Jakarta ORO
package.  Here we go...

Given an arbitrary string Foo and a set of regular expressions or patterns
Bar, I'm looking to for the most optimal way to find the subset of patterns
in Bar which Foo matches.  Optionally, I'd also like to be able to determine
the "best fit" pattern in the subset, if it has more than 1 item.

The trivial brute force approach obviously involves linearly checking each
pattern in the set with performance of O(n).  But I'm anticipating having to
deal with a relatively large set of patterns so this doesn't work.

Are there any features within Jakarta ORO which would allow me to implement
a more optimal solution?  Can anyone suggest how the components in ORO might
be combined/assembled to acheive something like this?

One other thing to bear in mind, is that there will likely be a great deal
overlap between the patterns in the set.  In essence many of the patterns
will be more and more specific "child" variants of a given general "parent"
pattern.   Therefore anything matching a child would necessarily also match
the parent.

Can ORO be used to recognize these "inheritance" (I'm sure there's a better
word!) relationships between patterns?  If so, I'm thinking I might be able
to construct some type of nested tree-like data-structure whereby checking
the "parents" will allow me to quickly eliminate most of the set.

Anyway I'm babbling now.  Any responses, help or suggestions you can provide
at greatly appreciated.  Thanks in advance.


Regards,

Sasha Haghani
Brightspark
http://www.brightspark.com/
Toronto, Canada

E: [EMAIL PROTECTED]
T: 416.488.1999 x 241
F: 416.488.1988

Reply via email to