> Given an arbitrary string Foo and a set of regular 
> expressions or patterns
> Bar, I'm looking to for the most optimal way to find the 
> subset of patterns
> in Bar which Foo matches.  Optionally, I'd also like to be 
> able to determine
> the "best fit" pattern in the subset, if it has more than 1 item.
> 
> The trivial brute force approach obviously involves linearly 
> checking each
> pattern in the set with performance of O(n).  But I'm 
> anticipating having to
> deal with a relatively large set of patterns so this doesn't work.
> 
> Are there any features within Jakarta ORO which would allow 
> me to implement
> a more optimal solution?  Can anyone suggest how the 
> components in ORO might
> be combined/assembled to acheive something like this?
> 
> One other thing to bear in mind, is that there will likely be 
> a great deal
> overlap between the patterns in the set.  In essence many of 
> the patterns
> will be more and more specific "child" variants of a given 
> general "parent"
> pattern.   Therefore anything matching a child would 
> necessarily also match
> the parent.
> 
> Can ORO be used to recognize these "inheritance" (I'm sure 
> there's a better
> word!) relationships between patterns?  If so, I'm thinking I 
> might be able
> to construct some type of nested tree-like data-structure 
> whereby checking
> the "parents" will allow me to quickly eliminate most of the set.

No, there is not a general mechanism to do what you want.
If the set of patterns was strictly non-overlapping, you
could put them together with '|', but that is not your case.
Is it possible that the "parents" are non-overlapping?
Then you could pick out the parents in one pass and look
in the substrings for the children.

This would be a problem in Perl or any other Reg Ex package.
What you are talking about is beyond regular expressions.

Even if you can get it down to one scan, the performance is not
of O(n).  Reg Ex parsing is an NP-complete problem.
- Dan

Reply via email to