On Tue, Mar 09, 2004 at 01:43:05PM -0500, Ronald J Kimball wrote:
>
> I think I've got almost what you want. It correctly limits the length and
> number of repetitions (example 1) and handles overlapping occurrences of
> the pattern (example 2).
>
> The one flaw is that it will only match one possible substring starting at
> any point in the string. For example, if you have S = ABAB, P = [AB], A =
> 2, B = 4, C = 2, D = 4, you could match ABAB, ABA, and AB all at the start
> of the string, but my regex will only match ABAB (example 3). Perl won't
> match a 0-length substring more than once at the same position. I'm not
> aware of a simple way around that within the regex.
Here's a fairly simple way to avoid the flaw, but that breaks out of the
regex mold. Instead of returning the results from the match (potentially
one at a time), the regex in this solution pushes the matches directly onto
a results array.
The regex engine will find all the matches with a single execution of the
regex, because the regex uses (?!) to force backtracking even after what
would be a successful match.
With this code, example 3 gives all six possible matches instead of just
three.
#!/usr/local/bin/perl -w
use strict;
use re 'eval';
my @args = (
[qw/ ABCBAB [AB] 3 3 2 2 /],
[qw/ 12121212 1212 8 8 3 3 /],
[qw/ ABAB [AB] 2 4 2 4 /],
);
foreach my $args (@args) {
print "@$args\n";
my @matches = subpattern(@$args);
foreach my $match (@matches) {
print "@$match ",
substr($args->[0], $match->[0], $match->[1] - $match->[0]),
"\n";
}
print "\n";
}
sub subpattern {
my($string, $pattern, $minlen, $maxlen, $minrep, $maxrep) = @_;
my $not_pattern = "(?:(?!$pattern).)";
my $match_length =
"(?{(length\$1>=$minlen&&length\$1<=$maxlen)" .
"&&[EMAIL PROTECTED],[\$-[0],\$-[0]+length\$1]})";
my @matches;
my $re = "(?=($not_pattern*" .
"((?=$pattern).$not_pattern*){$minrep,$maxrep}" .
")$match_length(?!))";
warn "$re\n";
$string =~ /$re/;
return @matches;
}