On Tue, Mar 09, 2004 at 01:43:05PM -0500, Ronald J Kimball wrote:
> 
> I think I've got almost what you want.  It correctly limits the length and
> number of repetitions (example 1) and handles overlapping occurrences of
> the pattern (example 2).
> 
> The one flaw is that it will only match one possible substring starting at
> any point in the string.  For example, if you have S = ABAB, P = [AB], A =
> 2, B = 4, C = 2, D = 4, you could match ABAB, ABA, and AB all at the start
> of the string, but my regex will only match ABAB (example 3).  Perl won't
> match a 0-length substring more than once at the same position.  I'm not
> aware of a simple way around that within the regex.

Here's a fairly simple way to avoid the flaw, but that breaks out of the
regex mold.  Instead of returning the results from the match (potentially
one at a time), the regex in this solution pushes the matches directly onto
a results array.

The regex engine will find all the matches with a single execution of the
regex, because the regex uses (?!) to force backtracking even after what
would be a successful match.

With this code, example 3 gives all six possible matches instead of just
three.


#!/usr/local/bin/perl -w

use strict;
use re 'eval';

my @args = (
            [qw/ ABCBAB [AB] 3 3 2 2 /],
            [qw/ 12121212 1212 8 8 3 3 /],
            [qw/ ABAB [AB] 2 4 2 4 /],
           );

foreach my $args (@args) {
  print "@$args\n";
  my @matches = subpattern(@$args);
  foreach my $match (@matches) {
    print "@$match ",
          substr($args->[0], $match->[0], $match->[1] - $match->[0]),
          "\n";
  }
  print "\n";
}


sub subpattern {
  my($string, $pattern, $minlen, $maxlen, $minrep, $maxrep) = @_;

  my $not_pattern = "(?:(?!$pattern).)";
  my $match_length =
    "(?{(length\$1>=$minlen&&length\$1<=$maxlen)" .
    "&&[EMAIL PROTECTED],[\$-[0],\$-[0]+length\$1]})";

  my @matches;

  my $re = "(?=($not_pattern*" .
                "((?=$pattern).$not_pattern*){$minrep,$maxrep}" .
           ")$match_length(?!))";

  warn "$re\n";

  $string =~ /$re/;

  return @matches;
}

Reply via email to