On Wed, Jun 27, 2007 at 05:45:54AM -0700, Phil Carmody wrote: > Say I had a string satisfying /^[A-Z_]{6}$/, but not equal to '______' > and I wish to extract from that the 1 or 2 letters which are closest to > the n-th character in the string. Is there a simple regexp to perform > that task? > > e.g. > if the string=A_Z_K_ then: > if n=1, then I want 'A' (or 'AA', not fussed) > if n=2, then I want 'AZ' > if n=3, then I want 'Z' (or 'ZZ', not fussed) > if n=4, then I want 'ZK' > if n=5 or 6, then I want 'K' (or 'KK', not fussed) > > I can see how to do it with the concatenation of two matches from two > substrs, but that's barely simpler than a naive loop over each character > forwards and backwards.
Well, I wouldn't exactly call this regex simple... But I have come up with one that does it: for (qw/ A_Z_K_ A_____ _____K /) { print "$_\n"; for my $n (1 .. 6) { my $r = $n - 1; print "$n: "; /^(?(?=.{0,$r}[A-Z]).{0,$r}|.*)([A-Z])(?(?<!^..{$r}).*?([A-Z]|$))/ && print "$1 $2"; print "\n"; } } This has the advantage of always putting the matched characters in $1 and $2. (Note that $1 is always set; if there is no letter at or before the position, $1 will contain the first letter after the position and $2 will be empty.) Here are two other approaches: /^.{$r}([A-Z])/ || /^.{0,$r}([A-Z]).*?([A-Z]|$)/ || /^.*([A-Z])/ && print "$1 $2"; is simpler, but uses three separate regular expressions. /^.{$r}([A-Z])|^.{0,$r}([A-Z]).*?([A-Z]|$)|^.*([A-Z])/ && print $1 || $4 || "$2 $3"; uses a single regular expression, but the results will be in $1, or in $2 and $3, or in $4. (And if digits were allowed the print logic would need to be modified.) Ronald