On Thu, 26 Jul 2001, Birgit Kellner wrote:
>
> I'm thinking of coding a KWIC search through a text. The user chooses a 
> search string and a horizon, meaning that output is to contain $i words to 
> the left and to the right of the search string (if found).
> 
> This is the code I have so far:
> 
> my $text ="this is just some text I am making up as an example but it 
> should be a bit longer since I am testing this script and would like to use 
> larger numbers";
> print "your searchstring: ";
> chomp ($searchstring = <STDIN>);

I've only been on this list a little while, so I'm not sure if it's
bad etiquette to simply propose an alternative solution to your
problem, especially when you had a very specific question that's worth
an answer in its own right.  (Someone please flame me if it *is* bad
etiquette.)

But ... I never like to pass up an opportunity to use a regexp to
solve a large problem.  So here's a 5-line program that does your
whole task:

-----
if ($text =~ m/((\w+ ){1,$i}$searchstring( \w+){0,$i})/) {
    print $1, "\n";
} else {
    print "not found\n";
}
-----

Here's what the regexp match is doing.  It returns true if it

  (a) first matches anywhere from 1 to $i words,
  (b) then matches $searchstring, and
  (c) then matches anywhere from 0 to $i additional words.

The $1 in the print statement is a special Perl variable that retains
everything in the first -- in this case, the outermost -- set of
parentheses in the regexp match, i.e. precisely the whole phrase
you're looking for.

One limitation of this solution is that it won't match the very first
"this" because I asked for 1 to $i words before the $searchstring,
which (obviously) does not allow 0 words before the $searchstring.  
Don't ask me why I didn't use {0,$i} ... I tried it and it was giving
me bugs, so I came up with the next closest thing. :)

A hack <gasp> to solve this limitation is to provide an alternative
at the beginning:

  if ($text =~ m/((\b|(\w+ ){1,$i})$searchstring( \w+){0,$i})/) {

which means:

EITHER
   match any 1 to $i words before the $searchstring (as we were
   discussing above)
OR
   match only the beginning of the string (\b) before the
   $searchstring.

This accomplishes exactly what I originally wanted to do with

  (\w+ ){0,$i}

in that it will match either 0 words or 1 word or 2 words, etc.,
before the $searchstring.

One final limitation of this approach is that you really don't have
much control over which occurrence of, say, the word "am" it will
match.  I'm pretty sure it will always stop with the very first match 
it finds.  I don't know if that's a feature that you were looking for
in your original program.

I hope this all helps and that it's not too complicated for inclusion
on this list.  (After all, I consider myself a beginner too.)

Daniel


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to