On Jul 13, Ing. Branislav Gerzo said:

Thomas Bätzler [TB], on Wednesday, July 13, 2005 at 12:38 (+0200) made
these points:

that - for example - Begin and end with an "G"
TB>   while( $line =~ m/\W((g\w*?)|(\w*?g))\W/ig ){

this will print OR, he wants AND, so, change this regexp to:
m/\W((g)\w*\2)\W/ig

Except that both of these regexes fail on the string "goulag", because \W has to MATCH a non-word character. Use word boundaries instead. And in your regex, Ing, there's no reason to capture 'g' and use \2 later. It's not a variable pattern, so hard-code the 'g' both times.

Here's a "first draft":

  /\b(g\w*g)\b/i

That matches a "word" that starts with a 'g' (or a 'G', since the /i modifier makes the regex case-insensitive), then has zero or more word characters (a-zA-Z0-9_), and then a 'g' (or 'G'). Immediately before the first 'g' and after the last 'G' cannot be another word character, so while it matches "grog" in "do you like grog?", it doesn't match the "gogg" in "I'm wearing goggles".

However, this requires the word to be at least two characters long -- the single letter 'g' as a word (if you'd permit that) fails to match, because the regex requires at least two characters. An easy workaround is:

  /\b(g(?:\w*g)?)\b/

which makes the ".....g" part optional (so long as the word boundaries still match).

The remaining problem is in the definition of a "word". What about "I got a grab-bag gift"? Should "grab-bag" match? It doesn't currently, because the '-' is not a valid word character (that is, it isn't matched by \w).

--
Jeff "japhy" Pinyan         %  How can we ever be the sold short or
RPI Acacia Brother #734     %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %    -- Meister Eckhart
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Reply via email to