RE: pulling out "a","an", "the" from beginning of strings

Bob Showalter Wed, 25 Aug 2004 05:34:40 -0700

Errin Larsen wrote:
> Hey,
> 
> Ok, looking through this ... I'm confused.
> 
> << SNIP >>
> 
> > > 
> > > Perhaps:
> > > 
> > >    $scalar =~ s/^(a|an|the)\s*\b//i;
> > > 
> > > would work better.
> 
> <<SNIP>>
> 
> Is this capturing into $1 the a|an|the (yes)


Yes, but that's only a side effect. I'm not doing anything with $1.

> and the rest of the title
> into $2 (no?).

No.

>  After doing so, will it reverse the two ( i.e.
> s/^(a|an|the)\s+(.*)\b/$2, $1/i )?  

No.

> Also, what is the "\b"?

A word boundary assertion. See perldoc perlre.

>  it seems
> that the trailing "i" is for ignoring case; is that correct?

Yes.

It's not concerned with capturing anything; it's just matching a pattern and
then replacing the text matched with an empty string. The parens are used to
delimit the alternation a|an|the.

What I'm trying to match is:

   ^           beginning of line, followed by
   (a|an|the)  one of these sequences, followed by
   \s*         any amount of whitespace, followed by
   \b          a word boundary (see perldoc perlre)

The \s* is there so the whitespace following the leading word "a, an, or
the" will be removed along with the word. The \b ensures that the end of
what we capture either is at the start of a new word or is the end of the
string.

If I left off the \b, it would match the "a" in "acme", since \s* can match
the zero-length string between the "a" and the "c". With \b in there, the
match fails, because \b will not match at the "c", since it's not a word
boundary.

An alternative to \s*\b would be \s+ (i.e. match at least one whitespace
char). However, this won't match a single word title like "the", because \s+
doesn't match at the end of the string, while \s*\b does. (How such a title
should be handled is up to the OP; if it should be left alone, then \s+
would be appropriate.)

HTH

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: pulling out "a","an", "the" from beginning of strings

Reply via email to