Errin Larsen wrote: > Hey, > > Ok, looking through this ... I'm confused. > > << SNIP >> > > > > > > > Perhaps: > > > > > > $scalar =~ s/^(a|an|the)\s*\b//i; > > > > > > would work better. > > <<SNIP>> > > Is this capturing into $1 the a|an|the (yes)
Yes, but that's only a side effect. I'm not doing anything with $1. > and the rest of the title > into $2 (no?). No. > After doing so, will it reverse the two ( i.e. > s/^(a|an|the)\s+(.*)\b/$2, $1/i )? No. > Also, what is the "\b"? A word boundary assertion. See perldoc perlre. > it seems > that the trailing "i" is for ignoring case; is that correct? Yes. It's not concerned with capturing anything; it's just matching a pattern and then replacing the text matched with an empty string. The parens are used to delimit the alternation a|an|the. What I'm trying to match is: ^ beginning of line, followed by (a|an|the) one of these sequences, followed by \s* any amount of whitespace, followed by \b a word boundary (see perldoc perlre) The \s* is there so the whitespace following the leading word "a, an, or the" will be removed along with the word. The \b ensures that the end of what we capture either is at the start of a new word or is the end of the string. If I left off the \b, it would match the "a" in "acme", since \s* can match the zero-length string between the "a" and the "c". With \b in there, the match fails, because \b will not match at the "c", since it's not a word boundary. An alternative to \s*\b would be \s+ (i.e. match at least one whitespace char). However, this won't match a single word title like "the", because \s+ doesn't match at the end of the string, while \s*\b does. (How such a title should be handled is up to the OP; if it should be left alone, then \s+ would be appropriate.) HTH -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>
