On Sep 8, 2011, at 10:13 AM, Rob Dixon wrote: > my $string = 'The Kcl Group'; > > $string =~ s/\b([aeiouy]{3,4}|[^aeiouy]{3,4})\b/\U$1/ig; > > print $string, "\n";
I'd like to revisit this, if I could. I've modified the above regex so as not to capitalize ordinal numbers, however I've noticed that it produces incorrect output if the word has an apostrophe. Given: my $string = "rex's chicken on 51st st. at lkj"; $string =~ s/\b([aeiouy]{3,4}|[^aeiouy0123456789]{3,4})\b/uc($1)/eg; the output is: Rex'S Chicken on 51st ST. at LKJ It should be: Rex's Chicken on 51st St. at LKJ I Googled and tried everything I'd found, but I can't fix it. Again, that line should capitalize 3 and 4 letter words that have either all vowels or all capitals. The code I found below works great for capitalization except for that one regex which throws a wrench into it. Thanks, Marc ----------- # http://daringfireball.net/2008/08/title_case_update use strict; use warnings; use utf8; use open qw( :encoding(UTF-8) :std ); my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on or the to v[.]? via vs[.]? ); my $small_re = join '|', @small_words; my $apos = qr/ (?: ['’] [[:lower:]]* )? /x; my $string = "rex's chicken on 51st st at lkj"; $string =~ s{ \b (_*) (?: ( [-_[:alpha:]]+ [@.:/] [-_[:alpha:]@.:/]+ $apos ) # URL, domain, or email | ( (?i: $small_re ) $apos ) # or small word (case-insensitive) | ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos ) # or word w/o internal caps | ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos ) # or some other word ) (_*) \b }{ $1 . ( defined $2 ? $2 # preserve URL, domain, or email : defined $3 ? "\L$3" # lowercase small word : defined $4 ? "\u\L$4" # capitalize word w/o internal caps : $5 # preserve other kinds of word ) . $6 }exgo; $string =~ # exceptions for small words: capitalize at start and end of title s{ ( \A [[:punct:]]* # start of title... | [:.;?!][ ]+ # or of subsentence... | [ ]['"“‘(\[][ ]* ) # or of inserted subphrase... ( $small_re ) \b # ... followed by small word }{ $1\u\L$2 }xigo; $string =~ s{ \b ( $small_re ) # small word... (?= [[:punct:]]* \Z # ... at the end of the title... | ['"’â€)\]] [ ] ) # ... or of an inserted subphrase? }{ \u\L$1 }xigo; $string =~ s/\b([aeiouy]{3,4}|[^aeiouy0123456789]{3,4})\b/uc($1)/eg; print "$string \n"; print "$string \n"; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/