Stop the mail On 10/7/11, Marc <[email protected]> wrote: > On Sep 8, 2011, at 10:13 AM, Rob Dixon wrote: > >> my $string = 'The Kcl Group'; >> >> $string =~ s/\b([aeiouy]{3,4}|[^aeiouy]{3,4})\b/\U$1/ig; >> >> print $string, "\n"; > > I'd like to revisit this, if I could. I've modified the above regex so > as > not to capitalize ordinal numbers, however I've noticed that it produces > incorrect output if the word has an apostrophe. Given: > > my $string = "rex's chicken on 51st st. at lkj"; > $string =~ s/\b([aeiouy]{3,4}|[^aeiouy0123456789]{3,4})\b/uc($1)/eg; > > the output is: > Rex'S Chicken on 51st ST. at LKJ > > It should be: > Rex's Chicken on 51st St. at LKJ > > I Googled and tried everything I'd found, but I can't fix it. Again, > that > line should capitalize 3 and 4 letter words that have either all vowels or > all capitals. The code I found below works great for capitalization except > for that one regex which throws a wrench into it. > > Thanks, > Marc > > ----------- > > # http://daringfireball.net/2008/08/title_case_update > > use strict; > use warnings; > use utf8; > use open qw( :encoding(UTF-8) :std ); > > > my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on > or the to v[.]? via vs[.]? ); > my $small_re = join '|', @small_words; > > my $apos = qr/ (?: ['’] [[:lower:]]* )? /x; > > my $string = "rex's chicken on 51st st at lkj"; > > $string =~ > s{ > \b (_*) (?: > ( [-_[:alpha:]]+ [@.:/] [-_[:alpha:]@.:/]+ $apos ) # > URL, domain, or > email > | > ( (?i: $small_re ) $apos ) # or > small word > (case-insensitive) > | > ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos ) # or > word w/o internal > caps > | > ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos ) # or > some other word > ) (_*) \b > }{ > $1 . ( > defined $2 ? $2 # preserve URL, domain, or email > : defined $3 ? "\L$3" # lowercase small word > : defined $4 ? "\u\L$4" # capitalize word w/o internal caps > : $5 # preserve other kinds of word > ) . $6 > }exgo; > > $string =~ > # exceptions for small words: capitalize at start and end of title > s{ > ( \A [[:punct:]]* # start of title... > | [:.;?!][ ]+ # or of subsentence... > | [ ]['"“‘(\[][ ]* ) # or of inserted subphrase... > ( $small_re ) \b # ... followed by small word > }{ > $1\u\L$2 > }xigo; > > $string =~ > s{ > \b ( $small_re ) # small word... > (?= [[:punct:]]* \Z # ... at the end of the title... > | ['"’†)\]] [ ] ) # ... or of an inserted subphrase? > }{ > \u\L$1 > }xigo; > > $string =~ s/\b([aeiouy]{3,4}|[^aeiouy0123456789]{3,4})\b/uc($1)/eg; > > print "$string \n"; > print "$string \n"; > > > -- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > http://learn.perl.org/ > > >
-- Sent from my mobile device -- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] http://learn.perl.org/
