On Sep 8, 2011, at 10:13 AM, Rob Dixon wrote:

> my $string = 'The Kcl Group';
> 
> $string =~ s/\b([aeiouy]{3,4}|[^aeiouy]{3,4})\b/\U$1/ig;
> 
> print $string, "\n";

        I'd like to revisit this, if I could.  I've modified the above regex so 
as not to capitalize ordinal numbers, however I've noticed that it produces 
incorrect output if the word has an apostrophe.  Given:

my $string = "rex's chicken on 51st st. at lkj";
$string =~ s/\b([aeiouy]{3,4}|[^aeiouy0123456789]{3,4})\b/uc($1)/eg;

the output is:
Rex'S Chicken on 51st ST. at LKJ

It should be:
Rex's Chicken on 51st St. at LKJ

        I Googled and tried everything I'd found, but I can't fix it.  Again, 
that line should capitalize 3 and 4 letter words that have either all vowels or 
all capitals.  The code I found below works great for capitalization except for 
that one regex which throws a wrench into it.

Thanks,
Marc

-----------

# http://daringfireball.net/2008/08/title_case_update

use strict;
use warnings;
use utf8;
use open qw( :encoding(UTF-8) :std );


my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on or 
the to v[.]? via vs[.]? );
my $small_re = join '|', @small_words;

my $apos = qr/ (?: ['’] [[:lower:]]* )? /x;

my $string = "rex's chicken on 51st st at lkj";

$string =~
        s{
                \b (_*) (?:
                        ( [-_[:alpha:]]+ [@.:/] [-_[:alpha:]@.:/]+ $apos ) # 
URL, domain, or email
                        |
                        ( (?i: $small_re ) $apos )                         # or 
small word (case-insensitive)
                        |
                        ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos )     # or 
word w/o internal caps
                        |
                        ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos )     # or 
some other word
                ) (_*) \b
        }{
                $1 . (
                  defined $2 ? $2         # preserve URL, domain, or email
                : defined $3 ? "\L$3"     # lowercase small word
                : defined $4 ? "\u\L$4"   # capitalize word w/o internal caps
                : $5                      # preserve other kinds of word
                ) . $6
        }exgo;

$string =~
        # exceptions for small words: capitalize at start and end of title
        s{
                (  \A [[:punct:]]*         # start of title...
                |  [:.;?!][ ]+             # or of subsentence...
                |  [ ]['"“‘(\[][ ]* )  # or of inserted subphrase...
                ( $small_re ) \b           # ... followed by small word
        }{
                $1\u\L$2
        }xigo;

$string =~
        s{
                \b ( $small_re )         # small word...
                (?= [[:punct:]]* \Z      # ... at the end of the title...
                |   ['"’”)\]] [ ] )   # ... or of an inserted subphrase?
        }{
                \u\L$1
        }xigo;

$string =~ s/\b([aeiouy]{3,4}|[^aeiouy0123456789]{3,4})\b/uc($1)/eg;

        print "$string \n";
        print "$string \n";


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to