On Thu, 28 Apr 2011 10:06:58 -0700 (PDT) Frank Müller <pottw...@freenet.de> wrote:
> dear all, > I'm trying to do some string replacements with Unicode::Collate which > usually work very well, but these replacements seem to be case > insensitive by default - how can I change this? look at this simple > example: > > my $myCollator = Unicode::Collate->new( normalization => undef, level > => 1 ); > my $str = "Camel camel donkey zebra came\x{301}l CAMEL horse > cAmEL..."; > $myCollator->gsubst($str, "camel", sub { "#$_[0]#" }); > > which makes the following replacements: > > #Camel# #camel# donkey zebra #camél# #CAMEL# horse #cAmEL#... > > what I would love to see is the following result: > > Camel #camel# donkey zebra #camél# CAMEL horse cAmEL... > > As there doesn't seem to be gsubst for case sensitive and gisubst for > case insensitive string replacements, what would a solution look like? > > Thanks a lot for any suggestions, > Frank As (level => 1) is not default, (level => 3) is also allowed for case sensitive matching. But UCA thinks accent difference (level 2) is more important than case difference (level 3), then camél won't match camel when (level => 3). level 1: camel matches camél and Camel. level 2: camel matches Camel but not camél. level 3: camel matches neither Camel nor camél. --Even at level 3, it isn't so strict: camel matches "c-a-m-e-l", "ca mel", etc. since punctuation difference is level 4. To make camel match camél but not Camel, other workwround is need. In next release, a new parameter (ignore_level2) will allow it. (However the behavior of ignore_level2 is quite different from so-called caseLevel in UCA etc.) Regards, SADAHIRO Tomoyuki