On Thu, 28 Apr 2011 10:06:58 -0700 (PDT)
Frank Müller <pottw...@freenet.de> wrote:

> dear all,
> I'm trying to do some string replacements with Unicode::Collate which
> usually work very well, but these replacements seem to be case
> insensitive by default - how can I change this? look at this simple
> example:
> 
> my $myCollator = Unicode::Collate->new( normalization => undef, level
> => 1 );
> my $str = "Camel camel donkey zebra came\x{301}l CAMEL horse
> cAmEL...";
> $myCollator->gsubst($str, "camel", sub { "#$_[0]#" });
> 
> which makes the following replacements:
> 
> #Camel# #camel# donkey zebra #camél# #CAMEL# horse #cAmEL#...
> 
> what I would love to see is the following result:
> 
> Camel #camel# donkey zebra #camél# CAMEL horse cAmEL...
> 
> As there doesn't seem to be gsubst for case sensitive and gisubst for
> case insensitive string replacements, what would a solution look like?
> 
> Thanks a lot for any suggestions,
> Frank

As (level => 1) is not default, (level => 3) is also allowed for case
sensitive matching.  But UCA thinks accent difference (level 2) is 
more important than case difference (level 3), then camél won't
match camel when (level => 3).

level 1: camel matches camél and Camel.
level 2: camel matches Camel but not camél.
level 3: camel matches neither Camel nor camél.
--Even at level 3, it isn't so strict:
  camel matches "c-a-m-e-l", "ca  mel", etc.
  since punctuation difference is level 4.

To make camel match camél but not Camel, other workwround is
need. In next release, a new parameter (ignore_level2) will allow it.
(However the behavior of ignore_level2 is quite different from
 so-called caseLevel in UCA etc.)

Regards,
SADAHIRO Tomoyuki

Reply via email to