Larry Wall <[EMAIL PROTECTED]> writes:On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Ibaïez". Because of the search
: isn't case-sensitive, all letters should be uppercased, using the uc method.
I don't think this is your problem, but in general I think it's better to canonicalize with lc() because it will try to undo both uppercase and titlecase.
Since you are here ;-)
Why does à not uppercase to à ?
I am no Larry but I think I can answer this-- it is the old mess of 8-bit versus
Unicode. In the old world of 8-bit codepages the à upcases to à only if the toupper()
says so, which normally needs a "use locale" somewhere, and even then it doesn't work
unless your locale as defined by your vendor says so. In the new world of Unicode the
à upcases to à if the string is Unicode. For example this works for me in a UTF-8
terminal window:
$ perl -CO -le '$a=chr(0xD1).chr(256);$b=uc($a);print $b' ÃÄ
I believe that as soon as the IO stream from where IbaÃez is coming from is marked
UTF-8, the à will upcase as expected.
Which bits of which Unicode.org files are used by uc()?
pp_uc -> to_utf8_upper -> to_utf8_case which uses lib/unicore/To/Foo.pl, which have been created from the UnicodeData.txt.
--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen