Larry Wall <[EMAIL PROTECTED]> writes:
On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote:
: For this example the search value will be "Ibaïez". Because of the search
: isn't case-sensitive, all letters should be uppercased, using the uc method.


I don't think this is your problem, but in general I think it's better
to canonicalize with lc() because it will try to undo both uppercase
and titlecase.

Since you are here ;-)


Why does à not uppercase to à ?

I am no Larry but I think I can answer this-- it is the old mess of 8-bit versus
Unicode. In the old world of 8-bit codepages the à upcases to à only if the toupper()
says so, which normally needs a "use locale" somewhere, and even then it doesn't work
unless your locale as defined by your vendor says so. In the new world of Unicode the
à upcases to à if the string is Unicode. For example this works for me in a UTF-8
terminal window:


$ perl -CO -le '$a=chr(0xD1).chr(256);$b=uc($a);print $b'
ÃÄ

I believe that as soon as the IO stream from where IbaÃez is coming from is marked
UTF-8, the à will upcase as expected.


Which bits of which Unicode.org files are used by uc()?

pp_uc -> to_utf8_upper -> to_utf8_case which uses lib/unicore/To/Foo.pl, which have been created from the UnicodeData.txt.

--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen





Reply via email to