# [EMAIL PROTECTED] / 2007-01-21 10:48:30 +0000:
> # [EMAIL PROTECTED] / 2007-01-21 00:11:13 +0100:
> > Roman Neuhauser wrote:
> > > # [EMAIL PROTECTED] / 2007-01-17 16:59:26 +0100:
> > >> wouldn't it be fair to assume (safety through paranoia) that
> > >> ctype_alnum() would suffer the same problem? (given the manual's
> > >> indication that ctype_alnum() and the offending regexp are equivalent?)
> > >
> > > isalnum(3) uses isalpha(3) and isdigit(3), so yes, their results are
> > > locale-dependent (LC_CTYPE, see setlocale(3)), but don't depend on
> > > collating sequence.
> >
> > so really the doc's are slightly misleading or even incorrect,
>
> Slightly, in a usually-behaves-as-described-but-for-different-reasons
> way.
>
> > as a side note: do you have any real world example of where this
> > collation issue might actually bite someone making use of the aforementioned
> > regexp range?
>
> Not off the top of my head. :(
Trying the Czech locale (I normally run with the values below), I've
come across some unexpected behavior.
0xE8 is c caron, and sorts between c and d, but not on this computer.
0xBE is z caron, and sorts just after z.
I'd expect [a-z] to match 0xE8 but it does not.
LANG=cs_CZ.ISO8859-2
LC_COLLATE=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_MESSAGES=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
LC_TIME=en_US.ISO8859-1
[EMAIL PROTECTED] ~/tmp/blemc 1042:0 > uname -srm
FreeBSD 6.1-PRERELEASE amd64
[EMAIL PROTECTED] ~/tmp/blemc 1043:0 > cat ./collseq.php
#!/usr/bin/env php
<?php
function f($c, $l)
{
printf("char=%c locale=%s\n", $c, $l);
setlocale(LC_COLLATE, $l);
setlocale(LC_CTYPE, $l);
printf("[a-z] = %s\n", var_export(preg_match('~[a-z]~', chr($c)), 1));
printf("[[:lower:]] = %s\n", var_export(preg_match('~[[:lower:]]~',
chr($c)), 1));
printf("islower(3) = %s\n", var_export(ctype_lower(chr($c)), 1));
print "\n";
}
f(0xE8, 'C'); f(0xE8, 'cs_CZ.ISO8859-2');
f(0xBE, 'C'); f(0xBE, 'cs_CZ.ISO8859-2');
[EMAIL PROTECTED] ~/tmp/blemc 1044:0 > ./collseq.php
char=č locale=C
[a-z] = 0
[[:lower:]] = 0
islower(3) = false
char=č locale=cs_CZ.ISO8859-2
[a-z] = 0
[[:lower:]] = 1
islower(3) = true
char=ž locale=C
[a-z] = 0
[[:lower:]] = 0
islower(3) = false
char=ž locale=cs_CZ.ISO8859-2
[a-z] = 0
[[:lower:]] = 1
islower(3) = true
--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man. You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php