Andrzej Krzysztofowicz <[EMAIL PROTECTED]> [30-11-2004 15:27]: > Radoslaw Zielinski wrote: >> Jakub Bogusz <[EMAIL PROTECTED]> [30-11-2004 13:35]: >>> On Tue, Nov 30, 2004 at 01:26:50PM +0100, Radoslaw Zielinski wrote: [...] >>>> As this shouldn't be locale dependent, until the bug is properly >>>> resolved, >>> regex functionality in libc is locale-dependent (uses LC_COLLATE). >> For [a-z]? It doesn't make sense... What does this class contain in >> Asian locales then, nothing? > Eg. for pl_PL it may contain 'ą' and some capital letters. For Asian locales > it may contain either the same as for C or the same as for pl_PL. Each > locale character set is a superset of ASCII.
Blah, right you are. POSIX (9.3.5 RE Bracket Expression) states: 7. In the POSIX locale, a range expression represents the set of collating elements that fall between two elements in the collation sequence, inclusive. In other locales, a range expression has unspecified behavior: strictly conforming applications shall not rely on whether the range expression is valid, or on the set of collating elements matched. A range expression shall be expressed as the starting point and the ending point separated by a hyphen ( '-' ). So, as I understand, [a-z] may or may not contain a pink elephant, which makes it pretty useless. What was the point...? And this is just sick: $ echo x | LC_ALL=et_EE grep '[a-z]' $ echo x | LC_ALL=et_EE grep '[a-x]' # shouldn't be invalid? x $ echo x | LC_ALL=et_EE grep '[a-y]' # no, "x" is somewhere... x $ echo z | LC_ALL=et_EE grep '[a-y]' # this isn't ASCII order for sure. z BTW, perl implements more logical behaviour: # LC_ALL is pl_PL.ISO-8859-2 $ echo ą | perl -Mlocale -nle 'print if /[a-z]/' $ echo ą | perl -Mlocale -nle 'print if /[[:alpha:]]/' ą -- Radosław Zieliński <[EMAIL PROTECTED]> [ GPG key: http://radek.karnet.pl/ ]
pgpY3pp5XbW1q.pgp
Description: PGP signature
_______________________________________________ pld-devel-en mailing list [EMAIL PROTECTED] http://lists.pld-linux.org/mailman/listinfo/pld-devel-en
