On 03/25/2013 05:47 AM, Jean-Marc Messina wrote:
> Hi
> 
> I hope i report this bug from the good way, if not, please accept my
> aplogies and ignore that mail as it's my first bug report.
> 

> We have been facing a weird behaviour of "grep -E" on Debian Squeeze
> versions which seems not to happen in lenny or wheezy versions.

The behavior you are seeing is locale-dependent.

> 
> Exemple :
> 
> echo "tanZANIE" | grep -E '^[a-z]{2,20}$'
> No output (normal behaviour)
> 
> echo "tanzANIE" | grep -E '^[a-z]{2,20}$'
> output : "tanzANIE"

You are probably running grep inside a locale that has case-insensitive
sorting, and thus where the range [a-z] actually expands to [aAbB...yYz]
(but not Z).  For example, glibc's en_US.UTF-8 locale has that behavior.
 POSIX says that the use of range operators in regular expressions is
undefined outside of the C locale, precisely because of this
rather-confusing historical behavior.

There is an effort underway to convert GNU tools to use Rational Range
Interpretation, where [a-z] will be forcefully translated to [abc...yz]
regardless of locale, even when libc would behave otherwise by default.
 I'm not sure if that conversion has yet hit the version of grep that
you are using, but it may be part of the answer in the difference you
are seeing.  The other thing to do is to check the output of 'locale'
between the machines that differ.

Meanwhile, the only PORTABLE way to get the behavior you want is to
avoid range expressions outside of the C locale, by either spelling out
the range:

echo "tanzANIE" | grep -E '^[abcdefghijklmnopqrstuvwxyz]{2,20}$'

or by forcing the locale:

echo "tanzANIE" | LC_ALL=C grep -E '^[a-z]{2,20}$'

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to