Bug#440813: grep handles character ranges incorrectly on some locales

Petteri Pajunen Tue, 04 Sep 2007 06:49:00 -0700

2007/9/4, Justin Pryzby <[EMAIL PROTECTED]>:
>
> tag 440813 moreinfo
> thanks
>
> On Tue, Sep 04, 2007 at 04:15:05PM +0300, Petteri Pajunen wrote:
> > Package: grep
> > Version: 2.5.1.ds2-6
> >
> > On my amd64 debian box using locale
> > fi_FI.UTF-8 (all LC_XXX variables set to fi_FI.UTF-8 as well as LANG)
> grep
> > fails to handle character ranges correctly. For example, the first grep
> is
> > wrong but the second works correctly:
> > $ echo "w" | grep "[a-z]"
> > $ echo "a" | grep "[a-z]"
> > a
> Are you sure this is not the normal problem of character ranges not
> being portable except within the C locale?  The common example is that
> [a-z] might actually mean aAbBcCdDeE...z, which is often not what's
> expected, since it both includes capital letters and misses "Z".  I
> don't know what fi locale looks like..



I am quite sure, since other users reported different results on 32-bit
systems, also using 2.5.1ds2-6 and
fi_FI (unless fi_FI differs between i386 and amd64).

I agree that applications should not count on character ranges, but a-z is
probably used in many scripts (that's how I ran into this potential bug).
The finnish alphabetical order is the common one, i.e.
"abcd.....tuvwxyz...". So v and w come after a and before z.

The typical solution is to either use [[:alpha:]] or just use:
|LC_ALL=C grep '...'

I have no problems with this solution, but I would bet that there are lots
of scripts using a-z ranges and not expecting letters such as v and w being
outside of a-z. This can introduce subtle bugs, for example ignoring files
with letters v and w in their name etc..

Petteri

Bug#440813: grep handles character ranges incorrectly on some locales

Reply via email to