tag 440813 - moreinfo
thanks

On Tue, Sep 04, 2007 at 04:44:26PM +0300, Petteri Pajunen wrote:
> 2007/9/4, Justin Pryzby <[EMAIL PROTECTED]>:
> >
> > tag 440813 moreinfo
> > thanks
> >
> > On Tue, Sep 04, 2007 at 04:15:05PM +0300, Petteri Pajunen wrote:
> > > Package: grep
> > > Version: 2.5.1.ds2-6
> > >
> > > On my amd64 debian box using locale
> > > fi_FI.UTF-8 (all LC_XXX variables set to fi_FI.UTF-8 as well as LANG)
> > grep
> > > fails to handle character ranges correctly. For example, the first grep
> > is
> > > wrong but the second works correctly:
> > > $ echo "w" | grep "[a-z]"
> > > $ echo "a" | grep "[a-z]"
> > > a
> > Are you sure this is not the normal problem of character ranges not
> > being portable except within the C locale?  The common example is that
> > [a-z] might actually mean aAbBcCdDeE...z, which is often not what's
> > expected, since it both includes capital letters and misses "Z".  I
> > don't know what fi locale looks like..
> 
> I am quite sure, since other users reported different results on
> 32-bit systems, also using 2.5.1ds2-6 and fi_FI (unless fi_FI
> differs between i386 and amd64).
Ok, then I'm afraid I don't have any further insights..

> I agree that applications should not count on character ranges, but a-z is
> probably used in many scripts (that's how I ran into this potential bug).
> The finnish alphabetical order is the common one, i.e.
> "abcd.....tuvwxyz...". So v and w come after a and before z.
> 
> > The typical solution is to either use [[:alpha:]] or just use:
> > |LC_ALL=C grep '...'
> 
> I have no problems with this solution, but I would bet that there are lots
> of scripts using a-z ranges and not expecting letters such as v and w being
> outside of a-z. This can introduce subtle bugs, for example ignoring files
> with letters v and w in their name etc..
It's a moderately unfortunate situation, but ISTR that it's specified
by standards so there's no going back.  Switching to UTF is more
recent than those standards.

Justin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to