tag 440813 - moreinfo thanks On Tue, Sep 04, 2007 at 04:44:26PM +0300, Petteri Pajunen wrote: > 2007/9/4, Justin Pryzby <[EMAIL PROTECTED]>: > > > > tag 440813 moreinfo > > thanks > > > > On Tue, Sep 04, 2007 at 04:15:05PM +0300, Petteri Pajunen wrote: > > > Package: grep > > > Version: 2.5.1.ds2-6 > > > > > > On my amd64 debian box using locale > > > fi_FI.UTF-8 (all LC_XXX variables set to fi_FI.UTF-8 as well as LANG) > > grep > > > fails to handle character ranges correctly. For example, the first grep > > is > > > wrong but the second works correctly: > > > $ echo "w" | grep "[a-z]" > > > $ echo "a" | grep "[a-z]" > > > a > > Are you sure this is not the normal problem of character ranges not > > being portable except within the C locale? The common example is that > > [a-z] might actually mean aAbBcCdDeE...z, which is often not what's > > expected, since it both includes capital letters and misses "Z". I > > don't know what fi locale looks like.. > > I am quite sure, since other users reported different results on > 32-bit systems, also using 2.5.1ds2-6 and fi_FI (unless fi_FI > differs between i386 and amd64). Ok, then I'm afraid I don't have any further insights..
> I agree that applications should not count on character ranges, but a-z is > probably used in many scripts (that's how I ran into this potential bug). > The finnish alphabetical order is the common one, i.e. > "abcd.....tuvwxyz...". So v and w come after a and before z. > > > The typical solution is to either use [[:alpha:]] or just use: > > |LC_ALL=C grep '...' > > I have no problems with this solution, but I would bet that there are lots > of scripts using a-z ranges and not expecting letters such as v and w being > outside of a-z. This can introduce subtle bugs, for example ignoring files > with letters v and w in their name etc.. It's a moderately unfortunate situation, but ISTR that it's specified by standards so there's no going back. Switching to UTF is more recent than those standards. Justin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]