bug#6377: Subject: inaccurate character class processing

2010-06-08 Thread Iosif Fettich

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' 
-DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/local/share/locale' 
-DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib 
-g -O2
uname output: Linux pony.netsoft.ro 2.6.32.12-115.fc12.x86_64 #1 SMP Fri 
Apr 30 19:46:25 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.1
Patch Level: 0
Release Status: release

Description:

(I'm not sure if this a bash or a coreutils issue).

ls [A-Z]*

doesn't work as expected/documented.
I'd want/expect it to list the filenames starting with an uppercase 
letter.

Thank you for looking at it!


Repeat-By:
In an empty directory, create files like

touch a A b B z Z

Now,

ls [A-Z]*

outputs

A  b  B  z  Z

(why 'b' and 'z' - and/or where's 'a'...?!!)

and

ls [a-z]*

outputs

a  A  b  B  z

(why 'A' and 'B' - and/or where's 'Z'...?!!)








bug#6377: Subject: inaccurate character class processing

2010-06-08 Thread Pádraig Brady
tags 6377 + notabug

On 08/06/10 14:48, Iosif Fettich wrote:
 (I'm not sure if this a bash or a coreutils issue).
 
 ls [A-Z]*
 
 doesn't work as expected/documented.

The logic is in bash but it's not an issue.
It's using the collating sequence of your locale

$ touch a A b B z Z
$ echo [A-Z]*
A b B z Z
$ export LANG=C
$ echo [A-Z]*
A B Z






bug#6377: Subject: inaccurate character class processing

2010-06-08 Thread Pierre Gaston
On Tue, Jun 8, 2010 at 4:48 PM, Iosif Fettich ifett...@netsoft.ro wrote:
...

        ls [a-z]*

        outputs

        a  A  b  B  z

        (why 'A' and 'B' - and/or where's 'Z'...?!!)


it's a classic problem with the locale, the range [a-z] contains the
capital letters
for some  locale definitions ie  a-z is aAbB z (Z is after the z)
As a workaround  you can export LC_COLLATE=C, or maybe use [[:lower:]]
instead of [a-z]





bug#6377: Subject: inaccurate character class processing

2010-06-08 Thread Greg Wooledge
On Tue, Jun 08, 2010 at 04:48:08PM +0300, Iosif Fettich wrote:
 ls [A-Z]*
 
 doesn't work as expected/documented.
 I'd want/expect it to list the filenames starting with an uppercase 
 letter.

The results of this are dependent upon your locale.  If your locale is
set to C or POSIX, you will get what you expect.  If your locale is set
to something else (such as en_US.utf8) then you will get something
completely different.

I explain why this happens, on http://mywiki.wooledge.org/locale.

The glob in your command is expanded by bash (not ls), so in order to
get the results you want, your locale variables would have to be set to
C/POSIX *before* expanding the glob.  In other words, LANG=C ls [A-Z]*
will not work, since that sets the variable after expanding the glob.

This would work, although it's extremely awkward (IMHO):

  LANG=C bash -c 'ls [A-Z]*'

Another approach would be to permanently (or semi-permanently, e.g. just
for one shell session) set the LC_COLLATE variable.  Thus,

  export LC_COLLATE=C
  ls [A-Z]*

This will cause the ordering of glob results (and also of results generated
by ls itself, for example ls with no arguments, or ls dirname) to be
in ASCII order, without throwing away the other locale features.