Hey Fumiyasu,

On Wed, Jun 03, 2009 at 11:04:56PM +0900, SATOH Fumiyasu wrote:
> $ dpkg -l libc6 bash
> ...
> ii  bash             3.2-5            The GNU Bourne Again SHell
> ii  libc6            2.9-13           GNU C Library: Shared libraries
> $ mkdir tmp
> $ cd tmp
> $ touch a b c x y z A B C X Y Z
> $ LC_ALL=C /bin/bash --noprofile --norc -c 'echo [A-Z]'
> A B C X Y Z
> $ LC_ALL=ja_JP.UTF-8 /bin/bash --noprofile --norc -c 'echo [A-Z]'
> A B C X Y Z
> $ LC_ALL=en_US.UTF-8 /bin/bash --noprofile --norc -c 'echo [A-Z]'
> A b B c C x X y Y z Z

This behavior seems quite dangerous to me: the command “rm [A-Z]*” could
remove more than just files starting with an uppercase letter, which
most people probably would not expect.

The root of this issue is in bash-3.2/lib/glob/smatch.c:

    static int rangecmp (c1, c2)
         int c1, c2;
    {
      static char s1[2] = { ' ', '\0' };
      static char s2[2] = { ' ', '\0' };
      int ret;

      /* Eight bits only.  Period. */
      c1 &= 0xFF;
      c2 &= 0xFF;

      if (c1 == c2)
        return (0);

      s1[0] = c1;
      s2[0] = c2;

      if ((ret = strcoll (s1, s2)) != 0)
        return ret;
      return (c1 - c2);
    }

This function uses the strcoll() function which is similar to strcmp()
but “compares two strings using the current locale”.  This allows things
like

    $ touch ö
    $ echo [o-p]
    ö

to work and also causes the problem you described.  Interestingly, the
POSIX specification permits this:

    7. In the POSIX locale, a range expression represents the set of
    collating elements that fall between two elements in the collation
    sequence, inclusive. In other locales, a range expression has
    unspecified behavior: strictly conforming applications shall not
    rely on whether the range expression is valid, or on the set of
    collating elements matched.

     – 
http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

So, the bash maintainers may decide to use -DUSE_POSIX_GLOB_LIBRARY
(which is deprecated according to
http://lists.gnu.org/archive/html/bug-bash/2001-02/msg00032.html), patch
away the usage of strcoll, or leave everything as it is.

All the best,
-- 
Michael Schutte <mi...@uiae.at>

Attachment: signature.asc
Description: Digital signature

Reply via email to