2011/9/1 GNU bug Tracking System <[email protected]>: > Your bug report > > #9418: case sensitivity buggy in sort > > which was filed against the coreutils package, has been closed. > > The explanation is attached below, along with your original report. > If you require more details, please reply to [email protected]. > > -- > 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418 > GNU Bug Tracking System > Contact [email protected] with problems > > > ---------- Wiadomość przekazana dalej ---------- > From: Eric Blake <[email protected]> > To: "Michał Janke" <[email protected]> > Date: Thu, 01 Sep 2011 10:32:45 -0600 > Subject: Re: bug#9418: case sensitivity buggy in sort > tag 9418 notabug > thanks > > On 09/01/2011 02:58 AM, Michał Janke wrote: >> >> sort (GNU coreutils) 8.12 >> >> The case-sensitivity looks buggy in sort. Have a look at these examples: > > Thanks for the report. However, this is most likely due to your choice of > locale, and not a bug in sort; this is a FAQ: > https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021 > > Using 'sort --debug' will help expose the issue. > >> $ sort -k1,2 bbb >> a B b 0 >> A b b 1 >> A B b 0 > > $ sort --debug bbb -k1,2 > sort: using `en_US.UTF-8' sorting rules > sort: leading blanks are significant in key 1; consider also specifying `b' > a B b 0 > ___ > _______ > A b b 1 > ___ > _______ > A B b 0 > ___ > _______ > $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2 > ../coreutils/src/sort: using simple byte comparison > A B b 0 > ___ > _______ > A b b 1 > ___ > _______ > a B b 0 > ___ > _______ > > See the difference? In the C locale, you get ascii sorting (A comes before B > comes before a comes before b), in the en_US.UTF-8 locale, you get dictionary > collation sorting (a comes before A comes before b comes before B). > > -- > Eric Blake [email protected] +1-801-349-2682 > Libvirt virtualization library http://libvirt.org > > > > ---------- Wiadomość przekazana dalej ---------- > From: "Michał Janke" <[email protected]> > To: [email protected] > Date: Thu, 1 Sep 2011 10:58:58 +0200 > Subject: case sensitivity buggy in sort > sort (GNU coreutils) 8.12 > > The case-sensitivity looks buggy in sort. Have a look at these examples: > > $ cat bbb > A B b 0 > a B b 0 > A b b 1 > > $ sort bbb > a B b 0 > A B b 0 > A b b 1 > > $ sort -k1,2 bbb > a B b 0 > A b b 1 > A B b 0 > > > $ cat ccc > A 2 b 0 > a 2 b 0 > A 1 b 1 > > $ sort ccc > A 1 b 1 > a 2 b 0 > A 2 b 0 > > $ sort -k1 ccc > A 1 b 1 > a 2 b 0 > A 2 b 0 > > $ sort -k1,2 ccc > A 1 b 1 > a 2 b 0 > A 2 b 0 > > $ sort -k1,1 ccc > a 2 b 0 > A 1 b 1 > A 2 b 0 > > > $ cat ddd > A2 b 0 > a2 b 0 > A1 b 1 > > $ sort ddd > A1 b 1 > a2 b 0 > A2 b 0 > > $ sort -k1 ddd > A1 b 1 > a2 b 0 > A2 b 0 > > $ sort -k1,1 ddd > A1 b 1 > a2 b 0 > A2 b 0 > > $ sort -k1,2 ddd > A1 b 1 > a2 b 0 > A2 b 0 > > $ sort -k1,3 ddd > A1 b 1 > a2 b 0 > A2 b 0 > > > >
I definitely don't agree with "locale issue" explanation. This is not a problem of some letter being treated as > or < than other - the problem is that it is _sometimes_ one way, sometimes the other! Please have a closer look at this one: $ cat aaa aa 1 AA 1 Aa 0 Now consider what should be the output of sort in two cases: A>a and A<a. If A>a, the result should be aa 1 Aa 0 AA 1 If A<a, it should be AA 1 Aa 0 aa 1 And now the actual result: $ sort aaa Aa 0 aa 1 AA 1 So the lines are sorted in first place according to the second column! But true, when locale is changed to native POSIX, the sorting is done reasonably $ LC_ALL=C sort aaa AA 1 Aa 0 aa 1 So yes, the bug is visible only with non-standard defined locale, but _no_ - the results in cases of other locales are not correct. The capital and lower-case letters seem to just aliased.
