2011/9/2 Michał Janke <[email protected]>: > 2011/9/1 GNU bug Tracking System <[email protected]>: >> Your bug report >> >> #9418: case sensitivity buggy in sort >> >> which was filed against the coreutils package, has been closed. >> >> The explanation is attached below, along with your original report. >> If you require more details, please reply to [email protected]. >> >> -- >> 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418 >> GNU Bug Tracking System >> Contact [email protected] with problems >> >> >> ---------- Wiadomość przekazana dalej ---------- >> From: Eric Blake <[email protected]> >> To: "Michał Janke" <[email protected]> >> Date: Thu, 01 Sep 2011 10:32:45 -0600 >> Subject: Re: bug#9418: case sensitivity buggy in sort >> tag 9418 notabug >> thanks >> >> On 09/01/2011 02:58 AM, Michał Janke wrote: >>> >>> sort (GNU coreutils) 8.12 >>> >>> The case-sensitivity looks buggy in sort. Have a look at these examples: >> >> Thanks for the report. However, this is most likely due to your choice of >> locale, and not a bug in sort; this is a FAQ: >> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021 >> >> Using 'sort --debug' will help expose the issue. >> >>> $ sort -k1,2 bbb >>> a B b 0 >>> A b b 1 >>> A B b 0 >> >> $ sort --debug bbb -k1,2 >> sort: using `en_US.UTF-8' sorting rules >> sort: leading blanks are significant in key 1; consider also specifying `b' >> a B b 0 >> ___ >> _______ >> A b b 1 >> ___ >> _______ >> A B b 0 >> ___ >> _______ >> $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2 >> ../coreutils/src/sort: using simple byte comparison >> A B b 0 >> ___ >> _______ >> A b b 1 >> ___ >> _______ >> a B b 0 >> ___ >> _______ >> >> See the difference? In the C locale, you get ascii sorting (A comes before >> B comes before a comes before b), in the en_US.UTF-8 locale, you get >> dictionary collation sorting (a comes before A comes before b comes before >> B). >> >> -- >> Eric Blake [email protected] +1-801-349-2682 >> Libvirt virtualization library http://libvirt.org >> >> >> >> ---------- Wiadomość przekazana dalej ---------- >> From: "Michał Janke" <[email protected]> >> To: [email protected] >> Date: Thu, 1 Sep 2011 10:58:58 +0200 >> Subject: case sensitivity buggy in sort >> sort (GNU coreutils) 8.12 >> >> The case-sensitivity looks buggy in sort. Have a look at these examples: >> >> $ cat bbb >> A B b 0 >> a B b 0 >> A b b 1 >> >> $ sort bbb >> a B b 0 >> A B b 0 >> A b b 1 >> >> $ sort -k1,2 bbb >> a B b 0 >> A b b 1 >> A B b 0 >> >> >> $ cat ccc >> A 2 b 0 >> a 2 b 0 >> A 1 b 1 >> >> $ sort ccc >> A 1 b 1 >> a 2 b 0 >> A 2 b 0 >> >> $ sort -k1 ccc >> A 1 b 1 >> a 2 b 0 >> A 2 b 0 >> >> $ sort -k1,2 ccc >> A 1 b 1 >> a 2 b 0 >> A 2 b 0 >> >> $ sort -k1,1 ccc >> a 2 b 0 >> A 1 b 1 >> A 2 b 0 >> >> >> $ cat ddd >> A2 b 0 >> a2 b 0 >> A1 b 1 >> >> $ sort ddd >> A1 b 1 >> a2 b 0 >> A2 b 0 >> >> $ sort -k1 ddd >> A1 b 1 >> a2 b 0 >> A2 b 0 >> >> $ sort -k1,1 ddd >> A1 b 1 >> a2 b 0 >> A2 b 0 >> >> $ sort -k1,2 ddd >> A1 b 1 >> a2 b 0 >> A2 b 0 >> >> $ sort -k1,3 ddd >> A1 b 1 >> a2 b 0 >> A2 b 0 >> >> >> >> > > I definitely don't agree with "locale issue" explanation. This is not > a problem of some letter being treated as > or < than other > - the problem is that it is _sometimes_ one way, sometimes the other! > Please have a closer look at this one: > > $ cat aaa > aa 1 > AA 1 > Aa 0 > > Now consider what should be the output of sort in two cases: A>a and A<a. > If A>a, the result should be > aa 1 > Aa 0 > AA 1 > > If A<a, it should be > AA 1 > Aa 0 > aa 1 > > And now the actual result: > > $ sort aaa > Aa 0 > aa 1 > AA 1 > > So the lines are sorted in first place according to the second column! > > But true, when locale is changed to native POSIX, the sorting is done > reasonably > > $ LC_ALL=C sort aaa > AA 1 > Aa 0 > aa 1 > > So yes, the bug is visible only with non-standard defined locale, but > _no_ - the results in cases of other locales are not correct. > The capital and lower-case letters seem to just aliased. >
If it is the _locale_ that decides on upper and lower case letters being equal, then the bug is in locale - the results look absurd. Where should a bugreport about locale go?
