bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)

Michał Janke Thu, 01 Sep 2011 23:57:48 -0700

2011/9/2 Michał Janke <[email protected]>:
> 2011/9/1 GNU bug Tracking System <[email protected]>:
>> Your bug report
>>
>> #9418: case sensitivity buggy in sort
>>
>> which was filed against the coreutils package, has been closed.
>>
>> The explanation is attached below, along with your original report.
>> If you require more details, please reply to [email protected].
>>
>> --
>> 9418: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9418
>> GNU Bug Tracking System
>> Contact [email protected] with problems
>>
>>
>> ---------- Wiadomość przekazana dalej ----------
>> From: Eric Blake <[email protected]>
>> To: "Michał Janke" <[email protected]>
>> Date: Thu, 01 Sep 2011 10:32:45 -0600
>> Subject: Re: bug#9418: case sensitivity buggy in sort
>> tag 9418 notabug
>> thanks
>>
>> On 09/01/2011 02:58 AM, Michał Janke wrote:
>>>
>>> sort (GNU coreutils) 8.12
>>>
>>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>>
>> Thanks for the report.  However, this is most likely due to your choice of 
>> locale, and not a bug in sort; this is a FAQ:
>> https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
>>
>> Using 'sort --debug' will help expose the issue.
>>
>>> $ sort -k1,2 bbb
>>> a B b 0
>>> A b b 1
>>> A B b 0
>>
>> $ sort --debug bbb -k1,2
>> sort: using `en_US.UTF-8' sorting rules
>> sort: leading blanks are significant in key 1; consider also specifying `b'
>> a B b 0
>> ___
>> _______
>> A b b 1
>> ___
>> _______
>> A B b 0
>> ___
>> _______
>> $ LC_ALL=C ../coreutils/src/sort --debug bbb -k1,2
>> ../coreutils/src/sort: using simple byte comparison
>> A B b 0
>> ___
>> _______
>> A b b 1
>> ___
>> _______
>> a B b 0
>> ___
>> _______
>>
>> See the difference?  In the C locale, you get ascii sorting (A comes before 
>> B comes before a comes before b), in the en_US.UTF-8 locale, you get 
>> dictionary collation sorting (a comes before A comes before b comes before 
>> B).
>>
>> --
>> Eric Blake   [email protected]    +1-801-349-2682
>> Libvirt virtualization library http://libvirt.org
>>
>>
>>
>> ---------- Wiadomość przekazana dalej ----------
>> From: "Michał Janke" <[email protected]>
>> To: [email protected]
>> Date: Thu, 1 Sep 2011 10:58:58 +0200
>> Subject: case sensitivity buggy in sort
>> sort (GNU coreutils) 8.12
>>
>> The case-sensitivity looks buggy in sort. Have a look at these examples:
>>
>> $ cat bbb
>> A B b 0
>> a B b 0
>> A b b 1
>>
>> $ sort bbb
>> a B b 0
>> A B b 0
>> A b b 1
>>
>> $ sort -k1,2 bbb
>> a B b 0
>> A b b 1
>> A B b 0
>>
>>
>> $ cat ccc
>> A 2 b 0
>> a 2 b 0
>> A 1 b 1
>>
>> $ sort ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1 ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1,2 ccc
>> A 1 b 1
>> a 2 b 0
>> A 2 b 0
>>
>> $ sort -k1,1 ccc
>> a 2 b 0
>> A 1 b 1
>> A 2 b 0
>>
>>
>> $ cat ddd
>> A2 b 0
>> a2 b 0
>> A1 b 1
>>
>> $ sort ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,1 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,2 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>> $ sort -k1,3 ddd
>> A1 b 1
>> a2 b 0
>> A2 b 0
>>
>>
>>
>>
>
> I definitely don't agree with "locale issue" explanation. This is not
> a problem of some letter being treated as > or < than other
> - the problem is that it is _sometimes_ one way, sometimes the other!
> Please have a closer look at this one:
>
> $ cat aaa
> aa 1
> AA 1
> Aa 0
>
> Now consider what should be the output of sort in two cases: A>a and A<a.
> If A>a, the result should be
> aa 1
> Aa 0
> AA 1
>
> If A<a, it should be
> AA 1
> Aa 0
> aa 1
>
> And now the actual result:
>
> $ sort aaa
> Aa 0
> aa 1
> AA 1
>
> So the lines are sorted in first place according to the second column!
>
> But true, when locale is changed to native POSIX, the sorting is done 
> reasonably
>
> $ LC_ALL=C sort aaa
> AA 1
> Aa 0
> aa 1
>
> So yes, the bug is visible only with non-standard defined locale, but
> _no_ - the results in cases of other locales are not correct.
> The capital and lower-case letters seem to just aliased.
>


If it is the _locale_ that decides on upper and lower case letters
being equal, then the bug is in locale - the results look absurd.
Where should a bugreport about locale go?

bug#9418: closed (Re: bug#9418: case sensitivity buggy in sort)

Reply via email to