tag 28847 notabug
thanks

On 10/15/2017 02:58 AM, kakaxixi777 wrote:
>    Dear coreutils :
>    I am a Research and Development Engineer in IT. I met a situation when
>    I use “sort” command in Linux shell which could be a bug for the "sort"
>    command. So I hope you read this email, thank you !
>    The whole command I used was :
>    sort test.txt
>    And the result was :
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
>    20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
>    20171012|3|2059517|-|82|-|30-34|0|-|2.0|1.0
>    The content in test.txt was:
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
>    20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
>    20171012|3|2059517|-|82|-|30-34|0|-|2.0|1.0

Your situation is a FAQ:
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

Most likely, you are sorting in a locale that does not treat punctuation
with the same weight as digits, such as en_US.UTF8.  If you'll notice,
the substring '8202' sorts before '8225' which in turn is before '8227'
and finally '8230', once you've ignored the punctuation in '8|-|20-2',
'82|-|25', and so forth.

>    Which means the “sort” command didn't work, because I think the correct
>    result should be :
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
>    20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0

Well, this isn't the right result either, as it is duplicating two lines
and missing two others (did you copy and past incorrectly?).

>    The version of "sort" command I use is : sort --version
>    "sort (GNU coreutils) 8.4

This version is rather old; we are now at 8.28.  But even as recently as
version 8.6, you can use sort's --debug feature to see where your
expectations are going wrong (as 99% of reports about sort misbehavior
turn out to instead be problems of misuse of either command line options
or current locale).  Observe the difference:

$ printf
'20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0\n20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0\n20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0\n'
| LC_ALL=en_US.UTF8 sort  --debug
sort: using ‘en_US.UTF8’ sorting rules
20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
__________________________________________
20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
_____________________________________________
20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
__________________________________________

$ printf
'20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0\n20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0\n20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0\n'
| LC_ALL=C sort  --debug
sort: using simple byte comparison
20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0
_____________________________________________
20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0
__________________________________________
20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0
__________________________________________

And if you want the lines containing '|8|' to sort before the lines
containing '|82|', then you can't use plain sort (which is over the
whole line), but instead need to use various -k, -n, and -t options to
tell sort where the keys are separated and which keys to sort on, and
the fact that the keys should be treated as numbers rather than as
character strings (since when sorting an entire line in ASCII, digits
sort before |).

>    I am not sure if it is a bug in "sort" command in Linux Shell or maybe
>    it's only my problems in using it.

I think I've demonstrated where the problem was, so I'm closing this as
not a bug.  Feel free to reply with further questions on the topic, though.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to