tag 28847 notabug thanks On 10/15/2017 02:58 AM, kakaxixi777 wrote: > Dear coreutils : > I am a Research and Development Engineer in IT. I met a situation when > I use “sort” command in Linux shell which could be a bug for the "sort" > command. So I hope you read this email, thank you ! > The whole command I used was : > sort test.txt > And the result was : > 20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0 > 20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0 > 20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0 > 20171012|3|2059517|-|82|-|30-34|0|-|2.0|1.0 > The content in test.txt was: > 20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0 > 20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0 > 20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0 > 20171012|3|2059517|-|82|-|30-34|0|-|2.0|1.0
Your situation is a FAQ: https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 Most likely, you are sorting in a locale that does not treat punctuation with the same weight as digits, such as en_US.UTF8. If you'll notice, the substring '8202' sorts before '8225' which in turn is before '8227' and finally '8230', once you've ignored the punctuation in '8|-|20-2', '82|-|25', and so forth. > Which means the “sort” command didn't work, because I think the correct > result should be : > 20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0 > 20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0 > 20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0 > 20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0 Well, this isn't the right result either, as it is duplicating two lines and missing two others (did you copy and past incorrectly?). > The version of "sort" command I use is : sort --version > "sort (GNU coreutils) 8.4 This version is rather old; we are now at 8.28. But even as recently as version 8.6, you can use sort's --debug feature to see where your expectations are going wrong (as 99% of reports about sort misbehavior turn out to instead be problems of misuse of either command line options or current locale). Observe the difference: $ printf '20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0\n20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0\n20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0\n' | LC_ALL=en_US.UTF8 sort --debug sort: using ‘en_US.UTF8’ sorting rules 20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0 __________________________________________ 20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0 _____________________________________________ 20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0 __________________________________________ $ printf '20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0\n20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0\n20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0\n' | LC_ALL=C sort --debug sort: using simple byte comparison 20171012|3|2059517|-|82|-|25-29|2|-|13.0|12.0 _____________________________________________ 20171012|3|2059517|-|8|-|-2|-2|-|71.0|64.0 __________________________________________ 20171012|3|2059517|-|8|-|20-24|2|-|2.0|2.0 __________________________________________ And if you want the lines containing '|8|' to sort before the lines containing '|82|', then you can't use plain sort (which is over the whole line), but instead need to use various -k, -n, and -t options to tell sort where the keys are separated and which keys to sort on, and the fact that the keys should be treated as numbers rather than as character strings (since when sorting an entire line in ASCII, digits sort before |). > I am not sure if it is a bug in "sort" command in Linux Shell or maybe > it's only my problems in using it. I think I've demonstrated where the problem was, so I'm closing this as not a bug. Feel free to reply with further questions on the topic, though. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature