Hello, I encountered a file that was taking hours to sort that was expected to take negligible time. This seems to be due to the locale LANG=en_US.UTF-8. I've worked around the problem by using LC_ALL=C, but thought I would report this, as I didn't see a relevant bug report.
This was seen on centos 8 using package coreutils-8.30-6.el8.x86_64 and the current coreutils-8.30-8.el8.x86_64 #takes under 1 second. export LC_ALL=C sort tst00776.out #slow sort takes many hours export LC_ALL=en_US.UTF-8 sort tst00776.out Looks like most of the time is consumed here: #0 0x00007f4a65425c4b in strcoll_l () from /lib64/libc.so.6 #1 0x00005600d195d365 in strcoll_loop () #2 0x00005600d195bebd in xmemcoll0 () #3 0x00005600d1951176 in compare () #4 0x00005600d1951224 in sequential_sort () #5 0x00005600d19511d5 in sequential_sort () #6 0x00005600d195374b in sortlines () #7 0x00005600d194d96b in main () It's possible the input (attached) has invalid UTF-8. I also tried on an older RHEL 7 and did NOT reproduce the problem with coreutils.x86_64 8.22-23.el7 Thanks, Jon Klaas
tst00776.out.gz
Description: GNU Zip compressed data