bug#23677: sort --debug not ignoring punctuation when sort does

Karl Berry Wed, 01 Jun 2016 15:16:58 -0700

Consider this two-line input file:
M !z
M /a
(! = ASCII 33; / = ASCII 47.)

Locale-dependent sort with debug:
LC_ALL=en_US.UTF-8 sort --debug -k2 /tmp/foo


Output:
sort: using âen_US.UTF-8â sorting rules
..
M /a
 ___
____
M !z
 ___
____

Due to the locale rules, the punctuation characters are being ignored
(presumably), or ! would sort before / (as it does with the LC_ALL=C
sort).  Therefore it seems the debug output would be closer to reality
if it was:

M /a
 _ _
____
M !z
 _ _
____

(I think; I'm not sure if all blanks are ignored in the locale
sort, or just multiple blanks collapsed to one.)

I realize that, in terms of mere string parsing, the punctuation is
included in the sort key.  But when a character is not actually used for
sorting, and the --debug output says it is, that seems suboptimal.
(Especially when the rules are, for all practical purposes,
undocumented.)

I also realize it is not necessarily feasible to change, even if there's
agreement on changing it.

@curmudgeon
How anyone can do anything useful with en_US.UTF-8 sort is beyond me ...
@end curmudgeon

Ok, no more from me in this area, you can be glad to know. --karl

bug#23677: sort --debug not ignoring punctuation when sort does

Reply via email to