Consider this two-line input file: M !z M /a (! = ASCII 33; / = ASCII 47.) Locale-dependent sort with debug: LC_ALL=en_US.UTF-8 sort --debug -k2 /tmp/foo
Output: sort: using âen_US.UTF-8â sorting rules .. M /a ___ ____ M !z ___ ____ Due to the locale rules, the punctuation characters are being ignored (presumably), or ! would sort before / (as it does with the LC_ALL=C sort). Therefore it seems the debug output would be closer to reality if it was: M /a _ _ ____ M !z _ _ ____ (I think; I'm not sure if all blanks are ignored in the locale sort, or just multiple blanks collapsed to one.) I realize that, in terms of mere string parsing, the punctuation is included in the sort key. But when a character is not actually used for sorting, and the --debug output says it is, that seems suboptimal. (Especially when the rules are, for all practical purposes, undocumented.) I also realize it is not necessarily feasible to change, even if there's agreement on changing it. @curmudgeon How anyone can do anything useful with en_US.UTF-8 sort is beyond me ... @end curmudgeon Ok, no more from me in this area, you can be glad to know. --karl
