2021-07-04 15:47:55 +0700, Robert Elz via austin-group-l at The Open Group: > Date: Fri, 2 Jul 2021 14:41:50 +0100 > From: "Geoff Clare via austin-group-l at The Open Group" > <austin-group-l@opengroup.org> > Message-ID: <20210702134150.GB16587@localhost> > > | I've always assumed that the intention for -c is to answer the > | question "if I ran this command without -c would the output be > | the same as the input?" So the NetBSD behaviour seems wrong > | to me. > > But: > jinx$ printf '%s\n' a,b a,a > a,b > a,a > jinx$ printf '%s\n' a,b a,a | sort -t, -k1,1 > a,b > a,a
That would make is non-compliant then. SUS> When there are multiple key fields, later keys shall be SUS> compared only after all earlier keys compare equal. Except SUS> when the -u option is specified, lines that otherwise ^^^^^^^^^^^^^^^^^^^^ SUS> compare equal shall be ordered as if none of the options ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SUS> -d, -f, -i, -n, or -k were present (but with -r still in ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SUS> effect, if it was specified) and with all bytes in the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SUS> lines significant to the comparison. The order in which ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SUS> lines that still compare equal are written is unspecified. [...] > ie: When -k args are given, there is no fallback to "whole record" matching, > if one wants that, one can easily add a final -k1 option to make that happen: [...] > which is the way it should be - if one has taken the trouble to specify > what parts of the record are the keys for sorting (and -u comparisons) > then sort should not be gratuitously adding more - that it used to do so > was widely regarded as a bug (especially given that there was no way to > defeat it, but enabling it is so simple when it is not the default). [...] I don't know what the original rationale was, but /one/ rationale could be to garantee a deterministic and total order, to make sure that two files with the same lines (though in different orders) result in the same output when sorted whatever the sorting specification. That guarantee is broken in locales that don't have total order which was the subject of recent changes. POSIX sort does sort as specified, and in cases where the user doesn't say (sort key same but line different), makes one of several possible decisions, in that case last resort comparison of the full line (and resort to memcmp() comparison when strcoll() find them equal if need be), whilst NetBSD sort uses the original order. Note that POSIX doesn't require the order be stable, leaves it unspecified what the selected one is for sort -uk1,1 for instance. -- Stephane