Paul Eggert <[EMAIL PROTECTED]> wrote: > Thomas Habets <[EMAIL PROTECTED]> writes: > >>> sort: Add an ordering option -R that causes 'sort' to sort according >>> to a random permutation of the correct sort order. >> >> This means that two different files, that happen to sort to the same output, >> should give the same output when randomized with the same SEED. Is that >> right? [*] > > Sort of, but not quite.
I couldn't find the "not quite" part of your explanation. >> Is there a good reason for wanting this? > > By "this" do you mean "a fairly-formal definition", or "this > particular definition of random sorting"? [...] If the latter, > then because we want sort -R to have the usual properties that > people expect from "sort", e.g., "sort -rR" should output in the > reverse order of "sort -R". Nit: they shouldn't expect that unless they also specify a seed. But sort -R can still provide this just by permuting the original input order, rather than the correct sort order. If we have a file A, and we do: $ sort -R A > B $ sort -R --seed=deadbeef A > A1 $ sort -R --seed=deadbeef A > A2 $ sort -R --seed=deadbeef B > B1 $ sort -R --seed=deadbeef B > B2 Then we should expect that A1 and A2 have the same contents, and that B1 and B2 have the same contents. But the TODO requirement would also ensure that A1/A2 have the same contents as B1/B2. Is that really needed? I'm also not sure that clustering lines with equivalent sort keys is desirable. >>> if you sort a permutation of the same input file >>> with the same --random-seed=SEED option twice, you'll get the same >>> output. [**] >> >> Here however it does not explicitly say what I said above about two different >> files. > > If two files sort to the same output, then they're permutations of > each other. So [**] implies [*]. (The converse does not hold. See > what I mean about the logic being tricky here?...) No, I think [*] implies [**] only. [*] is the more general case placing a requirement on all permutations of the same input; [**] is the special case where the two files are the same permutation of the same input. paul _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/bug-coreutils