Thomas Habets <[EMAIL PROTECTED]> writes: >> sort: Add an ordering option -R that causes 'sort' to sort according >> to a random permutation of the correct sort order. > > This means that two different files, that happen to sort to the same output, > should give the same output when randomized with the same SEED. Is that > right? [*]
Sort of, but not quite. The idea was that if the input is this: 1 3 2 3.0 4 3.000 5 and we are using sort -n so the correct sort order is this: 1 2 3 4 5 3.0 3.000 (where I've put "3", "3.0", and "3.000" atop each other to indicate that these three values are all tied), then sort -R would permute this order randomly, e.g. to: 3 1 4 5 2 3.0 3.000 and then sort as if that were the correct order. Part of the idea is that all the "3", "3.0", and "3.000" values will sort together, even when sorting "randomly", since they are all the same number. > I'm not a sorting-expert, but this would mean running through the > input twice, would it not? Perhaps, perhaps not. Let's figure out the proper semantics first before we worry about implementation details. > Is there a good reason for wanting this? By "this" do you mean "a fairly-formal definition", or "this particular definition of random sorting"? If the former, we need to define what "sort randomly" means, in a way that would satisfy someone trying to specify it fairly formally, and so that others can implement it according to our spec. If the latter, then because we want sort -R to have the usual properties that people expect from "sort", e.g., "sort -rR" should output in the reverse order of "sort -R". >> if you sort a permutation of the same input file >> with the same --random-seed=SEED option twice, you'll get the same >> output. [**] > > Here however it does not explicitly say what I said above about two different > files. If two files sort to the same output, then they're permutations of each other. So [**] implies [*]. (The converse does not hold. See what I mean about the logic being tricky here?...) > So what is "random"? Do we mean "arbitrary" or "unpredictable"? That's a deep subject. I suggest Knuth Volume 2. http://www-cs-faculty.stanford.edu/~knuth/taocp.html > Also, should "deteministic" here mean that it should now, and forever, and on > all architectures, work the same? Yes. _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/bug-coreutils