On Thu, 2 Jun 2005, Frederik Eaton wrote: >Phil> Is it that the app must guarantee all lines of a >Phil> non-seekable stdin must have an equal chance of any sort order? > >See my comment to James above. I think one need not make this >guarantee, since only a tiny fraction of possible sort orders will be >able to be tried by the user. However, I think it would be true for my >proposal (and the one that James suggested to Davis) if one took the >random number generator or hash function to be variable or unspecified >during analysis of the algorithm.
Thinking further on this, I don't think it matters to the guts of sort whether the ultimate random key is based on position hash or PRNG. >One possibility for an efficient random permutation of a large file is >a program which scans the file once and records the location of each >line in an index, then applies a uniformly distributed random >permutation to this line index by stepping through it in order and >swapping the current entry with an entry chosen at random from the set >of later entries, and finally steps through the index again and >dereferences each line entry and prints out the corresponding line in >the original file. That approach is fine on seekable files, but the user may wish to shuffle from stdin. sort already knows how to do this. Here's a concrete example of Paul's suggestion as I understand it: ---INPUT--- a 3 b 2 c 1 d 1 ---INPUT--- sort -R (or -k R) creates an imaginary field and sorts only on that: ---IMAGINARY INPUT--- a 3 0x293a b 2 0xc9f4 c 1 0xa932 d 1 0x5ef5 ---IMAGINARY INPUT--- ---SORTED IMAGINARY OUTPUT--- a 3 0x293a d 1 0x5ef5 c 1 0xa932 b 2 0xc9f4 ---SORTED IMAGINARY OUTPUT--- ---ACTUAL OUTPUT--- a 3 d 1 c 1 b 2 ---ACTUAL OUTPUT--- Of course the random key should be long enough to ensure a negligible chance of repetition (128 bits?) The user is free to make use of the random key and other keys to create random subsets of an overall-sorted list. I'm still interested to read what Paul considers to be the difficulties of such an implementation? Cheers, Phil _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils