Re: [PATCH] Makes sort create random order

Paul Eggert Thu, 02 Sep 2004 22:24:19 -0700

[EMAIL PROTECTED] (Paul Jarc) writes:

>> Sort of, but not quite.
>
> I couldn't find the "not quite" part of your explanation.


Well, I tried.  :-)

>> "sort -rR" should output in the reverse order of "sort -R".
>
> Nit: they shouldn't expect that unless they also specify a seed.

Yes, of course.

> But sort -R can still provide this just by permuting the original
> input order, rather than the correct sort order.

I don't understand this claim.  If "sort -R" operates by permuting the
original input order, and then sorts the result, then it will generate
the same output as if it hadn't permuted anything (assuming there are
no ties).

> we do:
> $ sort -R A > B
> $ sort -R --seed=deadbeef A > A1
> $ sort -R --seed=deadbeef A > A2
> $ sort -R --seed=deadbeef B > B1
> $ sort -R --seed=deadbeef B > B2
>
> Then we should expect that A1 and A2 have the same contents, and that
> B1 and B2 have the same contents.  But the TODO requirement would also
> ensure that A1/A2 have the same contents as B1/B2.

Yes, assuming no ties.

> Is that really needed?

If it's not needed, then why is this relevant to "sort"?  You are
asking for a program that randomly permutes its input.  Then let's
design another program to do that, and not get bogged down with how
its features work together with "sort"'s existing zoo of options.

> I'm also not sure that clustering lines with equivalent sort keys is
> desirable.

Again, it depends on whether you want something relevant to the
collating order (i.e., a sort), or you want something that's
completely irrelevant (i.e., a permutation).  If the latter, then I
suspect we should be talking about a different tool.



>>>>> This means that two different files, that happen to sort to the
>>>>> same output, should give the same output when randomized with
>>>>> the same SEED. Is that right? [*]
>>>>     if you sort a permutation of the same input file
>>>>     with the same --random-seed=SEED option twice, you'll get the same
>>>>     output. [**]
>> If two  files sort  to the same  output, then they're  permutations of
>> each other.  So  [**] implies [*].  (The converse  does not hold.  See
>> what I mean about the logic being tricky here?...)
>
> No, I think [*] implies [**] only.  [*] is the more general case
> placing a requirement on all permutations of the same input; [**] is
> the special case where the two files are the same permutation of the
> same input.

Ah, OK, I think see the problem.  By [**] I meant that if you sort two
permutations of the same input file, and use the same random seed for
both sorts, you'll get the same output.  This is roughly the same as
[*], then.  I say "roughly" because it's not clear from either
statement what should be done with ties.


_______________________________________________
Bug-coreutils mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: [PATCH] Makes sort create random order

Reply via email to