Re: Shuffling

Walt Mankowski Wed, 09 Dec 2009 17:00:03 +0100 (CET)

On Wed, Dec 09, 2009 at 02:24:40PM +0000, Smylers wrote:
> Roger Burton West writes:
> 
> > ... take a list and put it into a random order... a Perl program.  Now
> > GNU sort has grown a -R option which does basically the same thing.
> > GNU sort is in C, so that might be faster, right?
> 
> The doc for -R says:
> 
>   -R, --random-sort
>          sort by random hash of keys
> 
> Which implies its first generating a random key for each item in the
> list, then sorting by those keys, using the same sorting algorithm it
> would for 'normal' sorting.
> 
> So it's hardly surprising that's slower than a dedicated shuffle
> algorithm.


Indeed.  The Fisher-Yates shuffle algorithm is O(n), while sorting is
O(n log n).

> > Over a 180K-entry corpus, it's taking around four and a half seconds,
> > as against the 0.6 seconds of my Perl program.
> 

log_2(180000) is about 17.5, so that's roughly in the right ballpark,
considering the normal Perl vs C speed differences.

Walt

Re: Shuffling

Reply via email to