Dave Storrs wrote:

>         Hmmm...this is interesting.  A friend of mine who is in the
> process of getting her graduate degree in CS/information theory stuff
> recently told me that it has been mathematically proven that no sort can
> run faster than O(n log n) unless you know something about the incoming
> data.  I guess that radix sort _only_ works on strings (or stringified
> things), then?

Radix works on numbers, too; you start with the least significant digit
and work to the left. Of course, I think of numbers as stringified, anyways.
There's no difference between base 10 and base n where n is the number of
characters.

Radix sorts in columns. So, if you are sorting 5 digit numbers, you make
5 passes through the data. This would be o(5n), but as n gets huge, the
contant 5 really doesn't matter, so you drop it and it becomes O(n). Some
algorithms are O(n^2) but really take o(n^2 + 2n + 1) to run, but big O
makes it (n^2) because we're interested in what happens as n goes towards
infinity.

To take a stab at your question about "knowing something about the incoming data",
and I'm guessing here, we do know how many columns of data we have,
and therefore know just how big our constant in front of the n is. If this
constant is too big (bigger than log n I would guess),
we might want to consider a different sort.

In order to use radix, we'd probably also want to know that our data is not
"in order" or even "close to order". Part of the problem is that we now are
taking extra time to look at our data and analyse it before deciding what
sort to use. Maybe at this point you might want to just use mergesort
or quicksort and be done with it. I'm going to go mess with Benchmark.pm
and see what happens.

Okay, maybe it's wandering a little far from Perl Beginners, but it sure is
interesting!

        Pete


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to