On Wed, 24 Oct 2001, Lucy McWilliam wrote:

> Fly ranking problem.

Right, let me get this straight.  You have a big old list of data that has
scores of X verses Y.

What you want is rankings for each X verses Y.

I suggest you first modify your data lines to contain both the name of the
X and the name Y as well as their score (you can use code numbers or
references if you want - preferably something you can easily parse with
unpack quickly). At this time if you've got flyX_flyY_score in your
datafile but not flyY_flyX_score then you're going to want to add that in.
This will double the number of lines.  Oh well.

Now you can sort the data without losing what you're talking about.  Sort
on X value and then on score if the X value is the same (i.e. within
ranking sorting.)

If you've got a *lot* of data then you're probably going to want to use a
disk based mergesort.  This will be slow.  This is the nature of things.
It's a lot faster than quicksort that has to page because you've hit the
limit of memory.  You can speed this up by only breaking into chunks of
500 lines or so and then sorting those with quicksort rather than dropping
down to one line files like a true mergesort.

You can then run over this list again if you want to and split out all the
files so you have a separate file for each fly.  Or you could just ignore
the problem completely and binary search on the big massive file every
time you wanted to find something (slower)

Is that going to be faster than what you're doing already?  Maybe.  Try it
and see.  Be warned, I'm not entirely sure what I'm talking about...but it
works for other really big files that I can't possible comment about.

Later.

Mark.

-- 
s''  Mark Fowler                                     London.pm   Bath.pm
     http://www.twoshortplanks.com/              [EMAIL PROTECTED]
';use Term'Cap;$t=Tgetent Term'Cap{};print$t->Tputs(cl);for$w(split/  +/
){for(0..30){$|=print$t->Tgoto(cm,$_,$y)." $w";select$k,$k,$k,.03}$y+=2}


Reply via email to