On Wed, Oct 24, 2001 at 04:15:08PM +0100, Lucy McWilliam wrote: > > Hmm, this is where admit I'm crap and provide the CFT club with something > to ponder. I have made pairwise comparisons of a few thousand flies > and stored the similarity scores in a *large* file in a particular order > (to save me storing their names and making the file even bigger). > > e.g. > fly0_versus_fly0_score > fly0_versus_fly1_score > fly0_versus_fly2_score > fly1_versus_fly1_score > fly1_versus_fly2_score > fly2_versus_fly2_score > > What I'd like to end up with for each comparison is 2 numbers - the > position of fly_X in a list of numerically descending fly_Y scores - > e.g. highest score gets rank 1 - and vice versa. Self comparisons > should not be included.
At the moment your datafile contains x ** 2 + x / 2 data items. If you're happy with an array with that many scalars in it then you can probably get a speed up from something like this: Firstly read all the data into an array, then get rid of the prepare sub. $N = 3000; # number of flies my @rank; my @line = <FILE>; chomp @line; for my $i (0 .. ($N - 1)) { my %scores; my $index = $i; for my $j (0 .. ($N - 1)) { $scores{$index} = $line[$index]; $index += $j < $i ? $N - $j - 1 : 1; }; my $rank; foreach $key (sort { $scores{$b} <=> $scores{$a} } keys (%scores)) { $rank[$key] .= " $rank" if ($rank++); } } foreach my $n (0 .. $#line) { print $line[$n] . $rank[$n]; } cheers Andrew