On Wed, Oct 24, 2001 at 04:15:08PM +0100, Lucy McWilliam wrote:
> 
> Hmm, this is where admit I'm crap and provide the CFT club with something
> to ponder.  I have made pairwise comparisons of a few thousand flies
> and stored the similarity scores in a *large* file in a particular order
> (to save me storing their names and making the file even bigger).
> 
> e.g.
> fly0_versus_fly0_score
> fly0_versus_fly1_score
> fly0_versus_fly2_score
> fly1_versus_fly1_score
> fly1_versus_fly2_score
> fly2_versus_fly2_score
> 
> What I'd like to end up with for each comparison is 2 numbers - the
> position of fly_X in a list of numerically descending fly_Y scores -
> e.g. highest score gets rank 1 - and vice versa.  Self comparisons
> should not be included.  

At the moment your datafile contains x ** 2 + x / 2 data items.  If you're
happy with an array with that many scalars in it then you can probably get
a speed up from something like this:

Firstly read all the data into an array, then get rid of the prepare sub.

$N = 3000;      # number of flies

my @rank;
my @line = <FILE>;
chomp @line; 

for my $i (0 .. ($N - 1)) {
  my %scores;
  my $index = $i;
  for my $j (0 .. ($N - 1)) {
    $scores{$index} = $line[$index];
    $index += $j < $i ? $N - $j - 1 : 1;
  };

  my $rank;
  foreach $key (sort { $scores{$b} <=> $scores{$a} } keys (%scores)) {
     $rank[$key] .= " $rank" if ($rank++);
   }
} 

foreach my $n (0 .. $#line) {
  print $line[$n] . $rank[$n];
}

cheers

Andrew

Reply via email to