> ok, seriously, we'll also assume no two entries have
> the same number, and if
> they did you'd want to delete repeats. This makes
> things a lot easier.
>
> #! perl
>
> open FILE, 'file.txt';
> @list = <FILE>; # get list into array or by some
> other means
> # keep the line breaks if you can
> for $i (0..$#list) {
> $list[$i] =~ /(....)(\d*)/
> $sortedlist[$2] = $list[$i]; # create each
> element of the
> # new list
> }
> print @sortedlist;
>
> The nice thing is if you have gaps in the array
> (e.g. elements 2,3,4 exist
> but 5-83 don't) it really won't matter.
It does matter a little in that if you have big gaps
in the array (e.g., @list = ("exon1", "exon3908239")),
you end up creating a huge array that stores only a
few elements.
A hash-style approach may be a little more efficient
(so we don't waste buckets) while still maintaining
the feature that avoids multiple entries:
sub sort_custom {
my(%sorted);
for(@_) {
/(\d+)$/;
$sorted{$1} = $_;
}
return map {$sorted{$_}} sort {$a<=>$b} keys
%sorted;
}
However, using Benchmark on my DV iMac 400, it appears
that all our approaches are roughly equivalent (with
my hash-style approach sadly coming up dead last):
use Benchmark;
my @list = qw(
exon1
exon5
exon12
exon30
exon2
);
timethese(100000, {
'sorted_with_custom' => sub { @ary =
sort_custom(@list) },
' sorted_with_array' => sub { @ary = sort by_exon_num
@list },
'sorted_by_exon_num' => sub { @ary =
sort_custom(@list) }
} );
sub sort_custom {
my(%sorted);
for(@_) {
/(\d+)$/;
$sorted{$1} = $_;
}
return map { $sorted{$_} } sort { $a <=> $b } keys
%sorted;
}
sub by_exon_num {
$a =~ /(\d+)/;
my $a_dig = $1;
$b =~ /(\d+)/;
my $b_dig = $1;
$a_dig <=> $b_dig;
}
sub sort_array {
my(@sortedlist);
for my $i(0..$#list) {
$list[$i] =~ /(....)(\d*)/;
$sortedlist[$2] = $list[$i];
}
return @sortedlist;
}
With results as follows:
Benchmark: timing 100000 iterations of
sorted_with_array, sorted_by_exon_num,
sorted_with_custom...
sorted_with_array: 18 secs (17.45 usr 0.00 sys =
17.45 cpu)
sorted_by_exon_num: 19 secs (19.03 usr 0.00 sys =
19.03 cpu)
sorted_with_custom: 20 secs (19.33 usr 0.00 sys =
19.33 cpu)
> I'm sure it can get really complicated if you have
> many different combos of
> letters at the beginning. But if you can separate
> those out into separate
> lists then run the subroutine over each of them,
> that'll do it.
Well, even if the number of leading characters is
variable, simply catching the trailing digits (i.e.,
using /(.\d+)$/) should elminate any unnecessary
complexity that stems from that problem.
I sure had a lot of time on my hands today :-)
Regards,
David
>----- Original Message -----
>
> Hi,
>
> I am trying to sort a list like this
>
> exon1
> exon5
> exon12
> exon30
> exon2
>
> Into ->
>
> exon1
> exon2
> exon5
> exon12
> exon30
>
> Any ideas on how to do this?
>
> Thanks
>
> adam
__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/