Hi all,
I have a file that it looks as it follows:
>name1
ADASDADFSDF
ADASDADFSDF
SDFDSFDSDFDF
>name2
ASDFDFDFFDF
ADFEERERREWR
ADFADFQERQEWR
>name1
ADASDADFSDF
SDFDSFADFDF
SDFDSFDSDFDF
>name3
SDAFSDFDFF
WERWERER
WERWERER
and I want to have something like this:
>name1
ADASDADFSDFADASDADFSDFSDFDSFDSDFDF
>name2
ADASDADFSDFSDFDSFADFDFSDFDSFDSDFDF
>name3
SDAFSDFDFFWERWERERWERWERER
Note that ">name1 is repeated in the input but not in the output.
With the script below I can put everything under ">anyname"
in one line. However, if ">anyname" is repeated I will get a
concatenation and I do not want that
Any help welcome and thanks in advance.
Cheers
#!/usr/sbin/perl
if (!@ARGV) {
print STDERR "usage: $0 fasta_file \n";
exit 0;
}
my $FILE = shift @ARGV;
my @ID;
my %SEQ;
read_alignment($FILE);
foreach my $key ( keys (%SEQ)){ #defines key for each key
printf "%s\n%s\n", $key, $SEQ{$key};
}
sub read_alignment {
my $line;
my ($file) = @_;
#local (*TMP);
open(TMP, $file) or die "can't open file '$file'\n";
while ( $line = <TMP> ) {
chomp($line);
if ($line =~ /(>\S+)\s*/) {#&& (! $SEQ{$1})) {
push (@ID, $1);
}
else {
$SEQ{$1}.= $line;
}
}
close TMP
}
However, I find that
*******************************************************************
PEDRO A. RECHE , pHD TL: 617 632 3824
Dana-Farber Cancer Institute, FX: 617 632 4569
Harvard Medical School, EM: [EMAIL PROTECTED]
44 Binney Street, D1510A, EM: [EMAIL PROTECTED]
Boston, MA 02115 URL:
http://www.reche.org
*******************************************************************