On Feb 4, 2004, at 11:35 AM, Michael S. Robeson II wrote:


Hi I am all still to new to PERL and I am having trouble playing with formatting my data into a new format. So here is my problem:

I have data (DNA sequence) in a file that looks like this:

####
# Infile
####
>bob
AGTGATGCCGACG
>fred
ACGCATATCGCAT
>jon
CAGTACGATTTATC

and I need it converted to:

####
# Outfile
####
R 1 20

 A G U G A T G C C G A C G - - - - - - -       bob
 A C G C A U A U C G C A U - - - - - - -       fred
 C A G U A C G A U U U A U C - - - - - -       jon


The "R 1" is static and should always appear. The "20" at the top of the new file should be a number defined by the user, that is they should be prompted for the length they wish the sequence to be. That is the total length of the sequence plus the added dashes could be 20 or 3000 or whatever. So, if they type 20 and there is only 10 letters in that row then the script should add 10 dashes to bring that total up to the 20 chosen by the user.


Note that there should be a space between all letters and dashes - including a space at the beginning. Then there are supposed to be 7 spaces after the sequence string followed by the name as shown in the example output file above. Also, of note is the fact that all of the T's are changed to U's. For those of you that know biology I am not only switching formats of the data but also changing DNA to RNA.

I hope I am explaining this clear enough, but here (see below) is as far as I can get with the code. I just do not know how to structure the loop/code to do this. I always have trouble with manipulating data the way I want when it comes to a loop. I would prefer an easier to understand code rather than an efficient code. This way I can learn the simple stuff first and learn the short-cuts later. Thanks to anyone who can help.

- Cheers!
- Mike

######
#!/usr/bin/perl
use warnings;
use strict;

print "Enter the path of the INFILE to be processed:\n";

# For example "rotifer.txt" or "../Desktop/Folder/rotifer.txt"

chomp (my $infile = <STDIN>);

open(INFILE, $infile)
                or die "Can't open INFILE for input: $!";

print "Enter in the path of the OUTFILE:\n";

# For example "rotifer_out.txt" or "../Desktop/Folder/rotifer_out.txt"

chomp (my $outfile = <STDIN>);

open(OUTFILE, ">$outfile")
                or die "Can't open OUTFILE for input: $!";

print "Enter in the LENGTH you want the sequence to be:\n";
my ( $len ) = <STDIN> =~ /(\d+)/ or die "Invalid length parameter";


print OUTFILE "R 1 $len\n\n\n\n"; # The top of the file is supposed

my $name;
while (<INFILE>) {
chomp;
if (/^>(\w+)/) { $name = $1; }
else {
tr/T/U/; # convert Ts to Us
substr($_, $len) = '' if length($_) > $len; # shorten, if needed
$_ .= '.' x ($len - length($_)) if length($_) < $len; # lengthen, if needed
s/\b|\B/ /g; # add spaces
print OUTFILE "$_ $name\n"; # print
}
}


Hope that helps.

James


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to