Hi all!

Well, based on the input I have received from everyone thus far I have been able to cobble the following code together (See below for the input and out put of of this script).

Anyway, though it works great I am having a tough time trying to figure out WHY it works. I am especially having trouble with the line: "next unless s/^\s*(\S+)//" in relation to the while loop it is in. Basically, I do not understand how the script is differentiating the ">bob" line in the input from the lines of "agactgatcg" (again see input and output at bottom). I know that the "$/" has something to do with this, but I am not sure how or why it works.

I hate to sound like a dummy, but if anyone can help me understand WHAT the script is doing in the "while loop" I would really appreciate it. I think if I can understand the mechanics behind this script it will only help me my future understanding of writing PERL scripts. Especially, when it comes to regular expressions and loops. Heck, if there is a better way to do certain parts of this let me know! Also, special thanks to James Gray for the help thus far!! Till then, I'll be wracking my head with my PERL books!

The working script:
_________

#!/usr/bin/perl

use warnings;
use strict;

print "Enter the path of the INFILE to be processed:\n";

# For example "rotifer.txt" or "../Desktop/Folder/rotifer.txt"

chomp (my $infile = <STDIN>);

open(INFILE, $infile)
                or die "Can't open INFILE for input: $!";

print "Enter in the path of the OUTFILE:\n";

# For example "rotifer_out.txt" or "../Desktop/Folder/rotifer_out.txt"

chomp (my $outfile = <STDIN>);

open(OUTFILE, ">$outfile")
                or die "Can't open OUTFILE for input: $!";

print "Enter in the LENGTH you want the sequence to be:\n";
my ( $len ) = <STDIN> =~ /(\d+)/ or die "Invalid length parameter";


print OUTFILE "R 1 $len\n\n\n\n"; # The top of the file.


$/ = '>'; # Set input operator

while ( <INFILE> ) {
    chomp;
    next unless s/^\s*(\S+)//;
    my $name = $1;
    my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];
    my $sequence = join( ' ', @char);
    $sequence =~ tr/Tt/Uu/;
    print OUTFILE " $sequence       $name\n";
    }


close INFILE; close OUTFILE;

___________

Again this script is to convert the following data existing as either single line or multiline sequence data:

### input type 1 ###
>bob
atcgactagcatcgatcg
acacgtacgactagcac

>fred
actgactacgatcgaca
acgcgcgatacggcat
#####

or (as I posted originally)

### input type 2 ###
>bob
atcgactagcatcgatcgacacgtacgactagcac

>fred
actgactacgatcgacaacgcgcgatacggcat
#####

###output##
## Note that the T's are converted to U's in the output! ##

R 1 42


a u c g a c u a g c a u c g a u c g a c a c g u a c g a c u a g c a c - - - - - - - bob
a c u g a c u a c g a u c g a c a a c g c g c g a u a c g g c a u - - - - - - - - - fred


####



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to