position data
Well, with much help I have ben able to come up with this currently not working code: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my(%gap, %gap_pos, $animal); while (DATA) { if (/(\w+)/) { $animal = $1; } else { while (/(-+)/g) { my $gap_length = length $1; my $position = pos ($1); $gap{$animal}{$gap_length}++; push (@{$gap{$animal}{$gap_length}}, $position); } } } print Dumper \%gap; __DATA__ human acgtt---cgatacg---acgact-t chimp acgtacgatac---actgca---ac mouse acgata---acgatcgacgt Actually, the code will work fine if you remark the second and last lines of the inner while loop. Anyway, I am having trouble adding position data to my Hashes. I would like Data Dumper to output data like this (I always get my syntax messed up so I will just show part of $VAR1, but hopefully, you'll understand): $VAR1 = { 'human' = { '5' = '1' = { 25 } '3' = '2' = { 6, 16, }, } So, I am trying to figure out why the code I have does not work? What am I not getting? Any suggestions? -Thanks -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
comparing data between hashes
I have a several hash of a hashes that look like this (sorry if my formating is a little off): Human = { # HoH for human 1 = [1,32,54,67] # numbers in [ ] is a string delimited by commas not separate hash values 2 = [14,52,74,87] 5 = [33,44,64,107] } Chimp = { # HoH for Chimp 1 = [1,32,67] 2 = [14,74,87] 5 = [33,44] } Note: The numbers in between the [ ] is a STRING delimited by commas and NOT separate hash values. I already have a working script that appends these numbers to the appropriate hash separated by a comma or newline like this: $class{$position} .= $pos . ,; #to get: 5 = [33,44,64,107] And I am having trouble trying to compare data. I want to compare each number (i.e. 1, 2, 5) and its data with e the same number in the other species. For example, I would like to print out a table (see below) that compares the data between the group 1 in each species. Alleles for set: 1 Allele: 1 32 54 67 Human 1 1 1 1 # 1 = present Chimp 1 1 0 1 # 0 = absent And so on for 2 and 5. I can most likely do the print formating for the table myself so I do not need help with that. I just need help with being able to compare the data within each Hash of a Hash (HoH). I do not know if I should read this data into yet another Hash (which would be very busy) or an array somehow? I have been trying to figure this out all week to no avail. Any ideas or suggestions? - Thanks - Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Counting gaps in sequence data revisited.
I just wanted to thank everyone for their help and suggestions. This is the final full working code to count continuos gaps in a every sequence in a multi-sequence FASTA file. It may not be elegant but it is fast and works well. Sorry for the long post but I wanted to share this with those that do any DNA work. :-) I show in order: the format of the input data, the output of the input data, and finally the working script. I have yet to add comments into the code but I am sure many of you veterans will figure it out. -Thanks again all! As always comments, questions or suggestions are welcome! -Mike INPUT dog atcg--acgat---act-ca cat acgt-acgtacgt-gt-agct- mouse ---acgtacg-atcg---actgac- --- OUTPUT - ***Discovered the following DNA sequences:*** dog found! cat found! mouse found! mouse Indel size: 1 Times found:2 Positions: 11 25 Indel size: 3 Times found:2 Positions: 1 16 cat Indel size: 1 Times found:4 Positions: 5 18 21 26 Indel size: 4 Times found:1 Positions: 10 dog Indel size: 1 Times found:1 Positions: 18 Indel size: 2 Times found:1 Positions: 5 Indel size: 3 Times found:1 Positions: 12 Indel size: 4 Times found:1 Positions: 21 --- Script --- #!usr/bin/perl # By Michael S. Robeson II, with the help of friends at lernperl.org and bioperl.org! :-) # 10/16/2004 use warnings; use strict; ### # Open Sequence Data OUTFILE ### print Enter in the name of the DNA sequence file:\n; chomp (my $dna_seq = STDIN); open(DNA_SEQ, $dna_seq) or die Can't open file: $!\n; open(OUTFILE, indel_list_.$dna_seq) or die Can't open outfile: $!\n; # Read sequence data into a hash my %sequences; $/ = ''; print \n***Discovered the following DNA sequences:***\n; while ( DNA_SEQ ) { chomp; next unless s/^\s*(.+)//; my $name = $1; s/\s//g; $sequences{$name} = $_; print $name found!\n; } close DNA_SEQ; ## # iterate over gaps and write to file ## foreach (keys %sequences) { print \t\t\\ $_ \\\n; print OUTFILE \\ $_ \\\n; my $dna = $sequences{$_}; my %gap_data; my %position; while ($dna =~ /(\-+)/g) { my $gap_pos = pos ($dna) - length($) + 1; my $gap_length = length $1; #$1 =~ tr/\-+// $gap_data{$gap_length}++; $position{$gap_length} .= $gap_pos. \n; } my @indels = keys (%gap_data); my @keys = sort { $a = $b} @indels; foreach my $key (@keys) { print Indel size:\t$key\tTimes found:\t$gap_data{$key}\n; print OUTFILE Indel size:\t$key\tTimes found:\t$gap_data{$key}\n; print Positions:\n; print OUTFILE Positions:\n; print $position{$key}; print OUTFILE $position{$key}; print \n; print OUTFILE \n; } # Can replace the last foreach loop above with the while loop # below to do the same thing. Only Gap sizes will not be sorted. # nor is it set up to print to a file # # while (my ($key, $vlaue) = each (%gap_data)) { # print Indel size:\t$key\tTimes found:\t$gap_data{$key}\n; # print Positions:\n; # print $position{$key}; # print \n\n; # } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Counting gaps in sequence data revisited.
I cleaned up the code a little. So, here it is for anyone interested: #!usr/bin/perl # By Michael S. Robeson II with the help from the folks at lernperl.org and bioperl.org # 10/16/2004 # Last updated: 10/17/2004 # This script was made for the purpose of searching for indels (gaps) in aligned # DNA or protein sequences that are in FASTA format. It tallys up all of the different # size gaps within each sequence string. While it does this it counts the number of # times each gap of a given size is represented in each sequence and at the same time # reports all of the positions that that particular gap-size or indel appears. # contact: [EMAIL PROTECTED] if you have questions or comments use warnings; use strict; # # Introduction # print \n\t**Welcome to Mike Robeson's Gap-Counting Script!**\n A - Just be sure that your sequence alignment file is in FASTA format! B - Make sure there are no duplicate names within an individual file! C - Output file will be based on the name of the input file. It is named by appending \'indel_list_\' to the name of your input file.\n\n; ### # Open Sequence Data OUTFILE ### print Enter in the name of the DNA sequence file:\n; chomp (my $dna_seq = STDIN); open(DNA_SEQ, $dna_seq) or die Can't open file: $!\n; open(OUTFILE, indel_list_.$dna_seq) or die Can't open outfile: $!\n; # # Read sequence data into a hash # my %sequences; $/ = ''; print \n***Discovered the following DNA sequences:***\n; while ( DNA_SEQ ) { chomp; next unless s/^\s*(.+)//; my $name = $1; s/\s//g; $sequences{$name} = $_; print $name found!\n; print\n; } close DNA_SEQ; ## # Iterate over gaps and write to file ## foreach (keys %sequences) { print \t\t\\ $_ \\\n; print OUTFILE \\ $_ \\\n; my $dna = $sequences{$_}; my %gap_data; my %position; while ($dna =~ /(\-+)/g) { my $gap_length = length $1; my $gap_pos = pos ($dna) - $gap_length + 1; $gap_data{$gap_length}++; $position{$gap_length} .= $gap_pos. \n; } my @indels = keys (%gap_data); my @keys = sort { $a = $b} @indels; foreach my $key (@keys) { print Indel size:\t$key\tTimes found:\t$gap_data{$key}\n; print OUTFILE Indel size:\t$key\tTimes found:\t$gap_data{$key}\n; print Positions:\n; print OUTFILE Positions:\n; print $position{$key}; print OUTFILE $position{$key}; print \n; print OUTFILE \n; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Moving between hashes 2.
Ok, well I think I can see the forest but I have little idea as to what is actually going on here. I spent a few hours looking things up and I have a general sense of what is actually occurring but I am getting lost in the details that were posted in the last digest. See below: On Sep 19, 2004, at 10:08, [EMAIL PROTECTED] wrote: I see that you also made use of arrays. It struck me that, since the starting point is strings and not lists, using substr() would be more straight-forward: my %hash3; for ( keys %hash1 ) { while ( my $aa = substr $hash1{$_},0,1,'' ) { I have never seen anything like this nor can I find anything in any of my Perl books to help me explain what the 0,1 and the are doing to the substr of $hash1. I assume it is position information of some kind? If so, what is going on? $hash3{$_} .= $aa eq '-' ? '---' : substr $hash2{$_},0,3,''; This is something new to me. I think I follow your use of the ?: pattern feature. However, none of the perl books I have discuss it's use in this fashion. So, I am unsure of how you know to do that, or rather... how would I have known that I can do that? But basically I see that you are looking for '-' and equating it with what is matching between the ? and : (i.e. '---'). So, as far as I can tell, you are saying: hey, if you find '-' in $aa then append a '---' in $hash3, otherwise append the next three DNA letters. However, I do not understand the syntax of how perl is actually doing this. Help with explanation would be greatly appreciated. As you can see I can see what the big picture is, it's just that I am unable to determine mechanistically how perl is actually going about doing it. Also, any online references to the techniques used above would be great. I'd look for them myself but I do not know what some of these are actually called? -Thanks so much, I have learned a little just from this much so far. -mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Moving between hashes.
I have two sets of data that have been stored in hashes. The first hash has amino-acid (protein) sequence data. The second hash has the corresponding DNA sequence of those amino-acids: Hash 1 key:value: cat = mfgdhf doq = mfg--f mouse = mf-d-f Hash 2 key:value: cat = agtcatgcacactgatcg dog = agtcatgcatcg mouse = agtcatcactcg And I need to insert gaps (missing or absent data) proportionally into the DNA sequence (Hash 2) so that the output is as follows: Hash 3 key:value: cat = agtcatgcacactgatcg dog = agtcatgca--tcg mouse = agtca---tca---ctcg It doesn't look right here, but all the lines should end up being the same length with courier font. Basically, I am having trouble scanning though, say... hash1{cat} and for every dash found there being finally represented as three dashes in hash2{cat}. Also, every amino-acid is represented by 3 DNA letters. This is why I need to move in increments of 3 and add in increments of 3 for my final data to appear as it does in Hash 3. Example of relationship: M F DF = amino-acid agt tca --- act --- tcg = dna I have everything else set up I just need a few suggestions on how to do the above. Any help will be greatly appreciated. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: nested if
No, your post was not in the last e-mail digest I received, unless I missed it somehow? But the link you provided seems to clear things up for me. So, it's not about the order of operation but the timing of when the variables actually get defined. That is, during the beginning of the loop since the values are not defined then they are ignored. When the loop proceeds again these values (at this point) are now defined because of the previous iteration of the loop has set the values from the else statement? -Mike On Jul 2, 2004, at 10:25 PM, Michael S. Robeson II wrote: Well yeah, the indentation makes it much more clearer. However, this does not help me understand how the nested if statements are working. Which of the two if statements gets evaluated first? I am trying to figure out in english what the if statements are actually doing. Is it saying: If a line begins with bla-bla and if $seq (which appears no where else in the code other than $seq= ) exists assign it to the hash pro with the name bla-bla. So my question is how does the inner if statement work when seq= is out side that if statement? Is the outer if statement evaluated first then the inner? Because how does the inner if statement know what $seq is? I am probably not making any sense but I am trying to figure out mechanically how the perl interpreter knows what to do in the context of the nested if statements. -Thanks -Mike Gunnar Hjalmarsson wrote: That illustrates the importance of indenting the code in a way that makes sense: while (align) { $line=$_; if ($line=~/^(.+)/) { if ($seq) { $pro{$name}=$seq; #print SEQ:\n$pro\n\n; } $name=$1; $name=~s/\s//g; push @names, $name; #print $name\n; $k++; $seq=; } else { chomp $line; $seq.=$line; } } Quite a difference, isn't it? -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: nested if
Great, thanks for the help! -Mike On Jul 3, 2004, at 2:16 PM, Gunnar Hjalmarsson wrote: Michael S. Robeson II wrote: No, your post was not in the last e-mail digest I received, I see. Sometimes I think that digest mode for mailing lists is a nuisance. ;-) But the link you provided seems to clear things up for me. So, it's not about the order of operation but the timing of when the variables actually get defined. That is, during the beginning of the loop since the values are not defined then they are ignored. When the loop proceeds again these values (at this point) are now defined because of the previous iteration of the loop has set the values from the else statement? Yep, that's it. I would suggest that you to post the above comment to the list, too. Rgds, Gunnar -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: nested if
Well yeah, the indentation makes it much more clearer. However, this does not help me understand how the nested if statements are working. Which of the two if statements gets evaluated first? I am trying to figure out in english what the if statements are actually doing. Is it saying: If a line begins with bla-bla and if $seq (which appears no where else in the code other than $seq= ) exists assign it to the hash pro with the name bla-bla. So my question is how does the inner if statement work when seq= is out side that if statement? Is the outer if statement evaluated first then the inner? Because how does the inner if statement know what $seq is? I am probably not making any sense but I am trying to figure out mechanically how the perl interpreter knows what to do in the context of the nested if statements. -Thanks -Mike Gunnar Hjalmarsson wrote: That illustrates the importance of indenting the code in a way that makes sense: while (align) { $line=$_; if ($line=~/^(.+)/) { if ($seq) { $pro{$name}=$seq; #print SEQ:\n$pro\n\n; } $name=$1; $name=~s/\s//g; push @names, $name; #print $name\n; $k++; $seq=; } else { chomp $line; $seq.=$line; } } Quite a difference, isn't it? -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: nested if
Ok, that make much more sense - I think. So, I guess, the outer 'if' and 'else' statements get evaluated first. Then the inner 'if' can proceed once all the lines of data were gathered in the outer 'else' statement. This way the lines can be assigned as a key-value pair in the hash. I guess the individual who wrote the code could make it much cleaner or easier to read. The re-organizing of the conditionals in your second e-mail makes it perfectly clear and would be something I would have done had I known how the nested 'if' statement was working (again if I have right). Hopefully, I 'got it' now. I can see why many coders are annoyed with nested if statements. :-) -Thanks! -Mike On Jul 3, 2004, at 4:38 AM, Randy W. Sims wrote: On 7/2/2004 10:25 PM, Michael S. Robeson II wrote: Well yeah, the indentation makes it much more clearer. However, this does not help me understand how the nested if statements are working. Which of the two if statements gets evaluated first? I am trying to figure out in english what the if statements are actually doing. Is it saying: If a line begins with bla-bla and if $seq (which appears no where else in the code other than $seq= ) exists assign it to the hash pro with the name bla-bla. So my question is how does the inner if statement work when seq= is out side that if statement? Is the outer if statement evaluated first then the inner? Because how does the inner if statement know what $seq is? I am probably not making any sense but I am trying to figure out mechanically how the perl interpreter knows what to do in the context of the nested if statements. Tidied up a little more: my( %pro, @names); my( $name, $seq, $k ); while (defined( my $line = DATA )) { if ($line =~ /^(.+)/) { if ($seq) { $pro{$name} = $seq; $seq = ''; } $name = $1; $name =~ s/\s//g; push @names, $name; $k++; } else { chomp( $line ); $seq .= $line; } } This code deals with multi-line sequences, putting multiple lines together untill a sequence is complete. The 'else' part of the outter 'if' does the accumulation of multiple lines into a sequence. The 'if' part determines that a sequence is complete, captures some type of name from the sequence, stores the complete sequence in the '%pro' hash, and pushes the name onto a '@names' array. I'm guessing '$k' keeps tally of the number of sequences; it's not clear if that is neccessary since `scalar @names` possibly will provide the same info. It's also unclear why there is a '@names' array that mostly duplicates `keys %pro` I think where you're-understandably-getting confused is that most of those variables are global. That's made more explicit in my strictified rewrite above. It could probably be rewritten better if we knew the exact format of the data being read. Randy. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
nested if
I came across some code on the internet that looks like this (this is only part of the script): while (align) { $line=$_; if ($line=~/^(.+)/) { if ($seq) { $pro{$name}=$seq; #print SEQ:\n$pro\n\n; } $name=$1; $name=~s/\s//g; push @names, $name; #print $name\n; $k++; $seq=; } else { chomp $line; $seq.=$line; } } I am having trouble figuring out how the nested if statements work (i.e. what is the order of operation etc...) and their associated else statements. I pretty much understand the rest of what is going on but I am having trouble putting into words what the nested if statements are doing. I mean I know enough that the code is... ummm... yuck!!! :-) -Thanks for any help! -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
combining data from more than one file...
Hi all, I am having trouble with combining data from several files, and I can't even figure out how to get started. So, I am NOT asking for any code (though pseudo-code is ok) as I would like to try figuring this problem out myself. So, if anyone can give me any references or hints that would be great. So, here is what I am trying to do: I have say 2 files (I'd like to do this to as many files as the user needs): ***FILE 1*** cat atacta--gat--acgt- ac-ac-ggttta-ca-- dog atgcgtatgc-atcgat-ac--ac-a-ac-a-cac mouse acagctagc-atgca-- acgtatgctacg--atg- ***end file 1*** ***FILE 2*** mouse aatctgatcgc-atgca-- acgtaaggctagg- cat atacta--gat--acgt- ac-acacagcta--ca-- dog atgcgtatgc-atcgat -ac--ac-a-ac-a-cac ***end file 2*** Basically, I would like to concatenate the sequence of each corresponding animal so that the various input files would be out put to a file like so: ***output*** cat atacta--gat--acgt-ac-ac-ggttta-ca--atacta--gat--acgt-ac-acacagcta--ca-- dog atgcgtatgc-atcgat-ac--ac-a-ac-a-cacatgcgtatgc-atcgat-ac--ac-a-ac-a-cac mouse acagctagc-atgca--acgtatgctacg--atg-aatctgatcgc-atgca-- acgtaaggctagg- ***output end*** Notice that in the two files the data are not in the same order. So, I am trying to figure out how to have the script figure out what the first organism is in FILE 1( say cat in this case) and find the corresponding cat in the other input files. Then take the sequence data (all the cat data) from FILE 2 and concatenate it to the cat sequence data in FILE 1 to an output file. Then it should go on to the next organism in FILE 1 and search for that next organism in the other files (in this case FILE 2). I do not care about the order of the data, only that the like data is concatenated together. Again, I do NOT want this solved for me (unless I am totally lost). Otherwise, I'll never learn. I would just like either hints / suggestions / pseudo code / even links to books or sites that discuss this particular topic. Meanwhile, I am eagerly awaiting my PERL Cookbook and I'll keep searching the web. -Thanks! -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
combining data from more than one file...
Hi all, I am having trouble with combining data from several files, and I can't even figure out how to get started. So, I am NOT asking for any code (though pseudo-code is ok) as I would like to try figuring this problem out myself. So, if anyone can give me any references or hints that would be great. So, here is what I am trying to do: I have say 2 files (I'd like to do this to as many files as the user needs): ***FILE 1*** cat atacta--gat--acgt- ac-ac-ggttta-ca-- dog atgcgtatgc-atcgat-ac--ac-a-ac-a-cac mouse acagctagc-atgca-- acgtatgctacg--atg- ***end file 1*** ***FILE 2*** mouse aatctgatcgc-atgca-- acgtaaggctagg- cat atacta--gat--acgt- ac-acacagcta--ca-- dog atgcgtatgc-atcgat -ac--ac-a-ac-a-cac ***end file 2*** Basically, I would like to concatenate the sequence of each corresponding animal so that the various input files would be out put to a file like so: ***output*** cat atacta--gat--acgt-ac-ac-ggttta-ca--atacta--gat--acgt-ac-acacagcta--ca-- dog atgcgtatgc-atcgat-ac--ac-a-ac-a-cacatgcgtatgc-atcgat-ac--ac-a-ac-a-cac mouse acagctagc-atgca--acgtatgctacg--atg-aatctgatcgc-atgca-- acgtaaggctagg- ***output end*** Notice that in the two files the data are not in the same order. So, I am trying to figure out how to have the script figure out what the first organism is in FILE 1( say cat in this case) and find the corresponding cat in the other input files. Then take the sequence data (all the cat data) from FILE 2 and concatenate it to the cat sequence data in FILE 1 to an output file. Then it should go on to the next organism in FILE 1 and search for that next organism in the other files (in this case FILE 2). I do not care about the order of the data, only that the like data is concatenated together. Again, I do NOT want this solved for me (unless I am totally lost). Otherwise, I'll never learn. I would just like either hints / suggestions / pseudo code / even links to books or sites that discuss this particular topic. Meanwhile, I am eagerly awaiting my PERL Cookbook and I'll keep searching the web. -Thanks! -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: combining data from more than one file...
Well this is the best I could do thinking through what you said. This is actually my first time working with hashes. Also, I am still a PERL newbie. So, I guess a little helpful code would go a long way. I just can't figure out how to link the regular expressions to the hash when searching through the multiple files. to do as you say: ***Philipp wrote:*** - open the first file - search for the beginning of an organism (say: cat), read everything after this point - search in your hash if you already stored data of this organism - if yes, append your new sequence to the already existing data - if no, create a new key in the hash - repeat this until you run out of organisms - repeat the whole procedure until you run out of files ***end*** #!/usr/bin/perl # This script will take separate FASTA files and combine the like # data into one FASTA file. # use warnings; use strict; my %organisms ( $orgID = $orgSeq, ); print Enter in a list of files to be processed:\n; # For example: # CytB.fasta # NADH1.fasta # chomp (my @infiles = STDIN); foreach $infile (@infiles) { open (FASTA, $infile) or die Can't open INFILE: $!; $/=''; #Set input operator while (FASTA) { chomp; # Some regular expression match here? # something that will set, say... cat # as the key $orgID, something similar # to below? # and then set the sequence as the value # $orgSeq like below? # Do not know if or where to put the following, # but something like: if (exists $organisms{$orgID}) { # somehow concatenate like data # from the different files } # print the final Hash to an outfile? } yeah, I'm lost. :-) -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: formatting the loop
On Feb 11, 2004, at 2:55 PM, James Edward Gray II wrote: [snip] my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ]; If I may, yuck! This builds up a list of all the A-Za-z characters in the string, adds a boat load of extra - characters, trims the whole list to the length you want and stuffs all that inside @char. It's also receives a rank of awful, on the James Gray Scale of Readability. ;) [snip] Ok, now I understand. I found that my problem was with how the next command was operating in conjunction with the grouping of characters. Ok, making progress. :-) Now, about that array slice I have: my @char = ( /[a-z]/ig, ( '-' ) x $len) [0 .. $len - 1]; I know it wastes a lot of memory and makes perl do much extra work. However, when I try to replace that line with something like this: my @char = ( /[a-z]/ig, ( '-' ) x ($len - length) ; it doesn't work the way I thought it would (gee what a thought). I would like to express the code similar to ( '-' ) x ($len - length) because it is easy for me to read and it tells you clearly what is going on. However, every time I try to implement something like that I get unexpected output or I have to really rewrite the loop. Which I have been unable to troubleshot as you have been seeing. :-) I think the 'length' command it also counting any '\n' characters or something, because my out put ends up with different lengths like this when I use the ($len - length) way : a c u g a c g a g u - - - - - - - - bob a c u g a c u a g c u g - - - - - - - fred with this input: bob actgacgagt fred actgactagctg The reason I went with /[a-z]/ig is because some sequence data uses other letters to denote ambiguity and other things. I guess I can only list the letters it uses but I was just lazy and typed in the entire range of a to z. I will be continuing to work on it but here is the code as it stands now (with that awful array slice). #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed:\n; # For example rotifer.txt or ../Desktop/Folder/rotifer.txt chomp (my $infile = STDIN); open(INFILE, $infile) or die Can't open INFILE for input: $!; print Enter in the path of the OUTFILE:\n; # For example rotifer_out.txt or ../Desktop/Folder/rotifer_out.txt chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE for input: $!; print Enter in the LENGTH you want the sequence to be:\n; my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter; print OUTFILE R 1 $len\n\n\n\n; # The top of the file is supposed $/ = ''; # Set input operator while ( INFILE ) { chomp; next unless s/^\s*(.+)//; # delete name and place in memory my $name = $1; # what ever in memory saved as $name my @char = ( /[a-z]/ig, ( '-' ) x $len) [0 .. $len -1]; # take only sequence letters and # and add '-' to the end my $sequence = join( ' ', @char); # turn into scalar $sequence =~ tr/Tt/Uu/; # convert T's to U's print OUTFILE $sequence $name\n; } close INFILE; close OUTFILE; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
formatting the loop
Hi all! Well, based on the input I have received from everyone thus far I have been able to cobble the following code together (See below for the input and out put of of this script). Anyway, though it works great I am having a tough time trying to figure out WHY it works. I am especially having trouble with the line: next unless s/^\s*(\S+)// in relation to the while loop it is in. Basically, I do not understand how the script is differentiating the bob line in the input from the lines of agactgatcg (again see input and output at bottom). I know that the $/ has something to do with this, but I am not sure how or why it works. I hate to sound like a dummy, but if anyone can help me understand WHAT the script is doing in the while loop I would really appreciate it. I think if I can understand the mechanics behind this script it will only help me my future understanding of writing PERL scripts. Especially, when it comes to regular expressions and loops. Heck, if there is a better way to do certain parts of this let me know! Also, special thanks to James Gray for the help thus far!! Till then, I'll be wracking my head with my PERL books! The working script: _ #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed:\n; # For example rotifer.txt or ../Desktop/Folder/rotifer.txt chomp (my $infile = STDIN); open(INFILE, $infile) or die Can't open INFILE for input: $!; print Enter in the path of the OUTFILE:\n; # For example rotifer_out.txt or ../Desktop/Folder/rotifer_out.txt chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE for input: $!; print Enter in the LENGTH you want the sequence to be:\n; my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter; print OUTFILE R 1 $len\n\n\n\n; # The top of the file. $/ = ''; # Set input operator while ( INFILE ) { chomp; next unless s/^\s*(\S+)//; my $name = $1; my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ]; my $sequence = join( ' ', @char); $sequence =~ tr/Tt/Uu/; print OUTFILE $sequence $name\n; } close INFILE; close OUTFILE; ___ Again this script is to convert the following data existing as either single line or multiline sequence data: ### input type 1 ### bob atcgactagcatcgatcg acacgtacgactagcac fred actgactacgatcgaca acgcgcgatacggcat # or (as I posted originally) ### input type 2 ### bob atcgactagcatcgatcgacacgtacgactagcac fred actgactacgatcgacaacgcgcgatacggcat # ###output## ## Note that the T's are converted to U's in the output! ## R 1 42 a u c g a c u a g c a u c g a u c g a c a c g u a c g a c u a g c a c - - - - - - - bob a c u g a c u a c g a u c g a c a a c g c g c g a u a c g g c a u - - - - - - - - - fred -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: formatting the loop
See comments below. On Feb 11, 2004, at 2:55 PM, James Edward Gray II wrote: On Feb 11, 2004, at 1:27 PM, Michael S. Robeson II wrote: [snip] Anyway, though it works great I am having a tough time trying to figure out WHY it works. See comments below, in the code. [snip] I think if I can understand the mechanics behind this script it will only help me my future understanding of writing PERL scripts. Perl. The language you are learning is called Perl, not PERL. :) Hehe, thanks. :-) [snip] [snip] $/ = ''; # Set input operator Here's most of the magic. This sets Perl's input separator to a character. That means that INFILE won't return a sequence of characters ending in a \n like it usually does, but a sequence of characters ending in a . It basically jumps name to name, in other words. while ( INFILE ) { chomp; chomp() will remove the trailing . OK that makes pretty good sense. I understand that now, I hope. See next comment. next unless s/^\s*(\S+)//; my $name = $1; Well, if we're reading name to name, the thing right a the beginning of our sequence is going to be a name, right? The above removes the name, and saves it for later use. OK, I think this is were my problem is. That is how does it know that the characters as in bob or fred are the names and not mistaking the sequence of letters agtcaccgatg to be placed in memory ($name). Basically I am reading the following: next unless s/^\s*(\S+)//; as Go to the next line unless you see a line with zero or more whitespace characters followed by one or more non-whitespace characters and save the non-whitespace characters in memory. If this is correct then how can perl tell the difference between the lines containing bob or fred (and put then in memory) and the acgatctagc (and not put these in memory) because both lines of data seem to fit the expression pattern to me. I think it has something to do with how perl is reading through the file that makes this work? So, there is something I am missing, not noticing or realizing here. Maybe I've been staring at the code for far to long and should take a break! :-) my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ]; If I may, yuck! This builds up a list of all the A-Za-z characters in the string, adds a boat load of extra - characters, trims the whole list to the length you want and stuffs all that inside @char. It's also receives a rank of awful, on the James Gray Scale of Readability. ;) Yeah, I need to clean that up a bit! [snip] -Mike -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
formatting and syntax
Hi I am all still to new to PERL and I am having trouble playing with formatting my data into a new format. So here is my problem: I have data (DNA sequence) in a file that looks like this: # Infile bob AGTGATGCCGACG fred ACGCATATCGCAT jon CAGTACGATTTATC and I need it converted to: # Outfile R 1 20 A G U G A T G C C G A C G - - - - - - - bob A C G C A U A U C G C A U - - - - - - - fred C A G U A C G A U U U A U C - - - - - - jon The R 1 is static and should always appear. The 20 at the top of the new file should be a number defined by the user, that is they should be prompted for the length they wish the sequence to be. That is the total length of the sequence plus the added dashes could be 20 or 3000 or whatever. So, if they type 20 and there is only 10 letters in that row then the script should add 10 dashes to bring that total up to the 20 chosen by the user. Note that there should be a space between all letters and dashes - including a space at the beginning. Then there are supposed to be 7 spaces after the sequence string followed by the name as shown in the example output file above. Also, of note is the fact that all of the T's are changed to U's. For those of you that know biology I am not only switching formats of the data but also changing DNA to RNA. I hope I am explaining this clear enough, but here (see below) is as far as I can get with the code. I just do not know how to structure the loop/code to do this. I always have trouble with manipulating data the way I want when it comes to a loop. I would prefer an easier to understand code rather than an efficient code. This way I can learn the simple stuff first and learn the short-cuts later. Thanks to anyone who can help. - Cheers! - Mike ## #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed:\n; # For example rotifer.txt or ../Desktop/Folder/rotifer.txt chomp (my $infile = STDIN); open(INFILE, $infile) or die Can't open INFILE for input: $!; print Enter in the path of the OUTFILE:\n; # For example rotifer_out.txt or ../Desktop/Folder/rotifer_out.txt chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE for input: $!; print Enter in the LENGTH you want the sequence to be:\n; my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter; print OUTFILE R 1 $len\n\n\n\n; # The top of the file is supposed # type of loop or structure to follow ? # -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response