Re: combining data from more than one file...

2004-05-17 Thread Johan Viklund
On Sun, 16 May 2004 19:50:57 -0400, Michael S. Robeson II  
<[EMAIL PROTECTED]> wrote:

Hi all,
Hello and Welcome to the world of bioinformatics with perl!
...
I think you should take a look at bioperl since this is genome data, for  
this exercise it's not what you want, but if you want to do more biology  
whith perl (blast, interfacing with databases, easy format conversion, and  
so on, and so forth...). Bioperl can be found at http://www.bioperl.org/

***FILE 1***
 >cat
atacta--gat--acgt-
ac-ac-ggttta-ca--
...
Again, I do NOT want this solved for me (unless I am totally lost).  
Otherwise, I'll never learn. I would just like either hints /  
suggestions / pseudo code / even links to books or sites that discuss  
this particular topic. Meanwhile, I am eagerly awaiting my "PERL  
Cookbook" and I'll keep searching the web.
So this was more like a link ;)

-Thanks!
-Mike
/Johan Viklund
Ps.

Next exercise (or really the one before) would be to calculate the GC-skew.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>



Re: combining data from more than one file...

2004-05-19 Thread Johan Viklund
Hi,
See code
On Tue, 18 May 2004 13:16:37 -0400, Michael Robeson <[EMAIL PROTECTED]>  
wrote:

Ok great. Most of what you show does make sense. However, there are some  
bits of code that I need further clarification with. Some bits I am able  
to tell what they are doing but I do not quite know how or why they work  
they way they do. I'll state these areas in the code we've got together  
at this point.

Hopefully, I have copied over the bits you wrote correctly. I find this  
is like learning Spanish. I can read and (roughly) get the gist of the  
code. But when it comes to writing the original code on my own is when I  
have trouble. I am sure this will go away when I practice more. :-)

I didn't finish everything because I just need some code explained /  
clarified.

 >>>Start PERL code<
#!usr/bin/perl -w
use strict;
use FileHandle;
# I am unsure of what this module is. I've tried looking it up
# in the Camel and Llama book to no avail, not enough description.
# I guess I have to figure out the whole object thing?
# write 'perldoc FileHandle' on the commandline to see
# (you can do this with (hopefully) all new modules you come across).
my %organisms;
print "Enter in a list of files to be processed:\n";
# For example:
# Cytb.fasta
# NADH1.fasta
# ...
# chomp (my @infiles = );
# TODO we should make this nicer later
my @infiles = ('genetics.txt');
foreach my $infile(@infiles) {
my $FASTA = new FileHandle;

# Does the above statement tell PERL to create a new
# filehandle for each file it finds? I guess I need to understand
# what "new" and the module "FileHandle" are doing.
Right on.
open ($FASTA, $infile)
or die "Can't open INFILE:$!";

#$/='>' #Set input operator
my $orgID;
while (defined($_ = <$FASTA>)) {

# Above I am unsure of why the "defined function
# helps us here? I know it has something to do with an
# expression containing a valid string, but I am unsure
# of it's function here. This is something I would have
# never thought to do.  :-)
It's what
while (<$FASTA>)
actually do.
the defined function checks wheter $_ gets set or not.
chomp;
print "\nworking on >>$_<<\n";

if (\s*>(\w+)/) {
$orgID=$1;
print "Found a new organism start line ('$orgID')\n";

# The above regex makes complete sense. Actually, I was going to put
# something similar to that in my original post but wasn't sure
# if this was appropriate at the time. I guess it was!

} else {
print "This is just some data: $_\n";
print "This data needs to be appended to the hash entry for $orgID/n";
# okay, in the above you are taking the left over
# sequence ($_) and linking it as a "value" to "$orgID" ?
This if- then else statement should do what you want. I would do it like  
this instead:
$organism{$orgID} .= $_;

no if and no else just that single line. Perl will just make it work the  
wat it's supposed to work; if the hashkey don't exists it gets created and  
the contents of $_ is inserted in it (as a string).

if (exists ($organsims{$orgID})) {
#TODO append the data to the hash here

# I guess I would put the following to append to
# the already existing hash:
# $organism{$orgID} .= $_;

} else {
#create new hash entry for this data
$organsims{$orgID} = $_;
}
}   
}

# Do not forget to close the input file
close ($FASTA)
or die "Could not close INFILE: $!";
# We've processed all input files... print the resulting hash
print "\n*\n";
while (my($orgID, $sequence) = each(%organisms)) {
# since I want the output as:
# >cat
# actgac---cgatc-ag-cttag---acg
# >dog
# actatc---actat-at-accta---atc
# I would change the print statement to:
print "> . $orgID\n $sequence\n";
Hmm, you're trying to do string concatenation here but in that case it  
should be:
print ">" . $orgID . "\n" . $sequence . "\n";
but it's much easier to just do it like:
print ">$orgID\n$sequence\n";

}
end;
 >>>end PERL code<<<
Thanks for all your help so far! Most of this is starting help my  
thinking. I will be doing a lot more of this multi-file parsing as most  
of my work entails manipulating data in several files or folders at once.

-Mike


/Johan
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Dates are killing me..

2004-07-07 Thread Johan Viklund
Hi

Use the following methods from the Date::Calc module from CPAN:

Week_of_Year:   
($week,$year) = Week_of_Year($year,$month,$day);

Monday_of_Week: 
($year,$month,$day) = Monday_of_Week($week,$year);

Add_Delta_Days: 
($year,$month,$day) = Add_Delta_Days($year,$month,$day, $Dd);

Shouldn't be too hard I think.

ons 2004-07-07 klockan 06.03 skrev Chris Puccio:
> Hi Guys,
> 
> I'm having a real hard time trying to figure this out..
> 
> There are tons of modules on dates, etc, but I can't seem to find one to do 
> what I need.
> 
> I have one date, for example: 2004-07-07.
> 
> I need to take that date, get Monday's date and Sunday's date where 2004-07-07 
> is between.
> 
> Any suggestions?
> 
> Thanks!!
> -c
-- 
Johan Viklund <[EMAIL PROTECTED]>


signature.asc
Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad	meddelandedel