Dear all, I am a Perl newbie struggling to accomplish a conceptually simple bioinformatics task. I have a single large file containing about 200,000 DNA probe sequences (from an Affymetrix microarray), each of which is accompanied by a header, like so (this is in FASTA format):
>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense; TGGCTCCTGCTGAGGTCCCCTTTCC >probe:HG_U95Av2:107_at:543:519; Interrogation_Position=258; Antisense; CTACTCTCGTGGTGCACAAGGAGTG >probe:HG_U95Av2:1156_at:528:483; Interrogation_Position=2054; Antisense; TGCAGGTGGCAGATCTGCAGTCCAT >probe:HG_U95Av2:1102_s_at:541:589; Interrogation_Position=4316; Antisense; GTGAAGGTTGCTGAGGCTCTGACCC .........etc. What I would like to do is extract from this file a subset of ~130,800 probes (both the header and the sequence) and output this subset into a new text file in the same (FASTA) format. These 130,800 probes correspond to 8,175 probe set IDs ("1138_at" is an example of a probe set ID in the header listed above). I have these 8,175 IDs listed in a separate file called "ID.txt" and the 200,000 probe sequences in a file called "HG_U95Av2_probe_fasta.txt". The script below is missing something because the output file ("probe_subset.txt") is blank. This is also the case if I replace the file "ID.txt" with a file consisting of a single probe set ID (e.g. 1138_at). Does anyone know what I am missing? I am running this script in Cygwin on Windows XP. I appreciate any suggestions! ~ Mike O. #!/usr/bin/perl -w use strict; my $IDs = 'ID.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; my @ID = <IDFILE>; chomp @ID; while (<PROBES>) { foreach (@ID) { if(/^>probe:\w+:(\w+):/) { print OUT; print OUT scalar(<PROBES>); } } } exit; -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>