Hello Tiago, On Thu, Jan 26, 2012 at 11:08 AM, Tiago Hori <tiago.h...@gmail.com> wrote: > Hi All, > > I need some help to get started on a script. > > I have these huge data files 16K rows and several columns. I need to parse > the rows into a subset of these 16K rows. Each rows has a identifier made > up of 2 letters and 6 numbers and the ones I want have specific letter, > they start with either C or D. So I know I can use regex, but I have been > trying to figure out the rest and I don't know where to start. This is the > first time I am trying to do something from scratch so any suggestions > would be appreciated. I am not asking for the script but just some help on > how to go about it. > > So, what I want to be able to do is retrieve all the rows that have > identifiers starting with C or D. Should I use arrays, can I store each row > as one item a (set of information separated by tabs) in an array? >
Yes I would use an array to store the data and then use regex to extract the rows based on your criteria. I put together a little sample program using fictitious data. You should be able to apply the same concept to your needs. ***tested*** #!/usr/bin/perl use warnings; use strict; while ( <DATA> ) { chomp; my @array = split; my $GeneID = $array[6]; if ($GeneID =~ /^C|D/) { print $_,"\n"; } } __DATA__ Line1 c 2 3 4 5 C 7 8 9 Line2 1 2 3 4 5 6 7 8 9 Line3 1 2 3 4 5 D 7 8 9 Line4 1 2 3 4 5 6 7 8 9 Line5 1 2 3 4 5 D 7 8 9 Line6 1 2 3 4 5 6 7 8 9 ***output*** Line1 c 2 3 4 5 C 7 8 9 Line3 1 2 3 4 5 D 7 8 9 Line5 1 2 3 4 5 D 7 8 9 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/