Re: re-reading from already read file handle

Jim Gibson Mon, 20 Aug 2012 16:10:05 -0700

On Aug 20, 2012, at 3:32 PM, Rajeev Prasad wrote:

> Thx. I did some timestamp prints from within script, this piece is taking too 
> long: almost 5 minutes to complete...!!!
> 
> fyi, the strArr array contains about 1500 string elements. (this loop runs 
> that many times)
> the file tmp_FH_SR  is 27Mb and 300,000 lines of data.
> the file tmp_FH_RL is 13 Mb with around 150,000 lines of data.


You are reading the first file 1500 times and the second file 0 to 1500 times, 
depending on how many strings are matched only once. That is why it is taking 5 
minutes.

You are iterating over your string array 1500 times and reading the file during 
each iteration. It would be faster to read the file once and iterate over the 
array of strings for each record. You will have to use some sort of data 
structure to save the counts for each string. A hash would be suitable for this.

> 
> I have changed the names of variable to protect actual names...
> 
> in the first while, based on the fact that the $str was found only once in 
> the file, i obtain another field from the matching record. I use this field 
> to search for no of occurances of this filed in another file. Based on that 
> output i do further things with $str.

Here also you can identify which strings match only once in the first file, 
read each record in the second file, search for any of the once-matched stings 
in the just-read record, and process accordingly. If that is possible, then you 
will only read each file once, and you can't do better than that. 

[program snipped]

> how can i make it faster? read the 27mb and 13mb files in an array one time 
> and work? 

Yes, you can try to read the files into memory and process the records there. 
That would be the fastest approach. 40MB of data should be no problem. Perl can 
utilize lots of available memory. You should still try to minimize the number 
of times you iterate over your data.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: re-reading from already read file handle

Reply via email to