I have been pondering whether it would be feasible to work with a 100,000 entry 
file, and had put yesterday aside to do some timing tests. I first generated 
some sample
index files of various lengths. Each entry consisted of a single line with the 

ASDF;rhubarb, rhubarb, ....

where the ASDF is a randomly generated four character index, and the rest of 
the line is
filling, which varies slightly in length and contents from line to line, just 
in case
something tried to get smart and cache the line.  The average lenth of the line 
is about
80 bytes.

Then I wrote another program which read the file into an array, using the four 
index as the key, and the filling as the contents, sorted the array, and then 
rewrote it
to another file, reporting the elapsed time after each step.

My first version used fgets() to read the source file a line at a time, and 
fwrite() to
write the new file. This version performed quite consistently, and took 
approximately 1.3
seconds to read in a 100,000 entry 7.86Mb file, and another 5 seconds to write 
it out

I then read the discussion following fschnittke's post "File write operation 
slows to a
crawl ... " and wondered if the suggestions made there would help.

First I used file() to read the entire file into memory, then processed each 
line into the
form required to set up my matrix. This gave a useful improvement for small 
files, halving
the time required to read and process a 10,000 entry 815 kB file, but for a 
30,000 entry
file it had dropped to about 15%, and it made little difference for a 300,000 
entry file.

Then I tried writing my whole array into a single horrendous string, and using
file_put_contents() to write out the whole string in one bang. I started 
testing on a
short file, and thought I was onto a good thing, as it halved the time to write 
out a
10,000 entry 800 K file. But as I increased the file size it began to fail 
dismally. With
a 30,000 entry file it was 20% slower, and at 100,000 entries it was three 
times slower.

On Shawn McKenzie's suggestion, I also tried replacing fgets() with 
stream_get_line(). As
I had anticipated any difference was well within below the timing noise level.

In conclusion, for short (1MB!) files, using file() to read the whole file into 
memory is
substantially better than using fgets() to read the file a line at a time, but 
advantage rapidly diminishes for longer files. Similarly  using 
file_put_contents() in
place of fwrite() to write it out again is better for short files (up to 
perhaps 1 MB) but
the performance deteriorates rapidly above this.

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to