Hi people.  I am having some trouble with the PREG functions in php.

Here's what I am trying to do...

First of all I am reading in a file which is 1.5mb's in size, it could be many more, 
going up to 8mb's, the contents of the file is input to a string.

The format of the file is as follows...

#    #    #    "quoted text"    "quoted text"    #    #

the # represents a number, in the case of the first 3 numbers they are only ever 1 or 
2 digits long.  The final two digits can get to be rather big in size, thousands and 
millions.  Each element is seperated by a tab space and then a carriage return (\r) 
terminates each record.

I use preg_match_all to find all the lines that start with 1 and 1 as there first 
numbers, typically there will be 25 entries of 1 1.  So I am looking for all lines in 
this format:

1    1    #    "quoted text"    "quoted text"    #    #

I have the search pattern figured out, it is as follow:

preg_match_all("/($first)\t($second)\t([0-9]{1,2})\t\"([^\"]*)\"\t\"([^\"]*)\"\t([0-9]*)\t([0-9]*)\r/",
 $input, $output, PREG_SET_ORDER );

When this pattern finds a matching line beginning equal to $first and $second it will 
put all the elements of the record into the array $output. $output[0] being the array 
of the first elements found, $array[1] being the second line that was matched, and so 
on.

This pattern does actually work to some extent.  When the filesize is low (100kb) it 
works fine, but when I start to get over that filesize it becomes greedy and the 
$second value doesnt seem to be taken into account when it searchs.  It seems to 
return everything that equals the following:

1    #    #    "quoted text"    "quoted text"    #    #

Obviously not what I want.  Could this be some sort of overflow problem?  I am at a 
lost end here, so if anyone could offer some insight as to why it is not functioning 
correctly I would most welcome it.  Overwise the only solution I can think of is 
chopping up the input, I dont really want to go down that path, as it seems like a 
rather cheap workaround.

Thanks.

Matt

Reply via email to