Hi Matt,

first it might be easier to just use split here --

$fields = split("\t",$line);

.... I say might because I suspect that the records you have that are
converting incorrectly perhaps don't follow the tab convention? ... said
another way, I've never seen preg break because the file got over a meg
in size;)

On Fri, Dec 07, 2001 at 11:50:51PM -0000, Matthew Moreton wrote:
> Hi people.  I am having some trouble with the PREG functions in php.
> 
> Here's what I am trying to do...
> 
> First of all I am reading in a file which is 1.5mb's in size, it could be many more, 
>going up to 8mb's, the contents of the file is input to a string.
> 
> The format of the file is as follows...
> 
> #    #    #    "quoted text"    "quoted text"    #    #
> 
> the # represents a number, in the case of the first 3 numbers they are only ever 1 
>or 2 digits long.  The final two digits can get to be rather big in size, thousands 
>and millions.  Each element is seperated by a tab space and then a carriage return 
>(\r) terminates each record.
> 
> I use preg_match_all to find all the lines that start with 1 and 1 as there first 
>numbers, typically there will be 25 entries of 1 1.  So I am looking for all lines in 
>this format:
> 
> 1    1    #    "quoted text"    "quoted text"    #    #
> 
> I have the search pattern figured out, it is as follow:
> 
> 
>preg_match_all("/($first)\t($second)\t([0-9]{1,2})\t\"([^\"]*)\"\t\"([^\"]*)\"\t([0-9]*)\t([0-9]*)\r/",
> $input, $output, PREG_SET_ORDER );
> 
> When this pattern finds a matching line beginning equal to $first and $second it 
>will put all the elements of the record into the array $output. $output[0] being the 
>array of the first elements found, $array[1] being the second line that was matched, 
>and so on.
> 
> This pattern does actually work to some extent.  When the filesize is low (100kb) it 
>works fine, but when I start to get over that filesize it becomes greedy and the 
>$second value doesnt seem to be taken into account when it searchs.  It seems to 
>return everything that equals the following:
> 
> 1    #    #    "quoted text"    "quoted text"    #    #
> 
> Obviously not what I want.  Could this be some sort of overflow problem?  I am at a 
>lost end here, so if anyone could offer some insight as to why it is not functioning 
>correctly I would most welcome it.  Overwise the only solution I can think of is 
>chopping up the input, I dont really want to go down that path, as it seems like a 
>rather cheap workaround.
> 
> Thanks.
> 
> Matt

-- 
Hank Marquardt <[EMAIL PROTECTED]>
http://web.yerpso.net
GPG Id: 2BB5E60C
Fingerprint: D807 61BC FD18 370A AC1D  3EDF 2BF9 8A2D 2BB5 E60C
*** Web Development: PHP, MySQL/PgSQL - Network Admin: Debian/FreeBSD
*** PHP Instructor - Intnl. Webmasters Assn./HTML Writers Guild 
*** Beginning PHP -- Starts January 7, 2002 
*** See http://www.hwg.org/services/classes

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to