Re: Re: [PHP] Regular Expression - it works but uses way too much memory ?

Ulrik S. Kofod Sat, 19 Jun 2004 15:51:57 -0700

Robin Vickery sagde:
>
> The S modifier that you're using means that it's storing the studied
> expression. If the regexp changes each time around the loop then over
> 30000 iterations, that'll add up. See if removing that modifier helps
> at all.
>
The S modifier wasn't needed, I added it because I thought it would speed it up but
it didn't. Removing it didn't help on the memory usage, but it performs a little
better without.


> If that's not it, then these *might* save you some memory, although
> I've not tested them:
>
> I'm not entirely sure why you're matching (.*) at the end then putting
> it back in with your replacement text. Without running it, I'd have
> thought that you could leave out the (.*) from your pattern and the $4
> from your replacement and get exactly the same effect.
>

I tried removing $4 and (.*) but the result isn't the same, actually my first reg.
exp. didn't have $4, but I had to add it. Without it 51 of the 1246 texts isn't
processed right? Also there isn't really any difference in how it performs with or
without it.

>
> You could use a non-capturing subpattern for $2 which you're not using
> in your replacement.
>
>   $replace = "/^((?:[a-z]+?[^a-z]+?){".($count)."})(".$typedmask.")/i";

I didn't know you could do that.. cool :), this made the script run a little faster
but it still uses the same amount of memory.

>
> And maybe a look-behind assertion for the first subpattern rather than
> a capturing pattern then re-inserting $1.
>
>   $replace = "/^(?<=(?:[a-z]+?[^a-z]+?){".($count)."})(".$typedmask.")/i";
>   $with = "<error-start sourcetext=".$corr['sourcetext']." id=".$corr['id']."
>   ...
>

With ?<= I get a lot of warnings:

here is an example:
$replace is '/^(?<=(?:[a-z]+?[^a-z]+?){50})(go)(.*)/i'
$with is '<error-start sourcetext=3 id=49 group="-" class="-" corrected-from="go"
corrected-to="god">$2<error-end sourcetext=3 id=49>$3'
<br />
<b>Warning</b>:  Compilation failed: lookbehind assertion is not fixed length at
offset 34


with the corrections added the reg.exp. looks like this:
$typedmask = preg_replace("/\s+/",".*?",$corr['typed']);

$replace = '/^((?:[a-z]+?[^a-z]+?){'.($count).'})('.$typedmask.')(.*)/i';

$with = '$1<error-start sourcetext='.$corr['sourcetext'].' id='.$corr['id'].'
group="'.$corr['grupper'].'" class="'.$corr['ordklasse'].'"
corrected-from="'.$corr['typed'].'"
corrected-to="'.$corr['corrected'].'">$2<error-end
sourcetext='.$corr['sourcetext'].' id='.$corr['id'].'>$3';

$text = $skipText[0] . preg_replace ($replace,$with,$text,1);

It completes a little faster and the output is exactly the same as before,
but it still uses way too much memory.

[EMAIL PROTECTED] testextract]# time php ../export.php > export6.txt
real    1m15.851s
user    0m18.720s
sys     0m1.750s

>From "top" just before the script completed:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 7843 root      17   0  269M 269M  3328 R    41.7 53.6   0:19 php

This isn't a huge problem anymore, as we have been allowed to move the project to a
3 times faster server with less activity (because of this).

But I would still like to know if there is a solution to this because it seems quite
insane that it allocates more than 250MB memory generate 4MB output.

Thanks Robin! I really appreciate your answer.

Brgds Ulrik

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: Re: [PHP] Regular Expression - it works but uses way too much memory ?

Reply via email to