I am in the process of writing an application that does a lot of parsing in
which performance is the key.  In the process, I performed some rudimentary
speed testing that yielded some interesting results.  Rather than keep them
to myself, I thought I would share them.

I am currently parsing out the contents of general webpages, so for the test
case, I used a string that contained the html to generate a fairly standard
select list (with 31 options).  For the first test, I merely wanted to pull
out the name of the select element.  I used the following two parsing
commands in a side-by-side timing comparison and ran each in a loop 5000
times to get a larger time value:

    preg_match(
"/name=[ ]?(['\"])?((?(1)[^\\1]|[^\s\>])+?)(?(1)\\1|[\s>])/i", $string,
$arr );

    eregi( "name=[\"']{0,1}([_0-9a-zA-Z]+)[\"']{0,1}", $string, $arr );

Note: The preg_match expression is actually far more accurate that the eregi
as well as complex.  It handles the case of "name=34 multiple>" as well as
"name='my select'".  Both expressions were also case insensitive.

The results:
        preg_match
        Timer: This page was generated in 0.26572799682617 seconds.

        eregi
        Timer: This page was generated in 1.2171900272369 seconds.


The preg_match is considerably faster than ereg and much more powerful (the
PHP homepage for the documentation), and while the syntax takes a little
adjustment (if you have never used Perl before), it is not that difficult to
convert to.  When I replaced all of my eregi statements with their
preg_match equivalents, I found that the parsing portion of my page went
from .46 seconds to .23.  When it comes to regular expression pattern
matching, I have come to the conclusion that the only option is preg_match.


Inspired by this revelation, I decided to test preg_split vs split vs
explode.  It was not nearly as interesting, but I thought I would post my
results nonetheless.

Using the same string as above, I decided to split the string by the
</option tag.  I used the following commands in a side-by-side comparison
(again in a loop of 5000):

    preg_split( '/<\/option/i', $string, $arr );
    spliti( "</option>", $string, $arr );
    explode( "</option>", $string, $arr );

The results:

        preg_split
        Timer: This page was generated in 0.23138296604156 seconds.

        split
        Timer: This page was generated in 0.22009003162384 seconds.

        explode
        Timer: This page was generated in 0.14973497390747 seconds.


This really is not too surprising when it comes to explode.  If there is no
complex pattern matching, always use explode.  preg_split vs split was a
little surprising given my findings above, but in general, it looks like
while there is not much of a difference, split has the slight edge.


Summary:
* If you are doing regular expression matching in a string, use preg_match.
Not only is it much faster, but it is much more powerful than ereg.
* If you are splitting a string by a simple string pattern, use explode.
* If you are splitting a string using regular expressions, use split unless
you need the functionality of preg_split.


Disclaimer:
I have not done exhaustive performance study of all of the possible
scenarios to find discrepancies, but from my observations so far, the above
conclusions have held true.  If anyone has any other information, please
post it for us all to share.  I hope some of you have found this useful.


Matthew Aznoe
Fuzz Technologies
[EMAIL PROTECTED]
(406) 587-1100 x217

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to