I am in the process of writing an application that does a lot of parsing in which performance is the key. In the process, I performed some rudimentary speed testing that yielded some interesting results. Rather than keep them to myself, I thought I would share them. I am currently parsing out the contents of general webpages, so for the test case, I used a string that contained the html to generate a fairly standard select list (with 31 options). For the first test, I merely wanted to pull out the name of the select element. I used the following two parsing commands in a side-by-side timing comparison and ran each in a loop 5000 times to get a larger time value: preg_match( "/name=[ ]?(['\"])?((?(1)[^\\1]|[^\s\>])+?)(?(1)\\1|[\s>])/i", $string, $arr ); eregi( "name=[\"']{0,1}([_0-9a-zA-Z]+)[\"']{0,1}", $string, $arr ); Note: The preg_match expression is actually far more accurate that the eregi as well as complex. It handles the case of "name=34 multiple>" as well as "name='my select'". Both expressions were also case insensitive. The results: preg_match Timer: This page was generated in 0.26572799682617 seconds. eregi Timer: This page was generated in 1.2171900272369 seconds. The preg_match is considerably faster than ereg and much more powerful (the PHP homepage for the documentation), and while the syntax takes a little adjustment (if you have never used Perl before), it is not that difficult to convert to. When I replaced all of my eregi statements with their preg_match equivalents, I found that the parsing portion of my page went from .46 seconds to .23. When it comes to regular expression pattern matching, I have come to the conclusion that the only option is preg_match. Inspired by this revelation, I decided to test preg_split vs split vs explode. It was not nearly as interesting, but I thought I would post my results nonetheless. Using the same string as above, I decided to split the string by the </option tag. I used the following commands in a side-by-side comparison (again in a loop of 5000): preg_split( '/<\/option/i', $string, $arr ); spliti( "</option>", $string, $arr ); explode( "</option>", $string, $arr ); The results: preg_split Timer: This page was generated in 0.23138296604156 seconds. split Timer: This page was generated in 0.22009003162384 seconds. explode Timer: This page was generated in 0.14973497390747 seconds. This really is not too surprising when it comes to explode. If there is no complex pattern matching, always use explode. preg_split vs split was a little surprising given my findings above, but in general, it looks like while there is not much of a difference, split has the slight edge. Summary: * If you are doing regular expression matching in a string, use preg_match. Not only is it much faster, but it is much more powerful than ereg. * If you are splitting a string by a simple string pattern, use explode. * If you are splitting a string using regular expressions, use split unless you need the functionality of preg_split. Disclaimer: I have not done exhaustive performance study of all of the possible scenarios to find discrepancies, but from my observations so far, the above conclusions have held true. If anyone has any other information, please post it for us all to share. I hope some of you have found this useful. Matthew Aznoe Fuzz Technologies [EMAIL PROTECTED] (406) 587-1100 x217
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]