On 01/09/2013 02:45 PM, Gustavo Lopes wrote:
On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes <glo...@nebm.ist.utl.pt> 
wrote:

The algorithm behaves very poorly in this case because at each position of the 
text, all the substrings starting there and with size between m and n (where m 
is the size of the smallest pattern and n is the largest) are checked, even if 
there are only
two patterns with size m and n. We could fix this easily by building a set of 
the pattern sizes found and try only with those. The hashing of the substrings 
could also be improved; we don't have to recalculate everything when we advance 
in the text.


Both optimizations (the hash rolling and limiting the substrings hashed on each 
iteration) worked quite well.

But I got much better results with another algorithm [1], so I'm going to merge 
the branch with it [2] instead. I get these results with a 1.7 MB string and 13 
replacement strings, the smallest with 6 characters and 30 iterations (x86-64, 
gcc -O3):

strtr: 0.1387
str_replace: 0.4471

The algorithm doesn't perform as well when the replacement strings are small. 
Adding a replacement for the pattern '_' (1 character) yields:

strtr: 0.6157
str_replace: 0.6230

How does this compare with your baseline results?


But even in this case, it works better than my optimized version of the current 
algorithm.

I plan on merging to 5.4 and 5.5; you may want to review it as introducing 
completely new code carries some risk.

Depending on the improvement, it might be tempting to merge to 5.4 but I would
prefer to see it in 5.5+.  Let's keep 5.4 stable.

Chris



[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927
[2] https://github.com/cataphract/php-src/compare/strtr_wu94


--
christopher.jo...@oracle.com  http://twitter.com/ghrd
Newly updated, free PHP & Oracle book:
http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to