On Wed, 09 Jan 2013 23:45:03 +0100, Gustavo Lopes <glo...@nebm.ist.utl.pt>
wrote:
On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes
<glo...@nebm.ist.utl.pt> wrote:
The algorithm behaves very poorly in this case because at each position
of the text, all the substrings starting there and with size between m
and n (where m is the size of the smallest pattern and n is the
largest) are checked, even if there are only two patterns with size m
and n. We could fix this easily by building a set of the pattern sizes
found and try only with those. The hashing of the substrings could also
be improved; we don't have to recalculate everything when we advance in
the text.
Both optimizations (the hash rolling and limiting the substrings hashed
on each iteration) worked quite well.
But I got much better results with another algorithm [1], so I'm going
to merge the branch with it [2] instead.
OK, so now the plan is to merge this onto 5.4:
https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54
And this to 5.5:
https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55
The difference is that the 5.4 version does not introduce zend_qsort_r()
and instead has dedicated simple heap sort.
Any thoughts on this?
--
Gustavo Lopes
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php