Re: [PHP-DEV] strtr vs. str_replace runtime
On Mon, 14 Jan 2013 22:55:33 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: OK, so now the plan is to merge this onto 5.4: https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54 And this to 5.5: https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55 The difference is that the 5.4 version does not introduce zend_qsort_r() and instead has dedicated simple heap sort. Given the marked improvements in performance and no objection from the RMs, I went through with this plan. -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
Em 2013-01-11 0:32, Christopher Jones escreveu: How does this compare with your baseline results? I ran some benchmarks. Configure line: CC=gcc-mp-4.8 CFLAGS=-O3 -march=native ./configure --disable-all --host=x86_64-apple-darwin10 --build=x86_64-apple-darwin10 CPU: Intel(R) Core(TM) i5-2500S CPU @ 2.70GHz The text being searched is this page copy pasted: http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_execute.h The functions are run 30 time in a loop. All new algorithm (branch cataphract/strtr_wu94): Replacements: 13 total, smallest 6 largest 19 strtr: 0.1534 str_replace: 0.8625 Replacements: 14 total, smallest 1 largest 19 strtr: 0.5305 str_replace: 1.0654 Replacements: 1 total, smallest 7 largest 7 strtr: 0.1142 str_replace: 0.0985 Replacements: 2 total, smallest 7 largest 2500 strtr: 0.1165 str_replace: 0.1656 === Old algorithm improved (branch cataphract/strtr): Replacements: 13 total, smallest 6 largest 19 strtr: 0.8922 str_replace: 0.8606 Replacements: 14 total, smallest 1 largest 19 strtr: 1.2031 str_replace: 1.0687 Replacements: 1 total, smallest 7 largest 7 strtr: 0.4130 str_replace: 0.0991 Replacements: 2 total, smallest 7 largest 2500 strtr: 0.5886 str_replace: 0.1719 === Current (branch master) Replacements: 13 total, smallest 6 largest 19 strtr: 20.0317 str_replace: 0.8707 results match! Replacements: 14 total, smallest 1 largest 19 strtr: 26.6792 str_replace: 1.1017 results match! Replacements: 1 total, smallest 7 largest 7 strtr: 1.2030 str_replace: 0.0850 results match! Replacements: 2 total, smallest 7 largest 2500 ^C (got tired of waiting after a few minutes) -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
On Wed, 09 Jan 2013 23:45:03 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. Both optimizations (the hash rolling and limiting the substrings hashed on each iteration) worked quite well. But I got much better results with another algorithm [1], so I'm going to merge the branch with it [2] instead. OK, so now the plan is to merge this onto 5.4: https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54 And this to 5.5: https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55 The difference is that the 5.4 version does not introduce zend_qsort_r() and instead has dedicated simple heap sort. Any thoughts on this? -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
On 01/14/2013 01:55 PM, Gustavo Lopes wrote: On Wed, 09 Jan 2013 23:45:03 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. Both optimizations (the hash rolling and limiting the substrings hashed on each iteration) worked quite well. But I got much better results with another algorithm [1], so I'm going to merge the branch with it [2] instead. OK, so now the plan is to merge this onto 5.4: https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54 And this to 5.5: https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55 The difference is that the 5.4 version does not introduce zend_qsort_r() and instead has dedicated simple heap sort. Any thoughts on this? Despite temptation, I'm still erring on the side of merging to 5.5+ only. My argument is based on the potential for regressions (behavior and performance), added code maintenance issues, merge effort etc. The ability to differentiate the benefits of PHP 5.5 over 5.4, and promote migration to 5.5 will also be improved. Chris -- christopher.jo...@oracle.com http://twitter.com/ghrd Newly updated, free PHP Oracle book: http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
Hi! OK, so now the plan is to merge this onto 5.4: https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54 This one looks mostly harmless, so if all strtr tests still pass I think it's OK for 5.4. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
Hi! On 9 January 2013 23:45, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. Both optimizations (the hash rolling and limiting the substrings hashed on each iteration) worked quite well. But I got much better results with another algorithm [1], so I'm going to merge the branch with it [2] instead. I get these results with a 1.7 MB string and 13 replacement strings, the smallest with 6 characters and 30 iterations (x86-64, gcc -O3): strtr: 0.1387 str_replace: 0.4471 Nice :) The algorithm doesn't perform as well when the replacement strings are small. Adding a replacement for the pattern '_' (1 character) yields: strtr: 0.6157 str_replace: 0.6230 Even that is way better than before :) But even in this case, it works better than my optimized version of the current algorithm. I plan on merging to 5.4 and 5.5; you may want to review it as introducing completely new code carries some risk. [1] http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**2927http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927 [2] https://github.com/cataphract/php-src/compare/strtr_wu94 Does merging to 5.4 mean I can grab the head of the 5.4 branch afterwards and try it myself? Thanks! Greetings Nico
Re: [PHP-DEV] strtr vs. str_replace runtime
On 01/09/2013 02:45 PM, Gustavo Lopes wrote: On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. Both optimizations (the hash rolling and limiting the substrings hashed on each iteration) worked quite well. But I got much better results with another algorithm [1], so I'm going to merge the branch with it [2] instead. I get these results with a 1.7 MB string and 13 replacement strings, the smallest with 6 characters and 30 iterations (x86-64, gcc -O3): strtr: 0.1387 str_replace: 0.4471 The algorithm doesn't perform as well when the replacement strings are small. Adding a replacement for the pattern '_' (1 character) yields: strtr: 0.6157 str_replace: 0.6230 How does this compare with your baseline results? But even in this case, it works better than my optimized version of the current algorithm. I plan on merging to 5.4 and 5.5; you may want to review it as introducing completely new code carries some risk. Depending on the improvement, it might be tempting to merge to 5.4 but I would prefer to see it in 5.5+. Let's keep 5.4 stable. Chris [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927 [2] https://github.com/cataphract/php-src/compare/strtr_wu94 -- christopher.jo...@oracle.com http://twitter.com/ghrd Newly updated, free PHP Oracle book: http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. Both optimizations (the hash rolling and limiting the substrings hashed on each iteration) worked quite well. But I got much better results with another algorithm [1], so I'm going to merge the branch with it [2] instead. I get these results with a 1.7 MB string and 13 replacement strings, the smallest with 6 characters and 30 iterations (x86-64, gcc -O3): strtr: 0.1387 str_replace: 0.4471 The algorithm doesn't perform as well when the replacement strings are small. Adding a replacement for the pattern '_' (1 character) yields: strtr: 0.6157 str_replace: 0.6230 But even in this case, it works better than my optimized version of the current algorithm. I plan on merging to 5.4 and 5.5; you may want to review it as introducing completely new code carries some risk. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927 [2] https://github.com/cataphract/php-src/compare/strtr_wu94 -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
Em 2013-01-02 16:53, Nicolai Scheer escreveu: I might have chosen the wrong tool for what I'm trying to achieve in the first place, but can anyone comment on the algorithmic complexity of strtr? This is definitely not the expected behaviour for such small inputs. Since the inputs varied and the keys where determined automatically in my original script, I was confronted with runtimes of several hours compared to just a few seconds with str_replace. If this is the expected behaviour, at least the documentation should be adjusted to state that this function is very inefficient with keylengths that are very distant from each other... Please open a bug to track this. The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. -- Gustavo Lopes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] strtr vs. str_replace runtime
Hi! On 3 January 2013 11:40, Gustavo Lopes glo...@nebm.ist.utl.pt wrote: Em 2013-01-02 16:53, Nicolai Scheer escreveu: I might have chosen the wrong tool for what I'm trying to achieve in the first place, but can anyone comment on the algorithmic complexity of strtr? This is definitely not the expected behaviour for such small inputs. Since the inputs varied and the keys where determined automatically in my original script, I was confronted with runtimes of several hours compared to just a few seconds with str_replace. If this is the expected behaviour, at least the documentation should be adjusted to state that this function is very inefficient with keylengths that are very distant from each other... Please open a bug to track this. The algorithm behaves very poorly in this case because at each position of the text, all the substrings starting there and with size between m and n (where m is the size of the smallest pattern and n is the largest) are checked, even if there are only two patterns with size m and n. We could fix this easily by building a set of the pattern sizes found and try only with those. The hashing of the substrings could also be improved; we don't have to recalculate everything when we advance in the text. Ok, here it goes: https://bugs.php.net/bug.php?id=63893 Greetings, Nico
[PHP-DEV] strtr vs. str_replace runtime
Hi! I stumbled upon a problem with the function strtr() the other day... I noticed a very long running php script, and tried to reproduce the behaviour. I traced it down to a single call of strtr doing text replacements using a search = replace array. It wasn't quit obvious why the call would take that long, so I started to investigate the issue. From the docs, it says that strtr ... will be the most efficient when all the keys have the same size. My testcase showed, that in fact I was using Keys of very different lengths (they are determined automatically, so there's no fixed list). I wrote a simple script to reproduce this behaviour: ?php $text = str_repeat( 'm', 2000 ); $long_from_a = str_repeat( 'a', 1 ); $long_from_x = str_repeat( 'x', 1500 ); $replacements = array( $long_from_a = 'b', $long_from_x = 'y' ); $start = microtime( true ); $result_1 = strtr( $text, $replacements ); echo strtr: . number_format( microtime( true ) - $start, 4 ) . \n; $start = microtime( true ); $result_2 = str_replace( array_keys( $replacements ), array_values( $replacements ), $text ); echo str_replace: . number_format( microtime( true ) - $start, 4 ) . \n; echo $result_1 === $result_2 ? results match!\n: no match!\n; ? On my box, this reports 2.5 seconds for strtr and 0.0 seconds for str_replace. As far as I know the only difference between str_replace and strtr is that strtr does not replace stuff in already replaced parts of the string. Might be wrong here, though. If I adjust the str_repeat for m from 2000 to 2 the runtime is 45 seconds for strtr and 0.0001 for str_replace. I might have chosen the wrong tool for what I'm trying to achieve in the first place, but can anyone comment on the algorithmic complexity of strtr? This is definitely not the expected behaviour for such small inputs. Since the inputs varied and the keys where determined automatically in my original script, I was confronted with runtimes of several hours compared to just a few seconds with str_replace. If this is the expected behaviour, at least the documentation should be adjusted to state that this function is very inefficient with keylengths that are very distant from each other... Greetings Nico