Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-15 Thread Gustavo Lopes
On Mon, 14 Jan 2013 22:55:33 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt  
wrote:



OK, so now the plan is to merge this onto 5.4:

https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54

And this to 5.5:

https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55

The difference is that the 5.4 version does not introduce zend_qsort_r()  
and instead has dedicated simple heap sort.




Given the marked improvements in performance and no objection from the  
RMs, I went through with this plan.


--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-14 Thread Gustavo Lopes

Em 2013-01-11 0:32, Christopher Jones escreveu:


How does this compare with your baseline results?



I ran some benchmarks.
Configure line:
CC=gcc-mp-4.8 CFLAGS=-O3 -march=native ./configure --disable-all 
--host=x86_64-apple-darwin10 --build=x86_64-apple-darwin10

CPU: Intel(R) Core(TM) i5-2500S CPU @ 2.70GHz

The text being searched is this page copy pasted: 
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_execute.h

The functions are run 30 time in a loop.

All new algorithm (branch cataphract/strtr_wu94):

Replacements: 13 total, smallest 6 largest 19
strtr: 0.1534
str_replace: 0.8625

Replacements: 14 total, smallest 1 largest 19
strtr: 0.5305
str_replace: 1.0654

Replacements: 1 total, smallest 7 largest 7
strtr: 0.1142
str_replace: 0.0985

Replacements: 2 total, smallest 7 largest 2500
strtr: 0.1165
str_replace: 0.1656

===
Old algorithm improved (branch cataphract/strtr):

Replacements: 13 total, smallest 6 largest 19
strtr: 0.8922
str_replace: 0.8606

Replacements: 14 total, smallest 1 largest 19
strtr: 1.2031
str_replace: 1.0687

Replacements: 1 total, smallest 7 largest 7
strtr: 0.4130
str_replace: 0.0991

Replacements: 2 total, smallest 7 largest 2500
strtr: 0.5886
str_replace: 0.1719

===
Current (branch master)

Replacements: 13 total, smallest 6 largest 19
strtr: 20.0317
str_replace: 0.8707
results match!

Replacements: 14 total, smallest 1 largest 19
strtr: 26.6792
str_replace: 1.1017
results match!

Replacements: 1 total, smallest 7 largest 7
strtr: 1.2030
str_replace: 0.0850
results match!

Replacements: 2 total, smallest 7 largest 2500
^C (got tired of waiting after a few minutes)

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-14 Thread Gustavo Lopes
On Wed, 09 Jan 2013 23:45:03 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt  
wrote:


On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes  
glo...@nebm.ist.utl.pt wrote:


The algorithm behaves very poorly in this case because at each position  
of the text, all the substrings starting there and with size between m  
and n (where m is the size of the smallest pattern and n is the  
largest) are checked, even if there are only two patterns with size m  
and n. We could fix this easily by building a set of the pattern sizes  
found and try only with those. The hashing of the substrings could also  
be improved; we don't have to recalculate everything when we advance in  
the text.




Both optimizations (the hash rolling and limiting the substrings hashed  
on each iteration) worked quite well.


But I got much better results with another algorithm [1], so I'm going  
to merge the branch with it [2] instead.


OK, so now the plan is to merge this onto 5.4:

https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54

And this to 5.5:

https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55

The difference is that the 5.4 version does not introduce zend_qsort_r()  
and instead has dedicated simple heap sort.


Any thoughts on this?

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-14 Thread Christopher Jones



On 01/14/2013 01:55 PM, Gustavo Lopes wrote:

On Wed, 09 Jan 2013 23:45:03 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt 
wrote:


On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt 
wrote:


The algorithm behaves very poorly in this case because at each position of the 
text, all the substrings starting there and with size between m and n (where m 
is the size of the smallest pattern and n is the largest) are checked, even if 
there are only
two patterns with size m and n. We could fix this easily by building a set of 
the pattern sizes found and try only with those. The hashing of the substrings 
could also be improved; we don't have to recalculate everything when we advance 
in the text.



Both optimizations (the hash rolling and limiting the substrings hashed on each 
iteration) worked quite well.

But I got much better results with another algorithm [1], so I'm going to merge 
the branch with it [2] instead.


OK, so now the plan is to merge this onto 5.4:

https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54

And this to 5.5:

https://github.com/cataphract/php-src/compare/php:PHP-5.5...cataphract:strtr_wu94_55

The difference is that the 5.4 version does not introduce zend_qsort_r() and 
instead has dedicated simple heap sort.

Any thoughts on this?



Despite temptation, I'm still erring on the side of merging to 5.5+
only.  My argument is based on the potential for regressions (behavior
and performance), added code maintenance issues, merge effort etc.
The ability to differentiate the benefits of PHP 5.5 over 5.4, and
promote migration to 5.5 will also be improved.

Chris

--
christopher.jo...@oracle.com  http://twitter.com/ghrd
Newly updated, free PHP  Oracle book:
http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-14 Thread Stas Malyshev
Hi!

 OK, so now the plan is to merge this onto 5.4:
 
 https://github.com/cataphract/php-src/compare/php:PHP-5.4...cataphract:strtr_wu94_54

This one looks mostly harmless, so if all strtr tests still pass I think
it's OK for 5.4.

-- 
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-10 Thread Nicolai Scheer
Hi!


On 9 January 2013 23:45, Gustavo Lopes glo...@nebm.ist.utl.pt wrote:

 On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt
 wrote:

  The algorithm behaves very poorly in this case because at each position
 of the text, all the substrings starting there and with size between m and
 n (where m is the size of the smallest pattern and n is the largest) are
 checked, even if there are only two patterns with size m and n. We could
 fix this easily by building a set of the pattern sizes found and try only
 with those. The hashing of the substrings could also be improved; we don't
 have to recalculate everything when we advance in the text.


 Both optimizations (the hash rolling and limiting the substrings hashed on
 each iteration) worked quite well.

 But I got much better results with another algorithm [1], so I'm going to
 merge the branch with it [2] instead. I get these results with a 1.7 MB
 string and 13 replacement strings, the smallest with 6 characters and 30
 iterations (x86-64, gcc -O3):

 strtr: 0.1387
 str_replace: 0.4471


Nice :)



 The algorithm doesn't perform as well when the replacement strings are
 small. Adding a replacement for the pattern '_' (1 character) yields:

 strtr: 0.6157
 str_replace: 0.6230


Even that is way better than before :)



 But even in this case, it works better than my optimized version of the
 current algorithm.

 I plan on merging to 5.4 and 5.5; you may want to review it as introducing
 completely new code carries some risk.

 [1] 
 http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**2927http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927
 [2] https://github.com/cataphract/php-src/compare/strtr_wu94


Does merging to 5.4 mean I can grab the head of the 5.4 branch afterwards
and try it myself?

Thanks!

Greetings

Nico


Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-10 Thread Christopher Jones



On 01/09/2013 02:45 PM, Gustavo Lopes wrote:

On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt 
wrote:


The algorithm behaves very poorly in this case because at each position of the 
text, all the substrings starting there and with size between m and n (where m 
is the size of the smallest pattern and n is the largest) are checked, even if 
there are only
two patterns with size m and n. We could fix this easily by building a set of 
the pattern sizes found and try only with those. The hashing of the substrings 
could also be improved; we don't have to recalculate everything when we advance 
in the text.



Both optimizations (the hash rolling and limiting the substrings hashed on each 
iteration) worked quite well.

But I got much better results with another algorithm [1], so I'm going to merge 
the branch with it [2] instead. I get these results with a 1.7 MB string and 13 
replacement strings, the smallest with 6 characters and 30 iterations (x86-64, 
gcc -O3):

strtr: 0.1387
str_replace: 0.4471

The algorithm doesn't perform as well when the replacement strings are small. 
Adding a replacement for the pattern '_' (1 character) yields:

strtr: 0.6157
str_replace: 0.6230


How does this compare with your baseline results?



But even in this case, it works better than my optimized version of the current 
algorithm.

I plan on merging to 5.4 and 5.5; you may want to review it as introducing 
completely new code carries some risk.


Depending on the improvement, it might be tempting to merge to 5.4 but I would
prefer to see it in 5.5+.  Let's keep 5.4 stable.

Chris




[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927
[2] https://github.com/cataphract/php-src/compare/strtr_wu94



--
christopher.jo...@oracle.com  http://twitter.com/ghrd
Newly updated, free PHP  Oracle book:
http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-09 Thread Gustavo Lopes
On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes glo...@nebm.ist.utl.pt  
wrote:


The algorithm behaves very poorly in this case because at each position  
of the text, all the substrings starting there and with size between m  
and n (where m is the size of the smallest pattern and n is the largest)  
are checked, even if there are only two patterns with size m and n. We  
could fix this easily by building a set of the pattern sizes found and  
try only with those. The hashing of the substrings could also be  
improved; we don't have to recalculate everything when we advance in the  
text.




Both optimizations (the hash rolling and limiting the substrings hashed on  
each iteration) worked quite well.


But I got much better results with another algorithm [1], so I'm going to  
merge the branch with it [2] instead. I get these results with a 1.7 MB  
string and 13 replacement strings, the smallest with 6 characters and 30  
iterations (x86-64, gcc -O3):


strtr: 0.1387
str_replace: 0.4471

The algorithm doesn't perform as well when the replacement strings are  
small. Adding a replacement for the pattern '_' (1 character) yields:


strtr: 0.6157
str_replace: 0.6230

But even in this case, it works better than my optimized version of the  
current algorithm.


I plan on merging to 5.4 and 5.5; you may want to review it as introducing  
completely new code carries some risk.


[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927
[2] https://github.com/cataphract/php-src/compare/strtr_wu94

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-03 Thread Gustavo Lopes

Em 2013-01-02 16:53, Nicolai Scheer escreveu:


I might have chosen the wrong tool for what I'm trying to achieve in 
the
first place, but can anyone comment on the algorithmic complexity of 
strtr?
This is definitely not the expected behaviour for such small inputs. 
Since

the inputs varied and the keys where determined automatically in my
original script, I was confronted with runtimes of several hours 
compared

to just a few seconds with str_replace.

If this is the expected behaviour, at least the documentation should 
be
adjusted to state that this function is very inefficient with 
keylengths

that are very distant from each other...



Please open a bug to track this.

The algorithm behaves very poorly in this case because at each position 
of the text, all the substrings starting there and with size between m 
and n (where m is the size of the smallest pattern and n is the largest) 
are checked, even if there are only two patterns with size m and n. We 
could fix this easily by building a set of the pattern sizes found and 
try only with those. The hashing of the substrings could also be 
improved; we don't have to recalculate everything when we advance in the 
text.


--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] strtr vs. str_replace runtime

2013-01-03 Thread Nicolai Scheer
Hi!

On 3 January 2013 11:40, Gustavo Lopes glo...@nebm.ist.utl.pt wrote:

 Em 2013-01-02 16:53, Nicolai Scheer escreveu:


 I might have chosen the wrong tool for what I'm trying to achieve in the
 first place, but can anyone comment on the algorithmic complexity of
 strtr?
 This is definitely not the expected behaviour for such small inputs. Since
 the inputs varied and the keys where determined automatically in my
 original script, I was confronted with runtimes of several hours compared
 to just a few seconds with str_replace.

 If this is the expected behaviour, at least the documentation should be
 adjusted to state that this function is very inefficient with keylengths
 that are very distant from each other...


 Please open a bug to track this.

 The algorithm behaves very poorly in this case because at each position of
 the text, all the substrings starting there and with size between m and n
 (where m is the size of the smallest pattern and n is the largest) are
 checked, even if there are only two patterns with size m and n. We could
 fix this easily by building a set of the pattern sizes found and try only
 with those. The hashing of the substrings could also be improved; we don't
 have to recalculate everything when we advance in the text.


Ok, here it goes:

https://bugs.php.net/bug.php?id=63893

Greetings,

Nico