Re: [Python-Dev] New stringbench benchmark results

2011-10-07 Thread Victor Stinner

Le 07/10/2011 03:19, Steven D'Aprano a écrit :

Given that strings are immutable, would it not be an obvious
optimization for replace to return the source string unchanged if the
old and new substrings are equal, and avoid making a potentially
expensive copy?


I just implemented this optimization in 9c1b76936b79, but only if old 
and new substrings are the same object (old is new). *Compare* 
substrings (content) would slow down .replace() in most cases.


Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New stringbench benchmark results

2011-10-07 Thread Victor Stinner

Le 06/10/2011 12:42, Victor Stinner a écrit :

A.join([Bob]*100)): 0.92 =  2.11


I just optimized PyUnicode_Join() for such dummy benchmark. It's now 
1.2x slower instead of 2.3x slower on this dummy benchmark.


With longer *ASCII* strings, Python 3.3 is now 2x (narrow 3.2) or 4x 
(wide 3.2) faster than Python 3.2. For example with this micro-benchmark:


./python -m timeit 'x=[x*500]*5000; y=\n; z=y.join' 'z(x)'

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New stringbench benchmark results

2011-10-07 Thread Victor Stinner
Le jeudi 6 octobre 2011 02:06:30, Victor Stinner a écrit :
 The rfind case is really strange: the code between Python 3.2 and 3.3 is
 exactly the same. Even in Python 3.2: rfind looks twice faster than find:
 
 (AB*300+C).find(BC) (*1000) : 1.21
 (C+AB*300).rfind(CA) (*1000) : 0.57

It looks to be a gcc bug: using attached patch (written by Antoine), str.find() 
is a little bit faster. With the patch, the function does the same memory 
access, but it generates a different machine code.

I don't know exactly the difference yet, but it may be related to the CMOVNE 
instruction (which looks to be slower than a classical conditional jump, JNE).

Victor
diff --git a/Objects/stringlib/fastsearch.h b/Objects/stringlib/fastsearch.h
--- a/Objects/stringlib/fastsearch.h
+++ b/Objects/stringlib/fastsearch.h
@@ -76,6 +76,8 @@ FASTSEARCH(const STRINGLIB_CHAR* s, Py_s
 mask = 0;
 
 if (mode != FAST_RSEARCH) {
+const STRINGLIB_CHAR *ss = s + m - 1;
+const STRINGLIB_CHAR *pp = p + m - 1;
 
 /* create compressed boyer-moore delta 1 table */
 
@@ -90,7 +92,7 @@ FASTSEARCH(const STRINGLIB_CHAR* s, Py_s
 
 for (i = 0; i = w; i++) {
 /* note: using mlast in the skip path slows things down on x86 */
-if (s[i+m-1] == p[m-1]) {
+if (ss[i] == pp[0]) {
 /* candidate match */
 for (j = 0; j  mlast; j++)
 if (s[i+j] != p[j])
@@ -106,13 +108,13 @@ FASTSEARCH(const STRINGLIB_CHAR* s, Py_s
 continue;
 }
 /* miss: check if next character is part of pattern */
-if (!STRINGLIB_BLOOM(mask, s[i+m]))
+if (!STRINGLIB_BLOOM(mask, ss[i+1]))
 i = i + m;
 else
 i = i + skip;
 } else {
 /* skip: check if next character is part of pattern */
-if (!STRINGLIB_BLOOM(mask, s[i+m]))
+if (!STRINGLIB_BLOOM(mask, ss[i+1]))
 i = i + m;
 }
 }___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New stringbench benchmark results

2011-10-06 Thread Victor Stinner
Hum, copy-paste failure, I wrote numbers in the wrong order, it's:

(test: Python 3.2 = Python 3.3)
A.join([Bob]*100)): 0.92 = 2.11
(C+AB*300).rfind(CA): 0.57 = 1.03
(A + (Z*128*1024)).replace(A, BB, 1): 0.25 = 0.50

I improved str.replace(): it's now 5 times faster instead of 2 times slower 
for this specific benchmark :-) (or 10 times faster in Python 3.3 before/after 
my patch)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New stringbench benchmark results

2011-10-06 Thread Steven D'Aprano

Victor Stinner wrote:

Hum, copy-paste failure, I wrote numbers in the wrong order, it's:

(test: Python 3.2 = Python 3.3)
A.join([Bob]*100)): 0.92 = 2.11
(C+AB*300).rfind(CA): 0.57 = 1.03
(A + (Z*128*1024)).replace(A, BB, 1): 0.25 = 0.50

I improved str.replace(): it's now 5 times faster instead of 2 times slower 
for this specific benchmark :-) (or 10 times faster in Python 3.3 before/after 
my patch)



Talking about str.replace, I was surprised to see this behaviour in 3.2:

 s = 'spam'
 t = s.replace('a', 'a')
 s is t
False

Given that strings are immutable, would it not be an obvious 
optimization for replace to return the source string unchanged if the 
old and new substrings are equal, and avoid making a potentially 
expensive copy?


I note that if count is zero, the source string is returned unchanged:

 t = s.replace('a', 'b', 0)
 t is s
True


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New stringbench benchmark results

2011-10-06 Thread Greg Ewing

Steven D'Aprano wrote:

Given that strings are immutable, would it not be an obvious 
optimization for replace to return the source string unchanged if the 
old and new substrings are equal,


Only if this situation occurs frequently enough to outweigh
the overhead of comparing the target and replacement strings.

This check could be performed very cheaply when both strings
are interned, so it might be worth doing in that case.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com