Re: [Python-Dev] str.count is slow
Hi Ben, On Mon, Feb 27, 2006 at 06:50:28PM -0500, Ben Cartwright wrote: It seems to me that str.count is awfully slow. Is there some reason for this? stringobject.c could do with a good clean-up. It contains very similar algorithms multiple times, in slightly different styles and with different performance characteristics. If I find some motivation I'll try to come up with a patch. A bientot, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
On Feb 28, 2006, at 6:14 PM, Guido van Rossum wrote: On 2/28/06, Greg Ewing [EMAIL PROTECTED] wrote: Fredrik Lundh wrote: My personal goal in life right now is to stay as far away from C++ as I can get. so what C compiler are you using ? Gcc, mostly. I don't mind if it's capable of compiling C++, as long as I can choose not to write any. That's getting harder and harder though. Recent versions of GCC appear to be implementing C98 by default -- at least I didn't get complaints about declarations placed after non-declarations in the same block from any of the buildbot hosts... I don't know whether you meant C++98 or C99 in the above, but the default is (mostly) C99, now. If you like, you can still tell it to compile in C89 mode, with --std=c89. C99 contains some of the superficial C++ syntax changes such as // comments and declarations after non-declarations which have long been implemented as non-standard extensions, anyhow, but it's still nothing like C++. As for the question of whether to switch to C++ in 3.0, I'd say probably not, as it's much harder to interface with C++ from other languages than to C. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
Guido van Rossum wrote: Recent versions of GCC appear to be implementing C98 by default -- at least I didn't get complaints about declarations placed after non-declarations in the same block from any of the buildbot hosts... As long as it doesn't complain when I *do* put all my declarations before my non-declarations, I can live with that. :-) -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
From comp.lang.python: [EMAIL PROTECTED] wrote: It seems to me that str.count is awfully slow. Is there some reason for this? Evidence: str.count time test import string import time import array s = string.printable * int(1e5) # 10**7 character string a = array.array('c', s) u = unicode(s) RIGHT_ANSWER = s.count('a') def main(): print 'str:', time_call(s.count, 'a') print 'array: ', time_call(a.count, 'a') print 'unicode:', time_call(u.count, 'a') def time_call(f, *a): start = time.clock() assert RIGHT_ANSWER == f(*a) return time.clock()-start if __name__ == '__main__': main() ## end On my machine, the output is: str: 0.29365715475 array: 0.448095498171 unicode: 0.0243757237303 If a unicode object can count characters so fast, why should an str object be ten times slower? Just curious, really - it's still fast enough for me (so far). This is with Python 2.4.1 on WinXP. Chris Perkins Your evidence points to some unoptimized code in the underlying C implementation of Python. As such, this should probably go to the python-dev list (http://mail.python.org/mailman/listinfo/python-dev). The problem is that the C library function memcmp is slow, and str.count calls it frequently. See lines 2165+ in stringobject.c (inside function string_count): r = 0; while (i m) { if (!memcmp(s+i, sub, n)) { r++; i += n; } else { i++; } } This could be optimized as: r = 0; while (i m) { if (s[i] == *sub !memcmp(s+i, sub, n)) { r++; i += n; } else { i++; } } This tactic typically avoids most (sometimes all) of the calls to memcmp. Other string search functions, including unicode.count, unicode.index, and str.index, use this tactic, which is why you see unicode.count performing better than str.count. The above might be optimized further for cases such as yours, where a single character appears many times in the string: r = 0; if (n == 1) { /* optimize for a single character */ while (i m) { if (s[i] == *sub) r++; i++; } } else { while (i m) { if (s[i] == *sub !memcmp(s+i, sub, n)) { r++; i += n; } else { i++; } } } Note that there might be some subtle reason why neither of these optimizations are done that I'm unaware of... in which case a comment in the C source would help. :-) --Ben ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
Zitat von Fredrik Lundh [EMAIL PROTECTED]: it's about time that someone sat down and merged the string and unicode implementations into a single stringlib code base (see the SRE sources for an efficient way to do this in plain C). [1] [...] 1) anyone want me to start working on this ? This would be a waste of time: In Python 3, the string type will be gone (or, rather, the unicode type, depending on the point of view). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
[EMAIL PROTECTED] wrote: it's about time that someone sat down and merged the string and unicode implementations into a single stringlib code base (see the SRE sources for an efficient way to do this in plain C). [1] [...] 1) anyone want me to start working on this ? This would be a waste of time: In Python 3, the string type will be gone (or, rather, the unicode type, depending on the point of view). no matter what ends up in Python 3, you'll still need to perform operations on both 8-bit buffers and Unicode buffers. (not to mention that a byte type that doesn't support find/split/count etc is pretty useless). /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
Fredrik Lundh wrote: moving to (basic) C++ might also be a good idea (in 3.0, perhaps). is any- one still stuck with pure C89 these days ? Some of us actually *prefer* working with plain C when we have a choice, and don't consider ourselves stuck with it. My personal goal in life right now is to stay as far away from C++ as I can get. If CPython becomes C++-based (C++Python?) I will find it quite distressing, because my most favourite language will then be built on top of my least favourite language. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.count is slow
Greg Ewing wrote: Fredrik Lundh wrote: moving to (basic) C++ might also be a good idea (in 3.0, perhaps). is any- one still stuck with pure C89 these days ? Some of us actually *prefer* working with plain C when we have a choice, and don't consider ourselves stuck with it. perhaps, but polymorphic code is a lot easier to write in C++ than in C. My personal goal in life right now is to stay as far away from C++ as I can get. so what C compiler are you using ? /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com