Re: [Python-Dev] str.count is slow

2006-03-12 Thread Armin Rigo
Hi Ben,

On Mon, Feb 27, 2006 at 06:50:28PM -0500, Ben Cartwright wrote:
  It seems to me that str.count is awfully slow.  Is there some reason
  for this?

stringobject.c could do with a good clean-up.  It contains very similar
algorithms multiple times, in slightly different styles and with
different performance characteristics.

If I find some motivation I'll try to come up with a patch.


A bientot,

Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-28 Thread James Y Knight

On Feb 28, 2006, at 6:14 PM, Guido van Rossum wrote:

 On 2/28/06, Greg Ewing [EMAIL PROTECTED] wrote:
 Fredrik Lundh wrote:

 My personal goal in life right now is to stay as
 far away from C++ as I can get.

 so what C compiler are you using ?

 Gcc, mostly. I don't mind if it's capable of
 compiling C++, as long as I can choose not to
 write any.

 That's getting harder and harder though. Recent versions of GCC appear
 to be implementing C98 by default -- at least I didn't get complaints
 about declarations placed after non-declarations in the same block
 from any of the buildbot hosts...

I don't know whether you meant C++98 or C99 in the above, but the  
default is (mostly) C99, now. If you like, you can still tell it to  
compile in C89 mode, with --std=c89.

C99 contains some of the superficial C++ syntax changes such as //  
comments and declarations after non-declarations which have long been  
implemented as non-standard extensions, anyhow, but it's still  
nothing like C++.

As for the question of whether to switch to C++ in 3.0, I'd say  
probably not, as it's much harder to interface with C++ from other  
languages than to C.

James

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-28 Thread Greg Ewing
Guido van Rossum wrote:

 Recent versions of GCC appear
 to be implementing C98 by default -- at least I didn't get complaints
 about declarations placed after non-declarations in the same block
 from any of the buildbot hosts...

As long as it doesn't complain when I *do* put all
my declarations before my non-declarations, I can
live with that. :-)

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-27 Thread Ben Cartwright
From comp.lang.python:
[EMAIL PROTECTED] wrote:
 It seems to me that str.count is awfully slow.  Is there some reason
 for this?
 Evidence:

  str.count time test 
 import string
 import time
 import array

 s = string.printable * int(1e5) # 10**7 character string
 a = array.array('c', s)
 u = unicode(s)
 RIGHT_ANSWER = s.count('a')

 def main():
 print 'str:', time_call(s.count, 'a')
 print 'array:  ', time_call(a.count, 'a')
 print 'unicode:', time_call(u.count, 'a')

 def time_call(f, *a):
 start = time.clock()
 assert RIGHT_ANSWER == f(*a)
 return time.clock()-start

 if __name__ == '__main__':
 main()

 ## end 

 On my machine, the output is:

 str: 0.29365715475
 array:   0.448095498171
 unicode: 0.0243757237303

 If a unicode object can count characters so fast, why should an str
 object be ten times slower?  Just curious, really - it's still fast
 enough for me (so far).

 This is with Python 2.4.1 on WinXP.


 Chris Perkins


Your evidence points to some unoptimized code in the underlying C
implementation of Python.  As such, this should probably go to the
python-dev list (http://mail.python.org/mailman/listinfo/python-dev).

The problem is that the C library function memcmp is slow, and
str.count calls it frequently.  See lines 2165+ in stringobject.c
(inside function string_count):

r = 0;
while (i  m) {
if (!memcmp(s+i, sub, n)) {
r++;
i += n;
} else {
i++;
}
}

This could be optimized as:

r = 0;
while (i  m) {
if (s[i] == *sub  !memcmp(s+i, sub, n)) {
r++;
i += n;
} else {
i++;
}
}

This tactic typically avoids most (sometimes all) of the calls to
memcmp.  Other string search functions, including unicode.count,
unicode.index, and str.index, use this tactic, which is why you see
unicode.count performing better than str.count.

The above might be optimized further for cases such as yours, where a
single character appears many times in the string:

r = 0;
if (n == 1) {
/* optimize for a single character */
while (i  m) {
if (s[i] == *sub)
r++;
i++;
}
} else {
while (i  m) {
if (s[i] == *sub  !memcmp(s+i, sub, n)) {
r++;
i += n;
} else {
i++;
}
}
}

Note that there might be some subtle reason why neither of these
optimizations are done that I'm unaware of... in which case a comment
in the C source would help. :-)

--Ben
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-27 Thread martin
Zitat von Fredrik Lundh [EMAIL PROTECTED]:

 it's about time that someone sat down and merged the string and unicode
 implementations into a single stringlib code base (see the SRE sources for
 an efficient way to do this in plain C). [1]
[...]
 1) anyone want me to start working on this ?

This would be a waste of time: In Python 3, the string type will be
gone (or, rather, the unicode type, depending on the point of view).

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-27 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

  it's about time that someone sat down and merged the string and unicode
  implementations into a single stringlib code base (see the SRE sources for
  an efficient way to do this in plain C). [1]
 [...]
  1) anyone want me to start working on this ?

 This would be a waste of time: In Python 3, the string type will be
 gone (or, rather, the unicode type, depending on the point of view).

no matter what ends up in Python 3, you'll still need to perform operations
on both 8-bit buffers and Unicode buffers.

(not to mention that a byte type that doesn't support find/split/count etc
is pretty useless).

/F



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-27 Thread Greg Ewing
Fredrik Lundh wrote:

 moving to (basic) C++ might also be a good idea (in 3.0, perhaps).  is any-
 one still stuck with pure C89 these days ?

Some of us actually *prefer* working with plain C
when we have a choice, and don't consider ourselves
stuck with it.

My personal goal in life right now is to stay as
far away from C++ as I can get. If CPython becomes
C++-based (C++Python?) I will find it quite
distressing, because my most favourite language will
then be built on top of my least favourite language.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.count is slow

2006-02-27 Thread Fredrik Lundh
Greg Ewing wrote:

 Fredrik Lundh wrote:

  moving to (basic) C++ might also be a good idea (in 3.0, perhaps).  is any-
  one still stuck with pure C89 these days ?

 Some of us actually *prefer* working with plain C
 when we have a choice, and don't consider ourselves
 stuck with it.

perhaps, but polymorphic code is a lot easier to write in C++ than
in C.

 My personal goal in life right now is to stay as
 far away from C++ as I can get.

so what C compiler are you using ?

/F



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com