[issue10667] collections.Counter object in C

2010-12-15 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Fixed "if(" spacing and applied in r87265.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-14 Thread Gregory P. Smith

Gregory P. Smith  added the comment:

rhettinger's fastcount.patch looks good to me.  a couple style nits but they're 
minor.

no space between if and (?  yuck.   short variable names like "it"?  yuck.

but the code looks good otherwise.  i'm all for it.

--
nosy: +gregory.p.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-14 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Attaching a new patch for high-speed update() and __init__().  I also tried a C 
version of __missing__() but the speed-up was too small to be worth it.

--
resolution: later -> 
stage:  -> patch review
versions: +Python 3.2 -Python 3.3
Added file: http://bugs.python.org/file20043/fastcount.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-10 Thread Justin Peel

Justin Peel  added the comment:

Okay, I was done submitting. I thought that Antoine was asking for a version 
that kept a pure Python version so I did that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-10 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

I would like this API to "sit and cook" for a good while.  There are many 
possible ways to add more methods and most be end-up being YAGNI. 
Also, my experience with dict.fromkeys() is that a fair number of people get 
confused by alternative constructors.  So, at this point I have zero interest 
in adding from_items().

Am taking the C optimizations under advisement.  In the meantime, there is no 
need to keep submitting patch variations.  The question is less how-to-do-it 
and more a question of whether-to-do-it.  We try not to add an alternate C 
paths unless a feature is an important building block.  There is an on-going 
maintenance cost and a risk of slightly different behaviors (at a minimum, the 
tracebacks look different and the code can't be traced through pdb).  That 
being said, there is a nice speed-up to be had, so I'll give that some weight.

--
priority: normal -> low
resolution:  -> later

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-10 Thread Justin Peel

Justin Peel  added the comment:

I've done as Antoine asked and made a pure Python PyCounter class and a 
C-enhanced Counter class that both use the mixin CounterBase class. I also 
added to the tests so that both PyCounter and Counter are tested. I left the 
update_fromsubs() method in Counter for now, but haven't made a Python 
equivalent function for PyCounter in case the method is rejected.

Also, I realized that my build was still in debug mode so those benchmark 
weren't quite accurate (Oops). The Counter class is only 4-6x faster for most 
cases. That's still good, but not quite as good as I'd previously reported.

Lastly, I thought of one more method that could be quite useful. Say that you 
have a list like [('apples',4),('oranges',5),('apples',8)] where the second 
element in each tuple indicates how many of them there are. The resultant 
Counter should have {'apples': 12, 'oranges':5}. We don't really have a good 
way to count this, but we could add a small method like the following:

def fromitems(self, items):
for key, value in items:
self[key] += value

and I could easily make a C function for it. I think that this simple method 
could be very useful to some people. What do you all think of this addition?

--
Added file: http://bugs.python.org/file20004/counterpatch2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-10 Thread Andrew Svetlov

Changes by Andrew Svetlov :


--
nosy: +asvetlov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-10 Thread Daniel Urban

Changes by Daniel Urban :


--
nosy: +durban

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-09 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

When adding some C accelerations, we often try to keep a pure Python 
alternative in the stdlib, since it can then benefit other Python 
implementations (and easy prototyping).

If you move some of the methods inside a mixin and use multiple inheritance 
with the base C impl, this should be easy.
Also, the unit tests should then test both the C impl and the pure Python impl.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-09 Thread Raymond Hettinger

Changes by Raymond Hettinger :


Added file: http://bugs.python.org/file19993/time_counter.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-09 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Thanks for the patch.

FWIW, I'm attaching some timing code that I've used in the past.

--
assignee:  -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10667] collections.Counter object in C

2010-12-09 Thread Justin Peel

New submission from Justin Peel :

I put the Counter's update and __missing__ methods into C code. The rest of the 
existing methods remain the same. The new Counter is at least 2x-10x (or more) 
faster for updating depending on the input. I also added a new method, 
update_fromsubs, which is documented below the timings for updating a Counter.

Here are a few examples to demonstrate the speed of the new update function:

Counting list(range(100))*100 - 
Old Counter: 18.6 msec
New Counter: 1.43
13x speed-up

Counting range(1) - 
Old Counter: 21.5 msec
New Counter: 3.84 msec
5.6x speed-up

Counter generator  i % 100 for i in range(1) - 
Old Counter: 36.7 msec
New Counter: 17.5 msec
2.1x speed-up

and the __missing__ method being in C makes getting missing keys about twice as 
fast.

Also, I added a new method, update_fromsubs, which counts subarrays/substrings 
of a sequence. This method only works with immutable sequences like tuples, 
bytes, and unicode objects. The method is of the form

update_fromsubs(seq, frame[, step[, lo[, hi]]])

where frame is the size of the subarrays, step is how far to move forward after 
each subarray, and lo and hi are the respective starting and ending indices to 
count subarrays within the sequence.

For instance,

c = Counter()
c.update_fromsubs('abcd', 2)

yields

Counter({'cd': 1, 'ab': 1, 'bc': 1})

and

d = Counter()
d.update_fromsubs('abcd', 2, 2)

yields

Counter({'ab': 1, 'cd: 1})

These sorts of operations could be done with generators in a manner like

Counter(s[i:i+2] for i in range(0, len(s)-1))

but it can be about twice as fast by using update_fromsubs. Here's an example

Counting subarrays of length 2 of "ab"*1:
update_fromsubs:30.8 ms
New Counter w/ generator:   54.3 ms
Old Counter w/ generator:   98.8 ms

This method would probably be most useful in processing strings, but there may 
be other uses. If it is accepted, please feel to change the method name and 
parameters. I especially wasn't sure what to call this method 
(update_fromsubsequences seemed rather long).

--
components: Extension Modules, Library (Lib)
files: counterpatch.diff
keywords: patch
messages: 123706
nosy: jpeel, rhettinger
priority: normal
severity: normal
status: open
title: collections.Counter object in C
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file19992/counterpatch.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com