New submission from Lars Buitinck <l.j.buiti...@uva.nl>:

I've found some counterintuitive behavior in collections.Counter while hacking 
on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some 
simple term counting in a set of documents, roughly as follows:

   count_total = Counter()
   for doc in documents:
       count_current = Counter(analyze(doc))
       count_total += count_current
       count_per_doc.append(count_current)

Performance was horrible. After some digging, I found out that Counter [2] does 
not have __iadd__ and += copies the entire left-hand side in __add__. I've 
attached a patch that fixes the issue (for += only, and I've not patched the 
testsuite.)

[1] 
https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af

----------
components: Library (Lib)
files: cpython-counter-iadd.diff
keywords: patch
messages: 145063
nosy: larsmans
priority: normal
severity: normal
status: open
title: collections.Counter's += copies the entire object
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file23336/cpython-counter-iadd.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13121>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to