New submission from Lars Buitinck <[email protected]>:
I've found some counterintuitive behavior in collections.Counter while hacking
on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some
simple term counting in a set of documents, roughly as follows:
count_total = Counter()
for doc in documents:
count_current = Counter(analyze(doc))
count_total += count_current
count_per_doc.append(count_current)
Performance was horrible. After some digging, I found out that Counter [2] does
not have __iadd__ and += copies the entire left-hand side in __add__. I've
attached a patch that fixes the issue (for += only, and I've not patched the
testsuite.)
[1]
https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
----------
components: Library (Lib)
files: cpython-counter-iadd.diff
keywords: patch
messages: 145063
nosy: larsmans
priority: normal
severity: normal
status: open
title: collections.Counter's += copies the entire object
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file23336/cpython-counter-iadd.diff
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13121>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com