New submission from Lars Buitinck <l.j.buiti...@uva.nl>: I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:
count_total = Counter() for doc in documents: count_current = Counter(analyze(doc)) count_total += count_current count_per_doc.append(count_current) Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.) [1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af ---------- components: Library (Lib) files: cpython-counter-iadd.diff keywords: patch messages: 145063 nosy: larsmans priority: normal severity: normal status: open title: collections.Counter's += copies the entire object type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file23336/cpython-counter-iadd.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13121> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com