Hello, [First off, I'm not a member of this list, so please Cc: me in a reply!]
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows: count_total = Counter() for doc in documents: count_current = Counter(analyze(doc)) count_total += count_current count_per_doc.append(count_current) Because we target Python 2.5+, I implemented a lightweight replacement with just the functionality we need, including __iadd__, but then my co-developer ran the above code on Python 2.7 and performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__! I also figured out that I should use the update method instead, which I will, but I still find that uglier than +=. I would submit a patch to implement __iadd__, but I first want to know if that's considered the right behavior, since it changes the semantics of +=: >>> from collections import Counter >>> a = Counter([1,2,3]) >>> b = a >>> a += Counter([3,4,5]) >>> a is b False would become # snip >>> a is b True TIA, Lars [1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af [2] http://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l399 -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com