Vlastimil Brom <vlastimil.b...@gmail.com> writes: > Hi all, > I'd like to ask about the possibility of negative "counts" in > collections.Counter (using Python 3.1); > I believe, my usecase is rather trivial, basically I have the word > frequencies of two texts and I want do compare them (e.g. to see what > was added and removed between different versions of a text). > > This is simple enough to do with own code, but I thought, this would > be exactly the case for Counter... > However, as the Counter only returns positive counts, one has to get > the difference in both directions and combine them afterwards, maybe > something like: > >>>> c1=collections.Counter("aabcddd") >>>> c2=collections.Counter("abbbd") >>>> added_c2 = c2-c1 >>>> removed_c2 = c1-c2 >>>> negative_added_c2 = dict((k, v*-1) for (k, v) in removed_c2.items()) >>>> changed_c2 = dict(added_c2) >>>> changed_c2.update(negative_added_c2) >>>> changed_c2 > {'a': -1, 'c': -1, 'b': 2, 'd': -2} >>>> > > It seems to me, that with negative counts allowed in Counter, this > would simply be the matter of a single difference: > changed_c2 = c2 - c1 > > Is there a possibility to make the Counter work this way (other than > replacing its methods in a subclass, which might be comparable to > writing the naive counting class itself)? > Are there maybe some reasons I missed to disable negative counts here? > (As I could roughly understand, the Counter isn't quite limited to the > mathematical notion of multiset; it seems to accept negative counts, > but its methods only output the positive part). > Is this kind of task - a comparison in both directions - an unusual > one, or is it simply not the case for Counter?
Every time I have needed something like collections.Counter, I have wanted the behaviour you require too. As a result, I have never used collections.Counter. Instead I have used plain dictionaries or my own class. I don't understand why the Counter's + and - operators behave as they do. Here is an example from the docs: >>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3}) >>> c - d # subtract (keeping only positive counts) Counter({'a': 2}) >>> c & d # intersection: min(c[x], d[x]) Counter({'a': 1, 'b': 1}) >>> c | d # union: max(c[x], d[x]) Counter({'a': 3, 'b': 2}) If + and - just added or subtracted the multiplicities of elements, keeping negative multiplicites, we would get: >>> c - d Counter({'a':2, 'b':-1}) Which I think is useful in many cases. But we could still get the result of current c - d very simply: >>> (c - d) | Counter() # | Counter() removes negative multiplicities Counter({'a':2}) Altogether more versatile and coherent IMHO. -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list