Vlastimil Brom <vlastimil.b...@gmail.com> writes:

> Hi all,
> I'd like to ask about the possibility of negative "counts" in
> collections.Counter (using Python 3.1);
> I believe, my usecase is rather trivial, basically I have the word
> frequencies of two texts and I want do compare them (e.g. to see what
> was added and removed between different versions of a text).
>
> This is simple enough to do with own code, but I thought, this would
> be exactly the case for Counter...
> However, as the Counter only returns positive counts, one has to get
> the difference in both directions and combine them afterwards, maybe
> something like:
>
>>>> c1=collections.Counter("aabcddd")
>>>> c2=collections.Counter("abbbd")
>>>> added_c2 = c2-c1
>>>> removed_c2 = c1-c2
>>>> negative_added_c2 = dict((k, v*-1) for (k, v) in removed_c2.items())
>>>> changed_c2 = dict(added_c2)
>>>> changed_c2.update(negative_added_c2)
>>>> changed_c2
> {'a': -1, 'c': -1, 'b': 2, 'd': -2}
>>>>
>
> It seems to me, that with negative counts allowed in Counter, this
> would simply be the matter of a single difference:
> changed_c2 = c2 - c1
>
> Is there a possibility to make the Counter work this way (other than
> replacing its methods in a subclass, which might be comparable to
> writing the naive counting class itself)?
> Are there maybe some reasons I missed to disable negative counts here?
> (As I could roughly understand, the Counter isn't quite limited to the
> mathematical notion of multiset; it seems to accept negative counts,
> but its methods only output the positive part).
> Is this kind of task - a comparison in both directions - an unusual
> one, or is it simply not the case for Counter?

Every time I have needed something like collections.Counter, I have
wanted the behaviour you require too.  As a result, I have never used
collections.Counter.  Instead I have used plain dictionaries or my own
class.

I don't understand why the Counter's + and - operators behave as they
do.  Here is an example from the docs:

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d                       # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d                       # intersection:  min(c[x], d[x])
Counter({'a': 1, 'b': 1})
>>> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})

If + and - just added or subtracted the multiplicities of elements,
keeping negative multiplicites, we would get:

>>> c - d
Counter({'a':2, 'b':-1})

Which I think is useful in many cases.  But we could still get the
result of current c - d very simply:

>>> (c - d) | Counter() # | Counter() removes negative multiplicities
Counter({'a':2})

Altogether more versatile and coherent IMHO.

-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to