[issue25478] Consider adding a normalize() method to collections.Counter()

2021-12-31 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Withdrawing the suggestions for scaled_to() and scaled_by().  Am thinking that 
people are mostly better off with a dict comprehension where they can control 
the details of rounding and type conversions.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2021-05-02 Thread Raymond Hettinger


Raymond Hettinger  added the comment:


New changeset 8c598dbb9483bcfcb88fc579ebf27821d8861465 by Raymond Hettinger in 
branch 'master':
bpo-25478: Add total() method to collections.Counter (GH-25829)
https://github.com/python/cpython/commit/8c598dbb9483bcfcb88fc579ebf27821d8861465


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2021-05-02 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +24516
pull_request: https://github.com/python/cpython/pull/25829

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2020-12-20 Thread Allen Downey


Allen Downey  added the comment:

This API would work well for my use cases.

And looking back at previous comments in this thread, I think this proposal 
avoids the most objectionable pitfalls.

--
nosy: +AllenDowney

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2020-12-20 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Here's what I propose to add:

def total(self):
return sum(self.values())

def scaled_by(self, factor):
return Counter({elem : count * factor for elem, count in self.items()})

def scaled_to(self, target_total=1.0):
ratio = target_total / self.total()
return self.scaled_by(ratio)

These cover the common cases and they don't mutate the counter.

--
versions: +Python 3.10 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-06-24 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Vedran, Counters do explicitly support floats, decimals, ints, fractions, etc. 

Also, total() needs to be a method rather than a property to be consistent with 
the existing API and for clarity that a computation is being performed (as 
opposed to looking up a running total or other cheap operation).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-05-18 Thread Vedran Čačić

Vedran Čačić  added the comment:

My reading of the documentation says floats are only tentatively supported. The 
main text of the documentation says the values are supposed to be integers. 
Even the note mostly talks about negative values, the floats are mentioned in 
only one item. (And in that item, it says the value type should support 
addition and subtraction. I think it's not too big a stretch to stipulate it 
should support them accurately.:)

But whatever you do about total (cached property was just my idea to enable 
implementation of a probability distribution as a view on a Counter), it's 
obvious from the documentation that the output of normalize is _not_ a Counter. 
It might be a subclass, but I'd rather it be a completely separate class. The 
API intersection is not really that fat.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-05-18 Thread Mark Dickinson

Mark Dickinson  added the comment:

The point is that if you cache the total and update on each operation, you end 
up with a total that depends not just on the current contents of the Counter, 
but also on the history of operations. That seems like a bad idea: you could 
have two Counters with exactly the same counts in them (so that they compare 
equal), but with different cached totals.

So if floats (or Decimal instances) are permitted as Counter values, your 
suggested caching approach is not viable.

Of course, if Counter values are restricted to be plain old integers then it's 
fine, but that's not the current state of affairs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-05-18 Thread Vedran Čačić

Vedran Čačić  added the comment:

Well, yes, floats are innacurate. Do you really expect to normalize Counters 
containing values like a googol, or was it just a strawman? For me, it is much 
more imaginable* that total be zero because you have a negative value (e.g. 
{'spam': -1, 'eggs': 1}) than because you had a googol in your Counter at some 
time in the past.

(*) Note that the documentation says

> Counts are allowed to be any integer value including zero or _negative_ 
> counts. (emphasis mine)

... and floats are only mentioned at the bottom, in a Note. Besides, floats 
have that problem already, even with an existing API:

>>> from collections import Counter as C
>>> big = C(spam=1e100)
>>> c = C(spam=1)
>>> not +c
False
>>> c.update(big)
>>> c.subtract(big)
>>> not +c
True

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-05-18 Thread Mark Dickinson

Mark Dickinson  added the comment:

> total should be a cached property, that's updated on every Counter update

That would run into difficulties for Counters with float values: e.g., after

>>> c = Counter()
>>> c['spam'] = 1e100
>>> c['eggs'] = 1
>>> c['spam'] = 0

the  cached total would likely be 0.0, because that's what the sum of the 
(new-old) values gives:

>>> (1e100 - 0) + (1 - 0) + (0 - 1e100)
0.0

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-05-17 Thread Vedran Čačić

Vedran Čačić  added the comment:

As I said above, if we're going to go down that route, it seems much more 
reasonable to me that total should be a cached property, that's updated on 
every Counter update (in __setitem__, increased by a difference of a new value 
and an old one for that key).

And normalization should just provide a view over the Counter, that just passes 
the values through division with the above cached property. The view should of 
course be immutable by itself, but should reflect the changes of the underlying 
counter, just as already existing views (e.g. dict_values) do.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-05-17 Thread Allen Downey

Allen Downey  added the comment:

I'd like to second Raymond's suggestion.  With just a few additional methods, 
you could support a useful set of operations.  One possible API:

def scaled(self, factor)
"""Returns a new Counter with all values multiplied by factor."""

def normalized(self, total=1)
"""Returns a new Counter with values normalized so their sum is total."""

def total(self)
"""Returns the sum of the values in the Counter."""

These operations would make it easier to use a Counter as a PMF without 
subclassing.

I understand two arguments against this proposal

1) If you modify the Counter after normalizing, the result is probably nonsense.

That's true, but it is already the case that some Counter methods don't make 
sense for some use cases, depending on how you are using the Counter (as a bag, 
multiset, etc)

So the new features would come with caveats, but I don't think that's fatal.

2) PMF operations are not general enough for core Python; they should be in a 
stats module.

I think PMFs are used (or would be used) for lots of quick computations that 
don't require full-fledged stats.

Also, stats libraries tend to focus on analytic distributions; they don't 
really provide this kind of light-weight empirical PMF.

I think the proposed features have a high ratio of usefulness to implementation 
effort, without expanding the API unacceptably.


Two thoughts for alternatives/extensions:

1) It might be good to make scaled() available as __mul__, as Peter Norvig 
suggests.

2) If the argument of scaled() is a mapping type, it might be good to support 
elementwise scaling.  That would provide an elegant implementation of Raymond's 
chi-squared example and my inspection paradox example 
(http://greenteapress.com/thinkstats2/html/thinkstats2004.html#sec33)

Thank you!
Allen

--
nosy: +Allen Downey

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-04-23 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Was __rtruediv__ discussed before? What is the use case for it?

Besides __rtruediv__ all LGTM.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-04-22 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
keywords: +patch
pull_requests: +6275
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2018-01-29 Thread Raymond Hettinger

Change by Raymond Hettinger :


--
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2017-03-15 Thread Steven D'Aprano

Steven D'Aprano added the comment:

It seems to me that the basic Counter class should be left as-is, and if there 
are specialized methods used for statistics (such as normalize) it should go 
into a subclass in the statistics module.

The statistics module already uses Counter internally to calculate the mode.

It makes some sense to me for statistics to have a FrequencyTable (and 
CumulativeFrequencyTable?) class built on top of Counter. I don't think it 
makes sense to overload the collections.Counter type with these sorts of 
specialised methods.

--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2017-03-14 Thread David Mertz

David Mertz added the comment:

Raymond wrote:
> The idea is that the method would return a new counter instance
> and leave the existing instance untouched.

Your own first example suggested:

c /= sum(c.values())

That would suggest an inplace modification.  But even if it's not that, but 
creating a new object, that doesn't make much difference to the end user who 
has rebound the name `c`.

Likewise, I think users would be somewhat tempted by:

c = c.scale_by(1.0/c.total)  # My property/attribute suggestion

This would present the same attractive nuisance.  If the interface was the 
slightly less friendly:

freqs = {k:v/c.total for k, v in c.items()}

I think there would be far less temptation to rebind the same name 
unintentionally.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2017-03-14 Thread Raymond Hettinger

Raymond Hettinger added the comment:

The idea is that the method would return a new counter instance and leave the 
existing instance untouched.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2017-03-14 Thread David Mertz

David Mertz added the comment:

I definitely wouldn't want a mutator that "normalized" counts for the reason 
Antoine mentions.  It would be a common error to normalize then continue 
meaningless counting.

One could write a `Frequency` subclass easily enough.  The essential feature in 
my mind would be to keep an attribute `Counter.total` around to perform the 
normalization.  I'm +1 on adding that to `collections.Counter` itself.

I'm not sure if this would be better as an attribute kept directly or as a 
property that called `sum(self.values())` when accessed.  I believe that having 
`mycounter.total` would provide the right normalization in a clean API, and 
also expose easy access to other questions one would naturally ask (e.g. "How 
many observations were made?")

--
nosy: +David Mertz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2017-03-14 Thread Vedran Čačić

Vedran Čačić added the comment:

That seems horribly arbitrary to me, not to mention inviting another intdiv 
fiasco (from sanity import division:). If only Counter was committed to only 
working with integer values from start, it might be acceptable, but since 
Counter implementation was always careful not to preclude using Counter with 
nonint values, it wouldn't make sense.

Also, there is an interesting inconsistency then, in the form of

c = Counter(a=5,b=5).normalize(5)

Presumably c.a and c.b would be equal integers, and their sum equal to 5. That 
is unfortunately not possible. :-o

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2017-03-14 Thread Wolfgang Maier

Wolfgang Maier added the comment:

>   >>> Counter(red=11, green=5, blue=4).normalize(100) # percentage
>  Counter(red=55, green=25, blue=20)

I like this example, where the normalize method of a Counter returns a new 
Counter, but I think the new Counter should always only have integer counts. 
More specifically, it should be the closest approximation of the original 
Counter that is possible with integers adding up to the argument to the method 
or, statistically speaking, it should represent the expected number of 
observations of each outcome for a given sample size.

--
nosy: +wolma

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2016-09-19 Thread Vedran Čačić

Vedran Čačić added the comment:

Operator seems OK. After all, we can currently do c+c, which is kinda like c*2 
(sequences behave this way generally, and it is a usual convention in 
mathematics too). And division by a number is just a multiplication by its 
reciprocal. But a dedicated normalize method? No. As Josh said, then you're 
forking the API.

The correct way is probably to have a "normalized view" of a Counter. But I 
don't know the best way to calculate it fast. I mean, I know it mathematically 
(cache the sum of values and update it on every Counter update) but I don't 
know whether it's Pythonic enough.

--
nosy: +veky

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2016-09-14 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The pitfall I imagine here is that if you continue adding elements after 
normalize() is called, the results will be nonsensical.

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2016-09-12 Thread SilentGhost

Changes by SilentGhost :


--
nosy:  -SilentGhost

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2016-09-12 Thread SilentGhost

Changes by SilentGhost :


--
Removed message: http://bugs.python.org/msg275998

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2016-09-12 Thread SilentGhost

SilentGhost added the comment:

Floats are also not fully supported by the Counter class, for example,
sorted(Counter(a=1.0).elements()) results in TypeError.

--
nosy: +SilentGhost

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2016-09-12 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
versions: +Python 3.7 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2015-10-26 Thread Raymond Hettinger

New submission from Raymond Hettinger:

Allen Downey suggested this at PyCon in Montreal and said it would be useful in 
his bayesian statistics courses.  Separately, Peter Norvig created a 
normalize() function in his probablity tutorial at In[45] in 
http://nbviewer.ipython.org/url/norvig.com/ipython/Probability.ipynb .

I'm creating this tracker item to record thoughts about the idea.  Right now, 
it isn't clear whether Counter is the right place to support this operation, 
how it should be designed, whether to use an in-place operation or an operation 
that creates a new counter, should it have rounding to make the result exactly 
equal to 1.0, should it use math.fsum() for float inputs?

Should it support other target totals besides 1.0?

  >>> Counter(red=11, green=5, blue=4).normalize(100) # percentage
  Counter(red=55, green=25, blue=20)

Also would it make sense to support something like this?

  sampled_gender_dist = Counter(male=405, female=421)
  world_gender_dist = Counter(male=0.51, female=0.50)
  cs = world_gender_dist.chi_squared(observed=sampled_gender_dist)

Would it be better to just have a general multiply-by-scalar operation for 
scaling?

  c = Counter(observations)
  c.scale_by(1.0 / sum(c.values())

Perhaps use an operator?

  c /= sum(c.values())

--
assignee: rhettinger
components: Library (Lib)
messages: 253452
nosy: rhettinger
priority: low
severity: normal
status: open
title: Consider adding a normalize() method to collections.Counter()
type: enhancement
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2015-10-26 Thread Josh Rosenberg

Josh Rosenberg added the comment:

Counter is documented as being primarily intended for integer counts. While you 
can use them with floats, I'm not sure they're the right data type for this use 
case. Having some methods that only make sense with floats, and others (like 
elements) that only make sense with integers is just confusing.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25478] Consider adding a normalize() method to collections.Counter()

2015-10-25 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com