[Numpy-discussion] Re: Proposed enhancement: Finite Zipf distribution for NumPy

Charles R Harris via NumPy-Discussion Sun, 30 Nov 2025 20:16:36 -0800

On Sun, Nov 30, 2025 at 7:10 PM Warren Weckesser via NumPy-Discussion <
[email protected]> wrote:


> NumPy implements the Zipf distribution (also known as the zeta
> distribution), as `random.Generator.zipf(a, size=None)`. This is a
> discrete distribution with infinite support {1, 2, 3, ...}. The PMF of
> the distribution is
>
>     p(k, a) = k**-a / zeta(a), a > 1, k = 1, 2, 3, ...,
>
> where zeta(a) is the Riemann zeta function
> (https://en.wikipedia.org/wiki/Riemann_zeta_function). (Technically,
> NumPy's implementation is limited to the maximum representable 64 bit
> integer integer, but the intent is to model the infinite support.) The
> distribution is implemented in SciPy as `scipy.stats.zipf`.
>
> A variation of this distribution is the finite Zipf distribution. It
> has finite support {1, 2, 3, ..., n}, and the PMF is
>
>     p(k, a, n) = k**-a / H(n, a), a >= 0, k = 1, 2, ..., n
>
> where H(n, a) is the generalized harmonic number. In SciPy, this
> distribution is implemented in `scipy.stats.zipfian`.
>
> I have an implementation of an efficient rejection method for the
> finite distribution that is currently in a SciPy pull request:
> https://github.com/scipy/scipy/pull/24011.
>
> I think this would make a good addition to NumPy. We wouldn't need a
> new method; this could be implemented by adding the `n` parameter to
> the existing `zipf` method of `numpy.random.Generator`, so the
> signature becomes
>
>     zipf(a, size=None, *, n=None)
>
> The default n=None preserves the current behavior. When `n` is a
> positive integer, variates from the finite distribution are generated.
> Because the finite distribution has no convergence limitation for the
> normalizing constant, the constraint on `a` in that case can be
> broadened to `a >= 0`.
>
> Notes on the implementation are at
>
> https://github.com/WarrenWeckesser/experiments/tree/main/python/numpy/random-cython/docs
> ,
> and some validation scripts are in
>
> https://github.com/WarrenWeckesser/experiments/tree/main/python/scipy/test-zipfian-distr
> .
>
> API-wise, this is a small change to NumPy. Also, the algorithm is a
> straightforward implementation of the rejection method (well,
> "straightforward" if you are already familiar with the general idea),
> so the implementation should be easy to understand and maintain.
>
> Why add it to both SciPy and NumPy? Ideally, this would just be in
> NumPy, and SciPy could start using it reliably once the NumPy version
> with the enhancement becomes the oldest supported NumPy version. But
> the current SciPy implementation of random variate generation when 0
> <= a <= 1 is very slow, so I would like to also have this available in
> SciPy sooner than later. When the NumPy version with the enhancement
> is the oldest version supported by SciPy, SciPy can remove its
> implementation and use NumPy.
>
> What do you think?
>
> Warren
>

I like the idea. I suppose the most common case is actually a=1. We have
seen occasional complaints about speed for small values and I'd guess that
those who want it don't really care about the values way out in the tail.

Chuck

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Proposed enhancement: Finite Zipf distribution for NumPy

Reply via email to