[Numpy-discussion] Deprecate zipf distribution?

Charles R Harris Sat, 07 Oct 2017 08:30:02 -0700

Hi All,

The current NumPy implementation of the truncated zipf distribution has
several drawbacks.



   - Extremely poor performance when the parameter `a` is near 1. For
   instance, when `a = 1.000001` a simple change in the implementation speeds
   things up by a factor of 1,657. When the parameter is closer to 1, the
   algorithm effectively hangs.
   - Because the distribution is truncated, say to integers in the range of
   int64, the parameter could be allowed to take all values > 0, even though
   the untruncated series diverges. There is some indication that such values
   of `a` can be useful in modeling  because of the heavy distribution in the
   tail.

Because fixing these problems will change the output stream, I suggest
implementing a truncated zeta distribution, which is an alternative name
for the same distribution, and deprecating the the zipf distribution.
Furthermore, rather than truncate at the value of C long, which varies,
truncate at max(int64), or some possibly smaller value, say 2**44, which
allows all integers up to that value to be realized with approximately
correct probabilities when using double precision for the intermediate
computations.

Thoughts?

Chubk

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Deprecate zipf distribution?

Reply via email to