Hi All, The current NumPy implementation of the truncated zipf distribution has several drawbacks.
- Extremely poor performance when the parameter `a` is near 1. For instance, when `a = 1.000001` a simple change in the implementation speeds things up by a factor of 1,657. When the parameter is closer to 1, the algorithm effectively hangs. - Because the distribution is truncated, say to integers in the range of int64, the parameter could be allowed to take all values > 0, even though the untruncated series diverges. There is some indication that such values of `a` can be useful in modeling because of the heavy distribution in the tail. Because fixing these problems will change the output stream, I suggest implementing a truncated zeta distribution, which is an alternative name for the same distribution, and deprecating the the zipf distribution. Furthermore, rather than truncate at the value of C long, which varies, truncate at max(int64), or some possibly smaller value, say 2**44, which allows all integers up to that value to be realized with approximately correct probabilities when using double precision for the intermediate computations. Thoughts? Chubk
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion