On Sun, Feb 17, 2019 at 11:02:37PM +0100, Tomas Vondra wrote: > On 2/17/19 6:33 PM, David Fetter wrote: > > On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote: > >> Fabien COELHO <coe...@cri.ensmp.fr> writes: > >>>> I'm trying to use random_zipfian() for benchmarking of skewed data sets, > >>>> and I ran head-first into an issue with rather excessive CPU costs. > >> > >>> If you want skewed but not especially zipfian, use exponential which is > >>> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute > >>> the harmonic number, so it depends in the parameter. > >> > >> Maybe we should drop support for parameter values < 1.0, then. The idea > >> that pgbench is doing something so expensive as to require caching seems > >> flat-out insane from here. That cannot be seen as anything but a foot-gun > >> for unwary users. Under what circumstances would an informed user use > >> that random distribution rather than another far-cheaper-to-compute one? > >> > >>> ... This is why I submitted a pseudo-random permutation > >>> function, which alas does not get much momentum from committers. > >> > >> TBH, I think pgbench is now much too complex; it does not need more > >> features, especially not ones that need large caveats in the docs. > >> (What exactly is the point of having zipfian at all?) > > > > Taking a statistical perspective, Zipfian distributions violate some > > assumptions we make by assuming uniform distributions. This matters > > because Zipf-distributed data sets are quite common in real life. > > > > I don't think there's any disagreement about the value of non-uniform > distributions. The question is whether it has to be a zipfian one, when > the best algorithm we know about is this expensive in some cases? Or > would an exponential distribution be enough?
I suppose to people who care about the difference between Zipf and exponential would appreciate having the former around to test. Whether pgbench should support this is a different question, and it's sounding a little like the answer to that one is "no." Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate