Hello Ildar,

Here is a new version of patch. I've splitted it into two parts. The first one is almost the same as v4 from [1] with some refactoring. The second part introduces random_seed variable as you proposed.

Patch 1 applies. Compilations fails, there are two "hash_seed" declared in "pgbench.c".

Patch 2 applies cleanly on top of the previous patch and compiles, because the variable is removed...

If an ":hash_seed" pgbench variable is used, ISTM that there is no need for a global variable at all, so the two patches are going back and forth, which is unhelpful. ISTM better to provide just one combined patch for the feature.

If the hash_seed variable really needs to be kept, it should be an "int64" variable, like other pgbench values. Calling random just usually initializes about 31 bits, so random should be called 2 or maybe 3 times? Or maybe use the internal getrand which has 48 pseudorandom bits?

For me "random seed" is what is passed to srandom, so the variable should rather be named hash_seed because there could also be a random seed (actually, there is in another parallel patch:-). Moreover, this seed may or may not be random if set, so calling it "random_seed" is not desirable.

The len < 1 || len > 2 is checked twice, once in the "switch", on in an "if" just after the "switch". Once is enough.

I didn't do the executor simplification thing yet because I'm a little concerned about inventive users, who may want to change random_seed variable in runtime (which is possible since pgbench doesn't have read only variables aka constants AFAIK).

If the user choses to overide hash_seed in their script, it is their decision, the documentation has only to be clear about :hash_seed being the default seed. I see no clear reason to work around this possibility by evaluating the seed at parse time, especially as the variable may not have its final value yet depending on the option order. I'd suggest to just use make_variable("hash_seed") for the default second argument and simplify the executor.

The seed variable is not tested explicitely in the script, you could add
a "hash(5432) == hash(5432, :hash_seed)" for instance.

It would be nice if an native English speaker could proofread the documentation text. I'd suggest: "*an* optional seed parameter". "In case *the* seed...". "<literal>:hash_seed</literal>". "shared for" -> "shared by". "following listing" -> "following pgbench script". "few accounts generates" -> "few accounts generate".

For the document example, I'd use larger values for the random & modulo, eg 100000000 and 1000000. The drawback is that zipfian does a costly computation of the generalized harmonic number when the parameter is lower than 1.0. For cities, the parameter found by Zipf was 1.07 (says Wikipedia). Maybe use this historical value. Or maybe use an exponential distribution in the example.

--
Fabien.

Reply via email to