Just to clarify - if I understand Anthony correctly, this proposal is not about implementing exactly YCSB as it is, but more about using zipfian distribution for an id in the regular pgbench table structure in conjunction with read/write balance to simulate something similar to it.

Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
point is not to implement YCSB, then do not call it YCSB:-)

Maybe there could be other simpler builtins to use non uniform
distributions: {zipf,exp,...}-{simple,select} and default values
(exp_param, zipf_param?) for the random distribution parameters.

  \set id random_zipfian(1, 100000*:scale, :zipf_param)
  \set val random(-5000, 5000)
  UPDATE pgbench_whatever ...;

Then

  pgbench -b zipf-se@1 -b zipf-si@1 [ -D zipf_param=1.1 ... ] -T 10000 ...

And probably instead of implementing the exact YCSB workload inside pgbench, it makes more sense to add PostgreSQL Jsonb as one of the options into the framework itself (I was in the middle of it few years ago, but then was distracted by some interesting benchmarking results).

Sure.

Hello,
thank you for your interest. I'm still improving this idea, the patch
and I'm very happy about the discussion we have. It really helps.

The idea was to implement the workloads as close to YCSB as possible
using pgbench.

Basically I'm against having something called YCSB if it is not YCSB;-)

So, the schema it should be applied to - is default schema generated by
pgbnench -i (pgbench_accounts).

This is a contradiction, because pgbench_accounts table is in no way, even remotely, conformant to the YCSB benchmark test table.

So for me there are three possibilities:

(1) do nothing, always an option as committers may be against extending pgbench in this direction anyway. Personally I'm fine with having it.

(2) implement YCSB cleanly, i.e. both initialization and operations, at least if this is "reasonable" (i.e. it does not result in 2000 lines of new code). ISTM that it can be done, given that the YCSB schema is very simple, hence I suggested "pgbench -i --schema yscb" to trigger a non default initialization.

(3) if you are interested in demonstrating non uniform distribution on pgbench_accounts, I'm also fine with it, just do so, but do *NOT* call it YCSB.

Also it seems that the YCSB bench uses some hashing to mix keys and avoid having 1 as the most frequent, 2 as the second, and so on. There is a hash function in pgbench which can be used (although the solution is not perfect, some values cannot be reached), but it is used by YCSB. Otherwise I'm planning to submit a pseudo-random permutation function to ease this some day, provided that the size of the table stays constant.

--
Fabien.

Reply via email to