[HACKERS] Desirable pgbench features?

Fabien Wed, 30 Mar 2016 08:30:13 -0700


Hello pgdevs,


I've been having a look at why pgbench only implements a TPC-B "like" benchmark,
and not the full, although obsolete, TPC-B.

I'm not particulary interested in running TPC-B per se, although I likethe principle of a simple steady-state update-intensive OLTP stress test,but more at what relevant capabilities are missing in pgbench to implementa simple test.


I also had a brief look at the TPC-C benchmark, but it is several order of
magnitudes more complex, and there is an open source implementation available
(TPC-C-Uva).

(0) TPC-B assumes signed integers which can hold 10 figures, which means
    that currently used INTs are not enough, it should be INT8. Quite easy
    to fix, or extend to keep some upward compatibility with the previous
    version.

(1) TPC-B test driver must obtain a value from a query (the branch is the one
    of the chosen teller, not any random branch) and reuse it in another
    query. Currently the communication is one way, all results are silently
    discarded.

    This is not representative of client applications (say a web app) which
    interact with the server with some logic of their own, including reading
    things and writing others depending on the previous reading.

    This may be simulated on server side with a plpgsql script, but that
    would not exercise the client/server protocol logic and its performance
    impact, so I think that this simple read capability is important and
    missing.

(2) There is an "if" required in TPC-B with a boolean condition when
    selecting an account: 85% of the time the account does an operation in
    its own branch, 15% in another branch.

(3) More binary operators would be useful, eg | is used by TPC-C for
    building a non uniform random generation from a uniform random one
    (although pgbench has clean exponential and gaussian randoms, which
    still lack some shuffling, though).

(4) Weighted transaction scripts are used by TPC-C, some of which can be
    voluntarily aborted. This already works with recently added weights
    and using ROLLBACK instead of END in a given script. Nothing to do.

(5) Consistency check: after a run, some properties are expected to be
    true, such as the balances of branches is the balance of its
    tellers and also of its accounts... This should/could be checked,
    maybe with an additional query.

I think that these capabilities are useful features for composing areasonable bench, and allowing pgbench to do some of these, whileremaining at a basic expression level (i.e. not creating a fullclient-side language, not my intent in any way), should be ok if thesyntax, code and performance impacts are small.

Indeed, some of the missing features can be included, probably at low costin pgbench:



* better expressions: comparisons, condition, binary operators...

Comparisons (say <= < == > >= != operators) and an if() function for (2),
on the client side, could look like that:

  \set abid if(random(0, 99) < 85, expression-1, expression-2)

This is pretty easy to implement with the current function infrastructure,as well as new operators such as |&^! (3).

Note: the "?:" C-like syntax looks attractive but would probably interactquite badly with the existing ":variable" syntax; moreover the ?: syntaxsuggests a short-circuit evaluation, but pgbench expressions are fullyevaluated. These operators & function would probably require around 100lines of pretty basic code, plus doc.



* using values from a query

For this use case (1), the best syntax and implementation is unclear. In
particular, I'm not fond of the \gset syntax used in psql because the ';'
is dropped and the \gset seems to act as a statement terminator.

After giving it some thought, I would suggest a simple two-line explicitsyntax compatible with current conventions, with a SELECT statementterminated with a ';', on one side and where to put the results on theother, something like:


  SELECT ... ;
  \into some variable names

Or maybe in the other way around:

  \setsql some variable names
  SELECT ... ;

The variable names provided could be stored in the command structure ofthe SELECT and they would be assigned when the query is executed.

Among the possible constraints, enforced or not, the variable types shouldbe int or double, the query should be a select, there should be one rowonly (or keep the first and set to zero if nothing?), the number ofvariables should be less than the number of columns...

* shuffling function, i.e. a parametric permutation so that non uniformrandoms can be shuffled so that rows with close pkey do not have closedrawing probabilities. This is non trivial, basically some kind of cypherfunction but which does not operate on power-of-two sizes...

* if all necessary features are available, being able to run a stricttpc-b bench (this mean adapting the init phase, and creating anew builtin script which matches the real spec): no big deal.



Any thoughts on these points? Especially the query to variable syntax?

--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Desirable pgbench features?

Reply via email to