Re: [HACKERS] extend pgbench expressions with functions

Fabien COELHO Thu, 05 Nov 2015 12:34:05 -0800


Hello Robert,

2. ddebug and idebug seem like a lousy idea to me.


It was really useful to me for debugging and testing.


That doesn't mean it belongs in the final patch.

I think it is useful when debugging a script, not just for debugging theevaluation code itself.

3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
and evalDouble().


Basically, the code is significantly shorter and elegant with this option.


I find that pretty hard to swallow.


Hmmm, maybe with some more explanations?

If the backend took this approach,


Sure, I would never suggest to do anything like that in the backend!

we've have a separate evaluate function for every datatype, which wouldmake it completely impossible to support the creation of new datatypes.

In the backend implementation is generic about types (one can add a typedynamically in the system), which is another abstraction level, it doesnot compare in any way.

And I find this code to be quite difficult to follow.


It is really the function *you* wrote, and there is just one version for
int and one for double.

What I think we should have is a type like PgBenchValue that representsa value which may be an integer or a double or whatever else we want tosupport, but not an expression - specifically a value. Then when youinvoke a function or an operator it gets passed one or more PgBenchValueobjects and can do whatever it wants with those, storing the result intoan output PgBenchValue. So add might look like this:

Sure, I do know what it looks like, and I want to avoid it, because thisis just a lot of ugly code which is useless for pgbench purpose.

void
add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y)
{
   if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER)
   {
       result->type = PGBT_INTEGER;
       result->u.ival = x->u.ival + y->u.ival;
       return;
   }
   if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE)
   {
       result->type = PGBT_DOUBLE;
       result->u.ival = x->u.dval + y->u.dval;
       return;
  }
   /* cross-type cases, if desired, go here */

Why reject 1 + 2.0 ? So the cross-type cases are really required for usersanity, which adds:


    if (x->type == PGBT_DOUBLE && y->type == PGBT_INTEGER)
    {
        result->type = PGBT_DOUBLE;
        result->u.ival = x->u.dval + y->u.ival;
        return;
    }
    if (x->type == PGBT_INTEGER && y->type == PGBT_DOUBLE)
    {
        result->type = PGBT_DOUBLE;
        result->u.ival = x->u.ival + y->u.dval;
       return;

}

For the '+' overload operator with conversions there are 4 cases (2arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %).that makes 20 cases to handle. Then for every function, you have to dealwith type conversion as well, each function times number of arguments **number of types. That is 2 cases for abs, 4 cases for random, 8 cases foreach random_exp*, 8 for random_gaus*, and so on. Some thinking would berequired for n-ary functions (min & max).

Basically, it can be done, no technical issue, it is just a matter ofwriting a *lot* of repetitive code, hundreds of lines of them. As I thinkit does not bring any value for pgbench purpose, I used the other approachwhich reduces the code size by avoiding the combinatorial "cross-type"conversions.

The way you have it, the logic for every function and operator exists
in two copies: one in the integer function, and the other in the
double function.

Yep, but only 2 cases to handle, instead of 4 cases in the above example,that is the point.

As soon as we add another data type - strings, dates, whatever - we'dneed to have cases for all of these things in those functions as well,and every time we add a function, we have to update all the casestatements for every datatype's evaluation function. That's just not ascalable model.


On the contrary, it is reasonably scalable:

With the eval-per-type for every added type one should implement one evalfunction which handles all functions and operators, each eval functionroughly the same size as the current ones.

With the above approach, the overloaded add function which handles 2operands with 3 types ('post' + 'gres' -> 'postgres') would have to dealwith 2**3 = 8 types combinations instead of 4, so basically it would bedoubling the code size. If you add dates on top of that, 2**4 = 16 casesjust for one operator. No difficulty there, just a lot of lines...


Basically the code size complexity of the above approach is:

  #functions * (#args ** #types)

while for the approach in the submitted patch it is:

  #types * #functions

Now I obviously agree that the approach is not generic and should not beused in other contexts, it is just that it is simple/short and it fits thebill for pgbench.

Note that I do not really envision adding more types for pgbench scripts.The double type is just a side effect of the non uniform randoms. What Iplan is to add a few functions, especially a pseudo-random permutationand/or a parametric hash, that would allow running more realistic testswith non uniform distributions.


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

Reply via email to