In the numeric width_bucket() code, we currently do the following: mul_var(&operand_var, count_var, &operand_var, operand_var.dscale + count_var->dscale); div_var(&operand_var, &bound2_var, result_var, select_div_scale(&operand_var, &bound2_var), true);
if (cmp_var(result_var, count_var) >= 0) set_var_from_var(count_var, result_var); else { add_var(result_var, &const_one, result_var); floor_var(result_var, result_var); } select_div_scale() returns a value between 16 and 1000, depending on the dscales and weights of its inputs, so div_var() computes that many digits after the decimal point and rounds the final digit. Assuming the result doesn't exceed count, we then floor the result, throwing away all those digits to get the integer part that we want. Instead, this can be done more simply and efficiently, using division with truncation as follows: mul_var(&operand_var, count_var, &operand_var, operand_var.dscale + count_var->dscale); div_var(&operand_var, &bound2_var, result_var, 0, false); add_var(result_var, &const_one, result_var); This doesn't compute any digits after the decimal point, and instead just returns the integer part of the quotient, which is much cheaper to compute, and precisely what we want. In fact, the current code is slightly inaccurate, because the rounding step can incorrectly round up into the next internal bucket, for example: width_bucket(6.666666666666666, 0, 10, 3) -> 2 width_bucket(6.6666666666666666, 0, 10, 3) -> 3 though in practice that's extremely unlikely and doesn't really matter. Patch attached. I didn't bother with any new test cases, since there appears to be sufficient coverage already. As a quick performance/correctness test, I ran the following: SELECT setseed(0); CREATE TEMP TABLE t AS SELECT random(-4.000000, 8.000000) op, random(-4.100000, -2.000000) b1, random(6.000000, 8.100000) b2, random(1, 15) c FROM generate_series(1, 10000000); SELECT hash_array(array_agg(width_bucket(op, b1, b2, c))) FROM t; -- Result not changed by patch SELECT sum(width_bucket(op, b1, b2, c)) FROM t; Time: 3658.962 ms (00:03.659) -- HEAD Time: 3089.946 ms (00:03.090) -- with patch Regards, Dean
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index 5510a20..5999a19 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -1909,7 +1909,7 @@ width_bucket_numeric(PG_FUNCTION_ARGS) /* * 'operand' is inside the bucket range, so determine the correct - * bucket for it to go. The calculations performed by this function + * bucket for it to go in. The calculations performed by this function * are derived directly from the SQL2003 spec. Note however that we * multiply by count before dividing, to avoid unnecessary roundoff error. */ @@ -1928,32 +1928,26 @@ compute_bucket(Numeric operand, Numeric if (!reversed_bounds) { + /* bound1 <= operand < bound2 */ sub_var(&operand_var, &bound1_var, &operand_var); sub_var(&bound2_var, &bound1_var, &bound2_var); } else { + /* bound2 < operand <= bound1 */ sub_var(&bound1_var, &operand_var, &operand_var); sub_var(&bound1_var, &bound2_var, &bound2_var); } - mul_var(&operand_var, count_var, &operand_var, - operand_var.dscale + count_var->dscale); - div_var(&operand_var, &bound2_var, result_var, - select_div_scale(&operand_var, &bound2_var), true); - /* - * Roundoff in the division could give us a quotient exactly equal to - * "count", which is too large. Clamp so that we do not emit a result - * larger than "count". + * Now we have 0 <= operand < bound2. The required bucket index is given + * by (operand * count) / bound2 + 1, using "floor division" -- i.e., + * dividing to zero decimal places with truncation. */ - if (cmp_var(result_var, count_var) >= 0) - set_var_from_var(count_var, result_var); - else - { - add_var(result_var, &const_one, result_var); - floor_var(result_var, result_var); - } + mul_var(&operand_var, count_var, &operand_var, + operand_var.dscale + count_var->dscale); + div_var(&operand_var, &bound2_var, result_var, 0, false); + add_var(result_var, &const_one, result_var); free_var(&bound1_var); free_var(&bound2_var);