On 06.12.2019 19:52, Konstantin Knizhnik wrote:


On 06.12.2019 18:53, Robert Haas wrote:
On Thu, Nov 28, 2019 at 2:08 AM Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
calls float4_accum for each row of T, the same query in VOPS will call
vops_float4_avg_accumulate for each tile which contains 64 elements.
So vops_float4_avg_accumulate is called 64 times less than float4_accum.
And inside it contains straightforward loop:

              for (i = 0; i < TILE_SIZE; i++) {
                  sum += opd->payload[i];
              }

which can be optimized by compiler (loop unrolling, use of SIMD
instructions,...).
Part of the reason why the compiler can optimize that so well is
probably related to the fact that it includes no overflow checks.

May it makes sense to use in aggregate transformation function which is not checking for overflow and perform this check only in final function?
NaN and Inf values will be preserved in any case...

I have tried to comment check_float8_val in  float4_pl/float8_pl and get completely no difference in performance.

But if I replace query

select
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
    sum(l_quantity) as avg_qty,
    sum(l_extendedprice) as avg_price,
    sum(l_discount) as avg_disc,
count(*) as count_order
from lineitem_inmem;


with

select sum(l_quantity + l_extendedprice + l_discount + l_tax) from lineitem_inmem;


then time is reduced from 3686 to 1748 msec.
So at least half of this time we spend in expression evaluations and aggregates accumulation.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply via email to