Re: Window Function "Run Conditions"

Andy Fan Tue, 05 Apr 2022 00:39:08 -0700

Hi David:


I just came to the office today to double check this patch.  I probably can
> finish it very soon.
>

I would share my current review result first and more review is still in
progress.
There is a lot of amazing stuff there but I'd save the simple +1 and just
share
something I'm not fully understand now.  I just focused on the execution
part and
only 1 WindowAgg node situation right now.

1. We can do more on PASSTHROUGH, we just bypass the window function
currently,  but IIUC we can ignore all of the following tuples in current
partition
once we go into this mode.  patch 0001 shows what I mean.

--- without patch 0001,  we need 1653 ms for the below query, with the
patch 0001,
--- we need 629ms.   This is not a serious performance comparison since I
--- build software with -O0 and --enable_cassert.  but it can show some
improvement.
postgres=# explain analyze select * from (select x,y,row_number() over
(partition
by x order by y) rn from xy) as xy where rn < 2;
                                                                      QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on xy  (cost=0.42..55980.43 rows=5000 width=16) (actual
time=0.072..1653.631 rows=1000 loops=1)
   Filter: (xy.rn = 1)
   Rows Removed by Filter: 999000
   ->  WindowAgg  (cost=0.42..43480.43 rows=1000000 width=16) (actual
time=0.069..1494.553 rows=1000000 loops=1)
         Run Condition: (row_number() OVER (?) < 2)
         ->  Index Only Scan using xy_x_y_idx on xy xy_1
 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.047..330.283
rows=1000000 loops=1)
               Heap Fetches: 0
 Planning Time: 0.240 ms
 Execution Time: 1653.913 ms
(9 rows)


postgres=# explain analyze select * from (select x,y,row_number() over
(partition
by x order by y) rn from xy) as xy where rn < 2;
                                                                      QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on xy  (cost=0.42..55980.43 rows=5000 width=16) (actual
time=0.103..629.428 rows=1000 loops=1)
   Filter: (xy.rn < 2)
   Rows Removed by Filter: 1000
   ->  WindowAgg  (cost=0.42..43480.43 rows=1000000 width=16) (actual
time=0.101..628.821 rows=2000 loops=1)
         Run Condition: (row_number() OVER (?) < 2)
         ->  Index Only Scan using xy_x_y_idx on xy xy_1
 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.063..281.715
rows=1000000 loops=1)
               Heap Fetches: 0
 Planning Time: 1.119 ms
 Execution Time: 629.781 ms
(9 rows)

Time: 633.241 ms


2. the "Rows Removed by Filter: 1000" is strange to me for the above
example.

 Subquery Scan on xy  (cost=0.42..55980.43 rows=5000 width=16) (actual
time=0.103..629.428 rows=1000 loops=1)
   Filter: (xy.rn < 2)
   Rows Removed by Filter: 1000

The root cause is even ExecQual(winstate->runcondition, econtext) return
false, we
still return the slot to the upper node.  A simple hack can avoid it.

3.  With the changes in 2,  I think we can avoid the subquery node totally
for the above query.

4. If all the above are correct, looks the enum WindowAggStatus addition is
not a
must since we can do what WINDOWAGG_PASSTHROUGH does just when we find it
is, like
patch 3 shows.  (I leave WINDOWAGG_DONE only, but it can be replaced with
previous all_done field).

Finally, Thanks for the patch, it is a good material to study the knowledge
in this area.

-- 
Best Regards
Andy Fan

v1-0003-Try-to-remove-enum-WindowAggStatus.patch
Description: Binary data

v1-0001-When-we-are-in-PASSTHROUGH-mode-all-the-following.patch
Description: Binary data

v1-0002-not-return-run-condition-false-tuple-to-upper-nod.patch
Description: Binary data

Re: Window Function "Run Conditions"

Reply via email to