Hi David:
I just came to the office today to double check this patch. I probably can > finish it very soon. > I would share my current review result first and more review is still in progress. There is a lot of amazing stuff there but I'd save the simple +1 and just share something I'm not fully understand now. I just focused on the execution part and only 1 WindowAgg node situation right now. 1. We can do more on PASSTHROUGH, we just bypass the window function currently, but IIUC we can ignore all of the following tuples in current partition once we go into this mode. patch 0001 shows what I mean. --- without patch 0001, we need 1653 ms for the below query, with the patch 0001, --- we need 629ms. This is not a serious performance comparison since I --- build software with -O0 and --enable_cassert. but it can show some improvement. postgres=# explain analyze select * from (select x,y,row_number() over (partition by x order by y) rn from xy) as xy where rn < 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on xy (cost=0.42..55980.43 rows=5000 width=16) (actual time=0.072..1653.631 rows=1000 loops=1) Filter: (xy.rn = 1) Rows Removed by Filter: 999000 -> WindowAgg (cost=0.42..43480.43 rows=1000000 width=16) (actual time=0.069..1494.553 rows=1000000 loops=1) Run Condition: (row_number() OVER (?) < 2) -> Index Only Scan using xy_x_y_idx on xy xy_1 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.047..330.283 rows=1000000 loops=1) Heap Fetches: 0 Planning Time: 0.240 ms Execution Time: 1653.913 ms (9 rows) postgres=# explain analyze select * from (select x,y,row_number() over (partition by x order by y) rn from xy) as xy where rn < 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on xy (cost=0.42..55980.43 rows=5000 width=16) (actual time=0.103..629.428 rows=1000 loops=1) Filter: (xy.rn < 2) Rows Removed by Filter: 1000 -> WindowAgg (cost=0.42..43480.43 rows=1000000 width=16) (actual time=0.101..628.821 rows=2000 loops=1) Run Condition: (row_number() OVER (?) < 2) -> Index Only Scan using xy_x_y_idx on xy xy_1 (cost=0.42..25980.42 rows=1000000 width=8) (actual time=0.063..281.715 rows=1000000 loops=1) Heap Fetches: 0 Planning Time: 1.119 ms Execution Time: 629.781 ms (9 rows) Time: 633.241 ms 2. the "Rows Removed by Filter: 1000" is strange to me for the above example. Subquery Scan on xy (cost=0.42..55980.43 rows=5000 width=16) (actual time=0.103..629.428 rows=1000 loops=1) Filter: (xy.rn < 2) Rows Removed by Filter: 1000 The root cause is even ExecQual(winstate->runcondition, econtext) return false, we still return the slot to the upper node. A simple hack can avoid it. 3. With the changes in 2, I think we can avoid the subquery node totally for the above query. 4. If all the above are correct, looks the enum WindowAggStatus addition is not a must since we can do what WINDOWAGG_PASSTHROUGH does just when we find it is, like patch 3 shows. (I leave WINDOWAGG_DONE only, but it can be replaced with previous all_done field). Finally, Thanks for the patch, it is a good material to study the knowledge in this area. -- Best Regards Andy Fan
v1-0003-Try-to-remove-enum-WindowAggStatus.patch
Description: Binary data
v1-0001-When-we-are-in-PASSTHROUGH-mode-all-the-following.patch
Description: Binary data
v1-0002-not-return-run-condition-false-tuple-to-upper-nod.patch
Description: Binary data