Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-09-07 Thread Jian Guo
Hi Jian He,

Thanks for fixing the compiler warnings, seems the CI used a little old 
compiler and complained:

 ISO C90 forbids mixed declarations and code 
[-Werror=declaration-after-statement]

But later C standard have relaxed the requirements for this, ISO C99 and later 
standard allow declarations and code to be freely mixed within compound 
statements: 
https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html
Mixed Labels and Declarations (Using the GNU Compiler Collection 
(GCC))<https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html>
Mixed Labels and Declarations (Using the GNU Compiler Collection (GCC))
gcc.gnu.org


From: jian he 
Sent: Wednesday, September 6, 2023 14:00
To: Jian Guo 
Cc: Tomas Vondra ; Hans Buschmann 
; pgsql-hackers@lists.postgresql.org 

Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more 
than factor 500

!! External Email

On Tue, Aug 22, 2023 at 10:35 AM Jian Guo  wrote:
>
> Sure, Tomas.
>
> Here is the PG Commitfest link: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F44%2F4510%2F=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=%2FuPl5rS1rFaQRnNevIVxKZCNA2Bbmr2rg%2BRoX5yUE9s%3D=0<https://commitfest.postgresql.org/44/4510/>
> 

hi.
wondering around 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2F=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=t%2B8JrNQQAibe3Hdeico06U3HhLx70B17kzPMERY39os%3D=0<http://cfbot.cputube.org/>
there is a compiler warning: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcirrus-ci.com%2Ftask%2F6052087599988736=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8WbXadRi7MhO0AiHjtJOs4y5mqCP8VHBdcQao%2FXPpM8%3D=0<https://cirrus-ci.com/task/6052087599988736>

I slightly edited the code, making the compiler warning out.

I am not sure if the following duplicate comment from (rte->rtekind ==
RTE_SUBQUERY && !rte->inh) branch is correct.
/*
* OK, recurse into the subquery.  Note that the original setting
* of vardata->isunique (which will surely be false) is left
* unchanged in this situation.  That's what we want, since even
* if the underlying column is unique, the subquery may have
* joined to other tables in a way that creates duplicates.
*/

Index varnoSaved = var->varno;
here varnoSaved should be int?

image attached is the coverage report
if I  understand coverage report correctly,
`
if (rel->subroot) examine_simple_variable(rel->subroot, var, vardata);
`
the above never actually executed?

!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-08-21 Thread Jian Guo
Sure, Tomas.

Here is the PG Commitfest link: https://commitfest.postgresql.org/44/4510/

From: Tomas Vondra 
Sent: Monday, August 21, 2023 18:56
To: Jian Guo ; Hans Buschmann ; 
pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more 
than factor 500

!! External Email

On 8/21/23 10:16, Jian Guo wrote:
> Hi hackers,
>
> I found a new approach to fix this issue, which seems better, so I would
> like to post another version of the patch here. The origin patch made
> the assumption of the values of Vars from CTE must be unique, which
> could be very wrong. This patch examines variables for Vars inside CTE,
> which avoided the bad assumption, so the results could be much more
> accurate.
>

No problem with posting a reworked patch to the same thread, but I'll
repeat my suggestion to register this in the CF app [1]. The benefit is
that people are more likely to notice the patch and also cfbot [2] will
run regression tests.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=OmMo0lQtSvDFWu8VbI0ZorDpZ3BuxsmkTjagGfnryEc%3D=0<https://commitfest.postgresql.org/>
[2] 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2F=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=xTYDRybLm0AYyvRNqtN85fZWeUJREshIq7PYhz8bMgU%3D=0<http://cfbot.cputube.org/>


--
Tomas Vondra
EnterpriseDB: 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Zn4W8nPFmKxCLQ3XM555UlnM%2F9q1XLkJU5PRxT1VSig%3D=0<http://www.enterprisedb.com/>
The Enterprise PostgreSQL Company

!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-08-21 Thread Jian Guo
Hi hackers,

I found a new approach to fix this issue, which seems better, so I would like 
to post another version of the patch here. The origin patch made the assumption 
of the values of Vars from CTE must be unique, which could be very wrong. This 
patch examines variables for Vars inside CTE, which avoided the bad assumption, 
so the results could be much more accurate.

Regards,
Jian


From: Tomas Vondra 
Sent: Monday, August 14, 2023 20:58
To: Jian Guo ; Hans Buschmann ; 
pgsql-hackers@lists.postgresql.org 
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more 
than factor 500

!! External Email

Hi,

I haven't looked at the patch, but please add the patch to the next
commit fest (2023-09), so that we don't lose track of it.

See 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F=05%7C01%7Cgjian%40vmware.com%7C9d40e84af2c946f3517a08db9cc61ee2%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638276146959658928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=EUlMgo%2BU4Oi%2BWf0cS%2FKnTmhHzZrYzu26PzfxYnZIDFs%3D=0<https://commitfest.postgresql.org/>


regards

Tomas

On 8/14/23 13:12, Jian Guo wrote:
> Hi hackers,
>
> I have written a patch to add stats info for Vars in CTEs. With this
> patch, the join size estimation on the upper of CTE scans became more
> accurate.
>
> In the function |selfuncs.c:eqjoinsel| it uses the number of the
> distinct values of the two join variables to estimate join size, and in
> the function |selfuncs.c:get_variable_numdistinct| return a default
> value |DEFAULT_NUM_DISTINCT| (200 in Postgres and 1000 in Greenplum),
> with the default value, you can never expect a good plan.
>
> Thanks if anyone could give a review.
>
> Regards,
> Jian
>
> 
> *From:* Hans Buschmann 
> *Sent:* Wednesday, February 8, 2023 21:55
> *To:* pgsql-hackers@lists.postgresql.org
> 
> *Subject:* Wrong rows estimations with joins of CTEs slows queries by
> more than factor 500
>
>
> !! External Email
>
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
>
>
> 1. Background
>
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
>
>
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
>
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
>
>
> These 4 CTEs are then used to combine the data by joining them.
>
>
> 2. Problem
>
> The 4 CTEs are joined by left joins as shown below:
>
>
> from qup
> left join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> left join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> left join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
>
> The plan first retrieves qup and qli, taking the estimated row counts of
> 1163 and 1147 respectively
>
>
> BUT the result is then hashed and the row count is estimated as 33!
>
>
> In a Left join the row count stays always the same as the one of left
> table (here qup with 1163 rows)
>
>
> The same algorithm which reduces the row estimation from 1163 to 33 is
> used in the next step to give an estimation of 1 row.
>
> This is totally wrong.
>
>
> Here is the execution plan of the query:
>
> (search the plan for rows=33)
>
>
>
> QUERY PLAN
> --
>  Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual
> time=168.307..222.670 rows=9963 loops=1)
>CTE qup
>  ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.466..68.131 rows=10735 loops=1)
>Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.454..36.819 rows=50969 loops=1)
>  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>  Sort Method: quicksort  Memory: 4722kB
>  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.148..

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-08-14 Thread Jian Guo
  Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
 Sort Method: quicksort  Memory: 68kB
 ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 
width=72) (actual time=0.016..1.046 rows=481 loops=1)
   Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 
8))
   Rows Removed by Filter: 10197
 ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual 
time=1.119..1.128 rows=417 loops=1)
   Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
   Sort Method: quicksort  Memory: 68kB
   ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 
width=72) (actual time=0.029..1.056 rows=424 loops=1)
 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
 Rows Removed by Filter: 10275
 Planning Time: 1.746 ms
 Execution Time: 13405.503 ms
(116 Zeilen)

This case really brought me to detect the problem!

The original query and data are not shown here, but the principle should be 
clear from the execution plans.

I think the planner shouldn't change the row estimations on further steps after 
left joins at all, and be a bit more conservative on inner joins.
This may be related to the fact that this case has 2 join-conditions (xx_season 
an xx_code).

Thanks for looking

Hans Buschmann





!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.
From 29db9f3a6e6592cf277b011b7f92bb7bc0c69baf Mon Sep 17 00:00:00 2001
From: Jian Guo 
Date: Wed, 9 Aug 2023 03:32:01 -0400
Subject: [PATCH] Add stats info for Vars in CTEs.

Signed-off-by: Jian Guo 
---
 src/backend/utils/adt/selfuncs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 9f5536c04b6..b70b779a595 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5343,7 +5343,10 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
 	stadistinct = getgpsegmentCount();
 	break;
 default:
-	stadistinct = 0.0;	/* means "unknown" */
+	if (vardata->rel->rtekind == RTE_CTE)
+		stadistinct = -1.0;
+	else
+		stadistinct = 0.0;	/* means "unknown" */
 	break;
 			}
 		}
-- 
2.37.3



Re: On disable_cost

2023-08-03 Thread Jian Guo
Hi hackers,

I have write an initial patch to retire the disable_cost​ GUC, which labeled a 
flag on the Path struct instead of adding up a big cost which is hard to 
estimate. Though it involved in tons of plan changes in regression tests, I 
have tested on some simple test cases such as eagerly generate a two-stage agg 
plans and it worked. Could someone help to review?


regards,

Jian

From: Euler Taveira 
Sent: Friday, November 1, 2019 22:48
To: Zhenghua Lyu 
Cc: PostgreSQL Hackers 
Subject: Re: On disable_cost

!! External Email

Em sex, 1 de nov de 2019 às 03:42, Zhenghua Lyu  escreveu:
>
> My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For query 
> 104, it generates
>  nestloop join even with enable_nestloop set off. And the final plan's total 
> cost is very huge (about 1e24). But If I enlarge the disable_cost to 1e30, 
> then, planner will generate hash join.
>
> So I guess that disable_cost is not large enough for huge amount of data.
>
> It is tricky to set disable_cost a huge number. Can we come up with 
> better solution?
>
Isn't it a case for a GUC disable_cost? As Thomas suggested, DBL_MAX
upper limit should be sufficient.

> The following thoughts are from Heikki:
>>
>> Aside from not having a large enough disable cost, there's also the fact 
>> that the high cost might affect the rest of the plan, if we have to use a 
>> plan type that's disabled. For example, if a table doesn't have any indexes, 
>> but enable_seqscan is off, we might put the unavoidable Seq Scan on 
>> different side of a join than we we would with enable_seqscan=on, because of 
>> the high cost estimate.
>
>
>>
>> I think a more robust way to disable forbidden plan types would be to handle 
>> the disabling in add_path(). Instead of having a high disable cost on the 
>> Path itself, the comparison add_path() would always consider disabled paths 
>> as more expensive than others, regardless of the cost.
>
I'm afraid it is not as cheap as using diable_cost as a node cost. Are
you proposing to add a new boolean variable in Path struct to handle
those cases in compare_path_costs_fuzzily?


--
   Euler Taveira   Timbira -
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.timbira.com.br%2F=05%7C01%7Cgjian%40vmware.com%7C12a30b2852dd4651667608db9401d056%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638266507757076648%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=v54JhsW8FX4mSmjgt2yP59t7xtv1mZvC%2BBhtKrfp%2FBY%3D=0<http://www.timbira.com.br/>
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento





!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.
From baf0143438a91c8739534c7d85b9d121a7bb560e Mon Sep 17 00:00:00 2001
From: Jian Guo 
Date: Thu, 3 Aug 2023 17:03:49 +0800
Subject: [PATCH] Retire disable_cost, introduce new flag is_disabled into Path
 struct.

Signed-off-by: Jian Guo 
---
 src/backend/optimizer/path/costsize.c | 30 +++---
 src/backend/optimizer/util/pathnode.c | 58 +++
 src/include/nodes/pathnodes.h |  1 +
 3 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a1..0814604c49 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -274,7 +274,7 @@ cost_seqscan(Path *path, PlannerInfo *root,
 		path->rows = baserel->rows;
 
 	if (!enable_seqscan)
-		startup_cost += disable_cost;
+		path->is_disabled = true;
 
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
@@ -463,7 +463,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 		path->path.rows = rel->rows;
 
 	if (!enable_gathermerge)
-		startup_cost += disable_cost;
+		path->path.is_disabled = true;
 
 	/*
 	 * Add one to the number of workers to account for the leader.  This might
@@ -576,7 +576,7 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	}
 
 	if (!enable_indexscan)
-		startup_cost += disable_cost;
+		path->path.is_disabled = true;
 	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
 
 	/*
@@ -1011,7 +1011,7 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 		path->rows = baserel->rows;
 
 	if (!enable_bitmapscan)
-		startup_cost += disable_cost;
+		path->is_disabled = true;
 
 	pages_fetched = compute_bitmap_pages(root, baserel, bitmapqual,
 		 loop_count, ,
@@ -1279,11 +1279,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	 */
 	if (isCurrentOf)
 	{
-		Assert(bas

Question about use_physical_tlist() which is applied on Scan path

2023-07-26 Thread Jian Guo
Hi hackers,

I have a question about `use_physical_tlist()` which is applied in 
`create_scan_plan()`:
```
if (flags == CP_IGNORE_TLIST)
{
  tlist = NULL;
}
else if (use_physical_tlist(root, best_path, flags))
{
  if (best_path->pathtype == T_IndexOnlyScan)
  {
/* For index-only scan, the preferred tlist is the index's */
tlist = copyObject(((IndexPath *) best_path)->indexinfo->indextlist);

/*
 * Transfer sortgroupref data to the replacement tlist, if
 * requested (use_physical_tlist checked that this will work).
 */
if (flags & CP_LABEL_TLIST)
  apply_pathtarget_labeling_to_tlist(tlist, best_path->pathtarget);
  }
  else
  {
tlist = build_physical_tlist(root, rel);
……
```
And the comment above the code block says:

```
/*
* For table scans, rather than using the relation targetlist (which is
* only those Vars actually needed by the query), we prefer to generate a
* tlist containing all Vars in order.  This will allow the executor to
* optimize away projection of the table tuples, if possible.
*
* But if the caller is going to ignore our tlist anyway, then don't
* bother generating one at all.  We use an exact equality test here, so
* that this only applies when CP_IGNORE_TLIST is the only flag set.
*/
```

But for some column-oriented database based on Postgres, it may help a lot in 
case of projection of the table tuples in execution? And is there any other 
optimization considerations behind this design?

e.g. If we have such table definition and a query:

```
CREATE TABLE partsupp
(PS_PARTKEY INT,
PS_SUPPKEY INT,
PS_AVAILQTY INTEGER,
PS_SUPPLYCOST DECIMAL(15,2),
PS_COMMENT VARCHAR(199),
dummy text);

explain analyze verbose select sum(ps_supplycost * ps_availqty) from partsupp;
```

And the planner would give such plan:

```
QUERY PLAN
---
Aggregate  (cost=12.80..12.81 rows=1 width=32) (actual time=0.013..0.015 rows=1 
loops=1)
   Output: sum((ps_supplycost * (ps_availqty)::numeric))
   ->  Seq Scan on public.partsupp  (cost=0.00..11.60 rows=160 width=22) 
(actual time=0.005..0.005 rows=0 loops=1)
 Output: ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, 
ps_comment, dummy
Planning Time: 0.408 ms
Execution Time: 0.058 ms
(6 rows)
```
It looks the columns besides `ps_supplycost` and `ps_availqty` are not 
necessary, but fetched from tuples all at once. For the row-based storage such 
as heap, it looks fine, but for column-based storage, it would result into 
unnecessary overhead and impact performance. Is there any plan to optimize here?

Thanks.


Re: Summary Sort workers Stats in EXPLAIN ANALYZE

2022-03-28 Thread Jian Guo

I have updated the patch addressing the review comments, but I didn't moved 
this code block into VERBOSE mode, to keep consistency with 
`show_incremental_sort_info`:

https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2890

Please review, thanks.


From: Julien Rouhaud 
Sent: Friday, March 25, 2022 17:04
To: Jian Guo 
Cc: pgsql-hackers@lists.postgresql.org ; 
Zhenghua Lyu 
Subject: Re: Summary Sort workers Stats in EXPLAIN ANALYZE

⚠ External Email

Hi,

On Thu, Mar 24, 2022 at 07:50:11AM +, Jian Guo wrote:
> For a simple demo, with this explain statement:
>
> -- Test sort stats summary
> set force_parallel_mode=on;
> select explain_filter('explain (analyze, summary off, timing off, costs off, 
> format json) select * from tenk1 order by unique1');
>
> Before this patch, we got plan like this:
>
>
>"Node Type": "Sort",+
>"Parent Relationship": "Outer", +
>"Parallel Aware": false,+
>"Async Capable": false, +
>"Actual Rows": 1,   +
>"Actual Loops": 1,  +
>"Sort Key": ["unique1"],+
>"Workers": [+
>  { +
>"Worker Number": 0, +
>"Sort Method": "external merge",+
>"Sort Space Used": 2496,+
>"Sort Space Type": "Disk"   +
>  } +
>],  +

> After this patch, the effected plan is this:
>
>"Node Type": "Sort",   +
>"Parent Relationship": "Outer",+
>"Parallel Aware": false,   +
>"Async Capable": false,+
>"Actual Rows": N,  +
>"Actual Loops": N, +
>"Sort Key": ["unique1"],   +
>"Workers planned": N,  +
>"Sort Method": "external merge",   +
>"Average Sort Space Used": N,  +
>"Peak Sort Space Used": N, +
>"Sort Space Type": "Disk", +

I think the idea is interesting, however there are a few problems in the patch.

First, I think that it should only be done in the VERBOSE OFF mode.  If you ask
for a VERBOSE output you don't need both the details and the summarized
version.

Other minor problems:

- why (only) emitting the number of workers planned and not the number of
  workers launched?
- the textual format is missing details about what the numbers are, which is
  particularly obvious since avgSpaceUsed and peakSpaceUsed don't have any unit
  or even space between them:

+"Sort Method: %s  %s: " INT64_FORMAT INT64_FORMAT 
"kB\n",
+sortMethod, spaceType, avgSpaceUsed, peakSpaceUsed);




⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.
From 578b9ce26c7ea1a13d6cf78d0848fa6a950bd82b Mon Sep 17 00:00:00 2001
From: Aegeaner Guo 
Date: Mon, 21 Mar 2022 11:19:46 +0800
Subject: [PATCH] Summary Sort workers Stats.

Signed-off-by: Jian Guo 
---
 src/backend/commands/explain.c|  43 +--
 src/test/regress/expected/explain.out | 522 ++
 src/test/regress/sql/explain.sql  |  13 +-
 3 files changed, 390 insertions(+), 188 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..cdda0ac955 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2758,40 +2758,41 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
+		const char *sortMethod;
+		const char *spaceType;
+		int64		peakSpaceUsed = 0;
+		int64		totalSpaceUsed = 0;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
 			TuplesortInstrumentation *sinstrument;
-			const char *sortMethod;
-			const char *spaceType;
-			int64		spaceUsed;
 
 			sinstrument = >shared_info->sinstrument[n];
 			if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
 continue;		/* ignore any unfilled slots */
 			sortMethod = tuplesort_method_name(sinstrument->

Re: Summary Sort workers Stats in EXPLAIN ANALYZE

2022-03-24 Thread Jian Guo
For a simple demo, with this explain statement:

-- Test sort stats summary
set force_parallel_mode=on;
select explain_filter('explain (analyze, summary off, timing off, costs off, 
format json) select * from tenk1 order by unique1');

Before this patch, we got plan like this:


   "Node Type": "Sort",+
   "Parent Relationship": "Outer", +
   "Parallel Aware": false,+
   "Async Capable": false, +
   "Actual Rows": 1,   +
   "Actual Loops": 1,  +
   "Sort Key": ["unique1"],+
   "Workers": [+
 { +
   "Worker Number": 0, +
   "Sort Method": "external merge",+
   "Sort Space Used": 2496,+
   "Sort Space Type": "Disk"   +
 } +
   ],  +



After this patch, the effected plan is this:

   "Node Type": "Sort",   +
   "Parent Relationship": "Outer",+
   "Parallel Aware": false,   +
   "Async Capable": false,+
   "Actual Rows": N,  +
   "Actual Loops": N, +
   "Sort Key": ["unique1"],   +
   "Workers planned": N,  +
   "Sort Method": "external merge",   +
   "Average Sort Space Used": N,  +
   "Peak Sort Space Used": N, +
   "Sort Space Type": "Disk", +

From: Jian Guo 
Sent: Monday, March 21, 2022 15:50
To: pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Re: Summary Sort workers Stats in EXPLAIN ANALYZE

There is some problem with the last patch, I have removed the 
`ExplainOpenWorker` call to fix.

And also, I have added a test case in explain.sql​ according to the code change.

From: Jian Guo 
Sent: Monday, March 21, 2022 11:36
To: pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Summary Sort workers Stats in EXPLAIN ANALYZE


In current EXPLAIN ANALYZE implementation, the Sort Node stats from each 
workers are not summarized: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2762<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpostgres%2Fpostgres%2Fblob%2Fd4ba8b51c76300f06cc23f4d8a41d9f7210c4866%2Fsrc%2Fbackend%2Fcommands%2Fexplain.c%23L2762=04%7C01%7Cgjian%40vmware.com%7C0f2c3df25e8a46bdd84f08da0aebff59%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637834305971955895%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=RXY0uDuK7cFraHJqwU%2FQv%2BXhq3n%2F2cO6nv%2BoxHTbmCM%3D=0>

When the worker number is large, it will print out huge amount of node details 
in the plan. I have created this patch to summarize the tuplesort stats by 
AverageSpaceUsed / PeakSpaceUsed, make it behave just like in 
`show_incremental_sort_group_info()`: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2890<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpostgres%2Fpostgres%2Fblob%2Fd4ba8b51c76300f06cc23f4d8a41d9f7210c4866%2Fsrc%2Fbackend%2Fcommands%2Fexplain.c%23L2890=04%7C01%7Cgjian%40vmware.com%7C0f2c3df25e8a46bdd84f08da0aebff59%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637834305971955895%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=NotrZFufFycTmlVy3SKioUSWGzLalSu9SWCOccMXGCI%3D=0>



Re: Summary Sort workers Stats in EXPLAIN ANALYZE

2022-03-21 Thread Jian Guo
There is some problem with the last patch, I have removed the 
`ExplainOpenWorker` call to fix.

And also, I have added a test case in explain.sql​ according to the code change.

From: Jian Guo 
Sent: Monday, March 21, 2022 11:36
To: pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Summary Sort workers Stats in EXPLAIN ANALYZE


In current EXPLAIN ANALYZE implementation, the Sort Node stats from each 
workers are not summarized: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2762<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpostgres%2Fpostgres%2Fblob%2Fd4ba8b51c76300f06cc23f4d8a41d9f7210c4866%2Fsrc%2Fbackend%2Fcommands%2Fexplain.c%23L2762=04%7C01%7Cgjian%40vmware.com%7C0f2c3df25e8a46bdd84f08da0aebff59%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637834305971955895%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=RXY0uDuK7cFraHJqwU%2FQv%2BXhq3n%2F2cO6nv%2BoxHTbmCM%3D=0>

When the worker number is large, it will print out huge amount of node details 
in the plan. I have created this patch to summarize the tuplesort stats by 
AverageSpaceUsed / PeakSpaceUsed, make it behave just like in 
`show_incremental_sort_group_info()`: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2890<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpostgres%2Fpostgres%2Fblob%2Fd4ba8b51c76300f06cc23f4d8a41d9f7210c4866%2Fsrc%2Fbackend%2Fcommands%2Fexplain.c%23L2890=04%7C01%7Cgjian%40vmware.com%7C0f2c3df25e8a46bdd84f08da0aebff59%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637834305971955895%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=NotrZFufFycTmlVy3SKioUSWGzLalSu9SWCOccMXGCI%3D=0>

From 5b044523ee16bdae7b998ddd31fca92434e5028a Mon Sep 17 00:00:00 2001
From: Aegeaner Guo 
Date: Mon, 21 Mar 2022 11:19:46 +0800
Subject: [PATCH] Summary Sort workers Stats.

Signed-off-by: Jian Guo 
---
 src/backend/commands/explain.c|  44 +--
 src/test/regress/expected/explain.out | 523 ++
 src/test/regress/sql/explain.sql  |  13 +-
 3 files changed, 392 insertions(+), 188 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..57409bdea2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2758,40 +2758,42 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
+		const char *sortMethod;
+		const char *spaceType;
+		int64		peakSpaceUsed = 0;
+		int64		totalSpaceUsed = 0;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
 			TuplesortInstrumentation *sinstrument;
-			const char *sortMethod;
-			const char *spaceType;
-			int64		spaceUsed;
 
 			sinstrument = >shared_info->sinstrument[n];
 			if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
 continue;		/* ignore any unfilled slots */
 			sortMethod = tuplesort_method_name(sinstrument->sortMethod);
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
-			spaceUsed = sinstrument->spaceUsed;
+			peakSpaceUsed = Max(peakSpaceUsed, sinstrument->spaceUsed);
+			totalSpaceUsed += sinstrument->spaceUsed;
+		}
 
-			if (es->workers_state)
-ExplainOpenWorker(n, es);
+		int64 avgSpaceUsed = sortstate->shared_info->num_workers > 0 ?
+totalSpaceUsed / sortstate->shared_info->num_workers : 0;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-ExplainIndentText(es);
-appendStringInfo(es->str,
- "Sort Method: %s  %s: " INT64_FORMAT "kB\n",
- sortMethod, spaceType, spaceUsed);
-			}
-			else
-			{
-ExplainPropertyText("Sort Method", sortMethod, es);
-ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
-ExplainPropertyText("Sort Space Type", spaceType, es);
-			}
+		ExplainPropertyInteger("Workers planned", NULL, sortstate->shared_info->num_workers, es);
 
-			if (es->workers_state)
-ExplainCloseWorker(n, es);
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str,
+			 "Sort Method: %s  %s: " INT64_FORMAT INT64_FORMAT "kB\n",
+			 sortMethod, spaceType, avgSpaceUsed, peakSpaceUsed);
+		}
+		else
+		{
+			ExplainPropertyText("Sort Method", sortMethod, es);
+			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpaceUsed, es);
+			ExplainPropertyInteger("Peak Sort Space Used", "kB", peakSpaceUsed, es);
+			ExplainPropertyText("Sort Space Type", spaceType, es);
 		}
 	}
 }
diff 

Summary Sort workers Stats in EXPLAIN ANALYZE

2022-03-20 Thread Jian Guo

In current EXPLAIN ANALYZE implementation, the Sort Node stats from each 
workers are not summarized: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2762

When the worker number is large, it will print out huge amount of node details 
in the plan. I have created this patch to summarize the tuplesort stats by 
AverageSpaceUsed / PeakSpaceUsed, make it behave just like in 
`show_incremental_sort_group_info()`: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2890

From 8970f3d0f4a4535457dbe7f625081e592ebc1901 Mon Sep 17 00:00:00 2001
From: Jian Guo 
Date: Mon, 21 Mar 2022 11:19:46 +0800
Subject: [PATCH] Summary Sort workers Stats.

Signed-off-by: Jian Guo 
---
 src/backend/commands/explain.c | 48 +++---
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..5561231d7d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2758,41 +2758,47 @@ show_sort_info(SortState *sortstate, ExplainState *es)
 	if (sortstate->shared_info != NULL)
 	{
 		int			n;
+		const char *sortMethod;
+		const char *spaceType;
+		int64		peakSpaceUsed = 0;
+		int64		totalSpaceUsed = 0;
 
 		for (n = 0; n < sortstate->shared_info->num_workers; n++)
 		{
 			TuplesortInstrumentation *sinstrument;
-			const char *sortMethod;
-			const char *spaceType;
-			int64		spaceUsed;
 
 			sinstrument = >shared_info->sinstrument[n];
 			if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
 continue;		/* ignore any unfilled slots */
 			sortMethod = tuplesort_method_name(sinstrument->sortMethod);
 			spaceType = tuplesort_space_type_name(sinstrument->spaceType);
-			spaceUsed = sinstrument->spaceUsed;
+			peakSpaceUsed = Max(peakSpaceUsed, sinstrument->spaceUsed);
+			totalSpaceUsed += sinstrument->spaceUsed;
+		}
 
-			if (es->workers_state)
-ExplainOpenWorker(n, es);
+		int64 avgSpaceUsed = sortstate->shared_info->num_workers > 0 ?
+totalSpaceUsed / sortstate->shared_info->num_workers : 0;
 
-			if (es->format == EXPLAIN_FORMAT_TEXT)
-			{
-ExplainIndentText(es);
-appendStringInfo(es->str,
- "Sort Method: %s  %s: " INT64_FORMAT "kB\n",
- sortMethod, spaceType, spaceUsed);
-			}
-			else
-			{
-ExplainPropertyText("Sort Method", sortMethod, es);
-ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
-ExplainPropertyText("Sort Space Type", spaceType, es);
-			}
+		if (es->workers_state)
+			ExplainOpenWorker(n, es);
 
-			if (es->workers_state)
-ExplainCloseWorker(n, es);
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str,
+			 "Sort Method: %s  %s: " INT64_FORMAT INT64_FORMAT "kB\n",
+			 sortMethod, spaceType, avgSpaceUsed, peakSpaceUsed);
+		}
+		else
+		{
+			ExplainPropertyText("Sort Method", sortMethod, es);
+			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpaceUsed, es);
+			ExplainPropertyInteger("Peak Sort Space Used", "kB", peakSpaceUsed, es);
+			ExplainPropertyText("Sort Space Type", spaceType, es);
 		}
+
+		if (es->workers_state)
+			ExplainCloseWorker(n, es);
 	}
 }
 
-- 
2.35.1



Re: Teach pg_receivewal to use lz4 compression

2021-09-10 Thread Jian Guo
@@ -250,14 +302,18 @@ FindStreamingStart(uint32 *tli)
/*
 * Check that the segment has the right size, if it's supposed 
to be
 * completed.  For non-compressed segments just check the 
on-disk size
-* and see if it matches a completed segment. For compressed 
segments,
-* look at the last 4 bytes of the compressed file, which is 
where the
-* uncompressed size is located for gz files with a size lower 
than
-* 4GB, and then compare it to the size of a completed segment. 
The 4
-* last bytes correspond to the ISIZE member according to
+* and see if it matches a completed segment. For zlib 
compressed
+* segments, look at the last 4 bytes of the compressed file, 
which is
+* where the uncompressed size is located for gz files with a 
size lower
+* than 4GB, and then compare it to the size of a completed 
segment.
+* The 4 last bytes correspond to the ISIZE member according to
 * http://www.zlib.org/rfc-gzip.html.
+*
+* For lz4 compressed segments read the header using the 
exposed API and
+* compare the uncompressed file size, stored in
+* LZ4F_frameInfo_t{.contentSize}, to that of a completed 
segment.
 */
-   if (!ispartial && !iscompress)
+   if (!ispartial && wal_compression_method == COMPRESSION_NONE)
{
struct stat statbuf;
charfullpath[MAXPGPATH * 2];
@@ -276,7 +332,7 @@ FindStreamingStart(uint32 *tli)
continue;
}
}
-   else if (!ispartial && iscompress)
+   else if (!ispartial && wal_compression_method == 
COMPRESSION_ZLIB)
{
int fd;
charbuf[4];
@@ -322,6 +378,70 @@ FindStreamingStart(uint32 *tli)
continue;
}
}
+   else if (!ispartial && compression_method == COMPRESSION_LZ4)
+   {
+#ifdef HAVE_LIBLZ4
+   int fd;
+   int r;
+   size_t  consumed_len = LZ4F_HEADER_SIZE_MAX;
+   charbuf[LZ4F_HEADER_SIZE_MAX];
+   charfullpath[MAXPGPATH * 2];
+   LZ4F_frameInfo_t frame_info = { 0 };
+   LZ4F_decompressionContext_t ctx = NULL;
+
+   snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, 
dirent->d_name);
+
+   fd = open(fullpath, O_RDONLY | PG_BINARY, 0);

Should close the fd before exit or abort.