Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-05-28 Thread Ashutosh Bapat
On Mon, Feb 19, 2024 at 5:17 AM Ashutosh Bapat 
wrote:

> On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > After taking a look at the patch optimizing SpecialJoinInfo allocations,
> > I decided to take a quick look at this one too. I don't have any
> > specific comments on the code yet, but it seems quite a bit more complex
> > than the other patch ... it's introducing a HTAB into the optimizer,
> > surely that has costs too.
>
> Thanks for looking into this too.
>
> >
> > I started by doing the same test as with the other patch, comparing
> > master to the two patches (applied independently and both). And I got
> > about this (in MB):
> >
> >   tablesmaster sjinforinfo  both
> >   ---
> >237 36   3433
> >3   138129  122   113
> >4   421376  363   318
> >5  1495   1254 1172   931
> >
> > Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
> > planning time for this one.
>
> Yeah. That agreed with my observation as well.
>
> >
> > The reduction in allocated memory is nice. I wonder what's allocating
> > the remaining memory, and we'd have to do to reduce it further.
>
> Please see reply to SpecialJoinInfo thread. Other that the current
> patches, we require memory efficient Relids implementation. I have
> shared some ideas in the slides I shared in the other thread, but
> haven't spent time experimenting with any ideas there.
>
> >
> > However, this is a somewhat extreme example - it's joining 5 tables,
> > each with 1000 partitions, using a partition-wise join. It reduces the
> > amount of memory, but the planning time is still quite high (and
> > essentially the same as without the patch). So it's not like it'd make
> > them significantly more practical ... do we have any other ideas/plans
> > how to improve that?
>
> Yuya has been working on reducing planning time [1]. Has some
> significant improvements in that area based on my experiments. But
> those patches are complex and still WIP.
>
> >
> > AFAIK we don't expect this to improve "regular" cases with modest number
> > of tables / partitions, etc. But could it make them slower in some case?
> >
>
> AFAIR, my experiments did not show any degradation in regular cases
> with modest number of tables/partitions. The variation in planning
> time was with the usual planning time variations.
>

Documenting some comments from todays' patch review session
1. Instead of a nested hash table, it might be better to use a flat hash
table to save more memory.
2. new comm_rinfo member in RestrictInfo may have problems when copying
RestrictInfo or translating it. Instead commuted versions may be tracked
outside RestrictInfo

Combining the above two, it feels like we need a single hash table with
(commuted, rinfo_serial, relids) as key to store all the translations of a
RestrictInfo.

-- 
Best Wishes,
Ashutosh Bapat


Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-02-19 Thread Ashutosh Bapat
On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
 wrote:
>
> Hi,
>
> After taking a look at the patch optimizing SpecialJoinInfo allocations,
> I decided to take a quick look at this one too. I don't have any
> specific comments on the code yet, but it seems quite a bit more complex
> than the other patch ... it's introducing a HTAB into the optimizer,
> surely that has costs too.

Thanks for looking into this too.

>
> I started by doing the same test as with the other patch, comparing
> master to the two patches (applied independently and both). And I got
> about this (in MB):
>
>   tablesmaster sjinforinfo  both
>   ---
>237 36   3433
>3   138129  122   113
>4   421376  363   318
>5  1495   1254 1172   931
>
> Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
> planning time for this one.

Yeah. That agreed with my observation as well.

>
> The reduction in allocated memory is nice. I wonder what's allocating
> the remaining memory, and we'd have to do to reduce it further.

Please see reply to SpecialJoinInfo thread. Other that the current
patches, we require memory efficient Relids implementation. I have
shared some ideas in the slides I shared in the other thread, but
haven't spent time experimenting with any ideas there.

>
> However, this is a somewhat extreme example - it's joining 5 tables,
> each with 1000 partitions, using a partition-wise join. It reduces the
> amount of memory, but the planning time is still quite high (and
> essentially the same as without the patch). So it's not like it'd make
> them significantly more practical ... do we have any other ideas/plans
> how to improve that?

Yuya has been working on reducing planning time [1]. Has some
significant improvements in that area based on my experiments. But
those patches are complex and still WIP.

>
> AFAIK we don't expect this to improve "regular" cases with modest number
> of tables / partitions, etc. But could it make them slower in some case?
>

AFAIR, my experiments did not show any degradation in regular cases
with modest number of tables/partitions. The variation in planning
time was with the usual planning time variations.

[1] 
https://www.postgresql.org/message-id/flat/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN%3DD6rPHcao5gcXanY5w%40mail.gmail.com#112b3e104e0f9e39eb007abe075aae20

-- 
Best Wishes,
Ashutosh Bapat




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-02-18 Thread Andrei Lepikhov

On 19/2/2024 06:05, Tomas Vondra wrote:

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?
The planner principle of cleaning up all allocated structures after the 
optimization stage simplifies development and code. But, if we want to 
achieve horizontal scalability on many partitions, we should introduce 
per-partition memory context and reset it in between. GEQO already has a 
short-lived memory context, making designing extensions a bit more 
challenging but nothing too painful.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-02-18 Thread Tomas Vondra
Hi,

After taking a look at the patch optimizing SpecialJoinInfo allocations,
I decided to take a quick look at this one too. I don't have any
specific comments on the code yet, but it seems quite a bit more complex
than the other patch ... it's introducing a HTAB into the optimizer,
surely that has costs too.

I started by doing the same test as with the other patch, comparing
master to the two patches (applied independently and both). And I got
about this (in MB):

  tablesmaster sjinforinfo  both
  ---
   237 36   3433
   3   138129  122   113
   4   421376  363   318
   5  1495   1254 1172   931

Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
planning time for this one.

The reduction in allocated memory is nice. I wonder what's allocating
the remaining memory, and we'd have to do to reduce it further.

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?

AFAIK we don't expect this to improve "regular" cases with modest number
of tables / partitions, etc. But could it make them slower in some case?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-01-29 Thread Ashutosh Bapat
On Sat, Jan 27, 2024 at 8:26 AM vignesh C  wrote:
>
> On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat
>  wrote:
> >
> > On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
> >  wrote:
> > >
> > > The patch set is thus
> > > 0001 - patch used to measure memory used during planning
> > >
> > > 0002 - Patch to free intermediate Relids computed by
> > > adjust_child_relids_multilevel(). I didn't test memory consumption for
> > > multi-level partitioning. But this is clear improvement. In that
> > > function we free the AppendInfos array which as many pointers long as
> > > the number of relids. So it doesn't make sense not to free the Relids
> > > which can be {largest relid}/8 bytes long at least.
> > >
> > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by
> > > itself shows a small memory saving (3%) in the query below where the
> > > same clause is commuted twice. The query does not contain any
> > > partitioned tables.
> > > create table t2 (a int primary key, b int, c int);
> > > create index t2_a_b on t2(a, b);
> > > select * from t2 where 10 = a
> > > Memory used without patch: 13696 bytes
> > > Memory used with patch: 13264 bytes
> > >
> > > 0004 - Patch which implements the hash table of hash table described
> > > above and also code to avoid repeated RestrictInfo list translations.
> > >
> > > I will add this patchset to next commitfest.
> > >
> > > --
> > > Best Wishes,
> > > Ashutosh Bapat
> >
> > PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
> > the squashed version of the latest patch set at
> > https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com.
>
> CFBot shows that the patch does not apply anymore as in [1]:
> === Applying patches on top of PostgreSQL commit ID
> 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
> === applying patch
> ./0001-Report-memory-used-for-planning-a-query-in--20231031.patch
> ...
> patching file src/test/regress/expected/explain.out
> Hunk #5 FAILED at 290.
> Hunk #6 succeeded at 545 (offset 4 lines).
> 1 out of 6 hunks FAILED -- saving rejects to file
> src/test/regress/expected/explain.out.rej
> patching file src/tools/pgindent/typedefs.list
> Hunk #1 succeeded at 1562 (offset 18 lines).
>
> Please post an updated version for the same.
>
> [1] - http://cfbot.cputube.org/patch_46_4564.log
>
> Regards,
> Vignesh

Thanks Vignesh for the notification. PFA rebased patches. 0001 in
earlier patch-set is now removed.

-- 
Best Wishes,
Ashutosh Bapat
From 996664e4b4d7e65449b813828e81d0924af64e2b Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 4 Sep 2023 14:56:21 +0530
Subject: [PATCH 3/4] Save commuted RestrictInfo for later use

A commuted RestrictInfo may be produced as many times as the number of indexes
it is used for. It's the same RestrictInfo always. Save some CPU and memory by
saving the result in the original RestrictInfo.

Ashutosh Bapat
---
 src/backend/optimizer/util/appendinfo.c   |  6 ++
 src/backend/optimizer/util/restrictinfo.c | 22 +-
 src/include/nodes/pathnodes.h |  9 +
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index dd4251be80..cb9a6d2d95 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -492,6 +492,12 @@ adjust_appendrel_attrs_mutator(Node *node,
 		newinfo->left_mcvfreq = -1;
 		newinfo->right_mcvfreq = -1;
 
+		/*
+		 * Wipe out commuted parent RestrictInfo. The caller will compute
+		 * commuted clause if required.
+		 */
+		newinfo->comm_rinfo = NULL;
+
 		return (Node *) newinfo;
 	}
 
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index 0b406e9334..48f8b38258 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -246,6 +246,8 @@ make_restrictinfo_internal(PlannerInfo *root,
 
 	restrictinfo->left_hasheqoperator = InvalidOid;
 	restrictinfo->right_hasheqoperator = InvalidOid;
+	restrictinfo->comm_rinfo = NULL;
+	restrictinfo->is_commuted = false;
 
 	return restrictinfo;
 }
@@ -354,14 +356,27 @@ make_sub_restrictinfos(PlannerInfo *root,
  * be hazardous if the source is subject to change.  Also notice that we
  * assume without checking that the commutator op is a member of the same
  * btree and hash opclasses as the original op.
+ *
+ * If a commuted RestrictInfo is already available it is returned.
  */
 RestrictInfo *
 commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op)
 {
 	RestrictInfo *result;
 	OpExpr	   *newclause;
-	OpExpr	   *clause = castNode(OpExpr, rinfo->clause);
+	OpExpr	   *clause;
+
+	if (rinfo->comm_rinfo)
+	{
+		result = rinfo->comm_rinfo;
+		newclause = castNode(OpExpr, result->clause);
+		Assert(list_length(newclause->args) == 2);
+		Assert(newclause->opno == comm_op);
 
+		return result;
+	}
+
+	clause = castNode(OpExpr, 

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2024-01-26 Thread vignesh C
On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat
 wrote:
>
> On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
>  wrote:
> >
> > The patch set is thus
> > 0001 - patch used to measure memory used during planning
> >
> > 0002 - Patch to free intermediate Relids computed by
> > adjust_child_relids_multilevel(). I didn't test memory consumption for
> > multi-level partitioning. But this is clear improvement. In that
> > function we free the AppendInfos array which as many pointers long as
> > the number of relids. So it doesn't make sense not to free the Relids
> > which can be {largest relid}/8 bytes long at least.
> >
> > 0003 - patch to save and reuse commuted RestrictInfo. This patch by
> > itself shows a small memory saving (3%) in the query below where the
> > same clause is commuted twice. The query does not contain any
> > partitioned tables.
> > create table t2 (a int primary key, b int, c int);
> > create index t2_a_b on t2(a, b);
> > select * from t2 where 10 = a
> > Memory used without patch: 13696 bytes
> > Memory used with patch: 13264 bytes
> >
> > 0004 - Patch which implements the hash table of hash table described
> > above and also code to avoid repeated RestrictInfo list translations.
> >
> > I will add this patchset to next commitfest.
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
>
> PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
> the squashed version of the latest patch set at
> https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
=== applying patch
./0001-Report-memory-used-for-planning-a-query-in--20231031.patch
...
patching file src/test/regress/expected/explain.out
Hunk #5 FAILED at 290.
Hunk #6 succeeded at 545 (offset 4 lines).
1 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/explain.out.rej
patching file src/tools/pgindent/typedefs.list
Hunk #1 succeeded at 1562 (offset 18 lines).

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4564.log

Regards,
Vignesh




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-11-24 Thread Ashutosh Bapat
On Fri, Nov 24, 2023 at 3:56 PM Alena Rybakina  wrote:
>
> On 24.11.2023 13:20, Alena Rybakina wrote:
>
> Hi! Thank you for your work on the subject, I think it's a really useful 
> feature.
>
> I've reviewed your patch and I have a few questions.
>
> First of all, have you thought about creating a gun parameter to display 
> memory scheduling information? I agree that this is an important feature, but 
> I think only for debugging.

Not a GUC parameter but I have a proposal to use EXPLAIN for the same. [1]

>
> Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) 
> and can it lead to collisions? Why didn't you generate a hash from 
> childRestrictInfo for this? For example, something like how it is formed here 
> [0].

Not usually. But that's the only "key" we have to access a set of
sematically same RestrictInfos. Relids is another key to access the
exact RestrictInfo. A child RestrictInfo can not be used since there
will many child RestrictInfos. Similar parent RestrictInfo can not be
used since there will be multiple forms of the same RestrictInfo.

[1] https://commitfest.postgresql.org/45/4492/

-- 
Best Wishes,
Ashutosh Bapat




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-11-24 Thread Alena Rybakina

On 24.11.2023 13:20, Alena Rybakina wrote:


Hi! Thank you for your work on the subject, I think it's a really 
useful feature.


I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to 
display memory scheduling information? I agree that this is an 
important feature, but I think only for debugging.


Secondly, I noticed that for the child_rinfo_hash key you use a 
counter (int) and can it lead to collisions? Why didn't you generate a 
hash from childRestrictInfo for this? For example, something like how 
it is formed here [0].


[0] 
https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru 



Sorry, my first question was not clear, I mean: have you thought about 
creating a guc parameter to display memory planning information?


--
Regards,
Alena Rybakina


Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-11-24 Thread Alena Rybakina
Hi! Thank you for your work on the subject, I think it's a really useful 
feature.


I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to display 
memory scheduling information? I agree that this is an important 
feature, but I think only for debugging.


Secondly, I noticed that for the child_rinfo_hash key you use a counter 
(int) and can it lead to collisions? Why didn't you generate a hash from 
childRestrictInfo for this? For example, something like how it is formed 
here [0].


[0] 
https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru 



--
Regards,
Alena Rybakina
Postgres Professional


Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-10-30 Thread Ashutosh Bapat
On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
 wrote:
>
> The patch set is thus
> 0001 - patch used to measure memory used during planning
>
> 0002 - Patch to free intermediate Relids computed by
> adjust_child_relids_multilevel(). I didn't test memory consumption for
> multi-level partitioning. But this is clear improvement. In that
> function we free the AppendInfos array which as many pointers long as
> the number of relids. So it doesn't make sense not to free the Relids
> which can be {largest relid}/8 bytes long at least.
>
> 0003 - patch to save and reuse commuted RestrictInfo. This patch by
> itself shows a small memory saving (3%) in the query below where the
> same clause is commuted twice. The query does not contain any
> partitioned tables.
> create table t2 (a int primary key, b int, c int);
> create index t2_a_b on t2(a, b);
> select * from t2 where 10 = a
> Memory used without patch: 13696 bytes
> Memory used with patch: 13264 bytes
>
> 0004 - Patch which implements the hash table of hash table described
> above and also code to avoid repeated RestrictInfo list translations.
>
> I will add this patchset to next commitfest.
>
> --
> Best Wishes,
> Ashutosh Bapat

PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
the squashed version of the latest patch set at
https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com.

-- 
Best Wishes,
Ashutosh Bapat
From de823e9cf0b03f21f8de51b49347df58ecb27c5a Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 31 Aug 2023 19:02:23 +0530
Subject: [PATCH 2/4] Free intermediate Relids created in
 adjust_child_relids_multilevel()

adjust_child_relids_multilevel() creates Relids for all the intermediate stages
in a partition hierarchy. These relid sets are not required finally. Relids or
Bitmapset take reasonably large space when thousands of partitions are
invovled. Hence free these intermediate relid sets.

Ashutosh Bapat
---
 src/backend/optimizer/util/appendinfo.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..4008e52de2 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -591,6 +591,8 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 {
 	AppendRelInfo **appinfos;
 	int			nappinfos;
+	Relids		tmp_relids = NULL;
+	Relids		child_relids;
 
 	/*
 	 * If the given relids set doesn't contain any of the parent relids, it
@@ -599,13 +601,14 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 	if (!bms_overlap(relids, parentrel->relids))
 		return relids;
 
+	tmp_relids = relids;
 	/* Recurse if immediate parent is not the top parent. */
 	if (childrel->parent != parentrel)
 	{
 		if (childrel->parent)
-			relids = adjust_child_relids_multilevel(root, relids,
-	childrel->parent,
-	parentrel);
+			tmp_relids = adjust_child_relids_multilevel(root, relids,
+		childrel->parent,
+		parentrel);
 		else
 			elog(ERROR, "childrel is not a child of parentrel");
 	}
@@ -613,11 +616,15 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 	/* Now translate for this child. */
 	appinfos = find_appinfos_by_relids(root, childrel->relids, );
 
-	relids = adjust_child_relids(relids, nappinfos, appinfos);
+	child_relids = adjust_child_relids(tmp_relids, nappinfos, appinfos);
+
+	/* Free any intermediate Relids created. */
+	if (tmp_relids != relids)
+		bms_free(tmp_relids);
 
 	pfree(appinfos);
 
-	return relids;
+	return child_relids;
 }
 
 /*
-- 
2.25.1

From 69cfd8bd6ea38a99a715268423d92fa4179f7cc7 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 4 Sep 2023 14:56:21 +0530
Subject: [PATCH 3/4] Save commuted RestrictInfo for later use

A commuted RestrictInfo may be produced as many times as the number of indexes
it is used for. It's the same RestrictInfo always. Save some CPU and memory by
saving the result in the original RestrictInfo.

Ashutosh Bapat
---
 src/backend/optimizer/util/appendinfo.c   |  6 ++
 src/backend/optimizer/util/restrictinfo.c | 22 +-
 src/include/nodes/pathnodes.h |  9 +
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 4008e52de2..53e1233dce 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -492,6 +492,12 @@ adjust_appendrel_attrs_mutator(Node *node,
 		newinfo->left_mcvfreq = -1;
 		newinfo->right_mcvfreq = -1;
 
+		/*
+		 * Wipe out commuted parent RestrictInfo. The caller will compute
+		 * commuted clause if required.
+		 */
+		newinfo->comm_rinfo = NULL;
+
 		return (Node *) newinfo;
 	}
 
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index 

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-09-14 Thread Ashutosh Bapat
Hi All,

On Fri, Aug 11, 2023 at 6:24 PM Ashutosh Bapat
 wrote:
>
> Obtaining child clauses from parent clauses by translation and
> tracking the translations is less complex and may be more efficient
> too. I will post a patch on those lines soon.
>

PFA patch set to add infrastructure to track RestrictInfo translations
and reuse them.

PlannerInfo gets a new member "struct HTAB *child_rinfo_hash" which is
a hash table of hash tables keyed by RestrictInfo::rinfo_serial. Each
element in the array is a hash table of RestrictInfos keyed by
RestrictInfo::required_relids as explained in my previous email. When
building child clauses when a. building child join rels or b. when
reparameterizing paths, we first access the first level hash table
using RestrictInfo::rinfo_serial of the parent and search the required
translation by computing the child RestrictInfo::required_relids
obtained by translating RestrictInfo::required_relids of the parent
RestrictInfo. If the translation doesn't exist, we create one and add
to the hash table.

RestrictInfo::required_relids is same for a RestrictInfo and its
commuted RestrictInfo. The order of operands is important for
IndexClauses. Hence we track the commuted RestrictInfo in a new field
RestrictInfo::comm_rinfo. RestrictInfo::is_commuted differentiates
between a RestrictInfo and its commuted version. This is explained as
a comment in the patch. This scheme has a minor benefit of saving
memory when the same RestrictInfo is commuted multiple times.

Hash table of hash table is used instead of an array of hash tables
since a. not every rinfo_serial has a RestrictInfo associated with it
b. not every RestrictInfo has translations, c. I don't think the exact
size of this array is not known till the planning ends since we
continue to create new clauses as we create new RelOptInfos. Of
course, an array can be repalloc'ed and unused slots in the array may
not waste a lot of memory. I am open to change hash table to an array
which may be more efficient.

With these set of patches, the memory consumption stands as below
Number of tables | without patch  | with patch | % reduction |
being joined ||| |
--
   2 |  40.8 MiB  |   37.4 MiB |   8.46% |
   3 | 151.6 MiB  |  135.0 MiB |  10.96% |
   4 | 464.0 MiB  |  403.6 MiB |  12.00% |
   5 |1663.9 MiB  | 1329.1 MiB |  20.12% |

The patch set is thus
0001 - patch used to measure memory used during planning

0002 - Patch to free intermediate Relids computed by
adjust_child_relids_multilevel(). I didn't test memory consumption for
multi-level partitioning. But this is clear improvement. In that
function we free the AppendInfos array which as many pointers long as
the number of relids. So it doesn't make sense not to free the Relids
which can be {largest relid}/8 bytes long at least.

0003 - patch to save and reuse commuted RestrictInfo. This patch by
itself shows a small memory saving (3%) in the query below where the
same clause is commuted twice. The query does not contain any
partitioned tables.
create table t2 (a int primary key, b int, c int);
create index t2_a_b on t2(a, b);
select * from t2 where 10 = a
Memory used without patch: 13696 bytes
Memory used with patch: 13264 bytes

0004 - Patch which implements the hash table of hash table described
above and also code to avoid repeated RestrictInfo list translations.

I will add this patchset to next commitfest.

--
Best Wishes,
Ashutosh Bapat
From 3b9e8039c15c87f6dc83369419c85c4a0a3b5688 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 31 Aug 2023 19:02:23 +0530
Subject: [PATCH 2/4] Free intermediate Relids created in
 adjust_child_relids_multilevel()

adjust_child_relids_multilevel() creates Relids for all the intermediate stages
in a partition hierarchy. These relid sets are not required finally. Relids or
Bitmapset take reasonably large space when thousands of partitions are
invovled. Hence free these intermediate relid sets.

Ashutosh Bapat
---
 src/backend/optimizer/util/appendinfo.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..4008e52de2 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -591,6 +591,8 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 {
 	AppendRelInfo **appinfos;
 	int			nappinfos;
+	Relids		tmp_relids = NULL;
+	Relids		child_relids;
 
 	/*
 	 * If the given relids set doesn't contain any of the parent relids, it
@@ -599,13 +601,14 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 	if (!bms_overlap(relids, parentrel->relids))
 		return relids;
 
+	tmp_relids = relids;
 	/* Recurse if immediate parent is not the top parent. */
 	if (childrel->parent 

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-08-11 Thread Ashutosh Bapat
I spent some time on 4th point below but also looked at other points.
Here's what I have found so far

On Thu, Jul 27, 2023 at 7:35 PM Ashutosh Bapat
 wrote:
>
> 1. The patch uses RestrictInfo::required_relids as the key for
> searching child RelOptInfos. I am not sure which of the two viz.
> required_relids and clause_relids is a better key. required_relids
> seems to be a subset of clause_relids and from the description it
> looks like that's the set that decides the applicability of a clause
> in a join. But clause_relids is obtained from all the Vars that appear
> in the clause, so may be that's the one that matters for the
> translations. Can somebody guide me?

I was wrong that required_relids is subset of clause_relids. The first
can contain OJ relids which the other can not. OJ relids do not have
any children, so they won't be translated. So clause_relids seems to
be a better key. I haven't made a decision yet.

>
> 2. The patch adds two extra pointers per RestrictInfo. They will
> remain unused when partitionwise join is not used. Right now, I do not
> see any increase in memory consumed by planner because of those
> pointers even in case of unpartitioned tables; maybe they are absorbed
> in memory alignment. They may show up as extra memory in the future. I
> am wondering whether we can instead save and track translations in
> PlannerInfo as a hash table using  whatever is the answer to above question) of parent and child
> respectively> as key. That will just add one extra pointer in
> PlannerInfo when partitionwise join is not used. Please let me know
> your suggestions.

I will go ahead with a pointer in PlannerInfo for now.

>
> 3. I have changed adjust_appendrel_attrs_mutator() to return a
> translated RestrictInfo if it already exists. IOW, it won't always
> return a deep copy of given RestrictInfo as it does today. This can be
> fixed by writing wrappers around adjust_appendrel_attrs() to translate
> RestrictInfo specifically. But maybe we don't always need deep copies.
> Are there any cases when we need translated deep copies of
> RestrictInfo? Those cases will require fixing callers of
> adjust_appendrel_attrs() instead of the mutator.

I think it's better to handle the tracking logic outside
adjust_appendrel_attrs. That will be some code churn but it will be
cleaner and won't affect anything other that partitionwise joins.

>
> 4. IIRC, when partitionwise join was implemented we had discussed
> creating child RestrictInfos using a login similar to
> build_joinrel_restrictlist(). That might be another way to build
> RestrictInfo only once and use it multiple times. But we felt that it
> was much harder problem to solve since it's not known which partitions
> from joining partitioned tables will match and will be joined till we
> enter try_partitionwise_join(), so the required RestrictInfos may not
> be available in RelOptInfo::joininfo. Let me know your thoughts on
> this.

Here's some lengthy description of why I feel translations are better
compared to computing restrictlist and joininfo for a child join from
joining relation's joininfo
Consider a query "select * from p, q, r where p.c1 = q.c1 and q.c1 =
r.c1 and p.c2 + q.c2 < r.c2 and p.c3 != q.c3 and q.c3 != r.c3". The
query has following clauses
1. p.c1 = q.c1
2. q.c1 = r.c1
3. p.c2 + q.c2 < r.c2
4. p.c3 != q.c3
5. q.c3 != r.c3

The first two clauses are added to EC machinery and do not appear in
joininfo. They appear in restrictlist when we construct clauses in
restrictlist from ECs. Let's ignore them for now.

Assume that each table p, q, r has partitions (p1, p2, ...), (q1, q2,
...) and (r1, r2, ... ) respectively. Each triplet (pi, qi,ri) forms
the set of matching partitions from p, q, r respectively for all "i".
Consider join, p1q1r1. We will generate relations p1, q1, r1, p1q1,
p1r1, q1r1 and p1q1r1 while building the last join. Below is
description of how these clauses would look in each of these relations
and the list they appear in when computing that join. Please notice
the numeric suffixes carefully.

p1.
joininfo: p1.c2 + q.c2 < r.c2, p1.c3 != q.c3
restrictlist: <>

q1
joininfo: p.c2 + q1.c2 < r.c2, p.c3 != q1.c3, q1.c3 != r.c3
restrictlist: <>

r1
joininfo: p.c2 + q.c2 < r1.c2, q.c3 != r1.c3
restrictlist: <>

p1q1
joininfo: p1.c2 + q1.c2 < r.c2, q1.c3 != r.c3
restrictlist: p1.c3 != q1.c3

q1r1
joininfo: p.c2 + q1.c2 < r1.c2, p.c3 != q1.c3
restrictlist: q1.c3 != r1.c3

p1r1
joininfo: p1.c2 + q.c2 < r1.c2, p1.c3 != q.c3, q.c3 != r1.c3
restrictlist: <>

p1q1r1
joininfo: <>
restrictlist for (p1q1)r1: p1.c2 + q1.c2 < r1.c2, q1.c3 != r1.c3
restrictlist for (p1r1)q1: p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3, q1.c3 != r1.c3
restrictlist for p1(q1r1): p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3

If we translate the clauses when building join e.g. translate p1.c3 !=
q1.c3 when building p1q1 or p1q1r1, it would cause repeated
translations. So the translations need to be saved in lower relations
when we detect matching partitions and then 

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-08-09 Thread Ashutosh Bapat
On Wed, Aug 9, 2023 at 10:12 AM David Rowley  wrote:
>
> On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
>  wrote:
> > 0001 - to measure memory consumption during planning. This is the same
> > one as attached to [1].
>

I have started a separate thread to discuss this patch. I am taking
this discussion to that thread.

-- 
Best Wishes,
Ashutosh Bapat




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-08-08 Thread David Rowley
On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
 wrote:
> 0001 - to measure memory consumption during planning. This is the same
> one as attached to [1].

I see you're recording the difference in the CurrentMemoryContext of
palloc'd memory before and after planning.  That won't really alert us
to problems if the planner briefly allocates, say 12GBs of memory
during, say the join search then quickly pfree's it again.  unless
it's an oversized chunk, aset.c won't free() any memory until
MemoryContextReset(). Chunks just go onto a freelist for reuse later.
So at the end of planning, the context may still have that 12GBs
malloc'd, yet your new EXPLAIN ANALYZE property might end up just
reporting a tiny fraction of that.

I wonder if it would be more useful to just have ExplainOneQuery()
create a new memory context and switch to that and just report the
context's mem_allocated at the end.

It's also slightly annoying that these planner-related summary outputs
are linked to EXPLAIN ANALYZE. We could be showing them in EXPLAIN
without ANALYZE.  If we were to change that now, it might be a bit
annoying for the regression tests as we'd need to go and add SUMMARY
OFF in a load of places...

drowley@amd3990x:~/pg_src/src/test/regress/sql$ git grep -i "costs off" | wc -l
1592

hmm, that would cause a bit of churn... :-(

David




Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-08-08 Thread Ashutosh Bapat
On Tue, Aug 8, 2023 at 4:08 PM Richard Guo  wrote:
>
>
> I haven't looked into the details but with 0002 patch I came across a
> crash while planning the query below.
>
> regression=# set enable_partitionwise_join to on;
> SET
> regression=# EXPLAIN (COSTS OFF)
> SELECT * FROM prt1 t1, prt2 t2
> WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
> server closed the connection unexpectedly

Thanks for the report. PFA the patches with this issue fixed. There is
another issue seen when running partition_join.sql "ERROR:  index key
does not match expected index column". I will investigate that issue.

Right now I am looking at the 4th point in my first email in this
thread. [1]. I am trying to figure whether that approach would work.
If that approach doesn't work what's the best to track the
translations and also figure out answers to 1, 2, 3 there. Please let
me know your opinions if any.

-- 
Best Wishes,
Ashutosh Bapat

[1] 
https://www.postgresql.org/message-id/CAExHW5s=bclmmq8n_bn6iu+pjau0ds3z_6dn6ile69esmsp...@mail.gmail.com
From 4abdfd676a31d885ccdbe1803c60c1df1d0c1df2 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 12 Jul 2023 14:34:14 +0530
Subject: [PATCH 1/2] Report memory used for planning a query in EXPLAIN
 ANALYZE

The memory used in the CurrentMemoryContext and its children is sampled
before and after calling pg_plan_query() from ExplainOneQuery(). The
difference in the two samples is reported as the memory consumed while
planning the query. This may not account for the memory allocated in
memory contexts which are not children of CurrentMemoryContext. These
contexts are usually other long lived contexts, e.g.
CacheMemoryContext, which are shared by all the queries run in a
session. The consumption in those can not be attributed only to a given
query and hence should not be reported any way.

The memory consumption is reported as "Planning Memory" property in
EXPLAIN ANALYZE output.

Ashutosh Bapat
---
 src/backend/commands/explain.c| 12 ++--
 src/backend/commands/prepare.c|  7 ++-
 src/backend/utils/mmgr/mcxt.c | 19 +++
 src/include/commands/explain.h|  3 ++-
 src/include/utils/memutils.h  |  1 +
 src/test/regress/expected/explain.out | 15 +++
 6 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9f859949f0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions,
 	planduration;
 		BufferUsage bufusage_start,
 	bufusage;
+		Size		mem_consumed;
 
 		if (es->buffers)
 			bufusage_start = pgBufferUsage;
 		INSTR_TIME_SET_CURRENT(planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 		/* plan the query */
 		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
+		mem_consumed = MemoryContextMemUsed(CurrentMemoryContext)
+		- mem_consumed;
 
 		/* calc differences of buffer counters. */
 		if (es->buffers)
@@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 
 		/* run it (if needed) and produce output */
 		ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
-	   , (es->buffers ?  : NULL));
+	   , (es->buffers ?  : NULL), _consumed);
 	}
 }
 
@@ -527,7 +531,7 @@ void
 ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage)
+			   const BufferUsage *bufusage, const Size *mem_consumed)
 {
 	DestReceiver *dest;
 	QueryDesc  *queryDesc;
@@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		double		plantime = INSTR_TIME_GET_DOUBLE(*planduration);
 
 		ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es);
+
+		if (mem_consumed)
+			ExplainPropertyUInteger("Planning Memory", "bytes",
+		(uint64) *mem_consumed, es);
 	}
 
 	/* Print info about runtime of triggers */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..84544ce481 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 bufusage;
+	Size		mem_consumed;
 
 	if (es->buffers)
 		bufusage_start = pgBufferUsage;
 	INSTR_TIME_SET_CURRENT(planstart);
+	mem_consumed = MemoryContextMemUsed(CurrentMemoryContext);
 
 	/* Look it up in the hash table */
 	entry = FetchPreparedStatement(execstmt->name, true);
@@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 
 	

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-08-08 Thread Richard Guo
On Thu, Jul 27, 2023 at 10:06 PM Ashutosh Bapat <
ashutosh.bapat@gmail.com> wrote:

> 0002 - WIP patch to avoid repeated translations of RestrictInfo.
> The WIP patch avoids repeated translations by tracking the child for
> which a RestrictInfo is translated and reusing the same translation
> every time it is requested. In order to track the translations,
> RestrictInfo gets two new members.
> 1. parent_rinfo - In a child's RestrictInfo this points to the
> RestrictInfo applicable to the topmost parent in partition hierarchy.
> This is NULL in the topmost parent's RestrictInfo
> 2. child_rinfos - In a parent's RestrictInfo, this is a list that
> contains all the translated child RestrictInfos. In child
> RestrictInfos this is NULL.


I haven't looked into the details but with 0002 patch I came across a
crash while planning the query below.

regression=# set enable_partitionwise_join to on;
SET
regression=# EXPLAIN (COSTS OFF)
SELECT * FROM prt1 t1, prt2 t2
WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
server closed the connection unexpectedly

Thanks
Richard


Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2023-07-27 Thread Ashutosh Bapat
Hi All,

Following up on [1] ...

A restrictlist is a list of RestrictInfo nodes, each representing one
clause, applicable to a set of joins.  When partitionwise join is used
as a strategy, the restrictlists for child join are obtained by
translating the restrictlists for the parent in
try_partitionwise_join(). That function is called once for every join
order. E.g. when computing join ABC, it will be called for planning
joins AB, BC, AC, (AB)C, A(BC), B(AC). Every time it is called it will
translate the given parent restrictlist. This means that a
RestrictInfo node applicable to given child relations will be
translated as many times as the join orders in which those child
relations appear in different joining relations.

For example, consider a query "select * from A, B, C where A.a = B.a
and B.a = C.a" where A, B and C are partitioned tables. A has
partitions A1, A2, ... An. B has partitions B1, B2, ... Bn and C has
partitions C1, C2, ... Cn. Partitions Ai, Bi and Ci are matching
partitions respectively for all i's. The join ABC is computed as
Append of (A1B1C1, A2B2C2, ... AnBnCn). The clause A.a = B.a is
translated to A1.a = B1.a thrice, when computing A1B1, A1(B1C1) and
B1(A1C1) respectively. Similarly other clauses are translated multiple
times. Some extra translations also happen in
reparameterize_path_by_child().

These translations consume memory which remains allocated till the
statement finishes. A ResrtictInfo should be translated only once per
parent-child pair, thus avoiding consuming extra memory.

There are two patches attached
0001 - to measure memory consumption during planning. This is the same
one as attached to [1].
0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.

Every translated RestrictInfo is stored in the top parent's
RestrictInfo child_rinfos. RestrictInfo::required_relids is used as a
key to search a given translation. I have intercepted
adjust_appendrel_attrs_mutator() to track translations as well as
avoid multiple translations. It first looks for an existing
translation when translating a RestrictInfo and creates a new one only
when one doesn't exist already.

Using this patch the memory consumption for the above query reduces as follows

Number of partitions: 1000

Number of tables | without patch  | with patch | % reduction |
being joined ||| |
--
   2 |  40.3 MiB  |   37.7 MiB |   6.43% |
   3 | 146.8 MiB  |  133.0 MiB |   9.42% |
   4 | 445.4 MiB  |  389.5 MiB |  12.57% |
   5 |1563.2 MiB  | 1243.2 MiB |  20.47% |

The number of times a RestrictInfo requires to be translated increases
exponentially with the number of tables joined. Thus we see more
memory saved as the number of tables joined increases. When two tables
are joined there's only a single join planned so no extra translations
happen in try_partitionwise_join(). The memory saved in case of 2
joining tables comes from avoiding extra translations happening during
reparameterization of paths (in reparameterize_path_by_child()).

The attached patch is to show how much memory can be saved if we avoid
extra translation. But I want to discuss the following things about
the approach.

1. The patch uses RestrictInfo::required_relids as the key for
searching child RelOptInfos. I am not sure which of the two viz.
required_relids and clause_relids is a better key. required_relids
seems to be a subset of clause_relids and from the description it
looks like that's the set that decides the applicability of a clause
in a join. But clause_relids is obtained from all the Vars that appear
in the clause, so may be that's the one that matters for the
translations. Can somebody guide me?

2. The patch adds two extra pointers per RestrictInfo. They will
remain unused when partitionwise join is not used. Right now, I do not
see any increase in memory consumed by planner because of those
pointers even in case of unpartitioned tables; maybe they are absorbed
in memory alignment. They may show up as extra memory in the future. I
am wondering whether we can instead save and track translations in
PlannerInfo as a hash table using  as key. That will just add one extra pointer in
PlannerInfo when partitionwise join is not used. Please let me know
your suggestions.

3. I have changed