Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Mon, Feb 19, 2024 at 5:17 AM Ashutosh Bapat wrote: > On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra > wrote: > > > > Hi, > > > > After taking a look at the patch optimizing SpecialJoinInfo allocations, > > I decided to take a quick look at this one too. I don't have any > > specific comments on the code yet, but it seems quite a bit more complex > > than the other patch ... it's introducing a HTAB into the optimizer, > > surely that has costs too. > > Thanks for looking into this too. > > > > > I started by doing the same test as with the other patch, comparing > > master to the two patches (applied independently and both). And I got > > about this (in MB): > > > > tablesmaster sjinforinfo both > > --- > >237 36 3433 > >3 138129 122 113 > >4 421376 363 318 > >5 1495 1254 1172 931 > > > > Unlike the SpecialJoinInfo patch, I haven't seen any reduction in > > planning time for this one. > > Yeah. That agreed with my observation as well. > > > > > The reduction in allocated memory is nice. I wonder what's allocating > > the remaining memory, and we'd have to do to reduce it further. > > Please see reply to SpecialJoinInfo thread. Other that the current > patches, we require memory efficient Relids implementation. I have > shared some ideas in the slides I shared in the other thread, but > haven't spent time experimenting with any ideas there. > > > > > However, this is a somewhat extreme example - it's joining 5 tables, > > each with 1000 partitions, using a partition-wise join. It reduces the > > amount of memory, but the planning time is still quite high (and > > essentially the same as without the patch). So it's not like it'd make > > them significantly more practical ... do we have any other ideas/plans > > how to improve that? > > Yuya has been working on reducing planning time [1]. Has some > significant improvements in that area based on my experiments. But > those patches are complex and still WIP. > > > > > AFAIK we don't expect this to improve "regular" cases with modest number > > of tables / partitions, etc. But could it make them slower in some case? > > > > AFAIR, my experiments did not show any degradation in regular cases > with modest number of tables/partitions. The variation in planning > time was with the usual planning time variations. > Documenting some comments from todays' patch review session 1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory. 2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commuted versions may be tracked outside RestrictInfo Combining the above two, it feels like we need a single hash table with (commuted, rinfo_serial, relids) as key to store all the translations of a RestrictInfo. -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra wrote: > > Hi, > > After taking a look at the patch optimizing SpecialJoinInfo allocations, > I decided to take a quick look at this one too. I don't have any > specific comments on the code yet, but it seems quite a bit more complex > than the other patch ... it's introducing a HTAB into the optimizer, > surely that has costs too. Thanks for looking into this too. > > I started by doing the same test as with the other patch, comparing > master to the two patches (applied independently and both). And I got > about this (in MB): > > tablesmaster sjinforinfo both > --- >237 36 3433 >3 138129 122 113 >4 421376 363 318 >5 1495 1254 1172 931 > > Unlike the SpecialJoinInfo patch, I haven't seen any reduction in > planning time for this one. Yeah. That agreed with my observation as well. > > The reduction in allocated memory is nice. I wonder what's allocating > the remaining memory, and we'd have to do to reduce it further. Please see reply to SpecialJoinInfo thread. Other that the current patches, we require memory efficient Relids implementation. I have shared some ideas in the slides I shared in the other thread, but haven't spent time experimenting with any ideas there. > > However, this is a somewhat extreme example - it's joining 5 tables, > each with 1000 partitions, using a partition-wise join. It reduces the > amount of memory, but the planning time is still quite high (and > essentially the same as without the patch). So it's not like it'd make > them significantly more practical ... do we have any other ideas/plans > how to improve that? Yuya has been working on reducing planning time [1]. Has some significant improvements in that area based on my experiments. But those patches are complex and still WIP. > > AFAIK we don't expect this to improve "regular" cases with modest number > of tables / partitions, etc. But could it make them slower in some case? > AFAIR, my experiments did not show any degradation in regular cases with modest number of tables/partitions. The variation in planning time was with the usual planning time variations. [1] https://www.postgresql.org/message-id/flat/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN%3DD6rPHcao5gcXanY5w%40mail.gmail.com#112b3e104e0f9e39eb007abe075aae20 -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On 19/2/2024 06:05, Tomas Vondra wrote: However, this is a somewhat extreme example - it's joining 5 tables, each with 1000 partitions, using a partition-wise join. It reduces the amount of memory, but the planning time is still quite high (and essentially the same as without the patch). So it's not like it'd make them significantly more practical ... do we have any other ideas/plans how to improve that? The planner principle of cleaning up all allocated structures after the optimization stage simplifies development and code. But, if we want to achieve horizontal scalability on many partitions, we should introduce per-partition memory context and reset it in between. GEQO already has a short-lived memory context, making designing extensions a bit more challenging but nothing too painful. -- regards, Andrei Lepikhov Postgres Professional
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi, After taking a look at the patch optimizing SpecialJoinInfo allocations, I decided to take a quick look at this one too. I don't have any specific comments on the code yet, but it seems quite a bit more complex than the other patch ... it's introducing a HTAB into the optimizer, surely that has costs too. I started by doing the same test as with the other patch, comparing master to the two patches (applied independently and both). And I got about this (in MB): tablesmaster sjinforinfo both --- 237 36 3433 3 138129 122 113 4 421376 363 318 5 1495 1254 1172 931 Unlike the SpecialJoinInfo patch, I haven't seen any reduction in planning time for this one. The reduction in allocated memory is nice. I wonder what's allocating the remaining memory, and we'd have to do to reduce it further. However, this is a somewhat extreme example - it's joining 5 tables, each with 1000 partitions, using a partition-wise join. It reduces the amount of memory, but the planning time is still quite high (and essentially the same as without the patch). So it's not like it'd make them significantly more practical ... do we have any other ideas/plans how to improve that? AFAIK we don't expect this to improve "regular" cases with modest number of tables / partitions, etc. But could it make them slower in some case? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Sat, Jan 27, 2024 at 8:26 AM vignesh C wrote: > > On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat > wrote: > > > > On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat > > wrote: > > > > > > The patch set is thus > > > 0001 - patch used to measure memory used during planning > > > > > > 0002 - Patch to free intermediate Relids computed by > > > adjust_child_relids_multilevel(). I didn't test memory consumption for > > > multi-level partitioning. But this is clear improvement. In that > > > function we free the AppendInfos array which as many pointers long as > > > the number of relids. So it doesn't make sense not to free the Relids > > > which can be {largest relid}/8 bytes long at least. > > > > > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by > > > itself shows a small memory saving (3%) in the query below where the > > > same clause is commuted twice. The query does not contain any > > > partitioned tables. > > > create table t2 (a int primary key, b int, c int); > > > create index t2_a_b on t2(a, b); > > > select * from t2 where 10 = a > > > Memory used without patch: 13696 bytes > > > Memory used with patch: 13264 bytes > > > > > > 0004 - Patch which implements the hash table of hash table described > > > above and also code to avoid repeated RestrictInfo list translations. > > > > > > I will add this patchset to next commitfest. > > > > > > -- > > > Best Wishes, > > > Ashutosh Bapat > > > > PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is > > the squashed version of the latest patch set at > > https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com. > > CFBot shows that the patch does not apply anymore as in [1]: > === Applying patches on top of PostgreSQL commit ID > 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 === > === applying patch > ./0001-Report-memory-used-for-planning-a-query-in--20231031.patch > ... > patching file src/test/regress/expected/explain.out > Hunk #5 FAILED at 290. > Hunk #6 succeeded at 545 (offset 4 lines). > 1 out of 6 hunks FAILED -- saving rejects to file > src/test/regress/expected/explain.out.rej > patching file src/tools/pgindent/typedefs.list > Hunk #1 succeeded at 1562 (offset 18 lines). > > Please post an updated version for the same. > > [1] - http://cfbot.cputube.org/patch_46_4564.log > > Regards, > Vignesh Thanks Vignesh for the notification. PFA rebased patches. 0001 in earlier patch-set is now removed. -- Best Wishes, Ashutosh Bapat From 996664e4b4d7e65449b813828e81d0924af64e2b Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Mon, 4 Sep 2023 14:56:21 +0530 Subject: [PATCH 3/4] Save commuted RestrictInfo for later use A commuted RestrictInfo may be produced as many times as the number of indexes it is used for. It's the same RestrictInfo always. Save some CPU and memory by saving the result in the original RestrictInfo. Ashutosh Bapat --- src/backend/optimizer/util/appendinfo.c | 6 ++ src/backend/optimizer/util/restrictinfo.c | 22 +- src/include/nodes/pathnodes.h | 9 + 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index dd4251be80..cb9a6d2d95 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -492,6 +492,12 @@ adjust_appendrel_attrs_mutator(Node *node, newinfo->left_mcvfreq = -1; newinfo->right_mcvfreq = -1; + /* + * Wipe out commuted parent RestrictInfo. The caller will compute + * commuted clause if required. + */ + newinfo->comm_rinfo = NULL; + return (Node *) newinfo; } diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c index 0b406e9334..48f8b38258 100644 --- a/src/backend/optimizer/util/restrictinfo.c +++ b/src/backend/optimizer/util/restrictinfo.c @@ -246,6 +246,8 @@ make_restrictinfo_internal(PlannerInfo *root, restrictinfo->left_hasheqoperator = InvalidOid; restrictinfo->right_hasheqoperator = InvalidOid; + restrictinfo->comm_rinfo = NULL; + restrictinfo->is_commuted = false; return restrictinfo; } @@ -354,14 +356,27 @@ make_sub_restrictinfos(PlannerInfo *root, * be hazardous if the source is subject to change. Also notice that we * assume without checking that the commutator op is a member of the same * btree and hash opclasses as the original op. + * + * If a commuted RestrictInfo is already available it is returned. */ RestrictInfo * commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op) { RestrictInfo *result; OpExpr *newclause; - OpExpr *clause = castNode(OpExpr, rinfo->clause); + OpExpr *clause; + + if (rinfo->comm_rinfo) + { + result = rinfo->comm_rinfo; + newclause = castNode(OpExpr, result->clause); + Assert(list_length(newclause->args) == 2); + Assert(newclause->opno == comm_op); + return result; + } + + clause = castNode(OpExpr,
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat wrote: > > On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat > wrote: > > > > The patch set is thus > > 0001 - patch used to measure memory used during planning > > > > 0002 - Patch to free intermediate Relids computed by > > adjust_child_relids_multilevel(). I didn't test memory consumption for > > multi-level partitioning. But this is clear improvement. In that > > function we free the AppendInfos array which as many pointers long as > > the number of relids. So it doesn't make sense not to free the Relids > > which can be {largest relid}/8 bytes long at least. > > > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by > > itself shows a small memory saving (3%) in the query below where the > > same clause is commuted twice. The query does not contain any > > partitioned tables. > > create table t2 (a int primary key, b int, c int); > > create index t2_a_b on t2(a, b); > > select * from t2 where 10 = a > > Memory used without patch: 13696 bytes > > Memory used with patch: 13264 bytes > > > > 0004 - Patch which implements the hash table of hash table described > > above and also code to avoid repeated RestrictInfo list translations. > > > > I will add this patchset to next commitfest. > > > > -- > > Best Wishes, > > Ashutosh Bapat > > PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is > the squashed version of the latest patch set at > https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com. CFBot shows that the patch does not apply anymore as in [1]: === Applying patches on top of PostgreSQL commit ID 7014c9a4bba2d1b67d60687afb5b2091c1d07f73 === === applying patch ./0001-Report-memory-used-for-planning-a-query-in--20231031.patch ... patching file src/test/regress/expected/explain.out Hunk #5 FAILED at 290. Hunk #6 succeeded at 545 (offset 4 lines). 1 out of 6 hunks FAILED -- saving rejects to file src/test/regress/expected/explain.out.rej patching file src/tools/pgindent/typedefs.list Hunk #1 succeeded at 1562 (offset 18 lines). Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_4564.log Regards, Vignesh
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Fri, Nov 24, 2023 at 3:56 PM Alena Rybakina wrote: > > On 24.11.2023 13:20, Alena Rybakina wrote: > > Hi! Thank you for your work on the subject, I think it's a really useful > feature. > > I've reviewed your patch and I have a few questions. > > First of all, have you thought about creating a gun parameter to display > memory scheduling information? I agree that this is an important feature, but > I think only for debugging. Not a GUC parameter but I have a proposal to use EXPLAIN for the same. [1] > > Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) > and can it lead to collisions? Why didn't you generate a hash from > childRestrictInfo for this? For example, something like how it is formed here > [0]. Not usually. But that's the only "key" we have to access a set of sematically same RestrictInfos. Relids is another key to access the exact RestrictInfo. A child RestrictInfo can not be used since there will many child RestrictInfos. Similar parent RestrictInfo can not be used since there will be multiple forms of the same RestrictInfo. [1] https://commitfest.postgresql.org/45/4492/ -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On 24.11.2023 13:20, Alena Rybakina wrote: Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions. First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging. Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0]. [0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru Sorry, my first question was not clear, I mean: have you thought about creating a guc parameter to display memory planning information? -- Regards, Alena Rybakina
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi! Thank you for your work on the subject, I think it's a really useful feature. I've reviewed your patch and I have a few questions. First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging. Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0]. [0] https://www.postgresql.org/message-id/43ad8a48-b980-410d-a83c-5beebf82a4ed%40postgrespro.ru -- Regards, Alena Rybakina Postgres Professional
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat wrote: > > The patch set is thus > 0001 - patch used to measure memory used during planning > > 0002 - Patch to free intermediate Relids computed by > adjust_child_relids_multilevel(). I didn't test memory consumption for > multi-level partitioning. But this is clear improvement. In that > function we free the AppendInfos array which as many pointers long as > the number of relids. So it doesn't make sense not to free the Relids > which can be {largest relid}/8 bytes long at least. > > 0003 - patch to save and reuse commuted RestrictInfo. This patch by > itself shows a small memory saving (3%) in the query below where the > same clause is commuted twice. The query does not contain any > partitioned tables. > create table t2 (a int primary key, b int, c int); > create index t2_a_b on t2(a, b); > select * from t2 where 10 = a > Memory used without patch: 13696 bytes > Memory used with patch: 13264 bytes > > 0004 - Patch which implements the hash table of hash table described > above and also code to avoid repeated RestrictInfo list translations. > > I will add this patchset to next commitfest. > > -- > Best Wishes, > Ashutosh Bapat PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is the squashed version of the latest patch set at https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com. -- Best Wishes, Ashutosh Bapat From de823e9cf0b03f21f8de51b49347df58ecb27c5a Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Thu, 31 Aug 2023 19:02:23 +0530 Subject: [PATCH 2/4] Free intermediate Relids created in adjust_child_relids_multilevel() adjust_child_relids_multilevel() creates Relids for all the intermediate stages in a partition hierarchy. These relid sets are not required finally. Relids or Bitmapset take reasonably large space when thousands of partitions are invovled. Hence free these intermediate relid sets. Ashutosh Bapat --- src/backend/optimizer/util/appendinfo.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index f456b3b0a4..4008e52de2 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -591,6 +591,8 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, { AppendRelInfo **appinfos; int nappinfos; + Relids tmp_relids = NULL; + Relids child_relids; /* * If the given relids set doesn't contain any of the parent relids, it @@ -599,13 +601,14 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, if (!bms_overlap(relids, parentrel->relids)) return relids; + tmp_relids = relids; /* Recurse if immediate parent is not the top parent. */ if (childrel->parent != parentrel) { if (childrel->parent) - relids = adjust_child_relids_multilevel(root, relids, - childrel->parent, - parentrel); + tmp_relids = adjust_child_relids_multilevel(root, relids, + childrel->parent, + parentrel); else elog(ERROR, "childrel is not a child of parentrel"); } @@ -613,11 +616,15 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, /* Now translate for this child. */ appinfos = find_appinfos_by_relids(root, childrel->relids, ); - relids = adjust_child_relids(relids, nappinfos, appinfos); + child_relids = adjust_child_relids(tmp_relids, nappinfos, appinfos); + + /* Free any intermediate Relids created. */ + if (tmp_relids != relids) + bms_free(tmp_relids); pfree(appinfos); - return relids; + return child_relids; } /* -- 2.25.1 From 69cfd8bd6ea38a99a715268423d92fa4179f7cc7 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Mon, 4 Sep 2023 14:56:21 +0530 Subject: [PATCH 3/4] Save commuted RestrictInfo for later use A commuted RestrictInfo may be produced as many times as the number of indexes it is used for. It's the same RestrictInfo always. Save some CPU and memory by saving the result in the original RestrictInfo. Ashutosh Bapat --- src/backend/optimizer/util/appendinfo.c | 6 ++ src/backend/optimizer/util/restrictinfo.c | 22 +- src/include/nodes/pathnodes.h | 9 + 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 4008e52de2..53e1233dce 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -492,6 +492,12 @@ adjust_appendrel_attrs_mutator(Node *node, newinfo->left_mcvfreq = -1; newinfo->right_mcvfreq = -1; + /* + * Wipe out commuted parent RestrictInfo. The caller will compute + * commuted clause if required. + */ + newinfo->comm_rinfo = NULL; + return (Node *) newinfo; } diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c index
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi All, On Fri, Aug 11, 2023 at 6:24 PM Ashutosh Bapat wrote: > > Obtaining child clauses from parent clauses by translation and > tracking the translations is less complex and may be more efficient > too. I will post a patch on those lines soon. > PFA patch set to add infrastructure to track RestrictInfo translations and reuse them. PlannerInfo gets a new member "struct HTAB *child_rinfo_hash" which is a hash table of hash tables keyed by RestrictInfo::rinfo_serial. Each element in the array is a hash table of RestrictInfos keyed by RestrictInfo::required_relids as explained in my previous email. When building child clauses when a. building child join rels or b. when reparameterizing paths, we first access the first level hash table using RestrictInfo::rinfo_serial of the parent and search the required translation by computing the child RestrictInfo::required_relids obtained by translating RestrictInfo::required_relids of the parent RestrictInfo. If the translation doesn't exist, we create one and add to the hash table. RestrictInfo::required_relids is same for a RestrictInfo and its commuted RestrictInfo. The order of operands is important for IndexClauses. Hence we track the commuted RestrictInfo in a new field RestrictInfo::comm_rinfo. RestrictInfo::is_commuted differentiates between a RestrictInfo and its commuted version. This is explained as a comment in the patch. This scheme has a minor benefit of saving memory when the same RestrictInfo is commuted multiple times. Hash table of hash table is used instead of an array of hash tables since a. not every rinfo_serial has a RestrictInfo associated with it b. not every RestrictInfo has translations, c. I don't think the exact size of this array is not known till the planning ends since we continue to create new clauses as we create new RelOptInfos. Of course, an array can be repalloc'ed and unused slots in the array may not waste a lot of memory. I am open to change hash table to an array which may be more efficient. With these set of patches, the memory consumption stands as below Number of tables | without patch | with patch | % reduction | being joined ||| | -- 2 | 40.8 MiB | 37.4 MiB | 8.46% | 3 | 151.6 MiB | 135.0 MiB | 10.96% | 4 | 464.0 MiB | 403.6 MiB | 12.00% | 5 |1663.9 MiB | 1329.1 MiB | 20.12% | The patch set is thus 0001 - patch used to measure memory used during planning 0002 - Patch to free intermediate Relids computed by adjust_child_relids_multilevel(). I didn't test memory consumption for multi-level partitioning. But this is clear improvement. In that function we free the AppendInfos array which as many pointers long as the number of relids. So it doesn't make sense not to free the Relids which can be {largest relid}/8 bytes long at least. 0003 - patch to save and reuse commuted RestrictInfo. This patch by itself shows a small memory saving (3%) in the query below where the same clause is commuted twice. The query does not contain any partitioned tables. create table t2 (a int primary key, b int, c int); create index t2_a_b on t2(a, b); select * from t2 where 10 = a Memory used without patch: 13696 bytes Memory used with patch: 13264 bytes 0004 - Patch which implements the hash table of hash table described above and also code to avoid repeated RestrictInfo list translations. I will add this patchset to next commitfest. -- Best Wishes, Ashutosh Bapat From 3b9e8039c15c87f6dc83369419c85c4a0a3b5688 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Thu, 31 Aug 2023 19:02:23 +0530 Subject: [PATCH 2/4] Free intermediate Relids created in adjust_child_relids_multilevel() adjust_child_relids_multilevel() creates Relids for all the intermediate stages in a partition hierarchy. These relid sets are not required finally. Relids or Bitmapset take reasonably large space when thousands of partitions are invovled. Hence free these intermediate relid sets. Ashutosh Bapat --- src/backend/optimizer/util/appendinfo.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index f456b3b0a4..4008e52de2 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -591,6 +591,8 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, { AppendRelInfo **appinfos; int nappinfos; + Relids tmp_relids = NULL; + Relids child_relids; /* * If the given relids set doesn't contain any of the parent relids, it @@ -599,13 +601,14 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, if (!bms_overlap(relids, parentrel->relids)) return relids; + tmp_relids = relids; /* Recurse if immediate parent is not the top parent. */ if (childrel->parent
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
I spent some time on 4th point below but also looked at other points. Here's what I have found so far On Thu, Jul 27, 2023 at 7:35 PM Ashutosh Bapat wrote: > > 1. The patch uses RestrictInfo::required_relids as the key for > searching child RelOptInfos. I am not sure which of the two viz. > required_relids and clause_relids is a better key. required_relids > seems to be a subset of clause_relids and from the description it > looks like that's the set that decides the applicability of a clause > in a join. But clause_relids is obtained from all the Vars that appear > in the clause, so may be that's the one that matters for the > translations. Can somebody guide me? I was wrong that required_relids is subset of clause_relids. The first can contain OJ relids which the other can not. OJ relids do not have any children, so they won't be translated. So clause_relids seems to be a better key. I haven't made a decision yet. > > 2. The patch adds two extra pointers per RestrictInfo. They will > remain unused when partitionwise join is not used. Right now, I do not > see any increase in memory consumed by planner because of those > pointers even in case of unpartitioned tables; maybe they are absorbed > in memory alignment. They may show up as extra memory in the future. I > am wondering whether we can instead save and track translations in > PlannerInfo as a hash table using whatever is the answer to above question) of parent and child > respectively> as key. That will just add one extra pointer in > PlannerInfo when partitionwise join is not used. Please let me know > your suggestions. I will go ahead with a pointer in PlannerInfo for now. > > 3. I have changed adjust_appendrel_attrs_mutator() to return a > translated RestrictInfo if it already exists. IOW, it won't always > return a deep copy of given RestrictInfo as it does today. This can be > fixed by writing wrappers around adjust_appendrel_attrs() to translate > RestrictInfo specifically. But maybe we don't always need deep copies. > Are there any cases when we need translated deep copies of > RestrictInfo? Those cases will require fixing callers of > adjust_appendrel_attrs() instead of the mutator. I think it's better to handle the tracking logic outside adjust_appendrel_attrs. That will be some code churn but it will be cleaner and won't affect anything other that partitionwise joins. > > 4. IIRC, when partitionwise join was implemented we had discussed > creating child RestrictInfos using a login similar to > build_joinrel_restrictlist(). That might be another way to build > RestrictInfo only once and use it multiple times. But we felt that it > was much harder problem to solve since it's not known which partitions > from joining partitioned tables will match and will be joined till we > enter try_partitionwise_join(), so the required RestrictInfos may not > be available in RelOptInfo::joininfo. Let me know your thoughts on > this. Here's some lengthy description of why I feel translations are better compared to computing restrictlist and joininfo for a child join from joining relation's joininfo Consider a query "select * from p, q, r where p.c1 = q.c1 and q.c1 = r.c1 and p.c2 + q.c2 < r.c2 and p.c3 != q.c3 and q.c3 != r.c3". The query has following clauses 1. p.c1 = q.c1 2. q.c1 = r.c1 3. p.c2 + q.c2 < r.c2 4. p.c3 != q.c3 5. q.c3 != r.c3 The first two clauses are added to EC machinery and do not appear in joininfo. They appear in restrictlist when we construct clauses in restrictlist from ECs. Let's ignore them for now. Assume that each table p, q, r has partitions (p1, p2, ...), (q1, q2, ...) and (r1, r2, ... ) respectively. Each triplet (pi, qi,ri) forms the set of matching partitions from p, q, r respectively for all "i". Consider join, p1q1r1. We will generate relations p1, q1, r1, p1q1, p1r1, q1r1 and p1q1r1 while building the last join. Below is description of how these clauses would look in each of these relations and the list they appear in when computing that join. Please notice the numeric suffixes carefully. p1. joininfo: p1.c2 + q.c2 < r.c2, p1.c3 != q.c3 restrictlist: <> q1 joininfo: p.c2 + q1.c2 < r.c2, p.c3 != q1.c3, q1.c3 != r.c3 restrictlist: <> r1 joininfo: p.c2 + q.c2 < r1.c2, q.c3 != r1.c3 restrictlist: <> p1q1 joininfo: p1.c2 + q1.c2 < r.c2, q1.c3 != r.c3 restrictlist: p1.c3 != q1.c3 q1r1 joininfo: p.c2 + q1.c2 < r1.c2, p.c3 != q1.c3 restrictlist: q1.c3 != r1.c3 p1r1 joininfo: p1.c2 + q.c2 < r1.c2, p1.c3 != q.c3, q.c3 != r1.c3 restrictlist: <> p1q1r1 joininfo: <> restrictlist for (p1q1)r1: p1.c2 + q1.c2 < r1.c2, q1.c3 != r1.c3 restrictlist for (p1r1)q1: p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3, q1.c3 != r1.c3 restrictlist for p1(q1r1): p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3 If we translate the clauses when building join e.g. translate p1.c3 != q1.c3 when building p1q1 or p1q1r1, it would cause repeated translations. So the translations need to be saved in lower relations when we detect matching partitions and then
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Wed, Aug 9, 2023 at 10:12 AM David Rowley wrote: > > On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat > wrote: > > 0001 - to measure memory consumption during planning. This is the same > > one as attached to [1]. > I have started a separate thread to discuss this patch. I am taking this discussion to that thread. -- Best Wishes, Ashutosh Bapat
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat wrote: > 0001 - to measure memory consumption during planning. This is the same > one as attached to [1]. I see you're recording the difference in the CurrentMemoryContext of palloc'd memory before and after planning. That won't really alert us to problems if the planner briefly allocates, say 12GBs of memory during, say the join search then quickly pfree's it again. unless it's an oversized chunk, aset.c won't free() any memory until MemoryContextReset(). Chunks just go onto a freelist for reuse later. So at the end of planning, the context may still have that 12GBs malloc'd, yet your new EXPLAIN ANALYZE property might end up just reporting a tiny fraction of that. I wonder if it would be more useful to just have ExplainOneQuery() create a new memory context and switch to that and just report the context's mem_allocated at the end. It's also slightly annoying that these planner-related summary outputs are linked to EXPLAIN ANALYZE. We could be showing them in EXPLAIN without ANALYZE. If we were to change that now, it might be a bit annoying for the regression tests as we'd need to go and add SUMMARY OFF in a load of places... drowley@amd3990x:~/pg_src/src/test/regress/sql$ git grep -i "costs off" | wc -l 1592 hmm, that would cause a bit of churn... :-( David
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Tue, Aug 8, 2023 at 4:08 PM Richard Guo wrote: > > > I haven't looked into the details but with 0002 patch I came across a > crash while planning the query below. > > regression=# set enable_partitionwise_join to on; > SET > regression=# EXPLAIN (COSTS OFF) > SELECT * FROM prt1 t1, prt2 t2 > WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0; > server closed the connection unexpectedly Thanks for the report. PFA the patches with this issue fixed. There is another issue seen when running partition_join.sql "ERROR: index key does not match expected index column". I will investigate that issue. Right now I am looking at the 4th point in my first email in this thread. [1]. I am trying to figure whether that approach would work. If that approach doesn't work what's the best to track the translations and also figure out answers to 1, 2, 3 there. Please let me know your opinions if any. -- Best Wishes, Ashutosh Bapat [1] https://www.postgresql.org/message-id/CAExHW5s=bclmmq8n_bn6iu+pjau0ds3z_6dn6ile69esmsp...@mail.gmail.com From 4abdfd676a31d885ccdbe1803c60c1df1d0c1df2 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Wed, 12 Jul 2023 14:34:14 +0530 Subject: [PATCH 1/2] Report memory used for planning a query in EXPLAIN ANALYZE The memory used in the CurrentMemoryContext and its children is sampled before and after calling pg_plan_query() from ExplainOneQuery(). The difference in the two samples is reported as the memory consumed while planning the query. This may not account for the memory allocated in memory contexts which are not children of CurrentMemoryContext. These contexts are usually other long lived contexts, e.g. CacheMemoryContext, which are shared by all the queries run in a session. The consumption in those can not be attributed only to a given query and hence should not be reported any way. The memory consumption is reported as "Planning Memory" property in EXPLAIN ANALYZE output. Ashutosh Bapat --- src/backend/commands/explain.c| 12 ++-- src/backend/commands/prepare.c| 7 ++- src/backend/utils/mmgr/mcxt.c | 19 +++ src/include/commands/explain.h| 3 ++- src/include/utils/memutils.h | 1 + src/test/regress/expected/explain.out | 15 +++ 6 files changed, 49 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8570b14f62..9f859949f0 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -397,16 +397,20 @@ ExplainOneQuery(Query *query, int cursorOptions, planduration; BufferUsage bufusage_start, bufusage; + Size mem_consumed; if (es->buffers) bufusage_start = pgBufferUsage; INSTR_TIME_SET_CURRENT(planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); /* plan the query */ plan = pg_plan_query(query, queryString, cursorOptions, params); INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext) + - mem_consumed; /* calc differences of buffer counters. */ if (es->buffers) @@ -417,7 +421,7 @@ ExplainOneQuery(Query *query, int cursorOptions, /* run it (if needed) and produce output */ ExplainOnePlan(plan, into, es, queryString, params, queryEnv, - , (es->buffers ? : NULL)); + , (es->buffers ? : NULL), _consumed); } } @@ -527,7 +531,7 @@ void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, const char *queryString, ParamListInfo params, QueryEnvironment *queryEnv, const instr_time *planduration, - const BufferUsage *bufusage) + const BufferUsage *bufusage, const Size *mem_consumed) { DestReceiver *dest; QueryDesc *queryDesc; @@ -628,6 +632,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, double plantime = INSTR_TIME_GET_DOUBLE(*planduration); ExplainPropertyFloat("Planning Time", "ms", 1000.0 * plantime, 3, es); + + if (mem_consumed) + ExplainPropertyUInteger("Planning Memory", "bytes", + (uint64) *mem_consumed, es); } /* Print info about runtime of triggers */ diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 18f70319fc..84544ce481 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -583,10 +583,12 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, instr_time planduration; BufferUsage bufusage_start, bufusage; + Size mem_consumed; if (es->buffers) bufusage_start = pgBufferUsage; INSTR_TIME_SET_CURRENT(planstart); + mem_consumed = MemoryContextMemUsed(CurrentMemoryContext); /* Look it up in the hash table */ entry = FetchPreparedStatement(execstmt->name, true); @@ -623,6 +625,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
On Thu, Jul 27, 2023 at 10:06 PM Ashutosh Bapat < ashutosh.bapat@gmail.com> wrote: > 0002 - WIP patch to avoid repeated translations of RestrictInfo. > The WIP patch avoids repeated translations by tracking the child for > which a RestrictInfo is translated and reusing the same translation > every time it is requested. In order to track the translations, > RestrictInfo gets two new members. > 1. parent_rinfo - In a child's RestrictInfo this points to the > RestrictInfo applicable to the topmost parent in partition hierarchy. > This is NULL in the topmost parent's RestrictInfo > 2. child_rinfos - In a parent's RestrictInfo, this is a list that > contains all the translated child RestrictInfos. In child > RestrictInfos this is NULL. I haven't looked into the details but with 0002 patch I came across a crash while planning the query below. regression=# set enable_partitionwise_join to on; SET regression=# EXPLAIN (COSTS OFF) SELECT * FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0; server closed the connection unexpectedly Thanks Richard
Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Hi All, Following up on [1] ... A restrictlist is a list of RestrictInfo nodes, each representing one clause, applicable to a set of joins. When partitionwise join is used as a strategy, the restrictlists for child join are obtained by translating the restrictlists for the parent in try_partitionwise_join(). That function is called once for every join order. E.g. when computing join ABC, it will be called for planning joins AB, BC, AC, (AB)C, A(BC), B(AC). Every time it is called it will translate the given parent restrictlist. This means that a RestrictInfo node applicable to given child relations will be translated as many times as the join orders in which those child relations appear in different joining relations. For example, consider a query "select * from A, B, C where A.a = B.a and B.a = C.a" where A, B and C are partitioned tables. A has partitions A1, A2, ... An. B has partitions B1, B2, ... Bn and C has partitions C1, C2, ... Cn. Partitions Ai, Bi and Ci are matching partitions respectively for all i's. The join ABC is computed as Append of (A1B1C1, A2B2C2, ... AnBnCn). The clause A.a = B.a is translated to A1.a = B1.a thrice, when computing A1B1, A1(B1C1) and B1(A1C1) respectively. Similarly other clauses are translated multiple times. Some extra translations also happen in reparameterize_path_by_child(). These translations consume memory which remains allocated till the statement finishes. A ResrtictInfo should be translated only once per parent-child pair, thus avoiding consuming extra memory. There are two patches attached 0001 - to measure memory consumption during planning. This is the same one as attached to [1]. 0002 - WIP patch to avoid repeated translations of RestrictInfo. The WIP patch avoids repeated translations by tracking the child for which a RestrictInfo is translated and reusing the same translation every time it is requested. In order to track the translations, RestrictInfo gets two new members. 1. parent_rinfo - In a child's RestrictInfo this points to the RestrictInfo applicable to the topmost parent in partition hierarchy. This is NULL in the topmost parent's RestrictInfo 2. child_rinfos - In a parent's RestrictInfo, this is a list that contains all the translated child RestrictInfos. In child RestrictInfos this is NULL. Every translated RestrictInfo is stored in the top parent's RestrictInfo child_rinfos. RestrictInfo::required_relids is used as a key to search a given translation. I have intercepted adjust_appendrel_attrs_mutator() to track translations as well as avoid multiple translations. It first looks for an existing translation when translating a RestrictInfo and creates a new one only when one doesn't exist already. Using this patch the memory consumption for the above query reduces as follows Number of partitions: 1000 Number of tables | without patch | with patch | % reduction | being joined ||| | -- 2 | 40.3 MiB | 37.7 MiB | 6.43% | 3 | 146.8 MiB | 133.0 MiB | 9.42% | 4 | 445.4 MiB | 389.5 MiB | 12.57% | 5 |1563.2 MiB | 1243.2 MiB | 20.47% | The number of times a RestrictInfo requires to be translated increases exponentially with the number of tables joined. Thus we see more memory saved as the number of tables joined increases. When two tables are joined there's only a single join planned so no extra translations happen in try_partitionwise_join(). The memory saved in case of 2 joining tables comes from avoiding extra translations happening during reparameterization of paths (in reparameterize_path_by_child()). The attached patch is to show how much memory can be saved if we avoid extra translation. But I want to discuss the following things about the approach. 1. The patch uses RestrictInfo::required_relids as the key for searching child RelOptInfos. I am not sure which of the two viz. required_relids and clause_relids is a better key. required_relids seems to be a subset of clause_relids and from the description it looks like that's the set that decides the applicability of a clause in a join. But clause_relids is obtained from all the Vars that appear in the clause, so may be that's the one that matters for the translations. Can somebody guide me? 2. The patch adds two extra pointers per RestrictInfo. They will remain unused when partitionwise join is not used. Right now, I do not see any increase in memory consumed by planner because of those pointers even in case of unpartitioned tables; maybe they are absorbed in memory alignment. They may show up as extra memory in the future. I am wondering whether we can instead save and track translations in PlannerInfo as a hash table using as key. That will just add one extra pointer in PlannerInfo when partitionwise join is not used. Please let me know your suggestions. 3. I have changed