Re: A performance issue with Memoize
On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy < anthonin.bonne...@datadoghq.com> wrote: > Hi, > > I've seen a similar issue with the following query (tested on the current > head): > > EXPLAIN ANALYZE SELECT * FROM tenk1 t1 > LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2 > ON t1.hundred = t2.hundred WHERE t1.hundred < 5; >QUERY PLAN > > > Nested Loop Left Join (cost=8.46..1495.10 rows=5 width=256) > (actual time=0.860..111.013 rows=5 loops=1) >-> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500 > width=244) (actual time=0.798..1.418 rows=500 loops=1) > Recheck Cond: (hundred < 5) > Heap Blocks: exact=263 > -> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04 > rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1) >Index Cond: (hundred < 5) >-> Memoize (cost=0.30..4.89 rows=100 width=12) (actual > time=0.009..0.180 rows=100 loops=500) > Cache Key: t1.hundred > Cache Mode: logical > Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage: > 5kB > -> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88 > rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500) >Index Cond: (hundred = t1.hundred) > Planning Time: 0.661 ms > Execution Time: 113.076 ms > (14 rows) > > The memoize's cache key only uses t1.hundred while the nested loop has > two changed parameters: the lateral var t1.two and t1.hundred. This > leads to a chgParam that is always different and the cache is purged > on each rescan. Thanks for the report! This issue is caused by that we fail to check PHVs for lateral references, and hence cannot know that t1.two should also be included in the cache keys. I reported exactly the same issue in [1], and verified that it can be fixed by the patch in [2] (which seems to require a rebase). [1] https://www.postgresql.org/message-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun%2BVtrowcmxb41CzA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAMbWs49%2BCjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg%40mail.gmail.com Thanks Richard
Re: A performance issue with Memoize
Hi, I've seen a similar issue with the following query (tested on the current head): EXPLAIN ANALYZE SELECT * FROM tenk1 t1 LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2 ON t1.hundred = t2.hundred WHERE t1.hundred < 5; QUERY PLAN Nested Loop Left Join (cost=8.46..1495.10 rows=5 width=256) (actual time=0.860..111.013 rows=5 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500 width=244) (actual time=0.798..1.418 rows=500 loops=1) Recheck Cond: (hundred < 5) Heap Blocks: exact=263 -> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04 rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1) Index Cond: (hundred < 5) -> Memoize (cost=0.30..4.89 rows=100 width=12) (actual time=0.009..0.180 rows=100 loops=500) Cache Key: t1.hundred Cache Mode: logical Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage: 5kB -> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88 rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500) Index Cond: (hundred = t1.hundred) Planning Time: 0.661 ms Execution Time: 113.076 ms (14 rows) The memoize's cache key only uses t1.hundred while the nested loop has two changed parameters: the lateral var t1.two and t1.hundred. This leads to a chgParam that is always different and the cache is purged on each rescan. Regards, Anthonin On Sat, Jan 27, 2024 at 5:00 AM Alexander Lakhin wrote: > > Hello, > > 27.01.2024 00:09, Tom Lane wrote: > > David Rowley writes: > >> On Sat, 27 Jan 2024 at 09:41, Tom Lane wrote: > >>> drongo and fairywren are consistently failing the test case added > >>> by this commit. I'm not quite sure why the behavior of Memoize > >>> would be platform-specific when we're dealing with integers, > >>> but ... > >> Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those > >> rather than "2"? > >> Looking at the code in fmtfloat(), we fallback on the built-in snprintf. > > Maybe ... I don't have a better theory. > > FWIW, I've found where this behaviour is documented: > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170 > > (I've remembered a case with test/sql/partition_prune from 2020, where > sprintf on Windows worked the other way.) > > > Best regards, > Alexander > >
Re: A performance issue with Memoize
Hello, 27.01.2024 00:09, Tom Lane wrote: David Rowley writes: On Sat, 27 Jan 2024 at 09:41, Tom Lane wrote: drongo and fairywren are consistently failing the test case added by this commit. I'm not quite sure why the behavior of Memoize would be platform-specific when we're dealing with integers, but ... Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those rather than "2"? Looking at the code in fmtfloat(), we fallback on the built-in snprintf. Maybe ... I don't have a better theory. FWIW, I've found where this behaviour is documented: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170 (I've remembered a case with test/sql/partition_prune from 2020, where sprintf on Windows worked the other way.) Best regards, Alexander
Re: A performance issue with Memoize
David Rowley writes: > On Sat, 27 Jan 2024 at 09:41, Tom Lane wrote: >> drongo and fairywren are consistently failing the test case added >> by this commit. I'm not quite sure why the behavior of Memoize >> would be platform-specific when we're dealing with integers, >> but ... > Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those > rather than "2"? > Looking at the code in fmtfloat(), we fallback on the built-in snprintf. Maybe ... I don't have a better theory. > I can try changing the unique1 < 5 to unique1 < 4 to see that's more stable. Worth a try. regards, tom lane
Re: A performance issue with Memoize
On Sat, 27 Jan 2024 at 09:41, Tom Lane wrote: > > David Rowley writes: > > I've adjusted the comments to what you mentioned and also leaned out > > the pretty expensive test case to something that'll run much faster > > and pushed the result. > > drongo and fairywren are consistently failing the test case added > by this commit. I'm not quite sure why the behavior of Memoize > would be platform-specific when we're dealing with integers, > but ... Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those rather than "2"? Looking at the code in fmtfloat(), we fallback on the built-in snprintf. I can try changing the unique1 < 5 to unique1 < 4 to see that's more stable. David
Re: A performance issue with Memoize
David Rowley writes: > I've adjusted the comments to what you mentioned and also leaned out > the pretty expensive test case to something that'll run much faster > and pushed the result. drongo and fairywren are consistently failing the test case added by this commit. I'm not quite sure why the behavior of Memoize would be platform-specific when we're dealing with integers, but ... regards, tom lane
Re: A performance issue with Memoize
On Fri, 26 Jan 2024 at 19:03, Richard Guo wrote: > At first I wondered if we should assume that the same param expr must > have the same equality operator. If not, we should also check the > operator to tell if the cache key is a duplicate, like > > - if (!list_member(*param_exprs, expr)) > + if (!list_member(*param_exprs, expr) || > + !list_member_oid(*operators, hasheqoperator)) hmm, if that were the case you wouldn't do it that way. You'd need to forboth() and look for an item in both lists matching the search. > But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator > is set, it seems we can assume that: the operator is from the type cache > entry which is fetched according to the expr datatype. Yip. > So I think the patch makes sense. +1. Thanks for reviewing. I've pushed the patch. David
Re: A performance issue with Memoize
On Fri, Jan 26, 2024 at 12:18 PM David Rowley wrote: > On Fri, 26 Jan 2024 at 16:51, Tom Lane wrote: > > >> However ... it seems like we're not out of the woods yet. Why > > >> is Richard's proposed test case still showing > > >> + -> Memoize (actual rows=5000 loops=N) > > >> + Cache Key: t1.two, t1.two > > >> Seems like there is missing de-duplication logic, or something. > > > > > This seems separate and isn't quite causing the same problems as what > > > Richard wants to fix so I didn't touch this for now. > > > > Fair enough, but I think it might be worth pursuing later. > > Here's a patch for that. At first I wondered if we should assume that the same param expr must have the same equality operator. If not, we should also check the operator to tell if the cache key is a duplicate, like - if (!list_member(*param_exprs, expr)) + if (!list_member(*param_exprs, expr) || + !list_member_oid(*operators, hasheqoperator)) But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator is set, it seems we can assume that: the operator is from the type cache entry which is fetched according to the expr datatype. So I think the patch makes sense. +1. Thanks Richard
Re: A performance issue with Memoize
On Fri, Jan 26, 2024 at 1:22 AM Tom Lane wrote: > Apologies for not having noticed this thread before. I'm taking > a look at it now. However, while sniffing around this I found > what seems like an oversight in paramassign.c's > assign_param_for_var(): it says it should compare all the same > fields as _equalVar except for varlevelsup, but it's failing to > compare varnullingrels. Is that a bug? It's conceivable that > it's not possible to get here with varnullingrels different and > all else the same, but I don't feel good about that proposition. > > I tried adding > > @@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var) > pvar->vartype == var->vartype && > pvar->vartypmod == var->vartypmod && > pvar->varcollid == var->varcollid) > +{ > +Assert(bms_equal(pvar->varnullingrels, > var->varnullingrels)); > return pitem->paramId; > +} > } > } Yeah, I think it should be safe to assert that the varnullingrels is equal here. The Var is supposed to be an upper-level Var, and two same such Vars should not have different varnullingrels at this point, although the varnullingrels might be adjusted later in identify_current_nestloop_params according to which form of identity 3 we end up applying. Thanks Richard
Re: A performance issue with Memoize
On Fri, 26 Jan 2024 at 16:51, Tom Lane wrote: > >> However ... it seems like we're not out of the woods yet. Why > >> is Richard's proposed test case still showing > >> + -> Memoize (actual rows=5000 loops=N) > >> + Cache Key: t1.two, t1.two > >> Seems like there is missing de-duplication logic, or something. > > > This seems separate and isn't quite causing the same problems as what > > Richard wants to fix so I didn't touch this for now. > > Fair enough, but I think it might be worth pursuing later. Here's a patch for that. David diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index c0ba087b40..8ecf9fe8dc 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -492,8 +492,16 @@ paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info, return false; } - *operators = lappend_oid(*operators, hasheqoperator); - *param_exprs = lappend(*param_exprs, expr); + /* +* 'expr' may already exist as a parameter from a previous ppi_clauses. No +* need to include it again, however we'd better ensure we do switch into +* binary mode if required. See below. +*/ + if (!list_member(*param_exprs, expr)) + { + *operators = lappend_oid(*operators, hasheqoperator); + *param_exprs = lappend(*param_exprs, expr); + } /* * When the join operator is not hashable then it's possible that @@ -536,8 +544,16 @@ paraminfo_get_equal_hashops(PlannerInfo *root, ParamPathInfo *param_info, return false; } - *operators = lappend_oid(*operators, typentry->eq_opr); - *param_exprs = lappend(*param_exprs, expr); + /* +* 'expr' may already exist as a parameter from the ppi_clauses. No need to +* include it again, however we'd better ensure we do switch into binary +* mode. +*/ + if (!list_member(*param_exprs, expr)) + { + *operators = lappend_oid(*operators, typentry->eq_opr); + *param_exprs = lappend(*param_exprs, expr); + } /* * We must go into binary mode as we don't have too much of an idea of diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out index ca198ec3b8..17bb3c8661 100644 --- a/src/test/regress/expected/memoize.out +++ b/src/test/regress/expected/memoize.out @@ -107,7 +107,7 @@ WHERE t1.unique1 < 10;', false); -> Index Scan using tenk1_unique1 on tenk1 t1 (actual rows=10 loops=N) Index Cond: (unique1 < 10) -> Memoize (actual rows=2 loops=N) - Cache Key: t1.two, t1.two + Cache Key: t1.two Cache Mode: binary Hits: 8 Misses: 2 Evictions: Zero Overflows: 0 Memory Usage: NkB -> Subquery Scan on t2 (actual rows=2 loops=N)
Re: A performance issue with Memoize
David Rowley writes: > I've adjusted the comments to what you mentioned and also leaned out > the pretty expensive test case to something that'll run much faster > and pushed the result. +1, I was wondering if the test could be cheaper. It wasn't horrid as Richard had it, but core regression tests add up over time. >> However ... it seems like we're not out of the woods yet. Why >> is Richard's proposed test case still showing >> + -> Memoize (actual rows=5000 loops=N) >> + Cache Key: t1.two, t1.two >> Seems like there is missing de-duplication logic, or something. > This seems separate and isn't quite causing the same problems as what > Richard wants to fix so I didn't touch this for now. Fair enough, but I think it might be worth pursuing later. regards, tom lane
Re: A performance issue with Memoize
On Fri, 26 Jan 2024 at 07:32, Tom Lane wrote: > > David Rowley writes: > > I'd feel better about doing it your way if Tom could comment on if > > there was a reason he put the function calls that way around in > > 5ebaaa494. > > I'm fairly sure I thought it wouldn't matter because of the Param > de-duplication done in paramassign.c. However, Richard's example > shows that's not so, because process_subquery_nestloop_params is > picky about the param ID assigned to a particular Var while > replace_nestloop_params is not. So flipping the order makes sense. Makes sense. I've adjusted the comments to what you mentioned and also leaned out the pretty expensive test case to something that'll run much faster and pushed the result. > However ... it seems like we're not out of the woods yet. Why > is Richard's proposed test case still showing > > + -> Memoize (actual rows=5000 loops=N) > + Cache Key: t1.two, t1.two > > Seems like there is missing de-duplication logic, or something. This seems separate and isn't quite causing the same problems as what Richard wants to fix so I didn't touch this for now. David
Re: A performance issue with Memoize
On Fri, Jan 26, 2024 at 2:32 AM Tom Lane wrote: > I'm fairly sure I thought it wouldn't matter because of the Param > de-duplication done in paramassign.c. However, Richard's example > shows that's not so, because process_subquery_nestloop_params is > picky about the param ID assigned to a particular Var while > replace_nestloop_params is not. So flipping the order makes sense. > I'd change the comment though, maybe to > > /* > * Replace any outer-relation variables with nestloop params. > * > * We must provide nestloop params for both lateral references of > * the subquery and outer vars in the scan_clauses. It's better > * to assign the former first, because that code path requires > * specific param IDs, while replace_nestloop_params can adapt > * to the IDs assigned by process_subquery_nestloop_params. > * This avoids possibly duplicating nestloop params when the > * same Var is needed for both reasons. > */ +1. It's much better. > However ... it seems like we're not out of the woods yet. Why > is Richard's proposed test case still showing > > + -> Memoize (actual rows=5000 loops=N) > + Cache Key: t1.two, t1.two > > Seems like there is missing de-duplication logic, or something. When we collect the cache keys in paraminfo_get_equal_hashops() we search param_info's ppi_clauses as well as innerrel's lateral_vars for outer expressions. We do not perform de-duplication on the collected outer expressions there. In my proposed test case, the same Var 't1.two' appears both in the param_info->ppi_clauses and in the innerrel->lateral_vars, so we see two identical cache keys in the plan. I noticed this before and wondered if we should do de-duplication on the cache keys, but somehow I did not chase this to the ground. Thanks Richard
Re: A performance issue with Memoize
David Rowley writes: > I think fixing it your way makes sense. I don't really see any reason > why we should have two. However... > Another way it *could* be fixed would be to get rid of pull_paramids() > and change create_memoize_plan() to set keyparamids to all the param > IDs that match are equal() to each param_exprs. That way nodeMemoize > wouldn't purge the cache as we'd know the changing param is accounted > for in the cache. For the record, I don't think that's better, but it > scares me a bit less as I don't know what other repercussions there > are of applying your patch to get rid of the duplicate > NestLoopParam.paramval. > I'd feel better about doing it your way if Tom could comment on if > there was a reason he put the function calls that way around in > 5ebaaa494. I'm fairly sure I thought it wouldn't matter because of the Param de-duplication done in paramassign.c. However, Richard's example shows that's not so, because process_subquery_nestloop_params is picky about the param ID assigned to a particular Var while replace_nestloop_params is not. So flipping the order makes sense. I'd change the comment though, maybe to /* * Replace any outer-relation variables with nestloop params. * * We must provide nestloop params for both lateral references of * the subquery and outer vars in the scan_clauses. It's better * to assign the former first, because that code path requires * specific param IDs, while replace_nestloop_params can adapt * to the IDs assigned by process_subquery_nestloop_params. * This avoids possibly duplicating nestloop params when the * same Var is needed for both reasons. */ However ... it seems like we're not out of the woods yet. Why is Richard's proposed test case still showing + -> Memoize (actual rows=5000 loops=N) + Cache Key: t1.two, t1.two Seems like there is missing de-duplication logic, or something. > I also feel like we might be getting a bit close to the minor version > releases to be adjusting this stuff in back branches. Yeah, I'm not sure I would change this in the back branches. regards, tom lane
Re: A performance issue with Memoize
David Rowley writes: > I'd feel better about doing it your way if Tom could comment on if > there was a reason he put the function calls that way around in > 5ebaaa494. Apologies for not having noticed this thread before. I'm taking a look at it now. However, while sniffing around this I found what seems like an oversight in paramassign.c's assign_param_for_var(): it says it should compare all the same fields as _equalVar except for varlevelsup, but it's failing to compare varnullingrels. Is that a bug? It's conceivable that it's not possible to get here with varnullingrels different and all else the same, but I don't feel good about that proposition. I tried adding @@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var) pvar->vartype == var->vartype && pvar->vartypmod == var->vartypmod && pvar->varcollid == var->varcollid) +{ +Assert(bms_equal(pvar->varnullingrels, var->varnullingrels)); return pitem->paramId; +} } } This triggers no failures in the regression tests, but we know how little that proves. Anyway, that's just a side observation unrelated to the problem at hand. More later. regards, tom lane
Re: A performance issue with Memoize
On Fri, 20 Oct 2023 at 23:40, Richard Guo wrote: > The Memoize runtime stats 'Hits: 0 Misses: 1 Evictions: ' > seems suspicious to me, so I've looked into it a little bit, and found > that the MemoizeState's keyparamids and its outerPlan's chgParam are > always different, and that makes us have to purge the entire cache each > time we rescan the memoize node. > > But why are they always different? Well, for the query above, we have > two NestLoopParam nodes, one (with paramno 1) is created when we replace > outer-relation Vars in the scan qual 't1.two = s.two', the other one > (with paramno 0) is added from the subquery's subplan_params, which was > created when we replaced uplevel vars with Param nodes for the subquery. > That is to say, the chgParam would be {0, 1}. > > When it comes to replace outer-relation Vars in the memoize keys, the > two 't1.two' Vars are both replaced with the NestLoopParam with paramno > 1, because it is the first NLP we see in root->curOuterParams that is > equal to the Vars in memoize keys. That is to say, the memoize node's > keyparamids is {1}. I see the function calls were put this way around in 5ebaaa494 (Implement SQL-standard LATERAL subqueries.), per: @ -1640,6 +1641,7 @@ create_subqueryscan_plan(PlannerInfo *root, Path *best_path, { scan_clauses = (List *) replace_nestloop_params(root, (Node *) scan_clauses); + identify_nestloop_extparams(root, best_path->parent->subplan); } (identify_nestloop_extparams was later renamed to process_subquery_nestloop_params in 46c508fbc.) I think fixing it your way makes sense. I don't really see any reason why we should have two. However... Another way it *could* be fixed would be to get rid of pull_paramids() and change create_memoize_plan() to set keyparamids to all the param IDs that match are equal() to each param_exprs. That way nodeMemoize wouldn't purge the cache as we'd know the changing param is accounted for in the cache. For the record, I don't think that's better, but it scares me a bit less as I don't know what other repercussions there are of applying your patch to get rid of the duplicate NestLoopParam.paramval. I'd feel better about doing it your way if Tom could comment on if there was a reason he put the function calls that way around in 5ebaaa494. I also feel like we might be getting a bit close to the minor version releases to be adjusting this stuff in back branches. David
Re: A performance issue with Memoize
Hi Richard, I can tell this a real world problem. I have seen this multiple times in production. The fix seems surprisingly simple. I hope my questions here aren't completely off. I still struggle to think about the implications. I wonder, if there is any stuff we are breaking by bluntly forgetting about the subplan params. Maybe some table valued function scan within a subquery scan? Or something about casts on a join condition, that could be performed differently? I wasn't able to construct a problem case. I might be just missing context here. But I am not yet fully convinced whether this is safe to do in all cases. Regards Arne
Re: A performance issue with Memoize
On Tue, Oct 31, 2023 at 1:36 PM Andrei Lepikhov wrote: > On 30/10/2023 14:55, Richard Guo wrote: > > > > On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov > > mailto:a.lepik...@postgrespro.ru>> wrote: > > > > Do you've thought about the case, fixed with the commit 1db5667? As I > > see, that bugfix still isn't covered by regression tests. Could your > > approach of a PARAM_EXEC slot reusing break that case? > > > > > > Hm, I don't think so. The issue fixed by commit 1db5667 was caused by > > sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS > > it's safe to share PARAM_EXEC slots within the same level of NestLoop. > > > > The change here is about sharing PARAM_EXEC slots between subquery's > > subplan_params and outer-relation variables, which happens within the > > same level of NestLoop. > > ... > > Did you notice a case that the change here breaks? > > > > Hi Tom, could you share your insights on this issue and the proposed > > fix? > > I think your patch works correctly so far. I mentioned the commit > 1db5667 because, as I see, the origin of the problem was parallel > workers. I have thought about pushing Memoize down to a parallel worker > and couldn't imagine whether such a solution would be correct. > Sorry if I disturbed you in vain. Thanks for mentioning commit 1db5667, which brings my attention to more aspects about the PARAM_EXEC mechanism. I don't think the discussion is in vain. It helps a lot. Thanks for looking into this patch. Thanks Richard
Re: A performance issue with Memoize
On 30/10/2023 14:55, Richard Guo wrote: On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: Do you've thought about the case, fixed with the commit 1db5667? As I see, that bugfix still isn't covered by regression tests. Could your approach of a PARAM_EXEC slot reusing break that case? Hm, I don't think so. The issue fixed by commit 1db5667 was caused by sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS it's safe to share PARAM_EXEC slots within the same level of NestLoop. The change here is about sharing PARAM_EXEC slots between subquery's subplan_params and outer-relation variables, which happens within the same level of NestLoop. ... Did you notice a case that the change here breaks? Hi Tom, could you share your insights on this issue and the proposed fix? I think your patch works correctly so far. I mentioned the commit 1db5667 because, as I see, the origin of the problem was parallel workers. I have thought about pushing Memoize down to a parallel worker and couldn't imagine whether such a solution would be correct. Sorry if I disturbed you in vain. -- regards, Andrei Lepikhov Postgres Professional
Re: A performance issue with Memoize
On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov wrote: > Do you've thought about the case, fixed with the commit 1db5667? As I > see, that bugfix still isn't covered by regression tests. Could your > approach of a PARAM_EXEC slot reusing break that case? Hm, I don't think so. The issue fixed by commit 1db5667 was caused by sharing PARAM_EXEC slots between different levels of NestLoop. AFAICS it's safe to share PARAM_EXEC slots within the same level of NestLoop. The change here is about sharing PARAM_EXEC slots between subquery's subplan_params and outer-relation variables, which happens within the same level of NestLoop. Actually, even without this change, we'd still share PARAM_EXEC slots between subquery's subplan_params and outer-relation variables in some cases. As an example, consider explain (costs off) select * from t t1 left join (t t2 left join lateral (select t1.a as t1a, t2.a as t2a, * from t t3) s on t2.b = s.b) on t1.b = s.b and t1.a = t2.a; QUERY PLAN --- Nested Loop Left Join -> Seq Scan on t t1 -> Nested Loop Join Filter: (t1.a = t2.a) -> Seq Scan on t t2 -> Subquery Scan on s Filter: ((t1.b = s.b) AND (t2.b = s.b)) -> Seq Scan on t t3 (8 rows) For outer-relation Var 't1.a' from qual 't1.a = t2.a', it shares PARAM_EXEC slot 0 with the PlannerParamItem for 't1.a' within the subquery (from its targetlist). Did you notice a case that the change here breaks? Hi Tom, could you share your insights on this issue and the proposed fix? Thanks Richard
Re: A performance issue with Memoize
On 20/10/2023 17:40, Richard Guo wrote: I noticed $subject with the query below. set enable_memoize to off; explain (analyze, costs off) select * from tenk1 t1 left join lateral (select t1.two as t1two, * from tenk1 t2 offset 0) s on t1.two = s.two; QUERY PLAN Nested Loop Left Join (actual time=0.050..59578.053 rows=5000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=1 loops=1) -> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=1) Filter: (t1.two = s.two) Rows Removed by Filter: 5000 -> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=1 loops=1) Planning Time: 0.666 ms Execution Time: 60937.899 ms (8 rows) set enable_memoize to on; explain (analyze, costs off) select * from tenk1 t1 left join lateral (select t1.two as t1two, * from tenk1 t2 offset 0) s on t1.two = s.two; QUERY PLAN -- Nested Loop Left Join (actual time=0.061..122684.607 rows=5000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=1 loops=1) -> Memoize (actual time=0.011..9.821 rows=5000 loops=1) Cache Key: t1.two, t1.two Cache Mode: binary Hits: 0 Misses: 1 Evictions: Overflows: 0 Memory Usage: 1368kB -> Subquery Scan on s (actual time=0.008..5.188 rows=5000 loops=1) Filter: (t1.two = s.two) Rows Removed by Filter: 5000 -> Seq Scan on tenk1 t2 (actual time=0.004..4.081 rows=1 loops=1) Planning Time: 0.607 ms Execution Time: 124431.388 ms (12 rows) The execution time (best of 3) is 124431.388 VS 60937.899 with and without memoize. The Memoize runtime stats 'Hits: 0 Misses: 1 Evictions: ' seems suspicious to me, so I've looked into it a little bit, and found that the MemoizeState's keyparamids and its outerPlan's chgParam are always different, and that makes us have to purge the entire cache each time we rescan the memoize node. But why are they always different? Well, for the query above, we have two NestLoopParam nodes, one (with paramno 1) is created when we replace outer-relation Vars in the scan qual 't1.two = s.two', the other one (with paramno 0) is added from the subquery's subplan_params, which was created when we replaced uplevel vars with Param nodes for the subquery. That is to say, the chgParam would be {0, 1}. When it comes to replace outer-relation Vars in the memoize keys, the two 't1.two' Vars are both replaced with the NestLoopParam with paramno 1, because it is the first NLP we see in root->curOuterParams that is equal to the Vars in memoize keys. That is to say, the memoize node's keyparamids is {1}. ... Any thoughts? Do you've thought about the case, fixed with the commit 1db5667? As I see, that bugfix still isn't covered by regression tests. Could your approach of a PARAM_EXEC slot reusing break that case? -- regards, Andrei Lepikhov Postgres Professional
Re: A performance issue with Memoize
On Fri, Oct 20, 2023 at 7:43 PM Pavel Stehule wrote: > +1 > > it would be great to fix this problem - I've seen this issue a few times. > Thanks for the input. I guess this is not rare in the real world. If the subquery contains lateral reference to a Var that also appears in the subquery's join clauses, we'd probably suffer from this issue. Thanks Richard
Re: A performance issue with Memoize
On Fri, Oct 20, 2023 at 6:40 PM Richard Guo wrote: > I haven't thought thoroughly about the fix yet. But one way I'm > thinking is that in create_subqueryscan_plan() we can first add the > subquery's subplan_params to root->curOuterParams, and then replace > outer-relation Vars in scan_clauses afterwards. That can make us be > able to share the same PARAM_EXEC slot for the same Var that both > belongs to the subquery's uplevel vars and to the NestLoop's > outer-relation vars. To be concrete, something like attached. > After some more thought, I think this is the right way to fix this issue. The idea here is to make sure that the same NLP Var shares the same PARAM_EXEC slot. This change can also help to save PARAM_EXEC slots (which is trivial though since slots are very cheap). Thanks Richard
Re: A performance issue with Memoize
pá 20. 10. 2023 v 13:01 odesílatel Richard Guo napsal: > I noticed $subject with the query below. > > set enable_memoize to off; > > explain (analyze, costs off) > select * from tenk1 t1 left join lateral > (select t1.two as t1two, * from tenk1 t2 offset 0) s > on t1.two = s.two; > QUERY PLAN > > > Nested Loop Left Join (actual time=0.050..59578.053 rows=5000 loops=1) >-> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=1 loops=1) >-> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=1) > Filter: (t1.two = s.two) > Rows Removed by Filter: 5000 > -> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=1 > loops=1) > Planning Time: 0.666 ms > Execution Time: 60937.899 ms > (8 rows) > > set enable_memoize to on; > > explain (analyze, costs off) > select * from tenk1 t1 left join lateral > (select t1.two as t1two, * from tenk1 t2 offset 0) s > on t1.two = s.two; > QUERY PLAN > > -- > Nested Loop Left Join (actual time=0.061..122684.607 rows=5000 > loops=1) >-> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=1 loops=1) >-> Memoize (actual time=0.011..9.821 rows=5000 loops=1) > Cache Key: t1.two, t1.two > Cache Mode: binary > Hits: 0 Misses: 1 Evictions: Overflows: 0 Memory > Usage: 1368kB > -> Subquery Scan on s (actual time=0.008..5.188 rows=5000 > loops=1) >Filter: (t1.two = s.two) >Rows Removed by Filter: 5000 >-> Seq Scan on tenk1 t2 (actual time=0.004..4.081 > rows=1 loops=1) > Planning Time: 0.607 ms > Execution Time: 124431.388 ms > (12 rows) > > The execution time (best of 3) is 124431.388 VS 60937.899 with and > without memoize. > > The Memoize runtime stats 'Hits: 0 Misses: 1 Evictions: ' > seems suspicious to me, so I've looked into it a little bit, and found > that the MemoizeState's keyparamids and its outerPlan's chgParam are > always different, and that makes us have to purge the entire cache each > time we rescan the memoize node. > > But why are they always different? Well, for the query above, we have > two NestLoopParam nodes, one (with paramno 1) is created when we replace > outer-relation Vars in the scan qual 't1.two = s.two', the other one > (with paramno 0) is added from the subquery's subplan_params, which was > created when we replaced uplevel vars with Param nodes for the subquery. > That is to say, the chgParam would be {0, 1}. > > When it comes to replace outer-relation Vars in the memoize keys, the > two 't1.two' Vars are both replaced with the NestLoopParam with paramno > 1, because it is the first NLP we see in root->curOuterParams that is > equal to the Vars in memoize keys. That is to say, the memoize node's > keyparamids is {1}. > > I haven't thought thoroughly about the fix yet. But one way I'm > thinking is that in create_subqueryscan_plan() we can first add the > subquery's subplan_params to root->curOuterParams, and then replace > outer-relation Vars in scan_clauses afterwards. That can make us be > able to share the same PARAM_EXEC slot for the same Var that both > belongs to the subquery's uplevel vars and to the NestLoop's > outer-relation vars. To be concrete, something like attached. > > With this change the same query runs much faster and the Memoize runtime > stats looks more normal. > > explain (analyze, costs off) > select * from tenk1 t1 left join lateral > (select t1.two as t1two, * from tenk1 t2 offset 0) s > on t1.two = s.two; > QUERY PLAN > > -- > Nested Loop Left Join (actual time=0.063..21177.476 rows=5000 loops=1) >-> Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=1 loops=1) >-> Memoize (actual time=0.001..0.234 rows=5000 loops=1) > Cache Key: t1.two, t1.two > Cache Mode: binary > Hits: 9998 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage: > 2735kB > -> Subquery Scan on s (actual time=0.009..5.169 rows=5000 > loops=2) >Filter: (t1.two = s.two) >Rows Removed by Filter: 5000 >-> Seq Scan on tenk1 t2 (actual time=0.006..4.050 > rows=1 loops=2) > Planning Time: 0.593 ms > Execution Time: 22486.621 ms > (12 rows) > > Any thoughts? > +1 it would be great to fix this problem - I've seen this issue a few times. Regards Pavel > Thanks > Richard >
A performance issue with Memoize
I noticed $subject with the query below. set enable_memoize to off; explain (analyze, costs off) select * from tenk1 t1 left join lateral (select t1.two as t1two, * from tenk1 t2 offset 0) s on t1.two = s.two; QUERY PLAN Nested Loop Left Join (actual time=0.050..59578.053 rows=5000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=1 loops=1) -> Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=1) Filter: (t1.two = s.two) Rows Removed by Filter: 5000 -> Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=1 loops=1) Planning Time: 0.666 ms Execution Time: 60937.899 ms (8 rows) set enable_memoize to on; explain (analyze, costs off) select * from tenk1 t1 left join lateral (select t1.two as t1two, * from tenk1 t2 offset 0) s on t1.two = s.two; QUERY PLAN -- Nested Loop Left Join (actual time=0.061..122684.607 rows=5000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=1 loops=1) -> Memoize (actual time=0.011..9.821 rows=5000 loops=1) Cache Key: t1.two, t1.two Cache Mode: binary Hits: 0 Misses: 1 Evictions: Overflows: 0 Memory Usage: 1368kB -> Subquery Scan on s (actual time=0.008..5.188 rows=5000 loops=1) Filter: (t1.two = s.two) Rows Removed by Filter: 5000 -> Seq Scan on tenk1 t2 (actual time=0.004..4.081 rows=1 loops=1) Planning Time: 0.607 ms Execution Time: 124431.388 ms (12 rows) The execution time (best of 3) is 124431.388 VS 60937.899 with and without memoize. The Memoize runtime stats 'Hits: 0 Misses: 1 Evictions: ' seems suspicious to me, so I've looked into it a little bit, and found that the MemoizeState's keyparamids and its outerPlan's chgParam are always different, and that makes us have to purge the entire cache each time we rescan the memoize node. But why are they always different? Well, for the query above, we have two NestLoopParam nodes, one (with paramno 1) is created when we replace outer-relation Vars in the scan qual 't1.two = s.two', the other one (with paramno 0) is added from the subquery's subplan_params, which was created when we replaced uplevel vars with Param nodes for the subquery. That is to say, the chgParam would be {0, 1}. When it comes to replace outer-relation Vars in the memoize keys, the two 't1.two' Vars are both replaced with the NestLoopParam with paramno 1, because it is the first NLP we see in root->curOuterParams that is equal to the Vars in memoize keys. That is to say, the memoize node's keyparamids is {1}. I haven't thought thoroughly about the fix yet. But one way I'm thinking is that in create_subqueryscan_plan() we can first add the subquery's subplan_params to root->curOuterParams, and then replace outer-relation Vars in scan_clauses afterwards. That can make us be able to share the same PARAM_EXEC slot for the same Var that both belongs to the subquery's uplevel vars and to the NestLoop's outer-relation vars. To be concrete, something like attached. With this change the same query runs much faster and the Memoize runtime stats looks more normal. explain (analyze, costs off) select * from tenk1 t1 left join lateral (select t1.two as t1two, * from tenk1 t2 offset 0) s on t1.two = s.two; QUERY PLAN -- Nested Loop Left Join (actual time=0.063..21177.476 rows=5000 loops=1) -> Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=1 loops=1) -> Memoize (actual time=0.001..0.234 rows=5000 loops=1) Cache Key: t1.two, t1.two Cache Mode: binary Hits: 9998 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage: 2735kB -> Subquery Scan on s (actual time=0.009..5.169 rows=5000 loops=2) Filter: (t1.two = s.two) Rows Removed by Filter: 5000 -> Seq Scan on tenk1 t2 (actual time=0.006..4.050 rows=1 loops=2) Planning Time: 0.593 ms Execution Time: 22486.621 ms (12 rows) Any thoughts? Thanks Richard v1-0001-Fix-a-performance-issue-with-Memoize.patch Description: Binary data