Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ] I continue to think that this is the wrong way to go about it, and as proof of concept present the attached, which reproduces all of the regression-test plan changes appearing in v13 --- with a whole lot less mechanism and next to no added planning cycles (which certainly cannot be said of v13). I was a bit surprised to find that I didn't need to fool around with lying about whether [Merge]Append can project. I've not dug into the exact reason why, but I suspect it's that previous changes made in support of parallelization have resulted in ensuring that we push the upper tlist down to the children anyway, at some earlier stage. I haven't looked into whether this does the right things for parallel planning --- possibly create_[merge]append_path need to propagate up parallel-related path fields from the single child? Also, I wonder why you didn't teach ExecSupportsMarkRestore that a single-child MergeAppend can support mark/restore. I didn't add such code here either, but I suspect that's an oversight. One other remark is that the division of labor between create_[merge]append_path and their costsize.c subroutines seems pretty unprincipled. I'd be inclined to push all the relevant logic into costsize.c, but have not done so here. Moving the existing cost calculations in create_mergeappend_path into costsize.c would better be done as a separate refactoring patch, perhaps. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 42108bd..2be67bc 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8427,17 +8427,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J 400 | 400 | 0008 (4 rows) --- left outer join + nullable clasue -EXPLAIN (COSTS OFF) +-- left outer join + nullable clause +EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; -QUERY PLAN - Sort - Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c - -> Append - -> Foreign Scan - Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) -(5 rows) + QUERY PLAN + + Foreign Scan + Output: t1.a, ftprt2_p1.b, ftprt2_p1.c + Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) + Remote SQL: SELECT r6.a, r9.b, r9.c FROM (public.fprt1_p1 r6 LEFT JOIN public.fprt2_p1 r9 ON (((r6.a = r9.b)) AND ((r6.b = r9.a)) AND ((r9.a < 10 WHERE ((r6.a < 10)) ORDER BY r6.a ASC NULLS LAST, r9.b ASC NULLS LAST, r9.c ASC NULLS LAST +(4 rows) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; a | b | c diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index eb9d1ad..4728511 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -2309,8 +2309,8 @@ EXPLAIN (COSTS OFF) SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3; SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3; --- left outer join + nullable clasue -EXPLAIN (COSTS OFF) +-- left outer join + nullable clause +EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c index 187f892..14ad738 100644 --- a/src/backend/executor/execAmi.c +++ b/src/backend/executor/execAmi.c @@ -447,6 +447,21 @@ ExecSupportsMarkRestore(Path *pathnode) return false; /* childless Result */ } + case T_Append: + { +AppendPath *appendPath = castNode(AppendPath, pathnode); + +/* + * If there's exact
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, 4 Mar 2019 at 16:26, Tom Lane wrote: > > David Rowley writes: > > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ] > > I continue to think that this is the wrong way to go about it, > and as proof of concept present the attached, which reproduces > all of the regression-test plan changes appearing in v13 --- > with a whole lot less mechanism and next to no added planning > cycles (which certainly cannot be said of v13). Nice. I see you solved the problem I had been complaining about by pulling the surplus Append out of the final plan after setrefs, in which case the varnos are all set to the special varnos, meaning you don't suffer from the same problem with nodes higher in the plan having the varnos of the parent instead of the append child. This method is certainly much much better than what I had. Thanks for taking the time to show me how this should be done. > Also, I wonder why you didn't teach ExecSupportsMarkRestore that a > single-child MergeAppend can support mark/restore. I didn't add such > code here either, but I suspect that's an oversight. The reason for that was that I didn't ever create any single-subpath MergeAppends. I instead created Append paths having isproxy as true. This is slightly confusing as I was just borrowing Append to act as this proxy path and it was only these proxy Appends I was dealing with in ExecSupportsMarkRestore. I'll need to modify your version of the patch as you're keeping single-subpath MergeAppends paths, so ExecSupportsMarkRestore needs to know about those. > One other remark is that the division of labor between > create_[merge]append_path and their costsize.c subroutines > seems pretty unprincipled. I'd be inclined to push all the > relevant logic into costsize.c, but have not done so here. > Moving the existing cost calculations in create_mergeappend_path > into costsize.c would better be done as a separate refactoring patch, > perhaps. The part I don't like about that is the fact that we don't generate sort paths in the MergeAppend subpaths, instead, we rely on checking pathkeys_contained_in() again in create_merge_append_plan() and just creating a sort node, if required. There is some repeat pathkeys checking there but I wonder if we'll see much a performance regression if we go to the trouble of making sort paths. Is this what you meant? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > On Mon, 4 Mar 2019 at 16:26, Tom Lane wrote: >> One other remark is that the division of labor between >> create_[merge]append_path and their costsize.c subroutines >> seems pretty unprincipled. I'd be inclined to push all the >> relevant logic into costsize.c, but have not done so here. >> Moving the existing cost calculations in create_mergeappend_path >> into costsize.c would better be done as a separate refactoring patch, >> perhaps. > The part I don't like about that is the fact that we don't generate > sort paths in the MergeAppend subpaths, instead, we rely on checking > pathkeys_contained_in() again in create_merge_append_plan() and just > creating a sort node, if required. There is some repeat pathkeys > checking there but I wonder if we'll see much a performance regression > if we go to the trouble of making sort paths. Is this what you meant? No, I'm just suggesting taking the stanza "Add up the sizes and costs of the input paths" out of create_merge_append_path and putting it into cost_merge_append. It seems weird to me that in the plain Append case, cost_append does the adding-up of the child costs, but in the MergeAppend case it's not done similarly. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, 4 Mar 2019 at 16:26, Tom Lane wrote: > I was a bit surprised to find that I didn't need to fool around > with lying about whether [Merge]Append can project. I've not dug > into the exact reason why, but I suspect it's that previous changes > made in support of parallelization have resulted in ensuring that > we push the upper tlist down to the children anyway, at some earlier > stage. I can get a plan that does end up with Result nodes above a Scan node. create table rangep (a int, b int) partition by range (a); create table rangep1 partition of rangep for values from(0) to (100); explain select r1::text from rangep r1 inner join rangep r2 on r1::text = r2::Text order by r1::text; However, if we modify is_projection_capable_plan() similar to how ExecSupportsMarkRestore() has been modified then we'd need to ensure that any changes made to the Append/MergeAppend's tlist also are made to the underlying node's tlist. I'm unsure if that's worth the complexity. What do you think? > I haven't looked into whether this does the right things for parallel > planning --- possibly create_[merge]append_path need to propagate up > parallel-related path fields from the single child? I looked at this. For Append, with the current code in add_paths_to_append_rel() we won't increase the parallel_workers higher than the parallel workers from the single child rel. The current code does: parallel_workers = Max(parallel_workers, fls(list_length(live_childrels))); so fls(1) == 1. However, I wonder if it's better to set that explicitly in create_append_path() for the list_length(pathnode->subpaths) == 1 case. The other issue I found when looking into the parallel code was that partial paths with pathkeys will get made, but will never be used. The reason is that Append can't normally do anything with these as it can't maintain their sort order and MergeAppend is not a parallel aware node. I ended up adding some new code at the bottom of generate_append_paths() to add these paths as single subpaths to Append nodes. This also required changing create_append_path() so that it records the child pathkeys when dealing with a single subpath Append. > Also, I wonder why you didn't teach ExecSupportsMarkRestore that a > single-child MergeAppend can support mark/restore. I didn't add such > code here either, but I suspect that's an oversight. Added code for that. > One other remark is that the division of labor between > create_[merge]append_path and their costsize.c subroutines > seems pretty unprincipled. I'd be inclined to push all the > relevant logic into costsize.c, but have not done so here. > Moving the existing cost calculations in create_mergeappend_path > into costsize.c would better be done as a separate refactoring patch, > perhaps. I've not touched that, but happy to do it as a separate patch. I've attached my changes to your v14 patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services drop-useless-merge-appends-15.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > [ drop-useless-merge-appends-15.patch ] Pushed with some minor adjustments. > I can get a plan that does end up with Result nodes above a Scan node. > create table rangep (a int, b int) partition by range (a); > create table rangep1 partition of rangep for values from(0) to (100); > explain select r1::text from rangep r1 inner join rangep r2 on > r1::text = r2::Text order by r1::text; > However, if we modify is_projection_capable_plan() similar to how > ExecSupportsMarkRestore() has been modified then we'd need to ensure > that any changes made to the Append/MergeAppend's tlist also are made > to the underlying node's tlist. I'm unsure if that's worth the > complexity. What do you think? I'm inclined to think not, at least not without more compelling examples than that one ;-). Note that we had Result-above-the-Append before, so we're not regressing for such cases, we're just leaving something on the table. >> One other remark is that the division of labor between >> create_[merge]append_path and their costsize.c subroutines >> seems pretty unprincipled. I'd be inclined to push all the >> relevant logic into costsize.c, but have not done so here. >> Moving the existing cost calculations in create_mergeappend_path >> into costsize.c would better be done as a separate refactoring patch, >> perhaps. > I've not touched that, but happy to do it as a separate patch. After looking at this, I'm less excited than I was before. It seems it'd be only marginally less crufty than the way it is now. In particular, costsize.c would have to be aware of the rule about single-child MergeAppend being removable, which seems a little outside its purview: typically we only ask costsize.c to estimate the cost of a plan node as-presented, not to make decisions about what the shape of the plan tree will be. So I left it alone. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Tue, 26 Mar 2019 at 09:00, Tom Lane wrote: > > David Rowley writes: > > [ drop-useless-merge-appends-15.patch ] > > Pushed with some minor adjustments. Thank for all your work on this, and thanks for the final push. FWIW, you should probably have credited yourself as the main author since I don't think there was much of my patch left. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
I've attached an updated patch. The last one no longer applied due to the changes made in d723f5687 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v10-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra > wrote: >> AFAICS the patch essentially does two things: (a) identifies Append >> paths with a single member and (b) ensures the Vars are properly mapped >> when the Append node is skipped when creating the plan. >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But >> perhaps we could do at least (b) to setrefs.c? > The problem I found with doing var translations in setrefs.c was that > the plan tree is traversed there breadth-first and in createplan.c we > build the plan depth-first. The problem I didn't manage to overcome > with that is that when we replace a "ProxyPath" (actually an Append > path marked as is_proxy=true) it only alter targetlists, etc of nodes > above that in the plan tree. The nodes below it and their targetlists > don't care, as these don't fetch Vars from nodes above them. If we do > the Var translation in setrefs then we've not yet looked at the nodes > that don't care and we've already done the nodes that do care. So the > tree traversal is backwards. I don't quite buy this argument, because you haven't given a reason why it doesn't apply with equal force to setrefs.c's elimination of no-op SubqueryScan nodes. Yet that does work. Stepping back for a minute: even if we get this to work, how often is it going to do anything useful? It seems like most of the other development that's going on is trying to postpone pruning till later, so I wonder how often we'll really know at the beginning of planning that only one child will survive. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Sat, 26 Jan 2019 at 13:51, Tom Lane wrote: > > David Rowley writes: > > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra > > wrote: > >> AFAICS the patch essentially does two things: (a) identifies Append > >> paths with a single member and (b) ensures the Vars are properly mapped > >> when the Append node is skipped when creating the plan. > >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But > >> perhaps we could do at least (b) to setrefs.c? > > > The problem I found with doing var translations in setrefs.c was that > > the plan tree is traversed there breadth-first and in createplan.c we > > build the plan depth-first. The problem I didn't manage to overcome > > with that is that when we replace a "ProxyPath" (actually an Append > > path marked as is_proxy=true) it only alter targetlists, etc of nodes > > above that in the plan tree. The nodes below it and their targetlists > > don't care, as these don't fetch Vars from nodes above them. If we do > > the Var translation in setrefs then we've not yet looked at the nodes > > that don't care and we've already done the nodes that do care. So the > > tree traversal is backwards. > > I don't quite buy this argument, because you haven't given a reason > why it doesn't apply with equal force to setrefs.c's elimination of > no-op SubqueryScan nodes. Yet that does work. I assume you're talking about the code that's in set_subqueryscan_references() inside the trivial_subqueryscan() condition? If so, then that seems pretty different from what I'm doing here because trivial_subqueryscan() only ever returns true when the Vars/Exprs from the subquery's target list match the scan's targetlist exactly, in the same order. It's not quite that simple with the single-subpath Append/MergeAppend case since the Vars/Exprs won't be the same since they're from different relations and possibly in some different order. > Stepping back for a minute: even if we get this to work, how often > is it going to do anything useful? It seems like most of the other > development that's going on is trying to postpone pruning till later, > so I wonder how often we'll really know at the beginning of planning > that only one child will survive. It'll do something useful everytime the planner would otherwise produce an Append or MergeAppend with a single subpath. Pruning down to just 1 relation is pretty common, so seems worth optimising for. There's no other development that postpones pruning until later. If you're talking about run-time pruning then nothing is "postponed". We still attempt to prune during planning, it's just that we might not be able to prune in some cases until during execution, so we may do both. You make it sound like we're trying to do away with plan-time pruning, but that's not happening, in fact, we're working pretty hard to speed up the planner when plan-time pruning *can* be done. So far there are no patches to speed up planner for partitioned tables when the planner can't prune any partitions, although it is something that I've been giving some thought to. I've also attached a rebased patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v11-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, 31 Jan 2019 at 17:22, David Rowley wrote: > I've also attached a rebased patch. Rebased again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v12-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Tue, 5 Feb 2019 at 12:05, David Rowley wrote: > Rebased again. and again, due to new create_append_path() call added in ab5fcf2b04f9. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Feb 19, 2018 at 4:02 AM, David Rowley wrote: > On 19 February 2018 at 18:01, David Rowley > wrote: >> On 19 February 2018 at 15:11, Tomas Vondra >> wrote: >>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >>> naming for boolean variables. >> >> You're right. I'll change that and post an updated patch. I'll also >> reprocess your email again and try to improve some comments to make >> the intent of the code more clear. > > I've made a few changes to the patch. "isproxy" is now "is_proxy". > I've made the comments a bit more verbose at the top of the definition > of struct AppendPath. Also shuffled some comments around in allpaths.c > > I've attached both a delta patch with just these changes, and also a > completely new patch based on a recent master. While I was working on http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com I noticed that a projection path is very close to being usable as a proxy path, because we've already got code to strip out unnecessary proxy paths at plan generation time. I noticed that there were a few problems with that which I wrote patches to fix (see 0001, 0002 attached to that email) but they seem to be only minor issues. What do you think about the idea of using a projection path as a proxy path instead of inventing a new method? It seems simple enough to do: new_path = (Path *) create_projection_path(root, new_rel, old_path, old_path->pathtarget); ...when we need a proxy path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 14 March 2018 at 09:25, Robert Haas wrote: > What do you think about the idea of using a projection path as a proxy > path instead of inventing a new method? It seems simple enough to do: > > new_path = (Path *) create_projection_path(root, new_rel, old_path, > old_path->pathtarget); > > ...when we need a proxy path. I'm very open to finding a better way to do this, but does that not just handle the targetlist issue? The proxy path also carries information which allows the translation of Vars in other places in the plan from the old rel into the vars of the new rel. Join conditions, sort clauses etc. Wouldn't a ProjectionPath just need the same additional translation fields that I've bolted onto AppendPath to make it work properly? I've looked at the projection path code but got a bit confused with the dummypp field. I see where it gets set, but not where it gets checked. A comment in create_projection_plan() seems to explain that dummypp is pretty useless since targetlists are modified sometimes, so I'm a bit unsure what the point of it is. Maybe that just goes to show that my understanding of projection paths is not that great. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Tue, Mar 13, 2018 at 8:20 PM, David Rowley wrote: > On 14 March 2018 at 09:25, Robert Haas wrote: >> What do you think about the idea of using a projection path as a proxy >> path instead of inventing a new method? It seems simple enough to do: >> >> new_path = (Path *) create_projection_path(root, new_rel, old_path, >> old_path->pathtarget); >> >> ...when we need a proxy path. > > I'm very open to finding a better way to do this, but does that not > just handle the targetlist issue? The proxy path also carries > information which allows the translation of Vars in other places in > the plan from the old rel into the vars of the new rel. Join > conditions, sort clauses etc. > > Wouldn't a ProjectionPath just need the same additional translation > fields that I've bolted onto AppendPath to make it work properly? Well, I guess I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, Mar 15, 2018 at 9:22 AM, Robert Haas wrote: >> Wouldn't a ProjectionPath just need the same additional translation >> fields that I've bolted onto AppendPath to make it work properly? > > Well, I guess I'm not sure. Sorry, hit send too soon there. I'm not sure I entirely understand the purpose of those changes; I think the comments could use some work. For example: + /* +* First we must record the translation expressions in the PlannerInfo. +* These need to be found when the expression translation is being done +* when the final plan is being assembled. +*/ + /* +* Record this child as having been promoted. Some places treat child +* relations in a special way, and this will give them a VIP ticket to +* adulthood, where required. +*/ These comments just recapitulate what the code does, without really explaining what problem we're trying to solve ("need" for unspecified reasons, unspecified "special" handling). That said, I gather that one problem is the path might contain references to child varnos where we need to reference parent varnos. That does seem like something we need to handle, but I'm not sure whether this is really the right method. I wonder if we couldn't deduce the necessary translations from the parent pointers of the paths. That is, if we've got a proxy path associated with the parent rel and the path it's proxying is associated with the child rel, then we need to translate from realpath->parent->relids to proxypath->parent_relids. Not that this is an argument of overwhelming import, but note that ExecSupportsMarkRestore wouldn't need to change if we used ProjectionPath; that's already got the required handling. If we stick with your idea of using AppendPath, do we actually need generate_proxy_paths()? What paths get lost if we don't have a special case for them here? I find promote_child_rel_equivalences() kind of scary. It seems like a change that might have unintended and hard-to-predict consequences. Perhaps that's only my own lack of knowledge. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Robert Haas writes: > That said, I gather that one problem is the path might contain > references to child varnos where we need to reference parent varnos. > That does seem like something we need to handle, but I'm not sure > whether this is really the right method. I wonder if we couldn't > deduce the necessary translations from the parent pointers of the > paths. That is, if we've got a proxy path associated with the parent > rel and the path it's proxying is associated with the child rel, then > we need to translate from realpath->parent->relids to > proxypath->parent_relids. I hadn't been paying much attention to this thread, but I've now taken a quick look at the 2018-02-19 patch, and I've got to say I do not like it much. The changes in createplan.c in particular seem like hack-and- slash rather than anything principled or maintainable. The core issue here is that Paths involving the appendrel and higher will contain Vars referencing the appendrel's varno, whereas the child is set up to emit Vars containing its own varno, and somewhere we've got to match those up. I don't think though that this problem is exactly specific to single-member Appends, and so I would rather we not invent a solution that's specific to that. A nearly identical issue is getting rid of no-op SubqueryScan nodes. I've long wished we could simply not emit those in the first place, but it's really hard to do because of the fact that Vars inside the subquery have different varnos from those outside. (I've toyed with the idea of globally flattening the rangetable before we start planning, not at the end, but haven't made it happen yet; and anyway that would be only one step towards such a goal.) It might be worth looking at whether we couldn't fix the single-member- Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c get rid of them. That's not the most beautiful solution perhaps, but it'd be very localized and low-risk. In general setrefs.c is the right place to deal with variable-matching issues. So even if you don't like that specific idea, it'd probably be worth thinking about handling this by recording instructions telling setrefs what to do, instead of actually doing it at earlier stages. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane wrote: > It might be worth looking at whether we couldn't fix the single-member- > Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c > get rid of them. That's not the most beautiful solution perhaps, but > it'd be very localized and low-risk. That's definitely a thought; it's a probably the simplest way of saving the run-time cost of the Append node. However, I don't think it's a great solution overall because it doesn't get us the other advantages that David mentions in his original post. I think that to gain those advantages we'll need to know at path-creation time that there won't ultimately be an Append node in the finished plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Robert Haas writes: > On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane wrote: >> It might be worth looking at whether we couldn't fix the single-member- >> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c >> get rid of them. That's not the most beautiful solution perhaps, but >> it'd be very localized and low-risk. > That's definitely a thought; it's a probably the simplest way of > saving the run-time cost of the Append node. However, I don't think > it's a great solution overall because it doesn't get us the other > advantages that David mentions in his original post. I think that to > gain those advantages we'll need to know at path-creation time that > there won't ultimately be an Append node in the finished plan. Meh. We could certainly know that by inspection ("only one child? it'll be history"). I remain of the opinion that this is a big patch with a small patch struggling to get out. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 16 March 2018 at 02:46, Robert Haas wrote: > If we stick with your idea of using AppendPath, do we actually need > generate_proxy_paths()? What paths get lost if we don't have a > special case for them here? Actually, we do a surprisingly good job of allowing plan shapes to stay the same when planning for an only-child scan for an Append or MergeAppend. The main difference I did find was that and Append and MergeAppend don't support Mark and Restore, so if you just generated the same paths and simply skipped over Appends and MergeAppends you'd still be left with Materialize nodes which might not actually be required at all. This might not be huge, but seeing this made me worried that there might be some valid reason, if not today, then sometime in the future why it might not be safe to simply pluck the singleton Append/MergeAppend nodes out the plan tree. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 16 March 2018 at 04:01, Tom Lane wrote: > I hadn't been paying much attention to this thread, but I've now taken > a quick look at the 2018-02-19 patch, and I've got to say I do not like > it much. The changes in createplan.c in particular seem like hack-and- > slash rather than anything principled or maintainable. Thanks for looking at this. I didn't manage to discover any other working solutions to when the Vars can be replaced. If we don't do this in createplan.c then it's going to cause problems in places such as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c gets hold of the plan. > The core issue here is that Paths involving the appendrel and higher > will contain Vars referencing the appendrel's varno, whereas the child > is set up to emit Vars containing its own varno, and somewhere we've got > to match those up. I don't think though that this problem is exactly > specific to single-member Appends, and so I would rather we not invent a > solution that's specific to that. A nearly identical issue is getting > rid of no-op SubqueryScan nodes. I've long wished we could simply not > emit those in the first place, but it's really hard to do because of > the fact that Vars inside the subquery have different varnos from those > outside. (I've toyed with the idea of globally flattening the rangetable > before we start planning, not at the end, but haven't made it happen yet; > and anyway that would be only one step towards such a goal.) I'm not quite sure why you think the solution I came up with is specific to single-member Appends. The solution merely uses single-member Append paths as a proxy path for the solution which I've tried to make generic to any node type. For example, the patch also resolves the issue for MergeAppend, so certainly nothing in there is specific to single-member Appends. I could have made the proxy any other path type, it's just that you had suggested Append would be better than inventing ProxyPath, which is what I originally proposed. > It might be worth looking at whether we couldn't fix the single-member- > Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c > get rid of them. That's not the most beautiful solution perhaps, but > it'd be very localized and low-risk. It might be possible, but wouldn't that only solve 1 out of 2 problems. The problem of the planner not generating the most optimal plan is ignored with this. For example, it does not make much sense to bolt a Materialize node on top of an IndexScan node in order to provide the IndexScan with mark/restore capabilities... They already allow that. > In general setrefs.c is the right place to deal with variable-matching > issues. So even if you don't like that specific idea, it'd probably be > worth thinking about handling this by recording instructions telling > setrefs what to do, instead of actually doing it at earlier stages. >From what I can see, setrefs.c is too late. ERRORs are generated before setrefs.c gets hold of the plan if we don't replace Vars. I'm not opposed to finding a better way to do this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Now that the September 'fest is open for new patches I'm going to move this patch over there. This patch has become slightly less important than some other stuff, but I'd still like to come back to it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, 2 Jul 2018 at 09:37, David Rowley wrote: > Now that the September 'fest is open for new patches I'm going to move > this patch over there. This patch has become slightly less important > than some other stuff, but I'd still like to come back to it. This had bit-rotted quite a bit, so I've rebased it on top of today's master. There's not really any other changes to speak of. I've not reevaluated the patch to see if there is any more simple way to take care of this problem yet. However, I do recall getting off to quite a few false starts with this patch and the way I have it now was the only way I saw to make it possible. Although, perhaps some more experience will show me something different. Anyway, I've attached the rebased version. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v8-0001-Remove-single-subpath-Append-and-MergeAppend-node.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 4/1/18 9:26 AM, David Rowley wrote: > On 16 March 2018 at 04:01, Tom Lane wrote: >> I hadn't been paying much attention to this thread, but I've now taken >> a quick look at the 2018-02-19 patch, and I've got to say I do not like >> it much. The changes in createplan.c in particular seem like hack-and- >> slash rather than anything principled or maintainable. > > Thanks for looking at this. I didn't manage to discover any other > working solutions to when the Vars can be replaced. If we don't do > this in createplan.c then it's going to cause problems in places such > as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c > gets hold of the plan. > >> The core issue here is that Paths involving the appendrel and higher >> will contain Vars referencing the appendrel's varno, whereas the child >> is set up to emit Vars containing its own varno, and somewhere we've got >> to match those up. I don't think though that this problem is exactly >> specific to single-member Appends, and so I would rather we not invent a >> solution that's specific to that. A nearly identical issue is getting >> rid of no-op SubqueryScan nodes. I've long wished we could simply not >> emit those in the first place, but it's really hard to do because of >> the fact that Vars inside the subquery have different varnos from those >> outside. (I've toyed with the idea of globally flattening the rangetable >> before we start planning, not at the end, but haven't made it happen yet; >> and anyway that would be only one step towards such a goal.) > > I'm not quite sure why you think the solution I came up with is > specific to single-member Appends. The solution merely uses > single-member Append paths as a proxy path for the solution which I've > tried to make generic to any node type. For example, the patch also > resolves the issue for MergeAppend, so certainly nothing in there is > specific to single-member Appends. I could have made the proxy any > other path type, it's just that you had suggested Append would be > better than inventing ProxyPath, which is what I originally proposed. > I think a natural question is how the approach devised in this thread (based on single-member Append paths) could be used to deal with no-op Subquery nodes. I don't see an obvious way to do that, and I guess that's what Tom meant by "specific to single-member Appends". >> It might be worth looking at whether we couldn't fix the single-member- >> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c >> get rid of them. That's not the most beautiful solution perhaps, but >> it'd be very localized and low-risk. > > It might be possible, but wouldn't that only solve 1 out of 2 > problems. The problem of the planner not generating the most optimal > plan is ignored with this. For example, it does not make much sense to > bolt a Materialize node on top of an IndexScan node in order to > provide the IndexScan with mark/restore capabilities... They already > allow that. > Yeah, I think we can't do all the magic in setrefs.c if we want the decisions to affect plan shape in any significant way - the decision whether an Append node has only a single node needs to happen while constructing the path or shortly after it (before we build other paths on top of it). And before costing the path. So setrefs.c seems a too late for that :-( I suppose this limitation was acceptable for no-op Subqueries, but I'm not sure the same thing applies to single-member Appends. >> In general setrefs.c is the right place to deal with variable-matching >> issues. So even if you don't like that specific idea, it'd probably be >> worth thinking about handling this by recording instructions telling >> setrefs what to do, instead of actually doing it at earlier stages. > > From what I can see, setrefs.c is too late. ERRORs are generated > before setrefs.c gets hold of the plan if we don't replace Vars. > > I'm not opposed to finding a better way to do this. > AFAICS the patch essentially does two things: (a) identifies Append paths with a single member and (b) ensures the Vars are properly mapped when the Append node is skipped when creating the plan. I agree doing (a) in setrefs.c is too late, as mentioned earlier. But perhaps we could do at least (b) to setrefs.c? I'd say mapping the Vars is the most fragile and error-prone piece of this patch. It's a bit annoying that it's inventing a new infrastructure to do that, translates expressions in various places, establishes new rules for tlists (having to call finalize_plan_tlist when calling build_path_tlist before create_plan_recurse), etc. But we already do such mappings (not exactly the same, but similar) for other purposes - from setrefs.c. So why can't why do it the same way here? FWIW I haven't tried to do any of that, and I haven't looked on this patch for quite a long time, so perhaps I'm missing an obvious obstacle preventing us from doing that ... regards -- Tomas Vo
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, 3 Jan 2019 at 08:01, Tomas Vondra wrote: > AFAICS the patch essentially does two things: (a) identifies Append > paths with a single member and (b) ensures the Vars are properly mapped > when the Append node is skipped when creating the plan. Yes, but traditional Append and MergeAppend paths with a single member are never actually generated. I say "traditional" as we do happen to use a single-subpath Append as the "proxy" path to represent a path that needs to not have a plan node generated during createplan. Originally I had a new Path type named "ProxyPath" to do this, but per recommendation from Tom, I overloaded Append to mean that. (Append already has its meaning overloaded in regards to dummy paths.) > I agree doing (a) in setrefs.c is too late, as mentioned earlier. But > perhaps we could do at least (b) to setrefs.c? The problem I found with doing var translations in setrefs.c was that the plan tree is traversed there breadth-first and in createplan.c we build the plan depth-first. The problem I didn't manage to overcome with that is that when we replace a "ProxyPath" (actually an Append path marked as is_proxy=true) it only alter targetlists, etc of nodes above that in the plan tree. The nodes below it and their targetlists don't care, as these don't fetch Vars from nodes above them. If we do the Var translation in setrefs then we've not yet looked at the nodes that don't care and we've already done the nodes that do care. So the tree traversal is backwards. I sort of borrowed the traversal in createplan.c since it was in the correct depth-first order. Equally, I could have invented an entirely new stage that traverses the path tree depth-first, but I imagined that would have added more overhead and raised even more questions. Piggy-backing on createplan seemed like the best option at the time. > I'd say mapping the Vars is the most fragile and error-prone piece of > this patch. It's a bit annoying that it's inventing a new infrastructure > to do that, translates expressions in various places, establishes new > rules for tlists (having to call finalize_plan_tlist when calling > build_path_tlist before create_plan_recurse), etc. I entirely agree. It's by far the worst part of the patch. Getting the code to a working state to do this was hard. I kept finding places that I'd missed the translation. I'd rather it worked some other way. I just don't yet know what that way is. I changed the core regression table "tenk" to become a partitioned table with a single partition in the hope to catch all of the places that need the translations to be performed. I'm not entirely confident that I've caught them all by doing this. I've attached an updated patch that's really just a rebased version of v8. The old version conflicted with changes made in 1db5667b. The only real change was to the commit message. I'd previously managed to miss out the part about not generating needless Append/MergeAppend paths can allow plan shapes that were previously not possible. I'd only mentioned the executor not having to pull tuples through these nodes, which increases performance. Not having this perhaps caused some of the confusion earlier in this thread. One drawback to this patch, which I'm unsure if I previously mentioned is that run-time pruning only works for Append and MergeAppend. If we remove the Append/MergeAppend node then the single Append/MergeAppend sub-plan has no hope of being pruned. Going from 1 to 0 sub-plans may not be that common, but no longer supporting that is something that needs to be considered. I had imagined that gained flexibility of plan shapes plus less executor overhead was better than that, but there likely is a small use-case for keeping that ability. It seems impossible to have both advantages of this patch without that disadvantage. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v9-0001-Forgo-generating-single-subpath-Append-and-MergeA.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Jul 02, 2018 at 09:37:31AM +1200, David Rowley wrote: > Now that the September 'fest is open for new patches I'm going to move > this patch over there. This patch has become slightly less important > than some other stuff, but I'd still like to come back to it. Please note that the latest patch does not apply anymore, so I moved it to CF 2018-11 (already this time of the year!), waiting for your input. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
I've attached an updated patch which takes care of the build failure caused by the new call to create_append_path added in bb94ce4. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-01-26.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Hi, On 01/25/2018 01:55 PM, David Rowley wrote: > I've attached an updated patch which takes care of the build failure > caused by the new call to create_append_path added in bb94ce4. > I've been looking at the patch over the past few days. I haven't found any obvious issues with it, but I do have a couple of questions. 1) I can confirm that it indeed eliminates the Append overhead, using the example from [1], modified to use table with a single partition. Of course, this is a pretty formal check, because the patch simply removes the Append node and so it'd be utterly broken if the overhead did not disappear too. [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com 2) If I had to guess where the bugs will be, my guess would be a missing finalize_plan_tlist call - not necessarily in existing code, but perhaps in future improvements. I wonder if we could automate this call somehow, say by detecting when create_plan_recurse is called after build_path_tlist (for a given path), and if needed calling finalize_plan_tlist from create_plan_recurse. Although, now that I look at it, promote_child_relation may be doing something like that by adding stuff to the translate_* variables. I'm not quite sure why we need to append to the lists, though? 3) I'm not quite sure what replace_translatable_exprs does, or more precisely why do we actually need it. At this point the comment pretty much says "We need to do translation. This function does translation." Perhaps it should mention why removing/skipping the AppendPath implies the need for translation or something? 4) The thread talks about "[Merge]Append" but the patch apparently only tweaks create_append_path and does absolutely nothing to "merge" case. While looking into why, I notices that generate_mergeappend_paths always gets all_child_pathkeys=NULL in this case (with single subpath), and so we never create the MergePath at all. I suppose there's some condition somewhere responsible for doing that, but I haven't found it ... 5) The comment before AppendPath typedef says this: * An AppendPath with a single subpath can be set up to become a "proxy" * path. This allows a Path which belongs to one relation to be added to * the pathlist of some other relation. This is intended as generic * infrastructure, but its primary use case is to allow Appends with * only a single subpath to be removed from the final plan. I'm not quite sure how adding a 'isproxy' flag into AppendPath node makes this a generic infrastructure? Wouldn't an explicit ProxyPath be a better way to achieve that? What other proxy paths do you envision, and why should they be represented as AppendPath? Although, there seems to be a precedent in using AppendPath to represent dummy paths ... FWIW one advantage of a separate ProxyPath would be that it would be a much clearer breakage for third-party code (say, extensions tweaking paths using hooks), either at compile or execution time. With just a new flag in existing node, that may easily slip through. On the other hand no one promises the structures to be stable API ... 6) I suggest we add this assert to create_append_path: Assert(!isproxy || (isproxy && (list_length(subpaths)==1))); and perhaps we should do s/isproxy/is_proxy/ which seems like the usual naming for boolean variables. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 19 February 2018 at 15:11, Tomas Vondra wrote: > 1) I can confirm that it indeed eliminates the Append overhead, using > the example from [1], modified to use table with a single partition. Of > course, this is a pretty formal check, because the patch simply removes > the Append node and so it'd be utterly broken if the overhead did not > disappear too. > > [1] > https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com Thanks for testing that. > 2) If I had to guess where the bugs will be, my guess would be a missing > finalize_plan_tlist call - not necessarily in existing code, but perhaps > in future improvements. I agree with that. I'd say though that additionally for the future that we might end up with some missing translation calls to replace_translatable_exprs(). To try to ensure I didn't miss anything, I did previously modify the regression test tables "tenk" and "tenk1" to become partitioned tables with a single partition which allowed all possible values to try to ensure I'd not missed anywhere. I just failed to find a reasonable way to make this a more permanent test without enforcing that change on the tables as a permanent change. > I wonder if we could automate this call somehow, say by detecting when > create_plan_recurse is called after build_path_tlist (for a given path), > and if needed calling finalize_plan_tlist from create_plan_recurse. Sounds nice, but I'm unsure how we could do that. > Although, now that I look at it, promote_child_relation may be doing > something like that by adding stuff to the translate_* variables. Do you mean this? /* * Record this child as having been promoted. Some places treat child * relations in a special way, and this will give them a VIP ticket to * adulthood, where required. */ root->translated_childrelids = bms_add_members(root->translated_childrelids, child->relids); That's to get around a problem in use_physical_tlist() where there's different behaviour for RELOPT_OTHER_MEMBER_REL than there is for RELOPT_BASEREL. Another option might be to instead change the RELOPT_OTHER_MEMBER_REL into RELOPT_BASEREL, although I was a little too scared that might result in many other areas requiring being changed. I may be wrong about that though. We do, for example, occasionally change a RELOPT_BASEREL into a RELOPT_DEADREL, so RelOptInfos changing their RelOptKind is not a new concept, so perhaps it's fine. > I'm > not quite sure why we need to append to the lists, though? Multiple Append rels may be replaced by their only-child relation, so we'd need to be able to translate the targetlists for both. For example, joins above the Appends may contain Vars which belong to either of the Appends and those need to be translated into the promoted child relation's Vars. > 3) I'm not quite sure what replace_translatable_exprs does, or more > precisely why do we actually need it. At this point the comment pretty > much says "We need to do translation. This function does translation." > Perhaps it should mention why removing/skipping the AppendPath implies > the need for translation or something? It does mention "Expr translation is required to translate the targetlists of nodes above the Append", but perhaps I can make that a bit more clear. Let's say we have a query such as: SELECT * FROM fact f INNER JOIN dim1 d ON f.dim1_id = d.id WHERE f.date BETWEEN '2018-01-01' AND '2018-01-31'; Which results in a hash join and a single Jan 2018 partition being scanned for "fact". A plan without the Append node having been removed would have the Append targetlist containing the Vars from the "fact" table, as would the Hash Join node... The problem this function is trying to solve is translating the Vars in the Hash Join node (or any node above the Append) to change the "fact" Vars into "fact_jan2018" Vars. In create_plan.c we skip over any isproxy Append paths and instead return the recursive call to the single proxy-path. Without the translation we'd run into problems when trying to find Vars in the targetlists later. > 4) The thread talks about "[Merge]Append" but the patch apparently only > tweaks create_append_path and does absolutely nothing to "merge" case. > While looking into why, I notices that generate_mergeappend_paths always > gets all_child_pathkeys=NULL in this case (with single subpath), and so > we never create the MergePath at all. I suppose there's some condition > somewhere responsible for doing that, but I haven't found it ... Yeah, only Append paths are used as proxy paths. The confusion here is that the path creation logic has also been changed in allpaths.c (see generate_proxy_paths()), so that we no longer build MergeAppend paths when there's a single subpath. It's probably just the fact that Append is being hi-jacked to act as ProxyPath that's causing the confusion there. If you look over generate_proxy_paths and where it gets called from, you'll see that we, instead of creating App
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 19 February 2018 at 18:01, David Rowley wrote: > On 19 February 2018 at 15:11, Tomas Vondra > wrote: >> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >> naming for boolean variables. > > You're right. I'll change that and post an updated patch. I'll also > reprocess your email again and try to improve some comments to make > the intent of the code more clear. I've made a few changes to the patch. "isproxy" is now "is_proxy". I've made the comments a bit more verbose at the top of the definition of struct AppendPath. Also shuffled some comments around in allpaths.c I've attached both a delta patch with just these changes, and also a completely new patch based on a recent master. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-02-19_delta.patch Description: Binary data remove_singleton_appends_2018-02-19.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Wed, Nov 15, 2017 at 3:17 PM, David Rowley wrote: > The remove_singleton_appends_examples_of_differences_2017-11-15.patch > which I've attached applies changes to the regression tests to make > many of the major tables partitioned tables with a single partition. > It also changes the expected results of the small insignificant plan > changes and leaves only the actual true differences. This file is not > intended for commit, but more just a demo of it working for a larger > variety of plan shapes. There is actually no additional tests with the > patch. It relies on the places in the existing tests which have > changed. I didn't add any extra as I can't think of any particular > area to target given that it's such a generic thing that can apply to > so many different cases. The last patch set does not apply and did not get any reviews. So I am moving this item to the next with waiting on author as status. -- Michael
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 30 November 2017 at 15:34, Michael Paquier wrote: > On Wed, Nov 15, 2017 at 3:17 PM, David Rowley > wrote: > > The remove_singleton_appends_examples_of_differences_2017-11-15.patch > > which I've attached applies changes to the regression tests to make > > many of the major tables partitioned tables with a single partition. > > It also changes the expected results of the small insignificant plan > > changes and leaves only the actual true differences. This file is not > > intended for commit, but more just a demo of it working for a larger > > variety of plan shapes. There is actually no additional tests with the > > patch. It relies on the places in the existing tests which have > > changed. I didn't add any extra as I can't think of any particular > > area to target given that it's such a generic thing that can apply to > > so many different cases. > > The last patch set does not apply and did not get any reviews. So I am > moving this item to the next with waiting on author as status. > (returning from 2 weeks leave) Thanks for managing the commitfest. I've attached a patch which fixes the conflict with the regression test expected output in inherits.out. I've also made changes to the expected output in the new partition_prune test's expected output. I'll set this back to waiting on review now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_append_rel_2017-11-30.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, Dec 7, 2017 at 5:11 AM, David Rowley wrote: > > While rebasing this today I also noticed that we won't properly detect > unique joins in add_paths_to_joinrel() as we're still testing for > uniqueness against the partitioned parent rather than the only child. > This is likely not a huge problem since we'll always get a false > negative and never a false positive, but it is a missing optimisation. > I've not thought of how to solve it yet, it's perhaps not worth going > to too much trouble over. > It's only the jointrelids that are parent's relids, but all the following tests for unique join use RelOptInfo::relids which are child relids. case JOIN_UNIQUE_INNER: extra.inner_unique = bms_is_subset(sjinfo->min_lefthand, outerrel->relids); break; case JOIN_UNIQUE_OUTER: extra.inner_unique = innerrel_is_unique(root, outerrel->relids, innerrel, JOIN_INNER, restrictlist, false); break; default: extra.inner_unique = innerrel_is_unique(root, outerrel->relids, innerrel, jointype, restrictlist, false); break; Do you have a testcase, which shows the problem? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 11 December 2017 at 21:18, Ashutosh Bapat wrote: > On Thu, Dec 7, 2017 at 5:11 AM, David Rowley > wrote: >> While rebasing this today I also noticed that we won't properly detect >> unique joins in add_paths_to_joinrel() as we're still testing for >> uniqueness against the partitioned parent rather than the only child. >> This is likely not a huge problem since we'll always get a false >> negative and never a false positive, but it is a missing optimisation. >> I've not thought of how to solve it yet, it's perhaps not worth going >> to too much trouble over. >> > [...] > Do you have a testcase, which shows the problem? I do indeed: create table p (a int not null) partition by range (a); create table p1 partition of p for values from (minvalue) to (maxvalue); create unique index on p1 (a); create table t (a int not null); insert into p values(1),(2); insert into t values(1); analyze p; analyze t; explain (verbose, costs off) select * from p inner join t on p.a = t.a; QUERY PLAN - Nested Loop Output: p1.a, t.a Join Filter: (p1.a = t.a) -> Seq Scan on public.t Output: t.a -> Seq Scan on public.p1 Output: p1.a (7 rows) explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a; QUERY PLAN - Nested Loop Output: p1.a, t.a Inner Unique: true Join Filter: (p1.a = t.a) -> Seq Scan on public.t Output: t.a -> Seq Scan on public.p1 Output: p1.a (8 rows) Notice that when we join to the partitioned table directly and the Append is removed, we don't get the "Inner Unique: true" -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Dec 11, 2017 at 2:42 PM, David Rowley wrote: > On 11 December 2017 at 21:18, Ashutosh Bapat > wrote: >> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley >> wrote: >>> While rebasing this today I also noticed that we won't properly detect >>> unique joins in add_paths_to_joinrel() as we're still testing for >>> uniqueness against the partitioned parent rather than the only child. >>> This is likely not a huge problem since we'll always get a false >>> negative and never a false positive, but it is a missing optimisation. >>> I've not thought of how to solve it yet, it's perhaps not worth going >>> to too much trouble over. >>> >> > [...] > >> Do you have a testcase, which shows the problem? > > I do indeed: > > create table p (a int not null) partition by range (a); > create table p1 partition of p for values from (minvalue) to (maxvalue); > create unique index on p1 (a); > create table t (a int not null); > > insert into p values(1),(2); > insert into t values(1); > > analyze p; > analyze t; > > explain (verbose, costs off) select * from p inner join t on p.a = t.a; > QUERY PLAN > - > Nested Loop >Output: p1.a, t.a >Join Filter: (p1.a = t.a) >-> Seq Scan on public.t > Output: t.a >-> Seq Scan on public.p1 > Output: p1.a > (7 rows) > > explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a; > QUERY PLAN > - > Nested Loop >Output: p1.a, t.a >Inner Unique: true >Join Filter: (p1.a = t.a) >-> Seq Scan on public.t > Output: t.a >-> Seq Scan on public.p1 > Output: p1.a > (8 rows) > > Notice that when we join to the partitioned table directly and the > Append is removed, we don't get the "Inner Unique: true" Ok. I thought we don't use unique joins in partition-wise joins. Sorry for the misunderstanding. I think this didn't work with inheritance before partition-wise join as well for the same reason. Probably, we could do it if unique paths (sorted paths which can be unique-ified) bubble up the Append hierarchy. We already have code in add_paths_to_append_rel() to create sorted paths when even a single child has an index on it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 11 December 2017 at 22:23, Ashutosh Bapat wrote: > I think this didn't work with inheritance before partition-wise join > as well for the same reason. Yeah, I was just demonstrating a reason that the plans are not identical from querying a partitioned table when partition pruning eliminates all but one partition vs directly querying that single partition. I've been attaching some regression test changes which demonstrate some other subtle differences along with each version of the patch. I just wanted to be open with these subtle differences that I've encountered while working on this. > Probably, we could do it if unique paths (sorted paths which can be > unique-ified) bubble up the Append hierarchy. We already have code in > add_paths_to_append_rel() to create sorted paths when even a single > child has an index on it. Sometime in the future, I'd like to see some sort of UniqueKeys List in RelOptInfo which is initially populated by using looking at the unique indexes during build_simple_rel(). The idea is that these UniqueKeys would get propagated into RELOPT_JOINRELs when the join condition matches a set of unique keys on the other side of the join. e.g. If the outer side of the join had a UniqueKey matching the join condition then we could take all of the UniqueKeys from the RelOptInfo for the inner side of the join (and vice versa). This infrastructure would allow us to no-op a DISTINCT when the RelOptInfo has targetlist items which completely cover at least one UniqueKey set, or not bother performing a sort for a GROUP BY when the GROUP BY clause is covered by a UniqueKey. To fix the case we're discussing, we can also propagate those UniqueKeys to an AppendPath with a single subpath similar to how I'm propagating the PathKeys for the same case in this patch, that way the unique join code would find the UniqueKeys instead of what it does today and not find any unique indexes on the partitioned table. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Dec 11, 2017 at 3:16 PM, David Rowley wrote: > > Sometime in the future, I'd like to see some sort of UniqueKeys List > in RelOptInfo which is initially populated by using looking at the > unique indexes during build_simple_rel(). The idea is that these > UniqueKeys would get propagated into RELOPT_JOINRELs when the join > condition matches a set of unique keys on the other side of the join. > e.g. If the outer side of the join had a UniqueKey matching the join > condition then we could take all of the UniqueKeys from the RelOptInfo > for the inner side of the join (and vice versa). This infrastructure > would allow us to no-op a DISTINCT when the RelOptInfo has targetlist > items which completely cover at least one UniqueKey set, or not bother > performing a sort for a GROUP BY when the GROUP BY clause is covered > by a UniqueKey. > That looks neat. > To fix the case we're discussing, we can also propagate those > UniqueKeys to an AppendPath with a single subpath similar to how I'm > propagating the PathKeys for the same case in this patch, that way the > unique join code would find the UniqueKeys instead of what it does > today and not find any unique indexes on the partitioned table. > Another possibility is to support partition-wise join between an unpartitioned table and a partitioned table by joining the unpartitioned table with each partition separately and appending the results. That's a lot of work and quite some new infrastructure. Each of those join will pick unique key if appropriate index exists on the partitions. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
I've attached a rebased patch. The previous patch was conflicting with parallel Hash Join (180428404) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2017-12-28.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 28 December 2017 at 18:31, David Rowley wrote: > I've attached a rebased patch. > > The previous patch was conflicting with parallel Hash Join (180428404) And another one. Thanks to the patch tester [1], I realised that I didn't make check-world and there was a postgres_fdw test that was failing. The attached fixes. [1] http://commitfest.cputube.org/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-01-07.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Tue, 5 Feb 2019 at 12:05, David Rowley wrote: > Rebased again. and again, due to new create_append_path() call added in ab5fcf2b04f9. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ] I continue to think that this is the wrong way to go about it, and as proof of concept present the attached, which reproduces all of the regression-test plan changes appearing in v13 --- with a whole lot less mechanism and next to no added planning cycles (which certainly cannot be said of v13). I was a bit surprised to find that I didn't need to fool around with lying about whether [Merge]Append can project. I've not dug into the exact reason why, but I suspect it's that previous changes made in support of parallelization have resulted in ensuring that we push the upper tlist down to the children anyway, at some earlier stage. I haven't looked into whether this does the right things for parallel planning --- possibly create_[merge]append_path need to propagate up parallel-related path fields from the single child? Also, I wonder why you didn't teach ExecSupportsMarkRestore that a single-child MergeAppend can support mark/restore. I didn't add such code here either, but I suspect that's an oversight. One other remark is that the division of labor between create_[merge]append_path and their costsize.c subroutines seems pretty unprincipled. I'd be inclined to push all the relevant logic into costsize.c, but have not done so here. Moving the existing cost calculations in create_mergeappend_path into costsize.c would better be done as a separate refactoring patch, perhaps. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 42108bd..2be67bc 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8427,17 +8427,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J 400 | 400 | 0008 (4 rows) --- left outer join + nullable clasue -EXPLAIN (COSTS OFF) +-- left outer join + nullable clause +EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; -QUERY PLAN - Sort - Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c - -> Append - -> Foreign Scan - Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) -(5 rows) + QUERY PLAN + + Foreign Scan + Output: t1.a, ftprt2_p1.b, ftprt2_p1.c + Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) + Remote SQL: SELECT r6.a, r9.b, r9.c FROM (public.fprt1_p1 r6 LEFT JOIN public.fprt2_p1 r9 ON (((r6.a = r9.b)) AND ((r6.b = r9.a)) AND ((r9.a < 10 WHERE ((r6.a < 10)) ORDER BY r6.a ASC NULLS LAST, r9.b ASC NULLS LAST, r9.c ASC NULLS LAST +(4 rows) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; a | b | c diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index eb9d1ad..4728511 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -2309,8 +2309,8 @@ EXPLAIN (COSTS OFF) SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3; SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3; --- left outer join + nullable clasue -EXPLAIN (COSTS OFF) +-- left outer join + nullable clause +EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c index 187f892..14ad738 100644 --- a/src/backend/executor/execAmi.c +++ b/src/backend/executor/execAmi.c @@ -447,6 +447,21 @@ ExecSupportsMarkRestore(Path *pathnode) return false; /* childless Result */ } + case T_Append: + { +AppendPath *appendPath = castNode(AppendPath, pathnode); + +/* + * If there's exact
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, 4 Mar 2019 at 16:26, Tom Lane wrote: > > David Rowley writes: > > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ] > > I continue to think that this is the wrong way to go about it, > and as proof of concept present the attached, which reproduces > all of the regression-test plan changes appearing in v13 --- > with a whole lot less mechanism and next to no added planning > cycles (which certainly cannot be said of v13). Nice. I see you solved the problem I had been complaining about by pulling the surplus Append out of the final plan after setrefs, in which case the varnos are all set to the special varnos, meaning you don't suffer from the same problem with nodes higher in the plan having the varnos of the parent instead of the append child. This method is certainly much much better than what I had. Thanks for taking the time to show me how this should be done. > Also, I wonder why you didn't teach ExecSupportsMarkRestore that a > single-child MergeAppend can support mark/restore. I didn't add such > code here either, but I suspect that's an oversight. The reason for that was that I didn't ever create any single-subpath MergeAppends. I instead created Append paths having isproxy as true. This is slightly confusing as I was just borrowing Append to act as this proxy path and it was only these proxy Appends I was dealing with in ExecSupportsMarkRestore. I'll need to modify your version of the patch as you're keeping single-subpath MergeAppends paths, so ExecSupportsMarkRestore needs to know about those. > One other remark is that the division of labor between > create_[merge]append_path and their costsize.c subroutines > seems pretty unprincipled. I'd be inclined to push all the > relevant logic into costsize.c, but have not done so here. > Moving the existing cost calculations in create_mergeappend_path > into costsize.c would better be done as a separate refactoring patch, > perhaps. The part I don't like about that is the fact that we don't generate sort paths in the MergeAppend subpaths, instead, we rely on checking pathkeys_contained_in() again in create_merge_append_plan() and just creating a sort node, if required. There is some repeat pathkeys checking there but I wonder if we'll see much a performance regression if we go to the trouble of making sort paths. Is this what you meant? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > On Mon, 4 Mar 2019 at 16:26, Tom Lane wrote: >> One other remark is that the division of labor between >> create_[merge]append_path and their costsize.c subroutines >> seems pretty unprincipled. I'd be inclined to push all the >> relevant logic into costsize.c, but have not done so here. >> Moving the existing cost calculations in create_mergeappend_path >> into costsize.c would better be done as a separate refactoring patch, >> perhaps. > The part I don't like about that is the fact that we don't generate > sort paths in the MergeAppend subpaths, instead, we rely on checking > pathkeys_contained_in() again in create_merge_append_plan() and just > creating a sort node, if required. There is some repeat pathkeys > checking there but I wonder if we'll see much a performance regression > if we go to the trouble of making sort paths. Is this what you meant? No, I'm just suggesting taking the stanza "Add up the sizes and costs of the input paths" out of create_merge_append_path and putting it into cost_merge_append. It seems weird to me that in the plain Append case, cost_append does the adding-up of the child costs, but in the MergeAppend case it's not done similarly. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, 4 Mar 2019 at 16:26, Tom Lane wrote: > I was a bit surprised to find that I didn't need to fool around > with lying about whether [Merge]Append can project. I've not dug > into the exact reason why, but I suspect it's that previous changes > made in support of parallelization have resulted in ensuring that > we push the upper tlist down to the children anyway, at some earlier > stage. I can get a plan that does end up with Result nodes above a Scan node. create table rangep (a int, b int) partition by range (a); create table rangep1 partition of rangep for values from(0) to (100); explain select r1::text from rangep r1 inner join rangep r2 on r1::text = r2::Text order by r1::text; However, if we modify is_projection_capable_plan() similar to how ExecSupportsMarkRestore() has been modified then we'd need to ensure that any changes made to the Append/MergeAppend's tlist also are made to the underlying node's tlist. I'm unsure if that's worth the complexity. What do you think? > I haven't looked into whether this does the right things for parallel > planning --- possibly create_[merge]append_path need to propagate up > parallel-related path fields from the single child? I looked at this. For Append, with the current code in add_paths_to_append_rel() we won't increase the parallel_workers higher than the parallel workers from the single child rel. The current code does: parallel_workers = Max(parallel_workers, fls(list_length(live_childrels))); so fls(1) == 1. However, I wonder if it's better to set that explicitly in create_append_path() for the list_length(pathnode->subpaths) == 1 case. The other issue I found when looking into the parallel code was that partial paths with pathkeys will get made, but will never be used. The reason is that Append can't normally do anything with these as it can't maintain their sort order and MergeAppend is not a parallel aware node. I ended up adding some new code at the bottom of generate_append_paths() to add these paths as single subpaths to Append nodes. This also required changing create_append_path() so that it records the child pathkeys when dealing with a single subpath Append. > Also, I wonder why you didn't teach ExecSupportsMarkRestore that a > single-child MergeAppend can support mark/restore. I didn't add such > code here either, but I suspect that's an oversight. Added code for that. > One other remark is that the division of labor between > create_[merge]append_path and their costsize.c subroutines > seems pretty unprincipled. I'd be inclined to push all the > relevant logic into costsize.c, but have not done so here. > Moving the existing cost calculations in create_mergeappend_path > into costsize.c would better be done as a separate refactoring patch, > perhaps. I've not touched that, but happy to do it as a separate patch. I've attached my changes to your v14 patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services drop-useless-merge-appends-15.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > [ drop-useless-merge-appends-15.patch ] Pushed with some minor adjustments. > I can get a plan that does end up with Result nodes above a Scan node. > create table rangep (a int, b int) partition by range (a); > create table rangep1 partition of rangep for values from(0) to (100); > explain select r1::text from rangep r1 inner join rangep r2 on > r1::text = r2::Text order by r1::text; > However, if we modify is_projection_capable_plan() similar to how > ExecSupportsMarkRestore() has been modified then we'd need to ensure > that any changes made to the Append/MergeAppend's tlist also are made > to the underlying node's tlist. I'm unsure if that's worth the > complexity. What do you think? I'm inclined to think not, at least not without more compelling examples than that one ;-). Note that we had Result-above-the-Append before, so we're not regressing for such cases, we're just leaving something on the table. >> One other remark is that the division of labor between >> create_[merge]append_path and their costsize.c subroutines >> seems pretty unprincipled. I'd be inclined to push all the >> relevant logic into costsize.c, but have not done so here. >> Moving the existing cost calculations in create_mergeappend_path >> into costsize.c would better be done as a separate refactoring patch, >> perhaps. > I've not touched that, but happy to do it as a separate patch. After looking at this, I'm less excited than I was before. It seems it'd be only marginally less crufty than the way it is now. In particular, costsize.c would have to be aware of the rule about single-child MergeAppend being removable, which seems a little outside its purview: typically we only ask costsize.c to estimate the cost of a plan node as-presented, not to make decisions about what the shape of the plan tree will be. So I left it alone. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Tue, 26 Mar 2019 at 09:00, Tom Lane wrote: > > David Rowley writes: > > [ drop-useless-merge-appends-15.patch ] > > Pushed with some minor adjustments. Thank for all your work on this, and thanks for the final push. FWIW, you should probably have credited yourself as the main author since I don't think there was much of my patch left. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Feb 19, 2018 at 4:02 AM, David Rowley wrote: > On 19 February 2018 at 18:01, David Rowley > wrote: >> On 19 February 2018 at 15:11, Tomas Vondra >> wrote: >>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >>> naming for boolean variables. >> >> You're right. I'll change that and post an updated patch. I'll also >> reprocess your email again and try to improve some comments to make >> the intent of the code more clear. > > I've made a few changes to the patch. "isproxy" is now "is_proxy". > I've made the comments a bit more verbose at the top of the definition > of struct AppendPath. Also shuffled some comments around in allpaths.c > > I've attached both a delta patch with just these changes, and also a > completely new patch based on a recent master. While I was working on http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com I noticed that a projection path is very close to being usable as a proxy path, because we've already got code to strip out unnecessary proxy paths at plan generation time. I noticed that there were a few problems with that which I wrote patches to fix (see 0001, 0002 attached to that email) but they seem to be only minor issues. What do you think about the idea of using a projection path as a proxy path instead of inventing a new method? It seems simple enough to do: new_path = (Path *) create_projection_path(root, new_rel, old_path, old_path->pathtarget); ...when we need a proxy path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 14 March 2018 at 09:25, Robert Haas wrote: > What do you think about the idea of using a projection path as a proxy > path instead of inventing a new method? It seems simple enough to do: > > new_path = (Path *) create_projection_path(root, new_rel, old_path, > old_path->pathtarget); > > ...when we need a proxy path. I'm very open to finding a better way to do this, but does that not just handle the targetlist issue? The proxy path also carries information which allows the translation of Vars in other places in the plan from the old rel into the vars of the new rel. Join conditions, sort clauses etc. Wouldn't a ProjectionPath just need the same additional translation fields that I've bolted onto AppendPath to make it work properly? I've looked at the projection path code but got a bit confused with the dummypp field. I see where it gets set, but not where it gets checked. A comment in create_projection_plan() seems to explain that dummypp is pretty useless since targetlists are modified sometimes, so I'm a bit unsure what the point of it is. Maybe that just goes to show that my understanding of projection paths is not that great. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Tue, Mar 13, 2018 at 8:20 PM, David Rowley wrote: > On 14 March 2018 at 09:25, Robert Haas wrote: >> What do you think about the idea of using a projection path as a proxy >> path instead of inventing a new method? It seems simple enough to do: >> >> new_path = (Path *) create_projection_path(root, new_rel, old_path, >> old_path->pathtarget); >> >> ...when we need a proxy path. > > I'm very open to finding a better way to do this, but does that not > just handle the targetlist issue? The proxy path also carries > information which allows the translation of Vars in other places in > the plan from the old rel into the vars of the new rel. Join > conditions, sort clauses etc. > > Wouldn't a ProjectionPath just need the same additional translation > fields that I've bolted onto AppendPath to make it work properly? Well, I guess I'm not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, Mar 15, 2018 at 9:22 AM, Robert Haas wrote: >> Wouldn't a ProjectionPath just need the same additional translation >> fields that I've bolted onto AppendPath to make it work properly? > > Well, I guess I'm not sure. Sorry, hit send too soon there. I'm not sure I entirely understand the purpose of those changes; I think the comments could use some work. For example: + /* +* First we must record the translation expressions in the PlannerInfo. +* These need to be found when the expression translation is being done +* when the final plan is being assembled. +*/ + /* +* Record this child as having been promoted. Some places treat child +* relations in a special way, and this will give them a VIP ticket to +* adulthood, where required. +*/ These comments just recapitulate what the code does, without really explaining what problem we're trying to solve ("need" for unspecified reasons, unspecified "special" handling). That said, I gather that one problem is the path might contain references to child varnos where we need to reference parent varnos. That does seem like something we need to handle, but I'm not sure whether this is really the right method. I wonder if we couldn't deduce the necessary translations from the parent pointers of the paths. That is, if we've got a proxy path associated with the parent rel and the path it's proxying is associated with the child rel, then we need to translate from realpath->parent->relids to proxypath->parent_relids. Not that this is an argument of overwhelming import, but note that ExecSupportsMarkRestore wouldn't need to change if we used ProjectionPath; that's already got the required handling. If we stick with your idea of using AppendPath, do we actually need generate_proxy_paths()? What paths get lost if we don't have a special case for them here? I find promote_child_rel_equivalences() kind of scary. It seems like a change that might have unintended and hard-to-predict consequences. Perhaps that's only my own lack of knowledge. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Robert Haas writes: > That said, I gather that one problem is the path might contain > references to child varnos where we need to reference parent varnos. > That does seem like something we need to handle, but I'm not sure > whether this is really the right method. I wonder if we couldn't > deduce the necessary translations from the parent pointers of the > paths. That is, if we've got a proxy path associated with the parent > rel and the path it's proxying is associated with the child rel, then > we need to translate from realpath->parent->relids to > proxypath->parent_relids. I hadn't been paying much attention to this thread, but I've now taken a quick look at the 2018-02-19 patch, and I've got to say I do not like it much. The changes in createplan.c in particular seem like hack-and- slash rather than anything principled or maintainable. The core issue here is that Paths involving the appendrel and higher will contain Vars referencing the appendrel's varno, whereas the child is set up to emit Vars containing its own varno, and somewhere we've got to match those up. I don't think though that this problem is exactly specific to single-member Appends, and so I would rather we not invent a solution that's specific to that. A nearly identical issue is getting rid of no-op SubqueryScan nodes. I've long wished we could simply not emit those in the first place, but it's really hard to do because of the fact that Vars inside the subquery have different varnos from those outside. (I've toyed with the idea of globally flattening the rangetable before we start planning, not at the end, but haven't made it happen yet; and anyway that would be only one step towards such a goal.) It might be worth looking at whether we couldn't fix the single-member- Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c get rid of them. That's not the most beautiful solution perhaps, but it'd be very localized and low-risk. In general setrefs.c is the right place to deal with variable-matching issues. So even if you don't like that specific idea, it'd probably be worth thinking about handling this by recording instructions telling setrefs what to do, instead of actually doing it at earlier stages. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane wrote: > It might be worth looking at whether we couldn't fix the single-member- > Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c > get rid of them. That's not the most beautiful solution perhaps, but > it'd be very localized and low-risk. That's definitely a thought; it's a probably the simplest way of saving the run-time cost of the Append node. However, I don't think it's a great solution overall because it doesn't get us the other advantages that David mentions in his original post. I think that to gain those advantages we'll need to know at path-creation time that there won't ultimately be an Append node in the finished plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Robert Haas writes: > On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane wrote: >> It might be worth looking at whether we couldn't fix the single-member- >> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c >> get rid of them. That's not the most beautiful solution perhaps, but >> it'd be very localized and low-risk. > That's definitely a thought; it's a probably the simplest way of > saving the run-time cost of the Append node. However, I don't think > it's a great solution overall because it doesn't get us the other > advantages that David mentions in his original post. I think that to > gain those advantages we'll need to know at path-creation time that > there won't ultimately be an Append node in the finished plan. Meh. We could certainly know that by inspection ("only one child? it'll be history"). I remain of the opinion that this is a big patch with a small patch struggling to get out. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 28 December 2017 at 18:31, David Rowley wrote: > I've attached a rebased patch. > > The previous patch was conflicting with parallel Hash Join (180428404) And another one. Thanks to the patch tester [1], I realised that I didn't make check-world and there was a postgres_fdw test that was failing. The attached fixes. [1] http://commitfest.cputube.org/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-01-07.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
I've attached an updated patch which takes care of the build failure caused by the new call to create_append_path added in bb94ce4. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-01-26.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Hi, On 01/25/2018 01:55 PM, David Rowley wrote: > I've attached an updated patch which takes care of the build failure > caused by the new call to create_append_path added in bb94ce4. > I've been looking at the patch over the past few days. I haven't found any obvious issues with it, but I do have a couple of questions. 1) I can confirm that it indeed eliminates the Append overhead, using the example from [1], modified to use table with a single partition. Of course, this is a pretty formal check, because the patch simply removes the Append node and so it'd be utterly broken if the overhead did not disappear too. [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com 2) If I had to guess where the bugs will be, my guess would be a missing finalize_plan_tlist call - not necessarily in existing code, but perhaps in future improvements. I wonder if we could automate this call somehow, say by detecting when create_plan_recurse is called after build_path_tlist (for a given path), and if needed calling finalize_plan_tlist from create_plan_recurse. Although, now that I look at it, promote_child_relation may be doing something like that by adding stuff to the translate_* variables. I'm not quite sure why we need to append to the lists, though? 3) I'm not quite sure what replace_translatable_exprs does, or more precisely why do we actually need it. At this point the comment pretty much says "We need to do translation. This function does translation." Perhaps it should mention why removing/skipping the AppendPath implies the need for translation or something? 4) The thread talks about "[Merge]Append" but the patch apparently only tweaks create_append_path and does absolutely nothing to "merge" case. While looking into why, I notices that generate_mergeappend_paths always gets all_child_pathkeys=NULL in this case (with single subpath), and so we never create the MergePath at all. I suppose there's some condition somewhere responsible for doing that, but I haven't found it ... 5) The comment before AppendPath typedef says this: * An AppendPath with a single subpath can be set up to become a "proxy" * path. This allows a Path which belongs to one relation to be added to * the pathlist of some other relation. This is intended as generic * infrastructure, but its primary use case is to allow Appends with * only a single subpath to be removed from the final plan. I'm not quite sure how adding a 'isproxy' flag into AppendPath node makes this a generic infrastructure? Wouldn't an explicit ProxyPath be a better way to achieve that? What other proxy paths do you envision, and why should they be represented as AppendPath? Although, there seems to be a precedent in using AppendPath to represent dummy paths ... FWIW one advantage of a separate ProxyPath would be that it would be a much clearer breakage for third-party code (say, extensions tweaking paths using hooks), either at compile or execution time. With just a new flag in existing node, that may easily slip through. On the other hand no one promises the structures to be stable API ... 6) I suggest we add this assert to create_append_path: Assert(!isproxy || (isproxy && (list_length(subpaths)==1))); and perhaps we should do s/isproxy/is_proxy/ which seems like the usual naming for boolean variables. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 19 February 2018 at 15:11, Tomas Vondra wrote: > 1) I can confirm that it indeed eliminates the Append overhead, using > the example from [1], modified to use table with a single partition. Of > course, this is a pretty formal check, because the patch simply removes > the Append node and so it'd be utterly broken if the overhead did not > disappear too. > > [1] > https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com Thanks for testing that. > 2) If I had to guess where the bugs will be, my guess would be a missing > finalize_plan_tlist call - not necessarily in existing code, but perhaps > in future improvements. I agree with that. I'd say though that additionally for the future that we might end up with some missing translation calls to replace_translatable_exprs(). To try to ensure I didn't miss anything, I did previously modify the regression test tables "tenk" and "tenk1" to become partitioned tables with a single partition which allowed all possible values to try to ensure I'd not missed anywhere. I just failed to find a reasonable way to make this a more permanent test without enforcing that change on the tables as a permanent change. > I wonder if we could automate this call somehow, say by detecting when > create_plan_recurse is called after build_path_tlist (for a given path), > and if needed calling finalize_plan_tlist from create_plan_recurse. Sounds nice, but I'm unsure how we could do that. > Although, now that I look at it, promote_child_relation may be doing > something like that by adding stuff to the translate_* variables. Do you mean this? /* * Record this child as having been promoted. Some places treat child * relations in a special way, and this will give them a VIP ticket to * adulthood, where required. */ root->translated_childrelids = bms_add_members(root->translated_childrelids, child->relids); That's to get around a problem in use_physical_tlist() where there's different behaviour for RELOPT_OTHER_MEMBER_REL than there is for RELOPT_BASEREL. Another option might be to instead change the RELOPT_OTHER_MEMBER_REL into RELOPT_BASEREL, although I was a little too scared that might result in many other areas requiring being changed. I may be wrong about that though. We do, for example, occasionally change a RELOPT_BASEREL into a RELOPT_DEADREL, so RelOptInfos changing their RelOptKind is not a new concept, so perhaps it's fine. > I'm > not quite sure why we need to append to the lists, though? Multiple Append rels may be replaced by their only-child relation, so we'd need to be able to translate the targetlists for both. For example, joins above the Appends may contain Vars which belong to either of the Appends and those need to be translated into the promoted child relation's Vars. > 3) I'm not quite sure what replace_translatable_exprs does, or more > precisely why do we actually need it. At this point the comment pretty > much says "We need to do translation. This function does translation." > Perhaps it should mention why removing/skipping the AppendPath implies > the need for translation or something? It does mention "Expr translation is required to translate the targetlists of nodes above the Append", but perhaps I can make that a bit more clear. Let's say we have a query such as: SELECT * FROM fact f INNER JOIN dim1 d ON f.dim1_id = d.id WHERE f.date BETWEEN '2018-01-01' AND '2018-01-31'; Which results in a hash join and a single Jan 2018 partition being scanned for "fact". A plan without the Append node having been removed would have the Append targetlist containing the Vars from the "fact" table, as would the Hash Join node... The problem this function is trying to solve is translating the Vars in the Hash Join node (or any node above the Append) to change the "fact" Vars into "fact_jan2018" Vars. In create_plan.c we skip over any isproxy Append paths and instead return the recursive call to the single proxy-path. Without the translation we'd run into problems when trying to find Vars in the targetlists later. > 4) The thread talks about "[Merge]Append" but the patch apparently only > tweaks create_append_path and does absolutely nothing to "merge" case. > While looking into why, I notices that generate_mergeappend_paths always > gets all_child_pathkeys=NULL in this case (with single subpath), and so > we never create the MergePath at all. I suppose there's some condition > somewhere responsible for doing that, but I haven't found it ... Yeah, only Append paths are used as proxy paths. The confusion here is that the path creation logic has also been changed in allpaths.c (see generate_proxy_paths()), so that we no longer build MergeAppend paths when there's a single subpath. It's probably just the fact that Append is being hi-jacked to act as ProxyPath that's causing the confusion there. If you look over generate_proxy_paths and where it gets called from, you'll see that we, instead of creating App
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 19 February 2018 at 18:01, David Rowley wrote: > On 19 February 2018 at 15:11, Tomas Vondra > wrote: >> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >> naming for boolean variables. > > You're right. I'll change that and post an updated patch. I'll also > reprocess your email again and try to improve some comments to make > the intent of the code more clear. I've made a few changes to the patch. "isproxy" is now "is_proxy". I've made the comments a bit more verbose at the top of the definition of struct AppendPath. Also shuffled some comments around in allpaths.c I've attached both a delta patch with just these changes, and also a completely new patch based on a recent master. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-02-19_delta.patch Description: Binary data remove_singleton_appends_2018-02-19.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 16 March 2018 at 02:46, Robert Haas wrote: > If we stick with your idea of using AppendPath, do we actually need > generate_proxy_paths()? What paths get lost if we don't have a > special case for them here? Actually, we do a surprisingly good job of allowing plan shapes to stay the same when planning for an only-child scan for an Append or MergeAppend. The main difference I did find was that and Append and MergeAppend don't support Mark and Restore, so if you just generated the same paths and simply skipped over Appends and MergeAppends you'd still be left with Materialize nodes which might not actually be required at all. This might not be huge, but seeing this made me worried that there might be some valid reason, if not today, then sometime in the future why it might not be safe to simply pluck the singleton Append/MergeAppend nodes out the plan tree. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 16 March 2018 at 04:01, Tom Lane wrote: > I hadn't been paying much attention to this thread, but I've now taken > a quick look at the 2018-02-19 patch, and I've got to say I do not like > it much. The changes in createplan.c in particular seem like hack-and- > slash rather than anything principled or maintainable. Thanks for looking at this. I didn't manage to discover any other working solutions to when the Vars can be replaced. If we don't do this in createplan.c then it's going to cause problems in places such as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c gets hold of the plan. > The core issue here is that Paths involving the appendrel and higher > will contain Vars referencing the appendrel's varno, whereas the child > is set up to emit Vars containing its own varno, and somewhere we've got > to match those up. I don't think though that this problem is exactly > specific to single-member Appends, and so I would rather we not invent a > solution that's specific to that. A nearly identical issue is getting > rid of no-op SubqueryScan nodes. I've long wished we could simply not > emit those in the first place, but it's really hard to do because of > the fact that Vars inside the subquery have different varnos from those > outside. (I've toyed with the idea of globally flattening the rangetable > before we start planning, not at the end, but haven't made it happen yet; > and anyway that would be only one step towards such a goal.) I'm not quite sure why you think the solution I came up with is specific to single-member Appends. The solution merely uses single-member Append paths as a proxy path for the solution which I've tried to make generic to any node type. For example, the patch also resolves the issue for MergeAppend, so certainly nothing in there is specific to single-member Appends. I could have made the proxy any other path type, it's just that you had suggested Append would be better than inventing ProxyPath, which is what I originally proposed. > It might be worth looking at whether we couldn't fix the single-member- > Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c > get rid of them. That's not the most beautiful solution perhaps, but > it'd be very localized and low-risk. It might be possible, but wouldn't that only solve 1 out of 2 problems. The problem of the planner not generating the most optimal plan is ignored with this. For example, it does not make much sense to bolt a Materialize node on top of an IndexScan node in order to provide the IndexScan with mark/restore capabilities... They already allow that. > In general setrefs.c is the right place to deal with variable-matching > issues. So even if you don't like that specific idea, it'd probably be > worth thinking about handling this by recording instructions telling > setrefs what to do, instead of actually doing it at earlier stages. >From what I can see, setrefs.c is too late. ERRORs are generated before setrefs.c gets hold of the plan if we don't replace Vars. I'm not opposed to finding a better way to do this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, 3 Jan 2019 at 08:01, Tomas Vondra wrote: > AFAICS the patch essentially does two things: (a) identifies Append > paths with a single member and (b) ensures the Vars are properly mapped > when the Append node is skipped when creating the plan. Yes, but traditional Append and MergeAppend paths with a single member are never actually generated. I say "traditional" as we do happen to use a single-subpath Append as the "proxy" path to represent a path that needs to not have a plan node generated during createplan. Originally I had a new Path type named "ProxyPath" to do this, but per recommendation from Tom, I overloaded Append to mean that. (Append already has its meaning overloaded in regards to dummy paths.) > I agree doing (a) in setrefs.c is too late, as mentioned earlier. But > perhaps we could do at least (b) to setrefs.c? The problem I found with doing var translations in setrefs.c was that the plan tree is traversed there breadth-first and in createplan.c we build the plan depth-first. The problem I didn't manage to overcome with that is that when we replace a "ProxyPath" (actually an Append path marked as is_proxy=true) it only alter targetlists, etc of nodes above that in the plan tree. The nodes below it and their targetlists don't care, as these don't fetch Vars from nodes above them. If we do the Var translation in setrefs then we've not yet looked at the nodes that don't care and we've already done the nodes that do care. So the tree traversal is backwards. I sort of borrowed the traversal in createplan.c since it was in the correct depth-first order. Equally, I could have invented an entirely new stage that traverses the path tree depth-first, but I imagined that would have added more overhead and raised even more questions. Piggy-backing on createplan seemed like the best option at the time. > I'd say mapping the Vars is the most fragile and error-prone piece of > this patch. It's a bit annoying that it's inventing a new infrastructure > to do that, translates expressions in various places, establishes new > rules for tlists (having to call finalize_plan_tlist when calling > build_path_tlist before create_plan_recurse), etc. I entirely agree. It's by far the worst part of the patch. Getting the code to a working state to do this was hard. I kept finding places that I'd missed the translation. I'd rather it worked some other way. I just don't yet know what that way is. I changed the core regression table "tenk" to become a partitioned table with a single partition in the hope to catch all of the places that need the translations to be performed. I'm not entirely confident that I've caught them all by doing this. I've attached an updated patch that's really just a rebased version of v8. The old version conflicted with changes made in 1db5667b. The only real change was to the commit message. I'd previously managed to miss out the part about not generating needless Append/MergeAppend paths can allow plan shapes that were previously not possible. I'd only mentioned the executor not having to pull tuples through these nodes, which increases performance. Not having this perhaps caused some of the confusion earlier in this thread. One drawback to this patch, which I'm unsure if I previously mentioned is that run-time pruning only works for Append and MergeAppend. If we remove the Append/MergeAppend node then the single Append/MergeAppend sub-plan has no hope of being pruned. Going from 1 to 0 sub-plans may not be that common, but no longer supporting that is something that needs to be considered. I had imagined that gained flexibility of plan shapes plus less executor overhead was better than that, but there likely is a small use-case for keeping that ability. It seems impossible to have both advantages of this patch without that disadvantage. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v9-0001-Forgo-generating-single-subpath-Append-and-MergeA.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
I've attached an updated patch. The last one no longer applied due to the changes made in d723f5687 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v10-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
David Rowley writes: > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra > wrote: >> AFAICS the patch essentially does two things: (a) identifies Append >> paths with a single member and (b) ensures the Vars are properly mapped >> when the Append node is skipped when creating the plan. >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But >> perhaps we could do at least (b) to setrefs.c? > The problem I found with doing var translations in setrefs.c was that > the plan tree is traversed there breadth-first and in createplan.c we > build the plan depth-first. The problem I didn't manage to overcome > with that is that when we replace a "ProxyPath" (actually an Append > path marked as is_proxy=true) it only alter targetlists, etc of nodes > above that in the plan tree. The nodes below it and their targetlists > don't care, as these don't fetch Vars from nodes above them. If we do > the Var translation in setrefs then we've not yet looked at the nodes > that don't care and we've already done the nodes that do care. So the > tree traversal is backwards. I don't quite buy this argument, because you haven't given a reason why it doesn't apply with equal force to setrefs.c's elimination of no-op SubqueryScan nodes. Yet that does work. Stepping back for a minute: even if we get this to work, how often is it going to do anything useful? It seems like most of the other development that's going on is trying to postpone pruning till later, so I wonder how often we'll really know at the beginning of planning that only one child will survive. regards, tom lane
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Sat, 26 Jan 2019 at 13:51, Tom Lane wrote: > > David Rowley writes: > > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra > > wrote: > >> AFAICS the patch essentially does two things: (a) identifies Append > >> paths with a single member and (b) ensures the Vars are properly mapped > >> when the Append node is skipped when creating the plan. > >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But > >> perhaps we could do at least (b) to setrefs.c? > > > The problem I found with doing var translations in setrefs.c was that > > the plan tree is traversed there breadth-first and in createplan.c we > > build the plan depth-first. The problem I didn't manage to overcome > > with that is that when we replace a "ProxyPath" (actually an Append > > path marked as is_proxy=true) it only alter targetlists, etc of nodes > > above that in the plan tree. The nodes below it and their targetlists > > don't care, as these don't fetch Vars from nodes above them. If we do > > the Var translation in setrefs then we've not yet looked at the nodes > > that don't care and we've already done the nodes that do care. So the > > tree traversal is backwards. > > I don't quite buy this argument, because you haven't given a reason > why it doesn't apply with equal force to setrefs.c's elimination of > no-op SubqueryScan nodes. Yet that does work. I assume you're talking about the code that's in set_subqueryscan_references() inside the trivial_subqueryscan() condition? If so, then that seems pretty different from what I'm doing here because trivial_subqueryscan() only ever returns true when the Vars/Exprs from the subquery's target list match the scan's targetlist exactly, in the same order. It's not quite that simple with the single-subpath Append/MergeAppend case since the Vars/Exprs won't be the same since they're from different relations and possibly in some different order. > Stepping back for a minute: even if we get this to work, how often > is it going to do anything useful? It seems like most of the other > development that's going on is trying to postpone pruning till later, > so I wonder how often we'll really know at the beginning of planning > that only one child will survive. It'll do something useful everytime the planner would otherwise produce an Append or MergeAppend with a single subpath. Pruning down to just 1 relation is pretty common, so seems worth optimising for. There's no other development that postpones pruning until later. If you're talking about run-time pruning then nothing is "postponed". We still attempt to prune during planning, it's just that we might not be able to prune in some cases until during execution, so we may do both. You make it sound like we're trying to do away with plan-time pruning, but that's not happening, in fact, we're working pretty hard to speed up the planner when plan-time pruning *can* be done. So far there are no patches to speed up planner for partitioned tables when the planner can't prune any partitions, although it is something that I've been giving some thought to. I've also attached a rebased patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v11-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, 31 Jan 2019 at 17:22, David Rowley wrote: > I've also attached a rebased patch. Rebased again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v12-0001-Forgo-generating-single-subpath-Append-and-Merge.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Now that the September 'fest is open for new patches I'm going to move this patch over there. This patch has become slightly less important than some other stuff, but I'd still like to come back to it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, 2 Jul 2018 at 09:37, David Rowley wrote: > Now that the September 'fest is open for new patches I'm going to move > this patch over there. This patch has become slightly less important > than some other stuff, but I'd still like to come back to it. This had bit-rotted quite a bit, so I've rebased it on top of today's master. There's not really any other changes to speak of. I've not reevaluated the patch to see if there is any more simple way to take care of this problem yet. However, I do recall getting off to quite a few false starts with this patch and the way I have it now was the only way I saw to make it possible. Although, perhaps some more experience will show me something different. Anyway, I've attached the rebased version. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v8-0001-Remove-single-subpath-Append-and-MergeAppend-node.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 4/1/18 9:26 AM, David Rowley wrote: > On 16 March 2018 at 04:01, Tom Lane wrote: >> I hadn't been paying much attention to this thread, but I've now taken >> a quick look at the 2018-02-19 patch, and I've got to say I do not like >> it much. The changes in createplan.c in particular seem like hack-and- >> slash rather than anything principled or maintainable. > > Thanks for looking at this. I didn't manage to discover any other > working solutions to when the Vars can be replaced. If we don't do > this in createplan.c then it's going to cause problems in places such > as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c > gets hold of the plan. > >> The core issue here is that Paths involving the appendrel and higher >> will contain Vars referencing the appendrel's varno, whereas the child >> is set up to emit Vars containing its own varno, and somewhere we've got >> to match those up. I don't think though that this problem is exactly >> specific to single-member Appends, and so I would rather we not invent a >> solution that's specific to that. A nearly identical issue is getting >> rid of no-op SubqueryScan nodes. I've long wished we could simply not >> emit those in the first place, but it's really hard to do because of >> the fact that Vars inside the subquery have different varnos from those >> outside. (I've toyed with the idea of globally flattening the rangetable >> before we start planning, not at the end, but haven't made it happen yet; >> and anyway that would be only one step towards such a goal.) > > I'm not quite sure why you think the solution I came up with is > specific to single-member Appends. The solution merely uses > single-member Append paths as a proxy path for the solution which I've > tried to make generic to any node type. For example, the patch also > resolves the issue for MergeAppend, so certainly nothing in there is > specific to single-member Appends. I could have made the proxy any > other path type, it's just that you had suggested Append would be > better than inventing ProxyPath, which is what I originally proposed. > I think a natural question is how the approach devised in this thread (based on single-member Append paths) could be used to deal with no-op Subquery nodes. I don't see an obvious way to do that, and I guess that's what Tom meant by "specific to single-member Appends". >> It might be worth looking at whether we couldn't fix the single-member- >> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c >> get rid of them. That's not the most beautiful solution perhaps, but >> it'd be very localized and low-risk. > > It might be possible, but wouldn't that only solve 1 out of 2 > problems. The problem of the planner not generating the most optimal > plan is ignored with this. For example, it does not make much sense to > bolt a Materialize node on top of an IndexScan node in order to > provide the IndexScan with mark/restore capabilities... They already > allow that. > Yeah, I think we can't do all the magic in setrefs.c if we want the decisions to affect plan shape in any significant way - the decision whether an Append node has only a single node needs to happen while constructing the path or shortly after it (before we build other paths on top of it). And before costing the path. So setrefs.c seems a too late for that :-( I suppose this limitation was acceptable for no-op Subqueries, but I'm not sure the same thing applies to single-member Appends. >> In general setrefs.c is the right place to deal with variable-matching >> issues. So even if you don't like that specific idea, it'd probably be >> worth thinking about handling this by recording instructions telling >> setrefs what to do, instead of actually doing it at earlier stages. > > From what I can see, setrefs.c is too late. ERRORs are generated > before setrefs.c gets hold of the plan if we don't replace Vars. > > I'm not opposed to finding a better way to do this. > AFAICS the patch essentially does two things: (a) identifies Append paths with a single member and (b) ensures the Vars are properly mapped when the Append node is skipped when creating the plan. I agree doing (a) in setrefs.c is too late, as mentioned earlier. But perhaps we could do at least (b) to setrefs.c? I'd say mapping the Vars is the most fragile and error-prone piece of this patch. It's a bit annoying that it's inventing a new infrastructure to do that, translates expressions in various places, establishes new rules for tlists (having to call finalize_plan_tlist when calling build_path_tlist before create_plan_recurse), etc. But we already do such mappings (not exactly the same, but similar) for other purposes - from setrefs.c. So why can't why do it the same way here? FWIW I haven't tried to do any of that, and I haven't looked on this patch for quite a long time, so perhaps I'm missing an obvious obstacle preventing us from doing that ... regards -- Tomas Vo
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Jul 02, 2018 at 09:37:31AM +1200, David Rowley wrote: > Now that the September 'fest is open for new patches I'm going to move > this patch over there. This patch has become slightly less important > than some other stuff, but I'd still like to come back to it. Please note that the latest patch does not apply anymore, so I moved it to CF 2018-11 (already this time of the year!), waiting for your input. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Wed, Nov 15, 2017 at 3:17 PM, David Rowley wrote: > The remove_singleton_appends_examples_of_differences_2017-11-15.patch > which I've attached applies changes to the regression tests to make > many of the major tables partitioned tables with a single partition. > It also changes the expected results of the small insignificant plan > changes and leaves only the actual true differences. This file is not > intended for commit, but more just a demo of it working for a larger > variety of plan shapes. There is actually no additional tests with the > patch. It relies on the places in the existing tests which have > changed. I didn't add any extra as I can't think of any particular > area to target given that it's such a generic thing that can apply to > so many different cases. The last patch set does not apply and did not get any reviews. So I am moving this item to the next with waiting on author as status. -- Michael
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 30 November 2017 at 15:34, Michael Paquier wrote: > On Wed, Nov 15, 2017 at 3:17 PM, David Rowley > wrote: > > The remove_singleton_appends_examples_of_differences_2017-11-15.patch > > which I've attached applies changes to the regression tests to make > > many of the major tables partitioned tables with a single partition. > > It also changes the expected results of the small insignificant plan > > changes and leaves only the actual true differences. This file is not > > intended for commit, but more just a demo of it working for a larger > > variety of plan shapes. There is actually no additional tests with the > > patch. It relies on the places in the existing tests which have > > changed. I didn't add any extra as I can't think of any particular > > area to target given that it's such a generic thing that can apply to > > so many different cases. > > The last patch set does not apply and did not get any reviews. So I am > moving this item to the next with waiting on author as status. > (returning from 2 weeks leave) Thanks for managing the commitfest. I've attached a patch which fixes the conflict with the regression test expected output in inherits.out. I've also made changes to the expected output in the new partition_prune test's expected output. I'll set this back to waiting on review now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_append_rel_2017-11-30.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Thu, Dec 7, 2017 at 5:11 AM, David Rowley wrote: > > While rebasing this today I also noticed that we won't properly detect > unique joins in add_paths_to_joinrel() as we're still testing for > uniqueness against the partitioned parent rather than the only child. > This is likely not a huge problem since we'll always get a false > negative and never a false positive, but it is a missing optimisation. > I've not thought of how to solve it yet, it's perhaps not worth going > to too much trouble over. > It's only the jointrelids that are parent's relids, but all the following tests for unique join use RelOptInfo::relids which are child relids. case JOIN_UNIQUE_INNER: extra.inner_unique = bms_is_subset(sjinfo->min_lefthand, outerrel->relids); break; case JOIN_UNIQUE_OUTER: extra.inner_unique = innerrel_is_unique(root, outerrel->relids, innerrel, JOIN_INNER, restrictlist, false); break; default: extra.inner_unique = innerrel_is_unique(root, outerrel->relids, innerrel, jointype, restrictlist, false); break; Do you have a testcase, which shows the problem? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 11 December 2017 at 21:18, Ashutosh Bapat wrote: > On Thu, Dec 7, 2017 at 5:11 AM, David Rowley > wrote: >> While rebasing this today I also noticed that we won't properly detect >> unique joins in add_paths_to_joinrel() as we're still testing for >> uniqueness against the partitioned parent rather than the only child. >> This is likely not a huge problem since we'll always get a false >> negative and never a false positive, but it is a missing optimisation. >> I've not thought of how to solve it yet, it's perhaps not worth going >> to too much trouble over. >> > [...] > Do you have a testcase, which shows the problem? I do indeed: create table p (a int not null) partition by range (a); create table p1 partition of p for values from (minvalue) to (maxvalue); create unique index on p1 (a); create table t (a int not null); insert into p values(1),(2); insert into t values(1); analyze p; analyze t; explain (verbose, costs off) select * from p inner join t on p.a = t.a; QUERY PLAN - Nested Loop Output: p1.a, t.a Join Filter: (p1.a = t.a) -> Seq Scan on public.t Output: t.a -> Seq Scan on public.p1 Output: p1.a (7 rows) explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a; QUERY PLAN - Nested Loop Output: p1.a, t.a Inner Unique: true Join Filter: (p1.a = t.a) -> Seq Scan on public.t Output: t.a -> Seq Scan on public.p1 Output: p1.a (8 rows) Notice that when we join to the partitioned table directly and the Append is removed, we don't get the "Inner Unique: true" -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Dec 11, 2017 at 2:42 PM, David Rowley wrote: > On 11 December 2017 at 21:18, Ashutosh Bapat > wrote: >> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley >> wrote: >>> While rebasing this today I also noticed that we won't properly detect >>> unique joins in add_paths_to_joinrel() as we're still testing for >>> uniqueness against the partitioned parent rather than the only child. >>> This is likely not a huge problem since we'll always get a false >>> negative and never a false positive, but it is a missing optimisation. >>> I've not thought of how to solve it yet, it's perhaps not worth going >>> to too much trouble over. >>> >> > [...] > >> Do you have a testcase, which shows the problem? > > I do indeed: > > create table p (a int not null) partition by range (a); > create table p1 partition of p for values from (minvalue) to (maxvalue); > create unique index on p1 (a); > create table t (a int not null); > > insert into p values(1),(2); > insert into t values(1); > > analyze p; > analyze t; > > explain (verbose, costs off) select * from p inner join t on p.a = t.a; > QUERY PLAN > - > Nested Loop >Output: p1.a, t.a >Join Filter: (p1.a = t.a) >-> Seq Scan on public.t > Output: t.a >-> Seq Scan on public.p1 > Output: p1.a > (7 rows) > > explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a; > QUERY PLAN > - > Nested Loop >Output: p1.a, t.a >Inner Unique: true >Join Filter: (p1.a = t.a) >-> Seq Scan on public.t > Output: t.a >-> Seq Scan on public.p1 > Output: p1.a > (8 rows) > > Notice that when we join to the partitioned table directly and the > Append is removed, we don't get the "Inner Unique: true" Ok. I thought we don't use unique joins in partition-wise joins. Sorry for the misunderstanding. I think this didn't work with inheritance before partition-wise join as well for the same reason. Probably, we could do it if unique paths (sorted paths which can be unique-ified) bubble up the Append hierarchy. We already have code in add_paths_to_append_rel() to create sorted paths when even a single child has an index on it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 11 December 2017 at 22:23, Ashutosh Bapat wrote: > I think this didn't work with inheritance before partition-wise join > as well for the same reason. Yeah, I was just demonstrating a reason that the plans are not identical from querying a partitioned table when partition pruning eliminates all but one partition vs directly querying that single partition. I've been attaching some regression test changes which demonstrate some other subtle differences along with each version of the patch. I just wanted to be open with these subtle differences that I've encountered while working on this. > Probably, we could do it if unique paths (sorted paths which can be > unique-ified) bubble up the Append hierarchy. We already have code in > add_paths_to_append_rel() to create sorted paths when even a single > child has an index on it. Sometime in the future, I'd like to see some sort of UniqueKeys List in RelOptInfo which is initially populated by using looking at the unique indexes during build_simple_rel(). The idea is that these UniqueKeys would get propagated into RELOPT_JOINRELs when the join condition matches a set of unique keys on the other side of the join. e.g. If the outer side of the join had a UniqueKey matching the join condition then we could take all of the UniqueKeys from the RelOptInfo for the inner side of the join (and vice versa). This infrastructure would allow us to no-op a DISTINCT when the RelOptInfo has targetlist items which completely cover at least one UniqueKey set, or not bother performing a sort for a GROUP BY when the GROUP BY clause is covered by a UniqueKey. To fix the case we're discussing, we can also propagate those UniqueKeys to an AppendPath with a single subpath similar to how I'm propagating the PathKeys for the same case in this patch, that way the unique join code would find the UniqueKeys instead of what it does today and not find any unique indexes on the partitioned table. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On Mon, Dec 11, 2017 at 3:16 PM, David Rowley wrote: > > Sometime in the future, I'd like to see some sort of UniqueKeys List > in RelOptInfo which is initially populated by using looking at the > unique indexes during build_simple_rel(). The idea is that these > UniqueKeys would get propagated into RELOPT_JOINRELs when the join > condition matches a set of unique keys on the other side of the join. > e.g. If the outer side of the join had a UniqueKey matching the join > condition then we could take all of the UniqueKeys from the RelOptInfo > for the inner side of the join (and vice versa). This infrastructure > would allow us to no-op a DISTINCT when the RelOptInfo has targetlist > items which completely cover at least one UniqueKey set, or not bother > performing a sort for a GROUP BY when the GROUP BY clause is covered > by a UniqueKey. > That looks neat. > To fix the case we're discussing, we can also propagate those > UniqueKeys to an AppendPath with a single subpath similar to how I'm > propagating the PathKeys for the same case in this patch, that way the > unique join code would find the UniqueKeys instead of what it does > today and not find any unique indexes on the partitioned table. > Another possibility is to support partition-wise join between an unpartitioned table and a partitioned table by joining the unpartitioned table with each partition separately and appending the results. That's a lot of work and quite some new infrastructure. Each of those join will pick unique key if appropriate index exists on the partitions. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
I've attached a rebased patch. The previous patch was conflicting with parallel Hash Join (180428404) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2017-12-28.patch Description: Binary data