Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions
On Mon, Jul 7, 2014 at 11:46 PM, Greg Stark st...@mit.edu wrote: On Mon, Jul 7, 2014 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I doubt it. The extra code isn't the problem so much, it's the extra planning cycles spent trying to make proofs that will usually fail. What you propose will create a combinatorial explosion in the number of proof paths to be considered. Well, not necessarily. You only need to consider constraints on matching columns and only when there's a join column on those columns. So you could imagine, for example, sorting all the constraints by the eclass for the non-const side of their expression, then going through them linearly to see where you have multiple constraints on the same eclass. For what it's worth I think there is a case where this is a common optimization. When you have multiple tables partitioned the same way. Then you ideally want to be able to turn that from an join of multiple appends into an append of multiple joins. This is even more important when it comes to a parallelizing executor since it lets you do the joins in parallel. Ah, right. Also, if the foreign tables come under the inheritance hierarchy, and we want push joins to foreign servers. However to get from here to there I guess you would need to turn the join of the appends into NxM joins of every pair of subscans and then figure out which ones to exclude. That would be pretty nuts. To do it reasonably we probably need the partitioning infrastructure we've been talking about where Postgres would know what the partitioning key is and the order and range of the partitions so it can directly generate the matching subjoins in less than n^2 time. -- greg -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: I doubt it. The extra code isn't the problem so much, it's the extra planning cycles spent trying to make proofs that will usually fail. What you propose will create a combinatorial explosion in the number of proof paths to be considered. I can understand that it will create combinatorial explosion in the number of predicates that need to be examined by the constraint exclusion. I do not understand where come the paths gets involved here. The constraint exclusion kicks in before paths are created Perhaps I should not have used the term path here, because I wasn't referring to what the planner calls Paths. I just meant that there will be many more ways to perhaps prove a constraint-exclusion result, and the planner will have to go through them all. (Usually to no avail, because how often do people write queries that are guaranteed to produce no rows?) An example of what I meant by combinatorial explosion is that if a clause mentions K variables, and each of those variables has been equated to M other variables, there are (M+1)^K possible derived clauses, and it looks to me like we'd have to consider them all to catch the sort of situation you're talking about. I think this is going to require a *lot* of additional planner cycles, and TBH you've not provided any very compelling example of why it's worthwhile. Yeah, if you can exclude a large partition it's a win, but how often is that going to happen in real-world queries? The one example you gave seemed pretty artificial to me, ie unrelated to typical partitioning constraints. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions
On Tue, Jul 8, 2014 at 7:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: I doubt it. The extra code isn't the problem so much, it's the extra planning cycles spent trying to make proofs that will usually fail. What you propose will create a combinatorial explosion in the number of proof paths to be considered. I can understand that it will create combinatorial explosion in the number of predicates that need to be examined by the constraint exclusion. I do not understand where come the paths gets involved here. The constraint exclusion kicks in before paths are created Perhaps I should not have used the term path here, because I wasn't referring to what the planner calls Paths. I just meant that there will be many more ways to perhaps prove a constraint-exclusion result, and the planner will have to go through them all. (Usually to no avail, because how often do people write queries that are guaranteed to produce no rows?) An example of what I meant by combinatorial explosion is that if a clause mentions K variables, and each of those variables has been equated to M other variables, there are (M+1)^K possible derived clauses, and it looks to me like we'd have to consider them all to catch the sort of situation you're talking about. I agree. I think this is going to require a *lot* of additional planner cycles, and TBH you've not provided any very compelling example of why it's worthwhile. Yeah, if you can exclude a large partition it's a win, but how often is that going to happen in real-world queries? The one example you gave seemed pretty artificial to me, ie unrelated to typical partitioning constraints. Yes, the example is a cooked one to show the problem. But the case can be encountered easily in partitioned tables. A partitioned table (having constraints on each partition) with few table level constraints, can undergo queries with some clauses in WHERE clause which yield empty results for one or more partitions. Saving some table scans would be worth the time spent in bringing up (M+1)^K derived clauses and examining those. Greg's example about parallel joins or joins between partitioned foreign tables would yield much better results, if we have this optimization. But, I think, I will wait till parallel joins or partitioned foreign tables is a reality in PostgreSQL. regards, tom lane -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: Right now, constraint exclusion code looks only at the provided conditions. If we want avoid table scan based on constraints in the above example, it will need to look at the implied conditions as well. E.g. val2 30 AND val = val2 = val 30. Then the constraint exclusion can see that val 30 AND val 30 are contradictory and infer that the result is going to be empty. We will need to add information about the transitive inferences between operators. Can we do that in PostgreSQL? Will the benefits be worth the code, that needs to be added? I doubt it. The extra code isn't the problem so much, it's the extra planning cycles spent trying to make proofs that will usually fail. What you propose will create a combinatorial explosion in the number of proof paths to be considered. I can see some more benefits. We can use it to eliminate joins where the constraints on joining relations may cause the join to be empty e.g. ... and applying constraint exclusion to join relations would make that problem even worse. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions
On Mon, Jul 7, 2014 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: I doubt it. The extra code isn't the problem so much, it's the extra planning cycles spent trying to make proofs that will usually fail. What you propose will create a combinatorial explosion in the number of proof paths to be considered. Well, not necessarily. You only need to consider constraints on matching columns and only when there's a join column on those columns. So you could imagine, for example, sorting all the constraints by the eclass for the non-const side of their expression, then going through them linearly to see where you have multiple constraints on the same eclass. For what it's worth I think there is a case where this is a common optimization. When you have multiple tables partitioned the same way. Then you ideally want to be able to turn that from an join of multiple appends into an append of multiple joins. This is even more important when it comes to a parallelizing executor since it lets you do the joins in parallel. However to get from here to there I guess you would need to turn the join of the appends into NxM joins of every pair of subscans and then figure out which ones to exclude. That would be pretty nuts. To do it reasonably we probably need the partitioning infrastructure we've been talking about where Postgres would know what the partitioning key is and the order and range of the partitions so it can directly generate the matching subjoins in less than n^2 time. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions
On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: Right now, constraint exclusion code looks only at the provided conditions. If we want avoid table scan based on constraints in the above example, it will need to look at the implied conditions as well. E.g. val2 30 AND val = val2 = val 30. Then the constraint exclusion can see that val 30 AND val 30 are contradictory and infer that the result is going to be empty. We will need to add information about the transitive inferences between operators. Can we do that in PostgreSQL? Will the benefits be worth the code, that needs to be added? I doubt it. The extra code isn't the problem so much, it's the extra planning cycles spent trying to make proofs that will usually fail. What you propose will create a combinatorial explosion in the number of proof paths to be considered. I can understand that it will create combinatorial explosion in the number of predicates that need to be examined by the constraint exclusion. I do not understand where come the paths gets involved here. The constraint exclusion kicks in before paths are created 220 static void 221 set_rel_size(PlannerInfo *root, RelOptInfo *rel, 222 Index rti, RangeTblEntry *rte) 223 { 224 if (rel-reloptkind == RELOPT_BASEREL 225 relation_excluded_by_constraints(root, rel, rte)) 226 { 227 /* 228 * We proved we don't need to scan the rel via constraint exclusion, 229 * so set up a single dummy path for it. Here we only check this for 230 * regular baserels; if it's an otherrel, CE was already checked in 231 * set_append_rel_pathlist(). 232 * 233 * In this case, we go ahead and set up the relation's path right away 234 * instead of leaving it for set_rel_pathlist to do. This is because 235 * we don't have a convention for marking a rel as dummy except by 236 * assigning a dummy path to it. 237 */ 238 set_dummy_rel_pathlist(rel); 239 } and does not create paths for relations excluded by constraints 295 /* 296 * set_rel_pathlist 297 *Build access paths for a base relation 298 */ 299 static void 300 set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, 301 Index rti, RangeTblEntry *rte) 302 { 303 if (IS_DUMMY_REL(rel)) 304 { 305 /* We already proved the relation empty, so nothing more to do */ 306 } Same in the case of join in mark_join_rel() 663 * JOIN_INNER. 664 */ 665 switch (sjinfo-jointype) 666 { 667 case JOIN_INNER: 668 if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || 669 restriction_is_constant_false(restrictlist, false)) 670 { 671 mark_dummy_rel(joinrel); 672 break; 673 } 674 add_paths_to_joinrel(root, joinrel, rel1, rel2, 675 JOIN_INNER, sjinfo, 676 restrictlist); 677 add_paths_to_joinrel(root, joinrel, rel2, rel1, 678 JOIN_INNER, sjinfo, 679 restrictlist); 680 break; BTW, on a side note, I noticed, we use mark_dummy_rel() for joinrels and set_dummy_rel_pathlist() for baserel. Shouldn't we be using the same function everywhere for the same functionality (e.g. adding an append path with no child-paths). For larger relations, the time spent in constraint exclusion might be lesser than the time taken by actual table/index scan and that to when such a scan is not going to produce any rows. So, we might want to apply the technique only when the estimated number of rows/pages are above a certain threshold and may be when the GUC is ON (like we do it today). I can see some more benefits. We can use it to eliminate joins where the constraints on joining relations may cause the join to be empty e.g. ... and applying constraint exclusion to join relations would make that problem even worse. The same case goes with joins, where potentially, the crossproduct of two tables is huge. regards, tom lane -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company