Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

2014-07-08 Thread Ashutosh Bapat
On Mon, Jul 7, 2014 at 11:46 PM, Greg Stark st...@mit.edu wrote:

 On Mon, Jul 7, 2014 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I doubt it.  The extra code isn't the problem so much, it's the extra
  planning cycles spent trying to make proofs that will usually fail.
  What you propose will create a combinatorial explosion in the number
  of proof paths to be considered.

 Well, not necessarily. You only need to consider constraints on
 matching columns and only when there's a join column on those columns.
 So you could imagine, for example, sorting all the constraints by the
 eclass for the non-const side of their expression, then going through
 them linearly to see where you have multiple constraints on the same
 eclass.

 For what it's worth I think there is a case where this is a common
 optimization. When you have multiple tables partitioned the same way.
 Then you ideally want to be able to turn that from an join of multiple
 appends into an append of multiple joins. This is even more important
 when it comes to a parallelizing executor since it lets you do the
 joins in parallel.


Ah, right. Also, if the foreign tables come under the inheritance
hierarchy,  and we want push joins to foreign servers.


 However to get from here to there I guess you would need to turn the
 join of the appends into NxM joins of every pair of subscans and then
 figure out which ones to exclude. That would be pretty nuts. To do it
 reasonably we probably need the partitioning infrastructure we've been
 talking about where Postgres would know what the partitioning key is
 and the order and range of the partitions so it can directly generate
 the matching subjoins in less than n^2 time.

 --
 greg




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

2014-07-08 Thread Tom Lane
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I doubt it.  The extra code isn't the problem so much, it's the extra
 planning cycles spent trying to make proofs that will usually fail.
 What you propose will create a combinatorial explosion in the number
 of proof paths to be considered.

 I can understand that it will create combinatorial explosion in the number
 of predicates that need to be examined by the constraint exclusion. I do
 not understand where come the paths gets involved here. The constraint
 exclusion kicks in before paths are created

Perhaps I should not have used the term path here, because I wasn't
referring to what the planner calls Paths.  I just meant that there will
be many more ways to perhaps prove a constraint-exclusion result, and the
planner will have to go through them all.  (Usually to no avail, because
how often do people write queries that are guaranteed to produce no rows?)

An example of what I meant by combinatorial explosion is that if a clause
mentions K variables, and each of those variables has been equated to M
other variables, there are (M+1)^K possible derived clauses, and it looks
to me like we'd have to consider them all to catch the sort of situation
you're talking about.

I think this is going to require a *lot* of additional planner cycles,
and TBH you've not provided any very compelling example of why it's
worthwhile.  Yeah, if you can exclude a large partition it's a win,
but how often is that going to happen in real-world queries?  The one
example you gave seemed pretty artificial to me, ie unrelated to typical
partitioning constraints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

2014-07-08 Thread Ashutosh Bapat
On Tue, Jul 8, 2014 at 7:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
  On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I doubt it.  The extra code isn't the problem so much, it's the extra
  planning cycles spent trying to make proofs that will usually fail.
  What you propose will create a combinatorial explosion in the number
  of proof paths to be considered.

  I can understand that it will create combinatorial explosion in the
 number
  of predicates that need to be examined by the constraint exclusion. I do
  not understand where come the paths gets involved here. The constraint
  exclusion kicks in before paths are created

 Perhaps I should not have used the term path here, because I wasn't
 referring to what the planner calls Paths.  I just meant that there will
 be many more ways to perhaps prove a constraint-exclusion result, and the
 planner will have to go through them all.

(Usually to no avail, because
 how often do people write queries that are guaranteed to produce no rows?)

 An example of what I meant by combinatorial explosion is that if a clause
 mentions K variables, and each of those variables has been equated to M
 other variables, there are (M+1)^K possible derived clauses, and it looks
 to me like we'd have to consider them all to catch the sort of situation
 you're talking about.


I agree.



 I think this is going to require a *lot* of additional planner cycles,
 and TBH you've not provided any very compelling example of why it's
 worthwhile.  Yeah, if you can exclude a large partition it's a win,
 but how often is that going to happen in real-world queries?  The one
 example you gave seemed pretty artificial to me, ie unrelated to typical
 partitioning constraints.


Yes, the example is a cooked one to show the problem. But the case can be
encountered easily in partitioned tables. A partitioned table (having
constraints on each partition) with few table level constraints, can
undergo queries with some clauses in WHERE clause which yield empty results
for one or more partitions. Saving some table scans would be worth the time
spent in bringing up (M+1)^K derived clauses and examining those. Greg's
example about parallel joins or joins between partitioned foreign tables
would yield much better results, if we have this optimization. But, I
think, I will wait till parallel joins or partitioned foreign tables is a
reality in PostgreSQL.


 regards, tom lane




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

2014-07-07 Thread Tom Lane
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 Right now, constraint exclusion code looks only at the provided conditions.
 If we want avoid table scan based on constraints in the above example, it
 will need to look at the implied conditions as well. E.g. val2  30 AND val
 = val2 = val  30. Then the constraint exclusion can see that val  30 AND
 val  30 are contradictory and infer that the result is going to be empty.
 We will need to add information about the transitive inferences between
 operators. Can we do that in PostgreSQL? Will the benefits be worth the
 code, that needs to be added?

I doubt it.  The extra code isn't the problem so much, it's the extra
planning cycles spent trying to make proofs that will usually fail.
What you propose will create a combinatorial explosion in the number
of proof paths to be considered.

 I can see some more benefits. We can use it to eliminate joins where the
 constraints on joining relations may cause the join to be empty e.g.

... and applying constraint exclusion to join relations would make that
problem even worse.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

2014-07-07 Thread Greg Stark
On Mon, Jul 7, 2014 at 3:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I doubt it.  The extra code isn't the problem so much, it's the extra
 planning cycles spent trying to make proofs that will usually fail.
 What you propose will create a combinatorial explosion in the number
 of proof paths to be considered.

Well, not necessarily. You only need to consider constraints on
matching columns and only when there's a join column on those columns.
So you could imagine, for example, sorting all the constraints by the
eclass for the non-const side of their expression, then going through
them linearly to see where you have multiple constraints on the same
eclass.

For what it's worth I think there is a case where this is a common
optimization. When you have multiple tables partitioned the same way.
Then you ideally want to be able to turn that from an join of multiple
appends into an append of multiple joins. This is even more important
when it comes to a parallelizing executor since it lets you do the
joins in parallel.

However to get from here to there I guess you would need to turn the
join of the appends into NxM joins of every pair of subscans and then
figure out which ones to exclude. That would be pretty nuts. To do it
reasonably we probably need the partitioning infrastructure we've been
talking about where Postgres would know what the partitioning key is
and the order and range of the partitions so it can directly generate
the matching subjoins in less than n^2 time.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending constraint exclusion for implied constraints/conditions

2014-07-07 Thread Ashutosh Bapat
On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
  Right now, constraint exclusion code looks only at the provided
 conditions.
  If we want avoid table scan based on constraints in the above example, it
  will need to look at the implied conditions as well. E.g. val2  30 AND
 val
  = val2 = val  30. Then the constraint exclusion can see that val  30
 AND
  val  30 are contradictory and infer that the result is going to be
 empty.
  We will need to add information about the transitive inferences between
  operators. Can we do that in PostgreSQL? Will the benefits be worth the
  code, that needs to be added?

 I doubt it.  The extra code isn't the problem so much, it's the extra
 planning cycles spent trying to make proofs that will usually fail.
 What you propose will create a combinatorial explosion in the number
 of proof paths to be considered.


I can understand that it will create combinatorial explosion in the number
of predicates that need to be examined by the constraint exclusion. I do
not understand where come the paths gets involved here. The constraint
exclusion kicks in before paths are created

 220 static void
 221 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 222  Index rti, RangeTblEntry *rte)
 223 {
 224 if (rel-reloptkind == RELOPT_BASEREL 
 225 relation_excluded_by_constraints(root, rel, rte))
 226 {
 227 /*
 228  * We proved we don't need to scan the rel via constraint
exclusion,
 229  * so set up a single dummy path for it.  Here we only check
this for
 230  * regular baserels; if it's an otherrel, CE was already
checked in
 231  * set_append_rel_pathlist().
 232  *
 233  * In this case, we go ahead and set up the relation's path
right away
 234  * instead of leaving it for set_rel_pathlist to do.  This is
because
 235  * we don't have a convention for marking a rel as dummy
except by
 236  * assigning a dummy path to it.
 237  */
 238 set_dummy_rel_pathlist(rel);
 239 }

 and does not create paths for relations excluded by constraints

 295 /*
 296  * set_rel_pathlist
 297  *Build access paths for a base relation
 298  */
 299 static void
 300 set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 301  Index rti, RangeTblEntry *rte)
 302 {
 303 if (IS_DUMMY_REL(rel))
 304 {
 305 /* We already proved the relation empty, so nothing more to do
*/
 306 }

Same in the case of join in mark_join_rel()

 663  * JOIN_INNER.
 664  */
 665 switch (sjinfo-jointype)
 666 {
 667 case JOIN_INNER:
 668 if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
 669 restriction_is_constant_false(restrictlist, false))
 670 {
 671 mark_dummy_rel(joinrel);
 672 break;
 673 }
 674 add_paths_to_joinrel(root, joinrel, rel1, rel2,
 675  JOIN_INNER, sjinfo,
 676  restrictlist);
 677 add_paths_to_joinrel(root, joinrel, rel2, rel1,
 678  JOIN_INNER, sjinfo,
 679  restrictlist);
 680 break;

BTW, on a side note, I noticed, we use mark_dummy_rel() for joinrels and
set_dummy_rel_pathlist() for baserel. Shouldn't we be using the same
function everywhere for the same functionality (e.g. adding an append path
with no child-paths).

For larger relations, the time spent in constraint exclusion might be
lesser than the time taken by actual table/index scan and that to when such
a scan is not going to produce any rows. So, we might want to apply the
technique only when the estimated number of rows/pages are above a certain
threshold and may be when the GUC is ON (like we do it today).


  I can see some more benefits. We can use it to eliminate joins where the
  constraints on joining relations may cause the join to be empty e.g.

 ... and applying constraint exclusion to join relations would make that
 problem even worse.


The same case goes with joins, where potentially, the crossproduct of two
tables is huge.


 regards, tom lane




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company