Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-19 Thread Zeugswetter Andreas ADI SD
Tom Lane writes:
 Attached is some material from an updated src/backend/optimizer/README
 that describes the optimization principles that the EquivalenceClass
 rewrite is depending on.  Can anyone see any holes in the logic?

Sounds good, I can see no holes.

   SELECT *
 FROM a LEFT JOIN
  (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
  ON a.x = ss.y
 WHERE a.x = 42;
 
 We can form the below-outer-join EquivalenceClass {b.y c.z 10} and
thereby
 apply c.z = 10 while scanning c.  (The reason we disallow
outerjoin-delayed
 clauses from forming EquivalenceClasses is exactly that we want to be
able
 to push any derived clauses as far down as possible.)  But once above
the
 outer join it's no longer necessarily the case that b.y = 10, and thus
we
 cannot use such EquivalenceClasses to conclude that sorting is
unnecessary
 (see discussion of PathKeys below).  In this example, notice also that
 a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
 applicability to b is restricted by the outer join; thus we do not
make
 the mistake of concluding b.y = 42, even though we do have an
equivalence
 class for {a.x 42}.

I am not sure I understand the logic behind the above restriction
though.
Although b.y cannot be in the EquivalenceClass as described, it still
seems 
important/possible to push down b.y = 42 into ss. In above query ss can
then
even be const false (b.y=10 and b.y=42). Because of the outer join ss
can be 
null. Put another way (changing ss.y to ss.w (w col in table b)):
 
SELECT *
  FROM a LEFT JOIN
   (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
   ON a.x = ss.w
  WHERE a.x = 42;

You can inject ss.w=42 into the ss where clause.

It seems what we want in addition to EquivalenceClasses, is logic to
push
(or rather copy) down a restriction but keep the upperlevel part of it
for
outer joins.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-19 Thread Tom Lane
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 Tom Lane writes:
 SELECT *
 FROM a LEFT JOIN
 (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
 ON a.x = ss.y
 WHERE a.x = 42;
 
 ... In this example, notice also that
 a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
 applicability to b is restricted by the outer join; thus we do not make
 the mistake of concluding b.y = 42, even though we do have an equivalence
 class for {a.x 42}.

 I am not sure I understand the logic behind the above restriction
 though.  Although b.y cannot be in the EquivalenceClass as described,
 it still seems important/possible to push down b.y = 42 into ss.

Hmmm ... yeah, you're right, this example needs revision because we
actually do create {b.y 42} as a below outer join equivalence.
In fact with patch I get a plan like

 Nested Loop Left Join  (cost=76.05..139.42 rows=1331 width=12)
   -  Seq Scan on a  (cost=0.00..36.75 rows=11 width=4)
 Filter: (x = 42)
   -  Materialize  (cost=76.05..77.26 rows=121 width=8)
 -  Result  (cost=36.76..75.93 rows=121 width=8)
   One-Time Filter: (42 = 10)
   -  Nested Loop  (cost=36.76..75.93 rows=121 width=8)
 -  Seq Scan on b  (cost=0.00..36.75 rows=11 width=4)
   Filter: (y = 10)
 -  Materialize  (cost=36.76..36.87 rows=11 width=4)
   -  Seq Scan on c  (cost=0.00..36.75 rows=11 width=4)
 Filter: (z = 10)

which'll cause it to not evaluate the b/c join at all, as you suggested.
8.2 also realizes that b.y=42 is required, but it's a lot stupider about
what to do with the knowledge:

 Hash Left Join  (cost=81.79..118.59 rows=11 width=12)
   Hash Cond: (a.x = b.y)
   -  Seq Scan on a  (cost=0.00..36.75 rows=11 width=4)
 Filter: (x = 42)
   -  Hash  (cost=81.65..81.65 rows=11 width=8)
 -  Hash Join  (cost=42.11..81.65 rows=11 width=8)
   Hash Cond: (c.z = b.y)
   -  Seq Scan on c  (cost=0.00..31.40 rows=2140 width=4)
   -  Hash  (cost=42.10..42.10 rows=1 width=4)
 -  Seq Scan on b  (cost=0.00..42.10 rows=1 width=4)
   Filter: ((y = 10) AND (y = 42))

Notice 8.2 also fails to derive c.z=10.

 It seems what we want in addition to EquivalenceClasses, is logic to
 push (or rather copy) down a restriction but keep the upperlevel part
 of it for outer joins.

No, the bit that I was missing when I wrote that sentence was the
concept of a below outer join EquivalenceClass that allows values
to go to null.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-18 Thread Simon Riggs
On Thu, 2007-01-18 at 11:53 +1100, Gavin Sherry wrote:
 the major rule in the executor: do what ever the plan tells you to do.

I thought the rule was: do what the plan tells you to do, as efficiently
as possible.

Turning an explicit step into a no-op seems like great execution to me.

In the case you mention, the HashJoin node already looks inside its Hash
node child. It seems possible to have the Sort node check at ExecInit
time that it is sitting on a HashJoin node, so that at Exec time it can
ask the HashJoin node whether the Hash node has spilled to disk or not.
If not, it can just pass the rows through as a no-op.

We could formalise a last words API so that parent nodes sometimes
calls child nodes before they execute them, or maybe just leave it
somewhat dirty as the H/HJ communication is now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-18 Thread Teodor Sigaev

Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
pathkeys since we can say nothing about the overall order of its result.


It's seems to me that multi-pass indexscan was removed after introducing 
bitmapscan.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-18 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
 pathkeys since we can say nothing about the overall order of its result.

 It's seems to me that multi-pass indexscan was removed after introducing 
 bitmapscan.

Yeah, but it might come back someday, so I didn't feel a need to change
that sentence...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-18 Thread Teodor Sigaev

Note that a bitmap scan or multi-pass indexscan (OR clause scan) has NIL
pathkeys since we can say nothing about the overall order of its result.

Yeah, but it might come back someday, so I didn't feel a need to change
that sentence...


Hmm. Our OR patch makes the same possibility by using Append node - without
reintroducing multi-pass indexscan. Moreover, it allows to sort OR clauses to 
match sort order.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Design notes for EquivalenceClasses

2007-01-17 Thread Tom Lane
Attached is some material from an updated src/backend/optimizer/README
that describes the optimization principles that the EquivalenceClass
rewrite is depending on.  Can anyone see any holes in the logic?

I'm particularly interested in the discussion of allowing
EquivalenceClasses to be deduced from clauses below an outer join.
This is something our current code doesn't do at all, which is bad
because what had been a well-optimized view might fall apart the
moment you left-join to it.  I just today decided that I understood
how to do that, and haven't finished revising the patch to support it.
So if anyone sees a reason it won't work, please speak up ...

regards, tom lane


EquivalenceClasses
--

During the deconstruct_jointree() scan of the query's qual clauses, we look
for mergejoinable equality clauses A = B whose applicability is not delayed
by an outer join; these are called equivalence clauses.  When we find
one, we create an EquivalenceClass containing the expressions A and B to
record this knowledge.  If we later find another equivalence clause B = C,
we add C to the existing EquivalenceClass for {A B}; this may require
merging two existing EquivalenceClasses.  At the end of the scan, we have
sets of values that are known all transitively equal to each other.  We can
therefore use a comparison of any pair of the values as a restriction or
join clause (when these values are available at the scan or join, of
course); furthermore, we need test only one such comparison, not all of
them.  Therefore, equivalence clauses are removed from the standard qual
distribution process.  Instead, when preparing a restriction or join clause
list, we examine each EquivalenceClass to see if it can contribute a
clause, and if so we select an appropriate pair of values to compare.  For
example, if we are trying to join A's relation to C's, we can generate the
clause A = C, even though this appeared nowhere explicitly in the original
query.  This may allow us to explore join paths that otherwise would have
been rejected as requiring Cartesian-product joins.

Sometimes an EquivalenceClass may contain a pseudo-constant expression
(i.e., one not containing Vars or Aggs of the current query level, nor
volatile functions).  In this case we do not follow the policy of
dynamically generating join clauses: instead, we dynamically generate
restriction clauses var = const wherever one of the variable members of
the class can first be computed.  For example, if we have A = B and B = 42,
we effectively generate the restriction clauses A = 42 and B = 42, and then
we need not bother with explicitly testing the join clause A = B when the
relations are joined.  In effect, all the class members can be tested at
relation-scan level and there's never a need for join tests.

The precise technical interpretation of an EquivalenceClass is that it
asserts that at any plan node where more than one of its member values
can be computed, output rows in which the values are not all equal may
be discarded without affecting the query result.  (We require all levels
of the plan to enforce EquivalenceClasses, hence a node need not recheck
equality of values that were computable by one of its children.)  For an
ordinary EquivalenceClass that is valid everywhere, we can further infer
that the values are all non-null, because all mergejoinable operators are
strict.  However, we also allow equivalence clauses that appear below the
nullable side of an outer join to form EquivalenceClasses; for these
classes, the interpretation is that either all the values are equal, or
all (except pseudo-constants) have gone to null.  (This requires a
limitation that non-constant members be strict, else they might not go
to null when the other members do.)  Consider for example

SELECT *
  FROM a LEFT JOIN
   (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
   ON a.x = ss.y
  WHERE a.x = 42;

We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
clauses from forming EquivalenceClasses is exactly that we want to be able
to push any derived clauses as far down as possible.)  But once above the
outer join it's no longer necessarily the case that b.y = 10, and thus we
cannot use such EquivalenceClasses to conclude that sorting is unnecessary
(see discussion of PathKeys below).  In this example, notice also that
a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
applicability to b is restricted by the outer join; thus we do not make
the mistake of concluding b.y = 42, even though we do have an equivalence
class for {a.x 42}.

To aid in determining the sort ordering(s) that can work with a mergejoin,
we mark each mergejoinable clause with the EquivalenceClasses of its left
and right inputs.  For an equivalence clause, these are of course the same
EquivalenceClass.  

Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-17 Thread Gavin Sherry
On Wed, 17 Jan 2007, Tom Lane wrote:

 strict.  However, we also allow equivalence clauses that appear below the
 nullable side of an outer join to form EquivalenceClasses; for these
 classes, the interpretation is that either all the values are equal, or
 all (except pseudo-constants) have gone to null.  (This requires a
 limitation that non-constant members be strict, else they might not go
 to null when the other members do.)  Consider for example

   SELECT *
 FROM a LEFT JOIN
  (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
  ON a.x = ss.y
 WHERE a.x = 42;

 We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
 apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
 clauses from forming EquivalenceClasses is exactly that we want to be able
 to push any derived clauses as far down as possible.)  But once above the
 outer join it's no longer necessarily the case that b.y = 10, and thus we
 cannot use such EquivalenceClasses to conclude that sorting is unnecessary
 (see discussion of PathKeys below).  In this example, notice also that
 a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
 applicability to b is restricted by the outer join; thus we do not make
 the mistake of concluding b.y = 42, even though we do have an equivalence
 class for {a.x 42}.

This seems to be correct. A nice improvement.

 With a little further thought, it becomes apparent that nestloop joins
 can also produce sorted output.  For example, if we do a nestloop join
 between outer relation A and inner relation B, then any pathkeys relevant
 to A are still valid for the join result: we have not altered the order of
 the tuples from A.  Even more interesting, if there was an equivalence clause
 A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
 that B.Y is a pathkey for the join result; X was ordered before and still
 is, and the joined values of Y are equal to the joined values of X, so Y
 must now be ordered too.  This is true even though we used neither an
 explicit sort nor a mergejoin on Y.  (Note: hash joins cannot be counted
 on to preserve the order of their outer relation, because the executor
 might decide to batch the join, so we always set pathkeys to NIL for
 a hashjoin path.)  Exception: a RIGHT or FULL join doesn't preserve the
 ordering of its outer relation, because it might insert nulls at random
 points in the ordering.

I was thinking about this, but in relation to hash joins. A hash join
cannot be guaranteed to produce output sorted according to the pathkey of
the outer relation (as explained in the existing README). I wonder,
however, if it might be useful for hash join to pass a hint that the
output is known ordered (i.e., the join was not split into multiple
batches).

Thanks,

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-17 Thread Tom Lane
Gavin Sherry [EMAIL PROTECTED] writes:
 I was thinking about this, but in relation to hash joins. A hash join
 cannot be guaranteed to produce output sorted according to the pathkey of
 the outer relation (as explained in the existing README). I wonder,
 however, if it might be useful for hash join to pass a hint that the
 output is known ordered (i.e., the join was not split into multiple
 batches).

Yeah, I've considered that, but I think it'd have to be the other way
around: the planner tells the executor that it's assuming the output is
sorted, hence do not split into multiple batches.  This has the usual
assortment of problems if the planner has badly misestimated the
rowcount :-(

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Design notes for EquivalenceClasses

2007-01-17 Thread Gavin Sherry
On Wed, 17 Jan 2007, Tom Lane wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
  I was thinking about this, but in relation to hash joins. A hash join
  cannot be guaranteed to produce output sorted according to the pathkey of
  the outer relation (as explained in the existing README). I wonder,
  however, if it might be useful for hash join to pass a hint that the
  output is known ordered (i.e., the join was not split into multiple
  batches).

 Yeah, I've considered that, but I think it'd have to be the other way
 around: the planner tells the executor that it's assuming the output is
 sorted, hence do not split into multiple batches.  This has the usual
 assortment of problems if the planner has badly misestimated the
 rowcount :-(

Yep, I thought of that and discarded it for the reason you state.

I still think there would be some benefit to passing a hint up the
execution tree, effectively turning explicit sorts into no ops. This,
however, breaks the major rule in the executor: do what ever the plan
tells you to do.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly