Re: Making Vars outer-join aware

2023-07-23 Thread Anton A. Melnikov

On 08.06.2023 19:58, Tom Lane wrote:

I think the right thing here is not either of your patches, but
to tweak adjust_relid_set() to not fail on negative oldrelid.
I'll go make it so.


Thanks! This fully solves the problem with ChangeVarNodes() that i wrote above.



Hmm.  That implies that you're changing plan data structures around after
setrefs.c, which doesn't seem like a great design to me --- IMO that ought
to happen in PlanCustomPath, which will still see the original varnos.


My further searchers led to the fact that it is possible to immediately set the
necessary varnos during custom_scan->scan.plan.targetlist creation and leave the
сustom_scan->custom_scan_tlist = NIL rather than changing them later using 
ChangeVarNodes().
This resulted in a noticeable code simplification.
Thanks a lot for pointing on it!

Sincerely yours,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Making Vars outer-join aware

2023-06-08 Thread Tom Lane
[ back from PGCon ... ]

"Anton A. Melnikov"  writes:
> On 04.05.2023 15:22, Tom Lane wrote:
>> Under what circumstances would you be trying to inject INDEX_VAR
>> into a nullingrel set?  Only outer-join relids should ever appear there.

> The thing is that i don't try to push INDEX_VAR into a nullingrel set at all,
> i just try to replace the existing rt_index that equals to INDEX_VAR in Var 
> nodes with
> the defined positive indexes by using ChangeVarNodes_walker() function call.

Hmm.  That implies that you're changing plan data structures around after
setrefs.c, which doesn't seem like a great design to me --- IMO that ought
to happen in PlanCustomPath, which will still see the original varnos.
However, it's probably not worth breaking existing code for this, so
now I agree that ChangeVarNodes ought to (continue to) allow negative
rt_index.

> Therefore it also seems better and more logical to me in the case of an index 
> that
> cannot possibly be a member of the Bitmapset, immediately return false.
> Here is a patch like that.

I do not like the blast radius of this patch.  Yes, I know about that
comment in bms_is_member --- I wrote it, if memory serves.  But it's
stood like that for more than two decades, and I believe it's caught
its share of mistakes.  This issue doesn't seem like a sufficient
reason to change a globally-visible behavior.

I think the right thing here is not either of your patches, but
to tweak adjust_relid_set() to not fail on negative oldrelid.
I'll go make it so.

regards, tom lane




Re: Making Vars outer-join aware

2023-05-29 Thread Anton A. Melnikov

Hello and sorry for the big delay in reply!

On 04.05.2023 15:22, Tom Lane wrote:

AFAICS the change you propose would serve only to mask bugs.


Yes, it's a bad decision.


Under what circumstances would you be trying to inject INDEX_VAR
into a nullingrel set?  Only outer-join relids should ever appear there.


The thing is that i don't try to push INDEX_VAR into a nullingrel set at all,
i just try to replace the existing rt_index that equals to INDEX_VAR in Var 
nodes with
the defined positive indexes by using ChangeVarNodes_walker() function call. It 
checks
if the nullingrel contains the existing rt_index and does it need to be updated 
too.
It calls bms_is_member() for that, but the last immediately throws an error
if the value to be checked is negative like INDEX_VAR.

But we are not trying to corrupt the existing nullingrel with this bad index,
so it doesn't seem like an serious error.
And this index certainly cannot be a member of the Bitmapset.

Therefore it also seems better and more logical to me in the case of an index 
that
cannot possibly be a member of the Bitmapset, immediately return false.

Here is a patch like that.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit cc1724759b898efc703867a83d38173e4b2794b5
Author: Anton A. Melnikov 
Date:   Mon May 29 13:52:42 2023 +0300

Return false from bms_is_member() if checked value is negative.

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 7ba3cf635b..3e1db5fda2 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -446,9 +446,9 @@ bms_is_member(int x, const Bitmapset *a)
 	int			wordnum,
 bitnum;
 
-	/* XXX better to just return false for x<0 ? */
+	/* bitmapset member cannot be negative */
 	if (x < 0)
-		elog(ERROR, "negative bitmapset member not allowed");
+		return false;
 	if (a == NULL)
 		return false;
 	wordnum = WORDNUM(x);


Re: Making Vars outer-join aware

2023-05-04 Thread Tom Lane
"Anton A. Melnikov"  writes:
> The thing is that for custom scan nodes as readme says:
> "INDEX_VAR is abused to signify references to columns of a custom scan tuple 
> type"
> But INDEX_VAR has a negative value, so it can not be used in varnullingrels 
> bitmapset.
> And therefore this improvement seems will not work with custom scan nodes and 
> some
> extensions that use such nodes.

Under what circumstances would you be trying to inject INDEX_VAR
into a nullingrel set?  Only outer-join relids should ever appear there.
AFAICS the change you propose would serve only to mask bugs.

regards, tom lane




Re: Making Vars outer-join aware

2023-05-04 Thread Anton A. Melnikov

Hello!

I'm having doubts about this fix but most likely i don't understand something.
Could you help me to figure it out, please.

The thing is that for custom scan nodes as readme says:
"INDEX_VAR is abused to signify references to columns of a custom scan tuple 
type"
But INDEX_VAR has a negative value, so it can not be used in varnullingrels 
bitmapset.
And therefore this improvement seems will not work with custom scan nodes and 
some
extensions that use such nodes.

If i'm wrong in my doubts and bitmapset for varnullingrels is ok, may be add a 
check before
adjust_relid_set() call like this:

@@ -569,9 +569,10 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context 
*context)
{
if (var->varno == context->rt_index)
var->varno = context->new_index;
-   var->varnullingrels = 
adjust_relid_set(var->varnullingrels,
-  
context->rt_index,
-  
context->new_index);
+   if (context->rt_index >= 0 && context->new_index >= 0)
+   var->varnullingrels = 
adjust_relid_set(var->varnullingrels,
+  
context->rt_index,
+

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Making Vars outer-join aware

2023-02-13 Thread Justin Pryzby
On Mon, Feb 13, 2023 at 03:33:15PM +0800, Richard Guo wrote:
> On Mon, Feb 13, 2023 at 7:58 AM Justin Pryzby  wrote:
> > The patch broke this query:
> >
> > select from pg_inherits inner join information_schema.element_types
> > right join (select from pg_constraint as sample_2) on true
> > on false, lateral (select scope_catalog, inhdetachpending from
> > pg_publication_namespace limit 3);
> > ERROR:  could not devise a query plan for the given query

> BTW, here is a simplified query that can trigger this issue on HEAD.
> 
> select * from t1 inner join t2 left join (select null as c from t3 left
> join t4 on true) as sub on true on true, lateral (select c, t1.a from t5
> offset 0 ) ss;

It probably doesn't need to be said that the original query was reduced
from sqlsmith...  But I mention that now to make it searchable.

Thanks,
-- 
Justin




Re: Making Vars outer-join aware

2023-02-13 Thread Tom Lane
Richard Guo  writes:
> Thanks for the report!  I've looked at it a little bit and traced down
> to function have_unsafe_outer_join_ref().  The comment there says
>  * In practice, this test never finds a problem ...
> This seems not correct as showed by the counterexample.

Right.  I'd noticed that the inner loop of that function was not
reached in our regression tests, and incorrectly concluded that it
was not reachable --- but I failed to consider cases where the
inner rel's parameterization depends on Vars from multiple places.

> The NOT_USED part of code is doing this check.  But I think we need a
> little tweak.  We should check the nullable side of related outer joins
> against the satisfied part, rather than inner_paramrels.  Maybe
> something like attached.

Agreed.

> However, this test seems to cost some cycles after the change.  So I
> wonder if it's worthwhile to perform it, considering that join order
> restrictions should be able to guarantee there is no problem here.

Yeah, I think we should reduce it to an Assert check.  It shouldn't be
worth the cycles to run in production, and that will also make it easier
to notice in sqlsmith testing if anyone happens across another
counterexample.

Pushed that way.  Thanks for the report and the patch!

regards, tom lane




Re: Making Vars outer-join aware

2023-02-12 Thread Richard Guo
On Mon, Feb 13, 2023 at 7:58 AM Justin Pryzby  wrote:

> The patch broke this query:
>
> select from pg_inherits inner join information_schema.element_types
> right join (select from pg_constraint as sample_2) on true
> on false, lateral (select scope_catalog, inhdetachpending from
> pg_publication_namespace limit 3);
> ERROR:  could not devise a query plan for the given query


Thanks for the report!  I've looked at it a little bit and traced down
to function have_unsafe_outer_join_ref().  The comment there says

 * In practice, this test never finds a problem ...
 * ...
 * It still seems worth checking
 * as a backstop, but we don't go to a lot of trouble: just reject if the
 * unsatisfied part includes any outer-join relids at all.

This seems not correct as showed by the counterexample.  ISTM that we
need to do the check honestly as what the other comment says

 * If the parameterization is only partly satisfied by the outer rel,
 * the unsatisfied part can't include any outer-join relids that could
 * null rels of the satisfied part.

The NOT_USED part of code is doing this check.  But I think we need a
little tweak.  We should check the nullable side of related outer joins
against the satisfied part, rather than inner_paramrels.  Maybe
something like attached.

However, this test seems to cost some cycles after the change.  So I
wonder if it's worthwhile to perform it, considering that join order
restrictions should be able to guarantee there is no problem here.

BTW, here is a simplified query that can trigger this issue on HEAD.

select * from t1 inner join t2 left join (select null as c from t3 left
join t4 on true) as sub on true on true, lateral (select c, t1.a from t5
offset 0 ) ss;

Thanks
Richard


v1-0001-Fix-for-have_unsafe_outer_join_ref.patch
Description: Binary data


Re: Making Vars outer-join aware

2023-02-12 Thread Justin Pryzby
On Mon, Jan 23, 2023 at 03:38:06PM -0500, Tom Lane wrote:
> Richard, are you planning to review this any more?  I'm getting
> a little antsy to get it committed.  For such a large patch,
> it's surprising it's had so few conflicts to date.

The patch broke this query:

select from pg_inherits inner join information_schema.element_types
right join (select from pg_constraint as sample_2) on true
on false, lateral (select scope_catalog, inhdetachpending from 
pg_publication_namespace limit 3);
ERROR:  could not devise a query plan for the given query





Re: Making Vars outer-join aware

2023-01-30 Thread Tom Lane
Richard Guo  writes:
> Sorry for the delayed reply.  I don't have any more review comments at
> the moment, except a nitpicking one.

> In optimizer/README at line 729 there is a query as

> SELECT * FROM a
>   LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = b.y)
> WHERE a.x = 1;

> I think it should be

> SELECT * FROM a
>   LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = ss.y)
> WHERE a.x = 1;

Oh, good catch, thanks.

> I have no objection to get it committed.

I'll push forward then.  Thanks for reviewing!

regards, tom lane




Re: Making Vars outer-join aware

2023-01-30 Thread Richard Guo
On Tue, Jan 24, 2023 at 4:38 AM Tom Lane  wrote:

> Richard, are you planning to review this any more?  I'm getting
> a little antsy to get it committed.  For such a large patch,
> it's surprising it's had so few conflicts to date.


Sorry for the delayed reply.  I don't have any more review comments at
the moment, except a nitpicking one.

In optimizer/README at line 729 there is a query as

SELECT * FROM a
  LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = b.y)
WHERE a.x = 1;

I think it should be

SELECT * FROM a
  LEFT JOIN (SELECT * FROM b WHERE b.z = 1) ss ON (a.x = ss.y)
WHERE a.x = 1;

I have no objection to get it committed.

Thanks
Richard


Re: Making Vars outer-join aware

2023-01-24 Thread David G. Johnston
On Tue, Jan 24, 2023 at 1:25 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Tue, Jan 24, 2023 at 12:31 PM Tom Lane  wrote:
> >> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
> >>
> >> If we turn the generic equivclass.c logic loose on these clauses,
> >> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
> >> the scan of t2, which is even better (since we might be able to turn
> >> that into an indexscan qual).  However, it will also try to apply
> >> t1.x = 1 at the scan of t1, and that's just wrong, because that
> >> will eliminate t1 rows that should come through with null extension.
>
> > Is there a particular comment or README where that last conclusion is
> > explained so that it makes sense.
>
> Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
> A row that doesn't have t1.x = 1 will appear in the output with
> null columns for t2 ... but it must still appear, so we cannot
> filter on t1.x = 1 in the scan of t1.
>
>
Ran some queries, figured it out.  Sorry for the noise.  I had turned the
behavior of the RHS side appearing in the ON clause into a personal general
rule then tried to apply it to the LHS (left join mental model) without
working through the rules from first principles.

David J.


Re: Making Vars outer-join aware

2023-01-24 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, Jan 24, 2023 at 12:31 PM Tom Lane  wrote:
>> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
>> 
>> If we turn the generic equivclass.c logic loose on these clauses,
>> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
>> the scan of t2, which is even better (since we might be able to turn
>> that into an indexscan qual).  However, it will also try to apply
>> t1.x = 1 at the scan of t1, and that's just wrong, because that
>> will eliminate t1 rows that should come through with null extension.

> Is there a particular comment or README where that last conclusion is
> explained so that it makes sense.

Hm?  It's a LEFT JOIN, so it must not eliminate any rows from t1.
A row that doesn't have t1.x = 1 will appear in the output with
null columns for t2 ... but it must still appear, so we cannot
filter on t1.x = 1 in the scan of t1.

regards, tom lane




Re: Making Vars outer-join aware

2023-01-24 Thread David G. Johnston
On Tue, Jan 24, 2023 at 12:31 PM Tom Lane  wrote:

> I wrote:
> > Hans Buschmann  writes:
> >> I just noticed your new efforts in this area.
> >> I wanted to recurr to my old thread [1] considering constant
> propagation of quals.
> >> [1]
> https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net
>
> > Yeah, this patch series is not yet quite up to the point of improving
> > that.  That area is indeed the very next thing I want to work on, and
> > I did spend some effort on it last month, but I ran out of time to get
> > it working.  Maybe we'll have something there for v17.
>
> BTW, to clarify what's going on there: what I want to do is allow
> the regular equivalence-class machinery to handle deductions from
> equality operators appearing in LEFT JOIN ON clauses (maybe full
> joins too, but I'd be satisfied if it works for one-sided outer
> joins).  I'd originally hoped that distinguishing pre-nulled from
> post-nulled variables would be enough to make that safe, but it's
> not.  Here's an example:
>
> select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);
>
> If we turn the generic equivclass.c logic loose on these clauses,
> it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
> the scan of t2, which is even better (since we might be able to turn
> that into an indexscan qual).  However, it will also try to apply
> t1.x = 1 at the scan of t1, and that's just wrong, because that
> will eliminate t1 rows that should come through with null extension.
>
>
Is there a particular comment or README where that last conclusion is
explained so that it makes sense.  Intuitively, I would expect t1.x = 1 to
be applied during the scan of t1 - it isn't like the output of the join is
allowed to include t1 rows not matching that condition anyway.

IOW, I thought the more verbose but equivalent syntax for that was:

select ... from (select * from t1 as insub where insub.x = 1) as t1 left
join t2 on (t1.x  = t2.y)

Thanks!

David J.


Re: Making Vars outer-join aware

2023-01-24 Thread Tom Lane
I wrote:
> Hans Buschmann  writes:
>> I just noticed your new efforts in this area.
>> I wanted to recurr to my old thread [1] considering constant propagation of 
>> quals.
>> [1]  https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net

> Yeah, this patch series is not yet quite up to the point of improving
> that.  That area is indeed the very next thing I want to work on, and
> I did spend some effort on it last month, but I ran out of time to get
> it working.  Maybe we'll have something there for v17.

BTW, to clarify what's going on there: what I want to do is allow
the regular equivalence-class machinery to handle deductions from
equality operators appearing in LEFT JOIN ON clauses (maybe full
joins too, but I'd be satisfied if it works for one-sided outer
joins).  I'd originally hoped that distinguishing pre-nulled from
post-nulled variables would be enough to make that safe, but it's
not.  Here's an example:

select ... from t1 left join t2 on (t1.x = t2.y and t1.x = 1);

If we turn the generic equivclass.c logic loose on these clauses,
it will deduce t2.y = 1, which is good, and then apply t2.y = 1 at
the scan of t2, which is even better (since we might be able to turn
that into an indexscan qual).  However, it will also try to apply
t1.x = 1 at the scan of t1, and that's just wrong, because that
will eliminate t1 rows that should come through with null extension.

My current plan for making this work is to define
EquivalenceClass-generated clauses as applying within "join domains",
which are sets of inner-joined relations, and in the case of a one-sided
outer join then the join itself belongs to the same join domain as its
right-hand side --- but not to the join domain of its left-hand side.
This would allow us to push EC clauses from an outer join's qual down
into the RHS, but not into the LHS, and then anything leftover would
still have to be applied at the join.  In this example we'd have to
apply t1.x = t2.y or t1.x = 1, but not both, at the join.

I got as far as inventing join domains, in the 0012 patch of this
series, but I haven't quite finished puzzling out the clause application
rules that would be needed for this scenario.  Ordinarily an EC
containing a constant would be fully enforced at the scan level
(i.e., apply t1.x = 1 and t2.y = 1 at scan level) and generate no
additional clauses at join level; but that clearly doesn't work
anymore when some of the scans are outside the join domain.
I think that the no-constant case might need to be different too.
I have some WIP code but nothing I can show.

Also, this doesn't seem to help for full joins.  We can treat the
two sides as each being their own join domains, but then the join's
own ON clause doesn't belong to either one, since we can't throw
away rows from either side on the basis of a restriction from ON.
So it seems like we'll still need ad-hoc logic comparable to
reconsider_full_join_clause, if we want to preserve that optimization.
I'm only mildly discontented with that, but still discontented.

regards, tom lane




Re: Making Vars outer-join aware

2023-01-24 Thread Tom Lane
Hans Buschmann  writes:
> I just noticed your new efforts in this area.
> I wanted to recurr to my old thread [1] considering constant propagation of 
> quals.
> [1]  https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net

Yeah, this patch series is not yet quite up to the point of improving
that.  That area is indeed the very next thing I want to work on, and
I did spend some effort on it last month, but I ran out of time to get
it working.  Maybe we'll have something there for v17.

regards, tom lane




Re: Making Vars outer-join aware

2023-01-24 Thread Hans Buschmann
Hello Tom


I just noticed your new efforts in this area.


I wanted to recurr to my old thread [1] considering constant propagation of 
quals.


You gave an elaborated explanation at that time, but my knowledge was/is not 
yet sufficient to reveil the technical details.


In our application the described method is widespread used with much success 
(now at pg15.1 Fedora), but for unexperienced SQL authors this is not really 
obviously to choose (i.e. using the explicit constant xx_season=3 as qual). 
This always requires a "Macro" processor to compose the queries (in my case 
php) and a lot of programmer effort in the source code.


I can't review/understand your patchset for the planner, but since it covers 
the same area, the beformentioned optimization could perhaps be addressed too.


With respect of the nullability of these quals I immediately changed all of 
them to NOT NULL, which seems the most natural way when these quals are also 
used for partioning.


[1]  https://www.postgresql.org/message-id/1571413123735.26...@nidsa.net


Thanks for looking


Hans Buschmann



Re: Making Vars outer-join aware

2022-12-28 Thread Tom Lane
Richard Guo  writes:
> I think I see where the problem is.  And I can see currently in
> get_eclass_for_sort_expr we always use the top JoinDomain.  So although
> the equality clause 'a.x = b.y' belongs to JoinDomain {B}, we set up ECs
> for 'a.x' and 'b.y' that belong to the top JoinDomain {A, B, A/B}.

Yeah, that's a pretty squishy point, and likely wrong in detail.
If we want to associate an EC with the sort order of an index on
b.y, we almost certainly want that EC to belong to join domain {B}.
I had tried to do that in an earlier iteration of 0012, by dint of
adding a JoinDomain argument to get_eclass_for_sort_expr and then
having build_index_pathkeys specify the lowest join domain containing
the index's relation.  It did not work very well: it couldn't generate
mergejoins on full-join clauses, IIRC.

Maybe some variant on that plan can be made to fly, but I'm not at
all clear on what needs to be adjusted.  Anyway, that's part of why
I backed off on the notion of explicitly associating ECs with join
domains.  As long as we only pay attention to the join domain in
connection with constants, get_eclass_for_sort_expr can get away with
just using the top join domain, because we'd never apply it to a
constant unless perhaps somebody manages to ORDER BY or GROUP BY a
constant, and in those cases the top domain really is the right one.
(It's syntactically difficult to write such a thing anyway, but not
impossible.)

I think this is sort of an intermediate state, and hopefully a
year from now we'll have a better idea of how to manage all that.
What I mainly settled for doing in 0012 was getting rid of the
"below outer join" concept for ECs, because having to identify
a value for that had been giving me fits in several previous
attempts at extending ECs to cover outer-join equalities.
I think that the JoinDomain concept will offer a better-defined
replacement.

regards, tom lane




Re: Making Vars outer-join aware

2022-12-28 Thread Richard Guo
On Tue, Dec 27, 2022 at 11:31 PM Tom Lane  wrote:

> The thing that I couldn't get around before is that if you have,
> say, a mergejoinable equality clause in an outer join:
>
> select ... from a left join b on a.x = b.y;
>
> that equality clause can only be associated with the join domain
> for B, because it certainly can't be enforced against A.  However,
> you'd still wish to be able to do a mergejoin using indexes on
> a.x and b.y, and this means that we have to understand the ordering
> induced by a PathKey based on this EC as applicable to A, even
> though that relation is not in the same join domain.  So there are
> situations where sort orderings apply across domain boundaries even
> though equalities don't.  We might have to split the notion of
> EquivalenceClass into two sorts of objects, and somewhere right
> about here is where I realized that this wasn't getting finished
> for v16 :-(.


I think I see where the problem is.  And I can see currently in
get_eclass_for_sort_expr we always use the top JoinDomain.  So although
the equality clause 'a.x = b.y' belongs to JoinDomain {B}, we set up ECs
for 'a.x' and 'b.y' that belong to the top JoinDomain {A, B, A/B}.

But doing so would lead to a situation where the "same" Vars from
different join domains might have the same varnullingrels and thus would
match by equal().  As an example, consider

select ... from a left join b on a.x = b.y where a.x = 1;

As said we would set up EC for 'b.y' as belonging to the top JoinDomain.
Then when reconsider_outer_join_clause generates the equality clause
'b.y = 1', we figure out that the new clause belongs to JoinDomain {B}.
Note that the two 'b.y' here belong to different join domains but they
have the same varnullingrels (empty varnullingrels actually).  As a
result, the equality 'b.y = 1' would be merged into the existing EC for
'b.y', because the two 'b.y' matches by equal() and we do not check
JoinDomain for non-const EC members.  So we would end up with an EC
containing EC members of different join domains.

And it seems this would make the following statement in README not hold
any more.

We don't have to worry about this for Vars (or expressions
containing Vars), because references to the "same" column from
different join domains will have different varnullingrels and thus
won't be equal() anyway.

Thanks
Richard


Re: Making Vars outer-join aware

2022-12-27 Thread Tom Lane
Richard Guo  writes:
> For 0012, I'm still trying to understand JoinDomain.  AFAIU all EC
> members of the same EC should have the same JoinDomain, because for
> constants we match EC members only within the same JoinDomain, and for
> Vars if they come from different join domains they will have different
> nullingrels and thus will not match.  So I wonder if we can have the
> JoinDomain kept in EquivalenceClass rather than in each
> EquivalenceMembers.

Yeah, I tried to do it like that at first, and failed.  There is
some sort of association between ECs and join domains, for sure,
but exactly what it is seems to need more elucidation.

The thing that I couldn't get around before is that if you have,
say, a mergejoinable equality clause in an outer join:

select ... from a left join b on a.x = b.y;

that equality clause can only be associated with the join domain
for B, because it certainly can't be enforced against A.  However,
you'd still wish to be able to do a mergejoin using indexes on
a.x and b.y, and this means that we have to understand the ordering
induced by a PathKey based on this EC as applicable to A, even
though that relation is not in the same join domain.  So there are
situations where sort orderings apply across domain boundaries even
though equalities don't.  We might have to split the notion of
EquivalenceClass into two sorts of objects, and somewhere right
about here is where I realized that this wasn't getting finished
for v16 :-(.

So the next pass at this is likely going to involve some more
refactoring, and maybe we'll end up saying that an EquivalenceClass
is tightly bound to a join domain or maybe we won't.  For the moment
it seemed to work better to associate domains with only the const
members of ECs.  (As written, the patch does fill em_jdomain even
for non-const members, but that was just for simplicity.  I'd
originally meant to make it NULL for non-const members, but that
turned out to be a bit too tedious because the responsibility for
marking a member as const or not is split among several places.)

Another part of the motivation for doing it like that is that
I've been thinking about having just a single common pool of
EquivalenceMember objects, and turning EquivalenceClasses into
bitmapsets of indexes into the shared EquivalenceMember list.
This would support having ECs that share some member(s) without
being exactly the same thing, which I think might be necessary
to get to the point of treating outer-join clauses as creating
EC equalities.

BTW, I can't escape the suspicion that I've reinvented an idea
that's already well known in the literature.  Has anyone seen
something like this "join domain" concept before, and if so
what was it called?

regards, tom lane




Re: Making Vars outer-join aware

2022-12-27 Thread Richard Guo
On Sat, Dec 24, 2022 at 2:20 AM Tom Lane  wrote:

> I shoved some preliminary refactoring into the 0001 patch,
> notably splitting deconstruct_jointree into two passes.
> 0002-0009 cover the same ground as they did before, though
> with some differences in detail.  0010-0012 are new work
> mostly aimed at removing kluges we no longer need.


I'm looking at 0010-0012 and I really like the changes and removals
there.  Thanks for the great work!

For 0010, the change seems quite independent.  I think maybe we can
apply it to HEAD directly.

For 0011, I found that some clauses that were outerjoin_delayed and thus
not equivalent before might be treated as being equivalent now.  For
example

explain (costs off)
select * from a left join b on a.i = b.i where coalesce(b.j, 0) = 0 and
coalesce(b.j, 0) = a.j;
QUERY PLAN
--
 Hash Right Join
   Hash Cond: (b.i = a.i)
   Filter: (COALESCE(b.j, 0) = 0)
   ->  Seq Scan on b
   ->  Hash
 ->  Seq Scan on a
   Filter: (j = 0)
(7 rows)

This is different behavior from HEAD.  But I think it's an improvement.

For 0012, I'm still trying to understand JoinDomain.  AFAIU all EC
members of the same EC should have the same JoinDomain, because for
constants we match EC members only within the same JoinDomain, and for
Vars if they come from different join domains they will have different
nullingrels and thus will not match.  So I wonder if we can have the
JoinDomain kept in EquivalenceClass rather than in each
EquivalenceMembers.

Thanks
Richard


Re: Making Vars outer-join aware

2022-12-23 Thread Tom Lane
Ted Yu  writes:
> For v8-0012-invent-join-domains.patch, in `distribute_qual_to_rels`, it
> seems that `pseudoconstant` and `root->hasPseudoConstantQuals` carry the
> same value.
> Can the variable `pseudoconstant` be omitted ?

Surely not.  'pseudoconstant' tells whether the current qual clause
is pseudoconstant.  root->hasPseudoConstantQuals remembers whether
we have found any pseudoconstant qual in the query.

regards, tom lane




Re: Making Vars outer-join aware

2022-12-23 Thread Ted Yu
On Fri, Dec 23, 2022 at 10:21 AM Tom Lane  wrote:

> Here's a new edition of this patch series.
>
> I shoved some preliminary refactoring into the 0001 patch,
> notably splitting deconstruct_jointree into two passes.
> 0002-0009 cover the same ground as they did before, though
> with some differences in detail.  0010-0012 are new work
> mostly aimed at removing kluges we no longer need.
>
> There are two big areas that I would still like to improve, but
> I think we've run out of time to address them in the v16 cycle:
>
> * It'd be nice to apply the regular EquivalenceClass deduction
> mechanisms to outer-join equalities, instead of the
> reconsider_outer_join_clauses kluge.  I've made several stabs at that
> without much success.  I think that the "join domain" framework added
> in 0012 is likely to provide a workable foundation, but some more
> effort is needed.
>
> * I really want to get rid of RestrictInfo.is_pushed_down and
> RINFO_IS_PUSHED_DOWN(), because those seem exceedingly awkward
> and squishy.  I've not gotten far with finding a better
> replacement there, either.
>
> Despite the work being unfinished, I feel that this has moved us a
> long way towards outer-join handling being less of a jury-rigged
> affair.
>
> regards, tom lane
>
> Hi,
For v8-0012-invent-join-domains.patch, in `distribute_qual_to_rels`, it
seems that `pseudoconstant` and `root->hasPseudoConstantQuals` carry the
same value.
Can the variable `pseudoconstant` be omitted ?

Cheers


Re: Making Vars outer-join aware

2022-11-17 Thread Richard Guo
On Thu, Nov 17, 2022 at 4:46 AM Tom Lane  wrote:

> So we've marked the 4 and 7 joins as possibly commuting, but they
> cannot commute according to 7's min_lefthand set.  I don't think
> the extra clone condition is terribly harmful --- it's useless
> but shouldn't cause any problems.  However, if these joins should be
> able to commute then the min_lefthand marking is preventing us
> from considering legal join orders (and has been doing so all along,
> that's not new in this patch).  It looks to me like they should be
> able to commute (giving your third form), so this is a pre-existing
> planning deficiency.


Yeah.  This is an issue that can also be seen on HEAD and is discussed
in [1].  It happens because when building SpecialJoinInfo for 7, we find
A/B join 5 is in our LHS, and our join condition (Pcd) uses 5's
syn_righthand while is not strict for 5's min_righthand.  So we decide
to preserve the ordering of 7 and 5, by adding 5's full syntactic relset
to 7's min_lefthand.  As discussed in [1], maybe we should consider 5's
min_righthand rather than syn_righthand when checking if Pcd uses 5's
RHS.


> > Looking at the two forms again, it seems the expected two versions for
> > Pcd should be
> > Version 1: C Vars with nullingrels as {B/C}
> > Version 2: C Vars with nullingrels as {B/C, A/B}
> > With this we may have another problem that the two versions are both
> > applicable at the C/D join according to clause_is_computable_at(), in
> > both forms.
>
> At least when I tried it just now, clause_is_computable_at correctly
> rejected the first version, because we've already computed A/B when
> we are trying to form the C/D join so we expect it to be listed in
> varnullingrels.


For the first version of Pcd, which is C Vars with nullingrels as {B/C}
here, although at the C/D join level A/B join has been computed and
meanwhile is not listed in varnullingrels, but since Pcd does not
mention any nullable Vars in A/B's min_righthand, it seems to me this
version cannot be rejected by clause_is_computable_at().  But maybe I'm
missing something.

[1]
https://www.postgresql.org/message-id/flat/CAMbWs4_8n5ANh_aX2PinRZ9V9mtBguhnRd4DOVt9msPgHmEMOQ%40mail.gmail.com

Thanks
Richard


Re: Making Vars outer-join aware

2022-11-16 Thread Tom Lane
Richard Guo  writes:
> I'm reviewing the part about multiple version clauses, and I find a case
> that may not work as expected.  I tried with some query as below
>  (A leftjoin (B leftjoin C on (Pbc)) on (Pab)) left join D on (Pcd)
> Assume Pbc is strict for B and Pcd is strict for C.
> According to identity 3, we know one of its equivalent form is
>  ((A leftjoin B on (Pab)) leftjoin C on (Pbc)) left join D on (Pcd)
> For outer join clause Pcd, we would generate two versions from the first
> form
> Version 1: C Vars with nullingrels as {A/B}
> Version 2: C Vars with nullingrels as {B/C, A/B}
> I understand version 2 is reasonable as the nullingrels from parser
> would be set as that.  But it seems version 1 is not applicable in
> either form.

Hmm.  Looking at the data structures generated for the first form,
we have

B/C join:

  {SPECIALJOININFO 
  :min_lefthand (b 2)
  :min_righthand (b 3)
  :syn_lefthand (b 2)
  :syn_righthand (b 3)
  :jointype 1 
  :ojrelid 4 
  :commute_above_l (b 7)
  :commute_above_r (b 5)
  :commute_below (b)

A/B join:

  {SPECIALJOININFO 
  :min_lefthand (b 1)
  :min_righthand (b 2)
  :syn_lefthand (b 1)
  :syn_righthand (b 2 3 4)
  :jointype 1 
  :ojrelid 5 
  :commute_above_l (b)
  :commute_above_r (b)
  :commute_below (b 4)

everything-to-D join:

  {SPECIALJOININFO 
  :min_lefthand (b 1 2 3 4 5)
  :min_righthand (b 6)
  :syn_lefthand (b 1 2 3 4 5)
  :syn_righthand (b 6)
  :jointype 1 
  :ojrelid 7 
  :commute_above_l (b)
  :commute_above_r (b)
  :commute_below (b 4)

So we've marked the 4 and 7 joins as possibly commuting, but they
cannot commute according to 7's min_lefthand set.  I don't think
the extra clone condition is terribly harmful --- it's useless
but shouldn't cause any problems.  However, if these joins should be
able to commute then the min_lefthand marking is preventing us
from considering legal join orders (and has been doing so all along,
that's not new in this patch).  It looks to me like they should be
able to commute (giving your third form), so this is a pre-existing
planning deficiency.

Without having looked too closely, I suspect this is coming from
the delay_upper_joins/check_outerjoin_delay stuff in initsplan.c.
That's a chunk of logic that I'd like to nuke altogether, and maybe
we will be able to do so once this patchset is a bit further along.
But I've not had time to look at it yet.

I'm not entirely clear on whether the strange selection of clone
clauses for this example is a bug in process_postponed_left_join_quals
or if that function is just getting misled by the bogus min_lefthand
value.

> Looking at the two forms again, it seems the expected two versions for
> Pcd should be
> Version 1: C Vars with nullingrels as {B/C}
> Version 2: C Vars with nullingrels as {B/C, A/B}
> With this we may have another problem that the two versions are both
> applicable at the C/D join according to clause_is_computable_at(), in
> both forms.

At least when I tried it just now, clause_is_computable_at correctly
rejected the first version, because we've already computed A/B when
we are trying to form the C/D join so we expect it to be listed in
varnullingrels.

regards, tom lane




Re: Making Vars outer-join aware

2022-11-15 Thread Richard Guo
On Sun, Nov 6, 2022 at 5:53 AM Tom Lane  wrote:

> I wrote:
> > I've been working away at this patch series, and here is an up-to-date
> > version.
>
> This needs a rebase after ff8fa0bf7 and b0b72c64a.  I also re-ordered
> the patches so that the commit messages' claims about when regression
> tests start to pass are true again.  No interesting changes, though.


I'm reviewing the part about multiple version clauses, and I find a case
that may not work as expected.  I tried with some query as below

 (A leftjoin (B leftjoin C on (Pbc)) on (Pab)) left join D on (Pcd)

Assume Pbc is strict for B and Pcd is strict for C.

According to identity 3, we know one of its equivalent form is

 ((A leftjoin B on (Pab)) leftjoin C on (Pbc)) left join D on (Pcd)

For outer join clause Pcd, we would generate two versions from the first
form

Version 1: C Vars with nullingrels as {A/B}
Version 2: C Vars with nullingrels as {B/C, A/B}

I understand version 2 is reasonable as the nullingrels from parser
would be set as that.  But it seems version 1 is not applicable in
either form.

Looking at the two forms again, it seems the expected two versions for
Pcd should be

Version 1: C Vars with nullingrels as {B/C}
Version 2: C Vars with nullingrels as {B/C, A/B}

With this we may have another problem that the two versions are both
applicable at the C/D join according to clause_is_computable_at(), in
both forms.

Another thing is I believe we have another equivalent form as

 (A left join B on (Pab)) left join (C left join D on (Pcd)) on (Pbc)

Currently this form cannot be generated because of the issue discussed
in [1].  But someday when we can do that, I think we should have a third
version for Pcd

Version 3: C Vars with empty nullingrels

[1]
https://www.postgresql.org/message-id/flat/CAMbWs4_8n5ANh_aX2PinRZ9V9mtBguhnRd4DOVt9msPgHmEMOQ%40mail.gmail.com

Thanks
Richard


Re: Making Vars outer-join aware

2022-11-10 Thread Richard Guo
On Thu, Aug 25, 2022 at 6:27 PM Richard Guo  wrote:

> I'm not sure if I understand it correctly. If we are given the first
> order from the parser, the SpecialJoinInfo for the B/C join would have
> min_lefthand as containing both B and the A/B join. And this
> SpecialJoinInfo would make the B/C join be invalid, which is not what we
> want.
>

Now I see how this works from v6 patch.  Once we notice identity 3
applies, we will remove the lower OJ's ojrelid from the min_lefthand or
min_righthand so that the commutation is allowed.  So in this case, the
A/B join would be removed from B/C join's min_lefthand when we build the
SpecialJoinInfo for B/C join, and that makes the B/C join to be legal.

BTW, inner_join_rels can contain base Relids and OJ Relids.  Maybe we
can revise the comments a bit for it atop deconstruct_recurse and
make_outerjoininfo.  The same for the comments of qualscope, ojscope and
outerjoin_nonnullable atop distribute_qual_to_rels.

The README mentions restriction_is_computable_at(), I think it should be
clause_is_computable_at().

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-29 Thread Richard Guo
On Mon, Aug 29, 2022 at 2:30 PM Richard Guo  wrote:

>
> On Fri, Aug 19, 2022 at 2:45 AM Tom Lane  wrote:
>
>> Here's a rebase up to HEAD, mainly to get the cfbot back in sync
>> as to what's the live patch.
>
>
> Noticed another different behavior from previous. When we try to reduce
> JOIN_LEFT to JOIN_ANTI, we want to know if the join's own quals are
> strict for any Var that was forced null by higher qual levels. We do
> that by checking whether local_nonnullable_vars and forced_null_vars
> overlap. However, the same Var from local_nonnullable_vars and
> forced_null_vars may be labeled with different varnullingrels. If that
> is the case, currently we would fail to tell they actually overlap.
>

I wonder why this is not noticed by regression tests. So I did some
search and it seems we do not have any test cases covering the
transformation we apply to reduce outer joins. I think maybe we should
add such cases in regression tests.

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-29 Thread Richard Guo
On Fri, Aug 19, 2022 at 2:45 AM Tom Lane  wrote:

> Here's a rebase up to HEAD, mainly to get the cfbot back in sync
> as to what's the live patch.


Noticed another different behavior from previous. When we try to reduce
JOIN_LEFT to JOIN_ANTI, we want to know if the join's own quals are
strict for any Var that was forced null by higher qual levels. We do
that by checking whether local_nonnullable_vars and forced_null_vars
overlap. However, the same Var from local_nonnullable_vars and
forced_null_vars may be labeled with different varnullingrels. If that
is the case, currently we would fail to tell they actually overlap. As
an example, consider 'b.i' in the query below

# explain (costs off) select * from a left join b on a.i = b.i where b.i is
null;
QUERY PLAN
---
 Hash Left Join
   Hash Cond: (a.i = b.i)
   Filter: (b.i IS NULL)
   ->  Seq Scan on a
   ->  Hash
 ->  Seq Scan on b
(6 rows)

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-25 Thread Richard Guo
On Thu, Aug 25, 2022 at 5:18 AM Tom Lane  wrote:

> Richard Guo  writes:
> > Do we need to also
> > generate two SpecialJoinInfos for the B/C join in the first order, with
> > and without the A/B join in its min_lefthand?
>
> No, the SpecialJoinInfos would stay as they are now.  It's already the
> case that the first join's min_righthand would contain only B, and
> the second one's min_righthand would contain only C.


I'm not sure if I understand it correctly. If we are given the first
order from the parser, the SpecialJoinInfo for the B/C join would have
min_lefthand as containing both B and the A/B join. And this
SpecialJoinInfo would make the B/C join be invalid, which is not what we
want. Currently the patch resolves this by explicitly running
remove_unneeded_nulling_relids, and the A/B join would be removed from
B/C join's min_lefthand, if Pbc is strict for B.

Do we still need this kind of fixup if we are to keep just one form of
SpecialJoinInfo and two forms of RestrictInfos?

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-24 Thread Tom Lane
Richard Guo  writes:
> On Sun, Aug 21, 2022 at 6:52 AM Tom Lane  wrote:
>> What I'm thinking we should do about this, once we detect that
>> this identity is applicable, is to generate *both* forms of Pbc,
>> either adding or removing the varnullingrels bits depending on
>> which form we got from the parser.

> Do you mean we generate two RestrictInfos for Pbc in the case of
> identity 3, one with varnullingrels and one without varnullingrels, and
> choose the appropriate one when forming join paths?

Right.

> Do we need to also
> generate two SpecialJoinInfos for the B/C join in the first order, with
> and without the A/B join in its min_lefthand?

No, the SpecialJoinInfos would stay as they are now.  It's already the
case that the first join's min_righthand would contain only B, and
the second one's min_righthand would contain only C.

regards, tom lane




Re: Making Vars outer-join aware

2022-08-24 Thread Richard Guo
On Sun, Aug 21, 2022 at 6:52 AM Tom Lane  wrote:

> What I'm thinking we should do about this, once we detect that
> this identity is applicable, is to generate *both* forms of Pbc,
> either adding or removing the varnullingrels bits depending on
> which form we got from the parser.  Then, when we come to forming
> join paths, use the appropriate variant depending on which join
> order we're considering.  build_join_rel() already has the concept
> that the join restriction list varies depending on which input
> relations we're trying to join, so this doesn't require any
> fundamental restructuring -- only a way to identify which
> RestrictInfos to use or ignore for a particular join.  That will
> probably require some new field in RestrictInfo, but I'm not
> fussed about that because there are other fields we'll be able
> to remove as this work progresses.


Do you mean we generate two RestrictInfos for Pbc in the case of
identity 3, one with varnullingrels and one without varnullingrels, and
choose the appropriate one when forming join paths? Do we need to also
generate two SpecialJoinInfos for the B/C join in the first order, with
and without the A/B join in its min_lefthand?

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-20 Thread Tom Lane
Progress report on this ...

I've been trying to remove some of the cruftier aspects of
EquivalenceClasses (which really is the main point of this whole
effort), and gotten repeatedly blocked by the fact that the semantics
are still a bit crufty thanks to the hacking associated with outer
join identity 3.  I think I see a path forward though.

To recap, the thing about identity 3 is that it says you can get
equivalent results from

(A leftjoin B on (Pab)) leftjoin C on (Pb*c)

A leftjoin (B leftjoin C on (Pbc)) on (Pab)

if Pbc is strict for B.  Unlike what it says in optimizer/README,
I've written the first form as "Pb*c" to indicate that any B Vars
appearing in that clause will be marked as possibly nulled by
the A/B join.  This makes the problem apparent: we cannot use
the same representation of Pbc for both join orders, because
in the second variant B's Vars are not nulled by anything.
We've been trying to get away with writing Pbc just one way,
and that leads directly to the semantic squishiness I've been
fighting.

What I'm thinking we should do about this, once we detect that
this identity is applicable, is to generate *both* forms of Pbc,
either adding or removing the varnullingrels bits depending on
which form we got from the parser.  Then, when we come to forming
join paths, use the appropriate variant depending on which join
order we're considering.  build_join_rel() already has the concept
that the join restriction list varies depending on which input
relations we're trying to join, so this doesn't require any
fundamental restructuring -- only a way to identify which
RestrictInfos to use or ignore for a particular join.  That will
probably require some new field in RestrictInfo, but I'm not
fussed about that because there are other fields we'll be able
to remove as this work progresses.

Similarly, generate_join_implied_equalities() will need to generate
EquivalenceClass-derived join clauses with or without varnullingrels
marks, as appropriate.  I'm not quite sure how to do that, but it
feels like just a small matter of programming, not a fundamental
problem with the model which is where things are right now.
We'll only need this for ECs that include source clauses coming
from a movable outer join clause (i.e., Pbc in identity 3).

An interesting point is that I think we want to force movable
outer joins into the second format for the purpose of generating
ECs, that is we want to use Pbc not Pb*c as the EC source form.
The reason for that is to allow generation of relation-scan-level
clauses from an EC, particularly an EC that includes a constant.
As an example, given

A leftjoin (B leftjoin C on (B.b = C.c)) on (A.a = B.b)
where A.a = constant

we can decide unconditionally that A.a, B.b, C.c, and the constant all
belong to the same equivalence class, and thereby generate relation
scan restrictions A.a = constant, B.b = constant, and C.c = constant.
If we start with the other join order, which will include "B.b* = C.c"
(ie Pb*c) then we'd have two separate ECs: {A.a, B.b, constant} and
{B.b*, C.c}.  So we'll fail to produce any scan restriction for C, or
at least we can't do so in any principled way.

Furthermore, if the joins are done in the second order then we don't
need any additional join clauses -- both joins can act like "LEFT JOIN
ON TRUE".  (Right now, we'll emit redundant B.b = C.c and A.a = B.b
join clauses in addition to the scan-level clauses, which is
inefficient.)  However, if we make use of identity 3 to do the
joins in the other order, then we do need an extra join clause, like

(A leftjoin B on (true)) leftjoin C on (B.b* = C.c)

(or maybe we could just emit "B.b* IS NOT NULL" for Pb*c?)
Without any Pb*c join constraint we get wrong answers because
nulling of B fails to propagate to C.

So while there are lots of details to work out, it feels like
this line of thought can lead to something where setrefs.c
doesn't have to ignore varnullingrels mismatches (yay) and
there is no squishiness in EquivalenceClass semantics.

regards, tom lane




Re: Making Vars outer-join aware

2022-08-17 Thread Tom Lane
Richard Guo  writes:
> BTW, the comment just above the two calls to build_joinrel_tlist says:
>  * Create a new tlist containing just the vars that need to be output from
> Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If
> not we may need to adjust this comment to also include PlaceHolderVars.

I think it did intend just Vars because that's all that
build_joinrel_tlist did; but we really should have updated it when we
invented PlaceHolderVars, and even more so now that build_joinrel_tlist
adds PHVs too.  I changed the wording.

> A minor comment is that seems we can get rid of phid inside
> PlaceHolderInfo, since we do not do linear list searches any more. It's
> some duplicate to the phid inside PlaceHolderVar. Currently there are
> two places referencing PlaceHolderInfo->phid, remove_rel_from_query and
> find_placeholder_info. We can use PlaceHolderVar->phid instead in both
> the two places.

Meh, I'm not excited about that.  I don't think that the phid field
is only there to make the search loops faster; it's the basic
identity of the PlaceHolderInfo.

regards, tom lane




Re: Making Vars outer-join aware

2022-08-17 Thread Richard Guo
On Wed, Aug 17, 2022 at 4:57 AM Tom Lane  wrote:

> On further thought, it seems better to maintain the index array
> from the start, allowing complete replacement of the linear list
> searches.  We can add a separate bool flag to denote frozen-ness.


+1 for 0001 patch. Now we process plain Vars and PlaceHolderVars in a
more consistent way when building joinrel's tlist. And this change would
make it easier to build up phnullingrels for PHVs as we climb up the
join tree.

BTW, the comment just above the two calls to build_joinrel_tlist says:

 * Create a new tlist containing just the vars that need to be output from

Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If
not we may need to adjust this comment to also include PlaceHolderVars.


0002 patch looks good to me. Glad we can get rid of create_new_ph flag.
A minor comment is that seems we can get rid of phid inside
PlaceHolderInfo, since we do not do linear list searches any more. It's
some duplicate to the phid inside PlaceHolderVar. Currently there are
two places referencing PlaceHolderInfo->phid, remove_rel_from_query and
find_placeholder_info. We can use PlaceHolderVar->phid instead in both
the two places.

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-16 Thread Tom Lane
I wrote:
> ... We can fix
> that by adding an index array to go straight from phid to the
> PlaceHolderInfo.  While thinking about where to construct such
> an index array, I decided it'd be a good idea to have an explicit
> step to "freeze" the set of PlaceHolderInfos, at the start of
> deconstruct_jointree.

On further thought, it seems better to maintain the index array
from the start, allowing complete replacement of the linear list
searches.  We can add a separate bool flag to denote frozen-ness.
This does have minor downsides:

* Some fiddly code is needed to enlarge the index array at need.
But it's not different from that for, say, simple_rel_array.

* If we ever have a need to mutate the placeholder_list as a whole,
we'd have to reconstruct the index array to point to the new
objects.  We don't do that at present, except in one place in
analyzejoins.c, which is easily fixed.  While the same argument
could be raised against the v1 patch, it's not very likely that
we'd be doing such mutation beyond the start of deconstruct_jointree.

Hence, see v2 attached.

regards, tom lane

diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
index 3b0f0584f0..f0e8cd9965 100644
--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -400,14 +400,16 @@ add_placeholders_to_base_rels(PlannerInfo *root)
 
 /*
  * add_placeholders_to_joinrel
- *		Add any required PlaceHolderVars to a join rel's targetlist;
- *		and if they contain lateral references, add those references to the
- *		joinrel's direct_lateral_relids.
+ *		Add any newly-computable PlaceHolderVars to a join rel's targetlist;
+ *		and if computable PHVs contain lateral references, add those
+ *		references to the joinrel's direct_lateral_relids.
  *
  * A join rel should emit a PlaceHolderVar if (a) the PHV can be computed
  * at or below this join level and (b) the PHV is needed above this level.
- * However, condition (a) is sufficient to add to direct_lateral_relids,
- * as explained below.
+ * Our caller build_join_rel() has already added any PHVs that were computed
+ * in either join input rel, so we need add only newly-computable ones to
+ * the targetlist.  However, direct_lateral_relids must be updated for every
+ * PHV computable at or below this join, as explained below.
  */
 void
 add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
@@ -426,13 +428,10 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 			/* Is it still needed above this joinrel? */
 			if (bms_nonempty_difference(phinfo->ph_needed, relids))
 			{
-/* Yup, add it to the output */
-joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
-	phinfo->ph_var);
-joinrel->reltarget->width += phinfo->ph_width;
-
 /*
- * Charge the cost of evaluating the contained expression if
+ * Yes, but only add to tlist if it wasn't computed in either
+ * input; otherwise it should be there already.  Also, we
+ * charge the cost of evaluating the contained expression if
  * the PHV can be computed here but not in either input.  This
  * is a bit bogus because we make the decision based on the
  * first pair of possible input relations considered for the
@@ -445,12 +444,15 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 if (!bms_is_subset(phinfo->ph_eval_at, outer_rel->relids) &&
 	!bms_is_subset(phinfo->ph_eval_at, inner_rel->relids))
 {
+	PlaceHolderVar *phv = phinfo->ph_var;
 	QualCost	cost;
 
-	cost_qual_eval_node(, (Node *) phinfo->ph_var->phexpr,
-		root);
+	joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
+		phv);
+	cost_qual_eval_node(, (Node *) phv->phexpr, root);
 	joinrel->reltarget->cost.startup += cost.startup;
 	joinrel->reltarget->cost.per_tuple += cost.per_tuple;
+	joinrel->reltarget->width += phinfo->ph_width;
 }
 			}
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..442c12acef 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -688,6 +688,8 @@ build_join_rel(PlannerInfo *root,
 	 */
 	build_joinrel_tlist(root, joinrel, outer_rel);
 	build_joinrel_tlist(root, joinrel, inner_rel);
+
+	/* Add any newly-computable PlaceHolderVars, too */
 	add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
 
 	/*
@@ -966,6 +968,7 @@ min_join_parameterization(PlannerInfo *root,
  * The join's targetlist includes all Vars of its member relations that
  * will still be needed above the join.  This subroutine adds all such
  * Vars from the specified input rel's tlist to the join rel's tlist.
+ * Likewise for any PlaceHolderVars emitted by the input rel.
  *
  * We also compute the expected width of the join's output, making use
  * of data that was cached at the baserel level by 

Re: Making Vars outer-join aware

2022-08-16 Thread Tom Lane
I wrote:
> Richard Guo  writes:
>> If the two issues are indeed something we need to fix, maybe we can
>> change add_placeholders_to_joinrel() to search the PHVs from
>> outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
>> if needed, just like what we do in build_joinrel_tlist(). The PHVs there
>> should have the correct phnullingrels (at least the PHVs in base rels'
>> targetlists have correctly set phnullingrels to NULL).

> Yeah, maybe we should do something more invasive and make use of the
> input targetlists rather than doing this from scratch.  Not sure.
> I'm worried that doing it that way would increase the risk of getting
> different join tlist contents depending on which pair of input rels
> we happen to consider first.

After chewing on that for awhile, I've concluded that that is the way
to go.  0001 attached is a standalone patch to rearrange the way that
PHVs are added to joinrel targetlists.  It results in one cosmetic
change in the regression tests, where the targetlist order for an
intermediate join node changes.  I think that's fine; if anything,
the new order is more sensible than the old because it matches the
inputs' targetlist orders better.

I believe the reason I didn't do it like this to start with is that
I was concerned about the cost of searching the placeholder_list
repeatedly.  With a lot of PHVs, build_joinrel_tlist becomes O(N^2)
just from the repeated find_placeholder_info lookups.  We can fix
that by adding an index array to go straight from phid to the
PlaceHolderInfo.  While thinking about where to construct such
an index array, I decided it'd be a good idea to have an explicit
step to "freeze" the set of PlaceHolderInfos, at the start of
deconstruct_jointree.  This allows getting rid of the create_new_ph
flags for find_placeholder_info and add_vars_to_targetlist, which
I've always feared were bugs waiting to happen: they require callers
to have a very clear understanding of when they're invoked.  There
might be some speed gain over existing code too, but I've not really
tried to measure it.  I did drop a couple of hacks that were only
meant to short-circuit find_placeholder_info calls; that function
should now be cheap enough to not matter.

Barring objections, I'd like to push these soon and then rebase
the main outer-join-vars patch set over them.

regards, tom lane

diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
index 3b0f0584f0..f0e8cd9965 100644
--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -400,14 +400,16 @@ add_placeholders_to_base_rels(PlannerInfo *root)
 
 /*
  * add_placeholders_to_joinrel
- *		Add any required PlaceHolderVars to a join rel's targetlist;
- *		and if they contain lateral references, add those references to the
- *		joinrel's direct_lateral_relids.
+ *		Add any newly-computable PlaceHolderVars to a join rel's targetlist;
+ *		and if computable PHVs contain lateral references, add those
+ *		references to the joinrel's direct_lateral_relids.
  *
  * A join rel should emit a PlaceHolderVar if (a) the PHV can be computed
  * at or below this join level and (b) the PHV is needed above this level.
- * However, condition (a) is sufficient to add to direct_lateral_relids,
- * as explained below.
+ * Our caller build_join_rel() has already added any PHVs that were computed
+ * in either join input rel, so we need add only newly-computable ones to
+ * the targetlist.  However, direct_lateral_relids must be updated for every
+ * PHV computable at or below this join, as explained below.
  */
 void
 add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
@@ -426,13 +428,10 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 			/* Is it still needed above this joinrel? */
 			if (bms_nonempty_difference(phinfo->ph_needed, relids))
 			{
-/* Yup, add it to the output */
-joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
-	phinfo->ph_var);
-joinrel->reltarget->width += phinfo->ph_width;
-
 /*
- * Charge the cost of evaluating the contained expression if
+ * Yes, but only add to tlist if it wasn't computed in either
+ * input; otherwise it should be there already.  Also, we
+ * charge the cost of evaluating the contained expression if
  * the PHV can be computed here but not in either input.  This
  * is a bit bogus because we make the decision based on the
  * first pair of possible input relations considered for the
@@ -445,12 +444,15 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 if (!bms_is_subset(phinfo->ph_eval_at, outer_rel->relids) &&
 	!bms_is_subset(phinfo->ph_eval_at, inner_rel->relids))
 {
+	PlaceHolderVar *phv = phinfo->ph_var;
 	QualCost	cost;
 
-	cost_qual_eval_node(, (Node *) phinfo->ph_var->phexpr,
-		root);
+	joinrel->reltarget->exprs = 

Re: Making Vars outer-join aware

2022-08-16 Thread Tom Lane
Richard Guo  writes:
> When we add required PlaceHolderVars to a join rel's targetlist, if the
> PHV can be computed in the nullable-side of the join, we would add the
> join's RT index to phnullingrels. This is correct as we know the PHV's
> value can be nulled at this join. But I'm wondering if it is necessary
> since we have already propagated any varnullingrels into the PHV when we
> apply pullup variable replacement in perform_pullup_replace_vars().

I'm not seeing the connection there?  Any nullingrels that are set
during perform_pullup_replace_vars would refer to outer joins within the
pulled-up subquery, whereas what you are talking about here is what
happens when the PHV's value propagates up through outer joins of the
parent query.  There's no overlap between those relid sets, or if there
is we have some fault in the logic that constrains join order to ensure
that there's a valid place to compute each PHV.

> On the other hand, the phnullingrels may contain RT indexes of outer
> joins above this join level. It seems not good to add such a PHV to the
> joinrel's targetlist.

Hmm, yeah, add_placeholders_to_joinrel is doing this wrong.  The
intent was to match what build_joinrel_tlist does with plain Vars,
but in that case we're adding the join's relid to what we find in
varnullingrels in the input tlist.  Using the phnullingrels from
the placeholder_list entry is wrong.  (I wonder whether a
placeholder_list entry's phnullingrels is meaningful at all?
Maybe it isn't.)  I think it might work to take the intersection
of the join's relids with root->outer_join_rels.

> If the two issues are indeed something we need to fix, maybe we can
> change add_placeholders_to_joinrel() to search the PHVs from
> outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
> if needed, just like what we do in build_joinrel_tlist(). The PHVs there
> should have the correct phnullingrels (at least the PHVs in base rels'
> targetlists have correctly set phnullingrels to NULL).

Yeah, maybe we should do something more invasive and make use of the
input targetlists rather than doing this from scratch.  Not sure.
I'm worried that doing it that way would increase the risk of getting
different join tlist contents depending on which pair of input rels
we happen to consider first.

regards, tom lane




Re: Making Vars outer-join aware

2022-08-15 Thread Richard Guo
On Tue, Aug 2, 2022 at 3:51 AM Tom Lane  wrote:

> Here's a rebase up to HEAD, mostly to placate the cfbot.
> I accounted for d8e34fa7a (s/all_baserels/all_query_rels/
> in those places) and made one tiny bug-fix change.
> Nothing substantive as yet.


When we add required PlaceHolderVars to a join rel's targetlist, if the
PHV can be computed in the nullable-side of the join, we would add the
join's RT index to phnullingrels. This is correct as we know the PHV's
value can be nulled at this join. But I'm wondering if it is necessary
since we have already propagated any varnullingrels into the PHV when we
apply pullup variable replacement in perform_pullup_replace_vars().

On the other hand, the phnullingrels may contain RT indexes of outer
joins above this join level. It seems not good to add such a PHV to the
joinrel's targetlist. Below is an example:

# explain (verbose, costs off) select a.i, ss.jj from a left join (b left
join (select c.i, coalesce(c.j, 1) as jj from c) ss on b.i = ss.i) on true;
   QUERY PLAN
-
 Nested Loop Left Join
   Output: a.i, (COALESCE(c.j, 1))
   ->  Seq Scan on public.a
 Output: a.i, a.j
   ->  Materialize
 Output: (COALESCE(c.j, 1))
 ->  Hash Left Join
   Output: (COALESCE(c.j, 1))
   Hash Cond: (b.i = c.i)
   ->  Seq Scan on public.b
 Output: b.i, b.j
   ->  Hash
 Output: c.i, (COALESCE(c.j, 1))
 ->  Seq Scan on public.c
   Output: c.i, COALESCE(c.j, 1)
(15 rows)

In this query, for the joinrel {B, C}, the PHV in its targetlist has a
phnullingrels that contains the join of {A} and {BC}, which is confusing
because we have not reached that join level.

I tried the changes below to illustrate the two issues. The assertion
holds true during regression tests and the error pops up for the query
above.

--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -464,18 +464,20 @@ add_placeholders_to_joinrel(PlannerInfo *root,
RelOptInfo *joinrel,
   {
   if (sjinfo->jointype == JOIN_FULL &&
sjinfo->ojrelid != 0)
   {
-  /* PHV's value can be nulled at this
join */
-  phv->phnullingrels =
bms_add_member(phv->phnullingrels,
-
   sjinfo->ojrelid);
+
 Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels));
+
+  if
(!bms_is_subset(phv->phnullingrels, joinrel->relids))
+  elog(ERROR, "phnullingrels
is not subset of joinrel's relids");
   }
   }
   else if (bms_is_subset(phinfo->ph_eval_at,
inner_rel->relids))
   {
   if (sjinfo->jointype != JOIN_INNER &&
sjinfo->ojrelid != 0)
   {
-  /* PHV's value can be nulled at this
join */
-  phv->phnullingrels =
bms_add_member(phv->phnullingrels,
-
   sjinfo->ojrelid);
+
 Assert(bms_is_member(sjinfo->ojrelid, phv->phnullingrels));
+
+  if
(!bms_is_subset(phv->phnullingrels, joinrel->relids))
+  elog(ERROR, "phnullingrels
is not subset of joinrel's relids");
   }
   }


If the two issues are indeed something we need to fix, maybe we can
change add_placeholders_to_joinrel() to search the PHVs from
outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
if needed, just like what we do in build_joinrel_tlist(). The PHVs there
should have the correct phnullingrels (at least the PHVs in base rels'
targetlists have correctly set phnullingrels to NULL).

Thanks
Richard


Re: Making Vars outer-join aware

2022-08-01 Thread Tom Lane
Zhihong Yu  writes:
> For v3-0003-label-Var-nullability-in-parser.patch :

> +   if (rtindex > 0 && rtindex <= list_length(pstate->p_nullingrels))
> +   relids = (Bitmapset *) list_nth(pstate->p_nullingrels, rtindex - 1);
> +   else
> +   relids = NULL;
> +
> +   /*
> +* Merge with any already-declared nulling rels.  (Typically there won't
> +* be any, but let's get it right if there are.)
> +*/
> +   if (relids != NULL)

> It seems the last if block can be merged into the previous if block. That
> way `relids = NULL` can be omitted.

No, because the list entry we fetch could be NULL.

regards, tom lane




Re: Making Vars outer-join aware

2022-08-01 Thread Zhihong Yu
On Mon, Aug 1, 2022 at 12:51 PM Tom Lane  wrote:

> Here's a rebase up to HEAD, mostly to placate the cfbot.
> I accounted for d8e34fa7a (s/all_baserels/all_query_rels/
> in those places) and made one tiny bug-fix change.
> Nothing substantive as yet.
>
> regards, tom lane
>
> Hi,
For v3-0003-label-Var-nullability-in-parser.patch :

+   if (rtindex > 0 && rtindex <= list_length(pstate->p_nullingrels))
+   relids = (Bitmapset *) list_nth(pstate->p_nullingrels, rtindex - 1);
+   else
+   relids = NULL;
+
+   /*
+* Merge with any already-declared nulling rels.  (Typically there won't
+* be any, but let's get it right if there are.)
+*/
+   if (relids != NULL)

It seems the last if block can be merged into the previous if block. That
way `relids = NULL` can be omitted.

Cheers


Re: Making Vars outer-join aware

2022-07-13 Thread Tom Lane
Richard Guo  writes:
> But I'm not sure which is better, to evaluate the expression below or
> above the outer join. It seems to me that if the size of base rel is
> large and somehow the size of the joinrel is small, evaluation above the
> outer join would win. And in the opposite case evaluation below the
> outer join would be better.

Reasonable question.  But I think for the purposes of this patch,
it's better to keep the old behavior as much as we can.  People
have probably relied on it while tuning queries.  (I'm not saying
it has to be *exactly* bug-compatible, but simple cases like your
example probably should work the same.)

regards, tom lane




Re: Making Vars outer-join aware

2022-07-13 Thread Richard Guo
On Tue, Jul 12, 2022 at 9:37 PM Tom Lane  wrote:

> Richard Guo  writes:
> > Note that the evaluation of expression 'b.j + 1' now occurs below the
> > outer join. Is this something we need to be concerned about?
>
> It seems more formally correct to me, but perhaps somebody would
> complain about possibly-useless expression evals.  We could likely
> re-complicate the logic that inserts PHVs during pullup so that it
> looks for Vars it can apply the nullingrels to.  Maybe there's an
> opportunity to share code with flatten_join_alias_vars?


Yeah, maybe we can extend and leverage the codes in
adjust_standard_join_alias_expression() to do that?

But I'm not sure which is better, to evaluate the expression below or
above the outer join. It seems to me that if the size of base rel is
large and somehow the size of the joinrel is small, evaluation above the
outer join would win. And in the opposite case evaluation below the
outer join would be better.

Thanks
Richard


Re: Making Vars outer-join aware

2022-07-12 Thread Tom Lane
Richard Guo  writes:
> Note that the evaluation of expression 'b.j + 1' now occurs below the
> outer join. Is this something we need to be concerned about?

It seems more formally correct to me, but perhaps somebody would
complain about possibly-useless expression evals.  We could likely
re-complicate the logic that inserts PHVs during pullup so that it
looks for Vars it can apply the nullingrels to.  Maybe there's an
opportunity to share code with flatten_join_alias_vars?

regards, tom lane




Re: Making Vars outer-join aware

2022-07-12 Thread Richard Guo
On Mon, Jul 11, 2022 at 3:38 AM Tom Lane  wrote:

> Here's v2 of this patch series.  It's functionally identical to v1,
> but I've rebased it over the recent auto-node-support-generation
> changes, and also extracted a few separable bits in hopes of making
> the main planner patch smaller.  (It's still pretty durn large,
> unfortunately.)  Unlike the original submission, each step will
> compile on its own, though the intermediate states mostly don't
> pass all regression tests.


Noticed a different behavior from previous regarding PlaceHolderVar.
Take the query below as an example:

select a.i, ss.jj from a left join (select b.i, b.j + 1 as jj from b) ss
on a.i = ss.i;

Previously the expression 'b.j + 1' would not be wrapped in a
PlaceHolderVar, since it contains a Var of the subquery and meanwhile it
does not contain any non-strict constructs. And now in the patch, we
would insert a PlaceHolderVar for it, in order to have a place to store
varnullingrels. So the plan for the above query now becomes:

# explain (verbose, costs off) select a.i, ss.jj from a left join
(select b.i, b.j + 1 as jj from b) ss on a.i = ss.i;
QUERY PLAN
--
 Hash Right Join
   Output: a.i, ((b.j + 1))
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on public.b
 Output: b.i, (b.j + 1)
   ->  Hash
 Output: a.i
 ->  Seq Scan on public.a
   Output: a.i
(9 rows)

Note that the evaluation of expression 'b.j + 1' now occurs below the
outer join. Is this something we need to be concerned about?

Thanks
Richard


Re: Making Vars outer-join aware

2022-07-10 Thread Tom Lane
Zhihong Yu  writes:
> In remove_unneeded_nulling_relids():

> +   if (removable_relids == NULL)

> Why is bms_is_empty() not used in the above check ?

We initialized that to NULL just a few lines above, and then did
nothing to it other than perhaps bms_add_member, so it's impossible
for it to be empty-and-yet-not-NULL.

> +typedef struct reduce_outer_joins_partial_state

> Since there are already reduce_outer_joins_pass1_state
> and reduce_outer_joins_pass2_state, a comment
> above reduce_outer_joins_partial_state would help other people follow its
> purpose.

We generally document these sorts of structs in the using code,
not at the struct declaration.

> +   if (j->rtindex)
> +   {
> +   if (j->jointype == JOIN_INNER)
> +   {
> +   if (include_inner_joins)
> +   result = bms_add_member(result, j->rtindex);
> +   }
> +   else
> +   {
> +   if (include_outer_joins)

> Since there are other join types beside JOIN_INNER, should there be an
> assertion in the else block?

Like what?  We don't particularly care what the join type is here,
as long as it's not INNER.  In any case there is plenty of nearby
code checking for unsupported join types.

regards, tom lane




Re: Making Vars outer-join aware

2022-07-10 Thread Zhihong Yu
On Sun, Jul 10, 2022 at 12:39 PM Tom Lane  wrote:

> Here's v2 of this patch series.  It's functionally identical to v1,
> but I've rebased it over the recent auto-node-support-generation
> changes, and also extracted a few separable bits in hopes of making
> the main planner patch smaller.  (It's still pretty durn large,
> unfortunately.)  Unlike the original submission, each step will
> compile on its own, though the intermediate states mostly don't
> pass all regression tests.
>
> regards, tom lane
>
> Hi,
For v2-0004-cope-with-nullability-in-planner.patch.
In remove_unneeded_nulling_relids():

+   if (removable_relids == NULL)

Why is bms_is_empty() not used in the above check ?
Earlier there is `if (bms_is_empty(old_nulling_relids))`

+typedef struct reduce_outer_joins_partial_state

Since there are already reduce_outer_joins_pass1_state
and reduce_outer_joins_pass2_state, a comment
above reduce_outer_joins_partial_state would help other people follow its
purpose.

+   if (j->rtindex)
+   {
+   if (j->jointype == JOIN_INNER)
+   {
+   if (include_inner_joins)
+   result = bms_add_member(result, j->rtindex);
+   }
+   else
+   {
+   if (include_outer_joins)

Since there are other join types beside JOIN_INNER, should there be an
assertion in the else block ? e.g. jointype wouldn't be JOIN_UNIQUE_INNER.

Cheers


Re: Making Vars outer-join aware

2022-07-05 Thread Tom Lane
Richard Guo  writes:
> For the query in the example

> SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)

> (foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2
> scan level. Previously we do that with check_outerjoin_delay() by
> scanning all the outer joins below and check if the qual references any
> nullable rels of the OJ, and if so include the OJ's rels into the qual.
> So as a result we'd get that foo(t2.z) is referencing t1 and t2, and
> we'd put the qual into the join lists of t1 and t2.

> Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT
> JOIN below (with RTI 3). So we consider the qual is referencing RTE 2
> (which is t2) and RTE 3 (which is the OJ). Do we still need to include
> RTE 1, i.e. t1 into the qual's required relids? How should we do that?

It seems likely to me that we could leave the qual's required_relids
as just {2,3} and not have to bother ORing any additional bits into
that.  However, in the case of a Var-free JOIN/ON clause it'd still
be necessary to artificially add some relids to its initially empty
relids.  Since I've not yet tried to rewrite distribute_qual_to_rels
I'm not sure how the details will shake out.

regards, tom lane




Re: Making Vars outer-join aware

2022-07-05 Thread Richard Guo
On Sat, Jul 2, 2022 at 12:42 AM Tom Lane  wrote:

>
> Anyway, even though this is far from done, I'm pretty pleased with
> the results so far, so I thought I'd put it out for review by
> anyone who cares to take a look.  I'll add it to the September CF
> in hopes that it might be more or less finished by then, and so
> that the cfbot will check it out.
>

Thanks for the work! I have a question about qual clause placement.

For the query in the example

SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z)

(foo() is not strict.) We want to avoid pushing foo(t2.z) down to the t2
scan level. Previously we do that with check_outerjoin_delay() by
scanning all the outer joins below and check if the qual references any
nullable rels of the OJ, and if so include the OJ's rels into the qual.
So as a result we'd get that foo(t2.z) is referencing t1 and t2, and
we'd put the qual into the join lists of t1 and t2.

Now there is the 'varnullingrels' marker in the t2.z, which is the LEFT
JOIN below (with RTI 3). So we consider the qual is referencing RTE 2
(which is t2) and RTE 3 (which is the OJ). Do we still need to include
RTE 1, i.e. t1 into the qual's required relids? How should we do that?

Thanks
Richard


Re: Making Vars outer-join aware

2022-07-01 Thread Tom Lane
"Finnerty, Jim"  writes:
> Given that views are represented in a parsed representation, does 
> anything need to happen to the Vars inside a view when that view is 
> outer-joined to?  

No.  The markings only refer to what is in the same Query tree as the Var
itself.

Subquery flattening during planning does deal with this: if we pull up a
subquery (possibly inserted from a view) that was underneath an outer
join, the nullingrel marks on the upper-level Vars referring to subquery
outputs will get merged into what is pulled up, either by unioning the
varnullingrel bitmaps if what is pulled up is just a Var, or if what is
pulled up isn't a Var, by wrapping it in a PlaceHolderVar that carries
the old outer Var's markings.  We had essentially this same behavior
with PlaceHolderVars before, but I think this way makes it a lot more
principled and intelligible (and I suspect there are now cases where we
manage to avoid inserting unnecessary PlaceHolderVars that the old code
couldn't avoid).

> If an outer join is converted to an inner join, must this information get 
> propagated to all the affected Vars, potentially across query block levels?

Yes.  The code is there in the patch to run around and remove nullingrel
bits from affected Vars.

One thing that doesn't happen (and didn't before, so this is not a
regression) is that if we strength-reduce a FULL JOIN USING to an outer
or plain join, it'd be nice if the "COALESCE" hack we represent the
merged USING column with could be replaced with the same lower-relation
Var that the parser would have used if the join weren't FULL to begin
with.  Without that, we're leaving optimization opportunities on the
table.  I'm hesitant to try to do that though as long as the COALESCE
structures look exactly like something a user could write.  It'd be
safer if we used some bespoke node structure for this purpose ...
but nobody's bothered to invent that.

regards, tom lane




Re: Making Vars outer-join aware

2022-07-01 Thread Finnerty, Jim
Tom, two quick questions before attempting to read the patch:

Given that views are represented in a parsed representation, does anything 
need to happen to the Vars inside a view when that view is outer-joined to?  

If an outer join is converted to an inner join, must this information get 
propagated to all the affected Vars, potentially across query block levels?