Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

Andy Fan Sat, 15 May 2021 19:03:41 -0700

On Mon, Mar 18, 2019 at 8:09 AM Noah Misch <[email protected]> wrote:

> On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote:
> > Noah Misch <[email protected]> writes:
> > > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote:
> > >> FWIW, my problem with this patch is that I remain unconvinced of the
> basic
> > >> correctness of the transform (specifically the unique-ification
> approach).
> > >> Noah's points would be important to address if we were moving the
> patch
> > >> towards commit, but I don't see much reason to put effort into it
> until
> > >> we can think of a way to prove whether that works.
> >
> > > Not even effort to fix the assertion failures I reported?
> >
> > If it seemed relevant to the proof-of-correctness problem, I would look
> > into it, but ...
>
I put some hours into theoretical study of the proof, and I didn't find any
> holes.  When the planner removes "l0 LEFT JOIN r1", it does that because
> there's one output row per l0.ctid regardless of the rows of r1.  Hence,
> deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid,
> r1.ctid).  In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be
> sufficient
> as a key, but we're out of luck because removing the join makes us have
> only
> l0.ctid for some tuples and only r1.ctid for other tuples.
>
> If PostgreSQL ever gets inner join removal, it would be harder to preserve
> this optimization in this form.  At that point, perhaps we'd cost the path
> that retains the join for the benefit of $SUBJECT.  Given the performance
> test
> results, $SUBJECT may already need a cost-based decision on whether to use
> it.
>
>
As for the proof-of-correctness problem, I think the below strategy should
be
easy to understand.

SELECT * FROM any_v WHERE (A OR B OR C).
=>
SELECT * FROM any_v WHERE A
UNION ALL
SELECT * FROM any_v WHERE B AND !A
UNION ALL
SELECT * FROM any_v WHERE C AND !A AND !(B AND !A);
where !(Expr) means  (NOT (expr) OR (EXPR) IS NULL)
In this way, there is no duplicated data at the first. Oracle uses a similar
way like this[1].

- A shared planner info idea.

About the planning time case, suppose we have a query below.

SELECT * FROM t1, t2, ... t20  WHERE (t1.a = 1 or t2.a = 1) and xxx;

With the current strategy, t3 .. t20 would be planned many times, including
base
relation like (t3.. t20) and joinrel like (t{3, 4}, t(3,4,5}, t(3,4,5,6})
and
so on.  I guess all these relations would get the same result at every
time. we
don't need to do that at every section of the Union ALL case.  So an idea of
PlannerInfo A can access the planning result of PlannerInfo B comes, this
idea
doesn't come in my mind for the first time,  many cost based
transformations may
need this.

Then I have the following POC in my mind.

PlannerInfo
{

  PlannerInfo *shared_planner_info;
  Relids  or_relids; // this field can be improved later.
};

With this way, if we find a 'relids' is not related to or_relids. We can
grab
the planning result from shared_planner_info (in this or-union case, we can
set
that after we plan the first section of UNION).

Then I found this one would not work after planning time partition prune
since
the relid would change. For example:

P (P1, P2) PARTITION BY A
Q (Q1, Q2) PARTITION BY A

SELECT * FROM t, p, q where (t.a = 1 or p.a = 1).  So in the un-transformed
case, Q2 would have relid = 7.  After the transform, Q2 probably has relid
= 6.

So basically, I have 4 questions now.
1. Would investing on the shared_planner_info idea be a good idea?
2. Without planning time prune, shall the above design work?
3. Currently I want to use relid to identify a resultset and pathlists which
   have the planning time prune issue, should we consider other methods?
4. Any other suggestion about to resolve the 'duplicated planning case'
besides
   the shared_planner_info method?

[1]
https://blogs.oracle.com/optimizer/optimizer-transformations:-or-expansion

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

Reply via email to