Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

Noah Misch Sun, 17 Mar 2019 17:10:25 -0700

On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote:
> Noah Misch <[email protected]> writes:
> > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote:
> >> FWIW, my problem with this patch is that I remain unconvinced of the basic
> >> correctness of the transform (specifically the unique-ification approach).
> >> Noah's points would be important to address if we were moving the patch
> >> towards commit, but I don't see much reason to put effort into it until
> >> we can think of a way to prove whether that works.
> 
> > Not even effort to fix the assertion failures I reported?
> 
> If it seemed relevant to the proof-of-correctness problem, I would look
> into it, but ...


I put some hours into theoretical study of the proof, and I didn't find any
holes.  When the planner removes "l0 LEFT JOIN r1", it does that because
there's one output row per l0.ctid regardless of the rows of r1.  Hence,
deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid,
r1.ctid).  In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be sufficient
as a key, but we're out of luck because removing the join makes us have only
l0.ctid for some tuples and only r1.ctid for other tuples.

If PostgreSQL ever gets inner join removal, it would be harder to preserve
this optimization in this form.  At that point, perhaps we'd cost the path
that retains the join for the benefit of $SUBJECT.  Given the performance test
results, $SUBJECT may already need a cost-based decision on whether to use it.

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

Reply via email to