On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote: > Noah Misch <n...@leadboat.com> writes: > > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote: > >> FWIW, my problem with this patch is that I remain unconvinced of the basic > >> correctness of the transform (specifically the unique-ification approach). > >> Noah's points would be important to address if we were moving the patch > >> towards commit, but I don't see much reason to put effort into it until > >> we can think of a way to prove whether that works. > > > Not even effort to fix the assertion failures I reported? > > If it seemed relevant to the proof-of-correctness problem, I would look > into it, but ...
I put some hours into theoretical study of the proof, and I didn't find any holes. When the planner removes "l0 LEFT JOIN r1", it does that because there's one output row per l0.ctid regardless of the rows of r1. Hence, deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid, r1.ctid). In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be sufficient as a key, but we're out of luck because removing the join makes us have only l0.ctid for some tuples and only r1.ctid for other tuples. If PostgreSQL ever gets inner join removal, it would be harder to preserve this optimization in this form. At that point, perhaps we'd cost the path that retains the join for the benefit of $SUBJECT. Given the performance test results, $SUBJECT may already need a cost-based decision on whether to use it.