Jim Nasby <jim.na...@openscg.com> writes: > On 3/16/17 11:54 AM, David Steele wrote: >> On 2/14/17 4:03 PM, Tom Lane wrote: >>> Well, the key point is whether it's really OK to de-dup on the basis >>> of only the CTIDs that are not eliminated in any UNION arm. I was >>> feeling fairly good about that until I thought of the full-join-to- >>> left-join-to-no-join conversion issue mentioned in the comment.
>> Jim, have you had time to think about this? Any insights? > Having thought about it, I share Tom's concerns. And now I'm worried > about what happens if there are multiple separate OR clauses. I guess > those would be handled by separate UNIONs? As proposed, the patch would try to optimize by splitting each OR clause independently, and would choose whichever way gave the best cost estimate. I'm not sure it's possible to do better than that, and even if it is, I think improving it could be left for later. > Tom, could you expand the description some, especially with some examples? Here's a counterexample --- admittedly rather artificial, but still a counterexample --- to applying the optimization when there are FULL joins. Consider SELECT count(*) FROM a FULL JOIN b ON (a.id = b.id) WHERE (a.x = 42 OR b.y = 43); and suppose that a and b have mutual FK constraints guaranteeing that every non-null a.id value has exactly one match in b and vice versa. (For the purposes of this example, I think it doesn't matter whether or not we allow there to be null id values.) Also suppose that there are some join rows satisfying both a.x = 42 and b.y = 43, while other join rows satisfy only one of the OR arms. The patch would try to implement this query as, approximately, SELECT count(*) FROM ( SELECT FROM a FULL JOIN b ON (a.id = b.id) WHERE a.x = 42 UNION-using-ctids SELECT FROM a FULL JOIN b ON (a.id = b.id) WHERE b.y = 43 ) Now in the first UNION arm, we'd observe that the strict a.x = 42 condition allows reduction of the join strength to LEFT JOIN, and then we'd deduce from the FK on b that the join to b can be dropped altogether. In the second arm, we'd similarly conclude that a can be dropped altogether, leaving SELECT count(*) FROM ( SELECT FROM a WHERE a.x = 42 UNION-using-ctids SELECT FROM b WHERE b.y = 43 ) But at this point there are no rels that are not eliminated in any UNION arm, so that no de-duping would occur in the UNION, meaning that we'd double-count any join rows in which both a.x = 42 and b.y = 43. I'd also considered an approach of de-duping on the basis of all relation ctids, while allowing a rel's ctid to be returned as NULL from a UNION arm in which the rel was eliminated entirely. But that doesn't fix it, because in this example the first arm would return (a.ctid, NULL) while the second arm would return (NULL, b.ctid), so that the UNION step would still fail to detect any duplication. To make it work, we'd have to not eliminate the joins, which would pretty much defeat the usefulness of the optimization for your original example case. So full joins definitely break this whole optimization. I think it's okay with left joins though, because when starting from "a left join b" it will never be possible to remove "a" so we'll always include a.ctid in the UNION de-duping step. If b is removed in some arm, then it must be true that we get exactly one left-join output row per a row, regardless of the contents of b, in that arm. The argument for the patch being okay is essentially that we must get exactly one left-join output row per a row, regardless of the contents of b, in *every* arm, because the various modified versions of the OR clause can't affect that conclusion. In some of the arms we might not remove b, and we might even be able to reduce the left join to an inner join, but there should still be no more than one join output row per a row. That being the case, it should be sufficient to de-dup using a.ctid while ignoring b.ctid. Any clearer yet? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers