On Saturday, March 12, 2016, David G. Johnston <david.g.johns...@gmail.com>
wrote:

> On Saturday, March 12, 2016, Tom Lane <t...@sss.pgh.pa.us
> <javascript:_e(%7B%7D,'cvml','t...@sss.pgh.pa.us');>> wrote:
>
>> "David G. Johnston" <david.g.johns...@gmail.com> writes:
>> > Don't the semantics of a SEMI JOIN also state that the output columns
>> only
>> > come from the outer relation? i.e., the inner relation doesn't
>> contribute
>> > either rows or columns to the final result?  Or is that simply
>> > an implementation artifact of the fact that the only current way to
>> perform
>> > a semi-join explicitly is via exists/in?
>>
>> I think it's an artifact.  What nodes.h actually says about it is you get
>> the values of one randomly-selected matching inner row, which seems like
>> a fine definition for the purposes we plan to put it to.
>>
>>
> But is it a definition that actually materializes anywhere presently?
>
> I'm not sure what we consider an authoritative source but relational
> algebra does define the results of semi and anti joins as only containing
> rows from main relation.
>
> https://en.m.wikipedia.org/wiki/Relational_algebra
>

Pondering it more calling these optimizations "semi" joins further
distances us from the meaning of "semi" as used in relational algebra.  The
half that semi refers to IS that only one half of the tables are returned.
That you only get a single row of output regardless of multiple potential
matches is simply a consequence of this and general set theory.

In short "semi" communicates a semantic meaning as to the intended output
of the query irrespective of the data upon which it is executed.  We now
are hijacking the and calling something "semi" if by some chance the data
the query is operating against happens to be accommodating to some
particular optimization.

This seems wrong on definitional and cleanliness grounds.

So while I'm still liking the idea of introducing specializations of outer
and inner joins I think calling them "semi" joins adds a definitional
inconsistency we are better off avoiding.

This came about because calling something "outer semi join" struck me as
odd.

Something like "outer only join" and "inner only join" comes to mind.
Consider the parallel between this and "index only scan".  Learning that
"only" means "join the outer row to the (at most for outer) one and only
row in the inner relation" doesn't seem to much of a challenge.

David J.

Reply via email to