Hi Patrick -
Just that you see where I'm lost at the moment - here is my scrap text of
finding solid reasons when to inner join. It's a classical example of
tightening the bounds to get fixed-point solutions in a lattice ... probably
has long be solved by someone. Now I have (for the 6th time) the feeling that I
know what's going on and how to define the various tree attributes correctly (I
think I need 6 of them instead of three as below) - but I won't continue with
it until next weekend.
So long - don't read it to understand (unless you wnat to contribute to a sound
solution ...), just be amused :-).
Harald
.............................................
orop2
1. (||-4) semantics ...
2. Optimization of outer joins to inner joins - local consistency
It is interesting - and for some databases an advantage - to replace outer
joins with inner joins if the outer joins are not necessary. "Not necessary"
here means that the result must not be different whether using outer joins or
inner joins. The question, obviously, is: For which member expressions can we
replace an outer join with an inner join?
It took me a few hours to find this out. Here is the result.
A record (object) will be in the result if the evaluation of the condition in
3-value SQL logic will return true; it will not be in the result if the result
is either logical-null or false. The difference between outer joining and inner
joining is that with the latter, objects are missing from the set on which the
condition is checked. Thus, inner joins "emulates" a result that is
logical-null or false; and therefore, we can replace an outer join with an
inner join only if the resulting condition cannot become true.
If we look at the evaluation tree of a condition, we can assign to each node
the set of member expressions that, when emptily outer joined (i.e., creating
oj-nulls for all their simple properties), might make that node true,
logical-null, or false; lets call these sets T, N, and F, respectively. When we
know these sets at the top condition, we can then safely use inner joins for
all member expressions that are *only* in N and F.
For example, the condition
x.A.B == 4
is logical-null when emptily outer-joining x.A, but never true or false. Thus,
we have T={}, N={x.A}, F={}. Inner joining x.A (which drops records that are
emptily joined) therefore yields the same 2-valued-logic result.
On the other hand, the condition
x.A.B is null and x.E is null
can be made true or false by emptily outer-joining x.A, but never logical-null
(resaon: The right side can only be true or false; and the left side also; so
all of the condition is either true or false - depending on x.E's value). So,
T={x.A}, N={}, F={x.A}, and therefore we cannot inner join x.A. By direct
reasonsing, we see that inner joining x.A will drop all the non-joinable
records, hence inner and outer joining are different for records where x.E is
null, and will yield a different 2-valued-logic result.
Finally, the condition
x.A.B is null or x.E is null
is always true when emptily outer-joining x.A, or T={x.A}, N={}, F={}. Hence,
it is always different from inner joining, where the empty join drops such
records from the result.
How can we compute T, N, and F? First, we note that we can compute these sets
with different "preciseness". For example, one correct way would be to just
collect all member expressions into a set M, and then set T=N=F=M. Obviously,
this prevents any inner joins. On the other hand, we can quite precisely check
condition like the following:
x.A.B == null T=?, N=?, F=?
!(x.A.B != null) T=?, N=?, F=?
(x.A.B ?? 4) == 4 T=?, N=?, F=?
!((x.A.B ?? 4) != 4) T=?, N=?, F=?
(x.A.B ?? x.E) == 4 T=?, N=?, F=?
(x.A.B ?? x.C.D) == 4 T=?, N=?, F=?
In the following, I'll first give the rules for the 4 logical operators &&, ||,
!, and ?!; then I'll give them for very common simple conditions; finally, I'll
look at some complex simple conditions.
(a) && operator:
T(A && B) = T(A) union T(B)
Reason: Emptily outer joining a member expression can make A true. B
might be true for completely different reasons (e.g., it might be a tautology).
Then empty outer joining that member expression from T(A) will make all of A &&
B equal true.
This looks very unpromising: Because T is computed by set union, it is not
possible to "kick out" a member expression from T. So, if a member expression
is in T from some sub-condition, we will never be able to optimize its join to
an inner join! Essentially, this means that already on the simple condition
level, we must be careful when we put a member expression into T. But let's
continue with the logical operators first.
N(A && B) = N(A) union N(B) union T(A) union T(B)
Reason: First, a member expression from N(A) can make A && B equal to
logical-null, e.g. if B is always logical-null or true. Conversely, the same is
true for N(B). However, also a member expression (say t) from T(A) may make the
condition logical-null, e.g. if B is null == null: This condition will always
return logical-null, and when now t makes A true, A && B is true &&
logical-null, which is equal to logical-null. So we have to add all member
expressions of T(A) to N(A && B). The same is obviously also true for T(B).
F(A && B) = F(A) union F(B) union N(A) union N(B) union T(A) union T(B)
Reason: Similiarly to above. Adding N(A) and N(B) is necessary e.g. if
B or A is null == null. Adding T(A) and T(B) is necessary e.g. if B or A is a
contradiction like 0 == 1.
These "blow-ups" of N and F are not as bad as the one for T - after all, inner
joins are not suppressed by having a member expression in N or F. However, this
might be painful when we compute F and N for the ! operator, because there we
might blow up T a lot. We'll see ...
(b) || operator:
T(A || B) = T(A) union T(B) union N(A) union N(B) union F(A) union F(B)
Reason: An emptily outer joined member expressions that makes A true
can also make A || B true - e.g. when B is a contradiction. But also if it
makes A logical-null, all of A || B can become true - e.g. when B is a
tautology. Finally, even if it makes A false, all of A || B can become true
when B is a tautology.
The blow-up for || is even larger - essentially, *every member expression* will
be in T and hence require an outer join. But this is to be expected - it is
actually the foundation of the NH-2583 problem!
SHIT. Does not work for
N(A || B) = N(A) union N(B) union F(A) union F(B)
Reason: If a member expression is in F(A), then emptily outer joining
it can make A || B logical-null if B is e.g. null == null, as logical-null ||
false is logical-null. However, a member expression that might make A true when
emptily outer joined will never make A || B logical-null - there is no
possibility that true || ... becomes logical-null. You might ask: But that
member expression only *might* make A true; what if it actually makes A
logical-null - then, the whole expression will become logical-null after all!
The answer is: In that case, the member-expression must (also) be in N(A), as
it obviously might make A logical-null. So we cannot lose a possibility.
F(A || B) = F(A) union F(B)
Reason: The reasoning is like for T(A && B).
(c) ! operator:
T(!A) = F(A)
Reason: Under 3-valued logic, !A can only become true if A becomes
false. So only member expressions that, when emptily outer joined, *might* make
A false, can possibly make !A true.
N(!A) = N(A)
Reason: Here the reasoning is as simple as for T.
F(!A) = T(A)
Reason: Also this is easy.
(d) Logical ?: operator: A ? B : C is equivalent to A && B || !A && C. From
this, one can compute the T, N, and F sets. Here are the - boring - results:
T(A ? B : C) = T(A && B || !A && C)
= T(A && B) union T(!A && C)
union
N(A && B) union N(!A && C)
union
F(A && B) union F(!A && C)
= T(A) union T(B) union T(!A) union T(C)
union
N(A) union N(B) union T(A) union T(B) union N(!A) union
N(C) union T(!A) union T(C)
union
F(A) union F(B) union N(A) union N(B) union T(A) union
T(B) union F(!A) union F(C) union N(!A) union N(C) union T(!A) union T(C)
= T(A) union T(B) union F(A) union T(C)
union
N(A) union N(B) union T(A) union T(B) union N(A) union
N(C) union F(A) union T(C)
union
F(A) union F(B) union N(A) union N(B) union T(A) union
T(B) union T(A) union F(C) union N(A) union N(C) union F(A) union T(C)
= F(A) union F(B) union F(C) union N(A) union N(B) union
N(C) union T(A) union T(B) union T(C)
N(A ? B : C) = N(A && B || !A && C)
= N(A && B) union N(!A && C) union F(A && B) union F(!A &&
C)
= N(A) union N(B) union T(A) union T(B)
union
N(!A) union N(C) union T(!A) union T(C)
union
F(A) union F(B) union N(A) union N(B) union T(A) union
T(B)
union
F(!A) union F(C) union N(!A) union N(C) union T(!A) union
T(C)
= N(A) union N(B) union T(A) union T(B)
union
N(A) union N(C) union F(A) union T(C)
union
F(A) union F(B) union N(A) union N(B) union T(A) union
T(B)
union
T(A) union F(C) union N(A) union N(C) union F(A) union
T(C)
= F(A) union F(B) union F(C) union N(A) union N(B) union
N(C) union T(A) union T(B) union T(C)
F(A ? B : C) = F(A && B || !A && C)
= F(A && B) union F (!A && C)
= F(A) union F(B) union N(A) union N(B) union T(A) union
T(B)
union
F(!A) union F(C) union N(!A) union N(C) union T(!A) union
T(C)
= F(A) union F(B) union N(A) union N(B) union T(A) union
T(B)
union
T(A) union F(C) union N(A) union N(C) union F(A) union
T(C)
= F(A) union F(B) union F(C) union N(A) union N(B) union
N(C) union T(A) union T(B) union T(C)
Now let us look at simple conditions. The following are *possible* assignments
to T, N, and F. As said above, one can also assign the values more crudely -
e.g., always assign all member expressions in the condition to all three sets.
However, the assignments below are "precise" - i.e., they assign the minimal
sets - for all conditions except (i) (if I did not make a mistake).
(e) mE*.P <op> <constant(s)> (where <constant(s)> are not value-nulls)
mE* here is a symbol for a "nested sequence of member expressions" mE1, mE2,
..., .P is the final property. E.g. in
a.B.C.D.E == 4
the member expressions are mE1=a.B, mE2=a.B.C, and mE3=a.B.C.D; the final
property is E. In this case, we get
T(mE*.P <op> <constant>) == {}
N(mE*.P <op> <constant>) == {mE1, mE2, ...}
F(mE*.P <op> <constant>) == {}
Reason: Emptily outer joining any member expression will yield oj-null
for P; but oj-null <op> <constant> is always logical-null.
(f) mE*.P == me'*.Q
The result depends on the definition of ==: We can either use the Linq
definition where null == null is true (which requires an SQL translation of ...
= ... OR ... IS NULL AND ... IS NULL); or the SQL definition where null == null
is logical-null. In the first case, we reason as follows:
- When one side (e.g. the left one) does empty outer-joining and yields
oj-null for P, but the right side has non-oj-null values, the SQL
oj-null = value OR oj-null IS NULL AND value IS NULL
evaluates as logical-null OR true AND false, which is logical-null OR
true, which is true.
- When both sides do empty outer-joining, we get
oj-null = oj-null OR oj-null IS NULL AND oj-null IS NULL
which is logical-null OR true AND true, which is true.
So, empty outer joining will always yield true, and hence
T(mE*.P == me'*.Q) = {mE1, mE2, ...} union {mE1', mE2', ...}
N(mE*.P == me'*.Q) = {}
F(mE*.P == me'*.Q) = {}
For the SQL definition of equality, we get the following:
- When one side (e.g. the left one) does empty outer-joining and yields
oj-null for P, but the right side has non-oj-null values, the SQL
oj-null = value
evaluates as logical-null.
- When both sides do empty outer-joining, we get
oj-null = oj-null
which is logical-null.
So, empty outer joining will always yield logical-null, and hence
T(mE*.P == me'*.Q) = {}
N(mE*.P == me'*.Q) = {mE1, mE2, ...} union {mE1', mE2', ...}
F(mE*.P == me'*.Q) = {}
(g) mE*.P == null
Empty outer joining can only yield true, hence we get:
T(mE*.P == null) = {mE1, mE2, ...}
N(mE*.P == null) = {}
F(mE*.P == null) = {}
(h) mE*.P != null
Empty outer joining can only yield false, hence we get:
T(mE*.P == null) = {}
N(mE*.P == null) = {}
F(mE*.P == null) = {mE1, mE2, ...}
(i) Other complex simple condition, e.g. (x.A.B ?? x.C.D) <op> <constant(s)>
Here, I opt for the simple assignment
T(...) = N(...) = F(...) = all joinable member expressions in the
condition
It is probably possible to do better by analyzing coalesce operators (??).
Here, we should do a few examples.
(ex.1) Show that the typical "in-to-|| translation" will yield inner joins:
T(a.B.C == 1 || a.B.C == 2)
= T(a.B.C == 1) union T(a.B.C == 2) union N(a.B.C == 1) union N(a.B.C == 2)
union F(a.B.C == 1) union F(a.B.C == 2)
= {} union {} union {a.B} union {a.B} union {} union {}
= {a.B} ????????????????????????????????????????????????????????????? SHIT
- DOES NOT WORK - WE NEED SHARPER UPPER BOUNDS!!!!!
N(a.B.C == 1 || a.B.C == 2)
= N(a.B.C == 1) union N(a.B.C == 2) union F(a.B.C == 1) union F(a.B.C == 2)
= {a.B} union {a.B} union {} union {}
= {a.B}
F(a.B.C == 1 || a.B.C == 2)
= F(a.B.C == 1) union F(a.B.C == 2)
= {} union {}
= {}
........................
3. Optimization of outer joins to inner joins - or-false consistency
Not the same as time-proven logics. One famous example is the "law of the
excluded middle" and SQLs SELECT. The former law says that for a proposition P,
in classical two-valued logics (like propositional or predicate calculus),
either P is true or the negation of P is true. There is no "middle ground".
SQL uses a 3-valued logic in WHERE clause condition evaluation, with the 3
values true, false, and logical-NULL (not to be confused with the *value*
"value-NULL", which is something completely different). However, when executing
a SELECT, SQL is forced to map the 3 values to 2 (true or false): After all,
for a record (tuple) that is the possible result of a query, there are only two
possibilities: The record *is* in the result, or it is *not* - the law of the
excluded middle! In SQL, the mapping is that both false (of 3-valued logic) and
logical-NULL map to false (of 2-valued logic), whereas true (of 3-valued logic)
maps to true (of 2-valued logic). However, this mapping now produces the
well-known possibility that neither a proposition nor its negation is true for
a record. E.g., neither x = 4 nor NOT(x = 4) is true for a record where x is
value-NULL. Thus, the 2-value-logic part of SQL contradicts the law of the
excluded middle. To alleviate that somewhat, SQL goes a long way to ensure that
logical-NULL can only be the result of conditions where at least one
participating value is value-NULL. Thus, when no value-NULLs are present, SQL's
2-value-logic part follows classical logics.
Why is this interesting? When one decides on the semantics of a logic, it is
necessary to keep such general properties in view: Do we want the law of the
excluded middle, de'Morgan's laws etc. to hold or not?
Also for NHib.Linq, it is necessary to decide on such meta-rules. For example,
here is one rule that is true in classical logics as well as in SQL logics
(3-value and also 3-value mapping to 2-value; I will therefore use SQL notation
in the following except in genuine Linq examples):
If Q is a condition that is always false (in a domain of tuples), then
P holds for some value tuple exactly if P OR Q holds.
In other words, (P OR false) is always the same as P.
We could say that a logic which does *not* have this property exhibits the
"or-false-anomaly".
Here are some examples:
When a > 0 is true for some tuples, then also a > 0 OR 0 = 1 is true; and
vice versa.
On the domain of real numbers: When a > 5 is true, then also a > 5 OR b * b
< ABS(b) - 1 is true; and vice versa.
I propose that NHib.Linq should avoid the or-false-anomaly.
Careless introduction of inner joins creates the or-false-anomaly; here is why.
One could believe that outer joins are only necessary for paths that are used
below OR or NOT operators. Thus, in
.Where(x => x.A.B == 1 || x.C.D == 1)
we need outer joins for the table behind x.A's class - otherwise, a
non-existing x.A could filter out objects that *do* have an x.C with x.C.D ==
1. Now, what about
.Where(x => x.A.B == null || x.C.D * x.C.D < Math.Abs(x.C.D) - 1)
As humans, we can easily prove that the right side is always false for real
numbers. But we will not, I assume, get NHibernate to be able to do this.
Therefore, NHibernate must use an outer join for the member x.A in the
left-side condition (on the theory that the right condition only could be true
for some objects).
Now, "pure outer join logic" (without existence guards) will return all objects
where
(a) either there is an x.A, and its B is value-null;
(b) or where there is no x.B at all (reason: This will lead to an outer-joined
x.A.B column that contains null; I will call such a null value introduced by an
outer join oj-null in the following).
As the right side is always false, and we want to avoid the or-false-anomaly,
the query
.Where(x => x.A.B == null)
should return the same result. But this means that also in this simple
condition, we must join to the table of x.A's class with an *outer join*!
Therefore, we *cannot* decide on the usage of outer joins solely on the logical
operators in the where condition, e.g., the presence or absence of ! or ||.
The whole affair somehow requires that we "have to look into the future": If a
member expression m might be outer joined in the condition translation of some
condition ...a... || ...b... and this outer joining alone might yield records
in the result, we *must* outer join m also if we only translate ...a... alone
(or ...b...alone). The consequence of this is:
For all member expressions in a condition that *might* make *any possible
containing top-level condition* true under outer joining, we must always outer
join these member expressions!
Which member expressions may make a condition true when outer joined? Let us
look at a few examples (in C# notation):
x.A.B == null can become true under outer joining if x.A.B is
oj-null
!(x.A.B != null) can similiarly become true under outer joining
(x.A.B ?? 4) == 4 can become true under outer joining if x.A.B is
oj-null
!((x.A.B ?? 4) != 4) can similiarly become true under outer joining
(x.A.B ?? x.E) == 4 can become true under outer joining if there is no
x.A that can be outer joined, but x.E == 4.
(x.A.B ?? x.C.D) == 4 can become true under outer joining if there is no
x.A to be joined, but there is an x.C with value D == 4.
So we see that there are quite a few conditions that might make a condition
true under outer joining to non-existent records. In all these cases, x.A's
class must be outer joined to avoid the or-false-anomaly. And alas, I am not at
all sure that there is an algorithm that allows us to find such "dangerous
member expressions" - at least, this will be a complex machine!
So we try the opposite approach: We *always* do outer joins for member
expression in a condition P, unless we can *prove* that the member expression
being equal to oj-null P can never make a containing top-level condition true.
In that case only, we can use inner joins.
.....................................................
However, we can prove the following:
If a condition x with member expression m is NOT below a !
and does NOT occur "one-sided" below a || operator
and the condition x can only return logical-null for m == value-null (when
other values are set suitably),
then for each surrounding condition c(...x...) of x, we have
c(...x...) || F is true exactly if c(...x...) is true
Aha. (a) Why does this help? (b) How to prove this? (c) How can we use this in
an optimization algorithm?
Ad (a): It helps in the following quite frequent cases:
* A condition needs no joins at all, e.g.
a.E < 5 || !(a.E > 7 && a.E != 9)
* A condition with joins uses no logical operators at all, e.g.
a.B.C > 4
* A condition with joins uses only && operators, e.g.
a.B.C >= 5 && a.B.C <= 10 && a.B.F == 3
* A condition with joins uses only && operators and "trivial ||s", e.g.
a.B.E >= 5 && a.B.E <= 10 && (a.B.F == 3 || a.B.F == 5)
a.B.E <= 10 && (a.B.F == 3 || a.B.F == null)
left side null;
null and true --> null
null and null --> null
null and false (aber NICHT für null!!) --> false
daher inner joins ok!
!(a.B.E <= 10 && (a.B.F == 3 || a.B.F == null))
However, the following ....
a.B.F == 3 || a.B.F == null
--> can return true
if there is no a.B,
!(a && b) || F
where b is false
null || true --> true
In order to avoid the necessity of
In order to steer clear of problems, we therefore do not check w
Actually, *any* condition that may return true or false when fed a single null
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de