Re: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Harald Mueller Sun, 10 Apr 2011 07:11:57 -0700

Hi Patrick -

Just that you see where I'm lost at the moment - here is my scrap text of 
finding solid reasons when to inner join. It's a classical example of 
tightening the bounds to get fixed-point solutions in a lattice ... probably 
has long be solved by someone. Now I have (for the 6th time) the feeling that I 
know what's going on and how to define the various tree attributes correctly (I 
think I need 6 of them instead of three as below) - but I won't continue with 
it until next weekend.


So long - don't read it to understand (unless you wnat to contribute to a sound 
solution ...), just be amused :-).

Harald

.............................................

orop2

1. (||-4) semantics ...




2. Optimization of outer joins to inner joins - local consistency

It is interesting - and for some databases an advantage - to replace outer 
joins with inner joins if the outer joins are not necessary. "Not necessary" 
here means that the result must not be different whether using outer joins or 
inner joins. The question, obviously, is: For which member expressions can we 
replace an outer join with an inner join?

It took me a few hours to find this out. Here is the result.

A record (object) will be in the result if the evaluation of the condition in 
3-value SQL logic will return true; it will not be in the result if the result 
is either logical-null or false. The difference between outer joining and inner 
joining is that with the latter, objects are missing from the set on which the 
condition is checked. Thus, inner joins "emulates" a result that is 
logical-null or false; and therefore, we can replace an outer join with an 
inner join only if the resulting condition cannot become true.

If we look at the evaluation tree of a condition, we can assign to each node 
the set of member expressions that, when emptily outer joined (i.e., creating 
oj-nulls for all their simple properties), might make that node true, 
logical-null, or false; lets call these sets T, N, and F, respectively. When we 
know these sets at the top condition, we can then safely use inner joins for 
all member expressions that are *only* in N and F.

For example, the condition

    x.A.B == 4

is logical-null when emptily outer-joining x.A, but never true or false. Thus, 
we have T={}, N={x.A}, F={}. Inner joining x.A (which drops records that are 
emptily joined) therefore yields the same 2-valued-logic result.

On the other hand, the condition

    x.A.B is null and x.E is null

can be made true or false by emptily outer-joining x.A, but never logical-null 
(resaon: The right side can only be true or false; and the left side also; so 
all of the condition is either true or false - depending on x.E's value). So, 
T={x.A}, N={}, F={x.A}, and therefore we cannot inner join x.A. By direct 
reasonsing, we see that inner joining x.A will drop all the non-joinable 
records, hence inner and outer joining are different for records where x.E is 
null, and will yield a different 2-valued-logic result.

Finally, the condition

    x.A.B is null or x.E is null

is always true when emptily outer-joining x.A, or T={x.A}, N={}, F={}. Hence, 
it is always different from inner joining, where the empty join drops such 
records from the result.

How can we compute T, N, and F? First, we note that we can compute these sets 
with different "preciseness". For example, one correct way would be to just 
collect all member expressions into a set M, and then set T=N=F=M. Obviously, 
this prevents any inner joins. On the other hand, we can quite precisely check 
condition like the following:

    x.A.B == null           T=?, N=?, F=?
    !(x.A.B != null)        T=?, N=?, F=?
    (x.A.B ?? 4) == 4       T=?, N=?, F=?
    !((x.A.B ?? 4) != 4)    T=?, N=?, F=?
    (x.A.B ?? x.E) == 4     T=?, N=?, F=?
    (x.A.B ?? x.C.D) == 4   T=?, N=?, F=?

In the following, I'll first give the rules for the 4 logical operators &&, ||, 
!, and ?!; then I'll give them for very common simple conditions; finally, I'll 
look at some complex simple conditions.

(a) && operator:

    T(A && B) = T(A) union T(B)
        Reason: Emptily outer joining a member expression can make A true. B 
might be true for completely different reasons (e.g., it might be a tautology). 
Then empty outer joining that member expression from T(A) will make all of A && 
B equal true.

This looks very unpromising: Because T is computed by set union, it is not 
possible to "kick out" a member expression from T. So, if a member expression 
is in T from some sub-condition, we will never be able to optimize its join to 
an inner join! Essentially, this means that already on the simple condition 
level, we must be careful when we put a member expression into T. But let's 
continue with the logical operators first.

    N(A && B) = N(A) union N(B) union T(A) union T(B)
        Reason: First, a member expression from N(A) can make A && B equal to 
logical-null, e.g. if B is always logical-null or true. Conversely, the same is 
true for N(B). However, also a member expression (say t) from T(A) may make the 
condition logical-null, e.g. if B is null == null: This condition will always 
return logical-null, and when now t makes A true, A && B is true && 
logical-null, which is equal to logical-null. So we have to add all member 
expressions of T(A) to N(A && B). The same is obviously also true for T(B).

    F(A && B) = F(A) union F(B) union N(A) union N(B) union T(A) union T(B)
        Reason: Similiarly to above. Adding N(A) and N(B) is necessary e.g. if 
B or A is null == null. Adding T(A) and T(B) is necessary e.g. if B or A is a 
contradiction like 0 == 1.

These "blow-ups" of N and F are not as bad as the one for T - after all, inner 
joins are not suppressed by having a member expression in N or F. However, this 
might be painful when we compute F and N for the ! operator, because there we 
might blow up T a lot. We'll see ...

(b) || operator:

    T(A || B) = T(A) union T(B) union N(A) union N(B) union F(A) union F(B)    
        Reason: An emptily outer joined member expressions that makes A true 
can also make A || B true - e.g. when B is a contradiction. But also if it 
makes A logical-null, all of A || B can become true - e.g. when B is a 
tautology. Finally, even if it makes A false, all of A || B can become true 
when B is a tautology.
        
The blow-up for || is even larger - essentially, *every member expression* will 
be in T and hence require an outer join. But this is to be expected - it is 
actually the foundation of the NH-2583 problem!

SHIT. Does not work for 


    N(A || B) = N(A) union N(B) union F(A) union F(B)
        Reason: If a member expression is in F(A), then emptily outer joining 
it can make A || B logical-null if B is e.g. null == null, as logical-null || 
false is logical-null. However, a member expression that might make A true when 
emptily outer joined will never make A || B logical-null - there is no 
possibility that true || ... becomes logical-null. You might ask: But that 
member expression only *might* make A true; what if it actually makes A 
logical-null - then, the whole expression will become logical-null after all! 
The answer is: In that case, the member-expression must (also) be in N(A), as 
it obviously might make A logical-null. So we cannot lose a possibility.
    F(A || B) = F(A) union F(B) 
        Reason: The reasoning is like for T(A && B).

(c) ! operator:

    T(!A) = F(A)
        Reason: Under 3-valued logic, !A can only become true if A becomes 
false. So only member expressions that, when emptily outer joined, *might* make 
A false, can possibly make !A true.
    N(!A) = N(A)
        Reason: Here the reasoning is as simple as for T.
    F(!A) = T(A)
        Reason: Also this is easy.

(d) Logical ?: operator: A ? B : C is equivalent to A && B || !A && C. From 
this, one can compute the T, N, and F sets. Here are the - boring - results:

    T(A ? B : C)    = T(A && B || !A && C)
                    = T(A && B) union T(!A && C)
                      union
                      N(A && B) union N(!A && C)
                      union
                      F(A && B) union F(!A && C)
                    = T(A) union T(B) union T(!A) union T(C)
                      union
                      N(A) union N(B) union T(A) union T(B) union N(!A) union 
N(C) union T(!A) union T(C) 
                      union
                      F(A) union F(B) union N(A) union N(B) union T(A) union 
T(B) union F(!A) union F(C) union N(!A) union N(C) union T(!A) union T(C)
                    = T(A) union T(B) union F(A) union T(C)
                      union
                      N(A) union N(B) union T(A) union T(B) union N(A) union 
N(C) union F(A) union T(C) 
                      union
                      F(A) union F(B) union N(A) union N(B) union T(A) union 
T(B) union T(A) union F(C) union N(A) union N(C) union F(A) union T(C)
                    = F(A) union F(B) union F(C) union N(A) union N(B) union 
N(C) union T(A) union T(B) union T(C)

    N(A ? B : C)    = N(A && B || !A && C)
                    = N(A && B) union N(!A && C) union F(A && B) union F(!A && 
C)
                    = N(A) union N(B) union T(A) union T(B)
                      union
                      N(!A) union N(C) union T(!A) union T(C)
                      union
                      F(A) union F(B) union N(A) union N(B) union T(A) union 
T(B)
                      union
                      F(!A) union F(C) union N(!A) union N(C) union T(!A) union 
T(C)
                    = N(A) union N(B) union T(A) union T(B)
                      union
                      N(A) union N(C) union F(A) union T(C)
                      union
                      F(A) union F(B) union N(A) union N(B) union T(A) union 
T(B)
                      union
                      T(A) union F(C) union N(A) union N(C) union F(A) union 
T(C)
                    = F(A) union F(B) union F(C) union N(A) union N(B) union 
N(C) union T(A) union T(B) union T(C)

    F(A ? B : C)    = F(A && B || !A && C)
                    = F(A && B) union F (!A && C)
                    = F(A) union F(B) union N(A) union N(B) union T(A) union 
T(B) 
                      union
                      F(!A) union F(C) union N(!A) union N(C) union T(!A) union 
T(C)
                    = F(A) union F(B) union N(A) union N(B) union T(A) union 
T(B) 
                      union
                      T(A) union F(C) union N(A) union N(C) union F(A) union 
T(C)
                    = F(A) union F(B) union F(C) union N(A) union N(B) union 
N(C) union T(A) union T(B) union T(C)

Now let us look at simple conditions. The following are *possible* assignments 
to T, N, and F. As said above, one can also assign the values more crudely - 
e.g., always assign all member expressions in the condition to all three sets. 
However, the assignments below are "precise" - i.e., they assign the minimal 
sets - for all conditions except (i) (if I did not make a mistake).

(e) mE*.P <op> <constant(s)> (where <constant(s)> are not value-nulls)

mE* here is a symbol for a "nested sequence of member expressions" mE1, mE2, 
..., .P is the final property. E.g. in 

    a.B.C.D.E == 4

the member expressions are mE1=a.B, mE2=a.B.C, and mE3=a.B.C.D; the final 
property is E. In this case, we get

    T(mE*.P <op> <constant>) == {}
    N(mE*.P <op> <constant>) == {mE1, mE2, ...}
    F(mE*.P <op> <constant>) == {}
        Reason: Emptily outer joining any member expression will yield oj-null 
for P; but oj-null <op> <constant> is always logical-null.
        
(f) mE*.P == me'*.Q

    The result depends on the definition of ==: We can either use the Linq 
definition where null == null is true (which requires an SQL translation of ... 
= ... OR ... IS NULL AND ... IS NULL); or the SQL definition where null == null 
is logical-null. In the first case, we reason as follows:

        - When one side (e.g. the left one) does empty outer-joining and yields 
oj-null for P, but the right side has non-oj-null values, the SQL
            oj-null = value OR oj-null IS NULL AND value IS NULL
          evaluates as logical-null OR true AND false, which is logical-null OR 
true, which is true.
        - When both sides do empty outer-joining, we get
            oj-null = oj-null OR oj-null IS NULL AND oj-null IS NULL
          which is logical-null OR true AND true, which is true.
        So, empty outer joining will always yield true, and hence

    T(mE*.P == me'*.Q) = {mE1, mE2, ...} union {mE1', mE2', ...}
    N(mE*.P == me'*.Q) = {}
    F(mE*.P == me'*.Q) = {}
        
    For the SQL definition of equality, we get the following:

        - When one side (e.g. the left one) does empty outer-joining and yields 
oj-null for P, but the right side has non-oj-null values, the SQL
            oj-null = value
          evaluates as logical-null.
        - When both sides do empty outer-joining, we get
            oj-null = oj-null
          which is logical-null.
        So, empty outer joining will always yield logical-null, and hence
        
    T(mE*.P == me'*.Q) = {}
    N(mE*.P == me'*.Q) = {mE1, mE2, ...} union {mE1', mE2', ...}
    F(mE*.P == me'*.Q) = {}

(g) mE*.P == null
        Empty outer joining can only yield true, hence we get:

    T(mE*.P == null) = {mE1, mE2, ...}        
    N(mE*.P == null) = {}
    F(mE*.P == null) = {}

(h) mE*.P != null
        Empty outer joining can only yield false, hence we get:

    T(mE*.P == null) = {}
    N(mE*.P == null) = {}
    F(mE*.P == null) = {mE1, mE2, ...}

(i) Other complex simple condition, e.g. (x.A.B ?? x.C.D) <op> <constant(s)>

    Here, I opt for the simple assignment 

        T(...) = N(...) = F(...) = all joinable member expressions in the 
condition

    It is probably possible to do better by analyzing coalesce operators (??).

Here, we should do a few examples.

(ex.1) Show that the typical "in-to-|| translation" will yield inner joins:

    T(a.B.C == 1 || a.B.C == 2)
    = T(a.B.C == 1) union T(a.B.C == 2) union N(a.B.C == 1) union N(a.B.C == 2) 
union F(a.B.C == 1) union F(a.B.C == 2)
    = {} union {} union {a.B} union {a.B} union {} union {}
    = {a.B} ????????????????????????????????????????????????????????????? SHIT 
- DOES NOT WORK - WE NEED SHARPER UPPER BOUNDS!!!!!
     
    N(a.B.C == 1 || a.B.C == 2)
    = N(a.B.C == 1) union N(a.B.C == 2) union F(a.B.C == 1) union F(a.B.C == 2)
    = {a.B} union {a.B} union {} union {}
    = {a.B}
    
    F(a.B.C == 1 || a.B.C == 2)
    = F(a.B.C == 1) union F(a.B.C == 2)
    = {} union {}
    = {}

........................
    



3. Optimization of outer joins to inner joins - or-false consistency


Not the same as time-proven logics. One famous example is the "law of the 
excluded middle" and SQLs SELECT. The former law says that for a proposition P, 
in classical two-valued logics (like propositional or predicate calculus), 
either P is true or the negation of P is true. There is no "middle ground". 

SQL uses a 3-valued logic in WHERE clause condition evaluation, with the 3 
values true, false, and logical-NULL (not to be confused with the *value* 
"value-NULL", which is something completely different). However, when executing 
a SELECT, SQL is forced to map the 3 values to 2 (true or false): After all, 
for a record (tuple) that is the possible result of a query, there are only two 
possibilities: The record *is* in the result, or it is *not* - the law of the 
excluded middle! In SQL, the mapping is that both false (of 3-valued logic) and 
logical-NULL map to false (of 2-valued logic), whereas true (of 3-valued logic) 
maps to true (of 2-valued logic). However, this mapping now produces the 
well-known possibility that neither a proposition nor its negation is true for 
a record. E.g., neither x = 4 nor NOT(x = 4) is true for a record where x is 
value-NULL. Thus, the 2-value-logic part of SQL contradicts the law of the 
excluded middle. To alleviate that somewhat, SQL goes a long way to ensure that 
logical-NULL can only be the result of conditions where at least one 
participating value is value-NULL. Thus, when no value-NULLs are present, SQL's 
2-value-logic part follows classical logics.

Why is this interesting? When one decides on the semantics of a logic, it is 
necessary to keep such general properties in view: Do we want the law of the 
excluded middle, de'Morgan's laws etc. to hold or not?

Also for NHib.Linq, it is necessary to decide on such meta-rules. For example, 
here is one rule that is true in classical logics as well as in SQL logics 
(3-value and also 3-value mapping to 2-value; I will therefore use SQL notation 
in the following except in genuine Linq examples):

    If Q is a condition that is always false (in a domain of tuples), then
    P holds for some value tuple exactly if P OR Q holds.

In other words, (P OR false) is always the same as P.
We could say that a logic which does *not* have this property exhibits the 
"or-false-anomaly".

Here are some examples:

    When a > 0 is true for some tuples, then also a > 0 OR 0 = 1 is true; and 
vice versa.
    On the domain of real numbers: When a > 5 is true, then also a > 5 OR b * b 
< ABS(b) - 1 is true; and vice versa.

I propose that NHib.Linq should avoid the or-false-anomaly.

Careless introduction of inner joins creates the or-false-anomaly; here is why.
One could believe that outer joins are only necessary for paths that are used 
below OR or NOT operators. Thus, in

    .Where(x => x.A.B == 1 || x.C.D == 1)
    
we need outer joins for the table behind x.A's class - otherwise, a 
non-existing x.A could filter out objects that *do* have an x.C with x.C.D == 
1. Now, what about 

    .Where(x => x.A.B == null || x.C.D * x.C.D < Math.Abs(x.C.D) - 1)

As humans, we can easily prove that the right side is always false for real 
numbers. But we will not, I assume, get NHibernate to be able to do this. 
Therefore, NHibernate must use an outer join for the member x.A in the 
left-side condition (on the theory that the right condition only could be true 
for some objects).

Now, "pure outer join logic" (without existence guards) will return all objects 
where
(a) either there is an x.A, and its B is value-null;
(b) or where there is no x.B at all (reason: This will lead to an outer-joined 
x.A.B column that contains null; I will call such a null value introduced by an 
outer join oj-null in the following).
As the right side is always false, and we want to avoid the or-false-anomaly, 
the query

    .Where(x => x.A.B == null)

should return the same result. But this means that also in this simple 
condition, we must join to the table of x.A's class with an *outer join*! 
Therefore, we *cannot* decide on the usage of outer joins solely on the logical 
operators in the where condition, e.g., the presence or absence of ! or ||.

The whole affair somehow requires that we "have to look into the future": If a 
member expression m might be outer joined in the condition translation of some 
condition ...a... || ...b... and this outer joining alone might yield records 
in the result, we *must* outer join m also if we only translate ...a... alone 
(or ...b...alone). The consequence of this is:

    For all member expressions in a condition that *might* make *any possible 
containing top-level condition* true under outer joining, we must always outer 
join these member expressions!

Which member expressions may make a condition true when outer joined? Let us 
look at a few examples (in C# notation):

    x.A.B == null           can become true under outer joining if x.A.B is 
oj-null
    !(x.A.B != null)        can similiarly become true under outer joining
    (x.A.B ?? 4) == 4       can become true under outer joining if x.A.B is 
oj-null
    !((x.A.B ?? 4) != 4)    can similiarly become true under outer joining 
    (x.A.B ?? x.E) == 4     can become true under outer joining if there is no 
x.A that can be outer joined, but x.E == 4.
    (x.A.B ?? x.C.D) == 4   can become true under outer joining if there is no 
x.A to be joined, but there is an x.C with value D == 4.
    
So we see that there are quite a few conditions that might make a condition 
true under outer joining to non-existent records. In all these cases, x.A's 
class must be outer joined to avoid the or-false-anomaly. And alas, I am not at 
all sure that there is an algorithm that allows us to find such "dangerous 
member expressions" - at least, this will be a complex machine!

So we try the opposite approach: We *always* do outer joins for member 
expression in a condition P, unless we can *prove* that the member expression 
being equal to oj-null P can never make a containing top-level condition true. 
In that case only, we can use inner joins.


.....................................................




However, we can prove the following:

    If a condition x with member expression m is NOT below a ! 
    and does NOT occur "one-sided" below a || operator 
    and the condition x can only return logical-null for m == value-null (when 
other values are set suitably),
    then for each surrounding condition c(...x...) of x, we have
    
        c(...x...) || F is true exactly if c(...x...) is true
        
Aha. (a) Why does this help? (b) How to prove this? (c) How can we use this in 
an optimization algorithm?

Ad (a): It helps in the following quite frequent cases:
* A condition needs no joins at all, e.g.
    a.E < 5 || !(a.E > 7 && a.E != 9)
* A condition with joins uses no logical operators at all, e.g.
    a.B.C > 4
* A condition with joins uses only && operators, e.g.
    a.B.C >= 5 && a.B.C <= 10 && a.B.F == 3
* A condition with joins uses only && operators and "trivial ||s", e.g.
    a.B.E >= 5 && a.B.E <= 10 && (a.B.F == 3 || a.B.F == 5)





    a.B.E <= 10 && (a.B.F == 3 || a.B.F == null)

left side null; 
    null and true --> null
    null and null --> null
    null and false (aber NICHT für null!!) --> false 
daher inner joins ok!



    !(a.B.E <= 10 && (a.B.F == 3 || a.B.F == null))
    
    
    



However, the following ....

    a.B.F == 3 || a.B.F == null

--> can return true

if there is no a.B, 


    !(a && b) || F
    
    
    where b is false



    null || true --> true
    
    


In order to avoid the necessity of 


In order to steer clear of problems, we therefore do not check w




Actually, *any* condition that may return true or false when fed a single null 

-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

Re: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Reply via email to