Re: RE: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Harald Mueller Sun, 24 Apr 2011 04:02:56 -0700

Hi Patrick - 

here are my thoughts on Coalesce (??). The "culmination" of all that will have 
to be a semantics and inner-join optimization for the ?: operator, which 
"brings it all together". I don't know whether I look forward to this, or want 
it simply to be "unsupported" ...



A. Introduction to the coalesce problem
=======================================

Up to now, the analysis (my previous text) and the implemented code implicitly 
assume the following:

    If any member expression inside an expression is oj-null, the whole 
expression has value logical-null.

E.g. for case (e), we might have an expression like

   a.B.C + a.D.E > 5

The code says "map all to N" - in words: If any of the member expressions is 
oj-null, the result of the condition can only be logical-null.

Or e.g. in case (i), we might have

   a.B.C + a.D.E == null

The code says "map all to T" - in words: If any of the member expressions is 
oj-null, the result of the condition can only be true.

In the two examples above, the conclusions are correct. But - Patrick, you are 
VERY right to point this out - the conclusions drawn by the code (and my text) 
are NOT correct for the ?? operator; and - I add - also not for the ?: operator.

So let us start to think about expressions; but clearly separated from 
conditions, please!

Here is a very general statement:

    The result of an expression can be a value of a set defined by the 
expression's type.

We therefore have to track sets of such values to find out how an oj-null value 
in an expression might influence the result of a surrounding condition ... this 
is done in an are called "constraint-solving", and it is not what we want to do 
in general. But we can try a simple version of it, nonetheless: Namely 
partitioning the result set of an expression into simply tracked subsets and 
hope to get enough information to decide on possible values of conditions(!). 

You indicated such a partitioning of *value sets*: The value null, and all 
other values ("non-null non-boolean value"). So, the partitions are

    u      for the null value (be it value-null or oj-null)
    v      for the set of all not-null values

Please please - once more: Let's not mix u with n (or U with N in the code)! 
The latter is logical-null, the result of a condition; the former is a null 
*value* result of an *expression*.

We must note that such a value set partitioning is *arbitrary*! I.e., we could 
also e.g. partition into

    all numerical values < 0
    all numerical values >= 0
    all non-numerical values except null
    value-null and oj-null

and infinitely many other partitionings. This is *different* from conditions: 
There, we know for sure that the possible condition results can only be true, 
logical-null, and false, so the partitioning into t, n, f is already perfect.

In the following, I will first analyze "u/v-partioning for member expressions". 
Afterwards, I will analyze one coarser and one finer paritioning - just to show 
that these are also possible.

Maybe I need to stress the following once more: What we try here, is to "put 
3-valued logic ('SQL logic') over navigational expressions ('Linq 
expressions')". So it is a *new semantics* for Linq expressions. As I have 
pointed out repeatedly, I personally would have liked to "force" Linq2Objects 
logic into NHib.Linq expressions. However, we agreed that we want "simple 
implementations" by way of using SQL logic as much as possible. But when we do 
this, let's do it in earnest! That means, above all, clearly separating 
*expressions* and *conditions* in our analysis. Of course, Linq expressions do 
not have this separation - so we need special rules to distinguish them. Right 
now, in the code, we loosely rule that expressions with result type bool are 
conditions, unless they are leaf expressions, in which case they are 
interpreted as conditions of the form expr == 1. Let's stick to this at least 
for some more time.

(This ruling does most certainly not work in cases like (a == b) != (c < d), 
where a direct transformation is simply not allowed in SQL - and probably also 
in HQL. But such "cross-overs" between expressions and conditions are a 
different topic - right now, I just propose that we do not support them).

B. u/v partitioning for member expressions
==========================================

B1. Using u/v partitioning for member expressions in condition mappings
-----------------------------------------------------------------------

Let us assume that, for each expression, we have a mapping from each joined 
member expression in the expression to u or v or uv. The meaning of the mapping 
for one member expression mE is, as right now with t/n/f:

    The mapping for mE contains u if emptily outer joining mE can make the 
value of the expression null (value-null or oj-null).
    The mapping for mE contains v if emptily outer joining mE can make the 
value of the expression something different from null.

So, e.g., for

    a.B.C
    
we want the mapping { a.B -> u }. Reason: If a.B is emptily outer joined, then 
a.B.C under our "navigational 3-value semantics mimicking SQL" is always 
oj-null. For the expression

    a.B.C == null ? 2 : null

we want the mapping { a.B -> v }. Reason: If a.B.C is emptily outer joined, we 
want a.B.C == null to have the value true, and therefore the result for the ?: 
is 2, which is never null. Finally, for 

    a.B.C == null ? a.P : null

we want the mapping { a.B -> uv }. Reason: As before, if a.B.C is emptily outer 
joined, the expression value is equal to a.P. Unless we know anything more 
about a.P, we must assume that it can be null or not-null.

Let us assume that we somehow have this mapping - I'll of course develop the 
rules for this below.
If we have this information, we then have to "splice" it into the conditions' 
mappings. Here are the relevant cases from the previous text/code:

(e) expr <op> <constant(s)> (where <constant(s)> are not value-nulls)

Old result: { mE -> n } for all joined mE's in expr.

New result:
- If an outer-joined mE makes the expression expr null (we have u in the 
expression mapping), it makes the condition n.
- If outer-joining that mE makes the expression not-null (we have v in the 
expression), the condition can become t or f, but also n. The latter happens if 
*another* member expression, when outer joined, makes the expression null.

Concisely:
    { mE -> n }         if { mE -> u } in expr
    { mE -> tnf }       if { mE -> v } in expr
and, if both holds
    { mE -> tnf }       if { mE -> vu } in expr


(f) (Only "SQL equality", as NHib uses this) The result is the same as (e) [I 
think; I have not walked through examples ...]

(g) Same as (f) [Again without too precise thinking ...]

(h) expr != null

Old result: { mE -> f }

New result:
- If an outer-joined mE makes the expression expr null (we have u in the 
expression mapping), it can then make the condition only f ("NULL IS NOT NULL" 
is always false). 
- If an outer-joined mE makes the expression expr not-null (we have v in the 
expression mapping for expr), it makes the condition t.

Concisely:
    { mE -> f }         if { mE -> u } in expr
    { mE -> t }         if { mE -> v } in expr
and, if both hold,
    { mE -> tf }        if { mE -> vu } in expr

(i) expr == null

Old result: { mE -> t }

New result, concisely - reasoning like in (h):
    { mE -> t }         if { mE -> u } in expr
    { mE -> f }         if { mE -> v } in expr
and, if both holds
    { mE -> tf }        if { mE -> vu } in expr

B2. Computing u/v partitioning for member expressions
-----------------------------------------------------

As explained above, we now think about computing the u/v mappings. So we have 
to map each joined member expression in the expression to one possible 
partition, again with the meaning "if that member-expression is oj-null, then 
we can get ...".

Case (k):
Constants: Constants do not contain any member expressions, so the mapping 
comes out empty.

Case (l) ["ell"]:
Dot operator on complete expressions: (expr).P - this could be something like 
(a.B ?? a.D).E or worse. I think we should not support such expressions (i.e., 
we allow wrong results); or at least should not support inner join optimization 
for them! In the latter case, we have to invest some thought - we'd have to 
remove all those mappings. Here is a suggestion to solve this easily:

    We replace all mapped values with the pessimistic assumption uv.

I have not even thought about examples - the ?? operator is right now more 
important -, but I would hope that this rule will not produce wrong inner joins.

Case (m):
"mE.P" - i.e., using a property of a joined member mE: We get { mE -> u }. 
Reasoning: If mE is oj-null, any property mE.P is taken as null by SQL logic. 
This is true even if mE.P is of a non-nullable type (e.g. int), because, we 
talk about emptily outer joining mE! (I got this explanation wrong about 3 
times.)

Case (n):
(Member) Functions: I'll skip them for the moment - I think they are a littel 
difficult.

Case (o):
Binary operators: Here is, first, the simple + operator. A minus sign '-' means 
"our joined member expression does not exist in the mappings for that 
expression".

    x   y   x+y
    u   u    u      }
    u   v    u      } null on one side makes x+y null!
    u   -    u      }
    v   u    u
    v   v    v
    v   -    vu     (1)
    -   u    u
    -   v    vu     like (1), sides reversed
    -   -    -

(1) y might be null or not-null, irrespective of the member expression. We must 
assume the worst, so that the result might be any of v and u. Consequence: In

    a.B.C + 1

we get { a.B -> vu } even though outer-joining a.B will always yield a null 
value for a.B.C and hence the complete expression, so we should get { a.B -> u 
}! As with conditions, we could add a "possible result set" for each condition 
to restrict this - see examples below for some thoughts.

This table also is to be used for all other "null-hoisting" operators (i.e., 
where one null value suffices to make the whole expression value null).

Case (p):
Unary operators (like ~, unary minus):

    x   ~x
    u   u       null maps to null
    v   v       not-null maps to not-null
    -   -       not-existence remains not-existence

Case (q):
The very interesting ?? (coalesce) operator:

    x   y   x??y
    u   u    u
    u   v    v
    u   -    vu    (2)
    v   u    v
    v   v    v
    v   -    v     (3)
    -   u    vu    (4)
    -   v    v     (5)
    -   -    -

The entries without a minus (i.e., where there is a mapping for one and the 
same mE becoming oj-null on both sides) are simple: Only if both expressions 
yield a null value, we get null as result.
(2): As with +, we do not know what y might be when our mE is oj-null, so we 
assume the worst.
(3): When x is a non-null value, the whole expression is always non-null.
(4): x might be v under mE being oj-null, but also u.
(5): Whatever x is under mE being oj-null, the result cannot be null.

B3. Examples
------------

Let's do a few examples here, before looking at the horrible ?: operator:

Example (13)

What do we get now for that Coalesce(a.B.C, d.E.F) - in my notation, a.B.C ?? 
d.E.F.

For the left side, { a.B.C -> u }, according to (m).
For the right side, { d.E.F -> u }, also according to (m).
For all, 
    { a.B -> vu,          from case (q)(2)
      d.E -> vu }         from case (q)(4)

A condition like 

    (a.B.C ?? d.E.F ) == null
    
then gets, according to (i), the mapping

    { a.B -> tf,
      d.E -> tf }

This means that - if this is the top condition - we have to outer join both a.B 
and d.E (because there is t in both mapping results).

Example (14)

What about 

    (a.B.C ?? 1) == null

For the left side, { a.B.C -> u }, according to (m).
For the right side, { }, according to (k) (because there is no member 
expression!).
For the ?? expression, 
    { a.B -> vu }         from case (q)(2)
Hence, again, for the whole condition
    { a.B -> tf }

Now, on the face of it, this is wrong. We *know* that the left side is always 
not-null (so already the mapping for ?? should have been { a.B -> v }), and we 
*know* that the whole condition is always false, so the mapping should have 
been { a.B -> f }. Such a more precise reasoning would actually allow us to 
inner join in this case - but what for? The condition is always false anyway ...

Of course, we *could* implement that "possible expression value" concept. In 
the following, I use pairs <M,Z> as expression results, where M is the mapping 
as above and Z is the set of all possible expression values. The Z set can then 
be intersected with all mapping results to possibly reduce the mappings. I do 
not give the complete formalism (which requires all the case tables above to be 
rewritten to add those "possible values"), but just an outline:

For the left side, <{ a.B.C -> u }, vu> according to (m); and "whole 
expression" vu.
For the right side, { }; and "whole expression" v.
For ??, 
    { a.B -> v }          from a suitably modified case (q)(2); it knows that 
the rhs of ?? restricts the possible results.
For the complete condition, we now get
    { a.B -> f }
meaning we can inner join! Still, the condition is always false, anyway.

Example (15)

    (a.B.C ?? 1) != null

Again, we get
    { a.B -> vu }         from case (q)(2)
and then, from (h)
    { a.B -> tf }
    
We need to outer join! Also any optimized version should return this; it might 
tell us { a.B -> t }, but this still forces an outer join because of the t.

B4. The ?: operator
-------------------

TBD - might be a tour the force of recursively combining *condition* and 
*expression* mapping computation ...

B5. A short thought on coding u/v partitioning
----------------------------------------------

With my rigid request to separate conditions and expressions, the analysis 
becomes clearer and more easy to verify. However, in the code, we would need 
two data structures for hoisting the mappings up the Linq expression tree, 
about as follows:

        private const int T = 1;
        private const int N = 2;
        private const int F = 4;

        // The following is used for all *condition* traversal 
        // (but not *expressions* that are not conditions).
        private Dictionary<string, int> _memberExpressionMappingForConditions = 
new Dictionary<string, int>(); 
            // where the int is a subset of TNF

        private const int U = 1;
        private const int V = 2;

        // The following is used for all *expression* traversal.
        private Dictionary<string, int> _memberExpressionMappingForExpressions 
= new Dictionary<string, int>(); 
            // where the int is a subset of UV

However, because that int is agnostic to its contents, we could merge the two 
data structures into one, e.g. as follows:

        private const int T = 1;
        private const int N = 2;
        private const int F = 4;
        private const int U = 1;
        private const int V = 2;

        // The following is used for all traversals.
        private Dictionary<string, int> _memberExpressionMappingForExpressions 
= new Dictionary<string, int>(); 
            // where the int is a subset of TNF *or* of UV.
        
Please note that I did *not* care that U (the "value null" representation) and 
N (the logical-null representation) are mapped to the same value - there is no 
need for this:

*** I want to hammer it into the mind of any one involved that SQL's null 
*values* and null *condition results* are two totally different things! *** 

They are *only* connected by an intended semantics that for many "simple" 
operators, the occurrence of a null *value* in a condition will make the 
condition logical-null. But this is *not so* e.g. for COALESCE, CASE constructs 
with WHEN NULL and OTHERWISE, and maybe more operators (like ISNULL in SQL 
Server).

C. Coarse u/v partitioning ignoring member expressions
======================================================

TBD

D. More fine-grained value partitionings
========================================

TBD




-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

Re: RE: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Reply via email to