Hi Sorry to ask a dumb question from the sidelines - I have not been following you in detail.
Has there ever been a general discussion about how close NH's LINQ provider actually _wants_ to be to LINQ to objects? Unfortunately, Microsoft gives absolutely no guidance for provider authors. There's always the argument that code would be portable between DB and in-memory if it were 100% compatible. But one might as well argue that LINQ to SQL is the measure for any LINQ provider, and people are more likely to be surprised if another provider behaves differently. Or more likely to port queries between DB-providers than to/from LINQ to objects. http://msdn.microsoft.com/en-us/library/bb399342.aspx: "LINQ to SQL does not impose null comparison semantics on SQL. Comparison operators are syntactically translated to their SQL equivalents." So a == b in C# becomes a = b in SQL. What the equality operator does is up to the DB (e.g. ANSI NULL settings). I guess that's because there's a reason for many things that SQL does, like the boolean/null semantics. More often than not, that's what you want in set-based operations, and if you write code in LINQ, you want to use that feature too. In any case, 100% compatibility to LINQ 2 objects is an illusion. You'll never get there, so you might as well not try. Consider the typical NullReference problem: a.b.c == x What's this code supposed to do if a.b is null? In C#, you'd get a NullReferenceException. Do you want the same in SQL? Why? In C#, you can work around it: (a.b != null) && (a.b.c == x) But this works only because "&&" is lazy in C#. Since there's no lazy AND operation in SQL, how would you translate this? Detect the pattern? My suggestion is to copy the behavior of LINQ to SQL and stay as close to SQL as you can, because that's easier to handle for the user, and less likely to bring surprises. In any case, there needs to be a clear and concise documentation for translation results. It's just a language that you need to learn before you use it. If one has to look up everything in unit tests or try it out, or even memorize many inconsistent cases, LINQ becomes a burden. A wiki page will do. A reference to unit tests won't. Just my two cents. Cheers, Stefan > -----Original Message----- > From: [email protected] [mailto:nhibernate- > [email protected]] On Behalf Of Harald Mueller > Sent: Sunday, April 10, 2011 4:12 PM > To: [email protected] > Subject: Re: [nhibernate-development] NH-2583 - Query with || operator > and navigations (many-to-one) creates wrong joins > > Hi Patrick - > > Just that you see where I'm lost at the moment - here is my scrap text > of finding solid reasons when to inner join. It's a classical example > of tightening the bounds to get fixed-point solutions in a lattice ... > probably has long be solved by someone. Now I have (for the 6th time) > the feeling that I know what's going on and how to define the various > tree attributes correctly (I think I need 6 of them instead of three as > below) - but I won't continue with it until next weekend. > > So long - don't read it to understand (unless you wnat to contribute to > a sound solution ...), just be amused :-). > > Harald > > ............................................. > > orop2 > > 1. (||-4) semantics ... > > > > > 2. Optimization of outer joins to inner joins - local consistency > > It is interesting - and for some databases an advantage - to replace > outer joins with inner joins if the outer joins are not necessary. "Not > necessary" here means that the result must not be different whether > using outer joins or inner joins. The question, obviously, is: For > which member expressions can we replace an outer join with an inner > join? > > It took me a few hours to find this out. Here is the result. > > A record (object) will be in the result if the evaluation of the > condition in 3-value SQL logic will return true; it will not be in the > result if the result is either logical-null or false. The difference > between outer joining and inner joining is that with the latter, > objects are missing from the set on which the condition is checked. > Thus, inner joins "emulates" a result that is logical-null or false; > and therefore, we can replace an outer join with an inner join only if > the resulting condition cannot become true. > > If we look at the evaluation tree of a condition, we can assign to each > node the set of member expressions that, when emptily outer joined > (i.e., creating oj-nulls for all their simple properties), might make > that node true, logical-null, or false; lets call these sets T, N, and > F, respectively. When we know these sets at the top condition, we can > then safely use inner joins for all member expressions that are *only* > in N and F. > > For example, the condition > > x.A.B == 4 > > is logical-null when emptily outer-joining x.A, but never true or > false. Thus, we have T={}, N={x.A}, F={}. Inner joining x.A (which > drops records that are emptily joined) therefore yields the same 2- > valued-logic result. > > On the other hand, the condition > > x.A.B is null and x.E is null > > can be made true or false by emptily outer-joining x.A, but never > logical-null (resaon: The right side can only be true or false; and the > left side also; so all of the condition is either true or false - > depending on x.E's value). So, T={x.A}, N={}, F={x.A}, and therefore we > cannot inner join x.A. By direct reasonsing, we see that inner joining > x.A will drop all the non-joinable records, hence inner and outer > joining are different for records where x.E is null, and will yield a > different 2-valued-logic result. > > Finally, the condition > > x.A.B is null or x.E is null > > is always true when emptily outer-joining x.A, or T={x.A}, N={}, F={}. > Hence, it is always different from inner joining, where the empty join > drops such records from the result. > > How can we compute T, N, and F? First, we note that we can compute > these sets with different "preciseness". For example, one correct way > would be to just collect all member expressions into a set M, and then > set T=N=F=M. Obviously, this prevents any inner joins. On the other > hand, we can quite precisely check condition like the following: > > x.A.B == null T=?, N=?, F=? > !(x.A.B != null) T=?, N=?, F=? > (x.A.B ?? 4) == 4 T=?, N=?, F=? > !((x.A.B ?? 4) != 4) T=?, N=?, F=? > (x.A.B ?? x.E) == 4 T=?, N=?, F=? > (x.A.B ?? x.C.D) == 4 T=?, N=?, F=? > > In the following, I'll first give the rules for the 4 logical operators > &&, ||, !, and ?!; then I'll give them for very common simple > conditions; finally, I'll look at some complex simple conditions. > > (a) && operator: > > T(A && B) = T(A) union T(B) > Reason: Emptily outer joining a member expression can make A > true. B might be true for completely different reasons (e.g., it might > be a tautology). Then empty outer joining that member expression from > T(A) will make all of A && B equal true. > > This looks very unpromising: Because T is computed by set union, it is > not possible to "kick out" a member expression from T. So, if a member > expression is in T from some sub-condition, we will never be able to > optimize its join to an inner join! Essentially, this means that > already on the simple condition level, we must be careful when we put a > member expression into T. But let's continue with the logical operators > first. > > N(A && B) = N(A) union N(B) union T(A) union T(B) > Reason: First, a member expression from N(A) can make A && B > equal to logical-null, e.g. if B is always logical-null or true. > Conversely, the same is true for N(B). However, also a member > expression (say t) from T(A) may make the condition logical-null, e.g. > if B is null == null: This condition will always return logical-null, > and when now t makes A true, A && B is true && logical-null, which is > equal to logical-null. So we have to add all member expressions of T(A) > to N(A && B). The same is obviously also true for T(B). > > F(A && B) = F(A) union F(B) union N(A) union N(B) union T(A) union > T(B) > Reason: Similiarly to above. Adding N(A) and N(B) is necessary > e.g. if B or A is null == null. Adding T(A) and T(B) is necessary e.g. > if B or A is a contradiction like 0 == 1. > > These "blow-ups" of N and F are not as bad as the one for T - after > all, inner joins are not suppressed by having a member expression in N > or F. However, this might be painful when we compute F and N for the ! > operator, because there we might blow up T a lot. We'll see ... > > (b) || operator: > > T(A || B) = T(A) union T(B) union N(A) union N(B) union F(A) union > F(B) > Reason: An emptily outer joined member expressions that makes A > true can also make A || B true - e.g. when B is a contradiction. But > also if it makes A logical-null, all of A || B can become true - e.g. > when B is a tautology. Finally, even if it makes A false, all of A || B > can become true when B is a tautology. > > The blow-up for || is even larger - essentially, *every member > expression* will be in T and hence require an outer join. But this is > to be expected - it is actually the foundation of the NH-2583 problem! > > SHIT. Does not work for > > > N(A || B) = N(A) union N(B) union F(A) union F(B) > Reason: If a member expression is in F(A), then emptily outer > joining it can make A || B logical-null if B is e.g. null == null, as > logical-null || false is logical-null. However, a member expression > that might make A true when emptily outer joined will never make A || B > logical-null - there is no possibility that true || ... becomes > logical-null. You might ask: But that member expression only *might* > make A true; what if it actually makes A logical-null - then, the whole > expression will become logical-null after all! The answer is: In that > case, the member-expression must (also) be in N(A), as it obviously > might make A logical-null. So we cannot lose a possibility. > F(A || B) = F(A) union F(B) > Reason: The reasoning is like for T(A && B). > > (c) ! operator: > > T(!A) = F(A) > Reason: Under 3-valued logic, !A can only become true if A > becomes false. So only member expressions that, when emptily outer > joined, *might* make A false, can possibly make !A true. > N(!A) = N(A) > Reason: Here the reasoning is as simple as for T. > F(!A) = T(A) > Reason: Also this is easy. > > (d) Logical ?: operator: A ? B : C is equivalent to A && B || !A && C. > From this, one can compute the T, N, and F sets. Here are the - boring > - results: > > T(A ? B : C) = T(A && B || !A && C) > = T(A && B) union T(!A && C) > union > N(A && B) union N(!A && C) > union > F(A && B) union F(!A && C) > = T(A) union T(B) union T(!A) union T(C) > union > N(A) union N(B) union T(A) union T(B) union N(!A) > union N(C) union T(!A) union T(C) > union > F(A) union F(B) union N(A) union N(B) union T(A) > union T(B) union F(!A) union F(C) union N(!A) union N(C) union T(!A) > union T(C) > = T(A) union T(B) union F(A) union T(C) > union > N(A) union N(B) union T(A) union T(B) union N(A) > union N(C) union F(A) union T(C) > union > F(A) union F(B) union N(A) union N(B) union T(A) > union T(B) union T(A) union F(C) union N(A) union N(C) union F(A) union > T(C) > = F(A) union F(B) union F(C) union N(A) union N(B) > union N(C) union T(A) union T(B) union T(C) > > N(A ? B : C) = N(A && B || !A && C) > = N(A && B) union N(!A && C) union F(A && B) union > F(!A && C) > = N(A) union N(B) union T(A) union T(B) > union > N(!A) union N(C) union T(!A) union T(C) > union > F(A) union F(B) union N(A) union N(B) union T(A) > union T(B) > union > F(!A) union F(C) union N(!A) union N(C) union > T(!A) union T(C) > = N(A) union N(B) union T(A) union T(B) > union > N(A) union N(C) union F(A) union T(C) > union > F(A) union F(B) union N(A) union N(B) union T(A) > union T(B) > union > T(A) union F(C) union N(A) union N(C) union F(A) > union T(C) > = F(A) union F(B) union F(C) union N(A) union N(B) > union N(C) union T(A) union T(B) union T(C) > > F(A ? B : C) = F(A && B || !A && C) > = F(A && B) union F (!A && C) > = F(A) union F(B) union N(A) union N(B) union T(A) > union T(B) > union > F(!A) union F(C) union N(!A) union N(C) union > T(!A) union T(C) > = F(A) union F(B) union N(A) union N(B) union T(A) > union T(B) > union > T(A) union F(C) union N(A) union N(C) union F(A) > union T(C) > = F(A) union F(B) union F(C) union N(A) union N(B) > union N(C) union T(A) union T(B) union T(C) > > Now let us look at simple conditions. The following are *possible* > assignments to T, N, and F. As said above, one can also assign the > values more crudely - e.g., always assign all member expressions in the > condition to all three sets. However, the assignments below are > "precise" - i.e., they assign the minimal sets - for all conditions > except (i) (if I did not make a mistake). > > (e) mE*.P <op> <constant(s)> (where <constant(s)> are not value-nulls) > > mE* here is a symbol for a "nested sequence of member expressions" mE1, > mE2, ..., .P is the final property. E.g. in > > a.B.C.D.E == 4 > > the member expressions are mE1=a.B, mE2=a.B.C, and mE3=a.B.C.D; the > final property is E. In this case, we get > > T(mE*.P <op> <constant>) == {} > N(mE*.P <op> <constant>) == {mE1, mE2, ...} > F(mE*.P <op> <constant>) == {} > Reason: Emptily outer joining any member expression will yield > oj-null for P; but oj-null <op> <constant> is always logical-null. > > (f) mE*.P == me'*.Q > > The result depends on the definition of ==: We can either use the > Linq definition where null == null is true (which requires an SQL > translation of ... = ... OR ... IS NULL AND ... IS NULL); or the SQL > definition where null == null is logical-null. In the first case, we > reason as follows: > > - When one side (e.g. the left one) does empty outer-joining > and yields oj-null for P, but the right side has non-oj-null values, > the SQL > oj-null = value OR oj-null IS NULL AND value IS NULL > evaluates as logical-null OR true AND false, which is > logical-null OR true, which is true. > - When both sides do empty outer-joining, we get > oj-null = oj-null OR oj-null IS NULL AND oj-null IS NULL > which is logical-null OR true AND true, which is true. > So, empty outer joining will always yield true, and hence > > T(mE*.P == me'*.Q) = {mE1, mE2, ...} union {mE1', mE2', ...} > N(mE*.P == me'*.Q) = {} > F(mE*.P == me'*.Q) = {} > > For the SQL definition of equality, we get the following: > > - When one side (e.g. the left one) does empty outer-joining > and yields oj-null for P, but the right side has non-oj-null values, > the SQL > oj-null = value > evaluates as logical-null. > - When both sides do empty outer-joining, we get > oj-null = oj-null > which is logical-null. > So, empty outer joining will always yield logical-null, and > hence > > T(mE*.P == me'*.Q) = {} > N(mE*.P == me'*.Q) = {mE1, mE2, ...} union {mE1', mE2', ...} > F(mE*.P == me'*.Q) = {} > > (g) mE*.P == null > Empty outer joining can only yield true, hence we get: > > T(mE*.P == null) = {mE1, mE2, ...} > N(mE*.P == null) = {} > F(mE*.P == null) = {} > > (h) mE*.P != null > Empty outer joining can only yield false, hence we get: > > T(mE*.P == null) = {} > N(mE*.P == null) = {} > F(mE*.P == null) = {mE1, mE2, ...} > > (i) Other complex simple condition, e.g. (x.A.B ?? x.C.D) <op> > <constant(s)> > > Here, I opt for the simple assignment > > T(...) = N(...) = F(...) = all joinable member expressions in > the condition > > It is probably possible to do better by analyzing coalesce > operators (??). > > Here, we should do a few examples. > > (ex.1) Show that the typical "in-to-|| translation" will yield inner > joins: > > T(a.B.C == 1 || a.B.C == 2) > = T(a.B.C == 1) union T(a.B.C == 2) union N(a.B.C == 1) union > N(a.B.C == 2) union F(a.B.C == 1) union F(a.B.C == 2) > = {} union {} union {a.B} union {a.B} union {} union {} > = {a.B} > ????????????????????????????????????????????????????????????? SHIT - > DOES NOT WORK - WE NEED SHARPER UPPER BOUNDS!!!!! > > N(a.B.C == 1 || a.B.C == 2) > = N(a.B.C == 1) union N(a.B.C == 2) union F(a.B.C == 1) union > F(a.B.C == 2) > = {a.B} union {a.B} union {} union {} > = {a.B} > > F(a.B.C == 1 || a.B.C == 2) > = F(a.B.C == 1) union F(a.B.C == 2) > = {} union {} > = {} > > ........................ > > > > > 3. Optimization of outer joins to inner joins - or-false consistency > > > Not the same as time-proven logics. One famous example is the "law of > the excluded middle" and SQLs SELECT. The former law says that for a > proposition P, in classical two-valued logics (like propositional or > predicate calculus), either P is true or the negation of P is true. > There is no "middle ground". > > SQL uses a 3-valued logic in WHERE clause condition evaluation, with > the 3 values true, false, and logical-NULL (not to be confused with the > *value* "value-NULL", which is something completely different). > However, when executing a SELECT, SQL is forced to map the 3 values to > 2 (true or false): After all, for a record (tuple) that is the possible > result of a query, there are only two possibilities: The record *is* in > the result, or it is *not* - the law of the excluded middle! In SQL, > the mapping is that both false (of 3-valued logic) and logical-NULL map > to false (of 2-valued logic), whereas true (of 3-valued logic) maps to > true (of 2-valued logic). However, this mapping now produces the well- > known possibility that neither a proposition nor its negation is true > for a record. E.g., neither x = 4 nor NOT(x = 4) is true for a record > where x is value-NULL. Thus, the 2-value-logic part of SQL contradicts > the law of the excluded middle. To alleviate that somewhat, SQL goes a > long way to ensure that logical-NULL can only be the result of > conditions where at least one participating value is value-NULL. Thus, > when no value-NULLs are present, SQL's 2-value-logic part follows > classical logics. > > Why is this interesting? When one decides on the semantics of a logic, > it is necessary to keep such general properties in view: Do we want the > law of the excluded middle, de'Morgan's laws etc. to hold or not? > > Also for NHib.Linq, it is necessary to decide on such meta-rules. For > example, here is one rule that is true in classical logics as well as > in SQL logics (3-value and also 3-value mapping to 2-value; I will > therefore use SQL notation in the following except in genuine Linq > examples): > > If Q is a condition that is always false (in a domain of tuples), > then > P holds for some value tuple exactly if P OR Q holds. > > In other words, (P OR false) is always the same as P. > We could say that a logic which does *not* have this property exhibits > the "or-false-anomaly". > > Here are some examples: > > When a > 0 is true for some tuples, then also a > 0 OR 0 = 1 is > true; and vice versa. > On the domain of real numbers: When a > 5 is true, then also a > 5 > OR b * b < ABS(b) - 1 is true; and vice versa. > > I propose that NHib.Linq should avoid the or-false-anomaly. > > Careless introduction of inner joins creates the or-false-anomaly; here > is why. > One could believe that outer joins are only necessary for paths that > are used below OR or NOT operators. Thus, in > > .Where(x => x.A.B == 1 || x.C.D == 1) > > we need outer joins for the table behind x.A's class - otherwise, a > non-existing x.A could filter out objects that *do* have an x.C with > x.C.D == 1. Now, what about > > .Where(x => x.A.B == null || x.C.D * x.C.D < Math.Abs(x.C.D) - 1) > > As humans, we can easily prove that the right side is always false for > real numbers. But we will not, I assume, get NHibernate to be able to > do this. Therefore, NHibernate must use an outer join for the member > x.A in the left-side condition (on the theory that the right condition > only could be true for some objects). > > Now, "pure outer join logic" (without existence guards) will return all > objects where > (a) either there is an x.A, and its B is value-null; > (b) or where there is no x.B at all (reason: This will lead to an > outer-joined x.A.B column that contains null; I will call such a null > value introduced by an outer join oj-null in the following). > As the right side is always false, and we want to avoid the or-false- > anomaly, the query > > .Where(x => x.A.B == null) > > should return the same result. But this means that also in this simple > condition, we must join to the table of x.A's class with an *outer > join*! Therefore, we *cannot* decide on the usage of outer joins solely > on the logical operators in the where condition, e.g., the presence or > absence of ! or ||. > > The whole affair somehow requires that we "have to look into the > future": If a member expression m might be outer joined in the > condition translation of some condition ...a... || ...b... and this > outer joining alone might yield records in the result, we *must* outer > join m also if we only translate ...a... alone (or ...b...alone). The > consequence of this is: > > For all member expressions in a condition that *might* make *any > possible containing top-level condition* true under outer joining, we > must always outer join these member expressions! > > Which member expressions may make a condition true when outer joined? > Let us look at a few examples (in C# notation): > > x.A.B == null can become true under outer joining if > x.A.B is oj-null > !(x.A.B != null) can similiarly become true under outer > joining > (x.A.B ?? 4) == 4 can become true under outer joining if > x.A.B is oj-null > !((x.A.B ?? 4) != 4) can similiarly become true under outer > joining > (x.A.B ?? x.E) == 4 can become true under outer joining if > there is no x.A that can be outer joined, but x.E == 4. > (x.A.B ?? x.C.D) == 4 can become true under outer joining if > there is no x.A to be joined, but there is an x.C with value D == 4. > > So we see that there are quite a few conditions that might make a > condition true under outer joining to non-existent records. In all > these cases, x.A's class must be outer joined to avoid the or-false- > anomaly. And alas, I am not at all sure that there is an algorithm that > allows us to find such "dangerous member expressions" - at least, this > will be a complex machine! > > So we try the opposite approach: We *always* do outer joins for member > expression in a condition P, unless we can *prove* that the member > expression being equal to oj-null P can never make a containing top- > level condition true. In that case only, we can use inner joins. > > > ..................................................... > > > > > However, we can prove the following: > > If a condition x with member expression m is NOT below a ! > and does NOT occur "one-sided" below a || operator > and the condition x can only return logical-null for m == value- > null (when other values are set suitably), > then for each surrounding condition c(...x...) of x, we have > > c(...x...) || F is true exactly if c(...x...) is true > > Aha. (a) Why does this help? (b) How to prove this? (c) How can we use > this in an optimization algorithm? > > Ad (a): It helps in the following quite frequent cases: > * A condition needs no joins at all, e.g. > a.E < 5 || !(a.E > 7 && a.E != 9) > * A condition with joins uses no logical operators at all, e.g. > a.B.C > 4 > * A condition with joins uses only && operators, e.g. > a.B.C >= 5 && a.B.C <= 10 && a.B.F == 3 > * A condition with joins uses only && operators and "trivial ||s", e.g. > a.B.E >= 5 && a.B.E <= 10 && (a.B.F == 3 || a.B.F == 5) > > > > > > a.B.E <= 10 && (a.B.F == 3 || a.B.F == null) > > left side null; > null and true --> null > null and null --> null > null and false (aber NICHT für null!!) --> false > daher inner joins ok! > > > > !(a.B.E <= 10 && (a.B.F == 3 || a.B.F == null)) > > > > > > > However, the following .... > > a.B.F == 3 || a.B.F == null > > --> can return true > > if there is no a.B, > > > !(a && b) || F > > > where b is false > > > > null || true --> true > > > > > In order to avoid the necessity of > > > In order to steer clear of problems, we therefore do not check w > > > > > Actually, *any* condition that may return true or false when fed a > single null > > -- > Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir > belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
