Re: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Harald Mueller Sun, 03 Apr 2011 14:58:13 -0700

Hi Patrick -

Interesting :-) [all of the discussion!] - especially the argument


> Actually, a.P == a.Q is translated to ...P == ...Q.

Without even reading the arguments, I risk saying that the logic thus defined 
is inconsistent, i.e. you can write (complex) queries where it is unclear 
whether the result should be true or false. But if people can live with this 
sort of "experimental logic" (i.e., program, try out what you get, then 
re-program), then so be it. EF and Linq2SQL are the same sort of thing ... and 
therefore making them a standard is a little bit "interesting".
(SQL and C#, on the other hand, have deeply formally arguable consistency).

But as I said: I can live with any sort of semantics. 

What I'd ask *you* (and/or others) to do before I change the code: Rewrite the 
test cases (which are now, in the best tradition of TDD and "test first", 
"specificational tests") so that they match your semantics. What I do not want 
is to myself change code *and* tests as I believe you meant it to behave - and 
then have a debate whether this was "right" or not. We are now lucky to have 
those 74 test cases I wrote (which happily test *all combinations*; so one 
could argue they are more like 300 or so test cases) - and many of them test 
null logic -, so you have the chance to describe at least for all these 
expressions and value combinations what we should get. If you think that 
agreement with (the current version of) EF is tantamount, you could simply run 
them through EF and say "that's the results we need".

I'll then change the code so that it conforms to what you want!

Rest of some remarks inline ...

[...]
> > There are more anomalies, like the &&-commutativity and
> ||-commutativity, which are apparently accepted by all ...
> 
> 
> What are you referring to here regarding the commutativity?

Oh - just a little thing: && in Linq2Objects is not commutative (but a shortcut 
operator). So 

  a.B != null && a.B.P == 4

is *not* the same as

  a.B.P == 4 && a.B != null

On the other hand, C# always has had a commutative logical operator &

  a.B.Q == 3 & a.B.P == 4

is *exactly* the same as

  a.B.P == 4 & a.B.Q == 3 

For reasons which I do not know we *do* use the non-commutative && operator in 
Linq expressions intended to be translated to the commutative SQL AND operator 
- but we do not support the (intentionally commutative) & operator. Same for || 
and |. It's just interesting how an eco-system of a language gets used down the 
road ...

[...]
> > If you really want to convince me (and probably many others), you have
> to write down the *disadvantages* of our proposal. If they are visible -
> i.e., there are examples for them -, then one can see and accept them.
> 
> 
> These are the disadvantages I see, though they're not particularly formal.

I missed a crucial letter, it seems: I want to know the disadvantages of *your* 
proposal in order to accept it. I know the disadvantages of mine :-) - except 
one: That keeping to a (maybe somewhat flawed) EF or Linq2SQL semantics is 
better than having a ... mhm, let me say "straight" semantics (I'd like to 
write "consistent" - but I did not provide consistency proofs for (||-5) either 
...). Anyway, you summed up the problems quite nicely --> so what now about the 
disadvantages of your "outer join proposal"?

> Of these, the first is of greatest importance.  While the semantics
> for a.B.C.D == null could be argued in either direction, the fact that
> (x!=y) is not equivalent to !(x == y) is rather disturbing from a user
> perspective IMHO.

Just to re-iterate: We are *only* talking about the cases where Linq2Objects 
would throw an exception. In all other cases, this *is* equivalent.
And the answer is: No, it's not disturbing for us. We have not had one problem 
with it over the years. Have you ever programmed against such a model?

> To me, both
> of the following mean a.B.C exists and is not equal to 1.  I would not
> expect to get rows where a.B is null when I perform the second query.
> Mixing existence and equality is confusing IMHO.
> 
> a.B.C != 1
> !(a.B.C == 1)
> 

I see your point. I'm also quite sure that with this reasoning, you can get 
contradictory results ... here is an attempt to give an example just that you 
see how one might argue:

Let's say you can navigate 
* from a to B and C, and C is (in your application) an integer >= 0; and also
* from a to X and Y, and Y is also an integer >= 0.

The starting point of NH-2583 was that for

  a.B.C != 1 || a.X.Y != 1    [X]

to become true, it should be allowed that a.B is null when a.X.Y != 1. In other 
words, objects where a.B is null, but a.X != null and a.X.Y != 1 should be 
found by that query. By de'Morgan (which is true in all of C# and SQL), this is 
equivalent to

  !(a.B.C == 1 && a.X.Y == 1)   [Y]

I am not yet sure whether this includes "a.B != null && a.X != null" in your 
semantics. But I think we agree that under the conditions above,

  a.B.C == 1 && a.X.Y == 1

is equivalent to

  a.B.C * a.X.Y == 1

(the latter follows from the previous; but also the other way round - the only 
way to get a product one from non-negative integers is one times one). So the 
whole condition [Y] is equivalent to

  !(a.B.C * a.X.Y == 1)

or, by your expectation,

  a.B.C * a.X.Y != 1  [Z]

I assume now that you would expect that this implicitly includes the fact that 
both a.B and a.X are *not* null. But now we have a contradiction:

The same condition, just formulated differently, 
* on the one hand should allow that a.B is null; (at [X])
* on the other hand should imply that a.B is not null (at [Z]).

I do not say that you cannot define the operator logic like this. But probably, 
one should then add the "de'Morgan anomaly" to the documentation (i.e., in 
contrast to C# and SQL, de'Morgan no longer holds).

Personally, to me this logic sounds "risky": On first glance, it is maybe more 
natural than others. But when you start arguing about more complex conditions, 
you get in muddy water like above. Maybe you can "define my example away" - but 
logic has often and often shown us that the more you keep to the fundamental 
laws of propositional and predicate logic, the less problems you get ...

It reminds me somewhat of my children's asking "but why is 10 to the zeroth 
power equal to 1? If you dont write down 10 at all, it should be zero!" to 
which the answer is (you would probably say against "intuition" and 
"simplicity") "If you want a consistent system where you are free of surprises 
later, it is just better to have the zeroth power of every non-zero number to 
be one. Look, here, I can give you examples of what fails if you take your 
definition of 10^0 = 0 ... like the failure of a^x * a^y = a^(x+y)".

Anyway, although I have never seen a programming language that risks such 
effects, it is interesting to see that you go for it! You are maybe deeper into 
language semantics than I am, and so you know what you do!

> You are correct that we don't share the same assumption here.
> However, my assumption comes not from looking at the source
> expression, but in assuming the outer join operating model and working
> backwards.  

An interesting method - writing the compiler before defining the language ;-) 
.. just being a little nasty. But yes, that's exactly what I did with my (||-3) 
"definition"! So you probably are also thinking about examples how this can 
fail - that's how I found the or-sum-anomaly example. Have you found any other 
interesting effect?

> Here are the latest reasons:
> >> 1.  Significantly simpler implementation.
> >
> > Trat could be the case --> let's try it (although I'm not yet convinced
> - see below after 5.).
> 
> 
> Well, as far as I can see, the most trivial implementation just uses
> outer joins everywhere and neither adds nor removes any clauses.  This
> should require almost no code.  

Well, the current implementation of NHibernate is even more trivial!

> The thing that I have no real concept
> of is how difficult it is to optimize to inner joins when possible.

Ye(eeee?)s - I did not think it through. I guess that's essentially 
optimization B (unless you accept the or-sum-anomaly). Probably about as 
complex as the current implementation - after all, you have to distinguish the 
operators (below || and ! and ?:, you need possible outer joins; and then you 
lift them up - if they come out on top together, you can transform them to 
inner joins). So not that complex, but also not that easy.

[...]
> 
> Linq to RDBMS does have different semantics than Linq to Objects, 

That's probably the core observation. In my BEHAV-1...BEHAV-4, I argued 
differently: There should be *no* difference in semantics of a query when there 
is no exception in Linq2Objects. If this cornerstone falls, there is no reason 
not to do arbitrary semantics.

> By easier to understand, I just meant the generated query with fewer
> clauses; I wasn't referring to the programming model.  I could argue
> here that II-4 is satisfactory to many developers, since it's being
> used by Linq to SQL and EF and there doesn't seem to be much
> discussion about it.

Did you talk to people using Linq to SQL or EF about 3-valued logic, logcial 
consistency and anomalies? All people I have met to this day, when shown the 
effects (e.g. the famous "null" effect in Linq to SQL - replacing a constant 
null with a variable of value null creates wrong SQL), at some point said 
something like "Well, maybe in the next version Microsoft will iron out those 
problems" and "we did not write complex queries up to now; mostly, we avoid the 
not operator" etc. NHibernate has seen so much more applications and 
interesting cases than any Linq system, I'd say. Still, you are right - people 
accept quite haphazard logic, because with some rewriting you get most things 
to work.

> 
> While I don't have time to think about it right now, it may also be
> worth considering what happens when writing something like a.B.C ==
> a.B.D instead of a.B.C == 1.

This could tune it with my example above ... maybe I find time on my commute to 
think about it!

Best regards, and thanks for a not-at-all-easy discussion!!!

Harald

-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

Re: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Reply via email to