Comments inline.

On Sun, Apr 3, 2011 at 11:19 AM, Harald Mueller <[email protected]> wrote:
> In contrast to (||-4), (||-5) gives you the following (I say simple) 
> semantics:
>
> * Some condition using properties of an object can only be true if the object 
> "is there".
> * No "projection anomaly".


I don't disagree that these are nice properties that simplify the
programming model for some cases.


> But, as I and you noted, for Linq2Objects-transgressing expressions we get:
>
> * "Inverse operator anomaly": !(...==...) is not (...!=...)
>
> There are more anomalies, like the &&-commutativity and ||-commutativity, 
> which are apparently accepted by all ...


What are you referring to here regarding the commutativity?


> I would be easier to convince if you gave me a somewhat thorough treatment of 
> your logic. At least you should demonstrate what happens in (||-4) with 
> De'Morgan and associativity - obviously, I'll have to do the same for (||-5) 
> (for SQL (triple valued) logic, we all know how to do that; therefore, it 
> should also be done for another query logic).
>
> If you really want to convince me (and probably many others), you have to 
> write down the *disadvantages* of our proposal. If they are visible - i.e., 
> there are examples for them -, then one can see and accept them.


These are the disadvantages I see, though they're not particularly formal.

1. Different than Linq to SQL and EF, which can lead to "incorrect"
results when code depends on the behavior of the framework.  For
example, there is a growing ecosystem of code that simply generates
Linq expressions and expects certain results.  Unless there is strong
reason to differ, the default should be to conform to the established
behavior to prevent problems when using third party libraries and to
lessen the learning curve for those working with the "Linq to RDBMS"
style technologies.

2.  Introduces additional clauses which lead to:
        * Inverse operator anomaly
        * Increased processing for the query engine (obviously no
benchmarks were performed to back this up)
        * Increased SQL query complexity (for those actually reading
the generated code)
        * Decreased performance in generating the query, since the
existence clauses needs to be optimized out.
Of these, the first is of greatest importance.  While the semantics
for a.B.C.D == null could be argued in either direction, the fact that
(x!=y) is not equivalent to !(x == y) is rather disturbing from a user
perspective IMHO.

3.  Regarding a formal definition, the logic operators behave in the
expected way, regardless of whether there are inversions,
disjunctions, or conjunctions.  It's just standard SQL boolean logic
at that point.  The difference is that the implied existence is
factored out of the expression, rather than having it in.  To me, both
of the following mean a.B.C exists and is not equal to 1.  I would not
expect to get rows where a.B is null when I perform the second query.
Mixing existence and equality is confusing IMHO.

a.B.C != 1
!(a.B.C == 1)


> One item which would might me to give in to you:
>
> There is actually *one* implicit assumption which I never questioned - and 
> which you apparently do not share: That very simple conditions like
>
>    .Where(a => a.B.C.P == ...)
>
> should use inner joins. I grew up in a world where inner joins were 
> significantly cheaper than outer joins ... that's why. If we really use outer 
> joins *all the time*, then there is no or-sum-anomaly, because simple 
> conditions like
>
>    .Where(a => a.B.C.P == null)
>
> would return objects where B or B.C is null. (||-4) might be a way to go then 
> ...


You are correct that we don't share the same assumption here.
However, my assumption comes not from looking at the source
expression, but in assuming the outer join operating model and working
backwards.  Indeed, in cases like a.B.C.P == 1, it should generate
inner joins for performance reasons.  This will be the most common
simple case anyways.


>> More and more I believe that just using outer joins is the best
>> technique.  Here are the latest reasons:
>> 1.  Significantly simpler implementation.
>
> Trat could be the case --> let's try it (although I'm not yet convinced - see 
> below after 5.).


Well, as far as I can see, the most trivial implementation just uses
outer joins everywhere and neither adds nor removes any clauses.  This
should require almost no code.  The thing that I have no real concept
of is how difficult it is to optimize to inner joins when possible.


>> 3.  SQL semantics when null references are present (as opposed to
>> semantics that are neither SQL nor Object)
>
> I think this is wrong reasoning. Linq (like other, older OQLs) is a 
> *navigational query language*. There is no "SQL semantics" in it - never and 
> nowhere. For example, in Linq it is true that null == null (and therefore I 
> expect that SQL providers do translate the expression a.P == a.Q as
>
>    (...P = ...Q OR ...P IS NULL AND ...Q IS NULL)
>
> - everything else is obviously wrong (under BEHAV-1 ... BEHAV-4); I have not 
> tested whether NHib's Linq and EF's provider is correct in this respect). The 
> arguments about "SQL semantics" come (a) from some misguided and unfortunate 
> sentences in the C# docs when Nullables were introduced; and (b) that 
> "implementation thinking" where SQL semantics is allowed to "hi-jack" the 
> Linq semantics because "this is 'simpler'". But it is only "simpler" if you 
> toggle between Linq and SQL semantics erratically for trivial examples. 
> Instead, one should think about whole *classes of queries* (that's how I 
> found the projection anomaly) and consider the semantics.
> Again: There is no SQL semantics for a.B.C.P == null - simply because there 
> is no "navigation a.B.C" in SQL.


Actually, a.P == a.Q is translated to ...P == ...Q.  There was a long
discussion about this in an earlier thread.

     http://216.121.112.228/browse/NH-2402

Linq to RDBMS does have different semantics than Linq to Objects, and
while this is certainly annoying to some developers, it also tries to
provide a simple path from the ternary logic used in SQL to the binary
logic used in .NET.


>> 4.  If existence guards are added to protect against NREs manually, it
>> behaves like Linq to Objects.
>
> That's the same for both semantics.


Yes, but in #5, the extra checks should ideally be optimized out,
leading to extra implementation complexity.


>> 5.  Simpler queries, likely leading to performance improvements, and
>> at very least being easier to understand.
>>
>
> Well - I should march up my 20 developers, shouldn't I ;-) ? Do you have a 
> group of people who have programmed against (||-4) and are satisfied with it 
> = did not come to you [or whoever else] and ask about why they find objects 
> they are not expected to see? I did not invent that (||-5) semantics as a 
> theorist ... all that has been used for a few years, and therefore I say that 
> it is "simpler", "more efficient" (many inner joins) and "easier to 
> understand".


By easier to understand, I just meant the generated query with fewer
clauses; I wasn't referring to the programming model.  I could argue
here that II-4 is satisfactory to many developers, since it's being
used by Linq to SQL and EF and there doesn't seem to be much
discussion about it.


> But I do not see a constructive way how we can settle our different opinions 
> what is "simpler" and "easier to understand". Do you???
>
>> I'm actually quite interested in seeing a version of your code that
>> just uses the outer joins
>
> That should be easy ...
>
>> and optimizes to inner joins when possible.
>
> ... this is not that easy - but I'll try it out!: In the moment where there 
> is a navigation ("a member expression at depth > 1"), the outer join must 
> remain - otherwise you get the or-sum-anomaly.


As far as I understand, the or-sum-anomaly only happens when you're
checking against null.  You can use inner joins when there are no
"contradictory" disjunctions and you're comparing against a concrete
value.

While I don't have time to think about it right now, it may also be
worth considering what happens when writing something like a.B.C ==
a.B.D instead of a.B.C == 1.

        Patrick Earl

Reply via email to