RE: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Wenig, Stefan Tue, 12 Apr 2011 13:52:59 -0700

Actually, I think the L2S guys got it quite right, assuming that there just is 
no perfect choice here.

If I'm just repeating an old discussion here, just stop reading and say so. Or 
I'll just go on forever ;-)

I'm making some assumptions about the L2S team's rationales here. So take it 
with a grain of salt, but it's the only way for me to read any sense into it.

OK, here goes: L2S is a way to write SQL queries in C#. It is not made to turn 
SQL into a C# emulator.

Now that doesn't mean that there are no cases where emulating C# semantics in 
the DB wouldn't make any sense. It's just not the way to go in the general 
case. Why? 

Because you HAVE to know SQL in order to get anywhere useful. Just thinking in 
C# and letting some algorithm figure it out transparently is a broken 
abstraction. Just like transparent remoting and many other ideas that were 
tried and abandoned over time. If you just think in C# and depend on 
translations, your queries will not perform, you will have a hard time 
debugging queries, reading queries, modifying them ad hoc for experimentation 
etc.  

(That said, there are things in LINQ that translate horribly to SQL. You can 
always choose to avoid them. Or you might think that they are so much more 
powerful than plain SQL that they're worth the trouble.)

OK, so my assumption (and the L2S teams, I guess) is that in most cases you 
know what SQL you want, and you just use LINQ for ease of use, type safety etc. 

Now I want to write the following query:

SELECT ... FROM ... WHERE A = B

The easiest way to get there is to write where A == B in LINQ, right? But what 
if this gets translated to (A = B) OR (A IS NULL AND B IS NULL)? How will I get 
the SQL above? 

So I guess that explains that, and among all the imperfect choices we have it 
looks relatively good to me.

The other way is easier. If that's what I want, why, I'd just write it in LINQ. 
But wait, you say, how do I write A IS NULL in LINQ? That's where L2S gets a 
bit ugly. The L2S syntax for A IS NULL is just A == null. It's a special case 
you just need to know when using L2S.

Now I can get the SQL above by just writing (A == B) || (A == null && B == null)

This is far from perfect. And it caused some confusion for L2S. But every 
developer needs to know the null semantics of their programming and query 
languages, or they'll be confused. Who does not know a few coders who got 
caught in that trap in SQL?

I think the main mistake of L2S is in documentation. There's a lot of it, but 
what they'd need is a document called "Things you need to know about L2S before 
you use it (because you won't know when to look them up)".

Maybe IsNull() and IsNotNull() extension methods would have been a better way. 
There's room for debate. But I don't think it's necessarily a bad decision, 
just because it looks inconsistent. It's just an imperfect choice among several 
others.

There's the suggestion of applying the translation for == null to variables of 
value null too. The problem with this choice is that the same LINQ code 
generates different SQL depending on runtime values. If A == param gets 
translated to A = @param, I expect that translation to be stable. Next time I 
execute it, the generated SQL changes. How's that for consistency?

Why would I not always want the translation to (A = B) OR (A IS NULL AND B IS 
NULL)?

First, because it kills readability.

Second, depending on the RDBMS, there might be a performance penalty.

Third, because SQL semantics do have advantages. You said there's little in 
practice, but I think it's very obvious when you're comparing columns from two 
tables for a join. Now LINQ joins do not have this problem (at least on 
advantage of that limited syntax), but as soon as you build joins via WHERE 
clauses, you don't want to get the product of all tables that have NULL in 
these columns.

Would it be a good idea to have different semantics for comparing columns to 
variables vs. comparing to another column? Better than making the exception for 
null constants? I think it's a worse consistency, plus you'd still have the 
problem that you can't avoid the translation for variables. 

So, if we disregard the option of emulating C# in SQL, I think the L2S team 
made a pretty good choice. Just too bad they let everyone find it out 
themselves.

Does that make any sense?

Now I don't really have a problem with creating a LINQ provider that really 
does it the C# way. Harald seems to have it all covered, and there are 
situations where this makes a lot of sense. However, I'd not make this the 
default strategy. An alternative query factory method would probably be a good 
place to put it. (Or maybe you make it the default strategy and provide another 
that's closer to the metal, that would be a separate consideration.)

Oh, one more thing: Maybe translating a.Equals(b) the C#-emulation way makes 
sense. Nobody in their clear mind would write that if they just want the SQL to 
say a = b, right? That could make for a nice shortcut in LINQ.

Cheers,
Stefan

> -----Original Message-----
> From: [email protected] [mailto:nhibernate-
> [email protected]] On Behalf Of Patrick Earl
> Sent: Tuesday, April 12, 2011 5:34 PM
> To: [email protected]
> Subject: Re: [nhibernate-development] NH-2583 - Query with || operator
> and navigations (many-to-one) creates wrong joins
> 
> There was a fairly long discussion about null handling earlier on the
> list, and the general consensus was that == nullVariable should be
> translated to IS NULL.  This is a deviation from Linq to SQL, but in a
> way that seems to reduce confusion with no increase in complexity.
> The = NULL semantics are of limited use in practice.
> 
>         Patrick Earl
> 
> On Tue, Apr 12, 2011 at 8:09 AM, Harald Mueller
> <[email protected]> wrote:
> > Richard wrote:
> >> Just to clarify, I meant I expect (or am not at all surprised that):
> >>
> >>     .Where(u => u.UserName == nullVar)
> >>
> >> ... translates to:
> >>
> >>     session
> >>         .CreateQuery("from User u where u.UserName = :name")
> >>         .SetString("name", nullVar)
> >>
> >> ... which in turn generates the SQL:
> >>
> >>     WHERE u.UserName = @p0
> >>     /* with @p0 = NULL */
> >
> > Really?
> > My colleagues (using Linq2SQL) checked that
> >
> >   ....Where(u => u.UserName == null);
> >
> > really returns all "u" where the UserName is null - i.e.
> transformation to "IS NULL" works; then wrote code
> >
> >   ... FindWithUserName(string userName) {
> >        ....Where(u => u.UserName == nullVar);
> >   }
> >
> > then tested it in-memory with NUnit with both string and null values;
> then used it in the application - and searched for a few hours why they
> did not get back the same result as in their tests. That you have to
> write
> >
> >   if (userName == null) {
> >        ....Where(u => u.UserName == null);
> >   } else {
> >        ....Where(u => u.UserName == userName);
> >   }
> >
> > (not even a ?: inside the Where worked) was surprising. Also,
> ReSharper lets you happily inline or "outline" the null ... as almost
> anyone (<= that's me, of course) would expect.
> > Fortunately, NHibernate behaves "as expected by almost anyone ...
> ahem, by me"!
> >
> > On a more general level: As "boundary cases", there seem to be two
> "schools of thought" for Linq SQL providers (I might exaggerate a
> little):
> >
> > * The "Let's-SQLers" believes Linq to be a sort of "macro language to
> control SQL". For them, every operator is mainly (only?) interpreted as
> an SQL operator. It seems they would even go so far to redefine C#
> operators to "control" the resulting SQL. They miss < and > and the
> like keyword for strings and redefine the behavior for .Any to always
> be an inner join. They even know (or "know") which variables and
> constants are outer-parameterized and which are not and therefore can
> control remotely the DB planning performance of a statement via Linq!
> >
> > * The "Linq purists" believe that the following statements should
> *not* return a result:
> >
> >    obj.S = "abc "; // with trailing blank
> >    ...Where(obj => obj.S == "abc"); // without trailing blank
> >
> > For them, checking every- and anything against Linq2Objects is a
> must; they work test-driven in-memory and then complain when the
> database finds or does not find what their unit tests imply. They even
> know that && could be overloaded for a type so that it still returns
> bool - and then would be in heaven if (or when?) the Linq SQL provider
> will *not* create an SQL AND, but a call to some stored function
> implementing their && operator!
> >
> > I confess I tend to the latter belief; but I must accept that the
> code example as well as the overloading horror I described will not
> work on a real database: Trailing whitespace *is* ignored in SQL; and
> bool && *is* the SQL AND operator; period. Finding that "middle ground"
> is work ... and heated discussions.
> >
> > Regards - all the best!
> >
> > Harald
> >
> > --
> > GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
> > gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
> >

RE: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Reply via email to