Hi Harald.

Thanks again for your continued thoughtful analysis.

More and more I believe that just using outer joins is the best
technique.  Here are the latest reasons:
1.  Significantly simpler implementation.
2.  Same semantics when null references are not present.
3.  SQL semantics when null references are present (as opposed to
semantics that are neither SQL nor Object)
4.  If existence guards are added to protect against NREs manually, it
behaves like Linq to Objects.
5.  Simpler queries, likely leading to performance improvements, and
at very least being easier to understand.

I'm actually quite interested in seeing a version of your code that
just uses the outer joins and optimizes to inner joins when possible.
My hope is that you could indeed drop significant portions of it
without the existence guards.

BTW, feel free to add data and tests to the
NHibernate.Linq.Development project.  I'll quickly integrate pull
requests.

       Patrick Earl

On Sun, Apr 3, 2011 at 3:24 AM, Harald Mueller <[email protected]> wrote:
> Hi Patrick, all,
>
> That checking project is great! - I thought about writing one, but did not 
> yet find time.
>
> Regarding the existence guards (or, equivalently, the difference between the 
> (||-4) and (||-5) semantics): EF (I only tested this) exhibits exactly what I 
> say, namely the "projection anomaly". Write the following test:
>
>  [Test]
>  public void ProjectionAnomalyDoesNotOccurWhenConditionOnNullIsFalse()
>  {
>      Database db = DatabaseHelper.Create();
>      var directPrimitives = db.Primitives.Where(p => p.Decimal == 
> 0m).ToList();
>      var projectedPrimitives = db.EntityAs.Where(e => e.Primitive.Decimal == 
> 0m).Select(e => e.Primitive).ToList();
>      CollectionAssert.AreEquivalent(ToIdList(directPrimitives), 
> ToIdList(projectedPrimitives));
>      Assert.AreEqual(4, directPrimitives.Count);
>  }
>
> in LogicalJoinTests.cs (where ToIdList is simply:
>  private static IEnumerable<int> ToIdList(IEnumerable<Primitive> primitives)
>  {
>      return primitives.Select(p => p.Id).ToList();
>  }
> ).
>
> The test will succeed. (Please note: I set up a test that *does* return a 
> result - i.e., we get back 4 Primitive objects. So "everything gets evaluated 
> somewhere". On another note, I did not add .Distinct() as my document 
> requires; the reason being that in Patrick's test data there is no Primitive 
> that is reached from two EntityAs.).
>
> Now, write a new test ProjectionAnomalyDoesOccurWhenConditionOnNullIsTrue 
> where you replace the condition p.Decimal == 0m with some condition that 
> yields true on nulls. For Patrick's test data, that could e.g. be "p.String 
> == null". As this condition is also true for all 4 Primitive objects, one 
> could argue that we should get back the same result as in the first test. 
> However, the test now crashes with an NRE.
>
> If your condition contains parameters, you will get this behavior depending 
> on the parameter values. E.g., if Patrick replaces one Primitive's String 
> value with e.g. "abc"; and we now use in our application the condition
>
>    .Where(...String == param)...
>
> where param is provided by the application and can sometimes be "abc", 
> sometimes null - then the projected query will sometimes work and sometimes 
> not, although the non-projected query will always work.
>
> But: This "projection anomaly" happens *only* with expressions that will 
> throw an exception in Linq2Objects anyway - as will the non-equivalence of 
> !(...==...) and ...!=...
>
> So extensive testing (which should include not-null as well as null values) 
> should find these problems.
> And they can always be corrected by manually adding the existence guard - so 
> that the queries will also work under Linq2Objects!
>
> I, personally, would still keep the existence guards - this is based on our 
> framework, where some 20 people work on a business application of some 2.5 
> million lines of code; where we use an HQL translation including existence 
> guards. But actually, I accept that (||-4) is also a "reasonable" semantics - 
> especially as it exhibits "SQL three-valued null semantics" also for 
> expressions that are undefined under Linq2Objects, so that !(path == value) 
> is the same as path != value also in these semantics-extending cases.
>
> I'll take a look into the code I wrote - maybe I could remove quite a bit of 
> it when existence guards are dropped.
>
> Regards
> Harald

Reply via email to