Hi Patrick - then let me jump in with a list of items on my mind:
a. Let's talk about design & code!
b. Dialects / database vendors to support
c. Still create table joins? or always build inner joins?
d. Unit testing vs. system testing
e. A first list of concrete tests to be done
Ad a. I think it is necessary to explain the design and code (decisions). Here
is the most important item: I try to minimize the number of clauses and left
joins. The most important part of this is the attempt to "pull up" the
"existence clauses" as high as possible. What are "existence clauses?" Well, as
I wrote in a previous posting, the whole idea is to rewrite (Linq)
... || ...a.b.c.d.Prop...
to (HQL)
... LEFT JOIN a.b AS a_b
... LEFT JOIN a_b.c AS a_b_c
... LEFT JOIN a_b_c.d AS a_b_c_d
...
WHERE
... OR (a_b IS NOT NULL AND a_b_c IS NOT NULL AND a_b_c_d IS NOT NULL
AND (...a.b.c.d.Prop...))
Thus, the objects on the "path" a.b.c.d are outer joined so that the ||
operator can still "succeed on the left side". But before anything is evaluated
for a.b.c.d on the right side, it is guaranteed that the objects exist by the
IS NOT NULL clauses. I call these clauses "existence clauses".
They are especially important in the case
... || a.b.Prop == null
which can NOT be transformed simply to
... LEFT JOIN a.b AS a_b
...
WHERE
... OR a_b.Prop IS NULL
because this will obviously be true also when a_b is not at all present.
Analyzing the condition for IS NULL etc. is in my experience (we tried it)
futile, as you can have arbitrarily complex conditions like !(!(!(a.b.Prop !=
null))). So the (or rahter, one possible) correct rewrite is
... LEFT JOIN a.b AS a_b
...
WHERE
... OR (a_b IS NOT NULL AND (a_b.Prop IS NULL))
Naive rewriting of this sort has a problem with a query expression like
a.b.Prop == 4 || a.b.Prop == 5
This would yield
... LEFT JOIN a.b AS a_b
...
WHERE
(a_b IS NOT NULL AND (a_b.Prop == 4))
OR (a_b IS NOT NULL AND (a_b.Prop == 5))
Obviously, neither the left join nor the existence clauses are necessary for
this frequently occurring pattern. Therefore, the code "pulls up" existence
clauses that occur on both sides of an ||.
However, I do *not* go the extra length of reordering clauses. For example,
(a.b.Prop == 4 || a.OtherProp == 100) || a.b.Prop == 5
will create left join + existence clauses for a.b.
... "and so on": I could write here or in comments more about examples and
considerations. However, maybe now you tune in with questions and comments of
yours!
Ad b. Right now, I only test against SQL Server 2008 (Express and Standard).
The || problem is independent of that. Against which databases should I (or we)
test such changes? Something like SQL Server + MySql would sound useful to me
...
Ad c. Right now, my code has an if at the end where it essentially says "if I
did not produce an outer join, then don't do anything different" = "use table
joins."
I did this to not change the behavior in all cases that already work correctly
- which could be important for applications that have tuned their indices to
query plans resulting from the table joins. Personally, I'd prefer to have "the
one and only" join method - i.e., always create inner and outer joins, never
table joins.
Ad d. "System tests," for this code, are tests that set up a database and check
whether the results are correct.
"Unit tests" would be tests that test only a single inner unit, e.g. the
AddJoinReWriter or even smaller parts.
I have the gut feeling that with the non-trivial algorithms, I'd like to also
write unit tests. Seeing problems only through the lens of the subsequent HQL
machine is not enough, neither for exploratory testing (if someone else wants
to understand what goes on) nor for bug fixing. Do you agree? ... although
this, of course, adds more work.
Ad e. Last, but not least, here a quickly written list of possible test cases
(inputs and partial output expectations) that should cover the code (which
could be checked with a coverage tool). The notation is:
- A, B, ... are Linq clauses
- xyzP is shorthand for a path x.y.P, where
-- y and z are :1 associations, and
-- P denotes one or more simple properties.
- IJ means "expected to be inner joined *without* existence clauses"
- OJ means "...to be outer joined *with* existence clauses"
For example, the first test case
(A && B) && (C || D)
// xyP in A, C --> xy IJ
could be expanded to
session.Query<MyBO>().Where(x =>
(x.BO1.I1 == value1 && x.Flag1)
&&
(x.BO1.I2 == value2 || x.Flag2)
For this expression, it would now be necessary to do the following tests:
(a) Is the joining as designed - i.e., is BO1 joined via an inner join?
(b) Is the complete rewrite ok? In other words, one should test via "condition
coverage," i.e., produce enough data so that each side of each && and ||
becomes true and false. I'd hope that there is some sort of tooling for this
(maybe another Linq tree walker? :-) ) - would you know any?
Towards the end, the list gets more and more sketchy ...
// 1. One || tree: (A && B) && (C || D)
// 1a. One path
// xyP in A, C --> xy IJ
// xyP in A, C, D --> xy IJ
// xyP in C, D --> xy IJ
// xyP in C --> xy OJ
// 1b. Two paths
// xyP in A, C; rsQ in C --> xy IJ, rs OJ
// xyP in A, C; rsQ in C, D --> xy IJ, rs IJ
// xyP in C, D; rsQ in C, D --> xy IJ, rs IJ
// xyP in C, rsQ in C --> xy OJ, rs OJ
// xyP in C, rsQ in D --> xy OJ, rs OJ
// 1c. Path and subpath
// wxyP in A, C --> IJ wx, IJ wxy
// wxy in C, wxyP D --> IJ wx, OJ wxy
// wxy in A, wxyP D --> IJ wx, OJ wxy
// 1d. Partially overlapping paths
// wxyP in A, C --> IJ wx, IJ wxy
// wxvP in C, wxyP D --> IJ wx, OJ wxv, OJ wxy
// wxvP in A, wxyP D --> IJ wx, IJ wxv, OJ wxy
// 2. ORDER BY: (A && B) && (C || D) ORDER BY E
// xyP in E --> xy OJ
// ...and all of 1a. with xyP in E
// 3. SELECT: (A && B) && (C || D) SELECT F
// xyP in F --> xy OJ
// xy in A, xyP in F --> xy OJ
// xyP in A, xyP in F --> xy IJ
// xyP in A, xy in F --> xy IJ
// ...and all of 1a. with xyP in F
// 4. SELECT and ORDER BY: (A && B) && (C || D) ORDER BY E
SELECT F
// ...all of 3. with xyP in E
// 5. Two || trees side by side: (A || B) && (C || D)
// 5a. One path
// xyP in A, C --> xy OJ
// xyP in A, C, D --> xy IJ
// xyP in C, D --> xy IJ
// xyP in C --> xy OJ
// 5b. Two paths
// xyP in A, C; rsQ in C --> xy OJ, rs OJ
// xyP in A, C; rsQ in C, D --> xy OJ, rs IJ
// xyP in A, B; rsQ in C, D --> xy IJ, rs IJ
// xyP in C, rsQ in C --> xy OJ, rs OJ
// xyP in A, C, D; rsQ in A, B, D --> xy IJ, rs IJ
// 5c. Path and subpath
// wxyP in A, C --> OJ wx, OJ wxy
// ...and many more...???...
// ...also partial paths as IJ, then rest as OJ...???...
// 6. Two || trees side by side and an &&: (A || B) && (C
|| D) && (E && F)
// ...Idea: E filters out OJ from A and C ???
// ...???...
// 7. Nested || trees: (A || B) || (C || D)
// ...TBD...
// 8. Deeply nested || trees: (A || (B || (C || D)))
// ... all of 7. (semantically equivalent! - but not for
the tree walking algorithms)
// 9. Any: Any(q => A || B)
// ???
// 10. Nested Any: Any(e => A || Any(f => B)
// ???
// 11. Nested Any with ||: Any(e => A || Any(f => B || C)
// ???
// 12. ...Join???, Group???...
So much for some input.
Best regards
Harald M.
-------- Original-Nachricht --------
> Datum: Wed, 23 Mar 2011 15:57:02 -0600
> Von: Patrick Earl <[email protected]>
> An: [email protected]
> Betreff: Re: [nhibernate-development] NH-2583 - Query with || operator and
> navigations (many-to-one) creates wrong joins
> Hi Harald.
>
> I'd like to discuss it with you as well, though I haven't had time to
> get my brain into the problem yet. I've seen your issue and
> appreciate the effort you're putting in. If you don't get traction
> from me or somebody else soon, feel free to bug me again. It's
> definitely an important case we need to take care of.
>
> Patrick Earl
>
--
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl