Hi Harald. Nice to hear from you again. I'm very thankful for the time you're putting into this. :)
My comments are inline. On Thu, Mar 24, 2011 at 5:42 AM, Harald Mueller <[email protected]> wrote: > Hi Patrick - then let me jump in with a list of items on my mind: > > a. Let's talk about design & code! > b. Dialects / database vendors to support > c. Still create table joins? or always build inner joins? > d. Unit testing vs. system testing > e. A first list of concrete tests to be done > > Ad a. I think it is necessary to explain the design and code (decisions). > Here is the most important item: I try to minimize the number of clauses and > left joins. The most important part of this is the attempt to "pull up" the > "existence clauses" as high as possible. What are "existence clauses?" Well, > as I wrote in a previous posting, the whole idea is to rewrite (Linq) > > ... || ...a.b.c.d.Prop... > > to (HQL) > > ... LEFT JOIN a.b AS a_b > ... LEFT JOIN a_b.c AS a_b_c > ... LEFT JOIN a_b_c.d AS a_b_c_d > ... > WHERE > ... OR (a_b IS NOT NULL AND a_b_c IS NOT NULL AND a_b_c_d IS NOT NULL > AND (...a.b.c.d.Prop...)) > > Thus, the objects on the "path" a.b.c.d are outer joined so that the || > operator can still "succeed on the left side". But before anything is > evaluated for a.b.c.d on the right side, it is guaranteed that the objects > exist by the IS NOT NULL clauses. I call these clauses "existence clauses". > > They are especially important in the case > > ... || a.b.Prop == null > > which can NOT be transformed simply to > > ... LEFT JOIN a.b AS a_b > ... > WHERE > ... OR a_b.Prop IS NULL > > because this will obviously be true also when a_b is not at all present. > Analyzing the condition for IS NULL etc. is in my experience (we tried it) > futile, as you can have arbitrarily complex conditions like !(!(!(a.b.Prop != > null))). So the (or rahter, one possible) correct rewrite is > > ... LEFT JOIN a.b AS a_b > ... > WHERE > ... OR (a_b IS NOT NULL AND (a_b.Prop IS NULL)) > > Naive rewriting of this sort has a problem with a query expression like > > a.b.Prop == 4 || a.b.Prop == 5 > > This would yield > > ... LEFT JOIN a.b AS a_b > ... > WHERE > (a_b IS NOT NULL AND (a_b.Prop == 4)) > OR (a_b IS NOT NULL AND (a_b.Prop == 5)) > > Obviously, neither the left join nor the existence clauses are necessary for > this frequently occurring pattern. Therefore, the code "pulls up" existence > clauses that occur on both sides of an ||. I think I see what you're saying here. It seems that if you have something like a.b.Prop == 5, the existence checks are not needed. Is that correct? When you are explicitly checking for null (a less frequent case), the existence checks are needed all the way up the chain. I also understand what you're saying about pulling the existence checks up to a common clause. Are we on the same page here? If so, I think it would be desirable to eliminate the majority of the existence clauses in the case of a non-null value check. I don't yet understand the implementation complexities involved > However, I do *not* go the extra length of reordering clauses. For example, > > (a.b.Prop == 4 || a.OtherProp == 100) || a.b.Prop == 5 > > will create left join + existence clauses for a.b. > > ... "and so on": I could write here or in comments more about examples and > considerations. However, maybe now you tune in with questions and comments of > yours! > > Ad b. Right now, I only test against SQL Server 2008 (Express and Standard). > The || problem is independent of that. Against which databases should I (or > we) test such changes? Something like SQL Server + MySql would sound useful > to me ... Everything you've talked about so far seems like it would apply properly against other databases. In fact, when the unit tests are written and committed, they will be automatically checked against MSSQL, Firebird, SQLite, and PostgreSQL. http://teamcity.codebetter.com I'm not too concerned about database-specific issues here. > Ad c. Right now, my code has an if at the end where it essentially says "if I > did not produce an outer join, then don't do anything different" = "use table > joins." What's a table join? An inner join? > I did this to not change the behavior in all cases that already work > correctly - which could be important for applications that have tuned their > indices to query plans resulting from the table joins. Personally, I'd prefer > to have "the one and only" join method - i.e., always create inner and outer > joins, never table joins. > > Ad d. "System tests," for this code, are tests that set up a database and > check whether the results are correct. > "Unit tests" would be tests that test only a single inner unit, e.g. the > AddJoinReWriter or even smaller parts. We will definitely need system tests. If there is sufficient data and queries, it seems to be possible to avoid writing unit tests for the inner functions. In other words, if the external code can verify the query is what it expects in "all cases", we can assume that the inner workings are sufficient. The patterns should be fairly straight-forward to test. My suggestion would be that if you need individual function tests during development, write them. However, as for ensuring everything is working properly in the end, I'm more interested in thorough system tests. It seems that you'd just need to create a number of tables that just follow a pattern with certain numbers of rows with certain attributes (like 3 rows with column A null, column B not null), and then check that the result count matches the number we're expecting for a given query. > I have the gut feeling that with the non-trivial algorithms, I'd like to also > write unit tests. Seeing problems only through the lens of the subsequent HQL > machine is not enough, neither for exploratory testing (if someone else wants > to understand what goes on) nor for bug fixing. Do you agree? ... although > this, of course, adds more work. Ultimately, I'm open to whatever testing strategy you'd like to follow, as long as it's thoroughly tested from a user perspective. > Ad e. Last, but not least, here a quickly written list of possible test cases > (inputs and partial output expectations) that should cover the code (which > could be checked with a coverage tool). The notation is: > > - A, B, ... are Linq clauses > - xyzP is shorthand for a path x.y.P, where > -- y and z are :1 associations, and > -- P denotes one or more simple properties. > - IJ means "expected to be inner joined *without* existence clauses" > - OJ means "...to be outer joined *with* existence clauses" > > For example, the first test case > > (A && B) && (C || D) > // xyP in A, C --> xy IJ > > could be expanded to > > session.Query<MyBO>().Where(x => > (x.BO1.I1 == value1 && x.Flag1) > && > (x.BO1.I2 == value2 || x.Flag2) > > For this expression, it would now be necessary to do the following tests: > > (a) Is the joining as designed - i.e., is BO1 joined via an inner join? > > (b) Is the complete rewrite ok? In other words, one should test via > "condition coverage," i.e., produce enough data so that each side of each && > and || becomes true and false. I'd hope that there is some sort of tooling > for this (maybe another Linq tree walker? :-) ) - would you know any? For a, I see what you're saying. We'd be testing for the "performance" aspect, where it wouldn't do a left join and then exclude certain rows later. For b, I'm not aware of any tooling to come up with this data. I think the best idea would be to just invent some classes and build a few loops to do the inserts. > > Towards the end, the list gets more and more sketchy ... > > // 1. One || tree: (A && B) && (C || D) > > // 1a. One path > // xyP in A, C --> xy IJ > // xyP in A, C, D --> xy IJ > // xyP in C, D --> xy IJ > // xyP in C --> xy OJ > > // 1b. Two paths > // xyP in A, C; rsQ in C --> xy IJ, rs OJ > // xyP in A, C; rsQ in C, D --> xy IJ, rs IJ > // xyP in C, D; rsQ in C, D --> xy IJ, rs IJ > // xyP in C, rsQ in C --> xy OJ, rs OJ > // xyP in C, rsQ in D --> xy OJ, rs OJ > > // 1c. Path and subpath > // wxyP in A, C --> IJ wx, IJ wxy > // wxy in C, wxyP D --> IJ wx, OJ wxy > // wxy in A, wxyP D --> IJ wx, OJ wxy > > // 1d. Partially overlapping paths > // wxyP in A, C --> IJ wx, IJ wxy > // wxvP in C, wxyP D --> IJ wx, OJ wxv, OJ wxy > // wxvP in A, wxyP D --> IJ wx, IJ wxv, OJ wxy > > // 2. ORDER BY: (A && B) && (C || D) ORDER BY E > > // xyP in E --> xy OJ > // ...and all of 1a. with xyP in E > > // 3. SELECT: (A && B) && (C || D) SELECT F > > // xyP in F --> xy OJ > // xy in A, xyP in F --> xy OJ > // xyP in A, xyP in F --> xy IJ > // xyP in A, xy in F --> xy IJ > // ...and all of 1a. with xyP in F > > // 4. SELECT and ORDER BY: (A && B) && (C || D) ORDER BY E > SELECT F > > // ...all of 3. with xyP in E > > // 5. Two || trees side by side: (A || B) && (C || D) > > // 5a. One path > // xyP in A, C --> xy OJ > // xyP in A, C, D --> xy IJ > // xyP in C, D --> xy IJ > // xyP in C --> xy OJ > > // 5b. Two paths > // xyP in A, C; rsQ in C --> xy OJ, rs OJ > // xyP in A, C; rsQ in C, D --> xy OJ, rs IJ > // xyP in A, B; rsQ in C, D --> xy IJ, rs IJ > // xyP in C, rsQ in C --> xy OJ, rs OJ > // xyP in A, C, D; rsQ in A, B, D --> xy IJ, rs IJ > > // 5c. Path and subpath > // wxyP in A, C --> OJ wx, OJ wxy > // ...and many more...???... > // ...also partial paths as IJ, then rest as OJ...???... > > // 6. Two || trees side by side and an &&: (A || B) && (C > || D) && (E && F) > > // ...Idea: E filters out OJ from A and C ??? > // ...???... > > // 7. Nested || trees: (A || B) || (C || D) > > // ...TBD... > > // 8. Deeply nested || trees: (A || (B || (C || D))) > > // ... all of 7. (semantically equivalent! - but not for > the tree walking algorithms) > > // 9. Any: Any(q => A || B) > > // ??? > > // 10. Nested Any: Any(e => A || Any(f => B) > > // ??? > > // 11. Nested Any with ||: Any(e => A || Any(f => B || C) > > // ??? > > // 12. ...Join???, Group???... I didn't have time to go over the list here, but it sounds like you're making good progress on what needs to be tested. Have you had a chance to figure out how to run all the existing tests on your PC? "ShowBuildMenu.bat" can be of some assistance there. Thanks again for pushing forward on this. It's certainly a non-trivial exercise and your effort is very much appreciated. Patrick Earl
