Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Another reason to do at least the LEFT JOIN + WHERE -> INNER JOIN optimisation: If a view is based on a LEFT JOIN, running a WHERE query on it will exhibit the same poor behavior and here there will be no way to rewrite the query. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Keith Medcalf wrote > The primary difference is likely the test on b.c. /There was some text here that seems to have been lost and my browser opened a "super resume templates" page... maybe I've been already spectre-hacked? Anyway, I repost:/ The primary difference from what I see is that it changes the query plan to start off with an index-based search on b, not a search (scan) on a. To me, this can be due to the supplimentary condition in the ON clause making this plan more appealing to the optimizer. So with this in mind, an easy optimisation would be to just "permeate" any imperative non-null condition in the where clause to the on clause. I don't know if the execution plan is identical to that of an inner join (I never tried to actually convert it to an inner join and I'm working on sth else atm), but it's satisfactory and it starts off the right way, with a search on b, with an execution time well in the expected margins. Keith Medcalf wrote > I think the query is ill-conceived when written as an outer join. If anything, to reflect intentionality the query should have been written as such: SELECT FROM ( SELECT FROM a LEFT JOIN b ) WHERE b.c=5 -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Keith Medcalf wrote > The primary difference is likely the test on b.c. Keith Medcalf wrote > I think the query is ill-conceived when written as an outer join. If anything, to reflect intentionality the query should have been written as such: SELECT FROM ( SELECT FROM a LEFT JOIN b ) WHERE b.c=5 -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
>To reiterate, Keith: to get the query to execute properly, I didn't >change >the LEFT JOIN to an INNER JOIN! >Nope, >I rewrote > >SELECT >FROM > a > LEFT JOIN b ON >WHERE b.c=5 This means: Take all rows of table a match with rows of table b on the key and where there is no match "fill" columns that would have come from b with a NULL AND THEN AFTER THAT return all rows where b.c = 5 >to >SELECT >FROM > a > LEFT JOIN b ON AND b.c=5 >WHERE b.c IS NOT NULL This translates to: Take all rows of table a match with rows of table b on the key AND where b.c = 5 and where there is no match "fill" columns that would have come from b with a NULL AND THEN AFTER THAT return only rows where b.c is not NULL The primary difference is likely the test on b.c. You see, to check if b.c is a particular value requires unpacking the row and retrieving the value. Checking whether b.c IS NULL does not require unpacking the data but merely reading the 'datatype', which should be a much faster operation. If you wrote the query as an inner join it would be much faster since it would not be possible after projection for b.c to be anything other than not null. >So I just added a redundant predicate and it runs perfectly, on >SQLite! >That's why I said this simple improvement can surely be taken care of >on the >optimizer, while the larger discussion of actually changing the outer >join >to an inner join or even tackling the -OR- case is for sure something >nice >to think of, but increasingly more complicated. I think the query is ill-conceived when written as an outer join. You are asking the query to "project" a result containing a crap-load of rows which you will subsequently remove from the result set. For bitty-bases (a couple of hundred rows) this is irrelevant. However, for tables containing millions of rows, creating a googillion of rows then culling out all of them except 4 is a very bad use of computer resources. Why generate them in the first place if you are just going to discard them? I have seen many a mis-designed program which works blazingly quickly on a "test" database that contains a dozen customers, half-a-dozen products, with two or three components each. But when populated with the production data of 1,000,000 customers, 500,000 products, and between 100 and 10,000 components per product frankly take "years" to run any step and are completely unuseable. One should always design as if each table contained a googillion rows. If it is fast enough then, it will certainly be efficient enough to handle your ten row per table database. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
To reiterate, Keith: to get the query to execute properly, I didn't change the LEFT JOIN to an INNER JOIN! Nope, I rewrote SELECT FROM a LEFT JOIN b ON WHERE b.c=5 to SELECT FROM a LEFT JOIN b ON AND b.c=5 WHERE b.c IS NOT NULL So I just added a redundant predicate and it runs perfectly, on SQLite! That's why I said this simple improvement can surely be taken care of on the optimizer, while the larger discussion of actually changing the outer join to an inner join or even tackling the -OR- case is for sure something nice to think of, but increasingly more complicated. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Keith Medcalf wrote > but cannot be re-ordered for OUTER JOINS. Actually, I think order requirements (or rather, row grouping requirements, as far as I can paint it the requirement is just that all outer joined rows come in a bunch for each main join tuple) would not be violated if the join is made on an unique key left-side and an index is used right-side :) or something similar. I don't know, extensive algebra must be involved :) Even without index order inference, the main trunk keys can be sorted in a temp structure to preserve the condition, like in a GROUP BY query. However, I do see that SQLite seems to actually do it (scan b before a I mean) if I reqrite the query as I showed. select * from a, b where a.a *= b.a was replaced by select * from a LEFT JOIN b ON a.a = b.a Right, right, and with this in mind you can see my problem with the query is so easy to understand: My query, on the old format, is: select * from a, b where a.a *= b.a AND b.c = 5 My "improved" query, on the old format: select * from a, b where a.a *= b.a AND b.c *= 5 AND b.c = 5 // (OR IS NOT NULL) You can see the b.c *= 5 (JOIN ON ... AND b.c=5 ... WHERE b.c IS NOT NULL) is redundant, because it's just a weaker predicate, and I needed to add it just as an index hint on the join loop to trigger the right execution plan. Keith Medcalf wrote > I would sincerely doubt that there is *any* SQL optimizer or query planner > that can optimize mis-spoken queries containing OUTER JOINS. I don't know about mis-spoken, I don't think anything is mis-spoken. This app is running fine on Maria, I'm in the process of porting in to SQLite. I wouldn't have picked on this query unless it was lagging behind orders of magnitude (2.5s vs 50ms). So I think Maria does it (I haven't bothered to check the execution plan there, went straight to hacking SQLite). -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
>Thanks for acknowledging this, you are absolutely right, that's why I >stated >that no DB does perfect optimisations and that computing the >alternative >-OR- based closures are probably much harder to tackle. Also E. Pasma >pointed out the -OR- queries as I wrote them are not really >semantically >equivalent unless the 2 joins are disjunct. I suspect that query re-write of an outer join to an inner join would be violating some rule in the new SQL standards, most likely about visitation (nested loop) order -- which is freely reorderable for INNER JOINS but cannot be re-ordered for OUTER JOINS. Back in the "olden days" one specified outer join conditions in the WHERE clause using *= =* or *=* syntax, where the * was on the side of the operator where all rows came from. This was deprecated many years ago when the JOIN ON syntatactic sugar was created because too many people where forgetting that they need to use the appropriate * to "bind" the where condition to the appropriate outer join binding, then complaining that it was too difficult to remember or figure out where a plain "WHERE" clause was needed and where a "OUTER JOIN" bound condition was required, resulting in many calls to SQL Database support lines. And these same vendors also happen to be who write the specs, so they promptly changed the spec to eliminate the support calls. So the ** format was written out of the standard and the OUTER JOIN ON became mandatory ONLY FOR OUTER JOIN operations. In all other cases the JOIN ON syntax was just syntactic sugar for the old "list of tables" and where clause. Don't recall exactly when this occurred but it was about two decades ago. ( I remember it well because there were many queries that could not be expressed in the new-fangled format ) So really, adding conditions to apply to an OUTER JOIN to the where clause is equivalent to the common error of yester-decades of forgetting the *. OLD SYNTAX: select * from a, b where a.a *= b.a was replaced by select * from a LEFT JOIN b ON a.a = b.a Of course, the old syntax allowed one to specify algebraic conditions that can no longer be expressed with the new syntax. But that is OK, it is easier and simpler to say: go RTFM, it is jolly clear, and if you don't like it then boo-hoo on you. A good implementation was destroyed. Live with it, love it, and get over it. I would sincerely doubt that there is *any* SQL optimizer or query planner that can optimize mis-spoken queries containing OUTER JOINS. Not even DB2's exhaustive search query planner/optimizer can do it, and it is quite possibly now one of the best in existence if you tell it that it can take unlimited time and resources to generate (and run) the plan. All others pale in comparison. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
At the risk of preserving this thread well past its end of life cycle... On 2018/01/05 6:58 PM, Dinu wrote: Ryan, You cannot ask SQL a Method query, that's where my whole RDBMS understanding takes me. Everything you ask SQL is underpinned by a specific Method. Perhaps I should have been more clear - by METHOD I mean, in the case of the join method, "the /way/ you expect the query to enumerate rows" such as an Inner join being one method, and outer join being another method. Sorting/Ordering is a method of output, grouping, etc. Why would the SQL standard propose these different methods if they were not meaningful and distinct? As I said before, there need not be an AI to judge that the query uttered by the programmer can in fact, in a narrow case, be recomputed as another query because the result will be the same and hopefully that the alternate method would be more efficient. Note that I said there needn't be... I did not say there /can't/ be one, indeed, query flattening is a good example, but with query flattening the cost is low and the reward is high for a really broad spectrum of cases. Why did I say it is not needed? Well, what you propose has a relatively high cost (added heuristic AI) considering it is paid across all DB engine query planning to achieve a small advantage in the narrowest of use cases, not to mention that - should the programmer wish for a speed-up for the left join that conforms to this narrow set of circumstances, he or she could instantly change it to a normal join (the way it should have been to start with) and enjoy the fruits of the added speed with zero cost to the rest of us who wouldn't have made the imperfect query in the first place. Why do I call it narrow? Have you looked at your example queries in detail? Do you know how many things must be exactly just so (or how many other normal query things must be absent) for that join replacement to work algebraically? At least in the case of query flattening, it improved a query construct that is found abundantly and considered the correct construct for the expected results. You essentially want the engine to second-guess programmers who didn't write the best query for their expected results. That kind of hand-holding belongs to the realms of Microsoft and MySQL. (Apologies for all the word clarifications, but I'd rather avoid having this turn into a "semantics" debate, so trying to be as clear as possible on meanings of statements... not sure I succeeded though) :) Cheers, Ryan PS: I'm not judging MySQL, at least it has the benefit of being a fully fledged server-side software and greatly tweak-able on the fly for all its hand-holdy functionality. PPS: I will say this - If you're not using MySQL in STRICT mode, you are n hours away from some disaster, where n is a not-too-big positive integer. PPPS: I wish SQLite had a STRICT mode. :) ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Short sum-up: - The -OR- alternative I provided is not semantically equivalent, I will work on one that is :) - The other one, without the -OR- (second post) still stands. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Richard, Thanks for acknowledging this, you are absolutely right, that's why I stated that no DB does perfect optimisations and that computing the alternative -OR- based closures are probably much harder to tackle. Also E. Pasma pointed out the -OR- queries as I wrote them are not really semantically equivalent unless the 2 joins are disjunct. However, the case of the imperative WHERE NOT NULL implying INNER JOIN is just a matter of replacing a predicate with a stronger one, so in all fairness I imagined it a far lesser overhead than, say, the query flattener. And I imagine it's a much more common situation, too, especially when users are adding additional filters via WHERE clauses to a base query, so it might benefit a lot of users, too. I know it would us, by not having to rewrite these queries when porting; we are working on x86 servers, and a stick of memory or a hard drive cost less than a programmer's day for us :) For the extra memory, I know for computing relational closures the spatial complexity can get big, but only when the structure of the query is written warrants it in the first place, so it shouldn't manifest heavily on a query that doesn't have this structure. This is just my best view on this, obviously it's a political decision to be made so it's no make-it-or-break-it thing, like mentioned before, we are porting some pretty big system and when I notice differences with SQLite, I jolt them down, in the hope it might benefit you or the millions of users, if not by changing SQLite, then simply by pointing out the workaround to other users, such as moving the WHERE condition out to the ON clause, it's not necessarily a trivial thing to consider for everyone. Ryan, You cannot ask SQL a Method query, that's where my whole RDBMS understanding takes me. It nullifies the purpose of queries as well as all efforts you yourselves have put into a lot of things, query flattening to mention just one. The "same result" is not accidental, the equivalent queries will produce the same result no matter which data populates the tables. That is the only deffinition I know of semantic equivalence. SQL is declarative and thus everything that describes the same thing is the same thing. E.Pasma, Thanks for taking the time to make the TC. This is always a huge putdown for me, because finally the execution plan depends on the data indexes are populated with (via ANALYZE) and are tables are huge so it's always a putdown for me to create a minimal TC. Indeed I noticed just now the 2 queries are not equivalent that way :) Thanks for pointing that out! I will work on an equivalent -and- optimized rewrite :) For the query plans though, here is where the index stats come in: here a "SCAN a" makes sense, but in our case the number of records in a is on the order of 1x records to b and c, and also the cardinality of b.d and c.d is on the order of 1000; so a "SEARCH b, SEARCH c" works out. At minimum you should have indexes on b.d, c.d, a.ab, a.ac; but even so and with adding another 1000 records on a, b and c and running the query: EXPLAIN QUERY PLAN SELECT * FROM a JOIN b ON b=ab AND b.d IN (1,2,3) JOIN c ON c=ac AND c.d IN (4,5) selectid |order |from |detail | -|--|-|---| 0|0 |0|SCAN TABLE a | 0|1 |1|SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) | 0|0 |0|EXECUTE LIST SUBQUERY 1| 0|2 |2|SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) | it still plans a "SCAN a" first. So I guess I'll have to backtrack from the real data to generate a TC. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
op 05-01-2018 17:23 schreef David Raymond op david.raym...@tomtom.com: >> Anyway the two queries return the same set of rows. > >> This test also show a small semantic difference in the two queries. >> The set of rows is the same but the second query leaves certain >> details null if only one of the OR conditions is true. That occurs in >> row 1. > > You're contradicting yourself there. If there's a difference in the results > then they're not the same set of rows. > > We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has > (1, 1) > > Best if viewed in a fixed-width font > > Step by step version 1: > > a left join b on b = ab > > a b > a ab ac b d > (1, 1, 1) (1, 1) > > result: > a ab ac b d > (1, 1, 1, 1, 1) > > left join c on c = ab > > c > a ab ac b d c d > (1, 1, 1, 1, 1) (1, 1) > > result: > a ab ac b d c d > (1, 1, 1, 1, 1, 1, 1) > > where b.d in (1, 2, 3) or c.d in (4, 5) > > b.d is 1, so it passes > result: > > a ab ac b d c d > (1, 1, 1, 1, 1, 1, 1) > > The OR worked. We got the values from both tables b, and c and because one of > them was correct. > > > > Now, Step by step version 2: > > a left join b on b = ab and b.d in (1, 2, 3) > > a b > a ab ac b d > (1, 1, 1) (1, 1) > > result: > a ab ac b d > (1, 1, 1, 1, 1) same so far (but only by coincidence) > > left join c on c = ac and c.d in (4, 5) > > c > a ab ac b d c d > (1, 1, 1, 1, 1) (1, 1) > > the ON condition doesn't match. Since this is an outer join, and there were no > matches for the row in the left side, nulls are included > > result: > a ab ac b dc d > (1, 1, 1, 1, 1, null, null) > > where b.d is not null or c.d is not null > > b.d isn't null, so that passes. > result: > > a ab ac b dc d > (1, 1, 1, 1, 1, null, null) > > So we get a row saying that a matched something in b, but we're throwing out > the value from the c table, which is not what we wanted. Thanks, all clear except this last line. Did we not want the value to be thrown out. Or not want the value? It depends on that which query is favourite. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
> Anyway the two queries return the same set of rows. > This test also show a small semantic difference in the two queries. > The set of rows is the same but the second query leaves certain > details null if only one of the OR conditions is true. That occurs in > row 1. You're contradicting yourself there. If there's a difference in the results then they're not the same set of rows. We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has (1, 1) Best if viewed in a fixed-width font Step by step version 1: a left join b on b = ab a b a ab ac b d (1, 1, 1) (1, 1) result: a ab ac b d (1, 1, 1, 1, 1) left join c on c = ab c a ab ac b d c d (1, 1, 1, 1, 1) (1, 1) result: a ab ac b d c d (1, 1, 1, 1, 1, 1, 1) where b.d in (1, 2, 3) or c.d in (4, 5) b.d is 1, so it passes result: a ab ac b d c d (1, 1, 1, 1, 1, 1, 1) The OR worked. We got the values from both tables b, and c and because one of them was correct. Now, Step by step version 2: a left join b on b = ab and b.d in (1, 2, 3) a b a ab ac b d (1, 1, 1) (1, 1) result: a ab ac b d (1, 1, 1, 1, 1) same so far (but only by coincidence) left join c on c = ac and c.d in (4, 5) c a ab ac b d c d (1, 1, 1, 1, 1) (1, 1) the ON condition doesn't match. Since this is an outer join, and there were no matches for the row in the left side, nulls are included result: a ab ac b dc d (1, 1, 1, 1, 1, null, null) where b.d is not null or c.d is not null b.d isn't null, so that passes. result: a ab ac b dc d (1, 1, 1, 1, 1, null, null) So we get a row saying that a matched something in b, but we're throwing out the value from the c table, which is not what we wanted. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Dinu wrote: Hi all, I've ran into an optimisation problem with a double-left join that works as an "either" clause. The query is as follows: SELECT * FROM a LEFT JOIN b ON LEFT JOIN c ON WHERE b.someId IN (1,2,3) OR c.someId IN (4,5) This results in a bloated execution plan: SEARCH a SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX However, the semantically equivalent: SELECT * FROM a LEFT JOIN b ON AND b.someId IN (1,2,3) LEFT JOIN c ON AND c.someId IN (4,5) WHERE b.someId IS NOT NULL OR c.someId IS NOT NULL Gets the proper execution plan: SEARCH b SEARCH c EXECUTE LIST SUBQUERY Hello, the discussion about whether the two queries are equivalent is not satisfactory to me. What Keith sais Your "where" condition is effectively converted an OUTER JOIN into an INNER JOIN .. is true. But the OR condition makes this true for either the one or the other outer join. I hope this is what Dinu means here: And the two queries are algebrically identical. "(X ∊ S or X:=null) AND (X is not null)" is equivalent to "X ∊ S is not null". The two queries might look different only from an imperative programming point of view. Anyway the two queries return the same set of rows. Furthermore: what is a "bloated" execution plan? I set up some test data and the query deamed bloated appears just as fast. See below. This test also show a small semantic difference in the two queries. The set of rows is the same but the second query leaves certain details null if only one of the OR conditions is true. That occurs in row 1. The outcome of the "bloated" execution plan is more complete. Possibly I am too pragmatical and don't understand the discussion. E. Pasma My test script: create table a (a integer primary key, ab, ac); create table b (b integer primary key, d); create table c (c integer primary key, d); insert into a values (null,1,1); insert into a select null, 2,2 from a; insert into a select null, 3,2 from a; insert into a select null, 4,4 from a; insert into a select null, 5,5 from a; insert into a select null, 6,6 from a; insert into a select null, 7,7 from a; insert into a select null, 8,8 from a; insert into a select null, 9,9 from a; insert into a select null, 10,10 from a; insert into b values (1,1),(2,2),(3,3),(4,3),(6,3); insert into c values (1,1),(4,5),(5,5),(7,1); .eqp on .timer on SELECT * FROM a LEFT JOIN b ON b=ab LEFT JOIN c ON c=ac WHERE b.d IN (1,2,3) OR c.d IN (4,5) ; SELECT * FROM a LEFT JOIN b ON b=ab AND b.d IN (1,2,3) LEFT JOIN c ON c=ac AND c.d IN (4,5) WHERE b.d IS NOT NULL OR c.d IS NOT NULL ; Output: --EQP-- 0,0,0,SCAN TABLE a --EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) --EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1 1|1|1|1|1|1|1 2|2|2|2|2|| 3|3|2|3|3|| 4|3|2|3|3|| 5|4|4|4|3|4|5 6|4|4|4|3|4|5 7|4|4|4|3|4|5 8|4|4|4|3|4|5 9|5|5|||5|5 10|5|5|||5|5 11|5|5|||5|5 12|5|5|||5|5 13|5|5|||5|5 14|5|5|||5|5 15|5|5|||5|5 16|5|5|||5|5 17|6|6|6|3|| 18|6|6|6|3|| 19|6|6|6|3|| 20|6|6|6|3|| 21|6|6|6|3|| 22|6|6|6|3|| 23|6|6|6|3|| 24|6|6|6|3|| 25|6|6|6|3|| 26|6|6|6|3|| 27|6|6|6|3|| 28|6|6|6|3|| 29|6|6|6|3|| 30|6|6|6|3|| 31|6|6|6|3|| 32|6|6|6|3|| Run Time: real 0.003 user 0.001587 sys 0.000358 --EQP-- 0,0,0,SCAN TABLE a --EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1 --EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) 1|1|1|1|1|| 2|2|2|2|2|| 3|3|2|3|3|| 4|3|2|3|3|| 5|4|4|4|3|4|5 6|4|4|4|3|4|5 7|4|4|4|3|4|5 8|4|4|4|3|4|5 9|5|5|||5|5 10|5|5|||5|5 11|5|5|||5|5 12|5|5|||5|5 13|5|5|||5|5 14|5|5|||5|5 15|5|5|||5|5 16|5|5|||5|5 17|6|6|6|3|| 18|6|6|6|3|| 19|6|6|6|3|| 20|6|6|6|3|| 21|6|6|6|3|| 22|6|6|6|3|| 23|6|6|6|3|| 24|6|6|6|3|| 25|6|6|6|3|| 26|6|6|6|3|| 27|6|6|6|3|| 28|6|6|6|3|| 29|6|6|6|3|| 30|6|6|6|3|| 31|6|6|6|3|| 32|6|6|6|3|| Run Time: real 0.002 user 0.001560 sys 0.000296 ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
On 1/4/18, Dinu wrote: > I think it's every DB's intention to optimize as best possible > a query into an execution plan. None does it perfectly, but all try to, very > hard. There are trade-offs here. How much slower are you willing for sqlite3_prepare() to run in order to get a better query plan? How much extra memory and disk space are you willing to allocation to libsqlite3.so in order to get a better query plan? Are you willing to impose these costs on (literally) billions of other users that don't really need the more advanced query planning? These are hard questions with no easy answers. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
On 2018/01/05 4:24 AM, Dinu wrote: Thank you for your answer, Keith. I had my problem "fixed" before I wrote the first mail. Also with every problem I also provided the fix that worked, for anyone that might run into the same problem. However, it's difficult to not get a little frustrated with your answer. At https://sqlite.org/queryplanner.html I read: "The best feature of SQL (in all its implementations, not just SQLite) is that it is a declarative language, not a procedural language. When programming in SQL you tell the system what you want to compute, not how to compute it." I'm sure his frustration is on a par. :) While we've both stated that your queries are not equivalent, Keith took the time to write an explanation of why that is, which seemingly did not hit home, and now I will try again with an analogy: You are essentially have a delivery person usually tasked to take a truck, go to a farm and pick up a load of eggs. Now you ask the same driver to use the same truck to go to the corner cafe and pick up 6 eggs, and then you exclaim "Wow, why he takes the truck? he can just take the scooter!! The truck is sooo inefficient for this job!". And yes, you are right, but the problem is you asked for it to be done by truck. Now I agree the ideal in SQL is (as the quote above states) that one should merely ask for a result and the engine should decide how best to achieve it, but in practice there are many nuances in the programming of the engine that thwarts this ideal, not to mention how many programmers like to tweak their queries to get the engine to execute the quickest, and there is nothing wrong with this. It does however mean that the engine should in all circumstances, while trying to find the best query plan, still adhere to the type of question that was asked of it. You asked a METHOD A question, it won't (and shouldn't) apply a METHOD B to reach the accidental similar result. Second problem, your queries show the narrowest of use cases. The engine has to work for ALL use cases which can get very involved and complex. The engine needn't have another level of abstraction AI going "Oh this one is simple, we will disregard what the programmer asked for and use our own more simple query because it should get the same result." To sum it up: I think it's every DB's intention to optimize as best possible a query into an execution plan. None does it perfectly, but all try to, very hard. With this intention, I reported a case where the query planner COULD be improved. I think you will at least agree with me that making it better can't be wrong. Whether that happens tomorrow, in a year or never, that's up to the mercy, resources and priorities of the developers, so I am really am not interested in an argue over this. It's a good idea to report possible improvements, and thank you for that, but this case isn't able to improve since mangling an outer join into an inner join when sometimes it might yield the same result is as unsafe as it gets. However, that doesn't mean the devs (who would have read all this) doesn't find something of interest and could possibly think of a tweak that might improve things, so having this debate is never a waste, but the specific algebraic essence of what you are suggesting is not correct - 's all we're sayin. Cheers! Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Thank you for your answer, Keith. I had my problem "fixed" before I wrote the first mail. Also with every problem I also provided the fix that worked, for anyone that might run into the same problem. However, it's difficult to not get a little frustrated with your answer. At https://sqlite.org/queryplanner.html I read: "The best feature of SQL (in all its implementations, not just SQLite) is that it is a declarative language, not a procedural language. When programming in SQL you tell the system what you want to compute, not how to compute it." And I completely agree with this, "how to compute it" is called relational algebra and it's what a query planner should do best. And the two queries are algebrically identical. "(X ∊ S or X:=null) AND (X is not null)" is equivalent to "X ∊ S is not null". The two queries might look different only from an imperative programming point of view. As to why the query is written that way: with the above in mind, I will contend that there can absolutely never exist a "mistaken" way to write a query, as long as the description of the predicates is correct and consistent with the schema. You should consider that quite frequently queries are the result of one or more levels of logic abstraction (ORM, DBAL, etc). In my case, modifying the query was not difficult to do, but in other cases one may have few options on rewriting the way the query structure is generated. The only way to reduce a fabricated query is through relational algebra, and that is up to the DB, not the programmer, not the abstractions in-between. In this particular case, the where is optional; depending on parameters, I want the set of data that is correctly defined as the left join of tables a and b, or I might want a subset of this join that has a particular property over the left-joined set. The query was correctly written, to rewrite it so that the query planner might know how to run it is wrong, IMHO. To sum it up: I think it's every DB's intention to optimize as best possible a query into an execution plan. None does it perfectly, but all try to, very hard. With this intention, I reported a case where the query planner COULD be improved. I think you will at least agree with me that making it better can't be wrong. Whether that happens tomorrow, in a year or never, that's up to the mercy, resources and priorities of the developers, so I am really am not interested in an argue over this. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
They are not the same. Just as 5 - 3 is not the same as 1 + 1, even though both come up with the same result, 2. by happenstance. Your "where" condition is effectively converted an OUTER JOIN into an INNER JOIN through artifice (and quite likely mistake). If you *want* an inner join, use an inner join. If you want an outer join, use an outer join. Just because subtraction of two different numbers may have the same result as addition of two other numbers, does not meant that addition and subtraction are the same thing. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Dinu >Sent: Thursday, 4 January, 2018 16:01 >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] LEFT JOIN + WHERE / OR optimisation > >Algebrically, having a non-null imperative lookup condition in the >WHERE >clause means you have a stronger predicate on the same subject (ALL >MUST fit >vs. ANY that fit). > > > >-- >Sent from: http://sqlite.1065341.n5.nabble.com/ >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Algebrically, having a non-null imperative lookup condition in the WHERE clause means you have a stronger predicate on the same subject (ALL MUST fit vs. ANY that fit). -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
I think they are equivalent, if you look closer. SELECT FROM a LEFT JOIN b ON a.x=b.x WHERE b.y=5 -is- equivalent to SELECT FROM a JOIN b ON a.x=b.x AND b.y=5 SELECT FROM a JOIN b WHERE a.x=b.x AND b.y=5 SELECT FROM a LEFT JOIN b ON a.x=b.x AND b.y=5 WHERE b.y IS NOT NULL All the above are semantically equivalent. When there is only one LEFT JOIN, the presence of any non-null non-alternative condition on the joined table in the WHERE clause transforms it in an INNER join. There is no other way to have a non-null value except if the row exists. The reciprocal is not true of course. I don't know how difficult it is to compute the -OR- closure, as it is more difficult. But for an imperative non-null condition, I did expect the WHERE condition to be ported to the ON lookup for optimisation. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
The ON condition is used the "add one result row for each row of the outer table where nothing matches the ON condition" The WHERE condition is used those rows are added. Example with the basic "not in" type of outer join: SQLite version 3.21.0 2017-10-24 18:55:49 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create table a (x); sqlite> create table b (x); sqlite> insert into a values (1), (2), (3); sqlite> insert into b values (2); sqlite> select a.x from a left outer join b on a.x = b.x where b.x is null; x 1 3 sqlite> select a.x from a left outer join b on a.x = b.x and b.x is null; x 1 2 3 sqlite> select a.x from a left outer join b where a.x = b.x and b.x is null; sqlite> -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf Sent: Thursday, January 04, 2018 2:53 PM To: SQLite mailing list Subject: Re: [sqlite] LEFT JOIN + WHERE / OR optimisation They are not semantically equivalent. join conditions attached to an outer join operation are not semantically equivalent to the same conditions being in the where clause. In other words: select a,b,c from a join b join c on a.a=b.b where c.c=b.d is simply syntactic sugar for select a,b,c from a, b, c where a.a=b.b and c.c=b.d; In all cases the conditions in ON clauses of INNER JOINS are nothing more than WHERE clause filters. You do not even have to have the tables used in the ON clause "referenced" at the point you refer to them. the word "INNER JOIN" is syntactic sugar for a comma (,), and ON is sytactic sugar for the word WHERE (or AND). However, for OUTER JOINS the conditions in the ON clause "glue themselves" to the OUTER JOIN operation and ARE NEITHER syntactically or symantically the same as WHERE clause conditions. That is to say the behaviour observed is how it is designed to work and you expectations are misguided. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Dinu >Sent: Thursday, 4 January, 2018 12:29 >To: sqlite-users@mailinglists.sqlite.org >Subject: [sqlite] LEFT JOIN + WHERE / OR optimisation > >Hi all, >I've ran into an optimisation problem with a double-left join that >works as >an "either" clause. > >The query is as follows: > >SELECT * >FROM > a >LEFT JOIN > b ON >LEFT JOIN > c ON >WHERE > b.someId IN (1,2,3) OR > c.someId IN (4,5) > >This results in a bloated execution plan: >SEARCH a >SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX >SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX > >However, the semantically equivalent: >SELECT * >FROM > a >LEFT JOIN > b ON AND b.someId IN (1,2,3) >LEFT JOIN > c ON AND c.someId IN (4,5) >WHERE > b.someId IS NOT NULL OR > c.someId IS NOT NULL > >Gets the proper execution plan: >SEARCH b >SEARCH c >EXECUTE LIST SUBQUERY > > > >-- >Sent from: http://sqlite.1065341.n5.nabble.com/ >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
On 2018/01/04 9:28 PM, Dinu wrote: Hi all, I've ran into an optimisation problem with a double-left join that works as an "either" clause. The query is as follows: SELECT * FROM a LEFT JOIN b ON LEFT JOIN c ON WHERE b.someId IN (1,2,3) OR c.someId IN (4,5) This results in a bloated execution plan: SEARCH a SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX However, the semantically equivalent: SELECT * FROM a LEFT JOIN b ON AND b.someId IN (1,2,3) LEFT JOIN c ON AND c.someId IN (4,5) WHERE b.someId IS NOT NULL OR c.someId IS NOT NULL Gets the proper execution plan: SEARCH b SEARCH c EXECUTE LIST SUBQUERY These Queries are not equivalent, they cannot and should not have the same query plan. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
They are not semantically equivalent. join conditions attached to an outer join operation are not semantically equivalent to the same conditions being in the where clause. In other words: select a,b,c from a join b join c on a.a=b.b where c.c=b.d is simply syntactic sugar for select a,b,c from a, b, c where a.a=b.b and c.c=b.d; In all cases the conditions in ON clauses of INNER JOINS are nothing more than WHERE clause filters. You do not even have to have the tables used in the ON clause "referenced" at the point you refer to them. the word "INNER JOIN" is syntactic sugar for a comma (,), and ON is sytactic sugar for the word WHERE (or AND). However, for OUTER JOINS the conditions in the ON clause "glue themselves" to the OUTER JOIN operation and ARE NEITHER syntactically or symantically the same as WHERE clause conditions. That is to say the behaviour observed is how it is designed to work and you expectations are misguided. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Dinu >Sent: Thursday, 4 January, 2018 12:29 >To: sqlite-users@mailinglists.sqlite.org >Subject: [sqlite] LEFT JOIN + WHERE / OR optimisation > >Hi all, >I've ran into an optimisation problem with a double-left join that >works as >an "either" clause. > >The query is as follows: > >SELECT * >FROM > a >LEFT JOIN > b ON >LEFT JOIN > c ON >WHERE > b.someId IN (1,2,3) OR > c.someId IN (4,5) > >This results in a bloated execution plan: >SEARCH a >SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX >SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX > >However, the semantically equivalent: >SELECT * >FROM > a >LEFT JOIN > b ON AND b.someId IN (1,2,3) >LEFT JOIN > c ON AND c.someId IN (4,5) >WHERE > b.someId IS NOT NULL OR > c.someId IS NOT NULL > >Gets the proper execution plan: >SEARCH b >SEARCH c >EXECUTE LIST SUBQUERY > > > >-- >Sent from: http://sqlite.1065341.n5.nabble.com/ >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
Probably related: Compound join with a left outer join generates different execution plans: LEFT JOIN ( b JOIN c ON ) WHERE b.something = 5 vs. LEFT JOIN ( b JOIN c ON AND b.something = 5 ) WHERE b.something IS NOT NULL -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] LEFT JOIN + WHERE / OR optimisation
Hi all, I've ran into an optimisation problem with a double-left join that works as an "either" clause. The query is as follows: SELECT * FROM a LEFT JOIN b ON LEFT JOIN c ON WHERE b.someId IN (1,2,3) OR c.someId IN (4,5) This results in a bloated execution plan: SEARCH a SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX However, the semantically equivalent: SELECT * FROM a LEFT JOIN b ON AND b.someId IN (1,2,3) LEFT JOIN c ON AND c.someId IN (4,5) WHERE b.someId IS NOT NULL OR c.someId IS NOT NULL Gets the proper execution plan: SEARCH b SEARCH c EXECUTE LIST SUBQUERY -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users