Re: [sqlite] JOIN vs IN
On Fri, Nov 11, 2011 at 8:06 PM, Petite Abeille wrote: >> No, "exists" in this case will change query plan significantly and >> performance can degrade drastically as a result. > > Why would that be? How would you rewrite the query using exists? The only thing I have in mind is SELECT * FROM table2 WHERE exists ( SELECT 1 FROM table1 WHERE amount > 500 AND table1.rowid = table2.rowid) And this query will force SQLite to use the second query plan I talked about: scan full table2 and for each row search in table1 for rows with the same rowid and check if it has necessary amount. And this plan will very likely be slower. Pavel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] JOIN vs IN
Fabian wrote: Suppose I have two tables, and I want to have look for a value in the first table, but display the columns from the second table. The most obvious way would be joining them on rowid. But I don't need to SELECT any columns from the first table, and it's a FTS4 table (which always joins a bit slower than real tables), so I currently do: SELECT * FROM table2 WHERE rowid IN (SELECT rowid FROM table1 WHERE amount 500) It returns the same results, but it doesn't seem much faster. Is there any performance difference to be expected from using IN instead of JOIN, or does SQLite internally rewrite JOIN queries to something similar as IN, which would explain they perform nearly the same? The logical operation you are doing is a semijoin, filtering table2 by matching rows in table1 (if you used NOT IN instead you would be doing an antijoin). A semijoin is most clearly expressed in SQL using the WHERE clause as you did, because the only purpose of table1 is to filter and not to return values from, as putting it in FROM would imply. Now because SQL is bag oriented rather than set oriented, using IN also helps because you avoid generating extra duplicates, whereas if you used the join method instead, then if any row in one table matched multiple rows in the other (because you weren't joining on a (unique) key of both tables), the result could have duplicate table2 rows, which probably isn't what you want. As to your performance question, any good DBMS should make both of your methods perform about the same, but that if they aren't the same, the IN version should always perform faster than the FROM version because with IN you only ever have to look at each row in table2 once; as soon as it finds any match you move on, rather than repeating for all possible matches. Note that semijoins and antijoins are what you have both when you have another select after the NOT/IN and when you have a literal list, such as "IN (1,2,3)". Note that any WHERE clause that consists just of ANDed equality tests, such as the common "WHERE foo = 3" is also a trivial case of a semijoin where the table you are filtering on has exactly 1 row whose field value is 3, and ostensibly such WHERE clauses should also be optimizable. -- Darren Duncan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] JOIN vs IN
On Nov 12, 2011, at 1:58 AM, Pavel Ivanov wrote: > No, "exists" in this case will change query plan significantly and > performance can degrade drastically as a result. Why would that be? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] JOIN vs IN
On Fri, Nov 11, 2011 at 11:58 AM, Petite Abeille wrote: >> It returns the same results, but it doesn't seem much faster. Is there any >> performance difference to be expected from using IN instead of JOIN, or >> does SQLite internally rewrite JOIN queries to something similar as IN, >> which would explain they perform nearly the same? > > They should be equivalent in terms of cost. That said, you might want to use > the 'exists' clause instead for the sake of clarity. No, "exists" in this case will change query plan significantly and performance can degrade drastically as a result. For the original question: it's not that SQLite rewrites JOIN queries to be as IN. It's just in your particular case both queries can be executed in the same way: find all rows in table1 with the necessary amount, for each row look into table2 and find rows with the same rowid. Query with JOIN however could be executed differently - for each row in table2 find all rows with the same rowid in table1 and then check amount in them. SQLite decided that this query plan will be less effecient. Pavel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] JOIN vs IN
On Nov 11, 2011, at 3:03 PM, Fabian wrote: > It returns the same results, but it doesn't seem much faster. Is there any > performance difference to be expected from using IN instead of JOIN, or > does SQLite internally rewrite JOIN queries to something similar as IN, > which would explain they perform nearly the same? They should be equivalent in terms of cost. That said, you might want to use the 'exists' clause instead for the sake of clarity. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users