Fabian wrote:
Suppose I have two tables, and I want to have look for a value in the first
table, but display the columns from the second table. The most obvious way
would be joining them on rowid. But I don't need to SELECT any columns from
the first table, and it's a FTS4 table (which always joins a bit slower
than real tables), so I currently do:

SELECT * FROM table2 WHERE rowid IN (SELECT rowid FROM table1 WHERE amount
500)

It returns the same results, but it doesn't seem much faster. Is there any
performance difference to be expected from using IN instead of JOIN, or
does SQLite internally rewrite JOIN queries to something similar as IN,
which would explain they perform nearly the same?

The logical operation you are doing is a semijoin, filtering table2 by matching rows in table1 (if you used NOT IN instead you would be doing an antijoin).

A semijoin is most clearly expressed in SQL using the WHERE clause as you did, because the only purpose of table1 is to filter and not to return values from, as putting it in FROM would imply.

Now because SQL is bag oriented rather than set oriented, using IN also helps because you avoid generating extra duplicates, whereas if you used the join method instead, then if any row in one table matched multiple rows in the other (because you weren't joining on a (unique) key of both tables), the result could have duplicate table2 rows, which probably isn't what you want.

As to your performance question, any good DBMS should make both of your methods perform about the same, but that if they aren't the same, the IN version should always perform faster than the FROM version because with IN you only ever have to look at each row in table2 once; as soon as it finds any match you move on, rather than repeating for all possible matches.

Note that semijoins and antijoins are what you have both when you have another select after the NOT/IN and when you have a literal list, such as "IN (1,2,3)".

Note that any WHERE clause that consists just of ANDed equality tests, such as the common "WHERE foo = 3" is also a trivial case of a semijoin where the table you are filtering on has exactly 1 row whose field value is 3, and ostensibly such WHERE clauses should also be optimizable.

-- Darren Duncan
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to