Hi all, can someone explain me why this query gets executed the way it is? I simplified it as much as possible and think it's a conceptual/logical thing, so I'll omit - at least for now - the table definitions and sample data for brevity.
-- "Superquery" EXPLAIN SELECT t1.id, t1.name FROM document AS t1 WHERE t1.id IN ( -- "Subquery" SELECT DISTINCT data_id FROM wfd_reference WHERE wfd_field_id =699 AND ref_data_id IN ( 171 ) ) This gives: *************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 277 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: wfd_reference type: ref possible_keys: field_data,test key: field_data key_len: 4 ref: const,func rows: 4 Extra: Using where; Using temporary However - why do we need the *dependent* subquery at all? The subquery can be executed on its own, as it does not depend on any information of the "superquery". EXPLAIN SELECT DISTINCT data_id FROM wfd_reference WHERE wfd_field_id =699 AND ref_data_id IN ( 171 ) *************************** 1. row *************************** id: 1 select_type: SIMPLE table: wfd_reference type: ref possible_keys: field_data,test key: test key_len: 4 ref: const,const rows: 9 Extra: Using where; Using temporary Now if I just take the result of this query, concat the data_ids on the application level and build the superquery as follows: -- "two-staged superquery variant" EXPLAIN SELECT t1.id, t1.name FROM document AS t1 WHERE t1.id IN ( 32, 31, 30, 53, 56, 57, 58, 59, 60, 111 ) *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 type: range possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: NULL rows: 10 Extra: Using where the results seem to be much better. I was afraid of the dependent subquery for the "ALL" scan of t1, as t1 will become huge. OTOH, the subquery will be very restrictive: The number of data_ids will always be very small, at least compared to the number of rows in t1. Are there any non-obvious reasons for the behaviour described above? Is that something that cannot be optimized right now? Am I too short-sighted with my "optimization" approach? Thanks a lot, Matthias -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]