Hi all,
can someone explain me why this query gets executed the way it is? I
simplified it as much as possible and think it's a conceptual/logical
thing, so I'll omit - at least for now - the table definitions and
sample data for brevity.
-- "Superquery"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
-- "Subquery"
SELECT DISTINCT data_id
FROM wfd_reference
WHERE wfd_field_id =699
AND ref_data_id
IN ( 171 )
)
This gives:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 277
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: wfd_reference
type: ref
possible_keys: field_data,test
key: field_data
key_len: 4
ref: const,func
rows: 4
Extra: Using where; Using temporary
However - why do we need the *dependent* subquery at all? The subquery
can be executed on its own, as it does not depend on any information of
the "superquery".
EXPLAIN SELECT DISTINCT data_id
FROM wfd_reference
WHERE wfd_field_id =699
AND ref_data_id
IN ( 171 )
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: wfd_reference
type: ref
possible_keys: field_data,test
key: test
key_len: 4
ref: const,const
rows: 9
Extra: Using where; Using temporary
Now if I just take the result of this query, concat the data_ids on the
application level and build the superquery as follows:
-- "two-staged superquery variant"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
32, 31, 30, 53, 56, 57, 58, 59, 60, 111
)
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 2
ref: NULL
rows: 10
Extra: Using where
the results seem to be much better.
I was afraid of the dependent subquery for the "ALL" scan of t1, as t1
will become huge. OTOH, the subquery will be very restrictive: The
number of data_ids will always be very small, at least compared to the
number of rows in t1.
Are there any non-obvious reasons for the behaviour described above? Is
that something that cannot be optimized right now? Am I too
short-sighted with my "optimization" approach?
Thanks a lot,
Matthias
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]