Hi all,

can someone explain me why this query gets executed the way it is? I
simplified it as much as possible and think it's a conceptual/logical
thing, so I'll omit - at least for now - the table definitions and
sample data for brevity.

-- "Superquery"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
        -- "Subquery"
        SELECT DISTINCT data_id
        FROM wfd_reference
        WHERE wfd_field_id =699
        AND ref_data_id
        IN ( 171 )
)

This gives:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 277
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: wfd_reference
         type: ref
possible_keys: field_data,test
          key: field_data
      key_len: 4
          ref: const,func
         rows: 4
        Extra: Using where; Using temporary

However - why do we need the *dependent* subquery at all? The subquery
can be executed on its own, as it does not depend on any information of
the "superquery".

EXPLAIN SELECT DISTINCT data_id
FROM wfd_reference
WHERE wfd_field_id =699
AND ref_data_id
IN ( 171 ) 

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wfd_reference
         type: ref
possible_keys: field_data,test
          key: test
      key_len: 4
          ref: const,const
         rows: 9
        Extra: Using where; Using temporary

Now if I just take the result of this query, concat the data_ids on the
application level and build the superquery as follows:

-- "two-staged superquery variant"
EXPLAIN SELECT t1.id, t1.name
FROM document AS t1
WHERE t1.id
IN (
32, 31, 30, 53, 56, 57, 58, 59, 60, 111
)

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 10
        Extra: Using where

the results seem to be much better. 

I was afraid of the dependent subquery for the "ALL" scan of t1, as t1
will become huge. OTOH, the subquery will be very restrictive: The
number of data_ids will always be very small, at least compared to the
number of rows in t1.

Are there any non-obvious reasons for the behaviour described above? Is
that something that cannot be optimized right now? Am I too
short-sighted with my "optimization" approach?

Thanks a lot,
Matthias


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to