Javier Diaz wrote:
EXPLAIN SELECT * FROM process_times WHERE date <= date_sub(now(), INTERVAL 2 day) <possible_keys>date_idx,date_proc_idx</possible_keys> <key>(NULL)</key> <rows>10778561</rows>
EXPLAIN SELECT * FROM process_times WHERE date = date_sub(now(), INTERVAL 2 day) <possible_keys>date_idx,date_proc_idx</possible_keys> <key>date_idx</key> <rows>1863456</rows>
MySQL always tries to find the index that will result in the smallest number of rows to evaluate for the final result. These queries show that you have ca. 11 million rows in the table of which ca. 2 million rows are two days old. My guess is that at most 4 million rows are less than two days old, so at least 7 million rows are 2 days old or older. 7 million is far more than 30% of 11 million, so the select for <= 2 days old will use a full table, which is likely to be faster in this case. In case of InnoDB the primary key is stored with the data (clustered indexes), so if date_idx were the primary index it would probably have used it, but first getting 65% of a secundary index and then looking it up in the primary index does not make sense.
EXPLAIN SELECT * FROM process_times WHERE date = date_sub(now(), INTERVAL 2 day) and process=1 <possible_keys>date_idx,process,date_proc_idx</possible_keys> <key>date_proc_idx</key> <rows>550726</rows>
The half a million or so records for this process that are exactly two days old can be retrieved efficiently by using the combined index date_proc_idx.
EXPLAIN SELECT * FROM process_times WHERE date <= date_sub(now(), INTERVAL 2 day) and process=1 <possible_keys>date_idx,process,date_proc_idx</possible_keys> <key>process</key> <rows>1830334</rows>
The combined index cannot be used because you cannot make a range condition for this case:
(0000-00-00 00:00:00, 1) < (date, process) < (2005-01-02 00:00:00, 1)This would include for example (2005-01-02 00:00:00 , 0) which does not meet the condition set in the query.
[See: http://dev.mysql.com/doc/refman/5.0/en/range-access-multi-part.html ]As we've already seen earlier the date_idx would result in far more than 30% of the rows, so only the process index can be used (ca. 17% of the rows).
I would like also ask if is valid to expect that the results of the Explain statement for a query like this SELECT * FROM TABLE-X WHERE [CONDITIONS]Are valid for the equivalentDELETE FROM TABLE-X WHERE [CONDITIONS]
Not necessarily. In many cases it will be about the same, but I suspect that technical conditions such as updating indexes, etc. might influence the execution path that MySQL choses.
You might investigate other methods such as moving the remaining records to a new table, dropping the old table and renaming the new table. This is just a quick idea, I haven't looked up how well InnoDB performs these operations and how it influences the other users of your data.
Regards, Jigal.
smime.p7s
Description: S/MIME Cryptographic Signature