Hello,
 
I'm trying to determine the best way to optimize the query below. Right now
it is taking around 9mins and we need it to take no more than 30 seconds (we
can get it under 30s on MS SQL):
 
explain select count(distinct(phone)) as TOTAL
FROM speedlink
WHERE
county in('247','085','145','285','215','211') AND
state = 'GA' AND
(
 homeowner = 'Y' OR 
 probable_homeowner IN ('8','9') OR
 homeowner_probability_model BETWEEN '080' AND '102'
) AND
phone IS NOT NULL AND
first IS NOT NULL AND
last IS NOT NULL
--------------
 
+----+-------------+-----------+------+-------------------------------------
-------------------------------------------+-----------+---------+-------+--
-------+-------------+
| id | select_type | table     | type | possible_keys
| key       | key_len | ref   | rows    | Extra       |
+----+-------------+-----------+------+-------------------------------------
-------------------------------------------+-----------+---------+-------+--
-------+-------------+
|  1 | SIMPLE      | speedlink | ref  |
idx_state,idx_county,idx_phone,idx_homeowner,idx_hpm,idx_ph,idx_first,idx_la
st | idx_state |       3 | const | 2840162 | Using where |
+----+-------------+-----------+------+-------------------------------------
-------------------------------------------+-----------+---------+-------+--
-------+-------------+
1 row in set (0.00 sec)
 
 
 
Here are some of my thoughts on what could be done to speed it up, but
haven't implemented these yet:
1) make the phone field UNIQUE on data load. This would reduce the data for
other queries to be run but maybe it makes sense to have a few table sets.
2) change the numeric fields from varchars to ints, smallints or something
like that.
3) Not sure if NULL values are slower or faster than using a comparison with
'  ', interested on feedback.
4) Split the table into multiple files? I'm not sure how to do this but have
seen it mentioned in some articles.
 
Some information on the table:
 - ISAM
 - 120 million rows
 - 26 fields in total
 - 23 fields indexed (all fields in the above query are indexed)
 - speedlink.MYD is 12GB, speedlink.MYI is 24GB
 
Some info on the server
 - Quad Xeon 900MHz 
 - 4GB RAM
 - DB is storage on an EMC Symmetrix storage system (fibre channel SAN)
 
Any/all assistance is appreciated.
 
Thanks.
 
Chris.

Reply via email to