Hi,

On Feb 10, 2005, at 7:00 PM, Brad Eacker wrote:

Folks,
     I have a 677M row table with index

desc pts_awarded_snap;
+-------------+---------+------+-----+---------+-------+
| Field       | Type    | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| member_id   | int(11) |      | MUL | 0       |       |
| PTS_AWARDED | int(11) |      |     | 0       |       |
+-------------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

select count(*) from pts_awarded_snap;
+-----------+
| count(*)  |
+-----------+
| 776723372 |
+-----------+
1 row in set (0.00 sec)

Which you can see has an index on member_id.

The problem is when I try to create a temporary table from a
portion of this based upon the member_id, the optimizer decides
that a table scan is necessary on this 7GB table.

When I try:
  create temporary table T_awards_snap
    select member_id, pts_awarded
    from pts_awarded_snap FORCE INDEX (member_id)
    where member_id between 700001 and 800000;

the optimizer decides to do a full table scan to choose the
required rows.

Yet a counting query  of the same space:

  select count(*) from pts_awarded_snap
    where member_id between 700001 and 800000;
+----------+
| count(*) |
+----------+
|  3957677 |
+----------+
1 row in set (5.84 sec)

works fine.

explain select member_id, pts_awarded
from pts_awarded_snap FORCE INDEX (member_id)
where member_id between 700001 and 800000;
+------------------+-------+---------------+-----------+--------- +------+------
---+-------------+
| table | type | possible_keys | key | key_len | ref | rows
| Extra |
+------------------+-------+---------------+-----------+--------- +------+------
---+-------------+
| pts_awarded_snap | range | member_id | member_id | 4 | NULL |
3885524 | Using where |
+------------------+-------+---------------+-----------+--------- +------+------
---+-------------+
1 row in set (0.00 sec)


seems to indicate the member_id index can be used. But the length of time to
do this query, and the amount blocks being read as indicated by vmstat, tell
me that a full table scan is being performed.

As it says above, it is using the index to resolve the query. If it wasn't using the index it would say NULL under the key column listed above.


The difference between the count(*) and the other query is that the real query has to use the datafile to retrieve the data when you are involving the actual columns. With the count(*) query it is using an Index only read, meaning that it doesn't have to the use the datafile at all to resolve it. If you do an EXPLAIN on the count(*) query, you should see a 'Using Index' in the Extra column. So it is using the index and estimating it is going to have to read 3885524 rows from the data file. Assuming the estimate is close, that will be an extra 3885524 disk seeks and reads to find the data for your query. That is why vmstat is showing the query doing much more disk i/o.

The only way you could improve this is to make a combined index across (member_id, pts_awarded) and get rid of the only (member_id) index. Then MySQL would be able to again use only the index to resolve the query. Keep in mind this would increase your index size by about a third, so it would take more diskspace and you would fit less into cache, so it would decrease response times slightly for the count(*) query.

Regards,

Harrison

--
Harrison C. Fisk, Trainer and Consultant
MySQL AB, www.mysql.com

Get a jumpstart on MySQL Cluster -- http://www.mysql.com/consulting/packaged/cluster.html


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to