Re: Any means to get the optimizer out of the way?
Harrison Fisk [EMAIL PROTECTED] writes: The difference between the count(*) and the other query is that the real query has to use the datafile to retrieve the data when you are involving the actual columns. With the count(*) query it is using an Index only read, meaning that it doesn't have to the use the datafile at all to resolve it. If you do an EXPLAIN on the count(*) query, you should see a 'Using Index' in the Extra column. So it is using the index and estimating it is going to have to read 3885524 rows from the data file. Assuming the estimate is close, that will be an extra 3885524 disk seeks and reads to find the data for your query. That is why vmstat is showing the query doing much more disk i/o. Harrison, Thanks for helping me to better understand what the explain was telling me. Though I'm not sure it quite fits with the other data I collected from the strace of the mysqld that servicing my query. It looked to be doing a sequential pread, based on the record size of 9 bytes. This was one reason I felt the query was doing a table scan to fulfill the query. One interesting experiment I did was to try to do a summation query with and without an index. The query with an index too 31 hrs. While the same data set without the index took 7 hours. The only way you could improve this is to make a combined index across (member_id, pts_awarded) and get rid of the only (member_id) index. Then MySQL would be able to again use only the index to resolve the query. Keep in mind this would increase your index size by about a third, so it would take more diskspace and you would fit less into cache, so it would decrease response times slightly for the count(*) query. Thank you for this suggestion. I've dropped the original index and have added the composite index to the table. It increased the size of the index file by about 50% as expected, but the summation query seems to be doing a much better job at scanning the data in the index rather than in the index/table combination. We'll see how long this summation query takes at this point. Thank you for your help and explanations, Brad Eacker ([EMAIL PROTECTED]) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Any means to get the optimizer out of the way?
Folks, I have a 677M row table with index desc pts_awarded_snap; +-+-+--+-+-+---+ | Field | Type| Null | Key | Default | Extra | +-+-+--+-+-+---+ | member_id | int(11) | | MUL | 0 | | | PTS_AWARDED | int(11) | | | 0 | | +-+-+--+-+-+---+ 2 rows in set (0.00 sec) select count(*) from pts_awarded_snap; +---+ | count(*) | +---+ | 776723372 | +---+ 1 row in set (0.00 sec) Which you can see has an index on member_id. The problem is when I try to create a temporary table from a portion of this based upon the member_id, the optimizer decides that a table scan is necessary on this 7GB table. When I try: create temporary table T_awards_snap select member_id, pts_awarded from pts_awarded_snap FORCE INDEX (member_id) where member_id between 71 and 80; the optimizer decides to do a full table scan to choose the required rows. Yet a counting query of the same space: select count(*) from pts_awarded_snap where member_id between 71 and 80; +--+ | count(*) | +--+ | 3957677 | +--+ 1 row in set (5.84 sec) works fine. explain select member_id, pts_awarded from pts_awarded_snap FORCE INDEX (member_id) where member_id between 71 and 80; +--+---+---+---+-+--+-- ---+-+ | table| type | possible_keys | key | key_len | ref | rows | Extra | +--+---+---+---+-+--+-- ---+-+ | pts_awarded_snap | range | member_id | member_id | 4 | NULL | 3885524 | Using where | +--+---+---+---+-+--+-- ---+-+ 1 row in set (0.00 sec) seems to indicate the member_id index can be used. But the length of time to do this query, and the amount blocks being read as indicated by vmstat, tell me that a full table scan is being performed. I've looked over the optimization elements of the MySQL manual, trying set max_seeks_for_key=100; even 'FORCE INDEX (member_id)' has been tried as shown in the query above to no avail, a table scan is still done. Bottom line question out of all this: Is there a way to convince the optimizer that it is fine to use a range portion of the index to fulfill this query? Please let me know if there is something I've missed, Brad Eacker ([EMAIL PROTECTED]) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Any means to get the optimizer out of the way?
Hi, On Feb 10, 2005, at 7:00 PM, Brad Eacker wrote: Folks, I have a 677M row table with index desc pts_awarded_snap; +-+-+--+-+-+---+ | Field | Type| Null | Key | Default | Extra | +-+-+--+-+-+---+ | member_id | int(11) | | MUL | 0 | | | PTS_AWARDED | int(11) | | | 0 | | +-+-+--+-+-+---+ 2 rows in set (0.00 sec) select count(*) from pts_awarded_snap; +---+ | count(*) | +---+ | 776723372 | +---+ 1 row in set (0.00 sec) Which you can see has an index on member_id. The problem is when I try to create a temporary table from a portion of this based upon the member_id, the optimizer decides that a table scan is necessary on this 7GB table. When I try: create temporary table T_awards_snap select member_id, pts_awarded from pts_awarded_snap FORCE INDEX (member_id) where member_id between 71 and 80; the optimizer decides to do a full table scan to choose the required rows. Yet a counting query of the same space: select count(*) from pts_awarded_snap where member_id between 71 and 80; +--+ | count(*) | +--+ | 3957677 | +--+ 1 row in set (5.84 sec) works fine. explain select member_id, pts_awarded from pts_awarded_snap FORCE INDEX (member_id) where member_id between 71 and 80; +--+---+---+---+- +--+-- ---+-+ | table| type | possible_keys | key | key_len | ref | rows | Extra | +--+---+---+---+- +--+-- ---+-+ | pts_awarded_snap | range | member_id | member_id | 4 | NULL | 3885524 | Using where | +--+---+---+---+- +--+-- ---+-+ 1 row in set (0.00 sec) seems to indicate the member_id index can be used. But the length of time to do this query, and the amount blocks being read as indicated by vmstat, tell me that a full table scan is being performed. As it says above, it is using the index to resolve the query. If it wasn't using the index it would say NULL under the key column listed above. The difference between the count(*) and the other query is that the real query has to use the datafile to retrieve the data when you are involving the actual columns. With the count(*) query it is using an Index only read, meaning that it doesn't have to the use the datafile at all to resolve it. If you do an EXPLAIN on the count(*) query, you should see a 'Using Index' in the Extra column. So it is using the index and estimating it is going to have to read 3885524 rows from the data file. Assuming the estimate is close, that will be an extra 3885524 disk seeks and reads to find the data for your query. That is why vmstat is showing the query doing much more disk i/o. The only way you could improve this is to make a combined index across (member_id, pts_awarded) and get rid of the only (member_id) index. Then MySQL would be able to again use only the index to resolve the query. Keep in mind this would increase your index size by about a third, so it would take more diskspace and you would fit less into cache, so it would decrease response times slightly for the count(*) query. Regards, Harrison -- Harrison C. Fisk, Trainer and Consultant MySQL AB, www.mysql.com Get a jumpstart on MySQL Cluster -- http://www.mysql.com/consulting/packaged/cluster.html -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]