Re: Any means to get the optimizer out of the way?

2005-02-11 Thread beacker
Harrison Fisk [EMAIL PROTECTED] writes:
The difference between the count(*) and the other query is that the  
real query has to use the datafile to retrieve the data when you are  
involving the actual columns.  With the count(*) query it is using an  
Index only read, meaning that it doesn't have to the use the datafile  
at all to resolve it.  If you do an EXPLAIN on the count(*) query, you  
should see a 'Using Index' in the Extra column.  So it is using the  
index and estimating it is going to have to read 3885524 rows from the  
data file.  Assuming the estimate is close, that will be an extra  
3885524 disk seeks and reads to find the data for your query.  That is  
why vmstat is showing the query doing much more disk i/o.

Harrison,
 Thanks for helping me to better understand what the explain was
telling me.  Though I'm not sure it quite fits with the other data
I collected from the strace of the mysqld that servicing my query.  It
looked to be doing a sequential pread, based on the record size of
9 bytes.  This was one reason I felt the query was doing a table
scan to fulfill the query.

 One interesting experiment I did was to try to do a summation
query with and without an index.  The query with an index too 31 hrs.
While the same data set without the index took 7 hours.

The only way you could improve this is to make a combined index across  
(member_id, pts_awarded) and get rid of the only (member_id) index.   
Then MySQL would be able to again use only the index to resolve the  
query.  Keep in mind this would increase your index size by about a  
third, so it would take more diskspace and you would fit less into  
cache, so it would decrease response times slightly for the count(*)  
query.

 Thank you for this suggestion.  I've dropped the original index
and have added the composite index to the table.  It increased the
size of the index file by about 50% as expected, but the summation
query seems to be doing a much better job at scanning the data in
the index rather than in the index/table combination.  We'll see
how long this summation query takes at this point.

   Thank you for your help and explanations,
Brad Eacker ([EMAIL PROTECTED])



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Any means to get the optimizer out of the way?

2005-02-10 Thread Brad Eacker
Folks,
 I have a 677M row table with index

desc pts_awarded_snap;
+-+-+--+-+-+---+
| Field   | Type| Null | Key | Default | Extra |
+-+-+--+-+-+---+
| member_id   | int(11) |  | MUL | 0   |   |
| PTS_AWARDED | int(11) |  | | 0   |   |
+-+-+--+-+-+---+
2 rows in set (0.00 sec)

select count(*) from pts_awarded_snap;
+---+
| count(*)  |
+---+
| 776723372 |
+---+
1 row in set (0.00 sec)

Which you can see has an index on member_id.

The problem is when I try to create a temporary table from a
portion of this based upon the member_id, the optimizer decides
that a table scan is necessary on this 7GB table.

When I try:
  create temporary table T_awards_snap
select member_id, pts_awarded
from pts_awarded_snap FORCE INDEX (member_id)
where member_id between 71 and 80;

the optimizer decides to do a full table scan to choose the
required rows.

Yet a counting query  of the same space:

  select count(*) from pts_awarded_snap
where member_id between 71 and 80;
+--+
| count(*) |
+--+
|  3957677 |
+--+
1 row in set (5.84 sec)

works fine.

  explain select member_id, pts_awarded
from pts_awarded_snap FORCE INDEX (member_id)
where member_id between 71 and 80;
+--+---+---+---+-+--+--
---+-+
| table| type  | possible_keys | key   | key_len | ref  | rows 
   | Extra   |
+--+---+---+---+-+--+--
---+-+
| pts_awarded_snap | range | member_id | member_id |   4 | NULL | 
3885524 | Using where |
+--+---+---+---+-+--+--
---+-+
1 row in set (0.00 sec)

seems to indicate the member_id index can be used.  But the length of time to
do this query, and the amount blocks being read as indicated by vmstat, tell
me that a full table scan is being performed.

 I've looked over the optimization elements of the MySQL manual, trying

set max_seeks_for_key=100;

even 'FORCE INDEX (member_id)' has been tried as shown in the query above
to no avail, a table scan is still done.

 Bottom line question out of all this:  Is there a way to convince the
optimizer that it is fine to use a range portion of the index to fulfill
this query?

  Please let me know if there is something I've missed,
  Brad Eacker ([EMAIL PROTECTED])



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Any means to get the optimizer out of the way?

2005-02-10 Thread Harrison Fisk
Hi,
On Feb 10, 2005, at 7:00 PM, Brad Eacker wrote:
Folks,
 I have a 677M row table with index
desc pts_awarded_snap;
+-+-+--+-+-+---+
| Field   | Type| Null | Key | Default | Extra |
+-+-+--+-+-+---+
| member_id   | int(11) |  | MUL | 0   |   |
| PTS_AWARDED | int(11) |  | | 0   |   |
+-+-+--+-+-+---+
2 rows in set (0.00 sec)
select count(*) from pts_awarded_snap;
+---+
| count(*)  |
+---+
| 776723372 |
+---+
1 row in set (0.00 sec)
Which you can see has an index on member_id.
The problem is when I try to create a temporary table from a
portion of this based upon the member_id, the optimizer decides
that a table scan is necessary on this 7GB table.
When I try:
  create temporary table T_awards_snap
select member_id, pts_awarded
from pts_awarded_snap FORCE INDEX (member_id)
where member_id between 71 and 80;
the optimizer decides to do a full table scan to choose the
required rows.
Yet a counting query  of the same space:
  select count(*) from pts_awarded_snap
where member_id between 71 and 80;
+--+
| count(*) |
+--+
|  3957677 |
+--+
1 row in set (5.84 sec)
works fine.
  explain select member_id, pts_awarded
from pts_awarded_snap FORCE INDEX (member_id)
where member_id between 71 and 80;
+--+---+---+---+- 
+--+--
---+-+
| table| type  | possible_keys | key   | key_len | ref  
 | rows
   | Extra   |
+--+---+---+---+- 
+--+--
---+-+
| pts_awarded_snap | range | member_id | member_id |   4 |  
NULL |
3885524 | Using where |
+--+---+---+---+- 
+--+--
---+-+
1 row in set (0.00 sec)

seems to indicate the member_id index can be used.  But the length of  
time to
do this query, and the amount blocks being read as indicated by  
vmstat, tell
me that a full table scan is being performed.
As it says above, it is using the index to resolve the query.  If it  
wasn't using the index it would say NULL under the key column listed  
above.

The difference between the count(*) and the other query is that the  
real query has to use the datafile to retrieve the data when you are  
involving the actual columns.  With the count(*) query it is using an  
Index only read, meaning that it doesn't have to the use the datafile  
at all to resolve it.  If you do an EXPLAIN on the count(*) query, you  
should see a 'Using Index' in the Extra column.  So it is using the  
index and estimating it is going to have to read 3885524 rows from the  
data file.  Assuming the estimate is close, that will be an extra  
3885524 disk seeks and reads to find the data for your query.  That is  
why vmstat is showing the query doing much more disk i/o.

The only way you could improve this is to make a combined index across  
(member_id, pts_awarded) and get rid of the only (member_id) index.   
Then MySQL would be able to again use only the index to resolve the  
query.  Keep in mind this would increase your index size by about a  
third, so it would take more diskspace and you would fit less into  
cache, so it would decrease response times slightly for the count(*)  
query.

Regards,
Harrison
--
Harrison C. Fisk, Trainer and Consultant
MySQL AB, www.mysql.com
Get a jumpstart on MySQL Cluster --  
http://www.mysql.com/consulting/packaged/cluster.html

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]