Brent,

I tried this and it definitely boosted performance. On a test query that would take 2+ seconds to run with 20 id's - it ran in 0.002 seconds.

Thanks everyone for your help and comments.

Erik

On Sep 21, 2007, at 2:01 PM, Brent Baisley wrote:

As others have mentioned, mysql doesn't handle IN queries efficiently.
You can try changing it to using derived tables/subqueries. I did some
quick tests and the explain shows a different analysis.

select comment, gid, date_posted from tbl
JOIN
(select max(id) as mid
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as maxids
ON tpl.id=maxids.mid;

You're really just changing one of the IN statements to a join. My
quick tests showed that a full table is still being done, but it's now
on the derived table (maxids), which would only be as large as how
many gids you are searching on (50?).
I tested this on two related tables, 170K in one and 90K in the other.
Your tables are currently much smaller, so mysql may come up with a
different execution path. For me, the join syntax was far faster.


On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote:
Hello everyone,

The app server in this case is PHP, and the database is MySQL 5.0.22
on RedHat linux

I've got a database with about 7.5K records in it that I expect to
start growing very quickly ~10-12K records per day. The storage
engine is InnoDB. This table is growing quickly and will continue to
grow for a long time. This table stores comments (as you can see from
the structure) and is being used to display a list of comments based
on a users affiliations.

The structure is approximately this - I'm leaving out unrelated columns:

id int - primary key - auto increment
gid bigint - indexed
comment varchar
date_posted timestamp

I run a query with the following form

select comment, gid, date_posted from tbl where id in (select max(id)
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);

I have an index on gid and id is the primary key

When I describe the query with about 50 gid values inserted (where
indicated above) I get the following:

+----+--------------------+-------+-------+---------------+---------
+---------+------+------+--------------------------+
| id | select_type        | table | type  | possible_keys | key     |
key_len | ref  | rows | Extra                    |
+----+--------------------+-------+-------+---------------+---------
+---------+------+------+--------------------------+
|  1 | PRIMARY            | tbl   | ALL   | NULL          | NULL    |
NULL    | NULL | 7533 | Using where              |
|  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid       | idx_gid |
9       | NULL |   58 | Using where; Using index |
+----+--------------------+-------+-------+---------------+---------
+---------+------+------+--------------------------+

Running the query on a production machine with sufficient memory and
horsepower (box is only 20% utilized) it still takes 3 seconds to run
- obviously not quick enough for web use.

What I really need is the most recent comment from each group based
on a variable set of gid's that change from user to user.

Any thoughts on how to tweak this to avoid the full table scan? Thank
you in advance for your assistance.

Erik Giberti


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql? [EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to