As others have mentioned, mysql doesn't handle IN queries efficiently.
You can try changing it to using derived tables/subqueries. I did some
quick tests and the explain shows a different analysis.

select comment, gid, date_posted from tbl
JOIN
(select max(id) as mid
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as maxids
ON tpl.id=maxids.mid;

You're really just changing one of the IN statements to a join. My
quick tests showed that a full table is still being done, but it's now
on the derived table (maxids), which would only be as large as how
many gids you are searching on (50?).
I tested this on two related tables, 170K in one and 90K in the other.
Your tables are currently much smaller, so mysql may come up with a
different execution path. For me, the join syntax was far faster.


On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote:
> Hello everyone,
>
> The app server in this case is PHP, and the database is MySQL 5.0.22
> on RedHat linux
>
> I've got a database with about 7.5K records in it that I expect to
> start growing very quickly ~10-12K records per day. The storage
> engine is InnoDB. This table is growing quickly and will continue to
> grow for a long time. This table stores comments (as you can see from
> the structure) and is being used to display a list of comments based
> on a users affiliations.
>
> The structure is approximately this - I'm leaving out unrelated columns:
>
> id int - primary key - auto increment
> gid bigint - indexed
> comment varchar
> date_posted timestamp
>
> I run a query with the following form
>
> select comment, gid, date_posted from tbl where id in (select max(id)
> from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);
>
> I have an index on gid and id is the primary key
>
> When I describe the query with about 50 gid values inserted (where
> indicated above) I get the following:
>
> +----+--------------------+-------+-------+---------------+---------
> +---------+------+------+--------------------------+
> | id | select_type        | table | type  | possible_keys | key     |
> key_len | ref  | rows | Extra                    |
> +----+--------------------+-------+-------+---------------+---------
> +---------+------+------+--------------------------+
> |  1 | PRIMARY            | tbl   | ALL   | NULL          | NULL    |
> NULL    | NULL | 7533 | Using where              |
> |  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid       | idx_gid |
> 9       | NULL |   58 | Using where; Using index |
> +----+--------------------+-------+-------+---------------+---------
> +---------+------+------+--------------------------+
>
> Running the query on a production machine with sufficient memory and
> horsepower (box is only 20% utilized) it still takes 3 seconds to run
> - obviously not quick enough for web use.
>
> What I really need is the most recent comment from each group based
> on a variable set of gid's that change from user to user.
>
> Any thoughts on how to tweak this to avoid the full table scan? Thank
> you in advance for your assistance.
>
> Erik Giberti
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to