Re: Assistance avoiding a full table scan
Brent, I tried this and it definitely boosted performance. On a test query that would take 2+ seconds to run with 20 id's - it ran in 0.002 seconds. Thanks everyone for your help and comments. Erik On Sep 21, 2007, at 2:01 PM, Brent Baisley wrote: As others have mentioned, mysql doesn't handle IN queries efficiently. You can try changing it to using derived tables/subqueries. I did some quick tests and the explain shows a different analysis. select comment, gid, date_posted from tbl JOIN (select max(id) as mid from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as maxids ON tpl.id=maxids.mid; You're really just changing one of the IN statements to a join. My quick tests showed that a full table is still being done, but it's now on the derived table (maxids), which would only be as large as how many gids you are searching on (50?). I tested this on two related tables, 170K in one and 90K in the other. Your tables are currently much smaller, so mysql may come up with a different execution path. For me, the join syntax was far faster. On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote: Hello everyone, The app server in this case is PHP, and the database is MySQL 5.0.22 on RedHat linux I've got a database with about 7.5K records in it that I expect to start growing very quickly ~10-12K records per day. The storage engine is InnoDB. This table is growing quickly and will continue to grow for a long time. This table stores comments (as you can see from the structure) and is being used to display a list of comments based on a users affiliations. The structure is approximately this - I'm leaving out unrelated columns: id int - primary key - auto increment gid bigint - indexed comment varchar date_posted timestamp I run a query with the following form select comment, gid, date_posted from tbl where id in (select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid); I have an index on gid and id is the primary key When I describe the query with about 50 gid values inserted (where indicated above) I get the following: +++---+---+---+- +-+--+--+--+ | id | select_type| table | type | possible_keys | key | key_len | ref | rows | Extra| +++---+---+---+- +-+--+--+--+ | 1 | PRIMARY| tbl | ALL | NULL | NULL| NULL| NULL | 7533 | Using where | | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid | 9 | NULL | 58 | Using where; Using index | +++---+---+---+- +-+--+--+--+ Running the query on a production machine with sufficient memory and horsepower (box is only 20% utilized) it still takes 3 seconds to run - obviously not quick enough for web use. What I really need is the most recent comment from each group based on a variable set of gid's that change from user to user. Any thoughts on how to tweak this to avoid the full table scan? Thank you in advance for your assistance. Erik Giberti -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql? [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance avoiding a full table scan
As others have mentioned, mysql doesn't handle IN queries efficiently. You can try changing it to using derived tables/subqueries. I did some quick tests and the explain shows a different analysis. select comment, gid, date_posted from tbl JOIN (select max(id) as mid from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as maxids ON tpl.id=maxids.mid; You're really just changing one of the IN statements to a join. My quick tests showed that a full table is still being done, but it's now on the derived table (maxids), which would only be as large as how many gids you are searching on (50?). I tested this on two related tables, 170K in one and 90K in the other. Your tables are currently much smaller, so mysql may come up with a different execution path. For me, the join syntax was far faster. On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote: > Hello everyone, > > The app server in this case is PHP, and the database is MySQL 5.0.22 > on RedHat linux > > I've got a database with about 7.5K records in it that I expect to > start growing very quickly ~10-12K records per day. The storage > engine is InnoDB. This table is growing quickly and will continue to > grow for a long time. This table stores comments (as you can see from > the structure) and is being used to display a list of comments based > on a users affiliations. > > The structure is approximately this - I'm leaving out unrelated columns: > > id int - primary key - auto increment > gid bigint - indexed > comment varchar > date_posted timestamp > > I run a query with the following form > > select comment, gid, date_posted from tbl where id in (select max(id) > from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid); > > I have an index on gid and id is the primary key > > When I describe the query with about 50 gid values inserted (where > indicated above) I get the following: > > +++---+---+---+- > +-+--+--+--+ > | id | select_type| table | type | possible_keys | key | > key_len | ref | rows | Extra| > +++---+---+---+- > +-+--+--+--+ > | 1 | PRIMARY| tbl | ALL | NULL | NULL| > NULL| NULL | 7533 | Using where | > | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid | > 9 | NULL | 58 | Using where; Using index | > +++---+---+---+- > +-+--+--+--+ > > Running the query on a production machine with sufficient memory and > horsepower (box is only 20% utilized) it still takes 3 seconds to run > - obviously not quick enough for web use. > > What I really need is the most recent comment from each group based > on a variable set of gid's that change from user to user. > > Any thoughts on how to tweak this to avoid the full table scan? Thank > you in advance for your assistance. > > Erik Giberti > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance avoiding a full table scan
Erik, Even is you eliminate the subquery (which I have used efficiently in the past, but they are always something to be careful of), the IN clause is going to kill you.. above a certain number of elements in that clause, the optimizer will go straight to full table scan. This drawback is well-known. On 9/21/07, Dan Buettner <[EMAIL PROTECTED]> wrote: > Erik, I think the main reason your query is running slowly is the use of a > subselect. MySQL does not generally perform well with subselects, though > work continues in that area. > > There is also a problem/situation in MySQL in that you can't use MAX/GROUP > BY functions quite the way you can in other databases; you'll get an > accurate MAX value for one column, but the value in another won't > necessarily be from the same row. Someone posted on the list about this > recently, calling it a bug, and I tend to agree. > > To solve your problem: > > I would take one of two approaches. > > First approach: split it into two queries in PHP, and use the results of the > first in the second, like so: > > query1 = select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 ) > group by gid > > in PHP: id_string = join the results with commas. implode function? > > query2 = select comment, gid, date_posted from tbl where id in (id_string) > > Generally speaking, fewer queries = higher performance, and databases are > optimized to join tables, they do it well - but in your case I think you'll > find one of these works better. > > Second approach: > > Insert values from first query into a temporary table, then join on that > temp table in your second query. > > I don't think either approach will have a speed advantage, and the first is > probably easier to code. > > HTH, > Dan > > > > On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote: > > > > Hello everyone, > > > > The app server in this case is PHP, and the database is MySQL 5.0.22 > > on RedHat linux > > > > I've got a database with about 7.5K records in it that I expect to > > start growing very quickly ~10-12K records per day. The storage > > engine is InnoDB. This table is growing quickly and will continue to > > grow for a long time. This table stores comments (as you can see from > > the structure) and is being used to display a list of comments based > > on a users affiliations. > > > > The structure is approximately this - I'm leaving out unrelated columns: > > > > id int - primary key - auto increment > > gid bigint - indexed > > comment varchar > > date_posted timestamp > > > > I run a query with the following form > > > > select comment, gid, date_posted from tbl where id in (select max(id) > > from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid); > > > > I have an index on gid and id is the primary key > > > > When I describe the query with about 50 gid values inserted (where > > indicated above) I get the following: > > > > +++---+---+---+- > > +-+--+--+--+ > > | id | select_type| table | type | possible_keys | key | > > key_len | ref | rows | Extra| > > +++---+---+---+- > > +-+--+--+--+ > > | 1 | PRIMARY| tbl | ALL | NULL | NULL| > > NULL| NULL | 7533 | Using where | > > | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid | > > 9 | NULL | 58 | Using where; Using index | > > +++---+---+---+- > > +-+--+--+--+ > > > > Running the query on a production machine with sufficient memory and > > horsepower (box is only 20% utilized) it still takes 3 seconds to run > > - obviously not quick enough for web use. > > > > What I really need is the most recent comment from each group based > > on a variable set of gid's that change from user to user. > > > > Any thoughts on how to tweak this to avoid the full table scan? Thank > > you in advance for your assistance. > > > > Erik Giberti > > > > > > -- > > MySQL General Mailing List > > For list archives: http://lists.mysql.com/mysql > > To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] > > > > > -- - michael dykman - [EMAIL PROTECTED] - All models are wrong. Some models are useful. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Assistance avoiding a full table scan
Erik, I think the main reason your query is running slowly is the use of a subselect. MySQL does not generally perform well with subselects, though work continues in that area. There is also a problem/situation in MySQL in that you can't use MAX/GROUP BY functions quite the way you can in other databases; you'll get an accurate MAX value for one column, but the value in another won't necessarily be from the same row. Someone posted on the list about this recently, calling it a bug, and I tend to agree. To solve your problem: I would take one of two approaches. First approach: split it into two queries in PHP, and use the results of the first in the second, like so: query1 = select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid in PHP: id_string = join the results with commas. implode function? query2 = select comment, gid, date_posted from tbl where id in (id_string) Generally speaking, fewer queries = higher performance, and databases are optimized to join tables, they do it well - but in your case I think you'll find one of these works better. Second approach: Insert values from first query into a temporary table, then join on that temp table in your second query. I don't think either approach will have a speed advantage, and the first is probably easier to code. HTH, Dan On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote: > > Hello everyone, > > The app server in this case is PHP, and the database is MySQL 5.0.22 > on RedHat linux > > I've got a database with about 7.5K records in it that I expect to > start growing very quickly ~10-12K records per day. The storage > engine is InnoDB. This table is growing quickly and will continue to > grow for a long time. This table stores comments (as you can see from > the structure) and is being used to display a list of comments based > on a users affiliations. > > The structure is approximately this - I'm leaving out unrelated columns: > > id int - primary key - auto increment > gid bigint - indexed > comment varchar > date_posted timestamp > > I run a query with the following form > > select comment, gid, date_posted from tbl where id in (select max(id) > from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid); > > I have an index on gid and id is the primary key > > When I describe the query with about 50 gid values inserted (where > indicated above) I get the following: > > +++---+---+---+- > +-+--+--+--+ > | id | select_type| table | type | possible_keys | key | > key_len | ref | rows | Extra| > +++---+---+---+- > +-+--+--+--+ > | 1 | PRIMARY| tbl | ALL | NULL | NULL| > NULL| NULL | 7533 | Using where | > | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid | > 9 | NULL | 58 | Using where; Using index | > +++---+---+---+- > +-+--+--+--+ > > Running the query on a production machine with sufficient memory and > horsepower (box is only 20% utilized) it still takes 3 seconds to run > - obviously not quick enough for web use. > > What I really need is the most recent comment from each group based > on a variable set of gid's that change from user to user. > > Any thoughts on how to tweak this to avoid the full table scan? Thank > you in advance for your assistance. > > Erik Giberti > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] > >
Assistance avoiding a full table scan
Hello everyone, The app server in this case is PHP, and the database is MySQL 5.0.22 on RedHat linux I've got a database with about 7.5K records in it that I expect to start growing very quickly ~10-12K records per day. The storage engine is InnoDB. This table is growing quickly and will continue to grow for a long time. This table stores comments (as you can see from the structure) and is being used to display a list of comments based on a users affiliations. The structure is approximately this - I'm leaving out unrelated columns: id int - primary key - auto increment gid bigint - indexed comment varchar date_posted timestamp I run a query with the following form select comment, gid, date_posted from tbl where id in (select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid); I have an index on gid and id is the primary key When I describe the query with about 50 gid values inserted (where indicated above) I get the following: +++---+---+---+- +-+--+--+--+ | id | select_type| table | type | possible_keys | key | key_len | ref | rows | Extra| +++---+---+---+- +-+--+--+--+ | 1 | PRIMARY| tbl | ALL | NULL | NULL| NULL| NULL | 7533 | Using where | | 2 | DEPENDENT SUBQUERY | tbl | range | idx_gid | idx_gid | 9 | NULL | 58 | Using where; Using index | +++---+---+---+- +-+--+--+--+ Running the query on a production machine with sufficient memory and horsepower (box is only 20% utilized) it still takes 3 seconds to run - obviously not quick enough for web use. What I really need is the most recent comment from each group based on a variable set of gid's that change from user to user. Any thoughts on how to tweak this to avoid the full table scan? Thank you in advance for your assistance. Erik Giberti -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]