Re: Assistance avoiding a full table scan

2007-09-26 Thread Erik Giberti

Brent,

I tried this and it definitely boosted performance. On a test query  
that would take 2+ seconds to run with 20 id's - it ran in 0.002  
seconds.


Thanks everyone for your help and comments.

Erik

On Sep 21, 2007, at 2:01 PM, Brent Baisley wrote:


As others have mentioned, mysql doesn't handle IN queries efficiently.
You can try changing it to using derived tables/subqueries. I did some
quick tests and the explain shows a different analysis.

select comment, gid, date_posted from tbl
JOIN
(select max(id) as mid
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as  
maxids

ON tpl.id=maxids.mid;

You're really just changing one of the IN statements to a join. My
quick tests showed that a full table is still being done, but it's now
on the derived table (maxids), which would only be as large as how
many gids you are searching on (50?).
I tested this on two related tables, 170K in one and 90K in the other.
Your tables are currently much smaller, so mysql may come up with a
different execution path. For me, the join syntax was far faster.


On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote:

Hello everyone,

The app server in this case is PHP, and the database is MySQL 5.0.22
on RedHat linux

I've got a database with about 7.5K records in it that I expect to
start growing very quickly ~10-12K records per day. The storage
engine is InnoDB. This table is growing quickly and will continue to
grow for a long time. This table stores comments (as you can see from
the structure) and is being used to display a list of comments based
on a users affiliations.

The structure is approximately this - I'm leaving out unrelated  
columns:


id int - primary key - auto increment
gid bigint - indexed
comment varchar
date_posted timestamp

I run a query with the following form

select comment, gid, date_posted from tbl where id in (select max(id)
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);

I have an index on gid and id is the primary key

When I describe the query with about 50 gid values inserted (where
indicated above) I get the following:

+++---+---+---+-
+-+--+--+--+
| id | select_type| table | type  | possible_keys | key |
key_len | ref  | rows | Extra|
+++---+---+---+-
+-+--+--+--+
|  1 | PRIMARY| tbl   | ALL   | NULL  | NULL|
NULL| NULL | 7533 | Using where  |
|  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid   | idx_gid |
9   | NULL |   58 | Using where; Using index |
+++---+---+---+-
+-+--+--+--+

Running the query on a production machine with sufficient memory and
horsepower (box is only 20% utilized) it still takes 3 seconds to run
- obviously not quick enough for web use.

What I really need is the most recent comment from each group based
on a variable set of gid's that change from user to user.

Any thoughts on how to tweak this to avoid the full table scan? Thank
you in advance for your assistance.

Erik Giberti


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql? 
[EMAIL PROTECTED]





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Assistance avoiding a full table scan

2007-09-21 Thread Brent Baisley
As others have mentioned, mysql doesn't handle IN queries efficiently.
You can try changing it to using derived tables/subqueries. I did some
quick tests and the explain shows a different analysis.

select comment, gid, date_posted from tbl
JOIN
(select max(id) as mid
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid) as maxids
ON tpl.id=maxids.mid;

You're really just changing one of the IN statements to a join. My
quick tests showed that a full table is still being done, but it's now
on the derived table (maxids), which would only be as large as how
many gids you are searching on (50?).
I tested this on two related tables, 170K in one and 90K in the other.
Your tables are currently much smaller, so mysql may come up with a
different execution path. For me, the join syntax was far faster.


On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote:
> Hello everyone,
>
> The app server in this case is PHP, and the database is MySQL 5.0.22
> on RedHat linux
>
> I've got a database with about 7.5K records in it that I expect to
> start growing very quickly ~10-12K records per day. The storage
> engine is InnoDB. This table is growing quickly and will continue to
> grow for a long time. This table stores comments (as you can see from
> the structure) and is being used to display a list of comments based
> on a users affiliations.
>
> The structure is approximately this - I'm leaving out unrelated columns:
>
> id int - primary key - auto increment
> gid bigint - indexed
> comment varchar
> date_posted timestamp
>
> I run a query with the following form
>
> select comment, gid, date_posted from tbl where id in (select max(id)
> from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);
>
> I have an index on gid and id is the primary key
>
> When I describe the query with about 50 gid values inserted (where
> indicated above) I get the following:
>
> +++---+---+---+-
> +-+--+--+--+
> | id | select_type| table | type  | possible_keys | key |
> key_len | ref  | rows | Extra|
> +++---+---+---+-
> +-+--+--+--+
> |  1 | PRIMARY| tbl   | ALL   | NULL  | NULL|
> NULL| NULL | 7533 | Using where  |
> |  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid   | idx_gid |
> 9   | NULL |   58 | Using where; Using index |
> +++---+---+---+-
> +-+--+--+--+
>
> Running the query on a production machine with sufficient memory and
> horsepower (box is only 20% utilized) it still takes 3 seconds to run
> - obviously not quick enough for web use.
>
> What I really need is the most recent comment from each group based
> on a variable set of gid's that change from user to user.
>
> Any thoughts on how to tweak this to avoid the full table scan? Thank
> you in advance for your assistance.
>
> Erik Giberti
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Assistance avoiding a full table scan

2007-09-21 Thread Michael Dykman
Erik,

Even is you eliminate the subquery (which I have used efficiently in
the past, but they are always something to be careful of), the IN
clause is going to kill you..  above a certain number of elements in
that clause, the optimizer will go straight to full table scan.  This
drawback is well-known.

On 9/21/07, Dan Buettner <[EMAIL PROTECTED]> wrote:
> Erik, I think the main reason your query is running slowly is the use of a
> subselect.  MySQL does not generally perform well with subselects, though
> work continues in that area.
>
> There is also a problem/situation in MySQL in that you can't use MAX/GROUP
> BY functions quite the way you can in other databases; you'll get an
> accurate MAX value for one column, but the value in another won't
> necessarily be from the same row.  Someone posted on the list about this
> recently, calling it a bug, and I tend to agree.
>
> To solve your problem:
>
> I would take one of two approaches.
>
> First approach: split it into two queries in PHP, and use the results of the
> first in the second, like so:
>
> query1 = select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 )
> group by gid
>
> in PHP: id_string = join the results with commas.  implode function?
>
> query2 = select comment, gid, date_posted from tbl where id in (id_string)
>
> Generally speaking, fewer queries = higher performance, and databases are
> optimized to join tables, they do it well - but in your case I think you'll
> find one of these works better.
>
> Second approach:
>
> Insert values from first query into a temporary table, then join on that
> temp table in your second query.
>
> I don't think either approach will have a speed advantage, and the first is
> probably easier to code.
>
> HTH,
> Dan
>
>
>
> On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote:
> >
> > Hello everyone,
> >
> > The app server in this case is PHP, and the database is MySQL 5.0.22
> > on RedHat linux
> >
> > I've got a database with about 7.5K records in it that I expect to
> > start growing very quickly ~10-12K records per day. The storage
> > engine is InnoDB. This table is growing quickly and will continue to
> > grow for a long time. This table stores comments (as you can see from
> > the structure) and is being used to display a list of comments based
> > on a users affiliations.
> >
> > The structure is approximately this - I'm leaving out unrelated columns:
> >
> > id int - primary key - auto increment
> > gid bigint - indexed
> > comment varchar
> > date_posted timestamp
> >
> > I run a query with the following form
> >
> > select comment, gid, date_posted from tbl where id in (select max(id)
> > from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);
> >
> > I have an index on gid and id is the primary key
> >
> > When I describe the query with about 50 gid values inserted (where
> > indicated above) I get the following:
> >
> > +++---+---+---+-
> > +-+--+--+--+
> > | id | select_type| table | type  | possible_keys | key |
> > key_len | ref  | rows | Extra|
> > +++---+---+---+-
> > +-+--+--+--+
> > |  1 | PRIMARY| tbl   | ALL   | NULL  | NULL|
> > NULL| NULL | 7533 | Using where  |
> > |  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid   | idx_gid |
> > 9   | NULL |   58 | Using where; Using index |
> > +++---+---+---+-
> > +-+--+--+--+
> >
> > Running the query on a production machine with sufficient memory and
> > horsepower (box is only 20% utilized) it still takes 3 seconds to run
> > - obviously not quick enough for web use.
> >
> > What I really need is the most recent comment from each group based
> > on a variable set of gid's that change from user to user.
> >
> > Any thoughts on how to tweak this to avoid the full table scan? Thank
> > you in advance for your assistance.
> >
> > Erik Giberti
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
> >
> >
>


-- 
 - michael dykman
 - [EMAIL PROTECTED]

 - All models are wrong.  Some models are useful.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Assistance avoiding a full table scan

2007-09-21 Thread Dan Buettner
Erik, I think the main reason your query is running slowly is the use of a
subselect.  MySQL does not generally perform well with subselects, though
work continues in that area.

There is also a problem/situation in MySQL in that you can't use MAX/GROUP
BY functions quite the way you can in other databases; you'll get an
accurate MAX value for one column, but the value in another won't
necessarily be from the same row.  Someone posted on the list about this
recently, calling it a bug, and I tend to agree.

To solve your problem:

I would take one of two approaches.

First approach: split it into two queries in PHP, and use the results of the
first in the second, like so:

query1 = select max(id) from tbl where gid in ( 1234,2345,3456 .. 7890 )
group by gid

in PHP: id_string = join the results with commas.  implode function?

query2 = select comment, gid, date_posted from tbl where id in (id_string)

Generally speaking, fewer queries = higher performance, and databases are
optimized to join tables, they do it well - but in your case I think you'll
find one of these works better.

Second approach:

Insert values from first query into a temporary table, then join on that
temp table in your second query.

I don't think either approach will have a speed advantage, and the first is
probably easier to code.

HTH,
Dan



On 9/21/07, Erik Giberti <[EMAIL PROTECTED]> wrote:
>
> Hello everyone,
>
> The app server in this case is PHP, and the database is MySQL 5.0.22
> on RedHat linux
>
> I've got a database with about 7.5K records in it that I expect to
> start growing very quickly ~10-12K records per day. The storage
> engine is InnoDB. This table is growing quickly and will continue to
> grow for a long time. This table stores comments (as you can see from
> the structure) and is being used to display a list of comments based
> on a users affiliations.
>
> The structure is approximately this - I'm leaving out unrelated columns:
>
> id int - primary key - auto increment
> gid bigint - indexed
> comment varchar
> date_posted timestamp
>
> I run a query with the following form
>
> select comment, gid, date_posted from tbl where id in (select max(id)
> from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);
>
> I have an index on gid and id is the primary key
>
> When I describe the query with about 50 gid values inserted (where
> indicated above) I get the following:
>
> +++---+---+---+-
> +-+--+--+--+
> | id | select_type| table | type  | possible_keys | key |
> key_len | ref  | rows | Extra|
> +++---+---+---+-
> +-+--+--+--+
> |  1 | PRIMARY| tbl   | ALL   | NULL  | NULL|
> NULL| NULL | 7533 | Using where  |
> |  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid   | idx_gid |
> 9   | NULL |   58 | Using where; Using index |
> +++---+---+---+-
> +-+--+--+--+
>
> Running the query on a production machine with sufficient memory and
> horsepower (box is only 20% utilized) it still takes 3 seconds to run
> - obviously not quick enough for web use.
>
> What I really need is the most recent comment from each group based
> on a variable set of gid's that change from user to user.
>
> Any thoughts on how to tweak this to avoid the full table scan? Thank
> you in advance for your assistance.
>
> Erik Giberti
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
>
>


Assistance avoiding a full table scan

2007-09-21 Thread Erik Giberti

Hello everyone,

The app server in this case is PHP, and the database is MySQL 5.0.22  
on RedHat linux


I've got a database with about 7.5K records in it that I expect to  
start growing very quickly ~10-12K records per day. The storage  
engine is InnoDB. This table is growing quickly and will continue to  
grow for a long time. This table stores comments (as you can see from  
the structure) and is being used to display a list of comments based  
on a users affiliations.


The structure is approximately this - I'm leaving out unrelated columns:

id int - primary key - auto increment
gid bigint - indexed
comment varchar
date_posted timestamp

I run a query with the following form

select comment, gid, date_posted from tbl where id in (select max(id)  
from tbl where gid in ( 1234,2345,3456 .. 7890 ) group by gid);


I have an index on gid and id is the primary key

When I describe the query with about 50 gid values inserted (where  
indicated above) I get the following:


+++---+---+---+- 
+-+--+--+--+
| id | select_type| table | type  | possible_keys | key |  
key_len | ref  | rows | Extra|
+++---+---+---+- 
+-+--+--+--+
|  1 | PRIMARY| tbl   | ALL   | NULL  | NULL|  
NULL| NULL | 7533 | Using where  |
|  2 | DEPENDENT SUBQUERY | tbl   | range | idx_gid   | idx_gid |  
9   | NULL |   58 | Using where; Using index |
+++---+---+---+- 
+-+--+--+--+


Running the query on a production machine with sufficient memory and  
horsepower (box is only 20% utilized) it still takes 3 seconds to run  
- obviously not quick enough for web use.


What I really need is the most recent comment from each group based  
on a variable set of gid's that change from user to user.


Any thoughts on how to tweak this to avoid the full table scan? Thank  
you in advance for your assistance.


Erik Giberti


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]