Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
On Mon, Jan 24, 2011 at 6:43 PM, Gavin Towey wrote: > If you show the EXPLAIN SELECT .. output, and the table structure, someone > will be able to give a more definite answer. > > Thanks for the reply Gavin. I actually did place this info in my very first message on this thread, along with my basic table structure and server version. Myself and others have just stopped keeping the full, deeply-nested, quoted thread inside all subsequent messages which is why you probably haven't seen it. However, here is the EXPLAIN SELECT from the first message (reformatted for email): select_type: SIMPLE table: recipients type: ref possible_keys: messages_fk, employee_idx key: employee_idx key_len: 5 ref: const rows: 222640 Extra: Using where; Using temporary; Using filesort select_type: SIMPLE table: messages type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: email_archive.recipients.message_id rows: 1 Extra: Anyhow, having now copied these tables to another server (MySQL 5.1) and done some tests (bumping up innodb_buffer_pool_size and playing with innodb_flush_log_at_trx_commit for my writes and a few other knobs) it is simply that these somewhat large tables need lots of RAM to perform well, just as Reindl Harald originally pointed out. Thanks again for the help everyone! -- Kendall Gifford zettab...@gmail.com
RE: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
If you show the EXPLAIN SELECT .. output, and the table structure, someone will be able to give a more definite answer. -Original Message- From: Kendall Gifford [mailto:zettab...@gmail.com] Sent: Monday, January 24, 2011 2:29 PM To: mysql@lists.mysql.com Subject: Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables On Mon, Jan 24, 2011 at 2:20 PM, Kendall Gifford wrote: > > > On Mon, Jan 24, 2011 at 3:40 AM, Joerg Bruehe wrote: > >> Hi everybody! >> >> >> Shawn Green (MySQL) wrote: >> > On 1/21/2011 14:21, Kendall Gifford wrote: >> >> Hello everyone, I've got a database on an old Fedora Core 4 server >> >> running >> >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question >> >> has >> >> just two (InnoDB) tables: >> >> >> >> messages (approx 2.5 million records) >> >> recipients (approx 6.5 million records) >> >> >> >> [[ ... see the original post for the schema details ... ]] >> >> >> >> >> >> I have the following query that is just too slow: >> >> >> >>> SELECT messages.* FROM messages >> >>> INNER JOIN recipients ON recipients.message_id = messages.id >> >>> WHERE recipients.employee_id = X >> >>> GROUP BY messages.id >> >>> ORDER BY sent_at DESC >> >>> LIMIT 0, 25; >> >> >> >> This takes about 44 seconds on average. [[...]] >> >> >> > >> > You need to get rid of the GROUP BY to make this go faster. You can do >> > that by running two queries, one to pick the list of unique >> > recipients.message_id values that match your where condition then >> > another to actually retrieve the message data. [[...]] >> >> I don't want to contradict Shawn, but currently I fail to see the need >> for the GROUP BY: Joining like this >> >> messages INNER JOIN recipients ON recipients.message_id = messages.id >> WHERE recipients.employee_id = X >> >> can return only one row, unless there are multiple "recipients" records >> for the same values of "message_id" and "employee_id". >> >> I don't know whether that can happen in the poster's application, and >> whether it would cause trouble if the result line would occur multiple >> times. >> >> > In my application, there CAN in fact be several "recipients" records with > both the same "message_id" foreign key value AND the same "employee_id" > value (some employees may be a recipient of a message several times over via > alternative addresses and/or aliases). However, as I rework things, I could > probably rework application logic nuke the GROUP BY and just cope, in code, > with these extra "messages" records in my result set. (Just FYI, the SQL > query is simply the default query as created by rails or, more specifically, > ActiveRecord 2.3.9 which I can/will-be optimizing). > > I will additionally be moving this database to a new server. However, for > academic interest, I'll see if I can make time to post the query time(s) > once I change the app, before moving the database to a new (and better > configured) server. > > Just an update for posterity, simply removing the GROUP BY clause of my query above has, overall, no noticeable effect on performance. I suspect server configuration, as pointed out by Reindl, is too much of a bottleneck and is what I first need to change (working on that now). Perhaps the removal of GROUP BY would/will be noticeable if the server configuration for InnoDB tables wasn't so horrendous. I'll find out... -- Kendall Gifford zettab...@gmail.com IMPORTANT: This email message is intended only for the use of the individual to whom, or entity to which, it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are NOT the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender immediately and permanently delete this email. Thank you. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
On Mon, Jan 24, 2011 at 2:20 PM, Kendall Gifford wrote: > > > On Mon, Jan 24, 2011 at 3:40 AM, Joerg Bruehe wrote: > >> Hi everybody! >> >> >> Shawn Green (MySQL) wrote: >> > On 1/21/2011 14:21, Kendall Gifford wrote: >> >> Hello everyone, I've got a database on an old Fedora Core 4 server >> >> running >> >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question >> >> has >> >> just two (InnoDB) tables: >> >> >> >> messages (approx 2.5 million records) >> >> recipients (approx 6.5 million records) >> >> >> >> [[ ... see the original post for the schema details ... ]] >> >> >> >> >> >> I have the following query that is just too slow: >> >> >> >>> SELECT messages.* FROM messages >> >>> INNER JOIN recipients ON recipients.message_id = messages.id >> >>> WHERE recipients.employee_id = X >> >>> GROUP BY messages.id >> >>> ORDER BY sent_at DESC >> >>> LIMIT 0, 25; >> >> >> >> This takes about 44 seconds on average. [[...]] >> >> >> > >> > You need to get rid of the GROUP BY to make this go faster. You can do >> > that by running two queries, one to pick the list of unique >> > recipients.message_id values that match your where condition then >> > another to actually retrieve the message data. [[...]] >> >> I don't want to contradict Shawn, but currently I fail to see the need >> for the GROUP BY: Joining like this >> >> messages INNER JOIN recipients ON recipients.message_id = messages.id >> WHERE recipients.employee_id = X >> >> can return only one row, unless there are multiple "recipients" records >> for the same values of "message_id" and "employee_id". >> >> I don't know whether that can happen in the poster's application, and >> whether it would cause trouble if the result line would occur multiple >> times. >> >> > In my application, there CAN in fact be several "recipients" records with > both the same "message_id" foreign key value AND the same "employee_id" > value (some employees may be a recipient of a message several times over via > alternative addresses and/or aliases). However, as I rework things, I could > probably rework application logic nuke the GROUP BY and just cope, in code, > with these extra "messages" records in my result set. (Just FYI, the SQL > query is simply the default query as created by rails or, more specifically, > ActiveRecord 2.3.9 which I can/will-be optimizing). > > I will additionally be moving this database to a new server. However, for > academic interest, I'll see if I can make time to post the query time(s) > once I change the app, before moving the database to a new (and better > configured) server. > > Just an update for posterity, simply removing the GROUP BY clause of my query above has, overall, no noticeable effect on performance. I suspect server configuration, as pointed out by Reindl, is too much of a bottleneck and is what I first need to change (working on that now). Perhaps the removal of GROUP BY would/will be noticeable if the server configuration for InnoDB tables wasn't so horrendous. I'll find out... -- Kendall Gifford zettab...@gmail.com
Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
On Mon, Jan 24, 2011 at 3:40 AM, Joerg Bruehe wrote: > Hi everybody! > > > Shawn Green (MySQL) wrote: > > On 1/21/2011 14:21, Kendall Gifford wrote: > >> Hello everyone, I've got a database on an old Fedora Core 4 server > >> running > >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question > >> has > >> just two (InnoDB) tables: > >> > >> messages (approx 2.5 million records) > >> recipients (approx 6.5 million records) > >> > >> [[ ... see the original post for the schema details ... ]] > >> > >> > >> I have the following query that is just too slow: > >> > >>> SELECT messages.* FROM messages > >>> INNER JOIN recipients ON recipients.message_id = messages.id > >>> WHERE recipients.employee_id = X > >>> GROUP BY messages.id > >>> ORDER BY sent_at DESC > >>> LIMIT 0, 25; > >> > >> This takes about 44 seconds on average. [[...]] > >> > > > > You need to get rid of the GROUP BY to make this go faster. You can do > > that by running two queries, one to pick the list of unique > > recipients.message_id values that match your where condition then > > another to actually retrieve the message data. [[...]] > > I don't want to contradict Shawn, but currently I fail to see the need > for the GROUP BY: Joining like this > > messages INNER JOIN recipients ON recipients.message_id = messages.id > WHERE recipients.employee_id = X > > can return only one row, unless there are multiple "recipients" records > for the same values of "message_id" and "employee_id". > > I don't know whether that can happen in the poster's application, and > whether it would cause trouble if the result line would occur multiple > times. > > In my application, there CAN in fact be several "recipients" records with both the same "message_id" foreign key value AND the same "employee_id" value (some employees may be a recipient of a message several times over via alternative addresses and/or aliases). However, as I rework things, I could probably rework application logic nuke the GROUP BY and just cope, in code, with these extra "messages" records in my result set. (Just FYI, the SQL query is simply the default query as created by rails or, more specifically, ActiveRecord 2.3.9 which I can/will-be optimizing). I will additionally be moving this database to a new server. However, for academic interest, I'll see if I can make time to post the query time(s) once I change the app, before moving the database to a new (and better configured) server. Thanks for the help everybody. > > Regards, > Jörg > > -- Kendall Gifford zettab...@gmail.com
Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
Hi everybody! Shawn Green (MySQL) wrote: > On 1/21/2011 14:21, Kendall Gifford wrote: >> Hello everyone, I've got a database on an old Fedora Core 4 server >> running >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question >> has >> just two (InnoDB) tables: >> >> messages (approx 2.5 million records) >> recipients (approx 6.5 million records) >> >> [[ ... see the original post for the schema details ... ]] >> >> >> I have the following query that is just too slow: >> >>> SELECT messages.* FROM messages >>> INNER JOIN recipients ON recipients.message_id = messages.id >>> WHERE recipients.employee_id = X >>> GROUP BY messages.id >>> ORDER BY sent_at DESC >>> LIMIT 0, 25; >> >> This takes about 44 seconds on average. [[...]] >> > > You need to get rid of the GROUP BY to make this go faster. You can do > that by running two queries, one to pick the list of unique > recipients.message_id values that match your where condition then > another to actually retrieve the message data. [[...]] I don't want to contradict Shawn, but currently I fail to see the need for the GROUP BY: Joining like this messages INNER JOIN recipients ON recipients.message_id = messages.id WHERE recipients.employee_id = X can return only one row, unless there are multiple "recipients" records for the same values of "message_id" and "employee_id". I don't know whether that can happen in the poster's application, and whether it would cause trouble if the result line would occur multiple times. Regards, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. & Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
On Fri, Jan 21, 2011 at 2:01 PM, Shawn Green (MySQL) < shawn.l.gr...@oracle.com> wrote: > On 1/21/2011 14:21, Kendall Gifford wrote: > >> Hello everyone, I've got a database on an old Fedora Core 4 server running >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has >> just two (InnoDB) tables: >> >> messages (approx 2.5 million records) >> recipients (approx 6.5 million records) >> >> These track information about email messages. Each message "has many" >> recipient records. The structure of the two tables (omitting irrelevant >> data >> fields) are as follows: >> >> >> +-+--+--+-+-++ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +-+--+--+-+-++ >> | id | int(10) unsigned | | PRI | NULL| >> auto_increment | >> | sent_at | datetime | | MUL | -00-00 00:00:00 >> || >> | . OTHER FIELDS OMITTED FOR BREVITY >> ... | >> >> +-+--+--+-+-++ >> >> >> +-+--+--+-+-++ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +-+--+--+-+-++ >> | id | int(10) unsigned | | PRI | NULL| >> auto_increment | >> | message_id | int(10) unsigned | | MUL | 0 >> || >> | employee_id | int(10) unsigned | YES | MUL | NULL >> || >> | . OTHER FIELDS OMITTED FOR BREVITY >> ... | >> >> +-+--+--+-+-++ >> >> I have the following query that is just too slow: >> >> SELECT messages.* FROM messages >>> INNER JOIN recipients ON recipients.message_id = messages.id >>> WHERE recipients.employee_id = X >>> GROUP BY messages.id >>> ORDER BY sent_at DESC >>> LIMIT 0, 25; >>> >> >> This takes about 44 seconds on average. The query explanation is as >> follows: >> >> >> ++-+++--+--+-+-++--+ >> | id | select_type | table | type | possible_keys| >> key | key_len | ref | rows | >> Extra| >> >> ++-+++--+--+-+-++--+ >> | 1 | SIMPLE | recipients | ref| messages_fk,employee_idx | >> employee_idx | 5 | const | 222640 | >> Using where; Using temporary; Using filesort | >> | 1 | SIMPLE | messages | eq_ref | PRIMARY | >> PRIMARY | 4 | email_archive.recipients.message_id | 1 >> | | >> >> ++-+++--+--+-+-++--+ >> >> I've been doing some searching on the web and have no idea if/how this can >> be sped up. Most searches these days reference MySQL 5.x which I'm just >> not >> sure how much applies. I'm hoping that there is something obvious that I'm >> missing, or that one of you experts knows what I might be able to change >> to >> speed this query up. >> >> Anyhow, thanks in advance for even so much as reading my message, let >> alone >> replying :). >> >> > You need to get rid of the GROUP BY to make this go faster. You can do that > by running two queries, one to pick the list of unique recipients.message_id > values that match your where condition then another to actually retrieve the > message data. Something like this > > CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY > (message_id)) ENGINE=MEMORY; > > INSERT IGNORE tmpMessages > SELECT message_id > FROM recipients > WHERE employee_id = X; > > SELECT messages.* FROM messages > INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id > > ORDER BY sent_at DESC > LIMIT 0, 25; > > By pre-selecting a limited set of message_id values from the recipients > table, you seriously reduce the number of rows that need to be scanned. > Also, the INSERT IGNORE technique is faster than the GROUP BY because it > uses an index to identify any duplicates instead of a scan of all previous > unique values. > > Please let us all know if this is faster enough. (and don't forget to drop > the temp table once you are thr
Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
On 1/21/2011 14:21, Kendall Gifford wrote: Hello everyone, I've got a database on an old Fedora Core 4 server running MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has just two (InnoDB) tables: messages (approx 2.5 million records) recipients (approx 6.5 million records) These track information about email messages. Each message "has many" recipient records. The structure of the two tables (omitting irrelevant data fields) are as follows: +-+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-++ | id | int(10) unsigned | | PRI | NULL| auto_increment | | sent_at | datetime | | MUL | -00-00 00:00:00 || | . OTHER FIELDS OMITTED FOR BREVITY ... | +-+--+--+-+-++ +-+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-++ | id | int(10) unsigned | | PRI | NULL| auto_increment | | message_id | int(10) unsigned | | MUL | 0 || | employee_id | int(10) unsigned | YES | MUL | NULL || | . OTHER FIELDS OMITTED FOR BREVITY ... | +-+--+--+-+-++ I have the following query that is just too slow: SELECT messages.* FROM messages INNER JOIN recipients ON recipients.message_id = messages.id WHERE recipients.employee_id = X GROUP BY messages.id ORDER BY sent_at DESC LIMIT 0, 25; This takes about 44 seconds on average. The query explanation is as follows: ++-+++--+--+-+-++--+ | id | select_type | table | type | possible_keys| key | key_len | ref | rows | Extra| ++-+++--+--+-+-++--+ | 1 | SIMPLE | recipients | ref| messages_fk,employee_idx | employee_idx | 5 | const | 222640 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | messages | eq_ref | PRIMARY | PRIMARY | 4 | email_archive.recipients.message_id | 1 | | ++-+++--+--+-+-++--+ I've been doing some searching on the web and have no idea if/how this can be sped up. Most searches these days reference MySQL 5.x which I'm just not sure how much applies. I'm hoping that there is something obvious that I'm missing, or that one of you experts knows what I might be able to change to speed this query up. Anyhow, thanks in advance for even so much as reading my message, let alone replying :). You need to get rid of the GROUP BY to make this go faster. You can do that by running two queries, one to pick the list of unique recipients.message_id values that match your where condition then another to actually retrieve the message data. Something like this CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY (message_id)) ENGINE=MEMORY; INSERT IGNORE tmpMessages SELECT message_id FROM recipients WHERE employee_id = X; SELECT messages.* FROM messages INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id ORDER BY sent_at DESC LIMIT 0, 25; By pre-selecting a limited set of message_id values from the recipients table, you seriously reduce the number of rows that need to be scanned. Also, the INSERT IGNORE technique is faster than the GROUP BY because it uses an index to identify any duplicates instead of a scan of all previous unique values. Please let us all know if this is faster enough. (and don't forget to drop the temp table once you are through using it) -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Slow query on MySQL4 server doing simple inner join of two InnoDB tables
you need hughe ram / innodb_buffer_pool for large datasets in a perfect world the buffer_pool is as large as the data how looks your current config? how much RAM has the machine? Am 21.01.2011 20:21, schrieb Kendall Gifford: > Hello everyone, I've got a database on an old Fedora Core 4 server running > MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has > just two (InnoDB) tables: > > messages (approx 2.5 million records) > recipients (approx 6.5 million records) > > These track information about email messages. Each message "has many" > recipient records. The structure of the two tables (omitting irrelevant data > fields) are as follows: > > +-+--+--+-+-++ > | Field | Type | Null | Key | Default | > Extra | > +-+--+--+-+-++ > | id | int(10) unsigned | | PRI | NULL| > auto_increment | > | sent_at | datetime | | MUL | -00-00 00:00:00 > || > | . OTHER FIELDS OMITTED FOR BREVITY > ... | > +-+--+--+-+-++ > > +-+--+--+-+-++ > | Field | Type | Null | Key | Default | > Extra | > +-+--+--+-+-++ > | id | int(10) unsigned | | PRI | NULL| > auto_increment | > | message_id | int(10) unsigned | | MUL | 0 > || > | employee_id | int(10) unsigned | YES | MUL | NULL > || > | . OTHER FIELDS OMITTED FOR BREVITY > ... | > +-+--+--+-+-++ > > I have the following query that is just too slow: > >> SELECT messages.* FROM messages >> INNER JOIN recipients ON recipients.message_id = messages.id >> WHERE recipients.employee_id = X >> GROUP BY messages.id >> ORDER BY sent_at DESC >> LIMIT 0, 25; > > This takes about 44 seconds on average. The query explanation is as follows: > > ++-+++--+--+-+-++--+ > | id | select_type | table | type | possible_keys| > key | key_len | ref | rows | > Extra| > ++-+++--+--+-+-++--+ > | 1 | SIMPLE | recipients | ref| messages_fk,employee_idx | > employee_idx | 5 | const | 222640 | > Using where; Using temporary; Using filesort | > | 1 | SIMPLE | messages | eq_ref | PRIMARY | > PRIMARY | 4 | email_archive.recipients.message_id | 1 > | | > ++-+++--+--+-+-++--+ > > I've been doing some searching on the web and have no idea if/how this can > be sped up. Most searches these days reference MySQL 5.x which I'm just not > sure how much applies. I'm hoping that there is something obvious that I'm > missing, or that one of you experts knows what I might be able to change to > speed this query up. > > Anyhow, thanks in advance for even so much as reading my message, let alone > replying :). > -- Mit besten Grüßen, Reindl Harald the lounge interactive design GmbH A-1060 Vienna, Hofmühlgasse 17 CTO / software-development / cms-solutions p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40 icq: 154546673, http://www.thelounge.net/ signature.asc Description: OpenPGP digital signature