Re: Query Optimization
SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it faster, if the field is ENUM On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar sb_akarmar...@yahoo.comwrote: Hi All, Consider a scenario, I have table XYZ which contains value follow BLUE RED GREEN NULL following are queries we can use get this values 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN' and more So which one is good in terms of optimization. I guess, 1 and 3 are similar in term of formation. --Anupam -- Thanks Regards, P.Benaya Paul http://www.codeasearch.com http://www.iwannasearch.com
RE: Query Optimization
It depends on the distribution of the 4 'values' in that field. If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do a table scan. -Original Message- From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com] Sent: Friday, November 16, 2012 12:36 AM To: mysql@lists.mysql.com Subject: Query Optimization Hi All, Consider a scenario, I have table XYZ which contains value follow BLUE RED GREEN NULL following are queries we can use get this values 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN' and more So which one is good in terms of optimization. I guess, 1 and 3 are similar in term of formation. --Anupam -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
query optimization
I have a table with 24 million rows, I need to figure out how to optimize a query. It has to do with mac addresses and radius packets - I want to see the # of connections and the min/max date. So I basically want all this data: select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) recn from radiuscap where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) and r3_type='Access' group by cpe_mac order by cpe_mac ; This piece of the query takes 30 seconds to run and produces 3500 rows. I have r3_dt indexed. I also want a status field of the row with the highest r3_dt: select rec_num,cpe_mac,req_status from rad_r3cap where r3_type='Access' and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap) ; This piece of the query takes forever, I let it run for an hour and it still didn't finish, it's obviously not using indexes. I have no idea how far along it got. I wrote a php script to run the 1st query, then do 3500 individual lookups for the status using the max(rec_num) field in the 1st query, and I can get the data in 31 seconds. So I CAN produce this data, but very slowly, and not in 1 sql query. I want to consolidate this into 1 sql so I can make a view. If anyone can point me in the right direction, I'd appreciate it! mysql desc rad_r3cap; +-+-+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+-+--+-+-++ | rec_num | int(11) | NO | PRI | NULL | auto_increment | | r3_dt | datetime | YES | MUL | NULL | | | r3_micros | int(11) | YES | | NULL | | | r3_type | varchar(16) | YES | | NULL | | | req_status | varchar(16) | YES | | NULL | | | req_comment | varchar(64) | YES | | NULL | | | asn_ip | varchar(16) | YES | MUL | NULL | | | asn_name | varchar(16) | YES | | NULL | | | bsid | varchar(12) | YES | MUL | NULL | | | cpe_ip | varchar(16) | YES | | NULL | | | cpe_mac | varchar(12) | YES | MUL | NULL | | | filename | varchar(32) | YES | | NULL | | | linenum | int(11) | YES | | NULL | | | r3_hour | datetime | YES | MUL | NULL | | | user_name | varchar(64) | YES | | NULL | | +-+-+--+-+-++ mysql show indexes in rad_r3cap; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | rad_r3cap | 0 | PRIMARY | 1 | rec_num | A | 23877677 | NULL | NULL | | BTREE | | | rad_r3cap | 0 | r3cap_dt | 1 | r3_dt | A | NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 0 | r3cap_dt | 2 | r3_micros | A | NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_bsid | 1 | bsid | A | 346 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_asnip | 1 | asn_ip | A | 55 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_cpemac | 1 | cpe_mac | A | 4758 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_date | 1 | r3_hour | A | 1548 | NULL | NULL | YES | BTREE | | +---++--+--+-+---+-+--++--++-+ 7 rows in set (0.00 sec)
Re: query optimization
Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full table scan, you might want to check on this and use a WHERE condition to use indexed column On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote: I have a table with 24 million rows, I need to figure out how to optimize a query. It has to do with mac addresses and radius packets - I want to see the # of connections and the min/max date. So I basically want all this data: select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) recn from radiuscap where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) and r3_type='Access' group by cpe_mac order by cpe_mac ; This piece of the query takes 30 seconds to run and produces 3500 rows. I have r3_dt indexed. I also want a status field of the row with the highest r3_dt: select rec_num,cpe_mac,req_status from rad_r3cap where r3_type='Access' and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap) ; This piece of the query takes forever, I let it run for an hour and it still didn't finish, it's obviously not using indexes. I have no idea how far along it got. I wrote a php script to run the 1st query, then do 3500 individual lookups for the status using the max(rec_num) field in the 1st query, and I can get the data in 31 seconds. So I CAN produce this data, but very slowly, and not in 1 sql query. I want to consolidate this into 1 sql so I can make a view. If anyone can point me in the right direction, I'd appreciate it! mysql desc rad_r3cap; +-+-+--+-+-++ | Field | Type| Null | Key | Default | Extra | +-+-+--+-+-++ | rec_num | int(11) | NO | PRI | NULL| auto_increment | | r3_dt | datetime| YES | MUL | NULL|| | r3_micros | int(11) | YES | | NULL|| | r3_type | varchar(16) | YES | | NULL|| | req_status | varchar(16) | YES | | NULL|| | req_comment | varchar(64) | YES | | NULL|| | asn_ip | varchar(16) | YES | MUL | NULL|| | asn_name| varchar(16) | YES | | NULL|| | bsid| varchar(12) | YES | MUL | NULL|| | cpe_ip | varchar(16) | YES | | NULL|| | cpe_mac | varchar(12) | YES | MUL | NULL|| | filename| varchar(32) | YES | | NULL|| | linenum | int(11) | YES | | NULL|| | r3_hour | datetime| YES | MUL | NULL|| | user_name | varchar(64) | YES | | NULL|| +-+-+--+-+-++ mysql show indexes in rad_r3cap; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | rad_r3cap | 0 | PRIMARY |1 | rec_num | A |23877677 | NULL | NULL | | BTREE | | | rad_r3cap | 0 | r3cap_dt |1 | r3_dt | A |NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 0 | r3cap_dt |2 | r3_micros | A |NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_bsid |1 | bsid| A | 346 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_asnip |1 | asn_ip | A | 55 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_cpemac |1 | cpe_mac | A |4758 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_date |1 | r3_hour | A |1548 | NULL | NULL | YES | BTREE | | +---++--+--+-+---+-+--++--++-+ 7 rows in set (0.00 sec)
Re: Query Optimization
On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
Re: Query Optimization
Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---**
Re: Query Optimization
Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index: ++-+---+--**--+---**
Re: Query Optimization
I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+----+-**--** +--+-++--+** --**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+----+-**--** +--+-++--+** --**---+ | 1 | SIMPLE | sc | index |
Re: Query Optimization
If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
Re: Query Optimization
Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting
Re: Query Optimization
Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL | NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened
Re: Query Optimization
Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS
Re: Query Optimization
Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is
Re: Query Optimization
Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be
Re: Query Optimization
How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get:
Re: Query Optimization
Andrew, Generally there is only 1 user performing the complicated SELECT query at a time, however the background process that fills the table is constantly doing a fast SELECT (0.3 seconds) and a subsequent UPDATE. Basically whenever a connection is closed on the firewall, the bg process SELECTS from the table the last record that was opened (and hasn't already been closed) for a given source IP, source port, dest ip, and protocol. So for example whenever the firewall logs a CONNECTION CLOSED message, the bg process does: select id from sonicwall_connections where src_address = w.x.y.z and src_port = x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1 then it UPDATES the close_dt column for the record with the selected ID. These select/update statements being run by the background process generally take 0.000x seconds each. The only slowdown we see is with the complicated SELECT I have below, and this is only executed once in a while throughout the day. For example, if an IT manager sees a huge bandwidth spike on the network, he may access the web gui to determine what the source IP is so he can determine who is downloading a large file, etc. I think what I will try to do is create 60 partitions for the table in question based on month and day. This way each day will go into it's own partition and if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it will only need to access 7 partitions instead of the entire table. My only question with this is how would I go about creating the table with 60 partitions in such a way that I won't need to constantly add/remove new/old partitions every day? Is there any partitioning function I can use that would not require me to update the partitions schema every day? I already plan to have a cron run each night to purge records older than 60 days from the database. On 09/08/2011 03:26 PM, Andrew Moore wrote: Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote:
Re: Query Optimization
Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings
Re: Query Optimization
From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where | | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM
Re: Query Optimization
Ah I see. Well thanks for your assistance! -Brandon On 09/08/2011 05:21 PM, Mihail Manolov wrote: From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT |union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS
Re: Query Optimization
On Aug 30, 2011 6:46 PM, Brandon Phelps wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem
Re: Query Optimization
On 09/01/2011 04:59 AM, Jochem van Dieten wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? Thanks again, Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d)|---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc| index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |
Query Optimization
Hello, I have the following query I'd like to optimize a bit: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') ORDER BY rcvd DESC LIMIT 0 , 10 Currently I have an index on the rcvd column which seems to be working based on the output of EXPLAIN: id select_type table typepossible_keys key key_len ref rowsExtra 1 SIMPLE sc index open_dt ndx_rcvd4 NULL 10 Using where 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1 However the query is still fairly slow for some reason, any ideas how I could speed it up with additional indexes, etc? The values I am using in the WHERE clause are variable and change each time. The table has around 23 million records right now but is expected to continue to grow up to a potential 150 million. Here is the table schema: CREATE TABLE IF NOT EXISTS `firewall_connections` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `open_dt` datetime NOT NULL, `close_dt` datetime NOT NULL, `protocol` smallint(6) NOT NULL, `src_address` int(10) unsigned NOT NULL, `src_port` smallint(5) unsigned NOT NULL, `dst_address` int(10) unsigned NOT NULL, `dst_port` smallint(5) unsigned NOT NULL, `sent` int(10) unsigned NOT NULL, `rcvd` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `ndx_rcvd` (`rcvd`), KEY `ndx_sent` (`sent`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Thanks in advance! -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
There are a few things gonig on, but mainly it is the ORs that are killing you. As your require OR to examine two distinct columns, both of equal relevance to the query, MySQL: is left with no choice but to do a full table scan on what might be (at a guess) a very larger table. No amount of indexing will fix this for the query presented. You would be better off writing it as two distinct queires, each concerned with conditions on a single column (open_dt and close_dt) and then UNIONing the results. In this form, the indexes have a chance of being engaged. Once the indexes are engaged, you probably want to your DATE/DATETIME strings into actual DATEs or DATATIMEs, thus: ... (open_dt = DATE('2011-08-30 00:00:00')) ... In it's current state, the DATE fields are being converted to strings implicitly for every row tested which further frustrates index usage as the index is against the quasi-numeric DATE, not the string representation which your current implementation appears to expect. This query would also suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by an admin or a future release of MySQL. The explicit casting I have suggested will protect you against that, - michael dykman` On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote: Hello, I have the following query I'd like to optimize a bit: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') ORDER BY rcvd DESC LIMIT 0 , 10 Currently I have an index on the rcvd column which seems to be working based on the output of EXPLAIN: id select_type table typepossible_keys key key_len ref rowsExtra 1 SIMPLE sc index open_dt ndx_rcvd4 NULL10 Using where 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1 However the query is still fairly slow for some reason, any ideas how I could speed it up with additional indexes, etc? The values I am using in the WHERE clause are variable and change each time. The table has around 23 million records right now but is expected to continue to grow up to a potential 150 million. Here is the table schema: CREATE TABLE IF NOT EXISTS `firewall_connections` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `open_dt` datetime NOT NULL, `close_dt` datetime NOT NULL, `protocol` smallint(6) NOT NULL, `src_address` int(10) unsigned NOT NULL, `src_port` smallint(5) unsigned NOT NULL, `dst_address` int(10) unsigned NOT NULL, `dst_port` smallint(5) unsigned NOT NULL, `sent` int(10) unsigned NOT NULL, `rcvd` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `ndx_rcvd` (`rcvd`), KEY `ndx_sent` (`sent`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Thanks in advance! -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com -- - michael dykman - mdyk...@gmail.com May the Source be with you.
Query Optimization
Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
Thanks Singer, this took my query down to 0.0007, perfect! I wasn't aware a single index of multiple columns would work when one of the columns was in the WHERE clause and the other in the ORDER BY clause. Learn something new every day I guess! On 08/10/2011 02:03 PM, Singer X.J. Wang wrote: Try a index on (dst_port,close_dt) On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com mailto:bphe...@gls.com wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?__unsub=w...@singerwang.com http://lists.mysql.com/mysql?unsub=w...@singerwang.com -- The best compliment you could give Pythian for our service is a referral. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 8/10/2011 1:01 PM, Brandon Phelps wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Did you try adding your ORDER BY argument close_dt to the index? PB - Thanks in advance, -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
From: Brandon Phelps bphe...@gls.com I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. You need an index on `close_dt`. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 Current farmers, who have become mere operators of machines and mixers of chemicals, may not have the skills to develop a local, sustainable agriculture. A new generation of farmers, numbering in the tens of millions, will need to be trained and relocated to rural communities. -- Pat Murphy Jan Steinman, EcoReality Co-op -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Need help with query optimization
It may only be returning 51 rows but its having to read significantly more. Get rid of the derived table join if possible. Something like SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Sum(Pts.Points) Total_Points FROM TorchAwardParticipants TAP JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID JOIN Chapters C On C.ID http://c.id/=M.ChapterID JOIN Schools S On S.ID http://s.id/=C.SchoolID JOIN TorchAwardSelAct Pts ON Pts.AchievementID=TAP.ID http://tap.id/ WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL AND Pts.LocalApproveStatus='A' GROUP BY TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District ORDER BY TAP.LastName, TAP.FirstName, Sum(Pts.Points) Regards John On Tue, Mar 16, 2010 at 6:17 PM, Jesse j...@msdlg.com wrote: I have the following query: SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Pts.TotPoints FROM TorchAwardParticipants TAP JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID JOIN Chapters C On C.ID http://c.id/=M.ChapterID JOIN Schools S On S.ID http://s.id/=C.SchoolID JOIN (SELECT AchievementID,Sum(Points) As TotPoints FROM TorchAwardSelAct TASA WHERE LocalApproveStatus='A' GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/ WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints The TorchAwardParticipants table has about 84,000 records in it. The query takes almost 40 seconds to return the data, which is only 51 rows. An EXPLAIN returns the following: ++-+++-+---+-+-++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+++-+---+-+-++-+ | 1 | PRIMARY | derived2 | ALL| NULL| NULL | NULL| NULL| 4382 | Using temporary; Using filesort | | 1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY | 4 | Pts.AchievementID | 1 | Using where | | 1 | PRIMARY | M | eq_ref | PRIMARY,IX_Members_3| PRIMARY | 4 | bpa.TAP.CurrentMemberID | 1 | | | 1 | PRIMARY | C | eq_ref | PRIMARY,IX_Chapters_1 | PRIMARY | 4 | bpa.M.ChapterID | 1 | | | 1 | PRIMARY | S | eq_ref | PRIMARY | PRIMARY | 4 | bpa.C.SchoolID | 1 | | | 2 | DERIVED | TASA | index | NULL| AchievementID | 5 | NULL| 161685 | Using where | ++-+++-+---+-+-++-+ What is the best way to optimize this query so that it doesn't take 40 seconds to return the dataset? Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=john.dais...@butterflysystems.co.uk
Need help with query optimization
I have the following query: SELECT TAP.ID, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Pts.TotPoints FROM TorchAwardParticipants TAP JOIN Members M On M.ID=TAP.CurrentMemberID JOIN Chapters C On C.ID=M.ChapterID JOIN Schools S On S.ID=C.SchoolID JOIN (SELECT AchievementID,Sum(Points) As TotPoints FROM TorchAwardSelAct TASA WHERE LocalApproveStatus='A' GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.ID WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints The TorchAwardParticipants table has about 84,000 records in it. The query takes almost 40 seconds to return the data, which is only 51 rows. An EXPLAIN returns the following: ++-+++-+---+-+-++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+++-+---+-+-++-+ | 1 | PRIMARY | derived2 | ALL| NULL| NULL | NULL| NULL| 4382 | Using temporary; Using filesort | | 1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY | 4 | Pts.AchievementID | 1 | Using where | | 1 | PRIMARY | M | eq_ref | PRIMARY,IX_Members_3| PRIMARY | 4 | bpa.TAP.CurrentMemberID | 1 | | | 1 | PRIMARY | C | eq_ref | PRIMARY,IX_Chapters_1 | PRIMARY | 4 | bpa.M.ChapterID | 1 | | | 1 | PRIMARY | S | eq_ref | PRIMARY | PRIMARY | 4 | bpa.C.SchoolID | 1 | | | 2 | DERIVED | TASA | index | NULL| AchievementID | 5 | NULL| 161685 | Using where | ++-+++-+---+-+-++-+ What is the best way to optimize this query so that it doesn't take 40 seconds to return the dataset? Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Need help with query optimization
Can you please show us the indexes on both the tables. regards anandkl On Tue, Mar 16, 2010 at 11:47 PM, Jesse j...@msdlg.com wrote: I have the following query: SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Pts.TotPoints FROM TorchAwardParticipants TAP JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID JOIN Chapters C On C.ID http://c.id/=M.ChapterID JOIN Schools S On S.ID http://s.id/=C.SchoolID JOIN (SELECT AchievementID,Sum(Points) As TotPoints FROM TorchAwardSelAct TASA WHERE LocalApproveStatus='A' GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/ WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints The TorchAwardParticipants table has about 84,000 records in it. The query takes almost 40 seconds to return the data, which is only 51 rows. An EXPLAIN returns the following: ++-+++-+---+-+-++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+++-+---+-+-++-+ | 1 | PRIMARY | derived2 | ALL| NULL| NULL | NULL| NULL| 4382 | Using temporary; Using filesort | | 1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY | 4 | Pts.AchievementID | 1 | Using where | | 1 | PRIMARY | M | eq_ref | PRIMARY,IX_Members_3| PRIMARY | 4 | bpa.TAP.CurrentMemberID | 1 | | | 1 | PRIMARY | C | eq_ref | PRIMARY,IX_Chapters_1 | PRIMARY | 4 | bpa.M.ChapterID | 1 | | | 1 | PRIMARY | S | eq_ref | PRIMARY | PRIMARY | 4 | bpa.C.SchoolID | 1 | | | 2 | DERIVED | TASA | index | NULL| AchievementID | 5 | NULL| 161685 | Using where | ++-+++-+---+-+-++-+ What is the best way to optimize this query so that it doesn't take 40 seconds to return the dataset? Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=anan...@gmail.com
VFP to MySQL Query Optimization
I feel like there should be a better way to do this... So I'm hoping someone will be able to advise. We have contacts that belong to organizations. We also have a document tracker. Holding over from VFP you can have up to six organization ids on the document and up to six contact ids. Right now the select to see if a contact has documents looks like this... SELECT * FROM Contacts WHERE (id IN (SELECT contid1 FROM documents) OR id IN (SELECT contid2 FROM documents) OR id IN (SELECT contid3 FROM documents) OR id IN (SELECT contid4 FROM documents) OR id IN (SELECT contid5 FROM documents) OR id IN (SELECT contid6 FROM documents) OR orgid IN (SELECT orgid1 FROM documents) OR orgid IN (SELECT orgid2 FROM documents) OR orgid IN (SELECT orgid3 FROM documents) OR orgid IN (SELECT orgid4 FROM documents) OR orgid IN (SELECT orgid5 FROM documents) OR orgid IN (SELECT orgid6 FROM documents)) Which is UGLY... and I feel like there should be a better way (I know I could break that out into a many-many relationship via a third linking table but I'm not 'able' to do that now.) The only change I can think of is to union the two halves of the select but I'm not sure if that would be better... (IE id in (select contid1 from documents union select contid2 from documents) etc) Any advice or is this the best I'm going to get until I can reorganize the underlying structure? Thanks! Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: query optimization question (my struggle against 'using temporary; using filesort')
Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your slow query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like Listing? Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | index_addresses_on_addressable_type_and_addressable_id | 773 | const, ratemyarea.listings.id |1 | Using
Re: query optimization question (my struggle against 'using temporary; using filesort')
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :) The queries were generated by ActiveRecord (an ORM library for Ruby), although even if I had written them myself they would probably not be much better. Regards, Ciaran Lee 2009/9/24 Gavin Towey gto...@ffn.com Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your slow query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like Listing? Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref
query optimization question (my struggle against 'using temporary; using filesort')
Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | index_addresses_on_addressable_type_and_addressable_id | 773 | const, ratemyarea.listings.id |1 | Using where | ++-+---++++-+--+--+-+ Here is the slow query (select the latest listing within a particular city), and it's explain SELECT listings.*, addresses.* FROM `listings` LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 ++-+---+++--+-+-+---+--+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+++--+-+-+---+--+ | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | addressable_id_type_city | 773 | const,const | 25680 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.addresses.addressable_id | 1 | | ++-+---+++--+-+-+---+--+ Here
Re: Hi Query Optimization steps
Hi! bharani kumar wrote: Then which DB engine is better. Which fruit is better, an apple or a banana? And if X is better, why do farmers still grow Y, and why do people buy it? More serious: Different engines offer different features, as described in the manual. The most obvious differences are full-text search (MyISAM) versus transactions and referential integrity (InnoDB), but there are more. During your database schema design you should for each table select the engine which is appropriate. Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@sun.com Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB161028 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Hi Query Optimization steps
-Original Message- From: joerg.bru...@sun.com [mailto:joerg.bru...@sun.com] Sent: Wednesday, August 26, 2009 5:08 AM To: mysql@lists.mysql.com Cc: bharani kumar Subject: Re: Hi Query Optimization steps Hi! bharani kumar wrote: Then which DB engine is better. Which fruit is better, an apple or a banana? And if X is better, why do farmers still grow Y, and why do people buy it? [JS] Good answer. I did performance consulting for over 25 years, on *NIX and other operating systems, and there was one response that suited all questions: It depends. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Hi Query Optimization steps
Then which DB engine is better. Thanks On Mon, Aug 24, 2009 at 8:16 PM, mos mo...@fastmail.fm wrote: At 05:01 AM 8/22/2009, you wrote: Hi all Tell Me few Query Optimization Tips For example.. some people saying that , avoid using more of and condition in the query Huh? You need to carefully plan your table structure and use indexes sparingly on the columns you are using in the Where clauses. An EXPLAIN in front of the Select statement will show you which indexes it is using. This sort of very basic things please tell me ... Also tell me , in the optimization view ,, which database engine is best If you need transactions or RI then you have to use InnoDb. Otherwise MyISAM is quite fast. Again it depends on the type of queries you are executing. Thanks Bharanikumar There is no magic wand to getting MySQL to run faster except planning and knowledge. It will take some effort on your part but it will be fun. Take a look at High Performance MySQL 2nd Edition. Some of the first edition is online at http://books.google.com/books?id=sgMvu2uZXlsCprintsec=frontcoverdq=high+performance+mysql#v=onepageq=f=false You can get the second edition at Amazon. Also you may need to read other MySQL books like MySQL Cookbook http://books.google.com/books?id=aC3blgT3D3YCprintsec=frontcoverdq=mysql+cookbook#v=onepageq=f=false The 2nd edition is at Amazon. or MySQL by Paul DuBois is also quite good. These will get you started on the right foot. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bharanikumariyer...@gmail.com -- Regards B.S.Bharanikumar http://php-mysql-jquery.blogspot.com/
Re: Hi Query Optimization steps
At 05:01 AM 8/22/2009, you wrote: Hi all Tell Me few Query Optimization Tips For example.. some people saying that , avoid using more of and condition in the query Huh? You need to carefully plan your table structure and use indexes sparingly on the columns you are using in the Where clauses. An EXPLAIN in front of the Select statement will show you which indexes it is using. This sort of very basic things please tell me ... Also tell me , in the optimization view ,, which database engine is best If you need transactions or RI then you have to use InnoDb. Otherwise MyISAM is quite fast. Again it depends on the type of queries you are executing. Thanks Bharanikumar There is no magic wand to getting MySQL to run faster except planning and knowledge. It will take some effort on your part but it will be fun. Take a look at High Performance MySQL 2nd Edition. Some of the first edition is online at http://books.google.com/books?id=sgMvu2uZXlsCprintsec=frontcoverdq=high+performance+mysql#v=onepageq=f=false You can get the second edition at Amazon. Also you may need to read other MySQL books like MySQL Cookbook http://books.google.com/books?id=aC3blgT3D3YCprintsec=frontcoverdq=mysql+cookbook#v=onepageq=f=false The 2nd edition is at Amazon. or MySQL by Paul DuBois is also quite good. These will get you started on the right foot. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Hi Query Optimization steps
Hi all Tell Me few Query Optimization Tips For example.. some people saying that , avoid using more of and condition in the query This sort of very basic things please tell me ... Also tell me , in the optimization view ,, which database engine is best Thanks Bharanikumar
Re: Query Optimization
The index hint is not in productoin code.. I was trying ot force it to use the index even when using the OR clause.. ment to take that out before I sent the email. The table structure is: CREATE TABLE `customer` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `ssn` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_ssn` (`ssn`) ) ENGINE=InnoDB CREATE TABLE `customer_id` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `customer_id` int(10) unsigned DEFAULT NULL, `id_num` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`), KEY `customer_key` (`customer_id`), KEY `id_id_num` (`id_num`) ) ENGINE=InnoDB The explain output of the query using the OR clause: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ALL possible_keys: idx_ssn key: NULL key_len: NULL ref: NULL rows: 176680 Extra: Using where; Using temporary *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where; Distinct 2 rows in set (0.00 sec) Using a UNION results in: *** 1. row *** id: 1 select_type: PRIMARY table: customer type: range possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 1 Extra: Using where; Using temporary *** 2. row *** id: 1 select_type: PRIMARY table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using index; Distinct *** 3. row *** id: 2 select_type: UNION table: customer_id type: range possible_keys: customer_key,id_id_num key: id_id_num key_len: 35 ref: NULL rows: 1 Extra: Using where; Using temporary *** 4. row *** id: 2 select_type: UNION table: customer type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: aca_ecash.customer_id.customer_id rows: 1 Extra: Using where *** 5. row *** id: NULL select_type: UNION RESULT table: union1,2 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 5 rows in set (0.01 sec) The union is much faster. I've tested the same search for ID numbers on our test system (Windows 32-Bit, 2GB ram, P4 3Ghz) against the productoin system (RHEL 64-Bit 16GB ram, Dual Xeon 2Ghz).. the search in the test system is almost instant as compared to the production system its taking 4 to 6 seconds. There's not much traffic today on it. I'm going to put the UNION into production and see how it goes. Thanks for the replies. -johnny On Tue, Jan 13, 2009 at 7:39 PM, Andrew Garner andrew.b.gar...@gmail.comwrote: On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote: If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. And in newer versions, too. The optimizer frequently underestimates the cost of the merge operation and the required random I/O for row lookups. So, yes it can use an index merge, but... efficiency is another question. I've seen table scans outperform a two-way index merge by orders of magnitude. These appeared to be high selectivity indexes, but perhaps I assumed too much. :) -- - Johnny Withers 601.209.4985 joh...@pixelated.net
Re: Query Optimization
Try a union instead of an or condition. http://dev.mysql.com/doc/refman/5.0/en/union.html Johnny Withers wrote: I have the following tables: Customer: id,ssn Customer_Id: id,customer_id,id_num The customer table holds customers along with their SSN and the customer_id table holds identifications for each customer (Driver's License, State Issued ID, Student ID, etc). The SSN column from the customer table is VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32). Both of these columns have an index on them. The following query uses the index on customer.ssn and executes in 0ms: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ref possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: const rows: 1 Extra: Using where; Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Now, this is the query I have trouble with, it does not use the index (or says it does but doesn't?) and on a busy system (200+ queries per sec) can take up to 20 seconds or more to execute: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789' OR id_num='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: index possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 165843 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where Is there some way I can make it use the index? I've thought about redesigning the query to select from the customer_id table first, if a row is found.. just return the matching customer_id from the customer table.. but I wanted to see if maybe i'm going about this the wrong way before I engineer some way around this. Thanks in advance, - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
Do you have an index on id_num? What sort of explain output do you get when you don't use a query hint? Your USE INDEX hint may be causing MySQL to ignore a better strategy. If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. On Mon, Jan 12, 2009 at 9:43 AM, Johnny Withers joh...@pixelated.net wrote: I have the following tables: Customer: id,ssn Customer_Id: id,customer_id,id_num The customer table holds customers along with their SSN and the customer_id table holds identifications for each customer (Driver's License, State Issued ID, Student ID, etc). The SSN column from the customer table is VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32). Both of these columns have an index on them. The following query uses the index on customer.ssn and executes in 0ms: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ref possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: const rows: 1 Extra: Using where; Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Now, this is the query I have trouble with, it does not use the index (or says it does but doesn't?) and on a busy system (200+ queries per sec) can take up to 20 seconds or more to execute: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789' OR id_num='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: index possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 165843 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where Is there some way I can make it use the index? I've thought about redesigning the query to select from the customer_id table first, if a row is found.. just return the matching customer_id from the customer table.. but I wanted to see if maybe i'm going about this the wrong way before I engineer some way around this. Thanks in advance, - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. And in newer versions, too. The optimizer frequently underestimates the cost of the merge operation and the required random I/O for row lookups. So, yes it can use an index merge, but... efficiency is another question. I've seen table scans outperform a two-way index merge by orders of magnitude. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote: If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. And in newer versions, too. The optimizer frequently underestimates the cost of the merge operation and the required random I/O for row lookups. So, yes it can use an index merge, but... efficiency is another question. I've seen table scans outperform a two-way index merge by orders of magnitude. These appeared to be high selectivity indexes, but perhaps I assumed too much. :) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Query Optimization
I have the following tables: Customer: id,ssn Customer_Id: id,customer_id,id_num The customer table holds customers along with their SSN and the customer_id table holds identifications for each customer (Driver's License, State Issued ID, Student ID, etc). The SSN column from the customer table is VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32). Both of these columns have an index on them. The following query uses the index on customer.ssn and executes in 0ms: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ref possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: const rows: 1 Extra: Using where; Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Now, this is the query I have trouble with, it does not use the index (or says it does but doesn't?) and on a busy system (200+ queries per sec) can take up to 20 seconds or more to execute: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789' OR id_num='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: index possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 165843 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where Is there some way I can make it use the index? I've thought about redesigning the query to select from the customer_id table first, if a row is found.. just return the matching customer_id from the customer table.. but I wanted to see if maybe i'm going about this the wrong way before I engineer some way around this. Thanks in advance, - Johnny Withers 601.209.4985 joh...@pixelated.net
Re: Query optimization help
First, you might want to move the WHERE...t3.int_a = some integer condition into the join condition for t3. Your not using anything from t4, so I'm not sure why you have that table in your query. You can suggest or force mysql to use an index if it's using the wrong one: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html On very rare occasions I've had to do this. What's happening is that mysql is analyzing the information it has about the data and indexes and coming to the wrong conclusion, perhaps even opting for an entire table scan. You can run ANALYZE TABLE to force mysql to update the information it has about the data. This may actually solve your problem. Try SHOW INDEX FROM t1 to see what data mysql has about the indexes. Sometimes the CARDINALITY (uniqueness) column will be null which can indicate a problem. Posting the result of your EXPLAIN will actually be helpful. Hope that helps. Brent Baisley On Aug 11, 2008, at 8:26 AM, Jonathan Terhorst wrote: I have this query: SELECT DISTINCT t1.string FROM t1 LEFT JOIN t2 ON t1.string=t2.string LEFT JOIN t3 ON t1.int_a=t3.int_a LEFT JOIN t4 ON t1.int_b=t4.int_b WHERE t1.string != '' AND t2.string IS NULL AND t3.int_a = some integer ORDER BY t1.string ASC This query is executing slower than it should. EXPLAIN has it using temporary and using filesort. I have indexes on every column in the query, but I think the problem is the one-index-per-table limitation. According to EXPLAIN, there are two possible indices in use for t1 (int_a and string), but only int_a is being used. So I tried constructing a compound index on int_a and string. Although this new index appears in possible_keys, EXPLAIN still shows the key actually being used as int_a. I tried building the compound key in both orders and had the same results. How do get mysql to all possible keys on t1 when running the query? Thanks! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query optimization help
I have this query: SELECT DISTINCT t1.string FROM t1 LEFT JOIN t2 ON t1.string=t2.string LEFT JOIN t3 ON t1.int_a=t3.int_a LEFT JOIN t4 ON t1.int_b=t4.int_b WHERE t1.string != '' AND t2.string IS NULL AND t3.int_a = some integer ORDER BY t1.string ASC This query is executing slower than it should. EXPLAIN has it using temporary and using filesort. I have indexes on every column in the query, but I think the problem is the one-index-per-table limitation. According to EXPLAIN, there are two possible indices in use for t1 (int_a and string), but only int_a is being used. So I tried constructing a compound index on int_a and string. Although this new index appears in possible_keys, EXPLAIN still shows the key actually being used as int_a. I tried building the compound key in both orders and had the same results. How do get mysql to all possible keys on t1 when running the query? Thanks! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
-What I'am trying to do: Bit hard to explain. I've got a table consisting of ip addresses (ipv4_src), destination addresses (ipv4_dst), and port numbers (port_dst) and some other irrelevant columns. Ultimately my goal is to find a linear order in a subset of ports. For example, host A connects to B on port 1, 2, 3, 4,...20, I would like to find this linear relation. To achieve this, I have written some mathematical formula's. Unfortunately there is one downside to my formula: It can be fooled by adding some extreme values. It cannot find a linear relation in this list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up, the 45000 number ruïns the result of the algorithm. So the query I've submitted, is ment to remove extreme values. The methods boxplot(pF.port_dst,LOW) and boxplot(pF.port_dst,HIGH) calculate in linear time the allowed range of numbers. Extreme values won't be included in this range. So in the example, the range would be [1,20] therby omitting the value 45000. Finally I would like to filter my table with port numbers for every ipv4_srcipv4_dst tuple and remove all the numbers not fitting in the range. -In human readable pseudo code this is the query: SELECT source,dest,port,octets FROM ( SELECT source,dest,boxplot(port,LOW) AS low,boxplot(port,HIGH) AS high FROM --Calculate the LOW and HIGH values for each source,dest pair. ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries. ) pF GROUP BY source,dest ) boxplot ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries (again!). ) filter WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND filter.port=boxplot.LOW AND filter.port=boxplot.HIGH --Relate the tables 'boxplot' and 'filter' to eachother AND select only the source,dest,port tuples where port is in the range [LOW,HIGH] from the filter table. -Here is the original query I would like to optimize again: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst )filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low I'll try the 'EXPLAIN' query tonight, but therefor I've got to create a small database first to speed up the results. Thnx in advance for your help. On Jan 22, 2008 8:15 AM, Sebastian Mendel [EMAIL PROTECTED] wrote: Joris Kinable schrieb: Optimize query I've got one query, which I would like to improve a lot since it takes very long (24 hours) to execute. Here is the idea: 1. Take the table ipv4_srcipv4_dstport_dst (other rows in this table are not mentioned for clearity) and remove all duplicate tuple's. This is done by subquery 'filter'. 2. The same query is performed by the boxplot query, but this time an aditional group by command is executed, therby calculating a User Defined Function boxplot(row,type) which returns a double value. 3. Finally the results of the query in step 2 are used to select a subset of results from the 'filter' table. 4. As you can see, the subquery 'pF' used in step 2 is identical to the query 'filter'. It's an extreme waste to calculate the same table twice. I've tried to create a temporary table from filter, but unfortunately Mysql doesn't allow you to access a temporary table twice in the same query. I prefer a 1 query answer, instead of creating views, or temporary tables. Is there a way to improve this query, therby improving the execution time? Query: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10 ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low what you are trying to do? and how about formating your query in a human readable way? did you tried EXPLAIN? what type of syntax is this: boxplot(pF.port_dst,LOW) ? -- Sebastian --
Query optimization
Optimize query I've got one query, which I would like to improve a lot since it takes very long (24 hours) to execute. Here is the idea: 1. Take the table ipv4_srcipv4_dstport_dst (other rows in this table are not mentioned for clearity) and remove all duplicate tuple's. This is done by subquery 'filter'. 2. The same query is performed by the boxplot query, but this time an aditional group by command is executed, therby calculating a User Defined Function boxplot(row,type) which returns a double value. 3. Finally the results of the query in step 2 are used to select a subset of results from the 'filter' table. 4. As you can see, the subquery 'pF' used in step 2 is identical to the query 'filter'. It's an extreme waste to calculate the same table twice. I've tried to create a temporary table from filter, but unfortunately Mysql doesn't allow you to access a temporary table twice in the same query. I prefer a 1 query answer, instead of creating views, or temporary tables. Is there a way to improve this query, therby improving the execution time? Query: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10 ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
Joris Kinable schrieb: Optimize query I've got one query, which I would like to improve a lot since it takes very long (24 hours) to execute. Here is the idea: 1. Take the table ipv4_srcipv4_dstport_dst (other rows in this table are not mentioned for clearity) and remove all duplicate tuple's. This is done by subquery 'filter'. 2. The same query is performed by the boxplot query, but this time an aditional group by command is executed, therby calculating a User Defined Function boxplot(row,type) which returns a double value. 3. Finally the results of the query in step 2 are used to select a subset of results from the 'filter' table. 4. As you can see, the subquery 'pF' used in step 2 is identical to the query 'filter'. It's an extreme waste to calculate the same table twice. I've tried to create a temporary table from filter, but unfortunately Mysql doesn't allow you to access a temporary table twice in the same query. I prefer a 1 query answer, instead of creating views, or temporary tables. Is there a way to improve this query, therby improving the execution time? Query: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10 ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low what you are trying to do? and how about formating your query in a human readable way? did you tried EXPLAIN? what type of syntax is this: boxplot(pF.port_dst,LOW) ? -- Sebastian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help wit query optimization (cont'd)
Perrin Harkins schrieb: On Jan 4, 2008 5:51 PM, Eben [EMAIL PROTECTED] wrote: The resultset is paginated on the front end using the SQL_CALC_FOUND_ROWS functionality... Usually a bad idea: http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ usually ... but i do not know of any index capable of having FULLTEXT and 'normal' fields in one index WHERE MATCH table_2.field AGAINST ('value') AND table_2.current = 1 AND table_2.shared_id IN (SELECT shared_id FROM table_1_view) or am i wrong? -- Sebastian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help wit query optimization (cont'd)
On Jan 9, 2008 4:33 AM, Sebastian Mendel [EMAIL PROTECTED] wrote: usually ... but i do not know of any index capable of having FULLTEXT and 'normal' fields in one index Does that matter? It would have to be doing a full scan for SQL_CALC_FOUND_ROWS to work out well. - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help wit query optimization (cont'd)
Perrin Harkins schrieb: On Jan 9, 2008 4:33 AM, Sebastian Mendel [EMAIL PROTECTED] wrote: usually ... but i do not know of any index capable of having FULLTEXT and 'normal' fields in one index Does that matter? yes, as written in the mentioned article the test is only relevant with correct used indexes, but MySQL does not use more than one index, so this query cannot all be done with indexes please correct me if i am wrong -- Sebastian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help wit query optimization (cont'd)
On Jan 9, 2008 8:34 AM, Sebastian Mendel [EMAIL PROTECTED] wrote: yes, as written in the mentioned article the test is only relevant with correct used indexes, but MySQL does not use more than one index, so this query cannot all be done with indexes Well, first of all, MySQL 5 does use more than one index, although I'm not sure it can do this with a full text index. http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html Also, there's a lot of room between a full table scan and using every index. Using any index in a way that allows the query to be satisfied without scanning every single row should be enough to make the count query better. - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help with query optimization
Your biggest problem is probably the subquery/IN your are performing. You should change that to a join. And I don't know about using SQL_CALC_FOUND_ROWS in a full text query that's not boolean, and you shouldn't use it unless you have a LIMIT clause. SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.* FROM table_2 JOIN (SELECT shared_id FROM table_1_view) as table_3 ON table_2.shared_id=table_3.shared_id LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id WHERE MATCH table_2.field AGAINST ('value') AND table_2.current = 1 I know the difference doesn't seem that much, but MySQL optimizes it very differently. Brent On Jan 4, 2008, at 5:47 PM, Eben wrote: Hi, I have a query that has to run on a full text indexed table with many millions of records. I'm trying to figure out some optimizations for it. Here's the general query: SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.* FROM table_2 LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id WHERE MATCH table_2.field AGAINST ('value') AND table_2.current = 1 AND table_2.shared_id IN (SELECT shared_id FROM table_1_view) Some details: 1. table_1.shared_id and table_2.shared_id are indexed 2. table_2.current is an indexed tinyint 3. table_1_view is derived from a query like: SELECT shared_id FROM table_1 WHERE some_field LIKE 'some_value%' table_1 is a relatively small table i.e. 100k records table_2 is massive with 10 million records Any ideas or suggestions are appreciated thanks, Eben -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql? [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
help wit query optimization (cont'd)
I left something out, the query looks like: SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.* FROM table_2 LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id WHERE MATCH table_2.field AGAINST ('value') AND table_2.current = 1 AND table_2.shared_id IN (SELECT shared_id FROM table_1_view) LIMIT 0,10 The resultset is paginated on the front end using the SQL_CALC_FOUND_ROWS functionality... thanks, Eben -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help wit query optimization (cont'd)
On Jan 4, 2008 5:51 PM, Eben [EMAIL PROTECTED] wrote: The resultset is paginated on the front end using the SQL_CALC_FOUND_ROWS functionality... Usually a bad idea: http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
help with query optimization
Hi, I have a query that has to run on a full text indexed table with many millions of records. I'm trying to figure out some optimizations for it. Here's the general query: SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.* FROM table_2 LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id WHERE MATCH table_2.field AGAINST ('value') AND table_2.current = 1 AND table_2.shared_id IN (SELECT shared_id FROM table_1_view) Some details: 1. table_1.shared_id and table_2.shared_id are indexed 2. table_2.current is an indexed tinyint 3. table_1_view is derived from a query like: SELECT shared_id FROM table_1 WHERE some_field LIKE 'some_value%' table_1 is a relatively small table i.e. 100k records table_2 is massive with 10 million records Any ideas or suggestions are appreciated thanks, Eben -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: help wit query optimization (cont'd)
I've never used this feature before on my application. On Jan 5, 2008 7:06 AM, Perrin Harkins [EMAIL PROTECTED] wrote: On Jan 4, 2008 5:51 PM, Eben [EMAIL PROTECTED] wrote: The resultset is paginated on the front end using the SQL_CALC_FOUND_ROWS functionality... Usually a bad idea: http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- I'm a mysql DBA in china. More about me just visit here: http://yueliangdao0608.cublog.cn
Query Optimization
I need to check a date_time value in a number of tables for up-to-date-ness. The following query is part of a larger PHP script that runs as a cron job every 10 minutes: query select case # If within 2 minutes, do nothing. when (unix_timestamp() - unix_timestamp(date_time)) 120 then 'ok' # If between 2 and 60 minutes old, send an email each time the script is called (q 10 min). when (unix_timestamp() - unix_timestamp(date_time)) = 120 (unix_timestamp() - unix_timestamp(date_time)) 3600 then 'email' # If over an hour old, send out one email per hour. when (unix_timestamp() - unix_timestamp(date_time)) = 3600 (unix_timestamp() - unix_timestamp(date_time)) % 3600 2999 (unix_timestamp() - unix_timestamp(date_time)) % 3600 3600 then 'email' end as 'test' from mytable order by date_time desc limit 1; /qyery This seems to run OK, but I'd be interested if anyone sees any way to improve it. Thanks, --David.
Query Optimization
Hi All, How can I optimize the following 3-way JOIN SQL query given then following schema: SELECT ORDER.ID, ORDER.STATUS, ORDER.ORDER_TIME, ORDER_ITEM.ID, ORDER_ITEM.QUANTITY, PRODUCT.SIZE, PRODUCT.SHAPE, PRODUCT.PAPER, PRODUCT.TURNAROUND FROM ORDER, ORDER_ITEM, PRODUCT WHERE ORDER.ID = ORDER_ITEM.ORDER_ID AND ORDER_ITEM.PRODUCT_ID = PRODUCT.PRODUCT_ID AND ORDER.STATUS = status AND ORDER.TIME 'startTime' AND ORDER.TIME 'endTime' AND ORDER_ITEM.QUANTITY = quantity AND PRODUCT.SIZE = 'size' AND PRODUCT.SHAPE = 'shape' AND PRODUCT.PAPER = 'paper' AND PRODUCT.TURNAROUND = 'turnaround' ORDER BY ORDER.ORDER_TIME DESC LIMIT start, offset; [ORDER TABLE] row count = 350,544 (read/write. mostly read) ++-+--+-+-+ | Field | Type| Null | Key | Default | ++-+--+-+-+ | ID | int(10) unsigned| NO | PRI | 0 | | STATUS | tinyint(3) unsigned | NO | MUL | 0 | | ORDER_TIME | datetime| YES | | NULL| [ORDER_ITEM TABLE] row count = 548,456 (read/write. mostly read) ++-+--+-+-+ | Field | Type| Null | Key | Default | ++-+--+-+-+ | ID | int(10) unsigned| NO | PRI | 0 | | ORDER_ID | int(10) unsigned| NO | PRI | 0 | | PRODUCT_ID | int(10) unsigned| YES | | 0 | | QUANTITY | int(10) unsigned| YES | | NULL| [PRODUCT TABLE] row count = 56,641 (static content, hardly ever changes) +---+--+--+-+-+ | Field | Type | Null | Key | Default | +---+--+--+-+-+ | PRODUCT_ID| int(10) unsigned | NO | PRI | 0 | | SIZE | varchar(50) | YES | | NULL| (indexed) | PAPER | varchar(50) | YES | | NULL| (indexed) | TURNAROUND| varchar(50) | YES | | NULL| (indexed) | SHAPE | varchar(50) | YES | | NULL| (indexed) Thanks, Drew
Insert ... Select Max() .. transactional Query optimization on an InnoDB table
Hi, Using MySQL-5.0.22/Win-XP-SP2 and storing data in InnoDB tables. Clients connect from VB6. Enclosing all transactions in START TRANSACTION ... COMMIT statements from VB6 clients. I have the following query for a table tmp2 with a column x of data-type INT. Insert into tmp2(x,y,x) Select ((Max(x))-((max(x) div 100) * 100)+1 as pid, 'text1','text2' from tmp2 where ((x div 100)=2147 having (((max(x))-((max(x) div 100) * 100)) 483647; I have the following questions: 1) Will the above INSERT statement run if I use constant values for y and z? without a GROUP BY clause? 2) How I can make the above select statement return a 0 (ZERO) value for pid in case it returns an empty resultset? Though I am also thinking about it but it'd be great if you guys could give me some hints. I want to keep it fast so I haven't used any functions, etc. -- Thanks in advance, Asif -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
[EMAIL PROTECTED] wrote: Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield 16 This query will use the index if the number of rows with indexfield != 16 is small enough ( about 30%). This query should be noticeably faster due to using ranged queries (partial index ranged matches): ( SELECT ... FROM ... WHERE indexfield 16 ) UNION ( SELECT ... FROM ... WHERE indexfield 16 ) At the very least, you'll want to make that UNION ALL, rather than just UNION, so mysql doesn't waste time looking for duplicates to remove. But even then, this may be slower. First, these two unioned queries may still not use the index. If the number of rows with indexfield 16 is too large (more than about 30%) the first query will be a table scan. If the number of rows with indexfield 16 is too large (more than about 30%) the second query will be a table scan. In fact, if the number of rows with indexfield = 16 is less than about 40% of the table, then at least one of the two unioned queries is guaranteed to be a table scan. Worse yet, this query stands a good chance of being twice as long as the single, != query, because it may require 2 table scans! Second, even if both unioned queries use the index, the result still may not be faster. If the combined number of matching rows is too high, the full table scan should be faster than the indexed lookups. For example: SELECT cat, COUNT(*) FROM inits GROUP BY cat; +--+--+ | cat | COUNT(*) | +--+--+ |0 | 5743 | |1 | 3792 | |2 |30727 | |3 | 1926 | |4 | 7812 | +--+--+ 19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side. First, the != case: EXPLAIN SELECT * FROM inits WHERE cat != 2 \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: ALL possible_keys: cat_idx key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where As expected, a full table scan. SELECT * FROM inits WHERE cat != 2; ... 19273 rows in set (0.37 sec) Now, the unioned range queries: EXPLAIN SELECT * FROM inits WHERE cat 2 UNION ALL SELECT * FROM inits WHERE cat 2 \G *** 1. row *** id: 1 select_type: PRIMARY table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 5680 Extra: Using where *** 2. row *** id: 2 select_type: UNION table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 6543 Extra: Using where *** 3. row *** id: NULL select_type: UNION RESULT table: union1,2 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: As hoped, mysql plans to use the index for each query. SELECT * FROM inits WHERE cat 2 UNION ALL SELECT * FROM inits WHERE cat 2; ... 19273 rows in set (0.78 sec) Despite (because of) using the index, this takes more than twice as long! Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Even when using an index is possible, it is not always desirable. I'd suggest not trying to outfox the optimizer until you've first determined it is making bad choices, and then test to make sure the solution is actually an improvement. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: Query Optimization Question
Shawn, Any performance gains for specifying type 0 than type 0 ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query Optimization Question
Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield 16 This query should be noticeably faster due to using ranged queries (partial index ranged matches): ( SELECT ... FROM ... WHERE indexfield 16 ) UNION ( SELECT ... FROM ... WHERE indexfield 16 ) Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM: Shawn, Any performance gains for specifying type 0 than type 0 ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: Query Optimization Question
Interesting, that seems like an optimization the query optimizer could do itself when it sees a operator on a indexed numeric. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 8:01 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: RE: Query Optimization Question Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield 16 This query should be noticeably faster due to using ranged queries (partial index ranged matches): (SELECT ... FROM ... WHERE indexfield 16 ) UNION ( SELECT ... FROM ... WHERE indexfield 16 ) Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM: Shawn, Any performance gains for specifying type 0 than type 0 ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Query Optimization Question
Robert DiFalco wrote: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. Mladen Adamovic wrote: IN should be faster implemented with both hash tables and BTREE's so nowadays it should also be faster than as all MySQL implementation AFAIK use those well known data structures for indexes. [EMAIL PROTECTED] wrote: YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on your temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Implicit in Mladen and Shawn's answers, but never actually mentioned in the original post, is the presence of an index on the type column. This is probably obvious to all concerned, but I mention it for completeness: without an index on type, there is no difference between type IN (1,2,3) and type != 0. That is, the question is not whether IN is better than !=, but rather which will allow the optimizer to make good use of the index on type. I find mysql's optimizer is pretty good with well-written queries, as long as subqueries aren't involved, so my initial reaction was to expect no difference. After all, as the optimizer considers the WHERE conditions and the available indexes, it is certainly possible, at least theoretically, for it to notice that type IN (1,2,3) and type != 0 are identical conditions. That is, a clever optimizer could treat them identically. Shawn's and Mladen's answers gave me pause, however, and aroused my curiosity, so I decided to test: SELECT VERSION(); +---+ | VERSION() | +---+ | 4.1.15| +---+ SELECT cat, COUNT(*) FROM inits GROUP BY cat; +--+--+ | type | COUNT(*) | +--+--+ |0 |44224 | |1 | 1919 | |2 | 1931 | |3 | 1926 | +--+--+ mysql EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 8117 Extra: Using where mysql EXPLAIN SELECT * FROM inits WHERE cat != 0 \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 8120 Extra: Using where As you can see, the optimizer plans to use the index in both cases, examining 8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rows out of 50,000 (about 11.55%). On the other hand, it makes a difference how many rows will match. What is the distribution of values of type? If the number of matching rows is more than about 30% of the table, the optimizer won't use an available index in any case. For example, mysql EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: ALL possible_keys: cat_idx key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where mysql EXPLAIN SELECT * FROM inits WHERE cat !=3 \G *** 1. row *** id: 1 select_type: SIMPLE table:
Query Optimization Question
In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
Robert DiFalco wrote: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; IN should be faster implemented with both hash tables and BTREE's so nowadays it should also be faster than as all MySQL implementation AFAIK use those well known data structures for indexes. I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. -- Mladen Adamovic http://home.blic.net/adamm http://www.shortopedia.com http://www.froola.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Help with Query Optimization
Hi All, We use Vbulletin for our message boards, and I have a query which takes approximately 1.2 - 1.5 seconds to execute, on a table ('thread') having around 130,000 rows, it is as follows SELECT threadid FROM thread AS thread LEFT JOIN deletionlog AS deletionlog ON ( thread.threadid = deletionlog. primaryid AND TYPE = 'thread' ) WHERE forumid =98 AND sticky =0 AND visible =1 AND deletionlog.primaryid IS NULL ORDER BY sticky DESC , lastpost DESC LIMIT 0 , 15 There is an index on the field 'lastpost' ( Type: INT(10) ) , also Explain Select gives me this.. id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE thread index *NULL* lastpost 4 *NULL* 112783 Using where 1 SIMPLE deletionlog eq_ref PRIMARY PRIMARY 5 foxbox16.thread.threadid,const 1 Using where; Using index; Not exists I am no SQL guru so can somebody kindly help me to optimize this query so that it could run faster , as during peak times this slows the DB down a lot.. Kishore Jalleda
Re: Help with Query Optimization
Kishore, We use Vbulletin for our message boards, and I have a query which takes approximately 1.2 - 1.5 seconds to execute, on a table ('thread') having around 130,000 rows, it is as follows SELECT threadid FROM thread LEFT JOIN deletionlog ON ( thread.threadid = deletionlog.primaryid AND TYPE = 'thread' ) WHERE forumid = 98 AND sticky = 0 AND visible = 1 AND deletionlog.primaryid IS NULL ORDER BY sticky DESC , lastpost DESC LIMIT 0 , 15 Your formatting of EXPLAIN output is a bit hard to read, but did you try indexes on (sticky,lastpost) and (forumid,sticky,visible)? And depending on how big deletionlog is, you might move ... AND deletionlog.primaryid IS NULL from the WHERE clause to the ON clause, to reduce the number of rows the query engine has to write to its temp table. PB - Kishore Jalleda wrote: Hi All, We use Vbulletin for our message boards, and I have a query which takes approximately 1.2 - 1.5 seconds to execute, on a table ('thread') having around 130,000 rows, it is as follows SELECT threadid FROM thread AS thread LEFT JOIN deletionlog AS deletionlog ON ( thread.threadid = deletionlog. primaryid AND TYPE = 'thread' ) WHERE forumid =98 AND sticky =0 AND visible =1 AND deletionlog.primaryid IS NULL ORDER BY sticky DESC , lastpost DESC LIMIT 0 , 15 There is an index on the field 'lastpost' ( Type: INT(10) ) , also Explain Select gives me this.. id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE thread index *NULL* lastpost 4 *NULL* 112783 Using where 1 SIMPLE deletionlog eq_ref PRIMARY PRIMARY 5 foxbox16.thread.threadid,const 1 Using where; Using index; Not exists I am no SQL guru so can somebody kindly help me to optimize this query so that it could run faster , as during peak times this slows the DB down a lot.. Kishore Jalleda No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 268.1.0/269 - Release Date: 2/24/2006 No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 268.1.0/269 - Release Date: 2/24/2006 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Help with query optimization query SUM
Hi: I'm develop a simple shopping cart. I have this two tables: carro_de_compras -- IDU int(11) NOT NULL IDA int(11) NOT NULL CantidadDeArticulos int(11) NOT NULL os_articulo -- IDA int(11) NOT NULL auto_increment, IDC int(11) NOT NULL default '0', ANombre varchar(200) NOT NULL default '', ADescripcion text, ACantidad int(11) NOT NULL default '0', AImagen varchar(50) default NULL, IDU int(11) NOT NULL default '0', APrecio float(6,2) default NULL, KEY AI_IDA (`IDA`) Before ask let me explain some things. As you can see in the tables I have the same field IDU in both tables. So in first(table carro_de_compras) it means is user ID loged on ecommerce system, the second is the user ID who upload articles for sale. Something like eBay in wich you can sale and buy at every time. The arrive the point in wich I need to optimize queries: PHP Code: - $sql = mysql_query(SELECT * FROM carro_de_compras); $sresultado = mysql_fetch_assoc($sql); $query = mysql_query(SELECT * FROM os_articulo WHERE (IDA='.$sresultado['IDA'].')); while ($record = mysql_fetch_assoc($query)) { $productos[] = $record; } The question for this problem is: exists any way to optimize this query and leave only in one line? I read in MySQL doc about it and found some about JOIN but I can't understand how it works. Maybe because I'm cuban and not understand english as well as I want. The other questions is how to add some values to a field. For example: $sql = mysql_query(UPDATE table SET value=value+1 WHERE id='1'); For do this query I do this: $sql = mysql_query(SELECT value FROM table WHERE id='1'); $result = mysql_query($sql); $update = mysql_query(UPDATE table SET (value='.$result['value'].' + 1) WHERE id='1'); So is possible to optimize this query? Regards ReynierPM 4to. año Ing. Informática Usuario registrado de Linux: #310201 * El programador superhéroe aprende de compartir sus conocimientos. Es el referente de sus compañeros. Todo el mundo va a preguntarle y él, secretamente, lo fomenta porque es así como adquiere su legendaria sabiduría: escuchando ayudando a los demás... -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help with query optimization query SUM
Dear Reynier, You can use JOIN on your both, The JOIN have to run on the same feilds i.e IDA. SELECT * FROM carro_de_compras LEFT JOIN os_articulo ON carro_de_compras.IDA = os_articulo.IDA This query returns all your users with their articles if any and you can iterate on it. but one note: Use INDEX on both tables. You may encounter problems when your rows grow up. about the UPDATE query: UPDATE table SET value=value+1 WHERE id='1' is enough, use that. On 2/7/06, Reynier Perez Mira [EMAIL PROTECTED] wrote: Hi: I'm develop a simple shopping cart. I have this two tables: carro_de_compras -- IDU int(11) NOT NULL IDA int(11) NOT NULL CantidadDeArticulos int(11) NOT NULL os_articulo -- IDA int(11) NOT NULL auto_increment, IDC int(11) NOT NULL default '0', ANombre varchar(200) NOT NULL default '', ADescripcion text, ACantidad int(11) NOT NULL default '0', AImagen varchar(50) default NULL, IDU int(11) NOT NULL default '0', APrecio float(6,2) default NULL, KEY AI_IDA (`IDA`) Before ask let me explain some things. As you can see in the tables I have the same field IDU in both tables. So in first(table carro_de_compras) it means is user ID loged on ecommerce system, the second is the user ID who upload articles for sale. Something like eBay in wich you can sale and buy at every time. The arrive the point in wich I need to optimize queries: PHP Code: - $sql = mysql_query(SELECT * FROM carro_de_compras); $sresultado = mysql_fetch_assoc($sql); $query = mysql_query(SELECT * FROM os_articulo WHERE (IDA='.$sresultado['IDA'].')); while ($record = mysql_fetch_assoc($query)) { $productos[] = $record; } The question for this problem is: exists any way to optimize this query and leave only in one line? I read in MySQL doc about it and found some about JOIN but I can't understand how it works. Maybe because I'm cuban and not understand english as well as I want. The other questions is how to add some values to a field. For example: $sql = mysql_query(UPDATE table SET value=value+1 WHERE id='1'); For do this query I do this: $sql = mysql_query(SELECT value FROM table WHERE id='1'); $result = mysql_query($sql); $update = mysql_query(UPDATE table SET (value='.$result['value'].' + 1) WHERE id='1'); So is possible to optimize this query? Regards ReynierPM 4to. año Ing. Informática Usuario registrado de Linux: #310201 * El programador superhéroe aprende de compartir sus conocimientos. Es el referente de sus compañeros. Todo el mundo va a preguntarle y él, secretamente, lo fomenta porque es así como adquiere su legendaria sabiduría: escuchando ayudando a los demás... -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- Sincerely, Hadi Rastgou a href=http://www.spreadfirefox.com/?q=affiliatesamp;id=0amp;t=1;Get Firefox!/a
Re: Query optimization
Hi Tripp, all! Tripp Bishop wrote: [[...]] Why would adding extra fields to the query slow the query down so much? In either case it looks like it's using the same index. If a query is to return only fields contained in the index, it can be executed as an index-only query, and the true data need not be accessed. The moment you ask for a field not contained in the index, for each index entry matching the search condition, the base data need to be accessed to get that field. This may be very significant additional costs. HTH, Joerg PS: In my reader, it looked like your original question hijacked another thread: Question regarding running 'mysql' from a bash script. :-( -- Joerg Bruehe, Senior Production Engineer MySQL AB, www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Mysql Query Optimization?
I have a query that's taking about 0.3 seconds to run and it should be faster I think. I've read all about query optimization and forcing indexes and I'm not entirely sure what I'm having this problem. Google finds all kinds of basic information like using indexes, and not doing math on the query fields. Stupid stuff I'm not doing. I have two tables EmailAddr and CustInfo. That look like this: +--+---+--+-+-+ | Field| Type | Null | Key | Default | +--+---+--+-+-+ | EmailID | mediumint(8) unsigned | | PRI | NULL| | EmailUsername| varchar(25) | YES | MUL | NULL| | EmailPassword| varchar(50) | YES | | NULL| | CustID | mediumint(8) unsigned | | MUL | 0 | +--+---+--+-+-+ And +-+---+--+-+---+ | Field | Type | Null | Key | Default | +-+---+--+-+---+ | CustID | mediumint(8) | | PRI | 0 | | CustAddress | varchar(150) | YES | | NULL | | CustCity| varchar(50) | YES | | NULL | | CustState | varchar(50) | YES | | NULL | | etc... | +-+---+--+-+---+ Where I select all the records from either table it's 0.02 seconds. There are about 10,000 records in each table. If I try and join the Email table and CustInfo table on CustID (which is in indexed field in both tables) the query time jumps to 0.3 seconds. SELECT EmailID, e.CustID FROM EmailAddr e, CustInfo c WHERE e.CustID = c.CustID; If I tell it to use the index primary it's fast (0.02 seconds) SELECT EmailID, e.CustID FROM EmailAddr e USE INDEX (Primary), CustInfo c WHERE e.CustID = c.CustID; #1) Why can't it find that relationship itself... #2) Why does telling it to use the Primary key for EmailAddr help at all!?! It's not even a query/join field. Scott -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql Query Optimization?
Hello. Please, provide the output of EXPLAIN for your queries. The performance should improved if you make e.CustID and c.CustID the same type (one of them is unsigned). Have you run ANALYZE TABLE on your tables? Scott Baker wrote: I have a query that's taking about 0.3 seconds to run and it should be faster I think. I've read all about query optimization and forcing indexes and I'm not entirely sure what I'm having this problem. Google finds all kinds of basic information like using indexes, and not doing math on the query fields. Stupid stuff I'm not doing. I have two tables EmailAddr and CustInfo. That look like this: +--+---+--+-+-+ | Field| Type | Null | Key | Default | +--+---+--+-+-+ | EmailID | mediumint(8) unsigned | | PRI | NULL| | EmailUsername| varchar(25) | YES | MUL | NULL| | EmailPassword| varchar(50) | YES | | NULL| | CustID | mediumint(8) unsigned | | MUL | 0 | +--+---+--+-+-+ And +-+---+--+-+---+ | Field | Type | Null | Key | Default | +-+---+--+-+---+ | CustID | mediumint(8) | | PRI | 0 | | CustAddress | varchar(150) | YES | | NULL | | CustCity| varchar(50) | YES | | NULL | | CustState | varchar(50) | YES | | NULL | | etc... | +-+---+--+-+---+ Where I select all the records from either table it's 0.02 seconds. There are about 10,000 records in each table. If I try and join the Email table and CustInfo table on CustID (which is in indexed field in both tables) the query time jumps to 0.3 seconds. SELECT EmailID, e.CustID FROM EmailAddr e, CustInfo c WHERE e.CustID = c.CustID; If I tell it to use the index primary it's fast (0.02 seconds) SELECT EmailID, e.CustID FROM EmailAddr e USE INDEX (Primary), CustInfo c WHERE e.CustID = c.CustID; #1) Why can't it find that relationship itself... #2) Why does telling it to use the Primary key for EmailAddr help at all!?! It's not even a query/join field. Scott -- For technical support contracts, goto https://order.mysql.com/?ref=ensita This email is sponsored by Ensita.NET http://www.ensita.net/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Gleb Paharenko / /|_/ / // /\ \/ /_/ / /__ [EMAIL PROTECTED] /_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.NET ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql Query Optimization?
It looks like I spoke too soon. That field fix sped up the straight join, but not the more complex query I had, it's still 0.3 seconds. SELECT EmailID, EmailUsername, d.DomainName, e.CustID FROM EmailAddr e, Domain d, CustInfo c WHERE e.CustID = c.CustID AND e.DomainID = d.DomainID AND (EmailUsername LIKE '%thompson%') LIMIT 10; This explains out like this Slow (0.3 seconds) +---++-+-+-++--+-+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---++-+-+-++--+-+ | c | index | PRIMARY | PRIMARY | 3 | [NULL] | 9371 | Using index | | e | ref| DomainID,CustID | CustID | 3 | c.CustID |1 | Using where | | d | eq_ref | PRIMARY | PRIMARY | 2 | e.DomainID |1 | | +---++-+-+-++--+-+ However if I force the index on EmailAddr: SELECT EmailID, EmailUsername, d.DomainName, e.CustID FROM EmailAddr e USE INDEX(Primary), Domain d, CustInfo c WHERE e.CustID = c.CustID AND e.DomainID = d.DomainID AND (EmailUsername LIKE '%thompson%') LIMIT 10; It's fast (0.02 seconds). So it's like Mysql isn't putting the tables in the right order. Does it have to do with the order I put the fields/tables in the query? The best way would be for mysql to do a full table scan for anything that matches that LIKE query... which should return 8 records, and then join on the CustID. As opposed to doing the join first and joining 10,000 records, and then grepping all those. Fast (0.02 seconds) +---++---+-+-++---+-+ | table | type | possible_keys | key | key_len | ref| rows | Extra | +---++---+-+-++---+-+ | e | ALL| [NULL]| [NULL] | [NULL] | [NULL] | 12413 | Using where | | d | eq_ref | PRIMARY | PRIMARY | 2 | e.DomainID | 1 | | | c | eq_ref | PRIMARY | PRIMARY | 3 | e.CustID | 1 | Using index | +---++---+-+-++---+-+ Scott Gleb Paharenko wrote: Hello. Please, provide the output of EXPLAIN for your queries. The performance should improved if you make e.CustID and c.CustID the same type (one of them is unsigned). Have you run ANALYZE TABLE on your tables? Scott Baker wrote: I have a query that's taking about 0.3 seconds to run and it should be faster I think. I've read all about query optimization and forcing indexes and I'm not entirely sure what I'm having this problem. Google finds all kinds of basic information like using indexes, and not doing math on the query fields. Stupid stuff I'm not doing. I have two tables EmailAddr and CustInfo. That look like this: +--+---+--+-+-+ | Field| Type | Null | Key | Default | +--+---+--+-+-+ | EmailID | mediumint(8) unsigned | | PRI | NULL| | EmailUsername| varchar(25) | YES | MUL | NULL| | EmailPassword| varchar(50) | YES | | NULL| | CustID | mediumint(8) unsigned | | MUL | 0 | +--+---+--+-+-+ And +-+---+--+-+---+ | Field | Type | Null | Key | Default | +-+---+--+-+---+ | CustID | mediumint(8) | | PRI | 0 | | CustAddress | varchar(150) | YES | | NULL | | CustCity| varchar(50) | YES | | NULL | | CustState | varchar(50) | YES | | NULL | | etc... | +-+---+--+-+---+ Where I select all the records from either table it's 0.02 seconds. There are about 10,000 records in each table. If I try and join the Email table and CustInfo table on CustID (which is in indexed field in both tables) the query time jumps to 0.3 seconds. SELECT EmailID, e.CustID FROM EmailAddr e, CustInfo c WHERE e.CustID = c.CustID; If I tell it to use the index primary it's fast (0.02 seconds) SELECT EmailID, e.CustID FROM EmailAddr e USE INDEX (Primary), CustInfo c WHERE e.CustID = c.CustID; #1) Why can't it find that relationship itself... #2) Why does telling it to use the Primary key for EmailAddr help at all!?! It's not even a query/join field. Scott -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http
Re: Mysql Query Optimization?
Ops, forgot to reply to this list Gleb: +---+---+---+-+-+--+--+-+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +---+---+---+-+-+--+--+-+ | c | index | PRIMARY | PRIMARY | 3 | [NULL] | 9371 | Using index | | e | ref | CustID| CustID | 3 | c.CustID | 1 | | +---+---+---+-+-+--+--+-+ Hopefully that won't come across too funky for you. Anyway I ended up changing the CustID field in the EmailAddr table to NOT be unsigned (so both the field types are the same) and it came back immediately. So it appears from my preliminary tests that the problem was that the field types didn't match. What a weird problem, I never would have thought of that. Good catch! Scott Gleb Paharenko wrote: Hello. Please, provide the output of EXPLAIN for your queries. The performance should improved if you make e.CustID and c.CustID the same type (one of them is unsigned). Have you run ANALYZE TABLE on your tables? Scott Baker wrote: I have a query that's taking about 0.3 seconds to run and it should be faster I think. I've read all about query optimization and forcing indexes and I'm not entirely sure what I'm having this problem. Google finds all kinds of basic information like using indexes, and not doing math on the query fields. Stupid stuff I'm not doing. I have two tables EmailAddr and CustInfo. That look like this: +--+---+--+-+-+ | Field| Type | Null | Key | Default | +--+---+--+-+-+ | EmailID | mediumint(8) unsigned | | PRI | NULL| | EmailUsername| varchar(25) | YES | MUL | NULL| | EmailPassword| varchar(50) | YES | | NULL| | CustID | mediumint(8) unsigned | | MUL | 0 | +--+---+--+-+-+ And +-+---+--+-+---+ | Field | Type | Null | Key | Default | +-+---+--+-+---+ | CustID | mediumint(8) | | PRI | 0 | | CustAddress | varchar(150) | YES | | NULL | | CustCity| varchar(50) | YES | | NULL | | CustState | varchar(50) | YES | | NULL | | etc... | +-+---+--+-+---+ Where I select all the records from either table it's 0.02 seconds. There are about 10,000 records in each table. If I try and join the Email table and CustInfo table on CustID (which is in indexed field in both tables) the query time jumps to 0.3 seconds. SELECT EmailID, e.CustID FROM EmailAddr e, CustInfo c WHERE e.CustID = c.CustID; If I tell it to use the index primary it's fast (0.02 seconds) SELECT EmailID, e.CustID FROM EmailAddr e USE INDEX (Primary), CustInfo c WHERE e.CustID = c.CustID; #1) Why can't it find that relationship itself... #2) Why does telling it to use the Primary key for EmailAddr help at all!?! It's not even a query/join field. Scott -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query optimization
Howdy all, I've got a question regarding optimizing a query. I've got two tables: listing and listing_city_xref listing is a pretty wide table with many fields that can be queried by users. There is a subset of fields that are always in queries of this table but there are many additional fields that may or maynot be included. The primary key consists of two fields, group_id int and listing_id varchar(30). The listing_city_xref table is very simple. It consists of 3 fields: group_id, listing_id, city_id. All three fields form the PK. There is an index on group_id and listing_id as well as a foreign key that points back to the listing table consisting of these two fields. I created an index on the listing table that contains the group_id, listing_id, and the other fields that all searches use. When I run a query that joins the two tables on the group_id and listing_id and I search only fields that are included in the index that I mentioned the query is really fast. If I add another field that's not included in the index the query slows down by a factor of 100. I can't simply index every field in the listing table so what can I do? I don't understand why added extra critieria to the query destroys its performance. Any ideas? Thanks, Tripp __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
It might be a good idea if you could post the results of an EXPLAIN SELECT ... for a fast query and a slow query along with their actual SQL statements. James Harvard At 11:37 am -0800 12/1/06, Tripp Bishop wrote: I've got a question regarding optimizing a query. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
Jim, here's an example query: SELECT COUNT(listing.listing_id) AS listing_count FROM listing_city_xref INNER JOIN listing ON listing.listing_id = listing_city_xref.listing_id AND listing.group_id = listing_city_xref.group_id WHERE listing_city_xref.group_id =2 AND listing_city_xref.city_id IN (83,88) AND listing.price BETWEEN 189000.00 AND 23.00 AND tot_finished_sqft = 1100 AND sqft = 1000 AND baths = 1 AND bedrooms = 3 AND garage_spaces = 1 AND property_type IN ('RES')); I created an index on the following fields in the listing table: group_id, price, tot_finished_sqft, baths, bedrooms, garage_spaces, property_type. Here's the result of the explain on the query above: | 1 | SIMPLE | listing | ref | PRIMARY, idx_search | idx_search | 4 | const | 8708 | Using where| | 1 | SIMPLE | listing_wiz_city_xref | ref | PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36| const, ahf_test.listing.listing_id |1 | Using where; Using index | This query took 12.91 seconds. Running it multiple times produces similar run times. Now, if I remove the sqft = 1100 condition from the where clause I get the following explain output: | 1 | SIMPLE | listing | range | PRIMARY, idx_search | idx_search | 15 | NULL | 8688 | Using where; Using index | | 1 | SIMPLE | listing_wiz_city_xref | ref | PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36 | const, ahf_test.listing.listing_id |1 | Using where; Using index | The query executes in 0.09 seconds. Running it multiple times produces similar runtimes. One interesting difference between the explain outputs is that in the second case the ref is NULL. In the first scenario it is const. Why would adding extra fields to the query slow the query down so much? In either case it looks like it's using the same index. Thanks, Tripp __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Doubt about query optimization
Eric, Can you send us the actual show indexes from table and explain output that isn't shortend? Thank you for answer my question. Actually, the real query and tables are much more complex than the data in the previous message. A just tryed to optimize the information for you better understand the trouble. I think found what´s happening. A SELECT WHERE city = 1 returns more rows thant a full table scan in the table front (the real states that appear in the front page). So, it seems MySQL choose the second option, once it has less rows to optimize. Thanks, Ronan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Doubt about query optimization
Hello, I´m using MySQL-4.1.7, and I have a query that seems, even using where clause, an specific table isn´t being optimized properly: I have four tables: real_state: cod, name, city, ag_cod, sell agency: cod, name, ag_status front: cod, rs_cod, ag_cod photo: cod, rs_cod These table have the following indexes: real_state: cod (pri) city ag_cod agency: cod name front: cod rs_cod ag_cod photo cod rs_cod When I EXPLAIN the query: EXPLAIN SELECT front.rs_cod, photo.cod, real_state.descr FROM real_state, agency, front, photo_foto WHERE real_state.city = 1 AND real_state.sell = 1 AND front.rs_cod = real_state.cod AND photo.rs_cod = front.rs_cod AND agency.cod = real_state.ag_cod AND ag_status = 'A' It shows me (in a short): tablekeyrows ====== frontrs_cod 2085 real_statecod1 agencycod1 photo rs_cod 1 But the trouble is: If I execute: SELECT COUNT(*) FROM real_state, front WHERE real_state.city = 1 AND real_state.cod = front.rs_cod Returns: 271 So, Why the index front.rs_cod isn´t being used? Any help would be appreciated, thank you, Ronan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Doubt about query optimization
Can you send us the actual show indexes from table and explain output that isn't shortend? Ronan Lucio wrote: Hello, I´m using MySQL-4.1.7, and I have a query that seems, even using where clause, an specific table isn´t being optimized properly: I have four tables: real_state: cod, name, city, ag_cod, sell agency: cod, name, ag_status front: cod, rs_cod, ag_cod photo: cod, rs_cod These table have the following indexes: real_state: cod (pri) city ag_cod agency: cod name front: cod rs_cod ag_cod photo cod rs_cod When I EXPLAIN the query: EXPLAIN SELECT front.rs_cod, photo.cod, real_state.descr FROM real_state, agency, front, photo_foto WHERE real_state.city = 1 AND real_state.sell = 1 AND front.rs_cod = real_state.cod AND photo.rs_cod = front.rs_cod AND agency.cod = real_state.ag_cod AND ag_status = 'A' It shows me (in a short): tablekeyrows ====== frontrs_cod 2085 real_statecod1 agencycod1 photo rs_cod 1 But the trouble is: If I execute: SELECT COUNT(*) FROM real_state, front WHERE real_state.city = 1 AND real_state.cod = front.rs_cod Returns: 271 So, Why the index front.rs_cod isn´t being used? Any help would be appreciated, thank you, Ronan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Group By query optimization
Hi All, I have a mysql query which takes 8 seconds to run ona dual xeon 2.4, 3Gig ram box, SELECT gamename, MAX(score) AS score, COUNT(valid=1) AS played FROM gamesessions AS gamesessions WHERE valid=1 AND sessiontype IN (1,2) GROUP BY gamename; Explain select gives table typepossible_keys key key_len ref rowsExtra gamesessionsALL NULLNULLNULLNULL915522 Using where; Using temporary; Using filesort How can I optimize the query Thanks Kishore -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Group By query optimization
Hi, Put indexes on 'valid' and 'sessiontype' and all will be good. Cheers, Andrew On 15/7/05 18:26, Kishore Jalleda [EMAIL PROTECTED] wrote: Hi All, I have a mysql query which takes 8 seconds to run ona dual xeon 2.4, 3Gig ram box, SELECT gamename, MAX(score) AS score, COUNT(valid=1) AS played FROM gamesessions AS gamesessions WHERE valid=1 AND sessiontype IN (1,2) GROUP BY gamename; Explain select gives table type possible_keys key key_len ref rows Extra gamesessions ALL NULL NULL NULL NULL 915522 Using where; Using temporary; Using filesort How can I optimize the query Thanks Kishore -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Select Query Optimization - 1 minute long :(
The following query is in desperate need of optimization. Any gurus out there who can share some insights, I'd greatly appreciate it. I have a request table, from which I'm pulling all records. Each request record has a part number, and for each requested part number, I'd like to query the inventory table for all matching inventory items for that part number, and calculate the potential value of the inventory. Each inventory record has 4 price fields, and it is the HIGHEST value of these fields used in the calculation. As you can tell by running the query, it is incredibly expensive. We have potentially millions of inventory records and 10,000+ requests, so the query is going to take several minutes to complete. Also, I found that due to the joining on each record(?), I couldn't implement a limit clause to save time. Whether or not a limit clause is tacked onto the end of the query, it completes the entire calculation process for both tables, and THEN returns the limited number of records. This of course saves me no time. :( Any help is greatly appreciated, Thanks. ps, all pertinent fields have already been indexed. This reduced the query time by half, but half of an eternity is still not that impressive. I'm also aware that the use of LIKE in my JOIN is expensive, but after replacing it with a =, I achieved only modest performance gains. SELECT r.id, r.company, r.dateSent, r.fullName, r.phone, r.fax, r.email, r.address1, r.address2, r.city, r.province, r.country, r.comments, r.partNumber, r.description, r.dateCode, r.qty, r.targetPrice, r.manufacturer, r.expiryDate, r.companyType, r.yearEstablished, r.url, r.languages, GREATEST(i.distySellCost, i.originalCost,i.unitCost,i.unitSellCost)*r.qty AS 'highestValue', count(i.id) as 'matches', SUM(i.qty) as 'qtyAvailable' FROM request r LEFT JOIN inventory i ON ( i.MPN LIKE CONCAT(r.partNumber, '%') OR i.MPNClean LIKE CONCAT(r.partNumber, '%')) AND i.status=1 WHERE r.deleted=0 GROUP BY r.id; Any help is appreciated. Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query optimization help needed
I asked for help with a version of this query a few months ago, and subsequently made some improvements to it, and also fooled around with other parts of my app that were in need of tuning. I've since done some more extensive benchmarking and realized that this query really is somewhat slow. Even though the data set is rather small and everything is (I think) properly indexed and the joins are sensible, I can't seem to get rid of the using temporary and using filesort in my EXPLAIN. I'd be grateful for any suggestions for improving this. Here's the query (obviously I run it with different values for subject.name and different LIMIT values, but this is representative): SELECT citation.*, DATE_FORMAT(citation.last_modified, '%e %M, %Y') AS last_modified FROM citation, subject, citation_subject WHERE subject.name = 'History' AND citation_subject.subject_id = subject.id AND citation_subject.citation_id = citation.id AND citation.deleted = 0 ORDER BY citation.stripped_word, FIELD(citation.part_of_speech, 'NOUN', 'ADJECTIVE', 'ADVERB', 'VERB'), citation.id LIMIT 150, 50 and EXPLAIN gives me this: *** 1. row *** table: subject type: ref possible_keys: PRIMARY,name key: name key_len: 50 ref: const rows: 1 Extra: Using where; Using temporary; Using filesort *** 2. row *** table: citation_subject type: ref possible_keys: citation_id,subject_id key: subject_id key_len: 4 ref: subject.id rows: 169 Extra: Using index *** 3. row *** table: citation type: eq_ref possible_keys: PRIMARY,deleted key: PRIMARY key_len: 4 ref: citation_subject.citation_id rows: 1 Extra: Using where Finally, here are the three tables involved. I've trimmed out the irrelevant columns: CREATE TABLE `citation` ( `id` int(11) NOT NULL auto_increment, `word` varchar(50) NOT NULL default '', `stripped_word` varchar(50) default NULL, `part_of_speech` enum('NOUN','VERB','ADJECTIVE','ADVERB') NOT NULL default 'NOUN', `last_modified` timestamp(14) NOT NULL, `deleted` datetime default '-00-00 00:00:00', PRIMARY KEY (`id`), KEY `deleted` (`deleted`), KEY `word` (`word`), KEY `stripped_word` (`stripped_word`) ) TYPE=MyISAM CREATE TABLE `citation_subject` ( `id` int(11) NOT NULL auto_increment, `citation_id` int(11) NOT NULL default '0', `subject_id` int(11) NOT NULL default '0', PRIMARY KEY (`id`), KEY `citation_id` (`citation_id`,`subject_id`), KEY `subject_id` (`subject_id`,`citation_id`) ) TYPE=MyISAM CREATE TABLE `subject` ( `id` int(11) NOT NULL auto_increment, `name` varchar(50) NOT NULL default '', `deleted` datetime default NULL, PRIMARY KEY (`id`), KEY `name` (`name`) ) TYPE=MyISAM Thank you for any suggestions. Jesse Sheidlower -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Doubt about query optimization
Hello, Supposing I have 2 tables: product == - id - name price = - id - product_id - sell_price - rent_price If I want to make a SELECT for the products having the rent_price between 300,00 and 500,00, so I use the query: SELECT rent_price FROM product LEFT JOIN price ON (product.id = price.product_id) WHERE rent_price BETWEEN 300.00 and 500.00 My doubt is if the table product will be optimized. Will optimization process be done over all rows from the product table or only the rows related to the price table? In other words: If table price has other columns and LEFT JOIN is needed anyway, even that would be better to move the columns sell_price and rent_price to the product table? My table has so many columns, and, for structural and maintainance reasons it would be better to divide the columns in two tables, but I can´t compromisse the application performance because of it. Thanks, Ronan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query optimization question
Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as sublon ORDER BY distance; Thanks. GT -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization question
Gerald Taylor wrote: Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as sublon ORDER BY distance; Thanks. GT A Subselect won't help. You still have to scan the whole table to the calculation for the where clause. Pre calculate your min and max lat. and lon. WHERE latitude = @maxlat and latitude = @minlat and longitude = @maxlon and longitude = @minlon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization question
When you don't have subselects, you have two options: temporary tables or JOINed queries.In your case, I think the temporary table is the better way to go. I would also eliminate the ABS() check so that I can compare values directly against the index. I know the math is correct your way but this way you are comparing values directly against the column which means that indexes can come into play. SET @targetLat = 44.6, @targetLon = -123.8, @Delta = 3 CREATE TEMPORARY TABLE tmpDeltaData SELECT city, state, country, latitude, longitude FROM Londata WHERE latitude BETWEEN (@[EMAIL PROTECTED]) AND (@targetLat + @Delta) AND longitude BETWEEN (@targetLon - @Delta) AND (@targetLon + @Delta) SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9\\.]+$' AND longitude REGEXP'[0-9\\.]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(@targetLat))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(@targetLat))) * (COS(RADIANS(latitude))) * (COS(RADIANS(@targetLon -longitude) * 111),) as distance FROM tmpDeltaData ORDER BY distance DROP TEMPORARY TABLE tmpDeltaData I would also test the WHERE clause from tmpDeltaData with your original query to compare speeds of the two methods. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Gerald Taylor [EMAIL PROTECTED] wrote on 10/04/2004 10:33:22 AM: Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as sublon ORDER BY distance; Thanks. GT -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query Optimization
Hi, I'm trying to sort out a query that identifies images that are not in the story table (structures below). Peter Brawley has kindly pointed me in the right direction with the sql structure (which does work), but I'm getting benchmarks of 5+ seconds on the test data, whereas the live site has 50x the number of rows and I'm concerned about trying to go live with it. If anyone could suggest ways to improve the query/table structure, I'd be most grateful. Kris SELECT i.image_id, i.image_name FROM table_image i LEFT JOIN table_story s USING (mainpic_id) WHERE s.mainpic_id IS NULL ORDER by mi.image_name TABLE IMAGE image_id int(11) unsigned NOT NULL auto_increment, image_name char(64) NOT NULL default 'default', PRIMARY KEY (mainpic_id), KEY mainpic_id (mainpic_id) ) TYPE=MyISAM TABLE STORY id int(11) NOT NULL auto_increment, body text, image_id int(11) unsigned NOT NULL default '0', PRIMARY KEY (id), KEY id (id) ) TYPE=MyISAM And the results from EXPLAIN SELECT: table = image type = ALL possible keys = key = key_len = ref = rows = 460 extra = table = story type = ALL possible keys = key = key_len = ref = rows = 610 extra = -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization [sorted]
I'm trying to sort out a query that identifies images that are not in the story table (structures below). Peter Brawley has kindly pointed me in the right direction with the sql structure (which does work), but I'm getting benchmarks of 5+ seconds on the test data, whereas the live site has 50x the number of rows and I'm concerned about trying to go live with it. Went back to the manual and realised that the table wasn't properly indexed. Makes a *big* difference! Sorry to waste your time/bandwidth. Kris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query Optimization
Hi, Does anybody know where I can find information about query optimization in MySQL? Of which the techniques that are used? Regards, Volnei Galbino
Re: Query Optimization
At 21:55 -0300 3/7/04, Volnei Galbino wrote: Hi, Does anybody know where I can find information about query optimization in MySQL? Of which the techniques that are used? Regards, Volnei Galbino Yes, there's a chapter on optimization in the MySQL Reference Manual. http://www.mysql.com/doc/en/MySQL_Optimisation.html -- Paul DuBois, MySQL Documentation Team Madison, Wisconsin, USA MySQL AB, www.mysql.com MySQL Users Conference: April 14-16, 2004 http://www.mysql.com/uc2004/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization help
Chuck Gadd wrote: I've got a query that I can't seem to get optimized, so I'm hoping someone here can spot something I've missing! Table has three columns: CoordID int unsigned, Zip_Lo char(9), Zip_Hi char(9) Table has 3 million records indexes: acg_lo (Zip_Lo) acg_hi (Zip_Hi) acg_combined (Zip_Lo, Zip_Hi) Here's the query: select * from acg where zip4_lo_pot = '80128' and zip4_hi_pot = '80128' Explain shows: type: ALL possible keys: acg_lo,acg_hi,acg_combined rows: 3022309 extra: Using where This kind of query cannot be efficiently optimized on a pre-4.1 version. With 4.1, if you are using MyISAM tables you could make (zip4_lo_pot,zip4_hi_pot) a spatial column with a spatial index. See http://www.mysql.com/doc/en/Spatial_extensions_in_MySQL.html -- Sasha Pachev Create online surveys at http://www.surveyz.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]