Re: Query Optimization
SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it faster, if the field is ENUM On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar sb_akarmar...@yahoo.comwrote: Hi All, Consider a scenario, I have table XYZ which contains value follow BLUE RED GREEN NULL following are queries we can use get this values 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN' and more So which one is good in terms of optimization. I guess, 1 and 3 are similar in term of formation. --Anupam -- Thanks Regards, P.Benaya Paul http://www.codeasearch.com http://www.iwannasearch.com
RE: Query Optimization
It depends on the distribution of the 4 'values' in that field. If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do a table scan. -Original Message- From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com] Sent: Friday, November 16, 2012 12:36 AM To: mysql@lists.mysql.com Subject: Query Optimization Hi All, Consider a scenario, I have table XYZ which contains value follow BLUE RED GREEN NULL following are queries we can use get this values 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN' and more So which one is good in terms of optimization. I guess, 1 and 3 are similar in term of formation. --Anupam -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: query optimization
Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full table scan, you might want to check on this and use a WHERE condition to use indexed column On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote: I have a table with 24 million rows, I need to figure out how to optimize a query. It has to do with mac addresses and radius packets - I want to see the # of connections and the min/max date. So I basically want all this data: select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) recn from radiuscap where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) and r3_type='Access' group by cpe_mac order by cpe_mac ; This piece of the query takes 30 seconds to run and produces 3500 rows. I have r3_dt indexed. I also want a status field of the row with the highest r3_dt: select rec_num,cpe_mac,req_status from rad_r3cap where r3_type='Access' and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap) ; This piece of the query takes forever, I let it run for an hour and it still didn't finish, it's obviously not using indexes. I have no idea how far along it got. I wrote a php script to run the 1st query, then do 3500 individual lookups for the status using the max(rec_num) field in the 1st query, and I can get the data in 31 seconds. So I CAN produce this data, but very slowly, and not in 1 sql query. I want to consolidate this into 1 sql so I can make a view. If anyone can point me in the right direction, I'd appreciate it! mysql desc rad_r3cap; +-+-+--+-+-++ | Field | Type| Null | Key | Default | Extra | +-+-+--+-+-++ | rec_num | int(11) | NO | PRI | NULL| auto_increment | | r3_dt | datetime| YES | MUL | NULL|| | r3_micros | int(11) | YES | | NULL|| | r3_type | varchar(16) | YES | | NULL|| | req_status | varchar(16) | YES | | NULL|| | req_comment | varchar(64) | YES | | NULL|| | asn_ip | varchar(16) | YES | MUL | NULL|| | asn_name| varchar(16) | YES | | NULL|| | bsid| varchar(12) | YES | MUL | NULL|| | cpe_ip | varchar(16) | YES | | NULL|| | cpe_mac | varchar(12) | YES | MUL | NULL|| | filename| varchar(32) | YES | | NULL|| | linenum | int(11) | YES | | NULL|| | r3_hour | datetime| YES | MUL | NULL|| | user_name | varchar(64) | YES | | NULL|| +-+-+--+-+-++ mysql show indexes in rad_r3cap; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | rad_r3cap | 0 | PRIMARY |1 | rec_num | A |23877677 | NULL | NULL | | BTREE | | | rad_r3cap | 0 | r3cap_dt |1 | r3_dt | A |NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 0 | r3cap_dt |2 | r3_micros | A |NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_bsid |1 | bsid| A | 346 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_asnip |1 | asn_ip | A | 55 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_cpemac |1 | cpe_mac | A |4758 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_date |1 | r3_hour | A |1548 | NULL | NULL | YES | BTREE | | +---++--+--+-+---+-+--++--++-+ 7 rows in set (0.00 sec)
Re: Query Optimization
On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
Re: Query Optimization
Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---**
Re: Query Optimization
Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index: ++-+---+--**--+---**
Re: Query Optimization
I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+----+-**--** +--+-++--+** --**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+----+-**--** +--+-++--+** --**---+ | 1 | SIMPLE | sc | index |
Re: Query Optimization
If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
Re: Query Optimization
Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting
Re: Query Optimization
Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL | NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened
Re: Query Optimization
Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS
Re: Query Optimization
Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is
Re: Query Optimization
Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be
Re: Query Optimization
How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get:
Re: Query Optimization
Andrew, Generally there is only 1 user performing the complicated SELECT query at a time, however the background process that fills the table is constantly doing a fast SELECT (0.3 seconds) and a subsequent UPDATE. Basically whenever a connection is closed on the firewall, the bg process SELECTS from the table the last record that was opened (and hasn't already been closed) for a given source IP, source port, dest ip, and protocol. So for example whenever the firewall logs a CONNECTION CLOSED message, the bg process does: select id from sonicwall_connections where src_address = w.x.y.z and src_port = x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1 then it UPDATES the close_dt column for the record with the selected ID. These select/update statements being run by the background process generally take 0.000x seconds each. The only slowdown we see is with the complicated SELECT I have below, and this is only executed once in a while throughout the day. For example, if an IT manager sees a huge bandwidth spike on the network, he may access the web gui to determine what the source IP is so he can determine who is downloading a large file, etc. I think what I will try to do is create 60 partitions for the table in question based on month and day. This way each day will go into it's own partition and if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it will only need to access 7 partitions instead of the entire table. My only question with this is how would I go about creating the table with 60 partitions in such a way that I won't need to constantly add/remove new/old partitions every day? Is there any partitioning function I can use that would not require me to update the partitions schema every day? I already plan to have a cron run each night to purge records older than 60 days from the database. On 09/08/2011 03:26 PM, Andrew Moore wrote: Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote:
Re: Query Optimization
Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings
Re: Query Optimization
From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where | | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM
Re: Query Optimization
Ah I see. Well thanks for your assistance! -Brandon On 09/08/2011 05:21 PM, Mihail Manolov wrote: From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT |union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS
Re: Query Optimization
On Aug 30, 2011 6:46 PM, Brandon Phelps wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem
Re: Query Optimization
On 09/01/2011 04:59 AM, Jochem van Dieten wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? Thanks again, Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d)|---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc| index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |
Re: Query Optimization
There are a few things gonig on, but mainly it is the ORs that are killing you. As your require OR to examine two distinct columns, both of equal relevance to the query, MySQL: is left with no choice but to do a full table scan on what might be (at a guess) a very larger table. No amount of indexing will fix this for the query presented. You would be better off writing it as two distinct queires, each concerned with conditions on a single column (open_dt and close_dt) and then UNIONing the results. In this form, the indexes have a chance of being engaged. Once the indexes are engaged, you probably want to your DATE/DATETIME strings into actual DATEs or DATATIMEs, thus: ... (open_dt = DATE('2011-08-30 00:00:00')) ... In it's current state, the DATE fields are being converted to strings implicitly for every row tested which further frustrates index usage as the index is against the quasi-numeric DATE, not the string representation which your current implementation appears to expect. This query would also suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by an admin or a future release of MySQL. The explicit casting I have suggested will protect you against that, - michael dykman` On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote: Hello, I have the following query I'd like to optimize a bit: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') ORDER BY rcvd DESC LIMIT 0 , 10 Currently I have an index on the rcvd column which seems to be working based on the output of EXPLAIN: id select_type table typepossible_keys key key_len ref rowsExtra 1 SIMPLE sc index open_dt ndx_rcvd4 NULL10 Using where 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1 However the query is still fairly slow for some reason, any ideas how I could speed it up with additional indexes, etc? The values I am using in the WHERE clause are variable and change each time. The table has around 23 million records right now but is expected to continue to grow up to a potential 150 million. Here is the table schema: CREATE TABLE IF NOT EXISTS `firewall_connections` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `open_dt` datetime NOT NULL, `close_dt` datetime NOT NULL, `protocol` smallint(6) NOT NULL, `src_address` int(10) unsigned NOT NULL, `src_port` smallint(5) unsigned NOT NULL, `dst_address` int(10) unsigned NOT NULL, `dst_port` smallint(5) unsigned NOT NULL, `sent` int(10) unsigned NOT NULL, `rcvd` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `ndx_rcvd` (`rcvd`), KEY `ndx_sent` (`sent`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Thanks in advance! -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com -- - michael dykman - mdyk...@gmail.com May the Source be with you.
Re: Query Optimization
Thanks Singer, this took my query down to 0.0007, perfect! I wasn't aware a single index of multiple columns would work when one of the columns was in the WHERE clause and the other in the ORDER BY clause. Learn something new every day I guess! On 08/10/2011 02:03 PM, Singer X.J. Wang wrote: Try a index on (dst_port,close_dt) On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com mailto:bphe...@gls.com wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?__unsub=w...@singerwang.com http://lists.mysql.com/mysql?unsub=w...@singerwang.com -- The best compliment you could give Pythian for our service is a referral. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 8/10/2011 1:01 PM, Brandon Phelps wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Did you try adding your ORDER BY argument close_dt to the index? PB - Thanks in advance, -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
From: Brandon Phelps bphe...@gls.com I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. You need an index on `close_dt`. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 Current farmers, who have become mere operators of machines and mixers of chemicals, may not have the skills to develop a local, sustainable agriculture. A new generation of farmers, numbering in the tens of millions, will need to be trained and relocated to rural communities. -- Pat Murphy Jan Steinman, EcoReality Co-op -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: query optimization question (my struggle against 'using temporary; using filesort')
Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your slow query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like Listing? Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | index_addresses_on_addressable_type_and_addressable_id | 773 | const, ratemyarea.listings.id |1 | Using
Re: query optimization question (my struggle against 'using temporary; using filesort')
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :) The queries were generated by ActiveRecord (an ORM library for Ruby), although even if I had written them myself they would probably not be much better. Regards, Ciaran Lee 2009/9/24 Gavin Towey gto...@ffn.com Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your slow query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like Listing? Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref |
Re: Query Optimization
The index hint is not in productoin code.. I was trying ot force it to use the index even when using the OR clause.. ment to take that out before I sent the email. The table structure is: CREATE TABLE `customer` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `ssn` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_ssn` (`ssn`) ) ENGINE=InnoDB CREATE TABLE `customer_id` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `customer_id` int(10) unsigned DEFAULT NULL, `id_num` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`), KEY `customer_key` (`customer_id`), KEY `id_id_num` (`id_num`) ) ENGINE=InnoDB The explain output of the query using the OR clause: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ALL possible_keys: idx_ssn key: NULL key_len: NULL ref: NULL rows: 176680 Extra: Using where; Using temporary *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where; Distinct 2 rows in set (0.00 sec) Using a UNION results in: *** 1. row *** id: 1 select_type: PRIMARY table: customer type: range possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 1 Extra: Using where; Using temporary *** 2. row *** id: 1 select_type: PRIMARY table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using index; Distinct *** 3. row *** id: 2 select_type: UNION table: customer_id type: range possible_keys: customer_key,id_id_num key: id_id_num key_len: 35 ref: NULL rows: 1 Extra: Using where; Using temporary *** 4. row *** id: 2 select_type: UNION table: customer type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: aca_ecash.customer_id.customer_id rows: 1 Extra: Using where *** 5. row *** id: NULL select_type: UNION RESULT table: union1,2 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 5 rows in set (0.01 sec) The union is much faster. I've tested the same search for ID numbers on our test system (Windows 32-Bit, 2GB ram, P4 3Ghz) against the productoin system (RHEL 64-Bit 16GB ram, Dual Xeon 2Ghz).. the search in the test system is almost instant as compared to the production system its taking 4 to 6 seconds. There's not much traffic today on it. I'm going to put the UNION into production and see how it goes. Thanks for the replies. -johnny On Tue, Jan 13, 2009 at 7:39 PM, Andrew Garner andrew.b.gar...@gmail.comwrote: On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote: If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. And in newer versions, too. The optimizer frequently underestimates the cost of the merge operation and the required random I/O for row lookups. So, yes it can use an index merge, but... efficiency is another question. I've seen table scans outperform a two-way index merge by orders of magnitude. These appeared to be high selectivity indexes, but perhaps I assumed too much. :) -- - Johnny Withers 601.209.4985 joh...@pixelated.net
Re: Query Optimization
Try a union instead of an or condition. http://dev.mysql.com/doc/refman/5.0/en/union.html Johnny Withers wrote: I have the following tables: Customer: id,ssn Customer_Id: id,customer_id,id_num The customer table holds customers along with their SSN and the customer_id table holds identifications for each customer (Driver's License, State Issued ID, Student ID, etc). The SSN column from the customer table is VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32). Both of these columns have an index on them. The following query uses the index on customer.ssn and executes in 0ms: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ref possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: const rows: 1 Extra: Using where; Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Now, this is the query I have trouble with, it does not use the index (or says it does but doesn't?) and on a busy system (200+ queries per sec) can take up to 20 seconds or more to execute: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789' OR id_num='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: index possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 165843 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where Is there some way I can make it use the index? I've thought about redesigning the query to select from the customer_id table first, if a row is found.. just return the matching customer_id from the customer table.. but I wanted to see if maybe i'm going about this the wrong way before I engineer some way around this. Thanks in advance, - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
Do you have an index on id_num? What sort of explain output do you get when you don't use a query hint? Your USE INDEX hint may be causing MySQL to ignore a better strategy. If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. On Mon, Jan 12, 2009 at 9:43 AM, Johnny Withers joh...@pixelated.net wrote: I have the following tables: Customer: id,ssn Customer_Id: id,customer_id,id_num The customer table holds customers along with their SSN and the customer_id table holds identifications for each customer (Driver's License, State Issued ID, Student ID, etc). The SSN column from the customer table is VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32). Both of these columns have an index on them. The following query uses the index on customer.ssn and executes in 0ms: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: ref possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: const rows: 1 Extra: Using where; Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Now, this is the query I have trouble with, it does not use the index (or says it does but doesn't?) and on a busy system (200+ queries per sec) can take up to 20 seconds or more to execute: SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num FROM customer USE INDEX(idx_ssn) LEFT JOIN customer_id ON customer.id=customer_id.customer_id WHERE ssn='123456789' OR id_num='123456789'; Explain output: *** 1. row *** id: 1 select_type: SIMPLE table: customer type: index possible_keys: idx_ssn key: idx_ssn key_len: 35 ref: NULL rows: 165843 Extra: Using index *** 2. row *** id: 1 select_type: SIMPLE table: customer_id type: ref possible_keys: customer_key key: customer_key key_len: 5 ref: aca_ecash.customer.id rows: 1 Extra: Using where Is there some way I can make it use the index? I've thought about redesigning the query to select from the customer_id table first, if a row is found.. just return the matching customer_id from the customer table.. but I wanted to see if maybe i'm going about this the wrong way before I engineer some way around this. Thanks in advance, - Johnny Withers 601.209.4985 joh...@pixelated.net -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. And in newer versions, too. The optimizer frequently underestimates the cost of the merge operation and the required random I/O for row lookups. So, yes it can use an index merge, but... efficiency is another question. I've seen table scans outperform a two-way index merge by orders of magnitude. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote: If you have separate indexes on ssn and id_num, MySQL may be able to efficiently use an index merge optimization . See http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html. This is only in 5.0+ - on older versions of MySQL you may find a union more efficient. And in newer versions, too. The optimizer frequently underestimates the cost of the merge operation and the required random I/O for row lookups. So, yes it can use an index merge, but... efficiency is another question. I've seen table scans outperform a two-way index merge by orders of magnitude. These appeared to be high selectivity indexes, but perhaps I assumed too much. :) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query optimization help
First, you might want to move the WHERE...t3.int_a = some integer condition into the join condition for t3. Your not using anything from t4, so I'm not sure why you have that table in your query. You can suggest or force mysql to use an index if it's using the wrong one: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html On very rare occasions I've had to do this. What's happening is that mysql is analyzing the information it has about the data and indexes and coming to the wrong conclusion, perhaps even opting for an entire table scan. You can run ANALYZE TABLE to force mysql to update the information it has about the data. This may actually solve your problem. Try SHOW INDEX FROM t1 to see what data mysql has about the indexes. Sometimes the CARDINALITY (uniqueness) column will be null which can indicate a problem. Posting the result of your EXPLAIN will actually be helpful. Hope that helps. Brent Baisley On Aug 11, 2008, at 8:26 AM, Jonathan Terhorst wrote: I have this query: SELECT DISTINCT t1.string FROM t1 LEFT JOIN t2 ON t1.string=t2.string LEFT JOIN t3 ON t1.int_a=t3.int_a LEFT JOIN t4 ON t1.int_b=t4.int_b WHERE t1.string != '' AND t2.string IS NULL AND t3.int_a = some integer ORDER BY t1.string ASC This query is executing slower than it should. EXPLAIN has it using temporary and using filesort. I have indexes on every column in the query, but I think the problem is the one-index-per-table limitation. According to EXPLAIN, there are two possible indices in use for t1 (int_a and string), but only int_a is being used. So I tried constructing a compound index on int_a and string. Although this new index appears in possible_keys, EXPLAIN still shows the key actually being used as int_a. I tried building the compound key in both orders and had the same results. How do get mysql to all possible keys on t1 when running the query? Thanks! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
-What I'am trying to do: Bit hard to explain. I've got a table consisting of ip addresses (ipv4_src), destination addresses (ipv4_dst), and port numbers (port_dst) and some other irrelevant columns. Ultimately my goal is to find a linear order in a subset of ports. For example, host A connects to B on port 1, 2, 3, 4,...20, I would like to find this linear relation. To achieve this, I have written some mathematical formula's. Unfortunately there is one downside to my formula: It can be fooled by adding some extreme values. It cannot find a linear relation in this list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up, the 45000 number ruïns the result of the algorithm. So the query I've submitted, is ment to remove extreme values. The methods boxplot(pF.port_dst,LOW) and boxplot(pF.port_dst,HIGH) calculate in linear time the allowed range of numbers. Extreme values won't be included in this range. So in the example, the range would be [1,20] therby omitting the value 45000. Finally I would like to filter my table with port numbers for every ipv4_srcipv4_dst tuple and remove all the numbers not fitting in the range. -In human readable pseudo code this is the query: SELECT source,dest,port,octets FROM ( SELECT source,dest,boxplot(port,LOW) AS low,boxplot(port,HIGH) AS high FROM --Calculate the LOW and HIGH values for each source,dest pair. ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries. ) pF GROUP BY source,dest ) boxplot ( SELECT source,dest,port,octets FROM... GROUP BY source,dest,port --This removes the duplicate entries (again!). ) filter WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND filter.port=boxplot.LOW AND filter.port=boxplot.HIGH --Relate the tables 'boxplot' and 'filter' to eachother AND select only the source,dest,port tuples where port is in the range [LOW,HIGH] from the filter table. -Here is the original query I would like to optimize again: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY ipv4_src, ipv4_dst, port_dst )filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low I'll try the 'EXPLAIN' query tonight, but therefor I've got to create a small database first to speed up the results. Thnx in advance for your help. On Jan 22, 2008 8:15 AM, Sebastian Mendel [EMAIL PROTECTED] wrote: Joris Kinable schrieb: Optimize query I've got one query, which I would like to improve a lot since it takes very long (24 hours) to execute. Here is the idea: 1. Take the table ipv4_srcipv4_dstport_dst (other rows in this table are not mentioned for clearity) and remove all duplicate tuple's. This is done by subquery 'filter'. 2. The same query is performed by the boxplot query, but this time an aditional group by command is executed, therby calculating a User Defined Function boxplot(row,type) which returns a double value. 3. Finally the results of the query in step 2 are used to select a subset of results from the 'filter' table. 4. As you can see, the subquery 'pF' used in step 2 is identical to the query 'filter'. It's an extreme waste to calculate the same table twice. I've tried to create a temporary table from filter, but unfortunately Mysql doesn't allow you to access a temporary table twice in the same query. I prefer a 1 query answer, instead of creating views, or temporary tables. Is there a way to improve this query, therby improving the execution time? Query: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10 ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low what you are trying to do? and how about formating your query in a human readable way? did you tried EXPLAIN? what type of syntax is this: boxplot(pF.port_dst,LOW) ? -- Sebastian --
Re: Query optimization
Joris Kinable schrieb: Optimize query I've got one query, which I would like to improve a lot since it takes very long (24 hours) to execute. Here is the idea: 1. Take the table ipv4_srcipv4_dstport_dst (other rows in this table are not mentioned for clearity) and remove all duplicate tuple's. This is done by subquery 'filter'. 2. The same query is performed by the boxplot query, but this time an aditional group by command is executed, therby calculating a User Defined Function boxplot(row,type) which returns a double value. 3. Finally the results of the query in step 2 are used to select a subset of results from the 'filter' table. 4. As you can see, the subquery 'pF' used in step 2 is identical to the query 'filter'. It's an extreme waste to calculate the same table twice. I've tried to create a temporary table from filter, but unfortunately Mysql doesn't allow you to access a temporary table twice in the same query. I prefer a 1 query answer, instead of creating views, or temporary tables. Is there a way to improve this query, therby improving the execution time? Query: SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM ( SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS low,boxplot(pF.port_dst,HIGH) AS high FROM ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10 ) boxplot, ( SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY ipv4_src,ipv4_dst,port_dst ASC ) filter WHERE filter.ipv4_src=boxplot.ipv4_src AND filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND filter.port_dst=boxplot.low what you are trying to do? and how about formating your query in a human readable way? did you tried EXPLAIN? what type of syntax is this: boxplot(pF.port_dst,LOW) ? -- Sebastian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
[EMAIL PROTECTED] wrote: Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield 16 This query will use the index if the number of rows with indexfield != 16 is small enough ( about 30%). This query should be noticeably faster due to using ranged queries (partial index ranged matches): ( SELECT ... FROM ... WHERE indexfield 16 ) UNION ( SELECT ... FROM ... WHERE indexfield 16 ) At the very least, you'll want to make that UNION ALL, rather than just UNION, so mysql doesn't waste time looking for duplicates to remove. But even then, this may be slower. First, these two unioned queries may still not use the index. If the number of rows with indexfield 16 is too large (more than about 30%) the first query will be a table scan. If the number of rows with indexfield 16 is too large (more than about 30%) the second query will be a table scan. In fact, if the number of rows with indexfield = 16 is less than about 40% of the table, then at least one of the two unioned queries is guaranteed to be a table scan. Worse yet, this query stands a good chance of being twice as long as the single, != query, because it may require 2 table scans! Second, even if both unioned queries use the index, the result still may not be faster. If the combined number of matching rows is too high, the full table scan should be faster than the indexed lookups. For example: SELECT cat, COUNT(*) FROM inits GROUP BY cat; +--+--+ | cat | COUNT(*) | +--+--+ |0 | 5743 | |1 | 3792 | |2 |30727 | |3 | 1926 | |4 | 7812 | +--+--+ 19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side. First, the != case: EXPLAIN SELECT * FROM inits WHERE cat != 2 \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: ALL possible_keys: cat_idx key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where As expected, a full table scan. SELECT * FROM inits WHERE cat != 2; ... 19273 rows in set (0.37 sec) Now, the unioned range queries: EXPLAIN SELECT * FROM inits WHERE cat 2 UNION ALL SELECT * FROM inits WHERE cat 2 \G *** 1. row *** id: 1 select_type: PRIMARY table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 5680 Extra: Using where *** 2. row *** id: 2 select_type: UNION table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 6543 Extra: Using where *** 3. row *** id: NULL select_type: UNION RESULT table: union1,2 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: As hoped, mysql plans to use the index for each query. SELECT * FROM inits WHERE cat 2 UNION ALL SELECT * FROM inits WHERE cat 2; ... 19273 rows in set (0.78 sec) Despite (because of) using the index, this takes more than twice as long! Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Even when using an index is possible, it is not always desirable. I'd suggest not trying to outfox the optimizer until you've first determined it is making bad choices, and then test to make sure the solution is actually an improvement. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: Query Optimization Question
Shawn, Any performance gains for specifying type 0 than type 0 ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Query Optimization Question
Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield 16 This query should be noticeably faster due to using ranged queries (partial index ranged matches): ( SELECT ... FROM ... WHERE indexfield 16 ) UNION ( SELECT ... FROM ... WHERE indexfield 16 ) Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM: Shawn, Any performance gains for specifying type 0 than type 0 ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: Query Optimization Question
Interesting, that seems like an optimization the query optimizer could do itself when it sees a operator on a indexed numeric. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 8:01 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: RE: Query Optimization Question Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield 16 This query should be noticeably faster due to using ranged queries (partial index ranged matches): (SELECT ... FROM ... WHERE indexfield 16 ) UNION ( SELECT ... FROM ... WHERE indexfield 16 ) Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM: Shawn, Any performance gains for specifying type 0 than type 0 ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Query Optimization Question
Robert DiFalco wrote: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. Mladen Adamovic wrote: IN should be faster implemented with both hash tables and BTREE's so nowadays it should also be faster than as all MySQL implementation AFAIK use those well known data structures for indexes. [EMAIL PROTECTED] wrote: YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for matching values. Matches can come from indexes. When you say or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on your temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Implicit in Mladen and Shawn's answers, but never actually mentioned in the original post, is the presence of an index on the type column. This is probably obvious to all concerned, but I mention it for completeness: without an index on type, there is no difference between type IN (1,2,3) and type != 0. That is, the question is not whether IN is better than !=, but rather which will allow the optimizer to make good use of the index on type. I find mysql's optimizer is pretty good with well-written queries, as long as subqueries aren't involved, so my initial reaction was to expect no difference. After all, as the optimizer considers the WHERE conditions and the available indexes, it is certainly possible, at least theoretically, for it to notice that type IN (1,2,3) and type != 0 are identical conditions. That is, a clever optimizer could treat them identically. Shawn's and Mladen's answers gave me pause, however, and aroused my curiosity, so I decided to test: SELECT VERSION(); +---+ | VERSION() | +---+ | 4.1.15| +---+ SELECT cat, COUNT(*) FROM inits GROUP BY cat; +--+--+ | type | COUNT(*) | +--+--+ |0 |44224 | |1 | 1919 | |2 | 1931 | |3 | 1926 | +--+--+ mysql EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 8117 Extra: Using where mysql EXPLAIN SELECT * FROM inits WHERE cat != 0 \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 8120 Extra: Using where As you can see, the optimizer plans to use the index in both cases, examining 8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rows out of 50,000 (about 11.55%). On the other hand, it makes a difference how many rows will match. What is the distribution of values of type? If the number of matching rows is more than about 30% of the table, the optimizer won't use an available index in any case. For example, mysql EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: ALL possible_keys: cat_idx key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where mysql EXPLAIN SELECT * FROM inits WHERE cat !=3 \G *** 1. row *** id: 1 select_type: SIMPLE table:
Re: Query Optimization Question
Robert DiFalco wrote: In a previous database engine I was using an IN was more optimal than a . So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type 0; IN should be faster implemented with both hash tables and BTREE's so nowadays it should also be faster than as all MySQL implementation AFAIK use those well known data structures for indexes. I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. -- Mladen Adamovic http://home.blic.net/adamm http://www.shortopedia.com http://www.froola.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
Hi Tripp, all! Tripp Bishop wrote: [[...]] Why would adding extra fields to the query slow the query down so much? In either case it looks like it's using the same index. If a query is to return only fields contained in the index, it can be executed as an index-only query, and the true data need not be accessed. The moment you ask for a field not contained in the index, for each index entry matching the search condition, the base data need to be accessed to get that field. This may be very significant additional costs. HTH, Joerg PS: In my reader, it looked like your original question hijacked another thread: Question regarding running 'mysql' from a bash script. :-( -- Joerg Bruehe, Senior Production Engineer MySQL AB, www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
It might be a good idea if you could post the results of an EXPLAIN SELECT ... for a fast query and a slow query along with their actual SQL statements. James Harvard At 11:37 am -0800 12/1/06, Tripp Bishop wrote: I've got a question regarding optimizing a query. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization
Jim, here's an example query: SELECT COUNT(listing.listing_id) AS listing_count FROM listing_city_xref INNER JOIN listing ON listing.listing_id = listing_city_xref.listing_id AND listing.group_id = listing_city_xref.group_id WHERE listing_city_xref.group_id =2 AND listing_city_xref.city_id IN (83,88) AND listing.price BETWEEN 189000.00 AND 23.00 AND tot_finished_sqft = 1100 AND sqft = 1000 AND baths = 1 AND bedrooms = 3 AND garage_spaces = 1 AND property_type IN ('RES')); I created an index on the following fields in the listing table: group_id, price, tot_finished_sqft, baths, bedrooms, garage_spaces, property_type. Here's the result of the explain on the query above: | 1 | SIMPLE | listing | ref | PRIMARY, idx_search | idx_search | 4 | const | 8708 | Using where| | 1 | SIMPLE | listing_wiz_city_xref | ref | PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36| const, ahf_test.listing.listing_id |1 | Using where; Using index | This query took 12.91 seconds. Running it multiple times produces similar run times. Now, if I remove the sqft = 1100 condition from the where clause I get the following explain output: | 1 | SIMPLE | listing | range | PRIMARY, idx_search | idx_search | 15 | NULL | 8688 | Using where; Using index | | 1 | SIMPLE | listing_wiz_city_xref | ref | PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36 | const, ahf_test.listing.listing_id |1 | Using where; Using index | The query executes in 0.09 seconds. Running it multiple times produces similar runtimes. One interesting difference between the explain outputs is that in the second case the ref is NULL. In the first scenario it is const. Why would adding extra fields to the query slow the query down so much? In either case it looks like it's using the same index. Thanks, Tripp __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization question
Gerald Taylor wrote: Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as sublon ORDER BY distance; Thanks. GT A Subselect won't help. You still have to scan the whole table to the calculation for the where clause. Pre calculate your min and max lat. and lon. WHERE latitude = @maxlat and latitude = @minlat and longitude = @maxlon and longitude = @minlon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization question
When you don't have subselects, you have two options: temporary tables or JOINed queries.In your case, I think the temporary table is the better way to go. I would also eliminate the ABS() check so that I can compare values directly against the index. I know the math is correct your way but this way you are comparing values directly against the column which means that indexes can come into play. SET @targetLat = 44.6, @targetLon = -123.8, @Delta = 3 CREATE TEMPORARY TABLE tmpDeltaData SELECT city, state, country, latitude, longitude FROM Londata WHERE latitude BETWEEN (@[EMAIL PROTECTED]) AND (@targetLat + @Delta) AND longitude BETWEEN (@targetLon - @Delta) AND (@targetLon + @Delta) SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9\\.]+$' AND longitude REGEXP'[0-9\\.]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(@targetLat))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(@targetLat))) * (COS(RADIANS(latitude))) * (COS(RADIANS(@targetLon -longitude) * 111),) as distance FROM tmpDeltaData ORDER BY distance DROP TEMPORARY TABLE tmpDeltaData I would also test the WHERE clause from tmpDeltaData with your original query to compare speeds of the two methods. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Gerald Taylor [EMAIL PROTECTED] wrote on 10/04/2004 10:33:22 AM: Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as sublon ORDER BY distance; Thanks. GT -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization [sorted]
I'm trying to sort out a query that identifies images that are not in the story table (structures below). Peter Brawley has kindly pointed me in the right direction with the sql structure (which does work), but I'm getting benchmarks of 5+ seconds on the test data, whereas the live site has 50x the number of rows and I'm concerned about trying to go live with it. Went back to the manual and realised that the table wasn't properly indexed. Makes a *big* difference! Sorry to waste your time/bandwidth. Kris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization
At 21:55 -0300 3/7/04, Volnei Galbino wrote: Hi, Does anybody know where I can find information about query optimization in MySQL? Of which the techniques that are used? Regards, Volnei Galbino Yes, there's a chapter on optimization in the MySQL Reference Manual. http://www.mysql.com/doc/en/MySQL_Optimisation.html -- Paul DuBois, MySQL Documentation Team Madison, Wisconsin, USA MySQL AB, www.mysql.com MySQL Users Conference: April 14-16, 2004 http://www.mysql.com/uc2004/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization help
Chuck Gadd wrote: I've got a query that I can't seem to get optimized, so I'm hoping someone here can spot something I've missing! Table has three columns: CoordID int unsigned, Zip_Lo char(9), Zip_Hi char(9) Table has 3 million records indexes: acg_lo (Zip_Lo) acg_hi (Zip_Hi) acg_combined (Zip_Lo, Zip_Hi) Here's the query: select * from acg where zip4_lo_pot = '80128' and zip4_hi_pot = '80128' Explain shows: type: ALL possible keys: acg_lo,acg_hi,acg_combined rows: 3022309 extra: Using where This kind of query cannot be efficiently optimized on a pre-4.1 version. With 4.1, if you are using MyISAM tables you could make (zip4_lo_pot,zip4_hi_pot) a spatial column with a spatial index. See http://www.mysql.com/doc/en/Spatial_extensions_in_MySQL.html -- Sasha Pachev Create online surveys at http://www.surveyz.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization help
Without seeing the data I am assuming that you are going over the 30% threshold with your less/greater equal to where clauses. What sort of criteria are you asking the database engine to search for? Original Message On 2/25/04, 9:44:02 PM, [EMAIL PROTECTED] wrote regarding Re: Query optimization help: Maybe i'm wrong here, someone correct me, if its just int's you are gonna use set the field types to bigint it may search faster you are doing a character search, to get there quicker in a text search scenerio i'd suggest mysql4 and full text searching MATCH AGAINST I've got a query that I can't seem to get optimized, so I'm hoping someone here can spot something I've missing! Table has three columns: CoordID int unsigned, Zip_Lo char(9), Zip_Hi char(9) Table has 3 million records indexes: acg_lo (Zip_Lo) acg_hi (Zip_Hi) acg_combined (Zip_Lo, Zip_Hi) Here's the query: select * from acg where zip4_lo_pot = '80128' and zip4_hi_pot = '80128' Explain shows: type: ALL possible keys: acg_lo,acg_hi,acg_combined rows: 3022309 extra: Using where So, how can I optimize this? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization help
Maybe i'm wrong here, someone correct me, if its just int's you are gonna use set the field types to bigint it may search faster you are doing a character search, to get there quicker in a text search scenerio i'd suggest mysql4 and full text searching MATCH AGAINST I've got a query that I can't seem to get optimized, so I'm hoping someone here can spot something I've missing! Table has three columns: CoordID int unsigned, Zip_Lo char(9), Zip_Hi char(9) Table has 3 million records indexes: acg_lo (Zip_Lo) acg_hi (Zip_Hi) acg_combined (Zip_Lo, Zip_Hi) Here's the query: select * from acg where zip4_lo_pot = '80128' and zip4_hi_pot = '80128' Explain shows: type: ALL possible keys: acg_lo,acg_hi,acg_combined rows: 3022309 extra: Using where So, how can I optimize this? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: query optimization with CONCAT query
Mysql 4.0.1+ supports searches IN BOOLEAN MODE So if mysql 4 is an option, I think that would work for you. http://www.mysql.com/doc/en/Fulltext_Search.html olinux --- Andy Ingham [EMAIL PROTECTED] wrote: Folks -- We have built a table with bibliographic information that contains a field for initial articles and for titles (sans initial articles). Some sample data (with all other fields suppressed): +-+--+ | article | title | +-+--+ | The | '60s (mini) | | | À Nous la Liberté [AKA: Freedom for Us | Liberty for Us] | | | Full Speed [AKA: À toute vitesse] | | El | Ángel exterminador [AKA: The Exterminating Angel]| | The | Farm: Angola, USA [AKA: The Farm] +-+--+ For searching, we want to check article (+) title So, we've used CONCAT, as shown below from an example query from our slow query log: SELECT FILM.length, FILM.film_id, FILM.article, FILM.title, FILM.release_date, FILM.description, FILM.displayFlag FROM (FILM LEFT JOIN FILM_DIRECTOR ON FILM.film_id = FILM_DIRECTOR.film_id) LEFT JOIN DIRECTOR ON FILM_DIRECTOR.director_id = DIRECTOR.director_id WHERE FILM.displayFlag = 'yes' AND (CONCAT(article,title) LIKE '%paris is burning%') ORDER BY title, release_date; I am under the impression (from testing and reading previous posts) that it is not possible to construct an index on the CONCATENATION of two fields. A multi-column index on the article and title fields only helps if the query is like: ... article LIKE '%something%' and title LIKE '%something%' ... which doesn't help us. Likewise, a FULLTEXT index only finds a string that is *fully contained* within any of the fields defined in that index. Any ideas of how to address this issue? TIA for any insights, Andy Andy Ingham Systems Librarian Academic Affairs Library UNC-Chapel Hill 919-962-1288 [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: query optimization with CONCAT query
Actually, I'm not convinced it would, because I want to be able to find a string that is *NOT fully contained* within any given field. In other words, I want a search for The catcher in% to match the row: +-+--+ | article | title | +-+--+ | The | catcher in the rye | CONCAT allows me to do this, but I'm hoping to optimize that operation somehow, perhaps with an index. Andy olinux wrote: Mysql 4.0.1+ supports searches IN BOOLEAN MODE So if mysql 4 is an option, I think that would work for you. http://www.mysql.com/doc/en/Fulltext_Search.html olinux --- Andy Ingham [EMAIL PROTECTED] wrote: Folks -- We have built a table with bibliographic information that contains a field for initial articles and for titles (sans initial articles). Some sample data (with all other fields suppressed): +-+--+ | article | title | +-+--+ | The | '60s (mini) | | | À Nous la Liberté [AKA: Freedom for Us | Liberty for Us] | | | Full Speed [AKA: À toute vitesse] | | El | Ángel exterminador [AKA: The Exterminating Angel]| | The | Farm: Angola, USA [AKA: The Farm] +-+--+ For searching, we want to check article (+) title So, we've used CONCAT, as shown below from an example query from our slow query log: SELECT FILM.length, FILM.film_id, FILM.article, FILM.title, FILM.release_date, FILM.description, FILM.displayFlag FROM (FILM LEFT JOIN FILM_DIRECTOR ON FILM.film_id = FILM_DIRECTOR.film_id) LEFT JOIN DIRECTOR ON FILM_DIRECTOR.director_id = DIRECTOR.director_id WHERE FILM.displayFlag = 'yes' AND (CONCAT(article,title) LIKE '%paris is burning%') ORDER BY title, release_date; I am under the impression (from testing and reading previous posts) that it is not possible to construct an index on the CONCATENATION of two fields. A multi-column index on the article and title fields only helps if the query is like: ... article LIKE '%something%' and title LIKE '%something%' ... which doesn't help us. Likewise, a FULLTEXT index only finds a string that is *fully contained* within any of the fields defined in that index. Any ideas of how to address this issue? TIA for any insights, Andy Andy Ingham Systems Librarian Academic Affairs Library UNC-Chapel Hill 919-962-1288 [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php __ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: query optimization
That is not valid SQL. a table can't be like something. I think you ommitted the field name. I also don't see why you need a join , unless you have duplicate ids. Gabe wrote: First off, please excuse me if this is the wrong list to post this sort of question to (and please direct me to the appropriate list) My problem: I have a database of approx. 170,000 records and 1.2 million keywords. Most of my searches happen fairly quickly using the format below However, the group by is causing a temporary table when I explain my statement, which significantly lengthens query time. If I eliminate the group by, queries are substantially sped up and a temp table is not being generated. Only problem with not using group by is that sometimes the same ID is duplicated in the results due to some of the keywords being very similar (i.e. Richard, Rich, Rick) for the same ID. Can anyone help with a different Select strategy or something else to eliminate the temp table being generated for this query. - Tables: main - has unique ID (primary) column and main data columns kw_table - has ID column and a keyword column Indexes are created on the ID columns and the keyword column. Query: SELECT main.* FROM kw_table, kw_table AS kw_table1, main WHERE kw_table LIKE 'first word%' AND kw_table1 LIKE '2nd word%' AND main.id = kw_table.id AND kw_table1.id = kw_table.id GROUP BY main.id Limit 0,11 Thanks Gabe - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: query optimization
I'm sorry, I guess I should make sure the query syntax is correct when asking a question... Here it is with the correct field names: SELECT main.* FROM kw_table, kw_table AS kw_table1, main WHERE kw_table.keyword LIKE 'first word%' AND kw_table1.keyword LIKE '2nd word%' AND main.id = kw_table.id AND kw_table1.id = kw_table.id GROUP BY main.id Limit 0,11 Yes, there can be duplicate id's in the keyword table (but not the main table).. For instance: kw_table id keyword __ ___ 1 Rich 1 Richard 1 Rick 1 Bob 2 Leonard 3 Bob 3 Larry 3 Mike Thanks Gabe Gerald Clark wrote: That is not valid SQL. a table can't be like something. I think you ommitted the field name. I also don't see why you need a join , unless you have duplicate ids. Gabe wrote: First off, please excuse me if this is the wrong list to post this sort of question to (and please direct me to the appropriate list) My problem: I have a database of approx. 170,000 records and 1.2 million keywords. Most of my searches happen fairly quickly using the format below However, the group by is causing a temporary table when I explain my statement, which significantly lengthens query time. If I eliminate the group by, queries are substantially sped up and a temp table is not being generated. Only problem with not using group by is that sometimes the same ID is duplicated in the results due to some of the keywords being very similar (i.e. Richard, Rich, Rick) for the same ID. Can anyone help with a different Select strategy or something else to eliminate the temp table being generated for this query. - Tables: main - has unique ID (primary) column and main data columns kw_table - has ID column and a keyword column Indexes are created on the ID columns and the keyword column. Query: SELECT main.* FROM kw_table, kw_table AS kw_table1, main WHERE kw_table LIKE 'first word%' AND kw_table1 LIKE '2nd word%' AND main.id = kw_table.id AND kw_table1.id = kw_table.id GROUP BY main.id Limit 0,11 Thanks Gabe - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Query Optimization
Hi, I did some more research on my problem. (Sorry for not doing it before I posted the other message), and I think perhaps my question could be summarized into the following. Can I use an index to speed up a select max() or select min()? I read what the manual has to say about a query like this: SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10 However I want to do this: SELECT MAX(key_part1) FROM table_name where key_part1 10 also what I want to do is basically equivalent to this: SELECT key_part1 FROM table_name WHERE key_part1 10 ORDER BY key_part1 DESC LIMIT 1 In testing the second statement seems slower than the max() for some reason though. Is there anyway to get close to a O(logN) search time on these queries? Thanks again, Dave -Original Message- From: Dave Dutcher [mailto:[EMAIL PROTECTED]] Sent: Wednesday, July 24, 2002 1:04 PM To: [EMAIL PROTECTED] Subject: Query Optimization I was wondering if somebody could give me any suggestions on how to optimize a query I am working on. This is my table: CREATE TABLE testdata ( Begin char(9) NOT NULL default '', End char(9) NOT NULL default '', UNIQUE KEY BeginEndIndex (Begin,End) ) TYPE=MyISAM; It is a table of ranges. i.e. 1-5, 7-11, 2-24000. None of the ranges overlap, and I'm trying to write a query to find a range that contains a specific number such as 500,000. So this is what I've written for a query: select Begin, End from testdata where begin = '00500' and end = '00500' On a table with 100,000 records the explain command tells me it is using the BeginEndIndex, it says the key length is 9, and that it has to look through about 27,000 records. I would like to be able to configure the query or indexes , so it will only have to look through a couple records if possible. MySQL is super fast at a query like this: select Begin from testdata where Begin = '00500'; So I had the idea of trying this: select max(Begin) from testdata where Begin = '00500'; I was hoping that it would do the same as the simple select, and then because it has a sorted index it wouldn't have to search the previous records to find the max, but explain still puts this at about 27,000 records. Has anyone else tried writing a query similar to this? Does anybody have any suggestions? Thanks in advance, Dave - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query Optimization
In the last episode (Jul 24), Dave Dutcher said: I was wondering if somebody could give me any suggestions on how to optimize a query I am working on. This is my table: CREATE TABLE testdata ( Begin char(9) NOT NULL default '', End char(9) NOT NULL default '', UNIQUE KEY BeginEndIndex (Begin,End) ) TYPE=MyISAM; It is a table of ranges. i.e. 1-5, 7-11, 2-24000. None of the ranges overlap, and I'm trying to write a query to find a range that contains a specific number such as 500,000. So this is what I've written for a query: select Begin, End from testdata where begin = '00500' and end = '00500' Put an index on (begin,end). It will only have to do a range scan on that index to find the matching records. You might also want to make those columns integers instead of chars; they'll take up a bit less space that way. -- Dan Nelson [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query Optimization
Pada Mon, 8 Jul 2002 16:07:04 +0530 Arul [EMAIL PROTECTED] menulis : Hi All The Query below took around 175 Secs to return 22 Rows..Any way this query can be optimized try to use group by instead of distinct i usually found group by is faster ;-) anyway ... performance of query is more related to : - table definition - indexing - mysql parameter values - OS and hardware spec/limitation -- Let's call it an accidental feature. -- Larry Wall MySQL 3.23.51 : up 17 days, Queries : 365.147 per second (avg). - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query Optimization
Hi I Found Group By to be a bit faster...but still it takes 30 secs to execute a query... What could be done..I have Indexing on almost all the tables.. -Arul - Original Message - From: Dicky Wahyu Purnomo [EMAIL PROTECTED] To: Arul [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, July 08, 2002 4:25 PM Subject: Re: Query Optimization Pada Mon, 8 Jul 2002 16:07:04 +0530 Arul [EMAIL PROTECTED] menulis : Hi All The Query below took around 175 Secs to return 22 Rows..Any way this query can be optimized try to use group by instead of distinct i usually found group by is faster ;-) anyway ... performance of query is more related to : - table definition - indexing - mysql parameter values - OS and hardware spec/limitation -- Let's call it an accidental feature. -- Larry Wall MySQL 3.23.51 : up 17 days, Queries : 365.147 per second (avg). - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query Optimization
On Monday, 8. July 2002 12:37, Arul wrote: Hi All The Query below took around 175 Secs to return 22 Rows..Any way this query can be optimized For analyzing your problem, you should send your table definition and an output of the EXPLAIN statement. Regards Georg mysql, query - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query Optimization
On 8 Jul 2002, at 16:07, Arul [EMAIL PROTECTED] wrote: FROM User_Type_Details UTD, User_Type_Details B , Users U ,TempGeneralSearch1 T1,Users A LEFT JOIN Company C on (U.CompanyID = C.CompanyID) LEFT JOIN Expert_Info EI on (U.UserID = EI.UserID) LEFT JOIN User_Horizontal_Market UHM on (U.UserID = UHM.UserID) LEFT JOIN Industry I on (U.IndustryID = I.IndustryID) LEFT JOIN Technology_Area T on (U.Technology_AreaID = T.Technology_AreaID) LEFT JOIN Role R on (U.RoleID = R.RoleID ) WHERE U.UserID = UTD.UserID AND U.User_Status='ACT' AND U.User_App_Status='APP' AND UTD.User_TypeID IN (4,0,2,3) AND T1.TempID=117 You seem to have some tables in the FROM part of your query that aren't being used (Users aliased to A, User_Type_Details aliased to B) and another that isn't joined to anything (T1, but that may be okay since it looks like you may be getting only one record from it because of the TempID=117). Eliminate the unused tables (which could be resulting in a huge number of duplicated records), and your speed should improve greatly. -- Keith C. Ivey [EMAIL PROTECTED] Tobacco Documents Online http://tobaccodocuments.org - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization
Jeff, Let's do some math here first. Your table has 47 Million rows. You have a primary key on an int, and indexes across 22 chars and 10 chars. You have 512MB of key buffer. Now, assuming the indexes just have information on relevant fields (they're actually bigger than this since they have a row pointer too) you need 47M*4 + 47M*22 + 47M*10 bytes of memory for your keys. This works out to 47M * (4 + 22 + 10) = 47M * (36) = 1.6GB of memory to store your keys. You only have 512MB. Therefore, since MySQL can't keep the key in memory, it has to swap parts out to disk as it works on your query, which causes a huge performance decrease. If you don't need to search on inequalities or partial matches, I would recommend adding two columns -- an integer hash value for each of the char columns. Index those, instead of the char columns, and do your lookups on those. In this scenario, you will need 47M * (4 + 8 + 4) = 717MB of memory, which is much closer to what you have. If you aren't actively using the primary key, you may be able to squeeze the other two indexes in your available RAM. For a table that big, you should probably buy a little bit more RAM, if it will fit in your machine. RAM is cheap :) Steve Meyers On Fri, 2001-11-09 at 18:08, Jeff Isom wrote: I'm trying to figure out how to optimize a query on a fairly large table. I've been reading the MySQL documentation and have tried a few of the suggestions, but not seem to have much effect on the query. The table contains three columns: 1) int unsigned 2) char(12) 3) char(10) The table has 47,000,000 + rows. I have my key_buffer set to 512M. I have a primary key index on column 1, a multicolumn index (col2,col3) and another single column index for col3. The columns are not unique, in fact, some column 2 has as may as 1,000,000 rows with the same value. I am running MySQL 3.23.37 on RedHat 6.0 with 1 Gig memory and a single 700 Mhz processor. The hard drive is a single 18 Gig SCSI drive. I am searching on col 2 only, col 3 only, or col 2 and col 3. The queries on single columns can take as long a 6 minutes to execute. Any suggestions on how I can further optimize the setup to get better results? Thanks, Jeff - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization issue.
On Thu, Aug 23, 2001 at 11:11:38AM -0700, Steven Roussey wrote: mysql explain select * from dominf.domip order by ip desc; 'ORDER BY something DESC' does not use an index. It must be ascending. See the manual. Also, I forgot, if you do this often and it needs to be in DESC order, you can always alter the app to create and use an extra column 'ip_desc unsigned int not null' and set it in an insert to (0x - ip) and have an index on ip_desc. (If the column was signed you would use 0x7FF - thevariable, but ip numbers need to be stored as unsigned). And it's also worth knowing that this is fixed in 4.0. Jeremy -- Jeremy D. Zawodny, [EMAIL PROTECTED] Technical Yahoo - Yahoo Finance Desk: (408) 349-7878 Fax: (408) 349-5454 Cell: (408) 685-5936 MySQL 3.23.41-max: up 8 days, processed 107,141,973 queries (140/sec. avg) - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization issue.
mysql explain select * from dominf.domip order by ip desc; 'ORDER BY something DESC' does not use an index. It must be ascending. See the manual. Also, I forgot, if you do this often and it needs to be in DESC order, you can always alter the app to create and use an extra column 'ip_desc unsigned int not null' and set it in an insert to (0x - ip) and have an index on ip_desc. (If the column was signed you would use 0x7FF - thevariable, but ip numbers need to be stored as unsigned). Sincerely, Steven Roussey http://Network54.com/?pp=e PS: I'm trying to set aside a week in September to augment the MySQL code to use an index even with DESC in these situations. MySQL AB has some unfinished code to do this that I'm working from. I'm just familiarizing myself with the context of the code right now. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization issue.
Peter Zaitsev wrote: Hello mysql, I was trying to load very huge (5GB 45mil rows) to the text dump in sorted order (according to primary key). The problem is mysql somehow does not wants to scan the table by primary key to produce sorted output row by row, but prefers to use filesort which would take quite a long time in this case: mysql explain select * from dominf.domip order by ip desc; Try using limit clause -- _/_/ _/_/_/ - Rafa Jank [EMAIL PROTECTED] - _/ _/ _/ _/ _/ Wirtualna Polska SA http://www.wp.pl _/_/_/_/ _/_/_/ul. Uphagena 2, 80-237 Gdansk, tel/fax. (58) 5215625 _/ _/ _/ ==* http://szukaj.wp.pl *==-- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization issue.
Rafal Jank wrote: Peter Zaitsev wrote: Hello mysql, I was trying to load very huge (5GB 45mil rows) to the text dump in sorted order (according to primary key). The problem is mysql somehow does not wants to scan the table by primary key to produce sorted output row by row, but prefers to use filesort which would take quite a long time in this case: mysql explain select * from dominf.domip order by ip desc; Try using limit clause I mean: check the plan of select * from dominf.domip order by ip desc limit (number of records - 1) That's strange, but that works for me... (on the smaller table) The last record you can get using other query. -- _/_/ _/_/_/ - Rafa Jank [EMAIL PROTECTED] - _/ _/ _/ _/ _/ Wirtualna Polska SA http://www.wp.pl _/_/_/_/ _/_/_/ul. Uphagena 2, 80-237 Gdansk, tel/fax. (58) 5215625 _/ _/ _/ ==* http://szukaj.wp.pl *==-- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization issue.
mysql explain select * from dominf.domip order by ip desc; 'ORDER BY something DESC' does not use an index. It must be ascending. See the manual. Sincerely, Steven Roussey http://Network54.com/?pp=e - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Query optimization
Hello On Sat, 17 Mar 2001, Brd Farstad wrote: I have a query in my program which is taking to long when I get data in the database. I'm currently testing with ~40.000 articles in the system. [...] Is it possible to optimize this query further? Have you played with EXPLAIN SELECT and friends? See also some optimization tips in the chapter 12 of the manual. cheers -- TS - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Query Optimization
Hi, Without a where clause in your initial query, you will never get to use an index. How many rows are there in sessobjs? From the start of the sample output, it looks like there are several thousand, so asking a complex question may, indeed, take a long time. You could create a result table, and use several cell_id limited queries to populated it, if the time take to run the query for each cell_id turns out to be less than the time to run the one-off query. Hope this helps. Quentin -Original Message- From: CLEMENTS,BILL (HP-Roseville,ex1) [mailto:[EMAIL PROTECTED]] Sent: Thursday, 8 March 2001 1:27 p.m. To: '[EMAIL PROTECTED]' Subject: Query Optimization Hi All, This query takes 3-4 minutes to run from the mysql command line: select DAYOFMONTH(started_date), object_status, count(*) from sessobjs where MONTH(started_date)=3 group by started_date, object_status Sample Output: mysql select DAYOFMONTH(started_date), object_status, count(*) from sessobjs where MONTH(started_date)=3 group by started_date, object_status; +--+---+--+ | DAYOFMONTH(started_date) | object_status | count(*) | +--+---+--+ |1 | 1 | 185 | |1 | 2 |23647 | |1 | 3 | 257 | |1 | 4 | 304 | |1 | 5 | 69 | |1 | 6 |9 | |1 | 7 | 13 | |2 | 1 | 403 | |2 | 3 | 271 | |2 | 4 | 283 | |2 | 5 | 32 | |2 | 7 | 14 | |3 | 1 | 811 | |3 | 2 |20582 | |3 | 3 | 243 | |3 | 4 | 287 | |3 | 5 | 124 | |3 | 6 |1 | |3 | 7 | 43 | However, with the addition of a where clause, this query only a few seconds to run: select DAYOFMONTH(started_date), object_status, count(*) from seessobjs where cell_id=1 AND MONTH(started_date)=3 group by started_date, object_status Sample output: mysql select DAYOFMONTH(started_date), object_status, count(*) from sessobjs - where cell_id=29 and MONTH(started_date)=3 group by started_date, object_status; +--+---+--+ | DAYOFMONTH(started_date) | object_status | count(*) | +--+---+--+ |1 | 2 | 205 | |2 | 2 | 195 | |3 | 2 | 177 | |4 | 2 | 223 | |4 | 3 |1 | |5 | 2 | 194 | |6 | 2 | 195 | |7 | 2 | 12 | +--+---+--+ 8 rows in set (2.45 sec) Here is my index structure for the table SESSOBJS: mysql show index from sessobjs; +--++---+--+---+ ---+-+--+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | +--++---+--+---+ ---+-+--+ | sessobjs | 0 | PRIMARY |1 | sobj_id | A | 3449788 | NULL | | sessobjs | 1 | sess_id |1 | sess_id | A |NULL | NULL | | sessobjs | 1 | object_status |1 | object_status | A |NULL | NULL | | sessobjs | 1 | objda |1 | started_date | A |NULL | NULL | | sessobjs | 1 | objda |2 | object_status | A |NULL | NULL | | sessobjs | 1 | cell_id |1 | cell_id | A |NULL | NULL | | sessobjs | 1 | cell_id |2 | started_date | A |NULL | NULL | | sessobjs | 1 | cell_id |3 | object_status | A |NULL | NULL | | sessobjs | 1 | site_id |1 | site_id | A |NULL | NULL | | sessobjs | 1 | site_id |2 | started_date | A |NULL | NULL | | sessobjs | 1 | site_id |3 | object_status | A |NULL | NULL | | sessobjs | 1 | host_id |
Re: query optimization suggestion
Shouldn't the first query have parenthesis? As in: DELETE FROM test WHERE userID=X AND (testID20 OR testID80); Even if it works the other way, parenthesis make it more clear what you're trying to accomplish. -Angela Michael Griffith wrote: When using mySQL I've learned to avoid OR in any queries as much as possible . Almost always this causes a major speed decrease. Consider this table: CREATE TABLE test( userID int, # (non-unique) testID int, # (non-unique) PRIMARY key(testid,userid) ); Suppose this table is populated with 1,000,000 rows. Then do this: DELETE FROM test WHERE userID= AND testID20 OR testID80; This query is EXTREMELY slow becasue it looks at every row in the DB. A significant improvement can be acheived by splitting it into 2 statements: DELETE FROM test WHERE userID= AND testID20; DELETE FROM test WHERE userID= AND testID80; On real data I've acheived at least a ten fold increase doing this. This is easy to optimize from the client side, however, I don't see any reason why this optimization can't or shouldn't be build into the server. Whenever an OR can be split into two separate queries on the same index this optimization should work. Food for thought. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php -- Angela Curtis Innovative Business Consultants http://www.ibc2001.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: query optimization suggestion
Shouldn't the first query have parenthesis? As in: DELETE FROM test WHERE userID=X AND (testID20 OR testID80); Even if it works the other way, parenthesis make it more clear what you're trying to accomplish. -Angela Actually, yes. Sorry about the poor example. My point is that sometimes applications can be sped up by splitting OR queries into two separate queries, and I believe many times the server could recognize this, although I am wrong in this particular example. I'll have to check my code for an exact situation. Michael - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: query optimization suggestion
In the last episode (Feb 01), Michael Griffith said: CREATE TABLE test( userID int, # (non-unique) testID int, # (non-unique) PRIMARY key(testid,userid) ); Suppose this table is populated with 1,000,000 rows. Then do this: DELETE FROM test WHERE userID= AND testID20 OR testID80; This query is EXTREMELY slow becasue it looks at every row in the DB. A significant improvement can be acheived by splitting it into 2 statements: DELETE FROM test WHERE userID= AND testID20; DELETE FROM test WHERE userID= AND testID80; Avoiding ORs is a good idea, but I don't think your example demonstrates it. The real problem is ORs on two different columns. You have: "(testID20 AND userID=) OR testID80", which can be optimized with a two-column index on (testID,userID). A better example would be "DELETE FROM test WHERE userID= OR testID=". On real data I've acheived at least a ten fold increase doing this. This is easy to optimize from the client side, however, I don't see any reason why this optimization can't or shouldn't be build into the server. Whenever an OR can be split into two separate queries on the same index this optimization should work. What you're suggesting here is splitting a query into a UNION of two smaller queries, essentially. But this is not always the best way. Keep in mind that your DELETE example doesn't have to worry about records that are true for both halves of your OR (since the record would get deleted by the first statement). If it were really a SELECT query, mysql would have to run the first half of the query, remember all the record numbers, then run the second query and make sure that no duplicate records are returned: SELECT * FROM test WHERE userID= or testID=; i.e.full table scan would become (if mysql supported UNION): SELECT DISTINCT * FROM ( SELECT * FROM test WHERE userID= UNION SELECT * FROM test WHERE testID= ); i.e.random record pull based on userID index + random record pull based on testID index + mergesort + remove duplicate records If there is a lot of overlap between the two OR clauses, or the percentage of returned records is high relative to the size of the table, doing a full table scan would be faster. -- Dan Nelson [EMAIL PROTECTED] - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php