Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan?
Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps <bphe...@gls.com> wrote: > On 09/01/2011 01:32 PM, Brandon Phelps wrote: > >> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: >> >>> On 9/1/2011 09:42, Brandon Phelps wrote: >>> >>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote: >>>> > > ... >>>> >>>> > > WHERE >>>> > > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30 >>>> 00:00:00') >>>> > > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30 >>>> 12:36:53') >>>> >>>> > In that case your logic here simplifies to: >>>> > WHERE >>>> > open_dt >= '2011-08-30 00:00:00' >>>> > AND >>>> > close_dt <= '2011-08-30 12:36:53' >>>> >>>> > Now add an index over open_dt and close_dt and see what happens. >>>> >>>> > Jochem >>>> >>>> Jochem, >>>> >>>> I can't really use your WHERE logic because I also need to retrieve >>>> results where the open_dt time is out of the range specified. For >>>> example, a very large file download might span multiple days so given >>>> your logic if the connection was started 2 days ago and I want to pull 1 >>>> days worth of connections, I would miss that entry. Basically I want to >>>> SELECT all of the records that were opened OR closed during the >>>> specified time period, ie. if any activity happened between my start and >>>> end dates, I need to see that record. >>>> >>>> Any other ideas? >>>> >>>> >>> I believe Jochem was on the right track but he got his dates reversed. >>> >>> Let's try a little ASCII art to show the situation. I will setup a query >>> window with two markers (s) and (e). Events will be marked by |----| markers >>> showing their durations. >>> >>> a) (s) (e) >>> b) |---| >>> c) |---| >>> d) |---| >>> e) |--------------------| >>> f) |---| >>> g) |---| >>> >>> To describe these situations: >>> a) is the window for which you want to query (s) is the starting time and >>> (e) is the ending time for the date range you are interested in. >>> b) the event starts and stops before your window exists. It won't be part >>> of your results. >>> c) the event starts before the window but ends within the window - >>> include this >>> d) the event starts and ends within the window - include this >>> e) the event starts before the window and ends after the window - include >>> this >>> f) the event starts inside the window but ends beyond the window - >>> include this. >>> g) the event starts and ends beyond the window - exclude this. >>> >>> In order to get every event in the range of c-f, here is what you need >>> for a WHERE clause >>> >>> WHERE start <= (ending time) and end >= (starting time) >>> >>> Try that and let us know the results. >>> >> >> Thanks Jochem and Shawn, however the following two queries result in the >> exact same EXPLAIN output: (I hope the tables don't wrap too early for you) >> >> Old method: >> SELECT >> sc.open_dt, >> sc.close_dt, >> sc.protocol, >> INET_NTOA(sc.src_address) AS src_address, >> sc.src_port, >> INET_NTOA(sc.dst_address) AS dst_address, >> sc.dst_port, >> sc.sent, >> sc.rcvd, >> spm.desc AS src_port_desc, >> dpm.desc AS dst_port_desc >> FROM firewall_connections AS sc >> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port >> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port >> WHERE >> (open_dt >= '2011-08-31 09:53:31' OR close_dt >= '2011-08-31 09:53:31') >> AND (open_dt <= '2011-09-01 09:53:31' OR close_dt <= '2011-09-01 >> 09:53:31') >> ORDER BY rcvd DESC >> LIMIT 0, 10; >> >> New method with BTREE index on open_dt, close_dt (index name is >> ndx_open_close_dt): >> SELECT >> sc.open_dt, >> sc.close_dt, >> sc.protocol, >> INET_NTOA(sc.src_address) AS src_address, >> sc.src_port, >> INET_NTOA(sc.dst_address) AS dst_address, >> sc.dst_port, >> sc.sent, >> sc.rcvd, >> spm.desc AS src_port_desc, >> dpm.desc AS dst_port_desc >> FROM firewall_connections AS sc >> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port >> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port >> WHERE >> open_dt <= '2011-09-01 09:53:31' AND close_dt >= '2011-08-31 09:53:31' >> ORDER BY rcvd DESC >> LIMIT 0, 10; >> >> EXPLAIN output for old method: >> +----+-------------+-------+--**------+-----------------------** >> ----+----------+---------+----**----------------+------+------**-------+ >> | id | select_type | table | type | possible_keys | key | key_len | ref | >> rows | Extra | >> +----+-------------+-------+--**------+-----------------------** >> ----+----------+---------+----**----------------+------+------**-------+ >> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | >> NULL | 10 | Using where | >> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | >> 1 | | >> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | >> 1 | | >> +----+-------------+-------+--**------+-----------------------** >> ----+----------+---------+----**----------------+------+------**-------+ >> >> EXPLAIN output for new method with new index: >> +----+-------------+-------+--**------+-----------------------** >> ----+----------+---------+----**----------------+------+------**-------+ >> | id | select_type | table | type | possible_keys | key | key_len | ref | >> rows | Extra | >> +----+-------------+-------+--**------+-----------------------** >> ----+----------+---------+----**----------------+------+------**-------+ >> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | >> NULL | 10 | Using where | >> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | >> 1 | | >> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | >> 1 | | >> +----+-------------+-------+--**------+-----------------------** >> ----+----------+---------+----**----------------+------+------**-------+ >> >> SHOW INDEX: >> +----------------------+------**------+-------------------+---** >> -----------+-------------+----**-------+-------------+--------** >> --+--------+------+-----------**-+---------+ >> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | >> Cardinality | Sub_part | Packed | Null | Index_type | Comment | >> +----------------------+------**------+-------------------+---** >> -----------+-------------+----**-------+-------------+--------** >> --+--------+------+-----------**-+---------+ >> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691 >> | NULL | NULL | | BTREE | | >> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A | >> 6377783 | NULL | NULL | | BTREE | | >> +----------------------+------**------+-------------------+---** >> -----------+-------------+----**-------+-------------+--------** >> --+--------+------+-----------**-+---------+ >> >> >> Although right now the queries do seem to be executing much faster, >> although I'm not quite sure why. And I'm not sure why the new >> ndx_open_close_dt isn't being used either. >> >> -Brandon >> >> > I am still having a big issue with my query as seen above. The table is up > to around 32 million records at the moment and either of the two SELECT > queries above take a very long time to run. Is there anything at all I can > do to speed things up? It seems that changing the format of the WHERE > clause did not help at all, as the EXPLAIN output is exactly the same for > both version. I also tried adding an index on (open_dt, close_dt, rcvd) but > that index does not get used. > > Any other ideas? > > Thanks in advance, > > > Brandon > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql?** > unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com> > >