If you're running version 5.1+ you may wanna take a look at table partitioning options you may have.
On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: > Thanks for the reply Andy. Unfortunately the users will be selecting varying > date ranges and new data is constantly coming in, so I am not sure how I > could archive/cache the necessary data that would be any more efficient than > simply using the database directly. > > > On 09/08/2011 02:16 PM, Andrew Moore wrote: >> Thinking outside the query, is there any archiving that could happen to make >> your large tables kinder in the range scan? >> >> Andy >> >> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphe...@gls.com> wrote: >> >>> On 09/01/2011 01:32 PM, Brandon Phelps wrote: >>> >>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: >>>> >>>>> On 9/1/2011 09:42, Brandon Phelps wrote: >>>>> >>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote: >>>>>>>> ... >>>>>> >>>>>>>> WHERE >>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30 >>>>>> 00:00:00') >>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30 >>>>>> 12:36:53') >>>>>> >>>>>>> In that case your logic here simplifies to: >>>>>>> WHERE >>>>>>> open_dt>= '2011-08-30 00:00:00' >>>>>>> AND >>>>>>> close_dt<= '2011-08-30 12:36:53' >>>>>> >>>>>>> Now add an index over open_dt and close_dt and see what happens. >>>>>> >>>>>>> Jochem >>>>>> >>>>>> Jochem, >>>>>> >>>>>> I can't really use your WHERE logic because I also need to retrieve >>>>>> results where the open_dt time is out of the range specified. For >>>>>> example, a very large file download might span multiple days so given >>>>>> your logic if the connection was started 2 days ago and I want to pull 1 >>>>>> days worth of connections, I would miss that entry. Basically I want to >>>>>> SELECT all of the records that were opened OR closed during the >>>>>> specified time period, ie. if any activity happened between my start and >>>>>> end dates, I need to see that record. >>>>>> >>>>>> Any other ideas? >>>>>> >>>>>> >>>>> I believe Jochem was on the right track but he got his dates reversed. >>>>> >>>>> Let's try a little ASCII art to show the situation. I will setup a query >>>>> window with two markers (s) and (e). Events will be marked by |----| >>>>> markers >>>>> showing their durations. >>>>> >>>>> a) (s) (e) >>>>> b) |---| >>>>> c) |---| >>>>> d) |---| >>>>> e) |--------------------| >>>>> f) |---| >>>>> g) |---| >>>>> >>>>> To describe these situations: >>>>> a) is the window for which you want to query (s) is the starting time and >>>>> (e) is the ending time for the date range you are interested in. >>>>> b) the event starts and stops before your window exists. It won't be part >>>>> of your results. >>>>> c) the event starts before the window but ends within the window - >>>>> include this >>>>> d) the event starts and ends within the window - include this >>>>> e) the event starts before the window and ends after the window - include >>>>> this >>>>> f) the event starts inside the window but ends beyond the window - >>>>> include this. >>>>> g) the event starts and ends beyond the window - exclude this. >>>>> >>>>> In order to get every event in the range of c-f, here is what you need >>>>> for a WHERE clause >>>>> >>>>> WHERE start<= (ending time) and end>= (starting time) >>>>> >>>>> Try that and let us know the results. >>>>> >>>> >>>> Thanks Jochem and Shawn, however the following two queries result in the >>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you) >>>> >>>> Old method: >>>> SELECT >>>> sc.open_dt, >>>> sc.close_dt, >>>> sc.protocol, >>>> INET_NTOA(sc.src_address) AS src_address, >>>> sc.src_port, >>>> INET_NTOA(sc.dst_address) AS dst_address, >>>> sc.dst_port, >>>> sc.sent, >>>> sc.rcvd, >>>> spm.desc AS src_port_desc, >>>> dpm.desc AS dst_port_desc >>>> FROM firewall_connections AS sc >>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port >>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port >>>> WHERE >>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31') >>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01 >>>> 09:53:31') >>>> ORDER BY rcvd DESC >>>> LIMIT 0, 10; >>>> >>>> New method with BTREE index on open_dt, close_dt (index name is >>>> ndx_open_close_dt): >>>> SELECT >>>> sc.open_dt, >>>> sc.close_dt, >>>> sc.protocol, >>>> INET_NTOA(sc.src_address) AS src_address, >>>> sc.src_port, >>>> INET_NTOA(sc.dst_address) AS dst_address, >>>> sc.dst_port, >>>> sc.sent, >>>> sc.rcvd, >>>> spm.desc AS src_port_desc, >>>> dpm.desc AS dst_port_desc >>>> FROM firewall_connections AS sc >>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port >>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port >>>> WHERE >>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31' >>>> ORDER BY rcvd DESC >>>> LIMIT 0, 10; >>>> >>>> EXPLAIN output for old method: >>>> +----+-------------+-------+--**------+-----------------------** >>>> ----+----------+---------+----**----------------+------+------**-------+ >>>> | id | select_type | table | type | possible_keys | key | key_len | ref | >>>> rows | Extra | >>>> +----+-------------+-------+--**------+-----------------------** >>>> ----+----------+---------+----**----------------+------+------**-------+ >>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | >>>> NULL | 10 | Using where | >>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | >>>> 1 | | >>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | >>>> 1 | | >>>> +----+-------------+-------+--**------+-----------------------** >>>> ----+----------+---------+----**----------------+------+------**-------+ >>>> >>>> EXPLAIN output for new method with new index: >>>> +----+-------------+-------+--**------+-----------------------** >>>> ----+----------+---------+----**----------------+------+------**-------+ >>>> | id | select_type | table | type | possible_keys | key | key_len | ref | >>>> rows | Extra | >>>> +----+-------------+-------+--**------+-----------------------** >>>> ----+----------+---------+----**----------------+------+------**-------+ >>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | >>>> NULL | 10 | Using where | >>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | >>>> 1 | | >>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | >>>> 1 | | >>>> +----+-------------+-------+--**------+-----------------------** >>>> ----+----------+---------+----**----------------+------+------**-------+ >>>> >>>> SHOW INDEX: >>>> +----------------------+------**------+-------------------+---** >>>> -----------+-------------+----**-------+-------------+--------** >>>> --+--------+------+-----------**-+---------+ >>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | >>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment | >>>> +----------------------+------**------+-------------------+---** >>>> -----------+-------------+----**-------+-------------+--------** >>>> --+--------+------+-----------**-+---------+ >>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691 >>>> | NULL | NULL | | BTREE | | >>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A | >>>> 6377783 | NULL | NULL | | BTREE | | >>>> +----------------------+------**------+-------------------+---** >>>> -----------+-------------+----**-------+-------------+--------** >>>> --+--------+------+-----------**-+---------+ >>>> >>>> >>>> Although right now the queries do seem to be executing much faster, >>>> although I'm not quite sure why. And I'm not sure why the new >>>> ndx_open_close_dt isn't being used either. >>>> >>>> -Brandon >>>> >>>> >>> I am still having a big issue with my query as seen above. The table is up >>> to around 32 million records at the moment and either of the two SELECT >>> queries above take a very long time to run. Is there anything at all I can >>> do to speed things up? It seems that changing the format of the WHERE >>> clause did not help at all, as the EXPLAIN output is exactly the same for >>> both version. I also tried adding an index on (open_dt, close_dt, rcvd) but >>> that index does not get used. >>> >>> Any other ideas? >>> >>> Thanks in advance, >>> >>> >>> Brandon >>> >>> -- >>> MySQL General Mailing List >>> For list archives: http://lists.mysql.com/mysql >>> To unsubscribe: http://lists.mysql.com/mysql?** >>> unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com> >>> >>> >> > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql?unsub=mmano...@liquidation.com > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org