If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

> Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
> date ranges and new data is constantly coming in, so I am not sure how I 
> could archive/cache the necessary data that would be any more efficient than 
> simply using the database directly.
> 
> 
> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>> Thinking outside the query, is there any archiving that could happen to make
>> your large tables kinder in the range scan?
>> 
>> Andy
>> 
>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphe...@gls.com>  wrote:
>> 
>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>> 
>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>> 
>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>> 
>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>> ...
>>>>>> 
>>>>>>>> WHERE
>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>> 00:00:00')
>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>> 12:36:53')
>>>>>> 
>>>>>>> In that case your logic here simplifies to:
>>>>>>> WHERE
>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>> AND
>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>> 
>>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>> 
>>>>>>> Jochem
>>>>>> 
>>>>>> Jochem,
>>>>>> 
>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>> results where the open_dt time is out of the range specified. For
>>>>>> example, a very large file download might span multiple days so given
>>>>>> your logic if the connection was started 2 days ago and I want to pull 1
>>>>>> days worth of connections, I would miss that entry. Basically I want to
>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>> specified time period, ie. if any activity happened between my start and
>>>>>> end dates, I need to see that record.
>>>>>> 
>>>>>> Any other ideas?
>>>>>> 
>>>>>> 
>>>>> I believe Jochem was on the right track but he got his dates reversed.
>>>>> 
>>>>> Let's try a little ASCII art to show the situation. I will setup a query
>>>>> window with two markers (s) and (e). Events will be marked by |----| 
>>>>> markers
>>>>> showing their durations.
>>>>> 
>>>>> a) (s) (e)
>>>>> b) |---|
>>>>> c) |---|
>>>>> d) |---|
>>>>> e) |--------------------|
>>>>> f) |---|
>>>>> g) |---|
>>>>> 
>>>>> To describe these situations:
>>>>> a) is the window for which you want to query (s) is the starting time and
>>>>> (e) is the ending time for the date range you are interested in.
>>>>> b) the event starts and stops before your window exists. It won't be part
>>>>> of your results.
>>>>> c) the event starts before the window but ends within the window -
>>>>> include this
>>>>> d) the event starts and ends within the window - include this
>>>>> e) the event starts before the window and ends after the window - include
>>>>> this
>>>>> f) the event starts inside the window but ends beyond the window -
>>>>> include this.
>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>> 
>>>>> In order to get every event in the range of c-f, here is what you need
>>>>> for a WHERE clause
>>>>> 
>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>> 
>>>>> Try that and let us know the results.
>>>>> 
>>>> 
>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>>>> 
>>>> Old method:
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>> 09:53:31')
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>> 
>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>> ndx_open_close_dt):
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>> 
>>>> EXPLAIN output for old method:
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> 
>>>> EXPLAIN output for new method with new index:
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> 
>>>> SHOW INDEX:
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691
>>>> | NULL | NULL | | BTREE | |
>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>> 6377783 | NULL | NULL | | BTREE | |
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> 
>>>> 
>>>> Although right now the queries do seem to be executing much faster,
>>>> although I'm not quite sure why. And I'm not sure why the new
>>>> ndx_open_close_dt isn't being used either.
>>>> 
>>>> -Brandon
>>>> 
>>>> 
>>> I am still having a big issue with my query as seen above.  The table is up
>>> to around 32 million records at the moment and either of the two SELECT
>>> queries above take a very long time to run.  Is there anything at all I can
>>> do to speed things up?  It seems that changing the format of the WHERE
>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>> both version.  I also tried adding an index on (open_dt, close_dt, rcvd) but
>>> that index does not get used.
>>> 
>>> Any other ideas?
>>> 
>>> Thanks in advance,
>>> 
>>> 
>>> Brandon
>>> 
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:    http://lists.mysql.com/mysql?**
>>> unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com>
>>> 
>>> 
>> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=mmano...@liquidation.com
> 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to