Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting 
time)<=close_dt<=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

> Andy,
> 
> The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
> server 11.04.  Unfortunately the machine only has 2GB of RAM but no other 
> major daemons are running on the machine.  We are running RAID 1 (mirroring) 
> with 1TB drives.  The tables in question here are all MyISAM.  When running 
> with the LIMIT 10 my EXPLAIN is:
> 
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
> | id | select_type | table | type   | possible_keys       | key               
>   | key_len | ref                | rows     | Extra                       |
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
> |  1 | SIMPLE      | sc    | range  | ndx_open_close_rcvd | 
> ndx_open_close_rcvd | 8       | NULL               | 32393316 | Using where; 
> Using filesort |
> |  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY           
>   | 2       | syslog.sc.src_port |        1 |                             |
> |  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY           
>   | 2       | syslog.sc.dst_port |        1 |                             |
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
> 
> When I remove the LIMIT 10 I get:
> 
> ----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
> | id | select_type | table | type   | possible_keys       | key     | key_len 
> | ref                | rows     | Extra                       |
> +----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
> |  1 | SIMPLE      | sc    | ALL    | ndx_open_close_rcvd | NULL    | NULL    
> | NULL               | 32393330 | Using where; Using filesort |
> |  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY | 2       
> | syslog.sc.src_port |        1 |                             |
> |  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY | 2       
> | syslog.sc.dst_port |        1 |                             |
> +----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
> 
> Thanks for all your help thus far.
> 
> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>> I don't think I saw any query timings in the emails (maybe I missed them).
>> 
>> What version of MySQL are you currently using?
>> What does the explain look like when your remove the limit 10?
>> Is your server tuned for MyISAM or InnoDB?
>> What kind of disk setup is in use?
>> How much memory is in your machine?
>> 
>> 
>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps<bphe...@gls.com>  wrote:
>> 
>>> Thanks for the reply Andy.  Unfortunately the users will be selecting
>>> varying date ranges and new data is constantly coming in, so I am not sure
>>> how I could archive/cache the necessary data that would be any more
>>> efficient than simply using the database directly.
>>> 
>>> 
>>> 
>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>> 
>>>> Thinking outside the query, is there any archiving that could happen to
>>>> make
>>>> your large tables kinder in the range scan?
>>>> 
>>>> Andy
>>>> 
>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphe...@gls.com>   wrote:
>>>> 
>>>>  On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>> 
>>>>>  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>> 
>>>>>>  On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>> 
>>>>>>>  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>> 
>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>  WHERE
>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30
>>>>>>>>>> 
>>>>>>>>> 00:00:00')
>>>>>>>> 
>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30
>>>>>>>>>> 
>>>>>>>>> 12:36:53')
>>>>>>>> 
>>>>>>>>  In that case your logic here simplifies to:
>>>>>>>>> WHERE
>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>> AND
>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>  Now add an index over open_dt and close_dt and see what happens.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>  Jochem
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jochem,
>>>>>>>> 
>>>>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>>>>> results where the open_dt time is out of the range specified. For
>>>>>>>> example, a very large file download might span multiple days so given
>>>>>>>> your logic if the connection was started 2 days ago and I want to pull
>>>>>>>> 1
>>>>>>>> days worth of connections, I would miss that entry. Basically I want
>>>>>>>> to
>>>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>>>> specified time period, ie. if any activity happened between my start
>>>>>>>> and
>>>>>>>> end dates, I need to see that record.
>>>>>>>> 
>>>>>>>> Any other ideas?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  I believe Jochem was on the right track but he got his dates
>>>>>>> reversed.
>>>>>>> 
>>>>>>> Let's try a little ASCII art to show the situation. I will setup a
>>>>>>> query
>>>>>>> window with two markers (s) and (e). Events will be marked by |----|
>>>>>>> markers
>>>>>>> showing their durations.
>>>>>>> 
>>>>>>> a) (s) (e)
>>>>>>> b) |---|
>>>>>>> c) |---|
>>>>>>> d) |---|
>>>>>>> e) |--------------------|
>>>>>>> f) |---|
>>>>>>> g) |---|
>>>>>>> 
>>>>>>> To describe these situations:
>>>>>>> a) is the window for which you want to query (s) is the starting time
>>>>>>> and
>>>>>>> (e) is the ending time for the date range you are interested in.
>>>>>>> b) the event starts and stops before your window exists. It won't be
>>>>>>> part
>>>>>>> of your results.
>>>>>>> c) the event starts before the window but ends within the window -
>>>>>>> include this
>>>>>>> d) the event starts and ends within the window - include this
>>>>>>> e) the event starts before the window and ends after the window -
>>>>>>> include
>>>>>>> this
>>>>>>> f) the event starts inside the window but ends beyond the window -
>>>>>>> include this.
>>>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>>> 
>>>>>>> In order to get every event in the range of c-f, here is what you need
>>>>>>> for a WHERE clause
>>>>>>> 
>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>> 
>>>>>>> Try that and let us know the results.
>>>>>>> 
>>>>>>> 
>>>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
>>>>>> you)
>>>>>> 
>>>>>> Old method:
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>> WHERE
>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>>>> 09:53:31')
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>> 
>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>> ndx_open_close_dt):
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>> WHERE
>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>> 
>>>>>> EXPLAIN output for old method:
>>>>>> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>> |
>>>>>> rows | Extra |
>>>>>> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>> |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>> |
>>>>>> 1 | |
>>>>>> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> 
>>>>>> EXPLAIN output for new method with new index:
>>>>>> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
>>>>>> |
>>>>>> rows | Extra |
>>>>>> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
>>>>>> |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
>>>>>> |
>>>>>> 1 | |
>>>>>> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> 
>>>>>> SHOW INDEX:
>>>>>> +----------------------+------****------+-------------------+-**--**
>>>>>> -----------+-------------+----****-------+-------------+------**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
>>>>>> |
>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>> +----------------------+------****------+-------------------+-**--**
>>>>>> -----------+-------------+----****-------+-------------+------**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
>>>>>> 1342691
>>>>>> | NULL | NULL | | BTREE | |
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>> +----------------------+------****------+-------------------+-**--**
>>>>>> -----------+-------------+----****-------+-------------+------**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> 
>>>>>> 
>>>>>> Although right now the queries do seem to be executing much faster,
>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>> ndx_open_close_dt isn't being used either.
>>>>>> 
>>>>>> -Brandon
>>>>>> 
>>>>>> 
>>>>>>  I am still having a big issue with my query as seen above.  The table
>>>>> is up
>>>>> to around 32 million records at the moment and either of the two SELECT
>>>>> queries above take a very long time to run.  Is there anything at all I
>>>>> can
>>>>> do to speed things up?  It seems that changing the format of the WHERE
>>>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>>>> both version.  I also tried adding an index on (open_dt, close_dt, rcvd)
>>>>> but
>>>>> that index does not get used.
>>>>> 
>>>>> Any other ideas?
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> 
>>>>> Brandon
>>>>> 
>>>>> --
>>>>> MySQL General Mailing List
>>>>> For list archives: http://lists.mysql.com/mysql
>>>>> To unsubscribe:    
>>>>> http://lists.mysql.com/mysql?****<http://lists.mysql.com/mysql?**>
>>>>> unsub=eroomy...@gmail.com<http**://lists.mysql.com/mysql?**
>>>>> unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com>
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:    http://lists.mysql.com/mysql?**
>>> unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com>
>>> 
>>> 
>> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=de...@orange-pants.com
> 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to