Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps <bphe...@gls.com> wrote:

> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>
>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>
>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>
>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>> > > ...
>>>>
>>>> > > WHERE
>>>> > > (open_dt >= '2011-08-30 00:00:00' OR close_dt >= '2011-08-30
>>>> 00:00:00')
>>>> > > AND (open_dt <= '2011-08-30 12:36:53' OR close_dt <= '2011-08-30
>>>> 12:36:53')
>>>>
>>>> > In that case your logic here simplifies to:
>>>> > WHERE
>>>> > open_dt >= '2011-08-30 00:00:00'
>>>> > AND
>>>> > close_dt <= '2011-08-30 12:36:53'
>>>>
>>>> > Now add an index over open_dt and close_dt and see what happens.
>>>>
>>>> > Jochem
>>>>
>>>> Jochem,
>>>>
>>>> I can't really use your WHERE logic because I also need to retrieve
>>>> results where the open_dt time is out of the range specified. For
>>>> example, a very large file download might span multiple days so given
>>>> your logic if the connection was started 2 days ago and I want to pull 1
>>>> days worth of connections, I would miss that entry. Basically I want to
>>>> SELECT all of the records that were opened OR closed during the
>>>> specified time period, ie. if any activity happened between my start and
>>>> end dates, I need to see that record.
>>>>
>>>> Any other ideas?
>>>>
>>>>
>>> I believe Jochem was on the right track but he got his dates reversed.
>>>
>>> Let's try a little ASCII art to show the situation. I will setup a query
>>> window with two markers (s) and (e). Events will be marked by |----| markers
>>> showing their durations.
>>>
>>> a) (s) (e)
>>> b) |---|
>>> c) |---|
>>> d) |---|
>>> e) |--------------------|
>>> f) |---|
>>> g) |---|
>>>
>>> To describe these situations:
>>> a) is the window for which you want to query (s) is the starting time and
>>> (e) is the ending time for the date range you are interested in.
>>> b) the event starts and stops before your window exists. It won't be part
>>> of your results.
>>> c) the event starts before the window but ends within the window -
>>> include this
>>> d) the event starts and ends within the window - include this
>>> e) the event starts before the window and ends after the window - include
>>> this
>>> f) the event starts inside the window but ends beyond the window -
>>> include this.
>>> g) the event starts and ends beyond the window - exclude this.
>>>
>>> In order to get every event in the range of c-f, here is what you need
>>> for a WHERE clause
>>>
>>> WHERE start <= (ending time) and end >= (starting time)
>>>
>>> Try that and let us know the results.
>>>
>>
>> Thanks Jochem and Shawn, however the following two queries result in the
>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>>
>> Old method:
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM firewall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE
>> (open_dt >= '2011-08-31 09:53:31' OR close_dt >= '2011-08-31 09:53:31')
>> AND (open_dt <= '2011-09-01 09:53:31' OR close_dt <= '2011-09-01
>> 09:53:31')
>> ORDER BY rcvd DESC
>> LIMIT 0, 10;
>>
>> New method with BTREE index on open_dt, close_dt (index name is
>> ndx_open_close_dt):
>> SELECT
>> sc.open_dt,
>> sc.close_dt,
>> sc.protocol,
>> INET_NTOA(sc.src_address) AS src_address,
>> sc.src_port,
>> INET_NTOA(sc.dst_address) AS dst_address,
>> sc.dst_port,
>> sc.sent,
>> sc.rcvd,
>> spm.desc AS src_port_desc,
>> dpm.desc AS dst_port_desc
>> FROM firewall_connections AS sc
>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>> WHERE
>> open_dt <= '2011-09-01 09:53:31' AND close_dt >= '2011-08-31 09:53:31'
>> ORDER BY rcvd DESC
>> LIMIT 0, 10;
>>
>> EXPLAIN output for old method:
>> +----+-------------+-------+--**------+-----------------------**
>> ----+----------+---------+----**----------------+------+------**-------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>> rows | Extra |
>> +----+-------------+-------+--**------+-----------------------**
>> ----+----------+---------+----**----------------+------+------**-------+
>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>> NULL | 10 | Using where |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>> 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>> 1 | |
>> +----+-------------+-------+--**------+-----------------------**
>> ----+----------+---------+----**----------------+------+------**-------+
>>
>> EXPLAIN output for new method with new index:
>> +----+-------------+-------+--**------+-----------------------**
>> ----+----------+---------+----**----------------+------+------**-------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>> rows | Extra |
>> +----+-------------+-------+--**------+-----------------------**
>> ----+----------+---------+----**----------------+------+------**-------+
>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>> NULL | 10 | Using where |
>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>> 1 | |
>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>> 1 | |
>> +----+-------------+-------+--**------+-----------------------**
>> ----+----------+---------+----**----------------+------+------**-------+
>>
>> SHOW INDEX:
>> +----------------------+------**------+-------------------+---**
>> -----------+-------------+----**-------+-------------+--------**
>> --+--------+------+-----------**-+---------+
>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>> +----------------------+------**------+-------------------+---**
>> -----------+-------------+----**-------+-------------+--------**
>> --+--------+------+-----------**-+---------+
>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691
>> | NULL | NULL | | BTREE | |
>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>> 6377783 | NULL | NULL | | BTREE | |
>> +----------------------+------**------+-------------------+---**
>> -----------+-------------+----**-------+-------------+--------**
>> --+--------+------+-----------**-+---------+
>>
>>
>> Although right now the queries do seem to be executing much faster,
>> although I'm not quite sure why. And I'm not sure why the new
>> ndx_open_close_dt isn't being used either.
>>
>> -Brandon
>>
>>
> I am still having a big issue with my query as seen above.  The table is up
> to around 32 million records at the moment and either of the two SELECT
> queries above take a very long time to run.  Is there anything at all I can
> do to speed things up?  It seems that changing the format of the WHERE
> clause did not help at all, as the EXPLAIN output is exactly the same for
> both version.  I also tried adding an index on (open_dt, close_dt, rcvd) but
> that index does not get used.
>
> Any other ideas?
>
> Thanks in advance,
>
>
> Brandon
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?**
> unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com>
>
>

Reply via email to