Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
        sc.open_dt,
        sc.close_dt,
        sc.protocol,
        INET_NTOA(sc.src_address) AS src_address,
        sc.src_port,
        INET_NTOA(sc.dst_address) AS dst_address,
        sc.dst_port,
        sc.sent,
        sc.rcvd,
        spm.desc AS src_port_desc,
        dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
        LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
        LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
        sc.open_dt,
        sc.close_dt,
        sc.protocol,
        INET_NTOA(sc.src_address) AS src_address,
        sc.src_port,
        INET_NTOA(sc.dst_address) AS dst_address,
        sc.dst_port,
        sc.sent,
        sc.rcvd,
        spm.desc AS src_port_desc,
        dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
        LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
        LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

+----+--------------+------------+--------+--------------------------------------------+--------------+---------+--------------------+------+----------------+
| id | select_type  | table      | type   | possible_keys                       
       | key          | key_len | ref                | rows | Extra          |
+----+--------------+------------+--------+--------------------------------------------+--------------+---------+--------------------+------+----------------+
|  1 | PRIMARY      | sc         | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt      | 8       | NULL      
         | 1057 | Using where    |
|  1 | PRIMARY      | spm        | eq_ref | PRIMARY                             
       | PRIMARY      | 2       | syslog.sc.src_port |    1 |                |
|  1 | PRIMARY      | dpm        | eq_ref | PRIMARY                             
       | PRIMARY      | 2       | syslog.sc.dst_port |    1 |                |
|  2 | UNION        | sc         | range  | ndx_close_dt                        
       | ndx_close_dt | 8       | NULL               | 1131 | Using where    |
|  2 | UNION        | spm        | eq_ref | PRIMARY                             
       | PRIMARY      | 2       | syslog.sc.src_port |    1 |                |
|  2 | UNION        | dpm        | eq_ref | PRIMARY                             
       | PRIMARY      | 2       | syslog.sc.dst_port |    1 |                |
| NULL | UNION RESULT | <union1,2> | ALL    | NULL                              
         | NULL         | NULL    | NULL               | NULL | Using filesort |
+----+--------------+------------+--------+--------------------------------------------+--------------+---------+--------------------+------+----------------+



On 09/08/2011 03:45 PM, Mihail Manolov wrote:
How about:

SELECT
        sc.open_dt,
        sc.close_dt,
        sc.protocol,
        INET_NTOA(sc.src_address) AS src_address,
        sc.src_port,
        INET_NTOA(sc.dst_address) AS dst_address,
        sc.dst_port,
        sc.sent,
        sc.rcvd,
        spm.desc AS src_port_desc,
        dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
        LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
        LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
        sc.open_dt,
        sc.close_dt,
        sc.protocol,
        INET_NTOA(sc.src_address) AS src_address,
        sc.src_port,
        INET_NTOA(sc.dst_address) AS dst_address,
        sc.dst_port,
        sc.sent,
        sc.rcvd,
        spm.desc AS src_port_desc,
        dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
        LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
        LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:

Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
        sc.open_dt,
        sc.close_dt,
        sc.protocol,
        INET_NTOA(sc.src_address) AS src_address,
        sc.src_port,
        INET_NTOA(sc.dst_address) AS dst_address,
        sc.dst_port,
        sc.sent,
        sc.rcvd,
        spm.desc AS src_port_desc,
        dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
        LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
        LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58'<= open_dt<= '2011-09-08 13:18:58') OR ('2011-09-07 
13:18:58'<= close_dt<= '2011-09-08 13:18:58');


+----+-------------+-------+--------+---------------+---------+---------+--------------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref   
             | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+--------------------+----------+-------------+
|  1 | SIMPLE      | sc    | ALL    | NULL          | NULL    | NULL    | NULL  
             | 32393330 | Using where |
|  1 | SIMPLE      | spm   | eq_ref | PRIMARY       | PRIMARY | 2       | 
syslog.sc.src_port |        1 |             |
|  1 | SIMPLE      | dpm   | eq_ref | PRIMARY       | PRIMARY | 2       | 
syslog.sc.dst_port |        1 |             |
+----+-------------+-------+--------+---------------+---------+---------+--------------------+----------+-------------+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:
Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting 
time)<=close_dt<=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

+----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
| id | select_type | table | type   | possible_keys       | key                 
| key_len | ref                | rows     | Extra                       |
+----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
|  1 | SIMPLE      | sc    | range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8       | NULL               | 32393316 | Using where; Using filesort |
|  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY             
| 2       | syslog.sc.src_port |        1 |                             |
|  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY             
| 2       | syslog.sc.dst_port |        1 |                             |
+----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+

When I remove the LIMIT 10 I get:

----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
| id | select_type | table | type   | possible_keys       | key     | key_len | 
ref                | rows     | Extra                       |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
|  1 | SIMPLE      | sc    | ALL    | ndx_open_close_rcvd | NULL    | NULL    | 
NULL               | 32393330 | Using where; Using filesort |
|  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY | 2       | 
syslog.sc.src_port |        1 |                             |
|  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY | 2       | 
syslog.sc.dst_port |        1 |                             |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:
I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps<bphe...@gls.com>    wrote:

Thanks for the reply Andy.  Unfortunately the users will be selecting
varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.



On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to
make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphe...@gls.com>     wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...


  WHERE
(open_dt>= '2011-08-30 00:00:00' OR close_dt>= '2011-08-30

00:00:00')

AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<= '2011-08-30

12:36:53')

  In that case your logic here simplifies to:
WHERE
open_dt>= '2011-08-30 00:00:00'
AND
close_dt<= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull
1
days worth of connections, I would miss that entry. Basically I want
to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start
and
end dates, I need to see that record.

Any other ideas?


  I believe Jochem was on the right track but he got his dates
reversed.

Let's try a little ASCII art to show the situation. I will setup a
query
window with two markers (s) and (e). Events will be marked by |----|
markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) |--------------------|
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time
and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be
part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window -
include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start<= (ending time) and end>= (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for
you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
+----+-------------+-------+--****------+---------------------**--**
----+----------+---------+----****----------------+------+----**
--**-------+
| id | select_type | table | type | possible_keys | key | key_len | ref
|
rows | Extra |
+----+-------------+-------+--****------+---------------------**--**
----+----------+---------+----****----------------+------+----**
--**-------+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
|
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
|
1 | |
+----+-------------+-------+--****------+---------------------**--**
----+----------+---------+----****----------------+------+----**
--**-------+

EXPLAIN output for new method with new index:
+----+-------------+-------+--****------+---------------------**--**
----+----------+---------+----****----------------+------+----**
--**-------+
| id | select_type | table | type | possible_keys | key | key_len | ref
|
rows | Extra |
+----+-------------+-------+--****------+---------------------**--**
----+----------+---------+----****----------------+------+----**
--**-------+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
|
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
|
1 | |
+----+-------------+-------+--****------+---------------------**--**
----+----------+---------+----****----------------+------+----**
--**-------+

SHOW INDEX:
+----------------------+------****------+-------------------+-**--**
-----------+-------------+----****-------+-------------+------**--**
--+--------+------+-----------****-+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
|
Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------------+------****------+-------------------+-**--**
-----------+-------------+----****-------+-------------+------**--**
--+--------+------+-----------****-+---------+
| firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
1342691
| NULL | NULL | | BTREE | |
| firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
6377783 | NULL | NULL | | BTREE | |
+----------------------+------****------+-------------------+-**--**
-----------+-------------+----****-------+-------------+------**--**
--+--------+------+-----------****-+---------+


Although right now the queries do seem to be executing much faster,
although I'm not quite sure why. And I'm not sure why the new
ndx_open_close_dt isn't being used either.

-Brandon


  I am still having a big issue with my query as seen above.  The table
is up
to around 32 million records at the moment and either of the two SELECT
queries above take a very long time to run.  Is there anything at all I
can
do to speed things up?  It seems that changing the format of the WHERE
clause did not help at all, as the EXPLAIN output is exactly the same for
both version.  I also tried adding an index on (open_dt, close_dt, rcvd)
but
that index does not get used.

Any other ideas?

Thanks in advance,


Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    
http://lists.mysql.com/mysql?****<http://lists.mysql.com/mysql?**>
unsub=eroomy...@gmail.com<http**://lists.mysql.com/mysql?**
unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com>





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?**
unsub=eroomy...@gmail.com<http://lists.mysql.com/mysql?unsub=eroomy...@gmail.com>




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=de...@orange-pants.com




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=mmano...@liquidation.com




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to