subject:"RE\: Query Optimization"

Re: Query Optimization

2012-11-16 Thread Benaya Paul

SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it
faster, if the field is ENUM


On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar
sb_akarmar...@yahoo.comwrote:

 Hi All,

 Consider a scenario, I have table XYZ which contains value follow
 BLUE
 RED
 GREEN
 NULL

 following are queries we can use get this values

 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN');
 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL
 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN'
 and more

 So which one is good in terms of optimization. I guess, 1 and 3 are
 similar in term of formation.


 --Anupam




-- 
Thanks  Regards,
P.Benaya Paul

http://www.codeasearch.com
http://www.iwannasearch.com

RE: Query Optimization

2012-11-16 Thread Rick James

It depends on the distribution of the 4 'values' in that field.
If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do 
a table scan.

 -Original Message-
 From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com]
 Sent: Friday, November 16, 2012 12:36 AM
 To: mysql@lists.mysql.com
 Subject: Query Optimization

 Hi All,

 Consider a scenario, I have table XYZ which contains value follow BLUE RED
 GREEN NULL

 following are queries we can use get this values

 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM
 XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR
 VAL='RED' OR VAL='GREEN'
 and more

 So which one is good in terms of optimization. I guess, 1 and 3 are similar
 in term of formation.

 --Anupam

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: query optimization

2011-09-22 Thread Ananda Kumar

Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full
table scan, you might want to check on this and use a WHERE condition to
use indexed column

On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote:



  I have a table with 24 million rows, I need to figure out how to optimize
 a query.  It has to do with mac addresses and radius packets - I want to see
 the # of connections and the min/max date. So I basically want all this
 data:

   select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num)
 recn
   from radiuscap
   where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY)
 and r3_type='Access'
   group by cpe_mac order by cpe_mac
 ;

 This piece of the query takes 30 seconds to run and produces 3500 rows.  I
 have r3_dt indexed.  I also want a status field of the row with the highest
 r3_dt:

 select rec_num,cpe_mac,req_status
 from rad_r3cap
 where r3_type='Access'
   and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap)
 ;

 This piece of the query takes forever,  I let it run for an hour and it
 still didn't finish, it's obviously not using indexes.  I have no idea how
 far along it got.  I wrote a php script to run the 1st query, then do 3500
 individual lookups for the status using the max(rec_num) field in the 1st
 query, and I can get the data in 31 seconds.  So I CAN produce this data,
 but very slowly, and not in 1 sql query.  I want to consolidate this into 1
 sql so I can make a view.

 If anyone can point me in the right direction, I'd appreciate it!



 mysql desc rad_r3cap;
 +-+-+--+-+-++
 | Field   | Type| Null | Key | Default | Extra  |
 +-+-+--+-+-++
 | rec_num | int(11) | NO   | PRI | NULL| auto_increment |
 | r3_dt   | datetime| YES  | MUL | NULL||
 | r3_micros   | int(11) | YES  | | NULL||
 | r3_type | varchar(16) | YES  | | NULL||
 | req_status  | varchar(16) | YES  | | NULL||
 | req_comment | varchar(64) | YES  | | NULL||
 | asn_ip  | varchar(16) | YES  | MUL | NULL||
 | asn_name| varchar(16) | YES  | | NULL||
 | bsid| varchar(12) | YES  | MUL | NULL||
 | cpe_ip  | varchar(16) | YES  | | NULL||
 | cpe_mac | varchar(12) | YES  | MUL | NULL||
 | filename| varchar(32) | YES  | | NULL||
 | linenum | int(11) | YES  | | NULL||
 | r3_hour | datetime| YES  | MUL | NULL||
 | user_name   | varchar(64) | YES  | | NULL||
 +-+-+--+-+-++

 mysql show indexes in rad_r3cap;

 +---++--+--+-+---+-+--++--++-+
 | Table | Non_unique | Key_name | Seq_in_index | Column_name |
 Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |

 +---++--+--+-+---+-+--++--++-+
 | rad_r3cap |  0 | PRIMARY  |1 | rec_num | A
   |23877677 | NULL | NULL   |  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |1 | r3_dt   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |2 | r3_micros   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_bsid   |1 | bsid| A
   | 346 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_asnip  |1 | asn_ip  | A
   |  55 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_cpemac |1 | cpe_mac | A
   |4758 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_date   |1 | r3_hour | A
   |1548 | NULL | NULL   | YES  | BTREE  | |

 +---++--+--+-+---+-+--++--++-+
 7 rows in set (0.00 sec)

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


On 09/01/2011 01:32 PM, Brandon Phelps wrote:

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |

Re: Query Optimization

2011-09-08 Thread Andrew Moore

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote:

 On 09/01/2011 01:32 PM, Brandon Phelps wrote:

 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

 On 9/1/2011 09:42, Brandon Phelps wrote:

 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.

 Any other ideas?


 I believe Jochem was on the right track but he got his dates reversed.

 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start = (ending time) and end = (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
 AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+

 EXPLAIN output for new method with new index:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---+--**--+---**
+--+-+**+--+--**---+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
++-+---+--**--+---**
+--+-+**+--+--**---+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
1 | |
++-+---+--**--+---**
+--+-+**+--+--**---+

EXPLAIN output for new method with new index:
++-+---+--**--+---**

Re: Query Optimization

2011-09-08 Thread Andrew Moore

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.



 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be
 part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window -
 include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start= (ending time) and end= (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for
 you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref
 |
 rows | Extra |
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | 1 | SIMPLE | sc | index |

Re: Query Optimization

2011-09-08 Thread Mihail Manolov

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
 date ranges and new data is constantly coming in, so I am not sure how I 
 could archive/cache the necessary data that would be any more efficient than 
 simply using the database directly.
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 Thinking outside the query, is there any archiving that could happen to make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
 On 9/1/2011 09:42, Brandon Phelps wrote:
 
 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 ...
 
 WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 00:00:00')
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 12:36:53')
 
 In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 Now add an index over open_dt and close_dt and see what happens.
 
 Jochem
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.
 
 Any other ideas?
 
 
 I believe Jochem was on the right track but he got his dates reversed.
 
 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || 
 markers
 showing their durations.
 
 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|
 
 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.
 
 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause
 
 WHERE start= (ending time) and end= (starting time)
 
 Try that and let us know the results.
 
 
 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
 
 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting
varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.



On 09/08/2011 02:16 PM, Andrew Moore wrote:


Thinking outside the query, is there any archiving that could happen to
make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:


  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


  On 9/1/2011 09:42, Brandon Phelps wrote:


  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:



...





  WHERE

(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30


00:00:00')



AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30



12:36:53')


  In that case your logic here simplifies to:

WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



  Now add an index over open_dt and close_dt and see what happens.




  Jochem




Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull
1
days worth of connections, I would miss that entry. Basically I want
to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start
and
end dates, I need to see that record.

Any other ideas?


  I believe Jochem was on the right track but he got his dates

reversed.

Let's try a little ASCII art to show the situation. I will setup a
query
window with two markers (s) and (e). Events will be marked by ||
markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting

Re: Query Optimization

2011-09-08 Thread Derek Downey

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
 server 11.04.  Unfortunately the machine only has 2GB of RAM but no other 
 major daemons are running on the machine.  We are running RAID 1 (mirroring) 
 with 1TB drives.  The tables in question here are all MyISAM.  When running 
 with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key   
   | key_len | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using where; 
 Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:
 
 +-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key | key_len 
 | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL
 | NULL   | 32393330 | Using where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 Thanks for all your help thus far.
 
 On 09/08/2011 02:38 PM, Andrew Moore wrote:
 I don't think I saw any query timings in the emails (maybe I missed them).
 
 What version of MySQL are you currently using?
 What does the explain look like when your remove the limit 10?
 Is your server tuned for MyISAM or InnoDB?
 What kind of disk setup is in use?
 How much memory is in your machine?
 
 
 On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.
 
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 
 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:
 
  On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
  On 9/1/2011 09:42, Brandon Phelps wrote:
 
  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 
 ...
 
 
  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 
 00:00:00')
 
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 
 12:36:53')
 
  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 
  Now add an index over open_dt and close_dt and see what happens.
 
 
  Jochem
 
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Mihail,

I have considered this but have not yet determined how best to go about 
partitioning the table.  I don't think partitioning by dst_address or 
src_address would help because most of the queries do not filter on IP address 
(except very specific queries where the end-user is searching the table for 
history on a particular employee).

I could potentially partition the table based on the day of week the connection 
was opened on which may improve performance for a while since this would take 
me from a single 32million record table down to roughly 4.5 million records per 
partition (32 / 7) however we are looking to eventually store up to 2 months 
worth of data in the table, and so far that 32 million records is only for 1 
month, so I estimate another 32 million-ish before the month is out, bringing 
me to roughly 70 million records total (it would be nice if I could store even 
more than 2 months, but given my currently performance dilemma I don't expect 
that to happen).  Also this does not take into account that the end-user will 
often be pulling data for multiple days at a time, meaning that multiple 
partitions in this scheme will need to be accessed anyway.

The only other logical partitioning scheme I can think of would be to partition 
based on dst_port (the port the log relates to) but the majority of records are 
all to port 80 (standard web traffic) so I don't think this would be all that 
helpful.

I have never messed with partitioning so it is totally possible that I am not 
thinking of something, so if you have any ideas on a decent partitioning scheme 
based on my criteria and queries below, please let me know.

Thanks,
Brandon
 


On 09/08/2011 02:47 PM, Mihail Manolov wrote:

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 
13:18:58' = close_dt = '2011-09-08 13:18:58');


++-+---++---+-+-++--+-+
| id | select_type | table | type   | possible_keys | key | key_len | ref   
 | rows | Extra   |
++-+---++---+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| NULL  
 | 32393330 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++---+-+-++--+-+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is

Re: Query Optimization

2011-09-08 Thread Andrew Moore

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote:

 Mihail,

 I have considered this but have not yet determined how best to go about
 partitioning the table.  I don't think partitioning by dst_address or
 src_address would help because most of the queries do not filter on IP
 address (except very specific queries where the end-user is searching the
 table for history on a particular employee).

 I could potentially partition the table based on the day of week the
 connection was opened on which may improve performance for a while since
 this would take me from a single 32million record table down to roughly 4.5
 million records per partition (32 / 7) however we are looking to eventually
 store up to 2 months worth of data in the table, and so far that 32 million
 records is only for 1 month, so I estimate another 32 million-ish before the
 month is out, bringing me to roughly 70 million records total (it would be
 nice if I could store even more than 2 months, but given my currently
 performance dilemma I don't expect that to happen).  Also this does not take
 into account that the end-user will often be pulling data for multiple days
 at a time, meaning that multiple partitions in this scheme will need to be
 accessed anyway.

 The only other logical partitioning scheme I can think of would be to
 partition based on dst_port (the port the log relates to) but the majority
 of records are all to port 80 (standard web traffic) so I don't think this
 would be all that helpful.

 I have never messed with partitioning so it is totally possible that I am
 not thinking of something, so if you have any ideas on a decent partitioning
 scheme based on my criteria and queries below, please let me know.

 Thanks,
 Brandon


 On 09/08/2011 02:47 PM, Mihail Manolov wrote:

 If you're running version 5.1+ you may wanna take a look at table
 partitioning options you may have.

 On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.


 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com
 wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so
 given
 your logic if the connection was started 2 days ago and I want to
 pull 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be

Re: Query Optimization

2011-09-08 Thread Mihail Manolov

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:

 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR 
 ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58');
 
 
 ++-+---++---+-+-++--+-+
 | id | select_type | table | type   | possible_keys | key | key_len | ref 
| rows | Extra   |
 ++-+---++---+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| 
 NULL   | 32393330 | Using where |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.dst_port |1 | |
 ++-+---++---+-+-++--+-+
 
 I did create indexes on open_dt and close_dt (2 separate indexes).
 
 
 
 On 09/08/2011 02:55 PM, Derek Downey wrote:
 Correct me if I'm wrong. You're wanting to get all records that have an 
 open_date or a close_date between two times.
 
 If that's correct, you might be able to get an index_merge by doing a query 
 like:
 
 WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
 time)=close_dt=(ending time))
 
 and creating two indexes (one on 'open_dt' and the other on 'close_dt')
 
 http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html
 
 Regards,
 Derek
 
 On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
 
 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on 
 Ubuntu server 11.04.  Unfortunately the machine only has 2GB of RAM but no 
 other major daemons are running on the machine.  We are running RAID 1 
 (mirroring) with 1TB drives.  The tables in question here are all MyISAM.  
 When running with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key 
 | key_len | ref| rows | Extra   
 |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using 
 where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.src_port |1 | 
 |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.dst_port |1 | 
 |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Andrew,

Generally there is only 1 user performing the complicated SELECT query at a time, however 
the background process that fills the table is constantly doing a fast SELECT (0.3 
seconds) and a subsequent UPDATE.  Basically whenever a connection is closed on the 
firewall, the bg process SELECTS from the table the last record that was opened (and 
hasn't already been closed) for a given source IP, source port, dest ip, and protocol.  
So for example whenever the firewall logs a CONNECTION CLOSED message, the bg 
process does:

select id from sonicwall_connections where src_address = w.x.y.z and src_port = 
x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1

then it UPDATES the close_dt column for the record with the selected ID.  These 
select/update statements being run by the background process generally take 
0.000x seconds each.

The only slowdown we see is with the complicated SELECT I have below, and this 
is only executed once in a while throughout the day.  For example, if an IT 
manager sees a huge bandwidth spike on the network, he may access the web gui 
to determine what the source IP is so he can determine who is downloading a 
large file, etc.

I think what I will try to do is create 60 partitions for the table in question 
based on month and day.  This way each day will go into it's own partition and 
if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it 
will only need to access 7 partitions instead of the entire table.

My only question with this is how would I go about creating the table with 60 
partitions in such a way that I won't need to constantly add/remove new/old 
partitions every day?  Is there any partitioning function I can use that would 
not require me to update the partitions schema every day?  I already plan to 
have a cron run each night to purge records older than 60 days from the 
database.

On 09/08/2011 03:26 PM, Andrew Moore wrote:

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com  wrote:


Mihail,

I have considered this but have not yet determined how best to go about
partitioning the table.  I don't think partitioning by dst_address or
src_address would help because most of the queries do not filter on IP
address (except very specific queries where the end-user is searching the
table for history on a particular employee).

I could potentially partition the table based on the day of week the
connection was opened on which may improve performance for a while since
this would take me from a single 32million record table down to roughly 4.5
million records per partition (32 / 7) however we are looking to eventually
store up to 2 months worth of data in the table, and so far that 32 million
records is only for 1 month, so I estimate another 32 million-ish before the
month is out, bringing me to roughly 70 million records total (it would be
nice if I could store even more than 2 months, but given my currently
performance dilemma I don't expect that to happen).  Also this does not take
into account that the end-user will often be pulling data for multiple days
at a time, meaning that multiple partitions in this scheme will need to be
accessed anyway.

The only other logical partitioning scheme I can think of would be to
partition based on dst_port (the port the log relates to) but the majority
of records are all to port 80 (standard web traffic) so I don't think this
would be all that helpful.

I have never messed with partitioning so it is totally possible that I am
not thinking of something, so if you have any ideas on a decent partitioning
scheme based on my criteria and queries below, please let me know.

Thanks,
Brandon


On 09/08/2011 02:47 PM, Mihail Manolov wrote:


If you're running version 5.1+ you may wanna take a look at table
partitioning options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting

varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT | union1,2 | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings

Re: Query Optimization

2011-09-08 Thread Mihail Manolov

From the manual: The default behavior for UNION is that duplicate rows are 
removed from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:

 Mihail,
 
 Thanks so much!  I modified your example to include the proper ORDER BY and 
 LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
 Question, if a record's open_dt is between the range AND the close_dt is 
 between the range as well, will the UNION output the record twice?  If so, is 
 there any way to prevent that?
 
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 UNION
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 ORDER BY rcvd DESC LIMIT 10;
 
 ++--++++--+-++--++
 | id | select_type  | table  | type   | possible_keys 
  | key  | key_len | ref| rows | Extra 
  |
 ++--++++--+-++--++
 |  1 | PRIMARY  | sc | range  | 
 open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL
| 1057 | Using where|
 |  1 | PRIMARY  | spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  1 | PRIMARY  | dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 |  2 | UNION| sc | range  | ndx_close_dt  
  | ndx_close_dt | 8   | NULL   | 1131 | Using where   
  |
 |  2 | UNION| spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  2 | UNION| dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 | NULL | UNION RESULT | union1,2 | ALL| NULL
| NULL | NULL| NULL   | NULL | Using 
 filesort |
 ++--++++--+-++--++
 
 
 
 On 09/08/2011 03:45 PM, Mihail Manolov wrote:
 How about:
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 UNION
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 
 
 On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
 
 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Ah I see.  Well thanks for your assistance!

-Brandon

On 09/08/2011 05:21 PM, Mihail Manolov wrote:

 From the manual: The default behavior for UNION is that duplicate rows are removed 
from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT |union1,2  | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS

Re: Query Optimization

2011-09-01 Thread Jochem van Dieten

On Aug 30, 2011 6:46 PM, Brandon Phelps wrote:
 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc

If this is a firewall connection log I presume open_dt is the time a
connection was opened and is always going to be less than close_dt. Right?

 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

In that case your logic here simplifies to:
WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

Now add an index over open_dt and close_dt and see what happens.

Jochem

Re: Query Optimization

2011-09-01 Thread Brandon Phelps


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA( sc.src_address ) AS src_address,
 sc.src_port,
 INET_NTOA( sc.dst_address ) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
  FROM firewall_connections AS sc

 If this is a firewall connection log I presume open_dt is the time a
 connection was opened and is always going to be less than close_dt.
 Right?

  WHERE
 (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
 AND (open_dt = '2011-08-30 12:36:53' OR close_dt = 
'2011-08-30 12:36:53')


 In that case your logic here simplifies to:
 WHERE
   open_dt = '2011-08-30 00:00:00'
   AND
   close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve 
results where the open_dt time is out of the range specified.  For 
example, a very large file download might span multiple days so given 
your logic if the connection was started 2 days ago and I want to pull 1 
days worth of connections, I would miss that entry.  Basically I want to 
SELECT all of the records that were opened OR closed during the 
specified time period, ie. if any activity happened between my start and 
end dates, I need to see that record.


Any other ideas?

Thanks again,
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-09-01 Thread Shawn Green (MySQL)


On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query 
window with two markers (s) and (e).  Events will be marked by || 
markers showing their durations.


a)   (s)   (e)
b) |---|
c)  |---|
d)|---|
e)  ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time 
and (e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be 
part of your results.
c) the event starts before the window but ends within the window - 
include this

d) the event starts and ends within the window  - include this
e) the event starts before the window and ends after the window - 
include this
f) the event starts inside the window but ends beyond the window - 
include this.

g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need 
for a WHERE clause


WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-09-01 Thread Brandon Phelps


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output:   (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |
++-+---++---+--+-++--+-+
|  1 | SIMPLE  | sc| index  | open_dt,ndx_open_close_dt | ndx_rcvd | 4  
 | NULL   |   10 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.dst_port |1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |

Re: Query Optimization

2011-08-30 Thread Michael Dykman

There are a few things gonig on, but mainly it is the ORs that are killing
you.

As your  require OR to examine two distinct columns, both of equal relevance
to the query, MySQL: is left with no choice but to do a full table scan on
what might be (at a guess) a very larger table.  No amount of indexing will
fix this for the query presented.

You would be better off writing it as two distinct queires, each concerned
with conditions on a single column (open_dt and close_dt) and then UNIONing
the results. In this form, the indexes have a chance of being engaged.

Once the indexes are engaged, you probably want to your DATE/DATETIME
strings into actual DATEs or DATATIMEs, thus:

 ...
   (open_dt = DATE('2011-08-30 00:00:00'))
 ...

In it's current state, the DATE fields are being converted to strings
implicitly for every row tested which further frustrates index usage as the
index is against the quasi-numeric DATE, not the string representation which
your current implementation appears to expect.  This query would also
suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by
an admin or a future release of MySQL.  The explicit casting I have
suggested will protect you against that,

 - michael dykman`
On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote:

 Hello,

 I have the following query I'd like to optimize a bit:

 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')
 ORDER BY rcvd DESC
 LIMIT 0 , 10

 Currently I have an index on the rcvd column which seems to be working
 based on the output of EXPLAIN:

 id  select_type table   typepossible_keys   key key_len
 ref rowsExtra
 1   SIMPLE  sc  index   open_dt ndx_rcvd4
 NULL10  Using where
 1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.src_port  1
 1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.dst_port  1

 However the query is still fairly slow for some reason, any ideas how I
 could speed it up with additional indexes, etc?

 The values I am using in the WHERE clause are variable and change each
 time.

 The table has around 23 million records right now but is expected to
 continue to grow up to a potential 150 million.

 Here is the table schema:
 CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
 ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

 Thanks in advance!

 --
 Brandon

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com




-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.

Re: Query Optimization

2011-08-10 Thread Brandon Phelps

Thanks Singer,  this took my query down to 0.0007, perfect!  I wasn't 
aware a single index of multiple columns would work when one of the 
columns was in the WHERE clause and the other in the ORDER BY clause. 
Learn something new every day I guess!



On 08/10/2011 02:03 PM, Singer X.J. Wang wrote:

Try a index on (dst_port,close_dt)

On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com
mailto:bphe...@gls.com wrote:

Hello all,

I am using the query below and variations of it to query a database
with a TON of records.  Currently the database has around 11 million
records but it grows every day and should cap out at around 150 million.

I am curious if there is any way I can better optimize the below
query, as currently it takes this query around 10 seconds to run but
I am sure this will get slower and slower as the database grows.

SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the
output of EXPLAIN:

id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?__unsub=w...@singerwang.com
http://lists.mysql.com/mysql?unsub=w...@singerwang.com


--
The best compliment you could give Pythian for our service is a referral.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-08-10 Thread Peter Brawley


On 8/10/2011 1:01 PM, Brandon Phelps wrote:

Hello all,

I am using the query below and variations of it to query a database 
with a TON of records.  Currently the database has around 11 million 
records but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below 
query, as currently it takes this query around 10 seconds to run but I 
am sure this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the 
output of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Did you try adding your ORDER BY argument close_dt to the index?

PB

-


Thanks in advance,



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-08-10 Thread Jan Steinman

 From: Brandon Phelps bphe...@gls.com
 
 I am curious if there is any way I can better optimize the below query, as 
 currently it takes this query around 10 seconds to run but I am sure this 
 will get slower and slower as the database grows.

You need an index on `close_dt`.

 SELECT
   open_dt,
   close_dt,
   protocol,
   INET_NTOA(src_address) AS src_address,
   src_port,
   INET_NTOA(dst_address) AS dst_address,
   dst_port,
   sent,
   rcvd
 FROM connections
 WHERE
   dst_port = 80
 ORDER BY close_dt  DESC
 LIMIT 0, 30


Current farmers, who have become mere operators of machines and mixers of 
chemicals, may not have the skills to develop a local, sustainable agriculture. 
A new generation of farmers, numbering in the tens of millions, will need to be 
trained and relocated to rural communities. -- Pat Murphy
 Jan Steinman, EcoReality Co-op 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Gavin Towey

Hi Ciaran,

So I think there's a couple things going on:

1. The explain plan for your slow query looks wrong, such as mysql is 
confused.  It's possible your index statistics are incorrect.  Try ANALYZE 
TABLE  on listings and addresses.

I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order.  
That should get rid of the temp table and filesort operations and give faster 
results.

SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



2. I need to make some comments about your table design:

This column is AWFUL:
`addressable_type` varchar(255) DEFAULT NULL,

Why have field that hold up to 255 characters and put a little string in it 
like Listing? Why does it matter?  Well it makes your indexes disasterously 
bloated:

KEY `index_addresses_on_parent_city_id_and_addressable_type`
(`parent_city_id`,`addressable_type`),


If you noticed in the explain plan, that index is 733 bytes *per row*.  
Especially using utf8 means each character takes 3 bytes in the index.  That's 
terrible. That type field should be a foreign key tinyint or at the very least 
be a much much shorter varchar field (such as 8 or 10)

You have lots of those varchar(255) fields, which looks like lazy design -- by 
not gathering correct requirements and designing accordingly you will hurt your 
database performance, waste disk space and cause yourself all kinds of future 
problems.

3.  Why are you using OUTER JOIN?

It looks to me like you're using it because you don't know the difference, 
since you're not looking for NULL rows or anything.  In fact, it looks like 
mysql is smart enough to know that you've negated the OUTER JOIN by putting 
conditions on the joined tables in the WHERE clause, and convert then to INNER 
JOINS.  Don't rely on that!  Use the correct join type.

Those queries
Regards,
Gavin Towey

-Original Message-
From: Ciaran Lee [mailto:ciaran@gmail.com]
Sent: Tuesday, September 22, 2009 1:32 PM
To: mysql@lists.mysql.com
Subject: query optimization question (my struggle against 'using temporary; 
using filesort')

Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using

Re: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Ciaran Lee

Hi Gavin,Thanks very much, I'll implement as many of your suggestions as
possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :)
The queries were generated by ActiveRecord (an ORM library for Ruby),
although even if I had written them myself they would probably not be much
better.
Regards,
Ciaran Lee

2009/9/24 Gavin Towey gto...@ffn.com

 Hi Ciaran,

 So I think there's a couple things going on:

 1. The explain plan for your slow query looks wrong, such as mysql is
 confused.  It's possible your index statistics are incorrect.  Try ANALYZE
 TABLE  on listings and addresses.

 I think a sure way to fix it is to add STRAIGHT_JOIN to force the join
 order.  That should get rid of the temp table and filesort operations and
 give faster results.

 SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
 JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



 2. I need to make some comments about your table design:

 This column is AWFUL:
 `addressable_type` varchar(255) DEFAULT NULL,

 Why have field that hold up to 255 characters and put a little string in it
 like Listing? Why does it matter?  Well it makes your indexes
 disasterously bloated:

 KEY `index_addresses_on_parent_city_id_and_addressable_type`
 (`parent_city_id`,`addressable_type`),


 If you noticed in the explain plan, that index is 733 bytes *per row*.
  Especially using utf8 means each character takes 3 bytes in the index.
  That's terrible. That type field should be a foreign key tinyint or at the
 very least be a much much shorter varchar field (such as 8 or 10)

 You have lots of those varchar(255) fields, which looks like lazy design --
 by not gathering correct requirements and designing accordingly you will
 hurt your database performance, waste disk space and cause yourself all
 kinds of future problems.

 3.  Why are you using OUTER JOIN?

 It looks to me like you're using it because you don't know the difference,
 since you're not looking for NULL rows or anything.  In fact, it looks like
 mysql is smart enough to know that you've negated the OUTER JOIN by putting
 conditions on the joined tables in the WHERE clause, and convert then to
 INNER JOINS.  Don't rely on that!  Use the correct join type.

 Those queries
 Regards,
 Gavin Towey

 -Original Message-
 From: Ciaran Lee [mailto:ciaran@gmail.com]
 Sent: Tuesday, September 22, 2009 1:32 PM
 To: mysql@lists.mysql.com
 Subject: query optimization question (my struggle against 'using temporary;
 using filesort')

 Hi,

 I hope this is the right place to ask a question about query optimization.

 Background:
 I have a database which has events, which occur in places (listings).
 Places
 have addresses, and addresses belong to a city. I can select the latest
 event within a particular city very efficiently (less than 2ms), but
 selecting the latest listing within a city is REALLY slow (10-20 seconds)
 despite being almost a subset of the event query.

 I have been working on this for about a day, and have tried all sorts of
 tweaks to the indexes but to no avail. I always seem to end up with 'using
 temporary; using filesort' as the 'extra' content in the explain result. If
 anyone has a suggestion for what I might do to fix this, I'd really
 appreciate it. If not, I could further de-normalize the database for
 performance reasons, but I would feel dirty for doing so.

 Here is the fast query (select the latest event within a particular city),
 and it's explain.
 SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1


  
 ++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |

  
 ++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref |

Re: Query Optimization

2009-01-14 Thread Johnny Withers

The index hint is not in productoin code.. I was trying ot force it to use
the index even when using the OR clause.. ment to take that out before I
sent the email.

The table structure is:

CREATE TABLE `customer` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `ssn` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_ssn` (`ssn`)
) ENGINE=InnoDB

CREATE TABLE `customer_id` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` int(10) unsigned DEFAULT NULL,
  `id_num` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `customer_key` (`customer_id`),
  KEY `id_id_num` (`id_num`)
) ENGINE=InnoDB
The explain output of the query using the OR clause:
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ALL
possible_keys: idx_ssn
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 176680
Extra: Using where; Using temporary
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where; Distinct
2 rows in set (0.00 sec)

Using a UNION results in:

*** 1. row ***
   id: 1
  select_type: PRIMARY
table: customer
 type: range
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 1
Extra: Using where; Using temporary
*** 2. row ***
   id: 1
  select_type: PRIMARY
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using index; Distinct
*** 3. row ***
   id: 2
  select_type: UNION
table: customer_id
 type: range
possible_keys: customer_key,id_id_num
  key: id_id_num
  key_len: 35
  ref: NULL
 rows: 1
Extra: Using where; Using temporary
*** 4. row ***
   id: 2
  select_type: UNION
table: customer
 type: eq_ref
possible_keys: PRIMARY
  key: PRIMARY
  key_len: 4
  ref: aca_ecash.customer_id.customer_id
 rows: 1
Extra: Using where
*** 5. row ***
   id: NULL
  select_type: UNION RESULT
table: union1,2
 type: ALL
possible_keys: NULL
  key: NULL
  key_len: NULL
  ref: NULL
 rows: NULL
Extra:
5 rows in set (0.01 sec)

The union is much faster. I've tested the same search for ID numbers on our
test system (Windows 32-Bit, 2GB ram, P4 3Ghz) against the productoin system
(RHEL 64-Bit 16GB ram, Dual Xeon 2Ghz).. the search in the test system is
almost instant as compared to the production system its taking 4 to 6
seconds. There's not much traffic today on it.

I'm going to put the UNION into production and see how it goes.

Thanks for the replies.

-johnny
On Tue, Jan 13, 2009 at 7:39 PM, Andrew Garner andrew.b.gar...@gmail.comwrote:

  On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote:
  If you have separate indexes on ssn and id_num, MySQL may be able to
  efficiently use an index merge optimization .   See
  http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
  This is only in 5.0+ - on older versions of MySQL you may find a union
  more efficient.
 
  And in newer versions, too.  The optimizer frequently underestimates
  the cost of the merge operation and the required random I/O for row
  lookups.  So, yes it can use an index merge, but... efficiency is
  another question.  I've seen table scans outperform a two-way index
  merge by orders of magnitude.
 
 These appeared to be high selectivity indexes, but perhaps I assumed
 too much. :)




-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net

Re: Query Optimization

2009-01-13 Thread Ken Menzel


Try a union instead of an or condition.
http://dev.mysql.com/doc/refman/5.0/en/union.html

Johnny Withers wrote:

I have the following tables:

Customer: id,ssn
Customer_Id: id,customer_id,id_num

The customer table holds customers along with their SSN and the customer_id
table holds identifications for each customer (Driver's License, State
Issued ID, Student ID, etc). The SSN column from the customer table is
VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32).
Both of these columns have an index on them.

The following query uses the index on customer.ssn and executes in 0ms:

SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
FROM customer USE INDEX(idx_ssn)
LEFT JOIN customer_id ON customer.id=customer_id.customer_id
WHERE ssn='123456789';

Explain output:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ref
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: const
 rows: 1
Extra: Using where; Using index
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra:

Now, this is the query I have trouble with, it does not use the index (or
says it does but doesn't?) and on a busy system (200+ queries per sec) can
take up to 20 seconds or more to execute:

SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
FROM customer USE INDEX(idx_ssn)
LEFT JOIN customer_id ON customer.id=customer_id.customer_id
WHERE ssn='123456789' OR id_num='123456789';

Explain output:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: index
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 165843
Extra: Using index
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where


Is there some way I can make it use the index? I've thought about
redesigning the query to select from the customer_id table first, if a row
is found.. just return the matching customer_id from the customer table..
but I wanted to see if maybe i'm going about this the wrong way before I
engineer some way around this.

Thanks in advance,

-
Johnny Withers
601.209.4985
joh...@pixelated.net




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2009-01-13 Thread Andrew Garner

Do you have an index on id_num?  What sort of explain output do you
get when you don't use a query hint? Your USE INDEX hint may be
causing MySQL to ignore a better strategy.

If you have separate indexes on ssn and id_num, MySQL may be able to
efficiently use an index merge optimization .   See
http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
This is only in 5.0+ - on older versions of MySQL you may find a union
more efficient.


On Mon, Jan 12, 2009 at 9:43 AM, Johnny Withers joh...@pixelated.net wrote:
 I have the following tables:

 Customer: id,ssn
 Customer_Id: id,customer_id,id_num

 The customer table holds customers along with their SSN and the customer_id
 table holds identifications for each customer (Driver's License, State
 Issued ID, Student ID, etc). The SSN column from the customer table is
 VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32).
 Both of these columns have an index on them.

 The following query uses the index on customer.ssn and executes in 0ms:

 SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
 FROM customer USE INDEX(idx_ssn)
 LEFT JOIN customer_id ON customer.id=customer_id.customer_id
 WHERE ssn='123456789';

 Explain output:

 *** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ref
 possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: const
 rows: 1
Extra: Using where; Using index
 *** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
 possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra:

 Now, this is the query I have trouble with, it does not use the index (or
 says it does but doesn't?) and on a busy system (200+ queries per sec) can
 take up to 20 seconds or more to execute:

 SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
 FROM customer USE INDEX(idx_ssn)
 LEFT JOIN customer_id ON customer.id=customer_id.customer_id
 WHERE ssn='123456789' OR id_num='123456789';

 Explain output:

 *** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: index
 possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 165843
Extra: Using index
 *** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
 possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where


 Is there some way I can make it use the index? I've thought about
 redesigning the query to select from the customer_id table first, if a row
 is found.. just return the matching customer_id from the customer table..
 but I wanted to see if maybe i'm going about this the wrong way before I
 engineer some way around this.

 Thanks in advance,

 -
 Johnny Withers
 601.209.4985
 joh...@pixelated.net


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2009-01-13 Thread Baron Schwartz

 If you have separate indexes on ssn and id_num, MySQL may be able to
 efficiently use an index merge optimization .   See
 http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
 This is only in 5.0+ - on older versions of MySQL you may find a union
 more efficient.

And in newer versions, too.  The optimizer frequently underestimates
the cost of the merge operation and the required random I/O for row
lookups.  So, yes it can use an index merge, but... efficiency is
another question.  I've seen table scans outperform a two-way index
merge by orders of magnitude.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2009-01-13 Thread Andrew Garner

On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote:
 If you have separate indexes on ssn and id_num, MySQL may be able to
 efficiently use an index merge optimization .   See
 http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
 This is only in 5.0+ - on older versions of MySQL you may find a union
 more efficient.

 And in newer versions, too.  The optimizer frequently underestimates
 the cost of the merge operation and the required random I/O for row
 lookups.  So, yes it can use an index merge, but... efficiency is
 another question.  I've seen table scans outperform a two-way index
 merge by orders of magnitude.

These appeared to be high selectivity indexes, but perhaps I assumed
too much. :)

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query optimization help

2008-08-12 Thread Brent Baisley

First, you might want to move the WHERE...t3.int_a = some integer  
condition into the join condition for t3.
Your not using anything from t4, so I'm not sure why you have that  
table in your query.


You can suggest or force mysql to use an index if it's using the wrong  
one:

http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

On very rare occasions I've had to do this. What's happening is that  
mysql is analyzing the information it has about the data and indexes  
and coming to the wrong conclusion, perhaps even opting for an entire  
table scan. You can run ANALYZE TABLE to force mysql to update the  
information it has about the data. This may actually solve your problem.


Try SHOW INDEX FROM t1 to see what data mysql has about the indexes.  
Sometimes the CARDINALITY (uniqueness) column will be null which can  
indicate a problem.


Posting the result of your EXPLAIN will actually be helpful.

Hope that helps.

Brent Baisley


On Aug 11, 2008, at 8:26 AM, Jonathan Terhorst wrote:


I have this query:

SELECT DISTINCT t1.string FROM t1
LEFT JOIN t2 ON t1.string=t2.string
LEFT JOIN t3 ON t1.int_a=t3.int_a
LEFT JOIN t4 ON t1.int_b=t4.int_b
WHERE
t1.string != '' AND
t2.string IS NULL AND
t3.int_a = some integer
ORDER BY
t1.string ASC

This query is executing slower than it should. EXPLAIN has it using  
temporary and using filesort.
I have indexes on every column in the query, but I think the problem  
is the one-index-per-table limitation. According to EXPLAIN, there  
are two possible indices in use for t1 (int_a and string), but only  
int_a is being used. So I tried constructing a compound index on  
int_a and string. Although this new index appears in possible_keys,  
EXPLAIN still shows the key actually being used as int_a. I tried  
building the compound key in both orders and had the same results.  
How do get mysql to all possible keys on t1 when running the query?  
Thanks!




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization

2008-01-22 Thread Joris Kinable

-What I'am trying to do:
Bit hard to explain. I've got a table consisting of ip addresses
(ipv4_src), destination addresses (ipv4_dst), and port numbers
(port_dst) and some other irrelevant columns. Ultimately my goal is to
find a linear order in a subset of ports. For example, host A connects
to B on port 1, 2, 3, 4,...20, I would like to find this linear
relation. To achieve this, I have written some mathematical formula's.
Unfortunately there is one downside to my formula: It can be fooled by
adding some extreme values. It cannot find a linear relation in this
list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up,
the 45000 number ruïns the result of the algorithm. So the query I've
submitted, is ment to remove extreme values. The methods
boxplot(pF.port_dst,LOW) and boxplot(pF.port_dst,HIGH) calculate
in linear time the allowed range of numbers. Extreme values won't be
included in this range. So in the example, the range would be [1,20]
therby omitting the value 45000. Finally I would like to filter my
table with port numbers for every ipv4_srcipv4_dst tuple and
remove all the numbers not fitting in the range.

-In human readable pseudo code this is the query:

SELECT source,dest,port,octets FROM
(
SELECT source,dest,boxplot(port,LOW) AS low,boxplot(port,HIGH) AS
high FROM --Calculate the LOW and HIGH values for each source,dest
pair.
(
SELECT source,dest,port,octets FROM... GROUP BY source,dest,port
--This removes the duplicate entries.
) pF GROUP BY source,dest
) boxplot
(
SELECT source,dest,port,octets FROM... GROUP BY source,dest,port
--This removes the duplicate entries (again!).
) filter
WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND
filter.port=boxplot.LOW AND filter.port=boxplot.HIGH --Relate the
tables 'boxplot' and 'filter' to eachother AND select only the
source,dest,port tuples where port is in the range [LOW,HIGH] from
the filter table.


-Here is the original query I would like to optimize again:

SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
(
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY
ipv4_src, ipv4_dst, port_dst
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst
) boxplot,
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY
ipv4_src, ipv4_dst, port_dst
)filter
WHERE filter.ipv4_src=boxplot.ipv4_src AND
filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
filter.port_dst=boxplot.low

I'll try the 'EXPLAIN' query tonight, but therefor I've got to create
a small database first to speed up the results.

Thnx in advance for your help.



On Jan 22, 2008 8:15 AM, Sebastian Mendel [EMAIL PROTECTED] wrote:
 Joris Kinable schrieb:

  Optimize query
 
  I've got one query, which I would like to improve a lot since it takes
  very long (24 hours) to execute. Here is the idea:
  1. Take the table ipv4_srcipv4_dstport_dst (other rows in this
  table are not mentioned for clearity) and remove all duplicate
  tuple's. This is done by subquery 'filter'.
  2. The same query is performed by the boxplot query, but this time an
  aditional group by command is executed, therby calculating a User
  Defined Function boxplot(row,type) which returns a double value.
  3. Finally the results of the query in step 2 are used to select a
  subset of results from the 'filter' table.
  4. As you can see, the subquery 'pF' used in step 2 is identical to
  the query 'filter'. It's an extreme waste to calculate the same table
  twice. I've tried to create a temporary table from filter, but
  unfortunately Mysql doesn't allow you to access a temporary table
  twice in the same query. I prefer a 1 query answer, instead of
  creating views, or temporary tables.
 
  Is there a way to improve this query, therby improving the execution time?
 
  Query:
 
  SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
  (
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
  low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE
  prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
  ipv4_src,ipv4_dst,port_dst ASC
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING 
  COUNT(filter.port_dst)10
  ) boxplot,
  (
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6
  GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
  ipv4_src,ipv4_dst,port_dst ASC
  ) filter
  WHERE filter.ipv4_src=boxplot.ipv4_src AND
  filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
  filter.port_dst=boxplot.low

 what you are trying to do?

 and how about formating your query in a human readable way?

 did you tried EXPLAIN?

 what type of syntax is this: boxplot(pF.port_dst,LOW) ?


 --
 Sebastian


--

Re: Query optimization

2008-01-21 Thread Sebastian Mendel


Joris Kinable schrieb:

Optimize query

I've got one query, which I would like to improve a lot since it takes
very long (24 hours) to execute. Here is the idea:
1. Take the table ipv4_srcipv4_dstport_dst (other rows in this
table are not mentioned for clearity) and remove all duplicate
tuple's. This is done by subquery 'filter'.
2. The same query is performed by the boxplot query, but this time an
aditional group by command is executed, therby calculating a User
Defined Function boxplot(row,type) which returns a double value.
3. Finally the results of the query in step 2 are used to select a
subset of results from the 'filter' table.
4. As you can see, the subquery 'pF' used in step 2 is identical to
the query 'filter'. It's an extreme waste to calculate the same table
twice. I've tried to create a temporary table from filter, but
unfortunately Mysql doesn't allow you to access a temporary table
twice in the same query. I prefer a 1 query answer, instead of
creating views, or temporary tables.

Is there a way to improve this query, therby improving the execution time?

Query:

SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
(
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE
prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
ipv4_src,ipv4_dst,port_dst ASC
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10
) boxplot,
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6
GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
ipv4_src,ipv4_dst,port_dst ASC
) filter
WHERE filter.ipv4_src=boxplot.ipv4_src AND
filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
filter.port_dst=boxplot.low


what you are trying to do?

and how about formating your query in a human readable way?

did you tried EXPLAIN?

what type of syntax is this: boxplot(pF.port_dst,LOW) ?


--
Sebastian

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query Optimization Question

2006-03-14 Thread Michael Stassen


[EMAIL PROTECTED] wrote:
Yes, a ranged query should respond faster than a negation. In some cases 
you can seriously improve query performance for a negation query if you 
split it into two range queries unioned together. Here is a pseudo 
example:


This query should be slow due to the table scan it takes to test the 
condition:


SELECT ...
FROM ...
WHERE indexfield  16


This query will use the index if the number of rows with indexfield != 16 is 
small enough ( about 30%).


This query should be noticeably faster due to using ranged queries 
(partial index ranged matches):


(   SELECT ...
FROM ...
WHERE indexfield  16
) UNION (
SELECT ...
FROM ...
WHERE indexfield  16
)


At the very least, you'll want to make that UNION ALL, rather than just UNION, 
so mysql doesn't waste time looking for duplicates to remove.  But even then, 
this may be slower.


First, these two unioned queries may still not use the index.  If the number of 
rows with indexfield  16 is too large (more than about 30%) the first query 
will be a table scan.  If the number of rows with indexfield  16 is too large 
(more than about 30%) the second query will be a table scan.  In fact, if the 
number of rows with indexfield = 16 is less than about 40% of the table, then at 
least one of the two unioned queries is guaranteed to be a table scan.  Worse 
yet, this query stands a good chance of being twice as long as the single, != 
query, because it may require 2 table scans!


Second, even if both unioned queries use the index, the result still may not be 
faster.  If the combined number of matching rows is too high, the full table 
scan should be faster than the indexed lookups.


For example:

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +--+--+
  | cat  | COUNT(*) |
  +--+--+
  |0 | 5743 |
  |1 | 3792 |
  |2 |30727 |
  |3 | 1926 |
  |4 | 7812 |
  +--+--+

19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side.

First, the != case:

  EXPLAIN SELECT * FROM inits WHERE cat != 2 \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: ALL
  possible_keys: cat_idx
key: NULL
key_len: NULL
ref: NULL
   rows: 5
  Extra: Using where

As expected, a full table scan.

  SELECT * FROM inits WHERE cat != 2;
  ...
  19273 rows in set (0.37 sec)

Now, the unioned range queries:

  EXPLAIN SELECT * FROM inits WHERE cat  2
  UNION ALL
  SELECT * FROM inits WHERE cat  2 \G
  *** 1. row ***
 id: 1
select_type: PRIMARY
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 5680
  Extra: Using where
  *** 2. row ***
 id: 2
select_type: UNION
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 6543
  Extra: Using where
  *** 3. row ***
 id: NULL
select_type: UNION RESULT
  table: union1,2
   type: ALL
  possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
   rows: NULL
  Extra:


As hoped, mysql plans to use the index for each query.

  SELECT * FROM inits WHERE cat  2
  UNION ALL
  SELECT * FROM inits WHERE cat  2;
  ...
  19273 rows in set (0.78 sec)

Despite (because of) using the index, this takes more than twice as long!

Of course, if you have to do a table scan ANYWAY (because a value you have 
in a constraint is not in an index) then this won't help. This 
optimization is all about getting the engine to use an index whenever it 
can instead of performing a table scan. Of course, that is not always 
possible.


Even when using an index is possible, it is not always desirable.  I'd suggest 
not trying to outfox the optimizer until you've first determined it is making 
bad choices, and then test to make sure the solution is actually an improvement.


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query Optimization Question

2006-03-13 Thread SGreen

Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM:

 In a previous database engine I was using an IN was more optimal than a
 . So, for example:
 
 SELECT * FROM table WHERE table.type IN (1,2,3);
 
 Where the possible values of type are 0-3, was appreciably faster than:
 
 SELECT * FROM table WHERE table.type  0;
 
 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.
 
 TIA,
 
 R.
 
 

YES, YES, YES! This is definitely an optimization. 

When you say IN or =, you are asking for matching values. Matches can 
come from indexes. When you say  or NOT IN, you are asking for 
everything BUT matches.  In order to evaluate a negative, the database 
engine (and this is usually true regardless of database server) almost 
always performs a full table scan to test every row to make sure it is 
either  or NOT IN. At the very best, they have to perform a full index 
scan which is still less efficient than  ranged or values-based lookups.

It's when you get into the situation where you are matching against dozens 
of IN-clause items that you may run into slowdowns again. Until you reach 
2 or 3 dozen terms (depending on your hardware) you should be faster with 
an IN comparison than a  or a NOT IN comparison. An optimization to 
search for BUT a term or two is to create a temporary table of all of your 
terms and delete the exact ones you want to exclude. Put an index on yoru 
temp table then JOIN that back into your query again (replacing the huge 
IN clause).  The database will match index to index and things will get 
fast again. This technique can scale up to some really big queries.

Always try to code for the affirmative tests. Your users will thank you.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

RE: Query Optimization Question

2006-03-13 Thread Robert DiFalco

Shawn,
 
Any performance gains for specifying type  0 than type  0 ?
 
R.



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 13, 2006 6:37 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: Re: Query Optimization Question




Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
PM:

 In a previous database engine I was using an IN was more optimal than
a
 . So, for example:
  
 SELECT * FROM table WHERE table.type IN (1,2,3);
  
 Where the possible values of type are 0-3, was appreciably faster
than:
  
 SELECT * FROM table WHERE table.type  0;
  
 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.
  
 TIA,
  
 R.
  
 

YES, YES, YES! This is definitely an optimization.   

When you say IN or =, you are asking for matching values. Matches can
come from indexes. When you say  or NOT IN, you are asking for
everything BUT matches.  In order to evaluate a negative, the database
engine (and this is usually true regardless of database server) almost
always performs a full table scan to test every row to make sure it is
either  or NOT IN. At the very best, they have to perform a full index
scan which is still less efficient than  ranged or values-based lookups.


It's when you get into the situation where you are matching against
dozens of IN-clause items that you may run into slowdowns again. Until
you reach  2 or 3 dozen terms (depending on your hardware) you should be
faster with an IN comparison than a  or a NOT IN comparison. An
optimization to search for BUT a term or two is to create a temporary
table of all of your terms and delete the exact ones you want to
exclude. Put an index on yoru temp table then JOIN that back into your
query again (replacing the huge IN clause).  The database will match
index to index and things will get fast again. This technique can scale
up to some really big queries. 

Always try to code for the affirmative tests. Your users will thank you.


Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

RE: Query Optimization Question

2006-03-13 Thread SGreen

Yes, a ranged query should respond faster than a negation. In some cases 
you can seriously improve query performance for a negation query if you 
split it into two range queries unioned together. Here is a pseudo 
example:

This query should be slow due to the table scan it takes to test the 
condition:

SELECT ...
FROM ...
WHERE indexfield  16

This query should be noticeably faster due to using ranged queries 
(partial index ranged matches):

(   SELECT ...
FROM ...
WHERE indexfield  16
) UNION (
SELECT ...
FROM ...
WHERE indexfield  16
)

Of course, if you have to do a table scan ANYWAY (because a value you have 
in a constraint is not in an index) then this won't help. This 
optimization is all about getting the engine to use an index whenever it 
can instead of performing a table scan. Of course, that is not always 
possible.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM:

 Shawn,
 
 Any performance gains for specifying type  0 than type  0 ?
 
 R.
 
 
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 13, 2006 6:37 AM
 To: Robert DiFalco
 Cc: mysql@lists.mysql.com
 Subject: Re: Query Optimization Question
 
 
 
 
 Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
 PM:
 
  In a previous database engine I was using an IN was more optimal than
 a
  . So, for example:
  
  SELECT * FROM table WHERE table.type IN (1,2,3);
  
  Where the possible values of type are 0-3, was appreciably faster
 than:
  
  SELECT * FROM table WHERE table.type  0;
  
  I've been playing with the Query Browser and checking out the
  optimization documents and haven't been able to make a clear call on
  whether or not this is also the case with MySQL/InnoDB.
  
  TIA,
  
  R.
  
  
 
 YES, YES, YES! This is definitely an optimization. 
 
 When you say IN or =, you are asking for matching values. Matches can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full index
 scan which is still less efficient than  ranged or values-based lookups.
 
 
 It's when you get into the situation where you are matching against
 dozens of IN-clause items that you may run into slowdowns again. Until
 you reach  2 or 3 dozen terms (depending on your hardware) you should be
 faster with an IN comparison than a  or a NOT IN comparison. An
 optimization to search for BUT a term or two is to create a temporary
 table of all of your terms and delete the exact ones you want to
 exclude. Put an index on yoru temp table then JOIN that back into your
 query again (replacing the huge IN clause).  The database will match
 index to index and things will get fast again. This technique can scale
 up to some really big queries. 
 
 Always try to code for the affirmative tests. Your users will thank you.
 
 
 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine

RE: Query Optimization Question

2006-03-13 Thread Robert DiFalco

Interesting, that seems like an optimization the query optimizer could
do itself when it sees a  operator on a indexed numeric.

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 13, 2006 8:01 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: RE: Query Optimization Question

Yes, a ranged query should respond faster than a negation. In some cases
you can seriously improve query performance for a negation query if you
split it into two range queries unioned together. Here is a pseudo
example: 

This query should be slow due to the table scan it takes to test the
condition: 

SELECT ... 
FROM ... 
WHERE indexfield  16 

This query should be noticeably faster due to using ranged queries
(partial index ranged matches): 

(SELECT ... 
FROM ... 
WHERE indexfield  16 
) UNION ( 
SELECT ... 
FROM ... 
WHERE indexfield  16 
) 

Of course, if you have to do a table scan ANYWAY (because a value you
have in a constraint is not in an index) then this won't help. This
optimization is all about getting the engine to use an index whenever it
can instead of performing a table scan. Of course, that is not always
possible. 

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine 

Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29
AM:

 Shawn,

 Any performance gains for specifying type  0 than type  0 ?

 R.

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 13, 2006 6:37 AM
 To: Robert DiFalco
 Cc: mysql@lists.mysql.com
 Subject: Re: Query Optimization Question

 Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
 PM:

  In a previous database engine I was using an IN was more optimal
than
 a
  . So, for example:

  SELECT * FROM table WHERE table.type IN (1,2,3);

  Where the possible values of type are 0-3, was appreciably faster
 than:

  SELECT * FROM table WHERE table.type  0;

  I've been playing with the Query Browser and checking out the
  optimization documents and haven't been able to make a clear call on
  whether or not this is also the case with MySQL/InnoDB.

  TIA,

  R.

 YES, YES, YES! This is definitely an optimization.   

 When you say IN or =, you are asking for matching values. Matches
can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full
index
 scan which is still less efficient than  ranged or values-based
lookups.

 It's when you get into the situation where you are matching against
 dozens of IN-clause items that you may run into slowdowns again. Until
 you reach  2 or 3 dozen terms (depending on your hardware) you should
be
 faster with an IN comparison than a  or a NOT IN comparison. An
 optimization to search for BUT a term or two is to create a temporary
 table of all of your terms and delete the exact ones you want to
 exclude. Put an index on yoru temp table then JOIN that back into your
 query again (replacing the huge IN clause).  The database will match
 index to index and things will get fast again. This technique can
scale
 up to some really big queries. 

 Always try to code for the affirmative tests. Your users will thank
you.

 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine

Re: Query Optimization Question

2006-03-13 Thread Michael Stassen


Robert DiFalco wrote:
 In a previous database engine I was using an IN was more optimal than a
 . So, for example:

 SELECT * FROM table WHERE table.type IN (1,2,3);

 Where the possible values of type are 0-3, was appreciably faster than:

 SELECT * FROM table WHERE table.type  0;

 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.

Mladen Adamovic wrote:
 IN should be faster implemented with both hash tables and BTREE's so
 nowadays it should also be faster than  as all MySQL implementation
 AFAIK use those well known data structures for indexes.

[EMAIL PROTECTED] wrote:
 YES, YES, YES! This is definitely an optimization.

 When you say IN or =, you are asking for matching values. Matches can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full index
 scan which is still less efficient than  ranged or values-based lookups.

 It's when you get into the situation where you are matching against dozens
 of IN-clause items that you may run into slowdowns again. Until you reach
 2 or 3 dozen terms (depending on your hardware) you should be faster with
 an IN comparison than a  or a NOT IN comparison. An optimization to
 search for BUT a term or two is to create a temporary table of all of your
 terms and delete the exact ones you want to exclude. Put an index on your
 temp table then JOIN that back into your query again (replacing the huge
 IN clause).  The database will match index to index and things will get
 fast again. This technique can scale up to some really big queries.

 Always try to code for the affirmative tests. Your users will thank you.

Implicit in Mladen and Shawn's answers, but never actually mentioned in the 
original post, is the presence of an index on the type column.  This is probably 
obvious to all concerned, but I mention it for completeness: without an index on 
type, there is no difference between type IN (1,2,3) and type != 0.  That 
is, the question is not whether IN is better than !=, but rather which will 
allow the optimizer to make good use of the index on type.


I find mysql's optimizer is pretty good with well-written queries, as long as 
subqueries aren't involved, so my initial reaction was to expect no difference. 
 After all, as the optimizer considers the WHERE conditions and the available 
indexes, it is certainly possible, at least theoretically, for it to notice that 
type IN (1,2,3) and type != 0 are identical conditions.  That is, a clever 
optimizer could treat them identically.  Shawn's and Mladen's answers gave me 
pause, however, and aroused my curiosity, so I decided to test:


  SELECT VERSION();
  +---+
  | VERSION() |
  +---+
  | 4.1.15|
  +---+

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +--+--+
  | type | COUNT(*) |
  +--+--+
  |0 |44224 |
  |1 | 1919 |
  |2 | 1931 |
  |3 | 1926 |
  +--+--+

  mysql EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 8117
  Extra: Using where

  mysql EXPLAIN SELECT * FROM inits WHERE cat != 0 \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 8120
  Extra: Using where

As you can see, the optimizer plans to use the index in both cases, examining 
8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rows 
out of 50,000 (about 11.55%).


On the other hand, it makes a difference how many rows will match.  What is the 
distribution of values of type?  If the number of matching rows is more than 
about 30% of the table, the optimizer won't use an available index in any case. 
 For example,


mysql EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: inits
 type: ALL
possible_keys: cat_idx
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 5
Extra: Using where

mysql EXPLAIN SELECT * FROM inits WHERE cat !=3 \G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table:

Re: Query Optimization Question

2006-03-11 Thread Mladen Adamovic


Robert DiFalco wrote:

In a previous database engine I was using an IN was more optimal than a
. So, for example:
 
SELECT * FROM table WHERE table.type IN (1,2,3);
 
Where the possible values of type are 0-3, was appreciably faster than:
 
SELECT * FROM table WHERE table.type  0;
  
IN should be faster implemented with both hash tables and BTREE's so 
nowadays it should also be faster than  as all MySQL implementation 
AFAIK use those well known data structures for indexes.



 
I've been playing with the Query Browser and checking out the

optimization documents and haven't been able to make a clear call on
whether or not this is also the case with MySQL/InnoDB.
 
TIA,
 
R.
 



  


--
Mladen Adamovic
http://home.blic.net/adamm
http://www.shortopedia.com 
http://www.froola.com 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization

2006-01-13 Thread Joerg Bruehe


Hi Tripp, all!


Tripp Bishop wrote:

[[...]]

Why would adding extra fields to the query slow the
query down so much? In either case it looks like it's
using the same index.


If a query is to return only fields contained in the index, it can be 
executed as an index-only query, and the true data need not be accessed.


The moment you ask for a field not contained in the index, for each 
index entry matching the search condition, the base data need to be 
accessed to get that field.


This may be very significant additional costs.

HTH,
Joerg


PS:
In my reader, it looked like your original question hijacked another 
thread: Question regarding running 'mysql' from a bash script.

:-(

--
Joerg Bruehe, Senior Production Engineer
MySQL AB, www.mysql.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization

2006-01-12 Thread James Harvard

It might be a good idea if you could post the results of an EXPLAIN SELECT ... 
for a fast query and a slow query along with their actual SQL statements.

James Harvard

At 11:37 am -0800 12/1/06, Tripp Bishop wrote:
I've got a question regarding optimizing a query.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization

2006-01-12 Thread Tripp Bishop

Jim, here's an example query:

SELECT COUNT(listing.listing_id) AS listing_count FROM
listing_city_xref INNER JOIN listing
ON listing.listing_id = listing_city_xref.listing_id
AND listing.group_id = listing_city_xref.group_id
WHERE listing_city_xref.group_id =2  
AND listing_city_xref.city_id IN (83,88) AND
listing.price BETWEEN 189000.00 AND 23.00 
AND tot_finished_sqft = 1100 AND sqft = 1000 AND
baths = 1 AND bedrooms = 3 AND garage_spaces = 1
AND property_type IN ('RES'));


I created an index on the following fields in the
listing table:
group_id, price, tot_finished_sqft, baths, bedrooms,
garage_spaces, property_type.

Here's the result of the explain on the query above:

|  1 | SIMPLE  | listing   | ref  |
PRIMARY, idx_search | idx_search | 4 | const | 8708 |
Using where|
|  1 | SIMPLE  | listing_wiz_city_xref | ref  |
PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36|
const, ahf_test.listing.listing_id |1 | Using
where; Using index |

This query took 12.91 seconds. Running it multiple
times produces similar run times.

Now, if I remove the sqft = 1100 condition from the
where clause I get the following explain output:

|  1 | SIMPLE  | listing | range | PRIMARY,
idx_search | idx_search | 15  | NULL | 8688 |
Using where; Using index |
|  1 | SIMPLE  | listing_wiz_city_xref | ref   |
PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36  
   | const, ahf_test.listing.listing_id |1 | Using
where; Using index |

The query executes in 0.09 seconds. Running it
multiple times produces similar runtimes.

One interesting difference between the explain outputs
is that in the second case the ref is NULL. In the
first scenario it is const.

Why would adding extra fields to the query slow the
query down so much? In either case it looks like it's
using the same index.

Thanks,

Tripp


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization question

2004-10-04 Thread gerald_clark


Gerald Taylor wrote:
Query optimization question
I am  selecting from a single table  but it has a lot of rows and it 
has a very involved calculation.  What I really want to do is
is FIRST restrict the number of rows so that the big calculation is only
performed on the ones that are within 3 degrees.

Using 4.0.20
A sample query  is given here:
The application interpolates variable values such as 44.6 into
the query string, so from mysql's
point of view they are constants, right?  And the explain doc
says it optimizes constants, but  it is looking at all the rows
and I see why.
SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM londata
WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  
ORDER BY distance;

I guess I can't do a subselect with my version...
If I could what would it look like?
Something like below?   (I might be able to talk
the powers that be into an upgrade.)  And if I can't
is it more horrible to manually create a temporary table
and perform the calculations on it  than it is to
just do what I am doing?
SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM (SELECT * FROM  londata
   WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as
   sublon  ORDER BY distance;
Thanks.
GT

A Subselect won't help. You still have to scan the whole table to the 
calculation for the where clause.
Pre calculate your min and max lat. and lon.
WHERE  latitude = @maxlat and latitude = @minlat and longitude = 
@maxlon and longitude = @minlon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization question

2004-10-04 Thread SGreen

When you don't have subselects, you have two options: temporary tables or 
JOINed queries.In your case, I think the temporary table is the better way 
to go. 

I would also eliminate the ABS() check so that I can compare values 
directly against the index. I know the math is correct your way but this 
way you are comparing values directly against the column which means that 
indexes can come into play.


SET @targetLat = 44.6, @targetLon = -123.8, @Delta = 3

CREATE TEMPORARY TABLE tmpDeltaData
SELECT city, state, country, latitude, longitude
FROM Londata
WHERE latitude BETWEEN (@[EMAIL PROTECTED]) AND (@targetLat + @Delta)
AND longitude BETWEEN (@targetLon - @Delta) AND (@targetLon + 
@Delta)


SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9\\.]+$' AND longitude REGEXP'[0-9\\.]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(@targetLat))) *
 (SIN(RADIANS(latitude))) +
 (COS(RADIANS(@targetLat))) *
 (COS(RADIANS(latitude))) *
 (COS(RADIANS(@targetLon -longitude)
  * 111),) as distance 
FROM tmpDeltaData
ORDER BY distance

DROP TEMPORARY TABLE tmpDeltaData

I would also test the WHERE clause from tmpDeltaData with your original 
query to compare speeds of the two methods.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Gerald Taylor [EMAIL PROTECTED] wrote on 10/04/2004 10:33:22 AM:

 Query optimization question
 
 I am  selecting from a single table  but it has a lot of rows and it has 

 a very involved calculation.  What I really want to do is
 is FIRST restrict the number of rows so that the big calculation is only
 performed on the ones that are within 3 degrees.
 
 Using 4.0.20
 
 A sample query  is given here:
 The application interpolates variable values such as 44.6 into
 the query string, so from mysql's
 point of view they are constants, right?  And the explain doc
 says it optimizes constants, but  it is looking at all the rows
 and I see why.
 
 SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
  (SIN(RADIANS(latitude))) +
  (COS(RADIANS(44.6))) *
  (COS(RADIANS(latitude))) *
  (COS(RADIANS(-123.28 -longitude)
   * 111),) as distance  FROM londata
 WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  ORDER 

 BY distance;
 
 
 I guess I can't do a subselect with my version...
 If I could what would it look like?
 Something like below?   (I might be able to talk
 the powers that be into an upgrade.)  And if I can't
 is it more horrible to manually create a temporary table
 and perform the calculations on it  than it is to
 just do what I am doing?
 
 SELECT city, state, country, latitude, longitude,
   IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
   ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
  (SIN(RADIANS(latitude))) +
  (COS(RADIANS(44.6))) *
  (COS(RADIANS(latitude))) *
  (COS(RADIANS(-123.28 -longitude)
   * 111),) as distance  FROM (SELECT * FROM  londata
 WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) 
 as
 sublon  ORDER BY distance;
 
 Thanks.
 
 GT
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query Optimization [sorted]

2004-03-13 Thread kris burford


I'm trying to sort out a query that identifies images that are not in the 
story table (structures below). Peter Brawley has kindly pointed me in the 
right direction with the sql structure (which does work), but I'm getting 
benchmarks of 5+ seconds on the test data, whereas the live site has 50x 
the number of rows and I'm concerned about trying to go live with it.
Went back to the manual and realised that the table wasn't properly 
indexed. Makes a *big* difference!

Sorry to waste your time/bandwidth.

Kris

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query Optimization

2004-03-07 Thread Paul DuBois

At 21:55 -0300 3/7/04, Volnei Galbino wrote:
Hi,

Does anybody know where I can find information about query
optimization in MySQL? Of which the techniques that are used?
Regards,

Volnei Galbino
Yes, there's a chapter on optimization in the MySQL Reference Manual.

http://www.mysql.com/doc/en/MySQL_Optimisation.html

--
Paul DuBois, MySQL Documentation Team
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
MySQL Users Conference: April 14-16, 2004
http://www.mysql.com/uc2004/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization help

2004-02-26 Thread Sasha Pachev

Chuck Gadd wrote:
I've got a query that I can't seem to get optimized, so I'm
hoping someone here can spot something I've missing!
Table has three columns:
CoordID int unsigned,
Zip_Lo char(9),
Zip_Hi char(9)
Table has 3 million records

indexes:
acg_lo (Zip_Lo)
acg_hi (Zip_Hi)
acg_combined (Zip_Lo, Zip_Hi)


Here's the query:

select * from acg
where zip4_lo_pot = '80128' and
  zip4_hi_pot = '80128'


Explain shows:

type: ALL
possible keys: acg_lo,acg_hi,acg_combined
rows: 3022309
extra: Using where
This kind of query cannot be efficiently optimized on a pre-4.1 version. With 
4.1, if you are using MyISAM tables you could make (zip4_lo_pot,zip4_hi_pot) a 
spatial column with a spatial index. See 
http://www.mysql.com/doc/en/Spatial_extensions_in_MySQL.html

--
Sasha Pachev
Create online surveys at http://www.surveyz.com/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization help

2004-02-26 Thread vpendleton

Without seeing the data I am assuming that you are going over the 30% 
threshold with your less/greater equal to where clauses. What sort of 
criteria are you asking the database engine to search for?

 Original Message 

On 2/25/04, 9:44:02 PM, [EMAIL PROTECTED] wrote regarding Re: Query 
optimization help:


 Maybe i'm wrong here, someone correct me, if its just int's you are gonna
 use set the field types to bigint it may search faster you are doing a
 character search, to get there quicker in a text search scenerio i'd
 suggest mysql4 and full text searching MATCH  AGAINST


  I've got a query that I can't seem to get optimized, so I'm
  hoping someone here can spot something I've missing!
 
  Table has three columns:
  CoordID int unsigned,
  Zip_Lo char(9),
  Zip_Hi char(9)
 
  Table has 3 million records
 
  indexes:
  acg_lo (Zip_Lo)
  acg_hi (Zip_Hi)
  acg_combined (Zip_Lo, Zip_Hi)
 
  
 
  Here's the query:
 
  select * from acg
  where zip4_lo_pot = '80128' and
zip4_hi_pot = '80128'
 
  
 
  Explain shows:
 
  type: ALL
  possible keys: acg_lo,acg_hi,acg_combined
  rows: 3022309
  extra: Using where
 
 
  So, how can I optimize this?
 
 
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
  http://lists.mysql.com/[EMAIL PROTECTED]




 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query optimization help

2004-02-25 Thread daniel

Maybe i'm wrong here, someone correct me, if its just int's you are gonna
use set the field types to bigint it may search faster you are doing a
character search, to get there quicker in a text search scenerio i'd
suggest mysql4 and full text searching MATCH  AGAINST


 I've got a query that I can't seem to get optimized, so I'm
 hoping someone here can spot something I've missing!

 Table has three columns:
 CoordID int unsigned,
 Zip_Lo char(9),
 Zip_Hi char(9)

 Table has 3 million records

 indexes:
 acg_lo (Zip_Lo)
 acg_hi (Zip_Hi)
 acg_combined (Zip_Lo, Zip_Hi)

 

 Here's the query:

 select * from acg
 where zip4_lo_pot = '80128' and
   zip4_hi_pot = '80128'

 

 Explain shows:

 type: ALL
 possible keys: acg_lo,acg_hi,acg_combined
 rows: 3022309
 extra: Using where


 So, how can I optimize this?



 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: query optimization with CONCAT query

2002-11-05 Thread olinux

Mysql 4.0.1+ supports searches IN BOOLEAN MODE

So if mysql 4 is an option, I think that would work
for you.

http://www.mysql.com/doc/en/Fulltext_Search.html

olinux


--- Andy Ingham [EMAIL PROTECTED] wrote:
 Folks --
 
 We have built a table with bibliographic information
 that contains a 
 field for initial articles and for titles (sans
 initial articles).  
 
 Some sample data (with all other fields suppressed):
 

+-+--+
 | article | title   
 |

+-+--+
 | The | '60s (mini) 
 |
 | | À Nous la Liberté [AKA: Freedom for Us |
 Liberty for Us] |
 | | Full Speed [AKA: À toute vitesse]   
 |
 | El  | Ángel exterminador [AKA: The
 Exterminating Angel]|
 | The | Farm: Angola, USA [AKA: The Farm]

+-+--+
 
 For searching, we want to check
 
 article (+) title
 
 So, we've used CONCAT, as shown below from an
 example query from our 
 slow query log:
 
 SELECT FILM.length, FILM.film_id, FILM.article,
 FILM.title, 
 FILM.release_date, FILM.description,
 FILM.displayFlag FROM (FILM LEFT 
 JOIN FILM_DIRECTOR ON FILM.film_id =
 FILM_DIRECTOR.film_id) LEFT JOIN 
 DIRECTOR ON FILM_DIRECTOR.director_id =
 DIRECTOR.director_id WHERE 
 FILM.displayFlag = 'yes' AND (CONCAT(article,title)
 LIKE '%paris is 
 burning%') ORDER BY title, release_date;
 
 I am under the impression (from testing and reading
 previous posts) that 
 it is not possible to construct an index on the
 CONCATENATION of two 
 fields.  A multi-column index on the article and
 title fields only helps 
 if the query is like:
 
 ... article LIKE '%something%' and title LIKE
 '%something%' ...
 
 which doesn't help us.
 
 Likewise, a FULLTEXT index only finds a string that
 is *fully contained* 
 within any of the fields defined in that index.
 
 Any ideas of how to address this issue?
 
 TIA for any insights,
 Andy
 
 Andy Ingham
 Systems Librarian
 Academic Affairs Library
 UNC-Chapel Hill
 919-962-1288
 [EMAIL PROTECTED]
 
 
 
 

-
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list
 archive)
 
 To request this thread, e-mail
 [EMAIL PROTECTED]
 To unsubscribe, e-mail
 [EMAIL PROTECTED]
 Trouble unsubscribing? Try:
 http://lists.mysql.com/php/unsubscribe.php
 


__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: query optimization with CONCAT query

2002-11-05 Thread Andy Ingham

   Actually, I'm not convinced it would, because I want to be able to 
find a string that is *NOT fully contained* within any given field.

In other words, I want a search for

The catcher in%

to match the row:

+-+--+
| article | title  |
+-+--+
| The | catcher in the rye |

CONCAT allows me to do this, but I'm hoping to optimize that operation 
somehow, perhaps with an index.

Andy

olinux wrote:

Mysql 4.0.1+ supports searches IN BOOLEAN MODE

So if mysql 4 is an option, I think that would work
for you.

http://www.mysql.com/doc/en/Fulltext_Search.html

olinux


--- Andy Ingham [EMAIL PROTECTED] wrote:
 

Folks --

We have built a table with bibliographic information
that contains a 
field for initial articles and for titles (sans
initial articles).  

Some sample data (with all other fields suppressed):


   

+-+--+
 

| article | title   
   |

   

+-+--+
 

| The | '60s (mini) 
   |
| | À Nous la Liberté [AKA: Freedom for Us |
Liberty for Us] |
| | Full Speed [AKA: À toute vitesse]   
   |
| El  | Ángel exterminador [AKA: The
Exterminating Angel]|
| The | Farm: Angola, USA [AKA: The Farm]

   

+-+--+
 

For searching, we want to check

article (+) title

So, we've used CONCAT, as shown below from an
example query from our 
slow query log:

SELECT FILM.length, FILM.film_id, FILM.article,
FILM.title, 
FILM.release_date, FILM.description,
FILM.displayFlag FROM (FILM LEFT 
JOIN FILM_DIRECTOR ON FILM.film_id =
FILM_DIRECTOR.film_id) LEFT JOIN 
DIRECTOR ON FILM_DIRECTOR.director_id =
DIRECTOR.director_id WHERE 
FILM.displayFlag = 'yes' AND (CONCAT(article,title)
LIKE '%paris is 
burning%') ORDER BY title, release_date;

I am under the impression (from testing and reading
previous posts) that 
it is not possible to construct an index on the
CONCATENATION of two 
fields.  A multi-column index on the article and
title fields only helps 
if the query is like:

... article LIKE '%something%' and title LIKE
'%something%' ...

which doesn't help us.

Likewise, a FULLTEXT index only finds a string that
is *fully contained* 
within any of the fields defined in that index.

Any ideas of how to address this issue?

TIA for any insights,
Andy

Andy Ingham
Systems Librarian
Academic Affairs Library
UNC-Chapel Hill
919-962-1288
[EMAIL PROTECTED]





   

-
 

Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/   (the list
archive)

To request this thread, e-mail
[EMAIL PROTECTED]
To unsubscribe, e-mail
[EMAIL PROTECTED]
Trouble unsubscribing? Try:
http://lists.mysql.com/php/unsubscribe.php

   



__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/
 






-
Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: query optimization

2002-08-30 Thread Gerald Clark


That is not valid SQL.
a table can't be like something.
I think you ommitted the field name.
I also don't see why you need a join , unless you have duplicate ids.

Gabe wrote:

First off, please excuse me if this is the wrong list to post this sort of question 
to (and please direct me to the
appropriate list)

My problem:

I have a database of approx. 170,000 records and 1.2 million keywords.  Most of my 
searches happen fairly quickly using
the format below

However, the group by is causing a temporary table when I explain my statement, 
which significantly lengthens query
time.

If I eliminate the group by, queries are substantially sped up and a temp table is 
not being generated.  Only problem
with not using group by is that sometimes the same ID is duplicated in the results 
due to some of the keywords being
very similar (i.e.  Richard, Rich, Rick) for the same ID.

Can anyone help with a different Select strategy or something else to eliminate the 
temp table being generated for this
query.

-
Tables:
main - has unique ID (primary) column and main data columns

kw_table - has ID column and a keyword column

Indexes are created on the ID columns and the keyword column.

Query:

SELECT main.*
FROM kw_table, kw_table AS kw_table1, main
WHERE  kw_table LIKE 'first word%' AND kw_table1 LIKE '2nd word%' AND main.id = 
kw_table.id AND kw_table1.id =
kw_table.id
GROUP BY main.id
Limit 0,11


Thanks
Gabe



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail 
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php





-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: query optimization

2002-08-30 Thread Gabe


I'm sorry, I guess I should make sure the query syntax is correct when asking a 
question...  Here it is with the correct
field names:

SELECT main.*
FROM kw_table, kw_table AS kw_table1, main
WHERE kw_table.keyword LIKE 'first word%' AND kw_table1.keyword LIKE '2nd word%' AND 
main.id = kw_table.id AND kw_table1.id
= kw_table.id
GROUP BY main.id
Limit 0,11

Yes, there can be duplicate id's in the keyword table (but not the main table)..  
For instance:

kw_table

id  keyword
__  ___
1   Rich
1   Richard
1   Rick
1   Bob
2   Leonard
3   Bob
3   Larry
3   Mike

Thanks
Gabe

Gerald Clark wrote:

 That is not valid SQL.
 a table can't be like something.
 I think you ommitted the field name.
 I also don't see why you need a join , unless you have duplicate ids.

 Gabe wrote:

 First off, please excuse me if this is the wrong list to post this sort of question 
to (and please direct me to the
 appropriate list)
 
 My problem:
 
 I have a database of approx. 170,000 records and 1.2 million keywords.  Most of my 
searches happen fairly quickly using
 the format below
 
 However, the group by is causing a temporary table when I explain my statement, 
which significantly lengthens query
 time.
 
 If I eliminate the group by, queries are substantially sped up and a temp table is 
not being generated.  Only problem
 with not using group by is that sometimes the same ID is duplicated in the 
results due to some of the keywords being
 very similar (i.e.  Richard, Rich, Rick) for the same ID.
 
 Can anyone help with a different Select strategy or something else to eliminate the 
temp table being generated for this
 query.
 
 -
 Tables:
 main - has unique ID (primary) column and main data columns
 
 kw_table - has ID column and a keyword column
 
 Indexes are created on the ID columns and the keyword column.
 
 Query:
 
 SELECT main.*
 FROM kw_table, kw_table AS kw_table1, main
 WHERE  kw_table LIKE 'first word%' AND kw_table1 LIKE '2nd word%' AND main.id = 
kw_table.id AND kw_table1.id =
 kw_table.id
 GROUP BY main.id
 Limit 0,11
 
 
 Thanks
 Gabe
 
 
 
 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)
 
 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail 
[EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
 
 

 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail [EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: Query Optimization

2002-07-24 Thread Dave Dutcher


Hi, I did some more research on my problem.  (Sorry for not doing it before
I posted the other message), and I think perhaps my question could be
summarized into the following.

Can I use an index to speed up a select max() or select min()?

I read what the manual has to say about a query like this:
SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10

However I want to do this:
SELECT MAX(key_part1) FROM table_name where key_part1  10

also what I want to do is basically equivalent to this:

SELECT key_part1 FROM table_name WHERE key_part1  10 ORDER BY key_part1
DESC LIMIT 1

In testing the second statement seems slower than the max() for some reason
though.

Is there anyway to get close to a O(logN) search time on these queries?

Thanks again,

Dave


-Original Message-
From: Dave Dutcher [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 24, 2002 1:04 PM
To: [EMAIL PROTECTED]
Subject: Query Optimization


I was wondering if somebody could give me any suggestions on how to optimize
a query I am working on.

This is my table:

CREATE TABLE testdata (
  Begin char(9) NOT NULL default '',
  End char(9) NOT NULL default '',
  UNIQUE KEY BeginEndIndex (Begin,End)
) TYPE=MyISAM;

It is a table of ranges.  i.e. 1-5, 7-11, 2-24000.  None of the ranges
overlap, and I'm trying to write a query to find a range that contains a
specific number such as 500,000.  So this is what I've written for a query:

select Begin, End
from testdata
where begin = '00500' and end = '00500'

On a table with 100,000 records the explain command tells me it is using the
BeginEndIndex, it says the key length is 9, and that it has to look through
about 27,000 records.  I would like to be able to configure the query or
indexes , so it will only have to look through a couple records if possible.

MySQL is super fast at a query like this:
select Begin from testdata where Begin = '00500';

So I had the idea of trying this:
select max(Begin) from testdata where Begin = '00500';

I was hoping that it would do the same as the simple select, and then
because it has a sorted index it wouldn't have to search the previous
records to find the max, but explain still puts this at about 27,000
records.

Has anyone else tried writing a query similar to this?  Does anybody have
any suggestions?

Thanks in advance,

Dave




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query Optimization

2002-07-24 Thread Dan Nelson


In the last episode (Jul 24), Dave Dutcher said:
 I was wondering if somebody could give me any suggestions on how to optimize
 a query I am working on.
 
 This is my table:
 
 CREATE TABLE testdata (
   Begin char(9) NOT NULL default '',
   End char(9) NOT NULL default '',
   UNIQUE KEY BeginEndIndex (Begin,End)
 ) TYPE=MyISAM;
 
 It is a table of ranges.  i.e. 1-5, 7-11, 2-24000.  None of the ranges
 overlap, and I'm trying to write a query to find a range that contains a
 specific number such as 500,000.  So this is what I've written for a query:
 
 select Begin, End
 from testdata
 where begin = '00500' and end = '00500'

Put an index on (begin,end).  It will only have to do a range scan on
that index to find the matching records.  You might also want to make
those columns integers instead of chars; they'll take up a bit less
space that way.

-- 
Dan Nelson
[EMAIL PROTECTED]

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query Optimization

2002-07-08 Thread Dicky Wahyu Purnomo


Pada Mon, 8 Jul 2002 16:07:04 +0530
Arul [EMAIL PROTECTED] menulis :

 Hi All
 
 The Query below took around 175 Secs to return 22 Rows..Any way this query
 can be optimized

try to use group by instead of distinct  i usually found group by is faster ;-)

anyway ... performance of query is more related to :
- table definition
- indexing
- mysql parameter values
- OS and hardware spec/limitation

-- 
Let's call it an accidental feature.
-- Larry Wall
 
MySQL 3.23.51 : up 17 days, Queries : 365.147 per second (avg).

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query Optimization

2002-07-08 Thread Arul


Hi

I Found Group By to be a bit faster...but still it takes 30 secs to execute
a query...
What could be done..I have Indexing on almost all the tables..

-Arul
- Original Message -
From: Dicky Wahyu Purnomo [EMAIL PROTECTED]
To: Arul [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, July 08, 2002 4:25 PM
Subject: Re: Query Optimization


 Pada Mon, 8 Jul 2002 16:07:04 +0530
 Arul [EMAIL PROTECTED] menulis :

  Hi All
 
  The Query below took around 175 Secs to return 22 Rows..Any way this
query
  can be optimized

 try to use group by instead of distinct  i usually found group by is
faster ;-)

 anyway ... performance of query is more related to :
 - table definition
 - indexing
 - mysql parameter values
 - OS and hardware spec/limitation

 --
 Let's call it an accidental feature.
 -- Larry Wall

 MySQL 3.23.51 : up 17 days, Queries : 365.147 per second (avg).



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query Optimization

2002-07-08 Thread Georg Richter


On Monday, 8. July 2002 12:37, Arul wrote:
 Hi All

 The Query below took around 175 Secs to return 22 Rows..Any way this query
 can be optimized


For analyzing your problem, you should send your table definition and an 
output of the EXPLAIN statement.


Regards Georg

mysql, query

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query Optimization

2002-07-08 Thread Keith C. Ivey


On 8 Jul 2002, at 16:07, Arul [EMAIL PROTECTED] wrote:

   FROM
 User_Type_Details UTD,
 User_Type_Details B , Users U ,TempGeneralSearch1 T1,Users
 A LEFT JOIN Company C on (U.CompanyID = C.CompanyID) LEFT
 JOIN Expert_Info EI on (U.UserID = EI.UserID) LEFT JOIN
 User_Horizontal_Market UHM on (U.UserID = UHM.UserID) LEFT
 JOIN Industry I on (U.IndustryID = I.IndustryID) LEFT JOIN
 Technology_Area T  on (U.Technology_AreaID =
 T.Technology_AreaID) LEFT JOIN Role R on (U.RoleID =
 R.RoleID )
   WHERE
 U.UserID = UTD.UserID
 AND  U.User_Status='ACT' AND  U.User_App_Status='APP' AND
 UTD.User_TypeID IN (4,0,2,3)   AND T1.TempID=117

You seem to have some tables in the FROM part of your query that 
aren't being used (Users aliased to A, User_Type_Details aliased to 
B) and another that isn't joined to anything (T1, but that may be 
okay since it looks like you may be getting only one record from it 
because of the TempID=117).  Eliminate the unused tables (which could 
be resulting in a huge number of duplicated records), and your speed 
should improve greatly.

-- 
Keith C. Ivey [EMAIL PROTECTED]
Tobacco Documents Online
http://tobaccodocuments.org

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization

2001-11-09 Thread Steve Meyers


Jeff,

Let's do some math here first.  Your table has 47 Million rows.  You
have a primary key on an int, and indexes across 22 chars and 10 chars. 
You have 512MB of key buffer.

Now, assuming the indexes just have information on relevant fields
(they're actually bigger than this since they have a row pointer too)
you need 47M*4 + 47M*22 + 47M*10 bytes of memory for your keys.  This
works out to 47M * (4 + 22 + 10) = 47M * (36) = 1.6GB of memory to store
your keys.  You only have 512MB.  Therefore, since MySQL can't keep the
key in memory, it has to swap parts out to disk as it works on your
query, which causes a huge performance decrease.

If you don't need to search on inequalities or partial matches, I would
recommend adding two columns -- an integer hash value for each of the
char columns.  Index those, instead of the char columns, and do your
lookups on those.  In this scenario, you will need 47M * (4 + 8 + 4) =
717MB of memory, which is much closer to what you have.  If you aren't
actively using the primary key, you may be able to squeeze the other two
indexes in your available RAM.

For a table that big, you should probably buy a little bit more RAM, if
it will fit in your machine.  RAM is cheap :)

Steve Meyers

On Fri, 2001-11-09 at 18:08, Jeff Isom wrote:
 I'm trying to figure out how to optimize a query on a fairly large table.
 I've been reading the MySQL documentation and have tried a few of the
 suggestions, but not seem to have much effect on the query.
 
 The table contains three columns:
 1) int unsigned
 2) char(12)
 3) char(10)
 
 The table has 47,000,000 + rows.
 
 I have my key_buffer set to 512M.
 
 I have a primary key index on column 1, a multicolumn index (col2,col3) and
 another single column index for col3.
 
 The columns are not unique, in fact, some column 2 has as may as 1,000,000
 rows with the same value.
 
 I am running MySQL 3.23.37 on RedHat 6.0 with 1 Gig memory and a single 700
 Mhz processor.
 The hard drive is a single 18 Gig SCSI drive.
 
 I am searching on col 2 only, col 3 only, or col 2 and col 3.
 
 The queries on single columns can take as long a 6 minutes to execute.
 
 Any suggestions on how I can further optimize the setup to get better
 results?
 
 Thanks,
 Jeff



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization issue.

2001-08-25 Thread Jeremy Zawodny


On Thu, Aug 23, 2001 at 11:11:38AM -0700, Steven Roussey wrote:
 
  mysql explain select * from dominf.domip order by ip desc;
 
  'ORDER BY something DESC' does not use an index. It must
  be ascending. See the manual.
 
 Also, I forgot, if you do this often and it needs to be in DESC
 order, you can always alter the app to create and use an extra
 column 'ip_desc unsigned int not null' and set it in an insert to
 (0x - ip) and have an index on ip_desc. (If the column was
 signed you would use 0x7FF - thevariable, but ip numbers need to
 be stored as unsigned).

And it's also worth knowing that this is fixed in 4.0.

Jeremy
-- 
Jeremy D. Zawodny, [EMAIL PROTECTED]
Technical Yahoo - Yahoo Finance
Desk: (408) 349-7878   Fax: (408) 349-5454   Cell: (408) 685-5936

MySQL 3.23.41-max: up 8 days, processed 107,141,973 queries (140/sec. avg)

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization issue.

2001-08-23 Thread Steven Roussey



 mysql explain select * from dominf.domip order by ip desc;

 'ORDER BY something DESC' does not use an index. It must
 be ascending. See the manual.

Also, I forgot, if you do this often and it needs to be in DESC order, you
can always alter the app to create and use an extra column 'ip_desc unsigned
int not null' and set it in an insert to (0x - ip) and have an index
on ip_desc. (If the column was signed you would use 0x7FF - thevariable,
but ip numbers need to be stored as unsigned).


Sincerely,
Steven Roussey
http://Network54.com/?pp=e

PS: I'm trying to set aside a week in September to augment the MySQL code to
use an index even with DESC in these situations. MySQL AB has some
unfinished code to do this that I'm working from. I'm just familiarizing
myself with the context of the code right now.


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization issue.

2001-08-22 Thread Rafal Jank


Peter Zaitsev wrote:
 
 Hello mysql,
 
   I was trying to load very huge (5GB 45mil rows) to the text dump in
   sorted order (according to primary key).
   The problem is mysql somehow does not wants to scan the table by
   primary key to produce sorted output row by row, but  prefers to use
   filesort which would take quite a long time in this case:
 
 mysql explain select * from dominf.domip order by ip desc;  
   
Try using limit clause 

-- 
_/_/  _/_/_/  - Rafa Jank [EMAIL PROTECTED] -
 _/  _/  _/  _/   _/ Wirtualna Polska SA   http://www.wp.pl 
  _/_/_/_/  _/_/_/ul. Uphagena 2, 80-237 Gdansk, tel/fax. (58) 5215625
   _/  _/  _/ ==*  http://szukaj.wp.pl *==--

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization issue.

2001-08-22 Thread Rafal Jank


Rafal Jank wrote:
 
 Peter Zaitsev wrote:
 
  Hello mysql,
 
I was trying to load very huge (5GB 45mil rows) to the text dump in
sorted order (according to primary key).
The problem is mysql somehow does not wants to scan the table by
primary key to produce sorted output row by row, but  prefers to use
filesort which would take quite a long time in this case:
 
  mysql explain select * from dominf.domip order by ip desc;
 Try using limit clause
I mean: check the plan of
select * from dominf.domip order by ip desc limit (number of records - 1)
That's strange, but that works for me... (on the smaller table)
The last record you can get using other query. 
-- 
_/_/  _/_/_/  - Rafa Jank [EMAIL PROTECTED] -
 _/  _/  _/  _/   _/ Wirtualna Polska SA   http://www.wp.pl 
  _/_/_/_/  _/_/_/ul. Uphagena 2, 80-237 Gdansk, tel/fax. (58) 5215625
   _/  _/  _/ ==*  http://szukaj.wp.pl *==--

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization issue.

2001-08-22 Thread Steven Roussey



mysql explain select * from dominf.domip order by ip desc;

'ORDER BY something DESC' does not use an index. It must be ascending. See
the manual.

Sincerely,
Steven Roussey
http://Network54.com/?pp=e


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Query optimization

2001-03-19 Thread Tibor Simko


Hello

On Sat, 17 Mar 2001, Brd Farstad wrote:

 I have a query in my program which is taking to long when I get data
 in the database. I'm currently testing with ~40.000 articles in the
 system. [...] Is it possible to optimize this query further? 

Have you played with EXPLAIN SELECT and friends?  See also some
optimization tips in the chapter 12 of the manual.

cheers
-- 
TS

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: Query Optimization

2001-03-07 Thread Quentin Bennett


Hi,

Without a where clause in your initial query, you will never get to use an
index.

How many rows are there in sessobjs? From the start of the sample output, it
looks like there are several thousand, so asking a complex question may,
indeed, take a long time.

You could create a result table, and use several cell_id limited queries to
populated it, if the time take to run the query for each cell_id turns out
to be less than the time to run the one-off query.

Hope this helps.

Quentin

-Original Message-
From: CLEMENTS,BILL (HP-Roseville,ex1) [mailto:[EMAIL PROTECTED]]
Sent: Thursday, 8 March 2001 1:27 p.m.
To: '[EMAIL PROTECTED]'
Subject: Query Optimization



Hi All,

 This query takes 3-4 minutes to run from the mysql command line:

 select DAYOFMONTH(started_date), object_status, count(*) from sessobjs 
   where MONTH(started_date)=3 group by started_date, object_status

 Sample Output:

mysql select DAYOFMONTH(started_date), object_status, count(*) from
sessobjs 
where MONTH(started_date)=3 group by started_date, object_status;
+--+---+--+
| DAYOFMONTH(started_date) | object_status | count(*) |
+--+---+--+
|1 | 1 |  185 |
|1 | 2 |23647 |
|1 | 3 |  257 |
|1 | 4 |  304 |
|1 | 5 |   69 |
|1 | 6 |9 |
|1 | 7 |   13 |
|2 | 1 |  403 |
|2 | 3 |  271 |
|2 | 4 |  283 |
|2 | 5 |   32 |
|2 | 7 |   14 |
|3 | 1 |  811 |
|3 | 2 |20582 |
|3 | 3 |  243 |
|3 | 4 |  287 |
|3 | 5 |  124 |
|3 | 6 |1 |
|3 | 7 |   43 |


 However, with the addition of a where clause, this query only a few seconds
to run:

 select DAYOFMONTH(started_date), object_status, count(*) from seessobjs
where cell_id=1 AND MONTH(started_date)=3 group by started_date,
object_status

Sample output:

mysql select DAYOFMONTH(started_date), object_status, count(*) from
sessobjs
- where cell_id=29 and MONTH(started_date)=3 group by started_date,
object_status;
+--+---+--+
| DAYOFMONTH(started_date) | object_status | count(*) |
+--+---+--+
|1 | 2 |  205 |
|2 | 2 |  195 |
|3 | 2 |  177 |
|4 | 2 |  223 |
|4 | 3 |1 |
|5 | 2 |  194 |
|6 | 2 |  195 |
|7 | 2 |   12 |
+--+---+--+
8 rows in set (2.45 sec)


Here is my index structure for the table SESSOBJS:

mysql show index from sessobjs;
+--++---+--+---+
---+-+--+
| Table| Non_unique | Key_name  | Seq_in_index | Column_name   |
Collation | Cardinality | Sub_part |
+--++---+--+---+
---+-+--+
| sessobjs |  0 | PRIMARY   |1 | sobj_id   | A
| 3449788 | NULL |
| sessobjs |  1 | sess_id   |1 | sess_id   | A
|NULL | NULL |
| sessobjs |  1 | object_status |1 | object_status | A
|NULL | NULL |
| sessobjs |  1 | objda |1 | started_date  | A
|NULL | NULL |
| sessobjs |  1 | objda |2 | object_status | A
|NULL | NULL |
| sessobjs |  1 | cell_id   |1 | cell_id   | A
|NULL | NULL |
| sessobjs |  1 | cell_id   |2 | started_date  | A
|NULL | NULL |
| sessobjs |  1 | cell_id   |3 | object_status | A
|NULL | NULL |
| sessobjs |  1 | site_id   |1 | site_id   | A
|NULL | NULL |
| sessobjs |  1 | site_id   |2 | started_date  | A
|NULL | NULL |
| sessobjs |  1 | site_id   |3 | object_status | A
|NULL | NULL |
| sessobjs |  1 | host_id   |

Re: query optimization suggestion

2001-02-01 Thread Angela


Shouldn't the first query have parenthesis? As in:
DELETE FROM test WHERE userID=X AND (testID20 OR
testID80);
Even if it works the other way, parenthesis make it more clear what
you're trying to accomplish.
-Angela

Michael Griffith wrote:
 
 When using mySQL I've learned to avoid OR in any queries as much as possible
 . Almost always this causes a major speed decrease. Consider this table:
 
 CREATE TABLE test(
  userID int, # (non-unique)
  testID int,  # (non-unique)
  PRIMARY key(testid,userid)
 );
 
 Suppose this table is populated with 1,000,000 rows. Then do this:
 
 DELETE FROM test WHERE userID= AND testID20 OR testID80;
 
 This query is EXTREMELY slow becasue it looks at every row in the DB.
 A significant improvement can be acheived by splitting it into 2 statements:
 
 DELETE FROM test WHERE userID= AND testID20;
 DELETE FROM test WHERE userID= AND testID80;
 
 On real data I've acheived at least a ten fold increase doing this.
 
 This is easy to optimize from the client side, however, I don't see any
 reason why this optimization can't or shouldn't be build into the server.
 Whenever an OR can be split into two separate queries on the same index this
 optimization should work.
 
 Food for thought.
 
 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)
 
 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail [EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

-- 
Angela Curtis
Innovative Business Consultants
http://www.ibc2001.com

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: query optimization suggestion

2001-02-01 Thread Michael Griffith


 Shouldn't the first query have parenthesis? As in:
 DELETE FROM test WHERE userID=X AND (testID20 OR
 testID80);
 Even if it works the other way, parenthesis make it more clear what
 you're trying to accomplish.
 -Angela

Actually,  yes. Sorry about the poor example. My point is that sometimes
applications can be sped up by splitting OR queries into two separate
queries, and I believe many times the server could recognize this, although
I am wrong in this particular example. I'll have to check my code for an
exact situation.

Michael


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: query optimization suggestion

2001-02-01 Thread Dan Nelson


In the last episode (Feb 01), Michael Griffith said:
 CREATE TABLE test(
  userID int, # (non-unique)
  testID int,  # (non-unique)
  PRIMARY key(testid,userid)
 );
 
 Suppose this table is populated with 1,000,000 rows. Then do this:
 
 DELETE FROM test WHERE userID= AND testID20 OR testID80;
 
 This query is EXTREMELY slow becasue it looks at every row in the DB.
 A significant improvement can be acheived by splitting it into 2
 statements:
 
 DELETE FROM test WHERE userID= AND testID20;
 DELETE FROM test WHERE userID= AND testID80;

Avoiding ORs is a good idea, but I don't think your example
demonstrates it.  The real problem is ORs on two different columns. 
You have: "(testID20 AND userID=) OR testID80", which can
be optimized with a two-column index on (testID,userID).  A better
example would be "DELETE FROM test WHERE userID= OR testID=".

 On real data I've acheived at least a ten fold increase doing this.
 
 This is easy to optimize from the client side, however, I don't see
 any reason why this optimization can't or shouldn't be build into the
 server. Whenever an OR can be split into two separate queries on the
 same index this optimization should work.

What you're suggesting here is splitting a query into a UNION of two
smaller queries, essentially.  But this is not always the best way.

Keep in mind that your DELETE example doesn't have to worry about
records that are true for both halves of your OR (since the record
would get deleted by the first statement).  If it were really a SELECT
query, mysql would have to run the first half of the query, remember
all the record numbers, then run the second query and make sure that no
duplicate records are returned:

SELECT * FROM test WHERE userID= or testID=;
i.e.full table scan

would become (if mysql supported UNION):

SELECT DISTINCT * FROM 
(
SELECT * FROM test WHERE userID= 
UNION   SELECT * FROM test WHERE testID=
);
i.e.random record pull based on userID index +
random record pull based on testID index +
mergesort + remove duplicate records

If there is a lot of overlap between the two OR clauses, or the
percentage of returned records is high relative to the size of the
table, doing a full table scan would be faster.

-- 
Dan Nelson
[EMAIL PROTECTED]

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

74 matches

Mail list logo