Re: New-be: problem installing on Mac SO10.6

2011-11-05 Thread Brandon Phelps

On 11/5/2011 12:32 PM, Roger CPL wrote:

I installed Mysql using the .dmg package.  I can use System Preferences to 
start and stop the program.  When I open a terminal window, I think this is the 
way to us the program, any Mysql command I inter is not recognized.

Thanks
Roger


You will need to first start the mysql CLI (command line interface) 
using the mysql command.  You will likely need to supply correct 
credentials as well, try mysql --help for details.


In short, you cannot simply enter SQL commands from a shell and expect 
them to work.


-Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Initial install instructions for mac lion/macports

2011-10-11 Thread Brandon Phelps

Is there any reason why you are using ports and not the native 64-bit DMG from 
mysql.com?

http://www.mysql.com/downloads/mysql/#downloads

I run the latest version (5.5.15) on my macbook running lion and the install 
goes without a hitch.

Brandon

On 10/10/2011 07:34 PM, Tim Johnson wrote:

I familiar with mysql on linux, as of late - ubuntu.
I am installing mysql on a mac mini, with the Lion - osx 10.7
operating system.

mysql-server was downloaded and built with macports.
Following instructions included has not resulted in a successful
startup.
I have recorded error messages, but it may suffice if I were pointed
to initial setup instructions for Mac Lion. So far, google has not
edified me.

TIA


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Initial install instructions for mac lion/macports

2011-10-11 Thread Brandon Phelps

I only use ports for a very few number of apps, such as irssi, newer versions 
of perl, etc.

I was recommending the OP use the DMG installer, as it comes with everything 
necessary, including startup scripts.

-Brandon

On 10/11/2011 10:05 AM, Vladislav Geller wrote:

Hi Brandon,

MacPorts is practically dead. I tihnk you will have more luck with
http://mxcl.github.com/homebrew/

I have not built mysql with it though on lion.

regards,
Vladislav

On Tue, Oct 11, 2011 at 3:56 PM, Brandon Phelpsbphe...@gls.com  wrote:


Is there any reason why you are using ports and not the native 64-bit DMG
from mysql.com?

http://www.mysql.com/**downloads/mysql/#downloadshttp://www.mysql.com/downloads/mysql/#downloads

I run the latest version (5.5.15) on my macbook running lion and the
install goes without a hitch.

Brandon


On 10/10/2011 07:34 PM, Tim Johnson wrote:


I familiar with mysql on linux, as of late - ubuntu.
I am installing mysql on a mac mini, with the Lion - osx 10.7
operating system.

mysql-server was downloaded and built with macports.
Following instructions included has not resulted in a successful
startup.
I have recorded error messages, but it may suffice if I were pointed
to initial setup instructions for Mac Lion. So far, google has not
edified me.

TIA



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?**unsub=vladislav.geller@*
*vincorex.chhttp://lists.mysql.com/mysql?unsub=vladislav.gel...@vincorex.ch






--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Initial install instructions for mac lion/macports

2011-10-11 Thread Brandon Phelps

Tim,

Just a reminder, as I am not sure if it is documented or not;  After you get 
MySQL up and running via the DMG package be sure to install the System 
Preferences pane (it didn't use to install by default, not sure if it does now) 
which should be one of the icons you get when the DMG first opens, if I recall. 
 Once you get the system preferences pane installed you can access it and 
enable the Startup option so that MySQL starts whenever the system is booted.  
I am not sure what your requirements are but without the previously mentioned 
steps, if your machine loses power while you are away and is subsequently 
rebooted, you will not be able to access the database until you manually start 
the service.

Hope this helps.

-Brandon

On 10/11/2011 11:29 AM, Tim Johnson wrote:

* Tim Johnsont...@akwebsoft.com  [111010 15:37]:

I familiar with mysql on linux, as of late - ubuntu.
I am installing mysql on a mac mini, with the Lion - osx 10.7
operating system.

mysql-server was downloaded and built with macports.
Following instructions included has not resulted in a successful
startup.

  :) Last night before turning in, I thought that maybe I should have
  investigated the DMG option. It is clear from the respondents that
  I should do this.
  Thanks a lot. I will uninstall via macports and use the package as
  recommended.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Weird UPDATE Problem

2011-10-10 Thread Brandon Phelps

That query looks fine.  What error are you getting if you execute the query 
from the CLI?  Also is it possible that the s_id or owed columns are no longer 
numeric data types?  If this column(s) is/are a character type now, then you 
would need to have the values in quotes.

-Brandon

On 10/10/2011 09:14 AM, D. Marshall Lemcoe Jr. wrote:

Hello all.
I have recently finished my migration from an older server to a newer
server running RHEL 6. The MySQL version went from 5.0.77 to 5.1.52.
In my application, this query used to work just fine:

$paid_query = mysql_query(UPDATE $table_name SET owed = 0 WHERE s_id
= $student);

Where table_name was
mysql_real_escape_string(collection_41_students). With the new MySQL
version, the UPDATE query does not work, and echo mysql_error();
results nothing.

Is my syntax correct going from version to version?

Thanks in advance,

David



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: MySQL Indexes

2011-10-07 Thread Brandon Phelps

This thread has sparked my interest. What is the difference between an index on 
(field_a, field_b) and an index on (field_b, field_a)?

On 10/06/2011 07:43 PM, Nuno Tavares wrote:

Neil, whenever you see multiple fields you'd like to index, you should
consider, at least:

* The frequency of each query;
* The occurrences of the same field in multiple queries;
* The cardinality of each field;

There is a tool Index Analyzer that may give you some hints, and I
think it's maatkit that has a tool to run a query log to find good
candidates - I've seen it somewhere, I believe

Just remember that idx_a(field_a,field_b) is not the same, and is not
considered for use, the same way as idx_b(field_b,field_a).

-NT


Em 07-10-2011 00:22, Michael Dykman escreveu:

Only one index at a time can be used per query, so neither strategy is
optimal.  You need at look at the queries you intend to run against the
system and construct indexes which support them.

  - md

On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
neil.tompk...@googlemail.comwrote:


Maybe that was a bad example.  If the query was name = 'Red' what index
should I create ?

Should I create a index of all columns used in each query or have a index
on individual column ?


On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com  wrote:

For the first query, the obvious index on score will give you optimal
results.

The second query is founded on this phrase: Like '%Red%'  and no index
will help you there.  This is an anti-pattern, I am afraid.  The only way
your database can satisfy that expression is to test each and every record
in the that database (the test itself being expensive as infix finding is
iterative).  Perhaps you should consider this approach instead:
  http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompk...@googlemail.com
neil.tompk...@googlemail.com  wrote:


Hi,

Can anyone help and offer some advice with regards MySQL indexes.
  Basically
we have a number of different tables all of which have the obviously
primary
keys.  We then have some queries using JOIN statements that run slowly
than
we wanted.  How many indexes are recommended per table ?  For example
should
I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of two
basic queries

SELECT auto_id, name, score
FROM test_table
WHERE score  10
ORDER BY score DESC


SELECT auto_id, name, score
FROM test_table
WHERE score  10
AND name Like '%Red%'
ORDER BY score DESC

How many indexes should be created for these two queries ?

Thanks,
Neil





--
  - michael dykman
  -mdyk...@gmail.commdyk...@gmail.com

  May the Source be with you.










--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: FW: MySQL Indexes

2011-10-07 Thread Brandon Phelps

That cleared it up for me.  Thanks!

On 10/07/2011 03:06 PM, Jerry Schwartz wrote:

-Original Message-
From: Reindl Harald [mailto:h.rei...@thelounge.net]
Sent: Friday, October 07, 2011 12:21 PM
To: mysql@lists.mysql.com
Subject: Re: MySQL Indexes

but could this not be called a bug?


[JS] No.

Think of two telephone books: one is sorted by first name, last name and the
other is sorted by last name, first name. (Those are like your two keys, f1/f2
and f2/f1.)

If you want to find someone by their first name, you use the first book. If
you want to find somebody by their last name, you use the second book.

If you want to find someone by their last name, the first book (key f1/f2) is
useless. If you want to find someone by their first name, the second book
(f2/f1) is useless.

Does that help explain it?

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: je...@gii.co.jp
Web site: www.giiresearch.com





Am 07.10.2011 18:08, schrieb Michael Dykman:

When a query selects on field_a and field_b, that index can be used.  If
querying on field_a alone, the index again is useful.  Query on field_b
alone however, that first index is of no use to you.

On Fri, Oct 7, 2011 at 10:49 AM, Brandon Phelpsbphe...@gls.com  wrote:


This thread has sparked my interest. What is the difference between an
index on (field_a, field_b) and an index on (field_b, field_a)?


On 10/06/2011 07:43 PM, Nuno Tavares wrote:


Neil, whenever you see multiple fields you'd like to index, you should
consider, at least:

* The frequency of each query;
* The occurrences of the same field in multiple queries;
* The cardinality of each field;

There is a tool Index Analyzer that may give you some hints, and I
think it's maatkit that has a tool to run a query log to find good
candidates - I've seen it somewhere, I believe

Just remember that idx_a(field_a,field_b) is not the same, and is not
considered for use, the same way as idx_b(field_b,field_a).

-NT


Em 07-10-2011 00:22, Michael Dykman escreveu:


Only one index at a time can be used per query, so neither strategy is
optimal.  You need at look at the queries you intend to run against the
system and construct indexes which support them.

  - md

On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
neil.tompk...@googlemail.com**wrote:

  Maybe that was a bad example.  If the query was name = 'Red' what index

should I create ?

Should I create a index of all columns used in each query or have a
index
on individual column ?


On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com   wrote:

For the first query, the obvious index on score will give you optimal
results.

The second query is founded on this phrase: Like '%Red%'  and no
index
will help you there.  This is an anti-pattern, I am afraid.  The only
way
your database can satisfy that expression is to test each and every
record
in the that database (the test itself being expensive as infix finding
is
iterative).  Perhaps you should consider this approach instead:
  http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext-

natural-language.html



http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext-

natural-language.html


On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompkins@**
googlemail.comneil.tompk...@googlemail.com
neil.tompk...@googlemail.com   wrote:

  Hi,


Can anyone help and offer some advice with regards MySQL indexes.
  Basically
we have a number of different tables all of which have the obviously
primary
keys.  We then have some queries using JOIN statements that run slowly
than
we wanted.  How many indexes are recommended per table ?  For example
should
I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of
two
basic queries

SELECT auto_id, name, score
FROM test_table
WHERE score   10
ORDER BY score DESC


SELECT auto_id, name, score
FROM test_table
WHERE score   10
AND name Like '%Red%'
ORDER BY score DESC

How many indexes should be created for these two queries ?

Thanks,
Neil





--
  - michael dykman
  -mdyk...@gmail.commdykman@**gmail.commdyk...@gmail.com

  May the Source be with you.










--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:

http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/m
ysql?unsub=mdyk...@gmail.com








--

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

http://www.thelounge.net/signature.asc.what.htm








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:

Stored Procedure Question

2011-09-21 Thread Brandon Phelps

Hello all,

I would like to create a stored procedure that does the following:

1. Accepts 4 values as parameters
2. SELECTS 1 record (LIMIT 1) from a table where the 4 parameters match fields 
in that table
a. If a record was returned then UPDATE the table
b. If a record was not returned then INSERT into a different table

My main question here is how can I execute a SELECT id FROM ... LIMIT 1 
statement within a stored procedure then use the returned id field later in the 
procedure?

Something like this:

@row_id = SELECT id FROM myTable WHERE blah blah LIMIT 1;

IF @row_id != nothing THEN
UPDATE myTable ...
ELSE
INSERT INTO anotherTable ...
END IF

So if no rows were returned from the select I perform the ELSE block, otherwise 
I perform the main IF block.

Thanks!

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Stored Procedure Question [SOLVED]

2011-09-21 Thread Brandon Phelps

Ah ha!  Thanks Derek.  I thought INTO was used strictly for inserting the 
selected records into another table.

Much appreciated.


On 09/21/2011 02:34 PM, Derek Downey wrote:

SELECT id INTO @row_id FROM myTable WHEREblah blah  LIMIT 1;

Source
http://dev.mysql.com/doc/refman/5.5/en/select-into-statement.html

On Sep 21, 2011, at 2:23 PM, Brandon Phelps wrote:


Hello all,

I would like to create a stored procedure that does the following:

1. Accepts 4 values as parameters
2. SELECTS 1 record (LIMIT 1) from a table where the 4 parameters match fields 
in that table
a. If a record was returned then UPDATE the table
b. If a record was not returned then INSERT into a different table

My main question here is how can I execute a SELECT id FROM ... LIMIT 1 
statement within a stored procedure then use the returned id field later in the 
procedure?

Something like this:

@row_id = SELECT id FROM myTable WHEREblah blah  LIMIT 1;

IF @row_id != nothing THEN
UPDATE myTable ...
ELSE
INSERT INTO anotherTable ...
END IF

So if no rows were returned from the select I perform the ELSE block, otherwise 
I perform the main IF block.

Thanks!

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=de...@orange-pants.com






--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Quotes around INSERT and SELECT statements' arguments from the mysql CLI and PHP

2011-09-18 Thread Brandon Phelps

Personally I don't use any quotes for the numeric types, and single quotes for 
everything else.  Ie:

UPDATE mytable SET int_field = 5 WHERE id = 3;
SELECT id FROM mytable WHERE int_field = 5;
UPDATE mytable SET varchar_field = 'Test' WHERE id = 3;
SELECT id FROM mytable WHERE varchar_field = 'Test';
UPDATE mytable SET datetime_field = '2011-09-18 00:00:00' WHERE id = 3;

If you are using PHP you may need to escape the single quotes if your php 
string is in single quotes:
$query = 'UPDATE mytable SET varchar_field = \'Test\' WHERE id = 3'

But if you are doing interpolation and your string is in double quotes, you 
should not need to escape:
$query = UPDATE $table_name SET varchar_field = 'Test' WHERE id = 3

Some people prefer to use back quotes on field names such as:
$query = UPDATE `mytable` SET `varchar_field` = 'Test' WHERE `id` = 3

And some people prefer to put numeric fields in quotes as well, although it is 
not necessary:
UPDATE mytable SET int_field = '5' WHERE id = '3';

On 9/18/11 5:00 AM, Dotan Cohen wrote:

I am somewhat confused as to the proper way to place quotes around
arguments in INSERT and SELECT statements. I also don't see where this
is made explicit in the fine manual.

If the column is type int, is it preferable to use single, double, or
no quotes on INSERT from the mysql cli?
If the column is type int, is it preferable to use single, double, or
no quotes on SELECT from the mysql cli?
If the column is type int, is it preferable to use single, double, or
no quotes on INSERT from PHP?
If the column is type int, is it preferable to use single, double, or
no quotes on SELECT from PHP?
Is it the same for decimal and float?

If the column is type varchar, is it preferable to use single or
double quotes on INSERT from the mysql cli?
If the column is type varchar, is it preferable to use single or
double quotes on SELECT from the mysql cli?
If the column is type varchar, is it preferable to use single or
double quotes on INSERT from PHP?
If the column is type varchar, is it preferable to use single or
double quotes on SELECT from PHP?
Is it the same for text and blob?
Also, in PHP often I see code examples with the variable wrapped in
curly brackets, inside single quotes. What is the purpose of the curly
brackets? Here is such an example:
$query=INSERT INTO names (name) VALUE ('{$userName}');

If the column is type datetime, is it preferable to use single or
double quotes on INSERT from the mysql cli?
If the column is type datetime, is it preferable to use single or
double quotes on SELECT from the mysql cli?
If the column is type datetime, is it preferable to use single or
double quotes on INSERT from PHP?
If the column is type datetime, is it preferable to use single or
double quotes on SELECT from PHP?
What if I am using the NOW() function?

If the column is type set, is it preferable to use single or double
quotes on INSERT from the mysql cli?
If the column is type set, is it preferable to use single or double
quotes on SELECT from the mysql cli?
If the column is type set, is it preferable to use single or double
quotes on INSERT from PHP?
If the column is type set, is it preferable to use single or double
quotes on SELECT from PHP?

Thanks.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-09-08 Thread Brandon Phelps

On 09/01/2011 01:32 PM, Brandon Phelps wrote:

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---+--**--+---**
+--+-+**+--+--**---+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
++-+---+--**--+---**
+--+-+**+--+--**---+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
1 | |
++-+---+--**--+---**
+--+-+**+--+--**---+

EXPLAIN output for new method with new index

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting
varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.



On 09/08/2011 02:16 PM, Andrew Moore wrote:


Thinking outside the query, is there any archiving that could happen to
make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:


  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


  On 9/1/2011 09:42, Brandon Phelps wrote:


  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:



...





  WHERE

(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30


00:00:00')



AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30



12:36:53')


  In that case your logic here simplifies to:

WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



  Now add an index over open_dt and close_dt and see what happens.




  Jochem




Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull
1
days worth of connections, I would miss that entry. Basically I want
to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start
and
end dates, I need to see that record.

Any other ideas?


  I believe Jochem was on the right track but he got his dates

reversed.

Let's try a little ASCII art to show the situation. I will setup a
query
window with two markers (s) and (e). Events will be marked by ||
markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Mihail,

I have considered this but have not yet determined how best to go about 
partitioning the table.  I don't think partitioning by dst_address or 
src_address would help because most of the queries do not filter on IP address 
(except very specific queries where the end-user is searching the table for 
history on a particular employee).

I could potentially partition the table based on the day of week the connection 
was opened on which may improve performance for a while since this would take 
me from a single 32million record table down to roughly 4.5 million records per 
partition (32 / 7) however we are looking to eventually store up to 2 months 
worth of data in the table, and so far that 32 million records is only for 1 
month, so I estimate another 32 million-ish before the month is out, bringing 
me to roughly 70 million records total (it would be nice if I could store even 
more than 2 months, but given my currently performance dilemma I don't expect 
that to happen).  Also this does not take into account that the end-user will 
often be pulling data for multiple days at a time, meaning that multiple 
partitions in this scheme will need to be accessed anyway.

The only other logical partitioning scheme I can think of would be to partition 
based on dst_port (the port the log relates to) but the majority of records are 
all to port 80 (standard web traffic) so I don't think this would be all that 
helpful.

I have never messed with partitioning so it is totally possible that I am not 
thinking of something, so if you have any ideas on a decent partitioning scheme 
based on my criteria and queries below, please let me know.

Thanks,
Brandon
 


On 09/08/2011 02:47 PM, Mihail Manolov wrote:

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 
13:18:58' = close_dt = '2011-09-08 13:18:58');


++-+---++---+-+-++--+-+
| id | select_type | table | type   | possible_keys | key | key_len | ref   
 | rows | Extra   |
++-+---++---+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| NULL  
 | 32393330 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++---+-+-++--+-+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Andrew,

Generally there is only 1 user performing the complicated SELECT query at a time, however 
the background process that fills the table is constantly doing a fast SELECT (0.3 
seconds) and a subsequent UPDATE.  Basically whenever a connection is closed on the 
firewall, the bg process SELECTS from the table the last record that was opened (and 
hasn't already been closed) for a given source IP, source port, dest ip, and protocol.  
So for example whenever the firewall logs a CONNECTION CLOSED message, the bg 
process does:

select id from sonicwall_connections where src_address = w.x.y.z and src_port = 
x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1

then it UPDATES the close_dt column for the record with the selected ID.  These 
select/update statements being run by the background process generally take 
0.000x seconds each.

The only slowdown we see is with the complicated SELECT I have below, and this 
is only executed once in a while throughout the day.  For example, if an IT 
manager sees a huge bandwidth spike on the network, he may access the web gui 
to determine what the source IP is so he can determine who is downloading a 
large file, etc.

I think what I will try to do is create 60 partitions for the table in question 
based on month and day.  This way each day will go into it's own partition and 
if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it 
will only need to access 7 partitions instead of the entire table.

My only question with this is how would I go about creating the table with 60 
partitions in such a way that I won't need to constantly add/remove new/old 
partitions every day?  Is there any partitioning function I can use that would 
not require me to update the partitions schema every day?  I already plan to 
have a cron run each night to purge records older than 60 days from the 
database.

On 09/08/2011 03:26 PM, Andrew Moore wrote:

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com  wrote:


Mihail,

I have considered this but have not yet determined how best to go about
partitioning the table.  I don't think partitioning by dst_address or
src_address would help because most of the queries do not filter on IP
address (except very specific queries where the end-user is searching the
table for history on a particular employee).

I could potentially partition the table based on the day of week the
connection was opened on which may improve performance for a while since
this would take me from a single 32million record table down to roughly 4.5
million records per partition (32 / 7) however we are looking to eventually
store up to 2 months worth of data in the table, and so far that 32 million
records is only for 1 month, so I estimate another 32 million-ish before the
month is out, bringing me to roughly 70 million records total (it would be
nice if I could store even more than 2 months, but given my currently
performance dilemma I don't expect that to happen).  Also this does not take
into account that the end-user will often be pulling data for multiple days
at a time, meaning that multiple partitions in this scheme will need to be
accessed anyway.

The only other logical partitioning scheme I can think of would be to
partition based on dst_port (the port the log relates to) but the majority
of records are all to port 80 (standard web traffic) so I don't think this
would be all that helpful.

I have never messed with partitioning so it is totally possible that I am
not thinking of something, so if you have any ideas on a decent partitioning
scheme based on my criteria and queries below, please let me know.

Thanks,
Brandon


On 09/08/2011 02:47 PM, Mihail Manolov wrote:


If you're running version 5.1+ you may wanna take a look at table
partitioning options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting

varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT | union1,2 | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Ah I see.  Well thanks for your assistance!

-Brandon

On 09/08/2011 05:21 PM, Mihail Manolov wrote:

 From the manual: The default behavior for UNION is that duplicate rows are removed 
from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT |union1,2  | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address

Re: Query Optimization

2011-09-01 Thread Brandon Phelps

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA( sc.src_address ) AS src_address,
 sc.src_port,
 INET_NTOA( sc.dst_address ) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
  FROM firewall_connections AS sc

 If this is a firewall connection log I presume open_dt is the time a
 connection was opened and is always going to be less than close_dt.
 Right?

  WHERE
 (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
 AND (open_dt = '2011-08-30 12:36:53' OR close_dt = 
'2011-08-30 12:36:53')


 In that case your logic here simplifies to:
 WHERE
   open_dt = '2011-08-30 00:00:00'
   AND
   close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve 
results where the open_dt time is out of the range specified.  For 
example, a very large file download might span multiple days so given 
your logic if the connection was started 2 days ago and I want to pull 1 
days worth of connections, I would miss that entry.  Basically I want to 
SELECT all of the records that were opened OR closed during the 
specified time period, ie. if any activity happened between my start and 
end dates, I need to see that record.


Any other ideas?

Thanks again,
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-09-01 Thread Brandon Phelps

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output:   (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |
++-+---++---+--+-++--+-+
|  1 | SIMPLE  | sc| index  | open_dt,ndx_open_close_dt | ndx_rcvd | 4  
 | NULL   |   10 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.dst_port |1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra

Query Optimization

2011-08-30 Thread Brandon Phelps

Hello,

I have the following query I'd like to optimize a bit:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 
12:36:53')
ORDER BY rcvd DESC
LIMIT 0 , 10

Currently I have an index on the rcvd column which seems to be working 
based on the output of EXPLAIN:


id  select_type table   typepossible_keys   key key_len 
ref rowsExtra
1   SIMPLE  sc  index   open_dt ndx_rcvd4   NULL
10  Using where
1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2   
syslog.sc.src_port  1   
1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2   
syslog.sc.dst_port  1   

However the query is still fairly slow for some reason, any ideas how I 
could speed it up with additional indexes, etc?


The values I am using in the WHERE clause are variable and change each time.

The table has around 23 million records right now but is expected to 
continue to grow up to a potential 150 million.


Here is the table schema:
CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Thanks in advance!

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Query Optimization

2011-08-10 Thread Brandon Phelps

Hello all,

I am using the query below and variations of it to query a database with 
a TON of records.  Currently the database has around 11 million records 
but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below query, 
as currently it takes this query around 10 seconds to run but I am sure 
this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the output 
of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-08-10 Thread Brandon Phelps
Thanks Singer,  this took my query down to 0.0007, perfect!  I wasn't 
aware a single index of multiple columns would work when one of the 
columns was in the WHERE clause and the other in the ORDER BY clause. 
Learn something new every day I guess!



On 08/10/2011 02:03 PM, Singer X.J. Wang wrote:

Try a index on (dst_port,close_dt)

On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com
mailto:bphe...@gls.com wrote:

Hello all,

I am using the query below and variations of it to query a database
with a TON of records.  Currently the database has around 11 million
records but it grows every day and should cap out at around 150 million.

I am curious if there is any way I can better optimize the below
query, as currently it takes this query around 10 seconds to run but
I am sure this will get slower and slower as the database grows.

SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the
output of EXPLAIN:

id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?__unsub=w...@singerwang.com
http://lists.mysql.com/mysql?unsub=w...@singerwang.com


--
The best compliment you could give Pythian for our service is a referral.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org