Re: New-be: problem installing on Mac SO10.6
On 11/5/2011 12:32 PM, Roger CPL wrote: I installed Mysql using the .dmg package. I can use System Preferences to start and stop the program. When I open a terminal window, I think this is the way to us the program, any Mysql command I inter is not recognized. Thanks Roger You will need to first start the mysql CLI (command line interface) using the mysql command. You will likely need to supply correct credentials as well, try mysql --help for details. In short, you cannot simply enter SQL commands from a shell and expect them to work. -Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Initial install instructions for mac lion/macports
Is there any reason why you are using ports and not the native 64-bit DMG from mysql.com? http://www.mysql.com/downloads/mysql/#downloads I run the latest version (5.5.15) on my macbook running lion and the install goes without a hitch. Brandon On 10/10/2011 07:34 PM, Tim Johnson wrote: I familiar with mysql on linux, as of late - ubuntu. I am installing mysql on a mac mini, with the Lion - osx 10.7 operating system. mysql-server was downloaded and built with macports. Following instructions included has not resulted in a successful startup. I have recorded error messages, but it may suffice if I were pointed to initial setup instructions for Mac Lion. So far, google has not edified me. TIA -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Initial install instructions for mac lion/macports
I only use ports for a very few number of apps, such as irssi, newer versions of perl, etc. I was recommending the OP use the DMG installer, as it comes with everything necessary, including startup scripts. -Brandon On 10/11/2011 10:05 AM, Vladislav Geller wrote: Hi Brandon, MacPorts is practically dead. I tihnk you will have more luck with http://mxcl.github.com/homebrew/ I have not built mysql with it though on lion. regards, Vladislav On Tue, Oct 11, 2011 at 3:56 PM, Brandon Phelpsbphe...@gls.com wrote: Is there any reason why you are using ports and not the native 64-bit DMG from mysql.com? http://www.mysql.com/**downloads/mysql/#downloadshttp://www.mysql.com/downloads/mysql/#downloads I run the latest version (5.5.15) on my macbook running lion and the install goes without a hitch. Brandon On 10/10/2011 07:34 PM, Tim Johnson wrote: I familiar with mysql on linux, as of late - ubuntu. I am installing mysql on a mac mini, with the Lion - osx 10.7 operating system. mysql-server was downloaded and built with macports. Following instructions included has not resulted in a successful startup. I have recorded error messages, but it may suffice if I were pointed to initial setup instructions for Mac Lion. So far, google has not edified me. TIA -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?**unsub=vladislav.geller@* *vincorex.chhttp://lists.mysql.com/mysql?unsub=vladislav.gel...@vincorex.ch -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Initial install instructions for mac lion/macports
Tim, Just a reminder, as I am not sure if it is documented or not; After you get MySQL up and running via the DMG package be sure to install the System Preferences pane (it didn't use to install by default, not sure if it does now) which should be one of the icons you get when the DMG first opens, if I recall. Once you get the system preferences pane installed you can access it and enable the Startup option so that MySQL starts whenever the system is booted. I am not sure what your requirements are but without the previously mentioned steps, if your machine loses power while you are away and is subsequently rebooted, you will not be able to access the database until you manually start the service. Hope this helps. -Brandon On 10/11/2011 11:29 AM, Tim Johnson wrote: * Tim Johnsont...@akwebsoft.com [111010 15:37]: I familiar with mysql on linux, as of late - ubuntu. I am installing mysql on a mac mini, with the Lion - osx 10.7 operating system. mysql-server was downloaded and built with macports. Following instructions included has not resulted in a successful startup. :) Last night before turning in, I thought that maybe I should have investigated the DMG option. It is clear from the respondents that I should do this. Thanks a lot. I will uninstall via macports and use the package as recommended. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Weird UPDATE Problem
That query looks fine. What error are you getting if you execute the query from the CLI? Also is it possible that the s_id or owed columns are no longer numeric data types? If this column(s) is/are a character type now, then you would need to have the values in quotes. -Brandon On 10/10/2011 09:14 AM, D. Marshall Lemcoe Jr. wrote: Hello all. I have recently finished my migration from an older server to a newer server running RHEL 6. The MySQL version went from 5.0.77 to 5.1.52. In my application, this query used to work just fine: $paid_query = mysql_query(UPDATE $table_name SET owed = 0 WHERE s_id = $student); Where table_name was mysql_real_escape_string(collection_41_students). With the new MySQL version, the UPDATE query does not work, and echo mysql_error(); results nothing. Is my syntax correct going from version to version? Thanks in advance, David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: MySQL Indexes
This thread has sparked my interest. What is the difference between an index on (field_a, field_b) and an index on (field_b, field_a)? On 10/06/2011 07:43 PM, Nuno Tavares wrote: Neil, whenever you see multiple fields you'd like to index, you should consider, at least: * The frequency of each query; * The occurrences of the same field in multiple queries; * The cardinality of each field; There is a tool Index Analyzer that may give you some hints, and I think it's maatkit that has a tool to run a query log to find good candidates - I've seen it somewhere, I believe Just remember that idx_a(field_a,field_b) is not the same, and is not considered for use, the same way as idx_b(field_b,field_a). -NT Em 07-10-2011 00:22, Michael Dykman escreveu: Only one index at a time can be used per query, so neither strategy is optimal. You need at look at the queries you intend to run against the system and construct indexes which support them. - md On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins neil.tompk...@googlemail.comwrote: Maybe that was a bad example. If the query was name = 'Red' what index should I create ? Should I create a index of all columns used in each query or have a index on individual column ? On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com wrote: For the first query, the obvious index on score will give you optimal results. The second query is founded on this phrase: Like '%Red%' and no index will help you there. This is an anti-pattern, I am afraid. The only way your database can satisfy that expression is to test each and every record in the that database (the test itself being expensive as infix finding is iterative). Perhaps you should consider this approach instead: http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompk...@googlemail.com neil.tompk...@googlemail.com wrote: Hi, Can anyone help and offer some advice with regards MySQL indexes. Basically we have a number of different tables all of which have the obviously primary keys. We then have some queries using JOIN statements that run slowly than we wanted. How many indexes are recommended per table ? For example should I have a index on all fields that will be used in a WHERE statement ? Should the indexes be created with multiple fields ? A example of two basic queries SELECT auto_id, name, score FROM test_table WHERE score 10 ORDER BY score DESC SELECT auto_id, name, score FROM test_table WHERE score 10 AND name Like '%Red%' ORDER BY score DESC How many indexes should be created for these two queries ? Thanks, Neil -- - michael dykman -mdyk...@gmail.commdyk...@gmail.com May the Source be with you. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: FW: MySQL Indexes
That cleared it up for me. Thanks! On 10/07/2011 03:06 PM, Jerry Schwartz wrote: -Original Message- From: Reindl Harald [mailto:h.rei...@thelounge.net] Sent: Friday, October 07, 2011 12:21 PM To: mysql@lists.mysql.com Subject: Re: MySQL Indexes but could this not be called a bug? [JS] No. Think of two telephone books: one is sorted by first name, last name and the other is sorted by last name, first name. (Those are like your two keys, f1/f2 and f2/f1.) If you want to find someone by their first name, you use the first book. If you want to find somebody by their last name, you use the second book. If you want to find someone by their last name, the first book (key f1/f2) is useless. If you want to find someone by their first name, the second book (f2/f1) is useless. Does that help explain it? Regards, Jerry Schwartz Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 E-mail: je...@gii.co.jp Web site: www.giiresearch.com Am 07.10.2011 18:08, schrieb Michael Dykman: When a query selects on field_a and field_b, that index can be used. If querying on field_a alone, the index again is useful. Query on field_b alone however, that first index is of no use to you. On Fri, Oct 7, 2011 at 10:49 AM, Brandon Phelpsbphe...@gls.com wrote: This thread has sparked my interest. What is the difference between an index on (field_a, field_b) and an index on (field_b, field_a)? On 10/06/2011 07:43 PM, Nuno Tavares wrote: Neil, whenever you see multiple fields you'd like to index, you should consider, at least: * The frequency of each query; * The occurrences of the same field in multiple queries; * The cardinality of each field; There is a tool Index Analyzer that may give you some hints, and I think it's maatkit that has a tool to run a query log to find good candidates - I've seen it somewhere, I believe Just remember that idx_a(field_a,field_b) is not the same, and is not considered for use, the same way as idx_b(field_b,field_a). -NT Em 07-10-2011 00:22, Michael Dykman escreveu: Only one index at a time can be used per query, so neither strategy is optimal. You need at look at the queries you intend to run against the system and construct indexes which support them. - md On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins neil.tompk...@googlemail.com**wrote: Maybe that was a bad example. If the query was name = 'Red' what index should I create ? Should I create a index of all columns used in each query or have a index on individual column ? On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com wrote: For the first query, the obvious index on score will give you optimal results. The second query is founded on this phrase: Like '%Red%' and no index will help you there. This is an anti-pattern, I am afraid. The only way your database can satisfy that expression is to test each and every record in the that database (the test itself being expensive as infix finding is iterative). Perhaps you should consider this approach instead: http://dev.mysql.com/doc/**refman/5.5/en/fulltext-** natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext- natural-language.html http://dev.mysql.com/doc/**refman/5.5/en/fulltext-** natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext- natural-language.html On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompkins@** googlemail.comneil.tompk...@googlemail.com neil.tompk...@googlemail.com wrote: Hi, Can anyone help and offer some advice with regards MySQL indexes. Basically we have a number of different tables all of which have the obviously primary keys. We then have some queries using JOIN statements that run slowly than we wanted. How many indexes are recommended per table ? For example should I have a index on all fields that will be used in a WHERE statement ? Should the indexes be created with multiple fields ? A example of two basic queries SELECT auto_id, name, score FROM test_table WHERE score 10 ORDER BY score DESC SELECT auto_id, name, score FROM test_table WHERE score 10 AND name Like '%Red%' ORDER BY score DESC How many indexes should be created for these two queries ? Thanks, Neil -- - michael dykman -mdyk...@gmail.commdykman@**gmail.commdyk...@gmail.com May the Source be with you. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/m ysql?unsub=mdyk...@gmail.com -- Mit besten Grüßen, Reindl Harald the lounge interactive design GmbH A-1060 Vienna, Hofmühlgasse 17 CTO / software-development / cms-solutions p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40 icq: 154546673, http://www.thelounge.net/ http://www.thelounge.net/signature.asc.what.htm -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:
Stored Procedure Question
Hello all, I would like to create a stored procedure that does the following: 1. Accepts 4 values as parameters 2. SELECTS 1 record (LIMIT 1) from a table where the 4 parameters match fields in that table a. If a record was returned then UPDATE the table b. If a record was not returned then INSERT into a different table My main question here is how can I execute a SELECT id FROM ... LIMIT 1 statement within a stored procedure then use the returned id field later in the procedure? Something like this: @row_id = SELECT id FROM myTable WHERE blah blah LIMIT 1; IF @row_id != nothing THEN UPDATE myTable ... ELSE INSERT INTO anotherTable ... END IF So if no rows were returned from the select I perform the ELSE block, otherwise I perform the main IF block. Thanks! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Stored Procedure Question [SOLVED]
Ah ha! Thanks Derek. I thought INTO was used strictly for inserting the selected records into another table. Much appreciated. On 09/21/2011 02:34 PM, Derek Downey wrote: SELECT id INTO @row_id FROM myTable WHEREblah blah LIMIT 1; Source http://dev.mysql.com/doc/refman/5.5/en/select-into-statement.html On Sep 21, 2011, at 2:23 PM, Brandon Phelps wrote: Hello all, I would like to create a stored procedure that does the following: 1. Accepts 4 values as parameters 2. SELECTS 1 record (LIMIT 1) from a table where the 4 parameters match fields in that table a. If a record was returned then UPDATE the table b. If a record was not returned then INSERT into a different table My main question here is how can I execute a SELECT id FROM ... LIMIT 1 statement within a stored procedure then use the returned id field later in the procedure? Something like this: @row_id = SELECT id FROM myTable WHEREblah blah LIMIT 1; IF @row_id != nothing THEN UPDATE myTable ... ELSE INSERT INTO anotherTable ... END IF So if no rows were returned from the select I perform the ELSE block, otherwise I perform the main IF block. Thanks! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=de...@orange-pants.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Quotes around INSERT and SELECT statements' arguments from the mysql CLI and PHP
Personally I don't use any quotes for the numeric types, and single quotes for everything else. Ie: UPDATE mytable SET int_field = 5 WHERE id = 3; SELECT id FROM mytable WHERE int_field = 5; UPDATE mytable SET varchar_field = 'Test' WHERE id = 3; SELECT id FROM mytable WHERE varchar_field = 'Test'; UPDATE mytable SET datetime_field = '2011-09-18 00:00:00' WHERE id = 3; If you are using PHP you may need to escape the single quotes if your php string is in single quotes: $query = 'UPDATE mytable SET varchar_field = \'Test\' WHERE id = 3' But if you are doing interpolation and your string is in double quotes, you should not need to escape: $query = UPDATE $table_name SET varchar_field = 'Test' WHERE id = 3 Some people prefer to use back quotes on field names such as: $query = UPDATE `mytable` SET `varchar_field` = 'Test' WHERE `id` = 3 And some people prefer to put numeric fields in quotes as well, although it is not necessary: UPDATE mytable SET int_field = '5' WHERE id = '3'; On 9/18/11 5:00 AM, Dotan Cohen wrote: I am somewhat confused as to the proper way to place quotes around arguments in INSERT and SELECT statements. I also don't see where this is made explicit in the fine manual. If the column is type int, is it preferable to use single, double, or no quotes on INSERT from the mysql cli? If the column is type int, is it preferable to use single, double, or no quotes on SELECT from the mysql cli? If the column is type int, is it preferable to use single, double, or no quotes on INSERT from PHP? If the column is type int, is it preferable to use single, double, or no quotes on SELECT from PHP? Is it the same for decimal and float? If the column is type varchar, is it preferable to use single or double quotes on INSERT from the mysql cli? If the column is type varchar, is it preferable to use single or double quotes on SELECT from the mysql cli? If the column is type varchar, is it preferable to use single or double quotes on INSERT from PHP? If the column is type varchar, is it preferable to use single or double quotes on SELECT from PHP? Is it the same for text and blob? Also, in PHP often I see code examples with the variable wrapped in curly brackets, inside single quotes. What is the purpose of the curly brackets? Here is such an example: $query=INSERT INTO names (name) VALUE ('{$userName}'); If the column is type datetime, is it preferable to use single or double quotes on INSERT from the mysql cli? If the column is type datetime, is it preferable to use single or double quotes on SELECT from the mysql cli? If the column is type datetime, is it preferable to use single or double quotes on INSERT from PHP? If the column is type datetime, is it preferable to use single or double quotes on SELECT from PHP? What if I am using the NOW() function? If the column is type set, is it preferable to use single or double quotes on INSERT from the mysql cli? If the column is type set, is it preferable to use single or double quotes on SELECT from the mysql cli? If the column is type set, is it preferable to use single or double quotes on INSERT from PHP? If the column is type set, is it preferable to use single or double quotes on SELECT from PHP? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1
Re: Query Optimization
Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index
Re: Query Optimization
Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting
Re: Query Optimization
Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc
Re: Query Optimization
Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory
Re: Query Optimization
Andrew, Generally there is only 1 user performing the complicated SELECT query at a time, however the background process that fills the table is constantly doing a fast SELECT (0.3 seconds) and a subsequent UPDATE. Basically whenever a connection is closed on the firewall, the bg process SELECTS from the table the last record that was opened (and hasn't already been closed) for a given source IP, source port, dest ip, and protocol. So for example whenever the firewall logs a CONNECTION CLOSED message, the bg process does: select id from sonicwall_connections where src_address = w.x.y.z and src_port = x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1 then it UPDATES the close_dt column for the record with the selected ID. These select/update statements being run by the background process generally take 0.000x seconds each. The only slowdown we see is with the complicated SELECT I have below, and this is only executed once in a while throughout the day. For example, if an IT manager sees a huge bandwidth spike on the network, he may access the web gui to determine what the source IP is so he can determine who is downloading a large file, etc. I think what I will try to do is create 60 partitions for the table in question based on month and day. This way each day will go into it's own partition and if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it will only need to access 7 partitions instead of the entire table. My only question with this is how would I go about creating the table with 60 partitions in such a way that I won't need to constantly add/remove new/old partitions every day? Is there any partitioning function I can use that would not require me to update the partitions schema every day? I already plan to have a cron run each night to purge records older than 60 days from the database. On 09/08/2011 03:26 PM, Andrew Moore wrote: Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote
Re: Query Optimization
Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings
Re: Query Optimization
Ah I see. Well thanks for your assistance! -Brandon On 09/08/2011 05:21 PM, Mihail Manolov wrote: From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT |union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address
Re: Query Optimization
On 09/01/2011 04:59 AM, Jochem van Dieten wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? Thanks again, Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc| index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra
Query Optimization
Hello, I have the following query I'd like to optimize a bit: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') ORDER BY rcvd DESC LIMIT 0 , 10 Currently I have an index on the rcvd column which seems to be working based on the output of EXPLAIN: id select_type table typepossible_keys key key_len ref rowsExtra 1 SIMPLE sc index open_dt ndx_rcvd4 NULL 10 Using where 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1 However the query is still fairly slow for some reason, any ideas how I could speed it up with additional indexes, etc? The values I am using in the WHERE clause are variable and change each time. The table has around 23 million records right now but is expected to continue to grow up to a potential 150 million. Here is the table schema: CREATE TABLE IF NOT EXISTS `firewall_connections` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `open_dt` datetime NOT NULL, `close_dt` datetime NOT NULL, `protocol` smallint(6) NOT NULL, `src_address` int(10) unsigned NOT NULL, `src_port` smallint(5) unsigned NOT NULL, `dst_address` int(10) unsigned NOT NULL, `dst_port` smallint(5) unsigned NOT NULL, `sent` int(10) unsigned NOT NULL, `rcvd` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `ndx_rcvd` (`rcvd`), KEY `ndx_sent` (`sent`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Thanks in advance! -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Query Optimization
Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
Thanks Singer, this took my query down to 0.0007, perfect! I wasn't aware a single index of multiple columns would work when one of the columns was in the WHERE clause and the other in the ORDER BY clause. Learn something new every day I guess! On 08/10/2011 02:03 PM, Singer X.J. Wang wrote: Try a index on (dst_port,close_dt) On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com mailto:bphe...@gls.com wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?__unsub=w...@singerwang.com http://lists.mysql.com/mysql?unsub=w...@singerwang.com -- The best compliment you could give Pythian for our service is a referral. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org