subject:"optimization"

Re: MySQL optimization for testing purpose

2018-04-04 Thread Sebastien FLAESCH


On 04/03/2018 06:15 PM, Sebastien FLAESCH wrote:

On 04/03/2018 05:59 PM, Sebastien FLAESCH wrote:

Hi all,

Looking for some configuration tips to speed execution of our non-regression 
tests with MySQL.

We do pure functional testing, data persistence is not critical.

Our tests are not using a lot of data, but we create and drop a lot of test 
tables.

We have concurrency tests (processes locking rows), so this feature matters.

Are there some InnoDB params to enable or disable to speed up our tests?

Some tests behave a bit strange, sometimes one of our test takes 1 min 30 secs, 
sometimes
it takes only 15 seconds...


In fact this tests creates and drops ~150 times the same table, having a single 
column
using different types each time.

I can clearly see that sometimes table creations are fast, but from time to 
time it
slows down to about a second to create a table.



How can this happen?

Any log I could enable and provide here?

Thanks in advance!
Seb






FYI, have tried following settings, but it did not help:

innodb_stats_auto_recalc=0
innodb_file_per_table=0
innodb_stats_persistent=0

I have recreated my database after restarting the server.

Server version: 8.0.4-rc-log MySQL Community Server (GPL)

Seb

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: MySQL optimization for testing purpose

2018-04-03 Thread Sebastien FLAESCH


On 04/03/2018 05:59 PM, Sebastien FLAESCH wrote:

Hi all,

Looking for some configuration tips to speed execution of our non-regression 
tests with MySQL.

We do pure functional testing, data persistence is not critical.

Our tests are not using a lot of data, but we create and drop a lot of test 
tables.

We have concurrency tests (processes locking rows), so this feature matters.

Are there some InnoDB params to enable or disable to speed up our tests?

Some tests behave a bit strange, sometimes one of our test takes 1 min 30 secs, 
sometimes
it takes only 15 seconds...


In fact this tests creates and drops ~150 times the same table, having a single 
column
using different types each time.

I can clearly see that sometimes table creations are fast, but from time to 
time it
slows down to about a second to create a table.



How can this happen?

Any log I could enable and provide here?

Thanks in advance!
Seb




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

MySQL optimization for testing purpose

2018-04-03 Thread Sebastien FLAESCH


Hi all,

Looking for some configuration tips to speed execution of our non-regression 
tests with MySQL.

We do pure functional testing, data persistence is not critical.

Our tests are not using a lot of data, but we create and drop a lot of test 
tables.

We have concurrency tests (processes locking rows), so this feature matters.

Are there some InnoDB params to enable or disable to speed up our tests?

Some tests behave a bit strange, sometimes one of our test takes 1 min 30 secs, 
sometimes
it takes only 15 seconds...

How can this happen?

Any log I could enable and provide here?

Thanks in advance!
Seb

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

inconsistent optimization

2014-08-20 Thread Jim


Without going into specific details on queries...

Using mysql 5.1 as provided with CentOS6, I've noticed some queries 
providing what I can best explain as inconsistent optimization. The 
database can be quieted to just controlled queries and at times the same 
query will return very quickly when at other times may take minutes.


I don't see the same behavior with mysql5.0 under CentOS5. The same 
queries on the same data returns quickly consistently.


When the queries run slowly they show in a process list as either in a 
copy to temp table or sending data state. At first I thought query 
restructuring to avoid the copy to temp table was a path to a solution, 
but now I don't think so since the same query changed so that it no 
longer needs a temp table will sit in the sending data state for a 
long time.


The queries do eventually come back with correct results, but it takes 
minutes rather than milliseconds (sometimes slow; sometimes fast).


Have others seen this behavior? Any explanations?
Any reading to point to for further understanding?

Thanks.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: inconsistent optimization

2014-08-20 Thread Eduardo Fontinelle - Gerencianet Pagamentos

Well,

Try to start checking the IOPs vs Disc. Check your iowait and the cache
size.

Could you send a create table and the query for us?





Atenciosamente,

*Eduardo Fontinelle*
*Chief Technology Officer | G**erencianet*
Phone: +55 (31) 3603-0812



2014-08-20 12:04 GMT-03:00 Jim j...@lowcarbfriends.com:

 Without going into specific details on queries...

 Using mysql 5.1 as provided with CentOS6, I've noticed some queries
 providing what I can best explain as inconsistent optimization. The
 database can be quieted to just controlled queries and at times the same
 query will return very quickly when at other times may take minutes.

 I don't see the same behavior with mysql5.0 under CentOS5. The same
 queries on the same data returns quickly consistently.

 When the queries run slowly they show in a process list as either in a
 copy to temp table or sending data state. At first I thought query
 restructuring to avoid the copy to temp table was a path to a solution, but
 now I don't think so since the same query changed so that it no longer
 needs a temp table will sit in the sending data state for a long time.

 The queries do eventually come back with correct results, but it takes
 minutes rather than milliseconds (sometimes slow; sometimes fast).

 Have others seen this behavior? Any explanations?
 Any reading to point to for further understanding?

 Thanks.

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql

RE: inconsistent optimization

2014-08-20 Thread Martin Gainty

Jim/Jaime

What engine are you implementing?/
Qual mecanismo de MySQL que você está implementando?
Saludos desde Sud America
Martín

 Date: Wed, 20 Aug 2014 13:54:46 -0300
 Subject: Re: inconsistent optimization
 From: edua...@gerencianet.com.br
 To: j...@lowcarbfriends.com
 CC: mysql@lists.mysql.com

 Well,

 Try to start checking the IOPs vs Disc. Check your iowait and the cache
 size.

 Could you send a create table and the query for us?

 Atenciosamente,

 *Eduardo Fontinelle*
 *Chief Technology Officer | G**erencianet*
 Phone: +55 (31) 3603-0812

 2014-08-20 12:04 GMT-03:00 Jim j...@lowcarbfriends.com:

  Without going into specific details on queries...

  Using mysql 5.1 as provided with CentOS6, I've noticed some queries
  providing what I can best explain as inconsistent optimization. The
  database can be quieted to just controlled queries and at times the same
  query will return very quickly when at other times may take minutes.

  I don't see the same behavior with mysql5.0 under CentOS5. The same
  queries on the same data returns quickly consistently.

  When the queries run slowly they show in a process list as either in a
  copy to temp table or sending data state. At first I thought query
  restructuring to avoid the copy to temp table was a path to a solution, but
  now I don't think so since the same query changed so that it no longer
  needs a temp table will sit in the sending data state for a long time.

  The queries do eventually come back with correct results, but it takes
  minutes rather than milliseconds (sometimes slow; sometimes fast).

  Have others seen this behavior? Any explanations?
  Any reading to point to for further understanding?

  Thanks.

  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:http://lists.mysql.com/mysql

Re: inconsistent optimization

2014-08-20 Thread Jim


innodb

On 8/20/2014 1:22 PM, Martin Gainty wrote:

Jim/Jaime

What engine are you implementing?/
Qual mecanismo de MySQL que você está implementando?
Saludos desde Sud America
Martín



Date: Wed, 20 Aug 2014 13:54:46 -0300
Subject: Re: inconsistent optimization
From: edua...@gerencianet.com.br
To: j...@lowcarbfriends.com
CC: mysql@lists.mysql.com

Well,

Try to start checking the IOPs vs Disc. Check your iowait and the cache
size.

Could you send a create table and the query for us?





Atenciosamente,

*Eduardo Fontinelle*
*Chief Technology Officer | G**erencianet*
Phone: +55 (31) 3603-0812



2014-08-20 12:04 GMT-03:00 Jim j...@lowcarbfriends.com:


Without going into specific details on queries...

Using mysql 5.1 as provided with CentOS6, I've noticed some queries
providing what I can best explain as inconsistent optimization. The
database can be quieted to just controlled queries and at times the same
query will return very quickly when at other times may take minutes.

I don't see the same behavior with mysql5.0 under CentOS5. The same
queries on the same data returns quickly consistently.

When the queries run slowly they show in a process list as either in a
copy to temp table or sending data state. At first I thought query
restructuring to avoid the copy to temp table was a path to a solution, but
now I don't think so since the same query changed so that it no longer
needs a temp table will sit in the sending data state for a long time.

The queries do eventually come back with correct results, but it takes
minutes rather than milliseconds (sometimes slow; sometimes fast).

Have others seen this behavior? Any explanations?
Any reading to point to for further understanding?

Thanks.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: inconsistent optimization

2014-08-20 Thread shawn l.green

Hi Jim,

On 8/20/2014 11:04 AM, Jim wrote:

Without going into specific details on queries...

Using mysql 5.1 as provided with CentOS6, I've noticed some queries
providing what I can best explain as inconsistent optimization. The
database can be quieted to just controlled queries and at times the same
query will return very quickly when at other times may take minutes.

I don't see the same behavior with mysql5.0 under CentOS5. The same
queries on the same data returns quickly consistently.

When the queries run slowly they show in a process list as either in a
copy to temp table or sending data state. At first I thought query
restructuring to avoid the copy to temp table was a path to a solution,
but now I don't think so since the same query changed so that it no
longer needs a temp table will sit in the sending data state for a
long time.

The queries do eventually come back with correct results, but it takes
minutes rather than milliseconds (sometimes slow; sometimes fast).

Have others seen this behavior? Any explanations?
Any reading to point to for further understanding?

Fluctuations in query times can be the results of configuration mistakes
(like creating a 1GB query cache or a tiny InnoDB Buffer Pool), or data
changes (did you add or remove or change a bunch of rows), or query
patterns (did you add or remove terms from your WHERE clauses, did you
change which columns were in your SELECT clause, ... ).

To know why a query is doing what it is doing, you need to ask the
Optimizer. The Optimizer is that part of the server that works out the
most efficient way to go get the data you are asking for and how to
process that data once it is pulled from disk or cache.

This is the purpose of the EXPLAIN operator. Just put that word before
SELECT and see what you get. An explanation of how to interpret an
EXPLAIN report is here in the manual (you are reading the manual, right?)

http://dev.mysql.com/doc/refman/5.1/en/explain.html
http://dev.mysql.com/doc/refman/5.1/en/execution-plan-information.html

That will give you a starting place. After that, you can refer to the
other sections of the Optimization chapter to see what you can or
should be changing to improve your performance.

http://dev.mysql.com/doc/refman/5.1/en/optimization.html

You should also need to learn a little bit about the topic of index
statistics as those are what the Optimizer uses to develop its
execution plans.

http://dev.mysql.com/doc/refman/5.1/en/analyze-table.html
http://dev.mysql.com/doc/refman/5.1/en/show-index.html
http://dev.mysql.com/doc/refman/5.1/en/innodb-restrictions.html (search
for ANALYZE TABLE determines index cardinality...)

http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_stats_sample_pages
http://dev.mysql.com/doc/refman/5.1/en/optimizer-issues.html

Feel free to ask the list any questions that may arise in your research.

Regards,
--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: LIKE sql optimization

2014-02-12 Thread Jesper Wisborg Krogh

On 12/02/2014 13:16, Morgan Tocker wrote:
 Hi Zhigang,

 On Feb 11, 2014, at 8:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote:

 I want to know the reason, in my opinion, to scan the smaller index data has
 better performance than to scan the whole table data. 
 I think I understand the question - you are asking why MySQL will not index 
 scan, find matching records, and then look them up rather than table scan?

 I believe the answer is that there is no way of knowing if 1 row matches, or 
 all rows match.  In the worst case (where all rows match), it is much more 
 expensive to traverse between index and data rows for-each-record.

 So a table scan is a “safe choice / has less variance.

In addition to what Morgan writes, then with an index scan you will end
up doing a lot of random I/O: even if the index scan itself is one
sequential scan (which is not guaranteed) then for each match, it will
be necessary to look up the actual row. On the other hand a table scan
will generally be more of a sequential read as you already have all the
data available for each match. Random I/O is more expensive than
sequential I/O - particularly on spinning disks - so in general the
optimizer will try to reduce the amount of random I/O.

In some cases though, you may see the index scan be preferred. Assume
you have a query like:

SELECT val FROM table WHERE condition LIKE '%abcd';

and you have an index (condition, val) or (val, condition) then the
whole query can be satisfied from the index (it's called a covering
index). In that case the index scan is usually preferred over the table
scan.


For the purpose of using an index to do index lookups to find the
matching rows rather than doing either a table or index scan for WHERE
clauses like LIKE '%abcd' you can do a couple of things:

  * Duplicate the column used in the WHERE clause, but reverse the
string. That way the above WHERE clause becomes: WHERE
condition_revers LIKE 'dcba%'
This can use an index as it is a left prefix.
  * If you always look for around the same number of characters at the
end in your WHERE clause, you can create a column with just those
last characters, e.g. so the WHERE clause becomes: WHERE
condition_suffix = 'abcd'
Do however be careful that you ensure you have enough selectivity
that way. If for example 90% of the rows ends in 'abcd' an index
will not do you much good (unless you are looking for the last 10%
of the rows).


Best regards,
Jesper Krogh
MySQL Support

RE: LIKE sql optimization

2014-02-12 Thread Zhigang Zhang

Done.

 

Thand you very much!

 

Zhigang

 

  _  

From: Jesper Wisborg Krogh [mailto:my...@wisborg.dk] 
Sent: Wednesday, February 12, 2014 5:30 PM
To: Morgan Tocker; Zhigang Zhang
Cc: mysql@lists.mysql.com
Subject: Re: LIKE sql optimization

 

On 12/02/2014 13:16, Morgan Tocker wrote:

Hi Zhigang,
 
On Feb 11, 2014, at 8:48 PM, Zhigang Zhang  mailto:zzgang2...@gmail.com
zzgang2...@gmail.com wrote:
 

I want to know the reason, in my opinion, to scan the smaller index data has
better performance than to scan the whole table data. 

 
I think I understand the question - you are asking why MySQL will not index
scan, find matching records, and then look them up rather than table scan?
 
I believe the answer is that there is no way of knowing if 1 row matches, or
all rows match.  In the worst case (where all rows match), it is much more
expensive to traverse between index and data rows for-each-record.
 
So a table scan is a “safe choice / has less variance.


In addition to what Morgan writes, then with an index scan you will end up
doing a lot of random I/O: even if the index scan itself is one sequential
scan (which is not guaranteed) then for each match, it will be necessary to
look up the actual row. On the other hand a table scan will generally be
more of a sequential read as you already have all the data available for
each match. Random I/O is more expensive than sequential I/O - particularly
on spinning disks - so in general the optimizer will try to reduce the
amount of random I/O.

In some cases though, you may see the index scan be preferred. Assume you
have a query like:

SELECT val FROM table WHERE condition LIKE '%abcd';

and you have an index (condition, val) or (val, condition) then the whole
query can be satisfied from the index (it's called a covering index). In
that case the index scan is usually preferred over the table scan.


For the purpose of using an index to do index lookups to find the matching
rows rather than doing either a table or index scan for WHERE clauses like
LIKE '%abcd' you can do a couple of things:

*   Duplicate the column used in the WHERE clause, but reverse the
string. That way the above WHERE clause becomes: WHERE condition_revers LIKE
'dcba%'
This can use an index as it is a left prefix.
*   If you always look for around the same number of characters at the
end in your WHERE clause, you can create a column with just those last
characters, e.g. so the WHERE clause becomes: WHERE condition_suffix =
'abcd'
Do however be careful that you ensure you have enough selectivity that way.
If for example 90% of the rows ends in 'abcd' an index will not do you much
good (unless you are looking for the last 10% of the rows).


Best regards,
Jesper Krogh
MySQL Support

LIKE sql optimization

2014-02-11 Thread Zhigang Zhang

For example:

 

Select * from T where col like ‘%abcd’;

 

The table T is myisam table and we created a index on col.

 

As we known, this like sql does not use the index created on col, it confuse
me, why?

 

I think in mysiam engine, the index data is smaller, it can use index link
list to optimize it so as to reduce the disk scan than to the whole table
scan.

 

Thanks.

 

Zhigang

Re: LIKE sql optimization

2014-02-11 Thread Reindl Harald



Am 12.02.2014 02:23, schrieb Zhigang Zhang:
 For example:
 
 Select * from T where col like ‘%abcd’;
 
 The table T is myisam table and we created a index on col.
 
 As we known, this like sql does not use the index created on col, it confuse
 me, why?
 
 I think in mysiam engine, the index data is smaller, it can use index link
 list to optimize it so as to reduce the disk scan than to the whole table
 scan

because ‘%abcd’ can't work in case of a index
how do you imagine that?
read how a index works technically

'abcd%' may work but '%abcd' is impossible
independent what engine, this don't work and won't ever work

you may have luck with fulltext search (and it's other drawbacks)
https://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html



signature.asc
Description: OpenPGP digital signature

Re: LIKE sql optimization

2014-02-11 Thread Mathieu Desharnais

Sql database doesn't use index in like statement if it starts with % ..

like 'abcd%' would work though...

To use an index you can store your value using reverse function and index
it .. then your like would use the index.


2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:

 For example:



 Select * from T where col like '%abcd';



 The table T is myisam table and we created a index on col.



 As we known, this like sql does not use the index created on col, it
 confuse
 me, why?



 I think in mysiam engine, the index data is smaller, it can use index link
 list to optimize it so as to reduce the disk scan than to the whole table
 scan.



 Thanks.



 Zhigang

RE: LIKE sql optimization

2014-02-11 Thread Zhigang Zhang

I want to know the reason, in my opinion, to scan the smaller index data has
better performance than to scan the whole table data. 

 

 

zhigang

 

  _  

From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] 
Sent: Wednesday, February 12, 2014 9:41 AM
To: Zhigang Zhang; mysql@lists.mysql.com
Subject: Re: LIKE sql optimization

 

Sql database doesn't use index in like statement if it starts with % .. 

 

like 'abcd%' would work though... 

 

To use an index you can store your value using reverse function and index it
.. then your like would use the index.

 

2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:

For example:



Select * from T where col like ‘%abcd’;



The table T is myisam table and we created a index on col.



As we known, this like sql does not use the index created on col, it confuse
me, why?



I think in mysiam engine, the index data is smaller, it can use index link
list to optimize it so as to reduce the disk scan than to the whole table
scan.



Thanks.



Zhigang

Re: LIKE sql optimization

2014-02-11 Thread kitlenv

*read how a index works technically*


On Wed, Feb 12, 2014 at 12:48 PM, Zhigang Zhang zzgang2...@gmail.comwrote:

 I want to know the reason, in my opinion, to scan the smaller index data
 has
 better performance than to scan the whole table data.





 zhigang



   _

 From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc]
 Sent: Wednesday, February 12, 2014 9:41 AM
 To: Zhigang Zhang; mysql@lists.mysql.com
 Subject: Re: LIKE sql optimization



 Sql database doesn't use index in like statement if it starts with % ..



 like 'abcd%' would work though...



 To use an index you can store your value using reverse function and index
 it
 .. then your like would use the index.



 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:

 For example:



 Select * from T where col like '%abcd';



 The table T is myisam table and we created a index on col.



 As we known, this like sql does not use the index created on col, it
 confuse
 me, why?



 I think in mysiam engine, the index data is smaller, it can use index link
 list to optimize it so as to reduce the disk scan than to the whole table
 scan.



 Thanks.



 Zhigang

Re: LIKE sql optimization

2014-02-11 Thread Reindl Harald

because a index is not just a dumb copy of the whole field
and you simply can't seek in the middle of it?

http://en.wikipedia.org/wiki/B-tree
http://mattfleming.com/node/192

Am 12.02.2014 02:48, schrieb Zhigang Zhang:
 I want to know the reason, in my opinion, to scan the smaller index data has
 better performance than to scan the whole table data. 

 From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] 
 Sent: Wednesday, February 12, 2014 9:41 AM
 To: Zhigang Zhang; mysql@lists.mysql.com
 Subject: Re: LIKE sql optimization
 
 Sql database doesn't use index in like statement if it starts with % .. 
 
 like 'abcd%' would work though...   
 
 To use an index you can store your value using reverse function and index it
 .. then your like would use the index.
 
 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:
 
 For example:
 
 Select * from T where col like ‘%abcd’;
 
 The table T is myisam table and we created a index on col.

 As we known, this like sql does not use the index created on col, it confuse
 me, why?
 
 I think in mysiam engine, the index data is smaller, it can use index link
 list to optimize it so as to reduce the disk scan than to the whole table
 scan.



signature.asc
Description: OpenPGP digital signature

Re: LIKE sql optimization

2014-02-11 Thread louis liu

MySQL can't  use index when '%'  condition gives even oracle and you
can try full-text search


2014-02-12 9:55 GMT+08:00 kitlenv kitl...@gmail.com:

 *read how a index works technically*


 On Wed, Feb 12, 2014 at 12:48 PM, Zhigang Zhang zzgang2...@gmail.com
 wrote:

  I want to know the reason, in my opinion, to scan the smaller index data
  has
  better performance than to scan the whole table data.
 
 
 
 
 
  zhigang
 
 
 
_
 
  From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc]
  Sent: Wednesday, February 12, 2014 9:41 AM
  To: Zhigang Zhang; mysql@lists.mysql.com
  Subject: Re: LIKE sql optimization
 
 
 
  Sql database doesn't use index in like statement if it starts with % ..
 
 
 
  like 'abcd%' would work though...
 
 
 
  To use an index you can store your value using reverse function and index
  it
  .. then your like would use the index.
 
 
 
  2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:
 
  For example:
 
 
 
  Select * from T where col like '%abcd';
 
 
 
  The table T is myisam table and we created a index on col.
 
 
 
  As we known, this like sql does not use the index created on col, it
  confuse
  me, why?
 
 
 
  I think in mysiam engine, the index data is smaller, it can use index
 link
  list to optimize it so as to reduce the disk scan than to the whole table
  scan.
 
 
 
  Thanks.
 
 
 
  Zhigang
 
 
 
 




-- 
Phone: +86 1868061
Email  Gtalk:  yloui...@gmail.com
Personal Blog: http://www.vmcd.org

Re: LIKE sql optimization

2014-02-11 Thread Mathieu Desharnais

Same reason as why composite index works only if you supply first field or
fields ..

example index on a,b,c

if you have a query :

select * from tbl
where a = 'whatever'
and b = 'something

it will use the index ..

but a query like this one :

select * from tbl
where b = 'something'
and c = 'something else'

won't use the index ..


-

Just like an index in a book ...




2014-02-11 21:03 GMT-05:00 louis liu yloui...@gmail.com:

 MySQL can't  use index when '%'  condition gives even oracle and you
 can try full-text search


 2014-02-12 9:55 GMT+08:00 kitlenv kitl...@gmail.com:

 *read how a index works technically*



 On Wed, Feb 12, 2014 at 12:48 PM, Zhigang Zhang zzgang2...@gmail.com
 wrote:

  I want to know the reason, in my opinion, to scan the smaller index data
  has
  better performance than to scan the whole table data.
 
 
 
 
 
  zhigang
 
 
 
_
 
  From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc]
  Sent: Wednesday, February 12, 2014 9:41 AM
  To: Zhigang Zhang; mysql@lists.mysql.com
  Subject: Re: LIKE sql optimization
 
 
 
  Sql database doesn't use index in like statement if it starts with % ..
 
 
 
  like 'abcd%' would work though...
 
 
 
  To use an index you can store your value using reverse function and
 index
  it
  .. then your like would use the index.
 
 
 
  2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:
 
  For example:
 
 
 
  Select * from T where col like '%abcd';
 
 
 
  The table T is myisam table and we created a index on col.
 
 
 
  As we known, this like sql does not use the index created on col, it
  confuse
  me, why?
 
 
 
  I think in mysiam engine, the index data is smaller, it can use index
 link
  list to optimize it so as to reduce the disk scan than to the whole
 table
  scan.
 
 
 
  Thanks.
 
 
 
  Zhigang
 
 
 
 




 --
 Phone: +86 1868061
 Email  Gtalk:  yloui...@gmail.com
 Personal Blog: http://www.vmcd.org

RE: LIKE sql optimization

2014-02-11 Thread Zhigang Zhang

I checked a myisam table index, the index is a copy of the whole field.

 

 

Zhigang

 

 

-Original Message-
From: Reindl Harald [mailto:h.rei...@thelounge.net] 
Sent: Wednesday, February 12, 2014 10:02 AM
To: mysql@lists.mysql.com
Subject: Re: LIKE sql optimization

 

because a index is not just a dumb copy of the whole field

and you simply can't seek in the middle of it?

 

http://en.wikipedia.org/wiki/B-tree

http://mattfleming.com/node/192

 

Am 12.02.2014 02:48, schrieb Zhigang Zhang:

 I want to know the reason, in my opinion, to scan the smaller index data
has

 better performance than to scan the whole table data. 

 

 From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] 

 Sent: Wednesday, February 12, 2014 9:41 AM

 To: Zhigang Zhang; mysql@lists.mysql.com

 Subject: Re: LIKE sql optimization

 

 Sql database doesn't use index in like statement if it starts with % .. 

 

 like 'abcd%' would work though...   

 

 To use an index you can store your value using reverse function and index
it

 .. then your like would use the index.

 

 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com:

 

 For example:

 

 Select * from T where col like ‘%abcd’;

 

 The table T is myisam table and we created a index on col.

 

 As we known, this like sql does not use the index created on col, it
confuse

 me, why?

 

 I think in mysiam engine, the index data is smaller, it can use index link

 list to optimize it so as to reduce the disk scan than to the whole table

 scan.

Re: LIKE sql optimization

2014-02-11 Thread Morgan Tocker

Hi Zhigang,

On Feb 11, 2014, at 8:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote:

 I want to know the reason, in my opinion, to scan the smaller index data has
 better performance than to scan the whole table data. 

I think I understand the question - you are asking why MySQL will not index 
scan, find matching records, and then look them up rather than table scan?

I believe the answer is that there is no way of knowing if 1 row matches, or 
all rows match.  In the worst case (where all rows match), it is much more 
expensive to traverse between index and data rows for-each-record.

So a table scan is a “safe choice / has less variance.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

RE: LIKE sql optimization

2014-02-11 Thread Zhigang Zhang

Thank you very much!

 

Zhigang

 

 

-Original Message-
From: Morgan Tocker [mailto:morgan.toc...@oracle.com] 
Sent: Wednesday, February 12, 2014 10:16 AM
To: Zhigang Zhang
Cc: mysql@lists.mysql.com
Subject: Re: LIKE sql optimization

 

Hi Zhigang,

 

On Feb 11, 2014, at 8:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote:

 

 I want to know the reason, in my opinion, to scan the smaller index data
has

 better performance than to scan the whole table data. 

 

I think I understand the question - you are asking why MySQL will not index
scan, find matching records, and then look them up rather than table scan?

 

I believe the answer is that there is no way of knowing if 1 row matches, or
all rows match.  In the worst case (where all rows match), it is much more
expensive to traverse between index and data rows for-each-record.

 

So a table scan is a “safe choice / has less variance.=

Re: Query Optimization

2012-11-16 Thread Benaya Paul

SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it
faster, if the field is ENUM


On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar
sb_akarmar...@yahoo.comwrote:

 Hi All,

 Consider a scenario, I have table XYZ which contains value follow
 BLUE
 RED
 GREEN
 NULL

 following are queries we can use get this values

 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN');
 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL
 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN'
 and more

 So which one is good in terms of optimization. I guess, 1 and 3 are
 similar in term of formation.


 --Anupam




-- 
Thanks  Regards,
P.Benaya Paul

http://www.codeasearch.com
http://www.iwannasearch.com

RE: Query Optimization

2012-11-16 Thread Rick James

It depends on the distribution of the 4 'values' in that field.
If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do 
a table scan.

 -Original Message-
 From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com]
 Sent: Friday, November 16, 2012 12:36 AM
 To: mysql@lists.mysql.com
 Subject: Query Optimization

 Hi All,

 Consider a scenario, I have table XYZ which contains value follow BLUE RED
 GREEN NULL

 following are queries we can use get this values

 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM
 XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR
 VAL='RED' OR VAL='GREEN'
 and more

 So which one is good in terms of optimization. I guess, 1 and 3 are similar
 in term of formation.

 --Anupam

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

RE: forcing mysql to use batched key access (BKA) optimization for joins

2012-04-17 Thread Rick James

INDEX(o_orderdate, o_cust_key, o_orderkey)
Would probably help a bunch.

I assume you have indexes (PKs?) on c_custkey, l_order_key, n_nationkey.

Please provide SHOW CREATE TABLE and SHOW STATUS TABLE.

-Original Message-
From: Hal?sz S?ndor [mailto:h...@tbbs.net] 
Sent: Tuesday, April 10, 2012 5:20 PM
To: mysql@lists.mysql.com
Subject: Re: forcing mysql to use batched key access (BKA) optimization for 
joins

 2012/04/10 15:58 -0400, Stephen Tu 
select
  c_custkey,
  c_name,
  sum(l_extendedprice * (100 - l_discount)) as revenue,
  c_acctbal,
  n_name,
  c_address,
  c_phone,
  c_comment
from   CUSTOMER_INT,   ORDERS_INT,   LINEITEM_INT,   NATION_INT
where
  c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate = date '1994-08-01'
  and o_orderdate  date '1994-08-01' + interval '3' month
  and l_returnflag = 'R'   and c_nationkey = n_nationkey
group by   c_custkey,   c_name,   c_acctbal,   c_phone,   n_name,
c_address,   c_comment
order by   revenue desc limit 20

I understand that MySQL works better if inner joining explicitly is stated, not 
implicitly as you have it.

What are your keys, indices? Going by that which I have read  heard, you want 
every field named after this query s 'where' an index, if not key--and, of 
course, your every field named '...key' is a key, right?


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: forcing mysql to use batched key access (BKA) optimization,for joins

2012-04-11 Thread Øystein Grøvlen


Hi Stephen,

 2012/04/10 15:58 -0400, Stephen Tu 
| id | select_type | table| type   | possible_keys | key
| key_len | ref | rows|
Extra   |
++-+--++---+---+-+-+-+-+
|  1 | SIMPLE  | CUSTOMER_INT | ALL| PRIMARY   | NULL
| NULL| NULL| 1501528 | Using
temporary; Using filesort |
|  1 | SIMPLE  | NATION_INT   | eq_ref | PRIMARY   | PRIMARY
| 4   | tpch-10.00.CUSTOMER_INT.C_NATIONKEY |   1 |
NULL|
|  1 | SIMPLE  | ORDERS_INT   | ref| PRIMARY,O_CUSTKEY | O_CUSTKEY
| 4   | tpch-10.00.CUSTOMER_INT.C_CUSTKEY   |   7 | Using where;
Using join buffer (Batched Key Access) |
|  1 | SIMPLE  | LINEITEM_INT | ref| PRIMARY   | PRIMARY
| 4   | tpch-10.00.ORDERS_INT.O_ORDERKEY|   1 | Using
where |
++-+--++---+---+-+-+-+-+
4 rows in set (0.00 sec)

I'm wondering why, in this particular query, mysql doesn't use the BKA to
join the LINEITEM_INT table, but uses it for the ORDERS_INT table? It 
seems

like it should also use BKA to batch key fetches from the LINEITEM_INT
table (this I believe is the reason for the slow performance).


The basis for Batched Key Access (BKA) is the Disk-Sweep Multi-Range
Read (DS-MRR) strategy.  The basic idea of DS-MRR is to accumulate
primary keys from a batch of secondary index look-ups and access the
rows in the base table in primary key order.  In other words, DS-MRR
(and BKA) does not apply for look-ups by primary key.  Hence, since
the ref access into lineitem is by primary key, BKA will not be used.

Maybe you will get a more optimal plan if you add more indexes.  In my
case, where I have an index on orders(o_orderdate), the join will
start with a range scan on the orders tables.  If I enable DS-MRR for
this range scan, query time is reduced from 455 seconds (without
DS-MRR) to 90 seconds on a scale 1 database.

Hope this helps,

--
Øystein

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: forcing mysql to use batched key access (BKA) optimization for joins

2012-04-10 Thread Hal�sz S�ndor

 2012/04/10 15:58 -0400, Stephen Tu 
select
  c_custkey,
  c_name,
  sum(l_extendedprice * (100 - l_discount)) as revenue,
  c_acctbal,
  n_name,
  c_address,
  c_phone,
  c_comment
from   CUSTOMER_INT,   ORDERS_INT,   LINEITEM_INT,   NATION_INT
where
  c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate = date '1994-08-01'
  and o_orderdate  date '1994-08-01' + interval '3' month
  and l_returnflag = 'R'   and c_nationkey = n_nationkey
group by   c_custkey,   c_name,   c_acctbal,   c_phone,   n_name,
c_address,   c_comment
order by   revenue desc limit 20

I understand that MySQL works better if inner joining explicitly is stated, not 
implicitly as you have it.

What are your keys, indices? Going by that which I have read  heard, you want 
every field named after this query s 'where' an index, if not key--and, of 
course, your every field named '...key' is a key, right?


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

query optimization

2011-09-22 Thread supr_star



 I have a table with 24 million rows, I need to figure out how to optimize a 
query.  It has to do with mac addresses and radius packets - I want to see the 
# of connections and the min/max date. So I basically want all this data:

  select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) 
recn 
  from radiuscap 
  where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) 
    and r3_type='Access' 
  group by cpe_mac order by cpe_mac
;

This piece of the query takes 30 seconds to run and produces 3500 rows.  I have 
r3_dt indexed.  I also want a status field of the row with the highest r3_dt:

select rec_num,cpe_mac,req_status 
from rad_r3cap
where r3_type='Access'
  and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap)
;

This piece of the query takes forever,  I let it run for an hour and it still 
didn't finish, it's obviously not using indexes.  I have no idea how far along 
it got.  I wrote a php script to run the 1st query, then do 3500 individual 
lookups for the status using the max(rec_num) field in the 1st query, and I can 
get the data in 31 seconds.  So I CAN produce this data, but very slowly, and 
not in 1 sql query.  I want to consolidate this into 1 sql so I can make a view.

If anyone can point me in the right direction, I'd appreciate it!



mysql desc rad_r3cap;
+-+-+--+-+-++
| Field       | Type        | Null | Key | Default | Extra          |
+-+-+--+-+-++
| rec_num     | int(11)     | NO   | PRI | NULL    | auto_increment |
| r3_dt       | datetime    | YES  | MUL | NULL    |                |
| r3_micros   | int(11)     | YES  |     | NULL    |                |
| r3_type     | varchar(16) | YES  |     | NULL    |                |
| req_status  | varchar(16) | YES  |     | NULL    |                |
| req_comment | varchar(64) | YES  |     | NULL    |                |
| asn_ip      | varchar(16) | YES  | MUL | NULL    |                |
| asn_name    | varchar(16) | YES  |     | NULL    |                |
| bsid        | varchar(12) | YES  | MUL | NULL    |                |
| cpe_ip      | varchar(16) | YES  |     | NULL    |                |
| cpe_mac     | varchar(12) | YES  | MUL | NULL    |                |
| filename    | varchar(32) | YES  |     | NULL    |                |
| linenum     | int(11)     | YES  |     | NULL    |                |
| r3_hour     | datetime    | YES  | MUL | NULL    |                |
| user_name   | varchar(64) | YES  |     | NULL    |                |
+-+-+--+-+-++

mysql show indexes in rad_r3cap;
+---++--+--+-+---+-+--++--++-+
| Table     | Non_unique | Key_name     | Seq_in_index | Column_name | 
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---++--+--+-+---+-+--++--++-+
| rad_r3cap |          0 | PRIMARY      |            1 | rec_num     | A        
 |    23877677 |     NULL | NULL   |      | BTREE      |         |
| rad_r3cap |          0 | r3cap_dt     |            1 | r3_dt       | A        
 |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          0 | r3cap_dt     |            2 | r3_micros   | A        
 |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_bsid   |            1 | bsid        | A        
 |         346 |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_asnip  |            1 | asn_ip      | A        
 |          55 |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_cpemac |            1 | cpe_mac     | A        
 |        4758 |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_date   |            1 | r3_hour     | A        
 |        1548 |     NULL | NULL   | YES  | BTREE      |         |
+---++--+--+-+---+-+--++--++-+
7 rows in set (0.00 sec)

Re: query optimization

2011-09-22 Thread Ananda Kumar

Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full
table scan, you might want to check on this and use a WHERE condition to
use indexed column

On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote:



  I have a table with 24 million rows, I need to figure out how to optimize
 a query.  It has to do with mac addresses and radius packets - I want to see
 the # of connections and the min/max date. So I basically want all this
 data:

   select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num)
 recn
   from radiuscap
   where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY)
 and r3_type='Access'
   group by cpe_mac order by cpe_mac
 ;

 This piece of the query takes 30 seconds to run and produces 3500 rows.  I
 have r3_dt indexed.  I also want a status field of the row with the highest
 r3_dt:

 select rec_num,cpe_mac,req_status
 from rad_r3cap
 where r3_type='Access'
   and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap)
 ;

 This piece of the query takes forever,  I let it run for an hour and it
 still didn't finish, it's obviously not using indexes.  I have no idea how
 far along it got.  I wrote a php script to run the 1st query, then do 3500
 individual lookups for the status using the max(rec_num) field in the 1st
 query, and I can get the data in 31 seconds.  So I CAN produce this data,
 but very slowly, and not in 1 sql query.  I want to consolidate this into 1
 sql so I can make a view.

 If anyone can point me in the right direction, I'd appreciate it!



 mysql desc rad_r3cap;
 +-+-+--+-+-++
 | Field   | Type| Null | Key | Default | Extra  |
 +-+-+--+-+-++
 | rec_num | int(11) | NO   | PRI | NULL| auto_increment |
 | r3_dt   | datetime| YES  | MUL | NULL||
 | r3_micros   | int(11) | YES  | | NULL||
 | r3_type | varchar(16) | YES  | | NULL||
 | req_status  | varchar(16) | YES  | | NULL||
 | req_comment | varchar(64) | YES  | | NULL||
 | asn_ip  | varchar(16) | YES  | MUL | NULL||
 | asn_name| varchar(16) | YES  | | NULL||
 | bsid| varchar(12) | YES  | MUL | NULL||
 | cpe_ip  | varchar(16) | YES  | | NULL||
 | cpe_mac | varchar(12) | YES  | MUL | NULL||
 | filename| varchar(32) | YES  | | NULL||
 | linenum | int(11) | YES  | | NULL||
 | r3_hour | datetime| YES  | MUL | NULL||
 | user_name   | varchar(64) | YES  | | NULL||
 +-+-+--+-+-++

 mysql show indexes in rad_r3cap;

 +---++--+--+-+---+-+--++--++-+
 | Table | Non_unique | Key_name | Seq_in_index | Column_name |
 Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |

 +---++--+--+-+---+-+--++--++-+
 | rad_r3cap |  0 | PRIMARY  |1 | rec_num | A
   |23877677 | NULL | NULL   |  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |1 | r3_dt   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |2 | r3_micros   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_bsid   |1 | bsid| A
   | 346 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_asnip  |1 | asn_ip  | A
   |  55 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_cpemac |1 | cpe_mac | A
   |4758 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_date   |1 | r3_hour | A
   |1548 | NULL | NULL   | YES  | BTREE  | |

 +---++--+--+-+---+-+--++--++-+
 7 rows in set (0.00 sec)

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


On 09/01/2011 01:32 PM, Brandon Phelps wrote:

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |

Re: Query Optimization

2011-09-08 Thread Andrew Moore

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote:

 On 09/01/2011 01:32 PM, Brandon Phelps wrote:

 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

 On 9/1/2011 09:42, Brandon Phelps wrote:

 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.

 Any other ideas?


 I believe Jochem was on the right track but he got his dates reversed.

 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start = (ending time) and end = (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
 AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+

 EXPLAIN output for new method with new index:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---+--**--+---**
+--+-+**+--+--**---+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
++-+---+--**--+---**
+--+-+**+--+--**---+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
1 | |
++-+---+--**--+---**
+--+-+**+--+--**---+

EXPLAIN output for new method with new index:
++-+---+--**--+---**

Re: Query Optimization

2011-09-08 Thread Andrew Moore

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.



 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be
 part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window -
 include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start= (ending time) and end= (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for
 you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref
 |
 rows | Extra |
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | 1 | SIMPLE | sc | index |

Re: Query Optimization

2011-09-08 Thread Mihail Manolov

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
 date ranges and new data is constantly coming in, so I am not sure how I 
 could archive/cache the necessary data that would be any more efficient than 
 simply using the database directly.
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 Thinking outside the query, is there any archiving that could happen to make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
 On 9/1/2011 09:42, Brandon Phelps wrote:
 
 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 ...
 
 WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 00:00:00')
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 12:36:53')
 
 In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 Now add an index over open_dt and close_dt and see what happens.
 
 Jochem
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.
 
 Any other ideas?
 
 
 I believe Jochem was on the right track but he got his dates reversed.
 
 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || 
 markers
 showing their durations.
 
 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|
 
 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.
 
 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause
 
 WHERE start= (ending time) and end= (starting time)
 
 Try that and let us know the results.
 
 
 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
 
 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting
varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.



On 09/08/2011 02:16 PM, Andrew Moore wrote:


Thinking outside the query, is there any archiving that could happen to
make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:


  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


  On 9/1/2011 09:42, Brandon Phelps wrote:


  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:



...





  WHERE

(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30


00:00:00')



AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30



12:36:53')


  In that case your logic here simplifies to:

WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



  Now add an index over open_dt and close_dt and see what happens.




  Jochem




Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull
1
days worth of connections, I would miss that entry. Basically I want
to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start
and
end dates, I need to see that record.

Any other ideas?


  I believe Jochem was on the right track but he got his dates

reversed.

Let's try a little ASCII art to show the situation. I will setup a
query
window with two markers (s) and (e). Events will be marked by ||
markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting

Re: Query Optimization

2011-09-08 Thread Derek Downey

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
 server 11.04.  Unfortunately the machine only has 2GB of RAM but no other 
 major daemons are running on the machine.  We are running RAID 1 (mirroring) 
 with 1TB drives.  The tables in question here are all MyISAM.  When running 
 with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key   
   | key_len | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using where; 
 Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:
 
 +-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key | key_len 
 | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL
 | NULL   | 32393330 | Using where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 Thanks for all your help thus far.
 
 On 09/08/2011 02:38 PM, Andrew Moore wrote:
 I don't think I saw any query timings in the emails (maybe I missed them).
 
 What version of MySQL are you currently using?
 What does the explain look like when your remove the limit 10?
 Is your server tuned for MyISAM or InnoDB?
 What kind of disk setup is in use?
 How much memory is in your machine?
 
 
 On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.
 
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 
 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:
 
  On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
  On 9/1/2011 09:42, Brandon Phelps wrote:
 
  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 
 ...
 
 
  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 
 00:00:00')
 
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 
 12:36:53')
 
  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 
  Now add an index over open_dt and close_dt and see what happens.
 
 
  Jochem
 
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Mihail,

I have considered this but have not yet determined how best to go about 
partitioning the table.  I don't think partitioning by dst_address or 
src_address would help because most of the queries do not filter on IP address 
(except very specific queries where the end-user is searching the table for 
history on a particular employee).

I could potentially partition the table based on the day of week the connection 
was opened on which may improve performance for a while since this would take 
me from a single 32million record table down to roughly 4.5 million records per 
partition (32 / 7) however we are looking to eventually store up to 2 months 
worth of data in the table, and so far that 32 million records is only for 1 
month, so I estimate another 32 million-ish before the month is out, bringing 
me to roughly 70 million records total (it would be nice if I could store even 
more than 2 months, but given my currently performance dilemma I don't expect 
that to happen).  Also this does not take into account that the end-user will 
often be pulling data for multiple days at a time, meaning that multiple 
partitions in this scheme will need to be accessed anyway.

The only other logical partitioning scheme I can think of would be to partition 
based on dst_port (the port the log relates to) but the majority of records are 
all to port 80 (standard web traffic) so I don't think this would be all that 
helpful.

I have never messed with partitioning so it is totally possible that I am not 
thinking of something, so if you have any ideas on a decent partitioning scheme 
based on my criteria and queries below, please let me know.

Thanks,
Brandon
 


On 09/08/2011 02:47 PM, Mihail Manolov wrote:

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 
13:18:58' = close_dt = '2011-09-08 13:18:58');


++-+---++---+-+-++--+-+
| id | select_type | table | type   | possible_keys | key | key_len | ref   
 | rows | Extra   |
++-+---++---+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| NULL  
 | 32393330 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++---+-+-++--+-+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is

Re: Query Optimization

2011-09-08 Thread Andrew Moore

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote:

 Mihail,

 I have considered this but have not yet determined how best to go about
 partitioning the table.  I don't think partitioning by dst_address or
 src_address would help because most of the queries do not filter on IP
 address (except very specific queries where the end-user is searching the
 table for history on a particular employee).

 I could potentially partition the table based on the day of week the
 connection was opened on which may improve performance for a while since
 this would take me from a single 32million record table down to roughly 4.5
 million records per partition (32 / 7) however we are looking to eventually
 store up to 2 months worth of data in the table, and so far that 32 million
 records is only for 1 month, so I estimate another 32 million-ish before the
 month is out, bringing me to roughly 70 million records total (it would be
 nice if I could store even more than 2 months, but given my currently
 performance dilemma I don't expect that to happen).  Also this does not take
 into account that the end-user will often be pulling data for multiple days
 at a time, meaning that multiple partitions in this scheme will need to be
 accessed anyway.

 The only other logical partitioning scheme I can think of would be to
 partition based on dst_port (the port the log relates to) but the majority
 of records are all to port 80 (standard web traffic) so I don't think this
 would be all that helpful.

 I have never messed with partitioning so it is totally possible that I am
 not thinking of something, so if you have any ideas on a decent partitioning
 scheme based on my criteria and queries below, please let me know.

 Thanks,
 Brandon


 On 09/08/2011 02:47 PM, Mihail Manolov wrote:

 If you're running version 5.1+ you may wanna take a look at table
 partitioning options you may have.

 On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.


 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com
 wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so
 given
 your logic if the connection was started 2 days ago and I want to
 pull 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be

Re: Query Optimization

2011-09-08 Thread Mihail Manolov

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:

 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR 
 ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58');
 
 
 ++-+---++---+-+-++--+-+
 | id | select_type | table | type   | possible_keys | key | key_len | ref 
| rows | Extra   |
 ++-+---++---+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| 
 NULL   | 32393330 | Using where |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.dst_port |1 | |
 ++-+---++---+-+-++--+-+
 
 I did create indexes on open_dt and close_dt (2 separate indexes).
 
 
 
 On 09/08/2011 02:55 PM, Derek Downey wrote:
 Correct me if I'm wrong. You're wanting to get all records that have an 
 open_date or a close_date between two times.
 
 If that's correct, you might be able to get an index_merge by doing a query 
 like:
 
 WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
 time)=close_dt=(ending time))
 
 and creating two indexes (one on 'open_dt' and the other on 'close_dt')
 
 http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html
 
 Regards,
 Derek
 
 On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
 
 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on 
 Ubuntu server 11.04.  Unfortunately the machine only has 2GB of RAM but no 
 other major daemons are running on the machine.  We are running RAID 1 
 (mirroring) with 1TB drives.  The tables in question here are all MyISAM.  
 When running with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key 
 | key_len | ref| rows | Extra   
 |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using 
 where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.src_port |1 | 
 |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.dst_port |1 | 
 |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Andrew,

Generally there is only 1 user performing the complicated SELECT query at a time, however 
the background process that fills the table is constantly doing a fast SELECT (0.3 
seconds) and a subsequent UPDATE.  Basically whenever a connection is closed on the 
firewall, the bg process SELECTS from the table the last record that was opened (and 
hasn't already been closed) for a given source IP, source port, dest ip, and protocol.  
So for example whenever the firewall logs a CONNECTION CLOSED message, the bg 
process does:

select id from sonicwall_connections where src_address = w.x.y.z and src_port = 
x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1

then it UPDATES the close_dt column for the record with the selected ID.  These 
select/update statements being run by the background process generally take 
0.000x seconds each.

The only slowdown we see is with the complicated SELECT I have below, and this 
is only executed once in a while throughout the day.  For example, if an IT 
manager sees a huge bandwidth spike on the network, he may access the web gui 
to determine what the source IP is so he can determine who is downloading a 
large file, etc.

I think what I will try to do is create 60 partitions for the table in question 
based on month and day.  This way each day will go into it's own partition and 
if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it 
will only need to access 7 partitions instead of the entire table.

My only question with this is how would I go about creating the table with 60 
partitions in such a way that I won't need to constantly add/remove new/old 
partitions every day?  Is there any partitioning function I can use that would 
not require me to update the partitions schema every day?  I already plan to 
have a cron run each night to purge records older than 60 days from the 
database.

On 09/08/2011 03:26 PM, Andrew Moore wrote:

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com  wrote:


Mihail,

I have considered this but have not yet determined how best to go about
partitioning the table.  I don't think partitioning by dst_address or
src_address would help because most of the queries do not filter on IP
address (except very specific queries where the end-user is searching the
table for history on a particular employee).

I could potentially partition the table based on the day of week the
connection was opened on which may improve performance for a while since
this would take me from a single 32million record table down to roughly 4.5
million records per partition (32 / 7) however we are looking to eventually
store up to 2 months worth of data in the table, and so far that 32 million
records is only for 1 month, so I estimate another 32 million-ish before the
month is out, bringing me to roughly 70 million records total (it would be
nice if I could store even more than 2 months, but given my currently
performance dilemma I don't expect that to happen).  Also this does not take
into account that the end-user will often be pulling data for multiple days
at a time, meaning that multiple partitions in this scheme will need to be
accessed anyway.

The only other logical partitioning scheme I can think of would be to
partition based on dst_port (the port the log relates to) but the majority
of records are all to port 80 (standard web traffic) so I don't think this
would be all that helpful.

I have never messed with partitioning so it is totally possible that I am
not thinking of something, so if you have any ideas on a decent partitioning
scheme based on my criteria and queries below, please let me know.

Thanks,
Brandon


On 09/08/2011 02:47 PM, Mihail Manolov wrote:


If you're running version 5.1+ you may wanna take a look at table
partitioning options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting

varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT | union1,2 | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings

Re: Query Optimization

2011-09-08 Thread Mihail Manolov

From the manual: The default behavior for UNION is that duplicate rows are 
removed from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:

 Mihail,
 
 Thanks so much!  I modified your example to include the proper ORDER BY and 
 LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
 Question, if a record's open_dt is between the range AND the close_dt is 
 between the range as well, will the UNION output the record twice?  If so, is 
 there any way to prevent that?
 
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 UNION
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 ORDER BY rcvd DESC LIMIT 10;
 
 ++--++++--+-++--++
 | id | select_type  | table  | type   | possible_keys 
  | key  | key_len | ref| rows | Extra 
  |
 ++--++++--+-++--++
 |  1 | PRIMARY  | sc | range  | 
 open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL
| 1057 | Using where|
 |  1 | PRIMARY  | spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  1 | PRIMARY  | dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 |  2 | UNION| sc | range  | ndx_close_dt  
  | ndx_close_dt | 8   | NULL   | 1131 | Using where   
  |
 |  2 | UNION| spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  2 | UNION| dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 | NULL | UNION RESULT | union1,2 | ALL| NULL
| NULL | NULL| NULL   | NULL | Using 
 filesort |
 ++--++++--+-++--++
 
 
 
 On 09/08/2011 03:45 PM, Mihail Manolov wrote:
 How about:
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 UNION
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 
 
 On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
 
 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM

Re: Query Optimization

2011-09-08 Thread Brandon Phelps


Ah I see.  Well thanks for your assistance!

-Brandon

On 09/08/2011 05:21 PM, Mihail Manolov wrote:

 From the manual: The default behavior for UNION is that duplicate rows are removed 
from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT |union1,2  | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS

Re: Query Optimization

2011-09-01 Thread Jochem van Dieten

On Aug 30, 2011 6:46 PM, Brandon Phelps wrote:
 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc

If this is a firewall connection log I presume open_dt is the time a
connection was opened and is always going to be less than close_dt. Right?

 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

In that case your logic here simplifies to:
WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

Now add an index over open_dt and close_dt and see what happens.

Jochem

Re: Query Optimization

2011-09-01 Thread Brandon Phelps


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA( sc.src_address ) AS src_address,
 sc.src_port,
 INET_NTOA( sc.dst_address ) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
  FROM firewall_connections AS sc

 If this is a firewall connection log I presume open_dt is the time a
 connection was opened and is always going to be less than close_dt.
 Right?

  WHERE
 (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
 AND (open_dt = '2011-08-30 12:36:53' OR close_dt = 
'2011-08-30 12:36:53')


 In that case your logic here simplifies to:
 WHERE
   open_dt = '2011-08-30 00:00:00'
   AND
   close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve 
results where the open_dt time is out of the range specified.  For 
example, a very large file download might span multiple days so given 
your logic if the connection was started 2 days ago and I want to pull 1 
days worth of connections, I would miss that entry.  Basically I want to 
SELECT all of the records that were opened OR closed during the 
specified time period, ie. if any activity happened between my start and 
end dates, I need to see that record.


Any other ideas?

Thanks again,
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-09-01 Thread Shawn Green (MySQL)


On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query 
window with two markers (s) and (e).  Events will be marked by || 
markers showing their durations.


a)   (s)   (e)
b) |---|
c)  |---|
d)|---|
e)  ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time 
and (e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be 
part of your results.
c) the event starts before the window but ends within the window - 
include this

d) the event starts and ends within the window  - include this
e) the event starts before the window and ends after the window - 
include this
f) the event starts inside the window but ends beyond the window - 
include this.

g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need 
for a WHERE clause


WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-09-01 Thread Brandon Phelps


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output:   (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |
++-+---++---+--+-++--+-+
|  1 | SIMPLE  | sc| index  | open_dt,ndx_open_close_dt | ndx_rcvd | 4  
 | NULL   |   10 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.dst_port |1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |

Query Optimization

2011-08-30 Thread Brandon Phelps


Hello,

I have the following query I'd like to optimize a bit:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 
12:36:53')
ORDER BY rcvd DESC
LIMIT 0 , 10

Currently I have an index on the rcvd column which seems to be working 
based on the output of EXPLAIN:


id  select_type table   typepossible_keys   key key_len 
ref rowsExtra
1   SIMPLE  sc  index   open_dt ndx_rcvd4   NULL
10  Using where
1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2   
syslog.sc.src_port  1   
1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2   
syslog.sc.dst_port  1   

However the query is still fairly slow for some reason, any ideas how I 
could speed it up with additional indexes, etc?


The values I am using in the WHERE clause are variable and change each time.

The table has around 23 million records right now but is expected to 
continue to grow up to a potential 150 million.


Here is the table schema:
CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Thanks in advance!

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-08-30 Thread Michael Dykman

There are a few things gonig on, but mainly it is the ORs that are killing
you.

As your  require OR to examine two distinct columns, both of equal relevance
to the query, MySQL: is left with no choice but to do a full table scan on
what might be (at a guess) a very larger table.  No amount of indexing will
fix this for the query presented.

You would be better off writing it as two distinct queires, each concerned
with conditions on a single column (open_dt and close_dt) and then UNIONing
the results. In this form, the indexes have a chance of being engaged.

Once the indexes are engaged, you probably want to your DATE/DATETIME
strings into actual DATEs or DATATIMEs, thus:

 ...
   (open_dt = DATE('2011-08-30 00:00:00'))
 ...

In it's current state, the DATE fields are being converted to strings
implicitly for every row tested which further frustrates index usage as the
index is against the quasi-numeric DATE, not the string representation which
your current implementation appears to expect.  This query would also
suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by
an admin or a future release of MySQL.  The explicit casting I have
suggested will protect you against that,

 - michael dykman`
On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote:

 Hello,

 I have the following query I'd like to optimize a bit:

 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')
 ORDER BY rcvd DESC
 LIMIT 0 , 10

 Currently I have an index on the rcvd column which seems to be working
 based on the output of EXPLAIN:

 id  select_type table   typepossible_keys   key key_len
 ref rowsExtra
 1   SIMPLE  sc  index   open_dt ndx_rcvd4
 NULL10  Using where
 1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.src_port  1
 1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.dst_port  1

 However the query is still fairly slow for some reason, any ideas how I
 could speed it up with additional indexes, etc?

 The values I am using in the WHERE clause are variable and change each
 time.

 The table has around 23 million records right now but is expected to
 continue to grow up to a potential 150 million.

 Here is the table schema:
 CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
 ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

 Thanks in advance!

 --
 Brandon

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com




-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.

Query Optimization

2011-08-10 Thread Brandon Phelps


Hello all,

I am using the query below and variations of it to query a database with 
a TON of records.  Currently the database has around 11 million records 
but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below query, 
as currently it takes this query around 10 seconds to run but I am sure 
this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the output 
of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-08-10 Thread Brandon Phelps

Thanks Singer,  this took my query down to 0.0007, perfect!  I wasn't 
aware a single index of multiple columns would work when one of the 
columns was in the WHERE clause and the other in the ORDER BY clause. 
Learn something new every day I guess!



On 08/10/2011 02:03 PM, Singer X.J. Wang wrote:

Try a index on (dst_port,close_dt)

On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com
mailto:bphe...@gls.com wrote:

Hello all,

I am using the query below and variations of it to query a database
with a TON of records.  Currently the database has around 11 million
records but it grows every day and should cap out at around 150 million.

I am curious if there is any way I can better optimize the below
query, as currently it takes this query around 10 seconds to run but
I am sure this will get slower and slower as the database grows.

SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the
output of EXPLAIN:

id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?__unsub=w...@singerwang.com
http://lists.mysql.com/mysql?unsub=w...@singerwang.com


--
The best compliment you could give Pythian for our service is a referral.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-08-10 Thread Peter Brawley


On 8/10/2011 1:01 PM, Brandon Phelps wrote:

Hello all,

I am using the query below and variations of it to query a database 
with a TON of records.  Currently the database has around 11 million 
records but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below 
query, as currently it takes this query around 10 seconds to run but I 
am sure this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the 
output of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Did you try adding your ORDER BY argument close_dt to the index?

PB

-


Thanks in advance,



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Query Optimization

2011-08-10 Thread Jan Steinman

 From: Brandon Phelps bphe...@gls.com
 
 I am curious if there is any way I can better optimize the below query, as 
 currently it takes this query around 10 seconds to run but I am sure this 
 will get slower and slower as the database grows.

You need an index on `close_dt`.

 SELECT
   open_dt,
   close_dt,
   protocol,
   INET_NTOA(src_address) AS src_address,
   src_port,
   INET_NTOA(dst_address) AS dst_address,
   dst_port,
   sent,
   rcvd
 FROM connections
 WHERE
   dst_port = 80
 ORDER BY close_dt  DESC
 LIMIT 0, 30


Current farmers, who have become mere operators of machines and mixers of 
chemicals, may not have the skills to develop a local, sustainable agriculture. 
A new generation of farmers, numbering in the tens of millions, will need to be 
trained and relocated to rural communities. -- Pat Murphy
 Jan Steinman, EcoReality Co-op 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: optimization strategies based on file-level storage

2011-06-17 Thread Bennett Haselton


At 10:47 PM 6/16/2011, Johan De Meersman wrote:



- Original Message -
 From: Bennett Haselton benn...@peacefire.org

 Do you happen to know the answer to my other problem -- if I have
 TEXT and BLOB columns but all my other columns are fixed-length, can
 I still get the benefit of faster lookups resulting from fixed-length
 rows, if each row just contains a fixed-length reference to the
 actual TEXT or BLOB data which is stored elsewhere?

entirely unsure, but given the single-datafile structure of MyISAM 
tables, I strongly suspect BLOBS get stored inline.


I can't find a source that says for sure.
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
does say: The maximum size of a row in a MyISAM table is 65,535 
bytes. This figure excludes BLOB or TEXT columns, which contribute 
only 9 to 12 bytes toward this size... For BLOB and TEXT data, the 
information is stored internally in a different area of memory than 
the row buffer.  But that's talking about memory, not disk.  When 
people talk about performance improvements from using fixed-length 
rows, are they talking primarily about memory or hard disk?


Hold up though, I just got this reply from posting the question in a forum:
http://forums.mysql.com/read.php?21,423433,423846
which says Almost always the discussions recommending Fixed length 
records in MyISAM are myths. The apparent performance improvement is 
swamped by the performance loss of shoveling around the wasted 
padding bytes and goes on to give reasons.


Actually, that does make sense that it's a myth.  I was surprised to 
hear so many sources claiming that there was a big performance 
increase from being able to find row N by jumping to position 
N*rowlength.  Because even with variable-length rows, you can just 
store a table associating row numbers with the position of the row in 
the file, can't you -- which would mean it would only take one 
near-instantaneous lookup to be able to jump to the row you're looking for.


What I was really trying to figure out was why it takes me 4 hours to 
add a new column to my 22-million-row table, and whether a different 
table design can avoid that problem.  That reply in the forum says, 
ALTER TABLE ... ADD COLUMN will always copy the entire table over, 
and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do 
_not_ leave space for extra columns, it won't help.  I'm about to 
reply and point out the trick that you suggested to me: create dummy 
columns early and then just rename them later :)


-Bennett 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: optimization strategies based on file-level storage

2011-06-17 Thread Jerry Schwartz

-Original Message-
snip

What I was really trying to figure out was why it takes me 4 hours to
add a new column to my 22-million-row table, and whether a different
table design can avoid that problem.  That reply in the forum says,
ALTER TABLE ... ADD COLUMN will always copy the entire table over,
and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do
_not_ leave space for extra columns, it won't help.  I'm about to
reply and point out the trick that you suggested to me: create dummy
columns early and then just rename them later :)

 -Bennett

[JS] They will be too small, or the wrong type, or there won't be enough of 
them. Based upon 30+ years of database design, I'd bet money on it. ;-)

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: je...@gii.co.jp
Web site: www.the-infoshop.com







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: optimization strategies based on file-level storage

2011-06-17 Thread Wm Mussatto



On Fri, June 17, 2011 07:11, Jerry Schwartz wrote:
-Original Message-
 snip
 
What I was really trying to figure out was why it takes me 4
hours to
add a new column to my 22-million-row table, and
whether a different
table design can avoid that problem. 
That reply in the forum says,
ALTER TABLE ... ADD
COLUMN will always copy the entire table over,
and rebuild
all the indexes. (And, effectively, do an OPTIMIZE.) Do
_not_
leave space for extra columns, it won't help.  I'm about to
reply and point out the trick that you suggested to me: create
dummy
columns early and then just rename them later :)

 -Bennett

 [JS]
They will be too small, or the wrong type, or there won't be enough
 of
 them. Based upon 30+ years of database design, I'd bet
money on it. ;-)
 
 Regards,
 
 Jerry
Schwartz
 Global Information Incorporated
The only
alternative design would be to create another table with the
added columns and a common key field and then lock the primary table and
populate it with the keys from the original table, and I'm not convinced
that would be any faster or less disruptive.
--
William R.
Mussatto
Systems Engineer
http://www.csz.com
909-920-9154

Re: optimization strategies based on file-level storage

2011-06-16 Thread Johan De Meersman



- Original Message -
 From: Bennett Haselton benn...@peacefire.org
 
 Do you happen to know the answer to my other problem -- if I have
 TEXT and BLOB columns but all my other columns are fixed-length, can
 I still get the benefit of faster lookups resulting from fixed-length
 rows, if each row just contains a fixed-length reference to the
 actual TEXT or BLOB data which is stored elsewhere?

entirely unsure, but given the single-datafile structure of MyISAM tables, I 
strongly suspect BLOBS get stored inline.

For InnoDB, the answer appears to be it varies: 
  If a row is less than half a page long, all of it is stored locally 
  within the page. If it exceeds half a page, variable-length columns 
  are chosen for external off-page storage until the row fits within 
  half a page. For a column chosen for off-page storage, InnoDB stores 
  the first 768 bytes locally in the row, and the rest externally into 
  overflow pages.

See http://dev.mysql.com/doc/refman/5.0/en/innodb-file-space.html for more on 
that.

Also, I *think* the concept of fixed-length rows is only applicable to 
MyISAM, InnoDB has index-organised tables - that is to say, it stores all row 
data in the leaves of the primary index. The consequence, of course, is that no 
additional pointer lookup gets done for primary key selects; the tradeoff is 
that all nonprimary key lookups get detoured through the primary key.


The online documentation is really pretty good; but for the really low-level 
things, I guess the best documentation may be the source.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: optimization strategies based on file-level storage

2011-06-15 Thread Bennett Haselton


At 11:45 AM 6/14/2011, Johan De Meersman wrote:



- Original Message -
 From: Bennett Haselton benn...@peacefire.org

 modifications.  (For example, the question I asked earlier about
 whether you can declare extra space at the end of each row that is
 reserved for future columns.)

That question I can answer: you can't reserve space, but if you 
know what kind of rows you'll want to add later you can pre-add them 
(and incur the accompanying storage cost), and simply rename them 
appropriately later.


Thanks.  It would be more flexible if I could declare, say, 50 bytes, 
and decide later if I wanted to use them for a datetime, a char(n), 
or an int, but this is still helpful :)


Do you happen to know the answer to my other problem -- if I have 
TEXT and BLOB columns but all my other columns are fixed-length, can 
I still get the benefit of faster lookups resulting from fixed-length 
rows, if each row just contains a fixed-length reference to the 
actual TEXT or BLOB data which is stored elsewhere?


-Bennett 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

optimization strategies based on file-level storage

2011-06-14 Thread Bennett Haselton

I'm looking for some tips  tricks documentation that explains how 
different data types in rows are stored at the file level (in MyISAM 
tables, at least), and how to optimize tables for faster queries, 
updates, table definition modification, etc. based on this knowledge.


For example, I've heard that if all of your columns are fixed-length, 
that makes it faster to find individual rows since row N is located 
at position N*rowlength in the file.  (And, presumably, having just 
one variable-length column will slow things down considerably.)  But 
I've also read that having TEXT and BLOB columns will also slow down 
row-finding operations.  This seems to make no sense because I 
thought TEXT was not actually stored in the row, but the row just 
stored a constant-length reference to the TEXT whose actual data was 
stored somewhere else.  Is this correct?  Then is it incorrect to say 
that a TEXT column will slow down the locate-row-N operation, 
assuming all other columns are fixed-length?  This is the kind of 
thing I'm looking for a document to explain.


Another example: It sometimes takes me an extremely long time to add 
new columns to a table.  What if I had a table with all fixed-length 
columns, and I reserved some space at the end of each row to be 
used for columns to be added in the future.  Would it then be 
possible to add new columns much more quickly?  You wouldn't have to 
move around the existing row data to make room for the new column 
(although presumably you would still have to *write* to the place in 
reach row where the new column had just been defined, to fill it in 
with its default value).


In particular, I'm not looking for a list of optimization tricks, so 
much as a document that explains how the rows are stored at the file 
level, and thereby explains how the optimization tricks *follow 
logically from* this information.  The reason is that if I just have 
a grab-bag of optimization hints (of which I've found many on the 
Web), some of them will be not applicable to my situation, or just 
plain wrong, and I'll have no way of knowing which ones.  But if you 
know *why* something works, you can more easily figure out if it 
applies to your situation.


-Bennett


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: optimization strategies based on file-level storage

2011-06-14 Thread Bennett Haselton


At 05:46 AM 6/14/2011, Carlos Eduardo Caldi wrote:

Hello Bennett


On the Mysql developer site have a grate documentation, try the 
links above.


http://dev.mysql.com/doc/refman/5.0/en/optimizing-database-structure.html
http://dev.mysql.com/doc/refman/5.0/en/data-size.html


Thanks, this gets me a little closer to the answer but doesn't really 
provide the level of detail that I'm looking for.  For example, it 
says: For MyISAM tables, if you do not have any variable-length 
columns (VARCHAR, TEXT, or BLOB columns), a fixed-size row format is 
used. This is faster...  I still don't understand: If TEXT and BLOB 
columns are stored not by putting the data in the row but by having 
the row store a reference to the TEXT/BLOB data stored somewhere 
else, then can't a row with TEXT and BLOB data types *still* be a 
fixed-size row, with the resulting increased speed?


My main motivation is that I have a table with 22 million records and 
it takes a few hours for me to add a new column to it.  I suspect 
this would be faster if I designed the table correctly from the 
beginning, and I want to change it to a smarter design, but I only 
want to do it once.  So I want to understand really thoroughly *why* 
a different design would make it faster to complete the table 
modifications.  (For example, the question I asked earlier about 
whether you can declare extra space at the end of each row that is 
reserved for future columns.)




Att.
Carlos,

 Date: Tue, 14 Jun 2011 01:44:47 -0700
 To: mysql@lists.mysql.com
 From: benn...@peacefire.org
 Subject: optimization strategies based on file-level storage

 I'm looking for some tips  tricks documentation that explains how
 different data types in rows are stored at the file level (in MyISAM
 tables, at least), and how to optimize tables for faster queries,
 updates, table definition modification, etc. based on this knowledge.

 For example, I've heard that if all of your columns are fixed-length,
 that makes it faster to find individual rows since row N is located
 at position N*rowlength in the file. (And, presumably, having just
 one variable-length column will slow things down considerably.) But
 I've also read that having TEXT and BLOB columns will also slow down
 row-finding operations. This seems to make no sense because I
 thought TEXT was not actually stored in the row, but the row just
 stored a constant-length reference to the TEXT whose actual data was
 stored somewhere else. Is this correct? Then is it incorrect to say
 that a TEXT column will slow down the locate-row-N operation,
 assuming all other columns are fixed-length? This is the kind of
 thing I'm looking for a document to explain.

 Another example: It sometimes takes me an extremely long time to add
 new columns to a table. What if I had a table with all fixed-length
 columns, and I reserved some space at the end of each row to be
 used for columns to be added in the future. Would it then be
 possible to add new columns much more quickly? You wouldn't have to
 move around the existing row data to make room for the new column
 (although presumably you would still have to *write* to the place in
 reach row where the new column had just been defined, to fill it in
 with its default value).

 In particular, I'm not looking for a list of optimization tricks, so
 much as a document that explains how the rows are stored at the file
 level, and thereby explains how the optimization tricks *follow
 logically from* this information. The reason is that if I just have
 a grab-bag of optimization hints (of which I've found many on the
 Web), some of them will be not applicable to my situation, or just
 plain wrong, and I'll have no way of knowing which ones. But if you
 know *why* something works, you can more easily figure out if it
 applies to your situation.

 -Bennett


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe: http://lists.mysql.com/mysql?unsub=ce_ca...@hotmail.com




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-14 Thread sono-io

On Jun 7, 2011, at 10:43 PM, Johan De Meersman wrote:

 Where did you find the advice about setting columns NOT NULL?

It took me awhile, but I just found it again, in case anyone is 
interested:

http://dev.mysql.com/doc/refman/5.0/en/data-size.html


7.8.1. Make Your Data as Small as Possible

Declare columns to be NOT NULL if possible.  It makes everything faster and you 
save one bit per column. If you really need NULL in your application, you 
should definitely use it.  Just avoid having it on all columns by default.

Marc
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: optimization strategies based on file-level storage

2011-06-14 Thread Johan De Meersman



- Original Message -
 From: Bennett Haselton benn...@peacefire.org
 
 modifications.  (For example, the question I asked earlier about
 whether you can declare extra space at the end of each row that is
 reserved for future columns.)

That question I can answer: you can't reserve space, but if you know what 
kind of rows you'll want to add later you can pre-add them (and incur the 
accompanying storage cost), and simply rename them appropriately later.

ALTER TABLE will do a full re-write of your table when that is necessary (like 
adding or deleting columns), but will (likely) not do so when you're just 
renaming an existing column. Look hard enough on the web and you can find info 
on how to hack the .frm files, too, so some operations that would do a full 
re-write don't - but that's fishy business, and always at your own risk.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-14 Thread Johan De Meersman

Heeh. That's not a random blog post, that's the official manual :-p

If it's in there, it's pretty much trustworthy; and I've learned something new 
:-) So apparently there's a dedicated NULL bit to columns... Yes, then there 
would be a small performance benefit.

I stand corrected. Still, as you've noticed, don't change the design of an 
existing application without thoroughly testing the consequences :-p


- Original Message -
 From: sono...@fannullone.us
 To: mysql@lists.mysql.com
 Sent: Tuesday, 14 June, 2011 7:34:04 PM
 Subject: Re: Data missing after field optimization
 
 It took me awhile, but I just found it again, in case anyone is
 interested:
 
 http://dev.mysql.com/doc/refman/5.0/en/data-size.html
 
 
 7.8.1. Make Your Data as Small as Possible
 
 Declare columns to be NOT NULL if possible.  It makes everything
 faster and you save one bit per column. If you really need NULL in
 your application, you should definitely use it.  Just avoid having
 it on all columns by default.
 
 Marc

-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: Data missing after field optimization

2011-06-09 Thread Jerry Schwartz

snip

 If that's all you did, you indeed 'removed the default NULL' but did not
specify another default. Hence, if you don't explicitly specify a value in 
your
insert statement, the insert can not happen as the server doesn't know what 
to
put there and is explicitly disallowed from leaving the value empty.

   That makes sense.  So then why does phpMyAdmin allow you to choose 
 None
as a default?

[JS] There are times when you want to enforce certain conditions even though 
they might sometimes generate an error. Two examples are required fields (no 
default value) or referential integrity.

The goal is not to lose data, but to make sure your applications are doing 
what they are supposed to do. You would program your applications to trap and 
report errors.


Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: je...@gii.co.jp
Web site: www.the-infoshop.com







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-08 Thread sono-io

On Jun 7, 2011, at 10:43 PM, Johan De Meersman wrote:

 If that's all you did, you indeed 'removed the default NULL' but did not 
 specify another default. Hence, if you don't explicitly specify a value in 
 your insert statement, the insert can not happen as the server doesn't know 
 what to put there and is explicitly disallowed from leaving the value empty.

That makes sense.  So then why does phpMyAdmin allow you to choose 
None as a default?

 Where did you find the advice about setting columns NOT NULL?

I can't find them today, but I read on a couple of web sites that not 
populating fields with NULL would make the tables more efficient.

I have a shopping cart that has every field as type TEXT and I'm going 
through it with a fine tooth comb trying to make it more efficient.

 You really shouldn't take everything you read on the internet as truth :-)

NOW you tell me! =;)  I guess I learned my lesson the hard way.

Thanks again,
Marc
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-07 Thread sono-io

On Jun 6, 2011, at 10:06 PM, Johan De Meersman wrote:

 What exactly do you mean by removing the NULL default?  Did you set your 
 colums NOT NULL?

Yes.  That's all I did.

Marc
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-07 Thread Rik Wasmus

 On Jun 6, 2011, at 10:06 PM, Johan De Meersman wrote:
  What exactly do you mean by removing the NULL default?  Did you set
  your colums NOT NULL?
 
   Yes.  That's all I did.

In stead of getting info drop-by-drop, you might want to share the output of 
SHOW CREATE TABLE...,, but my guess is you had a UNIQUE key somewhere that got 
violated when converting to NULL's to empty strings. If would require an ALTER 
IGNORE TABLE... instead of a normal ALTER TABLE though.

That, or an outside source (code we cannot see querying for NOT NULLs etc.)
-- 
Rik Wasmus

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-07 Thread Johan De Meersman

- Original Message -
 From: sono...@fannullone.us
 
 Yes.  That's all I did.

If that's all you did, you indeed 'removed the default NULL' but did not 
specify another default. Hence, if you don't explicitly specify a value in your 
insert statement, the insert can not happen as the server doesn't know what to 
put there and is explicitly disallowed from leaving the value empty.

Where did you find the advice about setting columns NOT NULL? You really 
shouldn't take everything you read on the internet as truth :-)

-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Data missing after field optimization

2011-06-06 Thread sono-io

Hopefully I won't look like too much of a numbskull here but after 
reading some sites on table optimization, I decided to remove the NULL as 
default on the fields in my products table.  I thought everything went well 
until I realized that we hadn't received any orders for 2 days.  That's when I 
realized that my products table was empty!  Luckily this happened over the 
weekend.  =:\

I've made changes to field types before and never had a problem like 
this.  Does anyone know what went wrong from my limited description?  Why would 
removing the NULL default cause data to be lost?

Thanks,
Marc
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Data missing after field optimization

2011-06-06 Thread Johan De Meersman


- Original Message -
 From: sono...@fannullone.us
 
   description?  Why would removing the NULL default cause data to be
   lost?

What exactly do you mean by removing the NULL default? Did you set your 
colums NOT NULL?


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Super active table optimization

2011-02-19 Thread Bryan Cantwell


I have an innodb table with 200 million rows and growing.
It is a highly active table with tons of inserts and updates at all times.
I notice a select query I test on that table is 0.01 seconds or less 
when all the inserts/updates are OFF.
But when I throttle up the writes to the table, the select sql then 
takes like 1-3 full seconds or more.

This is a serious bottleneck in our application.
Any basic innodb wisdom for this scenario?

+---+-+--+-+-+---+
| Field | Type| Null | Key | Default | Extra |
+---+-+--+-+-+---+
| id  | bigint(20) unsigned | NO   | PRI | 0   |   |
| clock | int(11) | NO   | PRI | 0   |   |
| type  | int(11) | NO   | PRI | 3   |   |
| num   | int(11) | NO   | | 0   |   |
| value_min | double(20,4)| YES  | | NULL|   |
| value_avg | double(20,4)| YES  | | NULL|   |
| value_max | double(20,4)| YES  | | NULL|   |
+---+-+--+-+-+---+


# cat /etc/my.cnf|grep -i innodb
default-storage-engine=InnoDB
innodb_data_file_path = ibdata1:256M:autoextend
innodb_buffer_pool_size = 768M
innodb_additional_mem_pool_size = 32M
innodb_log_file_size = 192M
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_flush_method=O_DIRECT
innodb_table_locks=0;
innodb_use_legacy_cardinality_algorithm=0;

16G memory
16G swap
8 CPU

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Super active table optimization

2011-02-19 Thread Walter Heck

1. Which version of MySQL? Judging from your my.cnf it's not MariaDB 5.2 or
Percona Server? They have very good enhancement, particularly in the high
concurrency innodb atmosphere. InnoDB is a lot more tunabl;e in those
versions. Switching from a stock mysql version is adrop-in replacement, so
if tested properly should be a very simple operation.

2. Try makign all fields NOT NULL, that should increase performance as well,
although I'm not sure how much that would help.

On Sat, Feb 19, 2011 at 16:59, Bryan Cantwell bcantw...@firescope.comwrote:

 # cat /etc/my.cnf|grep -i innodb

Just because i like nagging: uuoc :)


 innodb_buffer_pool_size = 768M

With 16G RAM you should set this as high as you can, up to 60-75% of your
memory. This of course depends on the storage engine the rest of your db
uses. Aim to be mostly InnoDB and you can set this very high


 innodb_flush_log_at_trx_commit = 2

Have you read the consequences of setting this to 2? Look up the manual and
make an infromed decision as it can severely affect what happens if your
server dies.


 innodb_flush_method=O_DIRECT

This depends on what kind of hardware you have. With a RAID array with write
cache, you can set it to O-DIRECT. In other environments it can adversely
affect performance.


 16G swap

This is insane :) If MySQL starts swapping, you're dead in the water anyway.
Typically, anything over 1G or so is too much.


 8 CPU

How many cores? The more cores, the more use you will get from using one of
the non-stock MySQL versions, as they have been optimised for concurrency
and are much more tunable.

Shameless plug: if you want more on-topic help, you should look into letting
us (I work for Open Query) help you. We can then research your whole
environment and have much better advice that is not limited to this scope.
Other companies that do similar services include Percona, SkySQL and
FromDual to name a few :)

Have fun!

-- 
Walter Heck
Engineer @ Open Query (http://openquery.com)
Exceptional services for MariaDB and MySQL at a fixed budget
--
Follow @openquery on Twitter to stay up to date

Re: Super active table optimization

2011-02-19 Thread Reindl Harald

Am 19.02.2011 16:59, schrieb Bryan Cantwell:
 I have an innodb table with 200 million rows and growing.

more interesting is the db-size!

 innodb_buffer_pool_size = 768M

roo small, in the best case it is as large as the database
or so large as pissoble to avoid swapping

 innodb_log_file_size = 192M

1/4 of buffer_pool is ok but if your buffer-pool is too small
it is a little difficult to increase this, i would never set
this lower than 512 MB which is also ok for larger databases

this are our innodb settings for a dbmail-instance
and mysql 5.5 and we are speaking about 15 GB data
and 3 Mio. rows, the backend is a san-storage

in this setup we can increase up to 1.000 inserts per second

innodb_buffer_pool_size = 5120M
innodb_buffer_pool_instances= 5
innodb_purge_threads= 1
innodb_max_purge_lag= 20
innodb_max_dirty_pages_pct  = 60
innodb_additional_mem_pool_size = 64M
innodb_log_file_size= 512M
innodb_log_buffer_size  = 256M
innodb_thread_concurrency   = 32
innodb_thread_sleep_delay   = 10
innodb_flush_log_at_trx_commit  = 2
innodb_support_xa   = 1
innodb_lock_wait_timeout= 50
innodb_table_locks  = 0
innodb_file_format  = barracuda
innodb_file_per_table   = 1
innodb_open_files   = 600
innodb_io_capacity  = 800
innodb_read_io_threads  = 8
innodb_write_io_threads = 8
transaction-isolation   = READ-COMMITTED



signature.asc
Description: OpenPGP digital signature

Re: some problem of InnoDB performance optimization

2010-03-22 Thread Peter Zaitsev

Hi,

Lets see If I can help.

Overly long queries (transactions
  in general) are bad for performance as a lot of unpurged versions
 accumulate.

 In this sentence, I don't know the meaning about 'unpureged version
 accumulate'


When rows are updated new versions are created. They are later removed by
purge thread - only then no active transactions may need them. This is why
long open transactions are expensive.





 And I don't how to understanding 'open transaction'?


This is transaction which is started (opened) but not yet ended by commit or
rollback.





 Required for logical level replication to work properly.

 What's the meaning about logical level replication?


MySQL has statement level, also caused logical as well as row based
replication.   statement level replication requires updates to be
serializable to work.




 Can give problems for portable applications if you port from MySQL-4.0 to
 later

 What's the meaning about this sentence?



This means you can run into the problems if you upgrade from MySQL 4.0 to
later version. Probably is not much relevant any more.



-- 
Peter Zaitsev, CEO, Percona Inc.
Tel: +1 888 401 3401 ext 501   Skype:  peter_zaitsev
24/7 Emergency Line +1 888 401 3401 ext 911

Percona Training Workshops
http://www.percona.com/training/

Re: some problem of InnoDB performance optimization

2010-03-22 Thread Timo

HI Peter:


   Thanks for your answer. I have understand your answer. Thank you very
much.


――

Best regards
Timo Seven
blog: http://zauc.wordpress.com
#请翻墙浏览，或者指定hosts到74.200.243.252###
UNIX System Admin


2010/3/23 Peter Zaitsev p...@percona.com

 Hi,

 Lets see If I can help.

 Overly long queries (transactions
  in general) are bad for performance as a lot of unpurged versions
 accumulate.

 In this sentence, I don't know the meaning about 'unpureged version
 accumulate'


 When rows are updated new versions are created. They are later removed by
 purge thread - only then no active transactions may need them. This is why
 long open transactions are expensive.





 And I don't how to understanding 'open transaction'?


 This is transaction which is started (opened) but not yet ended by commit
 or rollback.





 Required for logical level replication to work properly.

 What's the meaning about logical level replication?


 MySQL has statement level, also caused logical as well as row based
 replication.   statement level replication requires updates to be
 serializable to work.




 Can give problems for portable applications if you port from MySQL-4.0 to
 later

 What's the meaning about this sentence?



 This means you can run into the problems if you upgrade from MySQL 4.0 to
 later version. Probably is not much relevant any more.



 --
 Peter Zaitsev, CEO, Percona Inc.
 Tel: +1 888 401 3401 ext 501   Skype:  peter_zaitsev
 24/7 Emergency Line +1 888 401 3401 ext 911

 Percona Training Workshops
 http://www.percona.com/training/

some problem of InnoDB performance optimization

2010-03-21 Thread Timo

Hi everyone:


I read the presentation about InnodDB performance optimization what
Heikki Tuuri written in april23 2007.
But now I have some sentences don't know how to understanding. Can you help
me?

Overly long queries (transactions
 in general) are bad for performance as a lot of unpurged versions accumulate.

In this sentence, I don't know the meaning about 'unpureged version
accumulate'


And I don't how to understanding 'open transaction'?


Required for logical level replication to work properly.

What's the meaning about logical level replication?


Can give problems for portable applications if you port from MySQL-4.0 to later

What's the meaning about this sentence?





――

Best regards
Timo Seven
blog: http://zauc.wordpress.com
#请翻墙浏览，或者指定hosts到74.200.243.252###
UNIX System Admin

Re: Need help with query optimization

2010-03-17 Thread John Daisley

It may only be returning 51 rows but its having to read significantly more.

Get rid of the derived table join if possible. Something like

SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName,
TAP.State,
TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
S.Region, S.District,Sum(Pts.Points) Total_Points
FROM TorchAwardParticipants TAP
 JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID
 JOIN Chapters C On C.ID http://c.id/=M.ChapterID
 JOIN Schools S On S.ID http://s.id/=C.SchoolID
 JOIN TorchAwardSelAct  Pts ON Pts.AchievementID=TAP.ID http://tap.id/
WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL
AND Pts.LocalApproveStatus='A'
GROUP BY TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI,
TAP.LastName, TAP.State,
TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
S.Region, S.District
ORDER BY TAP.LastName, TAP.FirstName, Sum(Pts.Points)
Regards
John



On Tue, Mar 16, 2010 at 6:17 PM, Jesse j...@msdlg.com wrote:

 I have the following query:

 SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI,
 TAP.LastName, TAP.State,
 TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
 S.Region, S.District,Pts.TotPoints
 FROM TorchAwardParticipants TAP
  JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID
  JOIN Chapters C On C.ID http://c.id/=M.ChapterID
  JOIN Schools S On S.ID http://s.id/=C.SchoolID
  JOIN (SELECT AchievementID,Sum(Points) As TotPoints
 FROM TorchAwardSelAct TASA
 WHERE LocalApproveStatus='A'
 GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/
 WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT
 NULL
 ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints

 The TorchAwardParticipants table has about 84,000 records in it.
 The query takes almost 40 seconds to return the data, which is only 51
 rows.
 An EXPLAIN returns the following:

 ++-+++-+---+-+-++-+
 | id | select_type | table  | type   | possible_keys   | key |
 key_len | ref | rows   | Extra |

 ++-+++-+---+-+-++-+
 |  1 | PRIMARY | derived2 | ALL| NULL| NULL |
 NULL| NULL|   4382 | Using temporary; Using filesort
 |
 |  1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID |
 PRIMARY | 4   | Pts.AchievementID   |  1 | Using where |
 |  1 | PRIMARY | M  | eq_ref | PRIMARY,IX_Members_3|
 PRIMARY | 4   | bpa.TAP.CurrentMemberID |  1 | |
 |  1 | PRIMARY | C  | eq_ref | PRIMARY,IX_Chapters_1   |
 PRIMARY | 4   | bpa.M.ChapterID |  1 | |
 |  1 | PRIMARY | S  | eq_ref | PRIMARY |
 PRIMARY | 4   | bpa.C.SchoolID  |  1 | |
 |  2 | DERIVED | TASA   | index  | NULL|
 AchievementID | 5   | NULL| 161685 | Using where |

 ++-+++-+---+-+-++-+

 What is the best way to optimize this query so that it doesn't take 40
 seconds to return the dataset?

 Jesse



 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?unsub=john.dais...@butterflysystems.co.uk

Need help with query optimization

2010-03-16 Thread Jesse


I have the following query:

SELECT TAP.ID, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State,
TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
S.Region, S.District,Pts.TotPoints
FROM TorchAwardParticipants TAP
  JOIN Members M On M.ID=TAP.CurrentMemberID
  JOIN Chapters C On C.ID=M.ChapterID
  JOIN Schools S On S.ID=C.SchoolID
  JOIN (SELECT AchievementID,Sum(Points) As TotPoints
 FROM TorchAwardSelAct TASA
 WHERE LocalApproveStatus='A'
 GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.ID
WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL
ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints

The TorchAwardParticipants table has about 84,000 records in it.
The query takes almost 40 seconds to return the data, which is only 51 rows.
An EXPLAIN returns the following:
++-+++-+---+-+-++-+
| id | select_type | table  | type   | possible_keys   | key 
| key_len | ref | rows   | Extra 
|

++-+++-+---+-+-++-+
|  1 | PRIMARY | derived2 | ALL| NULL| NULL 
| NULL| NULL|   4382 | Using temporary; Using 
filesort |
|  1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY 
| 4   | Pts.AchievementID   |  1 | Using where 
|
|  1 | PRIMARY | M  | eq_ref | PRIMARY,IX_Members_3| PRIMARY 
| 4   | bpa.TAP.CurrentMemberID |  1 | 
|
|  1 | PRIMARY | C  | eq_ref | PRIMARY,IX_Chapters_1   | PRIMARY 
| 4   | bpa.M.ChapterID |  1 | 
|
|  1 | PRIMARY | S  | eq_ref | PRIMARY | PRIMARY 
| 4   | bpa.C.SchoolID  |  1 | 
|
|  2 | DERIVED | TASA   | index  | NULL| 
AchievementID | 5   | NULL| 161685 | Using where 
|

++-+++-+---+-+-++-+

What is the best way to optimize this query so that it doesn't take 40 
seconds to return the dataset?


Jesse



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: Need help with query optimization

2010-03-16 Thread Ananda Kumar

Can you please show us the indexes on both the tables.

regards
anandkl

On Tue, Mar 16, 2010 at 11:47 PM, Jesse j...@msdlg.com wrote:

 I have the following query:

 SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI,
 TAP.LastName, TAP.State,
 TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
 S.Region, S.District,Pts.TotPoints
 FROM TorchAwardParticipants TAP
  JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID
  JOIN Chapters C On C.ID http://c.id/=M.ChapterID
  JOIN Schools S On S.ID http://s.id/=C.SchoolID
  JOIN (SELECT AchievementID,Sum(Points) As TotPoints
 FROM TorchAwardSelAct TASA
 WHERE LocalApproveStatus='A'
 GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/
 WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT
 NULL
 ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints

 The TorchAwardParticipants table has about 84,000 records in it.
 The query takes almost 40 seconds to return the data, which is only 51
 rows.
 An EXPLAIN returns the following:

 ++-+++-+---+-+-++-+
 | id | select_type | table  | type   | possible_keys   | key |
 key_len | ref | rows   | Extra |

 ++-+++-+---+-+-++-+
 |  1 | PRIMARY | derived2 | ALL| NULL| NULL |
 NULL| NULL|   4382 | Using temporary; Using filesort
 |
 |  1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID |
 PRIMARY | 4   | Pts.AchievementID   |  1 | Using where |
 |  1 | PRIMARY | M  | eq_ref | PRIMARY,IX_Members_3|
 PRIMARY | 4   | bpa.TAP.CurrentMemberID |  1 | |
 |  1 | PRIMARY | C  | eq_ref | PRIMARY,IX_Chapters_1   |
 PRIMARY | 4   | bpa.M.ChapterID |  1 | |
 |  1 | PRIMARY | S  | eq_ref | PRIMARY |
 PRIMARY | 4   | bpa.C.SchoolID  |  1 | |
 |  2 | DERIVED | TASA   | index  | NULL|
 AchievementID | 5   | NULL| 161685 | Using where |

 ++-+++-+---+-+-++-+

 What is the best way to optimize this query so that it doesn't take 40
 seconds to return the dataset?

 Jesse



 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql?unsub=anan...@gmail.com

Complex SQL optimization vs. general-purpose language

2010-02-03 Thread Yang Zhang

Any SQL rewriting gurus know how I might be able to optimize this
query? The schema:

mysql show columns from transactionlog;
+---+---+--+-+-++
| Field | Type  | Null |
Key | Default | Extra  |
+---+---+--+-+-++
| id| int(11)   | NO   |
PRI | NULL| auto_increment |
| transactionid | varchar(10)   | NO   |
MUL | NULL||
| queryid   | tinyint(4)| NO   |
  | NULL||
| tableid   | varchar(30)   | NO   |
MUL | NULL||
| tupleid   | int(11)   | NO   |
  | NULL||
| querytype | enum('select','insert','delete','update') | NO   |
  | NULL||
| schemaname| varchar(20)   | YES  |
  | NULL||
| partition | tinyint(3) unsigned   | YES  |
  | NULL||
+---+---+--+-+-++
8 rows in set (0.04 sec)

The query:

select concat(weight, ' ', ids, '\n')
from (
  select
tableid,
tupleid,
group_concat(id separator ' ') as ids,
(
  select count(distinct transactionid)
  from transactionlog
  where transactionid in (
select transactionid
from transactionlog
where (tableid, tupleid, querytype) =
  (t.tableid, t.tupleid, 'update')
group by transactionid
having count(*)  0
  )
) weight
  from transactionlog t
  group by tableid, tupleid
  having weight  0 and count(*)  1
) u;

This is the output of EXPLAIN and mk-visual-explain:

++++---+---+---+-+---+--+--
+
| id | select_type| table  | type  | possible_keys |
key   | key_len | ref   | rows | Extra
   |
++++---+---+---+-+---+--+--+
|  1 | PRIMARY| derived2 | ALL   | NULL  |
NULL  | NULL| NULL  |   13 |
   |
|  2 | DERIVED| t  | ALL   | NULL  |
NULL  | NULL| NULL  |   68 | Using filesort
   |
|  3 | DEPENDENT SUBQUERY | transactionlog | index | NULL  |
transactionid | 12  | NULL  |   68 | Using where; Using index
   |
|  4 | DEPENDENT SUBQUERY | transactionlog | ref   | tableid   |
tableid   | 36  | func,func |2 | Using where; Using
temporary; Using filesort |
++++---+---+---+-+---+--+--+
Table scan
rows   13
+- DERIVED
   table  derived(t,transactionlog,temporary(transactionlog))
   +- DEPENDENT SUBQUERY
  +- DEPENDENT SUBQUERY
  |  +- Filesort
  |  |  +- TEMPORARY
  |  | table  temporary(transactionlog)
  |  | +- Filter with WHERE
  |  |+- Bookmark lookup
  |  |   +- Table
  |  |   |  table  transactionlog
  |  |   |  possible_keys  tableid
  |  |   +- Index lookup
  |  |  keytransactionlog-tableid
  |  |  possible_keys  tableid
  |  |  key_len36
  |  |  reffunc,func
  |  |  rows   2
  |  +- Filter with WHERE
  | +- Index scan
  |keytransactionlog-transactionid
  |key_len12
  |rows   68
  +- Filesort
 +- Table scan
rows   68
+- Table
   table  t

That is a lot of work. I can write the equivalent logic in Python
while making a single pass:

  results = query(
select tableid, tupleid, transactionid, id, querytype
from transactionlog_2warehouse
  )
  _tab, _tup = None
  ids = []
  weight = 0
  saw_upd = False
  for tab, tup, txn, id, qt in results:
if (_tab, _tup) != (tab, tup):
  if len(ids)  1 and weight  0:
print weight, ids
  weight = 0
  ids = []
  _txn = None
if _txn != txn:
  saw_upd = False
if qt == 'update' and not saw_upd:
  weight += 1
  saw_upd = True
ids += [id]

Is it possible to achieve the Python single-pass performance using
pure SQL? Thanks in advance!
-- 
Yang Zhang
http://www.mit.edu/~y_z/

Re: optimization

2010-01-26 Thread John G. Heim


From: Jaime Crespo Rincón jcre...@warp.es
Sent: Monday, January 25, 2010 5:30 PM


2010/1/25 John G. Heim jh...@math.wisc.edu:

I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is
running the latest mysql-server from debian lenny (5.0.1). I have 
databases

for drupal, moodle, spamassassin, horde3, and a small database for
departmental stuff.

The problem is that inserts/updates are sometimes very slow, on the order 
of

a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The whole
my.cnf is reproduced below that:


Are your databases using MyISAM or InnoDB?

Both. Maybe that's the problem? I started creating database tables for my 
own web apps with the default mysql configuration. I believe the default 
database engine is MyISAM. But then I wanted to use foreign keys and I saw 
that it required me to use  InnoDB. So I converted some tables to InnoDB but 
not all. Maybe it was a mistake not to convert all of them.


After that, I installed drupal, moodle, and mediawiki. I haven't looked at 
what kind of tables those packages create. They may not specify it and the 
default is still whatever it is when you install mysql, MyISAM I think.



* If MyISAM, you could be suffering contention problems on writes

because of full table locks. No easy solution but engine change or
database sharding. Also key_buffer, (and the other buffers) coud be
too small for 16GB of RAM. Are you really using more thant 10% of it?
You could also disable other engines if unused.
* If InnoDB, you have not set innodb_buffer_pool_size nor log size.
You could increase the pool to 50% of ram available.

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.


Well, it could be that the disks aren't real fast. The server is also 
running a print server (lprng). I don't think that's very CPU intensive but 
it might be slowing down writes. 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: optimization

2010-01-26 Thread mos


Get yourself a copy of the book High Performance MySQL 2nd Edition
Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English 
ISBN-10: 0596101716 ISBN-13: 978-0596101718


Here is a brief preview of the first edition: 
http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false


Mike

At 10:19 AM 1/26/2010, John G. Heim wrote:

From: Jaime Crespo RincÃ³n jcre...@warp.es
Sent: Monday, January 25, 2010 5:30 PM


2010/1/25 John G. Heim jh...@math.wisc.edu:

I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is
running the latest mysql-server from debian lenny (5.0.1). I have databases
for drupal, moodle, spamassassin, horde3, and a small database for
departmental stuff.

The problem is that inserts/updates are sometimes very slow, on the order of
a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The whole
my.cnf is reproduced below that:


Are your databases using MyISAM or InnoDB?

Both. Maybe that's the problem? I started creating database tables for my 
own web apps with the default mysql configuration. I believe the default 
database engine is MyISAM. But then I wanted to use foreign keys and I saw 
that it required me to use  InnoDB. So I converted some tables to InnoDB 
but not all. Maybe it was a mistake not to convert all of them.


After that, I installed drupal, moodle, and mediawiki. I haven't looked at 
what kind of tables those packages create. They may not specify it and the 
default is still whatever it is when you install mysql, MyISAM I think.



* If MyISAM, you could be suffering contention problems on writes

because of full table locks. No easy solution but engine change or
database sharding. Also key_buffer, (and the other buffers) coud be
too small for 16GB of RAM. Are you really using more thant 10% of it?
You could also disable other engines if unused.
* If InnoDB, you have not set innodb_buffer_pool_size nor log size.
You could increase the pool to 50% of ram available.

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.


Well, it could be that the disks aren't real fast. The server is also 
running a print server (lprng). I don't think that's very CPU intensive 
but it might be slowing down writes.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: optimization

2010-01-26 Thread Keith Murphy

♫
I would recommend the same to you about reading High Perf. MySQL as Baron,
et al wrote a great book about performance on MySQL. That being said, it has
been my experience that in 99% of client cases they don't really need to run
two different types of tables. If I were you, I would use InnoDB exclusively
unless there is legitimate reason to do otherwise. In an environment that is
running 25% writes and a decent query rate you are bound to have contention
issues with MyISAM. While there are always going to be edge cases for
MyISAM, your default should be innodb and your config should reflect this.

Changing your tables to InnoDB is a simple ALTER TABLE which you can script
if there are a number of tables to convert. Allocate as much of your
available RAM as possible to the innodb_buffer_pool (typically 50 - 80% of
total RAM) and I bet you would see a dramatic difference. That is
simplifying things somewhat, but should give an idea.

keith

On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote:

Get yourself a copy of the book High Performance MySQL 2nd Edition
Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English
ISBN-10: 0596101716 ISBN-13: 978-0596101718

Here is a brief preview of the first edition:
http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false

Mike

At 10:19 AM 1/26/2010, John G. Heim wrote:

From: Jaime Crespo RincÃ³n jcre...@warp.es

Sent: Monday, January 25, 2010 5:30 PM

2010/1/25 John G. Heim jh...@math.wisc.edu:

I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is
running the latest mysql-server from debian lenny (5.0.1). I have
databases
for drupal, moodle, spamassassin, horde3, and a small database for
departmental stuff.

The problem is that inserts/updates are sometimes very slow, on the order
of
a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The
whole
my.cnf is reproduced below that:

Are your databases using MyISAM or InnoDB?

Both. Maybe that's the problem? I started creating database tables for my
own web apps with the default mysql configuration. I believe the default
database engine is MyISAM. But then I wanted to use foreign keys and I saw
that it required me to use InnoDB. So I converted some tables to InnoDB but
not all. Maybe it was a mistake not to convert all of them.

After that, I installed drupal, moodle, and mediawiki. I haven't looked at
what kind of tables those packages create. They may not specify it and the
default is still whatever it is when you install mysql, MyISAM I think.

* If MyISAM, you could be suffering contention problems on writes

because of full table locks. No easy solution but engine change or
database sharding. Also key_buffer, (and the other buffers) coud be
too small for 16GB of RAM. Are you really using more thant 10% of it?
You could also disable other engines if unused.
* If InnoDB, you have not set innodb_buffer_pool_size nor log size.
You could increase the pool to 50% of ram available.

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.

Well, it could be that the disks aren't real fast. The server is also
running a print server (lprng). I don't think that's very CPU intensive but
it might be slowing down writes.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com

--
Chief Training Officer
Paragon Consulting Services
850-637-3877

Re: optimization

2010-01-26 Thread John G. Heim

Just to be clear, you're suggesting I convert all of the spamassassin, 
drupal, and mediawiki tables to innodb too? Or just my own database? What 
about the mysql database itself? I wouldn't convert those tables, would I?

- Original Message - 
From: Keith Murphy bmur...@paragon-cs.com

To: mysql@lists.mysql.com
Sent: Tuesday, January 26, 2010 11:06 AM
Subject: Re: optimization

♫
I would recommend the same to you about reading High Perf. MySQL as Baron,
et al wrote a great book about performance on MySQL. That being said, it has
been my experience that in 99% of client cases they don't really need to run
two different types of tables. If I were you, I would use InnoDB exclusively
unless there is legitimate reason to do otherwise. In an environment that is
running 25% writes and a decent query rate you are bound to have contention
issues with MyISAM. While there are always going to be edge cases for
MyISAM, your default should be innodb and your config should reflect this.

Changing your tables to InnoDB is a simple ALTER TABLE which you can script
if there are a number of tables to convert.  Allocate as much of your
available RAM as possible to the innodb_buffer_pool (typically 50 - 80% of
total RAM) and I bet you would see a dramatic difference. That is
simplifying things somewhat, but should give an idea.

keith

On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote:

Get yourself a copy of the book High Performance MySQL 2nd Edition
Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English
ISBN-10: 0596101716 ISBN-13: 978-0596101718

Here is a brief preview of the first edition:
http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false

Mike

At 10:19 AM 1/26/2010, John G. Heim wrote:

From: Jaime Crespo RincÃ³n jcre...@warp.es

Sent: Monday, January 25, 2010 5:30 PM

2010/1/25 John G. Heim jh...@math.wisc.edu:

I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is
running the latest mysql-server from debian lenny (5.0.1). I have
databases
for drupal, moodle, spamassassin, horde3, and a small database for
departmental stuff.

The problem is that inserts/updates are sometimes very slow, on the 
order

of
a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The
whole
my.cnf is reproduced below that:

Are your databases using MyISAM or InnoDB?

Both. Maybe that's the problem? I started creating database tables for my
own web apps with the default mysql configuration. I believe the default
database engine is MyISAM. But then I wanted to use foreign keys and I 
saw
that it required me to use  InnoDB. So I converted some tables to InnoDB 
but

not all. Maybe it was a mistake not to convert all of them.

After that, I installed drupal, moodle, and mediawiki. I haven't looked 
at
what kind of tables those packages create. They may not specify it and 
the

default is still whatever it is when you install mysql, MyISAM I think.

 * If MyISAM, you could be suffering contention problems on writes

because of full table locks. No easy solution but engine change or
database sharding. Also key_buffer, (and the other buffers) coud be
too small for 16GB of RAM. Are you really using more thant 10% of it?
You could also disable other engines if unused.
* If InnoDB, you have not set innodb_buffer_pool_size nor log size.
You could increase the pool to 50% of ram available.

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.

Well, it could be that the disks aren't real fast. The server is also
running a print server (lprng). I don't think that's very CPU intensive 
but

it might be slowing down writes.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com

--
Chief Training Officer
Paragon Consulting Services
850-637-3877

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: optimization

2010-01-26 Thread Keith Murphy

You absolutely *should not* convert the mysql database to InnoDB.

Read the above sentence again :)

All others, unless you had a specific reason not to do so, yes, I would
convert them.

keith

On Tue, Jan 26, 2010 at 2:18 PM, John G. Heim jh...@math.wisc.edu wrote:

Just to be clear, you're suggesting I convert all of the spamassassin,
drupal, and mediawiki tables to innodb too? Or just my own database? What
about the mysql database itself? I wouldn't convert those tables, would I?

- Original Message - From: Keith Murphy bmur...@paragon-cs.com
To: mysql@lists.mysql.com
Sent: Tuesday, January 26, 2010 11:06 AM
Subject: Re: optimization

♫
I would recommend the same to you about reading High Perf. MySQL as Baron,
et al wrote a great book about performance on MySQL. That being said, it
has
been my experience that in 99% of client cases they don't really need to
run
two different types of tables. If I were you, I would use InnoDB
exclusively
unless there is legitimate reason to do otherwise. In an environment that
is
running 25% writes and a decent query rate you are bound to have contention
issues with MyISAM. While there are always going to be edge cases for
MyISAM, your default should be innodb and your config should reflect this.

keith

On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote:

Get yourself a copy of the book High Performance MySQL 2nd Edition
Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English
ISBN-10: 0596101716 ISBN-13: 978-0596101718

Here is a brief preview of the first edition:

http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false

Mike

At 10:19 AM 1/26/2010, John G. Heim wrote:

From: Jaime Crespo RincÃ³n jcre...@warp.es

Sent: Monday, January 25, 2010 5:30 PM

2010/1/25 John G. Heim jh...@math.wisc.edu:

The problem is that inserts/updates are sometimes very slow, on the
order
of
a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The
whole
my.cnf is reproduced below that:

Are your databases using MyISAM or InnoDB?

Both. Maybe that's the problem? I started creating database tables for my
own web apps with the default mysql configuration. I believe the default
database engine is MyISAM. But then I wanted to use foreign keys and I
saw
that it required me to use InnoDB. So I converted some tables to InnoDB
but
not all. Maybe it was a mistake not to convert all of them.

After that, I installed drupal, moodle, and mediawiki. I haven't looked
at
what kind of tables those packages create. They may not specify it and
the
default is still whatever it is when you install mysql, MyISAM I think.

* If MyISAM, you could be suffering contention problems on writes

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.

Well, it could be that the disks aren't real fast. The server is also
running a print server (lprng). I don't think that's very CPU intensive
but
it might be slowing down writes.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com

--
Chief Training Officer
Paragon Consulting Services
850-637-3877

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com

--
Chief Training Officer
Paragon Consulting Services
850-637-3877

Re: optimization

2010-01-26 Thread Kyong Kim

On Tue, Jan 26, 2010 at 11:23 AM, Keith Murphy bmur...@paragon-cs.com wrote:
You absolutely *should not* convert the mysql database to InnoDB.

Read the above sentence again :)

All others, unless you had a specific reason not to do so, yes, I would
convert them.

keith

On Tue, Jan 26, 2010 at 2:18 PM, John G. Heim jh...@math.wisc.edu wrote:

- Original Message - From: Keith Murphy bmur...@paragon-cs.com
To: mysql@lists.mysql.com
Sent: Tuesday, January 26, 2010 11:06 AM
Subject: Re: optimization

♫
I would recommend the same to you about reading High Perf. MySQL as Baron,
et al wrote a great book about performance on MySQL. That being said, it
has
been my experience that in 99% of client cases they don't really need to
run
two different types of tables. If I were you, I would use InnoDB
exclusively
unless there is legitimate reason to do otherwise. In an environment that
is
running 25% writes and a decent query rate you are bound to have contention
issues with MyISAM. While there are always going to be edge cases for
MyISAM, your default should be innodb and your config should reflect this.

keith

On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote:

Get yourself a copy of the book High Performance MySQL 2nd Edition
Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English
ISBN-10: 0596101716 ISBN-13: 978-0596101718

Here is a brief preview of the first edition:

http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false

Mike

At 10:19 AM 1/26/2010, John G. Heim wrote:

From: Jaime Crespo RincÃ³n jcre...@warp.es

Sent: Monday, January 25, 2010 5:30 PM

2010/1/25 John G. Heim jh...@math.wisc.edu:

The problem is that inserts/updates are sometimes very slow, on the
order
of
a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The
whole
my.cnf is reproduced below that:

Are your databases using MyISAM or InnoDB?

Both. Maybe that's the problem? I started creating database tables for my
own web apps with the default mysql configuration. I believe the default
database engine is MyISAM. But then I wanted to use foreign keys and I
saw
that it required me to use InnoDB. So I converted some tables to InnoDB
but
not all. Maybe it was a mistake not to convert all of them.

After that, I installed drupal, moodle, and mediawiki. I haven't looked
at
what kind of tables those packages create. They may not specify it and
the
default is still whatever it is when you install mysql, MyISAM I think.

* If MyISAM, you could be suffering contention problems on writes

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.

Well, it could be that the disks aren't real fast. The server is also
running a print server (lprng). I don't think that's very CPU intensive
but
it might be slowing down writes.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com

--
Chief Training Officer
Paragon Consulting Services
850-637-3877

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com

--
Chief Training Officer
Paragon Consulting Services
850-637-3877

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe

optimization

2010-01-25 Thread John G. Heim


I have a server with 16Gb of RAM and a dual-core  2Ghz processor.  It is
running the latest mysql-server from debian lenny (5.0.1). I have databases
for drupal, moodle, spamassassin, horde3, and a small database for
departmental stuff.

The problem is that inserts/updates are sometimes very slow, on the order of
a minute. I am hoping somebody can sspot something wrong in my config.
Here's the optimization settings section (for your convenience). The whole
my.cnf is reproduced below that:

key_buffer = 256M
max_allowed_packet = 16M
max_connections=2000
myisam_sort_buffer_size = 64M
open_files_limit = 8192
query_cache_min_res_unit= 2K
query_cache_size= 36M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
sort_buffer_size = 1M
table_cache = 512
thread_cache = 32
thread_cache_size = 8
thread_concurrency = 8

---
# Example MySQL config file for large systems.
#
# This is for a large system with memory = 512M where the system runs mainly
# MySQL.
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is /var/lib/mysql) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the --help option.

# The following options will be passed to all MySQL clients
[client]
#password = your_password
port  = 3306
socket  = /var/run/mysqld/mysqld.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 2
#log-queries-not-using-indexes

port  = 3306
socket  = /var/run/mysqld/mysqld.sock
skip-locking

#wait_timeout=14400
key_buffer = 256M
max_allowed_packet = 16M
max_connections=2000
myisam_sort_buffer_size = 64M
open_files_limit = 8192
query_cache_min_res_unit= 2K
query_cache_size= 36M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
sort_buffer_size = 1M
table_cache = 512
thread_cache = 32
thread_cache_size = 8
thread_concurrency = 8

# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the enable-named-pipe option) will render mysqld useless!
#
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin
expire_logs_days= 10
max_binlog_size = 100M


# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id = 1

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#the syntax is:
#
#CHANGE MASTER TO MASTER_HOST=host, MASTER_PORT=port,
#MASTER_USER=user, MASTER_PASSWORD=password ;
#
#where you replace host, user, password by quoted strings and
#port by the master's port number (3306 by default).
#
#Example:
#
#CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#start replication for the first time (even unsuccessfully, for example
#if you mistyped the password in master-password and the slave fails to
#connect), the slave will create a master.info file, and any later
#change in this file to the variables' values below will be ignored and
#overridden by the content of the master.info file, unless you shutdown
#the slave server, delete master.info and restart the slaver server.
#For that reason, you may want to leave the lines below untouched
#(commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id   = 2
#
# The replication master for this slave - required
#master-host =   hostname
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user =   username
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   password
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port =  port
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin

# Point the following paths to different dedicated disks
#tmpdir  = /tmp/
#log-update  = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
#bdb_cache_size = 64M
#bdb_max_lock = 10

# Uncomment

Re: optimization

2010-01-25 Thread Jaime Crespo Rincón

2010/1/25 John G. Heim jh...@math.wisc.edu:
 I have a server with 16Gb of RAM and a dual-core  2Ghz processor.  It is
 running the latest mysql-server from debian lenny (5.0.1). I have databases
 for drupal, moodle, spamassassin, horde3, and a small database for
 departmental stuff.

 The problem is that inserts/updates are sometimes very slow, on the order of
 a minute. I am hoping somebody can sspot something wrong in my config.
 Here's the optimization settings section (for your convenience). The whole
 my.cnf is reproduced below that:

Are your databases using MyISAM or InnoDB?

* If MyISAM, you could be suffering contention problems on writes
because of full table locks. No easy solution but engine change or
database sharding. Also key_buffer, (and the other buffers) coud be
too small for 16GB of RAM. Are you really using more thant 10% of it?
You could also disable other engines if unused.
* If InnoDB, you have not set innodb_buffer_pool_size nor log size.
You could increase the pool to 50% of ram available.

Those are very general suggestions. It depends a lot on your hardware
(slow storage?), other apps installed on the same machine or the load
of the server, among others.

There also exists a tool to get introduced into MySQL server variables
tuning:  Tuning primer -  https://launchpad.net/mysql-tuning-primer It
is also a very general tool, but it could be helpful for a starting
point.

Regards,
-- 
Jaime Crespo
MySQL  Java Instructor
Warp Networks
http://warp.es

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: mysql server optimization

2009-12-15 Thread John Daisley

What kind of queries are being run and what type of data is stored?

There are a number of factors which causes MySQL to use on disk temporary 
tables instead of in memory tables. (If there a BLOB or TEXT columns in the 
table for example).

As a starting point you could (if you have the memory in your box) try 
increasing the values for tmp_table_size and max_heap_size, these control how 
large an in-memory temp table can grow before it is converted to an on disk 
temp table.

Regards
John

===

John Daisley

MySQL 5.0 Certified Database Administrator (CMDBA)
MySQL 5.0 Certified Developer
Cognos BI Developer

Telephone: +44(0)1283 537111
Mobile: +44(0)7812 451238
Email: john.dais...@butterflysystems.co.uk

===

Sent via HP IPAQ mobile device

-Original Message-
From: TianJing jingtian.seu...@gmail.com
Sent: 15 December 2009 03:08
To: mysql@lists.mysql.com
Subject: mysql server optimization

Dear all,

 i am nowing having a problem with the mysql server optimization, i have 20
 database on a server,each database is about 80Gb,the sql seems very
 slow,almost  5s.and the server i/o is so high,when i check the
 processlist,the 'copying to tmp table' state  takes a long time.

 i have already use index,but the sql use lots of 'and','or','order by', and
 for some reason i can not optimization the sql,i hope to do some
 optimization on mysql server to mitigate this phenomenon,could any one give
 me some suggestion?

 thanks.

 my server is linux,8CPU and 4G memery,the my.cnf is:

 [mysqld]
 port    = 3306
 skip-locking
 skip-name-resolve
 key_buffer_size = 16M
 max_allowed_packet = 1M
 table_open_cache = 64
 sort_buffer_size = 512M
 net_buffer_length = 8K
 read_buffer_size = 512K
 read_rnd_buffer_size = 512M
 myisam_sort_buffer_size = 8M
 table_cache = 1024
 log-bin=mysql-bin
 binlog_format=mixed


 --
 Tianjing


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: mysql server optimization

2009-12-15 Thread TianJing

i use MyISAM storage,the MYI file is more than 500Gb. most of the sql is
select with many 'and','or','order by',for example:

SELECT
fref,fstart,fstop,fsource,fmethod,fscore,fstrand,fphase,gclass,gname,ftarget_start,ftarget_stop,fdata.fid,fdata.gid
 FROM fdata,ftype,fgroup
 WHERE   fgroup.gid = fdata.gid
  AND ftype.ftypeid = fdata.ftypeid
 AND fdata.fref='chr18'
AND (fbin='100'
 OR fbin between '10' and '10'
 OR fbin between '.999' and '1'
 OR fbin between '1000.049' and '1000.051'
 OR fbin between '100.549' and '100.551'
 OR fbin between '10.0005529' and '10.0005531'
 OR fbin between '1.0055379' and '1.0055381'
 OR fbin between '1000.0553839' and '1000.0553841')
AND fdata.fstop='55384910' AND fdata.fstart='55384989'
AND  ((fmethod = 'genotype' AND fsource = 'SoapSNP') OR (fmethod =
'gt_novel' AND fsource = 'SoapSNP') OR (fmethod = 'SNP' AND fsource =
'MutaGeneSys_06JUNE07') OR (fmethod = 'snp' AND fsource = 'HapMap_gt') OR
(fmethod = 'similarity') OR (fmethod = 'HSP') OR (fmethod = 'match') OR
(fmethod = 'CDS' AND fsource = 'UCSC_1') OR (fmethod = '5\'-UTR' AND fsource
= 'UCSC_1') OR (fmethod = '3\'-UTR' AND fsource = 'UCSC_1') OR (fmethod =
'transcription_start_site' AND fsource = 'UCSC_1') OR (fmethod =
'polyA_site' AND fsource = 'UCSC_1') OR (fmethod = 'UTR' AND fsource =
'UCSC_1') OR (fmethod = 'five_prime_untranslated_region' AND fsource =
'UCSC_1') OR (fmethod = 'three_prime_untranslated_region' AND fsource =
'UCSC_1') OR (fmethod = 'five_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod
= 'three_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'exon' AND fsource
= 'UCSC_1') OR (fmethod = 'mRNA' AND fsource = 'UCSC_1'))
 ORDER BY fgroup.gname;

there is a longblob column in the table, but even i do not  select this blob
column, the i/o is still higher.

i have 16G memery in total, the oracle database take 1/4 of them, i try to
increasing the key_buffer_size to 2048M and tmp_table_size to 1024M,but the
machine crashed out of memery in one day,the mysqld takes over 18G memery!

i use show profiles find that 'sending data' and 'copying to tmp table'
takes lots of time.

On Tue, Dec 15, 2009 at 5:36 PM, John Daisley mg_s...@hotmail.com wrote:

 What kind of queries are being run and what type of data is stored?

 There are a number of factors which causes MySQL to use on disk temporary
 tables instead of in memory tables. (If there a BLOB or TEXT columns in the
 table for example).

 As a starting point you could (if you have the memory in your box) try
 increasing the values for tmp_table_size and max_heap_size, these control
 how large an in-memory temp table can grow before it is converted to an on
 disk temp table.

 Regards
 John

 ===

 John Daisley

 MySQL 5.0 Certified Database Administrator (CMDBA)
 MySQL 5.0 Certified Developer
 Cognos BI Developer

 Telephone: +44(0)1283 537111
 Mobile: +44(0)7812 451238
 Email: john.dais...@butterflysystems.co.uk

 ===

 Sent via HP IPAQ mobile device

 -Original Message-
 From: TianJing jingtian.seu...@gmail.com
 Sent: 15 December 2009 03:08
 To: mysql@lists.mysql.com
 Subject: mysql server optimization

 Dear all,

  i am nowing having a problem with the mysql server optimization, i have 20
  database on a server,each database is about 80Gb,the sql seems very
  slow,almost  5s.and the server i/o is so high,when i check the
  processlist,the 'copying to tmp table' state  takes a long time.

  i have already use index,but the sql use lots of 'and','or','order by',
 and
  for some reason i can not optimization the sql,i hope to do some
  optimization on mysql server to mitigate this phenomenon,could any one
 give
  me some suggestion?

  thanks.

  my server is linux,8CPU and 4G memery,the my.cnf is:

  [mysqld]
  port= 3306
  skip-locking
  skip-name-resolve
  key_buffer_size = 16M
  max_allowed_packet = 1M
  table_open_cache = 64
  sort_buffer_size = 512M
  net_buffer_length = 8K
  read_buffer_size = 512K
  read_rnd_buffer_size = 512M
  myisam_sort_buffer_size = 8M
  table_cache = 1024
  log-bin=mysql-bin
  binlog_format=mixed


  --
  Tianjing




-- 
Tianjing

Tel:0755-2527-3851
MSN:tianjing...@hotmail.com msn%3atianjing...@hotmail.com

RE: mysql server optimization

2009-12-15 Thread Daisley, John (Burton)

I'm fairly sure that the longblob column will prevent MySQL from being able to 
use 'in memory temp tables' regardless of whether it is included in the SELECT. 
In an ideal world I would move that longblob to a separate table.

How big are the tables fdata,ftype,fgroup? Can you post the results of EXPLAIN 
EXTENDED for the query you posted? Can you also post the output of 'SHOW CREATE 
TABLE' for the tables fdata,ftype,fgroup?



John Daisley 

Business Intelligence Developer - MySQL Database Administrator
Inspired Gaming Group Plc 

Direct Dial +44 (0)1283 519244
Telephone +44 (0)1283 512777 ext 2244
Mobile +44 (0)7812 451238

Email john.dais...@llg.co.uk

www.inspiredgaminggroup.com 


-Original Message-
From: TianJing tianj...@genomics.org.cn
Sent: 15 December 2009 10:28
To: mg_s...@hotmail.com
Cc: mysql@lists.mysql.com
Subject: Re: mysql server optimization

i use MyISAM storage,the MYI file is more than 500Gb. most of the sql is  
select with many 'and','or','order by',for example:

 SELECT
 
fref,fstart,fstop,fsource,fmethod,fscore,fstrand,fphase,gclass,gname,ftarget_start,ftarget_stop,fdata.fid,fdata.gid
  FROM fdata,ftype,fgroup
  WHERE   fgroup.gid = fdata.gid
   AND ftype.ftypeid = fdata.ftypeid
  AND fdata.fref='chr18'
     AND (fbin='100'
  OR fbin between '10' and '10'
  OR fbin between '.999' and '1'
  OR fbin between '1000.049' and '1000.051'
  OR fbin between '100.549' and '100.551'
  OR fbin between '10.0005529' and '10.0005531'
  OR fbin between '1.0055379' and '1.0055381'
  OR fbin between '1000.0553839' and '1000.0553841')
     AND fdata.fstop='55384910' AND fdata.fstart='55384989'
     AND  ((fmethod = 'genotype' AND fsource = 'SoapSNP') OR (fmethod =  
'gt_novel' AND fsource = 'SoapSNP') OR (fmethod = 'SNP' AND fsource =
 'MutaGeneSys_06JUNE07') OR (fmethod = 'snp' AND fsource = 'HapMap_gt') OR  
(fmethod = 'similarity') OR (fmethod = 'HSP') OR (fmethod = 'match') OR  
(fmethod = 'CDS' AND fsource = 'UCSC_1') OR (fmethod = '5\'-UTR' AND fsource  = 
'UCSC_1') OR (fmethod = '3\'-UTR' AND fsource = 'UCSC_1') OR (fmethod =  
'transcription_start_site' AND fsource = 'UCSC_1') OR (fmethod =  'polyA_site' 
AND fsource = 'UCSC_1') OR (fmethod = 'UTR' AND fsource =
 'UCSC_1') OR (fmethod = 'five_prime_untranslated_region' AND fsource =
 'UCSC_1') OR (fmethod = 'three_prime_untranslated_region' AND fsource =
 'UCSC_1') OR (fmethod = 'five_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod  
= 'three_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'exon' AND fsource  = 
'UCSC_1') OR (fmethod = 'mRNA' AND fsource = 'UCSC_1'))
  ORDER BY fgroup.gname;

 there is a longblob column in the table, but even i do not  select this blob  
column, the i/o is still higher.

 i have 16G memery in total, the oracle database take 1/4 of them, i try to  
increasing the key_buffer_size to 2048M and tmp_table_size to 1024M,but the  
machine crashed out of memery in one day,the mysqld takes over 18G memery!

 i use show profiles find that 'sending data' and 'copying to tmp table'
 takes lots of time.

 On Tue, Dec 15, 2009 at 5:36 PM, John Daisley mg_s...@hotmail.com wrote:

  What kind of queries are being run and what type of data is stored?
 
  There are a number of factors which causes MySQL to use on disk temporary   
  tables instead of in memory tables. (If there a BLOB or TEXT columns in the  
   table for example).
 
  As a starting point you could (if you have the memory in your box) try   
  increasing the values for tmp_table_size and max_heap_size, these control   
  how large an in-memory temp table can grow before it is converted to an on  
   disk temp table.
 
  Regards
  John
 
  ===
 
  John Daisley
 
  MySQL 5.0 Certified Database Administrator (CMDBA)   MySQL 5.0 Certified 
  Developer   Cognos BI Developer Telephone: +44(0)1283 537111   
  Mobile: +44(0)7812 451238   Email: john.dais...@butterflysystems.co.uk
 
  ===
 
  Sent via HP IPAQ mobile device
 
  -Original Message-
  From: TianJing jingtian.seu...@gmail.com   Sent: 15 December 2009 03:08  
   To: mysql@lists.mysql.com   Subject: mysql server optimization Dear 
  all,  i am nowing having a problem with the mysql server optimization, 
  i have 20    database on a server,each database is about 80Gb,the sql seems 
  very    slow,almost  5s.and the server i/o is so high,when i check the    
  processlist,the 'copying to tmp table' state  takes a long time.
 
   i have already use index,but the sql use lots of 'and','or','order by',   
 and    for some reason i can not optimization the sql,i hope to do some    
 optimization on mysql server to mitigate this phenomenon,could any one   
 give    me some suggestion?
 
   thanks.
 
   my server is linux,8CPU and 4G memery,the my.cnf is:
 
   [mysqld]
   port    = 3306
   skip-locking
   skip-name-resolve

Re: mysql server optimization

2009-12-15 Thread TianJing

; Using filesort |
|  1 | SIMPLE  | ftype  | eq_ref | PRIMARY,ftype,fmethod,fsource |
PRIMARY | 4   | yhchr1.fdata.ftypeid |1 |   100.00 | Using
where  |
|  1 | SIMPLE  | fgroup | eq_ref | PRIMARY   |
PRIMARY | 4   | yhchr1.fdata.gid |1 |   100.00
|  |
++-+++---+-+-+--+--+--+--+



On Tue, Dec 15, 2009 at 6:54 PM, Daisley, John (Burton) 
john.dais...@llg.co.uk wrote:

 I'm fairly sure that the longblob column will prevent MySQL from being able
 to use 'in memory temp tables' regardless of whether it is included in the
 SELECT. In an ideal world I would move that longblob to a separate table.

 How big are the tables fdata,ftype,fgroup? Can you post the results of
 EXPLAIN EXTENDED for the query you posted? Can you also post the output of
 'SHOW CREATE TABLE' for the tables fdata,ftype,fgroup?



 John Daisley

 Business Intelligence Developer - MySQL Database Administrator
 Inspired Gaming Group Plc

 Direct Dial +44 (0)1283 519244
 Telephone +44 (0)1283 512777 ext 2244
 Mobile +44 (0)7812 451238

 Email john.dais...@llg.co.uk

 www.inspiredgaminggroup.com


 -Original Message-
 From: TianJing tianj...@genomics.org.cn
 Sent: 15 December 2009 10:28
 To: mg_s...@hotmail.com
 Cc: mysql@lists.mysql.com
 Subject: Re: mysql server optimization

 i use MyISAM storage,the MYI file is more than 500Gb. most of the sql is
  select with many 'and','or','order by',for example:

  SELECT

  
 fref,fstart,fstop,fsource,fmethod,fscore,fstrand,fphase,gclass,gname,ftarget_start,ftarget_stop,fdata.fid,fdata.gid
   FROM fdata,ftype,fgroup
   WHERE   fgroup.gid = fdata.gid
AND ftype.ftypeid = fdata.ftypeid
   AND fdata.fref='chr18'
  AND (fbin='100'
   OR fbin between '10' and '10'
   OR fbin between '.999' and '1'
   OR fbin between '1000.049' and '1000.051'
   OR fbin between '100.549' and '100.551'
   OR fbin between '10.0005529' and '10.0005531'
   OR fbin between '1.0055379' and '1.0055381'
   OR fbin between '1000.0553839' and '1000.0553841')
  AND fdata.fstop='55384910' AND fdata.fstart='55384989'
  AND  ((fmethod = 'genotype' AND fsource = 'SoapSNP') OR (fmethod =
  'gt_novel' AND fsource = 'SoapSNP') OR (fmethod = 'SNP' AND fsource =
  'MutaGeneSys_06JUNE07') OR (fmethod = 'snp' AND fsource = 'HapMap_gt') OR
  (fmethod = 'similarity') OR (fmethod = 'HSP') OR (fmethod = 'match') OR
  (fmethod = 'CDS' AND fsource = 'UCSC_1') OR (fmethod = '5\'-UTR' AND
 fsource  = 'UCSC_1') OR (fmethod = '3\'-UTR' AND fsource = 'UCSC_1') OR
 (fmethod =  'transcription_start_site' AND fsource = 'UCSC_1') OR (fmethod =
  'polyA_site' AND fsource = 'UCSC_1') OR (fmethod = 'UTR' AND fsource =
  'UCSC_1') OR (fmethod = 'five_prime_untranslated_region' AND fsource =
  'UCSC_1') OR (fmethod = 'three_prime_untranslated_region' AND fsource =
  'UCSC_1') OR (fmethod = 'five_prime_UTR' AND fsource = 'UCSC_1') OR
 (fmethod  = 'three_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'exon'
 AND fsource  = 'UCSC_1') OR (fmethod = 'mRNA' AND fsource = 'UCSC_1'))
   ORDER BY fgroup.gname;

  there is a longblob column in the table, but even i do not  select this
 blob  column, the i/o is still higher.

  i have 16G memery in total, the oracle database take 1/4 of them, i try to
  increasing the key_buffer_size to 2048M and tmp_table_size to 1024M,but the
  machine crashed out of memery in one day,the mysqld takes over 18G memery!

  i use show profiles find that 'sending data' and 'copying to tmp table'
  takes lots of time.

  On Tue, Dec 15, 2009 at 5:36 PM, John Daisley mg_s...@hotmail.com
 wrote:

   What kind of queries are being run and what type of data is stored?
  
   There are a number of factors which causes MySQL to use on disk
 temporary   tables instead of in memory tables. (If there a BLOB or TEXT
 columns in the   table for example).
  
   As a starting point you could (if you have the memory in your box) try
   increasing the values for tmp_table_size and max_heap_size, these control
   how large an in-memory temp table can grow before it is converted to an
 on   disk temp table.
  
   Regards
   John
  
   ===
  
   John Daisley
  
   MySQL 5.0 Certified Database Administrator (CMDBA)   MySQL 5.0
 Certified Developer   Cognos BI Developer Telephone: +44(0)1283
 537111   Mobile: +44(0)7812 451238   Email:
 john.dais...@butterflysystems.co.uk
  
   ===
  
   Sent via HP IPAQ mobile device
  
   -Original Message-
   From: TianJing jingtian.seu...@gmail.com   Sent: 15 December 2009
 03:08   To: mysql@lists.mysql.com   Subject: mysql server optimization
 Dear all,  i am nowing

RE: mysql server optimization

2009-12-15 Thread Daisley, John (Burton)

The 'order by' clause is forcing MySQL to use a temporary table; as a
test could you try running the query without the order by clause? Does
it run quicker? MySQL must use a temporary table if you 'group by' or
'order by' a column not contained in the first table of the select,
sometimes you can get around this limitation just by rewriting the
select. Give it a try, it can make a huge difference.
 
Do the tables have a lot of insert,update, delete operations? If so it
may be worth you running OPTIMIZE TABLE to rebuild indexes. Be warned it
can take some time to complete and tables are locked whilst it runs. If
you have slaves running you may want to add LOCAL to the OPTIMIZE TABLE
statement so the operation does not run on your slaves.
 
I'm stuck doing some other stuff at the mo but I will try and have a
proper look at this later and will get back to you.
 
Regards



John Daisley

Business Intelligence Developer - MySQL Database Administrator
Inspired Gaming Group Plc

Direct Dial +44 (0)1283 519244
Telephone +44 (0)1283 512777 ext 2244


Mobile +44 (0)7812 451238
 



From: jingtian.seu...@gmail.com [mailto:jingtian.seu...@gmail.com] On
Behalf Of TianJing
Sent: 15 December 2009 12:43
To: Daisley, John (Burton)
Cc: mysql@lists.mysql.com
Subject: Re: mysql server optimization


yes,you are right,the longblob is already move to a separate table
fdna,it is about 10Gb/database the fdata is about 30Gb/database ,the
fgroup is about 10Gb/database.the MYI file is almost the same or much
bigger than the MYD file.

show create table for fdna is:
| fdna  | CREATE TABLE `fdna` (
  `fref` varchar(100) NOT NULL,
  `foffset` int(10) unsigned NOT NULL,
  `fdna` longblob,
  PRIMARY KEY (`fref`,`foffset`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

show create table for fdata is:
| fdata | CREATE TABLE `fdata` (
  `fid` int(11) NOT NULL AUTO_INCREMENT,
  `fref` varchar(100) NOT NULL,
  `fstart` int(10) unsigned NOT NULL,
  `fstop` int(10) unsigned NOT NULL,
  `fbin` double(20,6) NOT NULL,
  `ftypeid` int(11) NOT NULL,
  `fscore` float DEFAULT NULL,
  `fstrand` enum('+','-') DEFAULT NULL,
  `fphase` enum('0','1','2') DEFAULT NULL,
  `gid` int(11) NOT NULL,
  `ftarget_start` int(10) unsigned DEFAULT NULL,
  `ftarget_stop` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`fid`),
  UNIQUE KEY `fref` (`fref`,`fbin`,`fstart`,`fstop`,`ftypeid`,`gid`),
  KEY `ftypeid` (`ftypeid`),
  KEY `gid` (`gid`)
) ENGINE=MyISAM AUTO_INCREMENT=381002371 DEFAULT CHARSET=latin1 |

show create for fgroup is:
-+
| fgroup | CREATE TABLE `fgroup` (
  `gid` int(11) NOT NULL AUTO_INCREMENT,
  `gclass` varchar(100) DEFAULT NULL,
  `gname` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`gid`),
  UNIQUE KEY `gclass` (`gclass`,`gname`)
) ENGINE=MyISAM AUTO_INCREMENT=240501186 DEFAULT CHARSET=latin1 |

show  create for ftype is:
| ftype | CREATE TABLE `ftype` (
  `ftypeid` int(11) NOT NULL AUTO_INCREMENT,
  `fmethod` varchar(100) NOT NULL,
  `fsource` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`ftypeid`),
  UNIQUE KEY `ftype` (`fmethod`,`fsource`),
  KEY `fmethod` (`fmethod`),
  KEY `fsource` (`fsource`)
) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1 |


the index on fdata is :
--+-+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
| Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---++--+--+-+--
-+-+--++--++-+
| fdata |  0 | PRIMARY  |1 | fid | A
|   463619315 | NULL | NULL   |  | BTREE  | |
| fdata |  0 | fref |1 | fref| A
|   1 | NULL | NULL   |  | BTREE  | |
| fdata |  0 | fref |2 | fbin| A
|  229060 | NULL | NULL   |  | BTREE  | |
| fdata |  0 | fref |3 | fstart  | A
|   231809657 | NULL | NULL   |  | BTREE  | |
| fdata |  0 | fref |4 | fstop   | A
|   463619315 | NULL | NULL   |  | BTREE  | |
| fdata |  0 | fref |5 | ftypeid | A
|   463619315 | NULL | NULL   |  | BTREE  | |
| fdata |  0 | fref |6 | gid | A
|   463619315 | NULL | NULL   |  | BTREE  | |
| fdata |  1 | ftypeid  |1 | ftypeid | A
|  15 | NULL | NULL   |  | BTREE  | |
| fdata |  1 | gid  |1 | gid | A
|   231809657 | NULL | NULL   |  | BTREE  | |

index for fgroup is:
+++--+--+-+-
--+-+--++--++-+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null

sql file system + optimization

2009-12-15 Thread Steven Staples

Ok... in the file system, my MySQL files are located in /var/lib/mysql/
I have a database (lets say 'test') so it gets its own directory
/var/lib/mysql/test/
Now, all the tables go in this folder. (I know, we *should* all know
this...)

if the database is on another disk, could it increase performance if this
was a large database? If so, could I just 
symlink the test/ directory to another raid array to increase performance?
Or would the increase be negligible?


Steven Staples




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: sql file system + optimization

2009-12-15 Thread John Daisley

Yes, you can symlink it. How much performance benefit you get will depend on 
hardware and traffic.

===

John Daisley

MySQL 5.0 Certified Database Administrator (CMDBA)
MySQL 5.0 Certified Developer
Cognos BI Developer

Telephone: +44(0)1283 537111
Mobile: +44(0)7812 451238
Email: john.dais...@butterflysystems.co.uk

===

Sent via HP IPAQ mobile device

-Original Message-
From: Steven Staples sstap...@mnsi.net
Sent: 15 December 2009 14:37
To: mysql@lists.mysql.com
Subject: sql file system + optimization

Ok... in the file system, my MySQL files are located in /var/lib/mysql/
 I have a database (lets say 'test') so it gets its own directory
 /var/lib/mysql/test/
 Now, all the tables go in this folder. (I know, we *should* all know
 this...)

 if the database is on another disk, could it increase performance if this
 was a large database? If so, could I just
 symlink the test/ directory to another raid array to increase performance?
 Or would the increase be negligible?


 Steven Staples




 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:    
http://lists.mysql.com/mysql?unsub=john.dais...@butterflysystems.co.uk



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: mysql server optimization

2009-12-15 Thread TianJing

yeah,it runs faster if without order clause,as it do not use filesort here.
because of the huge data,it takes lots of time to copy them to tmp table or
even to the tmp table on the disk when use filesort, It also led to the
higher io wait!  i am trying to increase the variable
'max_length_for_sort_data' to 8096,hope this can help mysql use the efficient
sorting algorithm,theoretically this can reduce the I / O, i am also trying
to use tmpfs filesystem as tmpdir,this can greatly improve the reading and
writing speed on the disk. hope these can help. do you hava any suggestions
on this?

these is only select operation in my database,and i have do some optimiztion
by 'myisamchk -aS' ,but the effect is not very good. i am sure that it will
be greate improved if we rewriting the select,but unfortunately, this sql is
inside the completed software, and i have no rights to rewrite it,so i have
the only way to optimiztion the mysql server!

thanks again for your help

all the best.

On Tue, Dec 15, 2009 at 9:59 PM, Daisley, John (Burton) 
john.dais...@llg.co.uk wrote:

  The 'order by' clause is forcing MySQL to use a temporary table; as a
 test could you try running the query without the order by clause? Does it
 run quicker? MySQL must use a temporary table if you 'group by' or 'order
 by' a column not contained in the first table of the select, sometimes you
 can get around this limitation just by rewriting the select. Give it a try,
 it can make a huge difference.

 Do the tables have a lot of insert,update, delete operations? If so it may
 be worth you running OPTIMIZE TABLE to rebuild indexes. Be warned it can
 take some time to complete and tables are locked whilst it runs. If you have
 slaves running you may want to add LOCAL to the OPTIMIZE TABLE statement so
 the operation does not run on your slaves.

 I'm stuck doing some other stuff at the mo but I will try and have a proper
 look at this later and will get back to you.

 Regards

 John Daisley

 Business Intelligence Developer - MySQL Database Administrator
 Inspired Gaming Group Plc

 Direct Dial +44 (0)1283 519244
 Telephone +44 (0)1283 512777 ext 2244
 Mobile +44 (0)7812 451238


  --
 *From:* jingtian.seu...@gmail.com [mailto:jingtian.seu...@gmail.com] *On
 Behalf Of *TianJing
 *Sent:* 15 December 2009 12:43
 *To:* Daisley, John (Burton)

 *Cc:* mysql@lists.mysql.com
 *Subject:* Re: mysql server optimization

 yes,you are right,the longblob is already move to a separate table fdna,it
 is about 10Gb/database the fdata is about 30Gb/database ,the fgroup is about
 10Gb/database.the MYI file is almost the same or much bigger than the MYD
 file.

 show create table for fdna is:
 | fdna  | CREATE TABLE `fdna` (
   `fref` varchar(100) NOT NULL,
   `foffset` int(10) unsigned NOT NULL,
   `fdna` longblob,
   PRIMARY KEY (`fref`,`foffset`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

 show create table for fdata is:
 | fdata | CREATE TABLE `fdata` (
   `fid` int(11) NOT NULL AUTO_INCREMENT,
   `fref` varchar(100) NOT NULL,
   `fstart` int(10) unsigned NOT NULL,
   `fstop` int(10) unsigned NOT NULL,
   `fbin` double(20,6) NOT NULL,
   `ftypeid` int(11) NOT NULL,
   `fscore` float DEFAULT NULL,
   `fstrand` enum('+','-') DEFAULT NULL,
   `fphase` enum('0','1','2') DEFAULT NULL,
   `gid` int(11) NOT NULL,
   `ftarget_start` int(10) unsigned DEFAULT NULL,
   `ftarget_stop` int(10) unsigned DEFAULT NULL,
   PRIMARY KEY (`fid`),
   UNIQUE KEY `fref` (`fref`,`fbin`,`fstart`,`fstop`,`ftypeid`,`gid`),
   KEY `ftypeid` (`ftypeid`),
   KEY `gid` (`gid`)
 ) ENGINE=MyISAM AUTO_INCREMENT=381002371 DEFAULT CHARSET=latin1 |

 show create for fgroup is:
 -+
 | fgroup | CREATE TABLE `fgroup` (
   `gid` int(11) NOT NULL AUTO_INCREMENT,
   `gclass` varchar(100) DEFAULT NULL,
   `gname` varchar(100) DEFAULT NULL,
   PRIMARY KEY (`gid`),
   UNIQUE KEY `gclass` (`gclass`,`gname`)
 ) ENGINE=MyISAM AUTO_INCREMENT=240501186 DEFAULT CHARSET=latin1 |

 show  create for ftype is:
 | ftype | CREATE TABLE `ftype` (
   `ftypeid` int(11) NOT NULL AUTO_INCREMENT,
   `fmethod` varchar(100) NOT NULL,
   `fsource` varchar(100) DEFAULT NULL,
   PRIMARY KEY (`ftypeid`),
   UNIQUE KEY `ftype` (`fmethod`,`fsource`),
   KEY `fmethod` (`fmethod`),
   KEY `fsource` (`fsource`)
 ) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1 |


 the index on fdata is :
 --+-+
 | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
 Cardinality | Sub_part | Packed | Null | Index_type | Comment |

 +---++--+--+-+---+-+--++--++-+
 | fdata |  0 | PRIMARY  |1 | fid | A
 |   463619315 | NULL | NULL   |  | BTREE  | |
 | fdata |  0 | fref |1 | fref| A
 |   1 | NULL | NULL   |  | BTREE  | |
 | fdata |  0 | fref

Optimization suggestions

2009-12-14 Thread Sudhir N

I have following table structure, I have to use merge storage engine.
Please have a look, and provide feedback if theres some thing wrong or if 
there's space for optimization.


/*Table structure for table `messages2009` */

CREATE TABLE `messages2009` (
  `id` varchar(36) NOT NULL default '',
  `folderid` varchar(36) NOT NULL default '',
  `fromid` int(11) NOT NULL default '0',
  `fromtype` varchar(10) NOT NULL default '',
  `toid` int(11) NOT NULL default '0',
  `totype` varchar(10) NOT NULL default '',
  `subject` varchar(255) default NULL,
  `body` text,
  `readbyrecipient` tinyint(1) NOT NULL default '0',
  `deletedbyauthor` tinyint(1) NOT NULL default '0',
  `deletedbyrecipient` tinyint(1) NOT NULL default '0',
  `threadid` varchar(36) NOT NULL default '',
  `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `folderid` (`folderid`),
  KEY `threadid` (`threadid`),
  KEY `inboxfolderindex` (`folderid`,`toid`,`totype`),
  KEY `sentitemsindex` (`fromid`,`fromtype`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

/*Table structure for table `messages` */
/*Merge table definition that covers all message tables*/

CREATE TABLE `messages` (
  `id` varchar(36) NOT NULL default '',
  `folderid` varchar(36) NOT NULL default '',
  `fromid` int(11) NOT NULL default '0',
  `fromtype` varchar(10) NOT NULL default '',
  `toid` int(11) NOT NULL default '0',
  `totype` varchar(10) NOT NULL default '',
  `subject` varchar(255) default NULL,
  `body` text,
  `readbyrecipient` tinyint(1) NOT NULL default '0',
  `deletedbyauthor` tinyint(1) NOT NULL default '0',
  `deletedbyrecipient` tinyint(1) NOT NULL default '0',
  `threadid` varchar(36) NOT NULL default '',
  `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `folderid` (`folderid`),
  KEY `threadid` (`threadid`),
  KEY `inboxfolderindex` (`folderid`,`toid`,`totype`),
  KEY `sentitemsindex` (`fromid`,`fromtype`)
) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST 
UNION=(`messages2009`);



 
  

Sudhir NimavatSenior software engineer. 
Quick start global PVT LTD.
Baroda - 390007
Gujarat, India

Personally I'm always ready to learn, although I do not always like being taught


  The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
http://in.yahoo.com/

Re: Optimization suggestions

2009-12-14 Thread Jim Lyons

After one very quick look, the index on folderid alone is unnecessary since
you have another index in which that field is the high-order field.

On Mon, Dec 14, 2009 at 12:31 PM, Sudhir N sudhir_nima...@yahoo.com wrote:

 I have following table structure, I have to use merge storage engine.
 Please have a look, and provide feedback if theres some thing wrong or if
 there's space for optimization.


 /*Table structure for table `messages2009` */

 CREATE TABLE `messages2009` (
  `id` varchar(36) NOT NULL default '',
  `folderid` varchar(36) NOT NULL default '',
  `fromid` int(11) NOT NULL default '0',
  `fromtype` varchar(10) NOT NULL default '',
  `toid` int(11) NOT NULL default '0',
  `totype` varchar(10) NOT NULL default '',
  `subject` varchar(255) default NULL,
  `body` text,
  `readbyrecipient` tinyint(1) NOT NULL default '0',
  `deletedbyauthor` tinyint(1) NOT NULL default '0',
  `deletedbyrecipient` tinyint(1) NOT NULL default '0',
  `threadid` varchar(36) NOT NULL default '',
  `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `folderid` (`folderid`),
  KEY `threadid` (`threadid`),
  KEY `inboxfolderindex` (`folderid`,`toid`,`totype`),
  KEY `sentitemsindex` (`fromid`,`fromtype`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

 /*Table structure for table `messages` */
 /*Merge table definition that covers all message tables*/

 CREATE TABLE `messages` (
  `id` varchar(36) NOT NULL default '',
  `folderid` varchar(36) NOT NULL default '',
  `fromid` int(11) NOT NULL default '0',
  `fromtype` varchar(10) NOT NULL default '',
  `toid` int(11) NOT NULL default '0',
  `totype` varchar(10) NOT NULL default '',
  `subject` varchar(255) default NULL,
  `body` text,
  `readbyrecipient` tinyint(1) NOT NULL default '0',
  `deletedbyauthor` tinyint(1) NOT NULL default '0',
  `deletedbyrecipient` tinyint(1) NOT NULL default '0',
  `threadid` varchar(36) NOT NULL default '',
  `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `folderid` (`folderid`),
  KEY `threadid` (`threadid`),
  KEY `inboxfolderindex` (`folderid`,`toid`,`totype`),
  KEY `sentitemsindex` (`fromid`,`fromtype`)
 ) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST
 UNION=(`messages2009`);






 Sudhir NimavatSenior software engineer.
 Quick start global PVT LTD.
 Baroda - 390007
 Gujarat, India

 Personally I'm always ready to learn, although I do not always like being
 taught


  The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
 http://in.yahoo.com/




-- 
Jim Lyons
Web developer / Database administrator
http://www.weblyons.com

RE: Optimization suggestions

2009-12-14 Thread Gavin Towey

Id should probably be an auto_incrementing INT, if you still need a unique text 
identifier, then I would make a separate field.  Though my opinion isn't the 
only way; there is much debate on natural vs. surrogate keys.

I would normalize folderid into a lookup in another table, and make folderid 
an INT value.

Threadid is another field that would probably be better as an INT.

As for your indexes, they depend completely on what type of queries you're 
going to be running.  Once you know that, then you can test them using sample 
data and EXPLAIN.

http://dev.mysql.com/doc/refman/5.0/en/explain.html
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

About normalization:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html


Regards,
Gavin Towey


-Original Message-
From: Sudhir N [mailto:sudhir_nima...@yahoo.com]
Sent: Monday, December 14, 2009 10:31 AM
To: Mysql
Subject: Optimization suggestions

I have following table structure, I have to use merge storage engine.
Please have a look, and provide feedback if theres some thing wrong or if 
there's space for optimization.


/*Table structure for table `messages2009` */

CREATE TABLE `messages2009` (
  `id` varchar(36) NOT NULL default '',
  `folderid` varchar(36) NOT NULL default '',
  `fromid` int(11) NOT NULL default '0',
  `fromtype` varchar(10) NOT NULL default '',
  `toid` int(11) NOT NULL default '0',
  `totype` varchar(10) NOT NULL default '',
  `subject` varchar(255) default NULL,
  `body` text,
  `readbyrecipient` tinyint(1) NOT NULL default '0',
  `deletedbyauthor` tinyint(1) NOT NULL default '0',
  `deletedbyrecipient` tinyint(1) NOT NULL default '0',
  `threadid` varchar(36) NOT NULL default '',
  `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `folderid` (`folderid`),
  KEY `threadid` (`threadid`),
  KEY `inboxfolderindex` (`folderid`,`toid`,`totype`),
  KEY `sentitemsindex` (`fromid`,`fromtype`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

/*Table structure for table `messages` */
/*Merge table definition that covers all message tables*/

CREATE TABLE `messages` (
  `id` varchar(36) NOT NULL default '',
  `folderid` varchar(36) NOT NULL default '',
  `fromid` int(11) NOT NULL default '0',
  `fromtype` varchar(10) NOT NULL default '',
  `toid` int(11) NOT NULL default '0',
  `totype` varchar(10) NOT NULL default '',
  `subject` varchar(255) default NULL,
  `body` text,
  `readbyrecipient` tinyint(1) NOT NULL default '0',
  `deletedbyauthor` tinyint(1) NOT NULL default '0',
  `deletedbyrecipient` tinyint(1) NOT NULL default '0',
  `threadid` varchar(36) NOT NULL default '',
  `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `folderid` (`folderid`),
  KEY `threadid` (`threadid`),
  KEY `inboxfolderindex` (`folderid`,`toid`,`totype`),
  KEY `sentitemsindex` (`fromid`,`fromtype`)
) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST 
UNION=(`messages2009`);






Sudhir NimavatSenior software engineer.
Quick start global PVT LTD.
Baroda - 390007
Gujarat, India

Personally I'm always ready to learn, although I do not always like being taught


  The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
http://in.yahoo.com/

This message contains confidential information and is intended only for the 
individual named.  If you are not the named addressee, you are notified that 
reviewing, disseminating, disclosing, copying or distributing this e-mail is 
strictly prohibited.  Please notify the sender immediately by e-mail if you 
have received this e-mail by mistake and delete this e-mail from your system. 
E-mail transmission cannot be guaranteed to be secure or error-free as 
information could be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete, or contain viruses. The sender therefore does not accept liability 
for any loss or damage caused by viruses or errors or omissions in the contents 
of this message, which arise as a result of e-mail transmission. [FriendFinder 
Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com

mysql server optimization

2009-12-14 Thread TianJing

Dear all,

i am nowing having a problem with the mysql server optimization, i have 20
database on a server,each database is about 80Gb,the sql seems very
slow,almost  5s.and the server i/o is so high,when i check the
processlist,the 'copying to tmp table' state  takes a long time.

i have already use index,but the sql use lots of 'and','or','order by', and
for some reason i can not optimization the sql,i hope to do some
optimization on mysql server to mitigate this phenomenon，could any one give
me some suggestion?

thanks.

my server is linux,8CPU and 4G memery,the my.cnf is:

[mysqld]
port= 3306
skip-locking
skip-name-resolve
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512M
net_buffer_length = 8K
read_buffer_size = 512K
read_rnd_buffer_size = 512M
myisam_sort_buffer_size = 8M
table_cache = 1024
log-bin=mysql-bin
binlog_format=mixed


-- 
Tianjing

1 2 3 4 5 6 >

1 - 100 of 537 matches

Mail list logo