Re: MySQL optimization for testing purpose
On 04/03/2018 06:15 PM, Sebastien FLAESCH wrote: On 04/03/2018 05:59 PM, Sebastien FLAESCH wrote: Hi all, Looking for some configuration tips to speed execution of our non-regression tests with MySQL. We do pure functional testing, data persistence is not critical. Our tests are not using a lot of data, but we create and drop a lot of test tables. We have concurrency tests (processes locking rows), so this feature matters. Are there some InnoDB params to enable or disable to speed up our tests? Some tests behave a bit strange, sometimes one of our test takes 1 min 30 secs, sometimes it takes only 15 seconds... In fact this tests creates and drops ~150 times the same table, having a single column using different types each time. I can clearly see that sometimes table creations are fast, but from time to time it slows down to about a second to create a table. How can this happen? Any log I could enable and provide here? Thanks in advance! Seb FYI, have tried following settings, but it did not help: innodb_stats_auto_recalc=0 innodb_file_per_table=0 innodb_stats_persistent=0 I have recreated my database after restarting the server. Server version: 8.0.4-rc-log MySQL Community Server (GPL) Seb -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: MySQL optimization for testing purpose
On 04/03/2018 05:59 PM, Sebastien FLAESCH wrote: Hi all, Looking for some configuration tips to speed execution of our non-regression tests with MySQL. We do pure functional testing, data persistence is not critical. Our tests are not using a lot of data, but we create and drop a lot of test tables. We have concurrency tests (processes locking rows), so this feature matters. Are there some InnoDB params to enable or disable to speed up our tests? Some tests behave a bit strange, sometimes one of our test takes 1 min 30 secs, sometimes it takes only 15 seconds... In fact this tests creates and drops ~150 times the same table, having a single column using different types each time. I can clearly see that sometimes table creations are fast, but from time to time it slows down to about a second to create a table. How can this happen? Any log I could enable and provide here? Thanks in advance! Seb -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
MySQL optimization for testing purpose
Hi all, Looking for some configuration tips to speed execution of our non-regression tests with MySQL. We do pure functional testing, data persistence is not critical. Our tests are not using a lot of data, but we create and drop a lot of test tables. We have concurrency tests (processes locking rows), so this feature matters. Are there some InnoDB params to enable or disable to speed up our tests? Some tests behave a bit strange, sometimes one of our test takes 1 min 30 secs, sometimes it takes only 15 seconds... How can this happen? Any log I could enable and provide here? Thanks in advance! Seb -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
inconsistent optimization
Without going into specific details on queries... Using mysql 5.1 as provided with CentOS6, I've noticed some queries providing what I can best explain as inconsistent optimization. The database can be quieted to just controlled queries and at times the same query will return very quickly when at other times may take minutes. I don't see the same behavior with mysql5.0 under CentOS5. The same queries on the same data returns quickly consistently. When the queries run slowly they show in a process list as either in a copy to temp table or sending data state. At first I thought query restructuring to avoid the copy to temp table was a path to a solution, but now I don't think so since the same query changed so that it no longer needs a temp table will sit in the sending data state for a long time. The queries do eventually come back with correct results, but it takes minutes rather than milliseconds (sometimes slow; sometimes fast). Have others seen this behavior? Any explanations? Any reading to point to for further understanding? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: inconsistent optimization
Well, Try to start checking the IOPs vs Disc. Check your iowait and the cache size. Could you send a create table and the query for us? Atenciosamente, *Eduardo Fontinelle* *Chief Technology Officer | G**erencianet* Phone: +55 (31) 3603-0812 2014-08-20 12:04 GMT-03:00 Jim j...@lowcarbfriends.com: Without going into specific details on queries... Using mysql 5.1 as provided with CentOS6, I've noticed some queries providing what I can best explain as inconsistent optimization. The database can be quieted to just controlled queries and at times the same query will return very quickly when at other times may take minutes. I don't see the same behavior with mysql5.0 under CentOS5. The same queries on the same data returns quickly consistently. When the queries run slowly they show in a process list as either in a copy to temp table or sending data state. At first I thought query restructuring to avoid the copy to temp table was a path to a solution, but now I don't think so since the same query changed so that it no longer needs a temp table will sit in the sending data state for a long time. The queries do eventually come back with correct results, but it takes minutes rather than milliseconds (sometimes slow; sometimes fast). Have others seen this behavior? Any explanations? Any reading to point to for further understanding? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: inconsistent optimization
Jim/Jaime What engine are you implementing?/ Qual mecanismo de MySQL que você está implementando? Saludos desde Sud America Martín Date: Wed, 20 Aug 2014 13:54:46 -0300 Subject: Re: inconsistent optimization From: edua...@gerencianet.com.br To: j...@lowcarbfriends.com CC: mysql@lists.mysql.com Well, Try to start checking the IOPs vs Disc. Check your iowait and the cache size. Could you send a create table and the query for us? Atenciosamente, *Eduardo Fontinelle* *Chief Technology Officer | G**erencianet* Phone: +55 (31) 3603-0812 2014-08-20 12:04 GMT-03:00 Jim j...@lowcarbfriends.com: Without going into specific details on queries... Using mysql 5.1 as provided with CentOS6, I've noticed some queries providing what I can best explain as inconsistent optimization. The database can be quieted to just controlled queries and at times the same query will return very quickly when at other times may take minutes. I don't see the same behavior with mysql5.0 under CentOS5. The same queries on the same data returns quickly consistently. When the queries run slowly they show in a process list as either in a copy to temp table or sending data state. At first I thought query restructuring to avoid the copy to temp table was a path to a solution, but now I don't think so since the same query changed so that it no longer needs a temp table will sit in the sending data state for a long time. The queries do eventually come back with correct results, but it takes minutes rather than milliseconds (sometimes slow; sometimes fast). Have others seen this behavior? Any explanations? Any reading to point to for further understanding? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: inconsistent optimization
innodb On 8/20/2014 1:22 PM, Martin Gainty wrote: Jim/Jaime What engine are you implementing?/ Qual mecanismo de MySQL que você está implementando? Saludos desde Sud America Martín Date: Wed, 20 Aug 2014 13:54:46 -0300 Subject: Re: inconsistent optimization From: edua...@gerencianet.com.br To: j...@lowcarbfriends.com CC: mysql@lists.mysql.com Well, Try to start checking the IOPs vs Disc. Check your iowait and the cache size. Could you send a create table and the query for us? Atenciosamente, *Eduardo Fontinelle* *Chief Technology Officer | G**erencianet* Phone: +55 (31) 3603-0812 2014-08-20 12:04 GMT-03:00 Jim j...@lowcarbfriends.com: Without going into specific details on queries... Using mysql 5.1 as provided with CentOS6, I've noticed some queries providing what I can best explain as inconsistent optimization. The database can be quieted to just controlled queries and at times the same query will return very quickly when at other times may take minutes. I don't see the same behavior with mysql5.0 under CentOS5. The same queries on the same data returns quickly consistently. When the queries run slowly they show in a process list as either in a copy to temp table or sending data state. At first I thought query restructuring to avoid the copy to temp table was a path to a solution, but now I don't think so since the same query changed so that it no longer needs a temp table will sit in the sending data state for a long time. The queries do eventually come back with correct results, but it takes minutes rather than milliseconds (sometimes slow; sometimes fast). Have others seen this behavior? Any explanations? Any reading to point to for further understanding? Thanks. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: inconsistent optimization
Hi Jim, On 8/20/2014 11:04 AM, Jim wrote: Without going into specific details on queries... Using mysql 5.1 as provided with CentOS6, I've noticed some queries providing what I can best explain as inconsistent optimization. The database can be quieted to just controlled queries and at times the same query will return very quickly when at other times may take minutes. I don't see the same behavior with mysql5.0 under CentOS5. The same queries on the same data returns quickly consistently. When the queries run slowly they show in a process list as either in a copy to temp table or sending data state. At first I thought query restructuring to avoid the copy to temp table was a path to a solution, but now I don't think so since the same query changed so that it no longer needs a temp table will sit in the sending data state for a long time. The queries do eventually come back with correct results, but it takes minutes rather than milliseconds (sometimes slow; sometimes fast). Have others seen this behavior? Any explanations? Any reading to point to for further understanding? Fluctuations in query times can be the results of configuration mistakes (like creating a 1GB query cache or a tiny InnoDB Buffer Pool), or data changes (did you add or remove or change a bunch of rows), or query patterns (did you add or remove terms from your WHERE clauses, did you change which columns were in your SELECT clause, ... ). To know why a query is doing what it is doing, you need to ask the Optimizer. The Optimizer is that part of the server that works out the most efficient way to go get the data you are asking for and how to process that data once it is pulled from disk or cache. This is the purpose of the EXPLAIN operator. Just put that word before SELECT and see what you get. An explanation of how to interpret an EXPLAIN report is here in the manual (you are reading the manual, right?) http://dev.mysql.com/doc/refman/5.1/en/explain.html http://dev.mysql.com/doc/refman/5.1/en/execution-plan-information.html That will give you a starting place. After that, you can refer to the other sections of the Optimization chapter to see what you can or should be changing to improve your performance. http://dev.mysql.com/doc/refman/5.1/en/optimization.html You should also need to learn a little bit about the topic of index statistics as those are what the Optimizer uses to develop its execution plans. http://dev.mysql.com/doc/refman/5.1/en/analyze-table.html http://dev.mysql.com/doc/refman/5.1/en/show-index.html http://dev.mysql.com/doc/refman/5.1/en/innodb-restrictions.html (search for ANALYZE TABLE determines index cardinality...) http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_stats_sample_pages http://dev.mysql.com/doc/refman/5.1/en/optimizer-issues.html Feel free to ask the list any questions that may arise in your research. Regards, -- Shawn Green MySQL Senior Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: LIKE sql optimization
On 12/02/2014 13:16, Morgan Tocker wrote: Hi Zhigang, On Feb 11, 2014, at 8:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. I think I understand the question - you are asking why MySQL will not index scan, find matching records, and then look them up rather than table scan? I believe the answer is that there is no way of knowing if 1 row matches, or all rows match. In the worst case (where all rows match), it is much more expensive to traverse between index and data rows for-each-record. So a table scan is a “safe choice / has less variance. In addition to what Morgan writes, then with an index scan you will end up doing a lot of random I/O: even if the index scan itself is one sequential scan (which is not guaranteed) then for each match, it will be necessary to look up the actual row. On the other hand a table scan will generally be more of a sequential read as you already have all the data available for each match. Random I/O is more expensive than sequential I/O - particularly on spinning disks - so in general the optimizer will try to reduce the amount of random I/O. In some cases though, you may see the index scan be preferred. Assume you have a query like: SELECT val FROM table WHERE condition LIKE '%abcd'; and you have an index (condition, val) or (val, condition) then the whole query can be satisfied from the index (it's called a covering index). In that case the index scan is usually preferred over the table scan. For the purpose of using an index to do index lookups to find the matching rows rather than doing either a table or index scan for WHERE clauses like LIKE '%abcd' you can do a couple of things: * Duplicate the column used in the WHERE clause, but reverse the string. That way the above WHERE clause becomes: WHERE condition_revers LIKE 'dcba%' This can use an index as it is a left prefix. * If you always look for around the same number of characters at the end in your WHERE clause, you can create a column with just those last characters, e.g. so the WHERE clause becomes: WHERE condition_suffix = 'abcd' Do however be careful that you ensure you have enough selectivity that way. If for example 90% of the rows ends in 'abcd' an index will not do you much good (unless you are looking for the last 10% of the rows). Best regards, Jesper Krogh MySQL Support
RE: LIKE sql optimization
Done. Thand you very much! Zhigang _ From: Jesper Wisborg Krogh [mailto:my...@wisborg.dk] Sent: Wednesday, February 12, 2014 5:30 PM To: Morgan Tocker; Zhigang Zhang Cc: mysql@lists.mysql.com Subject: Re: LIKE sql optimization On 12/02/2014 13:16, Morgan Tocker wrote: Hi Zhigang, On Feb 11, 2014, at 8:48 PM, Zhigang Zhang mailto:zzgang2...@gmail.com zzgang2...@gmail.com wrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. I think I understand the question - you are asking why MySQL will not index scan, find matching records, and then look them up rather than table scan? I believe the answer is that there is no way of knowing if 1 row matches, or all rows match. In the worst case (where all rows match), it is much more expensive to traverse between index and data rows for-each-record. So a table scan is a “safe choice / has less variance. In addition to what Morgan writes, then with an index scan you will end up doing a lot of random I/O: even if the index scan itself is one sequential scan (which is not guaranteed) then for each match, it will be necessary to look up the actual row. On the other hand a table scan will generally be more of a sequential read as you already have all the data available for each match. Random I/O is more expensive than sequential I/O - particularly on spinning disks - so in general the optimizer will try to reduce the amount of random I/O. In some cases though, you may see the index scan be preferred. Assume you have a query like: SELECT val FROM table WHERE condition LIKE '%abcd'; and you have an index (condition, val) or (val, condition) then the whole query can be satisfied from the index (it's called a covering index). In that case the index scan is usually preferred over the table scan. For the purpose of using an index to do index lookups to find the matching rows rather than doing either a table or index scan for WHERE clauses like LIKE '%abcd' you can do a couple of things: * Duplicate the column used in the WHERE clause, but reverse the string. That way the above WHERE clause becomes: WHERE condition_revers LIKE 'dcba%' This can use an index as it is a left prefix. * If you always look for around the same number of characters at the end in your WHERE clause, you can create a column with just those last characters, e.g. so the WHERE clause becomes: WHERE condition_suffix = 'abcd' Do however be careful that you ensure you have enough selectivity that way. If for example 90% of the rows ends in 'abcd' an index will not do you much good (unless you are looking for the last 10% of the rows). Best regards, Jesper Krogh MySQL Support
LIKE sql optimization
For example: Select * from T where col like ‘%abcd’; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. Thanks. Zhigang
Re: LIKE sql optimization
Am 12.02.2014 02:23, schrieb Zhigang Zhang: For example: Select * from T where col like ‘%abcd’; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan because ‘%abcd’ can't work in case of a index how do you imagine that? read how a index works technically 'abcd%' may work but '%abcd' is impossible independent what engine, this don't work and won't ever work you may have luck with fulltext search (and it's other drawbacks) https://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html signature.asc Description: OpenPGP digital signature
Re: LIKE sql optimization
Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like '%abcd'; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. Thanks. Zhigang
RE: LIKE sql optimization
I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. zhigang _ From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] Sent: Wednesday, February 12, 2014 9:41 AM To: Zhigang Zhang; mysql@lists.mysql.com Subject: Re: LIKE sql optimization Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like ‘%abcd’; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. Thanks. Zhigang
Re: LIKE sql optimization
*read how a index works technically* On Wed, Feb 12, 2014 at 12:48 PM, Zhigang Zhang zzgang2...@gmail.comwrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. zhigang _ From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] Sent: Wednesday, February 12, 2014 9:41 AM To: Zhigang Zhang; mysql@lists.mysql.com Subject: Re: LIKE sql optimization Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like '%abcd'; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. Thanks. Zhigang
Re: LIKE sql optimization
because a index is not just a dumb copy of the whole field and you simply can't seek in the middle of it? http://en.wikipedia.org/wiki/B-tree http://mattfleming.com/node/192 Am 12.02.2014 02:48, schrieb Zhigang Zhang: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] Sent: Wednesday, February 12, 2014 9:41 AM To: Zhigang Zhang; mysql@lists.mysql.com Subject: Re: LIKE sql optimization Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like ‘%abcd’; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. signature.asc Description: OpenPGP digital signature
Re: LIKE sql optimization
MySQL can't use index when '%' condition gives even oracle and you can try full-text search 2014-02-12 9:55 GMT+08:00 kitlenv kitl...@gmail.com: *read how a index works technically* On Wed, Feb 12, 2014 at 12:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. zhigang _ From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] Sent: Wednesday, February 12, 2014 9:41 AM To: Zhigang Zhang; mysql@lists.mysql.com Subject: Re: LIKE sql optimization Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like '%abcd'; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. Thanks. Zhigang -- Phone: +86 1868061 Email Gtalk: yloui...@gmail.com Personal Blog: http://www.vmcd.org
Re: LIKE sql optimization
Same reason as why composite index works only if you supply first field or fields .. example index on a,b,c if you have a query : select * from tbl where a = 'whatever' and b = 'something it will use the index .. but a query like this one : select * from tbl where b = 'something' and c = 'something else' won't use the index .. - Just like an index in a book ... 2014-02-11 21:03 GMT-05:00 louis liu yloui...@gmail.com: MySQL can't use index when '%' condition gives even oracle and you can try full-text search 2014-02-12 9:55 GMT+08:00 kitlenv kitl...@gmail.com: *read how a index works technically* On Wed, Feb 12, 2014 at 12:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. zhigang _ From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] Sent: Wednesday, February 12, 2014 9:41 AM To: Zhigang Zhang; mysql@lists.mysql.com Subject: Re: LIKE sql optimization Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like '%abcd'; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan. Thanks. Zhigang -- Phone: +86 1868061 Email Gtalk: yloui...@gmail.com Personal Blog: http://www.vmcd.org
RE: LIKE sql optimization
I checked a myisam table index, the index is a copy of the whole field. Zhigang -Original Message- From: Reindl Harald [mailto:h.rei...@thelounge.net] Sent: Wednesday, February 12, 2014 10:02 AM To: mysql@lists.mysql.com Subject: Re: LIKE sql optimization because a index is not just a dumb copy of the whole field and you simply can't seek in the middle of it? http://en.wikipedia.org/wiki/B-tree http://mattfleming.com/node/192 Am 12.02.2014 02:48, schrieb Zhigang Zhang: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. From: Mathieu Desharnais [mailto:mdesharn...@diffusion.cc] Sent: Wednesday, February 12, 2014 9:41 AM To: Zhigang Zhang; mysql@lists.mysql.com Subject: Re: LIKE sql optimization Sql database doesn't use index in like statement if it starts with % .. like 'abcd%' would work though... To use an index you can store your value using reverse function and index it .. then your like would use the index. 2014-02-11 20:23 GMT-05:00 Zhigang Zhang zzgang2...@gmail.com: For example: Select * from T where col like ‘%abcd’; The table T is myisam table and we created a index on col. As we known, this like sql does not use the index created on col, it confuse me, why? I think in mysiam engine, the index data is smaller, it can use index link list to optimize it so as to reduce the disk scan than to the whole table scan.
Re: LIKE sql optimization
Hi Zhigang, On Feb 11, 2014, at 8:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. I think I understand the question - you are asking why MySQL will not index scan, find matching records, and then look them up rather than table scan? I believe the answer is that there is no way of knowing if 1 row matches, or all rows match. In the worst case (where all rows match), it is much more expensive to traverse between index and data rows for-each-record. So a table scan is a “safe choice / has less variance. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: LIKE sql optimization
Thank you very much! Zhigang -Original Message- From: Morgan Tocker [mailto:morgan.toc...@oracle.com] Sent: Wednesday, February 12, 2014 10:16 AM To: Zhigang Zhang Cc: mysql@lists.mysql.com Subject: Re: LIKE sql optimization Hi Zhigang, On Feb 11, 2014, at 8:48 PM, Zhigang Zhang zzgang2...@gmail.com wrote: I want to know the reason, in my opinion, to scan the smaller index data has better performance than to scan the whole table data. I think I understand the question - you are asking why MySQL will not index scan, find matching records, and then look them up rather than table scan? I believe the answer is that there is no way of knowing if 1 row matches, or all rows match. In the worst case (where all rows match), it is much more expensive to traverse between index and data rows for-each-record. So a table scan is a “safe choice / has less variance.=
Re: Query Optimization
SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it faster, if the field is ENUM On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar sb_akarmar...@yahoo.comwrote: Hi All, Consider a scenario, I have table XYZ which contains value follow BLUE RED GREEN NULL following are queries we can use get this values 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN' and more So which one is good in terms of optimization. I guess, 1 and 3 are similar in term of formation. --Anupam -- Thanks Regards, P.Benaya Paul http://www.codeasearch.com http://www.iwannasearch.com
RE: Query Optimization
It depends on the distribution of the 4 'values' in that field. If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do a table scan. -Original Message- From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com] Sent: Friday, November 16, 2012 12:36 AM To: mysql@lists.mysql.com Subject: Query Optimization Hi All, Consider a scenario, I have table XYZ which contains value follow BLUE RED GREEN NULL following are queries we can use get this values 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN' and more So which one is good in terms of optimization. I guess, 1 and 3 are similar in term of formation. --Anupam -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: forcing mysql to use batched key access (BKA) optimization for joins
INDEX(o_orderdate, o_cust_key, o_orderkey) Would probably help a bunch. I assume you have indexes (PKs?) on c_custkey, l_order_key, n_nationkey. Please provide SHOW CREATE TABLE and SHOW STATUS TABLE. -Original Message- From: Hal?sz S?ndor [mailto:h...@tbbs.net] Sent: Tuesday, April 10, 2012 5:20 PM To: mysql@lists.mysql.com Subject: Re: forcing mysql to use batched key access (BKA) optimization for joins 2012/04/10 15:58 -0400, Stephen Tu select c_custkey, c_name, sum(l_extendedprice * (100 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from CUSTOMER_INT, ORDERS_INT, LINEITEM_INT, NATION_INT where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate = date '1994-08-01' and o_orderdate date '1994-08-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20 I understand that MySQL works better if inner joining explicitly is stated, not implicitly as you have it. What are your keys, indices? Going by that which I have read heard, you want every field named after this query s 'where' an index, if not key--and, of course, your every field named '...key' is a key, right? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: forcing mysql to use batched key access (BKA) optimization,for joins
Hi Stephen, 2012/04/10 15:58 -0400, Stephen Tu | id | select_type | table| type | possible_keys | key | key_len | ref | rows| Extra | ++-+--++---+---+-+-+-+-+ | 1 | SIMPLE | CUSTOMER_INT | ALL| PRIMARY | NULL | NULL| NULL| 1501528 | Using temporary; Using filesort | | 1 | SIMPLE | NATION_INT | eq_ref | PRIMARY | PRIMARY | 4 | tpch-10.00.CUSTOMER_INT.C_NATIONKEY | 1 | NULL| | 1 | SIMPLE | ORDERS_INT | ref| PRIMARY,O_CUSTKEY | O_CUSTKEY | 4 | tpch-10.00.CUSTOMER_INT.C_CUSTKEY | 7 | Using where; Using join buffer (Batched Key Access) | | 1 | SIMPLE | LINEITEM_INT | ref| PRIMARY | PRIMARY | 4 | tpch-10.00.ORDERS_INT.O_ORDERKEY| 1 | Using where | ++-+--++---+---+-+-+-+-+ 4 rows in set (0.00 sec) I'm wondering why, in this particular query, mysql doesn't use the BKA to join the LINEITEM_INT table, but uses it for the ORDERS_INT table? It seems like it should also use BKA to batch key fetches from the LINEITEM_INT table (this I believe is the reason for the slow performance). The basis for Batched Key Access (BKA) is the Disk-Sweep Multi-Range Read (DS-MRR) strategy. The basic idea of DS-MRR is to accumulate primary keys from a batch of secondary index look-ups and access the rows in the base table in primary key order. In other words, DS-MRR (and BKA) does not apply for look-ups by primary key. Hence, since the ref access into lineitem is by primary key, BKA will not be used. Maybe you will get a more optimal plan if you add more indexes. In my case, where I have an index on orders(o_orderdate), the join will start with a range scan on the orders tables. If I enable DS-MRR for this range scan, query time is reduced from 455 seconds (without DS-MRR) to 90 seconds on a scale 1 database. Hope this helps, -- Øystein -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: forcing mysql to use batched key access (BKA) optimization for joins
2012/04/10 15:58 -0400, Stephen Tu select c_custkey, c_name, sum(l_extendedprice * (100 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from CUSTOMER_INT, ORDERS_INT, LINEITEM_INT, NATION_INT where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate = date '1994-08-01' and o_orderdate date '1994-08-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20 I understand that MySQL works better if inner joining explicitly is stated, not implicitly as you have it. What are your keys, indices? Going by that which I have read heard, you want every field named after this query s 'where' an index, if not key--and, of course, your every field named '...key' is a key, right? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
query optimization
I have a table with 24 million rows, I need to figure out how to optimize a query. It has to do with mac addresses and radius packets - I want to see the # of connections and the min/max date. So I basically want all this data: select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) recn from radiuscap where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) and r3_type='Access' group by cpe_mac order by cpe_mac ; This piece of the query takes 30 seconds to run and produces 3500 rows. I have r3_dt indexed. I also want a status field of the row with the highest r3_dt: select rec_num,cpe_mac,req_status from rad_r3cap where r3_type='Access' and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap) ; This piece of the query takes forever, I let it run for an hour and it still didn't finish, it's obviously not using indexes. I have no idea how far along it got. I wrote a php script to run the 1st query, then do 3500 individual lookups for the status using the max(rec_num) field in the 1st query, and I can get the data in 31 seconds. So I CAN produce this data, but very slowly, and not in 1 sql query. I want to consolidate this into 1 sql so I can make a view. If anyone can point me in the right direction, I'd appreciate it! mysql desc rad_r3cap; +-+-+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+-+--+-+-++ | rec_num | int(11) | NO | PRI | NULL | auto_increment | | r3_dt | datetime | YES | MUL | NULL | | | r3_micros | int(11) | YES | | NULL | | | r3_type | varchar(16) | YES | | NULL | | | req_status | varchar(16) | YES | | NULL | | | req_comment | varchar(64) | YES | | NULL | | | asn_ip | varchar(16) | YES | MUL | NULL | | | asn_name | varchar(16) | YES | | NULL | | | bsid | varchar(12) | YES | MUL | NULL | | | cpe_ip | varchar(16) | YES | | NULL | | | cpe_mac | varchar(12) | YES | MUL | NULL | | | filename | varchar(32) | YES | | NULL | | | linenum | int(11) | YES | | NULL | | | r3_hour | datetime | YES | MUL | NULL | | | user_name | varchar(64) | YES | | NULL | | +-+-+--+-+-++ mysql show indexes in rad_r3cap; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | rad_r3cap | 0 | PRIMARY | 1 | rec_num | A | 23877677 | NULL | NULL | | BTREE | | | rad_r3cap | 0 | r3cap_dt | 1 | r3_dt | A | NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 0 | r3cap_dt | 2 | r3_micros | A | NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_bsid | 1 | bsid | A | 346 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_asnip | 1 | asn_ip | A | 55 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_cpemac | 1 | cpe_mac | A | 4758 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_date | 1 | r3_hour | A | 1548 | NULL | NULL | YES | BTREE | | +---++--+--+-+---+-+--++--++-+ 7 rows in set (0.00 sec)
Re: query optimization
Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full table scan, you might want to check on this and use a WHERE condition to use indexed column On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote: I have a table with 24 million rows, I need to figure out how to optimize a query. It has to do with mac addresses and radius packets - I want to see the # of connections and the min/max date. So I basically want all this data: select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) recn from radiuscap where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) and r3_type='Access' group by cpe_mac order by cpe_mac ; This piece of the query takes 30 seconds to run and produces 3500 rows. I have r3_dt indexed. I also want a status field of the row with the highest r3_dt: select rec_num,cpe_mac,req_status from rad_r3cap where r3_type='Access' and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap) ; This piece of the query takes forever, I let it run for an hour and it still didn't finish, it's obviously not using indexes. I have no idea how far along it got. I wrote a php script to run the 1st query, then do 3500 individual lookups for the status using the max(rec_num) field in the 1st query, and I can get the data in 31 seconds. So I CAN produce this data, but very slowly, and not in 1 sql query. I want to consolidate this into 1 sql so I can make a view. If anyone can point me in the right direction, I'd appreciate it! mysql desc rad_r3cap; +-+-+--+-+-++ | Field | Type| Null | Key | Default | Extra | +-+-+--+-+-++ | rec_num | int(11) | NO | PRI | NULL| auto_increment | | r3_dt | datetime| YES | MUL | NULL|| | r3_micros | int(11) | YES | | NULL|| | r3_type | varchar(16) | YES | | NULL|| | req_status | varchar(16) | YES | | NULL|| | req_comment | varchar(64) | YES | | NULL|| | asn_ip | varchar(16) | YES | MUL | NULL|| | asn_name| varchar(16) | YES | | NULL|| | bsid| varchar(12) | YES | MUL | NULL|| | cpe_ip | varchar(16) | YES | | NULL|| | cpe_mac | varchar(12) | YES | MUL | NULL|| | filename| varchar(32) | YES | | NULL|| | linenum | int(11) | YES | | NULL|| | r3_hour | datetime| YES | MUL | NULL|| | user_name | varchar(64) | YES | | NULL|| +-+-+--+-+-++ mysql show indexes in rad_r3cap; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | rad_r3cap | 0 | PRIMARY |1 | rec_num | A |23877677 | NULL | NULL | | BTREE | | | rad_r3cap | 0 | r3cap_dt |1 | r3_dt | A |NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 0 | r3cap_dt |2 | r3_micros | A |NULL | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_bsid |1 | bsid| A | 346 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_asnip |1 | asn_ip | A | 55 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_cpemac |1 | cpe_mac | A |4758 | NULL | NULL | YES | BTREE | | | rad_r3cap | 1 | r3cap_date |1 | r3_hour | A |1548 | NULL | NULL | YES | BTREE | | +---++--+--+-+---+-+--++--++-+ 7 rows in set (0.00 sec)
Re: Query Optimization
On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
Re: Query Optimization
Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---**
Re: Query Optimization
Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | | ++-+---+--**--+---** +--+-+**+--+--**---+ EXPLAIN output for new method with new index: ++-+---+--**--+---**
Re: Query Optimization
I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+----+-**--** +--+-++--+** --**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+----+-**--** +--+-++--+** --**---+ | 1 | SIMPLE | sc | index |
Re: Query Optimization
If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31') AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---+--**--+---** +--+-+**+--+--**---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--**--+---** +--+-+**+--+--**---+ | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
Re: Query Optimization
Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting
Re: Query Optimization
Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL | NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is in your machine? On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened
Re: Query Optimization
Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start= (ending time) and end= (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS
Re: Query Optimization
Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get: +-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| ALL| ndx_open_close_rcvd | NULL| NULL| NULL | 32393330 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ Thanks for all your help thus far. On 09/08/2011 02:38 PM, Andrew Moore wrote: I don't think I saw any query timings in the emails (maybe I missed them). What version of MySQL are you currently using? What does the explain look like when your remove the limit 10? Is your server tuned for MyISAM or InnoDB? What kind of disk setup is in use? How much memory is
Re: Query Optimization
Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote: Thinking outside the query, is there any archiving that could happen to make your large tables kinder in the range scan? Andy On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com wrote: On 09/01/2011 01:32 PM, Brandon Phelps wrote: On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30 00:00:00') AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt= '2011-08-30 00:00:00' AND close_dt= '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be
Re: Query Optimization
How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58'); ++-+---++---+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-++--+-+ | 1 | SIMPLE | sc| ALL| NULL | NULL| NULL| NULL | 32393330 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+-+-++--+-+ I did create indexes on open_dt and close_dt (2 separate indexes). On 09/08/2011 02:55 PM, Derek Downey wrote: Correct me if I'm wrong. You're wanting to get all records that have an open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a query like: WHERE ((starting time)=open_dt= (ending time)) OR ((starting time)=close_dt=(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: Andy, The queries take minutes to run. MySQL is 5.1.54 and it's running on Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but no other major daemons are running on the machine. We are running RAID 1 (mirroring) with 1TB drives. The tables in question here are all MyISAM. When running with the LIMIT 10 my EXPLAIN is: ++-+---++-+-+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++-+-+-++--+-+ | 1 | SIMPLE | sc| range | ndx_open_close_rcvd | ndx_open_close_rcvd | 8 | NULL | 32393316 | Using where; Using filesort | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++-+-+-++--+-+ When I remove the LIMIT 10 I get:
Re: Query Optimization
Andrew, Generally there is only 1 user performing the complicated SELECT query at a time, however the background process that fills the table is constantly doing a fast SELECT (0.3 seconds) and a subsequent UPDATE. Basically whenever a connection is closed on the firewall, the bg process SELECTS from the table the last record that was opened (and hasn't already been closed) for a given source IP, source port, dest ip, and protocol. So for example whenever the firewall logs a CONNECTION CLOSED message, the bg process does: select id from sonicwall_connections where src_address = w.x.y.z and src_port = x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1 then it UPDATES the close_dt column for the record with the selected ID. These select/update statements being run by the background process generally take 0.000x seconds each. The only slowdown we see is with the complicated SELECT I have below, and this is only executed once in a while throughout the day. For example, if an IT manager sees a huge bandwidth spike on the network, he may access the web gui to determine what the source IP is so he can determine who is downloading a large file, etc. I think what I will try to do is create 60 partitions for the table in question based on month and day. This way each day will go into it's own partition and if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it will only need to access 7 partitions instead of the entire table. My only question with this is how would I go about creating the table with 60 partitions in such a way that I won't need to constantly add/remove new/old partitions every day? Is there any partitioning function I can use that would not require me to update the partitions schema every day? I already plan to have a cron run each night to purge records older than 60 days from the database. On 09/08/2011 03:26 PM, Andrew Moore wrote: Partitioning isn't a bad idea for this however I'm still thinking about your dataset size and possible hardware limitations. It's not likely going to fit into relevant buffers/memory so you're going to be on disk more then you want. You're probably creating temporary tables like crazy and I would bet that there are a good load of them heading to disk too. With your IO performance limited to a small amount of disks as you describe, you're not going to be able to get much more from these queries. Although a dedicated DB server are there other databases been accessed on the server? When looking at the scope of your data, are you capturing more then you need? How often and how far back are the users querying? How many users concurrently performing queries on the 32m record table? On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com wrote: Mihail, I have considered this but have not yet determined how best to go about partitioning the table. I don't think partitioning by dst_address or src_address would help because most of the queries do not filter on IP address (except very specific queries where the end-user is searching the table for history on a particular employee). I could potentially partition the table based on the day of week the connection was opened on which may improve performance for a while since this would take me from a single 32million record table down to roughly 4.5 million records per partition (32 / 7) however we are looking to eventually store up to 2 months worth of data in the table, and so far that 32 million records is only for 1 month, so I estimate another 32 million-ish before the month is out, bringing me to roughly 70 million records total (it would be nice if I could store even more than 2 months, but given my currently performance dilemma I don't expect that to happen). Also this does not take into account that the end-user will often be pulling data for multiple days at a time, meaning that multiple partitions in this scheme will need to be accessed anyway. The only other logical partitioning scheme I can think of would be to partition based on dst_port (the port the log relates to) but the majority of records are all to port 80 (standard web traffic) so I don't think this would be all that helpful. I have never messed with partitioning so it is totally possible that I am not thinking of something, so if you have any ideas on a decent partitioning scheme based on my criteria and queries below, please let me know. Thanks, Brandon On 09/08/2011 02:47 PM, Mihail Manolov wrote: If you're running version 5.1+ you may wanna take a look at table partitioning options you may have. On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote: Thanks for the reply Andy. Unfortunately the users will be selecting varying date ranges and new data is constantly coming in, so I am not sure how I could archive/cache the necessary data that would be any more efficient than simply using the database directly. On 09/08/2011 02:16 PM, Andrew Moore wrote:
Re: Query Optimization
Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings
Re: Query Optimization
From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where | | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | | NULL | UNION RESULT | union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM
Re: Query Optimization
Ah I see. Well thanks for your assistance! -Brandon On 09/08/2011 05:21 PM, Mihail Manolov wrote: From the manual: The default behavior for UNION is that duplicate rows are removed from the result. On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote: Mihail, Thanks so much! I modified your example to include the proper ORDER BY and LIMIT clauses and this, so far, is running super fast (0.0007 seconds). Question, if a record's open_dt is between the range AND the close_dt is between the range as well, will the UNION output the record twice? If so, is there any way to prevent that? (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') UNION (SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58') ORDER BY rcvd DESC LIMIT 10; ++--++++--+-++--++ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++--++++--+-++--++ | 1 | PRIMARY | sc | range | open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt | 8 | NULL | 1057 | Using where| | 1 | PRIMARY | spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 1 | PRIMARY | dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | 2 | UNION| sc | range | ndx_close_dt | ndx_close_dt | 8 | NULL | 1131 | Using where| | 2 | UNION| spm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 || | 2 | UNION| dpm| eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 || | NULL | UNION RESULT |union1,2 | ALL| NULL | NULL | NULL| NULL | NULL | Using filesort | ++--++++--+-++--++ On 09/08/2011 03:45 PM, Mihail Manolov wrote: How about: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' UNION SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM sonicwall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58' On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote: Thanks for the idea Derek, however given the following query my EXPLAIN output is identical: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS
Re: Query Optimization
On Aug 30, 2011 6:46 PM, Brandon Phelps wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem
Re: Query Optimization
On 09/01/2011 04:59 AM, Jochem van Dieten wrote: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc If this is a firewall connection log I presume open_dt is the time a connection was opened and is always going to be less than close_dt. Right? WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? Thanks again, Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d)|---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: On 9/1/2011 09:42, Brandon Phelps wrote: On 09/01/2011 04:59 AM, Jochem van Dieten wrote: ... WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') In that case your logic here simplifies to: WHERE open_dt = '2011-08-30 00:00:00' AND close_dt = '2011-08-30 12:36:53' Now add an index over open_dt and close_dt and see what happens. Jochem Jochem, I can't really use your WHERE logic because I also need to retrieve results where the open_dt time is out of the range specified. For example, a very large file download might span multiple days so given your logic if the connection was started 2 days ago and I want to pull 1 days worth of connections, I would miss that entry. Basically I want to SELECT all of the records that were opened OR closed during the specified time period, ie. if any activity happened between my start and end dates, I need to see that record. Any other ideas? I believe Jochem was on the right track but he got his dates reversed. Let's try a little ASCII art to show the situation. I will setup a query window with two markers (s) and (e). Events will be marked by || markers showing their durations. a) (s) (e) b) |---| c) |---| d) |---| e) || f) |---| g) |---| To describe these situations: a) is the window for which you want to query (s) is the starting time and (e) is the ending time for the date range you are interested in. b) the event starts and stops before your window exists. It won't be part of your results. c) the event starts before the window but ends within the window - include this d) the event starts and ends within the window - include this e) the event starts before the window and ends after the window - include this f) the event starts inside the window but ends beyond the window - include this. g) the event starts and ends beyond the window - exclude this. In order to get every event in the range of c-f, here is what you need for a WHERE clause WHERE start = (ending time) and end = (starting time) Try that and let us know the results. Thanks Jochem and Shawn, however the following two queries result in the exact same EXPLAIN output: (I hope the tables don't wrap too early for you) Old method: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31') AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31') ORDER BY rcvd DESC LIMIT 0, 10; New method with BTREE index on open_dt, close_dt (index name is ndx_open_close_dt): SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA(sc.src_address) AS src_address, sc.src_port, INET_NTOA(sc.dst_address) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31' ORDER BY rcvd DESC LIMIT 0, 10; EXPLAIN output for old method: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra | ++-+---++---+--+-++--+-+ | 1 | SIMPLE | sc| index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 10 | Using where | | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |1 | | | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |1 | | ++-+---++---+--+-++--+-+ EXPLAIN output for new method with new index: ++-+---++---+--+-++--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra |
Query Optimization
Hello, I have the following query I'd like to optimize a bit: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') ORDER BY rcvd DESC LIMIT 0 , 10 Currently I have an index on the rcvd column which seems to be working based on the output of EXPLAIN: id select_type table typepossible_keys key key_len ref rowsExtra 1 SIMPLE sc index open_dt ndx_rcvd4 NULL 10 Using where 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1 However the query is still fairly slow for some reason, any ideas how I could speed it up with additional indexes, etc? The values I am using in the WHERE clause are variable and change each time. The table has around 23 million records right now but is expected to continue to grow up to a potential 150 million. Here is the table schema: CREATE TABLE IF NOT EXISTS `firewall_connections` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `open_dt` datetime NOT NULL, `close_dt` datetime NOT NULL, `protocol` smallint(6) NOT NULL, `src_address` int(10) unsigned NOT NULL, `src_port` smallint(5) unsigned NOT NULL, `dst_address` int(10) unsigned NOT NULL, `dst_port` smallint(5) unsigned NOT NULL, `sent` int(10) unsigned NOT NULL, `rcvd` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `ndx_rcvd` (`rcvd`), KEY `ndx_sent` (`sent`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Thanks in advance! -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
There are a few things gonig on, but mainly it is the ORs that are killing you. As your require OR to examine two distinct columns, both of equal relevance to the query, MySQL: is left with no choice but to do a full table scan on what might be (at a guess) a very larger table. No amount of indexing will fix this for the query presented. You would be better off writing it as two distinct queires, each concerned with conditions on a single column (open_dt and close_dt) and then UNIONing the results. In this form, the indexes have a chance of being engaged. Once the indexes are engaged, you probably want to your DATE/DATETIME strings into actual DATEs or DATATIMEs, thus: ... (open_dt = DATE('2011-08-30 00:00:00')) ... In it's current state, the DATE fields are being converted to strings implicitly for every row tested which further frustrates index usage as the index is against the quasi-numeric DATE, not the string representation which your current implementation appears to expect. This query would also suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by an admin or a future release of MySQL. The explicit casting I have suggested will protect you against that, - michael dykman` On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote: Hello, I have the following query I'd like to optimize a bit: SELECT sc.open_dt, sc.close_dt, sc.protocol, INET_NTOA( sc.src_address ) AS src_address, sc.src_port, INET_NTOA( sc.dst_address ) AS dst_address, sc.dst_port, sc.sent, sc.rcvd, spm.desc AS src_port_desc, dpm.desc AS dst_port_desc FROM firewall_connections AS sc LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port WHERE (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00') AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 12:36:53') ORDER BY rcvd DESC LIMIT 0 , 10 Currently I have an index on the rcvd column which seems to be working based on the output of EXPLAIN: id select_type table typepossible_keys key key_len ref rowsExtra 1 SIMPLE sc index open_dt ndx_rcvd4 NULL10 Using where 1 SIMPLE spm eq_ref PRIMARY PRIMARY 2 syslog.sc.src_port 1 1 SIMPLE dpm eq_ref PRIMARY PRIMARY 2 syslog.sc.dst_port 1 However the query is still fairly slow for some reason, any ideas how I could speed it up with additional indexes, etc? The values I am using in the WHERE clause are variable and change each time. The table has around 23 million records right now but is expected to continue to grow up to a potential 150 million. Here is the table schema: CREATE TABLE IF NOT EXISTS `firewall_connections` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `open_dt` datetime NOT NULL, `close_dt` datetime NOT NULL, `protocol` smallint(6) NOT NULL, `src_address` int(10) unsigned NOT NULL, `src_port` smallint(5) unsigned NOT NULL, `dst_address` int(10) unsigned NOT NULL, `dst_port` smallint(5) unsigned NOT NULL, `sent` int(10) unsigned NOT NULL, `rcvd` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `ndx_rcvd` (`rcvd`), KEY `ndx_sent` (`sent`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; Thanks in advance! -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com -- - michael dykman - mdyk...@gmail.com May the Source be with you.
Query Optimization
Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
Thanks Singer, this took my query down to 0.0007, perfect! I wasn't aware a single index of multiple columns would work when one of the columns was in the WHERE clause and the other in the ORDER BY clause. Learn something new every day I guess! On 08/10/2011 02:03 PM, Singer X.J. Wang wrote: Try a index on (dst_port,close_dt) On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com mailto:bphe...@gls.com wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Thanks in advance, -- Brandon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?__unsub=w...@singerwang.com http://lists.mysql.com/mysql?unsub=w...@singerwang.com -- The best compliment you could give Pythian for our service is a referral. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
On 8/10/2011 1:01 PM, Brandon Phelps wrote: Hello all, I am using the query below and variations of it to query a database with a TON of records. Currently the database has around 11 million records but it grows every day and should cap out at around 150 million. I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 I do have an index on the dst_port column, as you can see by the output of EXPLAIN: id 1 select_type SIMPLE tableconnections type ref possible_keysdst_port key dst_port key_len 2 ref const rows 1109273 ExtraUsing where; Using filesort Did you try adding your ORDER BY argument close_dt to the index? PB - Thanks in advance, -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Query Optimization
From: Brandon Phelps bphe...@gls.com I am curious if there is any way I can better optimize the below query, as currently it takes this query around 10 seconds to run but I am sure this will get slower and slower as the database grows. You need an index on `close_dt`. SELECT open_dt, close_dt, protocol, INET_NTOA(src_address) AS src_address, src_port, INET_NTOA(dst_address) AS dst_address, dst_port, sent, rcvd FROM connections WHERE dst_port = 80 ORDER BY close_dt DESC LIMIT 0, 30 Current farmers, who have become mere operators of machines and mixers of chemicals, may not have the skills to develop a local, sustainable agriculture. A new generation of farmers, numbering in the tens of millions, will need to be trained and relocated to rural communities. -- Pat Murphy Jan Steinman, EcoReality Co-op -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: optimization strategies based on file-level storage
At 10:47 PM 6/16/2011, Johan De Meersman wrote: - Original Message - From: Bennett Haselton benn...@peacefire.org Do you happen to know the answer to my other problem -- if I have TEXT and BLOB columns but all my other columns are fixed-length, can I still get the benefit of faster lookups resulting from fixed-length rows, if each row just contains a fixed-length reference to the actual TEXT or BLOB data which is stored elsewhere? entirely unsure, but given the single-datafile structure of MyISAM tables, I strongly suspect BLOBS get stored inline. I can't find a source that says for sure. http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html does say: The maximum size of a row in a MyISAM table is 65,535 bytes. This figure excludes BLOB or TEXT columns, which contribute only 9 to 12 bytes toward this size... For BLOB and TEXT data, the information is stored internally in a different area of memory than the row buffer. But that's talking about memory, not disk. When people talk about performance improvements from using fixed-length rows, are they talking primarily about memory or hard disk? Hold up though, I just got this reply from posting the question in a forum: http://forums.mysql.com/read.php?21,423433,423846 which says Almost always the discussions recommending Fixed length records in MyISAM are myths. The apparent performance improvement is swamped by the performance loss of shoveling around the wasted padding bytes and goes on to give reasons. Actually, that does make sense that it's a myth. I was surprised to hear so many sources claiming that there was a big performance increase from being able to find row N by jumping to position N*rowlength. Because even with variable-length rows, you can just store a table associating row numbers with the position of the row in the file, can't you -- which would mean it would only take one near-instantaneous lookup to be able to jump to the row you're looking for. What I was really trying to figure out was why it takes me 4 hours to add a new column to my 22-million-row table, and whether a different table design can avoid that problem. That reply in the forum says, ALTER TABLE ... ADD COLUMN will always copy the entire table over, and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do _not_ leave space for extra columns, it won't help. I'm about to reply and point out the trick that you suggested to me: create dummy columns early and then just rename them later :) -Bennett -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: optimization strategies based on file-level storage
-Original Message- snip What I was really trying to figure out was why it takes me 4 hours to add a new column to my 22-million-row table, and whether a different table design can avoid that problem. That reply in the forum says, ALTER TABLE ... ADD COLUMN will always copy the entire table over, and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do _not_ leave space for extra columns, it won't help. I'm about to reply and point out the trick that you suggested to me: create dummy columns early and then just rename them later :) -Bennett [JS] They will be too small, or the wrong type, or there won't be enough of them. Based upon 30+ years of database design, I'd bet money on it. ;-) Regards, Jerry Schwartz Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 E-mail: je...@gii.co.jp Web site: www.the-infoshop.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: optimization strategies based on file-level storage
On Fri, June 17, 2011 07:11, Jerry Schwartz wrote: -Original Message- snip What I was really trying to figure out was why it takes me 4 hours to add a new column to my 22-million-row table, and whether a different table design can avoid that problem. That reply in the forum says, ALTER TABLE ... ADD COLUMN will always copy the entire table over, and rebuild all the indexes. (And, effectively, do an OPTIMIZE.) Do _not_ leave space for extra columns, it won't help. I'm about to reply and point out the trick that you suggested to me: create dummy columns early and then just rename them later :) -Bennett [JS] They will be too small, or the wrong type, or there won't be enough of them. Based upon 30+ years of database design, I'd bet money on it. ;-) Regards, Jerry Schwartz Global Information Incorporated The only alternative design would be to create another table with the added columns and a common key field and then lock the primary table and populate it with the keys from the original table, and I'm not convinced that would be any faster or less disruptive. -- William R. Mussatto Systems Engineer http://www.csz.com 909-920-9154
Re: optimization strategies based on file-level storage
- Original Message - From: Bennett Haselton benn...@peacefire.org Do you happen to know the answer to my other problem -- if I have TEXT and BLOB columns but all my other columns are fixed-length, can I still get the benefit of faster lookups resulting from fixed-length rows, if each row just contains a fixed-length reference to the actual TEXT or BLOB data which is stored elsewhere? entirely unsure, but given the single-datafile structure of MyISAM tables, I strongly suspect BLOBS get stored inline. For InnoDB, the answer appears to be it varies: If a row is less than half a page long, all of it is stored locally within the page. If it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page. For a column chosen for off-page storage, InnoDB stores the first 768 bytes locally in the row, and the rest externally into overflow pages. See http://dev.mysql.com/doc/refman/5.0/en/innodb-file-space.html for more on that. Also, I *think* the concept of fixed-length rows is only applicable to MyISAM, InnoDB has index-organised tables - that is to say, it stores all row data in the leaves of the primary index. The consequence, of course, is that no additional pointer lookup gets done for primary key selects; the tradeoff is that all nonprimary key lookups get detoured through the primary key. The online documentation is really pretty good; but for the really low-level things, I guess the best documentation may be the source. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: optimization strategies based on file-level storage
At 11:45 AM 6/14/2011, Johan De Meersman wrote: - Original Message - From: Bennett Haselton benn...@peacefire.org modifications. (For example, the question I asked earlier about whether you can declare extra space at the end of each row that is reserved for future columns.) That question I can answer: you can't reserve space, but if you know what kind of rows you'll want to add later you can pre-add them (and incur the accompanying storage cost), and simply rename them appropriately later. Thanks. It would be more flexible if I could declare, say, 50 bytes, and decide later if I wanted to use them for a datetime, a char(n), or an int, but this is still helpful :) Do you happen to know the answer to my other problem -- if I have TEXT and BLOB columns but all my other columns are fixed-length, can I still get the benefit of faster lookups resulting from fixed-length rows, if each row just contains a fixed-length reference to the actual TEXT or BLOB data which is stored elsewhere? -Bennett -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
optimization strategies based on file-level storage
I'm looking for some tips tricks documentation that explains how different data types in rows are stored at the file level (in MyISAM tables, at least), and how to optimize tables for faster queries, updates, table definition modification, etc. based on this knowledge. For example, I've heard that if all of your columns are fixed-length, that makes it faster to find individual rows since row N is located at position N*rowlength in the file. (And, presumably, having just one variable-length column will slow things down considerably.) But I've also read that having TEXT and BLOB columns will also slow down row-finding operations. This seems to make no sense because I thought TEXT was not actually stored in the row, but the row just stored a constant-length reference to the TEXT whose actual data was stored somewhere else. Is this correct? Then is it incorrect to say that a TEXT column will slow down the locate-row-N operation, assuming all other columns are fixed-length? This is the kind of thing I'm looking for a document to explain. Another example: It sometimes takes me an extremely long time to add new columns to a table. What if I had a table with all fixed-length columns, and I reserved some space at the end of each row to be used for columns to be added in the future. Would it then be possible to add new columns much more quickly? You wouldn't have to move around the existing row data to make room for the new column (although presumably you would still have to *write* to the place in reach row where the new column had just been defined, to fill it in with its default value). In particular, I'm not looking for a list of optimization tricks, so much as a document that explains how the rows are stored at the file level, and thereby explains how the optimization tricks *follow logically from* this information. The reason is that if I just have a grab-bag of optimization hints (of which I've found many on the Web), some of them will be not applicable to my situation, or just plain wrong, and I'll have no way of knowing which ones. But if you know *why* something works, you can more easily figure out if it applies to your situation. -Bennett -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: optimization strategies based on file-level storage
At 05:46 AM 6/14/2011, Carlos Eduardo Caldi wrote: Hello Bennett On the Mysql developer site have a grate documentation, try the links above. http://dev.mysql.com/doc/refman/5.0/en/optimizing-database-structure.html http://dev.mysql.com/doc/refman/5.0/en/data-size.html Thanks, this gets me a little closer to the answer but doesn't really provide the level of detail that I'm looking for. For example, it says: For MyISAM tables, if you do not have any variable-length columns (VARCHAR, TEXT, or BLOB columns), a fixed-size row format is used. This is faster... I still don't understand: If TEXT and BLOB columns are stored not by putting the data in the row but by having the row store a reference to the TEXT/BLOB data stored somewhere else, then can't a row with TEXT and BLOB data types *still* be a fixed-size row, with the resulting increased speed? My main motivation is that I have a table with 22 million records and it takes a few hours for me to add a new column to it. I suspect this would be faster if I designed the table correctly from the beginning, and I want to change it to a smarter design, but I only want to do it once. So I want to understand really thoroughly *why* a different design would make it faster to complete the table modifications. (For example, the question I asked earlier about whether you can declare extra space at the end of each row that is reserved for future columns.) Att. Carlos, Date: Tue, 14 Jun 2011 01:44:47 -0700 To: mysql@lists.mysql.com From: benn...@peacefire.org Subject: optimization strategies based on file-level storage I'm looking for some tips tricks documentation that explains how different data types in rows are stored at the file level (in MyISAM tables, at least), and how to optimize tables for faster queries, updates, table definition modification, etc. based on this knowledge. For example, I've heard that if all of your columns are fixed-length, that makes it faster to find individual rows since row N is located at position N*rowlength in the file. (And, presumably, having just one variable-length column will slow things down considerably.) But I've also read that having TEXT and BLOB columns will also slow down row-finding operations. This seems to make no sense because I thought TEXT was not actually stored in the row, but the row just stored a constant-length reference to the TEXT whose actual data was stored somewhere else. Is this correct? Then is it incorrect to say that a TEXT column will slow down the locate-row-N operation, assuming all other columns are fixed-length? This is the kind of thing I'm looking for a document to explain. Another example: It sometimes takes me an extremely long time to add new columns to a table. What if I had a table with all fixed-length columns, and I reserved some space at the end of each row to be used for columns to be added in the future. Would it then be possible to add new columns much more quickly? You wouldn't have to move around the existing row data to make room for the new column (although presumably you would still have to *write* to the place in reach row where the new column had just been defined, to fill it in with its default value). In particular, I'm not looking for a list of optimization tricks, so much as a document that explains how the rows are stored at the file level, and thereby explains how the optimization tricks *follow logically from* this information. The reason is that if I just have a grab-bag of optimization hints (of which I've found many on the Web), some of them will be not applicable to my situation, or just plain wrong, and I'll have no way of knowing which ones. But if you know *why* something works, you can more easily figure out if it applies to your situation. -Bennett -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=ce_ca...@hotmail.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
On Jun 7, 2011, at 10:43 PM, Johan De Meersman wrote: Where did you find the advice about setting columns NOT NULL? It took me awhile, but I just found it again, in case anyone is interested: http://dev.mysql.com/doc/refman/5.0/en/data-size.html 7.8.1. Make Your Data as Small as Possible Declare columns to be NOT NULL if possible. It makes everything faster and you save one bit per column. If you really need NULL in your application, you should definitely use it. Just avoid having it on all columns by default. Marc -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: optimization strategies based on file-level storage
- Original Message - From: Bennett Haselton benn...@peacefire.org modifications. (For example, the question I asked earlier about whether you can declare extra space at the end of each row that is reserved for future columns.) That question I can answer: you can't reserve space, but if you know what kind of rows you'll want to add later you can pre-add them (and incur the accompanying storage cost), and simply rename them appropriately later. ALTER TABLE will do a full re-write of your table when that is necessary (like adding or deleting columns), but will (likely) not do so when you're just renaming an existing column. Look hard enough on the web and you can find info on how to hack the .frm files, too, so some operations that would do a full re-write don't - but that's fishy business, and always at your own risk. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
Heeh. That's not a random blog post, that's the official manual :-p If it's in there, it's pretty much trustworthy; and I've learned something new :-) So apparently there's a dedicated NULL bit to columns... Yes, then there would be a small performance benefit. I stand corrected. Still, as you've noticed, don't change the design of an existing application without thoroughly testing the consequences :-p - Original Message - From: sono...@fannullone.us To: mysql@lists.mysql.com Sent: Tuesday, 14 June, 2011 7:34:04 PM Subject: Re: Data missing after field optimization It took me awhile, but I just found it again, in case anyone is interested: http://dev.mysql.com/doc/refman/5.0/en/data-size.html 7.8.1. Make Your Data as Small as Possible Declare columns to be NOT NULL if possible. It makes everything faster and you save one bit per column. If you really need NULL in your application, you should definitely use it. Just avoid having it on all columns by default. Marc -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Data missing after field optimization
snip If that's all you did, you indeed 'removed the default NULL' but did not specify another default. Hence, if you don't explicitly specify a value in your insert statement, the insert can not happen as the server doesn't know what to put there and is explicitly disallowed from leaving the value empty. That makes sense. So then why does phpMyAdmin allow you to choose None as a default? [JS] There are times when you want to enforce certain conditions even though they might sometimes generate an error. Two examples are required fields (no default value) or referential integrity. The goal is not to lose data, but to make sure your applications are doing what they are supposed to do. You would program your applications to trap and report errors. Regards, Jerry Schwartz Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 E-mail: je...@gii.co.jp Web site: www.the-infoshop.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
On Jun 7, 2011, at 10:43 PM, Johan De Meersman wrote: If that's all you did, you indeed 'removed the default NULL' but did not specify another default. Hence, if you don't explicitly specify a value in your insert statement, the insert can not happen as the server doesn't know what to put there and is explicitly disallowed from leaving the value empty. That makes sense. So then why does phpMyAdmin allow you to choose None as a default? Where did you find the advice about setting columns NOT NULL? I can't find them today, but I read on a couple of web sites that not populating fields with NULL would make the tables more efficient. I have a shopping cart that has every field as type TEXT and I'm going through it with a fine tooth comb trying to make it more efficient. You really shouldn't take everything you read on the internet as truth :-) NOW you tell me! =;) I guess I learned my lesson the hard way. Thanks again, Marc -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
On Jun 6, 2011, at 10:06 PM, Johan De Meersman wrote: What exactly do you mean by removing the NULL default? Did you set your colums NOT NULL? Yes. That's all I did. Marc -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
On Jun 6, 2011, at 10:06 PM, Johan De Meersman wrote: What exactly do you mean by removing the NULL default? Did you set your colums NOT NULL? Yes. That's all I did. In stead of getting info drop-by-drop, you might want to share the output of SHOW CREATE TABLE...,, but my guess is you had a UNIQUE key somewhere that got violated when converting to NULL's to empty strings. If would require an ALTER IGNORE TABLE... instead of a normal ALTER TABLE though. That, or an outside source (code we cannot see querying for NOT NULLs etc.) -- Rik Wasmus -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
- Original Message - From: sono...@fannullone.us Yes. That's all I did. If that's all you did, you indeed 'removed the default NULL' but did not specify another default. Hence, if you don't explicitly specify a value in your insert statement, the insert can not happen as the server doesn't know what to put there and is explicitly disallowed from leaving the value empty. Where did you find the advice about setting columns NOT NULL? You really shouldn't take everything you read on the internet as truth :-) -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Data missing after field optimization
Hopefully I won't look like too much of a numbskull here but after reading some sites on table optimization, I decided to remove the NULL as default on the fields in my products table. I thought everything went well until I realized that we hadn't received any orders for 2 days. That's when I realized that my products table was empty! Luckily this happened over the weekend. =:\ I've made changes to field types before and never had a problem like this. Does anyone know what went wrong from my limited description? Why would removing the NULL default cause data to be lost? Thanks, Marc -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Data missing after field optimization
- Original Message - From: sono...@fannullone.us description? Why would removing the NULL default cause data to be lost? What exactly do you mean by removing the NULL default? Did you set your colums NOT NULL? -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Super active table optimization
I have an innodb table with 200 million rows and growing. It is a highly active table with tons of inserts and updates at all times. I notice a select query I test on that table is 0.01 seconds or less when all the inserts/updates are OFF. But when I throttle up the writes to the table, the select sql then takes like 1-3 full seconds or more. This is a serious bottleneck in our application. Any basic innodb wisdom for this scenario? +---+-+--+-+-+---+ | Field | Type| Null | Key | Default | Extra | +---+-+--+-+-+---+ | id | bigint(20) unsigned | NO | PRI | 0 | | | clock | int(11) | NO | PRI | 0 | | | type | int(11) | NO | PRI | 3 | | | num | int(11) | NO | | 0 | | | value_min | double(20,4)| YES | | NULL| | | value_avg | double(20,4)| YES | | NULL| | | value_max | double(20,4)| YES | | NULL| | +---+-+--+-+-+---+ # cat /etc/my.cnf|grep -i innodb default-storage-engine=InnoDB innodb_data_file_path = ibdata1:256M:autoextend innodb_buffer_pool_size = 768M innodb_additional_mem_pool_size = 32M innodb_log_file_size = 192M innodb_log_buffer_size = 64M innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout = 50 innodb_flush_method=O_DIRECT innodb_table_locks=0; innodb_use_legacy_cardinality_algorithm=0; 16G memory 16G swap 8 CPU -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Super active table optimization
1. Which version of MySQL? Judging from your my.cnf it's not MariaDB 5.2 or Percona Server? They have very good enhancement, particularly in the high concurrency innodb atmosphere. InnoDB is a lot more tunabl;e in those versions. Switching from a stock mysql version is adrop-in replacement, so if tested properly should be a very simple operation. 2. Try makign all fields NOT NULL, that should increase performance as well, although I'm not sure how much that would help. On Sat, Feb 19, 2011 at 16:59, Bryan Cantwell bcantw...@firescope.comwrote: # cat /etc/my.cnf|grep -i innodb Just because i like nagging: uuoc :) innodb_buffer_pool_size = 768M With 16G RAM you should set this as high as you can, up to 60-75% of your memory. This of course depends on the storage engine the rest of your db uses. Aim to be mostly InnoDB and you can set this very high innodb_flush_log_at_trx_commit = 2 Have you read the consequences of setting this to 2? Look up the manual and make an infromed decision as it can severely affect what happens if your server dies. innodb_flush_method=O_DIRECT This depends on what kind of hardware you have. With a RAID array with write cache, you can set it to O-DIRECT. In other environments it can adversely affect performance. 16G swap This is insane :) If MySQL starts swapping, you're dead in the water anyway. Typically, anything over 1G or so is too much. 8 CPU How many cores? The more cores, the more use you will get from using one of the non-stock MySQL versions, as they have been optimised for concurrency and are much more tunable. Shameless plug: if you want more on-topic help, you should look into letting us (I work for Open Query) help you. We can then research your whole environment and have much better advice that is not limited to this scope. Other companies that do similar services include Percona, SkySQL and FromDual to name a few :) Have fun! -- Walter Heck Engineer @ Open Query (http://openquery.com) Exceptional services for MariaDB and MySQL at a fixed budget -- Follow @openquery on Twitter to stay up to date
Re: Super active table optimization
Am 19.02.2011 16:59, schrieb Bryan Cantwell: I have an innodb table with 200 million rows and growing. more interesting is the db-size! innodb_buffer_pool_size = 768M roo small, in the best case it is as large as the database or so large as pissoble to avoid swapping innodb_log_file_size = 192M 1/4 of buffer_pool is ok but if your buffer-pool is too small it is a little difficult to increase this, i would never set this lower than 512 MB which is also ok for larger databases this are our innodb settings for a dbmail-instance and mysql 5.5 and we are speaking about 15 GB data and 3 Mio. rows, the backend is a san-storage in this setup we can increase up to 1.000 inserts per second innodb_buffer_pool_size = 5120M innodb_buffer_pool_instances= 5 innodb_purge_threads= 1 innodb_max_purge_lag= 20 innodb_max_dirty_pages_pct = 60 innodb_additional_mem_pool_size = 64M innodb_log_file_size= 512M innodb_log_buffer_size = 256M innodb_thread_concurrency = 32 innodb_thread_sleep_delay = 10 innodb_flush_log_at_trx_commit = 2 innodb_support_xa = 1 innodb_lock_wait_timeout= 50 innodb_table_locks = 0 innodb_file_format = barracuda innodb_file_per_table = 1 innodb_open_files = 600 innodb_io_capacity = 800 innodb_read_io_threads = 8 innodb_write_io_threads = 8 transaction-isolation = READ-COMMITTED signature.asc Description: OpenPGP digital signature
Re: some problem of InnoDB performance optimization
Hi, Lets see If I can help. Overly long queries (transactions in general) are bad for performance as a lot of unpurged versions accumulate. In this sentence, I don't know the meaning about 'unpureged version accumulate' When rows are updated new versions are created. They are later removed by purge thread - only then no active transactions may need them. This is why long open transactions are expensive. And I don't how to understanding 'open transaction'? This is transaction which is started (opened) but not yet ended by commit or rollback. Required for logical level replication to work properly. What's the meaning about logical level replication? MySQL has statement level, also caused logical as well as row based replication. statement level replication requires updates to be serializable to work. Can give problems for portable applications if you port from MySQL-4.0 to later What's the meaning about this sentence? This means you can run into the problems if you upgrade from MySQL 4.0 to later version. Probably is not much relevant any more. -- Peter Zaitsev, CEO, Percona Inc. Tel: +1 888 401 3401 ext 501 Skype: peter_zaitsev 24/7 Emergency Line +1 888 401 3401 ext 911 Percona Training Workshops http://www.percona.com/training/
Re: some problem of InnoDB performance optimization
HI Peter: Thanks for your answer. I have understand your answer. Thank you very much. ―― Best regards Timo Seven blog: http://zauc.wordpress.com #请翻墙浏览,或者指定hosts到74.200.243.252### UNIX System Admin 2010/3/23 Peter Zaitsev p...@percona.com Hi, Lets see If I can help. Overly long queries (transactions in general) are bad for performance as a lot of unpurged versions accumulate. In this sentence, I don't know the meaning about 'unpureged version accumulate' When rows are updated new versions are created. They are later removed by purge thread - only then no active transactions may need them. This is why long open transactions are expensive. And I don't how to understanding 'open transaction'? This is transaction which is started (opened) but not yet ended by commit or rollback. Required for logical level replication to work properly. What's the meaning about logical level replication? MySQL has statement level, also caused logical as well as row based replication. statement level replication requires updates to be serializable to work. Can give problems for portable applications if you port from MySQL-4.0 to later What's the meaning about this sentence? This means you can run into the problems if you upgrade from MySQL 4.0 to later version. Probably is not much relevant any more. -- Peter Zaitsev, CEO, Percona Inc. Tel: +1 888 401 3401 ext 501 Skype: peter_zaitsev 24/7 Emergency Line +1 888 401 3401 ext 911 Percona Training Workshops http://www.percona.com/training/
some problem of InnoDB performance optimization
Hi everyone: I read the presentation about InnodDB performance optimization what Heikki Tuuri written in april23 2007. But now I have some sentences don't know how to understanding. Can you help me? Overly long queries (transactions in general) are bad for performance as a lot of unpurged versions accumulate. In this sentence, I don't know the meaning about 'unpureged version accumulate' And I don't how to understanding 'open transaction'? Required for logical level replication to work properly. What's the meaning about logical level replication? Can give problems for portable applications if you port from MySQL-4.0 to later What's the meaning about this sentence? ―― Best regards Timo Seven blog: http://zauc.wordpress.com #请翻墙浏览,或者指定hosts到74.200.243.252### UNIX System Admin
Re: Need help with query optimization
It may only be returning 51 rows but its having to read significantly more. Get rid of the derived table join if possible. Something like SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Sum(Pts.Points) Total_Points FROM TorchAwardParticipants TAP JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID JOIN Chapters C On C.ID http://c.id/=M.ChapterID JOIN Schools S On S.ID http://s.id/=C.SchoolID JOIN TorchAwardSelAct Pts ON Pts.AchievementID=TAP.ID http://tap.id/ WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL AND Pts.LocalApproveStatus='A' GROUP BY TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District ORDER BY TAP.LastName, TAP.FirstName, Sum(Pts.Points) Regards John On Tue, Mar 16, 2010 at 6:17 PM, Jesse j...@msdlg.com wrote: I have the following query: SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Pts.TotPoints FROM TorchAwardParticipants TAP JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID JOIN Chapters C On C.ID http://c.id/=M.ChapterID JOIN Schools S On S.ID http://s.id/=C.SchoolID JOIN (SELECT AchievementID,Sum(Points) As TotPoints FROM TorchAwardSelAct TASA WHERE LocalApproveStatus='A' GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/ WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints The TorchAwardParticipants table has about 84,000 records in it. The query takes almost 40 seconds to return the data, which is only 51 rows. An EXPLAIN returns the following: ++-+++-+---+-+-++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+++-+---+-+-++-+ | 1 | PRIMARY | derived2 | ALL| NULL| NULL | NULL| NULL| 4382 | Using temporary; Using filesort | | 1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY | 4 | Pts.AchievementID | 1 | Using where | | 1 | PRIMARY | M | eq_ref | PRIMARY,IX_Members_3| PRIMARY | 4 | bpa.TAP.CurrentMemberID | 1 | | | 1 | PRIMARY | C | eq_ref | PRIMARY,IX_Chapters_1 | PRIMARY | 4 | bpa.M.ChapterID | 1 | | | 1 | PRIMARY | S | eq_ref | PRIMARY | PRIMARY | 4 | bpa.C.SchoolID | 1 | | | 2 | DERIVED | TASA | index | NULL| AchievementID | 5 | NULL| 161685 | Using where | ++-+++-+---+-+-++-+ What is the best way to optimize this query so that it doesn't take 40 seconds to return the dataset? Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=john.dais...@butterflysystems.co.uk
Need help with query optimization
I have the following query: SELECT TAP.ID, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Pts.TotPoints FROM TorchAwardParticipants TAP JOIN Members M On M.ID=TAP.CurrentMemberID JOIN Chapters C On C.ID=M.ChapterID JOIN Schools S On S.ID=C.SchoolID JOIN (SELECT AchievementID,Sum(Points) As TotPoints FROM TorchAwardSelAct TASA WHERE LocalApproveStatus='A' GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.ID WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints The TorchAwardParticipants table has about 84,000 records in it. The query takes almost 40 seconds to return the data, which is only 51 rows. An EXPLAIN returns the following: ++-+++-+---+-+-++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+++-+---+-+-++-+ | 1 | PRIMARY | derived2 | ALL| NULL| NULL | NULL| NULL| 4382 | Using temporary; Using filesort | | 1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY | 4 | Pts.AchievementID | 1 | Using where | | 1 | PRIMARY | M | eq_ref | PRIMARY,IX_Members_3| PRIMARY | 4 | bpa.TAP.CurrentMemberID | 1 | | | 1 | PRIMARY | C | eq_ref | PRIMARY,IX_Chapters_1 | PRIMARY | 4 | bpa.M.ChapterID | 1 | | | 1 | PRIMARY | S | eq_ref | PRIMARY | PRIMARY | 4 | bpa.C.SchoolID | 1 | | | 2 | DERIVED | TASA | index | NULL| AchievementID | 5 | NULL| 161685 | Using where | ++-+++-+---+-+-++-+ What is the best way to optimize this query so that it doesn't take 40 seconds to return the dataset? Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Need help with query optimization
Can you please show us the indexes on both the tables. regards anandkl On Tue, Mar 16, 2010 at 11:47 PM, Jesse j...@msdlg.com wrote: I have the following query: SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State, TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType, S.Region, S.District,Pts.TotPoints FROM TorchAwardParticipants TAP JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID JOIN Chapters C On C.ID http://c.id/=M.ChapterID JOIN Schools S On S.ID http://s.id/=C.SchoolID JOIN (SELECT AchievementID,Sum(Points) As TotPoints FROM TorchAwardSelAct TASA WHERE LocalApproveStatus='A' GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/ WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints The TorchAwardParticipants table has about 84,000 records in it. The query takes almost 40 seconds to return the data, which is only 51 rows. An EXPLAIN returns the following: ++-+++-+---+-+-++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+++-+---+-+-++-+ | 1 | PRIMARY | derived2 | ALL| NULL| NULL | NULL| NULL| 4382 | Using temporary; Using filesort | | 1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY | 4 | Pts.AchievementID | 1 | Using where | | 1 | PRIMARY | M | eq_ref | PRIMARY,IX_Members_3| PRIMARY | 4 | bpa.TAP.CurrentMemberID | 1 | | | 1 | PRIMARY | C | eq_ref | PRIMARY,IX_Chapters_1 | PRIMARY | 4 | bpa.M.ChapterID | 1 | | | 1 | PRIMARY | S | eq_ref | PRIMARY | PRIMARY | 4 | bpa.C.SchoolID | 1 | | | 2 | DERIVED | TASA | index | NULL| AchievementID | 5 | NULL| 161685 | Using where | ++-+++-+---+-+-++-+ What is the best way to optimize this query so that it doesn't take 40 seconds to return the dataset? Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=anan...@gmail.com
Complex SQL optimization vs. general-purpose language
Any SQL rewriting gurus know how I might be able to optimize this query? The schema: mysql show columns from transactionlog; +---+---+--+-+-++ | Field | Type | Null | Key | Default | Extra | +---+---+--+-+-++ | id| int(11) | NO | PRI | NULL| auto_increment | | transactionid | varchar(10) | NO | MUL | NULL|| | queryid | tinyint(4)| NO | | NULL|| | tableid | varchar(30) | NO | MUL | NULL|| | tupleid | int(11) | NO | | NULL|| | querytype | enum('select','insert','delete','update') | NO | | NULL|| | schemaname| varchar(20) | YES | | NULL|| | partition | tinyint(3) unsigned | YES | | NULL|| +---+---+--+-+-++ 8 rows in set (0.04 sec) The query: select concat(weight, ' ', ids, '\n') from ( select tableid, tupleid, group_concat(id separator ' ') as ids, ( select count(distinct transactionid) from transactionlog where transactionid in ( select transactionid from transactionlog where (tableid, tupleid, querytype) = (t.tableid, t.tupleid, 'update') group by transactionid having count(*) 0 ) ) weight from transactionlog t group by tableid, tupleid having weight 0 and count(*) 1 ) u; This is the output of EXPLAIN and mk-visual-explain: ++++---+---+---+-+---+--+-- + | id | select_type| table | type | possible_keys | key | key_len | ref | rows | Extra | ++++---+---+---+-+---+--+--+ | 1 | PRIMARY| derived2 | ALL | NULL | NULL | NULL| NULL | 13 | | | 2 | DERIVED| t | ALL | NULL | NULL | NULL| NULL | 68 | Using filesort | | 3 | DEPENDENT SUBQUERY | transactionlog | index | NULL | transactionid | 12 | NULL | 68 | Using where; Using index | | 4 | DEPENDENT SUBQUERY | transactionlog | ref | tableid | tableid | 36 | func,func |2 | Using where; Using temporary; Using filesort | ++++---+---+---+-+---+--+--+ Table scan rows 13 +- DERIVED table derived(t,transactionlog,temporary(transactionlog)) +- DEPENDENT SUBQUERY +- DEPENDENT SUBQUERY | +- Filesort | | +- TEMPORARY | | table temporary(transactionlog) | | +- Filter with WHERE | |+- Bookmark lookup | | +- Table | | | table transactionlog | | | possible_keys tableid | | +- Index lookup | | keytransactionlog-tableid | | possible_keys tableid | | key_len36 | | reffunc,func | | rows 2 | +- Filter with WHERE | +- Index scan |keytransactionlog-transactionid |key_len12 |rows 68 +- Filesort +- Table scan rows 68 +- Table table t That is a lot of work. I can write the equivalent logic in Python while making a single pass: results = query( select tableid, tupleid, transactionid, id, querytype from transactionlog_2warehouse ) _tab, _tup = None ids = [] weight = 0 saw_upd = False for tab, tup, txn, id, qt in results: if (_tab, _tup) != (tab, tup): if len(ids) 1 and weight 0: print weight, ids weight = 0 ids = [] _txn = None if _txn != txn: saw_upd = False if qt == 'update' and not saw_upd: weight += 1 saw_upd = True ids += [id] Is it possible to achieve the Python single-pass performance using pure SQL? Thanks in advance! -- Yang Zhang http://www.mit.edu/~y_z/
Re: optimization
From: Jaime Crespo Rincón jcre...@warp.es Sent: Monday, January 25, 2010 5:30 PM 2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? Both. Maybe that's the problem? I started creating database tables for my own web apps with the default mysql configuration. I believe the default database engine is MyISAM. But then I wanted to use foreign keys and I saw that it required me to use InnoDB. So I converted some tables to InnoDB but not all. Maybe it was a mistake not to convert all of them. After that, I installed drupal, moodle, and mediawiki. I haven't looked at what kind of tables those packages create. They may not specify it and the default is still whatever it is when you install mysql, MyISAM I think. * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. Well, it could be that the disks aren't real fast. The server is also running a print server (lprng). I don't think that's very CPU intensive but it might be slowing down writes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: optimization
Get yourself a copy of the book High Performance MySQL 2nd Edition Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English ISBN-10: 0596101716 ISBN-13: 978-0596101718 Here is a brief preview of the first edition: http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false Mike At 10:19 AM 1/26/2010, John G. Heim wrote: From: Jaime Crespo Rincón jcre...@warp.es Sent: Monday, January 25, 2010 5:30 PM 2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? Both. Maybe that's the problem? I started creating database tables for my own web apps with the default mysql configuration. I believe the default database engine is MyISAM. But then I wanted to use foreign keys and I saw that it required me to use InnoDB. So I converted some tables to InnoDB but not all. Maybe it was a mistake not to convert all of them. After that, I installed drupal, moodle, and mediawiki. I haven't looked at what kind of tables those packages create. They may not specify it and the default is still whatever it is when you install mysql, MyISAM I think. * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. Well, it could be that the disks aren't real fast. The server is also running a print server (lprng). I don't think that's very CPU intensive but it might be slowing down writes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: optimization
♫ I would recommend the same to you about reading High Perf. MySQL as Baron, et al wrote a great book about performance on MySQL. That being said, it has been my experience that in 99% of client cases they don't really need to run two different types of tables. If I were you, I would use InnoDB exclusively unless there is legitimate reason to do otherwise. In an environment that is running 25% writes and a decent query rate you are bound to have contention issues with MyISAM. While there are always going to be edge cases for MyISAM, your default should be innodb and your config should reflect this. Changing your tables to InnoDB is a simple ALTER TABLE which you can script if there are a number of tables to convert. Allocate as much of your available RAM as possible to the innodb_buffer_pool (typically 50 - 80% of total RAM) and I bet you would see a dramatic difference. That is simplifying things somewhat, but should give an idea. keith On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote: Get yourself a copy of the book High Performance MySQL 2nd Edition Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English ISBN-10: 0596101716 ISBN-13: 978-0596101718 Here is a brief preview of the first edition: http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false Mike At 10:19 AM 1/26/2010, John G. Heim wrote: From: Jaime Crespo Rincón jcre...@warp.es Sent: Monday, January 25, 2010 5:30 PM 2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? Both. Maybe that's the problem? I started creating database tables for my own web apps with the default mysql configuration. I believe the default database engine is MyISAM. But then I wanted to use foreign keys and I saw that it required me to use InnoDB. So I converted some tables to InnoDB but not all. Maybe it was a mistake not to convert all of them. After that, I installed drupal, moodle, and mediawiki. I haven't looked at what kind of tables those packages create. They may not specify it and the default is still whatever it is when you install mysql, MyISAM I think. * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. Well, it could be that the disks aren't real fast. The server is also running a print server (lprng). I don't think that's very CPU intensive but it might be slowing down writes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com -- Chief Training Officer Paragon Consulting Services 850-637-3877
Re: optimization
Just to be clear, you're suggesting I convert all of the spamassassin, drupal, and mediawiki tables to innodb too? Or just my own database? What about the mysql database itself? I wouldn't convert those tables, would I? - Original Message - From: Keith Murphy bmur...@paragon-cs.com To: mysql@lists.mysql.com Sent: Tuesday, January 26, 2010 11:06 AM Subject: Re: optimization ♫ I would recommend the same to you about reading High Perf. MySQL as Baron, et al wrote a great book about performance on MySQL. That being said, it has been my experience that in 99% of client cases they don't really need to run two different types of tables. If I were you, I would use InnoDB exclusively unless there is legitimate reason to do otherwise. In an environment that is running 25% writes and a decent query rate you are bound to have contention issues with MyISAM. While there are always going to be edge cases for MyISAM, your default should be innodb and your config should reflect this. Changing your tables to InnoDB is a simple ALTER TABLE which you can script if there are a number of tables to convert. Allocate as much of your available RAM as possible to the innodb_buffer_pool (typically 50 - 80% of total RAM) and I bet you would see a dramatic difference. That is simplifying things somewhat, but should give an idea. keith On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote: Get yourself a copy of the book High Performance MySQL 2nd Edition Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English ISBN-10: 0596101716 ISBN-13: 978-0596101718 Here is a brief preview of the first edition: http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false Mike At 10:19 AM 1/26/2010, John G. Heim wrote: From: Jaime Crespo Rincón jcre...@warp.es Sent: Monday, January 25, 2010 5:30 PM 2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? Both. Maybe that's the problem? I started creating database tables for my own web apps with the default mysql configuration. I believe the default database engine is MyISAM. But then I wanted to use foreign keys and I saw that it required me to use InnoDB. So I converted some tables to InnoDB but not all. Maybe it was a mistake not to convert all of them. After that, I installed drupal, moodle, and mediawiki. I haven't looked at what kind of tables those packages create. They may not specify it and the default is still whatever it is when you install mysql, MyISAM I think. * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. Well, it could be that the disks aren't real fast. The server is also running a print server (lprng). I don't think that's very CPU intensive but it might be slowing down writes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com -- Chief Training Officer Paragon Consulting Services 850-637-3877 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: optimization
You absolutely *should not* convert the mysql database to InnoDB. Read the above sentence again :) All others, unless you had a specific reason not to do so, yes, I would convert them. keith On Tue, Jan 26, 2010 at 2:18 PM, John G. Heim jh...@math.wisc.edu wrote: Just to be clear, you're suggesting I convert all of the spamassassin, drupal, and mediawiki tables to innodb too? Or just my own database? What about the mysql database itself? I wouldn't convert those tables, would I? - Original Message - From: Keith Murphy bmur...@paragon-cs.com To: mysql@lists.mysql.com Sent: Tuesday, January 26, 2010 11:06 AM Subject: Re: optimization ♫ I would recommend the same to you about reading High Perf. MySQL as Baron, et al wrote a great book about performance on MySQL. That being said, it has been my experience that in 99% of client cases they don't really need to run two different types of tables. If I were you, I would use InnoDB exclusively unless there is legitimate reason to do otherwise. In an environment that is running 25% writes and a decent query rate you are bound to have contention issues with MyISAM. While there are always going to be edge cases for MyISAM, your default should be innodb and your config should reflect this. Changing your tables to InnoDB is a simple ALTER TABLE which you can script if there are a number of tables to convert. Allocate as much of your available RAM as possible to the innodb_buffer_pool (typically 50 - 80% of total RAM) and I bet you would see a dramatic difference. That is simplifying things somewhat, but should give an idea. keith On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote: Get yourself a copy of the book High Performance MySQL 2nd Edition Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English ISBN-10: 0596101716 ISBN-13: 978-0596101718 Here is a brief preview of the first edition: http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false Mike At 10:19 AM 1/26/2010, John G. Heim wrote: From: Jaime Crespo Rincón jcre...@warp.es Sent: Monday, January 25, 2010 5:30 PM 2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? Both. Maybe that's the problem? I started creating database tables for my own web apps with the default mysql configuration. I believe the default database engine is MyISAM. But then I wanted to use foreign keys and I saw that it required me to use InnoDB. So I converted some tables to InnoDB but not all. Maybe it was a mistake not to convert all of them. After that, I installed drupal, moodle, and mediawiki. I haven't looked at what kind of tables those packages create. They may not specify it and the default is still whatever it is when you install mysql, MyISAM I think. * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. Well, it could be that the disks aren't real fast. The server is also running a print server (lprng). I don't think that's very CPU intensive but it might be slowing down writes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com -- Chief Training Officer Paragon Consulting Services 850-637-3877 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com -- Chief Training Officer Paragon Consulting Services 850-637-3877
Re: optimization
On Tue, Jan 26, 2010 at 11:23 AM, Keith Murphy bmur...@paragon-cs.com wrote: You absolutely *should not* convert the mysql database to InnoDB. Read the above sentence again :) All others, unless you had a specific reason not to do so, yes, I would convert them. keith On Tue, Jan 26, 2010 at 2:18 PM, John G. Heim jh...@math.wisc.edu wrote: Just to be clear, you're suggesting I convert all of the spamassassin, drupal, and mediawiki tables to innodb too? Or just my own database? What about the mysql database itself? I wouldn't convert those tables, would I? - Original Message - From: Keith Murphy bmur...@paragon-cs.com To: mysql@lists.mysql.com Sent: Tuesday, January 26, 2010 11:06 AM Subject: Re: optimization ♫ I would recommend the same to you about reading High Perf. MySQL as Baron, et al wrote a great book about performance on MySQL. That being said, it has been my experience that in 99% of client cases they don't really need to run two different types of tables. If I were you, I would use InnoDB exclusively unless there is legitimate reason to do otherwise. In an environment that is running 25% writes and a decent query rate you are bound to have contention issues with MyISAM. While there are always going to be edge cases for MyISAM, your default should be innodb and your config should reflect this. Changing your tables to InnoDB is a simple ALTER TABLE which you can script if there are a number of tables to convert. Allocate as much of your available RAM as possible to the innodb_buffer_pool (typically 50 - 80% of total RAM) and I bet you would see a dramatic difference. That is simplifying things somewhat, but should give an idea. keith On Tue, Jan 26, 2010 at 11:53 AM, mos mo...@fastmail.fm wrote: Get yourself a copy of the book High Performance MySQL 2nd Edition Publisher: O'Reilly Media; 2 edition (June 18, 2008) Language: English ISBN-10: 0596101716 ISBN-13: 978-0596101718 Here is a brief preview of the first edition: http://books.google.ca/books?id=iaCCQ13_zMICprintsec=frontcoverdq=high+performance+mysqlcd=1#v=onepageq=f=false Mike At 10:19 AM 1/26/2010, John G. Heim wrote: From: Jaime Crespo Rincón jcre...@warp.es Sent: Monday, January 25, 2010 5:30 PM 2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? Both. Maybe that's the problem? I started creating database tables for my own web apps with the default mysql configuration. I believe the default database engine is MyISAM. But then I wanted to use foreign keys and I saw that it required me to use InnoDB. So I converted some tables to InnoDB but not all. Maybe it was a mistake not to convert all of them. After that, I installed drupal, moodle, and mediawiki. I haven't looked at what kind of tables those packages create. They may not specify it and the default is still whatever it is when you install mysql, MyISAM I think. * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. Well, it could be that the disks aren't real fast. The server is also running a print server (lprng). I don't think that's very CPU intensive but it might be slowing down writes. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=mo...@fastmail.fm -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com -- Chief Training Officer Paragon Consulting Services 850-637-3877 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=bmur...@paragon-cs.com -- Chief Training Officer Paragon Consulting Services 850-637-3877 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe
optimization
I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: key_buffer = 256M max_allowed_packet = 16M max_connections=2000 myisam_sort_buffer_size = 64M open_files_limit = 8192 query_cache_min_res_unit= 2K query_cache_size= 36M read_buffer_size = 1M read_rnd_buffer_size = 4M sort_buffer_size = 1M table_cache = 512 thread_cache = 32 thread_cache_size = 8 thread_concurrency = 8 --- # Example MySQL config file for large systems. # # This is for a large system with memory = 512M where the system runs mainly # MySQL. # # You can copy this file to # /etc/my.cnf to set global options, # mysql-data-dir/my.cnf to set server-specific options (in this # installation this directory is /var/lib/mysql) or # ~/.my.cnf to set user-specific options. # # In this file, you can use all long options that a program supports. # If you want to know which options a program supports, run the program # with the --help option. # The following options will be passed to all MySQL clients [client] #password = your_password port = 3306 socket = /var/run/mysqld/mysqld.sock # Here follows entries for some specific programs # The MySQL server [mysqld] log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 2 #log-queries-not-using-indexes port = 3306 socket = /var/run/mysqld/mysqld.sock skip-locking #wait_timeout=14400 key_buffer = 256M max_allowed_packet = 16M max_connections=2000 myisam_sort_buffer_size = 64M open_files_limit = 8192 query_cache_min_res_unit= 2K query_cache_size= 36M read_buffer_size = 1M read_rnd_buffer_size = 4M sort_buffer_size = 1M table_cache = 512 thread_cache = 32 thread_cache_size = 8 thread_concurrency = 8 # Don't listen on a TCP/IP port at all. This can be a security enhancement, # if all processes that need to connect to mysqld run on the same host. # All interaction with mysqld must be made via Unix sockets or named pipes. # Note that using this option without enabling named pipes on Windows # (via the enable-named-pipe option) will render mysqld useless! # #skip-networking # Replication Master Server (default) # binary logging is required for replication log-bin=mysql-bin expire_logs_days= 10 max_binlog_size = 100M # required unique id between 1 and 2^32 - 1 # defaults to 1 if master-host is not set # but will not function as a master if omitted server-id = 1 # Replication Slave (comment out master section to use this) # # To configure this host as a replication slave, you can choose between # two methods : # # 1) Use the CHANGE MASTER TO command (fully described in our manual) - #the syntax is: # #CHANGE MASTER TO MASTER_HOST=host, MASTER_PORT=port, #MASTER_USER=user, MASTER_PASSWORD=password ; # #where you replace host, user, password by quoted strings and #port by the master's port number (3306 by default). # #Example: # #CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306, #MASTER_USER='joe', MASTER_PASSWORD='secret'; # # OR # # 2) Set the variables below. However, in case you choose this method, then #start replication for the first time (even unsuccessfully, for example #if you mistyped the password in master-password and the slave fails to #connect), the slave will create a master.info file, and any later #change in this file to the variables' values below will be ignored and #overridden by the content of the master.info file, unless you shutdown #the slave server, delete master.info and restart the slaver server. #For that reason, you may want to leave the lines below untouched #(commented) and instead use CHANGE MASTER TO (see above) # # required unique id between 2 and 2^32 - 1 # (and different from the master) # defaults to 2 if master-host is set # but will not function as a slave if omitted #server-id = 2 # # The replication master for this slave - required #master-host = hostname # # The username the slave will use for authentication when connecting # to the master - required #master-user = username # # The password the slave will authenticate with when connecting to # the master - required #master-password = password # # The port the master is listening on. # optional - defaults to 3306 #master-port = port # # binary logging - not required for slaves, but recommended #log-bin=mysql-bin # Point the following paths to different dedicated disks #tmpdir = /tmp/ #log-update = /path-to-dedicated-directory/hostname # Uncomment the following if you are using BDB tables #bdb_cache_size = 64M #bdb_max_lock = 10 # Uncomment
Re: optimization
2010/1/25 John G. Heim jh...@math.wisc.edu: I have a server with 16Gb of RAM and a dual-core 2Ghz processor. It is running the latest mysql-server from debian lenny (5.0.1). I have databases for drupal, moodle, spamassassin, horde3, and a small database for departmental stuff. The problem is that inserts/updates are sometimes very slow, on the order of a minute. I am hoping somebody can sspot something wrong in my config. Here's the optimization settings section (for your convenience). The whole my.cnf is reproduced below that: Are your databases using MyISAM or InnoDB? * If MyISAM, you could be suffering contention problems on writes because of full table locks. No easy solution but engine change or database sharding. Also key_buffer, (and the other buffers) coud be too small for 16GB of RAM. Are you really using more thant 10% of it? You could also disable other engines if unused. * If InnoDB, you have not set innodb_buffer_pool_size nor log size. You could increase the pool to 50% of ram available. Those are very general suggestions. It depends a lot on your hardware (slow storage?), other apps installed on the same machine or the load of the server, among others. There also exists a tool to get introduced into MySQL server variables tuning: Tuning primer - https://launchpad.net/mysql-tuning-primer It is also a very general tool, but it could be helpful for a starting point. Regards, -- Jaime Crespo MySQL Java Instructor Warp Networks http://warp.es -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: mysql server optimization
What kind of queries are being run and what type of data is stored? There are a number of factors which causes MySQL to use on disk temporary tables instead of in memory tables. (If there a BLOB or TEXT columns in the table for example). As a starting point you could (if you have the memory in your box) try increasing the values for tmp_table_size and max_heap_size, these control how large an in-memory temp table can grow before it is converted to an on disk temp table. Regards John === John Daisley MySQL 5.0 Certified Database Administrator (CMDBA) MySQL 5.0 Certified Developer Cognos BI Developer Telephone: +44(0)1283 537111 Mobile: +44(0)7812 451238 Email: john.dais...@butterflysystems.co.uk === Sent via HP IPAQ mobile device -Original Message- From: TianJing jingtian.seu...@gmail.com Sent: 15 December 2009 03:08 To: mysql@lists.mysql.com Subject: mysql server optimization Dear all, i am nowing having a problem with the mysql server optimization, i have 20 database on a server,each database is about 80Gb,the sql seems very slow,almost 5s.and the server i/o is so high,when i check the processlist,the 'copying to tmp table' state takes a long time. i have already use index,but the sql use lots of 'and','or','order by', and for some reason i can not optimization the sql,i hope to do some optimization on mysql server to mitigate this phenomenon,could any one give me some suggestion? thanks. my server is linux,8CPU and 4G memery,the my.cnf is: [mysqld] port = 3306 skip-locking skip-name-resolve key_buffer_size = 16M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512M net_buffer_length = 8K read_buffer_size = 512K read_rnd_buffer_size = 512M myisam_sort_buffer_size = 8M table_cache = 1024 log-bin=mysql-bin binlog_format=mixed -- Tianjing -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: mysql server optimization
i use MyISAM storage,the MYI file is more than 500Gb. most of the sql is select with many 'and','or','order by',for example: SELECT fref,fstart,fstop,fsource,fmethod,fscore,fstrand,fphase,gclass,gname,ftarget_start,ftarget_stop,fdata.fid,fdata.gid FROM fdata,ftype,fgroup WHERE fgroup.gid = fdata.gid AND ftype.ftypeid = fdata.ftypeid AND fdata.fref='chr18' AND (fbin='100' OR fbin between '10' and '10' OR fbin between '.999' and '1' OR fbin between '1000.049' and '1000.051' OR fbin between '100.549' and '100.551' OR fbin between '10.0005529' and '10.0005531' OR fbin between '1.0055379' and '1.0055381' OR fbin between '1000.0553839' and '1000.0553841') AND fdata.fstop='55384910' AND fdata.fstart='55384989' AND ((fmethod = 'genotype' AND fsource = 'SoapSNP') OR (fmethod = 'gt_novel' AND fsource = 'SoapSNP') OR (fmethod = 'SNP' AND fsource = 'MutaGeneSys_06JUNE07') OR (fmethod = 'snp' AND fsource = 'HapMap_gt') OR (fmethod = 'similarity') OR (fmethod = 'HSP') OR (fmethod = 'match') OR (fmethod = 'CDS' AND fsource = 'UCSC_1') OR (fmethod = '5\'-UTR' AND fsource = 'UCSC_1') OR (fmethod = '3\'-UTR' AND fsource = 'UCSC_1') OR (fmethod = 'transcription_start_site' AND fsource = 'UCSC_1') OR (fmethod = 'polyA_site' AND fsource = 'UCSC_1') OR (fmethod = 'UTR' AND fsource = 'UCSC_1') OR (fmethod = 'five_prime_untranslated_region' AND fsource = 'UCSC_1') OR (fmethod = 'three_prime_untranslated_region' AND fsource = 'UCSC_1') OR (fmethod = 'five_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'three_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'exon' AND fsource = 'UCSC_1') OR (fmethod = 'mRNA' AND fsource = 'UCSC_1')) ORDER BY fgroup.gname; there is a longblob column in the table, but even i do not select this blob column, the i/o is still higher. i have 16G memery in total, the oracle database take 1/4 of them, i try to increasing the key_buffer_size to 2048M and tmp_table_size to 1024M,but the machine crashed out of memery in one day,the mysqld takes over 18G memery! i use show profiles find that 'sending data' and 'copying to tmp table' takes lots of time. On Tue, Dec 15, 2009 at 5:36 PM, John Daisley mg_s...@hotmail.com wrote: What kind of queries are being run and what type of data is stored? There are a number of factors which causes MySQL to use on disk temporary tables instead of in memory tables. (If there a BLOB or TEXT columns in the table for example). As a starting point you could (if you have the memory in your box) try increasing the values for tmp_table_size and max_heap_size, these control how large an in-memory temp table can grow before it is converted to an on disk temp table. Regards John === John Daisley MySQL 5.0 Certified Database Administrator (CMDBA) MySQL 5.0 Certified Developer Cognos BI Developer Telephone: +44(0)1283 537111 Mobile: +44(0)7812 451238 Email: john.dais...@butterflysystems.co.uk === Sent via HP IPAQ mobile device -Original Message- From: TianJing jingtian.seu...@gmail.com Sent: 15 December 2009 03:08 To: mysql@lists.mysql.com Subject: mysql server optimization Dear all, i am nowing having a problem with the mysql server optimization, i have 20 database on a server,each database is about 80Gb,the sql seems very slow,almost 5s.and the server i/o is so high,when i check the processlist,the 'copying to tmp table' state takes a long time. i have already use index,but the sql use lots of 'and','or','order by', and for some reason i can not optimization the sql,i hope to do some optimization on mysql server to mitigate this phenomenon,could any one give me some suggestion? thanks. my server is linux,8CPU and 4G memery,the my.cnf is: [mysqld] port= 3306 skip-locking skip-name-resolve key_buffer_size = 16M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512M net_buffer_length = 8K read_buffer_size = 512K read_rnd_buffer_size = 512M myisam_sort_buffer_size = 8M table_cache = 1024 log-bin=mysql-bin binlog_format=mixed -- Tianjing -- Tianjing Tel:0755-2527-3851 MSN:tianjing...@hotmail.com msn%3atianjing...@hotmail.com
RE: mysql server optimization
I'm fairly sure that the longblob column will prevent MySQL from being able to use 'in memory temp tables' regardless of whether it is included in the SELECT. In an ideal world I would move that longblob to a separate table. How big are the tables fdata,ftype,fgroup? Can you post the results of EXPLAIN EXTENDED for the query you posted? Can you also post the output of 'SHOW CREATE TABLE' for the tables fdata,ftype,fgroup? John Daisley Business Intelligence Developer - MySQL Database Administrator Inspired Gaming Group Plc Direct Dial +44 (0)1283 519244 Telephone +44 (0)1283 512777 ext 2244 Mobile +44 (0)7812 451238 Email john.dais...@llg.co.uk www.inspiredgaminggroup.com -Original Message- From: TianJing tianj...@genomics.org.cn Sent: 15 December 2009 10:28 To: mg_s...@hotmail.com Cc: mysql@lists.mysql.com Subject: Re: mysql server optimization i use MyISAM storage,the MYI file is more than 500Gb. most of the sql is select with many 'and','or','order by',for example: SELECT fref,fstart,fstop,fsource,fmethod,fscore,fstrand,fphase,gclass,gname,ftarget_start,ftarget_stop,fdata.fid,fdata.gid FROM fdata,ftype,fgroup WHERE fgroup.gid = fdata.gid AND ftype.ftypeid = fdata.ftypeid AND fdata.fref='chr18' AND (fbin='100' OR fbin between '10' and '10' OR fbin between '.999' and '1' OR fbin between '1000.049' and '1000.051' OR fbin between '100.549' and '100.551' OR fbin between '10.0005529' and '10.0005531' OR fbin between '1.0055379' and '1.0055381' OR fbin between '1000.0553839' and '1000.0553841') AND fdata.fstop='55384910' AND fdata.fstart='55384989' AND ((fmethod = 'genotype' AND fsource = 'SoapSNP') OR (fmethod = 'gt_novel' AND fsource = 'SoapSNP') OR (fmethod = 'SNP' AND fsource = 'MutaGeneSys_06JUNE07') OR (fmethod = 'snp' AND fsource = 'HapMap_gt') OR (fmethod = 'similarity') OR (fmethod = 'HSP') OR (fmethod = 'match') OR (fmethod = 'CDS' AND fsource = 'UCSC_1') OR (fmethod = '5\'-UTR' AND fsource = 'UCSC_1') OR (fmethod = '3\'-UTR' AND fsource = 'UCSC_1') OR (fmethod = 'transcription_start_site' AND fsource = 'UCSC_1') OR (fmethod = 'polyA_site' AND fsource = 'UCSC_1') OR (fmethod = 'UTR' AND fsource = 'UCSC_1') OR (fmethod = 'five_prime_untranslated_region' AND fsource = 'UCSC_1') OR (fmethod = 'three_prime_untranslated_region' AND fsource = 'UCSC_1') OR (fmethod = 'five_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'three_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'exon' AND fsource = 'UCSC_1') OR (fmethod = 'mRNA' AND fsource = 'UCSC_1')) ORDER BY fgroup.gname; there is a longblob column in the table, but even i do not select this blob column, the i/o is still higher. i have 16G memery in total, the oracle database take 1/4 of them, i try to increasing the key_buffer_size to 2048M and tmp_table_size to 1024M,but the machine crashed out of memery in one day,the mysqld takes over 18G memery! i use show profiles find that 'sending data' and 'copying to tmp table' takes lots of time. On Tue, Dec 15, 2009 at 5:36 PM, John Daisley mg_s...@hotmail.com wrote: What kind of queries are being run and what type of data is stored? There are a number of factors which causes MySQL to use on disk temporary tables instead of in memory tables. (If there a BLOB or TEXT columns in the table for example). As a starting point you could (if you have the memory in your box) try increasing the values for tmp_table_size and max_heap_size, these control how large an in-memory temp table can grow before it is converted to an on disk temp table. Regards John === John Daisley MySQL 5.0 Certified Database Administrator (CMDBA) MySQL 5.0 Certified Developer Cognos BI Developer Telephone: +44(0)1283 537111 Mobile: +44(0)7812 451238 Email: john.dais...@butterflysystems.co.uk === Sent via HP IPAQ mobile device -Original Message- From: TianJing jingtian.seu...@gmail.com Sent: 15 December 2009 03:08 To: mysql@lists.mysql.com Subject: mysql server optimization Dear all, i am nowing having a problem with the mysql server optimization, i have 20 database on a server,each database is about 80Gb,the sql seems very slow,almost 5s.and the server i/o is so high,when i check the processlist,the 'copying to tmp table' state takes a long time. i have already use index,but the sql use lots of 'and','or','order by', and for some reason i can not optimization the sql,i hope to do some optimization on mysql server to mitigate this phenomenon,could any one give me some suggestion? thanks. my server is linux,8CPU and 4G memery,the my.cnf is: [mysqld] port = 3306 skip-locking skip-name-resolve
Re: mysql server optimization
; Using filesort | | 1 | SIMPLE | ftype | eq_ref | PRIMARY,ftype,fmethod,fsource | PRIMARY | 4 | yhchr1.fdata.ftypeid |1 | 100.00 | Using where | | 1 | SIMPLE | fgroup | eq_ref | PRIMARY | PRIMARY | 4 | yhchr1.fdata.gid |1 | 100.00 | | ++-+++---+-+-+--+--+--+--+ On Tue, Dec 15, 2009 at 6:54 PM, Daisley, John (Burton) john.dais...@llg.co.uk wrote: I'm fairly sure that the longblob column will prevent MySQL from being able to use 'in memory temp tables' regardless of whether it is included in the SELECT. In an ideal world I would move that longblob to a separate table. How big are the tables fdata,ftype,fgroup? Can you post the results of EXPLAIN EXTENDED for the query you posted? Can you also post the output of 'SHOW CREATE TABLE' for the tables fdata,ftype,fgroup? John Daisley Business Intelligence Developer - MySQL Database Administrator Inspired Gaming Group Plc Direct Dial +44 (0)1283 519244 Telephone +44 (0)1283 512777 ext 2244 Mobile +44 (0)7812 451238 Email john.dais...@llg.co.uk www.inspiredgaminggroup.com -Original Message- From: TianJing tianj...@genomics.org.cn Sent: 15 December 2009 10:28 To: mg_s...@hotmail.com Cc: mysql@lists.mysql.com Subject: Re: mysql server optimization i use MyISAM storage,the MYI file is more than 500Gb. most of the sql is select with many 'and','or','order by',for example: SELECT fref,fstart,fstop,fsource,fmethod,fscore,fstrand,fphase,gclass,gname,ftarget_start,ftarget_stop,fdata.fid,fdata.gid FROM fdata,ftype,fgroup WHERE fgroup.gid = fdata.gid AND ftype.ftypeid = fdata.ftypeid AND fdata.fref='chr18' AND (fbin='100' OR fbin between '10' and '10' OR fbin between '.999' and '1' OR fbin between '1000.049' and '1000.051' OR fbin between '100.549' and '100.551' OR fbin between '10.0005529' and '10.0005531' OR fbin between '1.0055379' and '1.0055381' OR fbin between '1000.0553839' and '1000.0553841') AND fdata.fstop='55384910' AND fdata.fstart='55384989' AND ((fmethod = 'genotype' AND fsource = 'SoapSNP') OR (fmethod = 'gt_novel' AND fsource = 'SoapSNP') OR (fmethod = 'SNP' AND fsource = 'MutaGeneSys_06JUNE07') OR (fmethod = 'snp' AND fsource = 'HapMap_gt') OR (fmethod = 'similarity') OR (fmethod = 'HSP') OR (fmethod = 'match') OR (fmethod = 'CDS' AND fsource = 'UCSC_1') OR (fmethod = '5\'-UTR' AND fsource = 'UCSC_1') OR (fmethod = '3\'-UTR' AND fsource = 'UCSC_1') OR (fmethod = 'transcription_start_site' AND fsource = 'UCSC_1') OR (fmethod = 'polyA_site' AND fsource = 'UCSC_1') OR (fmethod = 'UTR' AND fsource = 'UCSC_1') OR (fmethod = 'five_prime_untranslated_region' AND fsource = 'UCSC_1') OR (fmethod = 'three_prime_untranslated_region' AND fsource = 'UCSC_1') OR (fmethod = 'five_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'three_prime_UTR' AND fsource = 'UCSC_1') OR (fmethod = 'exon' AND fsource = 'UCSC_1') OR (fmethod = 'mRNA' AND fsource = 'UCSC_1')) ORDER BY fgroup.gname; there is a longblob column in the table, but even i do not select this blob column, the i/o is still higher. i have 16G memery in total, the oracle database take 1/4 of them, i try to increasing the key_buffer_size to 2048M and tmp_table_size to 1024M,but the machine crashed out of memery in one day,the mysqld takes over 18G memery! i use show profiles find that 'sending data' and 'copying to tmp table' takes lots of time. On Tue, Dec 15, 2009 at 5:36 PM, John Daisley mg_s...@hotmail.com wrote: What kind of queries are being run and what type of data is stored? There are a number of factors which causes MySQL to use on disk temporary tables instead of in memory tables. (If there a BLOB or TEXT columns in the table for example). As a starting point you could (if you have the memory in your box) try increasing the values for tmp_table_size and max_heap_size, these control how large an in-memory temp table can grow before it is converted to an on disk temp table. Regards John === John Daisley MySQL 5.0 Certified Database Administrator (CMDBA) MySQL 5.0 Certified Developer Cognos BI Developer Telephone: +44(0)1283 537111 Mobile: +44(0)7812 451238 Email: john.dais...@butterflysystems.co.uk === Sent via HP IPAQ mobile device -Original Message- From: TianJing jingtian.seu...@gmail.com Sent: 15 December 2009 03:08 To: mysql@lists.mysql.com Subject: mysql server optimization Dear all, i am nowing
RE: mysql server optimization
The 'order by' clause is forcing MySQL to use a temporary table; as a test could you try running the query without the order by clause? Does it run quicker? MySQL must use a temporary table if you 'group by' or 'order by' a column not contained in the first table of the select, sometimes you can get around this limitation just by rewriting the select. Give it a try, it can make a huge difference. Do the tables have a lot of insert,update, delete operations? If so it may be worth you running OPTIMIZE TABLE to rebuild indexes. Be warned it can take some time to complete and tables are locked whilst it runs. If you have slaves running you may want to add LOCAL to the OPTIMIZE TABLE statement so the operation does not run on your slaves. I'm stuck doing some other stuff at the mo but I will try and have a proper look at this later and will get back to you. Regards John Daisley Business Intelligence Developer - MySQL Database Administrator Inspired Gaming Group Plc Direct Dial +44 (0)1283 519244 Telephone +44 (0)1283 512777 ext 2244 Mobile +44 (0)7812 451238 From: jingtian.seu...@gmail.com [mailto:jingtian.seu...@gmail.com] On Behalf Of TianJing Sent: 15 December 2009 12:43 To: Daisley, John (Burton) Cc: mysql@lists.mysql.com Subject: Re: mysql server optimization yes,you are right,the longblob is already move to a separate table fdna,it is about 10Gb/database the fdata is about 30Gb/database ,the fgroup is about 10Gb/database.the MYI file is almost the same or much bigger than the MYD file. show create table for fdna is: | fdna | CREATE TABLE `fdna` ( `fref` varchar(100) NOT NULL, `foffset` int(10) unsigned NOT NULL, `fdna` longblob, PRIMARY KEY (`fref`,`foffset`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | show create table for fdata is: | fdata | CREATE TABLE `fdata` ( `fid` int(11) NOT NULL AUTO_INCREMENT, `fref` varchar(100) NOT NULL, `fstart` int(10) unsigned NOT NULL, `fstop` int(10) unsigned NOT NULL, `fbin` double(20,6) NOT NULL, `ftypeid` int(11) NOT NULL, `fscore` float DEFAULT NULL, `fstrand` enum('+','-') DEFAULT NULL, `fphase` enum('0','1','2') DEFAULT NULL, `gid` int(11) NOT NULL, `ftarget_start` int(10) unsigned DEFAULT NULL, `ftarget_stop` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`fid`), UNIQUE KEY `fref` (`fref`,`fbin`,`fstart`,`fstop`,`ftypeid`,`gid`), KEY `ftypeid` (`ftypeid`), KEY `gid` (`gid`) ) ENGINE=MyISAM AUTO_INCREMENT=381002371 DEFAULT CHARSET=latin1 | show create for fgroup is: -+ | fgroup | CREATE TABLE `fgroup` ( `gid` int(11) NOT NULL AUTO_INCREMENT, `gclass` varchar(100) DEFAULT NULL, `gname` varchar(100) DEFAULT NULL, PRIMARY KEY (`gid`), UNIQUE KEY `gclass` (`gclass`,`gname`) ) ENGINE=MyISAM AUTO_INCREMENT=240501186 DEFAULT CHARSET=latin1 | show create for ftype is: | ftype | CREATE TABLE `ftype` ( `ftypeid` int(11) NOT NULL AUTO_INCREMENT, `fmethod` varchar(100) NOT NULL, `fsource` varchar(100) DEFAULT NULL, PRIMARY KEY (`ftypeid`), UNIQUE KEY `ftype` (`fmethod`,`fsource`), KEY `fmethod` (`fmethod`), KEY `fsource` (`fsource`) ) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1 | the index on fdata is : --+-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+-- -+-+--++--++-+ | fdata | 0 | PRIMARY |1 | fid | A | 463619315 | NULL | NULL | | BTREE | | | fdata | 0 | fref |1 | fref| A | 1 | NULL | NULL | | BTREE | | | fdata | 0 | fref |2 | fbin| A | 229060 | NULL | NULL | | BTREE | | | fdata | 0 | fref |3 | fstart | A | 231809657 | NULL | NULL | | BTREE | | | fdata | 0 | fref |4 | fstop | A | 463619315 | NULL | NULL | | BTREE | | | fdata | 0 | fref |5 | ftypeid | A | 463619315 | NULL | NULL | | BTREE | | | fdata | 0 | fref |6 | gid | A | 463619315 | NULL | NULL | | BTREE | | | fdata | 1 | ftypeid |1 | ftypeid | A | 15 | NULL | NULL | | BTREE | | | fdata | 1 | gid |1 | gid | A | 231809657 | NULL | NULL | | BTREE | | index for fgroup is: +++--+--+-+- --+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null
sql file system + optimization
Ok... in the file system, my MySQL files are located in /var/lib/mysql/ I have a database (lets say 'test') so it gets its own directory /var/lib/mysql/test/ Now, all the tables go in this folder. (I know, we *should* all know this...) if the database is on another disk, could it increase performance if this was a large database? If so, could I just symlink the test/ directory to another raid array to increase performance? Or would the increase be negligible? Steven Staples -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: sql file system + optimization
Yes, you can symlink it. How much performance benefit you get will depend on hardware and traffic. === John Daisley MySQL 5.0 Certified Database Administrator (CMDBA) MySQL 5.0 Certified Developer Cognos BI Developer Telephone: +44(0)1283 537111 Mobile: +44(0)7812 451238 Email: john.dais...@butterflysystems.co.uk === Sent via HP IPAQ mobile device -Original Message- From: Steven Staples sstap...@mnsi.net Sent: 15 December 2009 14:37 To: mysql@lists.mysql.com Subject: sql file system + optimization Ok... in the file system, my MySQL files are located in /var/lib/mysql/ I have a database (lets say 'test') so it gets its own directory /var/lib/mysql/test/ Now, all the tables go in this folder. (I know, we *should* all know this...) if the database is on another disk, could it increase performance if this was a large database? If so, could I just symlink the test/ directory to another raid array to increase performance? Or would the increase be negligible? Steven Staples -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=john.dais...@butterflysystems.co.uk -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: mysql server optimization
yeah,it runs faster if without order clause,as it do not use filesort here. because of the huge data,it takes lots of time to copy them to tmp table or even to the tmp table on the disk when use filesort, It also led to the higher io wait! i am trying to increase the variable 'max_length_for_sort_data' to 8096,hope this can help mysql use the efficient sorting algorithm,theoretically this can reduce the I / O, i am also trying to use tmpfs filesystem as tmpdir,this can greatly improve the reading and writing speed on the disk. hope these can help. do you hava any suggestions on this? these is only select operation in my database,and i have do some optimiztion by 'myisamchk -aS' ,but the effect is not very good. i am sure that it will be greate improved if we rewriting the select,but unfortunately, this sql is inside the completed software, and i have no rights to rewrite it,so i have the only way to optimiztion the mysql server! thanks again for your help all the best. On Tue, Dec 15, 2009 at 9:59 PM, Daisley, John (Burton) john.dais...@llg.co.uk wrote: The 'order by' clause is forcing MySQL to use a temporary table; as a test could you try running the query without the order by clause? Does it run quicker? MySQL must use a temporary table if you 'group by' or 'order by' a column not contained in the first table of the select, sometimes you can get around this limitation just by rewriting the select. Give it a try, it can make a huge difference. Do the tables have a lot of insert,update, delete operations? If so it may be worth you running OPTIMIZE TABLE to rebuild indexes. Be warned it can take some time to complete and tables are locked whilst it runs. If you have slaves running you may want to add LOCAL to the OPTIMIZE TABLE statement so the operation does not run on your slaves. I'm stuck doing some other stuff at the mo but I will try and have a proper look at this later and will get back to you. Regards John Daisley Business Intelligence Developer - MySQL Database Administrator Inspired Gaming Group Plc Direct Dial +44 (0)1283 519244 Telephone +44 (0)1283 512777 ext 2244 Mobile +44 (0)7812 451238 -- *From:* jingtian.seu...@gmail.com [mailto:jingtian.seu...@gmail.com] *On Behalf Of *TianJing *Sent:* 15 December 2009 12:43 *To:* Daisley, John (Burton) *Cc:* mysql@lists.mysql.com *Subject:* Re: mysql server optimization yes,you are right,the longblob is already move to a separate table fdna,it is about 10Gb/database the fdata is about 30Gb/database ,the fgroup is about 10Gb/database.the MYI file is almost the same or much bigger than the MYD file. show create table for fdna is: | fdna | CREATE TABLE `fdna` ( `fref` varchar(100) NOT NULL, `foffset` int(10) unsigned NOT NULL, `fdna` longblob, PRIMARY KEY (`fref`,`foffset`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | show create table for fdata is: | fdata | CREATE TABLE `fdata` ( `fid` int(11) NOT NULL AUTO_INCREMENT, `fref` varchar(100) NOT NULL, `fstart` int(10) unsigned NOT NULL, `fstop` int(10) unsigned NOT NULL, `fbin` double(20,6) NOT NULL, `ftypeid` int(11) NOT NULL, `fscore` float DEFAULT NULL, `fstrand` enum('+','-') DEFAULT NULL, `fphase` enum('0','1','2') DEFAULT NULL, `gid` int(11) NOT NULL, `ftarget_start` int(10) unsigned DEFAULT NULL, `ftarget_stop` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`fid`), UNIQUE KEY `fref` (`fref`,`fbin`,`fstart`,`fstop`,`ftypeid`,`gid`), KEY `ftypeid` (`ftypeid`), KEY `gid` (`gid`) ) ENGINE=MyISAM AUTO_INCREMENT=381002371 DEFAULT CHARSET=latin1 | show create for fgroup is: -+ | fgroup | CREATE TABLE `fgroup` ( `gid` int(11) NOT NULL AUTO_INCREMENT, `gclass` varchar(100) DEFAULT NULL, `gname` varchar(100) DEFAULT NULL, PRIMARY KEY (`gid`), UNIQUE KEY `gclass` (`gclass`,`gname`) ) ENGINE=MyISAM AUTO_INCREMENT=240501186 DEFAULT CHARSET=latin1 | show create for ftype is: | ftype | CREATE TABLE `ftype` ( `ftypeid` int(11) NOT NULL AUTO_INCREMENT, `fmethod` varchar(100) NOT NULL, `fsource` varchar(100) DEFAULT NULL, PRIMARY KEY (`ftypeid`), UNIQUE KEY `ftype` (`fmethod`,`fsource`), KEY `fmethod` (`fmethod`), KEY `fsource` (`fsource`) ) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1 | the index on fdata is : --+-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | fdata | 0 | PRIMARY |1 | fid | A | 463619315 | NULL | NULL | | BTREE | | | fdata | 0 | fref |1 | fref| A | 1 | NULL | NULL | | BTREE | | | fdata | 0 | fref
Optimization suggestions
I have following table structure, I have to use merge storage engine. Please have a look, and provide feedback if theres some thing wrong or if there's space for optimization. /*Table structure for table `messages2009` */ CREATE TABLE `messages2009` ( `id` varchar(36) NOT NULL default '', `folderid` varchar(36) NOT NULL default '', `fromid` int(11) NOT NULL default '0', `fromtype` varchar(10) NOT NULL default '', `toid` int(11) NOT NULL default '0', `totype` varchar(10) NOT NULL default '', `subject` varchar(255) default NULL, `body` text, `readbyrecipient` tinyint(1) NOT NULL default '0', `deletedbyauthor` tinyint(1) NOT NULL default '0', `deletedbyrecipient` tinyint(1) NOT NULL default '0', `threadid` varchar(36) NOT NULL default '', `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `folderid` (`folderid`), KEY `threadid` (`threadid`), KEY `inboxfolderindex` (`folderid`,`toid`,`totype`), KEY `sentitemsindex` (`fromid`,`fromtype`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; /*Table structure for table `messages` */ /*Merge table definition that covers all message tables*/ CREATE TABLE `messages` ( `id` varchar(36) NOT NULL default '', `folderid` varchar(36) NOT NULL default '', `fromid` int(11) NOT NULL default '0', `fromtype` varchar(10) NOT NULL default '', `toid` int(11) NOT NULL default '0', `totype` varchar(10) NOT NULL default '', `subject` varchar(255) default NULL, `body` text, `readbyrecipient` tinyint(1) NOT NULL default '0', `deletedbyauthor` tinyint(1) NOT NULL default '0', `deletedbyrecipient` tinyint(1) NOT NULL default '0', `threadid` varchar(36) NOT NULL default '', `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `folderid` (`folderid`), KEY `threadid` (`threadid`), KEY `inboxfolderindex` (`folderid`,`toid`,`totype`), KEY `sentitemsindex` (`fromid`,`fromtype`) ) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST UNION=(`messages2009`); Sudhir NimavatSenior software engineer. Quick start global PVT LTD. Baroda - 390007 Gujarat, India Personally I'm always ready to learn, although I do not always like being taught The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/
Re: Optimization suggestions
After one very quick look, the index on folderid alone is unnecessary since you have another index in which that field is the high-order field. On Mon, Dec 14, 2009 at 12:31 PM, Sudhir N sudhir_nima...@yahoo.com wrote: I have following table structure, I have to use merge storage engine. Please have a look, and provide feedback if theres some thing wrong or if there's space for optimization. /*Table structure for table `messages2009` */ CREATE TABLE `messages2009` ( `id` varchar(36) NOT NULL default '', `folderid` varchar(36) NOT NULL default '', `fromid` int(11) NOT NULL default '0', `fromtype` varchar(10) NOT NULL default '', `toid` int(11) NOT NULL default '0', `totype` varchar(10) NOT NULL default '', `subject` varchar(255) default NULL, `body` text, `readbyrecipient` tinyint(1) NOT NULL default '0', `deletedbyauthor` tinyint(1) NOT NULL default '0', `deletedbyrecipient` tinyint(1) NOT NULL default '0', `threadid` varchar(36) NOT NULL default '', `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `folderid` (`folderid`), KEY `threadid` (`threadid`), KEY `inboxfolderindex` (`folderid`,`toid`,`totype`), KEY `sentitemsindex` (`fromid`,`fromtype`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; /*Table structure for table `messages` */ /*Merge table definition that covers all message tables*/ CREATE TABLE `messages` ( `id` varchar(36) NOT NULL default '', `folderid` varchar(36) NOT NULL default '', `fromid` int(11) NOT NULL default '0', `fromtype` varchar(10) NOT NULL default '', `toid` int(11) NOT NULL default '0', `totype` varchar(10) NOT NULL default '', `subject` varchar(255) default NULL, `body` text, `readbyrecipient` tinyint(1) NOT NULL default '0', `deletedbyauthor` tinyint(1) NOT NULL default '0', `deletedbyrecipient` tinyint(1) NOT NULL default '0', `threadid` varchar(36) NOT NULL default '', `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `folderid` (`folderid`), KEY `threadid` (`threadid`), KEY `inboxfolderindex` (`folderid`,`toid`,`totype`), KEY `sentitemsindex` (`fromid`,`fromtype`) ) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST UNION=(`messages2009`); Sudhir NimavatSenior software engineer. Quick start global PVT LTD. Baroda - 390007 Gujarat, India Personally I'm always ready to learn, although I do not always like being taught The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/ -- Jim Lyons Web developer / Database administrator http://www.weblyons.com
RE: Optimization suggestions
Id should probably be an auto_incrementing INT, if you still need a unique text identifier, then I would make a separate field. Though my opinion isn't the only way; there is much debate on natural vs. surrogate keys. I would normalize folderid into a lookup in another table, and make folderid an INT value. Threadid is another field that would probably be better as an INT. As for your indexes, they depend completely on what type of queries you're going to be running. Once you know that, then you can test them using sample data and EXPLAIN. http://dev.mysql.com/doc/refman/5.0/en/explain.html http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html About normalization: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html Regards, Gavin Towey -Original Message- From: Sudhir N [mailto:sudhir_nima...@yahoo.com] Sent: Monday, December 14, 2009 10:31 AM To: Mysql Subject: Optimization suggestions I have following table structure, I have to use merge storage engine. Please have a look, and provide feedback if theres some thing wrong or if there's space for optimization. /*Table structure for table `messages2009` */ CREATE TABLE `messages2009` ( `id` varchar(36) NOT NULL default '', `folderid` varchar(36) NOT NULL default '', `fromid` int(11) NOT NULL default '0', `fromtype` varchar(10) NOT NULL default '', `toid` int(11) NOT NULL default '0', `totype` varchar(10) NOT NULL default '', `subject` varchar(255) default NULL, `body` text, `readbyrecipient` tinyint(1) NOT NULL default '0', `deletedbyauthor` tinyint(1) NOT NULL default '0', `deletedbyrecipient` tinyint(1) NOT NULL default '0', `threadid` varchar(36) NOT NULL default '', `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `folderid` (`folderid`), KEY `threadid` (`threadid`), KEY `inboxfolderindex` (`folderid`,`toid`,`totype`), KEY `sentitemsindex` (`fromid`,`fromtype`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; /*Table structure for table `messages` */ /*Merge table definition that covers all message tables*/ CREATE TABLE `messages` ( `id` varchar(36) NOT NULL default '', `folderid` varchar(36) NOT NULL default '', `fromid` int(11) NOT NULL default '0', `fromtype` varchar(10) NOT NULL default '', `toid` int(11) NOT NULL default '0', `totype` varchar(10) NOT NULL default '', `subject` varchar(255) default NULL, `body` text, `readbyrecipient` tinyint(1) NOT NULL default '0', `deletedbyauthor` tinyint(1) NOT NULL default '0', `deletedbyrecipient` tinyint(1) NOT NULL default '0', `threadid` varchar(36) NOT NULL default '', `senttime` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `folderid` (`folderid`), KEY `threadid` (`threadid`), KEY `inboxfolderindex` (`folderid`,`toid`,`totype`), KEY `sentitemsindex` (`fromid`,`fromtype`) ) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST UNION=(`messages2009`); Sudhir NimavatSenior software engineer. Quick start global PVT LTD. Baroda - 390007 Gujarat, India Personally I'm always ready to learn, although I do not always like being taught The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/ This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you are notified that reviewing, disseminating, disclosing, copying or distributing this e-mail is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any loss or damage caused by viruses or errors or omissions in the contents of this message, which arise as a result of e-mail transmission. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com
mysql server optimization
Dear all, i am nowing having a problem with the mysql server optimization, i have 20 database on a server,each database is about 80Gb,the sql seems very slow,almost 5s.and the server i/o is so high,when i check the processlist,the 'copying to tmp table' state takes a long time. i have already use index,but the sql use lots of 'and','or','order by', and for some reason i can not optimization the sql,i hope to do some optimization on mysql server to mitigate this phenomenon,could any one give me some suggestion? thanks. my server is linux,8CPU and 4G memery,the my.cnf is: [mysqld] port= 3306 skip-locking skip-name-resolve key_buffer_size = 16M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512M net_buffer_length = 8K read_buffer_size = 512K read_rnd_buffer_size = 512M myisam_sort_buffer_size = 8M table_cache = 1024 log-bin=mysql-bin binlog_format=mixed -- Tianjing