RE: NoSQL help
I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql DISCLAIMER: This email message and all attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, you
Re: NoSQL help
can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto: rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto: vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto: manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives:
RE: NoSQL help
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where This is the execution plan for 1.5 milion records.. From: Ananda Kumar [mailto:anan...@gmail.com]mailto:[mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 3:33 PM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen
RE: NoSQL help
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where Sorry for the previous mail. this is my execution plan for 1.5 million records From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 3:33 PM To: Manivannan S. Cc: mysql@lists.mysql.com Subject: Re: NoSQL help can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4
ANN: Data Wizard for MySQL 12.6 released
Hi! SQL Maestro Group announces the release of Data Wizard for MySQL 12.6, a powerful Windows GUI solution for MySQL data management. The new version is immediately available at http://www.sqlmaestro.com/products/mysql/datawizard/ Data Wizard for MySQL provides you with a number of easy-to-use wizards to convert any ADO-compatible database to the MySQL database, import data into MySQL tables, export data from tables, views and queries to most popular file formats as well as generate data-driven ASP.NET pages for your MySQL database. Top 10 new features = 1. The SQL Dump wizard. This tool allows you to export the data from selected tables and views as a set of INSERT statements. 2. Data Pump: User-defined data type mappings. 3. Data Pump: Recognizing autoincrement columns in the source database and converting them in the best possible way. 4. Data Pump: Data import using bulk-loading commands. 5. Data Import: Auto arranging target tables by dependencies. 6. Data Import: Import data from CSV files that contain multi-line strings. 7. Data Export: Decimal and thousand separators for the generated files (when applicable). 8. All tools: Correct exit status (0 on success, 1 on error) for the command-line mode. This return code can be checked in batch files using the ERRORLEVEL variable. 9. All tools: Progress windows are redesigned in the modern style and become more informative. 10. All tools: Log files become more structured and detailed. In addition to this, several bugs have been fixed and some other minor improvements and corrections have been made. Full press-release is available at: http://www.sqlmaestro.com/news/company/data_wizards_12_6_released/ Background information: --- SQL Maestro Group offers complete database admin, development and management tools for MySQL, SQL Server, PostgreSQL, Oracle, DB2, SQLite, SQL Anywhere, Firebird and MaxDB providing the highest performance, scalability and reliability to meet the requirements of today's database applications. Sincerely yours, The SQL Maestro Group Team http://www.sqlmaestro.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: NoSQL help
As seen below, Full table scan is happening on table ibf. Can share the indexes on this table and also the complete sql On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. manivanna...@spanservices.com wrote: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status** ** idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where ** ** Sorry for the previous mail….. this is my execution plan for 1.5 million records…. ** ** *From:* Ananda Kumar [mailto:anan...@gmail.com] *Sent:* Thursday, June 14, 2012 3:33 PM *To:* Manivannan S. *Cc:* mysql@lists.mysql.com *Subject:* Re: NoSQL help ** ** can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto: rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto: vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto: manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those
Re: Minor collation question
- Original Message - From: Rick James rja...@yahoo-inc.com A warning about β -- There was a change in the collation at 5.1.24. Search http://mysql.rjweb.org/doc.php/charcoll for 'German'; there is a brief discussion near the end. Aha, also a very good bit of information, thank you. The unix boys had started their own upgrade cycle in the mean time from Debian Lenny to Squeeze, so were unwittingly already upgrading to 5.1. A swift explanation of the problem and the subliminal thread of a swift kick in the valueables got them to suspend that trajectory until I build myself Debian-compliant 5.5 packages for Squeeze :-D Any pointers towards the latter (no, Dotdeb's will not do - they don't magically replace the native squeeze ones) would also be appreciated; if not I'll be rolling my own. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Minor collation question
http://bugs.mysql.com/bug.php?id=27877 W.T. - Original Message - From: Rick James rja...@yahoo-inc.com A warning about β -- There was a change in the collation at 5.1.24. Search http://mysql.rjweb.org/doc.php/charcoll for 'German'; there is a brief discussion near the end. Aha, also a very good bit of information, thank you. The unix boys had started their own upgrade cycle in the mean time from Debian Lenny to Squeeze, so were unwittingly already upgrading to 5.1. A swift explanation of the problem and the subliminal thread of a swift kick in the valueables got them to suspend that trajectory until I build myself Debian-compliant 5.5 packages for Squeeze :-D Any pointers towards the latter (no, Dotdeb's will not do - they don't magically replace the native squeeze ones) would also be appreciated; if not I'll be rolling my own. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: NoSQL help
SHOW CREATE TABLE SHOW TABLE STATUS EXPLAIN SELECT ... \G -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 4:04 AM To: Manivannan S. Cc: mysql@lists.mysql.com Subject: Re: NoSQL help As seen below, Full table scan is happening on table ibf. Can share the indexes on this table and also the complete sql On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. manivanna...@spanservices.com wrote: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_stat us** ** idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where ** ** Sorry for the previous mail. this is my execution plan for 1.5 million records ** ** *From:* Ananda Kumar [mailto:anan...@gmail.com] *Sent:* Thursday, June 14, 2012 3:33 PM *To:* Manivannan S. *Cc:* mysql@lists.mysql.com *Subject:* Re: NoSQL help ** ** can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rjames@yahoo- inc.commailto: rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto: vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto: manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data
RE: NoSQL help
the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also 1-second reports?? Wouldn't that have millions of rows? Is there anything useful to summarize? For Second and Minute over narrow ranges, you might do fine reading the raw data. For hourly reports an hourly summary table might be 'right'. For daily, weekly, and monthly, have a daily summary table. There would be one row per widget per day. -Original Message- From: Manivannan S. [mailto:manivanna...@spanservices.com] Sent: Thursday, June 14, 2012 2:33 AM To: mysql@lists.mysql.com Subject: RE: NoSQL help I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rjames@yahoo- inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only
RE: Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases.
There should be little or no difference. If you are using thousands of tables, you might encounter overhead in opening the .ibd files. If you are tight on disk space, a single ibdata1 might be more efficient at reusing free blocks. OTOH, if you shrink or drop a big table, the freed space is not returned to the OS if you have a singe ibdata1. In most cases, I recommend innodb_file_per_table=1. -Original Message- From: Pothanaboyina Trimurthy [mailto:skd.trimur...@gmail.com] Sent: Tuesday, May 15, 2012 4:58 AM To: mysql@lists.mysql.com Subject: Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases. hi every one Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases, for InnoDB Storage Engine. please let me know the difference. -- 3murthy -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases.
there is performance issues with a larger number of datafiles than a single, that the reason innodb_file_per_table in not a default option. other, with innodb_file_per_table, you'll use more resources, there can be a problem if you have _many_ tables, there is obviously the problem to keep number of open files (open tables / table_cache / table_definition_cache) to manage. (Ofcourse, myisam tables always had that issue). Netx, with single file, table space can be shared between rows of different tables and schemas. This means less wasted tablespace. where, with innodb_file_per_table, each table may have unused tablspace, which can only be utilized by rows of the same table. This means (sometimes much) more wasted tablespace but still innodb_file_per_table is preferable, this lead to a disk space issue, since the ibdata1 file grew to 90G+ and it was not possible to defragment tablespace using the Alter method (Infact is is hardly possible to do in a production environment, since it leads to a huge downtime.) On Thu, Jun 14, 2012 at 12:19 PM, Rick James rja...@yahoo-inc.com wrote: There should be little or no difference. If you are using thousands of tables, you might encounter overhead in opening the .ibd files. If you are tight on disk space, a single ibdata1 might be more efficient at reusing free blocks. OTOH, if you shrink or drop a big table, the freed space is not returned to the OS if you have a singe ibdata1. In most cases, I recommend innodb_file_per_table=1. -Original Message- From: Pothanaboyina Trimurthy [mailto:skd.trimur...@gmail.com] Sent: Tuesday, May 15, 2012 4:58 AM To: mysql@lists.mysql.com Subject: Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases. hi every one Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases, for InnoDB Storage Engine. please let me know the difference. -- 3murthy -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- Best Regards, Prabhat Kumar MySQL DBA My Blog: http://adminlinux.blogspot.com My LinkedIn: http://www.linkedin.com/in/profileprabhat
Re: i need advice on redundancy of mysql server.
Guys - thanks for the replys - do any of you guys are on odesk or elancer.com ?? thanks On Wed, Jun 13, 2012 at 7:28 PM, Rick James rja...@yahoo-inc.com wrote: I prefer: * Master-Master (dual master) but write to only one of them. (Writing to both can lead to duplicate keys, etc., unless you are very careful in your code.) * Have the two Masters geographically separate. (Think tornados, floods, earthquakes, etc) * Have Slave(s) hanging of each master -- (1) for read scaling, and (2) to avoid a major outage when one Master goes down and you need to take the other one down to clone it. Another thing to consider: Backing up via a LVM snapshot requires only a minute or so of downtime, regardless of dataset size. Percona's XtraBackup is also very good. I also agree that MyISAM in not best. But, caution, InnoDB's disk footprint is 2x=3x bigger than MyISAM's. You can Load Balance reads (among slaves and, optionally, masters); you cannot do writes. Any number of Apache servers can talk to MySQL. But watch out -- MaxClients should not be so large that it swamps max_connections. Load balancing: DNS is the simple way to load balance Apache. There are low-impact software solutions. There are hardware solutions. (This is what I am used to at work; it is severe overkill for most users.) Bottom line: There is no best or perfect solution. First decide what 'keeps you up at night'. -Original Message- From: Joey L [mailto:mjh2...@gmail.com] Sent: Monday, June 11, 2012 7:26 AM To: mysql@lists.mysql.com Subject: i need advice on redundancy of mysql server. I am running a site with about 50gig myisam databases which are the backend to different websites. I can not afford any downtime and the data is realtime. What is the best method for this setup? master-master or master-slave? What are the best utilities to create and maintain this setup? as far as load balancing between the two physical servers that i am running. I am currently working with percona utilities - is there something better ? what would you use to load balance mysql ? what would you use to load balance apache. thanks -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
console input
Hi all, I've looked high and low for what I hope is a trivial answer. I was trying to load a table using LOAD DATA INFILE. Unfortunately, it craps out because there are some duplicate primary keys. Not surprising as the source did not enforce uniqueness. My problem is the load data simply dies without indicating which line of the input file was in error; the error message refers to line 3, which is not even the SQL statement for the LOAD DATA INTO statement: I can get the table loaded by specifying REPLACE INTO TABLE, but that still leaves me with not knowing where the duplicate records are. So... I wanted to read the data line at a time and use a plain INSERT statement. That way I could check for duplicate keys and discover where the duplicate records are. However, I can't find a way to read input from the console or a file. What am I missing? I know I could write a java or C++ program to do this, but it seems like overkill for what should be a trivial task. Thanks for any pointers, Gary -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: console input
On 2012-06-14, Gary Aitken my...@dreamchaser.org wrote: So... I wanted to read the data line at a time and use a plain INSERT statement. That way I could check for duplicate keys and discover where the duplicate records are. However, I can't find a way to read input from the console or a file. What am I missing? I know I could write a java or C++ program to do this, but it seems like overkill for what should be a trivial task. I don't know of any stock mysql program that does such a thing. You could write a Perl or Python program for this task; this program would probably be fairly short. --keith -- kkel...@wombat.san-francisco.ca.us -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql