RE: NoSQL help
I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql DISCLAIMER: This email message and all attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, you
Re: NoSQL help
can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto: rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto: vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto: manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives
RE: NoSQL help
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where This is the execution plan for 1.5 milion records.. From: Ananda Kumar [mailto:anan...@gmail.com]mailto:[mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 3:33 PM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen
RE: NoSQL help
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where Sorry for the previous mail. this is my execution plan for 1.5 million records From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 3:33 PM To: Manivannan S. Cc: mysql@lists.mysql.com Subject: Re: NoSQL help can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4
Re: NoSQL help
As seen below, Full table scan is happening on table ibf. Can share the indexes on this table and also the complete sql On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. manivanna...@spanservices.com wrote: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status** ** idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where ** ** Sorry for the previous mail….. this is my execution plan for 1.5 million records…. ** ** *From:* Ananda Kumar [mailto:anan...@gmail.com] *Sent:* Thursday, June 14, 2012 3:33 PM *To:* Manivannan S. *Cc:* mysql@lists.mysql.com *Subject:* Re: NoSQL help ** ** can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto: rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto: vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto: manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those
RE: NoSQL help
SHOW CREATE TABLE SHOW TABLE STATUS EXPLAIN SELECT ... \G -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 4:04 AM To: Manivannan S. Cc: mysql@lists.mysql.com Subject: Re: NoSQL help As seen below, Full table scan is happening on table ibf. Can share the indexes on this table and also the complete sql On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. manivanna...@spanservices.com wrote: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ibf ALL ibf_MsgId \N \N \N 160944 1 SIMPLE pl ref idx_unique_key_ib_msg\,index_message_id\,index_message_processing_stat us** ** idx_unique_key_ib_msg 180 reports.ibf.Message_Id\,const 1 Using where; Using index 1 SIMPLE tl ref idx_unique_key_ib_text\,index_message_id idx_unique_key_ib_text 153 reports.pl.Message_Id 1 1 SIMPLE xl ref idx_unique_key_ib_xml\,index_message_id idx_unique_key_ib_xml 153 reports.pl.Message_Id 1 Using where ** ** Sorry for the previous mail. this is my execution plan for 1.5 million records ** ** *From:* Ananda Kumar [mailto:anan...@gmail.com] *Sent:* Thursday, June 14, 2012 3:33 PM *To:* Manivannan S. *Cc:* mysql@lists.mysql.com *Subject:* Re: NoSQL help ** ** can u share the sql, explain plan, indexes etc, show full processlist out put when the sql's are running On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. manivanna...@spanservices.com wrote: I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rjames@yahoo- inc.commailto: rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto: vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto: manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data
RE: NoSQL help
the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also 1-second reports?? Wouldn't that have millions of rows? Is there anything useful to summarize? For Second and Minute over narrow ranges, you might do fine reading the raw data. For hourly reports an hourly summary table might be 'right'. For daily, weekly, and monthly, have a daily summary table. There would be one row per widget per day. -Original Message- From: Manivannan S. [mailto:manivanna...@spanservices.com] Sent: Thursday, June 14, 2012 2:33 AM To: mysql@lists.mysql.com Subject: RE: NoSQL help I tried with myisam engine also. But it also taking more time to generate the report. In my database I am having 8 innodb tables and at the same time I am joining 4 tables to get the report. I am maintaining 60days records because the user will try to generate the report out of 60 days in terms of second, minute, hourly, weekly and Monthly report also. From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Thursday, June 14, 2012 12:32 AM To: Rick James Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com Subject: Re: NoSQL help Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rjames@yahoo- inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.commailto:manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only
Re: NoSQL help
- Original Message - From: Manivannan S. manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: NoSQL help
At 07:27 AM 6/13/2012, Manivannan S. wrote: Hi all, I am using MySQL 5.1, in this I am inserting 5GB of data for two days into my database. I am trying to generate a report by processing these data which are available in my database. Our clients are planning to keep the records for 60 days then that will cross 150GB of data. To generate a report I have to use all this accumulated of 150 GB data. I have done all kind of optimizations in my procedure and I have tuned up my MySQL server parameters also. But using MySQL getting the reports for this amount of data, within the short time is not possible. I have seen the concept of NoSQL and I am planning to implement this NoSQL concept into my database. Does anyone have any idea in NoSQL especially MongoDB technology and how to use this ? Thanks in advance. Regards, Manivannan S DISCLAIMER: This email message and all attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, you are notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please notify us immediately by return email or to mailad...@spanservices.com and destroy the original message. Opinions, conclusions and other information in this message that do not relate to the official business of SPAN, shall be understood to be neither given nor endorsed by SPAN. You didn't say what the problem was when you tried to produce a report on this data. 1) Did the sorting take too long? 2) Did traversing the data take too long? 3) Were the reports tables locked by other processes? 4) Using too much resources like memory, CPU, or disk? 5) Joining tables takes too long? You may want to look into Handler. I've used it often when I needed to traverse very large MyISAM tables. Handler requires no physical sorting of the table because it traverses the table using an index. It also ignores any locks on the table (which you may or may not like), but the Handler will start traversing the table immediately. It will solve problems 1,3 ,4 but not #2 because traversing a large table with an index will be slower than if the table was already sorted. One nice thing about the Handler is it uses virtually no additional memory regardless of table size and very little disk activity because there is no sorting. You can run it any time and it won't degrade other MySQL processes. I don't think Handler will join tables together; I have used it only to traverse a single table. One other tip. When loading the data into the table, Load Data Infile will be much faster on an empty table so recreate your tables from scratch before loading the data. Also build all of the indexes after the data has been loaded using one Alter Table command, and if possible, reduce the number of unique indexes in the table. http://dev.mysql.com/doc/refman/5.5/en/handler.html Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: NoSQL help
I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: NoSQL help
Did you try with myisam tables. They are supposed to be good for reporting requirement On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.com wrote: I'll second Johan's comments. Count the disk hits! One minor change: Don't store averages in the summary table; instead store the SUM(). That lets you get the mathematically correct AVERAGE over any time range via SUM(sum_foo) / SUM(count_foo) Switching between MySQL and Mongo requires rewriting _all_ of the relevant code. opinion NoSQL will be no better than MySQL for 150GB. /opinion Count the disk hits! I recently built a system that topped out at 350GB (90 days' data). It involved hourly ingestion of a few GB of data and a variety of reports. The prototype showed that most reports would take about an hour to run. Not good. The final product, with summary tables, lets the reports be run on-demand and online and each takes only a few seconds. By careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 2 minutes (total) for the 7 summary tables. PARTITIONing was vital for the design. Once an hour a new partition is populated; once a day, 24 hourly partitions are rolled into a new daily partition and the 90-day old partition is DROPped. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.be] Sent: Wednesday, June 13, 2012 6:20 AM To: Manivannan S. Cc: mysql@lists.mysql.com Subject: Re: NoSQL help - Original Message - From: Manivannan S. manivanna...@spanservices.com Hi all, [lots of data] [slow reports] [wooo NoSQL magic] Not that I want to discourage you, but my standard first question is why do you think NoSQL (let alone any specific product) is the right solution? :-) Don't get me wrong, it might be; but from what little I now know about your environment, it sounds like applying some data warehousing techniques might suffice - and being the cynical dinosaur that I am, I have a healthy reluctance about welding new technology onto a stable environment. To speed up reporting (and note that these techniques are often applied even when implementing NoSQL solutions, too) it is usually a good first step to set up a process of data summarization. Basically, you pre-calculate averages, medians, groupings, whatever you need for your reports; and your job also saves the last record IDs it's processed; then on the next run, you only read the new records and update your summary tables to incorporate the new data. Suppose I have a table like this: ID | Val 1 1 2 7 3 5 413 I want to report the average on a daily basis, and calculating that over those rows is unbearably slow because I'm running the process on a wristwatch from 1860 :-) So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a record saying this: Avg | elementCount | lastSeen - 6.5 4 4 Now, over the course of the day, the elements 4, 17 and 2 get added with sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already summarized data by Avg*elementCount. Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks like this: Avg | elementCount | lastSeen - 7 7 7 This is of course a stupid example, but it saves you a lot of time if you already have the summary of several thousand elements and only need to update it for a handful. Similar tricks are possible for a lot of typical reporting stuff - you don't need to re-calculate data for past months over and over again, for instance - and that's what makes your reports run fast. Just my 2 cents :-) /johan -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql