RE: NoSQL help

2012-06-14 Thread Manivannan S.
I tried with myisam engine also. But it also taking more time to generate the 
report. In my database I am having 8 innodb tables and at the same time I am 
joining 4 tables to get the report.

I am maintaining 60days records because the user will try to generate the 
report out of 60 days in terms of second, minute, hourly, weekly and Monthly 
report also.

From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James 
rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote:
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: NoSQL help


 - Original Message -
  From: Manivannan S. 
  manivanna...@spanservices.commailto:manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]

 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)

 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.

 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.

 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.

 Suppose I have a table like this:

 ID | Val
 
  1 1
  2 7
  3 5
  413

 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)

 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:

 Avg | elementCount | lastSeen
 -
 6.5  4  4

 Now, over the course of the day, the elements 4, 17 and 2 get added
 with sequential row numbers. Instead of calculating
 (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
 summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
 like this:

 Avg | elementCount | lastSeen
 -
   7  7  7

 This is of course a stupid example, but it saves you a lot of time if
 you already have the summary of several thousand elements and only need
 to update it for a handful. Similar tricks are possible for a lot of
 typical reporting stuff - you don't need to re-calculate data for past
 months over and over again, for instance - and that's what makes your
 reports run fast.


 Just my 2 cents :-)
 /johan

 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql

DISCLAIMER: This email message and all attachments are confidential and may 
contain information that is privileged, confidential or exempt from disclosure 
under applicable law.  If you are not the intended recipient, you

Re: NoSQL help

2012-06-14 Thread Ananda Kumar
can u share the sql, explain plan, indexes etc,
show full processlist out put when the sql's are running

On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
manivanna...@spanservices.com wrote:

 I tried with myisam engine also. But it also taking more time to generate
 the report. In my database I am having 8 innodb tables and at the same time
 I am joining 4 tables to get the report.

 I am maintaining 60days records because the user will try to generate the
 report out of 60 days in terms of second, minute, hourly, weekly and
 Monthly report also.

 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 12:32 AM
 To: Rick James
 Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
 Subject: Re: NoSQL help

 Did you try with myisam tables.
 They are supposed to be good for reporting requirement
 On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:
 rja...@yahoo-inc.com wrote:
 I'll second Johan's comments.

 Count the disk hits!

 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE over
 any time range via
 SUM(sum_foo) / SUM(count_foo)

 Switching between MySQL and Mongo requires rewriting _all_ of the relevant
 code.

 opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count
 the disk hits!

 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of reports.
  The prototype showed that most reports would take about an hour to run.
  Not good.  The final product, with summary tables, lets the reports be run
 on-demand and online and each takes only a few seconds.  By careful use of
 MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour)
 for the raw data and 2 minutes (total) for the 7 summary tables.
  PARTITIONing was vital for the design.  Once an hour a new partition is
 populated; once a day, 24 hourly partitions are rolled into a new daily
 partition and the 90-day old partition is DROPped.


  -Original Message-
  From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
 vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.
  Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -
   From: Manivannan S. manivanna...@spanservices.commailto:
 manivanna...@spanservices.com
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am, I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often applied
  even when implementing NoSQL solutions, too) it is usually a good first
  step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever you
  need for your reports; and your job also saves the last record IDs it's
  processed; then on the next run, you only read the new records and
  update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those rows is unbearably slow because I'm running the process on a
  wristwatch from 1860 :-)
 
  So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
  gets a record saying this:
 
  Avg | elementCount | lastSeen
  -
  6.5  4  4
 
  Now, over the course of the day, the elements 4, 17 and 2 get added
  with sequential row numbers. Instead of calculating
  (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
  summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
  4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
  like this:
 
  Avg | elementCount | lastSeen
  -
7  7  7
 
  This is of course a stupid example, but it saves you a lot of time if
  you already have the summary of several thousand elements and only need
  to update it for a handful. Similar tricks are possible for a lot of
  typical reporting stuff - you don't need to re-calculate data for past
  months over and over again, for instance - and that's what makes your
  reports run fast.
 
 
  Just my 2 cents :-)
  /johan
 
  --
  Bier met grenadyn
  Is als mosterd by den wyn
  Sy die't drinkt, is eene kwezel
  Hy die't drinkt, is ras een ezel
 
  --
  MySQL General Mailing List
  For list archives

RE: NoSQL help

2012-06-14 Thread Manivannan S.
id

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

1

SIMPLE

ibf

ALL

ibf_MsgId

\N

\N

\N

160944

1

SIMPLE

pl

ref

idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status

idx_unique_key_ib_msg

180

reports.ibf.Message_Id\,const

1

Using where; Using index

1

SIMPLE

tl

ref

idx_unique_key_ib_text\,index_message_id

idx_unique_key_ib_text

153

reports.pl.Message_Id

1

1

SIMPLE

xl

ref

idx_unique_key_ib_xml\,index_message_id

idx_unique_key_ib_xml

153

reports.pl.Message_Id

1

Using where


This is the execution plan for 1.5 milion records..



From: Ananda Kumar [mailto:anan...@gmail.com]mailto:[mailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 3:33 PM
To: Manivannan S.
Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
Subject: Re: NoSQL help

can u share the sql, explain plan, indexes etc,
show full processlist out put when the sql's are running
On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote:
I tried with myisam engine also. But it also taking more time to generate the 
report. In my database I am having 8 innodb tables and at the same time I am 
joining 4 tables to get the report.

I am maintaining 60days records because the user will try to generate the 
report out of 60 days in terms of second, minute, hourly, weekly and Monthly 
report also.

From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; 
mysql@lists.mysql.commailto:mysql@lists.mysql.com
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James 
rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
 wrote:
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman 
 [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: 
 mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: NoSQL help


 - Original Message -
  From: Manivannan S. 
  manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]

 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)

 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.

 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.

 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.

 Suppose I have a table like this:

 ID | Val
 
  1 1
  2 7
  3 5
  413

 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)

 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:

 Avg | elementCount | lastSeen

RE: NoSQL help

2012-06-14 Thread Manivannan S.
id

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

1

SIMPLE

ibf

ALL

ibf_MsgId

\N

\N

\N

160944

1

SIMPLE

pl

ref

idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status

idx_unique_key_ib_msg

180

reports.ibf.Message_Id\,const

1

Using where; Using index

1

SIMPLE

tl

ref

idx_unique_key_ib_text\,index_message_id

idx_unique_key_ib_text

153

reports.pl.Message_Id

1

1

SIMPLE

xl

ref

idx_unique_key_ib_xml\,index_message_id

idx_unique_key_ib_xml

153

reports.pl.Message_Id

1

Using where


Sorry for the previous mail. this is my execution plan for 1.5 million 
records

From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 3:33 PM
To: Manivannan S.
Cc: mysql@lists.mysql.com
Subject: Re: NoSQL help

can u share the sql, explain plan, indexes etc,
show full processlist out put when the sql's are running
On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote:
I tried with myisam engine also. But it also taking more time to generate the 
report. In my database I am having 8 innodb tables and at the same time I am 
joining 4 tables to get the report.

I am maintaining 60days records because the user will try to generate the 
report out of 60 days in terms of second, minute, hourly, weekly and Monthly 
report also.

From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; 
mysql@lists.mysql.commailto:mysql@lists.mysql.com
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James 
rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
 wrote:
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman 
 [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: 
 mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: NoSQL help


 - Original Message -
  From: Manivannan S. 
  manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]

 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)

 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.

 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.

 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.

 Suppose I have a table like this:

 ID | Val
 
  1 1
  2 7
  3 5
  413

 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)

 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:

 Avg | elementCount | lastSeen
 -
 6.5  4  4

Re: NoSQL help

2012-06-14 Thread Ananda Kumar
As seen below,
Full table scan is happening on table ibf.
Can share the indexes on this table and also the complete sql

On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. 
manivanna...@spanservices.com wrote:

id

 select_type

 table

 type

 possible_keys

 key

 key_len

 ref

 rows

 Extra

 1

 SIMPLE

 ibf

 ALL

 ibf_MsgId

 \N

 \N

 \N

 160944

 1

 SIMPLE

 pl

 ref

 idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status**
 **

 idx_unique_key_ib_msg

 180

 reports.ibf.Message_Id\,const

 1

 Using where; Using index

 1

 SIMPLE

 tl

 ref

 idx_unique_key_ib_text\,index_message_id

 idx_unique_key_ib_text

 153

 reports.pl.Message_Id

 1

 1

 SIMPLE

 xl

 ref

 idx_unique_key_ib_xml\,index_message_id

 idx_unique_key_ib_xml

 153

 reports.pl.Message_Id

 1

 Using where

 ** **

 Sorry for the previous mail….. this is my execution plan for 1.5 million
 records….

 ** **

 *From:* Ananda Kumar [mailto:anan...@gmail.com]
 *Sent:* Thursday, June 14, 2012 3:33 PM

 *To:* Manivannan S.
 *Cc:* mysql@lists.mysql.com
 *Subject:* Re: NoSQL help

 ** **

 can u share the sql, explain plan, indexes etc,

 show full processlist out put when the sql's are running

 On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
 manivanna...@spanservices.com wrote:

 I tried with myisam engine also. But it also taking more time to generate
 the report. In my database I am having 8 innodb tables and at the same time
 I am joining 4 tables to get the report.

 I am maintaining 60days records because the user will try to generate the
 report out of 60 days in terms of second, minute, hourly, weekly and
 Monthly report also.

 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 12:32 AM
 To: Rick James
 Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
 Subject: Re: NoSQL help


 Did you try with myisam tables.
 They are supposed to be good for reporting requirement

 On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:
 rja...@yahoo-inc.com wrote:
 I'll second Johan's comments.

 Count the disk hits!

 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE over
 any time range via
 SUM(sum_foo) / SUM(count_foo)

 Switching between MySQL and Mongo requires rewriting _all_ of the relevant
 code.

 opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count
 the disk hits!

 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of reports.
  The prototype showed that most reports would take about an hour to run.
  Not good.  The final product, with summary tables, lets the reports be run
 on-demand and online and each takes only a few seconds.  By careful use of
 MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour)
 for the raw data and 2 minutes (total) for the 7 summary tables.
  PARTITIONing was vital for the design.  Once an hour a new partition is
 populated; once a day, 24 hourly partitions are rolled into a new daily
 partition and the 90-day old partition is DROPped.


  -Original Message-

  From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
 vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.

  Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -

   From: Manivannan S. manivanna...@spanservices.commailto:
 manivanna...@spanservices.com
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am, I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often applied
  even when implementing NoSQL solutions, too) it is usually a good first
  step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever you
  need for your reports; and your job also saves the last record IDs it's
  processed; then on the next run, you only read the new records and
  update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those

RE: NoSQL help

2012-06-14 Thread Rick James
SHOW CREATE TABLE
SHOW TABLE STATUS
EXPLAIN SELECT ... \G


 -Original Message-
 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 4:04 AM
 To: Manivannan S.
 Cc: mysql@lists.mysql.com
 Subject: Re: NoSQL help
 
 As seen below,
 Full table scan is happening on table ibf.
 Can share the indexes on this table and also the complete sql
 
 On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. 
 manivanna...@spanservices.com wrote:
 
 id
 
  select_type
 
  table
 
  type
 
  possible_keys
 
  key
 
  key_len
 
  ref
 
  rows
 
  Extra
 
  1
 
  SIMPLE
 
  ibf
 
  ALL
 
  ibf_MsgId
 
  \N
 
  \N
 
  \N
 
  160944
 
  1
 
  SIMPLE
 
  pl
 
  ref
 
 
 idx_unique_key_ib_msg\,index_message_id\,index_message_processing_stat
  us**
  **
 
  idx_unique_key_ib_msg
 
  180
 
  reports.ibf.Message_Id\,const
 
  1
 
  Using where; Using index
 
  1
 
  SIMPLE
 
  tl
 
  ref
 
  idx_unique_key_ib_text\,index_message_id
 
  idx_unique_key_ib_text
 
  153
 
  reports.pl.Message_Id
 
  1
 
  1
 
  SIMPLE
 
  xl
 
  ref
 
  idx_unique_key_ib_xml\,index_message_id
 
  idx_unique_key_ib_xml
 
  153
 
  reports.pl.Message_Id
 
  1
 
  Using where
 
  ** **
 
  Sorry for the previous mail. this is my execution plan for 1.5
  million
  records
 
  ** **
 
  *From:* Ananda Kumar [mailto:anan...@gmail.com]
  *Sent:* Thursday, June 14, 2012 3:33 PM
 
  *To:* Manivannan S.
  *Cc:* mysql@lists.mysql.com
  *Subject:* Re: NoSQL help
 
  ** **
 
  can u share the sql, explain plan, indexes etc,
 
  show full processlist out put when the sql's are running
 
  On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
  manivanna...@spanservices.com wrote:
 
  I tried with myisam engine also. But it also taking more time to
  generate the report. In my database I am having 8 innodb tables and
 at
  the same time I am joining 4 tables to get the report.
 
  I am maintaining 60days records because the user will try to generate
  the report out of 60 days in terms of second, minute, hourly, weekly
  and Monthly report also.
 
  From: Ananda Kumar [mailto:anan...@gmail.com]
  Sent: Thursday, June 14, 2012 12:32 AM
  To: Rick James
  Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  Did you try with myisam tables.
  They are supposed to be good for reporting requirement
 
  On Wed, Jun 13, 2012 at 11:52 PM, Rick James rjames@yahoo-
 inc.commailto:
  rja...@yahoo-inc.com wrote:
  I'll second Johan's comments.
 
  Count the disk hits!
 
  One minor change:  Don't store averages in the summary table; instead
  store the SUM().  That lets you get the mathematically correct
 AVERAGE
  over any time range via
  SUM(sum_foo) / SUM(count_foo)
 
  Switching between MySQL and Mongo requires rewriting _all_ of the
  relevant code.
 
  opinion NoSQL will be no better than MySQL for 150GB. /opinion
  Count the disk hits!
 
  I recently built a system that topped out at 350GB (90 days' data).
  It involved hourly ingestion of a few GB of data and a variety of
 reports.
   The prototype showed that most reports would take about an hour to
 run.
   Not good.  The final product, with summary tables, lets the reports
  be run on-demand and online and each takes only a few seconds.  By
  careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5
  minutes (each hour) for the raw data and 2 minutes (total) for the 7
 summary tables.
   PARTITIONing was vital for the design.  Once an hour a new partition
  is populated; once a day, 24 hourly partitions are rolled into a new
  daily partition and the 90-day old partition is DROPped.
 
 
   -Original Message-
 
   From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
  vegiv...@tuxera.be]
   Sent: Wednesday, June 13, 2012 6:20 AM
   To: Manivannan S.
 
   Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
   Subject: Re: NoSQL help
  
  
   - Original Message -
 
From: Manivannan S. manivanna...@spanservices.commailto:
  manivanna...@spanservices.com
   
Hi all,
   
[lots of data]
[slow reports]
[wooo NoSQL magic]
  
   Not that I want to discourage you, but my standard first question
 is
   why do you think NoSQL (let alone any specific product) is the
   right solution? :-)
  
   Don't get me wrong, it might be; but from what little I now know
   about your environment, it sounds like applying some data
   warehousing techniques might suffice - and being the cynical
   dinosaur that I am, I have a healthy reluctance about welding new
   technology onto a stable environment.
  
   To speed up reporting (and note that these techniques are often
   applied even when implementing NoSQL solutions, too) it is usually
 a
   good first step to set up a process of data

RE: NoSQL help

2012-06-14 Thread Rick James
 the report out of 60 days in terms of second, minute, hourly, weekly and 
 Monthly report also

1-second reports??  Wouldn't that have millions of rows?  Is there anything 
useful to summarize?

For Second and Minute over narrow ranges, you might do fine reading the raw 
data.

For hourly reports an hourly summary table might be 'right'.

For daily, weekly, and monthly, have a daily summary table.  There would be one 
row per widget per day.


 -Original Message-
 From: Manivannan S. [mailto:manivanna...@spanservices.com]
 Sent: Thursday, June 14, 2012 2:33 AM
 To: mysql@lists.mysql.com
 Subject: RE: NoSQL help
 
 I tried with myisam engine also. But it also taking more time to
 generate the report. In my database I am having 8 innodb tables and at
 the same time I am joining 4 tables to get the report.
 
 I am maintaining 60days records because the user will try to generate
 the report out of 60 days in terms of second, minute, hourly, weekly
 and Monthly report also.
 
 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 12:32 AM
 To: Rick James
 Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
 Subject: Re: NoSQL help
 
 Did you try with myisam tables.
 They are supposed to be good for reporting requirement On Wed, Jun 13,
 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rjames@yahoo-
 inc.com wrote:
 I'll second Johan's comments.
 
 Count the disk hits!
 
 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE
 over any time range via
 SUM(sum_foo) / SUM(count_foo)
 
 Switching between MySQL and Mongo requires rewriting _all_ of the
 relevant code.
 
 opinion NoSQL will be no better than MySQL for 150GB. /opinion
 Count the disk hits!
 
 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of
 reports.  The prototype showed that most reports would take about an
 hour to run.  Not good.  The final product, with summary tables, lets
 the reports be run on-demand and online and each takes only a few
 seconds.  By careful use of MEMORY tables, LOAD DATA, etc, the
 ingestion takes 5 minutes (each hour) for the raw data and 2 minutes
 (total) for the 7 summary tables.  PARTITIONing was vital for the
 design.  Once an hour a new partition is populated; once a day, 24
 hourly partitions are rolled into a new daily partition and the 90-day
 old partition is DROPped.
 
 
  -Original Message-
  From: Johan De Meersman
  [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.
  Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -
   From: Manivannan S.
  
 manivanna...@spanservices.commailto:manivanna...@spanservices.com
   
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know
 about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am,
 I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often
  applied even when implementing NoSQL solutions, too) it is usually a
  good first step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever
  you need for your reports; and your job also saves the last record
 IDs
  it's processed; then on the next run, you only read the new records
  and update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those rows is unbearably slow because I'm running the process on
  a wristwatch from 1860 :-)
 
  So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
  gets a record saying this:
 
  Avg | elementCount | lastSeen
  -
  6.5  4  4
 
  Now, over the course of the day, the elements 4, 17 and 2 get added
  with sequential row numbers. Instead of calculating
  (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the
 already
  summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
  4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
  like this:
 
  Avg | elementCount | lastSeen
  -
7  7  7
 
  This is of course a stupid example, but it saves you a lot of time if
  you already have the summary of several thousand elements and only

NoSQL help

2012-06-13 Thread Manivannan S.
Hi all,

I am using MySQL 5.1, in this I am inserting 5GB of data for 
two days into my database. I am trying to generate a report by processing these 
data which are available in my database. Our clients are planning to keep the 
records for 60 days then that will cross 150GB of data. To generate a report I 
have to use all this accumulated of 150 GB data. I have done all kind of 
optimizations in my procedure and  I have tuned up my MySQL server parameters 
also. But using MySQL getting the reports for this amount of data, within the 
short time is not possible.

I have seen the concept of NoSQL and I am planning to implement 
this NoSQL concept into my database.

Does anyone have any idea in NoSQL especially MongoDB 
technology and how to use this ?

Thanks in advance.

Regards,
Manivannan S

DISCLAIMER: This email message and all attachments are confidential and may 
contain information that is privileged, confidential or exempt from disclosure 
under applicable law.  If you are not the intended recipient, you are notified 
that any dissemination, distribution or copying of this email is strictly 
prohibited. If you have received this email in error, please notify us 
immediately by return email or to mailad...@spanservices.com and destroy the 
original message.  Opinions, conclusions and other information in this message 
that do not relate to the official business of SPAN, shall be understood to be 
neither given nor endorsed by SPAN.


Re: NoSQL help

2012-06-13 Thread Johan De Meersman

- Original Message -
 From: Manivannan S. manivanna...@spanservices.com
 
 Hi all,
 
 [lots of data]
 [slow reports]
 [wooo NoSQL magic]

Not that I want to discourage you, but my standard first question is why do 
you think NoSQL (let alone any specific product) is the right solution? :-)

Don't get me wrong, it might be; but from what little I now know about your 
environment, it sounds like applying some data warehousing techniques might 
suffice - and being the cynical dinosaur that I am, I have a healthy reluctance 
about welding new technology onto a stable environment.

To speed up reporting (and note that these techniques are often applied even 
when implementing NoSQL solutions, too) it is usually a good first step to set 
up a process of data summarization.

Basically, you pre-calculate averages, medians, groupings, whatever you need 
for your reports; and your job also saves the last record IDs it's processed; 
then on the next run, you only read the new records and update your summary 
tables to incorporate the new data.

Suppose I have a table like this:

ID | Val

 1 1
 2 7
 3 5
 413

I want to report the average on a daily basis, and calculating that over those 
rows is unbearably slow because I'm running the process on a wristwatch from 
1860 :-)

So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a 
record saying this:

Avg | elementCount | lastSeen
-
6.5  4  4

Now, over the course of the day, the elements 4, 17 and 2 get added with 
sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would 
be slow; I can substitute the already summarized data by Avg*elementCount. 
Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary 
table now looks like this:

Avg | elementCount | lastSeen
-
  7  7  7

This is of course a stupid example, but it saves you a lot of time if you 
already have the summary of several thousand elements and only need to update 
it for a handful. Similar tricks are possible for a lot of typical reporting 
stuff - you don't need to re-calculate data for past months over and over 
again, for instance - and that's what makes your reports run fast.


Just my 2 cents :-)
/johan

-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: NoSQL help

2012-06-13 Thread mos

At 07:27 AM 6/13/2012, Manivannan S. wrote:

Hi all,

I am using MySQL 5.1, in this I am inserting 5GB of 
data for two days into my database. I am trying to generate a 
report by processing these data which are available in my database. 
Our clients are planning to keep the records for 60 days then that 
will cross 150GB of data. To generate a report I have to use all 
this accumulated of 150 GB data. I have done all kind of 
optimizations in my procedure and  I have tuned up my MySQL server 
parameters also. But using MySQL getting the reports for this 
amount of data, within the short time is not possible.


I have seen the concept of NoSQL and I am planning 
to implement this NoSQL concept into my database.


Does anyone have any idea in NoSQL especially 
MongoDB technology and how to use this ?


Thanks in advance.

Regards,
Manivannan S

DISCLAIMER: This email message and all attachments are confidential 
and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law.  If you are not the 
intended recipient, you are notified that any dissemination, 
distribution or copying of this email is strictly prohibited. If you 
have received this email in error, please notify us immediately by 
return email or to mailad...@spanservices.com and destroy the 
original message.  Opinions, conclusions and other information in 
this message that do not relate to the official business of SPAN, 
shall be understood to be neither given nor endorsed by SPAN.


You didn't say what the problem was when you tried to produce a 
report on this data.


1) Did the sorting take too long?
2) Did traversing the data take too long?
3) Were the reports tables locked by other processes?
4) Using too much resources like memory, CPU, or disk?
5) Joining tables takes too long?

You may want to look into Handler. I've used it often when I needed 
to traverse very large MyISAM tables. Handler requires no physical 
sorting of the table because it traverses the table using an index. 
It also ignores any locks on the table (which you may or may not 
like), but the Handler will start traversing the table immediately. 
It will solve problems 1,3 ,4 but not #2 because traversing a large 
table with an index will be slower than if the table was already 
sorted. One nice thing about the Handler is it uses virtually no 
additional memory regardless of table size and very little disk 
activity because there is no sorting. You can run it any time and it 
won't degrade other MySQL processes. I don't think Handler will join 
tables together; I have used it only to traverse a single table.


One other tip. When loading the data into the table, Load Data Infile 
will be much faster on an empty table so recreate your tables from 
scratch before loading the data. Also build all of the indexes after 
the data has been loaded using one Alter Table command, and if 
possible, reduce the number of unique indexes in the table.


http://dev.mysql.com/doc/refman/5.5/en/handler.html

Mike 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: NoSQL help

2012-06-13 Thread Rick James
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: mysql@lists.mysql.com
 Subject: Re: NoSQL help
 
 
 - Original Message -
  From: Manivannan S. manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]
 
 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)
 
 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.
 
 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.
 
 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.
 
 Suppose I have a table like this:
 
 ID | Val
 
  1 1
  2 7
  3 5
  413
 
 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)
 
 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:
 
 Avg | elementCount | lastSeen
 -
 6.5  4  4
 
 Now, over the course of the day, the elements 4, 17 and 2 get added
 with sequential row numbers. Instead of calculating
 (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
 summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
 like this:
 
 Avg | elementCount | lastSeen
 -
   7  7  7
 
 This is of course a stupid example, but it saves you a lot of time if
 you already have the summary of several thousand elements and only need
 to update it for a handful. Similar tricks are possible for a lot of
 typical reporting stuff - you don't need to re-calculate data for past
 months over and over again, for instance - and that's what makes your
 reports run fast.
 
 
 Just my 2 cents :-)
 /johan
 
 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql



Re: NoSQL help

2012-06-13 Thread Ananda Kumar
Did you try with myisam tables.
They are supposed to be good for reporting requirement

On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.com wrote:

 I'll second Johan's comments.

 Count the disk hits!

 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE over
 any time range via
 SUM(sum_foo) / SUM(count_foo)

 Switching between MySQL and Mongo requires rewriting _all_ of the relevant
 code.

 opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count
 the disk hits!

 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of reports.
  The prototype showed that most reports would take about an hour to run.
  Not good.  The final product, with summary tables, lets the reports be run
 on-demand and online and each takes only a few seconds.  By careful use of
 MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour)
 for the raw data and 2 minutes (total) for the 7 summary tables.
  PARTITIONing was vital for the design.  Once an hour a new partition is
 populated; once a day, 24 hourly partitions are rolled into a new daily
 partition and the 90-day old partition is DROPped.


  -Original Message-
  From: Johan De Meersman [mailto:vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.
  Cc: mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -
   From: Manivannan S. manivanna...@spanservices.com
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am, I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often applied
  even when implementing NoSQL solutions, too) it is usually a good first
  step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever you
  need for your reports; and your job also saves the last record IDs it's
  processed; then on the next run, you only read the new records and
  update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those rows is unbearably slow because I'm running the process on a
  wristwatch from 1860 :-)
 
  So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
  gets a record saying this:
 
  Avg | elementCount | lastSeen
  -
  6.5  4  4
 
  Now, over the course of the day, the elements 4, 17 and 2 get added
  with sequential row numbers. Instead of calculating
  (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
  summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
  4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
  like this:
 
  Avg | elementCount | lastSeen
  -
7  7  7
 
  This is of course a stupid example, but it saves you a lot of time if
  you already have the summary of several thousand elements and only need
  to update it for a handful. Similar tricks are possible for a lot of
  typical reporting stuff - you don't need to re-calculate data for past
  months over and over again, for instance - and that's what makes your
  reports run fast.
 
 
  Just my 2 cents :-)
  /johan
 
  --
  Bier met grenadyn
  Is als mosterd by den wyn
  Sy die't drinkt, is eene kwezel
  Hy die't drinkt, is ras een ezel
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:http://lists.mysql.com/mysql