RE: NoSQL help

2012-06-14 Thread Manivannan S.
I tried with myisam engine also. But it also taking more time to generate the 
report. In my database I am having 8 innodb tables and at the same time I am 
joining 4 tables to get the report.

I am maintaining 60days records because the user will try to generate the 
report out of 60 days in terms of second, minute, hourly, weekly and Monthly 
report also.

From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James 
rja...@yahoo-inc.commailto:rja...@yahoo-inc.com wrote:
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: NoSQL help


 - Original Message -
  From: Manivannan S. 
  manivanna...@spanservices.commailto:manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]

 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)

 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.

 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.

 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.

 Suppose I have a table like this:

 ID | Val
 
  1 1
  2 7
  3 5
  413

 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)

 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:

 Avg | elementCount | lastSeen
 -
 6.5  4  4

 Now, over the course of the day, the elements 4, 17 and 2 get added
 with sequential row numbers. Instead of calculating
 (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
 summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
 4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
 like this:

 Avg | elementCount | lastSeen
 -
   7  7  7

 This is of course a stupid example, but it saves you a lot of time if
 you already have the summary of several thousand elements and only need
 to update it for a handful. Similar tricks are possible for a lot of
 typical reporting stuff - you don't need to re-calculate data for past
 months over and over again, for instance - and that's what makes your
 reports run fast.


 Just my 2 cents :-)
 /johan

 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql

DISCLAIMER: This email message and all attachments are confidential and may 
contain information that is privileged, confidential or exempt from disclosure 
under applicable law.  If you are not the intended recipient, you 

Re: NoSQL help

2012-06-14 Thread Ananda Kumar
can u share the sql, explain plan, indexes etc,
show full processlist out put when the sql's are running

On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
manivanna...@spanservices.com wrote:

 I tried with myisam engine also. But it also taking more time to generate
 the report. In my database I am having 8 innodb tables and at the same time
 I am joining 4 tables to get the report.

 I am maintaining 60days records because the user will try to generate the
 report out of 60 days in terms of second, minute, hourly, weekly and
 Monthly report also.

 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 12:32 AM
 To: Rick James
 Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
 Subject: Re: NoSQL help

 Did you try with myisam tables.
 They are supposed to be good for reporting requirement
 On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:
 rja...@yahoo-inc.com wrote:
 I'll second Johan's comments.

 Count the disk hits!

 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE over
 any time range via
 SUM(sum_foo) / SUM(count_foo)

 Switching between MySQL and Mongo requires rewriting _all_ of the relevant
 code.

 opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count
 the disk hits!

 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of reports.
  The prototype showed that most reports would take about an hour to run.
  Not good.  The final product, with summary tables, lets the reports be run
 on-demand and online and each takes only a few seconds.  By careful use of
 MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour)
 for the raw data and 2 minutes (total) for the 7 summary tables.
  PARTITIONing was vital for the design.  Once an hour a new partition is
 populated; once a day, 24 hourly partitions are rolled into a new daily
 partition and the 90-day old partition is DROPped.


  -Original Message-
  From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
 vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.
  Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -
   From: Manivannan S. manivanna...@spanservices.commailto:
 manivanna...@spanservices.com
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am, I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often applied
  even when implementing NoSQL solutions, too) it is usually a good first
  step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever you
  need for your reports; and your job also saves the last record IDs it's
  processed; then on the next run, you only read the new records and
  update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those rows is unbearably slow because I'm running the process on a
  wristwatch from 1860 :-)
 
  So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
  gets a record saying this:
 
  Avg | elementCount | lastSeen
  -
  6.5  4  4
 
  Now, over the course of the day, the elements 4, 17 and 2 get added
  with sequential row numbers. Instead of calculating
  (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the already
  summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
  4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
  like this:
 
  Avg | elementCount | lastSeen
  -
7  7  7
 
  This is of course a stupid example, but it saves you a lot of time if
  you already have the summary of several thousand elements and only need
  to update it for a handful. Similar tricks are possible for a lot of
  typical reporting stuff - you don't need to re-calculate data for past
  months over and over again, for instance - and that's what makes your
  reports run fast.
 
 
  Just my 2 cents :-)
  /johan
 
  --
  Bier met grenadyn
  Is als mosterd by den wyn
  Sy die't drinkt, is eene kwezel
  Hy die't drinkt, is ras een ezel
 
  --
  MySQL General Mailing List
  For list archives: 

RE: NoSQL help

2012-06-14 Thread Manivannan S.
id

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

1

SIMPLE

ibf

ALL

ibf_MsgId

\N

\N

\N

160944

1

SIMPLE

pl

ref

idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status

idx_unique_key_ib_msg

180

reports.ibf.Message_Id\,const

1

Using where; Using index

1

SIMPLE

tl

ref

idx_unique_key_ib_text\,index_message_id

idx_unique_key_ib_text

153

reports.pl.Message_Id

1

1

SIMPLE

xl

ref

idx_unique_key_ib_xml\,index_message_id

idx_unique_key_ib_xml

153

reports.pl.Message_Id

1

Using where


This is the execution plan for 1.5 milion records..



From: Ananda Kumar [mailto:anan...@gmail.com]mailto:[mailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 3:33 PM
To: Manivannan S.
Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
Subject: Re: NoSQL help

can u share the sql, explain plan, indexes etc,
show full processlist out put when the sql's are running
On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote:
I tried with myisam engine also. But it also taking more time to generate the 
report. In my database I am having 8 innodb tables and at the same time I am 
joining 4 tables to get the report.

I am maintaining 60days records because the user will try to generate the 
report out of 60 days in terms of second, minute, hourly, weekly and Monthly 
report also.

From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; 
mysql@lists.mysql.commailto:mysql@lists.mysql.com
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James 
rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
 wrote:
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman 
 [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: 
 mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: NoSQL help


 - Original Message -
  From: Manivannan S. 
  manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]

 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)

 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.

 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.

 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.

 Suppose I have a table like this:

 ID | Val
 
  1 1
  2 7
  3 5
  413

 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)

 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:

 Avg | elementCount | lastSeen
 

RE: NoSQL help

2012-06-14 Thread Manivannan S.
id

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

1

SIMPLE

ibf

ALL

ibf_MsgId

\N

\N

\N

160944

1

SIMPLE

pl

ref

idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status

idx_unique_key_ib_msg

180

reports.ibf.Message_Id\,const

1

Using where; Using index

1

SIMPLE

tl

ref

idx_unique_key_ib_text\,index_message_id

idx_unique_key_ib_text

153

reports.pl.Message_Id

1

1

SIMPLE

xl

ref

idx_unique_key_ib_xml\,index_message_id

idx_unique_key_ib_xml

153

reports.pl.Message_Id

1

Using where


Sorry for the previous mail. this is my execution plan for 1.5 million 
records

From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 3:33 PM
To: Manivannan S.
Cc: mysql@lists.mysql.com
Subject: Re: NoSQL help

can u share the sql, explain plan, indexes etc,
show full processlist out put when the sql's are running
On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
manivanna...@spanservices.commailto:manivanna...@spanservices.com wrote:
I tried with myisam engine also. But it also taking more time to generate the 
report. In my database I am having 8 innodb tables and at the same time I am 
joining 4 tables to get the report.

I am maintaining 60days records because the user will try to generate the 
report out of 60 days in terms of second, minute, hourly, weekly and Monthly 
report also.

From: Ananda Kumar [mailto:anan...@gmail.commailto:anan...@gmail.com]
Sent: Thursday, June 14, 2012 12:32 AM
To: Rick James
Cc: Johan De Meersman; Manivannan S.; 
mysql@lists.mysql.commailto:mysql@lists.mysql.com
Subject: Re: NoSQL help

Did you try with myisam tables.
They are supposed to be good for reporting requirement
On Wed, Jun 13, 2012 at 11:52 PM, Rick James 
rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
 wrote:
I'll second Johan's comments.

Count the disk hits!

One minor change:  Don't store averages in the summary table; instead store the 
SUM().  That lets you get the mathematically correct AVERAGE over any time 
range via
SUM(sum_foo) / SUM(count_foo)

Switching between MySQL and Mongo requires rewriting _all_ of the relevant code.

opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count the 
disk hits!

I recently built a system that topped out at 350GB (90 days' data).  It 
involved hourly ingestion of a few GB of data and a variety of reports.  The 
prototype showed that most reports would take about an hour to run.  Not good.  
The final product, with summary tables, lets the reports be run on-demand and 
online and each takes only a few seconds.  By careful use of MEMORY tables, 
LOAD DATA, etc, the ingestion takes 5 minutes (each hour) for the raw data and 
2 minutes (total) for the 7 summary tables.  PARTITIONing was vital for the 
design.  Once an hour a new partition is populated; once a day, 24 hourly 
partitions are rolled into a new daily partition and the 90-day old partition 
is DROPped.


 -Original Message-
 From: Johan De Meersman 
 [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Wednesday, June 13, 2012 6:20 AM
 To: Manivannan S.
 Cc: 
 mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: NoSQL help


 - Original Message -
  From: Manivannan S. 
  manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.commailto:manivanna...@spanservices.com
 
  Hi all,
 
  [lots of data]
  [slow reports]
  [wooo NoSQL magic]

 Not that I want to discourage you, but my standard first question is
 why do you think NoSQL (let alone any specific product) is the right
 solution? :-)

 Don't get me wrong, it might be; but from what little I now know about
 your environment, it sounds like applying some data warehousing
 techniques might suffice - and being the cynical dinosaur that I am, I
 have a healthy reluctance about welding new technology onto a stable
 environment.

 To speed up reporting (and note that these techniques are often applied
 even when implementing NoSQL solutions, too) it is usually a good first
 step to set up a process of data summarization.

 Basically, you pre-calculate averages, medians, groupings, whatever you
 need for your reports; and your job also saves the last record IDs it's
 processed; then on the next run, you only read the new records and
 update your summary tables to incorporate the new data.

 Suppose I have a table like this:

 ID | Val
 
  1 1
  2 7
  3 5
  413

 I want to report the average on a daily basis, and calculating that
 over those rows is unbearably slow because I'm running the process on a
 wristwatch from 1860 :-)

 So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
 gets a record saying this:

 Avg | elementCount | lastSeen
 -
 6.5  4  4

 

ANN: Data Wizard for MySQL 12.6 released

2012-06-14 Thread SQL Maestro Team
Hi!

SQL Maestro Group announces the release of Data Wizard for MySQL 12.6,
a powerful Windows GUI solution for MySQL data management.

The new version is immediately available at
http://www.sqlmaestro.com/products/mysql/datawizard/

Data Wizard for MySQL provides you with a number of easy-to-use
wizards to convert any ADO-compatible database to the MySQL database,
import data into MySQL tables, export data from tables, views and
queries to most popular file formats as well as generate data-driven
ASP.NET pages for your MySQL database.

Top 10 new features
=

1. The SQL Dump wizard. This tool allows you to export the data from
selected tables and views as a set of INSERT statements.

2. Data Pump: User-defined data type mappings.

3. Data Pump: Recognizing autoincrement columns in the source database
and converting them in the best possible way.

4. Data Pump: Data import using bulk-loading commands.

5. Data Import: Auto arranging target tables by dependencies.

6. Data Import: Import data from CSV files that contain multi-line strings.

7. Data Export: Decimal and thousand separators for the generated
files (when applicable).

8. All tools: Correct exit status (0 on success, 1 on error) for the
command-line mode. This return code can be checked in batch files
using the ERRORLEVEL variable.

9. All tools: Progress windows are redesigned in the modern style and
become more informative.

10. All tools: Log files become more structured and detailed.

In addition to this, several bugs have been fixed and some other minor
improvements and corrections have been made. Full press-release is
available at:
http://www.sqlmaestro.com/news/company/data_wizards_12_6_released/

Background information:
---
SQL Maestro Group offers complete database admin, development and
management tools for MySQL, SQL Server, PostgreSQL, Oracle, DB2,
SQLite, SQL Anywhere, Firebird and MaxDB providing the highest
performance, scalability and reliability to meet the requirements of
today's database applications.

Sincerely yours,
The SQL Maestro Group Team
http://www.sqlmaestro.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: NoSQL help

2012-06-14 Thread Ananda Kumar
As seen below,
Full table scan is happening on table ibf.
Can share the indexes on this table and also the complete sql

On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. 
manivanna...@spanservices.com wrote:

id

 select_type

 table

 type

 possible_keys

 key

 key_len

 ref

 rows

 Extra

 1

 SIMPLE

 ibf

 ALL

 ibf_MsgId

 \N

 \N

 \N

 160944

 1

 SIMPLE

 pl

 ref

 idx_unique_key_ib_msg\,index_message_id\,index_message_processing_status**
 **

 idx_unique_key_ib_msg

 180

 reports.ibf.Message_Id\,const

 1

 Using where; Using index

 1

 SIMPLE

 tl

 ref

 idx_unique_key_ib_text\,index_message_id

 idx_unique_key_ib_text

 153

 reports.pl.Message_Id

 1

 1

 SIMPLE

 xl

 ref

 idx_unique_key_ib_xml\,index_message_id

 idx_unique_key_ib_xml

 153

 reports.pl.Message_Id

 1

 Using where

 ** **

 Sorry for the previous mail….. this is my execution plan for 1.5 million
 records….

 ** **

 *From:* Ananda Kumar [mailto:anan...@gmail.com]
 *Sent:* Thursday, June 14, 2012 3:33 PM

 *To:* Manivannan S.
 *Cc:* mysql@lists.mysql.com
 *Subject:* Re: NoSQL help

 ** **

 can u share the sql, explain plan, indexes etc,

 show full processlist out put when the sql's are running

 On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
 manivanna...@spanservices.com wrote:

 I tried with myisam engine also. But it also taking more time to generate
 the report. In my database I am having 8 innodb tables and at the same time
 I am joining 4 tables to get the report.

 I am maintaining 60days records because the user will try to generate the
 report out of 60 days in terms of second, minute, hourly, weekly and
 Monthly report also.

 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 12:32 AM
 To: Rick James
 Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
 Subject: Re: NoSQL help


 Did you try with myisam tables.
 They are supposed to be good for reporting requirement

 On Wed, Jun 13, 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:
 rja...@yahoo-inc.com wrote:
 I'll second Johan's comments.

 Count the disk hits!

 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE over
 any time range via
 SUM(sum_foo) / SUM(count_foo)

 Switching between MySQL and Mongo requires rewriting _all_ of the relevant
 code.

 opinion NoSQL will be no better than MySQL for 150GB. /opinion  Count
 the disk hits!

 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of reports.
  The prototype showed that most reports would take about an hour to run.
  Not good.  The final product, with summary tables, lets the reports be run
 on-demand and online and each takes only a few seconds.  By careful use of
 MEMORY tables, LOAD DATA, etc, the ingestion takes 5 minutes (each hour)
 for the raw data and 2 minutes (total) for the 7 summary tables.
  PARTITIONing was vital for the design.  Once an hour a new partition is
 populated; once a day, 24 hourly partitions are rolled into a new daily
 partition and the 90-day old partition is DROPped.


  -Original Message-

  From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
 vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.

  Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -

   From: Manivannan S. manivanna...@spanservices.commailto:
 manivanna...@spanservices.com
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am, I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often applied
  even when implementing NoSQL solutions, too) it is usually a good first
  step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever you
  need for your reports; and your job also saves the last record IDs it's
  processed; then on the next run, you only read the new records and
  update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those 

Re: Minor collation question

2012-06-14 Thread Johan De Meersman
- Original Message -
 From: Rick James rja...@yahoo-inc.com

 A warning about β -- There was a change in the collation at 5.1.24.
 Search http://mysql.rjweb.org/doc.php/charcoll for 'German'; there is
 a brief discussion near the end.

Aha, also a very good bit of information, thank you.

The unix boys had started their own upgrade cycle in the mean time from Debian 
Lenny to Squeeze, so were unwittingly already upgrading to 5.1. A swift 
explanation of the problem and the subliminal thread of a swift kick in the 
valueables got them to suspend that trajectory until I build myself 
Debian-compliant 5.5 packages for Squeeze :-D

Any pointers towards the latter (no, Dotdeb's will not do - they don't 
magically replace the native squeeze ones) would also be appreciated; if not 
I'll be rolling my own.

--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Minor collation question

2012-06-14 Thread walter

http://bugs.mysql.com/bug.php?id=27877

W.T.

 - Original Message -
 From: Rick James rja...@yahoo-inc.com

 A warning about β -- There was a change in the collation at 5.1.24.
 Search http://mysql.rjweb.org/doc.php/charcoll for 'German'; there is
 a brief discussion near the end.

 Aha, also a very good bit of information, thank you.

 The unix boys had started their own upgrade cycle in the mean time from
 Debian Lenny to Squeeze, so were unwittingly already upgrading to 5.1. A
 swift explanation of the problem and the subliminal thread of a swift kick
 in the valueables got them to suspend that trajectory until I build myself
 Debian-compliant 5.5 packages for Squeeze :-D

 Any pointers towards the latter (no, Dotdeb's will not do - they don't
 magically replace the native squeeze ones) would also be appreciated; if
 not I'll be rolling my own.

 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: NoSQL help

2012-06-14 Thread Rick James
SHOW CREATE TABLE
SHOW TABLE STATUS
EXPLAIN SELECT ... \G


 -Original Message-
 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 4:04 AM
 To: Manivannan S.
 Cc: mysql@lists.mysql.com
 Subject: Re: NoSQL help
 
 As seen below,
 Full table scan is happening on table ibf.
 Can share the indexes on this table and also the complete sql
 
 On Thu, Jun 14, 2012 at 4:17 PM, Manivannan S. 
 manivanna...@spanservices.com wrote:
 
 id
 
  select_type
 
  table
 
  type
 
  possible_keys
 
  key
 
  key_len
 
  ref
 
  rows
 
  Extra
 
  1
 
  SIMPLE
 
  ibf
 
  ALL
 
  ibf_MsgId
 
  \N
 
  \N
 
  \N
 
  160944
 
  1
 
  SIMPLE
 
  pl
 
  ref
 
 
 idx_unique_key_ib_msg\,index_message_id\,index_message_processing_stat
  us**
  **
 
  idx_unique_key_ib_msg
 
  180
 
  reports.ibf.Message_Id\,const
 
  1
 
  Using where; Using index
 
  1
 
  SIMPLE
 
  tl
 
  ref
 
  idx_unique_key_ib_text\,index_message_id
 
  idx_unique_key_ib_text
 
  153
 
  reports.pl.Message_Id
 
  1
 
  1
 
  SIMPLE
 
  xl
 
  ref
 
  idx_unique_key_ib_xml\,index_message_id
 
  idx_unique_key_ib_xml
 
  153
 
  reports.pl.Message_Id
 
  1
 
  Using where
 
  ** **
 
  Sorry for the previous mail. this is my execution plan for 1.5
  million
  records
 
  ** **
 
  *From:* Ananda Kumar [mailto:anan...@gmail.com]
  *Sent:* Thursday, June 14, 2012 3:33 PM
 
  *To:* Manivannan S.
  *Cc:* mysql@lists.mysql.com
  *Subject:* Re: NoSQL help
 
  ** **
 
  can u share the sql, explain plan, indexes etc,
 
  show full processlist out put when the sql's are running
 
  On Thu, Jun 14, 2012 at 3:03 PM, Manivannan S. 
  manivanna...@spanservices.com wrote:
 
  I tried with myisam engine also. But it also taking more time to
  generate the report. In my database I am having 8 innodb tables and
 at
  the same time I am joining 4 tables to get the report.
 
  I am maintaining 60days records because the user will try to generate
  the report out of 60 days in terms of second, minute, hourly, weekly
  and Monthly report also.
 
  From: Ananda Kumar [mailto:anan...@gmail.com]
  Sent: Thursday, June 14, 2012 12:32 AM
  To: Rick James
  Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  Did you try with myisam tables.
  They are supposed to be good for reporting requirement
 
  On Wed, Jun 13, 2012 at 11:52 PM, Rick James rjames@yahoo-
 inc.commailto:
  rja...@yahoo-inc.com wrote:
  I'll second Johan's comments.
 
  Count the disk hits!
 
  One minor change:  Don't store averages in the summary table; instead
  store the SUM().  That lets you get the mathematically correct
 AVERAGE
  over any time range via
  SUM(sum_foo) / SUM(count_foo)
 
  Switching between MySQL and Mongo requires rewriting _all_ of the
  relevant code.
 
  opinion NoSQL will be no better than MySQL for 150GB. /opinion
  Count the disk hits!
 
  I recently built a system that topped out at 350GB (90 days' data).
  It involved hourly ingestion of a few GB of data and a variety of
 reports.
   The prototype showed that most reports would take about an hour to
 run.
   Not good.  The final product, with summary tables, lets the reports
  be run on-demand and online and each takes only a few seconds.  By
  careful use of MEMORY tables, LOAD DATA, etc, the ingestion takes 5
  minutes (each hour) for the raw data and 2 minutes (total) for the 7
 summary tables.
   PARTITIONing was vital for the design.  Once an hour a new partition
  is populated; once a day, 24 hourly partitions are rolled into a new
  daily partition and the 90-day old partition is DROPped.
 
 
   -Original Message-
 
   From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
  vegiv...@tuxera.be]
   Sent: Wednesday, June 13, 2012 6:20 AM
   To: Manivannan S.
 
   Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
   Subject: Re: NoSQL help
  
  
   - Original Message -
 
From: Manivannan S. manivanna...@spanservices.commailto:
  manivanna...@spanservices.com
   
Hi all,
   
[lots of data]
[slow reports]
[wooo NoSQL magic]
  
   Not that I want to discourage you, but my standard first question
 is
   why do you think NoSQL (let alone any specific product) is the
   right solution? :-)
  
   Don't get me wrong, it might be; but from what little I now know
   about your environment, it sounds like applying some data
   warehousing techniques might suffice - and being the cynical
   dinosaur that I am, I have a healthy reluctance about welding new
   technology onto a stable environment.
  
   To speed up reporting (and note that these techniques are often
   applied even when implementing NoSQL solutions, too) it is usually
 a
   good first step to set up a process of data 

RE: NoSQL help

2012-06-14 Thread Rick James
 the report out of 60 days in terms of second, minute, hourly, weekly and 
 Monthly report also

1-second reports??  Wouldn't that have millions of rows?  Is there anything 
useful to summarize?

For Second and Minute over narrow ranges, you might do fine reading the raw 
data.

For hourly reports an hourly summary table might be 'right'.

For daily, weekly, and monthly, have a daily summary table.  There would be one 
row per widget per day.


 -Original Message-
 From: Manivannan S. [mailto:manivanna...@spanservices.com]
 Sent: Thursday, June 14, 2012 2:33 AM
 To: mysql@lists.mysql.com
 Subject: RE: NoSQL help
 
 I tried with myisam engine also. But it also taking more time to
 generate the report. In my database I am having 8 innodb tables and at
 the same time I am joining 4 tables to get the report.
 
 I am maintaining 60days records because the user will try to generate
 the report out of 60 days in terms of second, minute, hourly, weekly
 and Monthly report also.
 
 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Thursday, June 14, 2012 12:32 AM
 To: Rick James
 Cc: Johan De Meersman; Manivannan S.; mysql@lists.mysql.com
 Subject: Re: NoSQL help
 
 Did you try with myisam tables.
 They are supposed to be good for reporting requirement On Wed, Jun 13,
 2012 at 11:52 PM, Rick James rja...@yahoo-inc.commailto:rjames@yahoo-
 inc.com wrote:
 I'll second Johan's comments.
 
 Count the disk hits!
 
 One minor change:  Don't store averages in the summary table; instead
 store the SUM().  That lets you get the mathematically correct AVERAGE
 over any time range via
 SUM(sum_foo) / SUM(count_foo)
 
 Switching between MySQL and Mongo requires rewriting _all_ of the
 relevant code.
 
 opinion NoSQL will be no better than MySQL for 150GB. /opinion
 Count the disk hits!
 
 I recently built a system that topped out at 350GB (90 days' data).  It
 involved hourly ingestion of a few GB of data and a variety of
 reports.  The prototype showed that most reports would take about an
 hour to run.  Not good.  The final product, with summary tables, lets
 the reports be run on-demand and online and each takes only a few
 seconds.  By careful use of MEMORY tables, LOAD DATA, etc, the
 ingestion takes 5 minutes (each hour) for the raw data and 2 minutes
 (total) for the 7 summary tables.  PARTITIONing was vital for the
 design.  Once an hour a new partition is populated; once a day, 24
 hourly partitions are rolled into a new daily partition and the 90-day
 old partition is DROPped.
 
 
  -Original Message-
  From: Johan De Meersman
  [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
  Sent: Wednesday, June 13, 2012 6:20 AM
  To: Manivannan S.
  Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
  Subject: Re: NoSQL help
 
 
  - Original Message -
   From: Manivannan S.
  
 manivanna...@spanservices.commailto:manivanna...@spanservices.com
   
  
   Hi all,
  
   [lots of data]
   [slow reports]
   [wooo NoSQL magic]
 
  Not that I want to discourage you, but my standard first question is
  why do you think NoSQL (let alone any specific product) is the right
  solution? :-)
 
  Don't get me wrong, it might be; but from what little I now know
 about
  your environment, it sounds like applying some data warehousing
  techniques might suffice - and being the cynical dinosaur that I am,
 I
  have a healthy reluctance about welding new technology onto a stable
  environment.
 
  To speed up reporting (and note that these techniques are often
  applied even when implementing NoSQL solutions, too) it is usually a
  good first step to set up a process of data summarization.
 
  Basically, you pre-calculate averages, medians, groupings, whatever
  you need for your reports; and your job also saves the last record
 IDs
  it's processed; then on the next run, you only read the new records
  and update your summary tables to incorporate the new data.
 
  Suppose I have a table like this:
 
  ID | Val
  
   1 1
   2 7
   3 5
   413
 
  I want to report the average on a daily basis, and calculating that
  over those rows is unbearably slow because I'm running the process on
  a wristwatch from 1860 :-)
 
  So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then
  gets a record saying this:
 
  Avg | elementCount | lastSeen
  -
  6.5  4  4
 
  Now, over the course of the day, the elements 4, 17 and 2 get added
  with sequential row numbers. Instead of calculating
  (1+7+5+13+4+17+2)/7, which would be slow; I can substitute the
 already
  summarized data by Avg*elementCount. Thus, I calculate (6.5*4 +
  4+17+2)/7 = 7, which is a lot faster, and my summary table now looks
  like this:
 
  Avg | elementCount | lastSeen
  -
7  7  7
 
  This is of course a stupid example, but it saves you a lot of time if
  you already have the summary of several thousand elements and only
  

RE: Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases.

2012-06-14 Thread Rick James
There should be little or no difference.

If you are using thousands of tables, you might encounter overhead in opening 
the .ibd files.
If you are tight on disk space, a single ibdata1 might be more efficient at 
reusing free blocks.
OTOH, if you shrink or drop a big table, the freed space is not returned to the 
OS if you have a singe ibdata1.

In most cases, I recommend innodb_file_per_table=1.

 -Original Message-
 From: Pothanaboyina Trimurthy [mailto:skd.trimur...@gmail.com]
 Sent: Tuesday, May 15, 2012 4:58 AM
 To: mysql@lists.mysql.com
 Subject: Is there any performance difference, maintaining separate
 ibdata files for each and every table insted of having one singl tabale
 for all databases.
 
 hi every one
 
   Is there any performance difference, maintaining separate ibdata
 files for each and every table insted of having one singl tabale for
 all databases, for InnoDB Storage Engine.
 
 please let me know the difference.
 
 --
 3murthy
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Is there any performance difference, maintaining separate ibdata files for each and every table insted of having one singl tabale for all databases.

2012-06-14 Thread Prabhat Kumar
there is performance issues with a larger number of datafiles than a
single, that the reason innodb_file_per_table in not a default option.

other, with innodb_file_per_table, you'll use more resources, there can be
a problem if you have _many_ tables, there is obviously the problem to
keep number of open files (open tables / table_cache /
table_definition_cache) to manage. (Ofcourse, myisam tables always had that
issue).

Netx, with single file, table space can be shared between rows of different
tables and schemas. This means less wasted tablespace. where, with
innodb_file_per_table, each table may have unused tablspace, which can only
be utilized by rows of the same table. This means (sometimes much) more
wasted tablespace

but still innodb_file_per_table is preferable, this lead to a disk space
issue, since the ibdata1 file grew to 90G+ and it was not possible to
defragment tablespace using the Alter method (Infact is is hardly possible
to do in a production environment, since it leads to a huge downtime.)

On Thu, Jun 14, 2012 at 12:19 PM, Rick James rja...@yahoo-inc.com wrote:

 There should be little or no difference.

 If you are using thousands of tables, you might encounter overhead in
 opening the .ibd files.
 If you are tight on disk space, a single ibdata1 might be more efficient
 at reusing free blocks.
 OTOH, if you shrink or drop a big table, the freed space is not returned
 to the OS if you have a singe ibdata1.

 In most cases, I recommend innodb_file_per_table=1.

  -Original Message-
  From: Pothanaboyina Trimurthy [mailto:skd.trimur...@gmail.com]
  Sent: Tuesday, May 15, 2012 4:58 AM
  To: mysql@lists.mysql.com
  Subject: Is there any performance difference, maintaining separate
  ibdata files for each and every table insted of having one singl tabale
   for all databases.
 
  hi every one
 
Is there any performance difference, maintaining separate ibdata
  files for each and every table insted of having one singl tabale for
  all databases, for InnoDB Storage Engine.
 
  please let me know the difference.
 
  --
  3murthy
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:http://lists.mysql.com/mysql


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql




-- 
Best Regards,

Prabhat Kumar
MySQL DBA

My Blog: http://adminlinux.blogspot.com
My LinkedIn: http://www.linkedin.com/in/profileprabhat


Re: i need advice on redundancy of mysql server.

2012-06-14 Thread Joey L
Guys - thanks for the replys - do any of you guys are on odesk or elancer.com ??
thanks

On Wed, Jun 13, 2012 at 7:28 PM, Rick James rja...@yahoo-inc.com wrote:
 I prefer:
 * Master-Master (dual master) but write to only one of them.  (Writing to 
 both can lead to duplicate keys, etc., unless you are very careful in your 
 code.)
 * Have the two Masters geographically separate.  (Think tornados, floods, 
 earthquakes, etc)
 * Have Slave(s) hanging of each master -- (1) for read scaling, and (2) to 
 avoid a major outage when one Master goes down and you need to take the other 
 one down to clone it.

 Another thing to consider:
 Backing up via a LVM snapshot requires only a minute or so of downtime, 
 regardless of dataset size.
 Percona's XtraBackup is also very good.

 I also agree that MyISAM in not best.  But, caution, InnoDB's disk footprint 
 is 2x=3x bigger than MyISAM's.

 You can Load Balance reads (among slaves and, optionally, masters); you 
 cannot do writes.

 Any number of Apache servers can talk to MySQL.  But watch out -- MaxClients 
 should not be so large that it swamps max_connections.

 Load balancing:
 DNS is the simple way to load balance Apache.
 There are low-impact software solutions.
 There are hardware solutions.  (This is what I am used to at work; it is 
 severe overkill for most users.)

 Bottom line:  There is no best or perfect solution.  First decide what 
 'keeps you up at night'.

 -Original Message-
 From: Joey L [mailto:mjh2...@gmail.com]
 Sent: Monday, June 11, 2012 7:26 AM
 To: mysql@lists.mysql.com
 Subject: i need advice on redundancy of mysql server.

 I am running a site with about 50gig myisam databases which are the
 backend to different websites.
 I can not afford any downtime and the data is realtime.

 What is the best method for this setup? master-master or master-slave?

 What are the best utilities to create and maintain this setup?  as far
 as load balancing between the two physical servers that i am running.
 I am currently working with percona utilities - is there something
 better ?
 what would you use to load balance mysql ? what would you use to load
 balance apache.


 thanks

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



console input

2012-06-14 Thread Gary Aitken
Hi all,

I've looked high and low for what I hope is a trivial answer.

I was trying to load a table using LOAD DATA INFILE.  Unfortunately, it craps 
out because there are some duplicate primary keys.  Not surprising as the 
source did not enforce uniqueness.  My problem is the load data simply dies 
without indicating which line of the input file was in error; the error message 
refers to line 3, which is not even the SQL statement for the LOAD DATA INTO 
statement:

I can get the table loaded by specifying REPLACE INTO TABLE, but that still 
leaves me with not knowing where the duplicate records are.

So...  I wanted to read the data line at a time and use a plain INSERT 
statement.  That way I could check for duplicate keys and discover where the 
duplicate records are.  However, I can't find a way to read input from the 
console or a file.  What am I missing?  I know I could write a java or C++ 
program to do this, but it seems like overkill for what should be a trivial 
task.

Thanks for any pointers,

Gary

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: console input

2012-06-14 Thread Keith Keller
On 2012-06-14, Gary Aitken my...@dreamchaser.org wrote:

 So...  I wanted to read the data line at a time and use a plain INSERT 
 statement.  That way I could check for duplicate keys and discover where the 
 duplicate records are.  However, I can't find a way to read input from the 
 console or a file.  What am I missing?  I know I could write a java or C++ 
 program to do this, but it seems like overkill for what should be a trivial 
 task.

I don't know of any stock mysql program that does such a thing.  You
could write a Perl or Python program for this task; this program would
probably be fairly short.

--keith


-- 
kkel...@wombat.san-francisco.ca.us



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql