Re: Splitting data across tables

2004-07-26 Thread Egor Egorov
Andrew Hill [EMAIL PROTECTED] wrote:

 Perhaps it would be better to insert the timestamp, rather than letting
 MySQL set it. 

Sure. This gives consistency in your situation. 

 Or, is there perhaps a way to let MySQL select which table
 to perform the insert into, based on the time?

No. 






-- 
For technical support contracts, goto https://order.mysql.com/?ref=ensita
This email is sponsored by Ensita.net http://www.ensita.net/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Egor Egorov
 / /|_/ / // /\ \/ /_/ / /__   [EMAIL PROTECTED]
/_/  /_/\_, /___/\___\_\___/   MySQL AB / Ensita.net
   ___/   www.mysql.com




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Splitting data across tables

2004-07-22 Thread Andrew Hill
 -Original Message-
 From: William H. Bowers [mailto:[EMAIL PROTECTED] 
 Sent: 22 July 2004 03:55
 To: [EMAIL PROTECTED]
 Subject: re: Splitting data across tables
 
 If indexing seems to be the cause of the slowdown, it may be 
 faster to drop the indexes, add the new data, and then 
 rebuild the indexes.  This is especially helpful with 
 physical indexing.

I don't think that this would be a good idea for us - we add the data
continuiously, and aggregate at least every hour. I'm not confident that
we could drop the indexes and then recreate them in less than 60
minutes...

Cheers,

--
Andrew Hill
Developer
Awarez Ltd.
Kirkman House, 12-14 Whitfield Street, London W1T 2RF
T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374

--
The information in this email is confidential and is intended solely
for the addressee. It may be legally privileged. The contents are not
to be disclosed to anyone other than the addressee and access to this
email by anyone else is unauthorised.
Unauthorised recipients are requested to preserve the confidentiality
of this email and to advise the sender immediately of any error in
transmission. Any disclosure, copying, distribution or action taken or
omitted to be taken in reliance upon the contents of this email by
unauthorised recipients is prohibited and may be unlawful.
-- 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Splitting data across tables

2004-07-22 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Andrew Hill [EMAIL PROTECTED] writes:

 Hi,
 Okay, cool. A couple of things:

 1) We *think* that our bottleneck is RAM and the calculation of indexes
 when inserting into the table (about 500 million rows). There's
 certainly plenty of CPU and disk I/O left in the machine. Hopefully,
 with smaller tables, the inserts will happen faster as well, and getting
 timestamps from MySQL won't be an issue.

 2) We're lucky with the report generation, in that everything can be
 summarised into stats tables reasonably easily :-)

 So, I guess that I'm more concerned with ensuring that the times are
 correct, in case of the clocks on the clients going crazy. That's why I
 was hoping there was a way of making MySQL select the table to insert
 into based on the time...

You could try allowing some inaccuracy: an entry might end up in the
correct table, in the table before it, or in the table after it.
This would require rewriting SELECTs with UNIONs, but because the
UNIONs would generate only a few rows (if any), the performance should
not suffer much.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Splitting data across tables

2004-07-21 Thread John McCaskey
I've had no problems partitioning data in this exact same manner.
However my timestamp column is always pre-computed in the application
code because it is neccesary to round it to the last 5 minute interval
so I would not encounter the issue you mention.  I'd recommend simply
computing the timestamp in advance.  If you have a database substantial
enough you need this partitioning, and such that you can't delete as
fast as you insert presently, then I'm guessing the database server is
your major bottleneck in overall performance anyway.  In which case
offloading the date computation to the application server is probably a
smart move regardless.

The only complication we have had since moving to this partioning scheme
is that its more complicated to create reports across tables (we have to
deal with both user local time and GMT, so sometimes a daily report will
span two tables).  

John A. McCaskey


-Original Message-
From: Andrew Hill [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 8:00 AM
To: [EMAIL PROTECTED]
Subject: Splitting data across tables


Hi all,

We have quite a large table, and are considering splitting the data into
daily tables - something like tblname_20040721, tblname_20040722, etc.
Hopefully, this will allow us to more easily delete old data through
dropping the old tables, instead of using DELETE. (At present, we're
inserting data into this table faster than we can remove it...)

I was wondering if anyone has any experience and advice to give on going
down this path, that they would be willing to share?

Specifically, I'm concerned by the possibility of inserting data into
the wrong table: Say a client notes that the date is 20040721
23:59:58, and so decides to insert the data into the tblname_20040721
table - and then, when the insert happens, it's now 20040722 00:00:01.
If the table has a timestamp column that is set by the MySQL server,
then the timestamp will be in 20040722, but the data will be inserted
into the table for 20040721.

Obviously, this could also happen should NTP get out of whack...

Perhaps it would be better to insert the timestamp, rather than letting
MySQL set it. Or, is there perhaps a way to let MySQL select which table
to perform the insert into, based on the time?

Any ideas or comments?

TIA,

--
Andrew Hill
Developer
Awarez Ltd.
Kirkman House, 12-14 Whitfield Street, London W1T 2RF
T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374

--
The information in this email is confidential and is intended solely for
the addressee. It may be legally privileged. The contents are not to be
disclosed to anyone other than the addressee and access to this email by
anyone else is unauthorised. Unauthorised recipients are requested to
preserve the confidentiality of this email and to advise the sender
immediately of any error in transmission. Any disclosure, copying,
distribution or action taken or omitted to be taken in reliance upon the
contents of this email by unauthorised recipients is prohibited and may
be unlawful.
--

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Splitting data across tables

2004-07-21 Thread Andrew Hill
Hi,

Okay, cool. A couple of things:

1) We *think* that our bottleneck is RAM and the calculation of indexes
when inserting into the table (about 500 million rows). There's
certainly plenty of CPU and disk I/O left in the machine. Hopefully,
with smaller tables, the inserts will happen faster as well, and getting
timestamps from MySQL won't be an issue.

2) We're lucky with the report generation, in that everything can be
summarised into stats tables reasonably easily :-)

So, I guess that I'm more concerned with ensuring that the times are
correct, in case of the clocks on the clients going crazy. That's why I
was hoping there was a way of making MySQL select the table to insert
into based on the time...

TIA,

--
Andrew Hill
Developer
Awarez Ltd.
Kirkman House, 12-14 Whitfield Street, London W1T 2RF
T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374

--
The information in this email is confidential and is intended solely
for the addressee. It may be legally privileged. The contents are not
to be disclosed to anyone other than the addressee and access to this
email by anyone else is unauthorised.
Unauthorised recipients are requested to preserve the confidentiality
of this email and to advise the sender immediately of any error in
transmission. Any disclosure, copying, distribution or action taken or
omitted to be taken in reliance upon the contents of this email by
unauthorised recipients is prohibited and may be unlawful.
-- 

 -Original Message-
 From: John McCaskey [mailto:[EMAIL PROTECTED] 
 Sent: 21 July 2004 16:24
 To: Andrew Hill; [EMAIL PROTECTED]
 Subject: RE: Splitting data across tables
 
 I've had no problems partitioning data in this exact same manner.
 However my timestamp column is always pre-computed in the 
 application code because it is neccesary to round it to the 
 last 5 minute interval so I would not encounter the issue you 
 mention.  I'd recommend simply computing the timestamp in 
 advance.  If you have a database substantial enough you need 
 this partitioning, and such that you can't delete as fast as 
 you insert presently, then I'm guessing the database server 
 is your major bottleneck in overall performance anyway.  In 
 which case offloading the date computation to the application 
 server is probably a smart move regardless.
 
 The only complication we have had since moving to this 
 partioning scheme is that its more complicated to create 
 reports across tables (we have to deal with both user local 
 time and GMT, so sometimes a daily report will span two tables).  
 
 John A. McCaskey
 
 
 -Original Message-
 From: Andrew Hill [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 21, 2004 8:00 AM
 To: [EMAIL PROTECTED]
 Subject: Splitting data across tables
 
 
 Hi all,
 
 We have quite a large table, and are considering splitting 
 the data into
 daily tables - something like tblname_20040721, tblname_20040722, etc.
 Hopefully, this will allow us to more easily delete old data through
 dropping the old tables, instead of using DELETE. (At present, we're
 inserting data into this table faster than we can remove it...)
 
 I was wondering if anyone has any experience and advice to 
 give on going
 down this path, that they would be willing to share?
 
 Specifically, I'm concerned by the possibility of inserting data into
 the wrong table: Say a client notes that the date is 20040721
 23:59:58, and so decides to insert the data into the tblname_20040721
 table - and then, when the insert happens, it's now 20040722 00:00:01.
 If the table has a timestamp column that is set by the MySQL server,
 then the timestamp will be in 20040722, but the data will be inserted
 into the table for 20040721.
 
 Obviously, this could also happen should NTP get out of whack...
 
 Perhaps it would be better to insert the timestamp, rather 
 than letting
 MySQL set it. Or, is there perhaps a way to let MySQL select 
 which table
 to perform the insert into, based on the time?
 
 Any ideas or comments?
 
 TIA,
 
 --
 Andrew Hill
 Developer
 Awarez Ltd.
 Kirkman House, 12-14 Whitfield Street, London W1T 2RF
 T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374
 
 --
 The information in this email is confidential and is intended 
 solely for
 the addressee. It may be legally privileged. The contents are 
 not to be
 disclosed to anyone other than the addressee and access to 
 this email by
 anyone else is unauthorised. Unauthorised recipients are requested to
 preserve the confidentiality of this email and to advise the sender
 immediately of any error in transmission. Any disclosure, copying,
 distribution or action taken or omitted to be taken in 
 reliance upon the
 contents of this email by unauthorised recipients is 
 prohibited and may
 be unlawful

RE: Splitting data across tables

2004-07-21 Thread John McCaskey
As far as I know there is no way to make mysql choose the table to
insert to dynamically.  However, It is probably possible to write a user
defined function that will do so.  You might want to look into that
option.

John A. McCaskey

-Original Message-
From: Andrew Hill [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 8:46 AM
To: [EMAIL PROTECTED]
Subject: RE: Splitting data across tables


Hi,

Okay, cool. A couple of things:

1) We *think* that our bottleneck is RAM and the calculation of indexes
when inserting into the table (about 500 million rows). There's
certainly plenty of CPU and disk I/O left in the machine. Hopefully,
with smaller tables, the inserts will happen faster as well, and getting
timestamps from MySQL won't be an issue.

2) We're lucky with the report generation, in that everything can be
summarised into stats tables reasonably easily :-)

So, I guess that I'm more concerned with ensuring that the times are
correct, in case of the clocks on the clients going crazy. That's why I
was hoping there was a way of making MySQL select the table to insert
into based on the time...

TIA,

--
Andrew Hill
Developer
Awarez Ltd.
Kirkman House, 12-14 Whitfield Street, London W1T 2RF
T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374

--
The information in this email is confidential and is intended solely for
the addressee. It may be legally privileged. The contents are not to be
disclosed to anyone other than the addressee and access to this email by
anyone else is unauthorised. Unauthorised recipients are requested to
preserve the confidentiality of this email and to advise the sender
immediately of any error in transmission. Any disclosure, copying,
distribution or action taken or omitted to be taken in reliance upon the
contents of this email by unauthorised recipients is prohibited and may
be unlawful.
-- 

 -Original Message-
 From: John McCaskey [mailto:[EMAIL PROTECTED]
 Sent: 21 July 2004 16:24
 To: Andrew Hill; [EMAIL PROTECTED]
 Subject: RE: Splitting data across tables
 
 I've had no problems partitioning data in this exact same manner. 
 However my timestamp column is always pre-computed in the application 
 code because it is neccesary to round it to the last 5 minute interval

 so I would not encounter the issue you mention.  I'd recommend simply 
 computing the timestamp in advance.  If you have a database 
 substantial enough you need this partitioning, and such that you can't

 delete as fast as you insert presently, then I'm guessing the database

 server is your major bottleneck in overall performance anyway.  In
 which case offloading the date computation to the application 
 server is probably a smart move regardless.
 
 The only complication we have had since moving to this
 partioning scheme is that its more complicated to create 
 reports across tables (we have to deal with both user local 
 time and GMT, so sometimes a daily report will span two tables).  
 
 John A. McCaskey
 
 
 -Original Message-
 From: Andrew Hill [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 21, 2004 8:00 AM
 To: [EMAIL PROTECTED]
 Subject: Splitting data across tables
 
 
 Hi all,
 
 We have quite a large table, and are considering splitting
 the data into
 daily tables - something like tblname_20040721, tblname_20040722, etc.
 Hopefully, this will allow us to more easily delete old data through
 dropping the old tables, instead of using DELETE. (At present, we're
 inserting data into this table faster than we can remove it...)
 
 I was wondering if anyone has any experience and advice to
 give on going
 down this path, that they would be willing to share?
 
 Specifically, I'm concerned by the possibility of inserting data into 
 the wrong table: Say a client notes that the date is 20040721 
 23:59:58, and so decides to insert the data into the tblname_20040721 
 table - and then, when the insert happens, it's now 20040722 00:00:01.

 If the table has a timestamp column that is set by the MySQL server, 
 then the timestamp will be in 20040722, but the data will be inserted 
 into the table for 20040721.
 
 Obviously, this could also happen should NTP get out of whack...
 
 Perhaps it would be better to insert the timestamp, rather
 than letting
 MySQL set it. Or, is there perhaps a way to let MySQL select 
 which table
 to perform the insert into, based on the time?
 
 Any ideas or comments?
 
 TIA,
 
 --
 Andrew Hill
 Developer
 Awarez Ltd.
 Kirkman House, 12-14 Whitfield Street, London W1T 2RF
 T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374
 
 --
 The information in this email is confidential and is intended
 solely for
 the addressee. It may be legally privileged. The contents are 
 not to be
 disclosed to anyone other than the addressee and access to 
 this email

re: Splitting data across tables

2004-07-21 Thread William H. Bowers
The table separation should be determined by the data retrieval
requirements.  I've done this before for sales data and partitioned the
tables by sales region since most of the reporting was within region.
As the database was updated only overnight, we also made aggregate
tables to make reporting easier.

If indexing seems to be the cause of the slowdown, it may be faster to
drop the indexes, add the new data, and then rebuild the indexes.  This
is especially helpful with physical indexing.

Bill
 
William H. Bowers
Lecturer, Information Sciences and Technology
Penn State Berks-Lehigh Valley College
Franco F119
Tulpehocken Road
P.O. Box 7009
Reading, PA  19610-6009
 
Voice:   (610) 396-6276
Fax: (610) 396-6026
 
Text Messages (100 char max)   [EMAIL PROTECTED]
Email:   [EMAIL PROTECTED]
AIM:  psuwhb108
MSN Messenger:  [EMAIL PROTECTED]
Home Page:  http://www.bklv.psu.edu/faculty/bowers
 

-Original Message-
From: John McCaskey [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 11:50
To: Andrew Hill; [EMAIL PROTECTED]
Subject: RE: Splitting data across tables

As far as I know there is no way to make mysql choose the table to
insert to dynamically.  However, It is probably possible to write a user
defined function that will do so.  You might want to look into that
option.

John A. McCaskey

-Original Message-
From: Andrew Hill [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 8:46 AM
To: [EMAIL PROTECTED]
Subject: RE: Splitting data across tables


Hi,

Okay, cool. A couple of things:

1) We *think* that our bottleneck is RAM and the calculation of indexes
when inserting into the table (about 500 million rows). There's
certainly plenty of CPU and disk I/O left in the machine. Hopefully,
with smaller tables, the inserts will happen faster as well, and getting
timestamps from MySQL won't be an issue.

2) We're lucky with the report generation, in that everything can be
summarised into stats tables reasonably easily :-)

So, I guess that I'm more concerned with ensuring that the times are
correct, in case of the clocks on the clients going crazy. That's why I
was hoping there was a way of making MySQL select the table to insert
into based on the time...

TIA,

--
Andrew Hill
Developer
Awarez Ltd.
Kirkman House, 12-14 Whitfield Street, London W1T 2RF
T: +44 (0)20 7299 7370  F: +44 (0)20 7299 7374

--
The information in this email is confidential and is intended solely for
the addressee. It may be legally privileged. The contents are not to be
disclosed to anyone other than the addressee and access to this email by
anyone else is unauthorised. Unauthorised recipients are requested to
preserve the confidentiality of this email and to advise the sender
immediately of any error in transmission. Any disclosure, copying,
distribution or action taken or omitted to be taken in reliance upon the
contents of this email by unauthorised recipients is prohibited and may
be unlawful.
-- 

 -Original Message-
 From: John McCaskey [mailto:[EMAIL PROTECTED]
 Sent: 21 July 2004 16:24
 To: Andrew Hill; [EMAIL PROTECTED]
 Subject: RE: Splitting data across tables
 
 I've had no problems partitioning data in this exact same manner. 
 However my timestamp column is always pre-computed in the application 
 code because it is neccesary to round it to the last 5 minute interval

 so I would not encounter the issue you mention.  I'd recommend simply 
 computing the timestamp in advance.  If you have a database 
 substantial enough you need this partitioning, and such that you can't

 delete as fast as you insert presently, then I'm guessing the database

 server is your major bottleneck in overall performance anyway.  In
 which case offloading the date computation to the application 
 server is probably a smart move regardless.
 
 The only complication we have had since moving to this
 partioning scheme is that its more complicated to create 
 reports across tables (we have to deal with both user local 
 time and GMT, so sometimes a daily report will span two tables).  
 
 John A. McCaskey
 
 
 -Original Message-
 From: Andrew Hill [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 21, 2004 8:00 AM
 To: [EMAIL PROTECTED]
 Subject: Splitting data across tables
 
 
 Hi all,
 
 We have quite a large table, and are considering splitting
 the data into
 daily tables - something like tblname_20040721, tblname_20040722, etc.
 Hopefully, this will allow us to more easily delete old data through
 dropping the old tables, instead of using DELETE. (At present, we're
 inserting data into this table faster than we can remove it...)
 
 I was wondering if anyone has any experience and advice to
 give on going
 down this path, that they would be willing to share?
 
 Specifically, I'm concerned by the possibility of inserting data into 
 the wrong table: Say a client