Re: Logging Data: Should I use MyIsam or InnoDB?

2005-01-19 Thread Martijn Tonies
Hello Jeremy,

  I have a curious issue here, maybe someone can help.
 
  I have a single process that inserts data into tables that contain
  purely logging information.  This table is then searched by our Care
  department to troubleshoot issues.  I am looking for the best way to
  store this data, and the structure on the backend.
 
  There are 50 million inserts into table LOG a day.  The primary index
  on the table is seconds from 1971.  I only need to keep 60 days worth
  of data, and the table is only used for read purposes.  This is my
  design criteria, but my problem is how to delete old data without
  crashing the log writer that is atteched to the table.

 OK, how about this:

 Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:

 log_2005_01_15
 log_2005_01_16
 log_2005_01_17
 log_2005_01_18

 etc.

 Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30
 day, 7 day, etc.) read views that you need, like so:

 CREATE TABLE log_view_7day (
...
 ) TYPE=MERGE UNION=(
log_2005_01_12,
 You can then do all of your inserts from the log writer into the today
 table, and do your reads against the various MERGEs.

 Every day at exactly midnight, you would use ALTER TABLE (which is
 atomic) to redefine the UNION of the MERGE definition of the various
tables.

Modifying metadata because you need a different view at your
data.

Am I the only one to which this sound ugly?

With regards,

Martijn Tonies
Database Workbench - developer tool for InterBase, Firebird, MySQL  MS SQL
Server
Upscene Productions
http://www.upscene.com


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Logging Data: Should I use MyIsam or InnoDB?

2005-01-19 Thread Javier Armendáriz
Martijn Tonies wrote:
Hello Jeremy,
 

I have a curious issue here, maybe someone can help.
I have a single process that inserts data into tables that contain
purely logging information.  This table is then searched by our Care
department to troubleshoot issues.  I am looking for the best way to
store this data, and the structure on the backend.
There are 50 million inserts into table LOG a day.  The primary index
on the table is seconds from 1971.  
 

Sorry, as I understand, there are not 5000 seconds in a day, so it 
can not be primary key.


OK, how about this:
Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:
log_2005_01_15
log_2005_01_16
log_2005_01_17
log_2005_01_18
etc.
Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30
day, 7 day, etc.) read views that you need, like so:
CREATE TABLE log_view_7day (
  ...
) TYPE=MERGE UNION=(
  log_2005_01_12,
You can then do all of your inserts from the log writer into the today
table, and do your reads against the various MERGEs.
Every day at exactly midnight, you would use ALTER TABLE (which is
atomic) to redefine the UNION of the MERGE definition of the various
   

tables.
Modifying metadata because you need a different view at your
data.
Am I the only one to which this sound ugly?
With regards,
Martijn Tonies
Database Workbench - developer tool for InterBase, Firebird, MySQL  MS SQL
Server
Upscene Productions
http://www.upscene.com
 


--

Cuando todo esta bajo control,
es que no vamos suficientemente deprisa

Javier Armendáriz
[EMAIL PROTECTED]



Re: Logging Data: Should I use MyIsam or InnoDB?

2005-01-19 Thread SGreen
Martijn Tonies [EMAIL PROTECTED] wrote on 01/19/2005 03:33:32 AM:

 Hello Jeremy,
 
   I have a curious issue here, maybe someone can help.
  
   I have a single process that inserts data into tables that contain
   purely logging information.  This table is then searched by our Care
   department to troubleshoot issues.  I am looking for the best way to
   store this data, and the structure on the backend.
  
   There are 50 million inserts into table LOG a day.  The primary 
index
   on the table is seconds from 1971.  I only need to keep 60 days 
worth
   of data, and the table is only used for read purposes.  This is my
   design criteria, but my problem is how to delete old data without
   crashing the log writer that is atteched to the table.
 
  OK, how about this:
 
  Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:
 
  log_2005_01_15
  log_2005_01_16
  log_2005_01_17
  log_2005_01_18
 
  etc.
 
  Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30
  day, 7 day, etc.) read views that you need, like so:
 
  CREATE TABLE log_view_7day (
 ...
  ) TYPE=MERGE UNION=(
 log_2005_01_12,
  You can then do all of your inserts from the log writer into the 
today
  table, and do your reads against the various MERGEs.
 
  Every day at exactly midnight, you would use ALTER TABLE (which is
  atomic) to redefine the UNION of the MERGE definition of the various
 tables.
 
 Modifying metadata because you need a different view at your
 data.
 
 Am I the only one to which this sound ugly?
 
 With regards,
 
 Martijn Tonies
 Database Workbench - developer tool for InterBase, Firebird, MySQL  MS 
SQL
 Server
 Upscene Productions
 http://www.upscene.com
 

It sounds just fine to me. IMHO this is an excellent usage of MySQL 
features to solve a technical issue. The original poster wants to maintain 
only 60 days worth of data. He also needs really fast inserts. MyISAM can 
perform concurrent, non-blocking inserts so long as there are no deletion 
gaps in the table to which it is inserting. One solution to this is to 
create what is in effect a view that spans 60 days worth of data, each 
day's data in its own table. However, since views wont be officially 
available until later, a MERGE table is an effective substitute.

This gives him many kinds of granularity options when performing queries, 
too. He can query the daily table directly if he knows that the query does 
not span days. He could use multiple MERGE tables to aggregate various 
ranges of daily data. He could UNION several single-day queries together 
to cover a span or list of dates. All of these have various advantages. 
The biggest of which is that when it comes time to move the out of date 
data off-line, he won't have to lock the current day's data to do it.

One alternative could be a VIEW built against a single, larger table with 
a query specifying a date range of 60 days. Something like:

CREATE VIEW umptyfratz AS SELECT * FROM datatable WHERE logdate  (NOW()- 
INTERVAL 60 days)

However this introduces several performance hits:
1) the WHERE Clause has to be recomputed every time the VIEW is 
called so that you return only the data within the specified date range. 
This may not scale well to larger tables.
2) If we delete rows from logdate older than 60 days, we create 
deletion gaps in the data. This prevents MyISAM from simply appending data 
to the end of the table (non-blocking inserts) and forces a table lock for 
each insert so that the deletion gaps can be filled in first. This limits 
concurrency severely.

However, there is a dark side to the MERGE table solution, too. As is 
explained in the manual (
http://dev.mysql.com/doc/mysql/en/MERGE_storage_engine.html), the merge 
table AND each table participating in the merge will consume an operating 
system file handle each time the merge table is opened.  For a merge table 
covering 60 daily tables, that means that the OS has to issue 61 new file 
handles PER USER ACCESSING THE TABLE. Depending on the limits of your OS, 
you may not be able to open very many files. I would recommend using 
weekly tables so that a MERGE table covering 60 days worth of data would 
only need 10 file handles from the OS (9 weeks worth of data + 1 for the 
MERGE itself)

Of course, it is imperative that before putting such a solution into 
production that any DBA must test, test, and retest to make sure they are 
getting optimal performance and resource usage.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


Re: Logging Data: Should I use MyIsam or InnoDB?

2005-01-19 Thread Martijn Tonies

I have a curious issue here, maybe someone can help.
   
I have a single process that inserts data into tables that contain
purely logging information.  This table is then searched by our Care
department to troubleshoot issues.  I am looking for the best way to
store this data, and the structure on the backend.
   
There are 50 million inserts into table LOG a day.  The primary
 index
on the table is seconds from 1971.  I only need to keep 60 days
 worth
of data, and the table is only used for read purposes.  This is my
design criteria, but my problem is how to delete old data without
crashing the log writer that is atteched to the table.
  
   OK, how about this:
  
   Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:
  
   log_2005_01_15
   log_2005_01_16
   log_2005_01_17
   log_2005_01_18
  
   etc.
  
   Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30
   day, 7 day, etc.) read views that you need, like so:
  
   CREATE TABLE log_view_7day (
  ...
   ) TYPE=MERGE UNION=(
  log_2005_01_12,
   You can then do all of your inserts from the log writer into the
 today
   table, and do your reads against the various MERGEs.
  
   Every day at exactly midnight, you would use ALTER TABLE (which is
   atomic) to redefine the UNION of the MERGE definition of the various
  tables.
 
  Modifying metadata because you need a different view at your
  data.
 
  Am I the only one to which this sound ugly?

 It sounds just fine to me. IMHO this is an excellent usage of MySQL
 features to solve a technical issue. The original poster wants to maintain
 only 60 days worth of data. He also needs really fast inserts. MyISAM can
 perform concurrent, non-blocking inserts so long as there are no deletion
 gaps in the table to which it is inserting. One solution to this is to
 create what is in effect a view that spans 60 days worth of data, each
 day's data in its own table. However, since views wont be officially
 available until later, a MERGE table is an effective substitute.

I understand the usage or MERGE here, but I have my doubts
at the table per day ...

Besides:
There are 50 million inserts into table LOG a day

This doesn't sound right -- how can this be alright:
The primary index on the table is seconds from 1971

?

With regards,

Martijn Tonies
Database Workbench - developer tool for InterBase, Firebird, MySQL  MS SQL
Server
Upscene Productions
http://www.upscene.com


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Logging Data: Should I use MyIsam or InnoDB?

2005-01-18 Thread Jeremy Cole
Hi Thomas,
I have a curious issue here, maybe someone can help.
I have a single process that inserts data into tables that contain
purely logging information.  This table is then searched by our Care
department to troubleshoot issues.  I am looking for the best way to
store this data, and the structure on the backend.
There are 50 million inserts into table LOG a day.  The primary index
on the table is seconds from 1971.  I only need to keep 60 days worth
of data, and the table is only used for read purposes.  This is my
design criteria, but my problem is how to delete old data without
crashing the log writer that is atteched to the table.
OK, how about this:
Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:
log_2005_01_15
log_2005_01_16
log_2005_01_17
log_2005_01_18
etc.
Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 
day, 7 day, etc.) read views that you need, like so:

CREATE TABLE log_view_7day (
  ...
) TYPE=MERGE UNION=(
  log_2005_01_12,
  log_2005_01_13,
  log_2005_01_14,
  log_2005_01_15,
  log_2005_01_16,
  log_2005_01_17,
  log_2005_01_18
);
Create another MERGE table for today using INSERT_METHOD:
CREATE TABLE log_view_today (
  ...
) TYPE=MERGE INSERT_METHOD=FIRST UNION=(
  log_2005_01_18
);
You can then do all of your inserts from the log writer into the today 
table, and do your reads against the various MERGEs.

Every day at exactly midnight, you would use ALTER TABLE (which is 
atomic) to redefine the UNION of the MERGE definition of the various tables.

When you want to delete your old data, it's simply a matter of doing 
an ALTER TABLE to remove them from the MERGE, and using DROP TABLE to 
drop the log__mm_dd table after that.

Also note that you could compress the old data and leave it live (but 
read only) in case you ever need it.  myisampack can help you with that.

I hope that helps!
Regards,
Jeremy
--
Jeremy Cole
Technical Yahoo - MySQL (Database) Geek
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Logging Data: Should I use MyIsam or InnoDB?

2005-01-18 Thread Thomas Lekai
Jeremy,

Thanks, this is what I was originally thinking of, but how I am getting rid of 
the data in log_view_today?  OR, are you saying that log_view_today is a merge 
table for only the current day?  That table def is defined every night?  Then I 
would go about dropping everything in whatever fashion I need.

When I recreate the merge table for just the current day, don't I have to drop 
the merge table, or it just gets recreated automatically.  I am not sure why 
you reference atomic on ALTER TABLE . . . , if there is a log writer attached 
to that table, won't I have to wait for a lock?  What do you mean by atomic?  I 
understand the term atomic transaction, just not sure of your context to this 
example.

Thanks for the idea, I was already in this neck of the woods, but the MERGE 
table just for today, I was not sure about that.  My problems in the past deal 
with the locking of the table by the logwriter, hence the need to truncate the 
table.  But while it is truncating, the table hung, hence the need for InnoDB.

Thomas.

-Original Message-
From: Jeremy Cole [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 18, 2005 2:28 PM
To: Thomas Lekai
Cc: mysql@lists.mysql.com
Subject: Re: Logging Data: Should I use MyIsam or InnoDB?


Hi Thomas,

 I have a curious issue here, maybe someone can help.
 
 I have a single process that inserts data into tables that contain
 purely logging information.  This table is then searched by our Care
 department to troubleshoot issues.  I am looking for the best way to
 store this data, and the structure on the backend.
 
 There are 50 million inserts into table LOG a day.  The primary index
 on the table is seconds from 1971.  I only need to keep 60 days worth
 of data, and the table is only used for read purposes.  This is my
 design criteria, but my problem is how to delete old data without
 crashing the log writer that is atteched to the table.

OK, how about this:

Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:

log_2005_01_15
log_2005_01_16
log_2005_01_17
log_2005_01_18

etc.

Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 
day, 7 day, etc.) read views that you need, like so:

CREATE TABLE log_view_7day (
   ...
) TYPE=MERGE UNION=(
   log_2005_01_12,
   log_2005_01_13,
   log_2005_01_14,
   log_2005_01_15,
   log_2005_01_16,
   log_2005_01_17,
   log_2005_01_18
);

Create another MERGE table for today using INSERT_METHOD:

CREATE TABLE log_view_today (
   ...
) TYPE=MERGE INSERT_METHOD=FIRST UNION=(
   log_2005_01_18
);

You can then do all of your inserts from the log writer into the today 
table, and do your reads against the various MERGEs.

Every day at exactly midnight, you would use ALTER TABLE (which is 
atomic) to redefine the UNION of the MERGE definition of the various tables.

When you want to delete your old data, it's simply a matter of doing 
an ALTER TABLE to remove them from the MERGE, and using DROP TABLE to 
drop the log__mm_dd table after that.

Also note that you could compress the old data and leave it live (but 
read only) in case you ever need it.  myisampack can help you with that.

I hope that helps!

Regards,

Jeremy

-- 
Jeremy Cole
Technical Yahoo - MySQL (Database) Geek

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Logging Data: Should I use MyIsam or InnoDB?

2005-01-18 Thread Jeremy Cole
Hi Thomas,
Thanks, this is what I was originally thinking of, but how I am
getting rid of the data in log_view_today?  OR, are you saying that
log_view_today is a merge table for only the current day?  That table
def is defined every night?  Then I would go about dropping
everything in whatever fashion I need.
Basically, at midnight you would do a sequence like this:
CREATE TABLE log_2005_01_19 ( ... );
ALTER TABLE log_view_today UNION=(log_2005_01_19);
ALTER TABLE log_view_7day UNION=(..., log_2005_01_19);
etc. etc.
You could actually create the tables beforehand, and only do the ALTER 
TABLEs themselves at midnight.

Note that this procedure has a race condition in that, depending on when 
the table actually switches over, you may have some records on either 
side of the split that don't belong.  You can always move those back 
manually with something like:

INSERT INTO log_2005_01_18 SELECT * from log_2005_01_19 WHERE datefield 
 2005-01-19 00:00:00;
DELETE FROM log_2005_01_19 WHERE datefield  2005-01-19 00:00:00;

or
INSERT INTO log_2005_01_19 SELECT * from log_2005_01_18 WHERE datefield 
= 2005-01-19 00:00:00;
DELETE FROM log_2005_01_18 WHERE datefield = 2005-01-19 00:00:00;

In some cases it might be easier to do the switch always e.g. 5 seconds 
before midnight, so that any records falling on the wrong side of the 
edge will always be on the same side.  That makes things easier sometimes.

When I recreate the merge table for just the current day, don't I
have to drop the merge table, or it just gets recreated
automatically.  I am not sure why you reference atomic on ALTER
TABLE . . . , if there is a log writer attached to that table, won't
I have to wait for a lock?  What do you mean by atomic?  I understand
the term atomic transaction, just not sure of your context to this
example.
What I mean is, you can use ALTER TABLE to change the definition (e.g. 
which tables it contains) of the MERGE table.  This happens atomically 
(no INSERTs will error, and no records could conceivably be split by 
the sudden change).

Does that all make sense?
Regards,
Jeremy
--
Jeremy Cole
Technical Yahoo - MySQL (Database) Geek
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Logging Data: Should I use MyIsam or InnoDB?

2005-01-18 Thread Thomas Lekai
Jeremy,

Thanks a million, this makes perfect sense, I will test this out asap.  In 
theory it sounds like the plan, I just need to see if it will work here.  I 
really appreciate the help . . . 

Regards,

Thomas.

-Original Message-
From: Jeremy Cole [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 18, 2005 3:30 PM
To: Thomas Lekai
Cc: mysql@lists.mysql.com
Subject: Re: Logging Data: Should I use MyIsam or InnoDB?


Hi Thomas,

 Thanks, this is what I was originally thinking of, but how I am
 getting rid of the data in log_view_today?  OR, are you saying that
 log_view_today is a merge table for only the current day?  That table
 def is defined every night?  Then I would go about dropping
 everything in whatever fashion I need.

Basically, at midnight you would do a sequence like this:

CREATE TABLE log_2005_01_19 ( ... );
ALTER TABLE log_view_today UNION=(log_2005_01_19);
ALTER TABLE log_view_7day UNION=(..., log_2005_01_19);

etc. etc.

You could actually create the tables beforehand, and only do the ALTER 
TABLEs themselves at midnight.

Note that this procedure has a race condition in that, depending on when 
the table actually switches over, you may have some records on either 
side of the split that don't belong.  You can always move those back 
manually with something like:

INSERT INTO log_2005_01_18 SELECT * from log_2005_01_19 WHERE datefield 
 2005-01-19 00:00:00;
DELETE FROM log_2005_01_19 WHERE datefield  2005-01-19 00:00:00;

or

INSERT INTO log_2005_01_19 SELECT * from log_2005_01_18 WHERE datefield 
 = 2005-01-19 00:00:00;
DELETE FROM log_2005_01_18 WHERE datefield = 2005-01-19 00:00:00;

In some cases it might be easier to do the switch always e.g. 5 seconds 
before midnight, so that any records falling on the wrong side of the 
edge will always be on the same side.  That makes things easier sometimes.

 When I recreate the merge table for just the current day, don't I
 have to drop the merge table, or it just gets recreated
 automatically.  I am not sure why you reference atomic on ALTER
 TABLE . . . , if there is a log writer attached to that table, won't
 I have to wait for a lock?  What do you mean by atomic?  I understand
 the term atomic transaction, just not sure of your context to this
 example.

What I mean is, you can use ALTER TABLE to change the definition (e.g. 
which tables it contains) of the MERGE table.  This happens atomically 
(no INSERTs will error, and no records could conceivably be split by 
the sudden change).

Does that all make sense?

Regards,

Jeremy

-- 
Jeremy Cole
Technical Yahoo - MySQL (Database) Geek

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]