Re: Logging Data: Should I use MyIsam or InnoDB?
Hello Jeremy, I have a curious issue here, maybe someone can help. I have a single process that inserts data into tables that contain purely logging information. This table is then searched by our Care department to troubleshoot issues. I am looking for the best way to store this data, and the structure on the backend. There are 50 million inserts into table LOG a day. The primary index on the table is seconds from 1971. I only need to keep 60 days worth of data, and the table is only used for read purposes. This is my design criteria, but my problem is how to delete old data without crashing the log writer that is atteched to the table. OK, how about this: Use MyISAM and MERGE tables. Keep one table per day. E.g.: log_2005_01_15 log_2005_01_16 log_2005_01_17 log_2005_01_18 etc. Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 day, 7 day, etc.) read views that you need, like so: CREATE TABLE log_view_7day ( ... ) TYPE=MERGE UNION=( log_2005_01_12, You can then do all of your inserts from the log writer into the today table, and do your reads against the various MERGEs. Every day at exactly midnight, you would use ALTER TABLE (which is atomic) to redefine the UNION of the MERGE definition of the various tables. Modifying metadata because you need a different view at your data. Am I the only one to which this sound ugly? With regards, Martijn Tonies Database Workbench - developer tool for InterBase, Firebird, MySQL MS SQL Server Upscene Productions http://www.upscene.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Logging Data: Should I use MyIsam or InnoDB?
Martijn Tonies wrote: Hello Jeremy, I have a curious issue here, maybe someone can help. I have a single process that inserts data into tables that contain purely logging information. This table is then searched by our Care department to troubleshoot issues. I am looking for the best way to store this data, and the structure on the backend. There are 50 million inserts into table LOG a day. The primary index on the table is seconds from 1971. Sorry, as I understand, there are not 5000 seconds in a day, so it can not be primary key. OK, how about this: Use MyISAM and MERGE tables. Keep one table per day. E.g.: log_2005_01_15 log_2005_01_16 log_2005_01_17 log_2005_01_18 etc. Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 day, 7 day, etc.) read views that you need, like so: CREATE TABLE log_view_7day ( ... ) TYPE=MERGE UNION=( log_2005_01_12, You can then do all of your inserts from the log writer into the today table, and do your reads against the various MERGEs. Every day at exactly midnight, you would use ALTER TABLE (which is atomic) to redefine the UNION of the MERGE definition of the various tables. Modifying metadata because you need a different view at your data. Am I the only one to which this sound ugly? With regards, Martijn Tonies Database Workbench - developer tool for InterBase, Firebird, MySQL MS SQL Server Upscene Productions http://www.upscene.com -- Cuando todo esta bajo control, es que no vamos suficientemente deprisa Javier Armendáriz [EMAIL PROTECTED]
Re: Logging Data: Should I use MyIsam or InnoDB?
Martijn Tonies [EMAIL PROTECTED] wrote on 01/19/2005 03:33:32 AM: Hello Jeremy, I have a curious issue here, maybe someone can help. I have a single process that inserts data into tables that contain purely logging information. This table is then searched by our Care department to troubleshoot issues. I am looking for the best way to store this data, and the structure on the backend. There are 50 million inserts into table LOG a day. The primary index on the table is seconds from 1971. I only need to keep 60 days worth of data, and the table is only used for read purposes. This is my design criteria, but my problem is how to delete old data without crashing the log writer that is atteched to the table. OK, how about this: Use MyISAM and MERGE tables. Keep one table per day. E.g.: log_2005_01_15 log_2005_01_16 log_2005_01_17 log_2005_01_18 etc. Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 day, 7 day, etc.) read views that you need, like so: CREATE TABLE log_view_7day ( ... ) TYPE=MERGE UNION=( log_2005_01_12, You can then do all of your inserts from the log writer into the today table, and do your reads against the various MERGEs. Every day at exactly midnight, you would use ALTER TABLE (which is atomic) to redefine the UNION of the MERGE definition of the various tables. Modifying metadata because you need a different view at your data. Am I the only one to which this sound ugly? With regards, Martijn Tonies Database Workbench - developer tool for InterBase, Firebird, MySQL MS SQL Server Upscene Productions http://www.upscene.com It sounds just fine to me. IMHO this is an excellent usage of MySQL features to solve a technical issue. The original poster wants to maintain only 60 days worth of data. He also needs really fast inserts. MyISAM can perform concurrent, non-blocking inserts so long as there are no deletion gaps in the table to which it is inserting. One solution to this is to create what is in effect a view that spans 60 days worth of data, each day's data in its own table. However, since views wont be officially available until later, a MERGE table is an effective substitute. This gives him many kinds of granularity options when performing queries, too. He can query the daily table directly if he knows that the query does not span days. He could use multiple MERGE tables to aggregate various ranges of daily data. He could UNION several single-day queries together to cover a span or list of dates. All of these have various advantages. The biggest of which is that when it comes time to move the out of date data off-line, he won't have to lock the current day's data to do it. One alternative could be a VIEW built against a single, larger table with a query specifying a date range of 60 days. Something like: CREATE VIEW umptyfratz AS SELECT * FROM datatable WHERE logdate (NOW()- INTERVAL 60 days) However this introduces several performance hits: 1) the WHERE Clause has to be recomputed every time the VIEW is called so that you return only the data within the specified date range. This may not scale well to larger tables. 2) If we delete rows from logdate older than 60 days, we create deletion gaps in the data. This prevents MyISAM from simply appending data to the end of the table (non-blocking inserts) and forces a table lock for each insert so that the deletion gaps can be filled in first. This limits concurrency severely. However, there is a dark side to the MERGE table solution, too. As is explained in the manual ( http://dev.mysql.com/doc/mysql/en/MERGE_storage_engine.html), the merge table AND each table participating in the merge will consume an operating system file handle each time the merge table is opened. For a merge table covering 60 daily tables, that means that the OS has to issue 61 new file handles PER USER ACCESSING THE TABLE. Depending on the limits of your OS, you may not be able to open very many files. I would recommend using weekly tables so that a MERGE table covering 60 days worth of data would only need 10 file handles from the OS (9 weeks worth of data + 1 for the MERGE itself) Of course, it is imperative that before putting such a solution into production that any DBA must test, test, and retest to make sure they are getting optimal performance and resource usage. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Logging Data: Should I use MyIsam or InnoDB?
I have a curious issue here, maybe someone can help. I have a single process that inserts data into tables that contain purely logging information. This table is then searched by our Care department to troubleshoot issues. I am looking for the best way to store this data, and the structure on the backend. There are 50 million inserts into table LOG a day. The primary index on the table is seconds from 1971. I only need to keep 60 days worth of data, and the table is only used for read purposes. This is my design criteria, but my problem is how to delete old data without crashing the log writer that is atteched to the table. OK, how about this: Use MyISAM and MERGE tables. Keep one table per day. E.g.: log_2005_01_15 log_2005_01_16 log_2005_01_17 log_2005_01_18 etc. Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 day, 7 day, etc.) read views that you need, like so: CREATE TABLE log_view_7day ( ... ) TYPE=MERGE UNION=( log_2005_01_12, You can then do all of your inserts from the log writer into the today table, and do your reads against the various MERGEs. Every day at exactly midnight, you would use ALTER TABLE (which is atomic) to redefine the UNION of the MERGE definition of the various tables. Modifying metadata because you need a different view at your data. Am I the only one to which this sound ugly? It sounds just fine to me. IMHO this is an excellent usage of MySQL features to solve a technical issue. The original poster wants to maintain only 60 days worth of data. He also needs really fast inserts. MyISAM can perform concurrent, non-blocking inserts so long as there are no deletion gaps in the table to which it is inserting. One solution to this is to create what is in effect a view that spans 60 days worth of data, each day's data in its own table. However, since views wont be officially available until later, a MERGE table is an effective substitute. I understand the usage or MERGE here, but I have my doubts at the table per day ... Besides: There are 50 million inserts into table LOG a day This doesn't sound right -- how can this be alright: The primary index on the table is seconds from 1971 ? With regards, Martijn Tonies Database Workbench - developer tool for InterBase, Firebird, MySQL MS SQL Server Upscene Productions http://www.upscene.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Logging Data: Should I use MyIsam or InnoDB?
Hi Thomas, I have a curious issue here, maybe someone can help. I have a single process that inserts data into tables that contain purely logging information. This table is then searched by our Care department to troubleshoot issues. I am looking for the best way to store this data, and the structure on the backend. There are 50 million inserts into table LOG a day. The primary index on the table is seconds from 1971. I only need to keep 60 days worth of data, and the table is only used for read purposes. This is my design criteria, but my problem is how to delete old data without crashing the log writer that is atteched to the table. OK, how about this: Use MyISAM and MERGE tables. Keep one table per day. E.g.: log_2005_01_15 log_2005_01_16 log_2005_01_17 log_2005_01_18 etc. Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 day, 7 day, etc.) read views that you need, like so: CREATE TABLE log_view_7day ( ... ) TYPE=MERGE UNION=( log_2005_01_12, log_2005_01_13, log_2005_01_14, log_2005_01_15, log_2005_01_16, log_2005_01_17, log_2005_01_18 ); Create another MERGE table for today using INSERT_METHOD: CREATE TABLE log_view_today ( ... ) TYPE=MERGE INSERT_METHOD=FIRST UNION=( log_2005_01_18 ); You can then do all of your inserts from the log writer into the today table, and do your reads against the various MERGEs. Every day at exactly midnight, you would use ALTER TABLE (which is atomic) to redefine the UNION of the MERGE definition of the various tables. When you want to delete your old data, it's simply a matter of doing an ALTER TABLE to remove them from the MERGE, and using DROP TABLE to drop the log__mm_dd table after that. Also note that you could compress the old data and leave it live (but read only) in case you ever need it. myisampack can help you with that. I hope that helps! Regards, Jeremy -- Jeremy Cole Technical Yahoo - MySQL (Database) Geek -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Logging Data: Should I use MyIsam or InnoDB?
Jeremy, Thanks, this is what I was originally thinking of, but how I am getting rid of the data in log_view_today? OR, are you saying that log_view_today is a merge table for only the current day? That table def is defined every night? Then I would go about dropping everything in whatever fashion I need. When I recreate the merge table for just the current day, don't I have to drop the merge table, or it just gets recreated automatically. I am not sure why you reference atomic on ALTER TABLE . . . , if there is a log writer attached to that table, won't I have to wait for a lock? What do you mean by atomic? I understand the term atomic transaction, just not sure of your context to this example. Thanks for the idea, I was already in this neck of the woods, but the MERGE table just for today, I was not sure about that. My problems in the past deal with the locking of the table by the logwriter, hence the need to truncate the table. But while it is truncating, the table hung, hence the need for InnoDB. Thomas. -Original Message- From: Jeremy Cole [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 18, 2005 2:28 PM To: Thomas Lekai Cc: mysql@lists.mysql.com Subject: Re: Logging Data: Should I use MyIsam or InnoDB? Hi Thomas, I have a curious issue here, maybe someone can help. I have a single process that inserts data into tables that contain purely logging information. This table is then searched by our Care department to troubleshoot issues. I am looking for the best way to store this data, and the structure on the backend. There are 50 million inserts into table LOG a day. The primary index on the table is seconds from 1971. I only need to keep 60 days worth of data, and the table is only used for read purposes. This is my design criteria, but my problem is how to delete old data without crashing the log writer that is atteched to the table. OK, how about this: Use MyISAM and MERGE tables. Keep one table per day. E.g.: log_2005_01_15 log_2005_01_16 log_2005_01_17 log_2005_01_18 etc. Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30 day, 7 day, etc.) read views that you need, like so: CREATE TABLE log_view_7day ( ... ) TYPE=MERGE UNION=( log_2005_01_12, log_2005_01_13, log_2005_01_14, log_2005_01_15, log_2005_01_16, log_2005_01_17, log_2005_01_18 ); Create another MERGE table for today using INSERT_METHOD: CREATE TABLE log_view_today ( ... ) TYPE=MERGE INSERT_METHOD=FIRST UNION=( log_2005_01_18 ); You can then do all of your inserts from the log writer into the today table, and do your reads against the various MERGEs. Every day at exactly midnight, you would use ALTER TABLE (which is atomic) to redefine the UNION of the MERGE definition of the various tables. When you want to delete your old data, it's simply a matter of doing an ALTER TABLE to remove them from the MERGE, and using DROP TABLE to drop the log__mm_dd table after that. Also note that you could compress the old data and leave it live (but read only) in case you ever need it. myisampack can help you with that. I hope that helps! Regards, Jeremy -- Jeremy Cole Technical Yahoo - MySQL (Database) Geek -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Logging Data: Should I use MyIsam or InnoDB?
Hi Thomas, Thanks, this is what I was originally thinking of, but how I am getting rid of the data in log_view_today? OR, are you saying that log_view_today is a merge table for only the current day? That table def is defined every night? Then I would go about dropping everything in whatever fashion I need. Basically, at midnight you would do a sequence like this: CREATE TABLE log_2005_01_19 ( ... ); ALTER TABLE log_view_today UNION=(log_2005_01_19); ALTER TABLE log_view_7day UNION=(..., log_2005_01_19); etc. etc. You could actually create the tables beforehand, and only do the ALTER TABLEs themselves at midnight. Note that this procedure has a race condition in that, depending on when the table actually switches over, you may have some records on either side of the split that don't belong. You can always move those back manually with something like: INSERT INTO log_2005_01_18 SELECT * from log_2005_01_19 WHERE datefield 2005-01-19 00:00:00; DELETE FROM log_2005_01_19 WHERE datefield 2005-01-19 00:00:00; or INSERT INTO log_2005_01_19 SELECT * from log_2005_01_18 WHERE datefield = 2005-01-19 00:00:00; DELETE FROM log_2005_01_18 WHERE datefield = 2005-01-19 00:00:00; In some cases it might be easier to do the switch always e.g. 5 seconds before midnight, so that any records falling on the wrong side of the edge will always be on the same side. That makes things easier sometimes. When I recreate the merge table for just the current day, don't I have to drop the merge table, or it just gets recreated automatically. I am not sure why you reference atomic on ALTER TABLE . . . , if there is a log writer attached to that table, won't I have to wait for a lock? What do you mean by atomic? I understand the term atomic transaction, just not sure of your context to this example. What I mean is, you can use ALTER TABLE to change the definition (e.g. which tables it contains) of the MERGE table. This happens atomically (no INSERTs will error, and no records could conceivably be split by the sudden change). Does that all make sense? Regards, Jeremy -- Jeremy Cole Technical Yahoo - MySQL (Database) Geek -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Logging Data: Should I use MyIsam or InnoDB?
Jeremy, Thanks a million, this makes perfect sense, I will test this out asap. In theory it sounds like the plan, I just need to see if it will work here. I really appreciate the help . . . Regards, Thomas. -Original Message- From: Jeremy Cole [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 18, 2005 3:30 PM To: Thomas Lekai Cc: mysql@lists.mysql.com Subject: Re: Logging Data: Should I use MyIsam or InnoDB? Hi Thomas, Thanks, this is what I was originally thinking of, but how I am getting rid of the data in log_view_today? OR, are you saying that log_view_today is a merge table for only the current day? That table def is defined every night? Then I would go about dropping everything in whatever fashion I need. Basically, at midnight you would do a sequence like this: CREATE TABLE log_2005_01_19 ( ... ); ALTER TABLE log_view_today UNION=(log_2005_01_19); ALTER TABLE log_view_7day UNION=(..., log_2005_01_19); etc. etc. You could actually create the tables beforehand, and only do the ALTER TABLEs themselves at midnight. Note that this procedure has a race condition in that, depending on when the table actually switches over, you may have some records on either side of the split that don't belong. You can always move those back manually with something like: INSERT INTO log_2005_01_18 SELECT * from log_2005_01_19 WHERE datefield 2005-01-19 00:00:00; DELETE FROM log_2005_01_19 WHERE datefield 2005-01-19 00:00:00; or INSERT INTO log_2005_01_19 SELECT * from log_2005_01_18 WHERE datefield = 2005-01-19 00:00:00; DELETE FROM log_2005_01_18 WHERE datefield = 2005-01-19 00:00:00; In some cases it might be easier to do the switch always e.g. 5 seconds before midnight, so that any records falling on the wrong side of the edge will always be on the same side. That makes things easier sometimes. When I recreate the merge table for just the current day, don't I have to drop the merge table, or it just gets recreated automatically. I am not sure why you reference atomic on ALTER TABLE . . . , if there is a log writer attached to that table, won't I have to wait for a lock? What do you mean by atomic? I understand the term atomic transaction, just not sure of your context to this example. What I mean is, you can use ALTER TABLE to change the definition (e.g. which tables it contains) of the MERGE table. This happens atomically (no INSERTs will error, and no records could conceivably be split by the sudden change). Does that all make sense? Regards, Jeremy -- Jeremy Cole Technical Yahoo - MySQL (Database) Geek -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]