"Martijn Tonies" <[EMAIL PROTECTED]> wrote on 01/19/2005 03:33:32 AM:

> Hello Jeremy,
> 
> > > I have a curious issue here, maybe someone can help.
> > >
> > > I have a single process that inserts data into tables that contain
> > > purely logging information.  This table is then searched by our Care
> > > department to troubleshoot issues.  I am looking for the best way to
> > > store this data, and the structure on the backend.
> > >
> > > There are 50 million inserts into table LOG a day.  The primary 
index
> > > on the table is seconds from 1971.  I only need to keep 60 days 
worth
> > > of data, and the table is only used for read purposes.  This is my
> > > design criteria, but my problem is how to delete old data without
> > > crashing the log writer that is atteched to the table.
> >
> > OK, how about this:
> >
> > Use MyISAM and MERGE tables.  Keep one table per day.  E.g.:
> >
> > log_2005_01_15
> > log_2005_01_16
> > log_2005_01_17
> > log_2005_01_18
> >
> > etc.
> >
> > Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30
> > day, 7 day, etc.) read views that you need, like so:
> >
> > CREATE TABLE log_view_7day (
> >    ...
> > ) TYPE=MERGE UNION=(
> >    log_2005_01_12,
> > You can then do all of your inserts from the log writer into the 
"today"
> > table, and do your reads against the various MERGEs.
> >
> > Every day at exactly midnight, you would use ALTER TABLE (which is
> > atomic) to redefine the UNION of the MERGE definition of the various
> tables.
> 
> Modifying metadata because you need a different view at your
> data.
> 
> Am I the only one to which this sound ugly?
> 
> With regards,
> 
> Martijn Tonies
> Database Workbench - developer tool for InterBase, Firebird, MySQL & MS 
SQL
> Server
> Upscene Productions
> http://www.upscene.com
> 

It sounds just fine to me. IMHO this is an excellent usage of MySQL 
features to solve a technical issue. The original poster wants to maintain 
only 60 days worth of data. He also needs really fast inserts. MyISAM can 
perform concurrent, non-blocking inserts so long as there are no deletion 
gaps in the table to which it is inserting. One solution to this is to 
create what is in effect a "view" that spans 60 days worth of data, each 
day's data in its own table. However, since views wont be officially 
available until later, a MERGE table is an effective substitute.

This gives him many kinds of granularity options when performing queries, 
too. He can query the daily table directly if he knows that the query does 
not span days. He could use multiple MERGE tables to aggregate various 
ranges of daily data. He could UNION several single-day queries together 
to cover a span or list of dates. All of these have various advantages. 
The biggest of which is that when it comes time to move the out of date 
data off-line, he won't have to lock the current day's data to do it.

One alternative could be a VIEW built against a single, larger table with 
a query specifying a date range of 60 days. Something like:

CREATE VIEW umptyfratz AS SELECT * FROM datatable WHERE logdate > (NOW()- 
INTERVAL 60 days)

However this introduces several performance hits:
        1) the WHERE Clause has to be recomputed every time the VIEW is 
called so that you return only the data within the specified date range. 
This may not scale well to larger tables.
        2) If we delete rows from "logdate" older than 60 days, we create 
deletion gaps in the data. This prevents MyISAM from simply appending data 
to the end of the table (non-blocking inserts) and forces a table lock for 
each insert so that the deletion gaps can be filled in first. This limits 
concurrency severely.

However, there is a dark side to the MERGE table solution, too. As is 
explained in the manual (
http://dev.mysql.com/doc/mysql/en/MERGE_storage_engine.html), the merge 
table AND each table participating in the merge will consume an operating 
system file handle each time the merge table is opened.  For a merge table 
covering 60 daily tables, that means that the OS has to issue 61 new file 
handles PER USER ACCESSING THE TABLE. Depending on the limits of your OS, 
you may not be able to open very many files. I would recommend using 
weekly tables so that a MERGE table covering 60 days worth of data would 
only need 10 file handles from the OS (9 weeks worth of data + 1 for the 
MERGE itself)

Of course, it is imperative that before putting such a solution into 
production that any DBA must test, test, and retest to make sure they are 
getting optimal performance and resource usage.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Reply via email to