Hi Folks --

Ran into a couple performance issues, and looking for some optimization tips :)

I'm currently using MySQL 4.1.5-gamma, built from the bitkeeper tree a month or
so ago.  I have a table which is roughly 2500 columns by 91 million rows (I get
4 of these a month, from the data we're receiving).  The tables are MyISAM, and
look roughly like:

create table raddata_2004_10 (
  granID               integer not null,
  scanID               tinyint unsigned not null,
  fpID                 tinyint not null,
  c1                   float not null,
  c2                   float not null,
  c3                   float not null,
           [BIG SNIP]
  c2499                float not null,
  c2500                float not null,
  primary key (granID, scanID, fpID)
)

The first issue I'm having is indexing performance.  Adding rows into the table
comes in data granules, each of which adds just over 12,000 rows into this
table.  With the keys disabled, inserts take about 50 seconds.  With keys
enabled, the insert takes 120-125 seconds.  I can escape some of the pain by
just not "promoting" a table for the users (adding it to the larger merge
tables that the users issue queries against) until the entire month's data has
been added, and then enabling the keys.  On a table with a full month's data,
ALTER TABLE raddata_2004_10 ENABLE KEYS takes about 22 hours to complete. 
Fortunately, its a write-once setup, for the most part, and most of the work of
the database engine is selecting/subsetting the data once it's in.

So, is there a faster way to insert/index the data?  Would a different table or
index type improve performace?

The second issue is query performance.  It seems that regardless of what fields
are selected, it reads the entire row?  Since a monthly table averages 840GB,
this takes a while, even on a well-organized query like 'Select c1 from
raddata_2004_10 where granID between 147568 and 150000' (granID tracks the
granules from the start of the mission, and is a calculated value based on the
date/time of the data in the file -- incrementing over time.  The 2500 granules
in the example above represent about 1/3 of a month -- 10 days or so).  Even
making sure it's using the index to do the query (which it does anyhow, for the
above query, according to explain), the current behavior makes it reads
29548800 complete rows, which is 275.4 GB of data to read, even though the
desired return is about 1/2500th of that (112.7 MB).

All of these tables are fixed-format rows.  Is there a configuration or code
tweak that would enable MySQL to just read the requested data pieces, or even a
smaller subset of each row?  With a fixed row format, each field offset should
be a relatively simple calculation within a row.

Any/all suggestions, comments, even flames are welcoome :)  Thanks in advance!

ken
===========================================================================
              "Diplomacy is the weapon of the Civilized Warrior"
                                    - Hun, A.T.

Ken Gieselman                                      [EMAIL PROTECTED]
System Administrator                   http://www.endlessknot.com/~kgieselm
Endlessknot Communications                       http://www.endlessknot.com
===========================================================================





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to