I'm trying to understand the most efficient load sequence 
possible.

I've got 2 large tables. Each is about 50 million rows,
taking about 20GB of disk space.  My disk is accessed
via fibre channel, and can read/write about 40 MB per
second.

My system is a dual PIII 850, running Linux 2.4.2-2smp #1 SMP.
Memory: 1GB.

I've "downgraded" my compiler to GCC 2.95-3, and can compile
MySQL.

The actual load of a single table takes about 1/2 hour, but 
indexing takes DAYS.  I've recompiled '--with-raid' which allows
me to create a table big enough.

The queries are very fast, which is why I am attempting this.  
I currently use Oracle, but would like to offload to MySQL as
much as possible.  Once I prove this out, I figure the support
contract is WAY cheaper than my Oracle licenses for new projects.
Total Oracle load, index, analyze for this data is about 1/2 a day.

Here's my start MySQL command:
nohup nice --10 /usr/local/libexec/mysqld \
        --basedir=/usr/local \
        --datadir=/var/lib/mysql \
        --user=mysql \
        --pid-file=/var/lib/mysql/ramius.xxx.com.pid \
        --skip-locking  \
        -O key_buffer=512M \
        -O max_allowed_packet=1M \
        -O table_cache=256 \
        -O sort_buffer=64M \
        -O record_buffer=1M \
        -O myisam_sort_buffer_size=128M \
        -O thread_cache=8 \
        -O thread_concurrency=4    \
        --port=3306 \
        --skip-networking \
        --tmpdir=/xy/local/tmp/     \
        --warnings      \
        --log=/xy/local/tmp/mysql.log \
        &

It eats about 512 MB.

Here's the critical part of the table create script:

create table ind_1101 (
FINDER_NUMBER_IND VARCHAR(10),
FINDER_NUMBER_HH VARCHAR(10),
DATE_ON_BASE DATE ,
...
100 fields later
)
TYPE=MyISAM
RAID_TYPE=STRIPED
RAID_CHUNKS=30
RAID_CHUNKSIZE=10000
MIN_ROWS=60000000
MAX_ROWS=500000000

I do the load of the data and then index via a single
alter statement rather than a series of creates, to avoid
many table copies.  Note:  I get my best load performance 
by allowing 2 source files to be loaded at the same time.
If I do 1, then I don't fill the pipeline, and if I do 3,
the disk seems to thrash and wait time goes up.  I never
use more than a single CPU during this process.

I've tried InnoDB, but it loads too slow.

According to the docs, it seems that the following 
sequence should be faster:

Create WITH indexes defined.
Alter table disable indexes.
Load data.
Alter table enable indexes.

The 'enable' and 'disable' seem to be strictly a MySQL 4.0
command, which I didn't initially DL, so I'm setting that
up now.

Is there anything else I should be looking at?
Is there a large table FAQ?
Will I cut the time in 1/2 buy putting it on a 
Dual 2GHZ Xeon (waiting for a test box right now)?

Thanks for reading this far.

Barry

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to