Hi!

On Dec 04, Barry Roomberg wrote:
> I'm trying to understand the most efficient load sequence 
> possible.
> 
> I've got 2 large tables. Each is about 50 million rows,
> taking about 20GB of disk space.
> 
> My system is a dual PIII 850, running Linux 2.4.2-2smp #1 SMP.
> Memory: 1GB.
> 
> The actual load of a single table takes about 1/2 hour, but 
> indexing takes DAYS.  I've recompiled '--with-raid' which allows
> me to create a table big enough.

Why do you need RAID ? Are you using ext2fs ?
With 2.4.2 you can use reiserfs, which does not have 2GB file limit.

Are you sure MySQL uses "repair by sorting" ?
SHOW PROCESSLIST can be used to verify (after first 1/2 hour that is,
when all the data are loaded).

Below, I assume it's "repair by sort"

> Here's my start MySQL command:
> nohup nice --10 /usr/local/libexec/mysqld \
>         --basedir=/usr/local \
>         --datadir=/var/lib/mysql \
>         --user=mysql \
>         --pid-file=/var/lib/mysql/ramius.xxx.com.pid \
>         --skip-locking  \
>         -O key_buffer=512M \

It's not that important for bulk loading.

>         -O max_allowed_packet=1M \
>         -O table_cache=256 \
>         -O sort_buffer=64M \
>         -O record_buffer=1M \
>         -O myisam_sort_buffer_size=128M \

This one you'd like to increase to 512M or even more -
you can devote all your free memory to that.

> Here's the critical part of the table create script:
> 
> create table ind_1101 (
> FINDER_NUMBER_IND VARCHAR(10),
> FINDER_NUMBER_HH VARCHAR(10),
> DATE_ON_BASE DATE ,
> ...
> 100 fields later
> )
> TYPE=MyISAM
> RAID_TYPE=STRIPED
> RAID_CHUNKS=30
> RAID_CHUNKSIZE=10000
> MIN_ROWS=60000000
> MAX_ROWS=500000000

It's not all critical parts.
What indexes are in ?

> Create WITH indexes defined.
> Alter table disable indexes.
> Load data.
> Alter table enable indexes.

If you use LOAD DATA INFILE (which is recommended) into empty table,
then you need not ALTER TABLE. Having one INSERT per row is the most
inefficient approach.  INSERT ... VALUES (...),(...),(...)... is better,
but you cannot put all your 20GB in one INSERT - it's limited by packet
size.  LOAD DATA INFILE allows you to load all the 20GB in one statement.

> The 'enable' and 'disable' seem to be strictly a MySQL 4.0
> command, which I didn't initially DL, so I'm setting that
> up now.

Yes, but they are simply convenient shortcuts to

myisamchk --keys-used=0 -rq

You can enable/disable keys from command line in 3.23.

> Is there anything else I should be looking at?
> Is there a large table FAQ?

No. Only "Speed of `INSERT' Queries" section in the manual,
but you've read it already.

Note that myisam_bulk_insert_tree_size variable,
as mentioned in the 4.0 manual, does nothing for "repair-by-sort".

> Will I cut the time in 1/2 buy putting it on a 
> Dual 2GHZ Xeon (waiting for a test box right now)?

It depends. If the speed in CPU-bound (`top' shows 99% CPU load),
then adding some MHz to CPU will help. If CPU is mostly idle,
then ... you have to tune up mysqld variables :-)

Regards,
Sergei

-- 
MySQL Development Team
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, http://www.mysql.com/
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
       <___/

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to