Max Nitribitt wrote:

> Hi Holger,
> 
> thank you for your help. I'm running the database on
> a:
> OS: LINUX (Suse 7.2)
> 768 MB DDR-RAM 
> ATHLON 1400 MHz (256 K full speed L2 Cache)
> HD: 160 GB (Raid5)
> 
> The table has 560 Million rows (not 5 600 000 000 as I
> stated before (sorry ;))) with about 450 Million
> distinct values in the field I'm trying to Index.
> TABLE: TOF
> Column a: INTEGER <- this is what I want to Index
> Column b: CHAR (9)
> (No Primary Key specified; single candidate key is
> combination of both columns)
> 
> There are no other users connected to the database nor
> to the machine it's running on yet. 
> What would be a sensible Value for DATA_CACHE and
> _IDX_FILE_SIZE under these circumstances?
> 


>>1. DATA_CACHE should be as big as possible to
>>minimize disk I/O.
>> ...
>>algorithm. A larger number of
>>buffers decreases the number of merge steps, and
>>thus increases performance.
>> ...
>>3. Spread your database over several disks to
>>increase performance by
>>parallel I/O.



Hi Max, Holger,

The very first thing to consider is upping the amount of RAM. Can you
cram 2, 4 or 6+ GB in that box? The second thing is minimising the load by
other apps (if any) on the CPU but especially the RAM and disks.

Another thing to consider is the L2 cache, since that's even faster
than main memory. Besides increasing the number of spindles you can
also increase the number of channels. As well as making sure you've
got the latest and greatest SCSI goodies. If the disks are on the same
bus they still can get in each others way.

What would be the difference betwixt having but one candidate key
and a primary key? The primary key is a candidate key picked more
or less at random, using arcane arts or sometimes by divination. It's
still a candidate key.

Anyway, having but the int for the value and another one to point to
the physical row, you already have to juggle 4480 M bytes of information
around. That is without the balanced tree pointer rigmarole. So you got
a problem. You can't fit that into main memory.

That means virtual memory which means disk access. Better have a
separate and very snappy disk for that. And controller, naturally.
Else you got one very fast processor twiddling tumbs, waiting for the
next batch of data from whatever disk it's coming from.

My 2 cents.

Roelof

PS good thing hardware prices keep going down ;).

-- 
_______________________________________________________________________
eBOA®                                               est. 1982
http://eBOA.com/                                    tel. +31-58-2123014
mailto:[EMAIL PROTECTED]?subject=Information_request    fax. +31-58-2160293

Reply via email to