Tyrrill, Ed wrote:
I have a two column table with over 160 million rows in it.  As the size
of the table grows queries on this table get exponentially slower.  I am
using version 8.1.5 32-bit on Red Hat Enterprise Linux 3.  The hardware
is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
configuration.  For current testing I am running a single database
connection with no other applications running on the machine, and the
swap is not being used at all.

Here is the table definition:

mdsdb=# \d backup_location
 Table "public.backup_location"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 record_id | bigint  | not null
 backup_id | integer | not null
Indexes:
    "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
    "backup_location_rid" btree (record_id)
Foreign-key constraints:
    "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
backups(backup_id) ON DELETE CASCADE
Here is the table size:

mdsdb=# select count(*) from backup_location;
   count
-----------
 162101296
(1 row)

And here is a simple query on this table that takes nearly 20 minutes to
return less then 3000 rows.  I ran an analyze immediately before I ran
this query:

mdsdb=# explain analyze select record_id from backup_location where
backup_id = 1070;
QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
 Index Scan using backup_location_pkey on backup_location
(cost=0.00..1475268.53 rows=412394 width=8) (actual
time=3318.057..1196723.915 rows=2752 loops=1)
   Index Cond: (backup_id = 1070)
 Total runtime: 1196725.617 ms
(3 rows)
I've got a few points. Firstly, is your data amenable to partitioning? If so that might be a big winner. Secondly, it might be more efficient for the planner to choose the backup_location_rid index than the combination primary key index. You can test this theory with this cool pg trick:

begin;
alter table backup_location drop constraint backup_location_pkey;
explain analyze select ....
rollback;

to see if it's faster.

Obviously at this point the application is not usable.  If possible we
would like to grow this table to the 3-5 billion row range, but I don't
know if that is realistic.

Any guidance would be greatly appreciated.

Without knowing more about your usage patterns, it's hard to say. But partitioning seems like your best choice at the moment.

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Reply via email to