[HACKERS] Index/Function organized table layout

James Rogers Tue, 30 Sep 2003 23:35:02 -0700

Hi,

I am running a large and mature mission-critical PostgreSQL implementation
with multiple tables that are growing at a rate of several million rows per
month (the rate of addition is growing as well).  It is a 24x7 operation, so
downtime is a party foul and apparent performance has to be reasonably good
all the time.


The problem: My working set is typically several million rows (and growing)
at any one time, which has a tendency to thrash the buffers mercilessly.
Records are inserted in an order that does not reflect typical retrieval
such that a typical query has to hit many, many blocks to collect all the
rows in a given range query.  CLUSTER isn't cutting it (more below).
Definitely sub-optimal.

Now, I've actually hacked commercial MVCC engines back in the day, and am
comfortable playing around in database internals.  I have an "itch to
scratch" for improving the scalability of Really Large Tables by explicitly
allowing control of table layouts as an optional property of the tables.  I
would like some feedback as to whether this is practical given the current
architecture; it appears that it is, but I'm not intimately familiar with it
(yet).

1.) B-tree organized tables.  The primary key index also contains the entire
row.  Upside: Very fast retrieval for indexed queries over large data sets
and very efficient buffer usage.  Downside:  In practice, the primary key
index can be the only index on the table, and the locking is more like an
index than data page.  I've actually used this feature a lot for very large
tables in Oracle implementations, and you can get substantial speed
improvements versus normal indexed tables if used correctly on large tables.
Vastly more convenient for a 24x7 operation than using the CLUSTER command,
which creates nasty contention, doesn't automatically order the table on
insert/update (i.e. it has to be run regularly), and basically ends up with
an index that is redundant.

2.) Function/Hash organized tables (aka Partitioning).  Partitioning tuples
across blocks according to a user provided function is the second target on
my short list.  Doing this well (i.e. being able to take individual
partitions offline or setting them read-only) takes a hell of a lot of work,
but having a table that internally clustered its records according to a user
provided hash function is a good start.  This isn't a beginner feature
(stupidly designed transactions can be quite costly on partitioned tables),
but it is a very nice tool to have.  True partitioning would be
enterprise-grade gold, but that target may be a bit high for now.


Both of these things really are attempts to address the same basic problem,
which is optimizing the number of buffers a given query uses by making the
tables layout reflect typical queries.  Given the size of our typical
working sets, optimizing layout greatly improves performance by reducing
buffer thrashing.

I would very much like to work on improving this type of feature (automatic
table layout optimization), but I thought I would check with the people
currently hacking on the system first, to see if there was a showstopper or
if someone is already working on this.

Cheers,

-James Rogers
 [EMAIL PROTECTED]



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Index/Function organized table layout

Reply via email to