Hi, I am running a large and mature mission-critical PostgreSQL implementation with multiple tables that are growing at a rate of several million rows per month (the rate of addition is growing as well). It is a 24x7 operation, so downtime is a party foul and apparent performance has to be reasonably good all the time.
The problem: My working set is typically several million rows (and growing) at any one time, which has a tendency to thrash the buffers mercilessly. Records are inserted in an order that does not reflect typical retrieval such that a typical query has to hit many, many blocks to collect all the rows in a given range query. CLUSTER isn't cutting it (more below). Definitely sub-optimal. Now, I've actually hacked commercial MVCC engines back in the day, and am comfortable playing around in database internals. I have an "itch to scratch" for improving the scalability of Really Large Tables by explicitly allowing control of table layouts as an optional property of the tables. I would like some feedback as to whether this is practical given the current architecture; it appears that it is, but I'm not intimately familiar with it (yet). 1.) B-tree organized tables. The primary key index also contains the entire row. Upside: Very fast retrieval for indexed queries over large data sets and very efficient buffer usage. Downside: In practice, the primary key index can be the only index on the table, and the locking is more like an index than data page. I've actually used this feature a lot for very large tables in Oracle implementations, and you can get substantial speed improvements versus normal indexed tables if used correctly on large tables. Vastly more convenient for a 24x7 operation than using the CLUSTER command, which creates nasty contention, doesn't automatically order the table on insert/update (i.e. it has to be run regularly), and basically ends up with an index that is redundant. 2.) Function/Hash organized tables (aka Partitioning). Partitioning tuples across blocks according to a user provided function is the second target on my short list. Doing this well (i.e. being able to take individual partitions offline or setting them read-only) takes a hell of a lot of work, but having a table that internally clustered its records according to a user provided hash function is a good start. This isn't a beginner feature (stupidly designed transactions can be quite costly on partitioned tables), but it is a very nice tool to have. True partitioning would be enterprise-grade gold, but that target may be a bit high for now. Both of these things really are attempts to address the same basic problem, which is optimizing the number of buffers a given query uses by making the tables layout reflect typical queries. Given the size of our typical working sets, optimizing layout greatly improves performance by reducing buffer thrashing. I would very much like to work on improving this type of feature (automatic table layout optimization), but I thought I would check with the people currently hacking on the system first, to see if there was a showstopper or if someone is already working on this. Cheers, -James Rogers [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster