On Tue, Sep 30, 2003 at 11:31:26PM -0700, James Rogers wrote:

> The problem: My working set is typically several million rows (and growing)
> at any one time, which has a tendency to thrash the buffers mercilessly.
> Records are inserted in an order that does not reflect typical retrieval
> such that a typical query has to hit many, many blocks to collect all the
> rows in a given range query.  CLUSTER isn't cutting it (more below).
> Definitely sub-optimal.

I have a situation that is very similar to yours, and my problems are
very similar to yours.  I hope some of your ideas are implementable in
some way, because I think they would solve some of my problems too.

> 1.) B-tree organized tables.  The primary key index also contains the
> entire row.

I think this is called a clustered index on some other database systems.
Basically you want to replace the content of the btree item with the
whole tuple, instead of the pointer to the heap element which contains
the tuple.  One thing to keep in mind is that index tuples are limited
to BLCKSZ/3 IIRC, so this limits the size of tuples that can be put in
such a "table".  TOASTing may help alleviate this problem.

As for other indexes, I'm not sure why you say this precludes the use of
other indexes.  The only thing they have to do is keep pointers to index
elements, instead of heap elements.  Doesn't sound impossible to me.

Another thing to keep in mind:  L&Y requires unique keys.  In the
current btree implementation this is worked around using the
pointers-to-heap (ctid?) to differentiate items that have the same key.
If you use a clustered index you won't have this pointer; maybe it will
be required that this index is marked UNIQUE.  This may not be a too
onerous restriction, given that the index is a primary key after all.


I don't have anything to say about 2).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseņar algo." (Jean B. Say)

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to