On 10/2/03 11:34 PM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote: > James Rogers kirjutas N, 02.10.2003 kell 23:44: >> Not exactly. What you are describing is more akin to partitioning or >> hash-organized tables i.e. sorting insert/update tuples to various pages >> according to some hash function. > > What I actually thought I was describing is how CLUSTER should work in a > postgres flavour of MVCC storage ;). Not the CLUSTER command, but the > whole feature.
Yeah, I can see this. "Clustering" doesn't really imply ordering, just tuple proximity in the heap. Loosely implemented by default (i.e. only grouping a tuple if it is cheap to do so at insert/update time) on a table's primary key might give measurable query performance improvement in the typical case without impacting the average performance of queries that use other indexes. Even if the tuples were not ordered in the heap, tuple proximity in the heap would solve a significant percentage of the performance issue that index-organized tables solve i.e. greatly improved cache efficiency. > AFAICS we could resolve this problem (querying indexes only) by keeping > a copy of visibility info (tmin,tmax,...) in index tuples. This would > make index updates bigger and thus slower, so this should be optional. It wouldn't be a feature you would want to use anyway if actually indexing a table. But yes, adding the necessary support to make an index page masquerade as a proper heap page would be most of the effort. That and making the SQL execution efficiently make use of the fact that the index and the table are the same relation with two different interfaces, such that you don't have to modify or access "both" when using either one. > If you then put all fields in primary key, then the main table could be > dropped. If there is no "data" table then no other indexes would then be > allowed, or they must be "double-indexes" referencing the primary key, > not tuple and thus even bigger ... Putting an index on an index is pretty horrible. But quite frankly, if you actually require a second index on a table, then B-Tree organized tables are not what you need. > But if we had clustered the table on (id, timestamp), then the data > would be in right order for queries, if cluster worked well. Right, and it doesn't even have to be in perfect index order really, as what I'm really trying to do is significantly decrease the amount of buffers required for a typical query. Having to order the tuples later is a small matter. > Yeah, index-organized tables seems exact fit for your problem, but then > my abstract idea of what clustering should do is exactly that - keep > tuples in roughly the same order as an index ;) It would be a good approximation of what index-organizing aims to accomplish. > So what really is needed is a smart tuple-placer which can keep tuples > that are close (as defined by index) together in a small number of > pages. These pages themselves need not be coninuous, they can be > sprinkled around in the whole data table, but they need to stay clusters > of index-close tuples. More or less, yes. Cheers, -James Rogers [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])