This idea has been rejected to do poor performance results reported later in the thread.
--------------------------------------------------------------------------- Heikki Linnakangas wrote: > While thinking about index-organized-tables and similar ideas, it > occurred to me that there's some low-hanging-fruit: maintaining cluster > order on inserts by trying to place new heap tuples close to other > similar tuples. That involves asking the index am where on the heap the > new tuple should go, and trying to insert it there before using the FSM. > Using the new fillfactor parameter makes it more likely that there's > room on the page. We don't worry about the order within the page. > > The API I'm thinking of introduces a new optional index am function, > amsuggestblock (suggestions for a better name are welcome). It gets the > same parameters as aminsert, and returns the heap block number that > would be optimal place to put the new tuple. It's be called from > ExecInsert before inserting the heap tuple, and the suggestion is passed > on to heap_insert and RelationGetBufferForTuple. > > I wrote a little patch to implement this for btree, attached. > > This could be optimized by changing the existing aminsert API, because > as it is, an insert will have to descend the btree twice. Once in > amsuggestblock and then in aminsert. amsuggestblock could keep the right > index page pinned so aminsert could locate it quicker. But I wanted to > keep this simple for now. Another improvement might be to allow > amsuggestblock to return a list of suggestions, but that makes it more > expensive to insert if there isn't room in the suggested pages, since > heap_insert will have to try them all before giving up. > > Comments regarding the general idea or the patch? There should probably > be a index option to turn the feature on and off. You'll want to turn it > off when you first load a table, and turn it on after CLUSTER to keep it > clustered. > > Since there's been discussion on keeping the TODO list more up-to-date, > I hereby officially claim the "Automatically maintain clustering on a > table" TODO item :). Feel free to bombard me with requests for status > reports. And just to be clear, I'm not trying to sneak this into 8.2 > anymore, this is 8.3 stuff. > > I won't be implementing a background daemon described on the TODO item, > since that would essentially be an online version of CLUSTER. Which sure > would be nice, but that's a different story. > > - Heikki > [ text/x-patch is unsupported, treating like TEXT/PLAIN ] > Index: doc/src/sgml/catalogs.sgml > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/catalogs.sgml,v > retrieving revision 2.129 > diff -c -r2.129 catalogs.sgml > *** doc/src/sgml/catalogs.sgml 31 Jul 2006 20:08:55 -0000 2.129 > --- doc/src/sgml/catalogs.sgml 8 Aug 2006 16:17:21 -0000 > *************** > *** 499,504 **** > --- 499,511 ---- > <entry>Function to parse and validate reloptions for an index</entry> > </row> > > + <row> > + <entry><structfield>amsuggestblock</structfield></entry> > + <entry><type>regproc</type></entry> > + <entry><literal><link > linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry> > + <entry>Get the best place in the heap to put a new tuple</entry> > + </row> > + > </tbody> > </tgroup> > </table> > Index: doc/src/sgml/indexam.sgml > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/indexam.sgml,v > retrieving revision 2.16 > diff -c -r2.16 indexam.sgml > *** doc/src/sgml/indexam.sgml 31 Jul 2006 20:08:59 -0000 2.16 > --- doc/src/sgml/indexam.sgml 8 Aug 2006 17:15:25 -0000 > *************** > *** 391,396 **** > --- 391,414 ---- > <function>amoptions</> to test validity of options settings. > </para> > > + <para> > + <programlisting> > + BlockNumber > + amsuggestblock (Relation indexRelation, > + Datum *values, > + bool *isnull, > + Relation heapRelation); > + </programlisting> > + Gets the optimal place in the heap for a new tuple. The parameters > + correspond the parameters for <literal>aminsert</literal>. > + This function is called on the clustered index before a new tuple > + is inserted to the heap, and it should choose the optimal insertion > + target page on the heap in such manner that the heap stays as close > + as possible to the index order. > + <literal>amsuggestblock</literal> can return InvalidBlockNumber if > + the index am doesn't have a suggestion. > + </para> > + > </sect1> > > <sect1 id="index-scanning"> > Index: src/backend/access/heap/heapam.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v > retrieving revision 1.218 > diff -c -r1.218 heapam.c > *** src/backend/access/heap/heapam.c 31 Jul 2006 20:08:59 -0000 1.218 > --- src/backend/access/heap/heapam.c 8 Aug 2006 16:17:21 -0000 > *************** > *** 1325,1330 **** > --- 1325,1335 ---- > * use_fsm is passed directly to RelationGetBufferForTuple, which see for > * more info. > * > + * suggested_blk can be set by the caller to hint heap_insert which > + * block would be the best place to put the new tuple in. heap_insert can > + * ignore the suggestion, if there's not enough room on that block. > + * InvalidBlockNumber means no preference. > + * > * The return value is the OID assigned to the tuple (either here or by the > * caller), or InvalidOid if no OID. The header fields of *tup are updated > * to match the stored tuple; in particular tup->t_self receives the actual > *************** > *** 1333,1339 **** > */ > Oid > heap_insert(Relation relation, HeapTuple tup, CommandId cid, > ! bool use_wal, bool use_fsm) > { > TransactionId xid = GetCurrentTransactionId(); > HeapTuple heaptup; > --- 1338,1344 ---- > */ > Oid > heap_insert(Relation relation, HeapTuple tup, CommandId cid, > ! bool use_wal, bool use_fsm, BlockNumber suggested_blk) > { > TransactionId xid = GetCurrentTransactionId(); > HeapTuple heaptup; > *************** > *** 1386,1392 **** > > /* Find buffer to insert this tuple into */ > buffer = RelationGetBufferForTuple(relation, heaptup->t_len, > ! > InvalidBuffer, use_fsm); > > /* NO EREPORT(ERROR) from here till changes are logged */ > START_CRIT_SECTION(); > --- 1391,1397 ---- > > /* Find buffer to insert this tuple into */ > buffer = RelationGetBufferForTuple(relation, heaptup->t_len, > ! > InvalidBuffer, use_fsm, suggested_blk); > > /* NO EREPORT(ERROR) from here till changes are logged */ > START_CRIT_SECTION(); > *************** > *** 1494,1500 **** > Oid > simple_heap_insert(Relation relation, HeapTuple tup) > { > ! return heap_insert(relation, tup, GetCurrentCommandId(), true, true); > } > > /* > --- 1499,1506 ---- > Oid > simple_heap_insert(Relation relation, HeapTuple tup) > { > ! return heap_insert(relation, tup, GetCurrentCommandId(), true, > ! true, InvalidBlockNumber); > } > > /* > *************** > *** 2079,2085 **** > { > /* Assume there's no chance to put heaptup on same > page. */ > newbuf = RelationGetBufferForTuple(relation, > heaptup->t_len, > ! > buffer, true); > } > else > { > --- 2085,2092 ---- > { > /* Assume there's no chance to put heaptup on same > page. */ > newbuf = RelationGetBufferForTuple(relation, > heaptup->t_len, > ! > buffer, true, > ! > InvalidBlockNumber); > } > else > { > *************** > *** 2096,2102 **** > */ > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > newbuf = RelationGetBufferForTuple(relation, > heaptup->t_len, > ! > buffer, true); > } > else > { > --- 2103,2110 ---- > */ > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > newbuf = RelationGetBufferForTuple(relation, > heaptup->t_len, > ! > buffer, true, > ! > InvalidBlockNumber); > } > else > { > Index: src/backend/access/heap/hio.c > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/hio.c,v > retrieving revision 1.63 > diff -c -r1.63 hio.c > *** src/backend/access/heap/hio.c 3 Jul 2006 22:45:37 -0000 1.63 > --- src/backend/access/heap/hio.c 9 Aug 2006 18:03:01 -0000 > *************** > *** 93,98 **** > --- 93,100 ---- > * any committed data of other transactions. (See heap_insert's comments > * for additional constraints needed for safe usage of this behavior.) > * > + * If the caller has a suggestion, it's passed in suggestedBlock. > + * > * We always try to avoid filling existing pages further than the > fillfactor. > * This is OK since this routine is not consulted when updating a tuple and > * keeping it on the same page, which is the scenario fillfactor is meant > *************** > *** 103,109 **** > */ > Buffer > RelationGetBufferForTuple(Relation relation, Size len, > ! Buffer otherBuffer, bool > use_fsm) > { > Buffer buffer = InvalidBuffer; > Page pageHeader; > --- 105,112 ---- > */ > Buffer > RelationGetBufferForTuple(Relation relation, Size len, > ! Buffer otherBuffer, bool > use_fsm, > ! BlockNumber suggestedBlock) > { > Buffer buffer = InvalidBuffer; > Page pageHeader; > *************** > *** 135,142 **** > otherBlock = InvalidBlockNumber; /* just to keep > compiler quiet */ > > /* > ! * We first try to put the tuple on the same page we last inserted a > tuple > ! * on, as cached in the relcache entry. If that doesn't work, we ask > the > * shared Free Space Map to locate a suitable page. Since the FSM's > info > * might be out of date, we have to be prepared to loop around and retry > * multiple times. (To insure this isn't an infinite loop, we must > update > --- 138,147 ---- > otherBlock = InvalidBlockNumber; /* just to keep > compiler quiet */ > > /* > ! * We first try to put the tuple on the page suggested by the caller, if > ! * any. Then we try to put the tuple on the same page we last inserted a > ! * tuple on, as cached in the relcache entry. If that doesn't work, we > ! * ask the > * shared Free Space Map to locate a suitable page. Since the FSM's > info > * might be out of date, we have to be prepared to loop around and retry > * multiple times. (To insure this isn't an infinite loop, we must > update > *************** > *** 144,152 **** > * not to be suitable.) If the FSM has no record of a page with enough > * free space, we give up and extend the relation. > * > ! * When use_fsm is false, we either put the tuple onto the existing > target > ! * page or extend the relation. > */ > if (len + saveFreeSpace <= MaxTupleSize) > targetBlock = relation->rd_targblock; > else > --- 149,167 ---- > * not to be suitable.) If the FSM has no record of a page with enough > * free space, we give up and extend the relation. > * > ! * When use_fsm is false, we skip the fsm lookup if neither the > suggested > ! * nor the cached last insertion page has enough room, and extend the > ! * relation. > ! * > ! * The fillfactor is taken into account when calculating the free space > ! * on the cached target block, and when using the FSM. The suggested > page > ! * is used whenever there's enough room in it, regardless of the > fillfactor, > ! * because that's exactly the purpose the space is reserved for in the > ! * first place. > */ > + if (suggestedBlock != InvalidBlockNumber) > + targetBlock = suggestedBlock; > + else > if (len + saveFreeSpace <= MaxTupleSize) > targetBlock = relation->rd_targblock; > else > *************** > *** 219,224 **** > --- 234,244 ---- > */ > pageHeader = (Page) BufferGetPage(buffer); > pageFreeSpace = PageGetFreeSpace(pageHeader); > + > + /* If we're trying the suggested block, don't care about > fillfactor */ > + if (targetBlock == suggestedBlock && len <= pageFreeSpace) > + return buffer; > + > if (len + saveFreeSpace <= pageFreeSpace) > { > /* use this page as future insert target, too */ > *************** > *** 241,246 **** > --- 261,275 ---- > ReleaseBuffer(buffer); > } > > + /* If we just tried the suggested block, try the cached target > + * block next, before consulting the FSM. */ > + if(suggestedBlock == targetBlock) > + { > + targetBlock = relation->rd_targblock; > + suggestedBlock = InvalidBlockNumber; > + continue; > + } > + > /* Without FSM, always fall out of the loop and extend */ > if (!use_fsm) > break; > Index: src/backend/access/index/genam.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/genam.c,v > retrieving revision 1.58 > diff -c -r1.58 genam.c > *** src/backend/access/index/genam.c 31 Jul 2006 20:08:59 -0000 1.58 > --- src/backend/access/index/genam.c 8 Aug 2006 16:17:21 -0000 > *************** > *** 259,261 **** > --- 259,275 ---- > > pfree(sysscan); > } > + > + /* > + * This is a dummy implementation of amsuggestblock, to be used for index > + * access methods that don't or can't support it. It just returns > + * InvalidBlockNumber, which means "no preference". > + * > + * This is probably not a good best place for this function, but it doesn't > + * fit naturally anywhere else either. > + */ > + Datum > + dummysuggestblock(PG_FUNCTION_ARGS) > + { > + PG_RETURN_UINT32(InvalidBlockNumber); > + } > Index: src/backend/access/index/indexam.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/indexam.c,v > retrieving revision 1.94 > diff -c -r1.94 indexam.c > *** src/backend/access/index/indexam.c 31 Jul 2006 20:08:59 -0000 > 1.94 > --- src/backend/access/index/indexam.c 8 Aug 2006 16:17:21 -0000 > *************** > *** 18,23 **** > --- 18,24 ---- > * index_rescan - restart a scan of an index > * index_endscan - end a scan > * index_insert - insert an index tuple into a relation > + * index_suggestblock - get desired insert location for a > heap tuple > * index_markpos - mark a scan position > * index_restrpos - restore a scan position > * index_getnext - get the next tuple from a scan > *************** > *** 202,207 **** > --- 203,237 ---- > > BoolGetDatum(check_uniqueness))); > } > > + /* ---------------- > + * index_suggestblock - get desired insert location for a heap > tuple > + * > + * The returned BlockNumber is the *heap* page that is the best place > + * to insert the given tuple to, according to the index am. The best > + * place is usually one that maintains the cluster order. > + * ---------------- > + */ > + BlockNumber > + index_suggestblock(Relation indexRelation, > + Datum *values, > + bool *isnull, > + Relation heapRelation) > + { > + FmgrInfo *procedure; > + > + RELATION_CHECKS; > + GET_REL_PROCEDURE(amsuggestblock); > + > + /* > + * have the am's suggestblock proc do all the work. > + */ > + return DatumGetUInt32(FunctionCall4(procedure, > + > PointerGetDatum(indexRelation), > + > PointerGetDatum(values), > + > PointerGetDatum(isnull), > + > PointerGetDatum(heapRelation))); > + } > + > /* > * index_beginscan - start a scan of an index with amgettuple > * > Index: src/backend/access/nbtree/nbtinsert.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v > retrieving revision 1.142 > diff -c -r1.142 nbtinsert.c > *** src/backend/access/nbtree/nbtinsert.c 25 Jul 2006 19:13:00 -0000 > 1.142 > --- src/backend/access/nbtree/nbtinsert.c 9 Aug 2006 17:51:33 -0000 > *************** > *** 146,151 **** > --- 146,221 ---- > } > > /* > + * _bt_suggestblock() -- Find the heap block of the closest index tuple. > + * > + * The logic to find the target should match _bt_doinsert, otherwise > + * we'll be making bad suggestions. > + */ > + BlockNumber > + _bt_suggestblock(Relation rel, IndexTuple itup, Relation heapRel) > + { > + int natts = rel->rd_rel->relnatts; > + OffsetNumber offset; > + Page page; > + BTPageOpaque opaque; > + > + ScanKey itup_scankey; > + BTStack stack; > + Buffer buf; > + IndexTuple curitup; > + BlockNumber suggestion = InvalidBlockNumber; > + > + /* we need an insertion scan key to do our search, so build one */ > + itup_scankey = _bt_mkscankey(rel, itup); > + > + /* find the first page containing this key */ > + stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_READ); > + if(!BufferIsValid(buf)) > + { > + /* The index was completely empty. No suggestion then. */ > + return InvalidBlockNumber; > + } > + /* we don't need the stack, so free it right away */ > + _bt_freestack(stack); > + > + page = BufferGetPage(buf); > + opaque = (BTPageOpaque) PageGetSpecialPointer(page); > + > + /* Find the location in the page where the new index tuple would go to. > */ > + > + offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); > + if (offset > PageGetMaxOffsetNumber(page)) > + { > + /* _bt_binsrch returned pointer to end-of-page. It means that > + * there was no equal items on the page, and the new item > should > + * be inserted as the last tuple of the page. There could be > equal > + * items on the next page, however. > + * > + * At the moment, we just ignore the potential equal items on > the > + * right, and pretend there isn't any. We could instead walk > right > + * to the next page to check that, but let's keep it simple for > now. > + */ > + offset = OffsetNumberPrev(offset); > + } > + if(offset < P_FIRSTDATAKEY(opaque)) > + { > + /* We landed on an empty page. We could step left or right until > + * we find some items, but let's keep it simple for now. > + */ > + } else { > + /* We're now positioned at the index tuple that we're > interested in. */ > + > + curitup = (IndexTuple) PageGetItem(page, PageGetItemId(page, > offset)); > + suggestion = ItemPointerGetBlockNumber(&curitup->t_tid); > + } > + > + _bt_relbuf(rel, buf); > + _bt_freeskey(itup_scankey); > + > + return suggestion; > + } > + > + /* > * _bt_check_unique() -- Check for violation of unique index constraint > * > * Returns InvalidTransactionId if there is no conflict, else an xact ID > Index: src/backend/access/nbtree/nbtree.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtree.c,v > retrieving revision 1.149 > diff -c -r1.149 nbtree.c > *** src/backend/access/nbtree/nbtree.c 10 May 2006 23:18:39 -0000 > 1.149 > --- src/backend/access/nbtree/nbtree.c 9 Aug 2006 18:04:02 -0000 > *************** > *** 228,233 **** > --- 228,265 ---- > } > > /* > + * btsuggestblock() -- find the best place in the heap to put a new tuple. > + * > + * This uses the same logic as btinsert to find the place where > the index > + * tuple would go if this was a btinsert call. > + * > + * There's room for improvement here. An insert operation will > descend > + * the tree twice, first by btsuggestblock, then by btinsert. > Things > + * might have changed in between, so that the heap tuple is > actually > + * not inserted in the optimal page, but since this is just an > + * optimization, it's ok if it happens sometimes. > + */ > + Datum > + btsuggestblock(PG_FUNCTION_ARGS) > + { > + Relation rel = (Relation) PG_GETARG_POINTER(0); > + Datum *values = (Datum *) PG_GETARG_POINTER(1); > + bool *isnull = (bool *) PG_GETARG_POINTER(2); > + Relation heapRel = (Relation) PG_GETARG_POINTER(3); > + IndexTuple itup; > + BlockNumber suggestion; > + > + /* generate an index tuple */ > + itup = index_form_tuple(RelationGetDescr(rel), values, isnull); > + > + suggestion =_bt_suggestblock(rel, itup, heapRel); > + > + pfree(itup); > + > + PG_RETURN_UINT32(suggestion); > + } > + > + /* > * btgettuple() -- Get the next tuple in the scan. > */ > Datum > Index: src/backend/executor/execMain.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execMain.c,v > retrieving revision 1.277 > diff -c -r1.277 execMain.c > *** src/backend/executor/execMain.c 31 Jul 2006 01:16:37 -0000 1.277 > --- src/backend/executor/execMain.c 8 Aug 2006 16:17:21 -0000 > *************** > *** 892,897 **** > --- 892,898 ---- > resultRelInfo->ri_RangeTableIndex = resultRelationIndex; > resultRelInfo->ri_RelationDesc = resultRelationDesc; > resultRelInfo->ri_NumIndices = 0; > + resultRelInfo->ri_ClusterIndex = -1; > resultRelInfo->ri_IndexRelationDescs = NULL; > resultRelInfo->ri_IndexRelationInfo = NULL; > /* make a copy so as not to depend on relcache info not changing... */ > *************** > *** 1388,1394 **** > heap_insert(estate->es_into_relation_descriptor, tuple, > estate->es_snapshot->curcid, > estate->es_into_relation_use_wal, > ! false); /* never any point in > using FSM */ > /* we know there are no indexes to update */ > heap_freetuple(tuple); > IncrAppended(); > --- 1389,1396 ---- > heap_insert(estate->es_into_relation_descriptor, tuple, > estate->es_snapshot->curcid, > estate->es_into_relation_use_wal, > ! false, /* never any point in using FSM > */ > ! InvalidBlockNumber); > /* we know there are no indexes to update */ > heap_freetuple(tuple); > IncrAppended(); > *************** > *** 1419,1424 **** > --- 1421,1427 ---- > ResultRelInfo *resultRelInfo; > Relation resultRelationDesc; > Oid newId; > + BlockNumber suggestedBlock; > > /* > * get the heap tuple out of the tuple table slot, making sure we have a > *************** > *** 1467,1472 **** > --- 1470,1479 ---- > if (resultRelationDesc->rd_att->constr) > ExecConstraints(resultRelInfo, slot, estate); > > + /* Ask the index am of the clustered index for the > + * best place to put it */ > + suggestedBlock = ExecSuggestBlock(slot, estate); > + > /* > * insert the tuple > * > *************** > *** 1475,1481 **** > */ > newId = heap_insert(resultRelationDesc, tuple, > estate->es_snapshot->curcid, > ! true, true); > > IncrAppended(); > (estate->es_processed)++; > --- 1482,1488 ---- > */ > newId = heap_insert(resultRelationDesc, tuple, > estate->es_snapshot->curcid, > ! true, true, suggestedBlock); > > IncrAppended(); > (estate->es_processed)++; > Index: src/backend/executor/execUtils.c > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execUtils.c,v > retrieving revision 1.139 > diff -c -r1.139 execUtils.c > *** src/backend/executor/execUtils.c 4 Aug 2006 21:33:36 -0000 1.139 > --- src/backend/executor/execUtils.c 9 Aug 2006 18:05:05 -0000 > *************** > *** 31,36 **** > --- 31,37 ---- > * ExecOpenIndices \ > * ExecCloseIndices | referenced by InitPlan, > EndPlan, > * ExecInsertIndexTuples / ExecInsert, ExecUpdate > + * ExecSuggestBlock Referenced by ExecInsert > * > * RegisterExprContextCallback Register function shutdown > callback > * UnregisterExprContextCallback Deregister function shutdown > callback > *************** > *** 874,879 **** > --- 875,881 ---- > IndexInfo **indexInfoArray; > > resultRelInfo->ri_NumIndices = 0; > + resultRelInfo->ri_ClusterIndex = -1; > > /* fast path if no indexes */ > if (!RelationGetForm(resultRelation)->relhasindex) > *************** > *** 913,918 **** > --- 915,925 ---- > /* extract index key information from the index's pg_index info > */ > ii = BuildIndexInfo(indexDesc); > > + /* Remember which index is the clustered one. > + * It's used to call the suggestblock-method on inserts */ > + if(indexDesc->rd_index->indisclustered) > + resultRelInfo->ri_ClusterIndex = i; > + > relationDescs[i] = indexDesc; > indexInfoArray[i] = ii; > i++; > *************** > *** 1062,1067 **** > --- 1069,1137 ---- > } > } > > + /* ---------------------------------------------------------------- > + * ExecSuggestBlock > + * > + * This routine asks the index am where a new heap tuple > + * should be placed. > + * ---------------------------------------------------------------- > + */ > + BlockNumber > + ExecSuggestBlock(TupleTableSlot *slot, > + EState *estate) > + { > + ResultRelInfo *resultRelInfo; > + int i; > + Relation relationDesc; > + Relation heapRelation; > + ExprContext *econtext; > + Datum values[INDEX_MAX_KEYS]; > + bool isnull[INDEX_MAX_KEYS]; > + IndexInfo *indexInfo; > + > + /* > + * Get information from the result relation info structure. > + */ > + resultRelInfo = estate->es_result_relation_info; > + i = resultRelInfo->ri_ClusterIndex; > + if(i == -1) > + return InvalidBlockNumber; /* there was no clustered index */ > + > + heapRelation = resultRelInfo->ri_RelationDesc; > + relationDesc = resultRelInfo->ri_IndexRelationDescs[i]; > + indexInfo = resultRelInfo->ri_IndexRelationInfo[i]; > + > + /* You can't cluster on a partial index */ > + Assert(indexInfo->ii_Predicate == NIL); > + > + /* > + * We will use the EState's per-tuple context for evaluating > + * index expressions (creating it if it's not already there). > + */ > + econtext = GetPerTupleExprContext(estate); > + > + /* Arrange for econtext's scan tuple to be the tuple under test */ > + econtext->ecxt_scantuple = slot; > + > + /* > + * FormIndexDatum fills in its values and isnull parameters with the > + * appropriate values for the column(s) of the index. > + */ > + FormIndexDatum(indexInfo, > + slot, > + estate, > + values, > + isnull); > + > + /* > + * The index AM does the rest. > + */ > + return index_suggestblock(relationDesc, /* index relation */ > + values, /* array of index Datums */ > + isnull, /* null flags */ > + heapRelation); > + } > + > /* > * UpdateChangedParamSet > * Add changed parameters to a plan node's chgParam set > Index: src/include/access/genam.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/genam.h,v > retrieving revision 1.65 > diff -c -r1.65 genam.h > *** src/include/access/genam.h 31 Jul 2006 20:09:05 -0000 1.65 > --- src/include/access/genam.h 9 Aug 2006 17:53:44 -0000 > *************** > *** 93,98 **** > --- 93,101 ---- > ItemPointer heap_t_ctid, > Relation heapRelation, > bool check_uniqueness); > + extern BlockNumber index_suggestblock(Relation indexRelation, > + Datum *values, bool *isnull, > + Relation heapRelation); > > extern IndexScanDesc index_beginscan(Relation heapRelation, > Relation indexRelation, > *************** > *** 123,128 **** > --- 126,133 ---- > extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum, > uint16 procnum); > > + extern Datum dummysuggestblock(PG_FUNCTION_ARGS); > + > /* > * index access method support routines (in genam.c) > */ > Index: src/include/access/heapam.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/heapam.h,v > retrieving revision 1.114 > diff -c -r1.114 heapam.h > *** src/include/access/heapam.h 3 Jul 2006 22:45:39 -0000 1.114 > --- src/include/access/heapam.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 156,162 **** > extern void setLastTid(const ItemPointer tid); > > extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid, > ! bool use_wal, bool use_fsm); > extern HTSU_Result heap_delete(Relation relation, ItemPointer tid, > ItemPointer ctid, TransactionId *update_xmax, > CommandId cid, Snapshot crosscheck, bool wait); > --- 156,162 ---- > extern void setLastTid(const ItemPointer tid); > > extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid, > ! bool use_wal, bool use_fsm, BlockNumber suggestedblk); > extern HTSU_Result heap_delete(Relation relation, ItemPointer tid, > ItemPointer ctid, TransactionId *update_xmax, > CommandId cid, Snapshot crosscheck, bool wait); > Index: src/include/access/hio.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/hio.h,v > retrieving revision 1.32 > diff -c -r1.32 hio.h > *** src/include/access/hio.h 13 Jul 2006 17:47:01 -0000 1.32 > --- src/include/access/hio.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 21,26 **** > extern void RelationPutHeapTuple(Relation relation, Buffer buffer, > HeapTuple tuple); > extern Buffer RelationGetBufferForTuple(Relation relation, Size len, > ! Buffer otherBuffer, bool use_fsm); > > #endif /* HIO_H */ > --- 21,26 ---- > extern void RelationPutHeapTuple(Relation relation, Buffer buffer, > HeapTuple tuple); > extern Buffer RelationGetBufferForTuple(Relation relation, Size len, > ! Buffer otherBuffer, bool use_fsm, > BlockNumber suggestedblk); > > #endif /* HIO_H */ > Index: src/include/access/nbtree.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/nbtree.h,v > retrieving revision 1.103 > diff -c -r1.103 nbtree.h > *** src/include/access/nbtree.h 7 Aug 2006 16:57:57 -0000 1.103 > --- src/include/access/nbtree.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 467,472 **** > --- 467,473 ---- > extern Datum btbulkdelete(PG_FUNCTION_ARGS); > extern Datum btvacuumcleanup(PG_FUNCTION_ARGS); > extern Datum btoptions(PG_FUNCTION_ARGS); > + extern Datum btsuggestblock(PG_FUNCTION_ARGS); > > /* > * prototypes for functions in nbtinsert.c > *************** > *** 476,481 **** > --- 477,484 ---- > extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access); > extern void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf, > BTStack stack, bool is_root, bool is_only); > + extern BlockNumber _bt_suggestblock(Relation rel, IndexTuple itup, > + Relation heapRel); > > /* > * prototypes for functions in nbtpage.c > Index: src/include/catalog/pg_am.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_am.h,v > retrieving revision 1.46 > diff -c -r1.46 pg_am.h > *** src/include/catalog/pg_am.h 31 Jul 2006 20:09:05 -0000 1.46 > --- src/include/catalog/pg_am.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 65,70 **** > --- 65,71 ---- > regproc amvacuumcleanup; /* post-VACUUM cleanup function > */ > regproc amcostestimate; /* estimate cost of an indexscan */ > regproc amoptions; /* parse AM-specific parameters > */ > + regproc amsuggestblock; /* suggest a block where to put heap > tuple */ > } FormData_pg_am; > > /* ---------------- > *************** > *** 78,84 **** > * compiler constants for pg_am > * ---------------- > */ > ! #define Natts_pg_am 23 > #define Anum_pg_am_amname 1 > #define Anum_pg_am_amstrategies 2 > #define Anum_pg_am_amsupport 3 > --- 79,85 ---- > * compiler constants for pg_am > * ---------------- > */ > ! #define Natts_pg_am 24 > #define Anum_pg_am_amname 1 > #define Anum_pg_am_amstrategies 2 > #define Anum_pg_am_amsupport 3 > *************** > *** 102,123 **** > #define Anum_pg_am_amvacuumcleanup 21 > #define Anum_pg_am_amcostestimate 22 > #define Anum_pg_am_amoptions 23 > > /* ---------------- > * initial contents of pg_am > * ---------------- > */ > > ! DATA(insert OID = 403 ( btree 5 1 1 t t t t f t btinsert btbeginscan > btgettuple btgetmulti btrescan btendscan btmarkpos btrestrpos btbuild > btbulkdelete btvacuumcleanup btcostestimate btoptions )); > DESCR("b-tree index access method"); > #define BTREE_AM_OID 403 > ! DATA(insert OID = 405 ( hash 1 1 0 f f f f f f hashinsert > hashbeginscan hashgettuple hashgetmulti hashrescan hashendscan hashmarkpos > hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate > hashoptions )); > DESCR("hash index access method"); > #define HASH_AM_OID 405 > ! DATA(insert OID = 783 ( gist 100 7 0 f t t t t t gistinsert > gistbeginscan gistgettuple gistgetmulti gistrescan gistendscan gistmarkpos > gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate > gistoptions )); > DESCR("GiST index access method"); > #define GIST_AM_OID 783 > ! DATA(insert OID = 2742 ( gin 100 4 0 f f f f t f gininsert > ginbeginscan gingettuple gingetmulti ginrescan ginendscan ginmarkpos > ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate > ginoptions )); > DESCR("GIN index access method"); > #define GIN_AM_OID 2742 > > --- 103,125 ---- > #define Anum_pg_am_amvacuumcleanup 21 > #define Anum_pg_am_amcostestimate 22 > #define Anum_pg_am_amoptions 23 > + #define Anum_pg_am_amsuggestblock 24 > > /* ---------------- > * initial contents of pg_am > * ---------------- > */ > > ! DATA(insert OID = 403 ( btree 5 1 1 t t t t f t btinsert btbeginscan > btgettuple btgetmulti btrescan btendscan btmarkpos btrestrpos btbuild > btbulkdelete btvacuumcleanup btcostestimate btoptions btsuggestblock)); > DESCR("b-tree index access method"); > #define BTREE_AM_OID 403 > ! DATA(insert OID = 405 ( hash 1 1 0 f f f f f f hashinsert > hashbeginscan hashgettuple hashgetmulti hashrescan hashendscan hashmarkpos > hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate > hashoptions dummysuggestblock)); > DESCR("hash index access method"); > #define HASH_AM_OID 405 > ! DATA(insert OID = 783 ( gist 100 7 0 f t t t t t gistinsert > gistbeginscan gistgettuple gistgetmulti gistrescan gistendscan gistmarkpos > gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate > gistoptions dummysuggestblock)); > DESCR("GiST index access method"); > #define GIST_AM_OID 783 > ! DATA(insert OID = 2742 ( gin 100 4 0 f f f f t f gininsert > ginbeginscan gingettuple gingetmulti ginrescan ginendscan ginmarkpos > ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate > ginoptions dummysuggestblock )); > DESCR("GIN index access method"); > #define GIN_AM_OID 2742 > > Index: src/include/catalog/pg_proc.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_proc.h,v > retrieving revision 1.420 > diff -c -r1.420 pg_proc.h > *** src/include/catalog/pg_proc.h 6 Aug 2006 03:53:44 -0000 1.420 > --- src/include/catalog/pg_proc.h 9 Aug 2006 18:06:44 -0000 > *************** > *** 682,687 **** > --- 682,689 ---- > DESCR("btree(internal)"); > DATA(insert OID = 2785 ( btoptions PGNSP PGUID 12 f f t f s 2 > 17 "1009 16" _null_ _null_ _null_ btoptions - _null_ )); > DESCR("btree(internal)"); > + DATA(insert OID = 2852 ( btsuggestblock PGNSP PGUID 12 f f t f v 4 23 > "2281 2281 2281 2281" _null_ _null_ _null_ btsuggestblock - _null_ )); > + DESCR("btree(internal)"); > > DATA(insert OID = 339 ( poly_same PGNSP PGUID 12 f f t f i 2 > 16 "604 604" _null_ _null_ _null_ poly_same - _null_ )); > DESCR("same as?"); > *************** > *** 3936,3941 **** > --- 3938,3946 ---- > DATA(insert OID = 2749 ( arraycontained PGNSP PGUID 12 f f t f i 2 > 16 "2277 2277" _null_ _null_ _null_ arraycontained - _null_ )); > DESCR("anyarray contained"); > > + DATA(insert OID = 2853 ( dummysuggestblock PGNSP PGUID 12 f f t f v 4 23 > "2281 2281 2281 2281" _null_ _null_ _null_ dummysuggestblock - _null_ )); > + DESCR("dummy amsuggestblock implementation (internal)"); > + > /* > * Symbolic values for provolatile column: these indicate whether the result > * of a function is dependent *only* on the values of its explicit > arguments, > Index: src/include/executor/executor.h > =================================================================== > RCS file: > /home/hlinnaka/pgcvsrepository/pgsql/src/include/executor/executor.h,v > retrieving revision 1.128 > diff -c -r1.128 executor.h > *** src/include/executor/executor.h 4 Aug 2006 21:33:36 -0000 1.128 > --- src/include/executor/executor.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 271,276 **** > --- 271,277 ---- > extern void ExecCloseIndices(ResultRelInfo *resultRelInfo); > extern void ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid, > EState *estate, bool is_vacuum); > + extern BlockNumber ExecSuggestBlock(TupleTableSlot *slot, EState *estate); > > extern void RegisterExprContextCallback(ExprContext *econtext, > > ExprContextCallbackFunction function, > Index: src/include/nodes/execnodes.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/execnodes.h,v > retrieving revision 1.158 > diff -c -r1.158 execnodes.h > *** src/include/nodes/execnodes.h 4 Aug 2006 21:33:36 -0000 1.158 > --- src/include/nodes/execnodes.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 257,262 **** > --- 257,264 ---- > * NumIndices # of indices existing > on result relation > * IndexRelationDescs array of relation descriptors > for indices > * IndexRelationInfo array of key/attr info for > indices > + * ClusterIndex index to the IndexRelationInfo > array of the > + * clustered > index, or -1 if there's none > * TrigDesc triggers to be fired, > if any > * TrigFunctions cached lookup info for trigger > functions > * TrigInstrument optional runtime measurements > for triggers > *************** > *** 272,277 **** > --- 274,280 ---- > int ri_NumIndices; > RelationPtr ri_IndexRelationDescs; > IndexInfo **ri_IndexRelationInfo; > + int ri_ClusterIndex; > TriggerDesc *ri_TrigDesc; > FmgrInfo *ri_TrigFunctions; > struct Instrumentation *ri_TrigInstrument; > Index: src/include/utils/rel.h > =================================================================== > RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/utils/rel.h,v > retrieving revision 1.91 > diff -c -r1.91 rel.h > *** src/include/utils/rel.h 3 Jul 2006 22:45:41 -0000 1.91 > --- src/include/utils/rel.h 8 Aug 2006 16:17:21 -0000 > *************** > *** 116,121 **** > --- 116,122 ---- > FmgrInfo amvacuumcleanup; > FmgrInfo amcostestimate; > FmgrInfo amoptions; > + FmgrInfo amsuggestblock; > } RelationAmInfo; > > -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches