I'm taking a look at doing the refactoring Tom Lane and Simon Riggs discussed here:
http://archives.postgresql.org/pgsql-patches/2008-02/msg00155.php In terms of the buffer manager, I think we can simply introduce a new strategy type BAS_BULKWRITE and make it behave identically to BAS_VACUUM. Anyone see a reason to do anything else? The trickier part is to handle the communication between CopyFrom (or the CTAS machinery), heap_insert, and RelationGetBufferForTuple. There are basically three things we need to keep track of here: (1) a BufferAccessStrategy (that is, the ring of buffers we're using for this bulk insert) (2) the last-pinned page (to implement Simon Riggs's proposed optimization of keeping the most-recently-written page pinned) (3) use_wal and use_fsm (to implement Tom Lane's suggestion of reducing the number of options to heap_insert by rolling everything into an options object) Tom's email seemed to suggest that we might want to roll everything into the BufferAccessStrategy itself, but that seems to require quite a few things to know about the internals of BufferAccessStrategy that currently don't, so I think that's a bad idea. I am kind of inclined to define flags like this: #define HEAP_INSERT_SKIP_WAL 0x0001 #define HEAP_INSERT_SKIP_FSM 0x0002 #define HEAP_INSERT_BULK 0x0004 /* do we even need this one? */ And then: Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid, unsigned options, BulkInsertState *bistate); BulkInsertState *GetBulkInsertState(void); void FreeBulkInsertState(BulkInsertState *); I'm always wary of reversing the sense of a boolean, but I think it makes sense here; it doesn't really matter whether you call heap_insert(relation, tup, cid, true, true) or heap_insert(relation, tup, cid, false, false), but heap_insert(relation, tup, cid, HEAP_INSERT_USE_WAL|HEAP_INSERT_USE_FSM, NULL) is a lot uglier than heap_insert(relation, tup, cid, 0, NULL), and there aren't that many places that need to be checked for correctness in making the change. Admittedly, we could make the calling sequence for heap_insert shorter by putting the options (and maybe even the CommandId) into BulkInsertState and calling it HeapInsertOptions, but that forces several callers of heap_insert who don't care at all about bulk inserts to uselessly create and destroy a HeapInsertOptions object just to pass a couple of boolean flags (and maybe the CommandId), which seems like a loser. Comments? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers