I wrote: > The problem here is that RelationSetNewRelfilenode is aggressively > changing the index's relcache entry before it's written out the > updated tuple, so that the tuple update tries to make an index > entry in the new storage which isn't filled yet. I think we can > fix it by *not* doing that, but leaving it to the relcache inval > during the CommandCounterIncrement call to update the relcache > entry. However, it looks like that will take some API refactoring, > because the storage-creation functions expect to get the new > relfilenode out of the relcache entry, and they'll have to be > changed to not do it that way.
So looking at that, it seems like the table_relation_set_new_filenode API is pretty darn ill-designed. It assumes that it's passed an already-entirely-valid relcache entry, but it also supposes that it can pass back information that needs to go into the relation's pg_class entry. One or the other side of that has to give, unless you want to doom everything to updating pg_class twice. I'm not really sure what's the point of giving the tableam control of relfrozenxid+relminmxid at all, and I notice that index_create for one is just Asserting that constant values are returned. I think we need to do one or possibly both of these things: * split table_relation_set_new_filenode into two functions, one that doesn't take a relcache entry at all and returns appropriate relfrozenxid+relminmxid for a new rel, and then one that just creates storage without dealing with the xid values; * change table_relation_set_new_filenode so that it is told the relfilenode etc to use without assuming that it has a valid relcache entry to work with. Thoughts? regards, tom lane