We have seen a few reports (eg from Hervé Piedvache) of VACUUM FULL in 7.2 producing messages like
dbfr=# VACUUM FULL VERBOSE ANALYZE pg_class ; NOTICE: --Relation pg_class-- NOTICE: Rel pg_class: Uninitialized page 9 - fixing NOTICE: Rel pg_class: Uninitialized page 10 - fixing NOTICE: Rel pg_class: Uninitialized page 11 - fixing NOTICE: Rel pg_class: Uninitialized page 12 - fixing NOTICE: Rel pg_class: Uninitialized page 13 - fixing NOTICE: Rel pg_class: Uninitialized page 14 - fixing NOTICE: Rel pg_class: Uninitialized page 15 - fixing NOTICE: Rel pg_class: Uninitialized page 16 - fixing NOTICE: Rel pg_class: Uninitialized page 17 - fixing NOTICE: Rel pg_class: Uninitialized page 18 - fixing NOTICE: Rel pg_class: Uninitialized page 19 - fixing NOTICE: Rel pg_class: Uninitialized page 20 - fixing NOTICE: Rel pg_class: Uninitialized page 21 - fixing NOTICE: Rel pg_class: Uninitialized page 22 - fixing NOTICE: Rel pg_class: Uninitialized page 23 - fixing ... I had originally suspected hardware problems, but Hervé told me today that he was still seeing this behavior after moving to a new machine. So I went digging for an explanation --- and I found one. I've been able to reproduce the above behavior by issuing repeated table creations in one backend while another backend does occasional VACUUM FULLs on pg_class. The fundamental problem is that for nailed-in-cache relations like pg_class, RelationClearRelation() does not want to release the cache entry. In 7.2 it doesn't do anything except close the smgr file for the relation and return. But RelationClearRelation is what gets called to implement a relcache flush from an SI message. This means that nothing much happens in other backends when a VACUUM transmits a relcache flush message for a nailed-in-cache relation. In particular, they fail to update their rd_targblock and rd_nblocks fields. So the scenario goes like this: 1. Backend A has done a lot of inserts/deletes in pg_class. Its rd_targblock field points out somewhere near the end of the table. 2. Backend B does a VACUUM FULL, gets rid of lots of space, and shrinks pg_class. 3. Backend A does nothing in response to B's SI message, so its rd_targblock field now points past the end of the table. 4. Backend A now tries to insert another pg_class row. In RelationGetBufferForTuple(), it reads the rd_targblock page, locks it, checks it for free space. md.c will allow the read to occur even though it's past current EOF of the table; it will return a zeroed page. The check for free space will act as though there is zero free space available, so RelationGetBufferForTuple releases the buffer and goes to find another page where there's space. No problem ... yet. 5. The trouble is that the bufmgr now has a live buffer for a page that's past the end of pg_class. What's more, it thinks the page is dirty (because the mere act of obtaining an exclusive buffer lock on the page sets cntxDirty). Eventually, the bufmgr will want to recycle that buffer for some other use, and at that point it writes out the buffer. Presto, a page of zeroes. In fact possibly many pages of zeroes --- if the rd_targblock was more than one block past the new actual EOF, standard Unixen will accept the write and will silently fill the intervening file space with zeroes (or make it look like they did, anyway). There isn't any serious consequence of this problem, other than that the next VACUUM will issue some "Uninitialized page" messages, so I'm not feeling that we need a 7.2.4 to fix it in the 7.2 series. But it needs to be fixed. The good news is that it is partly fixed already in 7.3, because in 7.3 RelationClearRelation does reset rd_targblock for nailed-in relations. So I believe the problem cannot occur in this form anymore. But I am also thinking that it's a really bad idea for mdread to allow reads from beyond EOF --- that's just asking for trouble. Can anyone see a reason not to remove the special-case at line 440 in md.c? It'd probably also be a good idea to decouple setting cntxDirty from acquiring exclusive buffer lock. As things stand, when RelationGetBufferForTuple finds there's not enough space on a target page, it's still set cntxDirty, thereby triggering an unnecessary write of that page. In many cases the page would be dirty already, but it's ugly nonetheless ... and it is a contributing factor in this bug. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])