[HACKERS] Duplicating transaction information in indexes and performing in memory vacuum
Hi, Last week, there was a thread whether solely in memory vacuum can be performed or not.(OK, that was a part of thread but anyways) I suggested that a page be vacuumed when it is pushed out of buffer cache. Tom pointed out that it can not be done as index tuples stote heap tuple id and depend upon heap tuple to find out transaction information. I asked is it feasible to add transaction information to index tuple and the answer was no. I searched hackers archive and following is only thread I could come up in this context. http://archives.postgresql.org/pgsql-hackers/2000-09/msg00513.php http://archives.postgresql.org/pgsql-hackers/2001-09/msg00409.php The thread does not consider vacuum at all. What are (more) reasons for not adding transaction information to index tuple, in addition to heap tuple? Cons are bloated indexes. The index tuple size will be close to 30 bytes minimum. On pro* side of this, no more vacuum required (at least for part of data that is being used. If data isn't used, it does not need vacuum anyway) and space bloat is stopped right in memory, without incurring overhead of additional IO vacuum demands. Given recent trend of pushing PG higher and higher in scale (From performance list traffic, that is), I think this could be worthwhile addition. So what are the cons I missed so far? Bye Shridhar ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Duplicating transaction information in indexes and performing in memory vacuum
Shridhar Daithankar [EMAIL PROTECTED] writes: What are (more) reasons for not adding transaction information to index tuple, in addition to heap tuple? Cons are bloated indexes. The index tuple size will be close to 30 bytes minimum. And extra time to perform an update or delete, and extra time for readers of the index to process and perhaps update the extra copies of the row's state. And atomicity concerns, since you can't possibly update the row and all its index entries simultaneously. I'm not certain that the latter issue is insoluble, but it surely is a big risk. On pro* side of this, no more vacuum required (at least for part of data that is being used. If data isn't used, it does not need vacuum anyway) and space bloat is stopped right in memory, without incurring overhead of additional IO vacuum demands. I do not believe either of those claims. For starters, if you don't remove a row's index entries when the row itself is removed, won't that make index bloat a lot worse? When exactly *will* you remove the index entries ... and won't that process look a lot like VACUUM? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html