[HACKERS] Duplicating transaction information in indexes and performing in memory vacuum

2003-10-27 Thread Shridhar Daithankar
Hi,

Last week, there was a thread whether solely in memory vacuum can be performed 
or not.(OK, that was a part of thread but anyways)

I suggested that a page be vacuumed when it is pushed out of buffer cache. Tom 
pointed out that it can not be done as index tuples stote heap tuple id and 
depend upon heap tuple to find out transaction information.

I asked is it feasible to add transaction information to index tuple and the 
answer was no.

I searched hackers archive and following is only thread I could come up in this 
context.

http://archives.postgresql.org/pgsql-hackers/2000-09/msg00513.php
http://archives.postgresql.org/pgsql-hackers/2001-09/msg00409.php
The thread does not consider vacuum at all.

What are (more) reasons for not adding transaction information to index tuple, 
in addition to heap tuple?

Cons are bloated indexes. The index tuple size will be close to 30 bytes minimum.

On pro* side of this, no more vacuum required (at least for part of data that is 
being used. If data isn't used, it does not need vacuum anyway) and space bloat 
is stopped right in memory, without incurring overhead of additional IO vacuum 
demands.

Given recent trend of pushing PG higher and higher in scale (From performance 
list traffic, that is), I think this could be worthwhile addition.

So what are the cons I missed so far?

 Bye
  Shridhar
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Duplicating transaction information in indexes and performing in memory vacuum

2003-10-27 Thread Tom Lane
Shridhar Daithankar [EMAIL PROTECTED] writes:
 What are (more) reasons for not adding transaction information to
 index tuple, in addition to heap tuple?

 Cons are bloated indexes. The index tuple size will be close to 30
 bytes minimum.

And extra time to perform an update or delete, and extra time for
readers of the index to process and perhaps update the extra copies
of the row's state.  And atomicity concerns, since you can't possibly
update the row and all its index entries simultaneously.  I'm not
certain that the latter issue is insoluble, but it surely is a big risk.

 On pro* side of this, no more vacuum required (at least for part of
 data that is being used. If data isn't used, it does not need vacuum
 anyway) and space bloat is stopped right in memory, without incurring
 overhead of additional IO vacuum demands.

I do not believe either of those claims.  For starters, if you don't
remove a row's index entries when the row itself is removed, won't that
make index bloat a lot worse?  When exactly *will* you remove the index
entries ... and won't that process look a lot like VACUUM?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html