Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-06-08 Thread Robert Haas
On Wed, Jun 8, 2011 at 1:19 AM, Pavan Deolasee pavan.deola...@gmail.com wrote:
 I went on to create a WIP patch based on our discussion. There are
 couple of issues that I stumbled upon while testing it.

 1. The start-of-index-vacuum LSN that we want to track must be noted
 even before the heap scan is started. This is because we must be
 absolutely sure that the index vacuum removes index pointers to all
 dead line pointers generated by any operation with LSN less than the
 start-of-index-vacuum LSN. If we don't remember the LSN before heap
 scan starts and rather delay it until the start of the index vacuum,
 new dead line pointers may get generated on a page which is already
 scanned by the heap scan but before the start of the index scan. Since
 the index pointers to these new dead line pointers haven't been
 vacuumed, we should really not be removing them.

 But as a consequence of using a LSN from the start of the heap scan,
 at the end of vacuum, all pruned pages will have vacuum LSN greater
 than the index vacuum LSN that we are going to remember in the
 pg_class. And by our design, we can't remove dead line pointers on
 those pages because we don't know if the index pointers have been
 vacuumed or not. We might not be able to reclaim any dead line
 pointers, if the page is again HOT pruned before the next vacuum cycle
 because that will overwrite the page vacuum LSN with a newer value.

Oh.  That sucks.

 I think we definitely need to track the dead line pointers that a heap
 scan has collected. The index pointers to them will be removed if the
 vacuum completes successfully. That gets us back to the original idea
 that we had discussed a while back about marking such dead line
 pointers as LP_DEAD_RECLAIMED  or something like that. When vacuum
 runs heap scan, it would collect all dead line pointers and mark them
 dead-reclaimed and also store an identifier of the vacuum operation
 that would remove the associated index pointers. During HOT cleanup or
 the next vacuum, we can safely remove the LP_DEAD_RECLAIMED line
 pointers if we can safely check if the vacuum completed successfully
 or not.  We don't have any free flags in ItemIdData, but we can use
 special lp_off to recognize a dead and dead-reclaimed line pointer.
 The identifier itself can either be an LSN or XID or anything else.
 Also, since we just need one identifier, I think this technique would
 work for unlogged and temp relations, with little adjustments.

OK.  So we have a Boolean some place.  At the beginning of VACUUM, we
read and remember the old value, and set it to false.  At the end of
VACUUM, after everything has succeeded, we set it to true.  During HOT
cleanup, we can free dead-reclaimed line pointers if the value is
currently true.  During VACUUM, we can free dead-reclaimed line
pointers if the value was true when we started.

The name dead-reclaimed doesn't inspire me very much.  Dead vs.
dead-vacuumed?  Morbid vs. dead?

 2. Another issue is with analyze counting dead line pointers as dead
 rows. While its correct in principle because a vacuum is needed to
 remove these dead line pointers, the overhead of having a dead line
 pointer is much lesser than a dead tuple. Also, with single pass
 vacuum, there will be many dead line pointers waiting to be cleaned up
 in the next vacuum or HOT-prune. We should not really count them as
 dead rows because they don't require a vacuum per se and counting them
 as dead will force more vacuum cycles than required. If we go by the
 idea described above, we can definitely skip the dead-reclaimed line
 pointers, definitely when we know that index vacuum was completed
 successfully.

 Thoughts ?

I think we should count both the dead line pointers and dead tuples
separately, but have two separate counters.  I agree that a dead line
pointer is a lot less expensive than a dead tuple, but it's not free
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-06-07 Thread Pavan Deolasee
On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:


 So are there any other objections/suggestions ? Anyone else cares to
 look at the brief design that we discussed above ? Otherwise, I would
 go ahead and work on this in the coming days. Of course, I will keep
 the list posted about any new issues that I see.


I went on to create a WIP patch based on our discussion. There are
couple of issues that I stumbled upon while testing it.

1. The start-of-index-vacuum LSN that we want to track must be noted
even before the heap scan is started. This is because we must be
absolutely sure that the index vacuum removes index pointers to all
dead line pointers generated by any operation with LSN less than the
start-of-index-vacuum LSN. If we don't remember the LSN before heap
scan starts and rather delay it until the start of the index vacuum,
new dead line pointers may get generated on a page which is already
scanned by the heap scan but before the start of the index scan. Since
the index pointers to these new dead line pointers haven't been
vacuumed, we should really not be removing them.

But as a consequence of using a LSN from the start of the heap scan,
at the end of vacuum, all pruned pages will have vacuum LSN greater
than the index vacuum LSN that we are going to remember in the
pg_class. And by our design, we can't remove dead line pointers on
those pages because we don't know if the index pointers have been
vacuumed or not. We might not be able to reclaim any dead line
pointers, if the page is again HOT pruned before the next vacuum cycle
because that will overwrite the page vacuum LSN with a newer value.

I think we definitely need to track the dead line pointers that a heap
scan has collected. The index pointers to them will be removed if the
vacuum completes successfully. That gets us back to the original idea
that we had discussed a while back about marking such dead line
pointers as LP_DEAD_RECLAIMED  or something like that. When vacuum
runs heap scan, it would collect all dead line pointers and mark them
dead-reclaimed and also store an identifier of the vacuum operation
that would remove the associated index pointers. During HOT cleanup or
the next vacuum, we can safely remove the LP_DEAD_RECLAIMED line
pointers if we can safely check if the vacuum completed successfully
or not.  We don't have any free flags in ItemIdData, but we can use
special lp_off to recognize a dead and dead-reclaimed line pointer.
The identifier itself can either be an LSN or XID or anything else.
Also, since we just need one identifier, I think this technique would
work for unlogged and temp relations, with little adjustments.

2. Another issue is with analyze counting dead line pointers as dead
rows. While its correct in principle because a vacuum is needed to
remove these dead line pointers, the overhead of having a dead line
pointer is much lesser than a dead tuple. Also, with single pass
vacuum, there will be many dead line pointers waiting to be cleaned up
in the next vacuum or HOT-prune. We should not really count them as
dead rows because they don't require a vacuum per se and counting them
as dead will force more vacuum cycles than required. If we go by the
idea described above, we can definitely skip the dead-reclaimed line
pointers, definitely when we know that index vacuum was completed
successfully.

Thoughts ?

Thanks,
Pavan



-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Pavan Deolasee
On Thu, May 26, 2011 at 9:40 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 25, 2011 at 11:51 PM, Pavan Deolasee
 Having said that, it doesn't excite me too much because I
 think we should do the dead line pointer reclaim operation during page
 pruning and we are already holding cleanup lock at that time and most
 likely do a reshuffle anyways.

 I'll give that a firm maybe.  If there is no reshuffle, then you can
 do this with just an exclusive content lock.  Maybe that's worthless,
 but I'm not certain of it.  I guess we might need to see how the code
 shakes out.


Yeah, once we start working on it, we might have a better idea.

 Also, reshuffling might be more expensive.  I agree that if there are
 new dead tuples on the page, then you're going to be paying that price
 anyway; but if not, it might be avoidable.


Yeah. We can tackle this later. As you suggested, may be we can start
with something simpler and then see we need to do more.


 There are some other issues that we should think about too. Like
 recording free space  and managing visibility map. The free space is
 recorded in the second pass pass today, but I don't see any reason why
 that can't be moved to the first pass. Its not clear though if we
 should also record free space after retail page vacuum or leave it as
 it is.

 Not sure.  Any idea why it's like that, or why we might want to change it?


I think it precedes the HOT days when the dead space was reclaimed
only during the second scan. Even post-HOT, if we know we would
revisit the page anyways during the second scan, it makes sense to
delay recording free space because the dead line pointers can add to
it (if they are towards the end of the line pointer array). I remember
discussing this briefly during HOT, but can't recollect why we decided
not to update the FSM after retail vacuum. But the entire focus then
was to keep things simple and that could be one reason.

 Currently, I believe the only way a page can get marked all-visible is
 by vacuum.  But if we make this change, then it would be possible for
 a HOT cleanup to encounter a situation where all-visible could be set.
  We probably want to make that work.


Yes. Thats certainly an option.

We did not discuss where to store the information about the start-LSN
of the last successful index vacuum. I am thinking about a new
pg_class attribute, just because I can't think of anything better. Any
suggestion ?

Also for the first version, I wonder if we should let the unlogged and
temp tables to be handled by the usual two pass vacuum. Once we have
proven that one pass is better, we will extend that to other tables as
discussed on this thread.

Do we need a modified syntax for vacuum, like VACUUM mytab SKIP
INDEX or something similar ? That way, user can just vacuum the heap
if she wishes so and can also help us with testing.

Do we need more autovacuum tuning parameters to control when to vacuum
just the heap and when to vacuum the index as well ? Again, we can
discuss and decide this later, but just wanted to mention this here.

So are there any other objections/suggestions ? Anyone else cares to
look at the brief design that we discussed above ? Otherwise, I would
go ahead and work on this in the coming days. Of course, I will keep
the list posted about any new issues that I see.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Pavan Deolasee
On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 On Thu, May 26, 2011 at 9:40 AM, Robert Haas robertmh...@gmail.com wrote:

 Currently, I believe the only way a page can get marked all-visible is
 by vacuum.  But if we make this change, then it would be possible for
 a HOT cleanup to encounter a situation where all-visible could be set.
  We probably want to make that work.


 Yes. Thats certainly an option.


BTW, I just realized that this design would expect the visibility map
to be always correct or at least it should always correctly report a
page having dead line pointers. We would expect the index vacuum to
clean  index pointers to *all* dead line pointers because once the
index vacuum is complete, other backends or next heap vacuum may
remove any of those old dead line pointers assuming that index vacuum
would have taken care of the index pointers.

IOW, the visibility map bit must always be clear when there are dead
line pointers on the page. Do we guarantee that today ? I think we do,
but the comment in the source file is not affirmative.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 6:40 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 There are some other issues that we should think about too. Like
 recording free space  and managing visibility map. The free space is
 recorded in the second pass pass today, but I don't see any reason why
 that can't be moved to the first pass. Its not clear though if we
 should also record free space after retail page vacuum or leave it as
 it is.

 Not sure.  Any idea why it's like that, or why we might want to change it?

 I think it precedes the HOT days when the dead space was reclaimed
 only during the second scan. Even post-HOT, if we know we would
 revisit the page anyways during the second scan, it makes sense to
 delay recording free space because the dead line pointers can add to
 it (if they are towards the end of the line pointer array). I remember
 discussing this briefly during HOT, but can't recollect why we decided
 not to update the FSM after retail vacuum. But the entire focus then
 was to keep things simple and that could be one reason.

It's important to keep in mind that page-at-a-time vacuum is happening
in the middle of a routine INSERT/UPDATE/DELETE operation, so we don't
want to do anything too expensive there.  Whether updating the FSM
falls into that category or not, I am not sure.

 Currently, I believe the only way a page can get marked all-visible is
 by vacuum.  But if we make this change, then it would be possible for
 a HOT cleanup to encounter a situation where all-visible could be set.
  We probably want to make that work.

 Yes. Thats certainly an option.

 We did not discuss where to store the information about the start-LSN
 of the last successful index vacuum. I am thinking about a new
 pg_class attribute, just because I can't think of anything better. Any
 suggestion ?

That seems fairly grotty, but I don't have a lot of brilliant ideas.
One possibility that occurred to me was to stick it in the special
space on the first page of the relation.  But that would mean that
every HOT cleanup would need to look at that page, which seems poor.
Even if we cached it after the first access, it still seems kinda
poor.  But it would make the unlogged case easier to handle...  and we
have thought previously about including some metadata in the relation
file itself to help with forensics (which table was this, anyway?).
So I don't know.

 Also for the first version, I wonder if we should let the unlogged and
 temp tables to be handled by the usual two pass vacuum. Once we have
 proven that one pass is better, we will extend that to other tables as
 discussed on this thread.

We can certainly do that for testing.  Whether we want to commit it
that way, I'm not sure.

 Do we need a modified syntax for vacuum, like VACUUM mytab SKIP
 INDEX or something similar ? That way, user can just vacuum the heap
 if she wishes so and can also help us with testing.

There's an extensible-options syntax you can use... VACUUM (index off) mytab.

 Do we need more autovacuum tuning parameters to control when to vacuum
 just the heap and when to vacuum the index as well ? Again, we can
 discuss and decide this later, but just wanted to mention this here.

Let's make tuning that a separate effort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-26 Thread Robert Haas
On Thu, May 26, 2011 at 8:57 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 On Thu, May 26, 2011 at 9:40 AM, Robert Haas robertmh...@gmail.com wrote:

 Currently, I believe the only way a page can get marked all-visible is
 by vacuum.  But if we make this change, then it would be possible for
 a HOT cleanup to encounter a situation where all-visible could be set.
  We probably want to make that work.


 Yes. Thats certainly an option.

 BTW, I just realized that this design would expect the visibility map
 to be always correct or at least it should always correctly report a
 page having dead line pointers. We would expect the index vacuum to
 clean  index pointers to *all* dead line pointers because once the
 index vacuum is complete, other backends or next heap vacuum may
 remove any of those old dead line pointers assuming that index vacuum
 would have taken care of the index pointers.

 IOW, the visibility map bit must always be clear when there are dead
 line pointers on the page. Do we guarantee that today ? I think we do,
 but the comment in the source file is not affirmative.

It can end up in the wrong state after a crash.  I have a patch to try
to fix that, but I need someone to review it.  (*looks meaningfully at
Heikki, coughs loudly*)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Pavan Deolasee
On Tue, May 24, 2011 at 10:59 PM, Robert Haas robertmh...@gmail.com wrote:

 So, first of all, thanks for putting some effort and thought into
 this.  Despite the large number of improvements in this area in 8.3
 and 8.4, this is still a pain point, and it would be really nice to
 find a way to make some further improvements.


Thanks for bringing up the idea during PGCon. That helped me to get
interested in this again. I hope we would be able to take this to a logical
conclusion this time and do something to alleviate the pain.


  On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
  So the idea is to separate the index vacuum (removing index pointers to
 dead
  tuples) from the heap vacuum. When we do heap vacuum (either by
 HOT-pruning
  or using regular vacuum), we can spool the dead line pointers somewhere.
 To
  avoid any hot-spots during normal processing, the spooling can be done
  periodically like the stats collection.

 What happens if the system crashes after a line pointer becomes dead
 but before the record of its death is safely on disk?  The fact that a
 previous index vacuum has committed is only sufficient justification
 for reclaiming the dead line pointers if you're positive that the
 index vacuum killed the index pointers for *every* dead line pointer.
 I'm not sure we want to go there; any operation that wants to make a
 line pointer dead will need to be XLOG'd.  Instead, I think we should
 stick with your original idea and just try to avoid the second heap
 pass.


I would not mind keeping the design simple for the first release. So even if
we can avoid the second heap scan in vacuum, that would be a big win. In the
long term though, I think it will pay off keeping track of dead line
pointers as they are generated. The only way though they are generated is
while cleaning up the page holding the clean-up lock and the operation is
WAL logged. So spooling dead line pointers during WAL replay should be
possible.

Anyways, I would like not to pursue the idea and I am OK with a simplified
version to start with where every heap vacuum is followed by index vacuum,
collecting and holding the dead line pointers in the maintenance memory.


 So to do that, as you say, we can have every operation that creates a
 dead line pointer note the LSN of the operation in the page.


Yes.


 But instead of allocating permanent space in the page header, which would
 both reduce (admittedly only by 8 bytes) the amount of space available
 for tuples, and more significantly have the effect of breaking on-disk
 compatibility, I'm wondering if we could get by with making space for
 that extra LSN only when it's actually present. In other words, when
 it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
 increment pd_upper, and use the extra space to store the LSN.  There
 is an alignment problem to worry about there but that shouldn't be a
 huge issue.


That might work but would require us to move tuples around when the first
dead line pointer gets generated in the page. You may argue that we should
be holding a cleanup-lock when that happens and the dead line pointer
creation is always followed by a call to PageRepairFragmentation(), so it
should be easier to make space for the LSN.

Instead of storing the LSN after the page header, would it be easier to set
pd_special and store the LSN at the end of the page ?


 When we vacuum, we remember the LSN before we start.  When we finish,
 if we scanned the indexes and everything completed without error, then
 we bump the heap's notion (wherever we store it) of the last
 successful index vacuum.  When we vacuum or do HOT cleanup on a page,
 if the page has a most-recent-dead-line pointer LSN and it precedes
 the start-of-last-successful-index-vacuum LSN, then we mark all the
 LP_DEAD tuples as LP_UNUSED and throw away the
 most-recent-dead-line-pointer LSN.


Right. And if the cleanup generates new dead line pointers, the LSN will be
replaced with the LSN of the current operation.


 One downside of this approach is that, if we do something like this,
 it'll become slightly more complicated to figure out where the item
 pointer array ends.  Another issue is that we might find ourselves
 wanting to extend the item pointer array to add a new item, and unable
 to do so easily because this most-recent-dead-line-pointer LSN is in
 the way.


I think that should be not so difficult to handle. I think handling this by
special space mechanism might be less complicated.


 If the LSN stored in the page precedes the
 start-of-last-successful-index-vacuum LSN, and if, further, we can get
 a buffer cleanup lock on the page, then we can do a HOT cleanup and
 life is good.  Otherwise, we can either (1) just forget about the
 most-recent-dead-line-pointer LSN - not ideal but not catastrophic
 either - or (2) if the start-of-last-successful-vacuum-LSN is old
 enough, we could overwrite an LP_DEAD line pointer in place.


I don't think we 

Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Simon Riggs
On Tue, May 24, 2011 at 7:58 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:

 The biggest gripe today is that vacuum needs two heap scans and each scan
 dirties the buffer.

That's not that clear to me. The debate usually stalls because we
don't have sufficient info from real world analysis of where the time
goes.

 So the idea is to separate the index vacuum (removing index pointers to dead
 tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
 or using regular vacuum), we can spool the dead line pointers somewhere.

ISTM it will be complex to attempt to store the exact list of TIDs
between VACUUMs.

At the moment we scan indexes if we have  0 rows to remove, which is
probably wasteful. Perhaps it would be better to keep a running total
of rows to remove, by updating pg_stats, then when we hit a certain
threshold in total we can do the index scan. So we don't need to
remember the TIDs, just remember how many there were and use that to
avoid cleaning too vigorously.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Robert Haas
On Wed, May 25, 2011 at 7:31 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, May 24, 2011 at 7:58 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:

 The biggest gripe today is that vacuum needs two heap scans and each scan
 dirties the buffer.

 That's not that clear to me. The debate usually stalls because we
 don't have sufficient info from real world analysis of where the time
 goes.

It probably wouldn't be too hard to write a simple patch to measure
time spent during the first heap pass, time spent scanning indexes,
and time spent on the second heap pass.  But it's not just about where
the time goes: as Pavan says, the second heap pass feels like a waste.
 Actually, the time spent scanning the indexes kinda feels like a
waste too, if the number of tuples being removed is small.  Which
brings me to your second point:

 So the idea is to separate the index vacuum (removing index pointers to dead
 tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
 or using regular vacuum), we can spool the dead line pointers somewhere.

 ISTM it will be complex to attempt to store the exact list of TIDs
 between VACUUMs.

 At the moment we scan indexes if we have  0 rows to remove, which is
 probably wasteful. Perhaps it would be better to keep a running total
 of rows to remove, by updating pg_stats, then when we hit a certain
 threshold in total we can do the index scan. So we don't need to
 remember the TIDs, just remember how many there were and use that to
 avoid cleaning too vigorously.

That occurred to me, too.  If we're being launched by autovacuum then
we know that a number of updates and deletes equal ~20% (or whatever
autovacuum_vacuum_scale_factor is set to) of the table size have
occurred since the last autovacuum.  But it's possible that many of
those were HOT updates, in which case the number of index entries to
be cleaned up might be much less than 20% of the table size.
Alternatively, it's possible that we'd be better off vacuuming the
table more often (say, autovacuum_vacuum_scale_factor=0.10 or 0.08 or
something) but only doing the index scans every once in a while when
enough dead line pointers have accumulated.  After all, it's the first
heap pass that frees up most of the space; cleaning dead line pointers
seems a bit less urgent.  But not having done any real analysis of how
this would work out in practice, I'm not sure whether it's a good idea
or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Simon Riggs
On Wed, May 25, 2011 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote:

 At the moment we scan indexes if we have  0 rows to remove, which is
 probably wasteful. Perhaps it would be better to keep a running total
 of rows to remove, by updating pg_stats, then when we hit a certain
 threshold in total we can do the index scan. So we don't need to
 remember the TIDs, just remember how many there were and use that to
 avoid cleaning too vigorously.

 That occurred to me, too.  If we're being launched by autovacuum then
 we know that a number of updates and deletes equal ~20% (or whatever
 autovacuum_vacuum_scale_factor is set to) of the table size have
 occurred since the last autovacuum.  But it's possible that many of
 those were HOT updates, in which case the number of index entries to
 be cleaned up might be much less than 20% of the table size.
 Alternatively, it's possible that we'd be better off vacuuming the
 table more often (say, autovacuum_vacuum_scale_factor=0.10 or 0.08 or
 something) but only doing the index scans every once in a while when
 enough dead line pointers have accumulated.  After all, it's the first
 heap pass that frees up most of the space; cleaning dead line pointers
 seems a bit less urgent.  But not having done any real analysis of how
 this would work out in practice, I'm not sure whether it's a good idea
 or not.

We know whether a TID was once in the index or not, so we can keep an
exact count. HOT doesn't come into it.

We can remove TIDs from index as well without VACUUM during btree
split avoidance. We can optimise the second scan by skipping htids no
longer present in the index, though we'd need a spare bit to mark
usage that which I'm not sure we have.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Robert Haas
On Wed, May 25, 2011 at 7:07 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 But instead of allocating permanent space in the page header, which would
 both reduce (admittedly only by 8 bytes) the amount of space available
 for tuples, and more significantly have the effect of breaking on-disk
 compatibility, I'm wondering if we could get by with making space for
 that extra LSN only when it's actually present. In other words, when
 it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
 increment pd_upper, and use the extra space to store the LSN.  There
 is an alignment problem to worry about there but that shouldn't be a
 huge issue.

 That might work but would require us to move tuples around when the first
 dead line pointer gets generated in the page.

I'm confused.  A major point of the approach I was proposing was to
avoid having to move tuples around.

 You may argue that we should
 be holding a cleanup-lock when that happens and the dead line pointer
 creation is always followed by a call to PageRepairFragmentation(), so it
 should be easier to make space for the LSN.

I'm not sure if this is the same thing you're saying, but certainly
the only time we need to make space for this value is when we've just
remove tuples from the page and defragmented, and at that point there
should certainly be 8 bytes free somewhere.

 Instead of storing the LSN after the page header, would it be easier to set
 pd_special and store the LSN at the end of the page ?

I was proposing storing it after the line pointer array, not after the
page header.  If we store it at the end of the page, I suspect we're
going to basically end up allocating permanent space for it, because
otherwise we'll have to shift all the tuple data forward and backward
by 8 bytes when we allocate or deallocate space for this.  Now, maybe
that's OK: I'm not sure.  But it's something to think about carefully.
 If we are going to allocate permanent space, the special space seems
better than the page header, because we should be able to make that
work without on-disk compatibility, and because AFAIUI we only need
the space for heap pages, not index pages.

 I think that should be not so difficult to handle. I think handling this by
 special space mechanism might be less complicated.

A permanent space allocation will certainly be simpler.  I'm just not
sure how much we care about giving up 8 bytes that could potentially
be used to store tuple data.

 If the LSN stored in the page precedes the
 start-of-last-successful-index-vacuum LSN, and if, further, we can get
 a buffer cleanup lock on the page, then we can do a HOT cleanup and
 life is good.  Otherwise, we can either (1) just forget about the
 most-recent-dead-line-pointer LSN - not ideal but not catastrophic
 either - or (2) if the start-of-last-successful-vacuum-LSN is old
 enough, we could overwrite an LP_DEAD line pointer in place.

 I don't think we need the cleanup lock to turn the LP_DEAD line pointers to
 LP_UNUSED since that does not involve moving tuples around. So a simple
 EXCLUSIVE lock should be enough. But we would need to WAL log the operation
 of turning DEAD to UNUSED, so it would be simpler to consolidate this in HOT
 pruning. There could be exceptions such as, say large number of DEAD line
 pointers are accumulated towards the end and reclaiming those would free up
 substantial space in the page. But may be we can use those conditions to
 invoke HOT prune instead of handling them separately.

Makes sense.

 Another issue is that this causes problems for temporary and unlogged
 tables, because no WAL records are generated and, therefore, the LSN
 does not advance.  This is also a problem for GIST indexes; Heikki
 fixed temporary GIST indexes by generating fake LSNs off of a
 backend-local counter.  Unlogged GIST indexes are currently not
 supported.  I think what we need to do is create an API to which you
 can pass a relation and get an LSN.  If it's a permanent relation, you
 get a regular LSN.  If it's a temporary relation, you get a fake LSN
 based on a backend-local counter.  If it's an unlogged relation, you
 get a fake LSN based on a shared-memory counter that is reset on
 restart.  If we can encapsulate that properly, it should provide both
 what we need to make this idea work and allow a somewhat graceful fix
 for GIST-vs-unlogged problem.

 Can you explain more how things would work for unlogged tables ? Do we use
 the same shared memory counter for tracking last successful index vacuum ?

Yes.

 If so, how do we handle the case where after restart the page may get LSN
 less than the index vacuum LSN if the index vacuum happened before the
 crash/stop ?

Well, on a crash, the unlogged relations get truncated, and their
indexes also, so no problem.  On a clean shutdown, I guess we need to
arrange to save the counter across restarts.

Take a look at the existing logic around GetXLogRecPtrForTemp().
That's slightly different, because there we don't even 

Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Pavan Deolasee
On Wed, May 25, 2011 at 7:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 25, 2011 at 7:07 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 But instead of allocating permanent space in the page header, which would
 both reduce (admittedly only by 8 bytes) the amount of space available
 for tuples, and more significantly have the effect of breaking on-disk
 compatibility, I'm wondering if we could get by with making space for
 that extra LSN only when it's actually present. In other words, when
 it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
 increment pd_upper, and use the extra space to store the LSN.  There
 is an alignment problem to worry about there but that shouldn't be a
 huge issue.

 That might work but would require us to move tuples around when the first
 dead line pointer gets generated in the page.

 I'm confused.  A major point of the approach I was proposing was to
 avoid having to move tuples around.


Well, I am not sure how you can always guarantee to make space
available for the LSN without moving tuples , irrespective of where
you store it.  But probably its not important as we discussed below.

 You may argue that we should
 be holding a cleanup-lock when that happens and the dead line pointer
 creation is always followed by a call to PageRepairFragmentation(), so it
 should be easier to make space for the LSN.

 I'm not sure if this is the same thing you're saying, but certainly
 the only time we need to make space for this value is when we've just
 remove tuples from the page and defragmented, and at that point there
 should certainly be 8 bytes free somewhere.


Agree.

 Instead of storing the LSN after the page header, would it be easier to set
 pd_special and store the LSN at the end of the page ?

 I was proposing storing it after the line pointer array, not after the
 page header.  If we store it at the end of the page, I suspect we're
 going to basically end up allocating permanent space for it, because
 otherwise we'll have to shift all the tuple data forward and backward
 by 8 bytes when we allocate or deallocate space for this.  Now, maybe
 that's OK: I'm not sure.  But it's something to think about carefully.
  If we are going to allocate permanent space, the special space seems
 better than the page header, because we should be able to make that
 work without on-disk compatibility, and because AFAIUI we only need
 the space for heap pages, not index pages.


I think if are reclaiming LP_DEAD line pointers only while
defragmenting the page, we can always reclaim the space for the LSN,
irrespective of where we store it. So may be we should decide
depending on what would matter for on-disk compatibility and whatever
requires least invasive changes. I don't know what is that yet.



 If so, how do we handle the case where after restart the page may get LSN
 less than the index vacuum LSN if the index vacuum happened before the
 crash/stop ?

 Well, on a crash, the unlogged relations get truncated, and their
 indexes also, so no problem.  On a clean shutdown, I guess we need to
 arrange to save the counter across restarts.

Oh ok. I was not aware that unlogged tables get truncated. I think we
can just restore from the value stored for last successful index
vacuum (after incrementing it may be). That should be possible.


 Take a look at the existing logic around GetXLogRecPtrForTemp().
 That's slightly different, because there we don't even need to be
 consistent across backends.  We just need an increasing sequence of
 values.  For unlogged relations things are a bit more complex - but it
 seems manageable.

Ok. Will look at it.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Pavan Deolasee
On Wed, May 25, 2011 at 5:57 PM, Robert Haas robertmh...@gmail.com wrote:


 Alternatively, it's possible that we'd be better off vacuuming the
 table more often (say, autovacuum_vacuum_scale_factor=0.10 or 0.08 or
 something) but only doing the index scans every once in a while when
 enough dead line pointers have accumulated.

Thats precisely the reason I suggested separating heap and index
vacuums instead of a tight integration as we have now. If we don't
spool the dead line pointers in a separate area though, we would need
to make sure that index vacuum runs through the heap first to collect
the dead line pointers and then remove the corresponding index
pointers. We would need to also take into consideration the
implications on visibility map for any such scheme to work correctly
and efficiently.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Robert Haas
On Wed, May 25, 2011 at 10:07 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 I'm confused.  A major point of the approach I was proposing was to
 avoid having to move tuples around.

 Well, I am not sure how you can always guarantee to make space
 available for the LSN without moving tuples , irrespective of where
 you store it.  But probably its not important as we discussed below.

Well, if we just defragged the page, then we should be guaranteed that
pd_lower + 8  pd_upper.  No?

 I think if are reclaiming LP_DEAD line pointers only while
 defragmenting the page, we can always reclaim the space for the LSN,
 irrespective of where we store it. So may be we should decide
 depending on what would matter for on-disk compatibility and whatever
 requires least invasive changes. I don't know what is that yet.

If we store the LSN at the beginning (page header) or end (special
space) of the page, then we'll have to adjust the location of the data
which follows it or precedes it if and when we want to reclaim the
space it occupies.  But if we store it in the hole in the middle
somewhere (probably by reducing the size of the hole, not by actually
referencing the area between pd_lower and pd_upper) then no tuples or
item pointers have to move around when we want to reclaim the space.
That way, we need only an exclusive lock (not a cleanup lock), and we
don't need to memmove() anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié may 25 12:27:38 -0400 2011:
 On Wed, May 25, 2011 at 10:07 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:

  I think if are reclaiming LP_DEAD line pointers only while
  defragmenting the page, we can always reclaim the space for the LSN,
  irrespective of where we store it. So may be we should decide
  depending on what would matter for on-disk compatibility and whatever
  requires least invasive changes. I don't know what is that yet.
 
 If we store the LSN at the beginning (page header) or end (special
 space) of the page, then we'll have to adjust the location of the data
 which follows it or precedes it if and when we want to reclaim the
 space it occupies.  But if we store it in the hole in the middle
 somewhere (probably by reducing the size of the hole, not by actually
 referencing the area between pd_lower and pd_upper) then no tuples or
 item pointers have to move around when we want to reclaim the space.
 That way, we need only an exclusive lock (not a cleanup lock), and we
 don't need to memmove() anything.

You can store anything in the hole in the data area (currently we only
store tuple data), but then you'd have to store the offset to where you
stored it in some fixed location, and make sure you adjust pd_upper/lower
so that the next tuple does not overwrite it.  You'd still need space
in either page header or special space.  I don't see how you'd store
anything in the hole without it being in a fixed place, where it would
eventually be hit by either the line pointer array or tuple data.


As far as the general idea of this thread goes, I remember thinking
about exactly this topic a couple of months ago -- I didn't get this far
though, so thanks for fleshing out some details and getting the ball
rolling.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Robert Haas
On Wed, May 25, 2011 at 12:48 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié may 25 12:27:38 -0400 2011:
 On Wed, May 25, 2011 at 10:07 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:

  I think if are reclaiming LP_DEAD line pointers only while
  defragmenting the page, we can always reclaim the space for the LSN,
  irrespective of where we store it. So may be we should decide
  depending on what would matter for on-disk compatibility and whatever
  requires least invasive changes. I don't know what is that yet.

 If we store the LSN at the beginning (page header) or end (special
 space) of the page, then we'll have to adjust the location of the data
 which follows it or precedes it if and when we want to reclaim the
 space it occupies.  But if we store it in the hole in the middle
 somewhere (probably by reducing the size of the hole, not by actually
 referencing the area between pd_lower and pd_upper) then no tuples or
 item pointers have to move around when we want to reclaim the space.
 That way, we need only an exclusive lock (not a cleanup lock), and we
 don't need to memmove() anything.

 You can store anything in the hole in the data area (currently we only
 store tuple data), but then you'd have to store the offset to where you
 stored it in some fixed location, and make sure you adjust pd_upper/lower
 so that the next tuple does not overwrite it.  You'd still need space
 in either page header or special space.

You only need one bit of space in the page header, to indicate whether
the LSN is present or not.  And we have unused bit space there.

 I don't see how you'd store
 anything in the hole without it being in a fixed place, where it would
 eventually be hit by either the line pointer array or tuple data.

The point is that it doesn't matter.  Suppose we put it just after the
line pointer array.  Any time we're thinking about extending the line
pointer array, we already have an exclusive lock on the buffer.  And
if we already have a exclusive lock on the buffer, then we can reclaim
the dead line pointers and now we no longer need the saved LSN, so
writing over it is perfectly fine.

OK, I lied: if we have an exclusive buffer lock, but the last vacuum
either failed, or is still in progress, then the LSN might not be old
enough for us to reclaim the dead line pointers yet.  So ideally we'd
like to hold onto it.  We can do that by either (a) moving the LSN out
another 6 bytes, if there's enough room; or (b) deciding not to put
the new tuple on this page, after all.  There's no situation in which
we absolutely HAVE to get another tuple onto this particular page.  We
can just decide that the effective size of a page that contains dead
line pointers is effectively 8 bytes less.  The alternative is to eat
up 8 bytes of space on ALL pages, whether they contain dead line
pointers or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Pavan Deolasee
On Wed, May 25, 2011 at 10:36 PM, Robert Haas robertmh...@gmail.com wrote:


 I don't see how you'd store
 anything in the hole without it being in a fixed place, where it would
 eventually be hit by either the line pointer array or tuple data.

 The point is that it doesn't matter.  Suppose we put it just after the
  line pointer array.

I think the point is you can not *always* put it just after the line
pointer array without possibly shuffling the tuples. Remember we need
to put the LSN when the dead line pointer is generated because we
decided to prune away the dead tuple. Say, for example, the page is
completely full and there are no dead line pointers and hence no LSN
on the page. Also there is no free space after the line pointer array.
Now say we prune dead tuples and generate dead line pointers, but the
last line pointer in the array is still in-use and the first tuple
immediately after the line pointer array is live. Since you generated
dead line pointers you want to store the LSN on the page. Now, there
is no way you can store is after the line pointer array without moving
the live tuple somewhere else.

Thats the point me and Alvaro are making. Do you agree with that logic
? Now that does not matter because we would always generate dead line
pointers holding a buffer cleanup lock and hence we are free to
shuffle tuples around. May be we are digressing on a trivial detail
here, but I hope I got it correct.

 Any time we're thinking about extending the line
 pointer array, we already have an exclusive lock on the buffer.  And
 if we already have a exclusive lock on the buffer, then we can reclaim
 the dead line pointers and now we no longer need the saved LSN, so
 writing over it is perfectly fine.


The trouble is you may not be able to shrink the line pointer array.
But of course, you can reuse the reclaimed dead line pointers. I would
still advocate doing that during the pruning operation because we want
to emit WAL records for the operation.

 OK, I lied: if we have an exclusive buffer lock, but the last vacuum
 either failed, or is still in progress, then the LSN might not be old
 enough for us to reclaim the dead line pointers yet.  So ideally we'd
 like to hold onto it.  We can do that by either (a) moving the LSN out
 another 6 bytes, if there's enough room; or (b) deciding not to put
 the new tuple on this page, after all.  There's no situation in which
 we absolutely HAVE to get another tuple onto this particular page.  We
 can just decide that the effective size of a page that contains dead
 line pointers is effectively 8 bytes less.  The alternative is to eat
 up 8 bytes of space on ALL pages, whether they contain dead line
 pointers or not.


I think we are on the same page as far as storing LSN if and only if
its required. But what was not convincing is the argument that you can
*always* find free space for the LSN without moving things around.

Let me summarize the sequence of operations and let me know if you
still disagree with the general principle:

1. There are no dead line pointers in the page - we are good.
2. Few tuples become dead, HOT pruning is invoked either during normal
operation or heap vacuum. The dead tuples are pruned away and
truncated to dead line pointers. We already hold cleanup lock on the
buffer. We set the flag in the page header and store the LSN (either
at the end of line pointer array or at the end of the page)
3. Someday index vacuum is run and it removes the index pointers to
the dead line pointers. We remember the start LSN of the index vacuum
somewhere, may be as a pg_class attribute (how does index vacuum get
the list of dead line pointers is not material in the general scheme
of things)
4. When the page is again chosen for pruning, we check if the flag is
set in the header. If so, get the LSN stored in the page, check it
against the last successful index vacuum LSN and if its precedes the
index vacuum LSN, we turn the LP_DEAD line pointers to LP_UNUSED. The
special LSN can be removed unless new LP_DEAD line pointers get
generated during the pruning, otherwise its overwritten with the
current LSN. Since we hold the buffer cleanup lock, the special LSN
storage can be reclaimed by shuffling things around.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Robert Haas
On Wed, May 25, 2011 at 1:43 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 I think the point is you can not *always* put it just after the line
 pointer array without possibly shuffling the tuples. Remember we need
 to put the LSN when the dead line pointer is generated because we
 decided to prune away the dead tuple. Say, for example, the page is
 completely full and there are no dead line pointers and hence no LSN
 on the page. Also there is no free space after the line pointer array.
 Now say we prune dead tuples and generate dead line pointers, but the
 last line pointer in the array is still in-use and the first tuple
 immediately after the line pointer array is live. Since you generated
 dead line pointers you want to store the LSN on the page. Now, there
 is no way you can store is after the line pointer array without moving
 the live tuple somewhere else.

So far I agree.  But don't we always defragment immediately after
pruning dead tuples to line pointers?  The removal of even one tuple
will give us more than enough space to store the LSN.

 Let me summarize the sequence of operations and let me know if you
 still disagree with the general principle:

 1. There are no dead line pointers in the page - we are good.
 2. Few tuples become dead, HOT pruning is invoked either during normal
 operation or heap vacuum. The dead tuples are pruned away and
 truncated to dead line pointers. We already hold cleanup lock on the
 buffer. We set the flag in the page header and store the LSN (either
 at the end of line pointer array or at the end of the page)
 3. Someday index vacuum is run and it removes the index pointers to
 the dead line pointers. We remember the start LSN of the index vacuum
 somewhere, may be as a pg_class attribute (how does index vacuum get
 the list of dead line pointers is not material in the general scheme
 of things)
 4. When the page is again chosen for pruning, we check if the flag is
 set in the header. If so, get the LSN stored in the page, check it
 against the last successful index vacuum LSN and if its precedes the
 index vacuum LSN, we turn the LP_DEAD line pointers to LP_UNUSED. The
 special LSN can be removed unless new LP_DEAD line pointers get
 generated during the pruning, otherwise its overwritten with the
 current LSN. Since we hold the buffer cleanup lock, the special LSN
 storage can be reclaimed by shuffling things around.

Agreed.  The only thing I'm trying to do further is to avoid the need
for a reshuffle when the special LSN storage is reclaimed.  For
example, consider:

1. There are three tuples on the page.  We are good.
2. Tuple #2 becomes dead.  The tuple is pruned to a line pointer.  The
page is defragmented.  At this point, it doesn't matter WHERE we put
the LSN - we are rearranging the whole page anyway.
3. Index vacuum is run.
4. Now we want to make the dead line pointer unused, and reclaim the
LSN storage.  If the LSN is stored at the end of the page, then we now
have to move all of the tuple data forward by 8 bytes.  But if it's
stored adjacent to the hole in the middle of the page, we need only
clear the page-header bits saying it's there (and maybe adjust
pd_lower).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Pavan Deolasee
On Wed, May 25, 2011 at 11:39 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 25, 2011 at 1:43 PM, Pavan Deolasee
 Now, there
 is no way you can store is after the line pointer array without moving
 the live tuple somewhere else.

 So far I agree.  But don't we always defragment immediately after
 pruning dead tuples to line pointers?  The removal of even one tuple
 will give us more than enough space to store the LSN.


Yes, we do. But defragment means shuffling tuples around. So we agree
that to find space for the LSN, we might need to move the tuples
around.


 Agreed.  The only thing I'm trying to do further is to avoid the need
 for a reshuffle when the special LSN storage is reclaimed.

Ah ok. That was never clear from your initial emails or may be I
mis-read. So what you are saying is by storing LSN after line pointer
array, we might be able to reclaim LSN storage without shuffling. That
makes sense. Having said that, it doesn't excite me too much because I
think we should do the dead line pointer reclaim operation during page
pruning and we are already holding cleanup lock at that time and most
likely do a reshuffle anyways.

Also a downside of storing LSN after line pointer array is that you
may waste space because of alignment issues. I also thought that the
LSN might come in between extending line pointer array, but probably
thats not a big deal since if there is free space in the page (and
there should be if we are adding a new tuple), it should be available
immediately after the LSN.

There are some other issues that we should think about too. Like
recording free space  and managing visibility map. The free space is
recorded in the second pass pass today, but I don't see any reason why
that can't be moved to the first pass. Its not clear though if we
should also record free space after retail page vacuum or leave it as
it is. For visibility maps, we should not update them until there are
LP_DEAD line pointers on the page. Now thats not good because all
tuples in the page may be visible, so we may loose some advantage, at
least for a while, but if mark the page all-visible, the vacuum scan
would not find the dead line pointers in it and that would leave
dangling index pointers after an index vacuum.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-25 Thread Robert Haas
On Wed, May 25, 2011 at 11:51 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 Agreed.  The only thing I'm trying to do further is to avoid the need
 for a reshuffle when the special LSN storage is reclaimed.

 Ah ok. That was never clear from your initial emails or may be I
 mis-read.

Sorry, I must not have explained it very well.  :-(

 So what you are saying is by storing LSN after line pointer
 array, we might be able to reclaim LSN storage without shuffling. That
 makes sense. Having said that, it doesn't excite me too much because I
 think we should do the dead line pointer reclaim operation during page
 pruning and we are already holding cleanup lock at that time and most
 likely do a reshuffle anyways.

I'll give that a firm maybe.  If there is no reshuffle, then you can
do this with just an exclusive content lock.  Maybe that's worthless,
but I'm not certain of it.  I guess we might need to see how the code
shakes out.

Also, reshuffling might be more expensive.  I agree that if there are
new dead tuples on the page, then you're going to be paying that price
anyway; but if not, it might be avoidable.

 Also a downside of storing LSN after line pointer array is that you
 may waste space because of alignment issues.

We could possibly store it unaligned and read it back two bytes at a
time.  Granted, that's not free.

 I also thought that the
 LSN might come in between extending line pointer array, but probably
 thats not a big deal since if there is free space in the page (and
 there should be if we are adding a new tuple), it should be available
 immediately after the LSN.

Yeah.  I'm not sure how icky that is, though.

 There are some other issues that we should think about too. Like
 recording free space  and managing visibility map. The free space is
 recorded in the second pass pass today, but I don't see any reason why
 that can't be moved to the first pass. Its not clear though if we
 should also record free space after retail page vacuum or leave it as
 it is.

Not sure.  Any idea why it's like that, or why we might want to change it?

 For visibility maps, we should not update them until there are
 LP_DEAD line pointers on the page. Now thats not good because all
 tuples in the page may be visible, so we may loose some advantage, at
 least for a while, but if mark the page all-visible, the vacuum scan
 would not find the dead line pointers in it and that would leave
 dangling index pointers after an index vacuum.

Also, an index-only scan might return index tuples that are pointing
to dead line pointers.

Currently, I believe the only way a page can get marked all-visible is
by vacuum.  But if we make this change, then it would be possible for
a HOT cleanup to encounter a situation where all-visible could be set.
 We probably want to make that work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-24 Thread Pavan Deolasee
Hi All,

Some of the ideas regarding vacuum improvements were discussed here:
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php
http://archives.postgresql.org/pgsql-patches/2008-06/msg00059.php

A recent thread was started by Robert Haas, but I don't know if we logically
concluded that either.
http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php

This was once again brought up by Robert Haas in a discussion with Tom and
me during the PGCon and  we agreed there are few things we can do make
vacuum more performant. One of the things that Tom mentioned is that the
vacuum today is not aware of the fact that its a periodic operation and
there might be ways to utilize that in some way.

The biggest gripe today is that vacuum needs two heap scans and each scan
dirties the buffer. While visibility map ensures that not-all blocks are
read and written during the scan, for a very large table, even a small
percentage of blocks can be significant. Further, post-HOT, the second scan
of the heap does not really reclaim any significant space, except for dead
line pointers. So there is a good reason to avoid that. I wanted to start a
discussion just about that. I am proposing one solution below, but I am not
married to the idea.

So the idea is to separate the index vacuum (removing index pointers to dead
tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
or using regular vacuum), we can spool the dead line pointers somewhere. To
avoid any hot-spots during normal processing, the spooling can be done
periodically like the stats collection. One obvious choice for spooling dead
line pointers is to use a relation fork. The index vacuum will be kicked off
periodically depending on the number of spooled deal line pointers. When
that happens, the index vacuum will remove all index pointers pointing to
those dead   line pointers and forget the spooled line pointers.

The dead line pointers themselves will be removed whenever a heap page is
later vacuumed, either as part of HOT pruning or the next heap vacuum. We
would need some mechanism though to know that the index pointers to the
existing dead line pointers have been vacuumed and its safe to remove them
now. May be we can track the last operation that generated a dead line
pointer in the page using a LSN in the page header and also keep track of
the LSN of the last successful index vacuum. If the index vacuum LSN is
greater than the page header vacuum LSN, we can safely remove the existing
dead line pointers. I am deliberately not suggesting how to track the index
vacuum LSN since my last proposal to do something similar through a pg_class
column was shot down by Tom :-)

In nutshell, what I am suggesting is to do heap and index vacuuming
independently. The heap will be vacuumed either by HOT pruning or a periodic
heap vacuum and the dead line pointers will be collected. An index vacuum
will remove the index pointers to those dead line pointers. And at some
later point, the dead line pointers will be removed, either as part of
retail or complete heap vacuum. Its not clear if its useful, but a single
index vacuum can follow multiple heap vacuums or vice versa.

Another advantage of this technique would be that we can then support
start/stop heap vacuum or vacuuming a range of blocks at a time or even
vacuuming only those blocks which are already cached in the buffer cache.
Just a hand-waving at this point, but seems possible.

Suggestions/comments/criticism all welcome, but please don't shoot down the
idea on implementation details since I have really not spent time on that,
so it will be easy find holes and corner cases. That can be worked out if we
believe something like this will be useful.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-24 Thread Robert Haas
So, first of all, thanks for putting some effort and thought into
this.  Despite the large number of improvements in this area in 8.3
and 8.4, this is still a pain point, and it would be really nice to
find a way to make some further improvements.

On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 So the idea is to separate the index vacuum (removing index pointers to dead
 tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
 or using regular vacuum), we can spool the dead line pointers somewhere. To
 avoid any hot-spots during normal processing, the spooling can be done
 periodically like the stats collection.

What happens if the system crashes after a line pointer becomes dead
but before the record of its death is safely on disk?  The fact that a
previous index vacuum has committed is only sufficient justification
for reclaiming the dead line pointers if you're positive that the
index vacuum killed the index pointers for *every* dead line pointer.
I'm not sure we want to go there; any operation that wants to make a
line pointer dead will need to be XLOG'd.  Instead, I think we should
stick with your original idea and just try to avoid the second heap
pass.

So to do that, as you say, we can have every operation that creates a
dead line pointer note the LSN of the operation in the page.  But
instead of allocating permanent space in the page header, which would
both reduce (admittedly only by 8 bytes) the amount of space available
for tuples, and more significantly have the effect of breaking on-disk
compatibility, I'm wondering if we could get by with making space for
that extra LSN only when it's actually present. In other words, when
it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
increment pd_upper, and use the extra space to store the LSN.  There
is an alignment problem to worry about there but that shouldn't be a
huge issue.

When we vacuum, we remember the LSN before we start.  When we finish,
if we scanned the indexes and everything completed without error, then
we bump the heap's notion (wherever we store it) of the last
successful index vacuum.  When we vacuum or do HOT cleanup on a page,
if the page has a most-recent-dead-line pointer LSN and it precedes
the start-of-last-successful-index-vacuum LSN, then we mark all the
LP_DEAD tuples as LP_UNUSED and throw away the
most-recent-dead-line-pointer LSN.

One downside of this approach is that, if we do something like this,
it'll become slightly more complicated to figure out where the item
pointer array ends.  Another issue is that we might find ourselves
wanting to extend the item pointer array to add a new item, and unable
to do so easily because this most-recent-dead-line-pointer LSN is in
the way.  If the LSN stored in the page precedes the
start-of-last-successful-index-vacuum LSN, and if, further, we can get
a buffer cleanup lock on the page, then we can do a HOT cleanup and
life is good.  Otherwise, we can either (1) just forget about the
most-recent-dead-line-pointer LSN - not ideal but not catastrophic
either - or (2) if the start-of-last-successful-vacuum-LSN is old
enough, we could overwrite an LP_DEAD line pointer in place.

Another issue is that this causes problems for temporary and unlogged
tables, because no WAL records are generated and, therefore, the LSN
does not advance.  This is also a problem for GIST indexes; Heikki
fixed temporary GIST indexes by generating fake LSNs off of a
backend-local counter.  Unlogged GIST indexes are currently not
supported.  I think what we need to do is create an API to which you
can pass a relation and get an LSN.  If it's a permanent relation, you
get a regular LSN.  If it's a temporary relation, you get a fake LSN
based on a backend-local counter.  If it's an unlogged relation, you
get a fake LSN based on a shared-memory counter that is reset on
restart.  If we can encapsulate that properly, it should provide both
what we need to make this idea work and allow a somewhat graceful fix
for GIST-vs-unlogged problem.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers