Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Heikki Linnakangas
Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
 Simon Riggs [EMAIL PROTECTED] writes:

 ISTM that if we call HeapTupleSatisfiesVacuum() with an additional
 boolean parameter, force, we can tell VF to always set the hint bits in
 every case, not just HEAP_MOVED_IN and HEAP_MOVED_OUT.
 Surely this approach is no good: won't it allow hint bits to reach disk
 in advance of their transaction?
 
 I don't think so since it sounds like he's saying to still sync the log and
 VACUUM FULL has an exclusive lock on the table. So any committed (or aborted)
 changes it sees in the table must have been committed or aborted before the
 log sync.

Hint bit updates are not WAL-logged, so there's no mechanism to keep the
data page from hitting the disk before the COMMIT record. That's the
reason why we can't just set the hint bits for async committed
transactions in the first place.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Simon Riggs
On Fri, 2007-08-10 at 11:11 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Gregory Stark wrote:
  I don't think so since it sounds like he's saying to still sync the log and
  VACUUM FULL has an exclusive lock on the table. So any committed (or 
  aborted)
  changes it sees in the table must have been committed or aborted before the
  log sync.
 
  Hint bit updates are not WAL-logged, so there's no mechanism to keep the
  data page from hitting the disk before the COMMIT record. That's the
  reason why we can't just set the hint bits for async committed
  transactions in the first place.
 
 Well the theory was that the flush at the start would ensure that VF's
 first scan could always set the hint bits.  But I remembered this
 morning why I felt uneasy about that: it's not guaranteed true for
 system catalogs.  We sometimes release locks on catalogs before
 committing.  (Whether that is a good idea is a different discussion.)

Yes, I see comments in the VF code along those lines.

Seems like we should have code comments explaining exactly where we do
this, why and which things it causes issues for.

 What I'm now thinking we should do is to have the first scan check
 whether XMIN_COMMITTED is actually set (after HeapTupleSatisfiesVacuum
 claims it's good), and abandon defrag if not, the same as we already do
 in the other corner cases where we find not-guaranteed-committed tuples
 (see INSERT_IN_PROGRESS/DELETE_IN_PROGRESS cases in scan_heap).

I now think we *must* do this for system catalogs, or something else at
least. Personally I would prefer preventing async commits from occurring
when a system catalog has been touched at all, which would make the VF
situation and a few other situations go away entirely.

However, I see no reason to do as you suggest for other tables. It seems
like this would be an annoying feature to have VF sometimes fail,
depending upon the timing of other transactions. There would be a rare
failure if an async commit touched an early block in a table immediately
prior to a VF. It's rare but possible. This would be extremely annoying
if a DBA ran a job to VF a table overnight and then came back in to see
the ERROR message. It is fairly easy to code it this way though, if you
really think this is the best way.

 If we do that, we don't actually need XLogAsyncCommitFlush() at the start.
 I'm inclined to remove it just because it's ugly.  Comments?

I'm not clear why XLogAsyncCommitFlush() is ugly. It's one line of code
that doesn't make anything else any harder. Apart from system catalogs,
doing it that way would be error free for all normal VFs.

 BTW, something else that just penetrated my brain is that this failure
 can only happen with synchronous_commit = off.  In the sync-commit case,
 even though we have inexact bookkeeping for clog LSNs, it's always true
 that every LSN recorded for a clog page will have been flushed first.
 So we will always think we can set hint bits even though we might be
 testing an LSN later than the particular transaction in question.
 I just rechecked and verified that I'd been (without really thinking
 about it) running my test build with synchronous_commit = off for the
 past few days.  If I happened to see this in one of the relatively small
 number of parallel regression sets I've run since then, then it should
 have been obvious in the buildfarm.  The reason it wasn't was that none
 of the buildfarm machines are testing async-commit.  We need to do
 something about that.

Yes, this issue effects async commit only.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Fri, 2007-08-10 at 11:11 -0400, Tom Lane wrote:
 If we do that, we don't actually need XLogAsyncCommitFlush() at the start.
 I'm inclined to remove it just because it's ugly.  Comments?

 I'm not clear why XLogAsyncCommitFlush() is ugly. It's one line of code
 that doesn't make anything else any harder. Apart from system catalogs,
 doing it that way would be error free for all normal VFs.

Please recall that the failure that started this thread was on a regular
user table.  To do what you want will introduce what I'm now thinking
is an unacceptable amount of fragility.  In particular your patch of
yesterday to force hint-bit setting in VF scares the heck out of me.

The reason why XLogAsyncCommitFlush() is ugly is that it doesn't
actually accomplish a darn thing, as we now see from this failure:
it does not guarantee that hint bits will get set, because of the
inexact bookkeeping in clog.c.  It might marginally improve the
probability that they get set, but that's all.  The reason I want
to take it out is mainly that its very existence tempts people to make
the same mistake that was made here.

 I now think we *must* do this for system catalogs, or something else at
 least. Personally I would prefer preventing async commits from occurring
 when a system catalog has been touched at all, which would make the VF
 situation and a few other situations go away entirely.

I think that that is completely the wrong line of thought: we should be
making async commit work everywhere, or as nearly so as possible, not
inventing arbitrary restrictions to place on it.  The restriction
you suggest here would probably cost more performance (by forcing sync
commits) than could ever be gained back by being able to assume hint
bits are set.  In the patch as committed, I took out most of the
restrictions you had on async commit --- the only utility commands that
force sync commit are the ones that have direct effects on the filesystem.

Another argument is that VACUUM FULL is a dinosaur that should probably
go away entirely someday (a view I believe you share); it should
therefore not be allowed to drive the design of other parts of the
system.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 I don't think so since it sounds like he's saying to still sync the log and
 VACUUM FULL has an exclusive lock on the table. So any committed (or aborted)
 changes it sees in the table must have been committed or aborted before the
 log sync.

 Hint bit updates are not WAL-logged, so there's no mechanism to keep the
 data page from hitting the disk before the COMMIT record. That's the
 reason why we can't just set the hint bits for async committed
 transactions in the first place.

Well the theory was that the flush at the start would ensure that VF's
first scan could always set the hint bits.  But I remembered this
morning why I felt uneasy about that: it's not guaranteed true for
system catalogs.  We sometimes release locks on catalogs before
committing.  (Whether that is a good idea is a different discussion.)

What I'm now thinking we should do is to have the first scan check
whether XMIN_COMMITTED is actually set (after HeapTupleSatisfiesVacuum
claims it's good), and abandon defrag if not, the same as we already do
in the other corner cases where we find not-guaranteed-committed tuples
(see INSERT_IN_PROGRESS/DELETE_IN_PROGRESS cases in scan_heap).

If we do that, we don't actually need XLogAsyncCommitFlush() at the start.
I'm inclined to remove it just because it's ugly.  Comments?

BTW, something else that just penetrated my brain is that this failure
can only happen with synchronous_commit = off.  In the sync-commit case,
even though we have inexact bookkeeping for clog LSNs, it's always true
that every LSN recorded for a clog page will have been flushed first.
So we will always think we can set hint bits even though we might be
testing an LSN later than the particular transaction in question.
I just rechecked and verified that I'd been (without really thinking
about it) running my test build with synchronous_commit = off for the
past few days.  If I happened to see this in one of the relatively small
number of parallel regression sets I've run since then, then it should
have been obvious in the buildfarm.  The reason it wasn't was that none
of the buildfarm machines are testing async-commit.  We need to do
something about that.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 The reason why XLogAsyncCommitFlush() is ugly is that it doesn't
 actually accomplish a darn thing, as we now see from this failure:
 it does not guarantee that hint bits will get set, because of the
 inexact bookkeeping in clog.c.  It might marginally improve the
 probability that they get set, but that's all.  The reason I want
 to take it out is mainly that its very existence tempts people to make
 the same mistake that was made here.

I don't understand your reasoning here and I would like to understand it so if
you don't mind I would like to see if I can work out what you're talking
about. Regardless of this point I think the impact of what you were proposing
to do to VF instead was much less than Simon seemed to think it was so it
seems like a perfectly acceptable solution.

As far as I understand the Xlog flush in combination with keeping an exclusive
lock on table and always holding locks until the end of the transaction make
forcing the hint bits entirely safe.

The fragility you see comes from depending on how those three things interact
and the difficulty in ensuring that all of those properties are always true.
By marginally improve the probability you're making a judgement of the
probability that programmers will manage to maintain all those properties
consistently, not about the probability that the race will occur at run-time?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Another argument is that VACUUM FULL is a dinosaur that should probably
 go away entirely someday (a view I believe you share); it should
 therefore not be allowed to drive the design of other parts of the
 system.

Incidentally, every time it comes up we recommend using CLUSTER or ALTER
TABLE. And explaining the syntax for ALTER TABLE is always a bit fiddly. I
wonder if it would make sense to add a VACUUM REWRITE which just did the
same as the noop ALTER TABLE we're recommending people do anyways. Then we
could have a HINT from VACUUM FULL which suggests considering VACUUM REWRITE.

I would think this would be 8.4 stuff except if all we want it to do is a
straight noop alter table it's pretty trivial. The hardest part is coming up
with a name for it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Decibel!
On Fri, Aug 10, 2007 at 07:53:06PM +0100, Gregory Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  Another argument is that VACUUM FULL is a dinosaur that should probably
  go away entirely someday (a view I believe you share); it should
  therefore not be allowed to drive the design of other parts of the
  system.
 
 Incidentally, every time it comes up we recommend using CLUSTER or ALTER
 TABLE. And explaining the syntax for ALTER TABLE is always a bit fiddly. I
 wonder if it would make sense to add a VACUUM REWRITE which just did the
 same as the noop ALTER TABLE we're recommending people do anyways. Then we
 could have a HINT from VACUUM FULL which suggests considering VACUUM REWRITE.
 
 I would think this would be 8.4 stuff except if all we want it to do is a
 straight noop alter table it's pretty trivial. The hardest part is coming up
 with a name for it.

One question... should we have a vacuum variant that also reindexes? Or
does that just naturally fall out of the rewrite?

BTW, rewrite sounds fine to me... anything but full, which is constantly
confused with a full database vacuum.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpnNufc2d6V6.pgp
Description: PGP signature


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Simon Riggs
On Fri, 2007-08-10 at 13:47 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Fri, 2007-08-10 at 11:11 -0400, Tom Lane wrote:
  If we do that, we don't actually need XLogAsyncCommitFlush() at the start.
  I'm inclined to remove it just because it's ugly.  Comments?
 
  I'm not clear why XLogAsyncCommitFlush() is ugly. It's one line of code
  that doesn't make anything else any harder. Apart from system catalogs,
  doing it that way would be error free for all normal VFs.
 
 Please recall that the failure that started this thread was on a regular
 user table.  To do what you want will introduce what I'm now thinking
 is an unacceptable amount of fragility.  In particular your patch of
 yesterday to force hint-bit setting in VF scares the heck out of me.
 
 The reason why XLogAsyncCommitFlush() is ugly is that it doesn't
 actually accomplish a darn thing, as we now see from this failure:
 it does not guarantee that hint bits will get set, because of the
 inexact bookkeeping in clog.c.  It might marginally improve the
 probability that they get set, but that's all.  The reason I want
 to take it out is mainly that its very existence tempts people to make
 the same mistake that was made here.

I think this *can* work, but I accept you don't like it, even if I'm not
sure why. The bug was in the assumption that all flushed async commits
can be confirmed to be flushed, which isn't true, not in the flush
itself.

If VACUUM FULL has problems with catalog tables, then that is an
existing bug, not one introduced by the async patch. Catalog tables
might be unlocked and yet have uncommitted transactions in them, which
violates the assumption in the current VF code that all hint bits will
be set prior to repair_frag(). But the comments in vacuum.c line 1821
say A tuple is either: (a) a tuple in a system catalog, inserted or
deleted by a not yet committed transaction... in case (a) we would not
be in repair_frag() at all (it doesn't give a reason).

If the current comments are correct, then we're OK to fix this by having
a special case HeapTupleSatisfies, maybe coded slightly differently.

If the current comments are false, in that (a) is possible, then VF has
a pre-existing bug, that *must* be fixed in the way you suggest, but
that has nothing to do with async.

...but...

 Another argument is that VACUUM FULL is a dinosaur that should probably
 go away entirely someday (a view I believe you share); it should
 therefore not be allowed to drive the design of other parts of the
 system.

You're right. Let's make that day today.

I vote to completely replace VF now with a cluster-style rebuild of the
table. That way we won't have to keep fixing something we're gonna kill
anyway.

It would be better to release 8.3 with a new, clean, fast implementation
of VF, than to release it with code that we *think* is right, but has so
far proved a source of some truly obscure bugs.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 It would be better to release 8.3 with a new, clean, fast implementation
 of VF, than to release it with code that we *think* is right, but has so
 far proved a source of some truly obscure bugs.

Well building a suitable replacement for VACUUM FULL is more work than I was
proposing. The no-op ALTER TABLE rebuild still has the disadvantage of
requiring 2x the space taken by the table (and potentially more if you have
large indexes).

I also think it's a safer course of action to create a new command, spend one
or two releases improving it until we're sure it meets everyone's needs, then
drop the old command.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Simon Riggs [EMAIL PROTECTED] writes:
 It would be better to release 8.3 with a new, clean, fast implementation
 of VF, than to release it with code that we *think* is right, but has so
 far proved a source of some truly obscure bugs.

 Well building a suitable replacement for VACUUM FULL is more work than I was
 proposing.

Reimplementing VACUUM FULL for 8.3 is right out :-(.  I do want to ship
a release in calendar 2007 ...

 I also think it's a safer course of action to create a new command, spend one
 or two releases improving it until we're sure it meets everyone's needs, then
 drop the old command.

Agreed.  While I'd like to see V.F. dead, we need a proven replacement
first.  (At the same time, we shouldn't spend more effort on V.F. than
we have to.)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Incidentally, every time it comes up we recommend using CLUSTER or ALTER
 TABLE. And explaining the syntax for ALTER TABLE is always a bit fiddly. I
 wonder if it would make sense to add a VACUUM REWRITE which just did the
 same as the noop ALTER TABLE we're recommending people do anyways. Then we
 could have a HINT from VACUUM FULL which suggests considering VACUUM REWRITE.

Not that syntax, please :-(.  The trouble with VACUUM [adjective] is
that adjective has to become a fully reserved keyword, else the parser
can't tell it from a table name.  This is all right for FULL because
that's a reserved word anyway due to the outer join syntax, but I really
don't want to do it for any words that aren't otherwise reserved.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 The reason why XLogAsyncCommitFlush() is ugly is that it doesn't
 actually accomplish a darn thing, as we now see from this failure:
 it does not guarantee that hint bits will get set, because of the
 inexact bookkeeping in clog.c.  It might marginally improve the
 probability that they get set, but that's all.  The reason I want
 to take it out is mainly that its very existence tempts people to make
 the same mistake that was made here.

 I don't understand your reasoning here and I would like to understand it so if
 you don't mind I would like to see if I can work out what you're talking
 about.

Well, both Simon and I thought that XLogAsyncCommitFlush would allow
VACUUM FULL to assume that hint bits were good.  We were both wrong
about that, and in hindsight that goes against the whole trend of PG
development on this topic: hint bits are hints, full stop.  I think
that having XLogAsyncCommitFlush in the API will just encourage people
to use it when they shouldn't.

 As far as I understand the Xlog flush in combination with keeping an exclusive
 lock on table and always holding locks until the end of the transaction make
 forcing the hint bits entirely safe.

I don't currently see a hole in that line of reasoning, but it's a bit
longer chain of reasoning than I like for a critical correctness
property.  Especially when we have a boatload of code that violates the
last assumption, including deeply-embedded APIs (heap_close) that make
it easy to violate the assumption.  We could maybe go out next week and
fix all the core code to not release any locks early, but what of
third-party backend add-ons?  Worse, might we not regret the change
later due to increased deadlock risks?

 By marginally improve the probability you're making a judgement of the
 probability that programmers will manage to maintain all those properties
 consistently, not about the probability that the race will occur at run-time?

No, I was thinking about the latter.  The current CVS tip code doesn't
have a huge window between logically committing a transaction and being
able to set hint bits for it.  In situations where you think hmm, I
just made a big update, maybe a VACUUM FULL would be a good idea,
you'll be quite safe by the time you've managed to type VACUUM FULL.
A V.F. that is automatically issued while a lot of other stuff is
going on will be at larger risk of having the defragment step disabled,
but at the same time it's not obvious that anyone will care much.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-09 Thread Simon Riggs
On Wed, 2007-08-08 at 23:23 -0400, Tom Lane wrote:
 I wrote:
  ... Since we've whacked the tqual.c logic around recently,
  the problem might actually lie there...
 
 In fact, I bet this is a result of the async-commit patch.  The places
 where vacuum.c bleats HEAP_MOVED_OFF was expected are all places where
 it is looking at a tuple not marked XMIN_COMMITTED; it expects that
 after its first pass over the table, *every* tuple is either
 XMIN_COMMITTED or one that it moved.  Async commit changed tqual.c
 so that tuples that are in fact known committed might not get marked
 XMIN_COMMITTED right away.  The patch tries to prevent this from
 happening within VACUUM FULL by means of
 
 /* 
  * VACUUM FULL assumes that all tuple states are well-known prior to
  * moving tuples around --- see comment known dead in repair_frag(),
  * as well as simplifications in tqual.c.  So before we start we must
  * ensure that any asynchronously-committed transactions with changes
  * against this table have been flushed to disk.  It's sufficient to do
  * this once after we've acquired AccessExclusiveLock.
  */
 XLogAsyncCommitFlush();
 
 but I bet lunch that that's not good enough.  I still haven't reproduced
 it, but I'm thinking that the inexact bookkeeping that we created for
 clog page LSNs allows tuples to not get marked if the right sort of
 timing of concurrent transactions happens.
 
 Not sure about the best solution for this.

Good hunch. I plugged this hole earlier, but on further inspection I can
see the plug wasn't wide enough. XLogAsyncCommitFlush() is good enough,
but HeapTupleSatisfiesVacuum() still allowed the inexact bookkeeping to
sometimes skip hint bit setting, when executed with concurrent
transactions touching other tables.

ISTM that if we call HeapTupleSatisfiesVacuum() with an additional
boolean parameter, force, we can tell VF to always set the hint bits in
every case, not just HEAP_MOVED_IN and HEAP_MOVED_OUT.

Patch enclosed, but a little crufty. Gotta run now, talk later.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com
Index: src/backend/commands/vacuum.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.354
diff -c -r1.354 vacuum.c
*** src/backend/commands/vacuum.c	1 Aug 2007 22:45:08 -	1.354
--- src/backend/commands/vacuum.c	9 Aug 2007 07:24:41 -
***
*** 1384,1390 
  			tuple.t_len = ItemIdGetLength(itemid);
  			ItemPointerSet((tuple.t_self), blkno, offnum);
  
! 			switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
  			{
  case HEAPTUPLE_DEAD:
  	tupgone = true;		/* we can delete the tuple */
--- 1384,1390 
  			tuple.t_len = ItemIdGetLength(itemid);
  			ItemPointerSet((tuple.t_self), blkno, offnum);
  
! 			switch (HeapTupleSatisfiesVacuumFull(tuple.t_data, OldestXmin, buf))
  			{
  case HEAPTUPLE_DEAD:
  	tupgone = true;		/* we can delete the tuple */
***
*** 1998,2004 
  		break;
  	}
  	/* must check for DEAD or MOVED_IN tuple, too */
! 	nextTstatus = HeapTupleSatisfiesVacuum(nextTdata,
  		   OldestXmin,
  		   nextBuf);
  	if (nextTstatus == HEAPTUPLE_DEAD ||
--- 1998,2004 
  		break;
  	}
  	/* must check for DEAD or MOVED_IN tuple, too */
! 	nextTstatus = HeapTupleSatisfiesVacuumFull(nextTdata,
  		   OldestXmin,
  		   nextBuf);
  	if (nextTstatus == HEAPTUPLE_DEAD ||
Index: src/backend/utils/time/tqual.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.103
diff -c -r1.103 tqual.c
*** src/backend/utils/time/tqual.c	1 Aug 2007 22:45:09 -	1.103
--- src/backend/utils/time/tqual.c	9 Aug 2007 07:24:43 -
***
*** 77,82 
--- 77,85 
  TransactionId RecentGlobalXmin = InvalidTransactionId;
  
  /* local functions */
+ static HTSV_Result HeapTupleSatisfiesVacuumInternal(HeapTupleHeader tuple, 
+ 		TransactionId OldestXmin, Buffer buffer, bool locked);
+ 
  static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
  
  
***
*** 1045,1054 
   * deleted by XIDs = OldestXmin are deemed recently dead; they might
   * still be visible to some open transaction, so we can't remove them,
   * even if we see that the deleting transaction has committed.
   */
  HTSV_Result
! HeapTupleSatisfiesVacuum(HeapTupleHeader tuple, TransactionId OldestXmin,
! 		 Buffer buffer)
  {
  	/*
  	 * Has inserting transaction committed?
--- 1048,1060 
   * deleted by XIDs = OldestXmin are deemed recently dead; they might
   * still be visible to some open transaction, so we can't remove them,
   * even if we see that the deleting transaction has committed.
+  *
+  * If the heap we are checking is exclusively locked, we can skip 

Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Good hunch. I plugged this hole earlier, but on further inspection I can
 see the plug wasn't wide enough. XLogAsyncCommitFlush() is good enough,
 but HeapTupleSatisfiesVacuum() still allowed the inexact bookkeeping to
 sometimes skip hint bit setting, when executed with concurrent
 transactions touching other tables.

 ISTM that if we call HeapTupleSatisfiesVacuum() with an additional
 boolean parameter, force, we can tell VF to always set the hint bits in
 every case, not just HEAP_MOVED_IN and HEAP_MOVED_OUT.

Surely this approach is no good: won't it allow hint bits to reach disk
in advance of their transaction?

I think it'd be safer, and a lot less ugly, to recast the tests in
VACUUM FULL.  If we make the first pass clear any old MOVED_IN/MOVED_OUT
bits then the last pass can key off those instead of assuming that
XMIN_COMMITTED is set everywhere.  Then we'd not need
XLogAsyncCommitFlush, which is a kluge anyway.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-09 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Simon Riggs [EMAIL PROTECTED] writes:

 ISTM that if we call HeapTupleSatisfiesVacuum() with an additional
 boolean parameter, force, we can tell VF to always set the hint bits in
 every case, not just HEAP_MOVED_IN and HEAP_MOVED_OUT.

 Surely this approach is no good: won't it allow hint bits to reach disk
 in advance of their transaction?

I don't think so since it sounds like he's saying to still sync the log and
VACUUM FULL has an exclusive lock on the table. So any committed (or aborted)
changes it sees in the table must have been committed or aborted before the
log sync.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-08 Thread Alvaro Herrera
Tom Lane wrote:
 This is a bit disturbing:
 
 *** ./expected/vacuum.out Sat Jul 20 00:58:14 2002
 --- ./results/vacuum.out  Wed Aug  8 20:16:45 2007
 ***
 *** 50,55 
 --- 50,56 
   
   DELETE FROM vactst WHERE i != 0;
   VACUUM FULL vactst;
 + ERROR:  HEAP_MOVED_OFF was expected
   DELETE FROM vactst;
   SELECT * FROM vactst;
i 
 
 ==
 
 This is today's CVS HEAD, plus some startup/shutdown logic changes in
 postmaster.c that hardly seem like they could be related.
 
 I couldn't reproduce it in a few tries.  A reasonable guess is that
 it's triggered by autovacuum deciding to vacuum the table sometime
 before the VACUUM FULL starts.  Anyone want to try to reproduce it?

Hum, aren't vacuums supposed to be blocked by each other?  Maybe this is
about a toast table not being locked enough against concurrent vacuuming
of the main table.

I'm currently away on vacation, which is why I've missed all the stuff
going on here.  Sorry for not letting people know.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-08 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I couldn't reproduce it in a few tries.  A reasonable guess is that
 it's triggered by autovacuum deciding to vacuum the table sometime
 before the VACUUM FULL starts.  Anyone want to try to reproduce it?

 Hum, aren't vacuums supposed to be blocked by each other?

Sure.  I'm not thinking it's a case of concurrent vacuums (if it is,
we've got worse problems than anyone imagined), but rather that the
autovac left the table in a state that exposes a bug in the subsequent
VACUUM FULL.  Since we've whacked the tqual.c logic around recently,
the problem might actually lie there...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Unexpected VACUUM FULL failure

2007-08-08 Thread Tom Lane
I wrote:
 ... Since we've whacked the tqual.c logic around recently,
 the problem might actually lie there...

In fact, I bet this is a result of the async-commit patch.  The places
where vacuum.c bleats HEAP_MOVED_OFF was expected are all places where
it is looking at a tuple not marked XMIN_COMMITTED; it expects that
after its first pass over the table, *every* tuple is either
XMIN_COMMITTED or one that it moved.  Async commit changed tqual.c
so that tuples that are in fact known committed might not get marked
XMIN_COMMITTED right away.  The patch tries to prevent this from
happening within VACUUM FULL by means of

/* 
 * VACUUM FULL assumes that all tuple states are well-known prior to
 * moving tuples around --- see comment known dead in repair_frag(),
 * as well as simplifications in tqual.c.  So before we start we must
 * ensure that any asynchronously-committed transactions with changes
 * against this table have been flushed to disk.  It's sufficient to do
 * this once after we've acquired AccessExclusiveLock.
 */
XLogAsyncCommitFlush();

but I bet lunch that that's not good enough.  I still haven't reproduced
it, but I'm thinking that the inexact bookkeeping that we created for
clog page LSNs allows tuples to not get marked if the right sort of
timing of concurrent transactions happens.

Not sure about the best solution for this.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate