subject:"\[HACKERS\] Make relation_openrv atomic wrt DDL"

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-08 Thread Noah Misch

On Thu, Jul 07, 2011 at 09:30:47PM -0400, Robert Haas wrote:
 On Thu, Jul 7, 2011 at 3:54 PM, Noah Misch n...@2ndquadrant.com wrote:
  Yes.  DDL-DDL concurrency is a much smaller practical concern, but it is a
  quality-of-implementation hole.
 
 Agreed, although I'm not too pleased about the fact that this doesn't
 fix nextval().  That seems like a fairly significant hole (but one to
 be addressed by a later patch).

True.

  The third sentence is true for all lock levels.  The fourth sentence is true
  for lock levels _except_ NoLock.
 
 I rewrote this whole blurb.  See what you think.

Seems better:

 +  * DDL operations can change the results of a name lookup.  Since all
 +  * such operations will generate invalidation messages, we keep track
 +  * of whether any such messages show up while we're performing the
 +  * operation, and retry until either (1) no more invalidation messages
 +  * show up or (2) the answer doesn't change.
 +  *
 +  * But if lockmode = NoLock, then we assume that either the caller is OK
 +  * with the answer changing under them, or that they already hold some
 +  * appropriate lock, and therefore return the first answer we get 
 without
 +  * checking for invalidation messages.  Also, if the requested lock is
 +  * already held, no LockRelationOid will not AcceptInvalidationMessages,

Extra word in that sentence.

 +  * so we may fail to notice a change.  We could protect against that 
 case

I failed to note it last time, but it might be worth mentioning that failing to
notice a change only happens due to search_path interposition.

 +  * by calling AcceptInvalidationMessages() before beginning this loop,
 +  * but that would add a significant amount overhead, so for now we 
 don't.


  +             /*
  +              * If no lock requested, we assume the caller knows what 
  they're
  +              * doing.  They should have already acquired a heavyweight 
  lock on
  +              * this relation earlier in the processing of this same 
  statement,
  +              * so it wouldn't be appropriate to 
  AcceptInvalidationMessages()
  +              * here, as that might pull the rug out from under them.
  +              */
 
  What sort of rug-pullout do you have in mind?  Also, I don't think it 
  matters
  if the caller acquired the lock during this _statement_.  It just needs to
  hold a lock, somehow.
 
 What I'm specifically concerned about here is the possibility that we
 have code which does RangeVarGetRelid() multiple times and expects to
 get the same relation every time.  Now, granted, any such places are
 bugs.  But I am not eager to change the logic here without looking
 harder for them (and also for performance reasons).

Yeah, I think a key point is that we're not promising to avoid calling
AcceptInvalidationMessages() to prop up code that relies on it not getting
called.  We just know it's not needed in this case, so we save that expense.

  ... also making it cleaner to preserve this optimization.
 
 That optimization is now gone anyway.

Okay.

  Incidentally, you've added in many places this pattern of commenting that a
  lock level must match another lock level used elsewhere.  Would it be better
  to migrate away from looking up a relation oid in one function and opening 
  the
  relation in another function, instead passing either a Relation or a 
  RangeVar?
  You did substitute passing a Relation in a couple of places.

[reasons this is a can of worms]
 At any rate, I'm not inventing this problem; there are plenty of
 existing instances where this same phenomenon occurs.  At least I'm
 documenting the ones I'm adding.  There's probably room for further
 improvement and restructuring of this code, but right at the moment I
 feel like the reasonable alternatives are (a) to pass a lock level
 that matches what will ultimately be taken later on or (b) pass
 NoLock.  I'm voting for the former.

Works for me.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-07 Thread Robert Haas

On Wed, Jul 6, 2011 at 10:44 PM, Noah Misch n...@2ndquadrant.com wrote:
 Attached.  I made the counter 64 bits wide, handled the nothing-found case per
 your idea, and improved a few comments cosmetically.  I have not attempted to
 improve the search_path interposition case.  We can recommend the workaround
 above, and doing better looks like an excursion much larger than the one
 represented by this patch.

I looked at this some more and started to get uncomfortable with the
whole idea of having RangeVarLockRelid() be a wrapper around
RangeVarGetRelid().  This hazard exists everywhere the latter function
gets called, not just in relation_open().  So it doesn't seem right to
fix the problem only in those places.

So I went through and incorporated the logic proposed for
RangeVarLockRelid() into RangeVarGetRelid() itself, and then went
through and examined all the callers of RangeVarGetRelid().  There are
some, such as has_table_privilege(), where it's really impractical to
take any lock, first because we might have no privileges at all on
that table and second because that could easily lead to a massive
amount of locking for no particular good reason.  I believe Tom
suggested that the right fix for these functions is to have them
index-scan the system catalogs using the caller's MVCC snapshot, which
would be right at least for pg_dump.  And there are other callers that
cannot acquire the lock as part of RangeVarGetRelid() for a variety of
other reasons.  However, having said that, there do appear to be a
number of cases that are can be fixed fairly easily.

So here's a (heavily) updated patch that tries to do that, along with
adding comments to the places where things still need more fixing.  In
addition to the problems corrected by your last version, this fixes
LOCK TABLE, ALTER SEQUENCE, ALTER TABLE .. RENAME, the whole-table
variant of REINDEX, CREATE CONSTRAINT TRIGGER (which is flat-out wrong
as it stands, since it acquires *no lock at all* on the table
specified in the FROM clause, never mind the question of doing so
atomically), CREATE RULE, and (partially) DROP TRIGGER and DROP RULE.

Regardless of exactly how we decide to proceed here, it strikes me
that there is a heck of a lot more work that could stand to be done in
this area...  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


atomic-rangevargetrelid.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-07 Thread Noah Misch

On Thu, Jul 07, 2011 at 11:43:30AM -0400, Robert Haas wrote:
 On Wed, Jul 6, 2011 at 10:44 PM, Noah Misch n...@2ndquadrant.com wrote:
  Attached.  I made the counter 64 bits wide, handled the nothing-found case 
  per
  your idea, and improved a few comments cosmetically.  I have not attempted 
  to
  improve the search_path interposition case.  We can recommend the workaround
  above, and doing better looks like an excursion much larger than the one
  represented by this patch.
 
 I looked at this some more and started to get uncomfortable with the
 whole idea of having RangeVarLockRelid() be a wrapper around
 RangeVarGetRelid().  This hazard exists everywhere the latter function
 gets called, not just in relation_open().  So it doesn't seem right to
 fix the problem only in those places.

Yes; I wished to focus on the major case for this round, then address the
other callers later.  We can do it this way, though.

It does make long-term sense to expose only the lock-taking form, making
otherwise-unaffected callers say NoLock explicitly.

 So I went through and incorporated the logic proposed for
 RangeVarLockRelid() into RangeVarGetRelid() itself, and then went
 through and examined all the callers of RangeVarGetRelid().  There are
 some, such as has_table_privilege(), where it's really impractical to
 take any lock, first because we might have no privileges at all on
 that table and second because that could easily lead to a massive
 amount of locking for no particular good reason.  I believe Tom
 suggested that the right fix for these functions is to have them
 index-scan the system catalogs using the caller's MVCC snapshot, which
 would be right at least for pg_dump.  And there are other callers that
 cannot acquire the lock as part of RangeVarGetRelid() for a variety of
 other reasons.  However, having said that, there do appear to be a
 number of cases that are can be fixed fairly easily.
 
 So here's a (heavily) updated patch that tries to do that, along with
 adding comments to the places where things still need more fixing.  In
 addition to the problems corrected by your last version, this fixes
 LOCK TABLE, ALTER SEQUENCE, ALTER TABLE .. RENAME, the whole-table
 variant of REINDEX, CREATE CONSTRAINT TRIGGER (which is flat-out wrong
 as it stands, since it acquires *no lock at all* on the table
 specified in the FROM clause, never mind the question of doing so
 atomically), CREATE RULE, and (partially) DROP TRIGGER and DROP RULE.

Looks basically sound; see a few code comments below.

 Regardless of exactly how we decide to proceed here, it strikes me
 that there is a heck of a lot more work that could stand to be done in
 this area...  :-(

Yes.  DDL-DDL concurrency is a much smaller practical concern, but it is a
quality-of-implementation hole.

 --- a/src/backend/catalog/namespace.c
 +++ b/src/backend/catalog/namespace.c

 @@ -238,37 +246,121 @@ RangeVarGetRelid(const RangeVar *relation, bool failOK)
   }
  
   /*
 -  * Some non-default relpersistence value may have been specified.  The
 -  * parser never generates such a RangeVar in simple DML, but it can 
 happen
 -  * in contexts such as CREATE TEMP TABLE foo (f1 int PRIMARY KEY).  
 Such
 -  * a command will generate an added CREATE INDEX operation, which must 
 be
 -  * careful to find the temp table, even when pg_temp is not first in the
 -  * search path.
 +  * If lockmode = NoLock, the caller is assumed to already hold some sort
 +  * of heavyweight lock on the target relation.  Otherwise, we're 
 preceding
 +  * here with no lock at all, which means that any answers we get must be
 +  * viewed with a certain degree of suspicion.  In particular, any time 
 we
 +  * AcceptInvalidationMessages(), the anwer might change.  We handle that
 +  * case by retrying the operation until either (1) no more invalidation
 +  * messages show up or (2) the answer doesn't change.

The third sentence is true for all lock levels.  The fourth sentence is true
for lock levels _except_ NoLock.

 + /*
 +  * If no lock requested, we assume the caller knows what they're
 +  * doing.  They should have already acquired a heavyweight lock 
 on
 +  * this relation earlier in the processing of this same 
 statement,
 +  * so it wouldn't be appropriate to AcceptInvalidationMessages()
 +  * here, as that might pull the rug out from under them.
 +  */

What sort of rug-pullout do you have in mind?  Also, I don't think it matters
if the caller acquired the lock during this _statement_.  It just needs to
hold a lock, somehow.

 --- a/src/backend/commands/lockcmds.c
 +++ b/src/backend/commands/lockcmds.c

 @@ -69,67 +70,10 @@ LockTableCommand(LockStmt *lockstmt)
   * rv is NULL and we should just silently ignore any dropped child rel.

This comment refers to a now-removed argument.

   */
  static void
 -LockTableRecurse(Oid

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-07 Thread Robert Haas

On Thu, Jul 7, 2011 at 3:54 PM, Noah Misch n...@2ndquadrant.com wrote:
 Yes.  DDL-DDL concurrency is a much smaller practical concern, but it is a
 quality-of-implementation hole.

Agreed, although I'm not too pleased about the fact that this doesn't
fix nextval().  That seems like a fairly significant hole (but one to
be addressed by a later patch).

 The third sentence is true for all lock levels.  The fourth sentence is true
 for lock levels _except_ NoLock.

I rewrote this whole blurb.  See what you think.

 +             /*
 +              * If no lock requested, we assume the caller knows what 
 they're
 +              * doing.  They should have already acquired a heavyweight 
 lock on
 +              * this relation earlier in the processing of this same 
 statement,
 +              * so it wouldn't be appropriate to 
 AcceptInvalidationMessages()
 +              * here, as that might pull the rug out from under them.
 +              */

 What sort of rug-pullout do you have in mind?  Also, I don't think it matters
 if the caller acquired the lock during this _statement_.  It just needs to
 hold a lock, somehow.

What I'm specifically concerned about here is the possibility that we
have code which does RangeVarGetRelid() multiple times and expects to
get the same relation every time.  Now, granted, any such places are
bugs.  But I am not eager to change the logic here without looking
harder for them (and also for performance reasons).

 ... also making it cleaner to preserve this optimization.

That optimization is now gone anyway.

 Incidentally, you've added in many places this pattern of commenting that a
 lock level must match another lock level used elsewhere.  Would it be better
 to migrate away from looking up a relation oid in one function and opening the
 relation in another function, instead passing either a Relation or a RangeVar?
 You did substitute passing a Relation in a couple of places.

Well, I've got these:

- ReindexTable() must match reindex_relation()
- ExecRenameStmt() must match RenameRelation()
- DropTrigger() must match RemoveTriggerById()
- DefineRule() must match DefineQueryRewrite()
- RemoveRewriteRule() must match RemoveRewriteRuleById()

(Whoever came up with these names was clearly a genius...)

RemoveTriggerById() and RemoveRewriteRuleById() are part of the
drop-statement machinery - they can be called either because that
rule/trigger itself gets dropped, or because some other object gets
dropped and it cascades to the rule/trigger.  I'm pretty sure I don't
want to go start mucking with that machinery.

On the other hand, RenameRelation() is only called in one place other
than ExecRenameStmt(), and it looks to me like the other caller could
just as well use RenameRelationInternal().  If we change that, then we
can redefine RenameRelation() to take a RangeVar instead of an Oid and
push the rest of the logic from ExecRenameStmt() into it.  That would
probably be better all around.

The situation with DefineRule and DefineQueryRewrite is a bit more
nuanced.  As you suggest, those could probably be merged into one
function taking both an Oid and a RangeVar.  If the Oid is not
InvalidOid, we do heap_open(); else, we do heap_openrv().  That's
slightly ugly, but there are only two callers, so maybe it's not so
bad.

Offhand, I don't see any good way to rearrange reindex_relation()
along those lines.  ReindexTable() wants to check permissions, not
just open the relation.  And passing a Relation to reindex_relation()
rather than an Oid or RangeVar is no good either; you still end up
with multiple people needing to know what lock level to use.

At any rate, I'm not inventing this problem; there are plenty of
existing instances where this same phenomenon occurs.  At least I'm
documenting the ones I'm adding.  There's probably room for further
improvement and restructuring of this code, but right at the moment I
feel like the reasonable alternatives are (a) to pass a lock level
that matches what will ultimately be taken later on or (b) pass
NoLock.  I'm voting for the former.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


atomic-rangevargetrelid-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-06 Thread Robert Haas

On Sun, Jun 19, 2011 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
 On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:
 So I was the victim assigned to review this patch.

 Thanks for doing so.

This discussion seems to have died off.  Let's see if we can drive
this forward to some conclusion.

I took a look at this patch and found that it had bit-rotted slightly.
 I am attaching a rebased version.

Maybe this is a stupid idea, but what about changing the logic so
that, if we get back InvalidOid, we AcceptInvalidationMessages() and
retry if the counter has advanced?  ISTM that might cover the example
you mentioned in your last post, where we fail to detect a relation
that has come into existence since our last call to
AcceptInvalidationMessages().  It would cost an extra
AcceptInvalidationMessages() only in the case where we haven't found
the relation, which (a) seems like a good time to worry about whether
we're missing something, since users generally try not to reference
nonexistent tables and (b) should be rare enough to be ignorable from
a performance perspective.

In the department of minor nitpicking, why not use a 64-bit counter
for SharedInvalidMessageCounter?  Then we don't have to think very
hard about whether overflow can ever pose a problem.

It strikes me that, even with this patch, there is a fair amount of
room for wonky behavior.  For example, as your comment notes, if
search_path = foo, bar, and we've previously referenced x, getting
bar.x, the creation of foo.x will generate invalidation messages,
but a subsequent reference - within the same transaction - to x will
not cause us to read them.  It would be nice to
AcceptInvalidationMessages() unconditionally at the beginning of
RangeVarGetRelid() [and then redo as necessary to get a stable
answer], but that might have some performance consequence for
transactions that repeatedly read the same tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


atomic-openrv-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-06 Thread Noah Misch

On Wed, Jul 06, 2011 at 03:06:40PM -0400, Robert Haas wrote:
 On Sun, Jun 19, 2011 at 11:42 PM, Noah Misch n...@leadboat.com wrote:
  On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:
  So I was the victim assigned to review this patch.
 
  Thanks for doing so.
 
 This discussion seems to have died off.  Let's see if we can drive
 this forward to some conclusion.
 
 I took a look at this patch and found that it had bit-rotted slightly.
  I am attaching a rebased version.

Thanks.

 Maybe this is a stupid idea, but what about changing the logic so
 that, if we get back InvalidOid, we AcceptInvalidationMessages() and
 retry if the counter has advanced?  ISTM that might cover the example
 you mentioned in your last post, where we fail to detect a relation
 that has come into existence since our last call to
 AcceptInvalidationMessages().  It would cost an extra
 AcceptInvalidationMessages() only in the case where we haven't found
 the relation, which (a) seems like a good time to worry about whether
 we're missing something, since users generally try not to reference
 nonexistent tables and (b) should be rare enough to be ignorable from
 a performance perspective.

Agreed on all points.  Good idea.  That improves our guarantee from any
client-issued command will see tables committed before its submission to
_any command_ will see tables committed before its _parsing_.  In
particular, commands submitted using SPI will no longer be subject to this
source of déjà vu.  I, too, doubt that looking up nonexistent relations is a
performance-critical operation for anyone.

 In the department of minor nitpicking, why not use a 64-bit counter
 for SharedInvalidMessageCounter?  Then we don't have to think very
 hard about whether overflow can ever pose a problem.

Overflow is fine because I only ever compare values for equality, and I use an
unsigned int to give defined behavior at overflow.  However, the added cost of
a 64-bit counter should be negligible, and future use cases (including
external code) might appreciate it.  No strong preference.

 It strikes me that, even with this patch, there is a fair amount of
 room for wonky behavior.  For example, as your comment notes, if
 search_path = foo, bar, and we've previously referenced x, getting
 bar.x, the creation of foo.x will generate invalidation messages,
 but a subsequent reference - within the same transaction - to x will
 not cause us to read them.  It would be nice to
 AcceptInvalidationMessages() unconditionally at the beginning of
 RangeVarGetRelid() [and then redo as necessary to get a stable
 answer], but that might have some performance consequence for
 transactions that repeatedly read the same tables.

A user doing that should LOCK bar.x in the transaction that creates foo.x,
giving a clean cutover.  (I thought of documenting that somewhere, but it
seemed a tad esoteric.)  In the absence of such a lock, an extra unconditional
call to AcceptInvalidationMessages() narrows the window in which his commands
parse as using the wrong table.  However, commands that have already parsed
will still use the old table without interruption, with no particular bound on
when they may finish.  I've failed to come up with a use case where the
narrower window for parse inconsistencies is valuable but the remaining
exposure is acceptable.  There may very well be one I'm missing, though.

While a mere LOCK bar.x is sufficient to get a clean cutover with respect to
parsing, it fails to invalidate plans.  To really cover all bases, you need
some no-op action that invalidates bar.x.  For actual practical use, I'd
recommend something like:

BEGIN;
ALTER TABLE bar.x RENAME TO x0;
ALTER TABLE bar.x0 RENAME TO x;
CREATE TABLE foo.x ...
COMMIT;

Probably worth making it more intuitive to DTRT here.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-06 Thread Robert Haas

On Wed, Jul 6, 2011 at 6:32 PM, Noah Misch n...@2ndquadrant.com wrote:
 Maybe this is a stupid idea, but what about changing the logic so
 that, if we get back InvalidOid, we AcceptInvalidationMessages() and
 retry if the counter has advanced?  ISTM that might cover the example
 you mentioned in your last post, where we fail to detect a relation
 that has come into existence since our last call to
 AcceptInvalidationMessages().  It would cost an extra
 AcceptInvalidationMessages() only in the case where we haven't found
 the relation, which (a) seems like a good time to worry about whether
 we're missing something, since users generally try not to reference
 nonexistent tables and (b) should be rare enough to be ignorable from
 a performance perspective.

 Agreed on all points.  Good idea.  That improves our guarantee from any
 client-issued command will see tables committed before its submission to
 _any command_ will see tables committed before its _parsing_.  In
 particular, commands submitted using SPI will no longer be subject to this
 source of déją vu.  I, too, doubt that looking up nonexistent relations is a
 performance-critical operation for anyone.

 In the department of minor nitpicking, why not use a 64-bit counter
 for SharedInvalidMessageCounter?  Then we don't have to think very
 hard about whether overflow can ever pose a problem.

 Overflow is fine because I only ever compare values for equality, and I use an
 unsigned int to give defined behavior at overflow.  However, the added cost of
 a 64-bit counter should be negligible, and future use cases (including
 external code) might appreciate it.  No strong preference.

Yeah, that's what I was thinking.  I have a feeling we may want to use
this mechanism in other places, including places where it would be
nice to be able to assume that  has sensible semantics.

 It strikes me that, even with this patch, there is a fair amount of
 room for wonky behavior.  For example, as your comment notes, if
 search_path = foo, bar, and we've previously referenced x, getting
 bar.x, the creation of foo.x will generate invalidation messages,
 but a subsequent reference - within the same transaction - to x will
 not cause us to read them.  It would be nice to
 AcceptInvalidationMessages() unconditionally at the beginning of
 RangeVarGetRelid() [and then redo as necessary to get a stable
 answer], but that might have some performance consequence for
 transactions that repeatedly read the same tables.

 A user doing that should LOCK bar.x in the transaction that creates foo.x,
 giving a clean cutover.  (I thought of documenting that somewhere, but it
 seemed a tad esoteric.)  In the absence of such a lock, an extra unconditional
 call to AcceptInvalidationMessages() narrows the window in which his commands
 parse as using the wrong table.  However, commands that have already parsed
 will still use the old table without interruption, with no particular bound on
 when they may finish.  I've failed to come up with a use case where the
 narrower window for parse inconsistencies is valuable but the remaining
 exposure is acceptable.  There may very well be one I'm missing, though.

 While a mere LOCK bar.x is sufficient to get a clean cutover with respect to
 parsing, it fails to invalidate plans.  To really cover all bases, you need
 some no-op action that invalidates bar.x.  For actual practical use, I'd
 recommend something like:

        BEGIN;
        ALTER TABLE bar.x RENAME TO x0;
        ALTER TABLE bar.x0 RENAME TO x;
        CREATE TABLE foo.x ...
        COMMIT;

 Probably worth making it more intuitive to DTRT here.

Well, what would be really nice is if it just worked.

Care to submit an updated patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-07-06 Thread Noah Misch

On Wed, Jul 06, 2011 at 08:35:55PM -0400, Robert Haas wrote:
 On Wed, Jul 6, 2011 at 6:32 PM, Noah Misch n...@2ndquadrant.com wrote:
  While a mere LOCK bar.x is sufficient to get a clean cutover with respect 
  to
  parsing, it fails to invalidate plans.  To really cover all bases, you need
  some no-op action that invalidates bar.x.  For actual practical use, I'd
  recommend something like:
 
         BEGIN;
         ALTER TABLE bar.x RENAME TO x0;
         ALTER TABLE bar.x0 RENAME TO x;
         CREATE TABLE foo.x ...
         COMMIT;
 
  Probably worth making it more intuitive to DTRT here.
 
 Well, what would be really nice is if it just worked.

Yes.

 Care to submit an updated patch?

Attached.  I made the counter 64 bits wide, handled the nothing-found case per
your idea, and improved a few comments cosmetically.  I have not attempted to
improve the search_path interposition case.  We can recommend the workaround
above, and doing better looks like an excursion much larger than the one
represented by this patch.

Thanks,
nm
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c9b1d5f..a345e39 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 975,1000  relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
Oid relOid;
  
!   /*
!* Check for shared-cache-inval messages before trying to open the
!* relation.  This is needed to cover the case where the name 
identifies a
!* rel that has been dropped and recreated since the start of our
!* transaction: if we don't flush the old syscache entry then we'll 
latch
!* onto that entry and suffer an error when we do RelationIdGetRelation.
!* Note that relation_open does not need to do this, since a relation's
!* OID never changes.
!*
!* We skip this if asked for NoLock, on the assumption that the caller 
has
!* already ensured some appropriate lock is held.
!*/
!   if (lockmode != NoLock)
!   AcceptInvalidationMessages();
! 
!   /* Look up the appropriate relation using namespace search */
!   relOid = RangeVarGetRelid(relation, false);
  
/* Let relation_open do the rest */
!   return relation_open(relOid, lockmode);
  }
  
  /* 
--- 975,985 
  {
Oid relOid;
  
!   /* Look up and lock the appropriate relation using namespace search */
!   relOid = RangeVarLockRelid(relation, lockmode, false);
  
/* Let relation_open do the rest */
!   return relation_open(relOid, NoLock);
  }
  
  /* 
***
*** 1012,1041  relation_openrv_extended(const RangeVar *relation, LOCKMODE 
lockmode,
  {
Oid relOid;
  
!   /*
!* Check for shared-cache-inval messages before trying to open the
!* relation.  This is needed to cover the case where the name 
identifies a
!* rel that has been dropped and recreated since the start of our
!* transaction: if we don't flush the old syscache entry then we'll 
latch
!* onto that entry and suffer an error when we do RelationIdGetRelation.
!* Note that relation_open does not need to do this, since a relation's
!* OID never changes.
!*
!* We skip this if asked for NoLock, on the assumption that the caller 
has
!* already ensured some appropriate lock is held.
!*/
!   if (lockmode != NoLock)
!   AcceptInvalidationMessages();
! 
!   /* Look up the appropriate relation using namespace search */
!   relOid = RangeVarGetRelid(relation, missing_ok);
  
/* Return NULL on not-found */
if (!OidIsValid(relOid))
return NULL;
  
/* Let relation_open do the rest */
!   return relation_open(relOid, lockmode);
  }
  
  /* 
--- 997,1011 
  {
Oid relOid;
  
!   /* Look up and lock the appropriate relation using namespace search */
!   relOid = RangeVarLockRelid(relation, lockmode, missing_ok);
  
/* Return NULL on not-found */
if (!OidIsValid(relOid))
return NULL;
  
/* Let relation_open do the rest */
!   return relation_open(relOid, NoLock);
  }
  
  /* 
diff --git a/src/backend/catalog/namespindex ce795a6..f75fcef 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***
*** 44,49 
--- 44,51 
  #include parser/parse_func.h
  #include storage/backendid.h
  #include storage/ipc.h
+ #include storage/lmgr.h
+ #include storage/sinval.h
  #include utils/acl.h
  #include utils/builtins.h
  #include utils/guc.h
***
*** 285,290  RangeVarGetRelid(const RangeVar *relation, bool failOK)
--- 287,372 
  }
  
  /*
+  * RangeVarLockRelid
+  *Like

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-19 Thread Greg Stark

So I was the victim assigned to review this patch.

The code is pretty much impeccable as usual from Noah. But I have
questions about the semantics of it.

Firstly this bit makes me wonder:

+   /* Not-found is always final. */
+   if (!OidIsValid(relOid1))
+   return relOid1;

If someone does

BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT;

Then what prevents this logic from finding the doomed relation,
blocking until the transaction commits, then finding it's deleted and
returning InvalidOid?
RangeVarGetRelid is just going to complete its index scan of pg_class
and may not come across the newly inserted row.

Am I missing something? I would have expected to have to loop around
and retry if no valid record is found. But this raises the question --
if no lock was acquired then what would have triggered an
AcceptInvalidatationMessages and how would we know we waited long
enough to find out about the newly created table?

As a side note, if there are a long stream of such concurrent DDL then
this code will leave all the old versions locked. This is consistent
with our hold locks until end of transaction semantics but it seems
weird for tables that we locked accidentally and didn't really end
up using at all. I'm not sure it's really bad though.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-19 Thread Noah Misch

Hi Greg,

On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:
 So I was the victim assigned to review this patch.

Thanks for doing so.

 The code is pretty much impeccable as usual from Noah. But I have
 questions about the semantics of it.
 
 Firstly this bit makes me wonder:
 
 +   /* Not-found is always final. */
 +   if (!OidIsValid(relOid1))
 +   return relOid1;
 
 If someone does
 
 BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT;
 
 Then what prevents this logic from finding the doomed relation,
 blocking until the transaction commits, then finding it's deleted and
 returning InvalidOid?
 RangeVarGetRelid is just going to complete its index scan of pg_class
 and may not come across the newly inserted row.

RangeVarGetRelid() always runs its index scan to completion, and the blocking
happens in LockRelationOid().  You will get a sequence like this:

RangeVarGetRelid(foo) = 2
LockRelationOid(2) [... blocks ...]
AcceptInvalidationMessages() [process a message]
RangeVarGetRelid(foo) = 20001
[restart loop]
LockRelationOid(20001)
AcceptInvalidationMessages() [no new messages - done]

RangeVarGetRelid() *is* vulnerable to the problem Simon just reported in the
ALTER TABLE lock strength reduction patch is unsafe thread, which arises
when the DDL transaction actually commits in the middle of a concurrent system
table scan.  I don't think this patch makes things better or worse in that
regard, but I haven't thought it through in great depth.

 Am I missing something? I would have expected to have to loop around
 and retry if no valid record is found. But this raises the question --
 if no lock was acquired then what would have triggered an
 AcceptInvalidatationMessages and how would we know we waited long
 enough to find out about the newly created table?

Good question.  I think characterizing long enough quickly leads to defining
one or more sequence points after which all backends must recognize a new
table as existing.  My greenfield definition would be a command should see
precisely the tables visible to its MVCC snapshot, but that has practical
problems.  Let's see what implementation concerns would suggest...

This leads to a case I had not considered explicitly: CREATE TABLE on a name
that has not recently mapped to any table.  If the catcache has a negative
entry on the key in question, we will rely on that and miss the new table
until we call AcceptInvalidationMessages() somehow.  To hit this, you need a
command that dynamically chooses to query a table that has been created since
the command started running.  DROP/CREATE of the same name in a single
transaction can't hit the problem.  Consider this test script:

psql -X \_EOSQL 
-- Cleanup from last run
DROP TABLE IF EXISTS public.foo;

BEGIN;

-- Create the neg catcache entry.
SAVEPOINT q;
SELECT 1 FROM public.foo;
ROLLBACK to q;

--SET client_min_messages = debug5; -- use with CACHEDEBUG for insight
DO $$
BEGIN
EXECUTE 'SELECT 1 FROM pg_am'; -- prime basic catcache entries
PERFORM pg_sleep(11);
EXECUTE 'SELECT 1 FROM public.foo';
END
$$;
_EOSQL

sleep 1
psql -Xc 'CREATE TABLE public.foo ()'

wait

The first backend fails to see the new table despite its creating transaction
having committed ~10s ago.  Starting a transaction, beginning to process a new
client-issued command, or successfully locking any relation prevents the miss.
We could narrow the window in most cases by re-adding a call to
AcceptInvalidationMessages() before RangeVarLockRelid()'s first call to
RangeVarGetRelid().  My current thinking is that it's not worth adding that
cost to every RangeVarLockRelid().  Thus, specify that, minimally, each
client-issued command will see all tables whose names were occupied at the
time the command started.  I would add a comment to that effect.  Thoughts?

 As a side note, if there are a long stream of such concurrent DDL then
 this code will leave all the old versions locked. This is consistent
 with our hold locks until end of transaction semantics but it seems
 weird for tables that we locked accidentally and didn't really end
 up using at all. I'm not sure it's really bad though.

Yes.  If that outcome were more common, this would be a good place to try
relaxing the rule.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-14 Thread Robert Haas

On Mon, Jun 13, 2011 at 4:04 PM, Noah Misch n...@leadboat.com wrote:
 On Mon, Jun 13, 2011 at 08:21:05AM -0400, Robert Haas wrote:
 On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
  This probably would not replace a backend-local counter of processed 
  messages
  for RangeVarLockRelid()'s purposes. ?It's quite possibly a good way to 
  reduce
  SInvalReadLock traffic, though.

 I was imagining one shared global counter, not one per backend, and
 thinking that each backend could do something like:

 volatile uint32 *the_global_counter = global_counter;
 uint32 latest_counter;
 mfence();
 latest_counter = *the_global_counter;
 if (latest_counter != previous_value_of_global_counter || 
 myprocstate-isReset)
    really_do_it();
 previous_value_of_global_counter = latest_counter;

 I'm not immediately seeing why that wouldn't work for your purposes as well.

 That takes us back to the problem of answering the (somewhat rephrased) 
 question
 Did any call to AcceptInvalidationMessages() between code point A and code
 point B call really_do_it()? in a way not prone to breaking when new calls to
 AcceptInvalidationMessages(), perhaps indirectly, get added.  That's what the
 local counter achieved.  To achieve that, previous_value_of_global_counter 
 would
 need to be exposed outside sinval.c.  That leaves us with a backend-local
 counter updated in a different fashion.  I might be missing something...

I see your point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-13 Thread Robert Haas

On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
 That might be a start, but it's not a complete replacement for the global
 counter.  AcceptInvalidationMessages() is actually called in 
 LockRelationOid(),
 but the comparison needs to happen a level up in RangeVarLockRelid().  So, we
 would be adding encapsulation in one place to lose it in another.  Also, in 
 the
 uncontended case, the patch only calls AcceptInvalidationMessages() once per
 relation_openrv.  It compares the counter after that call with a counter as 
 the
 last caller left it -- RangeVarLockRelid() doesn't care who that caller was.

Hmm, OK.

 Taking that a bit further, what if we put that counter in
 shared-memory?  After writing new messages into the queue, a writer
 would bump this count (only one process can be doing this at a time
 because SInvalWriteLock is held) and memory-fence.  Readers would
 memory-fence and then read the count before acquiring the lock.  If it
 hasn't changed since we last read it, then don't bother acquiring
 SInvalReadLock, because no new messages have arrived.  Or maybe an
 exact multiple of 2^32 messages have arrived, but there's probably
 someway to finesse around that issue, like maybe also using some kind
 of memory barrier to allow resetState to be checked without the lock.

 This probably would not replace a backend-local counter of processed messages
 for RangeVarLockRelid()'s purposes.  It's quite possibly a good way to reduce
 SInvalReadLock traffic, though.

 Exact multiples of 2^32 messages need not be a problem, because the queue is
 limited to MAXNUMMESSAGES (4096, currently).  I think you will need to pack 
 into
 one 32-bit value all data each backend needs to decide whether to proceed with
 the full process.  Given that queue offsets fit into 13 bits (easily reduced 
 to
 12) and resetState is a bit, that seems practical enough at first glance.

I was imagining one shared global counter, not one per backend, and
thinking that each backend could do something like:

volatile uint32 *the_global_counter = global_counter;
uint32 latest_counter;
mfence();
latest_counter = *the_global_counter;
if (latest_counter != previous_value_of_global_counter || myprocstate-isReset)
   really_do_it();
previous_value_of_global_counter = latest_counter;

I'm not immediately seeing why that wouldn't work for your purposes as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-13 Thread Noah Misch

On Mon, Jun 13, 2011 at 08:21:05AM -0400, Robert Haas wrote:
 On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
  This probably would not replace a backend-local counter of processed 
  messages
  for RangeVarLockRelid()'s purposes. ?It's quite possibly a good way to 
  reduce
  SInvalReadLock traffic, though.

 I was imagining one shared global counter, not one per backend, and
 thinking that each backend could do something like:
 
 volatile uint32 *the_global_counter = global_counter;
 uint32 latest_counter;
 mfence();
 latest_counter = *the_global_counter;
 if (latest_counter != previous_value_of_global_counter || 
 myprocstate-isReset)
really_do_it();
 previous_value_of_global_counter = latest_counter;
 
 I'm not immediately seeing why that wouldn't work for your purposes as well.

That takes us back to the problem of answering the (somewhat rephrased) question
Did any call to AcceptInvalidationMessages() between code point A and code
point B call really_do_it()? in a way not prone to breaking when new calls to
AcceptInvalidationMessages(), perhaps indirectly, get added.  That's what the
local counter achieved.  To achieve that, previous_value_of_global_counter would
need to be exposed outside sinval.c.  That leaves us with a backend-local
counter updated in a different fashion.  I might be missing something...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Make relation_openrv atomic wrt DDL

2011-06-12 Thread Noah Misch

On Thu, Jan 20, 2011 at 09:48:12PM -0500, Robert Haas wrote:
 On Thu, Jan 20, 2011 at 6:19 PM, Noah Misch n...@leadboat.com wrote:
  On Thu, Jan 20, 2011 at 09:36:11PM +, Simon Riggs wrote:
  I agree that the DDL behaviour is wrong and should be fixed. Thank you
  for championing that alternative view.
 
  Swapping based upon names only works and is very flexible, much more so
  than EXCHANGE could be.
 
  A separate utility might be worth it, but the feature set of that should
  be defined in terms of correctly-working DDL behaviour. It's possible
  that no further requirement exists. I remove my own patch from
  consideration for this release.
 
  I'll review your patch and commit it, problems or objections excepted. I
  haven't looked at it in any detail.
 
  Thanks. ?I wouldn't be very surprised if that patch is even the wrong way to
  achieve these semantics, but it's great that we're on the same page as to 
  which
  semantics they are.
 
 I think Noah's patch is a not a good idea, because it will result in
 calling RangeVarGetRelid twice even in the overwhelmingly common case
 where no relevant invalidation occurs.  That'll add several syscache
 lookups per table to very common operations.

Valid concern.

[Refresher: this was a patch to improve behavior for this test case:

psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
('new')
psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
sleep 1   # 
give prev time to take AccessShareLock

# Do it this way, and the next SELECT gets data from the old table.
psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' 
# Do it this way, and get: ERROR:  could not open relation with OID 
41380
#psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' 

psql -c 'SELECT * FROM t'   # I get 'old' or an error, 
never 'new'.
psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

It did so by rechecking the RangeVar-oid resolution after locking the found
relation, by which time concurrent DDL could no longer be interfering.]

I benchmarked the original patch with this function:

Datum
nmtest(PG_FUNCTION_ARGS)
{
int32   n = PG_GETARG_INT32(0);
int i;
RangeVar   *rv = makeRangeVar(NULL, pg_am, 0);

for (i = 0; i  n; ++i)
{
Relationr = relation_openrv(rv, 
AccessShareLock);
relation_close(r, AccessShareLock);
}
PG_RETURN_INT32(4);
}

(Releasing the lock before transaction end makes for an unrealistic benchmark,
but so would opening the same relation millions of times in a single
transaction.  I'm trying to isolate the cost that would be spread over
millions of transactions opening relations a handful of times.  See attached
shar archive for a complete extension wrapping that test function.)

Indeed, the original patch slowed it by about 50%.  I improved the patch,
adding a global SharedInvalidMessageCounter to increment as we process
messages.  If this counter does not change between the RangeVarGetRelid() call
and the post-lock AcceptInvalidationMessages() call, we can skip the second
RangeVarGetRelid() call.  With the updated patch, I get these timings (in ms)
for runs of SELECT nmtest(1000):

master: 19697.642, 20087.477, 19748.995
patched: 19723.716, 19879.538, 20257.671

In other words, no significant difference.  Since the patch removes the
no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to
relation_close(r, NoLock) in the test case actually reveals a 15%
performance improvement.  Granted, nothing to get excited about in light of
the artificiality.

This semantic improvement would be hard to test with the current pg_regress
suite, so I do not include any test case addition in the patch.  If
sufficiently important, it could be done with isolationtester.

Thanks,
nm
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 01a492e..63537fd 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 979,1004  relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
Oid relOid;
  
!   /*
!* Check for shared-cache-inval messages before trying to open the
!* relation.  This is needed to cover the case where the name 
identifies a
!* rel that has been dropped and recreated since the start of our
!* transaction: if we don't flush the old syscache entry then we'll 
latch
!* onto that entry and suffer an error when we do RelationIdGetRelation.
!* Note that relation_open does not need to do this, since a relation's
!* OID never changes.
!*
!*

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-12 Thread Robert Haas

On Sun, Jun 12, 2011 at 3:18 PM, Noah Misch n...@leadboat.com wrote:
 Indeed, the original patch slowed it by about 50%.  I improved the patch,
 adding a global SharedInvalidMessageCounter to increment as we process
 messages.  If this counter does not change between the RangeVarGetRelid() call
 and the post-lock AcceptInvalidationMessages() call, we can skip the second
 RangeVarGetRelid() call.  With the updated patch, I get these timings (in ms)
 for runs of SELECT nmtest(1000):

 master: 19697.642, 20087.477, 19748.995
 patched: 19723.716, 19879.538, 20257.671

 In other words, no significant difference.  Since the patch removes the
 no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to
 relation_close(r, NoLock) in the test case actually reveals a 15%
 performance improvement.  Granted, nothing to get excited about in light of
 the artificiality.

In point of fact, given the not-so-artificial results I just posted on
another thread (lazy vxid locks) I'm *very* excited about trying to
reduce the cost of AcceptInvalidationMessages().  I haven't reviewed
your patch in detail, but is there a way we can encapsulate the
knowledge of the invalidation system down inside the sinval machinery,
rather than letting the heap code have to know directly about the
counter?  Perhaps AIV() could return true or false depending on
whether any invalidation messages were processed, or somesuch.

 This semantic improvement would be hard to test with the current pg_regress
 suite, so I do not include any test case addition in the patch.  If
 sufficiently important, it could be done with isolationtester.

I haven't had a chance to look closely at the isolation tester yet,
but I'm excited about the possibilities for testing this sort of
thing.  Not sure whether it's worth including this or not, but it
doesn't seem like a bad idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-12 Thread Noah Misch

On Sun, Jun 12, 2011 at 06:20:53PM -0400, Robert Haas wrote:
 On Sun, Jun 12, 2011 at 3:18 PM, Noah Misch n...@leadboat.com wrote:
  Indeed, the original patch slowed it by about 50%. ?I improved the patch,
  adding a global SharedInvalidMessageCounter to increment as we process
  messages. ?If this counter does not change between the RangeVarGetRelid() 
  call
  and the post-lock AcceptInvalidationMessages() call, we can skip the second
  RangeVarGetRelid() call. ?With the updated patch, I get these timings (in 
  ms)
  for runs of SELECT nmtest(1000):
 
  master: 19697.642, 20087.477, 19748.995
  patched: 19723.716, 19879.538, 20257.671
 
  In other words, no significant difference. ?Since the patch removes the
  no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to
  relation_close(r, NoLock) in the test case actually reveals a 15%
  performance improvement. ?Granted, nothing to get excited about in light of
  the artificiality.
 
 In point of fact, given the not-so-artificial results I just posted on
 another thread (lazy vxid locks) I'm *very* excited about trying to
 reduce the cost of AcceptInvalidationMessages().

Quite interesting.  A quick look suggests there is room for optimization there.

 I haven't reviewed
 your patch in detail, but is there a way we can encapsulate the
 knowledge of the invalidation system down inside the sinval machinery,
 rather than letting the heap code have to know directly about the
 counter?  Perhaps AIV() could return true or false depending on
 whether any invalidation messages were processed, or somesuch.

I actually did it exactly that way originally.  The problem was the return value
only applying to the given call, while I wished to answer a question like Did
any call to AcceptInvalidationMessages() between code point A and code point B
process a message?  Adding AcceptInvalidationMessages() calls to code between A
and B would make the return value test yield a false negative.  A global counter
was the best thing I could come up with that avoided this hazard.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-12 Thread Robert Haas

On Sun, Jun 12, 2011 at 9:23 PM, Noah Misch n...@leadboat.com wrote:
 I haven't reviewed
 your patch in detail, but is there a way we can encapsulate the
 knowledge of the invalidation system down inside the sinval machinery,
 rather than letting the heap code have to know directly about the
 counter?  Perhaps AIV() could return true or false depending on
 whether any invalidation messages were processed, or somesuch.

 I actually did it exactly that way originally.  The problem was the return 
 value
 only applying to the given call, while I wished to answer a question like Did
 any call to AcceptInvalidationMessages() between code point A and code point B
 process a message?  Adding AcceptInvalidationMessages() calls to code 
 between A
 and B would make the return value test yield a false negative.  A global 
 counter
 was the best thing I could come up with that avoided this hazard.

Oh, interesting point.  What if AcceptInvalidationMessages returns the
counter?  Maybe with typedef uint32 InvalidationPositionId or
something like that, to make it partially self-documenting, and
greppable.

Taking that a bit further, what if we put that counter in
shared-memory?  After writing new messages into the queue, a writer
would bump this count (only one process can be doing this at a time
because SInvalWriteLock is held) and memory-fence.  Readers would
memory-fence and then read the count before acquiring the lock.  If it
hasn't changed since we last read it, then don't bother acquiring
SInvalReadLock, because no new messages have arrived.  Or maybe an
exact multiple of 2^32 messages have arrived, but there's probably
someway to finesse around that issue, like maybe also using some kind
of memory barrier to allow resetState to be checked without the lock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

2011-06-12 Thread Noah Misch

On Sun, Jun 12, 2011 at 10:56:41PM -0400, Robert Haas wrote:
 On Sun, Jun 12, 2011 at 9:23 PM, Noah Misch n...@leadboat.com wrote:
  I haven't reviewed
  your patch in detail, but is there a way we can encapsulate the
  knowledge of the invalidation system down inside the sinval machinery,
  rather than letting the heap code have to know directly about the
  counter? ?Perhaps AIV() could return true or false depending on
  whether any invalidation messages were processed, or somesuch.
 
  I actually did it exactly that way originally. ?The problem was the return 
  value
  only applying to the given call, while I wished to answer a question like 
  Did
  any call to AcceptInvalidationMessages() between code point A and code 
  point B
  process a message? ?Adding AcceptInvalidationMessages() calls to code 
  between A
  and B would make the return value test yield a false negative. ?A global 
  counter
  was the best thing I could come up with that avoided this hazard.
 
 Oh, interesting point.  What if AcceptInvalidationMessages returns the
 counter?  Maybe with typedef uint32 InvalidationPositionId or
 something like that, to make it partially self-documenting, and
 greppable.

That might be a start, but it's not a complete replacement for the global
counter.  AcceptInvalidationMessages() is actually called in LockRelationOid(),
but the comparison needs to happen a level up in RangeVarLockRelid().  So, we
would be adding encapsulation in one place to lose it in another.  Also, in the
uncontended case, the patch only calls AcceptInvalidationMessages() once per
relation_openrv.  It compares the counter after that call with a counter as the
last caller left it -- RangeVarLockRelid() doesn't care who that caller was.

 Taking that a bit further, what if we put that counter in
 shared-memory?  After writing new messages into the queue, a writer
 would bump this count (only one process can be doing this at a time
 because SInvalWriteLock is held) and memory-fence.  Readers would
 memory-fence and then read the count before acquiring the lock.  If it
 hasn't changed since we last read it, then don't bother acquiring
 SInvalReadLock, because no new messages have arrived.  Or maybe an
 exact multiple of 2^32 messages have arrived, but there's probably
 someway to finesse around that issue, like maybe also using some kind
 of memory barrier to allow resetState to be checked without the lock.

This probably would not replace a backend-local counter of processed messages
for RangeVarLockRelid()'s purposes.  It's quite possibly a good way to reduce
SInvalReadLock traffic, though.

Exact multiples of 2^32 messages need not be a problem, because the queue is
limited to MAXNUMMESSAGES (4096, currently).  I think you will need to pack into
one 32-bit value all data each backend needs to decide whether to proceed with
the full process.  Given that queue offsets fit into 13 bits (easily reduced to
12) and resetState is a bit, that seems practical enough at first glance.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

[HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

Re: [HACKERS] Make relation_openrv atomic wrt DDL

18 matches

Site Navigation

Mail list logo

Footer information