Re: [HACKERS] ALTER TABLE ... REPLACE WITH

Noah Misch Wed, 19 Jan 2011 14:47:24 -0800

Hi Simon,

I'm reviewing this patch for CommitFest 2011-01.

On Sat, Jan 15, 2011 at 10:02:03PM +0000, Simon Riggs wrote:
> On Tue, 2010-12-14 at 19:48 +0000, Simon Riggs wrote:
> > REPLACE TABLE ying WITH yang
> Patch. Needs work.

First, I'd like to note that the thread for this patch had *four* "me-too"
responses to the use case.  That's extremely unusual; the subject is definitely
compelling to people.  It addresses the bad behavior of natural attempts to
atomically swap two tables in the namespace:

        psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
('new')"
        psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
        sleep 1                                                           # 
give prev time to take AccessShareLock

        # Do it this way, and the next SELECT gets data from the old table.
        #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' 
&
        # Do it this way, and get: ERROR:  could not open relation with OID 
41380
        psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' &

        psql -c 'SELECT * FROM t'               # I get 'old' or an error, 
never 'new'.
        psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

by letting you do this instead:

        psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
('new')"
        psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
        sleep 1                                                           # 
give prev time to take AccessShareLock

        psql -c 'EXCHANGE TABLE new_t TO t &

        psql -c 'SELECT * FROM t'               # I get 'new', finally!
        psql -c 'DROP TABLE IF EXISTS t, new_t'

I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread
interesting: can we just make the first example work?  Even granting that the
second syntax may be a useful addition, the existing behavior of the first
example is surely worthless, even actively harmful.  I tossed together a
proof-of-concept patch, attached, that makes the first example DTRT.  Do you see
any value in going down that road?

As for your patch itself:

> +     /*
> +      * Exchange table contents
> +      *
> +      * Swap heaps, toast tables, toast indexes
> +      * all forks
> +      * all indexes

For indexes, would you basically swap yin<->yang in pg_index.indrelid, update
pg_class.relnamespace as needed, and check for naming conflicts (when yin and
yang differ in schema)?  Is there more?

> +      *
> +      * Checks:
> +      * * table definitions must match

Is there a particular reason to require this, or is it just a simplification to
avoid updating things to match?

> +      * * constraints must match

Wouldn't CHECK constraints have no need to match?

> +      * * indexes need not match
> +      * * outbound FKs don't need to match
> +      * * inbound FKs will be set to not validated

We would need to ensure that inbound FOREIGN KEY constraints still have indexes
suitable to implement them.  The easiest thing there might be to internally drop
and recreate the constraint, so we get all that verification.

> +      *
> +      * No Trigger behaviour
> +      *
> +      * How is it WAL logged? By locks and underlying catalog updates
> +      */

I see that the meat of the patch is yet to be written, so this ended up as more
of a design review based on your in-patch comments.  Hopefully it's of some
value.  I'll go ahead and mark the patch Returned with Feedback.

Thanks,
nm

*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 970,995 **** relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
        Oid                     relOid;
  
!       /*
!        * Check for shared-cache-inval messages before trying to open the
!        * relation.  This is needed to cover the case where the name 
identifies a
!        * rel that has been dropped and recreated since the start of our
!        * transaction: if we don't flush the old syscache entry then we'll 
latch
!        * onto that entry and suffer an error when we do RelationIdGetRelation.
!        * Note that relation_open does not need to do this, since a relation's
!        * OID never changes.
!        *
!        * We skip this if asked for NoLock, on the assumption that the caller 
has
!        * already ensured some appropriate lock is held.
!        */
!       if (lockmode != NoLock)
!               AcceptInvalidationMessages();
! 
!       /* Look up the appropriate relation using namespace search */
!       relOid = RangeVarGetRelid(relation, false);
  
        /* Let relation_open do the rest */
!       return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 970,980 ----
  {
        Oid                     relOid;
  
!       /* Look up and lock the appropriate relation using namespace search */
!       relOid = RangeVarLockRelid(relation, lockmode, false);
  
        /* Let relation_open do the rest */
!       return relation_open(relOid, NoLock);
  }
  
  /* ----------------
***************
*** 1005,1034 **** try_relation_openrv(const RangeVar *relation, LOCKMODE 
lockmode)
  {
        Oid                     relOid;
  
!       /*
!        * Check for shared-cache-inval messages before trying to open the
!        * relation.  This is needed to cover the case where the name 
identifies a
!        * rel that has been dropped and recreated since the start of our
!        * transaction: if we don't flush the old syscache entry then we'll 
latch
!        * onto that entry and suffer an error when we do RelationIdGetRelation.
!        * Note that relation_open does not need to do this, since a relation's
!        * OID never changes.
!        *
!        * We skip this if asked for NoLock, on the assumption that the caller 
has
!        * already ensured some appropriate lock is held.
!        */
!       if (lockmode != NoLock)
!               AcceptInvalidationMessages();
! 
!       /* Look up the appropriate relation using namespace search */
!       relOid = RangeVarGetRelid(relation, true);
  
        /* Return NULL on not-found */
        if (!OidIsValid(relOid))
                return NULL;
  
        /* Let relation_open do the rest */
!       return relation_open(relOid, lockmode);
  }
  
  /* ----------------
--- 990,1004 ----
  {
        Oid                     relOid;
  
!       /* Look up and lock the appropriate relation using namespace search */
!       relOid = RangeVarLockRelid(relation, lockmode, true);
  
        /* Return NULL on not-found */
        if (!OidIsValid(relOid))
                return NULL;
  
        /* Let relation_open do the rest */
!       return relation_open(relOid, NoLock);
  }
  
  /* ----------------
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
***************
*** 42,47 ****
--- 42,48 ----
  #include "parser/parse_func.h"
  #include "storage/backendid.h"
  #include "storage/ipc.h"
+ #include "storage/lmgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
***************
*** 282,287 **** RangeVarGetRelid(const RangeVar *relation, bool failOK)
--- 283,340 ----
  }
  
  /*
+  * RangeVarLockRelid
+  *            Like RangeVarGetRelid, but simulatenously acquire the specified 
lock on
+  *            the relation.  This works properly in the face of concurrent 
DDL that
+  *            may drop, create or rename relations.
+  *
+  * If the relation is not found and failOK = true, take no lock and return
+  * InvalidOid.  Otherwise, raise an error.
+  */
+ Oid
+ RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+                                 bool failOK)
+ {
+       Oid                     relOid1,
+                               relOid2;
+ 
+       /*
+        * First attempt.  If the caller requested NoLock, it already acquired 
an
+        * appropriate lock and has called AcceptInvalidationMessages() since 
doing
+        * so.  In this case, our first search is always correct, and we 
degenerate
+        * to behave exactly like RangeVarGetRelid().
+        */
+       relOid1 = RangeVarGetRelid(relation, failOK);
+       if (lockmode == NoLock)
+               return relOid1;
+ 
+       /*
+        * By the time we acquire the lock, our RangeVar may denote a different
+        * relation or no relation at all.  In particular, this can happen when 
the
+        * lock acquisition blocks on a transaction performing DROP or ALTER 
TABLE
+        * RENAME.  However, once and while we do hold a lock of any level, we 
can
+        * count on the name correspondence remaining stable.
+        */
+       do
+       {
+               /* Not-found is always final. */
+               if (!OidIsValid(relOid1))
+                       return relOid1;
+ 
+               LockRelationOid(relOid1, lockmode);
+ 
+               /* Make recent DDL effects visible.  Names are stable; search 
again. */
+               AcceptInvalidationMessages();
+               relOid2 = relOid1;
+               relOid1 = RangeVarGetRelid(relation, failOK);
+ 
+               /* Done when our RangeVar denotes the same relation we locked. 
*/
+       } while (relOid1 != relOid2);
+ 
+       return relOid1;
+ }
+ 
+ /*
   * RangeVarGetCreationNamespace
   *            Given a RangeVar describing a to-be-created relation,
   *            choose which namespace to create it in.
*** a/src/include/catalog/namespace.h
--- b/src/include/catalog/namespace.h
***************
*** 48,53 **** typedef struct OverrideSearchPath
--- 48,55 ----
  
  
  extern Oid    RangeVarGetRelid(const RangeVar *relation, bool failOK);
+ extern Oid    RangeVarLockRelid(const RangeVar *relation, LOCKMODE lockmode,
+                                 bool failOK);
  extern Oid    RangeVarGetCreationNamespace(const RangeVar *newRelation);
  extern Oid    RelnameGetRelid(const char *relname);
  extern bool RelationIsVisible(Oid relid);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

Reply via email to