subject:"\\\\\\\[HACKERS\\\\\\\] Should I implement DROP INDEX CONCURRENTLY\\\\\\\?"

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-03-28 Thread Robert Haas

On Fri, Feb 3, 2012 at 10:28 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
 Patch now locks index in AccessExclusiveLock in final stage of drop.

 Doesn't that defeat the point of doing the CONCURRENTLY business in the
 first place?

 That was my initial reaction.

 We lock the index in AccessExclusiveLock only once we are certain
 nobody else is looking at it any more.

 So its a Kansas City Shuffle, with safe locking in case of people
 doing strange low level things.

 Yeah, I think this is much safer, and in this version that doesn't
 seem to harm concurrency.

 Given our previous experiences in this area, I wouldn't like to bet my
 life savings on this having no remaining bugs - but if it does, I
 can't find them.

 I'll mark this Ready for Committer.

I don't think this has been committed - does that mean you've decided
against doing so, or just haven't had the round tuits?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-12 Thread Jeff Janes

On Sun, Jan 8, 2012 at 8:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Not having a freelist at all is probably a simpler way of avoiding the
 lock contention, so I'll happily back that suggestion instead. Patch
 attached, previous patch revoked.

 v2 attached with cleanup of some random stuff that crept onto patch.

Hi Simon,

Based on the way this patch leaves the old code behind (using #ifdef),
this looks more like a WIP patch which you want people to do
performance testing with, rather than  patch proposed for committing.
If that is the case, could you outline the type of performance testing
where you think it would make a difference (and whether it should be
done on top of the main patch from this thread, the concurrent index
drop one)?

Also, it would be much easier to do the performance testing if this
behavior was controlled by a temporarily added GUC, rather than an
#ifdef.  Do you think it is feasible to do that, or would the overhead
of a single if (some_guc) per StrategyGetBuffer and
StrategyFreeBuffer call distort things too much?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-03 Thread Robert Haas

On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
 Patch now locks index in AccessExclusiveLock in final stage of drop.

 Doesn't that defeat the point of doing the CONCURRENTLY business in the
 first place?

 That was my initial reaction.

 We lock the index in AccessExclusiveLock only once we are certain
 nobody else is looking at it any more.

 So its a Kansas City Shuffle, with safe locking in case of people
 doing strange low level things.

Yeah, I think this is much safer, and in this version that doesn't
seem to harm concurrency.

Given our previous experiences in this area, I wouldn't like to bet my
life savings on this having no remaining bugs - but if it does, I
can't find them.

I'll mark this Ready for Committer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-01 Thread Simon Riggs

On Wed, Feb 1, 2012 at 2:56 AM, Robert Haas robertmh...@gmail.com wrote:

 I improved the grammar issues in the attached version of the patch -
 the syntax is now simpler and more consistent, IF EXISTS now works,
 and RESTRICT is accepted (without changing the behavior) while CASCADE
 fails with a nicer error message.  I also fixed a bug in
 RangeVarCallbackForDropRelation.

Plus tests as well. Many thanks.

I fixed the main bug you observed and your test case now works
perfectly. I used pgbench to continuously drop/create an index, so a
little more than manual testing.

v5 Attached.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..343f7ac 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
 /synopsis
  /refsynopsisdiv
 
@@ -50,6 +50,29 @@ DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ..
/varlistentry
 
varlistentry
+termliteralCONCURRENTLY/literal/term
+listitem
+ para
+  When this option is used, productnamePostgreSQL/ will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ /para
+ para
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the literalCONCURRENTLY/literal
+  parameter is specified.  Regular commandDROP INDEX/ command can be
+  performed within a transaction block, but
+  commandDROP INDEX CONCURRENTLY/ cannot.
+  The CASCADE option is not supported when dropping an index concurrently.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termreplaceable class=PARAMETERname/replaceable/term
 listitem
  para
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 			Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, flags);
 
 		findDependentObjects(thisobj,
 			 DEPFLAG_ORIGINAL,
@@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((flags  PERFORM_DELETION_CONCURRENTLY) != PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -380,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -611,7 +615,7 @@ findDependentObjects(const ObjectAddress *object,
  * deletion of the owning object.)
  */
 ReleaseDeletionLock(object);
-

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-01 Thread Peter Eisentraut

On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
 Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-02-01 Thread Simon Riggs

On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-01-29 at 22:01 +, Simon Riggs wrote:
 Patch now locks index in AccessExclusiveLock in final stage of drop.

 Doesn't that defeat the point of doing the CONCURRENTLY business in the
 first place?

That was my initial reaction.

We lock the index in AccessExclusiveLock only once we are certain
nobody else is looking at it any more.

So its a Kansas City Shuffle, with safe locking in case of people
doing strange low level things.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-31 Thread Robert Haas

On Sun, Jan 29, 2012 at 5:01 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I can't see any way that situation can occur. The patch *explicitly*
 waits until all people that can see the index as usable have dropped
 their lock. So I don't think this is necessary. Having said that,
 since we are talking about the index and not the whole table, if I
 believe the above statement then I can't have any reasonable objection
 to doing as you suggest.

 Patch now locks index in AccessExclusiveLock in final stage of drop.

 v3 attached.

 If you have suggestions to improve grammar issues, they;re most
 welcome. Otherwise this seems good to go.

I improved the grammar issues in the attached version of the patch -
the syntax is now simpler and more consistent, IF EXISTS now works,
and RESTRICT is accepted (without changing the behavior) while CASCADE
fails with a nicer error message.  I also fixed a bug in
RangeVarCallbackForDropRelation.

However, testing reveals that this doesn't really work.  I found that
by playing around with a couple of concurrent sessions, I could get a
SELECT statement to hang waiting on the AccessExclusiveLock in the
second phase, because we're really still opening the relation (even
though it's invalid) and thus still locking it, and thus still waiting
for the AccessExclusiveLock.  And I managed to get this:

rhaas=# select * from foo where a = 1;
ERROR:  could not open relation with OID 16388

So then I tried changing the lock level back to
ShareUpdateExclusiveLock.  It appears that that merely narrows the
window for errors of this type, rather than getting rid of them.  I
ran this in one window:

pgbench -c 30 -j 30 -f f -n -T 180

where the file f contained this:

select * from foo where a = 1

and then in the other window I repeatedly did this:

rhaas=# create index foo_a on foo (a);
CREATE INDEX
rhaas=# drop index concurrently foo_a;
DROP INDEX

and pgbench started issuing messages like this:

Client 2 aborted in state 0: ERROR:  could not open relation with OID 16394
Client 9 aborted in state 0: ERROR:  could not open relation with OID 16397
Client 18 aborted in state 0: ERROR:  could not open relation with OID 16397

My conclusion is that regardless of whether ShareUpdateExclusiveLock
or AccessExclusiveLock is used for the final phase of the drop, this
isn't safe.  I haven't tracked down exactly where the wheels are
coming off, but the problem does not occur when using the non-current
form of DROP INDEX.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


drop_index_concurrently.v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-29 Thread Simon Riggs

On Thu, Jan 26, 2012 at 2:25 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Your caution is wise. All users of an index have already checked
 whether the index is usable at plan time, so although there is much
 code that assumes they can look at the index itself, that is not
 executed until after the correct checks.

 I'll look at VACUUM and other utilities, so thanks for that.

 I don't see much point in having the higher level lock, except perhaps
 as a test this code works.

 I thought of another way this can cause a problem: suppose that while
 we're dropping the relation with only ShareUpdateExclusiveLock, we get
 as far as calling DropRelFileNodeBuffers.  Just after we finish, some
 other process that holds AccessShareLock or RowShareLock or
 RowExclusiveLock reads and perhaps even dirties a page in the
 relation.

I can't see any way that situation can occur. The patch *explicitly*
waits until all people that can see the index as usable have dropped
their lock. So I don't think this is necessary. Having said that,
since we are talking about the index and not the whole table, if I
believe the above statement then I can't have any reasonable objection
to doing as you suggest.

Patch now locks index in AccessExclusiveLock in final stage of drop.

v3 attached.

If you have suggestions to improve grammar issues, they;re most
welcome. Otherwise this seems good to go.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY replaceable class=PARAMETERname/replaceable
 /synopsis
  /refsynopsisdiv
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ..
/varlistentry
 
varlistentry
+termliteralCONCURRENTLY/literal/term
+listitem
+ para
+  When this option is used, productnamePostgreSQL/ will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ /para
+ para
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the literalCONCURRENTLY/literal
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular commandDROP INDEX/ command can be performed within
+  a transaction block, but commandDROP INDEX CONCURRENTLY/ cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termreplaceable class=PARAMETERname/replaceable/term
 listitem
  para
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 			Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-25 Thread Robert Haas

On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Your caution is wise. All users of an index have already checked
 whether the index is usable at plan time, so although there is much
 code that assumes they can look at the index itself, that is not
 executed until after the correct checks.

 I'll look at VACUUM and other utilities, so thanks for that.

 I don't see much point in having the higher level lock, except perhaps
 as a test this code works.

I thought of another way this can cause a problem: suppose that while
we're dropping the relation with only ShareUpdateExclusiveLock, we get
as far as calling DropRelFileNodeBuffers.  Just after we finish, some
other process that holds AccessShareLock or RowShareLock or
RowExclusiveLock reads and perhaps even dirties a page in the
relation.  Now we've got pages in the buffer pool that might even be
dirty, but the backing file is truncated or perhaps removed
altogether.  If the page is dirty, I think the background writer will
eventually choke trying to write out the page.  If the page is not
dirty, I'm less certain whether anything is going to explode
violently, but it seems mighty sketchy to have pages in the buffer
pool with a buffer tag that don't exist any more.  As the comment
says:

 *  It is the responsibility of higher-level code to ensure that the
 *  deletion or truncation does not lose any data that could be needed
 *  later.  It is also the responsibility of higher-level code to ensure
 *  that no other process could be trying to load more pages of the
 *  relation into buffers.

...and of course, the intended mechanism is an AccessExclusiveLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-24 Thread Jeff Janes

On Sat, Jan 21, 2012 at 2:29 PM, Jim Nasby j...@nasby.net wrote:
 On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:

 So, you're proposing that we remove freelist altogether? Sounds reasonable, 
 but that needs to be performance tested somehow. I'm not sure what exactly 
 the test should look like, but it probably should involve an OLTP workload, 
 and large tables being created, populated to fill the cache with pages from 
 the table, and dropped, while the OLTP workload is running. I'd imagine that 
 the worst case scenario looks something like that.

 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

If the head and tail are both protected by BufFreelistLock, I'm pretty
sure this will make things worse, not better.

If we change to having head and tail protected by separate spinlocks,
then I don't see how you can remove the last buffer from the list, or
add a buffer to an empty list, without causing havoc.

Does anyone have ideas for implementing a cross-platform, lock-free,
FIFO linked list?  Without that, I don't see how we are going to get
anywhere on this approach.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-24 Thread Jeff Janes

On Mon, Jan 23, 2012 at 5:06 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

 That's kinda my feeling as well.  The free list in its current form is
 pretty much useless, but I don't think we'll save much by getting rid
 of it, because that's just a single test.  The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

 What makes you think that they are uncontested?

It is automatically partitioned over 131,072 spinlocks if
shared_buffers=1GB, so the chances of contention seem pretty low.
If a few very hot index root blocks are heavily contested, still that
would only be
encountered a few times out of the 131,072.  I guess we could count
them, similar
to your LWLOCK_STATS changes.

 Or for that matter,
 that even an uncontested spinlock operation is cheap enough to do
 while holding a badly contended LWLock?

Is the concern that getting a uncontested spinlock has lots of
instructions, or that the associated fences are expensive?


 So I think
 it would be advisable to prove rather than just assume that that's a
 problem.

 It's pretty trivial to prove that there is a very serious problem with
 BufFreelistLock.  I'll admit I can't prove what the right fix is just
 yet, and certainly measurement is warranted.


From my own analysis and experiments, the mere act of getting the
BufFreelistLock when it is contended is far more important than
anything actually done while holding that lock.  When the clock-sweep
is done often enough that the lock is contended, then it is done often
enough to keep the average usagecount low and so the average number of
buffers visited during a clock sweep before finding a usable one was
around 2.0.  YMMV.

Can we get some people who run busy high-CPU servers that churn the
cache and think they may be getting hit with this problem post the
results of this query:

select usagecount, count(*) from pg_buffercache group by usagecount;


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-24 Thread Simon Riggs

On Mon, Jan 23, 2012 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 The real
 problem there is that BufFreelistLock is also used to protect the
 clock sweep pointer.

Agreed

 I think basically we gotta find a way to allow
 multiple backends to run clock sweeps concurrently.  Or else fix
 things so that the freelist never (well, hardly ever) runs dry.

Agreed.

The only question is what do we do now. I'm happy with the thought of
jam tomorrow, I'd like to see some minor improvements now, given that
when we say we'll do that even better in the next release it often
doesn't happen. Which is where those patches come in.

I've posted an improved lock wait analysis patch and have scheduled
some runs on heavily used systems to see what this tells us. Results
from live machines also greatly appreciated, since test systems seem
likely to be inappropriate tests. Patch copied here again.

This is a key issue since RAM is cheap enough now that people are
swamped by it, so large shared_buffers are very common.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

 That's kinda my feeling as well.  The free list in its current form is
 pretty much useless, but I don't think we'll save much by getting rid
 of it, because that's just a single test.  The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested?  Or for that matter,
that even an uncontested spinlock operation is cheap enough to do
while holding a badly contended LWLock?

 So I think
 it would be advisable to prove rather than just assume that that's a
 problem.

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock.  I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Simon Riggs

On Mon, Jan 23, 2012 at 1:06 PM, Robert Haas robertmh...@gmail.com wrote:

 It's pretty trivial to prove that there is a very serious problem with
 BufFreelistLock.  I'll admit I can't prove what the right fix is just
 yet, and certainly measurement is warranted.

I agree there is a problem with BufFreelistLock (so please share your
results with Heikki, who seems not to).

As a result, I've published patches which reduce contention on that
lock in various ways, all of which seem valid to me.

There are lots of things we could have done for 9.2 but didn't, yet we
have some things that did get done on the table right now so it would
be useful to push those through immediately or at least defer
discussion on other things until we get back around to this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

 What makes you think that they are uncontested?

Ah, never mind.  I was thinking that we'd only be touching buffers that
were *on* the freelist, but of course this is incorrect.  The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer.  I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently.  Or else fix
things so that the freelist never (well, hardly ever) runs dry.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-23 Thread Robert Haas

On Mon, Jan 23, 2012 at 11:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

 Yeah ... spinlocks that, by definition, will be uncontested.

 What makes you think that they are uncontested?

 Ah, never mind.  I was thinking that we'd only be touching buffers that
 were *on* the freelist, but of course this is incorrect.  The real
 problem there is that BufFreelistLock is also used to protect the
 clock sweep pointer.  I think basically we gotta find a way to allow
 multiple backends to run clock sweeps concurrently.  Or else fix
 things so that the freelist never (well, hardly ever) runs dry.

I'd come to the same conclusion myself.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Jim Nasby

On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:
 On 04.01.2012 13:14, Simon Riggs wrote:
 On Tue, Jan 3, 2012 at 11:28 PM, Tom Lanet...@sss.pgh.pa.us  wrote:
 Jim Nasbyj...@nasby.net  writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.
 
 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.
 
 No, we can't, because if they're still dirty then the bgwriter would
 first try to write them to the no-longer-existing storage file.  It's
 important that we kill the buffers immediately during relation drop.
 
 I'm still thinking that it might be sufficient to mark the buffers
 invalid and let the clock sweep find them, thereby eliminating the need
 for a freelist.
 
 My patch puts things on the freelist only when it is free to do so.
 Not having a freelist at all is probably a simpler way of avoiding the
 lock contention, so I'll happily back that suggestion instead. Patch
 attached, previous patch revoked.
 
 So, you're proposing that we remove freelist altogether? Sounds reasonable, 
 but that needs to be performance tested somehow. I'm not sure what exactly 
 the test should look like, but it probably should involve an OLTP workload, 
 and large tables being created, populated to fill the cache with pages from 
 the table, and dropped, while the OLTP workload is running. I'd imagine that 
 the worst case scenario looks something like that.

We should also look at having the freelist do something useful, instead of just 
dropping it completely. Unfortunately that's probably more work...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Robert Haas

On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well.  The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test.  The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).  So I guess
my vote would be to leave it alone pending further study, and maybe
remove it later if we can't find a way to rejigger it to be more
useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

 That's kinda my feeling as well.  The free list in its current form is
 pretty much useless, but I don't think we'll save much by getting rid
 of it, because that's just a single test.  The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.  So I think
it would be advisable to prove rather than just assume that that's a
problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-21 Thread Simon Riggs

On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

 No, I haven't done a full review yet - I just noticed that right at
 the outset and wanted to be sure that got addressed.  I can look it
 over more carefully, and test it.

 Corrected your point, and another found during retest.

 Works as advertised, docs corrected.

 This comment needs updating:

    /*
     * To drop an index safely, we must grab exclusive lock on its parent
     * table.  Exclusive lock on the index alone is insufficient because
     * another backend might be about to execute a query on the parent table.
     * If it relies on a previously cached list of index OIDs, then it could
     * attempt to access the just-dropped index.  We must therefore take a
     * table lock strong enough to prevent all queries on the table from
     * proceeding until we commit and send out a shared-cache-inval notice
     * that will make them update their index lists.
     */

OK, I did reword some but clearly not enough.

 Speaking of that comment, why does a concurrent index drop need any
 lock at all on the parent table?  If, as the current comment text
 says, the point of locking the parent table is to make sure that no
 new backends can create relcache entries until our transaction
 commits, then it's not needed here, or at least not for that purpose.
 We're going to out-wait anyone who has the index open *after* we've
 committed our changes to the pg_index entry, so there shouldn't be a
 race condition there.  There may very well be a reason why we need a
 lock on the parent, or there may not, but that's not it.

I feel happier locking the parent as well. I'd rather avoid strange
weirdness later as has happened before in this type of discussion.

 On the flip side, it strikes me that there's considerable danger that
 ShareUpdateExclusiveLock on the index might not be strong enough.  In
 particular, it would fall down if there's anyone who takes an
 AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
 to access the index data despite its being marked !indisready and
 !indisvalid.  There are certainly instances of this in contrib, such
 as in pgstatindex(), which I'm pretty sure will abort with a nastygram
 if somebody truncates away the file under it.  That might not be
 enough reason to reject the patch, but I do think it would be the only
 place in the system where we allow something like that to happen.  I'm
 not sure whether there are any similar risks in core - I am a bit
 worried about get_actual_variable_range() and gincostestimate(), but I
 can't tell for sure whether there is any actual problem there, or
 elsewhere.  I wonder if we should only do the *first* phase of the
 drop with ShareUpdateExcusliveLock, and then after outwaiting
 everyone, take an AccessExclusiveLock on the index (but not the table)
 to do the rest of the work.  If that doesn't allow the drop to proceed
 concurrently, then we go find the places that block against it and
 teach them not to lock/use/examine indexes that are !indisready.  That
 way, the worst case is that D-I-C is less concurrent than we'd like,
 rather than making random other things fall over - specifically,
 anything that count on our traditional guarantee that AccessShareLock
 is enough to keep the file from disappearing.

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.


 In the department of minor nitpicks, the syntax synopsis you've added
 looks wrong:

 DROP INDEX
       [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 [, ...] [ CASCADE | RESTRICT ]
       CONCURRENTLY replaceable class=PARAMETERname/replaceable

 That implies that you may write if exists, followed by 1 or more
 names, followed by cascade or restrict.  After that you must write
 CONCURRENTLY, followed by exactly one name.  I think what you want is:

 DROP INDEX [ IF EXISTS ] replaceable
 class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
 DROP INDEX CONCURRENTLY replaceable class=PARAMETERname/replaceable

 ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
 CONCURRENTLY.  It could also be very useful to be able to concurrently
 drop multiple indexes at once, since that would require waiting out
 all concurrent transactions only once

Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-20 Thread Heikki Linnakangas


On 04.01.2012 13:14, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Jim Nasbyj...@nasby.net  writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.



Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.


No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.


My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.


So, you're proposing that we remove freelist altogether? Sounds 
reasonable, but that needs to be performance tested somehow. I'm not 
sure what exactly the test should look like, but it probably should 
involve an OLTP workload, and large tables being created, populated to 
fill the cache with pages from the table, and dropped, while the OLTP 
workload is running. I'd imagine that the worst case scenario looks 
something like that.


Removing the freelist should speed up the drop, but the question is how 
costly are the extra clock sweeps that the backends have to do.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-20 Thread Robert Haas

On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

 No, I haven't done a full review yet - I just noticed that right at
 the outset and wanted to be sure that got addressed.  I can look it
 over more carefully, and test it.

 Corrected your point, and another found during retest.

 Works as advertised, docs corrected.

This comment needs updating:

/*
 * To drop an index safely, we must grab exclusive lock on its parent
 * table.  Exclusive lock on the index alone is insufficient because
 * another backend might be about to execute a query on the parent table.
 * If it relies on a previously cached list of index OIDs, then it could
 * attempt to access the just-dropped index.  We must therefore take a
 * table lock strong enough to prevent all queries on the table from
 * proceeding until we commit and send out a shared-cache-inval notice
 * that will make them update their index lists.
 */

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table?  If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there.  There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough.  In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid.  There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it.  That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen.  I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere.  I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work.  If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready.  That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
   [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
[, ...] [ CASCADE | RESTRICT ]
   CONCURRENTLY replaceable class=PARAMETERname/replaceable

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict.  After that you must write
CONCURRENTLY, followed by exactly one name.  I think what you want is:

DROP INDEX [ IF EXISTS ] replaceable
class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY replaceable class=PARAMETERname/replaceable

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY.  It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code.  So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] replaceable
class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time.  That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-19 Thread Simon Riggs

On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

 No, I haven't done a full review yet - I just noticed that right at
 the outset and wanted to be sure that got addressed.  I can look it
 over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY replaceable class=PARAMETERname/replaceable
 /synopsis
  /refsynopsisdiv
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ..
/varlistentry
 
varlistentry
+termliteralCONCURRENTLY/literal/term
+listitem
+ para
+  When this option is used, productnamePostgreSQL/ will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ /para
+ para
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the literalCONCURRENTLY/literal
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular commandDROP INDEX/ command can be performed within
+  a transaction block, but commandDROP INDEX CONCURRENTLY/ cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termreplaceable class=PARAMETERname/replaceable/term
 listitem
  para
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 0b3d489..6884fdb 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,9 +171,10 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   DropBehavior behavior,
 	   int msglevel,
 	   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+			int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int option_flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -224,7 +225,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects-refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 		 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+		 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -308,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, option_flags);
 
 		findDependentObjects(thisobj,
 			 DEPFLAG_ORIGINAL,
@@ -336,13 +344,17 @@

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-18 Thread Simon Riggs

On Wed, Jan 18, 2012 at 1:28 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
 On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut pete...@gmx.net wrote:
  On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
  I don't see how setting indisvalid to false helps with this, because
  IIUC when a session sees indisvalid = false, it is supposed to avoid
  using the index for queries but still make new index entries when a
  write operation happens - but to drop an index, I think you'd need to
  get into a state where no one was using the index for anything at all.
 
  ISTM that one would need to set indisready to false instead.

 Maybe we should set both to false?

 Well, ready = false and valid = true doesn't make any sense.  There is
 only just-created - ready - valid.  We might as well convert that to a
 single char column, as you had indicated in your earlier email.  But
 that's independent of the proposed patch.

 Sure, but the point is that I think if you want everyone to stop
 touching the index, you ought to mark it both not-valid and not-ready,
 which the current patch doesn't do.

Thanks for the review and the important observation. I agree that I've
changed the wrong column. indisready must be set false. Also agree
setting both false makes most sense.

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-18 Thread Robert Haas

On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed.  I can look it
over more carefully, and test it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-17 Thread Peter Eisentraut

On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
 On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut pete...@gmx.net wrote:
  On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
  I don't see how setting indisvalid to false helps with this, because
  IIUC when a session sees indisvalid = false, it is supposed to avoid
  using the index for queries but still make new index entries when a
  write operation happens - but to drop an index, I think you'd need to
  get into a state where no one was using the index for anything at all.
 
  ISTM that one would need to set indisready to false instead.
 
 Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense.  There is
only just-created - ready - valid.  We might as well convert that to a
single char column, as you had indicated in your earlier email.  But
that's independent of the proposed patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-17 Thread Robert Haas

On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
 On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut pete...@gmx.net wrote:
  On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
  I don't see how setting indisvalid to false helps with this, because
  IIUC when a session sees indisvalid = false, it is supposed to avoid
  using the index for queries but still make new index entries when a
  write operation happens - but to drop an index, I think you'd need to
  get into a state where no one was using the index for anything at all.
 
  ISTM that one would need to set indisready to false instead.

 Maybe we should set both to false?

 Well, ready = false and valid = true doesn't make any sense.  There is
 only just-created - ready - valid.  We might as well convert that to a
 single char column, as you had indicated in your earlier email.  But
 that's independent of the proposed patch.

Sure, but the point is that I think if you want everyone to stop
touching the index, you ought to mark it both not-valid and not-ready,
which the current patch doesn't do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-16 Thread Robert Haas

On Sat, Dec 31, 2011 at 8:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
 I was poking around at tablecmds and index.c and wonder if a similar
 two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
 create a DROP INDEX CONCURRENTLY, and if there would be any interest
 in accepting such a patch.

 Hmm, it seems I just independently came up with this same concept.  My
 problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
 exclusive lock on the table just to clean that up.  If the table is
 under constant load, you can't easily do that.  So a two-pass DROP INDEX
 CONCURRENTLY might have been helpful for me.

 Here's a patch for this. Please review.

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

Maybe we need to change indisvalid to a char and make it three
valued: c = being created currently, v = valid, d = being dropped
concurrently, or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-16 Thread Peter Eisentraut

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
 I don't see how setting indisvalid to false helps with this, because
 IIUC when a session sees indisvalid = false, it is supposed to avoid
 using the index for queries but still make new index entries when a
 write operation happens - but to drop an index, I think you'd need to
 get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-16 Thread Robert Haas

On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
 I don't see how setting indisvalid to false helps with this, because
 IIUC when a session sees indisvalid = false, it is supposed to avoid
 using the index for queries but still make new index entries when a
 write operation happens - but to drop an index, I think you'd need to
 get into a state where no one was using the index for anything at all.

 ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-08 Thread Simon Riggs

On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Not having a freelist at all is probably a simpler way of avoiding the
 lock contention, so I'll happily back that suggestion instead. Patch
 attached, previous patch revoked.

v2 attached with cleanup of some random stuff that crept onto patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf-freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf-io_in_progress_lock = LWLockAssign();
 			buf-content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl-numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the clock sweep algorithm */
+	/* Run the clock sweep algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void
+StrategyInitFreelistBuffer(volatile BufferDesc *buf)
+{
+#ifdef USE_BUFMGR_FREELIST
+	/*
+	 * Initially link all the buffers together as unused. Subsequent
+	 * management of this list is done by freelist.c.
+	 */
+	buf-freeNext = i + 1;
+#else
+	buf-freeNext = FREENEXT_NOT_IN_LIST;
+#endif
 }
 
 /*
@@ -331,12 +356,14 @@ StrategyInitialize(bool init)
 		 */
 		Assert(init);
 
+#ifdef USE_BUFMGR_FREELIST
 		/*
 		 * Grab the whole linked list of free buffers for our strategy. We
 		 * assume it was previously set up by InitBufferPool().
 		 */
 		StrategyControl-firstFreeBuffer = 0;
 		StrategyControl-lastFreeBuffer = NBuffers - 1;
+#endif
 
 		/* Initialize the clock sweep pointer */
 		StrategyControl-nextVictimBuffer = 0;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e43719e..6a03d5d 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -190,6 +190,8 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern void StrategyInitFreelistBuffer(volatile BufferDesc *buf);
+
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-04 Thread Simon Riggs

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.

 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.

 No, we can't, because if they're still dirty then the bgwriter would
 first try to write them to the no-longer-existing storage file.  It's
 important that we kill the buffers immediately during relation drop.

 I'm still thinking that it might be sufficient to mark the buffers
 invalid and let the clock sweep find them, thereby eliminating the need
 for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

 Simon is after a different solution involving getting
 rid of the clock sweep...

err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.

But lets discuss that on the main freelist thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf-freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf-io_in_progress_lock = LWLockAssign();
 			buf-content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl-numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the clock sweep algorithm */
+	/* Run the clock sweep algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
 		 * it; decrement the usage_count (unless pinned) and keep scanning.
 		 */
-		LockBufHdr(buf);
 		if (buf-refcount == 0)
 		{
-			if (buf-usage_count  0)
+			if (buf-usage_count  StrategyControl-completePasses)
 			{
 buf-usage_count--;
 trycounter = NBuffers;
 			}
 			else
 			{
-/* Found a usable buffer */
-if (strategy != NULL)
-	AddBufferToRing(strategy, buf);
-return buf;
+LockBufHdr(buf);
+if (buf-refcount == 0)
+{
+	UnlockBufHdr(buf);
+	/* Found a usable buffer */
+	if (strategy != NULL)
+		AddBufferToRing(strategy, buf);
+	return buf;
+}
+UnlockBufHdr(buf);
 			}
 		}
 		else if (--trycounter == 0)
@@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 			 * probably better to fail than to risk getting stuck in an
 			 * infinite loop.
 			 */
-			UnlockBufHdr(buf);
 			elog(ERROR, no unpinned buffers available);
 		}
-		UnlockBufHdr(buf);
 	}
 
 	/* not reached */
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Simon Riggs

On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Assuming the issue really is the physical unlinks (which I agree I'd
 like to see some evidence for), I wonder whether the problem could be
 addressed by moving smgrDoPendingDeletes() to after locks are released,
 instead of before, in CommitTransaction/AbortTransaction.  There does
 not seem to be any strong reason why we have to do that before lock
 release, since incoming potential users of a table should not be trying
 to access the old physical storage after that anyway.

 Alright, since this concern about confirming the expensive part of
 index dropping has come up a few times but otherwise the waters are
 warm, I'll go ahead and do some work to pin things down a bit before
 we continue working on those assumptions.


 This suspicion seems to be proven correct; there came an opportunity
 where we were removing some indexes on a live system and I took the
 opportunity to carefully control and time the process.  There's not
 much relationship between size of the index and the delay, but the
 pauses are still very real. On the other hand, the first time this was
 noticed there was significantly higher load.

 I'd still like to do something to solve this problem, though: even if
 the time-consuming part of the process is not file unlinking, it's
 clearly something after the AccessExclusiveLock is acquired based on
 our other measurements.

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..af7215f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -218,12 +218,21 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 }
 
 /*
- * StrategyFreeBuffer: put a buffer on the freelist
+ * StrategyFreeBuffer: put a buffer on the freelist, unless we're busy
  */
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	/*
+	 * The buffer is already invalidated and is now an allocation target.
+	 * Adding buffers back onto the freelist is an optimisation only,
+	 * so we can decide to skip this step if the lock is busy.
+	 * This improves the speed of dropping indexes and tables on a busy system.
+	 * If the system is busy the newly invalidated buffers will be reallocated
+	 * within one clock sweep.
+	 */
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+		return;
 
 	/*
 	 * It is possible that we are told to put something in the freelist that

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.
 
 On top of that, taking what Robert Haas mentioned on another thread,
 InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
 an ExclusiveLock on the BufFreelistLock. On a busy system this will be
 heavily contended, so adding blocks to the freelist only if the lock
 is free seems warranted.

Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.

In fact, IIRC the function that scans for buffers actually checks to see if a 
rel still exists before it returns the buffer...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Tom Lane

Jim Nasby j...@nasby.net writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.

 Couldn't we just leave the buffers alone? Once an index is dropped and that's 
 pushed out through the catalog then nothing should be trying to access them 
 and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.  Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago.  Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.

(BTW, do we have a separate clock sweep hand for each backend?  If not,
there might be some low hanging fruit there.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby

On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:
 Jim Nasby j...@nasby.net writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.
 
 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.
 
 No, we can't, because if they're still dirty then the bgwriter would
 first try to write them to the no-longer-existing storage file.  It's
 important that we kill the buffers immediately during relation drop.
 
 I'm still thinking that it might be sufficient to mark the buffers
 invalid and let the clock sweep find them, thereby eliminating the need
 for a freelist.  Simon is after a different solution involving getting
 rid of the clock sweep, but he has failed to explain how that's not
 going to end up being the same type of contention-prone coding that we
 got rid of by adopting the clock sweep, some years ago.  Yeah, the sweep
 takes a lot of spinlocks, but that only matters if there is contention
 for them, and the sweep approach avoids the need for a centralized data
 structure.

Yeah, but the problem we run into is that with every backend trying to run the 
clock on it's own we end up with high contention again... it's just in a 
different place than when we had a true LRU. The clock sweep might be cheaper 
than the linked list was, but it's still awfully expensive. I believe our best 
bet is to have a free list that is actually useful in normal operations, and 
then optimize the cost of pulling buffers out of that list as much as possible 
(and let the bgwriter deal with keeping enough pages in that list to satisfy 
demand).

Heh, it occurs to me that the SQL analogy for how things work right now is that 
backends currently have to run a SeqScan (or 5) to find a free page... what we 
need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.

 (BTW, do we have a separate clock sweep hand for each backend?  If not,
 there might be some low hanging fruit there.)

No... having multiple clock hands is an interesting idea, but I'm worried that 
it could potentially get us into trouble if scores of backends were suddenly 
decrementing usage counts all over the place. For example, what if 5 backends 
all had their hands in basically the same place, all pointing at a very heavily 
used buffer. All 5 backends go for free space, they each grab the spinlock on 
that buffer in succession and suddenly this highly used buffer that started 
with a count of 5 has now been freed. We could potentially use more than one 
hand, but I think the relation between the number of hands and the maximum 
usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Tom Lane

Jim Nasby j...@nasby.net writes:
 Yeah, but the problem we run into is that with every backend trying to run 
 the clock on it's own we end up with high contention again... it's just in a 
 different place than when we had a true LRU. The clock sweep might be cheaper 
 than the linked list was, but it's still awfully expensive. I believe our 
 best bet is to have a free list that is actually useful in normal operations, 
 and then optimize the cost of pulling buffers out of that list as much as 
 possible (and let the bgwriter deal with keeping enough pages in that list to 
 satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Robert Haas

On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 Yeah, but the problem we run into is that with every backend trying to run 
 the clock on it's own we end up with high contention again... it's just in a 
 different place than when we had a true LRU. The clock sweep might be 
 cheaper than the linked list was, but it's still awfully expensive. I 
 believe our best bet is to have a free list that is actually useful in 
 normal operations, and then optimize the cost of pulling buffers out of that 
 list as much as possible (and let the bgwriter deal with keeping enough 
 pages in that list to satisfy demand).

 Well, maybe, but I think the historical evidence suggests that that
 approach will be a loser, simply because no matter how cheap, the
 freelist will remain a centralized and heavily contended data structure.
 IMO we need to be looking for a mechanism that has no single point of
 contention, and modifying the clock sweep rules looks like the best way
 to get there.

 Still, he who wants to do the work can try whatever approach he feels
 like.

It might be possible to partition the free list.  So imagine, for
example, 8 free lists.  The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list.  Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it.  If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again.  This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.

Of course you have to think about what happens if you find that your
chosen free list is empty.  In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground.  You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby

On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
 On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 Yeah, but the problem we run into is that with every backend trying to run 
 the clock on it's own we end up with high contention again... it's just in 
 a different place than when we had a true LRU. The clock sweep might be 
 cheaper than the linked list was, but it's still awfully expensive. I 
 believe our best bet is to have a free list that is actually useful in 
 normal operations, and then optimize the cost of pulling buffers out of 
 that list as much as possible (and let the bgwriter deal with keeping 
 enough pages in that list to satisfy demand).
 
 Well, maybe, but I think the historical evidence suggests that that
 approach will be a loser, simply because no matter how cheap, the
 freelist will remain a centralized and heavily contended data structure.
 IMO we need to be looking for a mechanism that has no single point of
 contention, and modifying the clock sweep rules looks like the best way
 to get there.
 
 Still, he who wants to do the work can try whatever approach he feels
 like.
 
 It might be possible to partition the free list.  So imagine, for
 example, 8 free lists.  The background writer runs the clock sweep and
 finds some buffers that are about ready to be reallocated and
 distributes one-eighth of them to each free list.  Then, when a
 backend wants to allocate a buffer, it picks a free list somehow
 (round robin?) and pulls a buffer off it.  If the buffer turns out to
 have been used since it was added to the free list, we give up on it
 and try again.  This hopefully shouldn't happen too often, though, as
 long as we only add enough buffers to the free list to satisfy the
 requests that we expect to get over the next
 small-fraction-of-a-second.
 
 Of course you have to think about what happens if you find that your
 chosen free list is empty.  In that case you probably want to try a
 different one, and if in the worst case where they're all empty, run
 the clock sweep in the foreground.  You probably also want to kick the
 background writer in the pants if even a single one is empty (or
 nearly empty?) and tell it to hurry up and add some more buffers to
 the freelist.

If it comes down to it, we can look at partitioning the free list. But here's 
the thing: this is the strategy that FreeBSD (and I think now Linux as well) 
use to service memory requests, be they for free memory or for reading data 
from disk. If it's good enough for an OS to use, I would expect we could make 
it work as well.

I would expect that pulling a page off the free list would be an extremely fast 
operation... lock the read lock on the list (we should probably have separate 
locks for putting stuff on the list vs taking it off), pin the buffer (which 
shouldn't be contentious), update the read pointer and drop the read lock. Even 
in C code that's probably less than 100 machine instructions, and no looping. 
Compared to the cost of running a clock sweep it should be significantly 
faster. So while there will be contention on the free list read lock, none of 
that contention should last very long.

Do we have any other places where we take extremely short locks like this and 
still run into problems?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-12-31 Thread Simon Riggs

On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
 I was poking around at tablecmds and index.c and wonder if a similar
 two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
 create a DROP INDEX CONCURRENTLY, and if there would be any interest
 in accepting such a patch.

 Hmm, it seems I just independently came up with this same concept.  My
 problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
 exclusive lock on the table just to clean that up.  If the table is
 under constant load, you can't easily do that.  So a two-pass DROP INDEX
 CONCURRENTLY might have been helpful for me.

Here's a patch for this. Please review.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..a5abb5b 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY replaceable class=PARAMETERname/replaceable
 /synopsis
  /refsynopsisdiv
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ..
/varlistentry
 
varlistentry
+termliteralCONCURRENTLY/literal/term
+listitem
+ para
+  When this option is used, productnamePostgreSQL/ will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  invalid and then commit the change. Next we wait until there are no
+  users locking the table who can see the invalid index.
+ /para
+ para
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the literalCONCURRENTLY/literal
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular commandDROP INDEX/ command can be performed within
+  a transaction block, but commandDROP INDEX CONCURRENTLY/ cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termreplaceable class=PARAMETERname/replaceable/term
 listitem
  para
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 223c097..656af21 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,8 +171,9 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   DropBehavior behavior,
 	   int msglevel,
 	   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+			int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
 static void AcquireDeletionLock(const ObjectAddress *object);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects-refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 		 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+		 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	{
 		ObjectAddress *thisobj = targetObjects-refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, option_flags);
 	}
 
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier if doing a drop concurrently
+	 */
+	if ((option_flags  OPT_CONCURRENTLY) != OPT_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -407,7 +419,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 		if (thisextra-flags  DEPFLAG_ORIGINAL)
 			continue;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel,

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-12-30 Thread Peter Eisentraut

On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
 At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
 recently when frobbing around some indexes I realized that there is no
 equivalent for DROP INDEX, and this is a similar but lesser problem
 (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
 EXCLUSIVE lock on the parent table while doing the work to unlink
 files, which nominally one would think to be trivial, but I assure you
 it is not at times for even indexes that are a handful of gigabytes
 (let's say ~= a dozen).  By non-trivial, I mean it can take 30+
 seconds, but less than a couple of minutes.  The storage layer
 (starting from the higher levels of abstraction) are XFS, a somewhat
 trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).
 
 I was poking around at tablecmds and index.c and wonder if a similar
 two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
 create a DROP INDEX CONCURRENTLY, and if there would be any interest
 in accepting such a patch.

Hmm, it seems I just independently came up with this same concept.  My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up.  If the table is
under constant load, you can't easily do that.  So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-09-09 Thread Daniel Farina

On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Assuming the issue really is the physical unlinks (which I agree I'd
 like to see some evidence for), I wonder whether the problem could be
 addressed by moving smgrDoPendingDeletes() to after locks are released,
 instead of before, in CommitTransaction/AbortTransaction.  There does
 not seem to be any strong reason why we have to do that before lock
 release, since incoming potential users of a table should not be trying
 to access the old physical storage after that anyway.

 Alright, since this concern about confirming the expensive part of
 index dropping has come up a few times but otherwise the waters are
 warm, I'll go ahead and do some work to pin things down a bit before
 we continue working on those assumptions.


This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process.  There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.

I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.

Back to the drawing board...

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Daniel Farina

Hello list,

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~= a dozen).  By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes.  The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Quoth index.c:

/*
 * To drop an index safely, we must grab exclusive lock on its parent
 * table.  Exclusive lock on the index alone is insufficient because
 * another backend might be about to execute a query on the parent 
table.
 * If it relies on a previously cached list of index OIDs, then it could
 * attempt to access the just-dropped index.  We must therefore take a
 * table lock strong enough to prevent all queries on the table from
 * proceeding until we commit and send out a shared-cache-inval notice
 * that will make them update their index lists.
 */

Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?

The general idea is:

1) set an index as invalid, to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.

A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Merlin Moncure

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina dan...@heroku.com wrote:
 Hello list,

 At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
 recently when frobbing around some indexes I realized that there is no
 equivalent for DROP INDEX, and this is a similar but lesser problem
 (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
 EXCLUSIVE lock on the parent table while doing the work to unlink
 files, which nominally one would think to be trivial, but I assure you
 it is not at times for even indexes that are a handful of gigabytes
 (let's say ~= a dozen).  By non-trivial, I mean it can take 30+
 seconds, but less than a couple of minutes.  The storage layer
 (starting from the higher levels of abstraction) are XFS, a somewhat
 trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

Are you sure that you are really waiting on the time to unlink the
file?  there's other stuff going on in there like waiting for lock,
plan invalidation, etc.  Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Robert Haas

On Wed, Aug 24, 2011 at 2:24 PM, Daniel Farina dan...@heroku.com wrote:
 Could I make the ACCESS EXCLUSIVE section just long enough to commit
 catalog updates, and then have the bulk of the work happen afterwards?

 The general idea is:

 1) set an index as invalid, to ensure no backend will use it in planning
 2) wait for the xmin horizon to advance to ensure no open snapshots
 that may not see the invalidation of the index are gone (is there a
 way to tighten that up? although even this conservative version would
 be 80-90% of the value for us...)
 3) then use performDeletions without taking a lock on the parent
 table, similar to what's in tablecmds.c already.

 A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
 instead of waiting for statement confirmation, just like CREATE INDEX
 CONCURRENTLY.

This might be a dumb idea, but could we rearrange CommitTransaction()
so that smgrDoPendingDeletes() happens just a bit further down, after
those ResourceOwnerRelease() calls?  It seems like that might
accomplish what you're trying to do here without needing a new
command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Tom Lane

Merlin Moncure mmonc...@gmail.com writes:
 On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina dan...@heroku.com wrote:
 At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
 recently when frobbing around some indexes I realized that there is no
 equivalent for DROP INDEX, and this is a similar but lesser problem
 (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
 EXCLUSIVE lock on the parent table while doing the work to unlink
 files, which nominally one would think to be trivial, but I assure you
 it is not at times for even indexes that are a handful of gigabytes
 (let's say ~= a dozen).

 Are you sure that you are really waiting on the time to unlink the
 file?  there's other stuff going on in there like waiting for lock,
 plan invalidation, etc.  Point being, maybe the time consuming stuff
 can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction.  There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2011-08-24 Thread Daniel Farina

On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina dan...@heroku.com wrote:
 At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
 recently when frobbing around some indexes I realized that there is no
 equivalent for DROP INDEX, and this is a similar but lesser problem
 (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
 EXCLUSIVE lock on the parent table while doing the work to unlink
 files, which nominally one would think to be trivial, but I assure you
 it is not at times for even indexes that are a handful of gigabytes
 (let's say ~= a dozen).

 Are you sure that you are really waiting on the time to unlink the
 file?  there's other stuff going on in there like waiting for lock,
 plan invalidation, etc.  Point being, maybe the time consuming stuff
 can't really be deferred which would make the proposal moot.

 Assuming the issue really is the physical unlinks (which I agree I'd
 like to see some evidence for), I wonder whether the problem could be
 addressed by moving smgrDoPendingDeletes() to after locks are released,
 instead of before, in CommitTransaction/AbortTransaction.  There does
 not seem to be any strong reason why we have to do that before lock
 release, since incoming potential users of a table should not be trying
 to access the old physical storage after that anyway.

Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

47 matches

Mail list logo