On 16.05.2013 00:18, Robert Haas wrote:
On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas
<hlinnakan...@vmware.com>  wrote:
Shared memory space is limited, but we only need the watermarks for any
in-progress truncations. Let's keep them in shared memory, in a small
fixed-size array. That limits the number of concurrent truncations that can
be in-progress, but that should be ok.

Would it only limit the number of concurrent transactions that can be
in progress *due to vacuum*?  Or would it limit the total number of
TOTAL concurrent truncations?  Because a table could have arbitrarily
many inheritance children, and you might try to truncate the whole
thing at once...

It would only limit the number of concurrent *truncations*. Vacuums in general would not count, only vacuums at the end of the vacuum process, trying to truncate the heap.

To not slow down common backend
operations, the values (or lack thereof) are cached in relcache. To sync the
relcache when the values change, there will be a new shared cache
invalidation event to force backends to refresh the cached watermark values.
A backend (vacuum) can ensure that all backends see the new value by first
updating the value in shared memory, sending the sinval message, and waiting
until everyone has received it.

AFAIK, the sinval mechanism isn't really well-designed to ensure that
these kinds of notifications arrive in a timely fashion.  There's no
particular bound on how long you might have to wait.  Pretty much all
inner loops have CHECK_FOR_INTERRUPTS(), but they definitely do not
all have AcceptInvalidationMessages(), nor would that be safe or
practical.  The sinval code sends catchup interrupts, but only for the
purpose of preventing sinval overflow, not for timely receipt.

Currently, vacuum will have to wait for all transactions that have touched the relation to finish, to get the AccessExclusiveLock. If we don't change anything in the sinval mechanism, the wait would be similar - until all currently in-progress transactions have finished. It's not quite the same; you'd have to wait for all in-progress transactions to finish, not only those that have actually touched the relation. But on the plus-side, you would not block new transactions from accessing the relation, so it's not too bad if it takes a long time.

If we could use the catchup interrupts to speed that up though, that would be much better. I think vacuum could simply send a catchup interrupt, and wait until everyone has caught up. That would significantly increase the traffic of sinval queue and catchup interrupts, compared to what it is today, but I think it would still be ok. It would still only be a few sinval messages and catchup interrupts per truncation (ie. per vacuum).

Another problem is that sinval resets are bad for performance, and
anything we do that pushes more messages through sinval will increase
the frequency of resets.  Now if those are operations are things that
are relatively uncommon, it's not worth worrying about - but if it's
something that happens on every relation extension, I think that's
likely to cause problems.

It would not be on every relation extension, only on truncation.

With the watermarks, truncation works like this:

1. Set soft watermark to the point where we think we can truncate the
relation. Wait until everyone sees it (send sinval message, wait).

I'm also concerned about how you plan to synchronize access to this
shared memory arena.

I was thinking of a simple lwlock, or perhaps one lwlock per slot in the arena. It would not be accessed very frequently, because the watermark values would be cached in the relcache. It would only need to be accessed when:

1. Truncating the relation, by vacuum, to set the watermark values
2. By backends, to update the relcache, when it receives the sinval message sent by vacuum. 3. By backends, when writing above the cached watermark value. IOW, when extending a relation that's being truncated at the same time.

In particular, it would definitely not be accessed every time a backend currently needs to do an lseek. Nor everytime a backend needs to extend a relation.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to