On Wed, May 15, 2013 at 11:35 AM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > Shared memory space is limited, but we only need the watermarks for any > in-progress truncations. Let's keep them in shared memory, in a small > fixed-size array. That limits the number of concurrent truncations that can > be in-progress, but that should be ok.
Would it only limit the number of concurrent transactions that can be in progress *due to vacuum*? Or would it limit the total number of TOTAL concurrent truncations? Because a table could have arbitrarily many inheritance children, and you might try to truncate the whole thing at once... > To not slow down common backend > operations, the values (or lack thereof) are cached in relcache. To sync the > relcache when the values change, there will be a new shared cache > invalidation event to force backends to refresh the cached watermark values. > A backend (vacuum) can ensure that all backends see the new value by first > updating the value in shared memory, sending the sinval message, and waiting > until everyone has received it. AFAIK, the sinval mechanism isn't really well-designed to ensure that these kinds of notifications arrive in a timely fashion. There's no particular bound on how long you might have to wait. Pretty much all inner loops have CHECK_FOR_INTERRUPTS(), but they definitely do not all have AcceptInvalidationMessages(), nor would that be safe or practical. The sinval code sends catchup interrupts, but only for the purpose of preventing sinval overflow, not for timely receipt. Another problem is that sinval resets are bad for performance, and anything we do that pushes more messages through sinval will increase the frequency of resets. Now if those are operations are things that are relatively uncommon, it's not worth worrying about - but if it's something that happens on every relation extension, I think that's likely to cause problems. That could leave to wrapping the sinval queue around in a fraction of a second, leading to system-wide sinval resets. Ouch. > With the watermarks, truncation works like this: > > 1. Set soft watermark to the point where we think we can truncate the > relation. Wait until everyone sees it (send sinval message, wait). I'm also concerned about how you plan to synchronize access to this shared memory arena. I thought about implementing a relation size cache during the 9.2 cycle, to avoid the overhead of the approximately 1 gazillion lseek calls we do under e.g. a pgbench workload. But the thing is, at least on Linux, the system calls are pretty cheap, and on kernels >= 3.2, they are lock-free. On earlier kernels, there's a spinlock acquire/release cycle for every lseek, and performance tanks with >= 44 cores. That spinlock is around a single memory fetch, so a spinlock or lwlock around the entire array would presumably be a lot worse. It seems to me that under this system, everyone who would under present circumstances invoke lseek() would have to first have to query this shared memory area, and then if they miss (which is likely, since most of the time there won't be a truncation in progress) they'll still have to do the lseek. So even if there's no contention problem, there could still be a raw loss of performance. I feel like I might be missing a trick though; it seems like somehow we ought to be able to cache the relation size for long periods of time when no extension is in progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers