Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06.06.2011 07:12, Robert Haas wrote: I did some further investigation of this. It appears that more than 99% of the lock manager lwlock traffic that remains with this patch applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT statement runs in a separate transaction, and for each new transaction we run VirtualXactLockTableInsert(), which takes a lock on the vxid of that transaction, so that other processes can wait for it. That requires acquiring and releasing a lock manager partition lock, and we have to do the same thing a moment later at transaction end to dump the lock. A quick grep seems to indicate that the only places where we actually make use of those VXID locks are in DefineIndex(), when CREATE INDEX CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay expires. Considering that these are not commonplace events, it seems tremendously wasteful to incur the overhead for every transaction. It might be possible to make the lock entry spring into existence "on demand" - i.e. if a backend wants to wait on a vxid entry, it creates the LOCK and PROCLOCK objects for that vxid. That presents a few synchronization challenges, and plus we have to make sure that the backend that's just been "given" a lock knows that it needs to release it, but those seem like they might be manageable problems, especially given the new infrastructure introduced by the current patch, which already has to deal with some of those issues. I'll look into this further. Ah, I remember I saw that vxid lock pop up quite high in an oprofile profile recently. I think it was the case of executing a lot of very simple prepared queries. So it would be nice to address that, even from a single CPU point of view. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jun 4, 2011, at 3:56 AM, Greg Stark wrote: > On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby wrote: >> >> I'm torn between whether the type should store the original time or the >> original time converted to GMT. > > This is the wrong way to think about it. We *never* store time > "converted to GMT". When we want to represent a point in time we > represent it as seconds since the epoch. Right. Sorry, my bad. > The question here is how to represent more complex concepts than > simply points in time. I think the two concepts under discussion are > a) a composite type representing a point in time and a timezone it > should be interpreted in for operations and display and b) the > original input provided which is a text string with the constraint > that it's a valid input which can be interpreted as a point in time. My fear with A is that something could change that would make it impossible to actually get back to the time that was originally entered. For example, a new version of the timezone database could change something. Though, that problem also exists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Sun, Jun 5, 2011 at 8:33 AM, Robert Haas wrote: > We've occasionally seen problems with VACUUM getting stuck for failure > to acquire a cleanup lock due to, for example, a cursor holding a pin > on the buffer page. In the worst case, this can cause an undetected > deadlock, if the backend holding the buffer pin blocks trying to > acquire a heavyweight lock that is in turn blocked by VACUUM. A while > back, someone (Greg Stark? me?) floated the idea of not waiting for > the cleanup lock. If we can't get it immediately, or within some > short period of time, then we just skip the page and continue on. > Do we know if this is really a problem though ? The deadlock for example, can happen only when a backend tries to get a table level conflicting lock while holding the buffer pin and I am not sure if we do that. The contention issue would probably make sense for small tables because for large to very large tables, the probability that a backend and vacuum would process the same page would be quite low. With the current default for vac_threshold, the small tables can get vacuumed very frequently and if they are also heavily accessed, the cleanup lock can become a bottleneck. Another issue that might be worth paying attention to is the single pass vacuum that I am currently working on. The design that we agreed up on, assumes that the index vacuum must clear index pointers to all the dead line pointers. If we skip any page, we must at least collect the existing dead line pointers and remove those index pointers. If we create dead line pointers and we want to vacuum them later, we store the LSN in the page and that may require defrag. Of course, we can work around that, but I think it will be useful if we do some tests to show that the cleanup lock is indeed a major bottleneck. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Jun 6, 2011, at 1:00 AM, Robert Haas wrote: > On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro > wrote: >> On Sun, Jun 5, 2011 at 12:03, Robert Haas wrote: >>> If other buffer pins do exist, then we can't >>> defragment the page, but that doesn't mean no useful work can be done: >>> we can still mark used line pointers dead, or dead line pointers >>> unused. We cannot defragment, but that can be done either by the next >>> VACUUM or by a HOT cleanup. >> >> This is just an idea -- Is it possible to have copy-on-write techniques? >> VACUUM allocates a duplicated page for the pinned page, and copy valid >> tuples into the new page. Following buffer readers after the VACUUM will >> see the cloned page instead of the old pinned one. > > Heikki suggested the same thing, and it's not a bad idea, but I think > it would be more work to implement than what I proposed. The caller > would need to be aware that, if it tries to re-acquire a content lock > on the same page, the offset of the tuple within the page might > change. I'm not sure how much work would be required to cope with > that possibility. I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast. So perhaps instead of copy on write we should try for copy on read on all appropriate plan nodes. On a related note, I've also wondered if it would be useful to allow nodes to deal with more than one tuple at a time; the idea being that it's better to execute a smaller chunk of code over a bigger chunk of data instead of dribbling tuples through an entire execution tree one at a time. Perhaps that will only be useful if nodes are executing in parallel... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro wrote: > On Sun, Jun 5, 2011 at 12:03, Robert Haas wrote: >> If other buffer pins do exist, then we can't >> defragment the page, but that doesn't mean no useful work can be done: >> we can still mark used line pointers dead, or dead line pointers >> unused. We cannot defragment, but that can be done either by the next >> VACUUM or by a HOT cleanup. > > This is just an idea -- Is it possible to have copy-on-write techniques? > VACUUM allocates a duplicated page for the pinned page, and copy valid > tuples into the new page. Following buffer readers after the VACUUM will > see the cloned page instead of the old pinned one. Heikki suggested the same thing, and it's not a bad idea, but I think it would be more work to implement than what I proposed. The caller would need to be aware that, if it tries to re-acquire a content lock on the same page, the offset of the tuple within the page might change. I'm not sure how much work would be required to cope with that possibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIREAD lock versus ACCESS EXCLUSIVE lock
On Sun, Jun 05, 2011 at 12:45:41PM -0500, Kevin Grittner wrote: > Is this possible? If a transaction gets its snapshot while OID of N > is assigned to relation X, can that transaction wind up seeing an OID > of N as a reference to relation Y? If not, there aren't any false > positives possible. This ought to be possible, assuming that the transaction doesn't try to read (and take an AccessShareLock on) X. On the other hand, given all the timing constraints you've noted, I think it's reasonable to ignore the risk of false positives here. What you're saying is that we might inadvertently roll back a transaction, if a table gets dropped and a new table created and assigned the same OID, and there's an uncommitted transaction that started before the DROP, *and* certain other conditions hold about the reads and timing of overlapping transactions. This combination of conditions seems quite unlikely and I have a hard time getting too worked up about it. Occasional false positives are already a fact of life when using SSI. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Pavel Stehule wrote: 2011/6/6 Darren Duncan : Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? it should be supported by FOREACH statement in PL/pgSQL Yes, absolutely. I know this feature is loved in Perl. But this usage would only work for a more limited range of data types, namely those over which one can build a sequence generator, such as integers, because they have a next-value/prev-value function defined. In other words, while range types in general would work for any ordered type, FOREACH would only work for the subset of those that are ordinal types. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
2011/6/6 Darren Duncan : > Jeff Davis wrote: >> >> I'd like to take another look at Range Types and whether part of it >> should be an extension. Some of these issues relate to extensions in >> general, not just range types. >> >> First of all, what are the advantages to being in core? it should be supported by FOREACH statement in PL/pgSQL Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto adjust send buffer size to congention window
Tom Lane Monday 06 of June 2011 04:07:41 > =?utf-8?q?Rados=C5=82aw_Smogura?= writes: > > I've got idea to auto adjust send buffer size to size of TCP congention > > window. Will this increase performance and in which way. I suppose > > throughput may be increased, but latency decreased. What do You think? > > I think if that makes any difference at all, it would be proof that the > TCP stack on your machine was implemented by incompetents. The kernel > should already be handling such considerations. > > regards, tom lane I was thought about send buffer size in libpq. Regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? I believe that ranges aka intervals are widely useful generic types, next after relations/tuples/arrays, and they *should* be supported in core, same as arrays are. In particular, the usefulness of ranges/intervals is often orthogonal to many other things, and for many types including numbers, strings, temporals. Now assuming that a range/interval value is generally defined in terms of a pair of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or {<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is capable of distinguishing open and closed intervals. For example, a range value can be represented by a tuple with 4 attributes, where two of those are the endpoint values, and two of those are booleans saying whether each of the endpoints is inside or outside the range/interval. Also, if Postgres has some concept of type-generic special values -Inf and +Inf (which always sort before or after any other value in the type system), those can be used as endpoints to indicate that the interval is unbounded. Unless you have some other syntax in mind, I suggest lifting the range literal syntax from Perl 6, where ".." is an infix operator building a range between its arguments, and a "^" on either side means that side is open, I think; so there are 4 variants: {..,^..,..^,^..^}. Now as to general usefulness of intervals ... Any operation that wants to deal with a range somehow, such as the BETWEEN syntax, could instead use a range/interval; for example, both of: foo in 1..10 foo between 1 and 10 ... would mean the same thing, but the 1..10 can be replaced by an arbitrary value expression or variable reference. Likewise with: date in start ..^ end date >= start and date < end ... mean the same thing. The LIMIT clause could take a range to specify take and skip count at once. Array slicing can be done using foo[first..last] or such. A random number generator that takes endpoints can take a range argument. An array or relation of these range can represent ranges with holes, and the general results of range union operations. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum & cleanup locks
On Sun, Jun 5, 2011 at 12:03, Robert Haas wrote: > If other buffer pins do exist, then we can't > defragment the page, but that doesn't mean no useful work can be done: > we can still mark used line pointers dead, or dead line pointers > unused. We cannot defragment, but that can be done either by the next > VACUUM or by a HOT cleanup. This is just an idea -- Is it possible to have copy-on-write techniques? VACUUM allocates a duplicated page for the pinned page, and copy valid tuples into the new page. Following buffer readers after the VACUUM will see the cloned page instead of the old pinned one. Of course, copy-on-writing is more complex than skipping pinned pages, but I wonder we cannot vacuum at all in some edge cases with the skipping method. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Sun, Jun 5, 2011 at 10:16 PM, Robert Haas wrote: > I'm definitely interested in investigating what to do > about that, but I don't think it's this patch's problem to fix all of > our lock manager bottlenecks. I did some further investigation of this. It appears that more than 99% of the lock manager lwlock traffic that remains with this patch applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT statement runs in a separate transaction, and for each new transaction we run VirtualXactLockTableInsert(), which takes a lock on the vxid of that transaction, so that other processes can wait for it. That requires acquiring and releasing a lock manager partition lock, and we have to do the same thing a moment later at transaction end to dump the lock. A quick grep seems to indicate that the only places where we actually make use of those VXID locks are in DefineIndex(), when CREATE INDEX CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay expires. Considering that these are not commonplace events, it seems tremendously wasteful to incur the overhead for every transaction. It might be possible to make the lock entry spring into existence "on demand" - i.e. if a backend wants to wait on a vxid entry, it creates the LOCK and PROCLOCK objects for that vxid. That presents a few synchronization challenges, and plus we have to make sure that the backend that's just been "given" a lock knows that it needs to release it, but those seem like they might be manageable problems, especially given the new infrastructure introduced by the current patch, which already has to deal with some of those issues. I'll look into this further. It's likely that if we lick this problem, the BufFreelistLock and BufMappingLocks are going to be the next hot spot. Of course, we're ignoring the ten-thousand pound gorilla in the corner, which is that on write workloads we have a pretty bad contention problem with WALInsertLock, which I fear will not be so easily addressed. But one problem at a time, I guess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
Noah Misch writes: > On Sun, Jun 05, 2011 at 02:17:00PM -0400, Tom Lane wrote: >> Attached are two versions of a patch to fix this. The second one >> modifies the code that tracks what's "pending" as per the above thought. >> I'm not entirely sure which one I like better ... any comments? > +1 for the second approach. It had bothered me that SetReindexProcessing() > and > ResetReindexProcessing() manipulated one thing, but ReindexIsProcessingIndex() > checked something else besides. (That's still technically true, but the > overall > effect seems more natural.) OK, done that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
On Sun, 2011-06-05 at 15:09 -0400, Tom Lane wrote: > so once we've set the index as the currentlyReindexedIndex, there's > no need for it still to be in pendingReindexedIndexes. OK. The second version of the patch looks good to me. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Sun, Jun 5, 2011 at 5:46 PM, Robert Haas wrote: > Could you compile with LWLOCK_STATS, rerun these tests, total up the > "blk" numbers by LWLockId, and post the results? (Actually, totalling > up the shacq and exacq numbers would be useful as well, if you > wouldn't mind.) I did this on the loaner 24-core box from Nate Boley and got the following results. This is just the LWLocks that had blk>0. lwlock 0: shacq 0 exacq 200625 blk 24044 lwlock 4: shacq 80101430 exacq 196 blk 28 lwlock 33: shacq 8333673 exacq 11977 blk 864 lwlock 34: shacq 7092293 exacq 11890 blk 803 lwlock 35: shacq 7893875 exacq 11909 blk 848 lwlock 36: shacq 7567514 exacq 11912 blk 830 lwlock 37: shacq 7427774 exacq 11930 blk 745 lwlock 38: shacq 7120108 exacq 11989 blk 853 lwlock 39: shacq 7584952 exacq 11982 blk 782 lwlock 40: shacq 7949867 exacq 12056 blk 821 lwlock 41: shacq 6612240 exacq 11929 blk 746 lwlock 42: shacq 47512112 exacq 11844 blk 4503 lwlock 43: shacq 7943511 exacq 11871 blk 878 lwlock 44: shacq 7534558 exacq 12033 blk 800 lwlock 45: shacq 7128256 exacq 12045 blk 856 lwlock 46: shacq 7575339 exacq 12015 blk 818 lwlock 47: shacq 6745173 exacq 12094 blk 806 lwlock 48: shacq 8410348 exacq 12104 blk 977 lwlock 49: shacq 0 exacq 5007594 blk 172533 lwlock 50: shacq 0 exacq 5011704 blk 172282 lwlock 51: shacq 0 exacq 5003356 blk 172802 lwlock 52: shacq 0 exacq 5009020 blk 174648 lwlock 53: shacq 0 exacq 5010808 blk 172080 lwlock 54: shacq 0 exacq 5004908 blk 169934 lwlock 55: shacq 0 exacq 5009324 blk 170281 lwlock 56: shacq 0 exacq 5005904 blk 171001 lwlock 57: shacq 0 exacq 5006984 blk 169942 lwlock 58: shacq 0 exacq 5000346 blk 170001 lwlock 59: shacq 0 exacq 5004884 blk 170484 lwlock 60: shacq 0 exacq 5006304 blk 171325 lwlock 61: shacq 0 exacq 5008421 blk 170866 lwlock 62: shacq 0 exacq 5008162 blk 170868 lwlock 63: shacq 0 exacq 5002238 blk 170291 lwlock 64: shacq 0 exacq 5005348 blk 169764 lwlock 307: shacq 0 exacq 2 blk 1 lwlock 315: shacq 0 exacq 3 blk 2 lwlock 337: shacq 0 exacq 4 blk 3 lwlock 345: shacq 0 exacq 2 blk 1 lwlock 349: shacq 0 exacq 2 blk 1 lwlock 231251: shacq 0 exacq 2 blk 1 lwlock 253831: shacq 0 exacq 2 blk 1 So basically, even with the patch, at 24 cores the lock manager locks are still under tremendous pressure. But note that there's a big difference between what's happening here and what's happening without the patch. Here's without the patch: lwlock 0: shacq 0 exacq 191613 blk 17591 lwlock 4: shacq 21543085 exacq 102 blk 20 lwlock 33: shacq 2237938 exacq 11976 blk 463 lwlock 34: shacq 1907344 exacq 11890 blk 458 lwlock 35: shacq 2125308 exacq 11908 blk 442 lwlock 36: shacq 2038220 exacq 11912 blk 430 lwlock 37: shacq 1998059 exacq 11927 blk 449 lwlock 38: shacq 1916179 exacq 11953 blk 409 lwlock 39: shacq 2042173 exacq 12019 blk 479 lwlock 40: shacq 2140002 exacq 12056 blk 448 lwlock 41: shacq 1776772 exacq 11928 blk 392 lwlock 42: shacq 12777368 exacq 11842 blk 2451 lwlock 43: shacq 2132240 exacq 11869 blk 478 lwlock 44: shacq 2026845 exacq 12031 blk 446 lwlock 45: shacq 1918618 exacq 12045 blk 449 lwlock 46: shacq 2038437 exacq 12011 blk 472 lwlock 47: shacq 1814660 exacq 12089 blk 401 lwlock 48: shacq 2261208 exacq 12105 blk 478 lwlock 49: shacq 0 exacq 1347524 blk 17020 lwlock 50: shacq 0 exacq 1350678 blk 16888 lwlock 51: shacq 0 exacq 1346260 blk 16744 lwlock 52: shacq 0 exacq 1348432 blk 16864 lwlock 53: shacq 0 exacq 22216779 blk 4914363 lwlock 54: shacq 0 exacq 22217309 blk 4525381 lwlock 55: shacq 0 exacq 1348406 blk 13438 lwlock 56: shacq 0 exacq 1345996 blk 13299 lwlock 57: shacq 0 exacq 1347890 blk 13654 lwlock 58: shacq 0 exacq 1343486 blk 13349 lwlock 59: shacq 0 exacq 1346198 blk 13471 lwlock 60: shacq 0 exacq 1346236 blk 13532 lwlock 61: shacq 0 exacq 1343688 blk 13547 lwlock 62: shacq 0 exacq 1350068 blk 13614 lwlock 63: shacq 0 exacq 1345302 blk 13420 lwlock 64: shacq 0 exacq 1348858 blk 13635 lwlock 321: shacq 0 exacq 2 blk 1 lwlock 329: shacq 0 exacq 4 blk 3 lwlock 337: shacq 0 exacq 6 blk 4 lwlock 347: shacq 0 exacq 5 blk 4 lwlock 357: shacq 0 exacq 3 blk 2 lwlock 363: shacq 0 exacq 3 blk 2 lwlock 369: shacq 0 exacq 4 blk 3 lwlock 379: shacq 0 exacq 2 blk 1 lwlock 383: shacq 0 exacq 2 blk 1 lwlock 445: shacq 0 exacq 2 blk 1 lwlock 449: shacq 0 exacq 2 blk 1 lwlock 451: shacq 0 exacq 2 blk 1 lwlock 1023: shacq 0 exacq 2 blk 1 lwlock 11401: shacq 0 exacq 2 blk 1 lwlock 115591: shacq 0 exacq 2 blk 1 lwlock 117177: shacq 0 exacq 2 blk 1 lwlock 362839: shacq 0 exacq 2 blk 1 In the unpatched case, two lock manager locks are getting beaten to death, and the others all about equally contended. By eliminating the portion of the lock manager contention that pertains specifically to the two heavily trafficked locks, system throughput improves by about 3.5x - and, not surprisingly, traffic on the lock manager locks increases by approximately the same multiple. Those locks now become the contention bottleneck, with about 12x the blocking they had pre-patch. I'm definitely
Re: [HACKERS] SIREAD lock versus ACCESS EXCLUSIVE lock
"Kevin Grittner" wrote: > Maybe I should submit a patch without added complexity of the > scheduled cleanup and we can discuss that as a separate patch? Here's a patch which adds the missing support for DDL. Cleanup of predicate locks at commit time for transactions which ran DROP TABLE or TRUNCATE TABLE can be added as a separate patch. I consider those to be optimizations which are of dubious benefit, especially compared to the complexity of the extra code required. Also, Heikki pointed out that explicitly releasing the predicate locks after a DROP DATABASE is an optimization, which on reflection also seems to be of dubious value compared to the code needed. Unless someone can find a way in which any of these early cleanups are needed for correctness, I suggest they are better left as enhancements in some future release, where there can be some demonstration that they matter enough for performance to justify the extra code, and there can be more testing to ensure the new code doesn't break anything. I stashed the partial work on the more aggressive cleanup, so if there's a consensus that we want it, I can post a patch pretty quickly. In making sure that the new code for this patch was in pgindent format, I noticed that the ASCII art and bullet lists recently added to OnConflict_CheckForSerializationFailure() were mangled badly by pgindent, so I added the dashes to protect those and included a pgindent form of that function. That should save someone some trouble sorting things out after the next global pgindent run. -Kevin ssi-ddl-3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto adjust send buffer size to congention window
=?utf-8?q?Rados=C5=82aw_Smogura?= writes: > I've got idea to auto adjust send buffer size to size of TCP congention > window. Will this increase performance and in which way. I suppose throughput > may be increased, but latency decreased. What do You think? I think if that makes any difference at all, it would be proof that the TCP stack on your machine was implemented by incompetents. The kernel should already be handling such considerations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
On Sun, Jun 05, 2011 at 02:17:00PM -0400, Tom Lane wrote: > I wrote: > > Noah Misch writes: > >> Sounds reasonable. Need to remove the index from pendingReindexedIndexes, > >> not > >> just call ResetReindexProcessing(). > > > [ looks again... ] Uh, right. I was thinking that the pending list was > > just "pending" and not "in progress" indexes. I wonder if we should > > rejigger things so that that's actually true, ie, remove an index's OID > > from the pending list when we mark it as the current one? > > Attached are two versions of a patch to fix this. The second one > modifies the code that tracks what's "pending" as per the above thought. > I'm not entirely sure which one I like better ... any comments? +1 for the second approach. It had bothered me that SetReindexProcessing() and ResetReindexProcessing() manipulated one thing, but ReindexIsProcessingIndex() checked something else besides. (That's still technically true, but the overall effect seems more natural.) Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: psql include file using relative path
On Sun, Jun 5, 2011 at 1:06 PM, Josh Kupershmidt wrote: > On Sun, Jun 5, 2011 at 10:21 AM, Gurjeet Singh > wrote: > > On Sat, May 21, 2011 at 11:59 AM, Josh Kupershmidt > > wrote: > > > Tweaks applied, but omitted the C variable names as I don't think that > adds > > much value. > > Your rewordings are fine, but the the article "the" is missing in a > few spots, e.g. > * "uses \ir command" -> "uses the \ir command" > * "to currently processing file" -> "to the currently processing file" > * "same as \i command" -> "same as the \i command" > > I think "processing" is better (and consistent with the rest of the > comments) than "processed" here: > + * the file from where the currently processed file (if any) is located. > > > New version of the patch attached. Thanks for the review. > > I think the patch is in pretty good shape now. The memory leak is gone > AFAICT, and the comments and documentation updates look good. > Attached an updated patch. If you find it ready for committer, please mark it so in the commitfest app. Thanks, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index eaf901d..9bd6d93 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1626,6 +1626,30 @@ Tue Oct 26 21:40:57 CEST 1999 +\ir filename + + +The \ir command is similar to \i, but is useful for files which +include and execute other files. + + + +When used within a file loaded via \i, \ir, or +-f, if the filename +is specified using a relative path, then the location of the file will be determined +relative to the currently executing file's location. + + + +If filename is given as an absolute path, +or if this command is used in interactive mode, then \ir behaves the same +as the \i command. + + + + + + \l (or \list) \l+ (or \list+) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 378330b..faa3087 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -785,10 +785,15 @@ exec_command(const char *cmd, /* \i is include file */ - else if (strcmp(cmd, "i") == 0 || strcmp(cmd, "include") == 0) + /* \ir is include file relative to file currently being processed */ + else if (strcmp(cmd, "i") == 0 || strcmp(cmd, "include") == 0 + || strcmp(cmd, "ir") == 0 || strcmp(cmd, "include_relative") == 0) { - char *fname = psql_scan_slash_option(scan_state, - OT_NORMAL, NULL, true); + char *fname = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, true); + + bool include_relative = (strcmp(cmd, "ir") == 0 + || strcmp(cmd, "include_relative") == 0); if (!fname) { @@ -798,7 +803,7 @@ exec_command(const char *cmd, else { expand_tilde(&fname); - success = (process_file(fname, false) == EXIT_SUCCESS); + success = (process_file(fname, false, include_relative) == EXIT_SUCCESS); free(fname); } } @@ -1969,15 +1974,19 @@ do_edit(const char *filename_arg, PQExpBuffer query_buf, * process_file * * Read commands from filename and then them to the main processing loop - * Handler for \i, but can be used for other things as well. Returns + * Handler for \i and \ir, but can be used for other things as well. Returns * MainLoop() error code. + * + * If use_relative_path is true and filename is not an absolute path, then open + * the file from where the currently processed file (if any) is located. */ int -process_file(char *filename, bool single_txn) +process_file(char *filename, bool single_txn, bool use_relative_path) { FILE *fd; int result; char *oldfilename; + char *relative_file = NULL; PGresult *res; if (!filename) @@ -1986,6 +1995,39 @@ process_file(char *filename, bool single_txn) if (strcmp(filename, "-") != 0) { canonicalize_path(filename); + + /* + * If the currently processing file uses the \ir command, and the + * 'filename' parameter to the command is given as a relative file path, + * then we resolve this path relative to the currently processing file. + * + * If the \ir command was executed in interactive mode (i.e. not via \i, + * \ir or -f) the we treat it the same as the \i command. + */ + if (use_relative_path && pset.inputfile) + { + char *last_slash; + + /* find the / that splits the file from its path */ + last_slash = strrchr(pset.inputfile, '/'); + + if (last_slash && !is_absolute_path(filename)) + { +size_t dir_len = (last_slash - pset.inputfile) + 1; +size_t file_len = strlen(filename); + +relative_file = pg_malloc(dir_len + 1 + file_len + 1); + +relative_file[0] = '\0'; +strncat(relative_file, pset.inputfile, dir_len); +strcat(relative_file, filename); + +canonicalize_path(relative_fi
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Sun, Jun 5, 2011 at 4:01 PM, Stefan Kaltenbrunner wrote: > On 06/05/2011 09:12 PM, Heikki Linnakangas wrote: >> On 05.06.2011 22:04, Stefan Kaltenbrunner wrote: >>> and one for the -j80 case(also patched). >>> >>> >>> 485798 48.9667 postgres s_lock >>> 60327 6.0808 postgres LWLockAcquire >>> 57049 5.7503 postgres LWLockRelease >>> 18357 1.8503 postgres hash_search_with_hash_value >>> 17033 1.7169 postgres GetSnapshotData >>> 14763 1.4881 postgres base_yyparse >>> 14460 1.4575 postgres SearchCatCache >>> 13975 1.4086 postgres AllocSetAlloc >>> 6416 0.6467 postgres PinBuffer >>> 5024 0.5064 postgres SIGetDataEntries >>> 4704 0.4741 postgres core_yylex >>> 4625 0.4662 postgres _bt_compare >> >> Hmm, does that mean that it's spending 50% of the time spinning on a >> spinlock? That's bad. It's one thing to be contended on a lock, and have >> a lot of idle time because of that, but it's even worse to spend a lot >> of time spinning because that CPU time won't be spent on doing more >> useful work, even if there is some other process on the system that >> could make use of that CPU time. > > well yeah - we are broken right now with only being able to use ~20% of > CPU on a modern mid-range box, but using 80% CPU (or 4x like in the > above case) and only getting less than 2x the performance seems wrong as > well. I also wonder if we are still missing something fundamental - > because even with the current patch we are quite far away from linear > scaling and light-years from some of our competitors... Could you compile with LWLOCK_STATS, rerun these tests, total up the "blk" numbers by LWLockId, and post the results? (Actually, totalling up the shacq and exacq numbers would be useful as well, if you wouldn't mind.) Unless I very much miss my guess, we're going to see zero contention on the new structures introduced by this patch. Rather, I suspect what we're going to find is that, with the hideous contention on one particular lock manager partition lock removed, there's a more spread-out contention problem, likely involving the lock manager partition lock, the buffer mapping locks, and possibly other LWLocks as well. The fact that the system is busy-waiting rather than just not using the CPU at all probably means that the remaining contention is more spread out than that which is removed by this patch. We don't actually have everything pile up on a single LWLock (as happens in git master), but we do spend a lot of time fighting cache lines away from other CPUs. Or at any rate, that's my guess: we need some real numbers to know for sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
Tom Lane writes: > Yes. I think the appropriate problem statement is "provide streaming > access to large field values, as an alternative to just fetching/storing > the entire value at once". I see no good reason to import the entire > messy notion of LOBS/CLOBS. (The fact that other databases have done it > is not a good reason.) Spent some time in the archive to confirm a certain “déjà vu” impression. Couldn't find it. Had to manually search in closed commit fests… but here we are, I think: https://commitfest.postgresql.org/action/patch_view?id=70 http://archives.postgresql.org/message-id/17891.1246301...@sss.pgh.pa.us http://archives.postgresql.org/message-id/4a4bf87e.7010...@ak.jp.nec.com Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06/05/2011 09:12 PM, Heikki Linnakangas wrote: > On 05.06.2011 22:04, Stefan Kaltenbrunner wrote: >> and one for the -j80 case(also patched). >> >> >> 485798 48.9667 postgres s_lock >> 60327 6.0808 postgres LWLockAcquire >> 57049 5.7503 postgres LWLockRelease >> 18357 1.8503 postgres hash_search_with_hash_value >> 17033 1.7169 postgres GetSnapshotData >> 14763 1.4881 postgres base_yyparse >> 14460 1.4575 postgres SearchCatCache >> 13975 1.4086 postgres AllocSetAlloc >> 6416 0.6467 postgres PinBuffer >> 5024 0.5064 postgres SIGetDataEntries >> 4704 0.4741 postgres core_yylex >> 4625 0.4662 postgres _bt_compare > > Hmm, does that mean that it's spending 50% of the time spinning on a > spinlock? That's bad. It's one thing to be contended on a lock, and have > a lot of idle time because of that, but it's even worse to spend a lot > of time spinning because that CPU time won't be spent on doing more > useful work, even if there is some other process on the system that > could make use of that CPU time. well yeah - we are broken right now with only being able to use ~20% of CPU on a modern mid-range box, but using 80% CPU (or 4x like in the above case) and only getting less than 2x the performance seems wrong as well. I also wonder if we are still missing something fundamental - because even with the current patch we are quite far away from linear scaling and light-years from some of our competitors... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Auto adjust send buffer size to congention window
Hi, I've got idea to auto adjust send buffer size to size of TCP congention window. Will this increase performance and in which way. I suppose throughput may be increased, but latency decreased. What do You think? Regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 05.06.2011 22:04, Stefan Kaltenbrunner wrote: and one for the -j80 case(also patched). 485798 48.9667 postgres s_lock 60327 6.0808 postgres LWLockAcquire 57049 5.7503 postgres LWLockRelease 18357 1.8503 postgres hash_search_with_hash_value 17033 1.7169 postgres GetSnapshotData 14763 1.4881 postgres base_yyparse 14460 1.4575 postgres SearchCatCache 13975 1.4086 postgres AllocSetAlloc 6416 0.6467 postgres PinBuffer 5024 0.5064 postgres SIGetDataEntries 4704 0.4741 postgres core_yylex 4625 0.4662 postgres _bt_compare Hmm, does that mean that it's spending 50% of the time spinning on a spinlock? That's bad. It's one thing to be contended on a lock, and have a lot of idle time because of that, but it's even worse to spend a lot of time spinning because that CPU time won't be spent on doing more useful work, even if there is some other process on the system that could make use of that CPU time. I like the overall improvement on the throughput, of course, but we have to find a way to avoid the busy-wait. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
Jeff Davis writes: > On Sun, 2011-06-05 at 14:17 -0400, Tom Lane wrote: >> Attached are two versions of a patch to fix this. The second one >> modifies the code that tracks what's "pending" as per the above thought. >> I'm not entirely sure which one I like better ... any comments? > I think I'm missing something simple: if it's not in the pending list, > what prevents systable_beginscan() from using it? The test that uses is /* * ReindexIsProcessingIndex * True if index specified by OID is currently being reindexed, * or should be treated as invalid because it is awaiting reindex. */ bool ReindexIsProcessingIndex(Oid indexOid) { return indexOid == currentlyReindexedIndex || list_member_oid(pendingReindexedIndexes, indexOid); } so once we've set the index as the currentlyReindexedIndex, there's no need for it still to be in pendingReindexedIndexes. (Arguably, this function should have been renamed when it was changed to look at pendingReindexedIndexes too ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06/03/2011 03:17 PM, Robert Haas wrote: [...] > > As you can see, this works out to a bit more than a 4% improvement on > this two-core box. I also got access (thanks to Nate Boley) to a > 24-core box and ran the same test with scale factor 100 and > shared_buffers=8GB. Here are the results of alternating runs without > and with the patch on that machine: > > tps = 36291.996228 (including connections establishing) > tps = 129242.054578 (including connections establishing) > tps = 36704.393055 (including connections establishing) > tps = 128998.648106 (including connections establishing) > tps = 36531.208898 (including connections establishing) > tps = 131341.367344 (including connections establishing) > > That's an improvement of about ~3.5x. According to the vmstat output, > when running without the patch, the CPU state was about 40% idle. > With the patch, it dropped down to around 6%. nice - but lets see on real hardware... Testing this on a brand new E7-4850 4 Socket/10cores+HT Box - so 80 hardware threads: first some numbers with -HEAD(-T 120, runtimes at lower -c counts have fairly high variation in the results, first number is the number of connections/threads): -j1:tps = 7928.965493 (including connections establishing) -j8:tps = 53610.572347 (including connections establishing) -j16: tps = 80835.446118 (including connections establishing) -j32: tps = 75666.731883 (including connections establishing) -j40: tps = 74628.568388 (including connections establishing) -j64. tps = 68268.081973 (including connections establishing) -c80tps = 66704.216166 (including connections establishing) postgresql is completely lock limited in this test anything beyond around -j10 is basically not able to push the box to more than 80% IDLE(!) and now with the patch applied: -j1:tps = 7783.295587 (including connections establishing) -j8:tps = 44361.661947 (including connections establishing) -j16: tps = 92270.464541 (including connections establishing) -j24: tps = 108259.524782 (including connections establishing) -j32: tps = 183337.422612 (including connections establishing) -j40tps = 209616.052430 (including connections establishing) -j48: tps = 229621.292382 (including connections establishing) -j56: tps = 218690.391603 (including connections establishing) -j64: tps = 188028.348501 (including connections establishing) -j80. tps = 118814.741609 (including connections establishing) so much better - but I still think there is some headroom left still, although pgbench itself is a CPU hog in those benchmark with eating up to 10 cores in the worst case scenario - will retest with sysbench which in the past showed more reasonable CPU usage for me. and a profile(patched code) for the -j48(aka fastest) case: 731535 11.8408 postgres s_lock 2918784.7244 postgres LWLockAcquire 2423733.9231 postgres AllocSetAlloc 2390833.8698 postgres LWLockRelease 2023413.2751 postgres SearchCatCache 1900553.0763 postgres hash_search_with_hash_value 1871483.0292 postgres base_yyparse 1732652.8045 postgres GetSnapshotData 75700 1.2253 postgres core_yylex 74974 1.2135 postgres MemoryContextAllocZeroAligned 61404 0.9939 postgres _bt_compare 57529 0.9312 postgres MemoryContextAlloc and one for the -j80 case(also patched). 485798 48.9667 postgres s_lock 60327 6.0808 postgres LWLockAcquire 57049 5.7503 postgres LWLockRelease 18357 1.8503 postgres hash_search_with_hash_value 17033 1.7169 postgres GetSnapshotData 14763 1.4881 postgres base_yyparse 14460 1.4575 postgres SearchCatCache 13975 1.4086 postgres AllocSetAlloc 6416 0.6467 postgres PinBuffer 5024 0.5064 postgres SIGetDataEntries 4704 0.4741 postgres core_yylex 4625 0.4662 postgres _bt_compare Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Range Types and extensions
I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? 1. ANYRANGE + CREATE TYPE ... AS RANGE -- This is the most compelling, in my opinion. People can define new range functions and new range types independently and each one gets the benefit of the other automatically. Without this, there will be an explosion of functions and a bunch of inconsistencies like functions that support most range types but not all (merely because the function author didn't know that the type existed). In the several talks that I've given, a common question is related to "multiranges" (ranges with holes). These get a little complex, and I don't have a complete answer. However, multiranges can be approximated with ordered arrays of non-overlapping, non-adjacent ranges. If someone wants to take it upon themselves to develop a set of operators here, that would be great -- but without ANYRANGE the operators would be unmanageable. 2. Documentation and Tests -- Let's say we take a minimalist view, and only have ANYRANGE and CREATE TYPE ... AS RANGE in core; and leave the rest as an extension. What exactly would the documentation say? I think it would be even more hypothetical and abstract than the documentation for Exclusion Constraints. So, there is a certain documentation advantage to having at least enough functionality to allow someone to try out the feature. And the tests for such a minimalist feature would be a significant challenge -- what do we do there? Get pg_regress to load the extension from PGXN? 3. Quality -- PostgreSQL has a great reputation for quality, and for good reason. But extensions don't follow the same quality-control standards; and even if some do, there is no visible stamp of approval. So, to ask someone to use an extension means that they have to evaluate the quality for themselves, which is a pretty high barrier. Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve many of the other issues, quality control is one of the biggest ones remaining. I still get questions about when the temporal type will be "in core", and I think this is why. I don't think this is a good excuse to put it in core though. We need to solve this problem, and the best way to start is by getting well-reviewed, high-quality extensions out there. 4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc. - There are a few aspects of range types that aren't in the first patch, but are fairly obvious follow-up additions. These will require some knowledge about ranges in the backend, like finding the "overlaps" operator for a range. The current patch provides this knowledge by providing a built-in overlaps operator for ANYRANGE. This would be a non-issue if we had a good type interface system (that works on polymorphic types) -- we could just have a built-in "range" interface, and the range extension could add "&&" as the range interface's overlaps operator for the type ANYRANGE. = So, where on this spectrum should range types fall? I think the most minimalist would be to only support #1 (and the necessary type IO functions); and leave all other functions, operators, and opclasses to an extension. That has a lot of appeal, but I don't think we can ignore the challenges above. On the other hand, trying to make it a complete feature in core has challenges as well. For instance, even with Range Types, Exclusion Constraints aren't practical out-of-the-box unless we also have BTree-GiST in core. So there's a snowball effect. There might also be some middle ground, where its like the minimalist approach, but with a few very basic constructors and accessors. That would at least make it easier to test, but then to be actually useful (with index support, operators, fancy functions, etc.) you'd need the extension. Thoughts? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
On Sun, 2011-06-05 at 14:17 -0400, Tom Lane wrote: > Attached are two versions of a patch to fix this. The second one > modifies the code that tracks what's "pending" as per the above thought. > I'm not entirely sure which one I like better ... any comments? I think I'm missing something simple: if it's not in the pending list, what prevents systable_beginscan() from using it? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
I wrote: > Noah Misch writes: >> Sounds reasonable. Need to remove the index from pendingReindexedIndexes, >> not >> just call ResetReindexProcessing(). > [ looks again... ] Uh, right. I was thinking that the pending list was > just "pending" and not "in progress" indexes. I wonder if we should > rejigger things so that that's actually true, ie, remove an index's OID > from the pending list when we mark it as the current one? Attached are two versions of a patch to fix this. The second one modifies the code that tracks what's "pending" as per the above thought. I'm not entirely sure which one I like better ... any comments? regards, tom lane diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 53b4c3c59bf78cacf8def18ac4d46940a3a29b71..b046f38ecb8a47538cb12e036a6fca6c3509aec1 100644 *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *** static void validate_index_heapscan(Rela *** 115,120 --- 115,121 Snapshot snapshot, v_i_state *state); static Oid IndexGetRelation(Oid indexId); + static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid); static void SetReindexProcessing(Oid heapOid, Oid indexOid); static void ResetReindexProcessing(void); static void SetReindexPending(List *indexes); *** index_build(Relation heapRelation, *** 1758,1776 } /* - * If it's for an exclusion constraint, make a second pass over the heap - * to verify that the constraint is satisfied. - */ - if (indexInfo->ii_ExclusionOps != NULL) - IndexCheckExclusion(heapRelation, indexRelation, indexInfo); - - /* Roll back any GUC changes executed by index functions */ - AtEOXact_GUC(false, save_nestlevel); - - /* Restore userid and security context */ - SetUserIdAndSecContext(save_userid, save_sec_context); - - /* * If we found any potentially broken HOT chains, mark the index as not * being usable until the current transaction is below the event horizon. * See src/backend/access/heap/README.HOT for discussion. --- 1759,1764 *** index_build(Relation heapRelation, *** 1824,1831 InvalidOid, stats->index_tuples); ! /* Make the updated versions visible */ CommandCounterIncrement(); } --- 1812,1834 InvalidOid, stats->index_tuples); ! /* Make the updated catalog row versions visible */ CommandCounterIncrement(); + + /* + * If it's for an exclusion constraint, make a second pass over the heap + * to verify that the constraint is satisfied. We must not do this until + * the index is fully valid. (Broken HOT chains shouldn't matter, though; + * see comments for IndexCheckExclusion.) + */ + if (indexInfo->ii_ExclusionOps != NULL) + IndexCheckExclusion(heapRelation, indexRelation, indexInfo); + + /* Roll back any GUC changes executed by index functions */ + AtEOXact_GUC(false, save_nestlevel); + + /* Restore userid and security context */ + SetUserIdAndSecContext(save_userid, save_sec_context); } *** IndexCheckExclusion(Relation heapRelatio *** 2270,2275 --- 2273,2290 ExprContext *econtext; /* + * If we are reindexing the target index, mark it as no longer being + * reindexed, to forestall an Assert in index_beginscan when we try to + * use the index for probes. This is OK because the index is now + * fully valid. + */ + if (ReindexIsCurrentlyProcessingIndex(RelationGetRelid(indexRelation))) + { + ResetReindexProcessing(); + RemoveReindexPending(RelationGetRelid(indexRelation)); + } + + /* * Need an EState for evaluation of index expressions and partial-index * predicates. Also a slot to hold the current tuple. */ *** reindex_relation(Oid relid, int flags) *** 3030,3036 * System index reindexing support * * When we are busy reindexing a system index, this code provides support ! * for preventing catalog lookups from using that index. * */ --- 3045,3053 * System index reindexing support * * When we are busy reindexing a system index, this code provides support ! * for preventing catalog lookups from using that index. We also make use ! * of this to catch attempted uses of user indexes during reindexing of ! * those indexes. * */ *** ReindexIsProcessingHeap(Oid heapOid) *** 3049,3054 --- 3066,3081 } /* + * ReindexIsCurrentlyProcessingIndex + * True if index specified by OID is currently being reindexed. + */ + static bool + ReindexIsCurrentlyProcessingIndex(Oid indexOid) + { + return indexOid == currentlyReindexedIndex; + } + + /* * ReindexIsProcessingIndex * True if index specified by OID is currently being reindexed, * or should be treated as invalid because it i
[HACKERS] VIP: enhanced errors
Hello all I am working on new diagnostics fields in errors - CONSTRAINT_NAME, SCHEMA_NAME, TABLE_NAME and COLUMN_NAME. These fields is shown when verbosity mode is active. Actually this works for table constraints, not null constraint and for RI constraints. postgres=# delete from xxx; ERROR: 23503: update or delete on table "xxx" violates foreign key constraint "yyy_b_fkey" on table "yyy" DETAIL: Key (a)=(10) is still referenced from table "yyy". LOCATION: ri_ReportViolation, ri_triggers.c:3593 CONSTRAINT: yyy_b_fkey SCHEMA: public TABLE: xxx COLUMN: a These fields should be available from GET DIAGNOSTICS statement too - It could be next step after support for stacked diagnostics. I looked on column name identification for column constraints - and there is not any possible identification - so I need a new column to pg_constraint table - "attrid", that should to identify column for column's constraint. This patch is just concept. Final patch will be significantly longer - we need to check any "ereport" call, as this most important constraints are table and ri constraints. Regards Pavel Stehule *** ./src/backend/executor/execMain.c.orig 2011-06-05 16:09:09.0 +0200 --- ./src/backend/executor/execMain.c 2011-06-05 19:46:41.469227198 +0200 *** *** 1565,1571 ereport(ERROR, (errcode(ERRCODE_NOT_NULL_VIOLATION), errmsg("null value in column \"%s\" violates not-null constraint", ! NameStr(rel->rd_att->attrs[attrChk - 1]->attname; } } --- 1565,1575 ereport(ERROR, (errcode(ERRCODE_NOT_NULL_VIOLATION), errmsg("null value in column \"%s\" violates not-null constraint", ! NameStr(rel->rd_att->attrs[attrChk - 1]->attname)), ! errconstraintname("not_null_constraint"), ! errschemaname(get_namespace_name(RelationGetNamespace(rel))), ! errtablename(RelationGetRelationName(rel)), ! errcolumnname(NameStr(rel->rd_att->attrs[attrChk - 1]->attname; } } *** *** 1577,1583 ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("new row for relation \"%s\" violates check constraint \"%s\"", ! RelationGetRelationName(rel), failed))); } } --- 1581,1590 ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("new row for relation \"%s\" violates check constraint \"%s\"", ! RelationGetRelationName(rel), failed), ! errconstraintname(failed), ! errschemaname(get_namespace_name(RelationGetNamespace(rel))), ! errtablename(RelationGetRelationName(rel; } } *** ./src/backend/utils/adt/ri_triggers.c.orig 2011-06-05 16:09:10.0 +0200 --- ./src/backend/utils/adt/ri_triggers.c 2011-06-05 19:57:28.489835680 +0200 *** *** 3537,3543 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"", RelationGetRelationName(fk_rel), constrname), errdetail("No rows were found in \"%s\".", ! RelationGetRelationName(pk_rel; } /* Get printable versions of the keys involved */ --- 3537,3546 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"", RelationGetRelationName(fk_rel), constrname), errdetail("No rows were found in \"%s\".", ! RelationGetRelationName(pk_rel)), ! errconstraintname(constrname), ! errschemaname(get_namespace_name(RelationGetNamespace(fk_rel))), ! errtablename(RelationGetRelationName(fk_rel; } /* Get printable versions of the keys involved */ *** *** 3570,3576 RelationGetRelationName(fk_rel), constrname), errdetail("Key (%s)=(%s) is not present in table \"%s\".", key_names.data, key_values.data, ! RelationGetRelationName(pk_rel; else ereport(ERROR, (errcode(ERRCODE_FOREIGN_KEY_VIOLATION), --- 3573,3583 RelationGetRelationName(fk_rel), constrname), errdetail("Key (%s)=(%s) is not present in table \"%s\".", key_names.data, key_values.data, ! RelationGetRelationName(pk_rel)), ! errconstraintname(constrname), ! errschemaname(get_namespace_name(RelationGetNamespace(fk_rel))), ! errtablename(RelationGetRelationName(fk_rel)), ! errcolumnname(key_names.data))); else ereport(ERROR, (errcode(ERRCODE_FOREIGN_KEY_VIOLATION), *** *** 3579,3585 constrname, RelationGetRelationName(fk_rel)), errdetail("Key (%s)=(%s) is still referenced from table \"%s\".", key_names.data, key_values.data, ! RelationGetRelationName(fk_rel; } /* -- --- 3586,3596 constrname, RelationGetRelationName(fk_rel)), errdetail("Key (%s)=(%s) is still referenced from table \"%s\".", key_names.data, key_values.data, ! RelationGetRelationName(fk_rel)), ! errconstraintname(constrname), ! errschem
Re: [HACKERS] SIREAD lock versus ACCESS EXCLUSIVE lock
> Heikki Linnakangas wrote: > On 03.06.2011 23:44, Kevin Grittner wrote: >> Heikki Linnakangas wrote: >> >>> I think you'll need to just memorize the lock deletion command in >>> a backend-local list, and perform the deletion in a post-commit >>> function. >> >> Hmm. As mentioned earlier in the thread, cleaning these up doesn't >> actually have any benefit beyond freeing space in the predicate >> locking collections. I'm not sure that benefit is enough to >> justify this much new mechanism. Maybe I should just leave them >> alone and let them get cleaned up in due course with the rest of >> the locks. Any opinions on that? > > Is there a chance of false positives if oid wraparound happens and > a new table gets the same oid as the old one? It's also possible > for a heap to get the OID of an old index or vice versa, will that > confuse things? Some additional thoughts after working on the new code... The risk we're trying to cover is that a table can be dropped while committed transactions still have predicate locks against it, which won't be cleared until overlapping transactions which are still active, and which can form a dangerous structure with the committed transactions, commit. If we just leave those predicate locks to be cleaned up in the normal course of transaction completion, there could be a new object created with the old OID which, if a complicated set of conditions are met, could lead to a serialization failure of a transaction which would otherwise succeed. We can't clean the predicate locks up immediately when the DROP TABLE appears to succeed, because the enclosing transaction (or subtransaction) could still be rolled back. That last bit seems pretty hard to dodge, because premature predicate lock cleanup could lead to false negatives -- which would allow data inconsistencies. So cleanup before commit is definitely out. But some additional time meditating on the dangerous structure diagram today has me wondering whether we might be OK just ignoring cleanup prior to what would normally happen as transactions complete. Tin - - -> Tpivot - - -> Tout As mentioned above, this whole issue is based around concern about false positives when a transaction has read from a table and committed before the table is dropped. We don't need to worry about that for Tout -- its position in the dangerous structure is just based around what it *writes*; it can't be responsible for the read-and-commit before the table is dropped. To have a problem, Tpivot can't be the transaction which commits before the DROP TABLE either -- since it reads what Tout writes and Tout has to be first to commit, then it cannot read and commit before the DROP TABLE and still have Tout write afterward. That means that the transaction which did the read and commit would have to be Tin, and Tpivot would need to be the transaction which later wrote to an object with the same OID as the dropped table. For a problem to manifest itself, this sequence must occur: - The timing of transaction starts doesn't matter (other than the RO optimizations) as long as Tpivot overlaps both Tin and Tout. - Tout commits first - Any time before Tin commits, it reads from relation X. - Tin commits second - Tpivot develops a rw-conflict out to Tout on a relation *other than* X, any time after both have started and before Tpivot commits. - Tpivot writes to a relation which has the same OID as relation X, but which isn't relation X. This requires that a new relation be created with the same OID which had been used by X, and that this new relation is visible to Tpivot -- which acquired its snapshot *before X was dropped*. Is this possible? If a transaction gets its snapshot while OID of N is assigned to relation X, can that transaction wind up seeing an OID of N as a reference to relation Y? If not, there aren't any false positives possible. Given the quantity and complexity of code which I've needed to generate to cover this, I'm wondering again whether it is justified *even if* such a false positive is possible. It certainly seems like it would be a *very* infrequent occurrence, There's also the question of what shows in pg_locks, but I'm wondering whether even that is enough to justify this much code. Thoughts? Maybe I should submit a patch without added complexity of the scheduled cleanup and we can discuss that as a separate patch? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: psql include file using relative path
On Sun, Jun 5, 2011 at 10:21 AM, Gurjeet Singh wrote: > On Sat, May 21, 2011 at 11:59 AM, Josh Kupershmidt > wrote: > Tweaks applied, but omitted the C variable names as I don't think that adds > much value. Your rewordings are fine, but the the article "the" is missing in a few spots, e.g. * "uses \ir command" -> "uses the \ir command" * "to currently processing file" -> "to the currently processing file" * "same as \i command" -> "same as the \i command" I think "processing" is better (and consistent with the rest of the comments) than "processed" here: + * the file from where the currently processed file (if any) is located. > New version of the patch attached. Thanks for the review. I think the patch is in pretty good shape now. The memory leak is gone AFAICT, and the comments and documentation updates look good. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assert failure when rechecking an exclusion constraint
Noah Misch writes: > On Sat, Jun 04, 2011 at 05:49:31PM -0400, Tom Lane wrote: >> So in short, I'm thinking move lines 1760-1772 (in HEAD) of index.c to >> the end of index_build(), then insert a ResetReindexProcessing() call in >> front of them; or maybe only do ResetReindexProcessing there if we >> actually do call IndexCheckExclusion. > Sounds reasonable. Need to remove the index from pendingReindexedIndexes, not > just call ResetReindexProcessing(). [ looks again... ] Uh, right. I was thinking that the pending list was just "pending" and not "in progress" indexes. I wonder if we should rejigger things so that that's actually true, ie, remove an index's OID from the pending list when we mark it as the current one? > Also, wouldn't that specific construction > make the catalog updates fail due to running in the table owner's security > context? AFAIR there's no security checks happening below this level. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: psql include file using relative path
On Sat, May 21, 2011 at 11:59 AM, Josh Kupershmidt wrote: > On Fri, May 20, 2011 at 2:35 PM, Gurjeet Singh > wrote: > > On Sat, May 14, 2011 at 5:03 PM, Josh Kupershmidt > > wrote: > > Thanks a lot for the review. My responses are inline below. > > Thanks for the fixes. Your updated patch is sent as a > patch-upon-a-patch, it'll probably be easier for everyone > (particularly the final committer) if you send inclusive patches > instead. > My Bad. I did not intend to do that. > > >> == Documentation == > >> The patch includes the standard psql help output description for the > >> new \ir command. I think ./doc/src/sgml/ref/psql-ref.sgml needs to be > >> patched as well, though. > > > > Done. > > This is a decent description from a technical standpoint: > > >When used within a script, if the class="parameter">filename >uses relative path notation, then the file will be looked up > relative to currently >executing file's location. > >If the filename > uses an absolute path >notation, or if this command is being used in interactive > mode, then it behaves the >same as \i command. > > > but I think these paragraphs could be made a little more clear, by > making a suggestion about why someone would be interested in \ir. How > about this: > > >The \ir command is similar to \i, but > is useful for files which >load in other files. > >When used within a file loaded via \i, > \ir, or >-f, if the class="parameter">filename >is specified with a relative path, then the location of the > file will be determined >relative to the currently executing file's location. > > > >If filename is given as > an >absolute path, or if this command is used in interactive mode, then >\ir behaves the same as the \i command. > > > The sentence "When used within a file ..." is now a little > clunky/verbose, but I was trying to avoid potential confusion from > someone trying \ir via 'cat ../filename.sql | psql', which would be > "used within a script", but \ir wouldn't know that. > Although a bit winded, I think that sounds much clearer. > > > >> == Code == > >> 1.) I have some doubts about whether the memory allocated here: > >>char *relative_file = pg_malloc(dir_len + 1 + file_len + 1); > >> is always free()'d, particularly if this condition is hit: > >> > >>if (!fd) > >>{ > >>psql_error("%s: %s\n", filename, strerror(errno)); > >>return EXIT_FAILURE; > >>} > > > > Fixed. > > Well, this fix: > >if (!fd) >{ > + if (relative_path != NULL) > + free(relative_path); > + >psql_error("%s: %s\n", filename, strerror(errno)); > > uses the wrong variable name (relative_path instead of relative_file), > and the subsequent psql_error() call will then reference freed memory, > since relative_file was assigned to filename. > > But even after fixing this snippet to get it to compile, like so: > >if (!fd) >{ >psql_error("%s: %s\n", filename, strerror(errno)); > if (relative_file != NULL) >free(relative_file); > >return EXIT_FAILURE; >} > > I was still seeing memory leaks in valgrind growing with the number of > \ir calls between files (see valgrind_bad_report.txt attached). I > think that relative_file needs to be freed even in the non-error case, > like so: > > error: >if (fd != stdin) >fclose(fd); > >if (relative_file != NULL) >free(relative_file); > >pset.inputfile = oldfilename; >return result; > > At least, that fix seemed to get rid of the ballooning valgrind > reports for me. I then see a constant-sized < 500 byte leak complaint > from valgrind, the same as with unpatched psql. > Yup, that fixes it. Thanks. > > >> 4.) I think the changes to process_file() merit another comment or > >> two, e.g. describing what relative_file is supposed to be. > > > Added. > > Some cleanup for this comment: > > + /* > +* If the currently processing file uses \ir command, and > the parameter > +* to the command is a relative file path, then we resolve > this path > +* relative to currently processing file. > > suggested tweak: > >If the currently processing file uses the \ir command, and the filename >parameter is given as a relative path, then we resolve this path > relative >to the currently processing file (pset.inputfile). > > +* > +* If the \ir command was executed in interactive mode > (i.e. not in a > +* script) the we treat it the same as \i command. > +*/ > > suggested tweak: > >If the \ir command was executed in interactive mode (i.e. not in a >script, and pset.inputfile will be NULL) then we treat the filename >the same as the \i command does. > Tweak
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Hi, > On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote: > > For 1, I've just finish my work. The latest patch is available at: > > http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch > > > > Reminder here--we can't accept code based on it being published to a web > page. You'll need to e-mail it to the pgsql-hackers mailing list to be > considered for the next PostgreSQL CommitFest, which is starting in a > few weeks. Code submitted to the mailing list is considered a release > of it to the project under the PostgreSQL license, which we can't just > assume for things when given only a URL to them. Sorry about that, but I had enough time to revise my patches this week-end. I attached the patches in this mail, and will update CommitFest page soon. > Also, you suggested you were out of time to work on this. If that's the > case, we'd like to know that so we don't keep cc'ing you about things in > expectation of an answer. Someone else may pick this up as a project to > continue working on. But it's going to need a fair amount of revision > before it matches what people want here, and I'm not sure how much of > what you've written is going to end up in any commit that may happen > from this idea. It seems that I don't have enough time to complete this work. You don't need to keep cc'ing me, and I'm very happy if postgres to be the first DBMS which support buffer cache hibernation feature. Thanks! diff --git src/backend/access/transam/xlog.c src/backend/access/transam/xlog.c index b0e4c41..7a3a207 100644 --- src/backend/access/transam/xlog.c +++ src/backend/access/transam/xlog.c @@ -4834,6 +4834,19 @@ ReadControlFile(void) #endif } +bool +GetControlFile(ControlFileData *controlFile) +{ + if (ControlFile == NULL) + { + return false; + } + + memcpy(controlFile, ControlFile, sizeof(ControlFileData)); + + return true; +} + void UpdateControlFile(void) { diff --git src/backend/bootstrap/bootstrap.c src/backend/bootstrap/bootstrap.c index fc093cc..7ecf6bb 100644 --- src/backend/bootstrap/bootstrap.c +++ src/backend/bootstrap/bootstrap.c @@ -360,6 +360,15 @@ AuxiliaryProcessMain(int argc, char *argv[]) BaseInit(); /* +* Only StartupProcess can call ResumeBufferCacheHibernation() after +* InitFileAccess() and smgrinit(). +*/ + if (auxType == StartupProcess && BufferCacheHibernationLevel > 0) + { + ResumeBufferCacheHibernation(); + } + + /* * When we are an auxiliary process, we aren't going to do the full * InitPostgres pushups, but there are a couple of things that need to get * lit up even in an auxiliary process. diff --git src/backend/storage/buffer/buf_init.c src/backend/storage/buffer/buf_init.c index dadb49d..52eb51a 100644 --- src/backend/storage/buffer/buf_init.c +++ src/backend/storage/buffer/buf_init.c @@ -127,6 +127,14 @@ InitBufferPool(void) /* Init other shared buffer-management stuff */ StrategyInitialize(!foundDescs); + + if (BufferCacheHibernationLevel > 0) + { + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS, + (char *)BufferDescriptors, sizeof(BufferDesc), NBuffers); + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS, + (char *)BufferBlocks, BLCKSZ, NBuffers); + } } /* diff --git src/backend/storage/buffer/bufmgr.c src/backend/storage/buffer/bufmgr.c index f96685d..dba8ebf 100644 --- src/backend/storage/buffer/bufmgr.c +++ src/backend/storage/buffer/bufmgr.c @@ -31,6 +31,7 @@ #include "postgres.h" #include +#include #include #include "catalog/catalog.h" @@ -61,6 +62,13 @@ #define BUF_WRITTEN0x01 #define BUF_REUSABLE 0x02 +/* + * Buffer Cache Hibernation stuff. + */ +/* enable this to debug buffer cache hibernation. */ +#if 0 +#define DEBUG_BUFFER_CACHE_HIBERNATION +#endif /* GUC variables */ bool zero_damaged_pages = false; @@ -765,6 +773,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, } } +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION + elog(DEBUG5, + "alloc [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d", + buf->buf_id, buf->flags, buf->usage_count, buf->refcount, + buf->wait_backend_pid, buf->freeNext, + newHash, newTag.rnode.spcNode, + newTag.rnode.dbNode, newTag.rnode.relNode, + newTag.forkNum, newTag.blockNum); +#endif + return buf; } @@ -800,6 +818,16 @@ BufferAlloc(SMgrRelation