Re: [HACKERS] foreign key locks, 2nd attempt
On 25 March 2012 09:17, Simon Riggs wrote: > The main thing we're waiting on are the performance tests to confirm > the lack of regression. I have extensively benchmarked the latest revision of the patch (tpc-b.sql), which I pulled from Alvaro's github account. The benchmark was of the feature branch's then-and-current HEAD, "Don't follow update chains unless caller requests it". I've had to split these numbers out into two separate reports. Incidentally, at some future point I hope that pgbench-tools can handling testing across feature branches, initdb'ing and suchlike automatically and as needed. That's something that's likely to happen sooner rather than later. The server used was kindly supplied by the University of Oregon open source lab. Server (It's virtualised, but apparently this is purely for sandboxing purposes and the virtualisation technology is rather good): IBM,8231-E2B POWER7 processor (8 cores). Fedora 16 8GB Ram Dedicated RAID1 disks. Exact configuration unknown. postgresql.conf (this information is available when you drill down into each test too, fwiw): max_connections = 200 shared_buffers = 2GB checkpoint_segments = 30 checkpoint_completion_target = 0.8 effective_cache_size = 6GB Reports: http://results_fklocks.staticloud.com/ http://results_master_for_fks.staticloud.com/ Executive summary: There is a clear regression of less than 10%. There also appears to be a new source of contention at higher client counts. I realise that the likely upshot of this, and other concerns that are generally held at this late stage is that this patch will not make it into 9.2 . For what it's worth, that comes as a big disappointment to me. I would like to thank both Alvaro and Noah for their hard work here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Sat, Mar 17, 2012 at 10:45 PM, Alvaro Herrera wrote: > Here is v11. This version is mainly updated to add pg_upgrade support, > as discussed. It also contains the README file that was posted earlier > (plus wording fixes per Bruce), a couple of bug fixes, and some comment > updates. The main thing we're waiting on are the performance tests to confirm the lack of regression. You are working on that, right? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of mar mar 06 17:28:12 -0300 2012: > On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera > wrote: > > > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is > > super-exclusive locking (used to delete tuples and more generally to update > > tuples modifying the values of the columns that make up the key of the > > tuple); > > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE > > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak > > mode > > that does not conflict with exclusive mode, but conflicts with SELECT FOR > > KEY > > UPDATE. This last mode implements a mode just strong enough to implement RI > > checks, i.e. it ensures that tuples do not go away from under a check, > > without > > blocking when some other transaction that want to update the tuple without > > changing its key. > > So there are 4 lock types, but we only have room for 3 on the tuple > header, so we store the least common/deprecated of the 4 types as a > multixactid. Some rewording would help there. Hmm, I rewrote that paragraph two times. I'll try to adjust it a bit more. > My understanding is that all of theses workloads will change > > * Users of explicit SHARE lockers will be slightly worse in the case > of the 1st locker, but then after that they'll be the same as before. Right. (We're assuming that there *are* users of SHARE locks, which I'm not sure to be a given.) > * Updates against an RI locked table will be dramatically faster > because of reduced lock waits Correct. > ...and that these previous workloads are effectively unchanged: > > * Stream of RI checks causes mxacts Yes. > * Multi row deadlocks still possible Yes. > * Queues of writers still wait in the same way Yes. > * Deletes don't cause mxacts unless by same transaction Yeah .. there's no way for anyone to not conflict with a FOR KEY UPDATE lock (the strength grabbed by a delete) unless you're the same transaction. > > The possibility of having an update within a MultiXact means that they must > > persist across crashes and restarts: a future reader of the tuple needs to > > figure out whether the update committed or aborted. So we have a > > requirement > > that pg_multixact needs to retain pages of its data until we're certain that > > the MultiXacts in them are no longer of interest. > > I think the "no longer of interest" aspect needs to be tracked more > closely because it will necessarily lead to more I/O. Not sure what you mean here. > If we store the LSN on each mxact page, as I think we need to, we can > get rid of pages more quickly if we know they don't have an LSN set. > So its possible we can optimise that more. Hmm, I had originally thought that this was rather pointless because it was unlikely that a segment would *never* have *all* multis not containing updates. But then, maybe Robert is right and there are users out there that run a lot of RI checks and never update the masters ... Hm. I'm not sure that LSN tracking is the right tool to do that optimization, however -- I mean, a single multi containing an update in a whole segment will prevent that segment from being considered useless. > > VACUUM is in charge of removing old MultiXacts at the time of tuple > > freezing. > > You mean mxact segments? Well, both. When a tuple is frozen, we both remove its Xmin/Xmax and any possible multi that it might have in Xmax. That's what I really meant above. But also, vacuum will remove pg_multixact segments just as it will remove pg_clog segments. (It is possible, and probably desirable, to remove a Multi much earlier than freezing the tuple. The patch does not (yet) do that, however.) > Surely we set hint bits on tuples same as now? Hope so. We set hint bits, but if a multi contains an update, we don't set HEAP_XMAX_COMMITTED even when the update is known committed. I think we could do this in some cases. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of mar mar 06 18:33:13 -0300 2012: > The lock modes are correct, appropriate and IMHO have meaningful > names. No redesign required here. > > Not sure about the naming of some of the flag bits however. Feel free to suggest improvements ... I've probably seen them for too long to find them anything but what I intended them to mean. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012: > > On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas wrote: > >> Regarding performance, the good thing about this patch is that if you > >> have an operation that used to block, it might now not block. So maybe > >> multixact-related operation is a bit slower than before, but if it > >> allows you to continue operating rather than sit waiting until some > >> other transaction releases you, it's much better. > > > > That's probably true, although there is some deferred cost that is > > hard to account for. You might not block immediately, but then later > > somebody might block either because the mxact SLRU now needs fsyncs or > > because they've got to decode an mxid long after the relevant segment > > has been evicted from the SLRU buffers. In general, it's hard to > > bound that latter cost, because you only avoid blocking once (when the > > initial update happens) but you might pay the extra cost of decoding > > the mxid as many times as the row is read, which could be arbitrarily > > many. How much of a problem that is in practice, I'm not completely > > sure, but it has worried me before and it still does. In the worst > > case scenario, a handful of frequently-accessed rows with MXIDs all of > > whose members are dead except for the UPDATE they contain could result > > in continual SLRU cache-thrashing. > > Cases I regularly see involve wait times of many seconds. > > When this patch helps, it will help performance by algorithmic gains, > so perhaps x10-100. > > That can and should be demonstrated though, I agree. BTW, the isolation tester cases have a few places that in the unpatched code die with deadlocks and with the patched code continue without dying. Others are cases that block in unpatched master, and continue without blocking in patched. This should be enough proof that there are "algorithmic gains" here. There's also a test case that demostrates a fix for the problem (pointed out by docs) that if you acquire a row lock, then a subxact upgrades it (say by deleting the row) and the subxact aborts, the original row lock is lost. With the patched code, the original lock is no longer lost. I completely agree with the idea that we need some mitigation against repeated lookups of mxids that contain committed updates. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Bruce Momjian's message of vie mar 16 15:22:05 -0300 2012: > > On Fri, Mar 16, 2012 at 10:08:07AM -0400, Robert Haas wrote: > > On Thu, Mar 15, 2012 at 11:09 PM, Bruce Momjian wrote: > > > On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote: > > >> On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs > > >> wrote: > > >> > I agree with you that some worst case performance tests should be > > >> > done. Could you please say what you think the worst cases would be, so > > >> > those can be tested? That would avoid wasting time or getting anything > > >> > backwards. > > >> > > >> I've thought about this some and here's what I've come up with so far: > > > > > > I question whether we are in a position to do the testing necessary to > > > commit this for 9.2. > > > > Is anyone even working on testing it? > > No one I know of. I am just trying to set expectations that this still > has a long way to go. A Command Prompt colleague is on it. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Fri, Mar 16, 2012 at 10:08:07AM -0400, Robert Haas wrote: > On Thu, Mar 15, 2012 at 11:09 PM, Bruce Momjian wrote: > > On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote: > >> On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs wrote: > >> > I agree with you that some worst case performance tests should be > >> > done. Could you please say what you think the worst cases would be, so > >> > those can be tested? That would avoid wasting time or getting anything > >> > backwards. > >> > >> I've thought about this some and here's what I've come up with so far: > > > > I question whether we are in a position to do the testing necessary to > > commit this for 9.2. > > Is anyone even working on testing it? No one I know of. I am just trying to set expectations that this still has a long way to go. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Fri, Mar 16, 2012 at 10:40:01AM -0300, Alvaro Herrera wrote: > > Excerpts from Alvaro Herrera's message of vie mar 16 10:36:11 -0300 2012: > > > > Now I am confused. Where do you see the word "hint" used by > > > HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK. These are tuple infomask > > > bits, not hints, meaning they are not optional or there just for > > > performance. > > > > Okay, I think this is just a case of confusing terminology. I have > > always assumed (because I have not seen any evidence to the contrary) > > that anything in t_infomask and t_infomask2 is a "hint bit" -- > > regardless of it being actually a hint or something with a stronger > > significance. > > Maybe this is just my mistake. I see in > http://wiki.postgresql.org/wiki/Hint_Bits that we only call the > COMMITTED/INVALID infomask bits "hints". > > I think it's easy enough to correct the README to call them "infomask > bits" rather than hints .. I'll go do that. OK, thanks. I only brought it up so people would not be confused by thinking these were optional pieces of information, and that the real information is stored somewhere else. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 11:09 PM, Bruce Momjian wrote: > On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote: >> On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs wrote: >> > I agree with you that some worst case performance tests should be >> > done. Could you please say what you think the worst cases would be, so >> > those can be tested? That would avoid wasting time or getting anything >> > backwards. >> >> I've thought about this some and here's what I've come up with so far: > > I question whether we are in a position to do the testing necessary to > commit this for 9.2. Is anyone even working on testing it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Alvaro Herrera's message of vie mar 16 10:36:11 -0300 2012: > > Now I am confused. Where do you see the word "hint" used by > > HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK. These are tuple infomask > > bits, not hints, meaning they are not optional or there just for > > performance. > > Okay, I think this is just a case of confusing terminology. I have > always assumed (because I have not seen any evidence to the contrary) > that anything in t_infomask and t_infomask2 is a "hint bit" -- > regardless of it being actually a hint or something with a stronger > significance. Maybe this is just my mistake. I see in http://wiki.postgresql.org/wiki/Hint_Bits that we only call the COMMITTED/INVALID infomask bits "hints". I think it's easy enough to correct the README to call them "infomask bits" rather than hints .. I'll go do that. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Bruce Momjian's message of vie mar 16 00:04:06 -0300 2012: > > On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote: > > > > Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012: > > > > > > On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote: > > > > > > When there is a single locker in a tuple, we can just store the locking > > > > info > > > > in the tuple itself. We do this by storing the locker's Xid in XMAX, > > > > and > > > > setting hint bits specifying the locking strength. There is one > > > > exception > > > > here: since hint bit space is limited, we do not provide a separate > > > > hint bit > > > > for SELECT FOR SHARE, so we have to use the extended info in a > > > > MultiXact in > > > > that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY > > > > SHARE, are > > > > presumably more commonly used due to being the standards-mandated > > > > locking > > > > mechanism, or heavily used by the RI code, so we want to provide fast > > > > paths > > > > for those.) > > > > > > Are those tuple bits actually "hint" bits? They seem quite a bit more > > > powerful than a "hint". > > > > I'm not sure what's your point. We've had a "hint" bit for SELECT FOR > > UPDATE for ages. Even 8.2 had HEAP_XMAX_EXCL_LOCK and > > HEAP_XMAX_SHARED_LOCK. Maybe they are misnamed and aren't really > > "hints", but it's not the job of this patch to fix that problem. > > Now I am confused. Where do you see the word "hint" used by > HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK. These are tuple infomask > bits, not hints, meaning they are not optional or there just for > performance. Okay, I think this is just a case of confusing terminology. I have always assumed (because I have not seen any evidence to the contrary) that anything in t_infomask and t_infomask2 is a "hint bit" -- regardless of it being actually a hint or something with a stronger significance. HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK are certainly not "optional" in the sense that if they are missing, the meaning of the Xmax field is completely different. So in all correctness they are not "hints", though we call them that. Now, if we want to differentiate infomask bits that are just hints from those that are something else, we can do that, but I'm not sure it's useful -- AFAICS only XMAX_COMMITTED and XMIN_COMMITTED are proper hints. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote: > On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs wrote: > > I agree with you that some worst case performance tests should be > > done. Could you please say what you think the worst cases would be, so > > those can be tested? That would avoid wasting time or getting anything > > backwards. > > I've thought about this some and here's what I've come up with so far: I question whether we are in a position to do the testing necessary to commit this for 9.2. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 11:04:06PM -0400, Bruce Momjian wrote: > On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote: > > > > Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012: > > > > > > On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote: > > > > > > When there is a single locker in a tuple, we can just store the locking > > > > info > > > > in the tuple itself. We do this by storing the locker's Xid in XMAX, > > > > and > > > > setting hint bits specifying the locking strength. There is one > > > > exception > > > > here: since hint bit space is limited, we do not provide a separate > > > > hint bit > > > > for SELECT FOR SHARE, so we have to use the extended info in a > > > > MultiXact in > > > > that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY > > > > SHARE, are > > > > presumably more commonly used due to being the standards-mandated > > > > locking > > > > mechanism, or heavily used by the RI code, so we want to provide fast > > > > paths > > > > for those.) > > > > > > Are those tuple bits actually "hint" bits? They seem quite a bit more > > > powerful than a "hint". > > > > I'm not sure what's your point. We've had a "hint" bit for SELECT FOR > > UPDATE for ages. Even 8.2 had HEAP_XMAX_EXCL_LOCK and > > HEAP_XMAX_SHARED_LOCK. Maybe they are misnamed and aren't really > > "hints", but it's not the job of this patch to fix that problem. > > Now I am confused. Where do you see the word "hint" used by > HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK. These are tuple infomask > bits, not hints, meaning they are not optional or there just for > performance. Are you saying that the bit is only a guide and is there only for performance? If so, I understand why it is called "hint". -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 13, 2012 at 02:35:02PM -0300, Alvaro Herrera wrote: > > Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012: > > > > On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote: > > > > When there is a single locker in a tuple, we can just store the locking > > > info > > > in the tuple itself. We do this by storing the locker's Xid in XMAX, and > > > setting hint bits specifying the locking strength. There is one exception > > > here: since hint bit space is limited, we do not provide a separate hint > > > bit > > > for SELECT FOR SHARE, so we have to use the extended info in a MultiXact > > > in > > > that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, > > > are > > > presumably more commonly used due to being the standards-mandated locking > > > mechanism, or heavily used by the RI code, so we want to provide fast > > > paths > > > for those.) > > > > Are those tuple bits actually "hint" bits? They seem quite a bit more > > powerful than a "hint". > > I'm not sure what's your point. We've had a "hint" bit for SELECT FOR > UPDATE for ages. Even 8.2 had HEAP_XMAX_EXCL_LOCK and > HEAP_XMAX_SHARED_LOCK. Maybe they are misnamed and aren't really > "hints", but it's not the job of this patch to fix that problem. Now I am confused. Where do you see the word "hint" used by HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK. These are tuple infomask bits, not hints, meaning they are not optional or there just for performance. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 08:37:36PM -0400, Robert Haas wrote: > On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs wrote: > > On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas wrote: > >>> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on > >>> mxid > >>> lookups, so I think something more sophisticated is needed to exercise > >>> that > >>> cost. ?Not sure what. > >> > >> I don't think HEAP_XMAX_COMMITTED is much help, because committed != > >> all-visible. > > > > So because committed does not equal all visible there will be > > additional lookups on mxids? That's complete rubbish. > > Noah seemed to be implying that once the updating transaction > committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup. > But I think that's not true, because anyone who looks at the tuple > afterward will still need to know the exact xmax, to test it against > their snapshot. Yeah, my comment above was wrong. I agree that we'll need to retrieve the mxid members during every MVCC scan until we either mark the page all-visible or have occasion to simplify the mxid xmax to the updater xid. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Robert Haas's message of jue mar 15 21:37:36 -0300 2012: > > On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs wrote: > > On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas wrote: > >>> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on > >>> mxid > >>> lookups, so I think something more sophisticated is needed to exercise > >>> that > >>> cost. Not sure what. > >> > >> I don't think HEAP_XMAX_COMMITTED is much help, because committed != > >> all-visible. > > > > So because committed does not equal all visible there will be > > additional lookups on mxids? That's complete rubbish. > > Noah seemed to be implying that once the updating transaction > committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup. > But I think that's not true, because anyone who looks at the tuple > afterward will still need to know the exact xmax, to test it against > their snapshot. Yeah, we don't set HEAP_XMAX_COMMITTED on multis, even when there's an update in it and it committed. I think we could handle it, at least some of the cases, but that'd require careful re-examination of all the tqual.c code, which is not something I want to do right now. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 5:07 PM, Simon Riggs wrote: > On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas wrote: >>> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid >>> lookups, so I think something more sophisticated is needed to exercise that >>> cost. Not sure what. >> >> I don't think HEAP_XMAX_COMMITTED is much help, because committed != >> all-visible. > > So because committed does not equal all visible there will be > additional lookups on mxids? That's complete rubbish. Noah seemed to be implying that once the updating transaction committed, HEAP_XMAX_COMMITTED would get set and save the mxid lookup. But I think that's not true, because anyone who looks at the tuple afterward will still need to know the exact xmax, to test it against their snapshot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 10:13 PM, Alvaro Herrera wrote: > > Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012: >> >> On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera >> wrote: >> > >> > Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012: >> >> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas >> >> wrote: >> > >> >> > But that would only make sense if >> >> > we thought that getting rid of the fsyncs would be more valuable than >> >> > avoiding the blocking here, and I don't. >> >> >> >> You're right that the existing code could use some optimisation. >> >> >> >> I'm a little tired, but I can't see a reason to fsync this except at >> >> checkpoint. >> > >> > Hang on. What fsyncs are we talking about? I don't see that the >> > multixact code calls any fsync except that checkpoint and shutdown. >> >> If a dirty page is evicted it will fsync. > > Ah, right. > >> >> Also seeing that we issue 2 WAL records for each RI check. We issue >> >> one during MultiXactIdCreate/MultiXactIdExpand and then immediately >> >> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show >> >> that each thinks it is doing it for the same reason and is the only >> >> place its being done. Alvaro, any ideas why that is. >> > >> > AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is >> > being locked by a multixact -- it doesn't record the contents (member >> > xids) of said multixact, which is what the other log entry records. >> >> Agreed. But issuing two records when we could issue just one seems a >> little strange, especially when the two record types follow one >> another so closely - so we end up queuing for the lock twice while >> holding the lock on the data block. > > Hmm, that seems optimization that could be done separately. Oh yes, definitely not something for you to add to the main patch. Just some additional tuning to alleviate Robert's concerns. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of jue mar 15 19:04:41 -0300 2012: > > On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera > wrote: > > > > Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012: > >> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas wrote: > > > >> > But that would only make sense if > >> > we thought that getting rid of the fsyncs would be more valuable than > >> > avoiding the blocking here, and I don't. > >> > >> You're right that the existing code could use some optimisation. > >> > >> I'm a little tired, but I can't see a reason to fsync this except at > >> checkpoint. > > > > Hang on. What fsyncs are we talking about? I don't see that the > > multixact code calls any fsync except that checkpoint and shutdown. > > If a dirty page is evicted it will fsync. Ah, right. > >> Also seeing that we issue 2 WAL records for each RI check. We issue > >> one during MultiXactIdCreate/MultiXactIdExpand and then immediately > >> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show > >> that each thinks it is doing it for the same reason and is the only > >> place its being done. Alvaro, any ideas why that is. > > > > AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is > > being locked by a multixact -- it doesn't record the contents (member > > xids) of said multixact, which is what the other log entry records. > > Agreed. But issuing two records when we could issue just one seems a > little strange, especially when the two record types follow one > another so closely - so we end up queuing for the lock twice while > holding the lock on the data block. Hmm, that seems optimization that could be done separately. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 9:54 PM, Alvaro Herrera wrote: > > Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012: >> On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas wrote: > >> > But that would only make sense if >> > we thought that getting rid of the fsyncs would be more valuable than >> > avoiding the blocking here, and I don't. >> >> You're right that the existing code could use some optimisation. >> >> I'm a little tired, but I can't see a reason to fsync this except at >> checkpoint. > > Hang on. What fsyncs are we talking about? I don't see that the > multixact code calls any fsync except that checkpoint and shutdown. If a dirty page is evicted it will fsync. >> Also seeing that we issue 2 WAL records for each RI check. We issue >> one during MultiXactIdCreate/MultiXactIdExpand and then immediately >> afterwards issue a XLOG_HEAP_LOCK record. The comments on both show >> that each thinks it is doing it for the same reason and is the only >> place its being done. Alvaro, any ideas why that is. > > AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is > being locked by a multixact -- it doesn't record the contents (member > xids) of said multixact, which is what the other log entry records. Agreed. But issuing two records when we could issue just one seems a little strange, especially when the two record types follow one another so closely - so we end up queuing for the lock twice while holding the lock on the data block. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of jue mar 15 18:46:44 -0300 2012: > > On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera > wrote: > > > As things stand today > > Can I confirm where we are now? Is there another version of the patch > coming out soon? Yes, another version is coming soon. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of jue mar 15 18:38:53 -0300 2012: > On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas wrote: > > But that would only make sense if > > we thought that getting rid of the fsyncs would be more valuable than > > avoiding the blocking here, and I don't. > > You're right that the existing code could use some optimisation. > > I'm a little tired, but I can't see a reason to fsync this except at > checkpoint. Hang on. What fsyncs are we talking about? I don't see that the multixact code calls any fsync except that checkpoint and shutdown. > Also seeing that we issue 2 WAL records for each RI check. We issue > one during MultiXactIdCreate/MultiXactIdExpand and then immediately > afterwards issue a XLOG_HEAP_LOCK record. The comments on both show > that each thinks it is doing it for the same reason and is the only > place its being done. Alvaro, any ideas why that is. AFAIR the XLOG_HEAP_LOCK log entry only records the fact that the row is being locked by a multixact -- it doesn't record the contents (member xids) of said multixact, which is what the other log entry records. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 1:17 AM, Alvaro Herrera wrote: > As things stand today Can I confirm where we are now? Is there another version of the patch coming out soon? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 2:26 AM, Robert Haas wrote: > On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera > wrote: >>> > Agreed. But speaking of that, why exactly do we fsync the multixact SLRU >>> > today? >>> >>> Good question. So far, I can't think of a reason. "nextMulti" is critical, >>> but we already fsync it with pg_control. We could delete the other >>> multixact >>> state data at every startup and set OldestVisibleMXactId accordingly. >> >> Hmm, yeah. > > In a way, the fact that we don't do that is kind of fortuitous in this > situation. I had just assumed that we were not fsyncing it because > there seems to be no reason to do so. But since we are, we already > know that the fsyncs resulting from frequent mxid allocation aren't a > huge pain point. If they were, somebody would have presumably > complained about it and fixed it before now. So that means that what > we're really worrying about here is the overhead of fsyncing a little > more often, which is a lot less scary than starting to do it when we > weren't previously. Good > Now, we could look at this as an opportunity to optimize the existing > implementation by removing the fsyncs, rather than adding the new > infrastructure Alvaro is proposing. This is not an exercise in tuning mxact code. There is a serious algorithmic problem that is causing real world problems. Removing the fsync will *not* provide a solution to the problem, so there is no "opportunity" here. > But that would only make sense if > we thought that getting rid of the fsyncs would be more valuable than > avoiding the blocking here, and I don't. You're right that the existing code could use some optimisation. I'm a little tired, but I can't see a reason to fsync this except at checkpoint. Also seeing that we issue 2 WAL records for each RI check. We issue one during MultiXactIdCreate/MultiXactIdExpand and then immediately afterwards issue a XLOG_HEAP_LOCK record. The comments on both show that each thinks it is doing it for the same reason and is the only place its being done. Alvaro, any ideas why that is. > I still think that someone needs to do some benchmarking here, because > this is a big complicated performance patch, and we can't predict the > impact of it on real-world scenarios without testing. There is > clearly some additional overhead, and it makes sense to measure it and > hopefully discover that it isn't excessive. Still, I'm a bit > relieved. Very much agreed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Mar 15, 2012 at 2:15 AM, Robert Haas wrote: > On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch wrote: >>> Well, post-release, the cat is out of the bag: we'll be stuck with >>> this whether the performance characteristics are acceptable or not. >>> That's why we'd better be as sure as possible before committing to >>> this implementation that there's nothing we can't live with. It's not >>> like there's any reasonable way to turn this off if you don't like it. >> >> I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE >> syntax additions. We could even avoid doing that by not documenting them. A >> later major release could implement them using a completely different >> mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE = >> UPDATE. To be sure, let's still do a good job the first time. > > What I mean is really that, once the release is out, we don't get to > take it back. Sure, the next release can fix things, but any > regressions will become obstacles to upgrading and pain points for new > users. This comment is completely superfluous. It's a complete waste of time to turn up on a thread and remind people that if they commit something and it doesn't actually work that it would be a bad thing. Why, we might ask do you think that thought needs to be expressed here? Please, don't answer, lets spend the time on actually reviewing the patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Mar 14, 2012 at 5:23 PM, Robert Haas wrote: >> You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid >> lookups, so I think something more sophisticated is needed to exercise that >> cost. Not sure what. > > I don't think HEAP_XMAX_COMMITTED is much help, because committed != > all-visible. So because committed does not equal all visible there will be additional lookups on mxids? That's complete rubbish. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Mar 14, 2012 at 9:17 PM, Alvaro Herrera wrote: >> > Agreed. But speaking of that, why exactly do we fsync the multixact SLRU >> > today? >> >> Good question. So far, I can't think of a reason. "nextMulti" is critical, >> but we already fsync it with pg_control. We could delete the other multixact >> state data at every startup and set OldestVisibleMXactId accordingly. > > Hmm, yeah. In a way, the fact that we don't do that is kind of fortuitous in this situation. I had just assumed that we were not fsyncing it because there seems to be no reason to do so. But since we are, we already know that the fsyncs resulting from frequent mxid allocation aren't a huge pain point. If they were, somebody would have presumably complained about it and fixed it before now. So that means that what we're really worrying about here is the overhead of fsyncing a little more often, which is a lot less scary than starting to do it when we weren't previously. Now, we could look at this as an opportunity to optimize the existing implementation by removing the fsyncs, rather than adding the new infrastructure Alvaro is proposing. But that would only make sense if we thought that getting rid of the fsyncs would be more valuable than avoiding the blocking here, and I don't. I still think that someone needs to do some benchmarking here, because this is a big complicated performance patch, and we can't predict the impact of it on real-world scenarios without testing. There is clearly some additional overhead, and it makes sense to measure it and hopefully discover that it isn't excessive. Still, I'm a bit relieved. > I have noticed that this code is not correct, because we don't know that > we're holding an appropriate lock on the page, so we can't simply change > the Xmax and reset those hint bits. As things stand today, mxids > persist longer. (We could do some cleanup at HOT-style page prune, for > example, though the lock we need is even weaker than that.) Overall > this means that coming up with a test case demonstrating this pressure > probably isn't that hard. What would such a test case look like? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Mar 14, 2012 at 6:10 PM, Noah Misch wrote: >> Well, post-release, the cat is out of the bag: we'll be stuck with >> this whether the performance characteristics are acceptable or not. >> That's why we'd better be as sure as possible before committing to >> this implementation that there's nothing we can't live with. It's not >> like there's any reasonable way to turn this off if you don't like it. > > I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE > syntax additions. We could even avoid doing that by not documenting them. A > later major release could implement them using a completely different > mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE = > UPDATE. To be sure, let's still do a good job the first time. What I mean is really that, once the release is out, we don't get to take it back. Sure, the next release can fix things, but any regressions will become obstacles to upgrading and pain points for new users. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Noah Misch's message of mié mar 14 19:10:00 -0300 2012: > > On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote: > > On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch wrote: > > > More often than that; each 2-member mxid takes 4 bytes in an offsets file > > > and > > > 10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note > > > that we already fsync the multixact SLRUs today, so any increase will > > > arise > > > from the widening of member entries from 4 bytes to 5. ?The realism of > > > this > > > test is attractive. ?Nearly-static parent tables are plenty common, and > > > this > > > test will illustrate the impact on those designs. > > > > Agreed. But speaking of that, why exactly do we fsync the multixact SLRU > > today? > > Good question. So far, I can't think of a reason. "nextMulti" is critical, > but we already fsync it with pg_control. We could delete the other multixact > state data at every startup and set OldestVisibleMXactId accordingly. Hmm, yeah. > > > You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on > > > mxid > > > lookups, so I think something more sophisticated is needed to exercise > > > that > > > cost. ?Not sure what. > > > > I don't think HEAP_XMAX_COMMITTED is much help, because committed != > > all-visible. HEAP_XMAX_INVALID will obviously help, when it happens. > > True. The patch (see ResetMultiHintBit()) also replaces a multixact xmax with > the updater xid when all transactions of the multixact have ended. I have noticed that this code is not correct, because we don't know that we're holding an appropriate lock on the page, so we can't simply change the Xmax and reset those hint bits. As things stand today, mxids persist longer. (We could do some cleanup at HOT-style page prune, for example, though the lock we need is even weaker than that.) Overall this means that coming up with a test case demonstrating this pressure probably isn't that hard. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Mar 14, 2012 at 01:23:14PM -0400, Robert Haas wrote: > On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch wrote: > > More often than that; each 2-member mxid takes 4 bytes in an offsets file > > and > > 10 bytes in a members file. ?So, more like one fsync per ~580 mxids. ?Note > > that we already fsync the multixact SLRUs today, so any increase will arise > > from the widening of member entries from 4 bytes to 5. ?The realism of this > > test is attractive. ?Nearly-static parent tables are plenty common, and this > > test will illustrate the impact on those designs. > > Agreed. But speaking of that, why exactly do we fsync the multixact SLRU > today? Good question. So far, I can't think of a reason. "nextMulti" is critical, but we already fsync it with pg_control. We could delete the other multixact state data at every startup and set OldestVisibleMXactId accordingly. > > You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid > > lookups, so I think something more sophisticated is needed to exercise that > > cost. ?Not sure what. > > I don't think HEAP_XMAX_COMMITTED is much help, because committed != > all-visible. HEAP_XMAX_INVALID will obviously help, when it happens. True. The patch (see ResetMultiHintBit()) also replaces a multixact xmax with the updater xid when all transactions of the multixact have ended. You would need a test workload with long-running multixacts that delay such replacement. However, the workloads that come to mind are the very workloads for which this patch eliminates lock waits; they wouldn't illustrate a worst-case. > >> This isn't exactly a test case, but from Noah's previous comments I > >> gather that there is a theoretical risk of mxid consumption running > >> ahead of xid consumption. ?We should try to think about whether there > >> are any realistic workloads where that might actually happen. ?I'm > >> willing to believe that there aren't, but not just because somebody > >> asserts it. ?The reason I'm concerned about this is because, if it > >> should happen, the result will be more frequent anti-wraparound > >> vacuums on every table in the cluster. ?Those are already quite > >> painful for some users. > > > > Yes. ?Pre-release, what can we really do here other than have more people > > thinking about ways it might happen in practice? ?Post-release, we could > > suggest monitoring methods or perhaps have VACUUM emit a WARNING when a > > table > > is using more mxid space than xid space. > > Well, post-release, the cat is out of the bag: we'll be stuck with > this whether the performance characteristics are acceptable or not. > That's why we'd better be as sure as possible before committing to > this implementation that there's nothing we can't live with. It's not > like there's any reasonable way to turn this off if you don't like it. I disagree; we're only carving in stone the FOR KEY SHARE and FOR KEY UPDATE syntax additions. We could even avoid doing that by not documenting them. A later major release could implement them using a completely different mechanism or even reduce them to aliases, KEY SHARE = SHARE and KEY UPDATE = UPDATE. To be sure, let's still do a good job the first time. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 13, 2012 at 11:42 PM, Noah Misch wrote: > More often than that; each 2-member mxid takes 4 bytes in an offsets file and > 10 bytes in a members file. So, more like one fsync per ~580 mxids. Note > that we already fsync the multixact SLRUs today, so any increase will arise > from the widening of member entries from 4 bytes to 5. The realism of this > test is attractive. Nearly-static parent tables are plenty common, and this > test will illustrate the impact on those designs. Agreed. But speaking of that, why exactly do we fsync the multixact SLRU today? > You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid > lookups, so I think something more sophisticated is needed to exercise that > cost. Not sure what. I don't think HEAP_XMAX_COMMITTED is much help, because committed != all-visible. HEAP_XMAX_INVALID will obviously help, when it happens. >> This isn't exactly a test case, but from Noah's previous comments I >> gather that there is a theoretical risk of mxid consumption running >> ahead of xid consumption. We should try to think about whether there >> are any realistic workloads where that might actually happen. I'm >> willing to believe that there aren't, but not just because somebody >> asserts it. The reason I'm concerned about this is because, if it >> should happen, the result will be more frequent anti-wraparound >> vacuums on every table in the cluster. Those are already quite >> painful for some users. > > Yes. Pre-release, what can we really do here other than have more people > thinking about ways it might happen in practice? Post-release, we could > suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table > is using more mxid space than xid space. Well, post-release, the cat is out of the bag: we'll be stuck with this whether the performance characteristics are acceptable or not. That's why we'd better be as sure as possible before committing to this implementation that there's nothing we can't live with. It's not like there's any reasonable way to turn this off if you don't like it. > Also consider a benchmark that does plenty of non-key updates on a parent > table with no activity on the child table. We'll pay the overhead of > determining that the key column(s) have not changed, but it will never pay off > by preventing a lock wait. Good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 13, 2012 at 01:46:24PM -0400, Robert Haas wrote: > On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs wrote: > > I agree with you that some worst case performance tests should be > > done. Could you please say what you think the worst cases would be, so > > those can be tested? That would avoid wasting time or getting anything > > backwards. > > I've thought about this some and here's what I've come up with so far: > > 1. SELECT FOR SHARE on a large table on a system with no write cache. Easy enough that we may as well check it. Share-locking an entire large table is impractical in a real application, so I would not worry if this shows a substantial regression. > 2. A small parent table (say 30 rows or so) and a larger child table > with a many-to-one FK relationship to the parent (say 100 child rows > per parent row), with heavy update activity on the child table, on a > system where fsyncs are very slow. This should generate lots of mxid > consumption, and every 1600 or so mxids (I think) we've got to fsync; > does that generate a noticeable performance hit? More often than that; each 2-member mxid takes 4 bytes in an offsets file and 10 bytes in a members file. So, more like one fsync per ~580 mxids. Note that we already fsync the multixact SLRUs today, so any increase will arise from the widening of member entries from 4 bytes to 5. The realism of this test is attractive. Nearly-static parent tables are plenty common, and this test will illustrate the impact on those designs. > 3. It would be nice to test the impact of increased mxid lookups in > the parent, but I've realized that the visibility map will probably > mask a good chunk of that effect, which is a good thing. Still, maybe > something like this: a fairly large parent table, say a million rows, > but narrow rows, so that many of them fit on a page, with frequent > reads and occasional updates (if there are only reads, autovacuum > might end with all the visibility map bits set); plus a child table > with one or a few rows per parent which is heavily updated. In theory > this ought to be good for the patch, since the the more fine-grained > locking will avoid blocking, but in this case the parent table is > large enough that you shouldn't get much blocking anyway, yet you'll > still pay the cost of mxid lookups because the occasional updates on > the parent will clear VM bits. This might not be the exactly right > workload to measure this effect, but if it's not maybe someone can > devote a little time to thinking about what would be. You still have HEAP_XMAX_{INVALID,COMMITTED} to reduce the pressure on mxid lookups, so I think something more sophisticated is needed to exercise that cost. Not sure what. > 4. A plain old pgbench run or two, to see whether there's any > regression when none of this matters at all... Might as well. > This isn't exactly a test case, but from Noah's previous comments I > gather that there is a theoretical risk of mxid consumption running > ahead of xid consumption. We should try to think about whether there > are any realistic workloads where that might actually happen. I'm > willing to believe that there aren't, but not just because somebody > asserts it. The reason I'm concerned about this is because, if it > should happen, the result will be more frequent anti-wraparound > vacuums on every table in the cluster. Those are already quite > painful for some users. Yes. Pre-release, what can we really do here other than have more people thinking about ways it might happen in practice? Post-release, we could suggest monitoring methods or perhaps have VACUUM emit a WARNING when a table is using more mxid space than xid space. Also consider a benchmark that does plenty of non-key updates on a parent table with no activity on the child table. We'll pay the overhead of determining that the key column(s) have not changed, but it will never pay off by preventing a lock wait. Granted, this is barely representative of application behavior. Perhaps, too, we already have a good sense of this cost from the HOT benchmarking efforts and have no cause to revisit it. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 12, 2012 at 3:28 PM, Simon Riggs wrote: > I agree with you that some worst case performance tests should be > done. Could you please say what you think the worst cases would be, so > those can be tested? That would avoid wasting time or getting anything > backwards. I've thought about this some and here's what I've come up with so far: 1. SELECT FOR SHARE on a large table on a system with no write cache. 2. A small parent table (say 30 rows or so) and a larger child table with a many-to-one FK relationship to the parent (say 100 child rows per parent row), with heavy update activity on the child table, on a system where fsyncs are very slow. This should generate lots of mxid consumption, and every 1600 or so mxids (I think) we've got to fsync; does that generate a noticeable performance hit? 3. It would be nice to test the impact of increased mxid lookups in the parent, but I've realized that the visibility map will probably mask a good chunk of that effect, which is a good thing. Still, maybe something like this: a fairly large parent table, say a million rows, but narrow rows, so that many of them fit on a page, with frequent reads and occasional updates (if there are only reads, autovacuum might end with all the visibility map bits set); plus a child table with one or a few rows per parent which is heavily updated. In theory this ought to be good for the patch, since the the more fine-grained locking will avoid blocking, but in this case the parent table is large enough that you shouldn't get much blocking anyway, yet you'll still pay the cost of mxid lookups because the occasional updates on the parent will clear VM bits. This might not be the exactly right workload to measure this effect, but if it's not maybe someone can devote a little time to thinking about what would be. 4. A plain old pgbench run or two, to see whether there's any regression when none of this matters at all... This isn't exactly a test case, but from Noah's previous comments I gather that there is a theoretical risk of mxid consumption running ahead of xid consumption. We should try to think about whether there are any realistic workloads where that might actually happen. I'm willing to believe that there aren't, but not just because somebody asserts it. The reason I'm concerned about this is because, if it should happen, the result will be more frequent anti-wraparound vacuums on every table in the cluster. Those are already quite painful for some users. It would be nice if Noah or someone else who has reviewed this patch in detail could comment further. I am shooting from the hip here, a bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Bruce Momjian's message of mar mar 13 14:00:52 -0300 2012: > > On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote: > > When there is a single locker in a tuple, we can just store the locking info > > in the tuple itself. We do this by storing the locker's Xid in XMAX, and > > setting hint bits specifying the locking strength. There is one exception > > here: since hint bit space is limited, we do not provide a separate hint bit > > for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in > > that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, > > are > > presumably more commonly used due to being the standards-mandated locking > > mechanism, or heavily used by the RI code, so we want to provide fast paths > > for those.) > > Are those tuple bits actually "hint" bits? They seem quite a bit more > powerful than a "hint". I'm not sure what's your point. We've had a "hint" bit for SELECT FOR UPDATE for ages. Even 8.2 had HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_SHARED_LOCK. Maybe they are misnamed and aren't really "hints", but it's not the job of this patch to fix that problem. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 12, 2012 at 9:24 PM, Noah Misch wrote: > When we lock an update-in-progress row, we walk the t_ctid chain and lock all > descendant tuples. They may all have uncommitted xmins. This is essential to > ensure that the final outcome of the updating transaction does not affect > whether the locking transaction has its KEY SHARE lock. Similarly, when we > update a previously-locked tuple, we copy any locks (always KEY SHARE locks) > to the new version. That new tuple is both uncommitted and has locks, and we > cannot easily sacrifice either property. Do you see a way to extend your > scheme to cover these needs? No, I think that sinks it. Good analysis. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 06, 2012 at 04:39:32PM -0300, Alvaro Herrera wrote: > Here's a first attempt at a README illustrating this. I intend this to > be placed in src/backend/access/heap/README.tuplock; the first three > paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove > those from there, directing people to this new file instead. Is there > something that you think should be covered more extensively (or at all) > here? ... > > When there is a single locker in a tuple, we can just store the locking info > in the tuple itself. We do this by storing the locker's Xid in XMAX, and > setting hint bits specifying the locking strength. There is one exception > here: since hint bit space is limited, we do not provide a separate hint bit > for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in > that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are > presumably more commonly used due to being the standards-mandated locking > mechanism, or heavily used by the RI code, so we want to provide fast paths > for those.) Are those tuple bits actually "hint" bits? They seem quite a bit more powerful than a "hint". -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 12, 2012 at 01:28:11PM -0400, Robert Haas wrote: > I spent some time thinking about this over the weekend, and I have an > observation, and an idea. Here's the observation: I believe that > locking a tuple whose xmin is uncommitted is always a noop, because if > it's ever possible for a transaction to wait for an XID that is part > of its own transaction (exact same XID, or sub-XIDs of the same top > XID), then a transaction could deadlock against itself. I believe > that this is not possible: if a transaction were to wait for an XID > assigned to that same backend, then the lock manager would observe > that an ExclusiveLock on the xid is already held, so the request for a > ShareLock would be granted immediately. I also don't believe there's > any situation in which the existence of an uncommitted tuple fails to > block another backend, but a lock on that same uncommitted tuple would > have caused another backend to block. If any of that sounds wrong, > you can stop reading here (but please tell me why it sounds wrong). When we lock an update-in-progress row, we walk the t_ctid chain and lock all descendant tuples. They may all have uncommitted xmins. This is essential to ensure that the final outcome of the updating transaction does not affect whether the locking transaction has its KEY SHARE lock. Similarly, when we update a previously-locked tuple, we copy any locks (always KEY SHARE locks) to the new version. That new tuple is both uncommitted and has locks, and we cannot easily sacrifice either property. Do you see a way to extend your scheme to cover these needs? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 12, 2012 at 6:14 PM, Robert Haas wrote: > Considering that nobody's done any work to resolve the uncertainty > about whether the worst-case performance characteristics of this patch > are acceptable, and considering further that it was undergoing massive > code churn for more than a month after the final CommitFest, I think > it's not that unreasonable to think it might not be ready for prime > time at this point. Thank you for cutting to the chase. The "uncertainty" of which you speak is a theoretical point you raised. It has been explained, but nobody has yet shown the performance numbers to illustrate the point but only because they seemed so clear. I would point out that you haven't demonstrated the existence of a problem either, so redesigning something without any proof of a problem seems a strange. Let me explain again what this patch does and why it has such major performance benefit. This feature give us a step change in lock reductions from FKs. A real world "best case" might be to examine the benefit this patch has on a large batch load that inserts many new orders for existing customers. In my example case the orders table has a FK to the customer table. At the same time as the data load, we attempt to update a customer's additional details, address or current balance etc. The large load takes locks on the customer table and keeps them for the whole transaction. So the customer updates are locked out for multiple seconds, minutes or maybe hours, depending upon how far you want to stretch the example. With this patch the customer updates don't cause lock conflicts but they require mxact lookups in *some* cases, so they might take 1-10ms extra, rather than 1-10 minutes more. >1000x faster. The only case that causes the additional lookups is the case that otherwise would have been locked. So producing "best case" results is trivial and can be as enormous as you like. I agree with you that some worst case performance tests should be done. Could you please say what you think the worst cases would be, so those can be tested? That would avoid wasting time or getting anything backwards. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 12, 2012 at 1:50 PM, Simon Riggs wrote: > On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas wrote: >> In other words, we'd entirely avoid needing to make mxacts crash-safe, >> and we'd avoid most of the extra SLRU lookups that the current >> implementation requires; they'd only be needed when (and for as long >> as) the locked tuple was not yet all-visible. > > The current implementation only requires additional lookups in the > update/check case, which is the case that doesn't do anything other > than block right now. Since we're replacing lock contention with > physical access contention even the worst case situation is still > better than what we have now. Please feel free to point out worst case > situations and show that isn't true. I think I already have: http://archives.postgresql.org/pgsql-hackers/2012-02/msg01258.php The case I'm worried about is where we are allocating mxids quickly, and we end up having to wait for fsyncs on mxact segments. That might be very slow, but you could argue that it could *possibly* be still worthwhile if it avoids blocking. That doesn't strike me as a slam-dunk, though, because we've already seen and fixed cases where too many fsyncs causes the performance of the entire system to go down the tubes (cf. commit 7f242d880b5b5d9642675517466d31373961cf98). But it's really bad if there are no updates on the parent table - then, whatever extra overhead there is will be all for naught, since the more fine-grained locking doesn't help anyway. > I've also pointed out how to avoid overhead of making mxacts crash > safe when the new facilities are not in use, so I don't see problems > with the proposed mechanism. Given that I am still myself reviewing > the actual code. The closest thing I can find to a proposal from you in that regard is this comment: # I was really thinking we could skip the fsync of a page if we've not # persisted anything important on that page, since that was one of # Robert's performance points. It might be possible to do something with that idea, but at the moment I'm not seeing how to make it work. > So those things are not something we need to avoid. > > My feeling is that overwriting xmin is a clever idea, but arrives too > late to require sensible analysis in this stage of the CF. It's not > solving a problem, its just an alternate mechanism and at best an > optimisation of the mechanism. Were we to explore it now, it seems > certain that another person would observe that design were taking > place and so the patch should be rejected, which would be unnecessary > and wasteful. Considering that nobody's done any work to resolve the uncertainty about whether the worst-case performance characteristics of this patch are acceptable, and considering further that it was undergoing massive code churn for more than a month after the final CommitFest, I think it's not that unreasonable to think it might not be ready for prime time at this point. In any event, your argument is exactly backwards: we need to first decide whether the patch needs a redesign and then, if it does, postpone it. Deciding that we don't want to postpone it first, and therefore we're not going to redesign it even if that is what's really needed makes no sense. > I also think it would alter our ability to diagnose > problems, not least the normal test that xmax matches xmin across an > update. There's nothing stopping the new tuple from being frozen before the old one, even today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 12, 2012 at 5:28 PM, Robert Haas wrote: > In other words, we'd entirely avoid needing to make mxacts crash-safe, > and we'd avoid most of the extra SLRU lookups that the current > implementation requires; they'd only be needed when (and for as long > as) the locked tuple was not yet all-visible. The current implementation only requires additional lookups in the update/check case, which is the case that doesn't do anything other than block right now. Since we're replacing lock contention with physical access contention even the worst case situation is still better than what we have now. Please feel free to point out worst case situations and show that isn't true. I've also pointed out how to avoid overhead of making mxacts crash safe when the new facilities are not in use, so I don't see problems with the proposed mechanism. Given that I am still myself reviewing the actual code. So those things are not something we need to avoid. My feeling is that overwriting xmin is a clever idea, but arrives too late to require sensible analysis in this stage of the CF. It's not solving a problem, its just an alternate mechanism and at best an optimisation of the mechanism. Were we to explore it now, it seems certain that another person would observe that design were taking place and so the patch should be rejected, which would be unnecessary and wasteful. I also think it would alter our ability to diagnose problems, not least the normal test that xmax matches xmin across an update. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas wrote: >> Regarding performance, the good thing about this patch is that if you >> have an operation that used to block, it might now not block. So maybe >> multixact-related operation is a bit slower than before, but if it >> allows you to continue operating rather than sit waiting until some >> other transaction releases you, it's much better. > > That's probably true, although there is some deferred cost that is > hard to account for. You might not block immediately, but then later > somebody might block either because the mxact SLRU now needs fsyncs or > because they've got to decode an mxid long after the relevant segment > has been evicted from the SLRU buffers. In general, it's hard to > bound that latter cost, because you only avoid blocking once (when the > initial update happens) but you might pay the extra cost of decoding > the mxid as many times as the row is read, which could be arbitrarily > many. How much of a problem that is in practice, I'm not completely > sure, but it has worried me before and it still does. In the worst > case scenario, a handful of frequently-accessed rows with MXIDs all of > whose members are dead except for the UPDATE they contain could result > in continual SLRU cache-thrashing. > > From a performance standpoint, we really need to think not only about > the cases where the patch wins, but also, and maybe more importantly, > the cases where it loses. There are some cases where the current > mechanism, use SHARE locks for foreign keys, is adequate. In > particular, it's adequate whenever the parent table is not updated at > all, or only very lightly. I believe that those people will pay > somewhat more with this patch, and especially in any case where > backends end up waiting for fsyncs in order to create new mxids, but > also just because I think this patch will have the effect of > increasing the space consumed by each individual mxid, which imposes a > distributed cost of its own. I spent some time thinking about this over the weekend, and I have an observation, and an idea. Here's the observation: I believe that locking a tuple whose xmin is uncommitted is always a noop, because if it's ever possible for a transaction to wait for an XID that is part of its own transaction (exact same XID, or sub-XIDs of the same top XID), then a transaction could deadlock against itself. I believe that this is not possible: if a transaction were to wait for an XID assigned to that same backend, then the lock manager would observe that an ExclusiveLock on the xid is already held, so the request for a ShareLock would be granted immediately. I also don't believe there's any situation in which the existence of an uncommitted tuple fails to block another backend, but a lock on that same uncommitted tuple would have caused another backend to block. If any of that sounds wrong, you can stop reading here (but please tell me why it sounds wrong). If it's right, then here's the idea: what if we stored mxids using xmin rather than xmax? This would mean that, instead of making mxids contain the tuple's original xmax, they'd need to instead contain the tuple's original xmin. This might seem like rearranging the deck chairs on the titanic, but I think it actually works out substantially better, because if we can assume that the xmin is committed, then we only need to know its exact value until it becomes older than RecentGlobalXmin. This means that a tuple can be both updated and locked at the same time without the MultiXact SLRU needing to be crash-safe, because if we crash and restart, any mxids that are still around from before the crash are known to contain only xmins that are now all-visible. We therefore don't need their exact values, so it doesn't matter if that data actually made it to disk. Furthermore, in the case where a previously-locked tuple is read repeatedly, we only need to keep doing SLRU lookups until the xmin becomes all-visible; after that, we can set a hint bit indicating that the tuple's xmin is all-visible, and any future readers (or writers) can use that to skip the SLRU lookup. In the degenerate (and probably common) case where a tuple is already all-visible at the time it's locked, we don't really need to record the original xmin at all; we can still do so if convenient, but we can set the xmin-all-visible hint right away, so nobody needs to probe the SLRU just to get xmin. In other words, we'd entirely avoid needing to make mxacts crash-safe, and we'd avoid most of the extra SLRU lookups that the current implementation requires; they'd only be needed when (and for as long as) the locked tuple was not yet all-visible. This also seems like it would make the anti-wraparound issues simpler to handle - once an mxid is old enough that any xmin it contains must be all-visible, we can simply overwrite the tuple's xmin with FrozenXID, which is pretty much what we're already doing anyway. It's already the c
Re: [HACKERS] foreign key locks, 2nd attempt
n Wed, Mar 7, 2012 at 11:37 AM, Gokulakannan Somasundaram wrote: >> >> Please explain in detail your idea of how it will work. > So we will take some kind of lock, which will stop such a happening. ... > May be someone can come up with better ideas than this. With respect, I don't call this a detailed explanation of an idea. For consideration here, come up with a very detailed design of how your suggestion will work. Think about it carefully, spend hours and days thinking it through and when you are personally sure it is better than what we have now, please raise it on list at an appropriate time. Bear in mind that most people throw away 90% of their ideas before even mentioning them here. I hope that helps you to contribute. At the moment we're trying to review patches for specific code to include or exclude, not discuss huge redesign of internal mechanisms using broad brush descriptions. It is possible you may find an improvement and if you do, people will be interested but that seems an unlikely thing to happen here and now. If you have specific comments or tests on this patch those are very welcome. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
> > > Please explain in detail your idea of how it will work. > > OK. I will try to explain the abstract idea, i have. a) Referential integrity gets violated, when there are referencing key values, not present in the referenced key values. We are maintaining the integrity by taking a Select for Share Lock during the foreign key checks, so that referred value is not updated/deleted during the operation. b) We can do the same in the reverse way. When there is a update/delete of the referred value, we don't want any new inserts with the referred value in referring table, any update that will update its value to the referred value being updated/deleted. So we will take some kind of lock, which will stop such a happening. This can be achieved through i) predicate locking infrastructure already present (or) ii) a temporary B-Tree index ( no WAL protection ), that gets created only for the referred value updations and holds those values that are being updated/deleted (if we are scared of predicate locking). So whenever we de foreign key checks, we just need to make sure there is no such referential integrity lock in our own PGPROC structure(if implemented with predicate locking) / check the temporary B-Tree index for any entry matching the entry that we are going to insert/update to.( the empty tree can be tracked with a flag to optimize ) May be someone can come up with better ideas than this. Gokul.
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Mar 7, 2012 at 10:18 AM, Gokulakannan Somasundaram wrote: >> >> Insert, Update and Delete don't take locks they simply mark the tuples >> they change with an xid. Anybody else wanting to "wait on the lock" >> just waits on the xid. We do insert a lock row for each xid, but not >> one per row changed. > > I mean the foreign key checks here. They take a Select for Share Lock right. > That's what we are trying to optimize here. Or am i missing something? So by > following the suggested methodology, the foreign key checks won't take any > locks. Please explain in detail your idea of how it will work. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
> > > Insert, Update and Delete don't take locks they simply mark the tuples > they change with an xid. Anybody else wanting to "wait on the lock" > just waits on the xid. We do insert a lock row for each xid, but not > one per row changed. > I mean the foreign key checks here. They take a Select for Share Lock right. That's what we are trying to optimize here. Or am i missing something? So by following the suggested methodology, the foreign key checks won't take any locks. > It's worked that way for 5 years, so its too late to modify it now and > this patch won't change that. > > The way we do RI locking is designed to prevent holding that in memory > and then having the lock table overflow, which then either requires us > to revert to the current design or upgrade to table level locks to > save space in the lock table - which is a total disaster, if you've > ever worked with DB2. > > What you're suggesting is that we store the locks in memory only as a > way of avoiding updating the row. > > But that memory would be consumed, only when someone updates the referenced column( which will usually be the primary key of the referenced table). Any normal database programmer knows that updating primary key is not good for performance. So we go by the same logic. > No, updates of referenced columns are exactly the same as now when no > RI checks happening. > > If the update occurs when an RI check takes place there is more work > to do, but previously it would have just blocked and done nothing. So > that path is relatively heavyweight but much better than nothing. > > As i have already said, that path is definitely heavy weight( like how Robert has made the DDL path heavy weight). If we assume that DDLs are going to be a rare phenomenon, then we can also assume that update of primary keys is a rare phenomenon in a normal database. > > The most useful way to help with this patch right now is to run > performance investigations and see if there are non-optimal cases. We > can then see how the patch handles those. Theory is good, but it needs > to drive experimentation, as I myself re-discover continually. > > I understand. I just wanted to know, whether the developer considered that line of thought. Thanks, Gokul.
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Mar 7, 2012 at 9:24 AM, Gokulakannan Somasundaram wrote: > I feel sad, that i followed this topic very late. But i still want to put > forward my views. > Have we thought on the lines of how Robert has implemented relation level > locks. In short it should go like this > > a) The locks for enforcing Referential integrity should be taken only when > the rarest of the events( that would cause the integrity failure) occur. > That would be the update of the referenced column. Other cases of update, > delete and insert should not be required to take locks. In this way, we can > reduce a lot of lock traffic. Insert, Update and Delete don't take locks they simply mark the tuples they change with an xid. Anybody else wanting to "wait on the lock" just waits on the xid. We do insert a lock row for each xid, but not one per row changed. > So if we have a table like employee( empid, empname, ... depid references > dept(deptid)) and table dept(depid depname). > > Currently we are taking shared locks on referenced rows in dept table, > whenever we are updating something in the employee table. This should not > happen. Instead any insert / update of referenced column / delete should > check for some lock in its PGPROC structure, which will only get created > when the depid gets updated / deleted( rare event ) It's worked that way for 5 years, so its too late to modify it now and this patch won't change that. The way we do RI locking is designed to prevent holding that in memory and then having the lock table overflow, which then either requires us to revert to the current design or upgrade to table level locks to save space in the lock table - which is a total disaster, if you've ever worked with DB2. What you're suggesting is that we store the locks in memory only as a way of avoiding updating the row. My understanding is we have two optimisation choices. A single set of xids can be used in many places, since the same set of transactions may do roughly the same thing. 1. We could assign a new mxactid every time we touch a new row. That way there is no correspondence between sets of xids, and we may hold the same set many times. OTOH since each set is unique we can expand it easily and we don't need to touch each row once for each lock. That saves on row touches but it also greatly increases the mxactid creation rate, which causes cache scrolling. 2. We assign a new mxactid each time we create a new unique set of rows. We have a separate cache for local sets. This way reduces the mxactid creation rate but causes row updates each time we lock the row, which then needs WAL. (2) is how we currently handle the very difficult decision of how to optimise this for the general case. I'm not sure that is right in all cases, but it is at least scalable and it is the devil we know. > b) But the operation of update of the referenced column will be made more > costly. May be it can create something like a predicate lock(used for > enforcing serializable) and keep it in all the PG_PROC structures. No, updates of referenced columns are exactly the same as now when no RI checks happening. If the update occurs when an RI check takes place there is more work to do, but previously it would have just blocked and done nothing. So that path is relatively heavyweight but much better than nothing. > I know this is a abstract idea, but just wanted to know, whether we have > thought on those lines. Thanks for your thoughts. The most useful way to help with this patch right now is to run performance investigations and see if there are non-optimal cases. We can then see how the patch handles those. Theory is good, but it needs to drive experimentation, as I myself re-discover continually. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
I feel sad, that i followed this topic very late. But i still want to put forward my views. Have we thought on the lines of how Robert has implemented relation level locks. In short it should go like this a) The locks for enforcing Referential integrity should be taken only when the rarest of the events( that would cause the integrity failure) occur. That would be the update of the referenced column. Other cases of update, delete and insert should not be required to take locks. In this way, we can reduce a lot of lock traffic. So if we have a table like employee( empid, empname, ... depid references dept(deptid)) and table dept(depid depname). Currently we are taking shared locks on referenced rows in dept table, whenever we are updating something in the employee table. This should not happen. Instead any insert / update of referenced column / delete should check for some lock in its PGPROC structure, which will only get created when the depid gets updated / deleted( rare event ) b) But the operation of update of the referenced column will be made more costly. May be it can create something like a predicate lock(used for enforcing serializable) and keep it in all the PG_PROC structures. I know this is a abstract idea, but just wanted to know, whether we have thought on those lines. Thanks, Gokul.
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 6, 2012 at 4:27 PM, Alvaro Herrera wrote: > Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012: >> >> Preliminary comment: >> >> This README is very helpful. > > Thanks. I feel silly that I didn't write it earlier. > >> On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera >> wrote: >> > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is >> > super-exclusive locking (used to delete tuples and more generally to update >> > tuples modifying the values of the columns that make up the key of the >> > tuple); >> > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE >> > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak >> > mode >> > that does not conflict with exclusive mode, but conflicts with SELECT FOR >> > KEY >> > UPDATE. This last mode implements a mode just strong enough to implement >> > RI >> > checks, i.e. it ensures that tuples do not go away from under a check, >> > without >> > blocking when some other transaction that want to update the tuple without >> > changing its key. >> >> I feel like there is a naming problem here. The semantics that have >> always been associated with SELECT FOR UPDATE are now attached to >> SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened. >> I think users will be surprised to find that SELECT FOR UPDATE >> doesn't block all concurrent updates. > > I'm not sure why you say that. Certainly SELECT FOR UPDATE continues to > block all updates. It continues to block SELECT FOR SHARE as well. > The things that it doesn't block are the new SELECT FOR KEY SHARE locks; > since those didn't exist before, it doesn't seem correct to consider > that SELECT FOR UPDATE changed in any way. > > The main difference in the UPDATE behavior is that an UPDATE is regarded > as though it might acquire two different lock modes -- it either > acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR > UPDATE if not. Since SELECT FOR KEY UPDATE didn't exist before, we can > consider that previous to this patch, what UPDATE did was always acquire > a lock of strength SELECT FOR UPDATE. So UPDATE also hasn't been > weakened. Ah, I see. My mistake. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 6, 2012 at 9:10 PM, Robert Haas wrote: > Preliminary comment: > > This README is very helpful. > > On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera > wrote: >> We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is >> super-exclusive locking (used to delete tuples and more generally to update >> tuples modifying the values of the columns that make up the key of the >> tuple); >> SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE >> implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak >> mode >> that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY >> UPDATE. This last mode implements a mode just strong enough to implement RI >> checks, i.e. it ensures that tuples do not go away from under a check, >> without >> blocking when some other transaction that want to update the tuple without >> changing its key. > > I feel like there is a naming problem here. The semantics that have > always been associated with SELECT FOR UPDATE are now attached to > SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened. > I think users will be surprised to find that SELECT FOR UPDATE > doesn't block all concurrent updates. > > It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR > UPDATE, and what you're calling SELECT FOR UPDATE should be called > something else - essentially NONKEY UPDATE, though I don't much like > that name. No, because that would stop it from doing what it is designed to do. The lock modes are correct, appropriate and IMHO have meaningful names. No redesign required here. Not sure about the naming of some of the flag bits however. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Robert Haas's message of mar mar 06 18:10:16 -0300 2012: > > Preliminary comment: > > This README is very helpful. Thanks. I feel silly that I didn't write it earlier. > On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera > wrote: > > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is > > super-exclusive locking (used to delete tuples and more generally to update > > tuples modifying the values of the columns that make up the key of the > > tuple); > > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE > > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak > > mode > > that does not conflict with exclusive mode, but conflicts with SELECT FOR > > KEY > > UPDATE. This last mode implements a mode just strong enough to implement RI > > checks, i.e. it ensures that tuples do not go away from under a check, > > without > > blocking when some other transaction that want to update the tuple without > > changing its key. > > I feel like there is a naming problem here. The semantics that have > always been associated with SELECT FOR UPDATE are now attached to > SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened. > I think users will be surprised to find that SELECT FOR UPDATE > doesn't block all concurrent updates. I'm not sure why you say that. Certainly SELECT FOR UPDATE continues to block all updates. It continues to block SELECT FOR SHARE as well. The things that it doesn't block are the new SELECT FOR KEY SHARE locks; since those didn't exist before, it doesn't seem correct to consider that SELECT FOR UPDATE changed in any way. The main difference in the UPDATE behavior is that an UPDATE is regarded as though it might acquire two different lock modes -- it either acquires SELECT FOR KEY UPDATE if the key is modified, or SELECT FOR UPDATE if not. Since SELECT FOR KEY UPDATE didn't exist before, we can consider that previous to this patch, what UPDATE did was always acquire a lock of strength SELECT FOR UPDATE. So UPDATE also hasn't been weakened. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Preliminary comment: This README is very helpful. On Tue, Mar 6, 2012 at 2:39 PM, Alvaro Herrera wrote: > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is > super-exclusive locking (used to delete tuples and more generally to update > tuples modifying the values of the columns that make up the key of the tuple); > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode > that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY > UPDATE. This last mode implements a mode just strong enough to implement RI > checks, i.e. it ensures that tuples do not go away from under a check, without > blocking when some other transaction that want to update the tuple without > changing its key. I feel like there is a naming problem here. The semantics that have always been associated with SELECT FOR UPDATE are now attached to SELECT FOR KEY UPDATE; and SELECT FOR UPDATE itself has been weakened. I think users will be surprised to find that SELECT FOR UPDATE doesn't block all concurrent updates. It seems to me that SELECT FOR KEY UPDATE should be called SELECT FOR UPDATE, and what you're calling SELECT FOR UPDATE should be called something else - essentially NONKEY UPDATE, though I don't much like that name. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera wrote: > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is > super-exclusive locking (used to delete tuples and more generally to update > tuples modifying the values of the columns that make up the key of the tuple); > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode > that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY > UPDATE. This last mode implements a mode just strong enough to implement RI > checks, i.e. it ensures that tuples do not go away from under a check, without > blocking when some other transaction that want to update the tuple without > changing its key. So there are 4 lock types, but we only have room for 3 on the tuple header, so we store the least common/deprecated of the 4 types as a multixactid. Some rewording would help there. Neat scheme! My understanding is that all of theses workloads will change * Users of explicit SHARE lockers will be slightly worse in the case of the 1st locker, but then after that they'll be the same as before. * Updates against an RI locked table will be dramatically faster because of reduced lock waits ...and that these previous workloads are effectively unchanged: * Stream of RI checks causes mxacts * Multi row deadlocks still possible * Queues of writers still wait in the same way * Deletes don't cause mxacts unless by same transaction > In earlier PostgreSQL releases, a MultiXact always meant that the tuple was > locked in shared mode by multiple transactions. This is no longer the case; a > MultiXact may contain an update or delete Xid. (Keep in mind that tuple locks > in a transaction do not conflict with other tuple locks in the same > transaction, so it's possible to have otherwise conflicting locks in a > MultiXact if they belong to the same transaction). Somewhat confusing, but am getting there. > Note that each lock is attributed to the subtransaction that acquires it. > This means that a subtransaction that aborts is seen as though it releases the > locks it acquired; concurrent transactions can then proceed without having to > wait for the main transaction to finish. It also means that a subtransaction > can upgrade to a stronger lock level than an earlier transaction had, and if > the subxact aborts, the earlier, weaker lock is kept. OK > The possibility of having an update within a MultiXact means that they must > persist across crashes and restarts: a future reader of the tuple needs to > figure out whether the update committed or aborted. So we have a requirement > that pg_multixact needs to retain pages of its data until we're certain that > the MultiXacts in them are no longer of interest. I think the "no longer of interest" aspect needs to be tracked more closely because it will necessarily lead to more I/O. If we store the LSN on each mxact page, as I think we need to, we can get rid of pages more quickly if we know they don't have an LSN set. So its possible we can optimise that more. > VACUUM is in charge of removing old MultiXacts at the time of tuple freezing. You mean mxact segments? Surely we set hint bits on tuples same as now? Hope so. > This works in the same way that pg_clog segments are removed: we have a > pg_class column that stores the earliest multixact that could possibly be > stored in the table; the minimum of all such values is stored in a pg_database > column. VACUUM computes the minimum across all pg_database values, and > removes pg_multixact segments older than the minimum. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 5, 2012 at 8:35 PM, Simon Riggs wrote: >>> * Why do we need multixact to be persistent? Do we need every page of >>> multixact to be persistent, or just particular pages in certain >>> circumstances? >> >> Any page that contains at least one multi with an update as a member >> must persist. It's possible that some pages contain no update (and this >> is even likely in some workloads, if updates are rare), but I'm not sure >> it's worth complicating the code to cater for early removal of some >> pages. > > If the multixact contains an xid and that is being persisted then you > need to set an LSN to ensure that a page writes causes an XLogFlush() > before the multixact write. And you need to set do_fsync, no? Or > explain why not in comments... > > I was really thinking we could skip the fsync of a page if we've not > persisted anything important on that page, since that was one of > Robert's performance points. We need to increase these values to 32 as well #define NUM_MXACTOFFSET_BUFFERS8 #define NUM_MXACTMEMBER_BUFFERS16 using same logic as for clog. We're using 25% more space and we already know clog benefits from increasing them, so there's little doubt we need it here also, since we are increasing the access rate and potentially the longevity. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012: > It does however, illustrate my next review comment which is that the > comments and README items are sorely lacking here. It's quite hard to > see how it works, let along comment on major design decisions. It > would help myself and others immensely if we could improve that. Here's a first attempt at a README illustrating this. I intend this to be placed in src/backend/access/heap/README.tuplock; the first three paragraphs are stolen from the comment in heap_lock_tuple, so I'd remove those from there, directing people to this new file instead. Is there something that you think should be covered more extensively (or at all) here? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support Locking tuples -- Because the shared-memory lock table is of finite size, but users could reasonably want to lock large numbers of tuples, we do not rely on the standard lock manager to store tuple-level locks over the long term. Instead, a tuple is marked as locked by setting the current transaction's XID as its XMAX, and setting additional infomask bits to distinguish this usage from the more normal case of having deleted the tuple. When multiple transactions concurrently lock a tuple, a MultiXact is used; see below. When it is necessary to wait for a tuple-level lock to be released, the basic delay is provided by XactLockTableWait or MultiXactIdWait on the contents of the tuple's XMAX. However, that mechanism will release all waiters concurrently, so there would be a race condition as to which waiter gets the tuple, potentially leading to indefinite starvation of some waiters. The possibility of share-locking makes the problem much worse --- a steady stream of share-lockers can easily block an exclusive locker forever. To provide more reliable semantics about who gets a tuple-level lock first, we use the standard lock manager. The protocol for waiting for a tuple-level lock is really LockTuple() XactLockTableWait() mark tuple as locked by me UnlockTuple() When there are multiple waiters, arbitration of who is to get the lock next is provided by LockTuple(). However, at most one tuple-level lock will be held or awaited per backend at any time, so we don't risk overflow of the lock table. Note that incoming share-lockers are required to do LockTuple as well, if there is any conflict, to ensure that they don't starve out waiting exclusive-lockers. However, if there is not any active conflict for a tuple, we don't incur any extra overhead. We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is super-exclusive locking (used to delete tuples and more generally to update tuples modifying the values of the columns that make up the key of the tuple); SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak mode that does not conflict with exclusive mode, but conflicts with SELECT FOR KEY UPDATE. This last mode implements a mode just strong enough to implement RI checks, i.e. it ensures that tuples do not go away from under a check, without blocking when some other transaction that want to update the tuple without changing its key. The conflict table is: KEY UPDATEUPDATESHAREKEY SHARE KEY UPDATE conflictconflict conflict conflict UPDATE conflictconflict conflict SHAREconflictconflict KEY SHAREconflict When there is a single locker in a tuple, we can just store the locking info in the tuple itself. We do this by storing the locker's Xid in XMAX, and setting hint bits specifying the locking strength. There is one exception here: since hint bit space is limited, we do not provide a separate hint bit for SELECT FOR SHARE, so we have to use the extended info in a MultiXact in that case. (The other cases, SELECT FOR UPDATE and SELECT FOR KEY SHARE, are presumably more commonly used due to being the standards-mandated locking mechanism, or heavily used by the RI code, so we want to provide fast paths for those.) MultiXacts -- A tuple header provides very limited space for storing information about tuple locking and updates: there is room only for a single Xid and a small number of hint bits. Whenever we need to store more than one lock, we replace the first locker's Xid with a new MultiXactId. Each MultiXact provides extended locking data; it comprises an array of Xids plus some flags bits for each one. The flags are currently used to store the locking strength of each member transaction. (The flags also distinguish a pure locker from an actual updater.) In earlier PostgreSQL releases, a MultiXact always meant that the tuple was locked in shared mode by multiple transactions. This is no longe
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 5, 2012 at 7:53 PM, Alvaro Herrera wrote: >> My other comments so far are >> >> * some permutations commented out - no comments as to why >> Something of a fault with the isolation tester that it just shows >> output, there's no way to record expected output in the spec > > The reason they are commented out is that they are "invalid", that is, > it requires running a command on a session that's blocked in the > previous command. Obviously, that cannot happen in real life. > > isolationtester now has support for detecting such conditions; if the > spec specifies running a command in a locked session, the permutation is > killed with an error message "invalid permutation" and just continues > with the next permutation. It used to simply die, aborting the test. > Maybe we could just modify the specs so that all permutations are there > (this can be done by simply removing the permutation lines), and the > "invalid permutation" messages are part of the expected file. Would > that be better? It would be better to have an isolation tester mode that checks to see it was invalid and if not, report that. At the moment we can't say why you commented something out. There's no comment or explanation, and we need something, otherwise 3 years from now we'll be completely in the dark. >> Comments required for these points >> >> * Why do we need multixact to be persistent? Do we need every page of >> multixact to be persistent, or just particular pages in certain >> circumstances? > > Any page that contains at least one multi with an update as a member > must persist. It's possible that some pages contain no update (and this > is even likely in some workloads, if updates are rare), but I'm not sure > it's worth complicating the code to cater for early removal of some > pages. If the multixact contains an xid and that is being persisted then you need to set an LSN to ensure that a page writes causes an XLogFlush() before the multixact write. And you need to set do_fsync, no? Or explain why not in comments... I was really thinking we could skip the fsync of a page if we've not persisted anything important on that page, since that was one of Robert's performance points. >> * Why do we need to expand multixact with flags? Can we avoid that in >> some cases? > > Did you read my blog post? > http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/ > This explains the reason -- the point is that we need to distinguish the > lock strength acquired by each locker. Thanks, I will, but it all belongs in a README please. >> * Why do we need to store just single xids in multixact members? >> Didn't understand comments, no explanation > > This is just for SELECT FOR SHARE. We don't have a hint bit to indicate > "this tuple has a for-share lock", so we need to create a multi for it. > Since FOR SHARE is probably going to be very uncommon, this isn't likely > to be a problem. We're mainly catering for users of SELECT FOR SHARE so > that it continues to work, i.e. maintain backwards compatibility. Good, thanks. Are we actively recommending people use FOR KEY SHARE rather than FOR SHARE, in explicit use? > (Maybe I misunderstood your question -- what I think you're asking is, > "why are there some multixacts that have a single member?") > > I'll try to come up with a good place to add some paragraphs about all > this. Please let me know if answers here are unclear and/or you have > further questions. Thanks I think we need to define some test workloads to measure the performance impact of this patch. We need to be certain that it has a good impact in target cases, plus a known impact in other cases. Suggest * basic pgbench - no RI * inserts into large table, RI checks to small table, no activity on small table * large table parent, large table: child 20 child rows per parent, fk from child to parent updates of multiple children at same time low/medium/heavy locking * large table parent, large table: child 20 child rows per parent,fk from child to parent updates of parent and child at same time low/medium/heavy locking -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of lun mar 05 16:34:10 -0300 2012: > On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera > wrote: > It does however, illustrate my next review comment which is that the > comments and README items are sorely lacking here. It's quite hard to > see how it works, let along comment on major design decisions. It > would help myself and others immensely if we could improve that. Hm. Okay. > Is there a working copy on a git repo? Easier than waiting for next > versions of a patch. No, I don't have an external mirror of my local repo. > My other comments so far are > > * some permutations commented out - no comments as to why > Something of a fault with the isolation tester that it just shows > output, there's no way to record expected output in the spec The reason they are commented out is that they are "invalid", that is, it requires running a command on a session that's blocked in the previous command. Obviously, that cannot happen in real life. isolationtester now has support for detecting such conditions; if the spec specifies running a command in a locked session, the permutation is killed with an error message "invalid permutation" and just continues with the next permutation. It used to simply die, aborting the test. Maybe we could just modify the specs so that all permutations are there (this can be done by simply removing the permutation lines), and the "invalid permutation" messages are part of the expected file. Would that be better? > Comments required for these points > > * Why do we need multixact to be persistent? Do we need every page of > multixact to be persistent, or just particular pages in certain > circumstances? Any page that contains at least one multi with an update as a member must persist. It's possible that some pages contain no update (and this is even likely in some workloads, if updates are rare), but I'm not sure it's worth complicating the code to cater for early removal of some pages. > * Why do we need to expand multixact with flags? Can we avoid that in > some cases? Did you read my blog post? http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/ This explains the reason -- the point is that we need to distinguish the lock strength acquired by each locker. > * Why do we need to store just single xids in multixact members? > Didn't understand comments, no explanation This is just for SELECT FOR SHARE. We don't have a hint bit to indicate "this tuple has a for-share lock", so we need to create a multi for it. Since FOR SHARE is probably going to be very uncommon, this isn't likely to be a problem. We're mainly catering for users of SELECT FOR SHARE so that it continues to work, i.e. maintain backwards compatibility. (Maybe I misunderstood your question -- what I think you're asking is, "why are there some multixacts that have a single member?") I'll try to come up with a good place to add some paragraphs about all this. Please let me know if answers here are unclear and/or you have further questions. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Mar 5, 2012 at 6:37 PM, Alvaro Herrera wrote: > > Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012: >> >> On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas wrote: > >> > From a performance standpoint, we really need to think not only about >> > the cases where the patch wins, but also, and maybe more importantly, >> > the cases where it loses. There are some cases where the current >> > mechanism, use SHARE locks for foreign keys, is adequate. In >> > particular, it's adequate whenever the parent table is not updated at >> > all, or only very lightly. I believe that those people will pay >> > somewhat more with this patch, and especially in any case where >> > backends end up waiting for fsyncs in order to create new mxids, but >> > also just because I think this patch will have the effect of >> > increasing the space consumed by each individual mxid, which imposes a >> > distributed cost of its own. >> >> That is a concern also. >> >> It's taken me a while reviewing the patch to realise that space usage >> is actually 4 times worse than before. > > Eh. You're probably misreading something. Previously each member of a > multixact used 4 bytes (the size of an Xid). With the current patch a > member uses 5 bytes (same plus a flags byte). An earlier version used > 4.25 bytes per multi, which I increased to leave space for future > expansion. > > So it's 1.25x worse, not 4x worse. Thanks for correcting me. That sounds better. It does however, illustrate my next review comment which is that the comments and README items are sorely lacking here. It's quite hard to see how it works, let along comment on major design decisions. It would help myself and others immensely if we could improve that. Is there a working copy on a git repo? Easier than waiting for next versions of a patch. My other comments so far are * some permutations commented out - no comments as to why Something of a fault with the isolation tester that it just shows output, there's no way to record expected output in the spec Comments required for these points * Why do we need multixact to be persistent? Do we need every page of multixact to be persistent, or just particular pages in certain circumstances? * Why do we need to expand multixact with flags? Can we avoid that in some cases? * Why do we need to store just single xids in multixact members? Didn't understand comments, no explanation -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012: > > On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas wrote: > > From a performance standpoint, we really need to think not only about > > the cases where the patch wins, but also, and maybe more importantly, > > the cases where it loses. There are some cases where the current > > mechanism, use SHARE locks for foreign keys, is adequate. In > > particular, it's adequate whenever the parent table is not updated at > > all, or only very lightly. I believe that those people will pay > > somewhat more with this patch, and especially in any case where > > backends end up waiting for fsyncs in order to create new mxids, but > > also just because I think this patch will have the effect of > > increasing the space consumed by each individual mxid, which imposes a > > distributed cost of its own. > > That is a concern also. > > It's taken me a while reviewing the patch to realise that space usage > is actually 4 times worse than before. Eh. You're probably misreading something. Previously each member of a multixact used 4 bytes (the size of an Xid). With the current patch a member uses 5 bytes (same plus a flags byte). An earlier version used 4.25 bytes per multi, which I increased to leave space for future expansion. So it's 1.25x worse, not 4x worse. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas wrote: > On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera > wrote: >>> This >>> seems like a horrid mess that's going to be unsustainable both from a >>> complexity and a performance standpoint. The only reason multixacts >>> were tolerable at all was that they had only one semantics. Changing >>> it so that maybe a multixact represents an actual updater and maybe >>> it doesn't is not sane. >> >> As far as complexity, yeah, it's a lot more complex now -- no question >> about that. >> >> Regarding performance, the good thing about this patch is that if you >> have an operation that used to block, it might now not block. So maybe >> multixact-related operation is a bit slower than before, but if it >> allows you to continue operating rather than sit waiting until some >> other transaction releases you, it's much better. > > That's probably true, although there is some deferred cost that is > hard to account for. You might not block immediately, but then later > somebody might block either because the mxact SLRU now needs fsyncs or > because they've got to decode an mxid long after the relevant segment > has been evicted from the SLRU buffers. In general, it's hard to > bound that latter cost, because you only avoid blocking once (when the > initial update happens) but you might pay the extra cost of decoding > the mxid as many times as the row is read, which could be arbitrarily > many. How much of a problem that is in practice, I'm not completely > sure, but it has worried me before and it still does. In the worst > case scenario, a handful of frequently-accessed rows with MXIDs all of > whose members are dead except for the UPDATE they contain could result > in continual SLRU cache-thrashing. Cases I regularly see involve wait times of many seconds. When this patch helps, it will help performance by algorithmic gains, so perhaps x10-100. That can and should be demonstrated though, I agree. > From a performance standpoint, we really need to think not only about > the cases where the patch wins, but also, and maybe more importantly, > the cases where it loses. There are some cases where the current > mechanism, use SHARE locks for foreign keys, is adequate. In > particular, it's adequate whenever the parent table is not updated at > all, or only very lightly. I believe that those people will pay > somewhat more with this patch, and especially in any case where > backends end up waiting for fsyncs in order to create new mxids, but > also just because I think this patch will have the effect of > increasing the space consumed by each individual mxid, which imposes a > distributed cost of its own. That is a concern also. It's taken me a while reviewing the patch to realise that space usage is actually 4 times worse than before. > I think we should avoid having a theoretical argument about how > serious these problems are; instead, you should try to construct > somewhat-realistic worst case scenarios and benchmark them. Tom's > complaint about code complexity is basically a question of opinion, so > I don't know how to evaluate that objectively, but performance is > something we can measure. We might still disagree on the > interpretation of the results, but I still think having some real > numbers to talk about based on carefully-thought-out test cases would > advance the debate. It's a shame that the isolation tester can't be used directly by pgbench - I think we need something similar for performance regression testing. So yes, performance testing is required. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Feb 28, 2012 at 12:28 AM, Noah Misch wrote: > On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote: >> On 23.02.2012 18:01, Alvaro Herrera wrote: >>> As far as complexity, yeah, it's a lot more complex now -- no question >>> about that. >> >> How about assigning a new, real, transaction id, to represent the group >> of transaction ids. The new transaction id would be treated as a >> subtransaction of the updater, and the xids of the lockers would be >> stored in the multixact-members slru. That way the multixact structures >> wouldn't need to survive a crash; you don't care about the shared >> lockers after a crash, and the xid of the updater would be safely stored >> as is in the xmax field. >> >> That way you wouldn't need to handle multixact wraparound, because we >> already handle xid wraparound, and you wouldn't need to make multixact >> slrus crash-safe. >> >> Not sure what the performance implications would be. You would use up >> xids more quickly, which would require more frequent anti-wraparound >> vacuuming. And if we just start using real xids as the key to >> multixact-offsets slru, we would need to extend that a lot more often. >> But I feel it would probably be acceptable. > > When a key locker arrives after the updater and creates this implicit > subtransaction of the updater, how might you arrange for the xid's clog status > to eventually get updated in accordance with the updater's outcome? Somewhat off-topic, but just seen another bad case of FK lock contention. Thanks for working on this everybody. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Feb 27, 2012 at 02:13:32PM +0200, Heikki Linnakangas wrote: > On 23.02.2012 18:01, Alvaro Herrera wrote: >> As far as complexity, yeah, it's a lot more complex now -- no question >> about that. > > How about assigning a new, real, transaction id, to represent the group > of transaction ids. The new transaction id would be treated as a > subtransaction of the updater, and the xids of the lockers would be > stored in the multixact-members slru. That way the multixact structures > wouldn't need to survive a crash; you don't care about the shared > lockers after a crash, and the xid of the updater would be safely stored > as is in the xmax field. > > That way you wouldn't need to handle multixact wraparound, because we > already handle xid wraparound, and you wouldn't need to make multixact > slrus crash-safe. > > Not sure what the performance implications would be. You would use up > xids more quickly, which would require more frequent anti-wraparound > vacuuming. And if we just start using real xids as the key to > multixact-offsets slru, we would need to extend that a lot more often. > But I feel it would probably be acceptable. When a key locker arrives after the updater and creates this implicit subtransaction of the updater, how might you arrange for the xid's clog status to eventually get updated in accordance with the updater's outcome? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On 23.02.2012 18:01, Alvaro Herrera wrote: Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012: Alvaro Herrera writes: Sure. The problem is that we are allowing updated rows to be locked (and locked rows to be updated). This means that we need to store extended Xmax information in tuples that goes beyond mere locks, which is what we were doing previously -- they may now have locks and updates simultaneously. (In the previous code, a multixact never meant an update, it always signified only shared locks. After a crash, all backends that could have been holding locks must necessarily be gone, so the multixact info is not interesting and can be treated like the tuple is simply live.) Ugh. I had not been paying attention to what you were doing in this patch, and now that I read this I wish I had objected earlier. Uhm, yeah, a lot earlier -- I initially blogged about this in August last year: http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/ and in several posts in pgsql-hackers. This seems like a horrid mess that's going to be unsustainable both from a complexity and a performance standpoint. The only reason multixacts were tolerable at all was that they had only one semantics. Changing it so that maybe a multixact represents an actual updater and maybe it doesn't is not sane. As far as complexity, yeah, it's a lot more complex now -- no question about that. How about assigning a new, real, transaction id, to represent the group of transaction ids. The new transaction id would be treated as a subtransaction of the updater, and the xids of the lockers would be stored in the multixact-members slru. That way the multixact structures wouldn't need to survive a crash; you don't care about the shared lockers after a crash, and the xid of the updater would be safely stored as is in the xmax field. That way you wouldn't need to handle multixact wraparound, because we already handle xid wraparound, and you wouldn't need to make multixact slrus crash-safe. Not sure what the performance implications would be. You would use up xids more quickly, which would require more frequent anti-wraparound vacuuming. And if we just start using real xids as the key to multixact-offsets slru, we would need to extend that a lot more often. But I feel it would probably be acceptable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 11:01 AM, Alvaro Herrera wrote: >> This >> seems like a horrid mess that's going to be unsustainable both from a >> complexity and a performance standpoint. The only reason multixacts >> were tolerable at all was that they had only one semantics. Changing >> it so that maybe a multixact represents an actual updater and maybe >> it doesn't is not sane. > > As far as complexity, yeah, it's a lot more complex now -- no question > about that. > > Regarding performance, the good thing about this patch is that if you > have an operation that used to block, it might now not block. So maybe > multixact-related operation is a bit slower than before, but if it > allows you to continue operating rather than sit waiting until some > other transaction releases you, it's much better. That's probably true, although there is some deferred cost that is hard to account for. You might not block immediately, but then later somebody might block either because the mxact SLRU now needs fsyncs or because they've got to decode an mxid long after the relevant segment has been evicted from the SLRU buffers. In general, it's hard to bound that latter cost, because you only avoid blocking once (when the initial update happens) but you might pay the extra cost of decoding the mxid as many times as the row is read, which could be arbitrarily many. How much of a problem that is in practice, I'm not completely sure, but it has worried me before and it still does. In the worst case scenario, a handful of frequently-accessed rows with MXIDs all of whose members are dead except for the UPDATE they contain could result in continual SLRU cache-thrashing. From a performance standpoint, we really need to think not only about the cases where the patch wins, but also, and maybe more importantly, the cases where it loses. There are some cases where the current mechanism, use SHARE locks for foreign keys, is adequate. In particular, it's adequate whenever the parent table is not updated at all, or only very lightly. I believe that those people will pay somewhat more with this patch, and especially in any case where backends end up waiting for fsyncs in order to create new mxids, but also just because I think this patch will have the effect of increasing the space consumed by each individual mxid, which imposes a distributed cost of its own. I think we should avoid having a theoretical argument about how serious these problems are; instead, you should try to construct somewhat-realistic worst case scenarios and benchmark them. Tom's complaint about code complexity is basically a question of opinion, so I don't know how to evaluate that objectively, but performance is something we can measure. We might still disagree on the interpretation of the results, but I still think having some real numbers to talk about based on carefully-thought-out test cases would advance the debate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Vik Reykja wrote: > Kevin Grittner wrote: > >> One of the problems that Florian was trying to address is that >> people often have a need to enforce something with a lot of >> similarity to a foreign key, but with more subtle logic than >> declarative foreign keys support. One example would be the case >> Robert has used in some presentations, where the manager column >> in each row in a project table must contain the id of a row in a >> person table *which has the project_manager boolean column set to >> TRUE*. Short of using the new serializable transaction isolation >> level in all related transactions, hand-coding enforcement of >> this useful invariant through trigger code (or application code >> enforced through some framework) is very tricky. The change to >> SELECT FOR UPDATE that Florian was working on would make it >> pretty straightforward. > > I'm not sure what Florian's patch does, but I've been trying to > advocate syntax like the following for this exact scenario: > > foreign key (manager_id, true) references person (id, is_manager) > > Basically, allow us to use constants instead of field names as > part of foreign keys. Interesting. IMV, a declarative approach like that is almost always better than the alternatives, so something like this (possibly with different syntax) would be another step in the right direction. I suspect that there will always be a few corner cases where the business logic required is too esoteric to be handled by a generalized declarative construct, so I think Florian's idea still has merit -- especially if we want to ease the transition to PostgreSQL for large shops using other products. > I have no idea what the implementation aspect of this is, > but I need the user aspect of it and don't know the best way to > get it. There are those in the community who make their livings by helping people get the features they want. If you have some money to fund development, I would bet you could get this addressed -- it sure sounds reasonable to me. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 19:44, Kevin Grittner wrote: > One of the problems that Florian was trying to address is that > people often have a need to enforce something with a lot of > similarity to a foreign key, but with more subtle logic than > declarative foreign keys support. One example would be the case > Robert has used in some presentations, where the manager column in > each row in a project table must contain the id of a row in a person > table *which has the project_manager boolean column set to TRUE*. > Short of using the new serializable transaction isolation level in > all related transactions, hand-coding enforcement of this useful > invariant through trigger code (or application code enforced through > some framework) is very tricky. The change to SELECT FOR UPDATE > that Florian was working on would make it pretty straightforward. > I'm not sure what Florian's patch does, but I've been trying to advocate syntax like the following for this exact scenario: foreign key (manager_id, true) references person (id, is_manager) Basically, allow us to use constants instead of field names as part of foreign keys. I have no idea what the implementation aspect of this is, but I need the user aspect of it and don't know the best way to get it.
Re: [HACKERS] foreign key locks, 2nd attempt
On 2012-02-23 22:12, Noah Misch wrote: That alone would not simplify the patch much. INSERT/UPDATE/DELETE on the foreign side would still need to take some kind of tuple lock on the primary side to prevent primary-side DELETE. You then still face the complicated case of a tuple that's both locked and updated (non-key/immutable columns only). Updates that change keys are relatively straightforward, following what we already do today. It's the non-key updates that complicate things. Ah, so there's the technical hitch. From previous discussion I was under the impression that: 1. Foreign-key updates only inherently conflict with _key_ updates on the foreign side, and that non-key updates on the foreign side were just caught in the locking cross-fire, so to speak. And 2. The DELETE case was somehow trivially accounted for. But, for instance, there does not seem to be a lighter lock type that DELETE conflicts with but UPDATE does not. Bummer. By then, though, that change would share little or no code with the current patch. It may have its own value, but it's not a means for carving a subset from the current patch. No, to be clear, it was never meant to be. Only a possible way to give users a way out of foreign-key locks more quickly. Not a way to get some of the branch out to users more quickly. At any rate, that seems to be moot then. And to be honest, mechanisms designed for more than one purpose rarely pan out. Thanks for explaining! Jeroen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 12:43:11PM -0300, Alvaro Herrera wrote: > Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012: > > On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch wrote: > > > > >> All in all, I think this is in pretty much final shape. ??Only pg_upgrade > > >> bits are still missing. ??If sharp eyes could give this a critical look > > >> and knuckle-cracking testers could give it a spin, that would be > > >> helpful. > > > > > > Lack of pg_upgrade support leaves this version incomplete, because that > > > omission would constitute a blocker for beta 2. ??This version changes as > > > much > > > code compared to the version I reviewed at the beginning of the > > > CommitFest as > > > that version changed overall. ??In that light, it's time to close the > > > books on > > > this patch for the purpose of this CommitFest; I'm marking it Returned > > > with > > > Feedback. ??Thanks for your efforts thus far. > > Now this is an interesting turn of events. I must thank you for your > extensive review effort in the current version of the patch, and also > thank you and credit you for the idea that initially kicked this patch > from the older, smaller, simpler version I wrote during the 9.1 timeline > (which you also reviewed exhaustively). Without your and Simon's > brilliant ideas, this patch wouldn't exist at all. > > I completely understand that you don't want to review this latest > version of the patch; it's a lot of effort and I wouldn't inflict it on > anybody who hasn't not volunteered. However, it doesn't seem to me that > this is reason to boot the patch from the commitfest. I think the thing > to do would be to remove yourself from the reviewers column and set it > back to "needs review", so that other reviewers can pick it up. It would indeed be wrong to change any patch from Needs Review to Returned with Feedback on account of a personal distaste for reviewing the patch. I hope I did not harbor such a motive here. Rather, this CommitFest has given your patch its fair shake, and I and other reviewers would better serve the CF's needs by reviewing, say, "ECPG FETCH readahead" instead of your latest submission. Likewise, you would better serve the CF by evaluating one of the four non-committer patches that have been Ready for Committer since January. That's not to imply that the goals of the CF align with my goals, your goals, or broader PGDG goals. The patch status on commitfest.postgresql.org does exist solely for the advancement of the CF, and I have set it in accordingly. > As for the late code churn, it mostly happened as a result of your > own feedback; I would have left most of it in the original state, but as > I went ahead it seemed much better to refactor things. This is mostly > in heapam.c. As for multixact.c, it also had a lot of churn, but that > was mostly to restore it to the state it has in the master branch, > dropping much of the code I had written to handle multixact truncation. > The new code there and in the vacuum code path (relminmxid and so on) is > a lot smaller than that other code was, and it's closely based on > relfrozenxid which is a known piece of technology. I appreciate that. > > However, review of such a large patch should not be simply pass or > > fail. We should be looking back at the original problem and ask > > ourselves whether some subset of the patch could solve a useful subset > > of the problem. For me, that seems quite likely and this is very > > definitely an important patch. Incidentally, I find it harmful to think of "Returned with Feedback" as "fail". For large patches, it's healthier to think of a CF as a bimonthly project status meeting with stakeholders. When the project is done, wonderful! When there's work left, that's no great surprise. > > Even if we can't solve some part of the problem we can at least commit > > some useful parts of infrastructure to allow later work to happen more > > smoothly and quickly. > > > > So please let's not focus on the 100%, lets focus on 80/20. > > Well, we have the patch I originally posted in the 9.1 timeframe. > That's a lot smaller and simpler. However, that solves only part of the > blocking problem, and in particular it doesn't fix the initial deadlock > reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case > you wonder about his change of email address) that started this effort > in the first place. I don't think we can cut down to that and still > satisfy the users that requested this; and Glue was just the first one, > because after I started blogging about this, some more people started > asking for it. > > I don't think there's any useful middlepoint between that one and the > current one, but maybe I'm wrong. Nothing additional comes to my mind, either. This patch is monolithic. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 06:36:42PM -0300, Alvaro Herrera wrote: > > Excerpts from Noah Misch's message of mi?? feb 22 14:00:07 -0300 2012: > > > > On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote: > > > On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote: > > > On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: > > > > * Columns that are part of the key > > > > Noah thinks the set of columns should only consider those actually > > > > referenced > > > > by keys, not those that *could* be referenced. > > > > > > Well, do you disagree? To me it's low-hanging fruit, because it isolates > > > the > > > UPDATE-time overhead of this patch to FK-referenced tables rather than all > > > tables having a PK or PK-like index (often just "all tables"). > > > > You have not answered my question above. > > Sorry. The reason I didn't research this is that at the very start of > the discussion it was said that having heapam.c figure out whether > columns are being used as FK destinations or not would be more of a > modularity violation than "indexed columns" already are for HOT support > (this was a contentious issue for HOT, so I don't take it lightly. I > don't think I need any more reasons for Tom to object to this patch, or > more bulk into it. Both are already serious issues.) That's fair. > In any case, with the way we've defined FOR KEY SHARE locks (in the docs > it's explicitely said that the set of columns considered could vary in > the future), it's a relatively easy patch to add on top of what I've > submitted. Just as the ALTER TABLE bits to add columns to the set of > columns considered, it could be left for a second pass on the issue. Agreed. Let's have that debate another day, as a follow-on patch. Thanks for shedding this light. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Noah Misch's message of mié feb 22 14:00:07 -0300 2012: > > On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote: > On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote: > > On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: > > > * Columns that are part of the key > > > Noah thinks the set of columns should only consider those actually > > > referenced > > > by keys, not those that *could* be referenced. > > > > Well, do you disagree? To me it's low-hanging fruit, because it isolates > > the > > UPDATE-time overhead of this patch to FK-referenced tables rather than all > > tables having a PK or PK-like index (often just "all tables"). > > You have not answered my question above. Sorry. The reason I didn't research this is that at the very start of the discussion it was said that having heapam.c figure out whether columns are being used as FK destinations or not would be more of a modularity violation than "indexed columns" already are for HOT support (this was a contentious issue for HOT, so I don't take it lightly. I don't think I need any more reasons for Tom to object to this patch, or more bulk into it. Both are already serious issues.) In any case, with the way we've defined FOR KEY SHARE locks (in the docs it's explicitely said that the set of columns considered could vary in the future), it's a relatively easy patch to add on top of what I've submitted. Just as the ALTER TABLE bits to add columns to the set of columns considered, it could be left for a second pass on the issue. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 02:08:28PM +0100, Jeroen Vermeulen wrote: > On 2012-02-23 10:18, Simon Riggs wrote: > >> However, review of such a large patch should not be simply pass or >> fail. We should be looking back at the original problem and ask >> ourselves whether some subset of the patch could solve a useful subset >> of the problem. For me, that seems quite likely and this is very >> definitely an important patch. >> >> Even if we can't solve some part of the problem we can at least commit >> some useful parts of infrastructure to allow later work to happen more >> smoothly and quickly. >> >> So please let's not focus on the 100%, lets focus on 80/20. > > The suggested immutable-column constraint was meant as a potential > "80/20 workaround." Definitely not a full solution, helpful to some, > probably easier to do. I don't know if an immutable key would actually > be enough to elide foreign-key locks though. That alone would not simplify the patch much. INSERT/UPDATE/DELETE on the foreign side would still need to take some kind of tuple lock on the primary side to prevent primary-side DELETE. You then still face the complicated case of a tuple that's both locked and updated (non-key/immutable columns only). Updates that change keys are relatively straightforward, following what we already do today. It's the non-key updates that complicate things. If you had both an immutable column constraint and a never-deleted table constraint, that combination would be sufficient to simplify the picture. (Directly or indirectly, it would not actually be a never-deleted constraint so much as a "you must take AccessExclusiveLock to DELETE" constraint.) Foreign-side DML would then take an AccessShareLock on the parent table with no tuple lock at all. By then, though, that change would share little or no code with the current patch. It may have its own value, but it's not a means for carving a subset from the current patch. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Alvaro Herrera wrote: > Excerpts from Kevin Grittner's message: >> Since the limitation on what can be stored in xmax was the killer >> for Florian's attempt to support SELECT FOR UPDATE in a form >> which was arguably more useful (and certainly more convenient for >> those converting from other database products), I'm wondering >> whether anyone has determined whether this new scheme would allow >> Florian's work to be successfully completed. The issues seem >> very similar. If this approach also provides a basis for the >> other work, I think it helps bolster the argument that this is a >> good design; if not, I think it suggests that maybe it should be >> made more general or extensible in some way. Once this has to be >> supported by pg_upgrade it will be harder to change the format, >> if that is needed for some other feature. > > I have no idea what improvements Florian was seeking, but > multixacts now have plenty of bit flag space to indicate whatever > we want for each member transaction, so most likely the answer is > yes. However we need to make clear that a single SELECT FOR > UPDATE in a tuple does not currently use a multixact; if we wish > to always store flags then we are forced to use one which incurs a > performance hit. Well, his effort really started to go into a tailspin on the related issues here: http://archives.postgresql.org/pgsql-hackers/2010-12/msg01743.php ... with a summary of the problem and possible directions for a solution here: http://archives.postgresql.org/pgsql-hackers/2010-12/msg01833.php One of the problems that Florian was trying to address is that people often have a need to enforce something with a lot of similarity to a foreign key, but with more subtle logic than declarative foreign keys support. One example would be the case Robert has used in some presentations, where the manager column in each row in a project table must contain the id of a row in a person table *which has the project_manager boolean column set to TRUE*. Short of using the new serializable transaction isolation level in all related transactions, hand-coding enforcement of this useful invariant through trigger code (or application code enforced through some framework) is very tricky. The change to SELECT FOR UPDATE that Florian was working on would make it pretty straightforward. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 4:01 PM, Alvaro Herrera wrote: > As far as complexity, yeah, it's a lot more complex now -- no question > about that. As far as complexity goes, would it be easier if we treated the UPDATE of a primary key column as a DELETE plus an INSERT? There's not really a logical reason why updating a primary key has meaning, so allowing an ExecPlanQual to follow the chain across primary key values doesn't seem valid to me. That would make all primary keys IMMUTABLE to updates. No primary key, no problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On 02/23/2012 01:04 PM, Alvaro Herrera wrote: "manual vacuum is teh sux0r" I think you've just named my next conference talk submission. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Greg Smith's message of jue feb 23 14:48:13 -0300 2012: > On 02/23/2012 10:43 AM, Alvaro Herrera wrote: > > I completely understand that you don't want to review this latest > > version of the patch; it's a lot of effort and I wouldn't inflict it on > > anybody who hasn't not volunteered. However, it doesn't seem to me that > > this is reason to boot the patch from the commitfest. I think the thing > > to do would be to remove yourself from the reviewers column and set it > > back to "needs review", so that other reviewers can pick it up. > > This feature made Robert's list of serious CF concerns, too, and the > idea that majorly revised patches might be punted isn't a new one. Well, this patch (or rather, a previous incarnation of it) got punted from 9.1's fourth commitfest; I intended to have the new version in 9.2's first CF, but business reasons (which I will not discuss in public) forced me otherwise. So here we are again -- as I said to Tom, I don't intend to let go of this one easily, though of course I will concede to whatever the community decides. > Noah > is certainly justified in saying you're off his community support list, > after all the review work he's been doing for this CF. Yeah, I can't blame him. I've been trying to focus most of my review availability on his own patches precisely due to that, but it's very clear to me that his effort is larger than mine. > We here think it would be a shame for all of these other performance > bits to be sorted but still have this one loose though, if it's possible > to keep going on it. It's well known as something on Simon's peeve list > for some time now. I was just reading someone else ranting about how > this foreign key locking issue proves Postgres isn't "enterprise scale" > yesterday, it was part of an article proving why DB2 is worth paying for > I think. This change crosses over into the advocacy area due to that, > albeit only for the people who have been burned by this already. Yeah, Simon's been on this particular issue for quite some time -- which is probably why the initial idea that kickstarted this patch was his. Personally I've been in the "not enterprise strength" camp for a long time, mostly unintentionally; you can see that by tracing how my major patches close holes in that kind of area ("cluster loses indexes", "we don't have subtransactions", "foreign key concurrency sucks" (--> SELECT FOR SHARE), "manual vacuum is teh sux0r", and now this one about FKs again). -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On 02/23/2012 10:43 AM, Alvaro Herrera wrote: I completely understand that you don't want to review this latest version of the patch; it's a lot of effort and I wouldn't inflict it on anybody who hasn't not volunteered. However, it doesn't seem to me that this is reason to boot the patch from the commitfest. I think the thing to do would be to remove yourself from the reviewers column and set it back to "needs review", so that other reviewers can pick it up. This feature made Robert's list of serious CF concerns, too, and the idea that majorly revised patches might be punted isn't a new one. Noah is certainly justified in saying you're off his community support list, after all the review work he's been doing for this CF. We here think it would be a shame for all of these other performance bits to be sorted but still have this one loose though, if it's possible to keep going on it. It's well known as something on Simon's peeve list for some time now. I was just reading someone else ranting about how this foreign key locking issue proves Postgres isn't "enterprise scale" yesterday, it was part of an article proving why DB2 is worth paying for I think. This change crosses over into the advocacy area due to that, albeit only for the people who have been burned by this already. If the main problem is pg_upgrade complexity, eventually progress on that front needs to be made. I'm surprised the project has survived this long without needing anything beyond catalog conversion for in-place upgrade. That luck won't hold forever. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Kevin Grittner's message of jue feb 23 13:31:36 -0300 2012: > > Alvaro Herrera wrote: > > > As for sanity -- I regard multixacts as a way to store extended > > Xmax information. The original idea was obviously much more > > limited in that the extended info was just locking info. We've > > extended it but I don't think it's such a stretch. > > Since the limitation on what can be stored in xmax was the killer > for Florian's attempt to support SELECT FOR UPDATE in a form which > was arguably more useful (and certainly more convenient for those > converting from other database products), I'm wondering whether > anyone has determined whether this new scheme would allow Florian's > work to be successfully completed. The issues seem very similar. > If this approach also provides a basis for the other work, I think > it helps bolster the argument that this is a good design; if not, I > think it suggests that maybe it should be made more general or > extensible in some way. Once this has to be supported by pg_upgrade > it will be harder to change the format, if that is needed for some > other feature. I have no idea what improvements Florian was seeking, but multixacts now have plenty of bit flag space to indicate whatever we want for each member transaction, so most likely the answer is yes. However we need to make clear that a single SELECT FOR UPDATE in a tuple does not currently use a multixact; if we wish to always store flags then we are forced to use one which incurs a performance hit. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Alvaro Herrera wrote: > As for sanity -- I regard multixacts as a way to store extended > Xmax information. The original idea was obviously much more > limited in that the extended info was just locking info. We've > extended it but I don't think it's such a stretch. Since the limitation on what can be stored in xmax was the killer for Florian's attempt to support SELECT FOR UPDATE in a form which was arguably more useful (and certainly more convenient for those converting from other database products), I'm wondering whether anyone has determined whether this new scheme would allow Florian's work to be successfully completed. The issues seem very similar. If this approach also provides a basis for the other work, I think it helps bolster the argument that this is a good design; if not, I think it suggests that maybe it should be made more general or extensible in some way. Once this has to be supported by pg_upgrade it will be harder to change the format, if that is needed for some other feature. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Tom Lane's message of jue feb 23 12:28:20 -0300 2012: > > Alvaro Herrera writes: > > Sure. The problem is that we are allowing updated rows to be locked (and > > locked rows to be updated). This means that we need to store extended > > Xmax information in tuples that goes beyond mere locks, which is what we > > were doing previously -- they may now have locks and updates simultaneously. > > > (In the previous code, a multixact never meant an update, it always > > signified only shared locks. After a crash, all backends that could > > have been holding locks must necessarily be gone, so the multixact info > > is not interesting and can be treated like the tuple is simply live.) > > Ugh. I had not been paying attention to what you were doing in this > patch, and now that I read this I wish I had objected earlier. Uhm, yeah, a lot earlier -- I initially blogged about this in August last year: http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/ and in several posts in pgsql-hackers. > This > seems like a horrid mess that's going to be unsustainable both from a > complexity and a performance standpoint. The only reason multixacts > were tolerable at all was that they had only one semantics. Changing > it so that maybe a multixact represents an actual updater and maybe > it doesn't is not sane. As far as complexity, yeah, it's a lot more complex now -- no question about that. Regarding performance, the good thing about this patch is that if you have an operation that used to block, it might now not block. So maybe multixact-related operation is a bit slower than before, but if it allows you to continue operating rather than sit waiting until some other transaction releases you, it's much better. As for sanity -- I regard multixacts as a way to store extended Xmax information. The original idea was obviously much more limited in that the extended info was just locking info. We've extended it but I don't think it's such a stretch. I have been posting about (most? all of?) the ideas that I've been following to make this work at all, so that people had plenty of chances to disagree with them -- and Noah and others did disagree with many of them, so I changed the patch accordingly. I'm not closed to further rework, but I'm not going to entirely abandon the idea too lightly. I'm sure there are bugs too, but hopefully there are as shallow as interested reviewer eyeballs there are. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of jue feb 23 12:12:13 -0300 2012: > On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera > wrote: > > Sure. The problem is that we are allowing updated rows to be locked (and > > locked rows to be updated). This means that we need to store extended > > Xmax information in tuples that goes beyond mere locks, which is what we > > were doing previously -- they may now have locks and updates simultaneously. > OK, thanks. > > So why do we need pg_upgrade support? Two reasons. One is that in upgrades from a version that contains this patch to another version that also contains this patch (i.e. future upgrades), we need to copy the multixact files from the old cluster to the new. The other is that in upgrades from a version that doesn't contain this patch to a version that does, we need to set the multixact limit values so that values that were used in the old cluster are returned as empty values (keeping the old semantics); otherwise they would cause errors trying to read the member Xids from disk. > If pg_multixact is not persistent now, surely there is no requirement > to have pg_upgrade do any form of upgrade? The only time we'll need to > do this is from 9.2 to 9.3, which can of course occur any time in next > year. That doesn't sound like a reason to block a patch now, because > of something that will be needed a year from now. I think there's a policy that we must allow upgrades from one beta to the next, which is why Noah says this is a blocker starting from beta2. The pg_upgrade code for this is rather simple however. There's no rocket science there. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of jue feb 23 06:18:57 -0300 2012: > > On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch wrote: > > >> All in all, I think this is in pretty much final shape. Only pg_upgrade > >> bits are still missing. If sharp eyes could give this a critical look > >> and knuckle-cracking testers could give it a spin, that would be > >> helpful. > > > > Lack of pg_upgrade support leaves this version incomplete, because that > > omission would constitute a blocker for beta 2. This version changes as > > much > > code compared to the version I reviewed at the beginning of the CommitFest > > as > > that version changed overall. In that light, it's time to close the books > > on > > this patch for the purpose of this CommitFest; I'm marking it Returned with > > Feedback. Thanks for your efforts thus far. Now this is an interesting turn of events. I must thank you for your extensive review effort in the current version of the patch, and also thank you and credit you for the idea that initially kicked this patch from the older, smaller, simpler version I wrote during the 9.1 timeline (which you also reviewed exhaustively). Without your and Simon's brilliant ideas, this patch wouldn't exist at all. I completely understand that you don't want to review this latest version of the patch; it's a lot of effort and I wouldn't inflict it on anybody who hasn't not volunteered. However, it doesn't seem to me that this is reason to boot the patch from the commitfest. I think the thing to do would be to remove yourself from the reviewers column and set it back to "needs review", so that other reviewers can pick it up. As for the late code churn, it mostly happened as a result of your own feedback; I would have left most of it in the original state, but as I went ahead it seemed much better to refactor things. This is mostly in heapam.c. As for multixact.c, it also had a lot of churn, but that was mostly to restore it to the state it has in the master branch, dropping much of the code I had written to handle multixact truncation. The new code there and in the vacuum code path (relminmxid and so on) is a lot smaller than that other code was, and it's closely based on relfrozenxid which is a known piece of technology. > My view would be that with 90 files touched this is a very large > patch, so that alone makes me wonder whether we should commit this > patch, so I agree with Noah and compliment him on an excellent > detailed review. I note, however, that the bulk of the patch is in three files -- multixact.c, tqual.c, heapam.c, as is clearly illustrated in the diff stats I posted. The rest of them are touched mostly to follow their new APIs (and of course to add tests and docs). To summarize, of 94 files touched in total: * 22 files are in src/test/isolation/ (new and updated tests and expected files) * 19 files are in src/include/ * 10 files are in contrib/ * 39 files are in src/backend; * in that subdir, there are 3097 insertions and 1006 deletions * 3047 (83%) of which are in heapam.c multixact.c tqual.c * one is a README > However, review of such a large patch should not be simply pass or > fail. We should be looking back at the original problem and ask > ourselves whether some subset of the patch could solve a useful subset > of the problem. For me, that seems quite likely and this is very > definitely an important patch. > > Even if we can't solve some part of the problem we can at least commit > some useful parts of infrastructure to allow later work to happen more > smoothly and quickly. > > So please let's not focus on the 100%, lets focus on 80/20. Well, we have the patch I originally posted in the 9.1 timeframe. That's a lot smaller and simpler. However, that solves only part of the blocking problem, and in particular it doesn't fix the initial deadlock reports from Joel Jacobson at Glue Finance (now renamed Trustly, in case you wonder about his change of email address) that started this effort in the first place. I don't think we can cut down to that and still satisfy the users that requested this; and Glue was just the first one, because after I started blogging about this, some more people started asking for it. I don't think there's any useful middlepoint between that one and the current one, but maybe I'm wrong. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Alvaro Herrera writes: > Sure. The problem is that we are allowing updated rows to be locked (and > locked rows to be updated). This means that we need to store extended > Xmax information in tuples that goes beyond mere locks, which is what we > were doing previously -- they may now have locks and updates simultaneously. > (In the previous code, a multixact never meant an update, it always > signified only shared locks. After a crash, all backends that could > have been holding locks must necessarily be gone, so the multixact info > is not interesting and can be treated like the tuple is simply live.) Ugh. I had not been paying attention to what you were doing in this patch, and now that I read this I wish I had objected earlier. This seems like a horrid mess that's going to be unsustainable both from a complexity and a performance standpoint. The only reason multixacts were tolerable at all was that they had only one semantics. Changing it so that maybe a multixact represents an actual updater and maybe it doesn't is not sane. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 3:04 PM, Alvaro Herrera wrote: > > Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012: >> On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch wrote: >> >> > Making pg_multixact persistent across clean shutdowns is no bridge to cross >> > lightly, since it means committing to an on-disk format for an indefinite >> > period. We should do it; the benefits of this patch justify it, and I >> > haven't >> > identified a way to avoid it without incurring worse problems. >> >> I can't actually see anything in the patch that explains why this is >> required. (That is something we should reject more patches on, since >> it creates a higher maintenance burden). >> >> Can someone explain? We might think of a way to avoid that. > > Sure. The problem is that we are allowing updated rows to be locked (and > locked rows to be updated). This means that we need to store extended > Xmax information in tuples that goes beyond mere locks, which is what we > were doing previously -- they may now have locks and updates simultaneously. > > (In the previous code, a multixact never meant an update, it always > signified only shared locks. After a crash, all backends that could > have been holding locks must necessarily be gone, so the multixact info > is not interesting and can be treated like the tuple is simply live.) > > This means that this extended Xmax info needs to be able to survive, so > that it's possible to retrieve it after a crash; because even if the > lockers are all gone, the updater might have committed and this means > the tuple is dead. If we failed to keep this, the tuple would be > considered live which would be wrong because the other version of the > tuple, which was created by the update, is also live. OK, thanks. So why do we need pg_upgrade support? If pg_multixact is not persistent now, surely there is no requirement to have pg_upgrade do any form of upgrade? The only time we'll need to do this is from 9.2 to 9.3, which can of course occur any time in next year. That doesn't sound like a reason to block a patch now, because of something that will be needed a year from now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Simon Riggs's message of jue feb 23 11:15:45 -0300 2012: > On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch wrote: > > > Making pg_multixact persistent across clean shutdowns is no bridge to cross > > lightly, since it means committing to an on-disk format for an indefinite > > period. We should do it; the benefits of this patch justify it, and I > > haven't > > identified a way to avoid it without incurring worse problems. > > I can't actually see anything in the patch that explains why this is > required. (That is something we should reject more patches on, since > it creates a higher maintenance burden). > > Can someone explain? We might think of a way to avoid that. Sure. The problem is that we are allowing updated rows to be locked (and locked rows to be updated). This means that we need to store extended Xmax information in tuples that goes beyond mere locks, which is what we were doing previously -- they may now have locks and updates simultaneously. (In the previous code, a multixact never meant an update, it always signified only shared locks. After a crash, all backends that could have been holding locks must necessarily be gone, so the multixact info is not interesting and can be treated like the tuple is simply live.) This means that this extended Xmax info needs to be able to survive, so that it's possible to retrieve it after a crash; because even if the lockers are all gone, the updater might have committed and this means the tuple is dead. If we failed to keep this, the tuple would be considered live which would be wrong because the other version of the tuple, which was created by the update, is also live. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Thu, Feb 23, 2012 at 1:08 PM, Jeroen Vermeulen wrote: > Simon, I think you had a reason why it couldn't work, but I didn't quite get > your meaning and didn't want to distract things further at that stage. You > wrote that it "doesn't do what KEY LOCKS are designed to do"... any chance > you might recall what the problem was? The IMMUTABLE idea would work, but it requires all users to recode their apps. By the time they've done that we'll have probably fixed the problem in full anyway, so then we have to ask them to stop again, which is hard so we'll be stuck with a performance tweak that applies to just one release. So its the fully automatic solution we're looking for. I don't object to someone implementing IMMUTABLE, I'm just saying its not a way to get this patch simpler and therefore acceptable. If people are willing to recode apps to avoid this then hire me and I'll tell you how ;-) -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Sun, Dec 4, 2011 at 12:20 PM, Noah Misch wrote: > Making pg_multixact persistent across clean shutdowns is no bridge to cross > lightly, since it means committing to an on-disk format for an indefinite > period. We should do it; the benefits of this patch justify it, and I haven't > identified a way to avoid it without incurring worse problems. I can't actually see anything in the patch that explains why this is required. (That is something we should reject more patches on, since it creates a higher maintenance burden). Can someone explain? We might think of a way to avoid that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On 2012-02-23 10:18, Simon Riggs wrote: However, review of such a large patch should not be simply pass or fail. We should be looking back at the original problem and ask ourselves whether some subset of the patch could solve a useful subset of the problem. For me, that seems quite likely and this is very definitely an important patch. Even if we can't solve some part of the problem we can at least commit some useful parts of infrastructure to allow later work to happen more smoothly and quickly. So please let's not focus on the 100%, lets focus on 80/20. The suggested immutable-column constraint was meant as a potential "80/20 workaround." Definitely not a full solution, helpful to some, probably easier to do. I don't know if an immutable key would actually be enough to elide foreign-key locks though. Simon, I think you had a reason why it couldn't work, but I didn't quite get your meaning and didn't want to distract things further at that stage. You wrote that it "doesn't do what KEY LOCKS are designed to do"... any chance you might recall what the problem was? I don't mean to be pushy about my pet idea, and heaven knows I don't have time to implement it, but it'd be good to know whether I should put the whole thought to rest. Jeroen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Feb 22, 2012 at 5:00 PM, Noah Misch wrote: >> All in all, I think this is in pretty much final shape. Only pg_upgrade >> bits are still missing. If sharp eyes could give this a critical look >> and knuckle-cracking testers could give it a spin, that would be >> helpful. > > Lack of pg_upgrade support leaves this version incomplete, because that > omission would constitute a blocker for beta 2. This version changes as much > code compared to the version I reviewed at the beginning of the CommitFest as > that version changed overall. In that light, it's time to close the books on > this patch for the purpose of this CommitFest; I'm marking it Returned with > Feedback. Thanks for your efforts thus far. My view would be that with 90 files touched this is a very large patch, so that alone makes me wonder whether we should commit this patch, so I agree with Noah and compliment him on an excellent detailed review. However, review of such a large patch should not be simply pass or fail. We should be looking back at the original problem and ask ourselves whether some subset of the patch could solve a useful subset of the problem. For me, that seems quite likely and this is very definitely an important patch. Even if we can't solve some part of the problem we can at least commit some useful parts of infrastructure to allow later work to happen more smoothly and quickly. So please let's not focus on the 100%, lets focus on 80/20. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Feb 13, 2012 at 07:16:58PM -0300, Alvaro Herrera wrote: > Okay, so this patch fixes the truncation and wraparound issues through a > mechanism much like pg_clog's: it keeps track of the oldest possibly > existing multis on each and every table, and then during tuple freezing > those are removed. I also took the liberty to make the code remove > multis altogether (i.e. resetting the IS_MULTI hint bit) when only the > update remains and lockers are all gone. > > I also cleaned up the code in heapam so that there's a couple of tables > mapping MultiXactStatus to LockTupleMode and back, and to heavyweight > lock modes (the older patches used functions to do this, which was > pretty ugly). I had to add a little helper function to lock.c to make > this work. I made a rather large bunch of other minor changes to close > minor bugs here and there. > > Docs have been added, as have new tests for the isolation harness, which > I've ensured pass in both read committed and serializable modes; WAL > logging for locking updated versions of a tuple, when an old one is > locked due to an old snapshot, was also added; there's plenty of room > for growth in the MultiXact flag bits; the bit that made tables with no > keys lock the entire row all the time was removed; multiple places in > code comments were cleaned up that referred to this feature as "FOR KEY > LOCK" and ensured that it also mentions FOR KEY UPDATE; the pg_rowlocks, > pageinspect, pg_controldata, pg_resetxlog utilities have been updated. All of the above sounds great. I especially like the growing test coverage. > All in all, I think this is in pretty much final shape. Only pg_upgrade > bits are still missing. If sharp eyes could give this a critical look > and knuckle-cracking testers could give it a spin, that would be > helpful. Lack of pg_upgrade support leaves this version incomplete, because that omission would constitute a blocker for beta 2. This version changes as much code compared to the version I reviewed at the beginning of the CommitFest as that version changed overall. In that light, it's time to close the books on this patch for the purpose of this CommitFest; I'm marking it Returned with Feedback. Thanks for your efforts thus far. On Mon, Jan 30, 2012 at 06:48:47PM -0500, Noah Misch wrote: > On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: > > * Columns that are part of the key > > Noah thinks the set of columns should only consider those actually > > referenced > > by keys, not those that *could* be referenced. > > Well, do you disagree? To me it's low-hanging fruit, because it isolates the > UPDATE-time overhead of this patch to FK-referenced tables rather than all > tables having a PK or PK-like index (often just "all tables"). You have not answered my question above. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Jim Nasby's message of mié feb 01 21:33:47 -0300 2012: > > On Jan 31, 2012, at 10:58 AM, Alvaro Herrera wrote: > >> I think it's butt-ugly, but it's only slightly uglier than > >> relfrozenxid which we're already stuck with. The slight amount of > >> additional ugliness is that you're going to use an XID column to store > >> a uint4 that is not an XID - but I don't have a great idea how to fix > >> that. You could mislabel it as an OID or a (signed) int4, but I'm not > >> sure that either of those is any better. We could also create an mxid > >> data type, but that seems like it might be overkill. > > > > Well, we're already storing a multixact in Xmax, so it's not like we > > don't assume that we can store multis in space normally reserved for > > Xids. What I've been wondering is not how ugly it is, but rather of the > > fact that we're bloating pg_class some more. > > FWIW, users have been known to request uint datatypes; so if this really is a > uint perhaps we should just create a uint datatype... Yeah. This is just for internal consumption, though, not a full-blown datatype. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Jan 31, 2012, at 10:58 AM, Alvaro Herrera wrote: >> I think it's butt-ugly, but it's only slightly uglier than >> relfrozenxid which we're already stuck with. The slight amount of >> additional ugliness is that you're going to use an XID column to store >> a uint4 that is not an XID - but I don't have a great idea how to fix >> that. You could mislabel it as an OID or a (signed) int4, but I'm not >> sure that either of those is any better. We could also create an mxid >> data type, but that seems like it might be overkill. > > Well, we're already storing a multixact in Xmax, so it's not like we > don't assume that we can store multis in space normally reserved for > Xids. What I've been wondering is not how ugly it is, but rather of the > fact that we're bloating pg_class some more. FWIW, users have been known to request uint datatypes; so if this really is a uint perhaps we should just create a uint datatype... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Jan 31, 2012 at 11:58 AM, Alvaro Herrera wrote: > Well, we're already storing a multixact in Xmax, so it's not like we > don't assume that we can store multis in space normally reserved for > Xids. What I've been wondering is not how ugly it is, but rather of the > fact that we're bloating pg_class some more. I don't think another 4 bytes in pg_class is that big a deal. We don't do relcache rebuilds frequently enough for that to really matter much. The bigger cost of this patch seems to me to be that we're going to have to carry around multi-xact IDs for a long time, and probably fsync and/or WAL-log them moreso than now. I'm not sure how much you've worried about that, but a new column in pg_class seems like line noise by comparison. >> What about SELECT FOR UPDATE? That's a pretty common case, I think. >> If that's now going to force a multixact to get created and >> additionally force multixact lookups when the row is subsequently >> examined, that seems, well, actually pretty scary at first glance. >> SELECT FOR UPDATE is fairly expensive as it stands, and is commonly >> used. > > SELECT FOR UPDATE is still going to work without a multi in the simple > cases. The case where it's different is when somebody else grabs a KEY > SHARE lock on the same tuple; it's now going to get a multi, where it > previously blocked. So other transactions later checking the tuple will > have a bit of a larger cost. That's okay considering that it meant > the other transaction did not have to wait anymore. OK. I assume that the different treatment of SELECT FOR SHARE is due to lack of bit space? >> >> 2. What algorithm did we end up using do fix the set of key columns, >> >> and is there any user configuration that can or needs to happen there? >> > >> > Currently we just use all columns indexed by unique indexes (excluding >> > expressional and partial ones). Furthermore we consider "key column" >> > all columns in a table without unique indexes. Noah disagrees with this >> > choice; he says we should drop this last point, and that we should relax >> > the first to "columns actually used by foreign key constraints". I >> > expect that this is a rather simple change. >> >> Why the special case for tables without unique indexes? Like Noah, I >> don't see the point. Unless there's some trade-off I'm not seeing, we >> should want the number of key columns to be as minimal as possible, so >> that as many updates as possible can use the "cheap" path that doesn't >> involve locking the whole tuple. > > No trade-off. I just thought it was safer: my thought was that if > there's no nominated key column, the safer bet was that any of them > could have been. But then, in reality there cannot be any foreign key > here anyway. I'll revert that bit. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Robert Haas's message of mar ene 31 13:18:30 -0300 2012: > > On Tue, Jan 31, 2012 at 9:19 AM, Alvaro Herrera > wrote: > > Excerpts from Robert Haas's message of mar ene 31 10:17:40 -0300 2012: > >> I suspect you are right that it is unlikely, but OTOH that sounds like > >> an extremely painful recovery procedure. We probably don't need to > >> put a ton of thought into handling this case as efficiently as > >> possible, but I think we would do well to avoid situations that could > >> lead to, basically, a full-cluster shutdown. If that happens to one > >> of my customers I expect to lose the customer. > > > > Okay, so the worst case here is really bad and we should do something > > about it. Are you okay with a new pg_class column of type xid? The > > advantage is not only that we would be able to track it with high > > precision; we would also get rid of a lot of code in which I have little > > confidence. > > I think it's butt-ugly, but it's only slightly uglier than > relfrozenxid which we're already stuck with. The slight amount of > additional ugliness is that you're going to use an XID column to store > a uint4 that is not an XID - but I don't have a great idea how to fix > that. You could mislabel it as an OID or a (signed) int4, but I'm not > sure that either of those is any better. We could also create an mxid > data type, but that seems like it might be overkill. Well, we're already storing a multixact in Xmax, so it's not like we don't assume that we can store multis in space normally reserved for Xids. What I've been wondering is not how ugly it is, but rather of the fact that we're bloating pg_class some more. > >> 1. I think it's probably fair to assume that this is going to be a > >> huge win in cases where it avoids deadlocks or lock waits. But is > >> there a worst case where we don't avoid that but still add a lot of > >> extra multi-xact lookups? What's the worst case we can imagine and > >> how pathological does the workload have to be to tickle that case? > > > > Hm. I haven't really thought about this. There are some code paths > > that now have to resolve Multixacts that previously did not; things like > > vacuum. I don't think there's any case in which we previously did not > > block and now block, but there might be things that got slower without > > blocking. One thing that definitely got slower is use of SELECT FOR > > SHARE. (This command previously used hint bits to mark the row as > > locked; now it is always going to create a multixact). However, I > > expect that with foreign keys switching to FOR KEY SHARE, the use of FOR > > SHARE is going to decline, maybe disappear completely, so it shouldn't > > be a problem. > > What about SELECT FOR UPDATE? That's a pretty common case, I think. > If that's now going to force a multixact to get created and > additionally force multixact lookups when the row is subsequently > examined, that seems, well, actually pretty scary at first glance. > SELECT FOR UPDATE is fairly expensive as it stands, and is commonly > used. SELECT FOR UPDATE is still going to work without a multi in the simple cases. The case where it's different is when somebody else grabs a KEY SHARE lock on the same tuple; it's now going to get a multi, where it previously blocked. So other transactions later checking the tuple will have a bit of a larger cost. That's okay considering that it meant the other transaction did not have to wait anymore. > >> 2. What algorithm did we end up using do fix the set of key columns, > >> and is there any user configuration that can or needs to happen there? > > > > Currently we just use all columns indexed by unique indexes (excluding > > expressional and partial ones). Furthermore we consider "key column" > > all columns in a table without unique indexes. Noah disagrees with this > > choice; he says we should drop this last point, and that we should relax > > the first to "columns actually used by foreign key constraints". I > > expect that this is a rather simple change. > > Why the special case for tables without unique indexes? Like Noah, I > don't see the point. Unless there's some trade-off I'm not seeing, we > should want the number of key columns to be as minimal as possible, so > that as many updates as possible can use the "cheap" path that doesn't > involve locking the whole tuple. No trade-off. I just thought it was safer: my thought was that if there's no nominated key column, the safer bet was that any of them could have been. But then, in reality there cannot be any foreign key here anyway. I'll revert that bit. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Jan 31, 2012 at 9:19 AM, Alvaro Herrera wrote: > Excerpts from Robert Haas's message of mar ene 31 10:17:40 -0300 2012: >> I suspect you are right that it is unlikely, but OTOH that sounds like >> an extremely painful recovery procedure. We probably don't need to >> put a ton of thought into handling this case as efficiently as >> possible, but I think we would do well to avoid situations that could >> lead to, basically, a full-cluster shutdown. If that happens to one >> of my customers I expect to lose the customer. > > Okay, so the worst case here is really bad and we should do something > about it. Are you okay with a new pg_class column of type xid? The > advantage is not only that we would be able to track it with high > precision; we would also get rid of a lot of code in which I have little > confidence. I think it's butt-ugly, but it's only slightly uglier than relfrozenxid which we're already stuck with. The slight amount of additional ugliness is that you're going to use an XID column to store a uint4 that is not an XID - but I don't have a great idea how to fix that. You could mislabel it as an OID or a (signed) int4, but I'm not sure that either of those is any better. We could also create an mxid data type, but that seems like it might be overkill. >> I have a couple of other concerns about this patch: >> >> 1. I think it's probably fair to assume that this is going to be a >> huge win in cases where it avoids deadlocks or lock waits. But is >> there a worst case where we don't avoid that but still add a lot of >> extra multi-xact lookups? What's the worst case we can imagine and >> how pathological does the workload have to be to tickle that case? > > Hm. I haven't really thought about this. There are some code paths > that now have to resolve Multixacts that previously did not; things like > vacuum. I don't think there's any case in which we previously did not > block and now block, but there might be things that got slower without > blocking. One thing that definitely got slower is use of SELECT FOR > SHARE. (This command previously used hint bits to mark the row as > locked; now it is always going to create a multixact). However, I > expect that with foreign keys switching to FOR KEY SHARE, the use of FOR > SHARE is going to decline, maybe disappear completely, so it shouldn't > be a problem. What about SELECT FOR UPDATE? That's a pretty common case, I think. If that's now going to force a multixact to get created and additionally force multixact lookups when the row is subsequently examined, that seems, well, actually pretty scary at first glance. SELECT FOR UPDATE is fairly expensive as it stands, and is commonly used. >> 2. What algorithm did we end up using do fix the set of key columns, >> and is there any user configuration that can or needs to happen there? > > Currently we just use all columns indexed by unique indexes (excluding > expressional and partial ones). Furthermore we consider "key column" > all columns in a table without unique indexes. Noah disagrees with this > choice; he says we should drop this last point, and that we should relax > the first to "columns actually used by foreign key constraints". I > expect that this is a rather simple change. Why the special case for tables without unique indexes? Like Noah, I don't see the point. Unless there's some trade-off I'm not seeing, we should want the number of key columns to be as minimal as possible, so that as many updates as possible can use the "cheap" path that doesn't involve locking the whole tuple. >> Do we handle cleanly the case where the set of key columns is changed >> by DDL? > > Hmm, I remember thinking about this at some point, but now I'm not 100% > sure. I think it doesn't matter due to multis being so ephemeral. Let > me try and figure it out. I thought part of the point here was that multixacts aren't so ephemeral any more: they're going to stick around until the table gets frozen. I'm worried that's going to turn out to be a problem somehow. With respect to this particular issue, what I'm worried about is something like this: 1. Transaction A begins. 2. Transaction B begins and does some updating or locking of table T, and then commits. 3. Transaction C begins and does DDL on table T, acquiring AccessExclusiveLock while it does so, and changes the set of key columns. It then commits. 4A.Transaction A now accesses table T and/or 4B. Transaction D begins and accesses table T. At step 4, A and/or D have an up-to-date relcache entries that correctly describes the current set of key columns in T. But the work done by transaction B was done with a different set of key columns (could be more or less), and A and/or D mustn't get confused on that basis. Also, in the case of A, there is the further possibility that A's snapshot can't see B as committed yet (even though C subsequently held an AccessExclusiveLock on the table). -- Robert Haas EnterpriseDB: http://w
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Robert Haas's message of mar ene 31 10:17:40 -0300 2012: > I suspect you are right that it is unlikely, but OTOH that sounds like > an extremely painful recovery procedure. We probably don't need to > put a ton of thought into handling this case as efficiently as > possible, but I think we would do well to avoid situations that could > lead to, basically, a full-cluster shutdown. If that happens to one > of my customers I expect to lose the customer. Okay, so the worst case here is really bad and we should do something about it. Are you okay with a new pg_class column of type xid? The advantage is not only that we would be able to track it with high precision; we would also get rid of a lot of code in which I have little confidence. > I have a couple of other concerns about this patch: > > 1. I think it's probably fair to assume that this is going to be a > huge win in cases where it avoids deadlocks or lock waits. But is > there a worst case where we don't avoid that but still add a lot of > extra multi-xact lookups? What's the worst case we can imagine and > how pathological does the workload have to be to tickle that case? Hm. I haven't really thought about this. There are some code paths that now have to resolve Multixacts that previously did not; things like vacuum. I don't think there's any case in which we previously did not block and now block, but there might be things that got slower without blocking. One thing that definitely got slower is use of SELECT FOR SHARE. (This command previously used hint bits to mark the row as locked; now it is always going to create a multixact). However, I expect that with foreign keys switching to FOR KEY SHARE, the use of FOR SHARE is going to decline, maybe disappear completely, so it shouldn't be a problem. > 2. What algorithm did we end up using do fix the set of key columns, > and is there any user configuration that can or needs to happen there? Currently we just use all columns indexed by unique indexes (excluding expressional and partial ones). Furthermore we consider "key column" all columns in a table without unique indexes. Noah disagrees with this choice; he says we should drop this last point, and that we should relax the first to "columns actually used by foreign key constraints". I expect that this is a rather simple change. Currently there's nothing that the user can do to add more columns to the set considered (other than creating more unique indexes, of course). We discussed having an ALTER TABLE command to do it, but this isn't seen as essential. > Do we handle cleanly the case where the set of key columns is changed > by DDL? Hmm, I remember thinking about this at some point, but now I'm not 100% sure. I think it doesn't matter due to multis being so ephemeral. Let me try and figure it out. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Jan 30, 2012 at 6:48 PM, Noah Misch wrote: > On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: >> The biggest item remaining is the point you raised about multixactid >> wraparound. This is closely related to multixact truncation and the way >> checkpoints are to be handled. If we think that MultiXactId wraparound >> is possible, and we need to involve autovacuum to keep it at bay, then I > > To prove it possible, we need prove there exists some sequence of operations > consuming N xids and M > N multixactids. Have N transactions key-lock N-1 > rows apiece, then have each of them key-lock one of the rows locked by each > other transaction. This consumes N xids and N(N-1) multixactids. I believe > you could construct a workload with N! multixact usage, too. > > Existence proofs are one thing, real workloads another. My unsubstantiated > guess is that multixactid use will not overtake xid use in bulk on a workload > not specifically tailored to do so. So, I think it's enough to notice it, > refuse to assign a new multixactid, and tell the user to clear the condition > with a VACUUM FREEZE of all databases. Other opinions would indeed be useful. I suspect you are right that it is unlikely, but OTOH that sounds like an extremely painful recovery procedure. We probably don't need to put a ton of thought into handling this case as efficiently as possible, but I think we would do well to avoid situations that could lead to, basically, a full-cluster shutdown. If that happens to one of my customers I expect to lose the customer. I have a couple of other concerns about this patch: 1. I think it's probably fair to assume that this is going to be a huge win in cases where it avoids deadlocks or lock waits. But is there a worst case where we don't avoid that but still add a lot of extra multi-xact lookups? What's the worst case we can imagine and how pathological does the workload have to be to tickle that case? 2. What algorithm did we end up using do fix the set of key columns, and is there any user configuration that can or needs to happen there? Do we handle cleanly the case where the set of key columns is changed by DDL? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: > The biggest item remaining is the point you raised about multixactid > wraparound. This is closely related to multixact truncation and the way > checkpoints are to be handled. If we think that MultiXactId wraparound > is possible, and we need to involve autovacuum to keep it at bay, then I To prove it possible, we need prove there exists some sequence of operations consuming N xids and M > N multixactids. Have N transactions key-lock N-1 rows apiece, then have each of them key-lock one of the rows locked by each other transaction. This consumes N xids and N(N-1) multixactids. I believe you could construct a workload with N! multixact usage, too. Existence proofs are one thing, real workloads another. My unsubstantiated guess is that multixactid use will not overtake xid use in bulk on a workload not specifically tailored to do so. So, I think it's enough to notice it, refuse to assign a new multixactid, and tell the user to clear the condition with a VACUUM FREEZE of all databases. Other opinions would indeed be useful. > think the only way to make that work is to add another column to > pg_class so that each table's oldest multixact is tracked, same as we do > with relfrozenxid for Xids. If we do that, I think we can do away with > most of the MultiXactTruncate junk I added -- things would become a lot > simpler. The cost would be bloating pg_class a bit more. Are we okay > with paying that cost? I asked this question some months ago and I > decided that I would not add the column, but I am starting to lean the > other way. I would like some opinions on this. That's not the only way; autovacuum could just accelerate normal freezing to advance the multixactid horizon indirectly. Your proposal could make it far more efficient, though. > You asked two questions about WAL-logging locks: one was about the level > of detail we log for each lock we grab; the other was about > heap_xlog_update logging the right info. AFAICS, the main thing that > makes detailed WAL logging necessary is hot standbys. That is, the > standby must have all the locking info so that concurrent transactions > are similarly locked as in the master ... or am I wrong in that? (ISTM > this means I need to fix heap_xlog_update so that it faithfully > registers the lock info we're storing, not just the current Xid). Standby servers do not need accurate tuple locks, because it's not possible to wait on a tuple lock during recovery. (By the way, the question about log detail was just from my own curiosity. We don't especially need an answer to move forward with this patch, unless you want one.) > * Columns that are part of the key > Noah thinks the set of columns should only consider those actually > referenced > by keys, not those that *could* be referenced. Well, do you disagree? To me it's low-hanging fruit, because it isolates the UPDATE-time overhead of this patch to FK-referenced tables rather than all tables having a PK or PK-like index (often just "all tables"). > Also, in a table without columns, are all columns part of the key, or is the > key the empty set? I changed HeapSatisfiesHOTUpdate but that seems > arbitrary. It does seem arbitrary. What led you to switch in a later version? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Alvaro Herrera's message of mar ene 24 15:47:16 -0300 2012: > Need more code changes for the following: > * export FOR KEY UPDATE lock mode in SQL While doing this, I realized that there's an open item here regarding a transaction that locks a tuple, and then in an aborted savepoint deletes it. As things stand, what happens is that the original tuple lock is forgotten entirely, which was one of the things I wanted to fix (and in fact is fixed for all other cases AFAICS). So what we need is to be able to store a MultiXactId that includes a member for KeyUpdate locks, which will represent an UPDATE that touches key columns as well as DELETEs. That closes the hole. However, the problem with this is that we have no more bits left in the flag bitmask, which is another concern you had raised. I chose the easy way out and added a full byte of flags per transaction. This means that we now have 1636 xacts per members page rather than 1900+, but I'm not too concerned about that. (We could cut back to 4 flag bits per xact -- but then, having some room for future growth is probably a good plan anyway). So DELETEs can also create multis. I'm going to submit an updated patch shortly. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Wed, Jan 18, 2012 at 05:18:31PM -0300, Alvaro Herrera wrote: > Excerpts from Heikki Linnakangas's message of mar ene 17 03:21:28 -0300 2012: > > On 16.01.2012 21:52, Alvaro Herrera wrote: > > > I was initially thinking that pg_multixact should return the > > > empty set if requested members of a multi that preceded the freeze > > > point. That way, I thought, we would just never try to access a page > > > originated in the older version (assuming the freeze point is set to > > > "current" whenever pg_upgrade runs). However, as things currently > > > stand, accessing an old multi raises an error. So maybe we need a > > > scheme a bit more complex to handle this. > > > > Hmm, could we create new multixact files filled with zeros, covering the > > range that was valid in the old cluster? > > Hm, we could do something like that I guess. I'm not sure that just > zeroes is the right pattern, but it should be something simple. PostgreSQL 9.1 can have all ~4B MultiXactId on disk at any given time. We could silently ignore the lookup miss when HEAP_XMAX_LOCK_ONLY is also set. That makes existing data files acceptable while still catching data loss scenarios going forward. (It's tempting to be stricter when we know the cluster data files originated in PostgreSQL 9.2+, but I'm not sure whether that's worth its weight.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
Excerpts from Heikki Linnakangas's message of mar ene 17 03:21:28 -0300 2012: > > On 16.01.2012 21:52, Alvaro Herrera wrote: > > > > Excerpts from Heikki Linnakangas's message of lun ene 16 16:17:42 -0300 > > 2012: > >> > >> On 15.01.2012 06:49, Alvaro Herrera wrote: > >>> - pg_upgrade bits are missing. > >> > >> I guess we'll need to rewrite pg_multixact contents in pg_upgrade. Is > >> the page format backwards-compatible? > > > > It's not. > > > > I haven't worked out what pg_upgrade needs to do, honestly. I think we > > should just not copy old pg_multixact files when upgrading across this > > patch. > > Sorry, I meant whether the *data* page format is backwards-compatible? > the multixact page format clearly isn't. It's supposed to be, yes. The HEAP_XMAX_IS_NOT_UPDATE bit (now renamed) was chosen so that it'd take the place of the old HEAP_XMAX_SHARE_LOCK bit, so any pages with that bit set should continue to work similarly. The other infomask bits I used were previously unused. > > I was initially thinking that pg_multixact should return the > > empty set if requested members of a multi that preceded the freeze > > point. That way, I thought, we would just never try to access a page > > originated in the older version (assuming the freeze point is set to > > "current" whenever pg_upgrade runs). However, as things currently > > stand, accessing an old multi raises an error. So maybe we need a > > scheme a bit more complex to handle this. > > Hmm, could we create new multixact files filled with zeros, covering the > range that was valid in the old cluster? Hm, we could do something like that I guess. I'm not sure that just zeroes is the right pattern, but it should be something simple. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Mon, Jan 16, 2012 at 04:52:36PM -0300, Alvaro Herrera wrote: > Excerpts from Heikki Linnakangas's message of lun ene 16 16:17:42 -0300 2012: > > On 15.01.2012 06:49, Alvaro Herrera wrote: > > > --- 164,178 > > > #define HEAP_HASVARWIDTH0x0002/* has variable-width > > > attribute(s) */ > > > #define HEAP_HASEXTERNAL0x0004/* has external stored > > > attribute(s) */ > > > #define HEAP_HASOID0x0008/* has an object-id field > > > */ > > > ! #define HEAP_XMAX_KEYSHR_LOCK0x0010/* xmax is a key-shared > > > locker */ > > > #define HEAP_COMBOCID0x0020/* t_cid is a combo cid */ > > > #define HEAP_XMAX_EXCL_LOCK0x0040/* xmax is exclusive > > > locker */ > > > ! #define HEAP_XMAX_IS_NOT_UPDATE0x0080/* xmax, if valid, is only > > > a locker. > > > ! * Note this is not set unless > > > ! * XMAX_IS_MULTI is also set. */ > > > ! > > > ! #define HEAP_LOCK_BITS(HEAP_XMAX_EXCL_LOCK | > > > HEAP_XMAX_IS_NOT_UPDATE | \ > > > ! HEAP_XMAX_KEYSHR_LOCK) > > > #define HEAP_XMIN_COMMITTED0x0100/* t_xmin committed */ > > > #define HEAP_XMIN_INVALID0x0200/* t_xmin invalid/aborted */ > > > #define HEAP_XMAX_COMMITTED0x0400/* t_xmax committed */ > > > > HEAP_XMAX_IS_NOT_UPDATE is a pretty opaque name for that. From the name, > > I'd say that a DELETE would set that, but the comment says it has to do > > with locking. After going through all the combinations in my mind, I > > think I finally understood it: HEAP_XMAX_IS_NOT_UPDATE is set if xmax is > > a multi-xact, that represent only locking xids, no updates. How about > > calling it "HEAP_XMAX_LOCK_ONLY", and setting it whenever whether or not > > xmax is a multi-xid? > > Hm, sounds like a good idea. Will do. +1 > > Why are you renaming HeapTupleHeaderGetXmax() into > > HeapTupleHeaderGetRawXmax()? Any current callers of > > HeapTupleHeaderGetXmax() should already check that HEAP_XMAX_IS_MULTI is > > not set. > > I had this vague impression that it'd be better to break existing > callers so that they ensure they decide between > HeapTupleHeaderGetRawXmax and HeapTupleHeaderGetUpdateXid. Noah > suggested changing the macro name, too. It's up to each caller to > decide what's the sematics they want. Most want the latter; and callers > outside core are more likely to want that one. If we kept the old name, > they would get the wrong value. My previous suggestion was to have both macros: #define HeapTupleHeaderGetXmax(tup) \ ( \ AssertMacro(!((tup)->t_infomask & HEAP_XMAX_IS_MULTI)), \ HeapTupleHeaderGetRawXmax(tup) \ ) #define HeapTupleHeaderGetRawXmax(tup) \ ( \ (tup)->t_choice.t_heap.t_xmax \ ) No strong preference, though. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers