Re: [HACKERS] race condition in sync rep
On Sun, Mar 27, 2011 at 7:46 AM, Simon Riggs wrote: > Are the master and standby on same system or are they separated by a network? > > I'm surprised that a network roundtrip takes less time than the > backend takes to mark clog and then queue for the SyncRepLock. When I first noticed that it was slow (really hanging, though I failed to realize it) with fsync=off, I had two clusters on the system. I didn't know what was going on at that point, and wasn't specifically looking for this bug - I was actually testing some other aspect of the behavior and hit upon it by accident. Then while I was at PG East I realized there was a race condition. (I think I actually realized it while I was dreaming about PostgreSQL; if you think dreaming about PostgreSQL is a sign that something is seriously wrong with me, you are likely correct.) Just to convince myself that I wasn't making things up I then stuck a sleep(1) in right before the sync rep wait, for testing purposes, which of course made it trivial to demonstrate the hang; I again did that on the same system (different one) but of course with the sleep in there it wouldn't have mattered. Then later I realized that the race condition and the fsync=off were probably the same problem, so I wrote up the email that way. If your point is that I never demonstrated with sync rep between two different systems, I agree. I suspect it could be done, but you'd probably have to load the master down pretty heavily while keeping the load on the standby very light - or possibly it would work to just run a single-threaded test for a really long time, but I don't know because I haven't tried it. I'm actually not that interested in quantifying the exact probability of this happening under any given set of circumstances; it seems like enough that it's been found and fixed. If something in the phrasing of my original email gave offence, it wasn't intended to: in particular, the use of the word "nasty" was intended to convey "difficult to find; tricky". I think my fear that it would prove difficult to fix also may have affected that word choice; I didn't anticipate it being resolved so quickly and with such a small patch. I am doing my best to help fix the things that I believe to be bugs in the code without pissing anybody off. Clearly, at least in your case, that doesn't seem to have been entirely successful, but in all honesty it's not for lack of trying. I really, really want to get this release out the door and get back to writing code and doing CommitFests; but I also want it to be good (as I'm sure you do as well) and I think we're not quite there yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sun, Mar 27, 2011 at 6:07 PM, Kevin Grittner wrote: > Simon Riggs wrote: > >> I'm surprised that a network roundtrip takes less time than the >> backend takes to mark clog and then queue for the SyncRepLock. > > I'm not surprised by that at all. Some of our replication involves > Gb or faster connections on the same LAN segment (sometimes on the > same switch). > > That's rather beside the point though -- on a busy system a process > can get starved long enough to make some seemingly improbably timings > occur. PostgreSQL shouldn't fall over just because the load gets > heavy enough to cause such timings. A race condition is a race > condition. That has been fixed, without discussion. The discussion was about ..., well, something else. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Simon Riggs wrote: > I'm surprised that a network roundtrip takes less time than the > backend takes to mark clog and then queue for the SyncRepLock. I'm not surprised by that at all. Some of our replication involves Gb or faster connections on the same LAN segment (sometimes on the same switch). That's rather beside the point though -- on a busy system a process can get starved long enough to make some seemingly improbably timings occur. PostgreSQL shouldn't fall over just because the load gets heavy enough to cause such timings. A race condition is a race condition. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 5:51 PM, Robert Haas wrote: > On Sat, Mar 26, 2011 at 12:41 PM, Simon Riggs wrote: >> Is it? Sync rep requires fsync on the standby. If you then explicitly >> turn off fsync on the standby then it has a performance impact, as >> documented. > > Actually, it doesn't, now that you fixed this. Before: > > [rhaas@office ~]$ pgbench -T 10 > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 25 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 10 s > number of transactions actually processed: 27 > tps = 0.099386 (including connections establishing) > tps = 0.099389 (excluding connections establishing) > [rhaas@office ~]$ pgbench -T 10 > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 25 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 10 s > number of transactions actually processed: 425 > tps = 42.442185 (including connections establishing) > tps = 42.468972 (excluding connections establishing) > > The first one - run with code from a few weeks ago - hung up on the > 27th transaction and was stuck there until the next checkpoint > completed. The second one was run with the latest code and no longer > hangs - and in fact it's now faster than running with fsync=on, > exactly as one would expect. Or at least as *I* expected. Are the master and standby on same system or are they separated by a network? I'm surprised that a network roundtrip takes less time than the backend takes to mark clog and then queue for the SyncRepLock. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Simon Riggs writes: > On Sat, Mar 26, 2011 at 4:24 PM, Tom Lane wrote: >> Well, in that case, it should be on the open-items list. If the system >> is still behaving that way, it's a bug. > Is it? Sync rep requires fsync on the standby. If you then explicitly > turn off fsync on the standby then it has a performance impact, as > documented. No, that's utter nonsense. If the system is behaving that way, then it's a bug. If you don't think it's a bug, then you misunderstand what the fsync GUC is supposed to do. What fsync=off is supposed to do is cause every attempted fsync to succeed instantly. It is *not possible* for that to result in a performance slowdown compared to fsyncs that take nonzero time. Unless someone's broken it. The documented behavior is broken. Period. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 12:41 PM, Simon Riggs wrote: > Is it? Sync rep requires fsync on the standby. If you then explicitly > turn off fsync on the standby then it has a performance impact, as > documented. Actually, it doesn't, now that you fixed this. Before: [rhaas@office ~]$ pgbench -T 10 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 25 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 27 tps = 0.099386 (including connections establishing) tps = 0.099389 (excluding connections establishing) [rhaas@office ~]$ pgbench -T 10 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 25 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 425 tps = 42.442185 (including connections establishing) tps = 42.468972 (excluding connections establishing) The first one - run with code from a few weeks ago - hung up on the 27th transaction and was stuck there until the next checkpoint completed. The second one was run with the latest code and no longer hangs - and in fact it's now faster than running with fsync=on, exactly as one would expect. Or at least as *I* expected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 4:24 PM, Tom Lane wrote: > Simon Riggs writes: >> On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote: >>> In particular, in view of today's fix, shouldn't this commit be reverted? >>> >>> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 >>> >>> I thought at the time that that was nothing more than documenting a >>> known bug, and now it is documenting a dead bug. > >> No, that doc change is still accurate. > > Well, in that case, it should be on the open-items list. If the system > is still behaving that way, it's a bug. Is it? Sync rep requires fsync on the standby. If you then explicitly turn off fsync on the standby then it has a performance impact, as documented. Setting shared_buffers very low also reduces performance. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Mar 26, 2011, at 12:24 PM, Tom Lane wrote: > Simon Riggs writes: >> On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote: >>> In particular, in view of today's fix, shouldn't this commit be reverted? >>> >>> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 >>> >>> I thought at the time that that was nothing more than documenting a >>> known bug, and now it is documenting a dead bug. > >> No, that doc change is still accurate. > > Well, in that case, it should be on the open-items list. If the system > is still behaving that way, it's a bug. I suspect it's fixed, but I haven't checked yet. And contrary to Simon's contention on a nearby email, it was not a "slight" performance degradation. It was running at like 1 tps. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 3:00 PM, Robert Haas wrote: > On Mar 26, 2011, at 6:16 AM, Simon Riggs wrote: >> That seems very unlikely even with fsync=off in a real config where we >> have network path to consider. >> >> Your definition of a "nasty" race condition seems off. >> >> I've added code for you. > > Your skepticism seems out of place. I actually hit this problem in testing. > We could debate how realistic my test setup was, but why? It is not our > policy - and bad practice in general - to ship code with race conditions. > The code you added is not "for me"; it's "to fix a race condition". A race that will not be lost with normal and sensible settings and results in a slight performance degradation only in a contrived worse case scenario. There is no question that it caused any other problem. > Thanks for the quick fix. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Simon Riggs writes: > On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote: >> In particular, in view of today's fix, shouldn't this commit be reverted? >> >> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 >> >> I thought at the time that that was nothing more than documenting a >> known bug, and now it is documenting a dead bug. > No, that doc change is still accurate. Well, in that case, it should be on the open-items list. If the system is still behaving that way, it's a bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane wrote: > Robert Haas writes: >>> I've added code for you. > >> Your skepticism seems out of place. I actually hit this problem in testing. >> We could debate how realistic my test setup was, but why? It is not our >> policy - and bad practice in general - to ship code with race conditions. >> The code you added is not "for me"; it's "to fix a race condition". > > In particular, in view of today's fix, shouldn't this commit be reverted? > > http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 > > I thought at the time that that was nothing more than documenting a > known bug, and now it is documenting a dead bug. No, that doc change is still accurate. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Robert Haas writes: >> I've added code for you. > Your skepticism seems out of place. I actually hit this problem in testing. > We could debate how realistic my test setup was, but why? It is not our > policy - and bad practice in general - to ship code with race conditions. > The code you added is not "for me"; it's "to fix a race condition". In particular, in view of today's fix, shouldn't this commit be reverted? http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 I thought at the time that that was nothing more than documenting a known bug, and now it is documenting a dead bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Mar 26, 2011, at 6:16 AM, Simon Riggs wrote: > That seems very unlikely even with fsync=off in a real config where we > have network path to consider. > > Your definition of a "nasty" race condition seems off. > > I've added code for you. Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not "for me"; it's "to fix a race condition". Thanks for the quick fix. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 1:11 AM, Robert Haas wrote: > I believe I've figured out why synchronous replication has such > terrible performance with fsync=off: it has a nasty race condition. > It may happen - if the standby responds very quickly - that the > standby acks the commit record and awakens waiters before the > committing backend actually begins to wait. There's no cross-check > for this: the committing backend waits unconditionally, with no regard > to whether the necessary ACK has already arrived. At this point we > may be in for a very long wait: another ACK will be required to > release waiters, and that may not be immediately forthcoming. I had > thought that the next ACK (after at most wal_receiver_status_interval) > would do the trick, but it appears to be even worse than that: by > making the standby win the race, I was easily able to get the master > to hang for over a minute, and it only got released when I committed > another transaction. Had I been sufficiently patient, the next > checkpoint probably would have done the trick. > > Of course, with fsync=off on the standby, it's much easier for the > standby to win the race. That seems very unlikely even with fsync=off in a real config where we have network path to consider. Your definition of a "nasty" race condition seems off. I've added code for you. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] race condition in sync rep
I believe I've figured out why synchronous replication has such terrible performance with fsync=off: it has a nasty race condition. It may happen - if the standby responds very quickly - that the standby acks the commit record and awakens waiters before the committing backend actually begins to wait. There's no cross-check for this: the committing backend waits unconditionally, with no regard to whether the necessary ACK has already arrived. At this point we may be in for a very long wait: another ACK will be required to release waiters, and that may not be immediately forthcoming. I had thought that the next ACK (after at most wal_receiver_status_interval) would do the trick, but it appears to be even worse than that: by making the standby win the race, I was easily able to get the master to hang for over a minute, and it only got released when I committed another transaction. Had I been sufficiently patient, the next checkpoint probably would have done the trick. Of course, with fsync=off on the standby, it's much easier for the standby to win the race. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers