Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Robert Haaswrites: > No, that's not right. Now that you mention it, I realize that tuple > locks can definitely cause deadlocks. Example: Yeah. Foreign-key-related tuple locks are another rich source of examples. > ... So I don't > think we can remove speculative insertion locks from the deadlock > detector either. That scares me too. I think that relation extension can safely be transferred to some lower-level mechanism, because what has to be done while holding the lock is circumscribed and below the level of database operations (which might need other locks). These other ideas seem a lot riskier. (But see recent conversation where I discouraged Alvaro from holding extension locks across BRIN summarization activity. We'll need to look and make sure that nobody else has had creative ideas like that.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Wed, Nov 8, 2017 at 9:40 AM, Masahiko Sawadawrote: > Speaking of the acquiring these four lock types and heavy weight lock, > there obviously is a call path to acquire any of four lock types while > holding a heavy weight lock. In reverse, there also is a call path > that we acquire a heavy weight lock while holding any of four lock > types. The call path I found is that in heap_delete we acquire a tuple > lock and call XactLockTableWait or MultiXactIdWait which eventually > could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent > transactions finish. But IIUC since these functions acquire the lock > for the concurrent transaction's transaction id, deadlocks doesn't > happen. No, that's not right. Now that you mention it, I realize that tuple locks can definitely cause deadlocks. Example: setup: rhaas=# create table foo (a int, b text); CREATE TABLE rhaas=# create table bar (a int, b text); CREATE TABLE rhaas=# insert into foo values (1, 'hoge'); INSERT 0 1 session 1: rhaas=# begin; BEGIN rhaas=# update foo set b = 'hogehoge' where a = 1; UPDATE 1 session 2: rhaas=# begin; BEGIN rhaas=# update foo set b = 'quux' where a = 1; session 3: rhaas=# begin; BEGIN rhaas=# lock bar; LOCK TABLE rhaas=# update foo set b = 'blarfle' where a = 1; back to session 1: rhaas=# select * from bar; ERROR: deadlock detected LINE 1: select * from bar; ^ DETAIL: Process 88868 waits for AccessShareLock on relation 16391 of database 16384; blocked by process 88845. Process 88845 waits for ExclusiveLock on tuple (0,1) of relation 16385 of database 16384; blocked by process 88840. Process 88840 waits for ShareLock on transaction 1193; blocked by process 88868. HINT: See server log for query details. So what I said before was wrong: we definitely cannot exclude tuple locks from deadlock detection. However, we might be able to handle the problem in another way: introduce a separate, parallel-query specific mechanism to avoid having two participants try to update and/or delete the same tuple at the same time - e.g. advertise the BufferTag + offset within the page in DSM, and if somebody else already has that same combination advertised, wait until they no longer do. That shouldn't ever deadlock, because the other worker shouldn't be able to find itself waiting for us while it's busy updating a tuple. After some further study, speculative insertion locks look problematic too. I'm worried about the code path ExecInsert() [taking speculative insertion locking] -> heap_insert -> heap_prepare_insert -> toast_insert_or_update -> toast_save_datum -> heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock). That sure looks like we can end up waiting for a relation lock while holding a speculative insertion lock, which seems to mean that speculative insertion locks are subject to at least theoretical deadlock hazards as well. Note that even if we were guaranteed to be holding the lock on the toast relation already at this point, it wouldn't fix the problem, because we might still have to build or refresh a relcache entry at this point, which could end up scanning (and thus locking) system catalogs. Any syscache lookup can theoretically take a lock, even though most of the time it doesn't, and thus taking a lock that has been removed from the deadlock detector (or, say, an lwlock) and then performing a syscache lookup with it held is not OK. So I don't think we can remove speculative insertion locks from the deadlock detector either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Wed, Nov 8, 2017 at 5:41 AM, Robert Haaswrote: > On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada wrote: I suggest that a good thing to do more or less immediately, regardless of when this patch ends up being ready, would be to insert an insertion that LockAcquire() is never called while holding a lock of one of these types. If that assertion ever fails, then the whole theory that these lock types don't need deadlock detection is wrong, and we'd like to find out about that sooner or later. >>> >>> I understood. I'll check that first. >> >> I've checked whether LockAcquire is called while holding a lock of one >> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, >> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that >> we cannot move these four lock types together out of heavy-weight >> lock, but can move only the relation extension lock with tricks. >> >> Here is detail of the survey. > > Thanks for these details, but I'm not sure I fully understand. > >> * LOCKTAG_RELATION_EXTENSION >> There is a path that LockRelationForExtension() could be called while >> holding another relation extension lock. In brin_getinsertbuffer(), we >> acquire a relation extension lock for a index relation and could >> initialize a new buffer (brin_initailize_empty_new_buffer()). During >> initializing a new buffer, we call RecordPageWithFreeSpace() which >> eventually can call fsm_readbuf(rel, addr, true) where the third >> argument is "extend". We can process this problem by having the list >> (or local hash) of acquired locks and skip acquiring the lock if >> already had. For other call paths calling LockRelationForExtension, I >> don't see any problem. > > Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock? No, I meant fsm_readbuf(rel,addr,true) can acquire a relation extension lock. So it's not problem. > Basically, what matters here in the end is whether we can articulate a > deadlock-proof rule around the order in which these locks are > acquired. You're right, my survey was not enough to make a decision. As far as the acquiring these four lock types goes, there are two call paths that acquire any type of locks while holding another type of lock. The one is that acquiring a relation extension lock and then acquiring a relation extension lock for the same relation again. As explained before, this can be resolved by remembering the holding lock (perhaps holding only last one is enough). Another is that acquiring either a tuple lock, a page lock or a speculative insertion lock and then acquiring a relation extension lock. In the second case, we try to acquire these two locks in the same order; acquiring 3 types lock and then extension lock. So it's not problem if we apply the rule that is that we disallow to try acquiring these three lock types while holding any relation extension lock. Also, as far as I surveyed there is no path to acquire a relation lock while holding other 3 type locks. > The simplest such rule would be "you can only acquire one > lock of any of these types at a time, and you can't subsequently > acquire a heavyweight lock". But a more complicated rule would be OK > too, e.g. "you can acquire as many heavyweight locks as you want, and > after that you can optionally acquire one page, tuple, or speculative > token lock, and after that you can acquire a relation extension lock". > The latter rule, although more complex, is still deadlock-proof, > because the heavyweight locks still use the deadlock detector, and the > rest has a consistent order of lock acquisition that precludes one > backend taking A then B while another backend takes B then A. I'm not > entirely clear whether your survey leads us to a place where we can > articulate such a deadlock-proof rule. Speaking of the acquiring these four lock types and heavy weight lock, there obviously is a call path to acquire any of four lock types while holding a heavy weight lock. In reverse, there also is a call path that we acquire a heavy weight lock while holding any of four lock types. The call path I found is that in heap_delete we acquire a tuple lock and call XactLockTableWait or MultiXactIdWait which eventually could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent transactions finish. But IIUC since these functions acquire the lock for the concurrent transaction's transaction id, deadlocks doesn't happen. However, there might be other similar call paths if I'm missing something. For example, we do some operations that might acquire any heavy weight locks other than LOCKTAG_TRANSACTION, while holding a page lock (in ginInsertCleanup) or holding a specualtive insertion lock (in nodeModifyTable). To summary, I think we can put the following rules in order to move four lock types out of heavy weight lock. 1. Do not acquire either a tuple lock, a page lock or a speculative insertion lock while holding a
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawadawrote: >>> I suggest that a good thing to do more or less immediately, regardless >>> of when this patch ends up being ready, would be to insert an >>> insertion that LockAcquire() is never called while holding a lock of >>> one of these types. If that assertion ever fails, then the whole >>> theory that these lock types don't need deadlock detection is wrong, >>> and we'd like to find out about that sooner or later. >> >> I understood. I'll check that first. > > I've checked whether LockAcquire is called while holding a lock of one > of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, > LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that > we cannot move these four lock types together out of heavy-weight > lock, but can move only the relation extension lock with tricks. > > Here is detail of the survey. Thanks for these details, but I'm not sure I fully understand. > * LOCKTAG_RELATION_EXTENSION > There is a path that LockRelationForExtension() could be called while > holding another relation extension lock. In brin_getinsertbuffer(), we > acquire a relation extension lock for a index relation and could > initialize a new buffer (brin_initailize_empty_new_buffer()). During > initializing a new buffer, we call RecordPageWithFreeSpace() which > eventually can call fsm_readbuf(rel, addr, true) where the third > argument is "extend". We can process this problem by having the list > (or local hash) of acquired locks and skip acquiring the lock if > already had. For other call paths calling LockRelationForExtension, I > don't see any problem. Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock? Basically, what matters here in the end is whether we can articulate a deadlock-proof rule around the order in which these locks are acquired. The simplest such rule would be "you can only acquire one lock of any of these types at a time, and you can't subsequently acquire a heavyweight lock". But a more complicated rule would be OK too, e.g. "you can acquire as many heavyweight locks as you want, and after that you can optionally acquire one page, tuple, or speculative token lock, and after that you can acquire a relation extension lock". The latter rule, although more complex, is still deadlock-proof, because the heavyweight locks still use the deadlock detector, and the rest has a consistent order of lock acquisition that precludes one backend taking A then B while another backend takes B then A. I'm not entirely clear whether your survey leads us to a place where we can articulate such a deadlock-proof rule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Mon, Oct 30, 2017 at 3:17 PM, Masahiko Sawadawrote: > On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas wrote: >> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada >> wrote: >>> Since the previous patch conflicts with current HEAD, I attached the >>> updated patch for next CF. >> >> I think we should back up here and ask ourselves a couple of questions: > > Thank you for summarizing of the purpose and discussion of this patch. > >> 1. What are we trying to accomplish here? >> >> 2. Is this the best way to accomplish it? >> >> To the first question, the problem as I understand it as follows: >> Heavyweight locks don't conflict between members of a parallel group. >> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, >> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases >> don't arise, because parallel operations are strictly read-only >> (except for inserts by the leader into a just-created table, when only >> one member of the group can be taking the lock anyway). However, once >> we allow writes, they become possible, so some solution is needed. >> >> To the second question, there are a couple of ways we could fix this. >> First, we could continue to allow these locks to be taken in the >> heavyweight lock manager, but make them conflict even between members >> of the same lock group. This is, however, complicated. A significant >> problem (or so I think) is that the deadlock detector logic, which is >> already quite hard to test, will become even more complicated, since >> wait edges between members of a lock group need to exist at some times >> and not other times. Moreover, to the best of my knowledge, the >> increased complexity would have no benefit, because it doesn't look to >> me like we ever take any other heavyweight lock while holding one of >> these four kinds of locks. Therefore, no deadlock can occur: if we're >> waiting for one of these locks, the process that holds it is not >> waiting for any other heavyweight lock. This gives rise to a second >> idea: move these locks out of the heavyweight lock manager and handle >> them with separate code that does not have deadlock detection and >> doesn't need as many lock modes. I think that idea is basically >> sound, although it's possibly not the only sound idea. > > I'm on the same page. > >> >> However, that makes me wonder whether we shouldn't be a bit more >> aggressive with this patch: why JUST relation extension locks? Why >> not all four types of locks listed above? Actually, tuple locks are a >> bit sticky, because they have four lock modes. The other three kinds >> are very similar -- all you can do is "take it" (implicitly, in >> exclusive mode), "try to take it" (again, implicitly, in exclusive >> mode), or "wait for it to be released" (i.e. share lock and then >> release). Another idea is to try to handle those three types and >> leave the tuple locking problem for another day. >> >> I suggest that a good thing to do more or less immediately, regardless >> of when this patch ends up being ready, would be to insert an >> insertion that LockAcquire() is never called while holding a lock of >> one of these types. If that assertion ever fails, then the whole >> theory that these lock types don't need deadlock detection is wrong, >> and we'd like to find out about that sooner or later. > > I understood. I'll check that first. I've checked whether LockAcquire is called while holding a lock of one of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that we cannot move these four lock types together out of heavy-weight lock, but can move only the relation extension lock with tricks. Here is detail of the survey. * LOCKTAG_RELATION_EXTENSION There is a path that LockRelationForExtension() could be called while holding another relation extension lock. In brin_getinsertbuffer(), we acquire a relation extension lock for a index relation and could initialize a new buffer (brin_initailize_empty_new_buffer()). During initializing a new buffer, we call RecordPageWithFreeSpace() which eventually can call fsm_readbuf(rel, addr, true) where the third argument is "extend". We can process this problem by having the list (or local hash) of acquired locks and skip acquiring the lock if already had. For other call paths calling LockRelationForExtension, I don't see any problem. * LOCKTAG_PAGE, LOCKTAG_TUPLE, LOCKTAG_SPECULATIVE_INSERTION There is a path that we can acquire a relation extension lock while holding these lock. For LOCKTAG_PAGE, in ginInsertCleanup() we acquire a page lock for the meta page and process the pending list which could acquire a relation extension lock for a index relation. For LOCKTAG_TUPLE, in heap_update() we acquire a tuple lock and could call RelationGetBufferForTuple(). For LOCKTAG_SPECULATIVE_INSERTION, in ExecInsert() we acquire a
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, Oct 27, 2017 at 12:03 AM, Robert Haaswrote: > On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada > wrote: >> Since the previous patch conflicts with current HEAD, I attached the >> updated patch for next CF. > > I think we should back up here and ask ourselves a couple of questions: Thank you for summarizing of the purpose and discussion of this patch. > 1. What are we trying to accomplish here? > > 2. Is this the best way to accomplish it? > > To the first question, the problem as I understand it as follows: > Heavyweight locks don't conflict between members of a parallel group. > However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, > LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases > don't arise, because parallel operations are strictly read-only > (except for inserts by the leader into a just-created table, when only > one member of the group can be taking the lock anyway). However, once > we allow writes, they become possible, so some solution is needed. > > To the second question, there are a couple of ways we could fix this. > First, we could continue to allow these locks to be taken in the > heavyweight lock manager, but make them conflict even between members > of the same lock group. This is, however, complicated. A significant > problem (or so I think) is that the deadlock detector logic, which is > already quite hard to test, will become even more complicated, since > wait edges between members of a lock group need to exist at some times > and not other times. Moreover, to the best of my knowledge, the > increased complexity would have no benefit, because it doesn't look to > me like we ever take any other heavyweight lock while holding one of > these four kinds of locks. Therefore, no deadlock can occur: if we're > waiting for one of these locks, the process that holds it is not > waiting for any other heavyweight lock. This gives rise to a second > idea: move these locks out of the heavyweight lock manager and handle > them with separate code that does not have deadlock detection and > doesn't need as many lock modes. I think that idea is basically > sound, although it's possibly not the only sound idea. I'm on the same page. > > However, that makes me wonder whether we shouldn't be a bit more > aggressive with this patch: why JUST relation extension locks? Why > not all four types of locks listed above? Actually, tuple locks are a > bit sticky, because they have four lock modes. The other three kinds > are very similar -- all you can do is "take it" (implicitly, in > exclusive mode), "try to take it" (again, implicitly, in exclusive > mode), or "wait for it to be released" (i.e. share lock and then > release). Another idea is to try to handle those three types and > leave the tuple locking problem for another day. > > I suggest that a good thing to do more or less immediately, regardless > of when this patch ends up being ready, would be to insert an > insertion that LockAcquire() is never called while holding a lock of > one of these types. If that assertion ever fails, then the whole > theory that these lock types don't need deadlock detection is wrong, > and we'd like to find out about that sooner or later. I understood. I'll check that first. If this direction has no problem and we changed these three locks so that it uses new lock mechanism, we'll not be able to use these locks at the same time. Since it also means that we impose a limitation to the future we should think carefully about it. We can implement the deadlock detection mechanism for it again but it doesn't make sense. > > On the details of the patch, it appears that RelExtLockAcquire() > executes the wait-for-lock code with the partition lock held, and then > continues to hold the partition lock for the entire time that the > relation extension lock is held. That not only makes all code that > runs while holding the lock non-interruptible but makes a lot of the > rest of this code pointless. How is any of this atomics code going to > be reached by more than one process at the same time if the entire > bucket is exclusive-locked? I would guess that the concurrency is not > very good here for the same reason. Of course, just releasing the > bucket lock wouldn't be right either, because then ext_lock might go > away while we've got a pointer to it, which wouldn't be good. I think > you could make this work if each lock had both a locker count and a > pin count, and the object can only be removed when the pin_count is 0. > So the lock algorithm would look like this: > > - Acquire the partition LWLock. > - Find the item of interest, creating it if necessary. If out of > memory for more elements, sweep through the table and reclaim > 0-pin-count entries, then retry. > - Increment the pin count. > - Attempt to acquire the lock atomically; if we succeed, release the > partition lock and return. > - If this was a
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawadawrote: > Since the previous patch conflicts with current HEAD, I attached the > updated patch for next CF. I think we should back up here and ask ourselves a couple of questions: 1. What are we trying to accomplish here? 2. Is this the best way to accomplish it? To the first question, the problem as I understand it as follows: Heavyweight locks don't conflict between members of a parallel group. However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. Currently, those cases don't arise, because parallel operations are strictly read-only (except for inserts by the leader into a just-created table, when only one member of the group can be taking the lock anyway). However, once we allow writes, they become possible, so some solution is needed. To the second question, there are a couple of ways we could fix this. First, we could continue to allow these locks to be taken in the heavyweight lock manager, but make them conflict even between members of the same lock group. This is, however, complicated. A significant problem (or so I think) is that the deadlock detector logic, which is already quite hard to test, will become even more complicated, since wait edges between members of a lock group need to exist at some times and not other times. Moreover, to the best of my knowledge, the increased complexity would have no benefit, because it doesn't look to me like we ever take any other heavyweight lock while holding one of these four kinds of locks. Therefore, no deadlock can occur: if we're waiting for one of these locks, the process that holds it is not waiting for any other heavyweight lock. This gives rise to a second idea: move these locks out of the heavyweight lock manager and handle them with separate code that does not have deadlock detection and doesn't need as many lock modes. I think that idea is basically sound, although it's possibly not the only sound idea. However, that makes me wonder whether we shouldn't be a bit more aggressive with this patch: why JUST relation extension locks? Why not all four types of locks listed above? Actually, tuple locks are a bit sticky, because they have four lock modes. The other three kinds are very similar -- all you can do is "take it" (implicitly, in exclusive mode), "try to take it" (again, implicitly, in exclusive mode), or "wait for it to be released" (i.e. share lock and then release). Another idea is to try to handle those three types and leave the tuple locking problem for another day. I suggest that a good thing to do more or less immediately, regardless of when this patch ends up being ready, would be to insert an insertion that LockAcquire() is never called while holding a lock of one of these types. If that assertion ever fails, then the whole theory that these lock types don't need deadlock detection is wrong, and we'd like to find out about that sooner or later. On the details of the patch, it appears that RelExtLockAcquire() executes the wait-for-lock code with the partition lock held, and then continues to hold the partition lock for the entire time that the relation extension lock is held. That not only makes all code that runs while holding the lock non-interruptible but makes a lot of the rest of this code pointless. How is any of this atomics code going to be reached by more than one process at the same time if the entire bucket is exclusive-locked? I would guess that the concurrency is not very good here for the same reason. Of course, just releasing the bucket lock wouldn't be right either, because then ext_lock might go away while we've got a pointer to it, which wouldn't be good. I think you could make this work if each lock had both a locker count and a pin count, and the object can only be removed when the pin_count is 0. So the lock algorithm would look like this: - Acquire the partition LWLock. - Find the item of interest, creating it if necessary. If out of memory for more elements, sweep through the table and reclaim 0-pin-count entries, then retry. - Increment the pin count. - Attempt to acquire the lock atomically; if we succeed, release the partition lock and return. - If this was a conditional-acquire, then decrement the pin count, release the partition lock and return. - Release the partition lock. - Sleep on the condition variable until we manage to atomically acquire the lock. The unlock algorithm would just decrement the pin count and, if the resulting value is non-zero, broadcast on the condition variable. Although I think this will work, I'm not sure this is actually a great algorithm. Every lock acquisition has to take and release the partition lock, use at least two more atomic ops (to take the pin and the lock), and search a hash table. I don't think that's going to be staggeringly fast. Maybe it's OK. It's not that much worse, possibly not any worse, than
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, Sep 8, 2017 at 4:32 AM, Masahiko Sawadawrote: > On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro > wrote: >> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada >> wrote: >>> The previous patch conflicts with current HEAD, I rebased the patch to >>> current HEAD. >> >> Hi Masahiko-san, >> >> FYI this doesn't build anymore. I think it's just because the wait >> event enumerators were re-alphabetised in pgstat.h: >> >> ../../../../src/include/pgstat.h:820:2: error: redeclaration of >> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ >> WAIT_EVENT_LOGICAL_SYNC_DATA, >> ^ >> ../../../../src/include/pgstat.h:806:2: note: previous definition of >> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here >> WAIT_EVENT_LOGICAL_SYNC_DATA, >> ^ >> ../../../../src/include/pgstat.h:821:2: error: redeclaration of >> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ >> WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, >> ^ >> ../../../../src/include/pgstat.h:807:2: note: previous definition of >> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here >> WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, >> ^ >> > > Thank you for the information! Attached rebased patch. > Since the previous patch conflicts with current HEAD, I attached the updated patch for next CF. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 80f803e..b928c1a 100644 --- a/src/backend/access/brin/brin_pageops.c +++ b/src/backend/access/brin/brin_pageops.c @@ -609,8 +609,8 @@ brin_page_cleanup(Relation idxrel, Buffer buf) */ if (PageIsNew(page)) { - LockRelationForExtension(idxrel, ShareLock); - UnlockRelationForExtension(idxrel, ShareLock); + LockRelationForExtension(idxrel, RELEXT_SHARED); + UnlockRelationForExtension(idxrel, RELEXT_SHARED); LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); if (PageIsNew(page)) @@ -702,7 +702,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz, */ if (!RELATION_IS_LOCAL(irel)) { -LockRelationForExtension(irel, ExclusiveLock); +LockRelationForExtension(irel, RELEXT_EXCLUSIVE); extensionLockHeld = true; } buf = ReadBuffer(irel, P_NEW); @@ -754,7 +754,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz, } if (extensionLockHeld) - UnlockRelationForExtension(irel, ExclusiveLock); + UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE); ReleaseBuffer(buf); return InvalidBuffer; @@ -764,7 +764,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz, LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); if (extensionLockHeld) - UnlockRelationForExtension(irel, ExclusiveLock); + UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE); page = BufferGetPage(buf); diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c index 22f2076..4c15b45 100644 --- a/src/backend/access/brin/brin_revmap.c +++ b/src/backend/access/brin/brin_revmap.c @@ -570,7 +570,7 @@ revmap_physical_extend(BrinRevmap *revmap) else { if (needLock) - LockRelationForExtension(irel, ExclusiveLock); + LockRelationForExtension(irel, RELEXT_EXCLUSIVE); buf = ReadBuffer(irel, P_NEW); if (BufferGetBlockNumber(buf) != mapBlk) @@ -582,7 +582,7 @@ revmap_physical_extend(BrinRevmap *revmap) * page from under whoever is using it. */ if (needLock) -UnlockRelationForExtension(irel, ExclusiveLock); +UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE); LockBuffer(revmap->rm_metaBuf, BUFFER_LOCK_UNLOCK); ReleaseBuffer(buf); return; @@ -591,7 +591,7 @@ revmap_physical_extend(BrinRevmap *revmap) page = BufferGetPage(buf); if (needLock) - UnlockRelationForExtension(irel, ExclusiveLock); + UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE); } /* Check that it's a regular block (or an empty page) */ diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c index 136ea27..1690d21 100644 --- a/src/backend/access/gin/ginutil.c +++ b/src/backend/access/gin/ginutil.c @@ -325,13 +325,13 @@ GinNewBuffer(Relation index) /* Must extend the file */ needLock = !RELATION_IS_LOCAL(index); if (needLock) - LockRelationForExtension(index, ExclusiveLock); + LockRelationForExtension(index, RELEXT_EXCLUSIVE); buffer = ReadBuffer(index, P_NEW); LockBuffer(buffer, GIN_EXCLUSIVE); if (needLock) - UnlockRelationForExtension(index, ExclusiveLock); + UnlockRelationForExtension(index, RELEXT_EXCLUSIVE); return buffer; } diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c index 31425e9..e9f84bc 100644 --- a/src/backend/access/gin/ginvacuum.c +++ b/src/backend/access/gin/ginvacuum.c @@ -716,10 +716,10 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) needLock = !RELATION_IS_LOCAL(index);
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munrowrote: > On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada > wrote: >> The previous patch conflicts with current HEAD, I rebased the patch to >> current HEAD. > > Hi Masahiko-san, > > FYI this doesn't build anymore. I think it's just because the wait > event enumerators were re-alphabetised in pgstat.h: > > ../../../../src/include/pgstat.h:820:2: error: redeclaration of > enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ > WAIT_EVENT_LOGICAL_SYNC_DATA, > ^ > ../../../../src/include/pgstat.h:806:2: note: previous definition of > ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here > WAIT_EVENT_LOGICAL_SYNC_DATA, > ^ > ../../../../src/include/pgstat.h:821:2: error: redeclaration of > enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ > WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, > ^ > ../../../../src/include/pgstat.h:807:2: note: previous definition of > ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here > WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, > ^ > Thank you for the information! Attached rebased patch. -- Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center Moving_extension_lock_out_of_heavyweight_lock_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, Sep 8, 2017 at 8:25 AM, Thomas Munrowrote: > On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro > wrote: >> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada >> wrote: >>> The previous patch conflicts with current HEAD, I rebased the patch to >>> current HEAD. >> >> Hi Masahiko-san, > > Hi Sawada-san, > > I have just learned from a colleague who is knowledgeable about > Japanese customs and kind enough to correct me that the appropriate > term of address for our colleagues in Japan on this mailing list is > -san. I was confused about that -- apologies for my > clumsiness. Don't worry about it, either is ok. In Japan there is a custom of writing -san but -san is also not incorrect :-) (also I think it's hard to distinguish between last name and first name of Japanese name). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munrowrote: > On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada > wrote: >> The previous patch conflicts with current HEAD, I rebased the patch to >> current HEAD. > > Hi Masahiko-san, Hi Sawada-san, I have just learned from a colleague who is knowledgeable about Japanese customs and kind enough to correct me that the appropriate term of address for our colleagues in Japan on this mailing list is -san. I was confused about that -- apologies for my clumsiness. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawadawrote: > The previous patch conflicts with current HEAD, I rebased the patch to > current HEAD. Hi Masahiko-san, FYI this doesn't build anymore. I think it's just because the wait event enumerators were re-alphabetised in pgstat.h: ../../../../src/include/pgstat.h:820:2: error: redeclaration of enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ WAIT_EVENT_LOGICAL_SYNC_DATA, ^ ../../../../src/include/pgstat.h:806:2: note: previous definition of ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here WAIT_EVENT_LOGICAL_SYNC_DATA, ^ ../../../../src/include/pgstat.h:821:2: error: redeclaration of enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, ^ ../../../../src/include/pgstat.h:807:2: note: previous definition of ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE, ^ -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Thu, Jun 22, 2017 at 12:03 AM, Masahiko Sawadawrote: > On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada > wrote: >> On Wed, May 17, 2017 at 1:30 AM, Robert Haas wrote: >>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila >>> wrote: On Fri, May 12, 2017 at 9:14 AM, Tom Lane wrote: > Robert Haas writes: >> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada >> wrote: >>> ... I'd like to propose to change relation >>> extension lock management so that it works using LWLock instead. > >> That's not a good idea because it'll make the code that executes while >> holding that lock noninterruptible. > > Is that really a problem? We typically only hold it over one kernel call, > which ought to be noninterruptible anyway. During parallel bulk load operations, I think we hold it over multiple kernel calls. >>> >>> We do. Also, RelationGetNumberOfBlocks() is not necessarily only one >>> kernel call, no? Nor is vm_extend. >> >> Yeah, these functions could call more than one kernel calls while >> holding extension lock. >> >>> Also, it's not just the backend doing the filesystem operation that's >>> non-interruptible, but also any waiters, right? >>> >>> Maybe this isn't a big problem, but it does seem to be that it would >>> be better to avoid it if we can. >>> >> >> I agree to change it to be interruptible for more safety. >> > > Attached updated version patch. To use the lock mechanism similar to > LWLock but interrupt-able, I introduced new lock manager for extension > lock. A lot of code especially locking and unlocking, is inspired by > LWLock but it uses the condition variables to wait for acquiring lock. > Other part is not changed from previous patch. This is still a PoC > patch, lacks documentation. The following is the measurement result > with test script same as I used before. > > * Copy test script > HEADPatched > 4436.6 436.1 > 8561.8 561.8 > 16 580.7 579.4 > 32 588.5 597.0 > 64 596.1 599.0 > > * Insert test script > HEADPatched > 4156.5 156.0 > 8167.0 167.9 > 16 176.2 175.6 > 32 181.1 181.0 > 64 181.5 183.0 > > Since I replaced heavyweight lock with lightweight lock I expected the > performance slightly improves from HEAD but it was almost same result. > I'll continue to look at more detail. > The previous patch conflicts with current HEAD, I rebased the patch to current HEAD. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center Moving_extension_lock_out_of_heavyweight_lock_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawadawrote: > On Wed, May 17, 2017 at 1:30 AM, Robert Haas wrote: >> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila wrote: >>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane wrote: Robert Haas writes: > On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada > wrote: >> ... I'd like to propose to change relation >> extension lock management so that it works using LWLock instead. > That's not a good idea because it'll make the code that executes while > holding that lock noninterruptible. Is that really a problem? We typically only hold it over one kernel call, which ought to be noninterruptible anyway. >>> >>> During parallel bulk load operations, I think we hold it over multiple >>> kernel calls. >> >> We do. Also, RelationGetNumberOfBlocks() is not necessarily only one >> kernel call, no? Nor is vm_extend. > > Yeah, these functions could call more than one kernel calls while > holding extension lock. > >> Also, it's not just the backend doing the filesystem operation that's >> non-interruptible, but also any waiters, right? >> >> Maybe this isn't a big problem, but it does seem to be that it would >> be better to avoid it if we can. >> > > I agree to change it to be interruptible for more safety. > Attached updated version patch. To use the lock mechanism similar to LWLock but interrupt-able, I introduced new lock manager for extension lock. A lot of code especially locking and unlocking, is inspired by LWLock but it uses the condition variables to wait for acquiring lock. Other part is not changed from previous patch. This is still a PoC patch, lacks documentation. The following is the measurement result with test script same as I used before. * Copy test script HEADPatched 4436.6 436.1 8561.8 561.8 16 580.7 579.4 32 588.5 597.0 64 596.1 599.0 * Insert test script HEADPatched 4156.5 156.0 8167.0 167.9 16 176.2 175.6 32 181.1 181.0 64 181.5 183.0 Since I replaced heavyweight lock with lightweight lock I expected the performance slightly improves from HEAD but it was almost same result. I'll continue to look at more detail. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center Moving_extension_lock_out_of_heavyweight_lock_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Wed, May 17, 2017 at 1:30 AM, Robert Haaswrote: > On Sat, May 13, 2017 at 7:27 AM, Amit Kapila wrote: >> On Fri, May 12, 2017 at 9:14 AM, Tom Lane wrote: >>> Robert Haas writes: On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada wrote: > ... I'd like to propose to change relation > extension lock management so that it works using LWLock instead. >>> That's not a good idea because it'll make the code that executes while holding that lock noninterruptible. >>> >>> Is that really a problem? We typically only hold it over one kernel call, >>> which ought to be noninterruptible anyway. >> >> During parallel bulk load operations, I think we hold it over multiple >> kernel calls. > > We do. Also, RelationGetNumberOfBlocks() is not necessarily only one > kernel call, no? Nor is vm_extend. Yeah, these functions could call more than one kernel calls while holding extension lock. > Also, it's not just the backend doing the filesystem operation that's > non-interruptible, but also any waiters, right? > > Maybe this isn't a big problem, but it does seem to be that it would > be better to avoid it if we can. > I agree to change it to be interruptible for more safety. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Sat, May 13, 2017 at 7:27 AM, Amit Kapilawrote: > On Fri, May 12, 2017 at 9:14 AM, Tom Lane wrote: >> Robert Haas writes: >>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada >>> wrote: ... I'd like to propose to change relation extension lock management so that it works using LWLock instead. >> >>> That's not a good idea because it'll make the code that executes while >>> holding that lock noninterruptible. >> >> Is that really a problem? We typically only hold it over one kernel call, >> which ought to be noninterruptible anyway. > > During parallel bulk load operations, I think we hold it over multiple > kernel calls. We do. Also, RelationGetNumberOfBlocks() is not necessarily only one kernel call, no? Nor is vm_extend. Also, it's not just the backend doing the filesystem operation that's non-interruptible, but also any waiters, right? Maybe this isn't a big problem, but it does seem to be that it would be better to avoid it if we can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Sat, May 13, 2017 at 8:19 PM, Amit Kapilawrote: > On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawada > wrote: >> This work would be helpful not only for existing workload but also >> future works like some parallel utility commands, which is discussed >> on other threads[1]. At least for parallel vacuum, this feature helps >> to solve issue that the implementation of parallel vacuum has. >> >> I ran pgbench for 10 min three times(scale factor is 5000), here is a >> performance measurement result. >> >> clients TPS(HEAD) TPS(Patched) >> 4 2092.612 2031.277 >> 8 3153.732 3046.789 >> 16 4562.072 4625.419 >> 32 6439.391 6479.526 >> 64 7767.364 7779.636 >> 100 7917.173 7906.567 >> >> * 16 core Xeon E5620 2.4GHz >> * 32 GB RAM >> * ioDrive >> >> In current implementation, it seems there is no performance degradation so >> far. >> > > I think it is good to check pgbench, but we should do tests of the > bulk load as this lock is stressed during such a workload. Some of > the tests we have done when we have improved the performance of bulk > load can be found in an e-mail [1]. > Thank you for sharing. I've measured using two test scripts attached on that thread. Here is result. * Copy test script ClientHEAD Patched 4 452.60 455.53 8 561.74 561.09 16592.50 592.21 32602.53 599.53 64605.01 606.42 * Insert test script ClientHEAD Patched 4 159.04 158.44 8 169.41 169.69 16177.11 178.14 32182.14 181.99 64182.11 182.73 It seems there is no performance degradation so far. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Fri, May 12, 2017 at 9:14 AM, Tom Lanewrote: > Robert Haas writes: >> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada >> wrote: >>> ... I'd like to propose to change relation >>> extension lock management so that it works using LWLock instead. > >> That's not a good idea because it'll make the code that executes while >> holding that lock noninterruptible. > > Is that really a problem? We typically only hold it over one kernel call, > which ought to be noninterruptible anyway. > During parallel bulk load operations, I think we hold it over multiple kernel calls. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawadawrote: > This work would be helpful not only for existing workload but also > future works like some parallel utility commands, which is discussed > on other threads[1]. At least for parallel vacuum, this feature helps > to solve issue that the implementation of parallel vacuum has. > > I ran pgbench for 10 min three times(scale factor is 5000), here is a > performance measurement result. > > clients TPS(HEAD) TPS(Patched) > 4 2092.612 2031.277 > 8 3153.732 3046.789 > 16 4562.072 4625.419 > 32 6439.391 6479.526 > 64 7767.364 7779.636 > 100 7917.173 7906.567 > > * 16 core Xeon E5620 2.4GHz > * 32 GB RAM > * ioDrive > > In current implementation, it seems there is no performance degradation so > far. > I think it is good to check pgbench, but we should do tests of the bulk load as this lock is stressed during such a workload. Some of the tests we have done when we have improved the performance of bulk load can be found in an e-mail [1]. [1] - https://www.postgresql.org/message-id/CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Robert Haaswrites: > On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada > wrote: >> ... I'd like to propose to change relation >> extension lock management so that it works using LWLock instead. > That's not a good idea because it'll make the code that executes while > holding that lock noninterruptible. Is that really a problem? We typically only hold it over one kernel call, which ought to be noninterruptible anyway. Also, the CheckpointLock is held for far longer, and we've not heard complaints about that one. I'm slightly suspicious of the claim that we don't need deadlock detection. There are places that e.g. touch FSM while holding this lock. It might be all right but it needs close review, not just an assertion that it's not a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawadawrote: > Currently, the relation extension lock is implemented using > heavyweight lock manager and almost functions (except for > brin_page_cleanup) using LockRelationForExntesion use it with > ExclusiveLock mode. But actually it doesn't need multiple lock modes > or deadlock detection or any of the other functionality that the > heavyweight lock manager provides. I think It's enough to use > something like LWLock. So I'd like to propose to change relation > extension lock management so that it works using LWLock instead. That's not a good idea because it'll make the code that executes while holding that lock noninterruptible. Possibly something based on condition variables would work better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers