Re: Add client connection check during the execution of the query
> On 5 Jul 2019, at 11:46, Thomas Munro wrote: > > On Fri, Jul 5, 2019 at 6:28 PM Tatsuo Ishii wrote: >>> The purpose of this patch is to stop the execution of continuous >>> requests in case of a disconnection from the client. >> >> Pgpool-II already does this by sending a parameter status message to >> the client. It is expected that clients are always prepared to receive >> the parameter status message. This way I believe we could reliably >> detect that the connection to the client is broken or not. > > Hmm. If you send a message, it's basically application-level > keepalive. But it's a lot harder to be sure that the protocol and > socket are in the right state to insert a message at every possible > CHECK_FOR_INTERRUPT() location. Sergey's proposal of recv(MSG_PEEK) > doesn't require any knowledge of the protocol at all, though it > probably does need TCP keepalive to be configured to be useful for > remote connections. Well, indeed in case of cable disconnect only way to detect it with proposed approach is to have tcp keepalive. However if disconnection happens due to client application shutdown then client OS should itself properly close than connection and therefore this patch will detect such situation without keepalives configured. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Read-only access to temp tables for 2PC transactions
> On 14 May 2019, at 12:53, Stas Kelvich wrote: > > Hi, > > That is an attempt number N+1 to relax checks for a temporary table access > in a transaction that is going to be prepared. > Konstantin Knizhnik made off-list review of this patch and spotted few problems. * Incorrect reasoning that ON COMMIT DELETE truncate mechanism should be changed in order to allow preparing transactions with read-only access to temp relations. It actually can be be leaved as is. Things done in previous patch for ON COMMIT DELETE may be a performance win, but not directly related to this topic so I've deleted that part. * Copy-paste error with check conditions in relation_open/relation_try_open. Fixed version attached. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 2PC-ro-temprels-v2.patch Description: Binary data
Read-only access to temp tables for 2PC transactions
Hi, That is an attempt number N+1 to relax checks for a temporary table access in a transaction that is going to be prepared. One of the problems regarding the use of temporary tables in prepared transactions is that such transaction will hold locks for a temporary table after being prepared. That locks will prevent the backend from exiting since it will fail to acquire lock needed to delete temp table during exit. Also, re-acquiring such lock after server restart seems like an ill-defined operation. I tried to allow prepared transactions that opened a temporary relation only in AccessShare mode and then neither transfer this lock to a dummy PGPROC nor include it in a 'prepare' record. Such prepared transaction will not prevent the backend from exiting and can be committed from other backend or after a restart. However, that modification allows new DDL-related serialization anomaly: it will be possible to prepare transaction which read table A; then drop A; then commit the transaction. I not sure whether that is worse than not being able to access temp relations or not. On the other hand, it is possible to drop AccessShare locks only for temporary relation and don't change behavior for an ordinary table (in the attached patch this is done for all tables). Also, I slightly modified ON COMMIT DELETE code path. Right now all ON COMMIT DELETE temp tables are linked in a static list and if transaction accessed any temp table in any mode then during commit all tables from that list will be truncated. For a given patch that means that even if a transaction only did read from a temp table it anyway can access other temp tables with high lock mode during commit. I've added hashtable that tracks higher-than-AccessShare action with a temp table during current transaction, so during commit, only tables from that hash will be truncated. That way ON COMMIT DELETE tables in the backend will not prevent read-only access to some other table in a given backend. Any thoughts? -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 2PC-ro-temprels.patch Description: Binary data
XLogInsert() of dangling pointer while logging replica identity
Hi, hackers. It seems that heapam.c:3082 calls XLogRegisterData() with an argument allocated on stack, but following call to XLogInsert() happens after end of context for that variable. Issue spotted by clang's AddressSanitizer. Fix attached. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Fix-use-after-scope.patch Description: Binary data
Re: Global snapshots
> On 31 Jan 2019, at 18:42, Andres Freund wrote: > > Hi, > > On 2018-11-30 16:00:17 +0300, Stas Kelvich wrote: >>> On 29 Nov 2018, at 18:21, Dmitry Dolgov <9erthali...@gmail.com> wrote: >>> Is there any resulting patch where the ideas how to implement this are >>> outlined? >> >> Not yet. I’m going to continue work on this in January. And probably try to >> force some of nearby committers to make a line by line review. > > This hasn't happened yet, so I think this ought to be marked ad returned > with feedback? > No objections. I don't think this will realistically go in during last CF, so will open it during next release cycle. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
haviour, but that probably should be used as protection from unusually long transactions rather then a standard behavior. > For example, if you constructed the happens-after graph between > transactions in shared memory, including actions on all nodes, and > looked for cycles, you could abort transactions that would complete a > cycle. (We say A happens-after B if A reads or writes data previously > written by B.) If no cycle exists then all is well. Well, again, it seem to me that any kind of transaction scheduler that guarantees that RO will not abort (even if it is special kind of RO like read only deferred) needs to keep old versions. Speaking about alternative approaches, good evaluation of algorithms can be found in [HARD17]. Postgres model is close to MVCC described in article and if we enable STO with small timeout then it will be close to TIMESTAMP algorithm in article. Results shows that both MVCC and TIMESTAMP are less performant then CALVIN approach =) But that one is quite different from what is done in Postgres (and probably in all other databases except Calvin/Fauna itself) in last 20-30 years. Also looking through bunch of articles I found that one of the first articles about MVCC [REED78] (I though first was [BERN83], but actually he references bunch of previous articles and [REED78] is one them) was actually about distributed transactions and uses more or less the same approach with pseudo-time in their terminology to order transaction and assign snapshots. [HARD17] https://dl.acm.org/citation.cfm?id=3055548 [REED78] https://dl.acm.org/citation.cfm?id=889815 [BERN83] https://dl.acm.org/citation.cfm?id=319998 -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 11 May 2018, at 04:05, Masahiko Sawada <sawada.m...@gmail.com> wrote: > > If I understand correctly, simple writes with ordinary 2PC doesn't > block read who reads that writes. For example, an insertion on a node > doesn't block readers who reads the inserted tuple. But in this global > transaction, the read will be blocked during the global transaction is > InDoubt state. Is that right? InDoubt state will be short-live state > if it's local transaction but I'm not sure in global transaction. > Because during InDoubt state the coordinator has to prepare on all > participant nodes and to assign the global csn to them (and end global > transaction) the global transaction could be in InDoubt state for a > relatively long time. What I meant by "short-lived" is that InDoubt is set after transaction is prepared so it doesn't depend on size of transaction, only on network/commit latency. So you can have transaction that did bulk load for a several hours and still InDoubt state will last for network round-trip and possibly fsync that can happened during logging of commit record. > Also, it could be more longer if the > commit/rollback prepared never be performed due to a failure of any > nodes of them. In this case it definitely can. Individual node can not know whether that InDoubt transaction was somewhere committed or aborted, so it should wait until somebody will finish this tx. Particular time to recover depends on how failures are handled. Speaking more generally in presence of failures we can unlock tuples only when consensus on transaction commit is reached. And FLP theorem states that it can take indefinitely long time in fully asynchronous network. However from more practical PoW probability of such behaviour in real network becomes negligible after several iterations of voting process (some evaluations can be found in https://ieeexplore.ieee.org/document/1352999/). So several roundtrips can be a decent approximation of how long it should take to recover from InDoubt state in case failure. > With this patch, can we start a remote transaction at READ COMMITTED > with imported a global snapshot if the local transaction started at > READ COMMITTED? In theory it is possible, one just need to send new snapshot before each statement. With some amount of careful work it is possible to achieve READ COMMITED with postgres_fwd using global snapshots. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 9 May 2018, at 17:51, Robert Haas <robertmh...@gmail.com> wrote: > > Ouch. That's not so bad at READ COMMITTED, but at higher isolation > levels failure becomes extremely likely. Any multi-statement > transaction that lasts longer than global_snapshot_defer_time is > pretty much doomed. Ouch indeed. Current xmin holding scheme has two major drawbacks: it introduces timeout between export/import of snapshot and holds xmin in pessimistic way, so old versions will be preserved even when there were no global transactions. On a positive side is simplicity: that is the only way which I can think of that doesn't require distributed calculation of global xmin, which in turn, will probably require permanent connection to remote postgres_fdw node. It is not hard to add some background worker to postgres_fdw that will hold permanent connection, but I afraid that it is very discussion-prone topic and that's why I tried to avoid that. > I don't think holding back xmin is a very good strategy. Maybe it > won't be so bad if and when we get zheap, since only the undo log will > bloat rather than the table. But as it stands, holding back xmin > means everything bloats and you have to CLUSTER or VACUUM FULL the > table in order to fix it. Well, opened local transaction in postgres holds globalXmin for whole postgres instance (with exception of STO). Also active global transaction should hold globalXmin of participating nodes to be able to read right versions (again, with exception of STO). However, xmin holding scheme itself can be different. For example we can periodically check (lets say every 1-2 seconds) oldest GlobalCSN on each node and delay globalXmin advancement only if there is really exist some long transaction. So the period of bloat will be limited by this 1-2 seconds, and will not impose timeout between export/import. Also, I want to note, that global_snapshot_defer_time values about of tens of seconds doesn't change much in terms of bloat comparing to logical replication. Active logical slot holds globalXmin by setting replication_slot_xmin, which is advanced on every RunningXacts, which in turn logged every 15 seconds (hardcoded in LOG_SNAPSHOT_INTERVAL_MS). > If the behavior were really analogous to our existing "snapshot too > old" feature, it wouldn't be so bad. Old snapshots continue to work > without error so long as they only read unmodified data, and only > error out if they hit modified pages. That is actually a good idea that I missed, thanks. Really since all logic for checking modified pages is already present, it is possible to reuse that and don't raise "Global STO" error right when old snapshot is imported, but only in case when global transaction read modified page. I will implement that and update patch set. Summarising, I think, that introducing some permanent connections to postgres_fdw node will put too much burden on this patch set and that it will be possible to address that later (in a long run such connection will be anyway needed at least for a deadlock detection). However, if you think that current behavior + STO analog isn't good enough, then I'm ready to pursue that track. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 7 May 2018, at 20:04, Robert Haas <robertmh...@gmail.com> wrote: > > But what happens if a transaction starts on node A at time T0 but > first touches node B at a much later time T1, such that T1 - T0 > > global_snapshot_defer_time? > Such transaction will get "global snapshot too old" error. In principle such behaviour can be avoided by calculating oldest global csn among all cluster nodes and oldest xmin on particular node will be held only when there is some open old transaction on other node. It's easy to do from global snapshot point of view, but it's not obvious how to integrate that into postgres_fdw. Probably that will require bi-derectional connection between postgres_fdw nodes (also distributed deadlock detection will be easy with such connection). -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 4 May 2018, at 22:09, Robert Haas <robertmh...@gmail.com> wrote: > > So, is the idea that we'll definitely find out about any remote > transactions within 30 seconds, and then after we know about remote > transactions, we'll hold back OldestXmin some other way? Yes, kind of. There is a procArray->global_snapshot_xmin variable which acts as a barrier to xmin calculations in GetOldestXmin and GetSnapshotData, when set. Also each second GetSnapshotData writes globalxmin (as it was before procArray->global_snapshot_xmin was taken into account) into a circle buffer with a size equal to global_snapshot_defer_time value. That more or less the same thing as with Snapshot Too Old feature, but with a bucket size of 1 second instead of 1 minute. procArray->global_snapshot_xmin is always set to oldest value in circle buffer. This way xmin calculation is always gives a value that were global_snapshot_xmin seconds ago and we have mapping from time (or GlobalCSN) to globalxmin for each second in this range. So when some backends imports global snapshot with some GlobalCSN, that GlobalCSN is mapped to a xmin and this xmin is set as a Proc->xmin. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 3 May 2018, at 18:28, Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, May 2, 2018 at 1:27 AM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote: >> 1) To achieve commit atomicity of different nodes intermediate step is >> introduced: at first running transaction is marked as InDoubt on all nodes, >> and only after that each node commit it and stamps with a given GlobalCSN. >> All readers who ran into tuples of an InDoubt transaction should wait until >> it ends and recheck visibility. > > I'm concerned that long-running transaction could keep other > transactions waiting and then the system gets stuck. Can this happen? > and is there any workaround? InDoubt state is set just before commit, so it is short-lived state. During transaction execution global tx looks like an ordinary running transaction. Failure during 2PC with coordinator not being able to commit/abort this prepared transaction can result in situation where InDoubt tuples will be locked for reading, but in such situation coordinator should be blamed. Same problems will arise if prepared transaction holds locks, for example. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 2 May 2018, at 05:58, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> > wrote: > > On 5/1/18 12:27, Stas Kelvich wrote: >> Clock-SI is described in [5] and here I provide a small overview, which >> supposedly should be enough to catch the idea. Assume that each node runs >> Commit >> Sequence Number (CSN) based visibility: database tracks one counter for each >> transaction start (xid) and another counter for each transaction commit >> (csn). >> In such setting, a snapshot is just a single number -- a copy of current CSN >> at >> the moment when the snapshot was taken. Visibility rules are boiled down to >> checking whether current tuple's CSN is less than our snapshot's csn. Also it >> worth of mentioning that for the last 5 years there is an active proposal to >> switch Postgres to CSN-based visibility [6]. > > But that proposal has so far not succeeded. How are you overcoming the > reasons for that? Well, CSN proposal is aiming to switch all postgres visibility stuff with CSN. This proposal is far more ambitious and original postgres visibility with snapshots being arrays of XIDs is preserved. In this patch CSNs are written to SLRU during commit (in a way like commit_ts does) but will be read in two cases: 1) When the local transaction faced XID that in progress according to XIP-based snapshot, it CSN need to be checked, as it may already be InDoubt. XIDs that viewed as committed doesn't need that check (in [6] they also need to be checked through SLRU). 2) If we are in backend that imported global snapshot, then only CSN-based visibility can be used. But that happens only for global transactions. So I hope that local transactions performance will be affected only by in-progress check and there are ways to circumvent this check. Also all this behaviour is optional and can be switched off by not enabling track_global_snapshots. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Global snapshots
> On 1 May 2018, at 22:43, Robert Haas <robertmh...@gmail.com> wrote: > > I'm concerned about the provisioning aspect of this problem. Suppose > I have two existing database systems with, perhaps, wildly different > XID counters. On a certain date, I want to start using this system. Yes, that totally possible. On both systems you need: * set track_global_snapshots='on' -- this will start writing each transaction commit sequence number to SRLU. * set global_snapshot_defer_time to 30 seconds, for example -- this will delay oldestXmin advancement for specified amount of time, preserving old tuples. * restart database * optionally enable NTPd if it wasn't enabled. Also it is possible to avoid reboot, but that will require some careful work: after enabling track_global_snapshots it will be safe to start global transactions only when all concurrently running transactions will finish. More or less equivalent thing happens during logical slot creation. > Or conversely, I have two systems that are bonded together using this > system from the beginning, and then, as of a certain date, I want to > break them apart into two standalone systems. In your proposed > design, are things like this possible? Can you describe the setup > involved? Well, they are not actually "bonded" in any persistent way. If there will be no distributed transactions, there will be no any logical or physical connection between that nodes. And returning to your original concern about "wildly different XID counters" I want to emphasise that only thing that is floating between nodes is a GlobalCSN's during start and commit of distributed transaction. And that GlobalCSN is actually a timestamp of commit, the real one, from clock_gettime(). And clock time is supposedly more or less the same on different nodes in normal condition. But correctness here will not depend on degree of clock synchronisation, only performance of global transactions will. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
FinishPreparedTransaction missing HOLD_INTERRUPTS section
Hello. It seems that during COMMIT PREPARED FinishPreparedTransaction() doesn't hold interrupts around writing to wal and cleaning up ProcArray and GXact entries. At least RemoveTwoPhaseFile (which is called in between) can print a warning with ereport(), which, in turn will check for interrupts and therefore can cancel backend or throw an error before GXact clean-up. Other similar places like CommitTransaction and PrepareTransaction have such hold interrupts sections. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Add-HOLD_INTERRUPTS-section-into-FinishPreparedTrans.patch Description: Binary data
Re: unused_oids script is broken with bsd sed
> On 25 Apr 2018, at 17:55, John Naylor <jcnay...@gmail.com> wrote: > > On 4/25/18, Stas Kelvich <s.kelv...@postgrespro.ru> wrote: >>> On 25 Apr 2018, at 17:18, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> I think we should rewrite >>> both of them to use the Catalog.pm infrastructure. >> >> Okay, seems reasonable. I'll put shared code in Catalog.pm and >> update patch. > > I don't think you need any new code in Catalog.pm, I believe the > suggestion was just to use that module as a stable interface to the > data. Looking at your patch, I'll mention that we have an idiom for > extracting #define'd OID symbols, e.g.: > > my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol( > 'access/transam.h', \@include_path, 'FirstBootstrapObjectId'); > > This is preferred over using awk, which would have its own portability > issues (Windows for starters). > > While I'm thinking out loud, it might be worthwhile to patch genbki.pl > for the duplicate test, since they're run at the same time anyway (see > catalog/Makefile), and we've already read all the data. > > -John Naylor > New version is attached. I've put iterator into Catalog.pm to avoid copy-pasting it over two scripts. Also instead of greping through *.dat and *.h files I've used parsers provided in Catalog module. That way should be more sustainable to format changes. Anyone who wanted to help with scripts -- feel free to rewrite. 0001-Rewrite-unused_oids-in-perl.patch Description: Binary data -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: unused_oids script is broken with bsd sed
> On 25 Apr 2018, at 17:18, Tom Lane <t...@sss.pgh.pa.us> wrote: > I think we should rewrite > both of them to use the Catalog.pm infrastructure. Okay, seems reasonable. I'll put shared code in Catalog.pm and update patch. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: unused_oids script is broken with bsd sed
> On 25 Apr 2018, at 17:18, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes: >> +many for rewriting in perl. Do you want to have a go at that? If not I >> will. > > +1 ... I was mildly astonished that this didn't already have to happen > as part of the bootstrap data conversion effort. It's certainly not > hard to imagine future extensions to the .dat file format that would > break this script, and duplicate_oids too. I think we should rewrite > both of them to use the Catalog.pm infrastructure. > > regards, tom lane Hm, I attached patch in first message, but seems that my mail client again messed with attachment. However archive caught it: https://www.postgresql.org/message-id/attachment/60920/0001-Rewrite-unused_oids-in-perl.patch -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] logical decoding of two-phase transactions
> On 3 Apr 2018, at 16:56, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > > So I think we need a subscription parameter to enable/disable this, > defaulting to 'disabled’. +1 Also, current value for LOGICALREP_IS_COMMIT is 1, but previous code expected flags to be zero, so this way logical replication between postgres-10 and postgres-with-2pc-decoding will be broken. So ISTM it’s better to set LOGICALREP_IS_COMMIT to zero and change flags checking rules to accommodate that. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] logical decoding of two-phase transactions
Hi! Thanks for working on this patch. Reading through patch I’ve noticed that you deleted call to SnapBuildCommitTxn() in DecodePrepare(). As you correctly spotted upthread there was unnecessary code that marked transaction as running after decoding of prepare. However call marking it as committed before decoding of prepare IMHO is still needed as SnapBuildCommitTxn does some useful thing like setting base snapshot for parent transactions which were skipped because of SnapBuildXactNeedsSkip(). E.g. current code will crash in assert for following transaction: BEGIN; SAVEPOINT one; CREATE TABLE test_prepared_savepoints (a int); PREPARE TRANSACTION 'x'; COMMIT PREPARED 'x'; :get_with2pc_nofilter :get_with2pc_nofilter <- second call will crash decoder With following backtrace: frame #3: 0x00010dc47b40 postgres`ExceptionalCondition(conditionName="!(txn->ninvalidations == 0)", errorType="FailedAssertion", fileName="reorderbuffer.c", lineNumber=1944) at assert.c:54 frame #4: 0x00010d9ff4dc postgres`ReorderBufferForget(rb=0x7fe1ab832318, xid=816, lsn=35096144) at reorderbuffer.c:1944 frame #5: 0x00010d9f055c postgres`DecodePrepare(ctx=0x7fe1ab81b918, buf=0x7ffee2650408, parsed=0x7ffee2650088) at decode.c:703 frame #6: 0x00010d9ef718 postgres`DecodeXactOp(ctx=0x7fe1ab81b918, buf=0x7ffee2650408) at decode.c:310 That can be fixed by calling SnapBuildCommitTxn() in DecodePrepare() which I believe is safe because during normal work prepared transaction holds relation locks until commit/abort and in between nobody can access altered relations (or just I don’t know such situations — that was the reason why i had marked that xids as running in previous versions). > On 6 Feb 2018, at 15:20, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote: > > Hi all, > >> >> PFA, patch which applies cleanly against latest git head. I also >> removed unwanted newlines and took care of the cleanup TODO about >> making ReorderBufferTXN structure using a txn_flags field instead of >> separate booleans for various statuses like has_catalog_changes, >> is_subxact, is_serialized etc. The patch uses this txn_flags field for >> the newer prepare related info as well. >> >> "make check-world" passes ok, including the additional regular and tap >> tests that we have added as part of this patch. >> > > PFA, latest version of this patch. > > This latest version takes care of the abort-while-decoding issue along > with additional test cases and documentation changes. > > -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Issues with logical replication
> On 15 Nov 2017, at 23:09, Robert Haas <robertmh...@gmail.com> wrote: > > Ouch. This seems like a bug that needs to be fixed, but do you think > it's related to to Petr's proposed fix to set es_output_cid? That fix > looks reasonable, since we shouldn't try to lock tuples without a > valid CommandId. > > Now, having said that, I understand how the lack of that fix could cause: > > 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple > > But I do not understand how it could cause: > > #3 0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, > oper=XLTW_None) at lmgr.c:582 > I think that’s two separate bugs. Second one is due to race between XactLockTableWait() and acquiring of heavyweight transaction lock in AssignTransactionId(). XactLockTableWait assumes that transaction already set it’s hw transaction lock, but with current machinery of RunningXacts and snapshot building we can start such waiting before transaction gets it hw lock. I can see two ways out of this and both of them are quite unattractive: 1. Create a flag in pgproc indicating that hw lock is acquired and include in RunningXacts only transaction with this flag set. That's probably quite expensive since we need to hold one more exclusive lwlock in AssignTransactionId. 2. Rewrite XactLockTableWait() in a way that it is aware of the fact that valid xid can be topmost transaction but not yet have a hw lock entry. In such situation we can just wait and try to lock again. Good stop condition in this loop is TransactionIdIsInProgress(xid) becoming false. I did a sketch of first approach just to confirm that it solves the problem. But there I hold ProcArrayLock during update of flag. Since only reader is GetRunningTransactionData it possible to have a custom lock there. In this case GetRunningTransactionData will hold three locks simultaneously, since it already holds ProcArrayLock and XidGenLock =) Any better ideas? -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company xltw_fix.diff Description: Binary data