Re: Add client connection check during the execution of the query

2019-07-05 Thread Stas Kelvich



> On 5 Jul 2019, at 11:46, Thomas Munro  wrote:
> 
> On Fri, Jul 5, 2019 at 6:28 PM Tatsuo Ishii  wrote:
>>> The purpose of this patch is to stop the execution of continuous
>>> requests in case of a disconnection from the client.
>> 
>> Pgpool-II already does this by sending a parameter status message to
>> the client. It is expected that clients are always prepared to receive
>> the parameter status message. This way I believe we could reliably
>> detect that the connection to the client is broken or not.
> 
> Hmm.  If you send a message, it's basically application-level
> keepalive.  But it's a lot harder to be sure that the protocol and
> socket are in the right state to insert a message at every possible
> CHECK_FOR_INTERRUPT() location.  Sergey's proposal of recv(MSG_PEEK)
> doesn't require any knowledge of the protocol at all, though it
> probably does need TCP keepalive to be configured to be useful for
> remote connections.


Well, indeed in case of cable disconnect only way to detect it with
proposed approach is to have tcp keepalive. However if disconnection
happens due to client application shutdown then client OS should itself
properly close than connection and therefore this patch will detect
such situation without keepalives configured.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company






Re: Read-only access to temp tables for 2PC transactions

2019-05-22 Thread Stas Kelvich


> On 14 May 2019, at 12:53, Stas Kelvich  wrote:
> 
> Hi,
> 
> That is an attempt number N+1 to relax checks for a temporary table access
> in a transaction that is going to be prepared.
> 

Konstantin Knizhnik made off-list review of this patch and spotted
few problems.

* Incorrect reasoning that ON COMMIT DELETE truncate mechanism
should be changed in order to allow preparing transactions with
read-only access to temp relations. It actually can be be leaved
as is. Things done in previous patch for ON COMMIT DELETE may be
a performance win, but not directly related to this topic so I've
deleted that part.

* Copy-paste error with check conditions in
relation_open/relation_try_open.

Fixed version attached.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



2PC-ro-temprels-v2.patch
Description: Binary data


Read-only access to temp tables for 2PC transactions

2019-05-14 Thread Stas Kelvich
Hi,

That is an attempt number N+1 to relax checks for a temporary table access
in a transaction that is going to be prepared.

One of the problems regarding the use of temporary tables in prepared 
transactions
is that such transaction will hold locks for a temporary table after being 
prepared.
That locks will prevent the backend from exiting since it will fail to acquire 
lock
needed to delete temp table during exit. Also, re-acquiring such lock after 
server
restart seems like an ill-defined operation.

I tried to allow prepared transactions that opened a temporary relation only in
AccessShare mode and then neither transfer this lock to a dummy PGPROC nor 
include
it in a 'prepare' record. Such prepared transaction will not prevent the 
backend from
exiting and can be committed from other backend or after a restart.

However, that modification allows new DDL-related serialization anomaly: it 
will be
possible to prepare transaction which read table A; then drop A; then commit the
transaction. I not sure whether that is worse than not being able to access temp
relations or not. On the other hand, it is possible to drop AccessShare locks 
only for
temporary relation and don't change behavior for an ordinary table (in the 
attached
patch this is done for all tables).

Also, I slightly modified ON COMMIT DELETE code path. Right now all ON COMMIT 
DELETE
temp tables are linked in a static list and if transaction accessed any temp 
table
in any mode then during commit all tables from that list will be truncated. For 
a
given patch that means that even if a transaction only did read from a temp 
table it
anyway can access other temp tables with high lock mode during commit. I've 
added
hashtable that tracks higher-than-AccessShare action with a temp table during
current transaction, so during commit, only tables from that hash will be 
truncated.
That way ON COMMIT DELETE tables in the backend will not prevent read-only 
access to
some other table in a given backend.

Any thoughts?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



2PC-ro-temprels.patch
Description: Binary data


XLogInsert() of dangling pointer while logging replica identity

2019-01-31 Thread Stas Kelvich
Hi, hackers.

  It seems that heapam.c:3082 calls XLogRegisterData() with an argument
allocated on stack, but following call to XLogInsert() happens after
end of context for that variable.
  Issue spotted by clang's AddressSanitizer. Fix attached.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



0001-Fix-use-after-scope.patch
Description: Binary data


Re: Global snapshots

2019-01-31 Thread Stas Kelvich



> On 31 Jan 2019, at 18:42, Andres Freund  wrote:
> 
> Hi,
> 
> On 2018-11-30 16:00:17 +0300, Stas Kelvich wrote:
>>> On 29 Nov 2018, at 18:21, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>> Is there any resulting patch where the ideas how to implement this are 
>>> outlined?
>> 
>> Not yet. I’m going to continue work on this in January. And probably try to
>> force some of nearby committers to make a line by line review.
> 
> This hasn't happened yet, so I think this ought to be marked ad returned
> with feedback?
> 

No objections. I don't think this will realistically go in during last CF, so
will open it during next release cycle.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-16 Thread Stas Kelvich
haviour, but that probably should be used as protection from
unusually long transactions rather then a standard behavior.

> For example, if you constructed the happens-after graph between
> transactions in shared memory, including actions on all nodes, and
> looked for cycles, you could abort transactions that would complete a
> cycle.  (We say A happens-after B if A reads or writes data previously
> written by B.)  If no cycle exists then all is well.

Well, again, it seem to me that any kind of transaction scheduler that
guarantees that RO will not abort (even if it is special kind of RO like
read only deferred) needs to keep old versions.

Speaking about alternative approaches, good evaluation of algorithms
can be found in [HARD17]. Postgres model is close to MVCC described in article
and if we enable STO with small timeout then it will be close to TIMESTAMP
algorithm in article. Results shows that both MVCC and TIMESTAMP are less
performant then CALVIN approach =) But that one is quite different from what
is done in Postgres (and probably in all other databases except Calvin/Fauna
itself) in last 20-30 years.

Also looking through bunch of articles I found that one of the first articles
about MVCC [REED78] (I though first was [BERN83], but actually he references
bunch of previous articles and [REED78] is one them) was actually about 
distributed
transactions and uses more or less the same approach with pseudo-time in their
terminology to order transaction and assign snapshots.

[HARD17] https://dl.acm.org/citation.cfm?id=3055548
[REED78] https://dl.acm.org/citation.cfm?id=889815
[BERN83] https://dl.acm.org/citation.cfm?id=319998

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-14 Thread Stas Kelvich


> On 11 May 2018, at 04:05, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> 
> If I understand correctly, simple writes with ordinary 2PC doesn't
> block read who reads that writes. For example, an insertion on a node
> doesn't block readers who reads the inserted tuple. But in this global
> transaction, the read will be blocked during the global transaction is
> InDoubt state. Is that right? InDoubt state will be short-live state
> if it's local transaction but I'm not sure in global transaction.
> Because during InDoubt state the coordinator has to prepare on all
> participant nodes and to assign the global csn to them (and end global
> transaction) the global transaction could be in InDoubt state for a
> relatively long time.

What I meant by "short-lived" is that InDoubt is set after transaction
is prepared so it doesn't depend on size of transaction, only on
network/commit latency. So you can have transaction that did bulk load for
a several hours and still InDoubt state will last for network round-trip
and possibly fsync that can happened during logging of commit record.

> Also, it could be more longer if the
> commit/rollback prepared never be performed due to a failure of any
> nodes of them.

In this case it definitely can. Individual node can not know whether
that InDoubt transaction was somewhere committed or aborted, so it should
wait until somebody will finish this tx. Particular time to recover
depends on how failures are handled.

Speaking more generally in presence of failures we can unlock tuples
only when consensus on transaction commit is reached. And FLP theorem
states that it can take indefinitely long time in fully asynchronous
network. However from more practical PoW probability of such behaviour
in real network becomes negligible after several iterations of voting process
(some evaluations can be found in 
https://ieeexplore.ieee.org/document/1352999/).
So several roundtrips can be a decent approximation of how long it should
take to recover from InDoubt state in case failure.

> With this patch, can we start a remote transaction at READ COMMITTED
> with imported a global snapshot if the local transaction started at
> READ COMMITTED?

In theory it is possible, one just need to send new snapshot before each
statement. With some amount of careful work it is possible to achieve
READ COMMITED with postgres_fwd using global snapshots.


--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-14 Thread Stas Kelvich


> On 9 May 2018, at 17:51, Robert Haas <robertmh...@gmail.com> wrote:
> 
> Ouch.  That's not so bad at READ COMMITTED, but at higher isolation
> levels failure becomes extremely likely.  Any multi-statement
> transaction that lasts longer than global_snapshot_defer_time is
> pretty much doomed.

Ouch indeed. Current xmin holding scheme has two major drawbacks: it introduces
timeout between export/import of snapshot and holds xmin in pessimistic way, so
old versions will be preserved even when there were no global transactions. On
a positive side is simplicity: that is the only way which I can think of that
doesn't require distributed calculation of global xmin, which in turn, will
probably require permanent connection to remote postgres_fdw node. It is not
hard to add some background worker to postgres_fdw that will hold permanent
connection, but I afraid that it is very discussion-prone topic and that's why
I tried to avoid that.

> I don't think holding back xmin is a very good strategy.  Maybe it
> won't be so bad if and when we get zheap, since only the undo log will
> bloat rather than the table.  But as it stands, holding back xmin
> means everything bloats and you have to CLUSTER or VACUUM FULL the
> table in order to fix it.

Well, opened local transaction in postgres holds globalXmin for whole postgres
instance (with exception of STO). Also active global transaction should hold
globalXmin of participating nodes to be able to read right versions (again,
with exception of STO).
  However, xmin holding scheme itself can be different. For example we can
periodically check (lets say every 1-2 seconds) oldest GlobalCSN on each node
and delay globalXmin advancement only if there is really exist some long
transaction. So the period of bloat will be limited by this 1-2 seconds, and
will not impose timeout between export/import.
  Also, I want to note, that global_snapshot_defer_time values about
of tens of seconds doesn't change much in terms of bloat comparing to logical
replication. Active logical slot holds globalXmin by setting
replication_slot_xmin, which is advanced on every RunningXacts, which in turn
logged every 15 seconds (hardcoded in LOG_SNAPSHOT_INTERVAL_MS).

> If the behavior were really analogous to our existing "snapshot too
> old" feature, it wouldn't be so bad.  Old snapshots continue to work
> without error so long as they only read unmodified data, and only
> error out if they hit modified pages.

That is actually a good idea that I missed, thanks. Really since all logic
for checking modified pages is already present, it is possible to reuse that
and don't raise "Global STO" error right when old snapshot is imported, but
only in case when global transaction read modified page. I will implement
that and update patch set.

Summarising, I think, that introducing some permanent connections to
postgres_fdw node will put too much burden on this patch set and that it will
be possible to address that later (in a long run such connection will be anyway
needed at least for a deadlock detection). However, if you think that current
behavior + STO analog isn't good enough, then I'm ready to pursue that track.


--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-08 Thread Stas Kelvich


> On 7 May 2018, at 20:04, Robert Haas <robertmh...@gmail.com> wrote:
> 
> But what happens if a transaction starts on node A at time T0 but
> first touches node B at a much later time T1, such that T1 - T0 >
> global_snapshot_defer_time?
> 

Such transaction will get "global snapshot too old" error.

In principle such behaviour can be avoided by calculating oldest
global csn among all cluster nodes and oldest xmin on particular
node will be held only when there is some open old transaction on
other node. It's easy to do from global snapshot point of view,
but it's not obvious how to integrate that into postgres_fdw. Probably
that will require bi-derectional connection between postgres_fdw nodes
(also distributed deadlock detection will be easy with such connection).

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-06 Thread Stas Kelvich


> On 4 May 2018, at 22:09, Robert Haas <robertmh...@gmail.com> wrote:
> 
> So, is the idea that we'll definitely find out about any remote
> transactions within 30 seconds, and then after we know about remote
> transactions, we'll hold back OldestXmin some other way?

Yes, kind of. There is a procArray->global_snapshot_xmin variable which
acts as a barrier to xmin calculations in GetOldestXmin and
GetSnapshotData, when set.

Also each second GetSnapshotData writes globalxmin (as it was before
procArray->global_snapshot_xmin was taken into account) into a circle
buffer with a size equal to global_snapshot_defer_time value. That more
or less the same thing as with Snapshot Too Old feature, but with a
bucket size of 1 second instead of 1 minute.
procArray->global_snapshot_xmin is always set to oldest
value in circle buffer.

This way xmin calculation is always gives a value that were
global_snapshot_xmin seconds ago and we have mapping from time (or
GlobalCSN) to globalxmin for each second in this range. So when
some backends imports global snapshot with some GlobalCSN, that
GlobalCSN is mapped to a xmin and this xmin is set as a Proc->xmin.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-03 Thread Stas Kelvich


> On 3 May 2018, at 18:28, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> 
> On Wed, May 2, 2018 at 1:27 AM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> 1) To achieve commit atomicity of different nodes intermediate step is
>>   introduced: at first running transaction is marked as InDoubt on all nodes,
>>   and only after that each node commit it and stamps with a given GlobalCSN.
>>   All readers who ran into tuples of an InDoubt transaction should wait until
>>   it ends and recheck visibility.
> 
> I'm concerned that long-running transaction could keep other
> transactions waiting and then the system gets stuck. Can this happen?
> and is there any workaround?

InDoubt state is set just before commit, so it is short-lived state.
During transaction execution global tx looks like an ordinary running
transaction. Failure during 2PC with coordinator not being able to
commit/abort this prepared transaction can result in situation where
InDoubt tuples will be locked for reading, but in such situation
coordinator should be blamed. Same problems will arise if prepared
transaction holds locks, for example.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-02 Thread Stas Kelvich


> On 2 May 2018, at 05:58, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> 
> wrote:
> 
> On 5/1/18 12:27, Stas Kelvich wrote:
>> Clock-SI is described in [5] and here I provide a small overview, which
>> supposedly should be enough to catch the idea. Assume that each node runs 
>> Commit
>> Sequence Number (CSN) based visibility: database tracks one counter for each
>> transaction start (xid) and another counter for each transaction commit 
>> (csn).
>> In such setting, a snapshot is just a single number -- a copy of current CSN 
>> at
>> the moment when the snapshot was taken. Visibility rules are boiled down to
>> checking whether current tuple's CSN is less than our snapshot's csn. Also it
>> worth of mentioning that for the last 5 years there is an active proposal to
>> switch Postgres to CSN-based visibility [6].
> 
> But that proposal has so far not succeeded.  How are you overcoming the
> reasons for that?

Well, CSN proposal is aiming to switch all postgres visibility stuff with CSN.
This proposal is far more ambitious and original postgres visibility with
snapshots being arrays of XIDs is preserved. In this patch CSNs are written
to SLRU during commit (in a way like commit_ts does) but will be read in two
cases:

1) When the local transaction faced XID that in progress according to XIP-based
snapshot, it CSN need to be checked, as it may already be InDoubt. XIDs that
viewed as committed doesn't need that check (in [6] they also need to be
checked through SLRU).

2) If we are in backend that imported global snapshot, then only CSN-based
visibility can be used. But that happens only for global transactions.

So I hope that local transactions performance will be affected only by
in-progress check and there are ways to circumvent this check.

Also all this behaviour is optional and can be switched off by not enabling
track_global_snapshots.


--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Global snapshots

2018-05-01 Thread Stas Kelvich


> On 1 May 2018, at 22:43, Robert Haas <robertmh...@gmail.com> wrote:
> 
> I'm concerned about the provisioning aspect of this problem.  Suppose
> I have two existing database systems with, perhaps, wildly different
> XID counters.  On a certain date, I want to start using this system.

Yes, that totally possible. On both systems you need:

* set track_global_snapshots='on' -- this will start writing each
  transaction commit sequence number to SRLU.
* set global_snapshot_defer_time to 30 seconds, for example -- this
  will delay oldestXmin advancement for specified amount of time,
  preserving old tuples.
* restart database
* optionally enable NTPd if it wasn't enabled.

Also it is possible to avoid reboot, but that will require some careful
work: after enabling track_global_snapshots it will be safe to start
global transactions only when all concurrently running transactions
will finish. More or less equivalent thing happens during logical slot
creation.

> Or conversely, I have two systems that are bonded together using this
> system from the beginning, and then, as of a certain date, I want to
> break them apart into two standalone systems.  In your proposed
> design, are things like this possible?  Can you describe the setup
> involved?

Well, they are not actually "bonded" in any persistent way. If there will
be no distributed transactions, there will be no any logical or physical
connection between that nodes.

And returning to your original concern about "wildly different XID
counters" I want to emphasise that only thing that is floating between
nodes is a GlobalCSN's during start and commit of distributed transaction.
And that GlobalCSN is actually a timestamp of commit, the real one, from
clock_gettime(). And clock time is supposedly more or less the same
on different nodes in normal condition. But correctness here will not
depend on degree of clock synchronisation, only performance of
global transactions will.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





FinishPreparedTransaction missing HOLD_INTERRUPTS section

2018-04-27 Thread Stas Kelvich
Hello.

It seems that during COMMIT PREPARED FinishPreparedTransaction() doesn't
hold interrupts around writing to wal and cleaning up ProcArray and GXact
entries. At least RemoveTwoPhaseFile (which is called in between) can print
a warning with ereport(), which, in turn will check for interrupts and
therefore can cancel backend or throw an error before GXact clean-up.

Other similar places like CommitTransaction and PrepareTransaction have
such hold interrupts sections.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



0001-Add-HOLD_INTERRUPTS-section-into-FinishPreparedTrans.patch
Description: Binary data


Re: unused_oids script is broken with bsd sed

2018-04-25 Thread Stas Kelvich


> On 25 Apr 2018, at 17:55, John Naylor <jcnay...@gmail.com> wrote:
> 
> On 4/25/18, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>>> On 25 Apr 2018, at 17:18, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> I think we should rewrite
>>> both of them to use the Catalog.pm infrastructure.
>> 
>> Okay, seems reasonable. I'll put shared code in Catalog.pm and
>> update patch.
> 
> I don't think you need any new code in Catalog.pm, I believe the
> suggestion was just to use that module as a stable interface to the
> data. Looking at your patch, I'll mention that we have an idiom for
> extracting #define'd OID symbols, e.g.:
> 
> my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
>   'access/transam.h', \@include_path, 'FirstBootstrapObjectId');
> 
> This is preferred over using awk, which would have its own portability
> issues (Windows for starters).
> 
> While I'm thinking out loud, it might be worthwhile to patch genbki.pl
> for the duplicate test, since they're run at the same time anyway (see
> catalog/Makefile), and we've already read all the data.
> 
> -John Naylor
> 


New version is attached. I've put iterator into Catalog.pm to avoid copy-pasting
it over two scripts. Also instead of greping through *.dat and *.h files I've
used parsers provided in Catalog module. That way should be more sustainable
to format changes.

Anyone who wanted to help with scripts -- feel free to rewrite.




0001-Rewrite-unused_oids-in-perl.patch
Description: Binary data



--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: unused_oids script is broken with bsd sed

2018-04-25 Thread Stas Kelvich


> On 25 Apr 2018, at 17:18, Tom Lane <t...@sss.pgh.pa.us> wrote:
>  I think we should rewrite
> both of them to use the Catalog.pm infrastructure.

Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: unused_oids script is broken with bsd sed

2018-04-25 Thread Stas Kelvich


> On 25 Apr 2018, at 17:18, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes:
>> +many for rewriting in perl. Do you want to have a go at that? If not I
>> will.
> 
> +1 ... I was mildly astonished that this didn't already have to happen
> as part of the bootstrap data conversion effort.  It's certainly not
> hard to imagine future extensions to the .dat file format that would
> break this script, and duplicate_oids too.  I think we should rewrite
> both of them to use the Catalog.pm infrastructure.
> 
>   regards, tom lane

Hm, I attached patch in first message, but seems that my mail client again
messed with attachment. However archive caught it:

https://www.postgresql.org/message-id/attachment/60920/0001-Rewrite-unused_oids-in-perl.patch

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] logical decoding of two-phase transactions

2018-04-03 Thread Stas Kelvich


> On 3 Apr 2018, at 16:56, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> 
> 
> So I think we need a subscription parameter to enable/disable this,
> defaulting to 'disabled’.

+1

Also, current value for LOGICALREP_IS_COMMIT is 1, but previous code expected
flags to be zero, so this way logical replication between postgres-10 and
postgres-with-2pc-decoding will be broken. So ISTM it’s better to set
LOGICALREP_IS_COMMIT to zero and change flags checking rules to accommodate 
that.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] logical decoding of two-phase transactions

2018-02-08 Thread Stas Kelvich
Hi!

Thanks for working on this patch.

Reading through patch I’ve noticed that you deleted call to SnapBuildCommitTxn()
in DecodePrepare(). As you correctly spotted upthread there was unnecessary
code that marked transaction as running after decoding of prepare. However call
marking it as committed before decoding of prepare IMHO is still needed as
SnapBuildCommitTxn does some useful thing like setting base snapshot for parent
transactions which were skipped because of SnapBuildXactNeedsSkip().

E.g. current code will crash in assert for following transaction:

BEGIN;
SAVEPOINT one;
CREATE TABLE test_prepared_savepoints (a int);
PREPARE TRANSACTION 'x';
COMMIT PREPARED 'x';
:get_with2pc_nofilter
:get_with2pc_nofilter  <- second call will crash decoder


With following backtrace:
 
   frame #3: 0x00010dc47b40 
postgres`ExceptionalCondition(conditionName="!(txn->ninvalidations == 0)", 
errorType="FailedAssertion", fileName="reorderbuffer.c", lineNumber=1944) at 
assert.c:54
frame #4: 0x00010d9ff4dc 
postgres`ReorderBufferForget(rb=0x7fe1ab832318, xid=816, lsn=35096144) at 
reorderbuffer.c:1944
frame #5: 0x00010d9f055c postgres`DecodePrepare(ctx=0x7fe1ab81b918, 
buf=0x7ffee2650408, parsed=0x7ffee2650088) at decode.c:703
frame #6: 0x00010d9ef718 postgres`DecodeXactOp(ctx=0x7fe1ab81b918, 
buf=0x7ffee2650408) at decode.c:310

That can be fixed by calling SnapBuildCommitTxn() in DecodePrepare()
which I believe is safe because during normal work prepared transaction
holds relation locks until commit/abort and in between nobody can access
altered relations (or just I don’t know such situations — that was the reason
why i had marked that xids as running in previous versions).


> On 6 Feb 2018, at 15:20, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote:
> 
> Hi all,
> 
>> 
>> PFA, patch which applies cleanly against latest git head. I also
>> removed unwanted newlines and took care of the cleanup TODO about
>> making ReorderBufferTXN structure using a txn_flags field instead of
>> separate booleans for various statuses like has_catalog_changes,
>> is_subxact, is_serialized etc. The patch uses this txn_flags field for
>> the newer prepare related info as well.
>> 
>> "make check-world" passes ok, including the additional regular and tap
>> tests that we have added as part of this patch.
>> 
> 
> PFA, latest version of this patch.
> 
> This latest version takes care of the abort-while-decoding issue along
> with additional test cases and documentation changes.
> 
> 


--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [HACKERS] Issues with logical replication

2017-11-16 Thread Stas Kelvich

> On 15 Nov 2017, at 23:09, Robert Haas <robertmh...@gmail.com> wrote:
> 
> Ouch.  This seems like a bug that needs to be fixed, but do you think
> it's related to to Petr's proposed fix to set es_output_cid?  That fix
> looks reasonable, since we shouldn't try to lock tuples without a
> valid CommandId.
> 
> Now, having said that, I understand how the lack of that fix could cause:
> 
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple
> 
> But I do not understand how it could cause:
> 
> #3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
> 

I think that’s two separate bugs. Second one is due to race between
XactLockTableWait() and acquiring of heavyweight transaction lock in
AssignTransactionId(). XactLockTableWait assumes that transaction
already set it’s hw transaction lock, but with current machinery of
RunningXacts and snapshot building we can start such waiting before
transaction gets it hw lock.

I can see two ways out of this and both of them are quite unattractive:

1. Create a flag in pgproc indicating that hw lock is acquired and
include in RunningXacts only transaction with this flag set.
That's probably quite expensive since we need to hold one more exclusive
lwlock in AssignTransactionId.

2. Rewrite XactLockTableWait() in a way that it is aware of the fact that
valid xid can be topmost transaction but not yet have a hw lock entry.
In such situation we can just wait and try to lock again. Good stop
condition in this loop is TransactionIdIsInProgress(xid) becoming
false.

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

Any better ideas?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



xltw_fix.diff
Description: Binary data