Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-03 Thread Petr Jelinek
Hi,

On 02/09/17 21:08, David Steele wrote:
> Hackers,
> 
> The cache in pg_internal.init was reused in days of yore but has been
> rebuilt on postmaster startup since v8.1.  It appears there is no reason
> for this file to be backed up.
> 

Makes sense.

> I also moved the RELCACHE_INIT_FILENAME constant to relcache.h to avoid
> duplicating the string.

+1

> +++ b/doc/src/sgml/protocol.sgml
> @@ -2384,6 +2384,11 @@ The commands accepted in walsender mode are:
> 
> 
>  
> + pg_internal.init
> +
> +   
> +   
> +

Not specific problem to this patch, but I wonder if it should be made
more clear that those files (there are couple above of what you added)
are skipped no matter which directory they reside in.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-11-03 Thread Petr Jelinek
Hi,

thanks for checking.

On 02/11/17 10:00, Robert Haas wrote:
> On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> sorry for the delay but I didn't have much time in past weeks to follow
>> this thread.
> 
> +TimestampTz now = GetCurrentTimestamp();
> +
>  /* output previously gathered data in a CopyData packet */
>  pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
> 
>  /*
>   * Fill the send timestamp last, so that it is taken as late as possible.
>   * This is somewhat ugly, but the protocol is set as it's already used 
> for
>   * several releases by streaming physical replication.
>   */
>  resetStringInfo();
> -pq_sendint64(, GetCurrentTimestamp());
> +pq_sendint64(, now);
>  memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
> tmpbuf.data, sizeof(int64));
> 
> This change falsifies the comments.  Maybe initialize now just after
> resetSetringInfo() is done.

Eh, right, I can do that.

> 
> -/* fast path */
> -/* Try to flush pending output to the client */
> -if (pq_flush_if_writable() != 0)
> -WalSndShutdown();
> +/* Try taking fast path unless we get too close to walsender timeout. */
> +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> +  wal_sender_timeout / 2))
> +{
> +CHECK_FOR_INTERRUPTS();
> 
> -if (!pq_is_send_pending())
> -return;
> +/* Try to flush pending output to the client */
> +if (pq_flush_if_writable() != 0)
> +WalSndShutdown();
> +
> +if (!pq_is_send_pending())
> +return;
> +}
> 
> I think it's only the if (!pq_is_send_pending()) return; that needs to
> be conditional here, isn't it?  The pq_flush_if_writable() can be done
> unconditionally.
> 

Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
It just seems like it's needless call as we'll call both in for loop
anyway if we take the "slow" path. I admit it's not exactly big win
though. If you think it would improve readability I can move it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-11-01 Thread Petr Jelinek
Hi,

sorry for the delay but I didn't have much time in past weeks to follow
this thread.

On 02/10/17 05:44, Kyotaro HORIGUCHI wrote:
> Hello Sokolov.
> 
> At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura 
> <funny.fal...@postgrespro.ru> wrote in 
> <d076dae18b437be89c787a854034f...@postgrespro.ru>
>> I don't want to make test to lasts so long and generate so many data.
>> That is why I used such small timeouts for tests.
> 
> I understand your point, but still *I* think such a short timeout
> is out of expectation by design. (But it can be set.)
> 
> Does anyone have opinions on this?

Yes, it's not practically useful to have such small wal_sender_timeout
given that the main purpose of that is to detect network issues.

> 
>> Test is failing if there is "short quit" after
>> `!pq_is_send_pending()`,
>> so I doubt your patch will pass the test.
> 
> It is because I think that the test "should" fail since the
> timeout is out of expected range. I (and perhaps also Petr) is
> thinking that the problem is just that a large transaction causes
> a timeout with an ordinary timeout. My test case is based on the
> assumption.
> 
> Your test is for a timeout during replication-startup with
> extremely short timeout. This may be a different problem to
> discuss, but perhaps better to be solved together.
> 
> I'd like to have opnions from others on this point.

I think it's different problem and because of what I wrote above it does
not seem to me like something we should spend out time on trying to fix.
> 
> Uggh! I misunderstood there. It wais for writing socket so the
> sleep is wrong and WaitLatchOrSocket is right.
> 
> After all, I put +1 for Petr's latest patch. Sorry for my
> carelessness.
> 

Great, I attached version 3 which is just rebased on current master and
also does not move the GetCurrentTimestamp() call because the concern
about skewing the timestamp during config reload (and also network flush
as I realized later) is valid.

It's rather hard to test all the scenarios that this patch fixes in
automated fashion without generating a lot of wal or adding sleeps to
the processing. That's why I didn't produce usable TAP test.

Since it seems like some of my reasoning is unclear I will try to
describe it once more.

The main problem we have is that unless we call the
ProcessRepliesIfAny() before the wal_sender_timeout expires we'll die
because of timeout eventually. The current coding will skip that call if
there is a long transaction being processed (if network is fast enough).
This is what the first part (first 2 hunks) of the patch solves. There
is also issue that while this is happening the walsender ignores signals
so it's impossible to stop it which is why I added the
CHECK_FOR_INTERRUPTS to the fast path.

The second problem is that if we solved just the first one, then if
downstream (and again network) is fast enough, the ProcessRepliesIfAny()
will not do anything useful because downstream is not going to send any
response while the network buffer contains any data. This is caused by
the fact that we normally code the receiver side to receive until there
is something and only send reply when there is a "pause" in the stream.
To get around this problem we also need to make sure that we send
WalSndKeepaliveIfNecessary() periodically and that will not happen on
fast network unless we do the second part of the patch (3rd hunk), ie,
move the pq_is_send_pending() after the keepalive handling.

This code is specific to logical decoding walsender interface so it only
happens for logical decoding/replication (which means it should be
backported all the way to 9.4). The physical one

These two issues happen quite normally in the wild as all we need is big
data load in single transaction, or update of large part of an table or
something similar for this to happen with default settings.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From df2d5b09a74cb31537e2bb74895a8e31febce5f8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Tue, 31 Oct 2017 14:00:37 +0100
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that transaction with large number of changes may take very long
to be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time and more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast 

[HACKERS] Subscriber resets additional columns to NULL on UPDATE

2017-10-26 Thread Petr Jelinek
Hi,

I found bug in logical replication where extra (nullable) columns on
subscriber will be reset to NULL value when update comes from provider.

The issue is apparently that we /points finger at himself/ forgot to
check specifically for columns that are not part of attribute map in
slot_modify_cstrings() so the extra columns will fall through to the
else block which sets the value to NULL.

Attached patch fixes it and adds couple of tests for this scenario.

This is rather serious issue so it would be good if we could get it
fixed in 10.1.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From c641eb6170f3dec26cf52264e1f931393aee434e Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Thu, 26 Oct 2017 11:11:42 +0200
Subject: [PATCH] Don't reset additional columns on subscriber to NULL after
 UPDATE

When publisher tables has fewer columns than subscriber, the update of
row on publisher should result in update of only common columns.
Previous coding mistakenly reset the values of additonal columns on
subscriber to NULL because it failed to skip updates of columns not
found in attribute map.
---
 src/backend/replication/logical/worker.c   |  7 ++-
 src/test/subscription/t/008_diff_schema.pl | 77 ++
 2 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 src/test/subscription/t/008_diff_schema.pl

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bc6d8246a7..0e68670767 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -391,10 +391,13 @@ slot_modify_cstrings(TupleTableSlot *slot, LogicalRepRelMapEntry *rel,
 		Form_pg_attribute att = TupleDescAttr(slot->tts_tupleDescriptor, i);
 		int			remoteattnum = rel->attrmap[i];
 
-		if (remoteattnum >= 0 && !replaces[remoteattnum])
+		if (remoteattnum < 0)
 			continue;
 
-		if (remoteattnum >= 0 && values[remoteattnum] != NULL)
+		if (!replaces[remoteattnum])
+			continue;
+
+		if (values[remoteattnum] != NULL)
 		{
 			Oid			typinput;
 			Oid			typioparam;
diff --git a/src/test/subscription/t/008_diff_schema.pl b/src/test/subscription/t/008_diff_schema.pl
new file mode 100644
index 00..a6b4597403
--- /dev/null
+++ b/src/test/subscription/t/008_diff_schema.pl
@@ -0,0 +1,77 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create some preexisting content on publisher
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b varchar)");
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz default now(), d bigint default 999)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+# update the tuples on publisher and check the additional columns on
+# subscriber didn't change
+$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = md5(b)");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for subscriber to catch up";
+
+$result =
+  $node_subscr

Re: [HACKERS] Fix a typo in execReplication.c

2017-10-12 Thread Petr Jelinek
On 12/10/17 00:52, Masahiko Sawada wrote:
> On Thu, Oct 12, 2017 at 5:02 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Mon, Oct 9, 2017 at 10:59 PM, Masahiko Sawada <sawada.m...@gmail.com> 
>> wrote:
>>> Attached a patch for $subject. ISTM these are mistakes of copy-and-paste.
>>
>> Committed, but isn't the code itself wrong as well in the DELETE case?
>>
>> /* BEFORE ROW DELETE Triggers */
>> if (resultRelInfo->ri_TrigDesc &&
>> resultRelInfo->ri_TrigDesc->trig_update_before_row)
>> {
>> skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
>>>tts_tuple->t_self,
>>NULL);
>> }
>>
>> Why not trig_delete_before_row?
>>

Indeed, that's another copy-pasto.

> 
> Thank you!
> I think you're right. It should be trig_delete_before_row and be
> back-patched to 10. Attached patch fixes it.
> 

Thanks for the patch, looks correct to me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-10 Thread Petr Jelinek
On 08/10/17 15:21, Craig Ringer wrote:
> On 8 October 2017 at 02:37, Daniele Varrazzo <daniele.varra...@gmail.com> 
> wrote:
>> Hello,
>>
>> we have been reported, and I have experienced a couple of times,
>> pg_repack breaking logical replication.
>>
>> - https://github.com/reorg/pg_repack/issues/135
>> - https://github.com/2ndQuadrant/pglogical/issues/113
> 
> Yeah, I was going to say I've seen reports of this with pglogical, but
> I see you've linked to them.
> 
> I haven't had a chance to look into it though, and haven't had a
> suitable reproducible test case.
> 
>> In the above issue #113, Petr Jelinek commented:
>>
>>> From quick look at pg_repack, the way it does table rewrite is almost 
>>> guaranteed
>>> to break logical decoding unless there is zero unconsumed changes for a 
>>> given table
>>> as it does not build the necessary mappings info for logical decoding that 
>>> standard
>>> heap rewrite in postgres does.
>>
>> unfortunately he didn't follow up to further details requests.
> 
> At a guess he's referring to src/backend/access/heap/rewriteheap.c .
> 
> I'd explain better if I understood what was going on myself, but I
> haven't really understood the logical decoding parts of that code.
> 
>> - Is Petr diagnosis right and freezing of logical replication is to be
>> blamed to missing mapping?
>> - Can you suggest a test to reproduce the issue reliably?
>> - What are mapped relations anyway?
> 
> I can't immediately give you the answers you seek, but start by
> studying src/backend/access/heap/rewriteheap.c . Notably
> logical_end_heap_rewrite, logical_rewrite_heap_tuple,
> logical_begin_heap_rewrite.
> 

Yes that's exactly it. When table is rewritten we need to create mapping
for every tuple that was created or removed (ie, insert, update or
delete operation happened on it) since the oldest replication slot xmin
for logical decoding to continue to work on that table after the
rewrite. And pg_repack doesn't create that mapping.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Petr Jelinek
On 10/10/17 17:22, Aleksander Alekseev wrote:
> Hi Petr,
> 
>> let me start by saying that my view is that this is simply a
>> documentation bug. Meaning that I didn't document that it does not work,
>> but I also never intended it to work. Main reason is that we can't
>> support the semantics of "UPDATE OF" correctly (see bellow). And I think
>> it's better to not support something at all rather than making it behave
>> differently in different cases.
>>
>> Now about the proposed patch, I don't think this is correct way to
>> support this as it will only work when either PRIMARY KEY column was
>> changed or when REPLICA IDENTITY is set to FULL for the table. And even
>> then it will have very different semantics from how it works when the
>> row is updated by SQL statement (all non-toasted columns will be
>> reported as changed regardless of actually being changed or not).
>>
>> The more proper way to do this would be to run data comparison of the
>> new tuple and the existing tuple values which a) will have different
>> behavior than normal "UPDATE OF" triggers have because we still can't
>> tell what columns were mentioned in the original query and b) will not
>> exactly help performance (and perhaps c) one can write the trigger to
>> behave this way already without impacting all other use-cases).
> 
> I personally think that solution proposed by Masahiko is much better
> than current behavior.


Well the proposed patch adds inconsistent behavior - if in your test
you'd change the trigger to be UPDATE OF v instead of k and updated v,
it would still not be triggered even with this patch. That's IMHO worse
than current behavior which at least consistently doesn't work.

> We could document current limitations you've
> mentioned and probably add a corresponding warning during execution of
> ALTER TABLE ... ENABLE REPLICA TRIGGER. I don't insist on this
> particular approach though.
> 
> What I really don't like is that PostgreSQL allows to create something
> that supposedly should work but in fact doesn't. Such behavior is
> obviously a bug. So as an alternative we could just return an error in
> response to ALTER TABLE ... ENABLE REPLICA TRIGGER query for triggers
> that we know will never be executed.
> 

Well ENABLE REPLICA is not specific to builtin logical replication
though. It only depends on the value of session_replication_role GUC.
Any session can set that and if that session executes SQL the UPDATE OF
triggers will work as expected. It's similar situation to statement
triggers IMHO, we allow ENABLE REPLICA statement triggers but they are
not triggered by logical replication because there is no statement. And
given that UPDATE OF also depends on statement to work as expected (it's
specified to be triggered for columns listed in the statement, not for
what has been actually changed by the execution) I don't really see the
difference between this and statement triggers except that statement
trigger behavior is documented.

I understand that it's somewhat confusing and I am not saying it's
ideal, but I don't see any other behavior that would work for what your
test tries to do and still be consistent with rest of the system.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF trigger

2017-10-10 Thread Petr Jelinek
In my environment this bug seems to be fixed by the patch.
>
Hi,

let me start by saying that my view is that this is simply a
documentation bug. Meaning that I didn't document that it does not work,
but I also never intended it to work. Main reason is that we can't
support the semantics of "UPDATE OF" correctly (see bellow). And I think
it's better to not support something at all rather than making it behave
differently in different cases.

Now about the proposed patch, I don't think this is correct way to
support this as it will only work when either PRIMARY KEY column was
changed or when REPLICA IDENTITY is set to FULL for the table. And even
then it will have very different semantics from how it works when the
row is updated by SQL statement (all non-toasted columns will be
reported as changed regardless of actually being changed or not).

The more proper way to do this would be to run data comparison of the
new tuple and the existing tuple values which a) will have different
behavior than normal "UPDATE OF" triggers have because we still can't
tell what columns were mentioned in the original query and b) will not
exactly help performance (and perhaps c) one can write the trigger to
behave this way already without impacting all other use-cases).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-07 Thread Petr Jelinek
On 07/10/17 19:59, Tom Lane wrote:
> Petr Jelinek <petr.jeli...@2ndquadrant.com> writes:
>> On 07/10/17 18:15, Tom Lane wrote:
>>> No, I'm afraid you didn't read that comment closely enough.  This will
>>> flat out fail for cases like "select ... where x=x order by x", because
>>> there will already be a single-element EC for x and so the clause will
>>> just disappear entirely.  If that doesn't happen, then instead you're
>>> creating an EC with duplicate entries, which is an idea I seriously
>>> dislike; the semantics of that don't seem clear at all to me.
> 
>> Hmm it did not disappear (and worked fine in SQL level tests).
> 
> I may not be correctly remembering what the query would need to look
> like for there to be single-element ECs in existence at this point; but
> I surely would not like this code to assume that none will exist till
> later.
> 
>> I don't
>> think I fully understand the "EC with duplicate entries" part and what's
>> the problem with it so I'll defer to your wisdom there.
> 
> Well, as one example, assume that we use your patch and consider what
> happens with
>   where x = y and x = x
> vs
>   where x = x and x = y
> 
> In the first case we end up with an EC that's just {x,y}, because the
> second process_equivalence() will find both sides in the same EC and
> conclude that the second clause is redundant.  (Which it is, if the
> equality operators have the same notion of what to do with nulls.)
> In the second case we end up with an EC containing {x,x,y}, which
> at minimum will result in emitting redundant generated quals.  I'm
> not sure if it could have any worse consequences than that, but I'm
> not sure it couldn't either.  But this is bogus in any case, because
> those WHERE clauses surely should produce identical results.
> 
> Looking further ahead, if ECs have to support being multisets rather
> than pure sets, that would put a crimp in ever improving this logic to
> use a smarter UNION-FIND algorithm.  (I've not yet seen queries where the
> number of EC members is large enough to make that a serious issue, but
> I think it could happen, particularly with inheritance/partitioning.)
> 

Okay, that makes sense, thanks for explanation. Your patch is the way to
go then.

>>> This passes the smell test for me in the sense of not adding any
>>> significant number of planner cycles except when the weird case occurs.
>>> It leaves something on the table in that there are some situations
>>> where X=X could be converted, but we won't do it because we don't see
>>> the clause as being a potential EC (because it's not at top level),
>>> as in the second new regression test case below.  I think that's
>>> probably all right; I don't see any way to be more thorough without
>>> adding a lot of new cycles all the time, and I don't believe this is
>>> worth that.
> 
>> My code had same issue. I think going deeper would require quite a bit
>> of new code (and cycles) for something that's even less likely to happen
>> than simple X=X while the current patch is quite cheap win.
> 
> Yeah.  I'm not really convinced it's a big win, but it's so cheap we
> might as well do it.  The only case where it would expend cycles and
> fail to give an improvement is if we have X=X with a non-strict operator,
> which I think is a case that never arises in practice at present,
> because btree effectively assumes that all btree operators are strict.
> (Someday we might want to relax that, though, which is why I don't
> want to let the planner just assume it.)
> 

In real-life probably not, but seems like these small bits can be useful
marketing wise, given articles like the one which started this. And it
should not hurt anything either.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-07 Thread Petr Jelinek
On 07/10/17 18:15, Tom Lane wrote:
> Petr Jelinek <petr.jeli...@2ndquadrant.com> writes:
>> On 07/10/17 04:19, Tom Lane wrote:
>>> (edit: a few minutes later, I seem to remember that equivclass.c has
>>> to do something special with the X=X case, so maybe it could do
>>> something else special instead, with little new overhead.)
> 
>> So I wrote prototype of achieving this optimization and it seems to be
>> really quite simple code-wise (see attached). I did only a limited
>> manual testing of this but I don't see any negative impact on planning time.
> 
> No, I'm afraid you didn't read that comment closely enough.  This will
> flat out fail for cases like "select ... where x=x order by x", because
> there will already be a single-element EC for x and so the clause will
> just disappear entirely.  If that doesn't happen, then instead you're
> creating an EC with duplicate entries, which is an idea I seriously
> dislike; the semantics of that don't seem clear at all to me.

Hmm it did not disappear (and worked fine in SQL level tests). I don't
think I fully understand the "EC with duplicate entries" part and what's
the problem with it so I'll defer to your wisdom there.

> What I was imagining was that having detected X=X, instead of "throwing
> back" the clause as-is we could throw back an X IS NOT NULL clause,
> along the lines of the attached.

Right, I wanted to avoid messing with the incoming result info, but if
we don't want to call the code bellow or write tons of code for this, I
guess it's the best we can do.

> This passes the smell test for me in the sense of not adding any
> significant number of planner cycles except when the weird case occurs.
> It leaves something on the table in that there are some situations
> where X=X could be converted, but we won't do it because we don't see
> the clause as being a potential EC (because it's not at top level),
> as in the second new regression test case below.  I think that's
> probably all right; I don't see any way to be more thorough without
> adding a lot of new cycles all the time, and I don't believe this is
> worth that.
> 

My code had same issue. I think going deeper would require quite a bit
of new code (and cycles) for something that's even less likely to happen
than simple X=X while the current patch is quite cheap win.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

2017-10-07 Thread Petr Jelinek
On 07/10/17 18:23, Konstantin Knizhnik wrote:
> On 10/07/2017 04:26 PM, Petr Jelinek wrote:
>>
>> Hmm so you start transaction (you have to when running
>> CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot
>> is being created the config is reloaded. And since now you are in
>> transaction the tsearch hook for GUC processing tries to access catalogs
>> which sets the xmin for the transaction.
> 
> Actually this slot is implicitly created by LogicalRepSyncTableStart to
> perform initial data sync.
> 

That does not change what I said above.

>>
>> That's not good, but I can't really say I have idea about what to do
>> with it other than to set some kind of flag saying that logical decoding
>> snapshot is being built and using that to skip catalog access which does
>> not seem very pretty.
>>
> It is not quite clear from the comment why it is not possible to
> overwrite MyPgXact->xmin or just use existed value.
> 

We can't use existing value because the snapshot used to read data would
no longer correspond to the one we just built as consistent start point.

About overwriting, well we may be able to do that if there was no active
snapshot in the transaction, ie we could try doing something like
SnapshotResetXmin() or possibly rather InvalidateCatalogSnapshot() since
the xmin was set for reading catalog snapshot before checking the
MyPgXact->xmin.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

2017-10-07 Thread Petr Jelinek
On 06/10/17 16:46, Konstantin Knizhnik wrote:
> 
> 
> On 06.10.2017 15:29, Petr Jelinek wrote:
>> On 06/10/17 12:16, Konstantin Knizhnik wrote:
>>> When creating logical replication slots we quite often get the following
>>> error:
>>>
>>> ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
>>> already is valid
>>>
>>> which cause restart of WAL sender.
>>> The comment to this line doesn't clarify much:
>>>
>>>  /* so we don't overwrite the existing value */
>>>  if (TransactionIdIsValid(MyPgXact->xmin))
>>>  elog(ERROR, "cannot build an initial slot snapshot when
>>> MyPgXact->xmin already is valid");
>>> I wonder if it is normal situation or something goes wrong?
>>>
>> Hi,
>>
>> no it's not normal situation, it seems you are doing something that
>> assigns xid before you run the CREATE_REPLICATION_SLOT command on that
>> connection.
> 
> I have not doing something in this connection: it is wal sender
> executing CREATE_REPLICATION_SLOT replication command.
> Please look at the stack in my original e-mail. It shows who and when is
> setting MyPgXact->xmin.
> It is GetSnapshotData called because of reloading configuration settings.
> And configuration setting are reloaded because our application is
> updating "synchronous_standby_names".
> 
> 

Hmm so you start transaction (you have to when running
CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot
is being created the config is reloaded. And since now you are in
transaction the tsearch hook for GUC processing tries to access catalogs
which sets the xmin for the transaction.

That's not good, but I can't really say I have idea about what to do
with it other than to set some kind of flag saying that logical decoding
snapshot is being built and using that to skip catalog access which does
not seem very pretty.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-07 Thread Petr Jelinek
On 07/10/17 04:19, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
>> On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:
>>> The article in question is here:
>>> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/
> 
>> That's interesting.
> 
> The impression I have in a quick scan is that probably hardly any of these
> are cases that any of the DB designers think are important in themselves.
> Rather, they fall out of more general optimization attempts, or not,
> depending on the optimization mechanisms in use in a particular DB.
> For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
> comes out of a constant-subexpression-precalculation mechanism for us,
> whereas "WHERE column=column" doesn't fall to that approach.  ISTM it
> would be really dumb to expend planner cycles looking specifically for
> that case, so I guess that DB2 et al are finding it as a side-effect of
> some more general optimization ... I wonder what that is?
> 
> (edit: a few minutes later, I seem to remember that equivclass.c has
> to do something special with the X=X case, so maybe it could do
> something else special instead, with little new overhead.)
> 

What it actually does is to specifically skip the processing for X=X
(the const expression will be simplified by
estimate_expression_value/eval_const_expressions separately). There is
comment there that specifically states that it's not worth it to process
this as it's rare clause which is equal to X IS NOT NULL.

I don't actually agree with the argument of the comment there, since in
practice the if the "silly" equality is not there, we'll just waste
equal() call and if it is there the optimization seems worth it as it
will lead to orders of magnitude better estimation in many cases.

So I wrote prototype of achieving this optimization and it seems to be
really quite simple code-wise (see attached). I did only a limited
manual testing of this but I don't see any negative impact on planning time.

Thoughts?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From a7d6f3fe0a5d42a96c711cc33e6962b9a2f6511f Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sat, 7 Oct 2017 15:10:54 +0200
Subject: [PATCH] Transform X=X expressions into X IS NOT NULL

---
 src/backend/optimizer/path/equivclass.c | 12 
 src/backend/optimizer/plan/initsplan.c  | 11 +++
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7997f50c18..d4ffdf66f2 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -154,18 +154,6 @@ process_equivalence(PlannerInfo *root, RestrictInfo *restrictinfo,
 	   collation);
 
 	/*
-	 * Reject clauses of the form X=X.  These are not as redundant as they
-	 * might seem at first glance: assuming the operator is strict, this is
-	 * really an expensive way to write X IS NOT NULL.  So we must not risk
-	 * just losing the clause, which would be possible if there is already a
-	 * single-element EquivalenceClass containing X.  The case is not common
-	 * enough to be worth contorting the EC machinery for, so just reject the
-	 * clause and let it be processed as a normal restriction clause.
-	 */
-	if (equal(item1, item2))
-		return false;			/* X=X is not a useful equivalence */
-
-	/*
 	 * If below outer join, check for strictness, else reject.
 	 */
 	if (below_outer_join)
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 9931dddba4..1c3672a978 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2347,6 +2347,17 @@ process_implied_equality(PlannerInfo *root,
 return;
 		}
 	}
+	else if (equal(item1, item2))
+	{
+		NullTest	   *ntest;
+		ntest = makeNode(NullTest);
+		ntest->arg = item1;
+		ntest->nulltesttype = IS_NOT_NULL;
+		ntest->argisrow = false;
+		ntest->location = exprLocation((Node *) clause);
+
+		clause = (Expr *) ntest;
+	}
 
 	/*
 	 * Push the new clause into all the appropriate restrictinfo lists.
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

2017-10-06 Thread Petr Jelinek
On 06/10/17 12:16, Konstantin Knizhnik wrote:
> When creating logical replication slots we quite often get the following
> error:
> 
> ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
> already is valid
> 
> which cause restart of WAL sender.
> The comment to this line doesn't clarify much:
> 
>     /* so we don't overwrite the existing value */
>     if (TransactionIdIsValid(MyPgXact->xmin))
>     elog(ERROR, "cannot build an initial slot snapshot when
> MyPgXact->xmin already is valid");
>>
> I wonder if it is normal situation or something goes wrong?
> 

Hi,

no it's not normal situation, it seems you are doing something that
assigns xid before you run the CREATE_REPLICATION_SLOT command on that
connection.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with logical replication

2017-10-03 Thread Petr Jelinek
On 02/10/17 18:59, Petr Jelinek wrote:
>>
>> Now fix the trigger function:
>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>> BEGIN
>>   RETURN NEW;
>> END $$ LANGUAGE plpgsql;
>>
>> And manually perform at master two updates inside one transaction:
>>
>> postgres=# begin;
>> BEGIN
>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>> UPDATE 1
>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>> UPDATE 1
>> postgres=# commit;
>> 
>>
>> and in replica log we see:
>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>> worker for subscription "sub" has started
>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>> tuple
>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>> replication worker for subscription 16403 (PID 2954) exited with exit
>> code 1
>>
>> Error happens in trigger.c:
>>
>> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>> trigger.c:3103
>> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>     at execReplication.c:461
>> #6  0x00820894 in apply_handle_update (s=0x7ffc442163b0) at
>> worker.c:736
> 
> We have locked the same tuple in RelationFindReplTupleByIndex() just
> before this gets called and didn't get the same error. I guess we do
> something wrong with snapshots. Will need to investigate more.
> 

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b58d2e1008..d6a5cf7088 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -204,6 +204,8 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
 
+	estate->es_output_cid = GetCurrentCommandId(true);
+
 	/* Triggers might need a slot */
 	if (resultRelInfo->ri_TrigDesc)
 		estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-03 Thread Petr Jelinek
On 03/10/17 22:01, Thomas Munro wrote:
> On Wed, Oct 4, 2017 at 7:32 AM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> On 03/10/17 19:44, Tom Lane wrote:
>>> I wrote:
>>>>> So that's trouble waiting to happen, for sure.  At the very least we
>>>>> need to do a single fetch of WalRcv->latch, not two.  I wonder whether
>>>>> even that is sufficient, though: this coding requires an atomic fetch of
>>>>> a pointer, which is something we generally don't assume to be safe.
>>>
>>> BTW, I had supposed that this bug was of long standing, but actually it's
>>> new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555.  Before
>>> that walreceiver start/stop just changed the owner of a long-lived shared
>>> latch, and there was no question of stale pointers.
>>>
>>> I considered reverting that decision, but the reason for it seems to have
>>> been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
>>> having to know about a custom latch.  That's probably a sufficient reason
>>> to justify some squishiness in the wakeup logic.  Still, we might want to
>>> revisit it if we find any other problems here.
>> That's correct, and because the other users of that library don't have
>> special latch it seemed feasible to standardize on the procLatch. If we
>> indeed wanted to change the decision about this I think we can simply
>> give latch as a parameter to libpqrcv_connect and store it in the
>> WalReceiverConn struct. The connection does not need to live past the
>> latch (although it currently does, but that's just a matter of
>> reordering the code in WalRcvDie() a little AFAICS).
> 
> I wonder if the new ConditionVariable mechanism would be worth
> considering in future cases like this, where the signalling process
> doesn't know the identity of the process to be woken up (or even how
> many waiting processes there are), but instead any interested waiters
> block on a particular ConditionVariable that models a specific
> interesting condition.  In the end it's just latches anyway, but it
> may be a better abstraction.  On the other hand I'm not sure how waits
> on a ConditionVariable would be multiplexed with IO (a distinct wait
> event, or leaky abstraction where the caller relies on the fact that
> it's built on MyProc->latch, something else?).
> 

In this specific case the problem is that what we are waiting for is
indeed the Latch because we want the libpqwalreceiver calls to be
interruptible when waiting for I/O.

Otherwise, yes ConditionVariable is useful for generic signaling, we use
it in slot and origin for "lock" waits (they don't have normal catalog
so normal locking does not work).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-03 Thread Petr Jelinek
On 03/10/17 19:44, Tom Lane wrote:
> I wrote:
>>> So that's trouble waiting to happen, for sure.  At the very least we
>>> need to do a single fetch of WalRcv->latch, not two.  I wonder whether
>>> even that is sufficient, though: this coding requires an atomic fetch of
>>> a pointer, which is something we generally don't assume to be safe.
> 
> BTW, I had supposed that this bug was of long standing, but actually it's
> new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555.  Before
> that walreceiver start/stop just changed the owner of a long-lived shared
> latch, and there was no question of stale pointers.
> 
> I considered reverting that decision, but the reason for it seems to have
> been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
> having to know about a custom latch.  That's probably a sufficient reason
> to justify some squishiness in the wakeup logic.  Still, we might want to
> revisit it if we find any other problems here.
That's correct, and because the other users of that library don't have
special latch it seemed feasible to standardize on the procLatch. If we
indeed wanted to change the decision about this I think we can simply
give latch as a parameter to libpqrcv_connect and store it in the
WalReceiverConn struct. The connection does not need to live past the
latch (although it currently does, but that's just a matter of
reordering the code in WalRcvDie() a little AFAICS).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with logical replication

2017-10-02 Thread Petr Jelinek
On 02/10/17 17:49, Konstantin Knizhnik wrote:
> I have faced two issues with logical replication.
> Reproducing scenario:
> 
> 1. start two Postgres instances (I start both at local machine).
> 2. Initialize pgbench tables at both instances:
>     pgbench -i -s 10 postgres
> 3. Create publication of pgbench_accounts table at one node:
>     create publication pub for table pgbench_accounts;
> 4. Create subscription at another node:
>     delete from pgbench_accounts;
>     CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost
> port=5432 sslmode=disable' publication pub;
>     CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS
> TRIGGER AS $$
>     BEGIN
>     --  RETURN NEW;
>     END $$ LANGUAGE plpgsql;
>    CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE
> ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE
> replication_trigger_proc();
>    ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
> 5. Start pgbench at primary node:
>     pgbench -T 1000 -P 2 -c 10 postgres
> 
> 
> Please notice commented "return new" statement. Invocation of this
> function cause error and in log we see repeated messages:
> 
> 2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table
> synchronization worker for subscription "sub", table "pgbench_accounts"
> has started
> 2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of
> trigger procedure without RETURN
> 2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function
> replication_trigger_proc()
>     COPY pgbench_accounts, line 1: "1    1    0 "
> 2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical
> replication worker for subscription 17264 sync 17251 (PID 32129) exited
> with exit code 1
> ...
> 
> 
> After few minutes of work primary node (with publication) is crashed
> with the following stack trace:
> 
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f3608f92028 in __GI_abort () at abort.c:89
> #2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30
> "!(((xid) != ((TransactionId) 0)))",
>     errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
> lineNumber=582) at assert.c:54
> #3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
> #4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
> cutoff=898498) at snapbuild.c:1400
> #5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
> lsn=798477760, running=0x2749198) at snapbuild.c:1311
> #6  0x0081c339 in SnapBuildProcessRunningXacts
> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106


Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
transaction ids but we specifically skip those in
GetRunningTransactionData(). Can you check pg_waldump output for that
record (lsn is shown in the backtrace)?

> 
> Now fix the trigger function:
> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
> BEGIN
>   RETURN NEW;
> END $$ LANGUAGE plpgsql;
> 
> And manually perform at master two updates inside one transaction:
> 
> postgres=# begin;
> BEGIN
> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
> UPDATE 1
> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
> UPDATE 1
> postgres=# commit;
> 
> 
> and in replica log we see:
> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
> worker for subscription "sub" has started
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
> tuple
> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
> replication worker for subscription 16403 (PID 2954) exited with exit
> code 1
> 
> Error happens in trigger.c:
> 
> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
> trigger.c:3103
> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, searchslot=0x2

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-26 Thread Petr Jelinek
On 26/09/17 09:26, Alvaro Hernandez wrote:
> On 26/09/17 10:03, Craig Ringer wrote:
>> On 26 September 2017 at 14:08, Alvaro Hernandez <a...@ongres.com
>> <mailto:a...@ongres.com>> wrote: 
>> - If you stick to in-core plugins, then you need to support at
>> least three different output formats if you want to support 9.4+:
>> test_decoding (and pray it works!), pgoutput, and the "new"
>> in-core plugin that was proposed at the beginning of this thread,
>> if that would see the light.
>>
>>
>> The only practical way will IMO be to have whatever new plugin it also
>> have an out-of-core version maintained for older Pg versions, where it
>> can be installed.
>>   
>>
>> But only in-core plugins help for general-purpose solutions.
>>
>>
>> I still don't agree there. If there's enough need/interest/adoption
>> you can get cloud vendors on board, they'll feel the customer
>> pressure. It's not our job to create that pressure and do their work
>> for them.
> 
>     Don't want to get into a loop, but as I said before it's
> chicken-and-egg. But nobody is asking core to do their work. As much as
> I love it, I think logical decoding is a bit half-baked until there is a
> single, quality, in-core plugin, as it discourages its usage, because of
> the reasons I stated.
> 

Well, in that case it's all good as PG10 has that.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Petr Jelinek
On 25/09/17 19:48, Joshua D. Drake wrote:
> On 09/25/2017 10:43 AM, Andres Freund wrote:
>> On 2017-09-25 10:38:52 -0700, Joshua D. Drake wrote:
>>> On 09/25/2017 10:32 AM, Petr Jelinek wrote:
>>>> On 25/09/17 19:26, Tom Lane wrote:
>>>>> Alvaro Hernandez <a...@ongres.com> writes:
>>>
>>>>
>>>> There is already about 3 million output plugins out there so I think we
>>>> did reasonable job there. The fact that vast majority of that are
>>>> various json ones gives reasonable hint that we should have that one in
>>>> core though.
>>>
>>> And I am sure that 2ndQuadrant would be happy to add it to their
>>> version of
>>> Postgres and maintain it themselves.
>>>
>>> https://www.2ndquadrant.com/en/resources/2ndqpostgres/
>>
>> This doesn't seem like a good way to argue.
>>
> 
> Sorry, that wasn't supposed to be negative. My point was that
> 2ndQuadrant has a distribution of Postgres that they support. If
> 2ndQuadrant wants the feature, they could add it to their own without
> burdening the wider community further. It provides 2ndQuadrant what they
> are arguing for, benefits 2ndQuadrant as it increases the visibility and
> opportunity of their distribution for wider use.
> 

I am not sure I follow, we do have pglogical extension which works just
fine for this, thanks. You might have noticed that it's not what I was
suggesting we use for core.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Petr Jelinek
On 25/09/17 19:26, Tom Lane wrote:
> Alvaro Hernandez <a...@ongres.com> writes:
>> In my opinion, logical decoding plugins that don't come with core 
>> are close to worthless (don't get me wrong):
> 
>> - They very unlikely will be installed in managed environments (an area 
>> growing significantly).
>> - As anything that is not in core, raises concerns by users.
>> - Distribution and testing are non-trivial: many OS/archs combinations.
> 
> The problem with this type of argument is that it leads directly to the
> conclusion that every feature users want must be in core.  We can't
> accept that conclusion, because we simply do not have the manpower or
> infrastructure to maintain a kitchen-sink, Oracle-sized code base.
> I think we're already more or less at the limit of the feature set we can
> deal with :-(.  The entire point of the output-plugin design was to allow
> useful replication stuff to be done outside of core; we need to make use
> of that.  (If that means we need better docs, then yeah, we'd better get
> that part done.)
> 

There is already about 3 million output plugins out there so I think we
did reasonable job there. The fact that vast majority of that are
various json ones gives reasonable hint that we should have that one in
core though.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and statistics

2017-09-25 Thread Petr Jelinek
On 25/09/17 19:19, Tom Lane wrote:
> Pavel Stehule <pavel.steh...@gmail.com> writes:
>> I had two instances on one server with different port. I am sure, so
>> replication was functional. Only one issue is statistics
> 
>> Master:
> 
>> CREATE TABLE foo(id int primary key, a int);
>> CREATE PUBLICATION test_pub FOR TABLE foo;
>> INSERT INTO foo VALUES(1, 200);
> 
>> slave
> 
>> CREATE TABLE foo(id int primary key, a int);
>> CREATE SUBSCRIPTION test_sub CONNECTION 'port=5432' PUBLICATION test_pub;
> 
>> That was all
> 
> In this example, nothing's been done yet by the actual replication
> apply process, only by the initial table sync.  Maybe that accounts
> for your not seeing stats?
> 

The main replication worker should still be running though. The output
of pg_stat_replication should only be empty if there is nothing running.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Petr Jelinek
On 25/09/17 18:48, Alvaro Hernandez wrote:
> 
> 
> On 25/09/17 19:39, Petr Jelinek wrote:
>>
>> Well, test_decoding is not meant for production use anyway, no need for
>> middleware to support it. The pgoutput is primarily used for internal
>> replication purposes, which is why we need something with more
>> interoperability in mind in the first place. The new plugin should still
>> support publications etc though IMHO.
>>
>>>  However, having said that, and while json is a great output format
>>> for interoperability, if there's a discussion on which plugin to include
>>> next, I'd also favor one that has some more compact representation
>>> format (or that supports several formats, not only json).
>>>
>> JSON is indeed great for interoperability, if you want more compact
>> format, use either pgoutput or write something of your own or do
>> conversion to something else in your consumer. I don't think postgres
>> needs to provide 100 different formats out of the box when there is an
>> API. The JSON output does not have to be extremely chatty either btw.
>>
> 
>     In my opinion, logical decoding plugins that don't come with core
> are close to worthless (don't get me wrong):
> 

I respectfully disagree.

> - They very unlikely will be installed in managed environments (an area
> growing significantly).
> - As anything that is not in core, raises concerns by users.
> - Distribution and testing are non-trivial: many OS/archs combinations.
> 
>     Given the above, I believe having a general-purpose output plugin
> in-core is critical to the use of logical decoding. As for 9.4-9.6 there
> is test_decoding, and given that AWS uses it for production, that's kind
> of fine. For 10 there is at least pgoutput, which could be used (even
> though it was meant for replication). But if a new plugin is to be
> developed for 11+, one really general purpose one, I'd say json is not a
> good choice if it is the only output it would support. json is too
> verbose, and replication, if anything, needs performance (it is both
> network heavy and serialization/deserialization is quite expensive). Why
> not, if one and only one plugin would be developed for 11+, general
> purpose, do something that is, indeed, more general, i.e., that supports
> high-performance scenarios too?
> 

IMO for general purpose use the adoption/ease of parsing is more
important criteria which is why JSON is good option. Again if you need
something more tuned to performance and you are fine with some hardship
in terms of parsing, what's stopping you from using pgoutput?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-25 Thread Petr Jelinek
On 23/09/17 19:01, Alvaro Hernandez wro> On 23/09/17 18:42, Euler
Taveira wrote:
>> 2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbr...@google.com>:
>>> We have been working on a project that makes extensive use of logical
>>> replication for use inside Apigee (which is a very small part of
>>> Google):
>>>
>>> https://github.com/apigee-labs/transicator
>>>
>>> In order to do this, we had to write our own logical replication plugin
>>> because the supplied "test_decoding" plugin from the "contrib"
>>> directory was
>>> hard for us to work with. Primarily:
>>>
>> test_decoding is a proof of concept to illustrate the logical decoding
>> potential. It is not intended for production.
> 
>     However, AFAIK, AWS's DMS uses it for production purposes (see
> http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).
> 
> 
>> I developed wal2json [1]
>> for general use. It outputs changes in JSON. It was one of the first
>> logical decoding plugins.
>>
>>> 1) It doesn't include all the fields that we need for Transicator (most
>>> importantly we need the LSN and the 64-bit transaction ID).
>>>
>> wal2json includes both.
>>
>>> 2) It outputs a text format that is hard to parse.
>>>
>> There are a lot of JSON parsers.
>>
>>> I imagine that other users of logical decoding are facing similar
>>> problems.
>>>
>>> Would the community support the development of another plugin that is
>>> distributed as part of "contrib" that addresses these issues? I'd be
>>> happy
>>> to submit a patch, or GitHub repo, or whatever works best as an example.
>>> (Also, although Transicator uses protobuf, I'm happy to have it output a
>>> simple binary format as well.)
>>>
>> There was a prior discussion and it was suggestted that we have a
>> ready-for-production plugin in core (besides pgoutput). It was
>> suggested [1] that I submit wal2json for 11. I'm in process to clean
>> up the code and hope to submit it to CF2.

Thanks, I'll be happy to review that.

> 
>     I would be happy to see another logical decoding plugin into core
> starting on 11. However, this also poses a bit of a challenge for
> middleware implementors: you need to support one for 9.4-9.5
> (test_decoding), another for 10 (pgoutput) and maybe another for 11
> onwards. The idea of asking users to install a binary plugin is very
> unsexy, so these are the options available.

Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.

>     However, having said that, and while json is a great output format
> for interoperability, if there's a discussion on which plugin to include
> next, I'd also favor one that has some more compact representation
> format (or that supports several formats, not only json).
> 

JSON is indeed great for interoperability, if you want more compact
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: bgw_type (was Re: [HACKERS] Why does logical replication launcher set application_name?)

2017-09-25 Thread Petr Jelinek
On 25/09/17 16:45, Peter Eisentraut wrote:
> On 8/31/17 23:22, Michael Paquier wrote:
>>> One open question is how to treat a missing (empty) bgw_type.  I
>>> currently fill in bgw_name as a fallback.  We could also treat it as an
>>> error or a warning as a transition measure.
>>
>> Hm. Why not reporting an empty type string as NULL at SQL level and
>> just let it empty them? I tend to like more interfaces that report
>> exactly what is exactly registered at memory-level, because that's
>> easier to explain to users and in the documentation, as well as easier
>> to interpret and easier for module developers.
> 
> But then background workers that are not updated for, say, PG11 will not
> show anything useful in pg_stat_activity.  We should have some amount of
> backward compatibility here.
> 

Maybe the empty bgw_type could mean just "bgworker"?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and statistics

2017-09-25 Thread Petr Jelinek
On 25/09/17 13:33, Pavel Stehule wrote:
> 
> 
> 2017-09-25 13:12 GMT+02:00 Masahiko Sawada <sawada.m...@gmail.com
> <mailto:sawada.m...@gmail.com>>:
> 
> On Mon, Sep 25, 2017 at 12:58 AM, Pavel Stehule
> <pavel.steh...@gmail.com <mailto:pavel.steh...@gmail.com>> wrote:
> > Hi
> >
> > I did trivial example of logical replication (one table, one 
> publication,
> > one subscription)
> >
> > I am little bit surprised so after some work - the replication is 
> working,
> > the statistics are empty
> >
> > #master
> > postgres=# select * from pg_stat_replication ;
> > (0 rows)
> >
> > #slave
> > postgres=# select * from pg_stat_subscription ;
> > -[ RECORD 1 ]-+-
> > subid                 | 16472
> > subname               | test_sub
> > pid                   |
> > relid                 |
> > received_lsn          |
> > last_msg_send_time    |
> > last_msg_receipt_time |
> > latest_end_lsn        |
> > latest_end_time       |
> >
> > Should be some enabled?
> >
> 
> If the subscription is disabled, the statistics of subscription is
> empty and no wal sender processes launch. The test_sub can start the
> replication by ALTER SUBSCRIPTION test_sub ENABLE.
> 
> 
> I used this subscriptions for and it was warking.


If there is no pid, the worker is not running. And if there is nothing
in pg_stat_replication on master, the walsender is not running either,
so it seems like it's not actually working.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-20 Thread Petr Jelinek
On 20/09/17 05:53, Tom Lane wrote:
> Craig Ringer <cr...@2ndquadrant.com> writes:
>> On 19 September 2017 at 18:04, Petr Jelinek <petr.jeli...@2ndquadrant.com>
>> wrote:
>>> If you are asking why they are not identified by the
>>> BackgroundWorkerHandle, then it's because it's private struct and can't
>>> be shared with other processes so there is no way to link the logical
>>> worker info with bgworker directly.
> 
>> I really want BackgroundWorkerHandle to be public, strong +1 from me.
> 
> I'm confused about what you think that would accomplish.  AFAICS, the
> point of BackgroundWorkerHandle is to allow the creator/requestor of
> a bgworker to verify whether or not the slot in question is still
> "owned" by its request.  This is necessarily not useful to any other
> process, since they didn't make the request.

I works pretty well as unique identifier of the bgworker which can be
used to check if the specific bgworker process is the one we think it is
and that would work even across processes if it could be shared.
Bgworker slots don't help with that as they can be replaced by different
worker between calls, same goes for PID. It technically does not have to
be public struct, just serialize-able/copyable so knowing size of the
struct ought to be enough for that.

> The thought I had in mind upthread was to get rid of logicalrep slots
> in favor of expanding the underlying bgworker slot with some additional
> fields that would carry whatever extra info we need about a logicalrep
> worker.  Such fields could also be repurposed for additional info about
> other kinds of bgworkers, when (not if) we find out we need that.
> 

Well, that would be nice but those slots are in shared memory so it's
not exactly easy to allow other modules to add arbitrary data there.
Maybe we could add just one field that holds DSM handle (or DSA handle)
to whatever dynamic shared memory is allocated for that slot. I don't
know enough about inner working of DSM to say for sure if that's
feasible or not though.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Petr Jelinek
On 19/09/17 16:30, Amit Kapila wrote:
> On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> n 18/09/17 18:42, Tom Lane wrote:
>>> Amit Kapila <amit.kapil...@gmail.com> writes:
>>>> On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> So, frankly, I think we would be best off losing the "logical rep
>>> worker slot" business altogether, and making do with just bgworker
>>> slots.  The alternative is getting the postmaster involved in cleaning
>>> up lrep slots as well as bgworker slots, and I'm going to resist
>>> any such idea strongly.  That way madness lies, or at least an
>>> unreliable postmaster --- if we need lrep slots, what other forty-seven
>>> data structures are we going to need the postmaster to clean up
>>> in the future?
>>>
>>> I haven't studied the code enough to know why it grew lrep worker
>>> slots in the first place.  Maybe someone could explain?
>>>
>>
>> I am not quite sure I understand this question, we need to store
>> additional info about workers in shared memory so we need slots for that.
>>
>> If you are asking why they are not identified by the
>> BackgroundWorkerHandle, then it's because it's private struct and can't
>> be shared with other processes so there is no way to link the logical
>> worker info with bgworker directly.
>>
> 
> Not sure, but can't we identify the actual worker slot if we just have
> pid of background worker?  IIUC, you need access to
> BackgroundWorkerHandle by other processes, so that you can stop them
> like in case of drop subscription command.  If so, then, might be
> storing pid can serve the purpose.
> 

I don't think that pid can be trusted to belong to same process between
the calls, if we could we would not need BackgroundWorkerHandle.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Petr Jelinek
On 19/09/17 15:08, Amit Kapila wrote:
> On Tue, Sep 19, 2017 at 6:29 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> On 19/09/17 14:33, Amit Kapila wrote:
>>> On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek
>>> <petr.jeli...@2ndquadrant.com> wrote:
>>>> n 18/09/17 18:42, Tom Lane wrote:
>>>>
>>>>> So, frankly, I think we would be best off losing the "logical rep
>>>>> worker slot" business altogether, and making do with just bgworker
>>>>> slots.
>>>
>>> I think that would be cleaner as compared to what we have now.
>>>
>>>>>  The alternative is getting the postmaster involved in cleaning
>>>>> up lrep slots as well as bgworker slots, and I'm going to resist
>>>>> any such idea strongly.  That way madness lies, or at least an
>>>>> unreliable postmaster --- if we need lrep slots, what other forty-seven
>>>>> data structures are we going to need the postmaster to clean up
>>>>> in the future?
>>>>>
>>>>> I haven't studied the code enough to know why it grew lrep worker
>>>>> slots in the first place.  Maybe someone could explain?
>>>>>
>>>>
>>>> I am not quite sure I understand this question, we need to store
>>>> additional info about workers in shared memory so we need slots for that.
>>>>
>>>
>>> Yeah, but you could have used the way we do for parallel query where
>>> we setup dsm and share all such information.  You can check the logic
>>> of execparallel.c and parallel.c to see how we do all such stuff for
>>> parallel query.
>>>
>>
>> I don't understand how DSM helps in this case (except perhaps the memory
>> allocation being dynamic rather than static). We still need this shared
>> memory area to be accessible from other places than launcher (the
>> paralllel query has one lead which manages everything, that's not true
>> for logical replication)
>>
> 
> I am not much aware of this area.  Can you explain what other usages
> it has apart from in the process that has launched the worker and in
> worker itself?
> 

The subscription "DDL" commands and monitoring functions need access to
that info. Note that the synchronization workers are not even started by
launcher (I experimented with doing that but it slows the process of
synchronization quite considerably) so it can't manage them unless the
handle is accessible via IPC.

>> and we need it to survive restart of launcher
>> for example, so all the current issues will stay.
>>
> 
> Do you mean to say that you want to save this part of shared memory
> across restart of launcher?
> 

Yes. There is no reason why replication should stop because of launcher
restart.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Petr Jelinek
On 19/09/17 14:33, Amit Kapila wrote:
> On Tue, Sep 19, 2017 at 3:34 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> n 18/09/17 18:42, Tom Lane wrote:
>>
>>> So, frankly, I think we would be best off losing the "logical rep
>>> worker slot" business altogether, and making do with just bgworker
>>> slots.
> 
> I think that would be cleaner as compared to what we have now.
> 
>>>  The alternative is getting the postmaster involved in cleaning
>>> up lrep slots as well as bgworker slots, and I'm going to resist
>>> any such idea strongly.  That way madness lies, or at least an
>>> unreliable postmaster --- if we need lrep slots, what other forty-seven
>>> data structures are we going to need the postmaster to clean up
>>> in the future?
>>>
>>> I haven't studied the code enough to know why it grew lrep worker
>>> slots in the first place.  Maybe someone could explain?
>>>
>>
>> I am not quite sure I understand this question, we need to store
>> additional info about workers in shared memory so we need slots for that.
>>
> 
> Yeah, but you could have used the way we do for parallel query where
> we setup dsm and share all such information.  You can check the logic
> of execparallel.c and parallel.c to see how we do all such stuff for
> parallel query.
> 

I don't understand how DSM helps in this case (except perhaps the memory
allocation being dynamic rather than static). We still need this shared
memory area to be accessible from other places than launcher (the
paralllel query has one lead which manages everything, that's not true
for logical replication) and we need it to survive restart of launcher
for example, so all the current issues will stay.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Petr Jelinek
n 18/09/17 18:42, Tom Lane wrote:
> Amit Kapila <amit.kapil...@gmail.com> writes:
>> On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> The subscriber log includes
>>> 2017-09-18 08:43:08.240 UTC [15672] WARNING:  out of background worker slots
>>> Maybe that's harmless, but I'm suspicious that it's a smoking gun.
> 
>> I have noticed while fixing the issue you are talking that this path
>> is also susceptible to such problems.  In
>> WaitForReplicationWorkerAttach(), it relies on
>> GetBackgroundWorkerPid() to know the status of the worker which won't
>> give the correct status in case of fork failure.  The patch I have
>> posted has a fix for the issue,
> 
> Your GetBackgroundWorkerPid fix looks good as far as it goes, but
> I feel that WaitForReplicationWorkerAttach's problems go way deeper
> than that --- in fact, that function should not exist at all IMO.
> 
> The way that launcher.c is set up, it's relying on either the calling
> process or the called process to free the logicalrep slot when done.
> The called process might never exist at all, so the second half of
> that is obviously unreliable; but WaitForReplicationWorkerAttach
> can hardly be relied on either, seeing it's got a big fat exit-on-
> SIGINT in it.  I thought about putting a PG_TRY, or more likely
> PG_ENSURE_ERROR_CLEANUP, into it, but that doesn't fix the basic
> problem: you can't assume that the worker isn't ever going to start,
> just because you got a signal that means you shouldn't wait anymore.
> 
> Now, this rickety business might be fine if it were only about the
> states of the caller and called processes, but we've got long-lived
> shared data involved (ie the worker slots); failing to clean it up
> is not an acceptable outcome.
> 

We'll clean it up eventually if somebody requests creation of new
logical replication worker and that slot is needed. See the "garbage
collection" part in logicalrep_worker_launch(). I know it's not ideal
solution, but it's the best one I could think of given the bellow.

> So, frankly, I think we would be best off losing the "logical rep
> worker slot" business altogether, and making do with just bgworker
> slots.  The alternative is getting the postmaster involved in cleaning
> up lrep slots as well as bgworker slots, and I'm going to resist
> any such idea strongly.  That way madness lies, or at least an
> unreliable postmaster --- if we need lrep slots, what other forty-seven
> data structures are we going to need the postmaster to clean up
> in the future?
> 
> I haven't studied the code enough to know why it grew lrep worker
> slots in the first place.  Maybe someone could explain?
> 

I am not quite sure I understand this question, we need to store
additional info about workers in shared memory so we need slots for that.

If you are asking why they are not identified by the
BackgroundWorkerHandle, then it's because it's private struct and can't
be shared with other processes so there is no way to link the logical
worker info with bgworker directly. I mentioned that I am not happy
about this during the original development but it didn't gather any
attention which I took to mean that nobody minds.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Petr Jelinek
On 28/08/17 01:36, Michael Paquier wrote:
> On Sun, Aug 27, 2017 at 6:32 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> Attached should fix this.
> 
> +$node_master->poll_query_until('postgres',
> +"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name =
> 'test_slot' AND active_pid IS NULL)"
> +)
> +  or die "slot never became inactive";
> +
>  $stdout_recv = $node_master->pg_recvlogical_upto(
> I am wondering if a similar check should actually go into
> pg_recvlogical_upto() instead. I tend to think that we've learned
> things the hard way with 3043c1dd and such.
> 

Putting it there can lead to tests that take forever to finish if
written incorrectly. This way they at least just fail.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Petr Jelinek
On 27/08/17 04:32, Noah Misch wrote:
> On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
>> On 24/08/17 19:54, Tom Lane wrote:
>>> sungazer just failed with
>>>
>>> pg_recvlogical exited with code '256', stdout '' and stderr 
>>> 'pg_recvlogical: could not send replication command "START_REPLICATION SLOT 
>>> "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": 
>>> ERROR:  replication slot "test_slot" is active for PID 8913148
>>> pg_recvlogical: disconnected
>>> ' at 
>>> /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
>>>  line 1657.
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
>>>
>>> Looks like we're still not there on preventing replication startup
>>> race conditions.
>>
>> Hmm, that looks like "by design" behavior. Slot acquiring will throw
>> error if the slot is already used by somebody else (slots use their own
>> locking mechanism that does not wait). In this case it seems the
>> walsender which was using slot for previous previous step didn't finish
>> releasing the slot by the time we start new command. We can work around
>> this by changing the test to wait perhaps.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 

Attached should fix this.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From d66caa3532558828ecb89b4f153cc1f29332b918 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sun, 27 Aug 2017 11:28:49 +0200
Subject: [PATCH] Fix race condition in logical decoding tap test

---
 src/test/recovery/t/006_logical_decoding.pl | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 4a90e9a..8b35bc8 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -78,6 +78,11 @@ chomp($stdout_recv);
 is($stdout_recv, $expected,
 	'got same expected output from pg_recvlogical decoding session');
 
+$node_master->poll_query_until('postgres',
+"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'test_slot' AND active_pid IS NULL)"
+)
+  or die "slot never became inactive";
+
 $stdout_recv = $node_master->pg_recvlogical_upto(
 	'postgres', 'test_slot', $endpos, 10,
 	'include-xids' => '0',
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: global index

2017-08-25 Thread Petr Jelinek
On 25/08/17 10:28, Chris Travers wrote:
> 
> 
> On Thu, Aug 24, 2017 at 9:44 PM, Andres Freund <and...@anarazel.de
> <mailto:and...@anarazel.de>> wrote:
> 
> Hi,
> 
> On 2017-08-18 12:12:58 +0300, Ildar Musin wrote:
> > While we've been developing pg_pathman extension one of the most 
> frequent
> > questions we got from our users was about global index support. We 
> cannot
> > provide it within an extension. And I couldn't find any recent 
> discussion
> > about someone implementing it. So I'm thinking about giving it a shot 
> and
> > start working on a patch for postgres.
> 
> FWIW, I personally think for constraints the better approach is to make
> the constraint checking code cope with having to check multiple
> indexes. Initially by just checking all indexes, over the longer term
> perhaps pruning the set of to-be-checked indexes based on the values in
> the partition key if applicable.   The problem with creating huge global
> indexes is that you give away some the major advantages of partitioning:
> - dropping partitions now is slow / leaves a lof of garbage again
> - there's no way you can do this with individual partitions being remote
>   or such
> - there's a good chunk of locality loss in global indexes
> 
> The logic we have for exclusion constraints checking can essentially be
> extended to do uniqueness checking over multiple partitions. Depending
> on the desired deadlock behaviour one might end up doing speculative
> insertions in addition.  The foreign key constraint checking is fairly
> simple, essentially one "just" need to remove the ONLY from the
> generated check query.
> 

+1 (or +as much as I am allowed to get away with really ;) )

> 
> To be clear, this would still require a high-level concept of a global
> index and the only question is whether it gets stored as multiple
> partitions against partitioned tables vs stored in one giant index, right?
> 
No, just global constraints. For example, if you consider unique index
to be implementation detail of a unique constraint, there is nothing
stopping us to use multiple such indexes (one per partition) as
implementation detail to single global unique constraint. No need for
global index at all.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-25 Thread Petr Jelinek
On 24/08/17 19:54, Tom Lane wrote:
> sungazer just failed with
> 
> pg_recvlogical exited with code '256', stdout '' and stderr 'pg_recvlogical: 
> could not send replication command "START_REPLICATION SLOT "test_slot" 
> LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": ERROR:  
> replication slot "test_slot" is active for PID 8913148
> pg_recvlogical: disconnected
> ' at 
> /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
>  line 1657.
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
> 
> Looks like we're still not there on preventing replication startup
> race conditions.

Hmm, that looks like "by design" behavior. Slot acquiring will throw
error if the slot is already used by somebody else (slots use their own
locking mechanism that does not wait). In this case it seems the
walsender which was using slot for previous previous step didn't finish
releasing the slot by the time we start new command. We can work around
this by changing the test to wait perhaps.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Don't force-assign transaction id when exporting a snapshot.

2017-08-12 Thread Petr Jelinek
On 14/06/17 20:57, Andres Freund wrote:
> Don't force-assign transaction id when exporting a snapshot.
> 
> Previously we required every exported transaction to have an xid
> assigned. That was used to check that the exporting transaction is
> still running, which in turn is needed to guarantee that that
> necessary rows haven't been removed in between exporting and importing
> the snapshot.
> 
> The exported xid caused unnecessary problems with logical decoding,
> because slot creation has to wait for all concurrent xid to finish,
> which in turn serializes concurrent slot creation.   It also
> prohibited snapshots to be exported on hot-standby replicas.
> 
> Instead export the virtual transactionid, which avoids the unnecessary
> serialization and the inability to export snapshots on standbys. This
> changes the file name of the exported snapshot, but since we never
> documented what that one means, that seems ok.
> 

This commit has side effect that it makes it possible to export
snapshots on the standbys. This makes it possible to do pg_dump -j on
standby with consistent snapshot. Here is one line patch (+ doc update)
which allows doing that when pg_dumping from PG10+.

I also wonder if it should be mentioned in release notes. If the
attached patch would make it into PG10 it would be no brainer to mention
it as feature under pg_dump section, but exporting snapshots alone I am
not sure about.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From e2bca9b5271dbf29e5ab68d99c409c4f7dd752db Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sat, 12 Aug 2017 13:16:48 +0200
Subject: [PATCH] Allow pg_dump -j on standby when dumping from PG10+

PG10 adds support for exporting snapthots on standby, allow pg_dump to
use this feature when consistent parallel dump has been requested.
---
 doc/src/sgml/ref/pg_dump.sgml | 18 ++
 src/bin/pg_dump/pg_dump.c |  2 +-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index bafa031..3c088f3 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -337,14 +337,16 @@ PostgreSQL documentation
 but to abort the dump.


-For a consistent backup, the database server needs to support synchronized snapshots,
-a feature that was introduced in PostgreSQL 9.2. With this
-feature, database clients can ensure they see the same data set even though they use
-different connections. pg_dump -j uses multiple database
-connections; it connects to the database once with the master process and
-once again for each worker job. Without the synchronized snapshot feature, the
-different worker jobs wouldn't be guaranteed to see the same data in each connection,
-which could lead to an inconsistent backup.
+For a consistent backup, the database server needs to support
+synchronized snapshots, a feature that was introduced in
+PostgreSQL 9.2 for primary servers and 10
+for standbys. With this feature, database clients can ensure they see
+the same data set even though they use different connections.
+pg_dump -j uses multiple database connections; it
+connects to the database once with the master process and once again
+for each worker job. Without the synchronized snapshot feature, the
+different worker jobs wouldn't be guaranteed to see the same data in
+each connection, which could lead to an inconsistent backup.


 If you want to run a parallel dump of a pre-9.2 server, you need to make sure that the
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 37cb7cd..8ad9773 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1131,7 +1131,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
 			 AH->remoteVersion >= 90200 &&
 			 !dopt->no_synchronized_snapshots)
 	{
-		if (AH->isStandby)
+		if (AH->isStandby && AH->remoteVersion < 10)
 			exit_horribly(NULL,
 		  "Synchronized snapshots are not supported on standby servers.\n"
 		  "Run with --no-synchronized-snapshots instead if you do not need\n"
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-08-09 Thread Petr Jelinek
On 02/08/17 19:35, Yura Sokolov wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
> 
> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) 
> as in other places
> (in WalSndKeepaliveIfNecessary for example).
> 
> I don't think moving update of 'now' down to end of loop body is correct:
> there are calls to ProcessConfigFile with SyncRepInitConfig, 
> ProcessRepliesIfAny that can
> last non-negligible time. It could lead to over sleeping due to larger 
> computed sleeptime.
> Though I could be mistaken.
> 
> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after 
> WalSndKeepaliveIfNecessary.
> Is it necessary? But it looks harmless at least.
> 

We also need to do actual timeout handing so that the timeout is not
deferred to the end of the transaction (Which is why I moved `if
(!pg_is_send_pending())` under WalSndCheckTimeOut() and
WalSndKeepaliveIfNecessary() calls).

> Could patch be reduced to check after first `if (!pg_is_sendpending())` ? 
> like:
> 
>   if (!pq_is_send_pending())
> - return;
> + {
> + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> + {
> + CHECK_FOR_INTERRUPTS();
> + return;
> + }
> + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
> wal_sender_timeout / 2))
> + return;
> + }
> 
> If not, what problem prevents?

We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
so that it's possible to stop walsender while it's processing large
transaction.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Petr Jelinek
On 25/07/17 01:33, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> I'm back at looking into this again, after a rather exhausting week.  I
>> think I have a better grasp of what is going on in this code now,
> 
> Here's an updated patch, which I intend to push tomorrow.
> 

I think the ConditionVariablePrepareToSleep() call in
ReplicationSlotAcquire() needs to be done inside the mutex lock
otherwise there is tiny race condition where the process which has slot
acquired might release the slot between the mutex unlock and the
ConditionVariablePrepareToSleep() call which means we'll never get
signaled and ConditionVariableSleep() will wait forever.

Otherwise the patch looks good to me.

As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-07 Thread Petr Jelinek
On 06/07/17 18:20, Petr Jelinek wrote:
> On 06/07/17 17:33, Petr Jelinek wrote:
>> On 03/07/17 01:54, Tom Lane wrote:
>>> I noticed a recent failure that looked suspiciously like a race condition:
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07
>>>
>>> The critical bit in the log file is
>>>
>>> error running SQL: 'psql::1: ERROR:  could not drop the replication 
>>> slot "tap_sub" on publisher
>>> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for 
>>> PID 3866790'
>>> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R 
>>> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION 
>>> tap_sub' at 
>>> /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm
>>>  line 1198.
>>>
>>> After poking at it a bit, I found that I can cause several different
>>> failures of this ilk in the subscription tests by injecting delays at
>>> the points where a slot's active_pid is about to be cleared, as in the
>>> attached patch (which also adds some extra printouts for debugging
>>> purposes; none of that is meant for commit).  It seems clear that there
>>> is inadequate interlocking going on when we kill and restart a logical
>>> rep worker: we're trying to start a new one before the old one has
>>> gotten out of the slot.
>>>
>>
>> Thanks for the test case.
>>
>> It's not actually that we start new worker fast. It's that we try to
>> drop the slot right after worker process was killed but if the code that
>> clears slot's active_pid takes too long, it still looks like it's being
>> used. I am quite sure it's possible to make this happen for physical
>> replication as well when using slots.
>>
>> This is not something that can be solved by locking on subscriber. ISTM
>> we need to make pg_drop_replication_slot behave more nicely, like making
>> it wait for the slot to become available (either by default or as an
>> option).
>>
>> I'll have to think about how to do it without rewriting half of
>> replication slots or reimplementing lock queue though because the
>> replication slots don't use normal catalog access so there is no object
>> locking with wait queue. We could use some latch wait with small timeout
>> but that seems ugly as that function can be called by user without
>> having dropped the slot before so the wait can be quite long (as in
>> "forever").
>>
> 
> Naive fix would be something like attached. But as I said, it's not
> exactly pretty.
> 

So best idea I could come up with is to make use of the new condition
variable API. That lets us wait for variable which can be set per slot.

It's not backportable however, I am not sure how big problem that is
considering the lack of complaints until now (maybe we could backport
using the ugly timeout version?).

The attached patch is a prototype of such solution and there are some
race conditions (variable can get signaled before the waiting process
starts sleeping for it). I am mainly sending it to get feedback on the
approach.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From b72ea52c865b2d7f0d7d29d0834d71e1ec33d54a Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Thu, 6 Jul 2017 18:16:44 +0200
Subject: [PATCH] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |  2 +-
 src/backend/replication/slot.c | 43 +-
 src/backend/replication/slotfuncs.c|  2 +-
 src/backend/replication/walsender.c|  6 ++--
 src/include/replication/slot.h |  8 +++--
 5 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 363ca82..a3ba2b1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr();
 
-	ReplicationSlotAcquire(NameStr(*name));
+	ReplicationSlotAcquire(NameStr(*name), true);
 
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..2993bb9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#

[HACKERS] Subscription code improvements

2017-07-07 Thread Petr Jelinek
Hi,

I have done some review of subscription handling (well self-review) and
here is the result of that (It's slightly improved version from another
thread [1]).

I split it into several patches:

0001 - Makes subscription worker exit nicely when there is subscription
missing (ie was removed while it was starting) and also makes disabled
message look same as the message when disabled state was detected while
worker is running. This is mostly just nicer behavior for user (ie no
unexpected errors in log when you just disabled the subscription).

0002 - This is bugfix - the sync worker should exit when waiting for
apply and apply dies otherwise there is possibility of not being
correctly synchronized.

0003 - Splits the schizophrenic SetSubscriptionRelState function into
two which don't try to do broken upsert and report proper errors instead.

0004 - Solves the issue which Masahiko Sawada reported [2] about ALTER
SUBSCRIPTION handling of workers not being transactional - we now do
killing of workers transactionally (and we do the same when possible in
DROP SUBSCRIPTION).

0005 - Bugfix to allow syscache access to subscription without being
connected to database - this should work since subscription is pinned
catalog, it wasn't caught because nothing in core is using it (yet).

0006 - Makes the locking of subscriptions more granular (this depends on
0005). This removes the ugly AccessExclusiveLock on the pg_subscription
catalog from DROP SUBSCRIPTION and also solves some potential race
conditions between launcher, ALTER SUBSCRIPTION and
process_syncing_tables_for_apply(). Also simplifies memory handling in
launcher as well as logicalrep_worker_stop() function. This basically
makes subscriptions behave like every other object in the database in
terms of locking.

Only the 0002, 0004 and 0005 are actual bug fixes, but I'd still like to
get it all into PG10 as especially the locking now behaves really
differently than everything else and that does not seem like a good idea.

[1]
https://www.postgresql.org/message-id/flat/3e06c16c-f441-c606-985c-6d6239097...@2ndquadrant.com
[2]
https://www.postgresql.org/message-id/flat/cad21aobd4t2rwtiwd8yfxd+q+m9swx50yjqt5ibj2mzebvn...@mail.gmail.com

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 4c1ef64493ee930dfde3aa787c454a5d68ac3efd Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Thu, 6 Jul 2017 23:42:56 +0200
Subject: [PATCH 1/6] Improve messaging during logical replication worker
 startup

---
 src/backend/replication/logical/worker.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 0d48dfa..2e4099c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1527,24 +1527,31 @@ ApplyWorkerMain(Datum main_arg)
 		 ALLOCSET_DEFAULT_SIZES);
 	StartTransactionCommand();
 	oldctx = MemoryContextSwitchTo(ApplyContext);
-	MySubscription = GetSubscription(MyLogicalRepWorker->subid, false);
+	MySubscription = GetSubscription(MyLogicalRepWorker->subid, true);
+	if (!MySubscription)
+	{
+		ereport(LOG,
+(errmsg("logical replication apply worker for subscription %u will "
+		"stop because the subscription was removed",
+		MyLogicalRepWorker->subid)));
+		proc_exit(0);
+	}
 	MySubscriptionValid = true;
 	MemoryContextSwitchTo(oldctx);
 
-	/* Setup synchronous commit according to the user's wishes */
-	SetConfigOption("synchronous_commit", MySubscription->synccommit,
-	PGC_BACKEND, PGC_S_OVERRIDE);
-
 	if (!MySubscription->enabled)
 	{
 		ereport(LOG,
-(errmsg("logical replication apply worker for subscription \"%s\" will not "
-		"start because the subscription was disabled during startup",
+(errmsg("logical replication apply worker for subscription \"%s\" will "
+		"stop because the subscription was disabled",
 		MySubscription->name)));
-
 		proc_exit(0);
 	}
 
+	/* Setup synchronous commit according to the user's wishes */
+	SetConfigOption("synchronous_commit", MySubscription->synccommit,
+	PGC_BACKEND, PGC_S_OVERRIDE);
+
 	/* Keep us informed about subscription changes. */
 	CacheRegisterSyscacheCallback(SUBSCRIPTIONOID,
 				  subscription_change_cb,
-- 
2.7.4

From b5d4d9a130658859bcf6e21ca3bed131dbdddb57 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Fri, 7 Jul 2017 00:04:43 +0200
Subject: [PATCH 2/6] Exit in sync worker if relation was removed during
 startup

---
 src/backend/replication/logical/tablesync.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 32ab

Re: [HACKERS] More race conditions in logical replication

2017-07-06 Thread Petr Jelinek
On 06/07/17 17:33, Petr Jelinek wrote:
> On 03/07/17 01:54, Tom Lane wrote:
>> I noticed a recent failure that looked suspiciously like a race condition:
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07
>>
>> The critical bit in the log file is
>>
>> error running SQL: 'psql::1: ERROR:  could not drop the replication 
>> slot "tap_sub" on publisher
>> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 
>> 3866790'
>> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R 
>> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION 
>> tap_sub' at 
>> /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm
>>  line 1198.
>>
>> After poking at it a bit, I found that I can cause several different
>> failures of this ilk in the subscription tests by injecting delays at
>> the points where a slot's active_pid is about to be cleared, as in the
>> attached patch (which also adds some extra printouts for debugging
>> purposes; none of that is meant for commit).  It seems clear that there
>> is inadequate interlocking going on when we kill and restart a logical
>> rep worker: we're trying to start a new one before the old one has
>> gotten out of the slot.
>>
> 
> Thanks for the test case.
> 
> It's not actually that we start new worker fast. It's that we try to
> drop the slot right after worker process was killed but if the code that
> clears slot's active_pid takes too long, it still looks like it's being
> used. I am quite sure it's possible to make this happen for physical
> replication as well when using slots.
> 
> This is not something that can be solved by locking on subscriber. ISTM
> we need to make pg_drop_replication_slot behave more nicely, like making
> it wait for the slot to become available (either by default or as an
> option).
> 
> I'll have to think about how to do it without rewriting half of
> replication slots or reimplementing lock queue though because the
> replication slots don't use normal catalog access so there is no object
> locking with wait queue. We could use some latch wait with small timeout
> but that seems ugly as that function can be called by user without
> having dropped the slot before so the wait can be quite long (as in
> "forever").
> 

Naive fix would be something like attached. But as I said, it's not
exactly pretty.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 6d016a3fad8635c2366bcf5438d49f41c905621c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Thu, 6 Jul 2017 18:16:44 +0200
Subject: [PATCH] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |  2 +-
 src/backend/replication/slot.c | 51 ++
 src/backend/replication/slotfuncs.c|  2 +-
 src/backend/replication/walsender.c|  6 +--
 src/include/replication/slot.h |  4 +-
 5 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 363ca82..a3ba2b1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr();
 
-	ReplicationSlotAcquire(NameStr(*name));
+	ReplicationSlotAcquire(NameStr(*name), true);
 
 	PG_TRY();
 	{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..ab115f4 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
 #include "pgstat.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
+#include "storage/ipc.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -323,7 +324,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -331,6 +332,8 @@ ReplicationSlotAcquire(const char *name)
 
 	Assert(MyReplicationSlot == NULL);
 
+retry:
+
 	/* Search for the named slot and mark it active if we find it. */
 	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
@@ -350,16 +353,48 @@ ReplicationSlotA

Re: [HACKERS] More race conditions in logical replication

2017-07-06 Thread Petr Jelinek
On 03/07/17 01:54, Tom Lane wrote:
> I noticed a recent failure that looked suspiciously like a race condition:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07
> 
> The critical bit in the log file is
> 
> error running SQL: 'psql::1: ERROR:  could not drop the replication 
> slot "tap_sub" on publisher
> DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 
> 3866790'
> while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R 
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION 
> tap_sub' at 
> /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm
>  line 1198.
> 
> After poking at it a bit, I found that I can cause several different
> failures of this ilk in the subscription tests by injecting delays at
> the points where a slot's active_pid is about to be cleared, as in the
> attached patch (which also adds some extra printouts for debugging
> purposes; none of that is meant for commit).  It seems clear that there
> is inadequate interlocking going on when we kill and restart a logical
> rep worker: we're trying to start a new one before the old one has
> gotten out of the slot.
> 

Thanks for the test case.

It's not actually that we start new worker fast. It's that we try to
drop the slot right after worker process was killed but if the code that
clears slot's active_pid takes too long, it still looks like it's being
used. I am quite sure it's possible to make this happen for physical
replication as well when using slots.

This is not something that can be solved by locking on subscriber. ISTM
we need to make pg_drop_replication_slot behave more nicely, like making
it wait for the slot to become available (either by default or as an
option).

I'll have to think about how to do it without rewriting half of
replication slots or reimplementing lock queue though because the
replication slots don't use normal catalog access so there is no object
locking with wait queue. We could use some latch wait with small timeout
but that seems ugly as that function can be called by user without
having dropped the slot before so the wait can be quite long (as in
"forever").

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix doc of DROP SUBSCRIPTION

2017-06-30 Thread Petr Jelinek
On 30/06/17 15:17, Yugo Nagata wrote:
> On Fri, 30 Jun 2017 20:17:39 +0900
> Masahiko Sawada <sawada.m...@gmail.com> wrote:
> 
>> On Fri, Jun 30, 2017 at 7:01 PM, Yugo Nagata <nag...@sraoss.co.jp> wrote:
>>> Hi,
>>>
>>> The documentation says that a subscription that has a replication slot
>>> cannot be dropped  in a transaction block, but it is not allowed even
>>> outside of a transaction block.
>>
>> Hmm, I think we can drop a subscription outside of a transaction block
>> even if the subscription associates with a replication.
> 
> Sorry, I was wrong and missing something... I confirmaed it.
> The documentation is right. Sorry for the noise.
> 

It will only fail if the remote site is not reachable during the drop
(but then it should hint about the ALTER), maybe that's what happened to
you?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Apparent walsender bug triggered by logical replication

2017-06-30 Thread Petr Jelinek
On 30/06/17 04:46, Tom Lane wrote:
> Petr Jelinek <petr.jeli...@2ndquadrant.com> writes:
>> On 30/06/17 02:07, Tom Lane wrote:
>>> I'm also kind of wondering why the "behind the apply" path out of
>>> LogicalRepSyncTableStart exists at all; as far as I can tell we'd be much
>>> better off if we just let the sync worker exit always as soon as it's done
>>> the initial sync, letting any extra catchup happen later.  The main thing
>>> the current behavior seems to be accomplishing is to monopolize one of the
>>> scarce max_sync_workers_per_subscription slots for the benefit of a single
>>> table, for longer than necessary.  Plus it adds additional complicated
>>> interprocess signaling.
> 
>> Hmm, I don't understand what you mean here. The "letting any extra
>> catchup happen later" would never happen if the sync is behind apply as
>> apply has already skipped relevant transactions.
> 
> Once the sync worker has exited, we have to have some other way of dealing
> with that.  I'm wondering why we can't let that other way take over

We make apply wait for the sync worker to get to expected position if it
was behind and only then continue, we can't exactly do that if the apply
already skipped some changes.

> immediately.  The existing approach is inefficient, according to the
> traces I've been poring over all day, and frankly I am very far from
> convinced that it's bug-free either.
-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Apparent walsender bug triggered by logical replication

2017-06-29 Thread Petr Jelinek
On 30/06/17 02:07, Tom Lane wrote:
> I'm also kind of wondering why the "behind the apply" path out of
> LogicalRepSyncTableStart exists at all; as far as I can tell we'd be much
> better off if we just let the sync worker exit always as soon as it's done
> the initial sync, letting any extra catchup happen later.  The main thing
> the current behavior seems to be accomplishing is to monopolize one of the
> scarce max_sync_workers_per_subscription slots for the benefit of a single
> table, for longer than necessary.  Plus it adds additional complicated
> interprocess signaling.
> 

Hmm, I don't understand what you mean here. The "letting any extra
catchup happen later" would never happen if the sync is behind apply as
apply has already skipped relevant transactions.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-27 Thread Petr Jelinek

On 27/06/17 10:51, Masahiko Sawada wrote:
> On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
> 
> I've reviewed this patch briefly.

Thanks!

> 
> @@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid)
>  }
> 
>  /*
> + * Request worker to be stopped on commit.
> + */
> +void
> +logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
> +{
> +   LogicalRepWorkerId *wid;
> +   MemoryContext old;
> +
> +   old = MemoryContextSwitchTo(TopTransactionContext);
> +
> +   wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId));
> +
> +   /*
> +   wid = MemoryContextAlloc(TopTransactionContext,
> +
> sizeof(LogicalRepWorkerId));
> +   */
> +   wid->subid = subid;
> +   wid->relid = relid;
> +
> +   on_commit_stop_workers = lappend(on_commit_stop_workers, wid);
> +
> +   MemoryContextSwitchTo(old);
> +}
> 
> logicalrep_worker_stop_at_commit() has a problem that new_list()
> called by lappend() allocates the memory from current memory context.
> It should switch the memory context and then allocate the memory for
> wid and append it to the list.
> 

Right, fixed (I see you did that locally as well based on the above
excerpt ;) ).

> 
> @@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void)
>  void
>  AtEOXact_ApplyLauncher(bool isCommit)
>  {
> -   if (isCommit && on_commit_launcher_wakeup)
> -   ApplyLauncherWakeup();
> +   ListCell *lc;
> +
> +   if (isCommit)
> +   {
> +   foreach (lc, on_commit_stop_workers)
> +   {
> +   LogicalRepWorkerId *wid = lfirst(lc);
> +   logicalrep_worker_stop(wid->subid, wid->relid);
> +   }
> +
> +   if (on_commit_launcher_wakeup)
> +   ApplyLauncherWakeup();
> 
> Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't
> support the prepared transaction. Since we allocate the list
> on_commit_stop_workers in TopTransactionContext the postgres crashes
> if we execute any query after prepared transaction that removes
> subscription relation state. Also after fixed this issue, we still
> need to something: the list of on_commit_stop_workers is not
> associated the prepared transaction.  A query next to "preapre
> transaction" tries to stop workers at the commit. There was similar
> discussion before.
> 

Hmm, good point. I think for now it makes sense to simply don't allow
PREPARE for transactions that manipulate workers similar to what we do
when there are exported snapshots. Done it that way in attached.

> 
> +
> +   ensure_transaction();
> +
> UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
> +
> rstate->relid, rstate->state,
> +
> rstate->lsn);
> 
> 
> Should we commit the transaction if we started a new transaction
> before update the subscription relation state, or it could be deadlock
> risk.

We only lock the whole subscription (and only conflicting things are
DROP and ALTER SUBSCRIPTION), not individual subscription-relation
mapping so it doesn't seem to me like there is any higher risk of
deadlocks than anything else which works with multiple tables (or
compared to previous behavior).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 88da433110aa3bde3dcce33ebf62d41a08c191b9 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sat, 24 Jun 2017 19:38:21 +0200
Subject: [PATCH] Rework subscription worker and relation status handling

---
 src/backend/access/transam/xact.c   |   9 +
 src/backend/catalog/pg_subscription.c   | 137 ++--
 src/backend/commands/subscriptioncmds.c |  98 -
 src/backend/replication/logical/launcher.c  | 309 
 src/backend/replication/logical/tablesync.c |  97 +
 src/backend/replication/logical/worker.c|  23 ++-
 src/backend/utils/cache/catcache.c  |   6 +-
 src/include/catalog/pg_subscription_rel.h   |   6 +-
 src/include/replication/logicallauncher.h   |   1 +
 src/include/replication/worker_internal.h   |   6 +-
 10 files changed, 393 insertions(+), 299 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b0aa69f..322502d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2277,6 +2277,15 @@ PrepareTransaction(void)
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot PREPARE a transaction that has exported snapshots")));
 
+	/*
+	 * Similar to above, don't 

Re: [HACKERS] Walsender timeouts and large transactions

2017-06-25 Thread Petr Jelinek
On 30/05/17 15:44, Petr Jelinek wrote:
> On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
>>
>> +TimestampTz now = GetCurrentTimestamp();
>>
>> I think it is not recommended to read the current time too
>> frequently, especially within a loop that hates slowness. (I
>> suppose that a loop that can fill up a send queue falls into that
> 
> Yeah that was my main worry for the patch as well, although given that
> the loop does tuple processing it might not be very noticeable.
> 

I realized we actually call GetCurrentTimestamp() there anyway (for the
pq_sendint64). So I just modified the patch to use the now variable
there instead which means there are no additional GetCurrentTimestamp()
calls compared to state before patch now.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From c6e9cdb81311371662c580db42a165ab0575d951 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Thu, 25 May 2017 17:35:24 +0200
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However the fast path failed to consider the
fact that transaction with large number of changes may take very long to
be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time but more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---
 src/backend/replication/walsender.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index f845180..278f882 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1160,6 +1160,8 @@ static void
 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 bool last_write)
 {
+	TimestampTz	now = GetCurrentTimestamp();
+
 	/* output previously gathered data in a CopyData packet */
 	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
 
@@ -1169,23 +1171,28 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	 * several releases by streaming physical replication.
 	 */
 	resetStringInfo();
-	pq_sendint64(, GetCurrentTimestamp());
+	pq_sendint64(, now);
 	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
 		   tmpbuf.data, sizeof(int64));
 
-	/* fast path */
-	/* Try to flush pending output to the client */
-	if (pq_flush_if_writable() != 0)
-		WalSndShutdown();
+	/* Try taking fast path unless we get too close to walsender timeout. */
+	if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+		  wal_sender_timeout / 2))
+	{
+		CHECK_FOR_INTERRUPTS();
 
-	if (!pq_is_send_pending())
-		return;
+		/* Try to flush pending output to the client */
+		if (pq_flush_if_writable() != 0)
+			WalSndShutdown();
+
+		if (!pq_is_send_pending())
+			return;
+	}
 
 	for (;;)
 	{
 		int			wakeEvents;
 		long		sleeptime;
-		TimestampTz now;
 
 		/*
 		 * Emergency bailout if postmaster has died.  This is to avoid the
@@ -1214,18 +1221,16 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		if (pq_flush_if_writable() != 0)
 			WalSndShutdown();
 
-		/* If we finished clearing the buffered data, we're done here. */
-		if (!pq_is_send_pending())
-			break;
-
-		now = GetCurrentTimestamp();
-
 		/* die if timeout was reached */
 		WalSndCheckTimeOut(now);
 
 		/* Send keepalive if the time has come */
 		WalSndKeepaliveIfNecessary(now);
 
+		/* If we finished clearing the buffered data, we're done here. */
+		if (!pq_is_send_pending())
+			break;
+
 		sleeptime = WalSndComputeSleeptime(now);
 
 		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
@@ -1235,6 +1240,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		WaitLatchOrSocket(MyLatch, wakeEvents,
 		  MyProcPort->sock, sleeptime,
 		  WAIT_EVENT_WAL_SENDER_WRITE_DATA);
+
+		/* Update our idea of current time for the next cycle. */
+		now = GetCurrentTimestamp();
 	}
 
 	/* reactivate latch so WalSndLoop knows to continue */
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-25 Thread Petr Jelinek
On 24/06/17 04:50, Tom Lane wrote:
> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>> Do you want to take a look at move those elog calls around a bit?  That
>> should do it.
> 
> It would be a good idea to have some clarity on *why* that should do it.
> 
> Looking at the original report's log, but without having actually
> reproduced the problem, I guess what is happening is this:
> 
> 1. Subscription worker process (23117) gets a duplicate key conflict while
> trying to apply an update, and in consequence it exits.  (Is that supposed
> to happen?)
> 
> 2. Publication server process (23124) doesn't notice client connection
> loss right away.  By chance, the next thing it tries to send to the client
> is the debug output from LogicalIncreaseRestartDecodingForSlot.  Then it
> detects loss of connection (at 2017-06-21 14:55:12.033) and FATAL's out.
> But since the spinlock stuff has no tracking infrastructure, we don't
> know we are still holding the replication slot mutex.
> 
> 3. Process exit cleanup does know that it's supposed to release the
> replication slot, so it tries to take the mutex spinlock ... again.
> Eventually that times out and we get the "stuck spinlock" panic.
> 
> All correct so far?

Sounds about right.

> So, okay, the proximate cause of the crash is a blatant violation of the
> rule that spinlocks may only be held across straight-line code segments.
> But I'm wondering about the client exit having occurred in the first
> place.  Why is that, and how would one ever recover?  It sure looks
> like this isn't the first subscription worker process that has tried
> and failed to apply the update.  If our attitude towards this situation is
> that it's okay to fork-bomb your server with worker processes continually
> respawning and making no progress, well, I don't think that's good enough.
> 

Well, we don't have conflict detection/handling in PG10 like for example
pglogical does. Even once we'll have that it will not be able to resolve
multiple unique index violations probably (there is no obvious way how
to do that automatically). And we can't really progress when there is an
unresolved constraint violation. To recover one has to either remove the
conflicting row on downstream or remove the transaction from replication
upstream by manually consuming it using
pg_logical_slot_get_binary_changes. Now that's arguably somewhat ugly
interface to do it, we might want to invent nicer interface for that
even for PG10, but it would mean catalog bump so it should be rather
soon if we'll go there.

As for fork-bombing, it should be very slow fork bomb (we rate-limit the
worker starting) but it's not ideal situation I agree with that. I am
open to suggestions what we can do there, if we had some kind of list of
non-recoverable errors we could automatically disable the subscription
on them (although we need to be able to modify the catalog for that
which may not be possible in an unrecoverable error) but it's not clear
to me how to reasonably produce such a list.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SUBSCRIPTION log noise

2017-06-25 Thread Petr Jelinek
On 21/06/17 23:32, Euler Taveira wrote:
> 2017-06-21 15:14 GMT-03:00 Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com
> <mailto:peter.eisentr...@2ndquadrant.com>>:
> 
> > It doesn't appear to be contingent on anything other than the content of
> > the command you just gave it.  I don't think we need a NOTICE saying
> > that it did that thing I just told it to do--that should be implicit by
> > the lack of an ERROR.
> 
> I'm appreciative of this complaint.  The point is that we are allocating
> resources on a remote host that should not be forgotten.  I could go
> either way.
> 
> 
> It is documented that subscription can create a replication slot on
> remote host as mentioned in section "Replication Slot Management". If we
> want to maintain the message let's downgrade it to DEBUG1 (as we do with
> "create implicit index for table") but I prefer to rip it out.
> 

I agree with Peter here, the reason why we show this is that there is
implicit action which reserves resources on different server and is not
transactional. If we only did it when explicitly asked for, I would
agree it's not needed but given that it's default behavior I prefer to
inform the user.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-25 Thread Petr Jelinek
(was away for a while, got some time now for this again)

On 22/06/17 04:43, Peter Eisentraut wrote:
> The alternative is that we use the LockSharedObject() approach that was
> already alluded to, like in the attached patch.  Both approaches would
> work equally fine AFAICT.

I agree, but I think we need bigger overhaul of the locking/management
in general. So here is patch which does much more changes.

The patch does several important things (in no particular order):
- Split SetSubscriptionRelState into AddSubscriptionRelState and
UpdateSubscriptionRelState for the reasons said upstream, it's cleaner,
there is no half-broken upsert logic and it has proper error checking
for each action.

- Do LockSharedObject in ALTER SUBSCRIPTION, DROP SUBSCRIPTION (this one
is preexisting but mentioning it for context), SetSubscriptionRelState,
AddSubscriptionRelState, and in the logicalrep_worker_launch. This means
we use granular per object locks to deal with concurrency.

- Because of above, the AccessExclusiveLock on pg_subscription is no
longer needed, just normal RowExlusiveLock is used now.

- logicalrep_worker_stop is also simplified due to the proper locking

- There is new interface logicalrep_worker_stop_at_commit which is used
by ALTER SUBSCRIPTION ... REFRESH PUBLICATION and by transactional
variant of DROP SUBSCRIPTION to only kill workers at the end of transaction.

- Locking/reading of subscription info is unified between DROP and ALTER
SUBSCRIPTION commands.

- DROP SUBSCRIPTION will kill all workers associated with subscription,
not just apply.

- The sync worker checks during startup if the relation is still subscribed.

- The sync worker will exit when waiting for apply and apply has shut-down.

- The log messages around workers and removed or disabled subscription
are now more consistent between startup and normal runtime of the worker.

- Some code deduplication and stylistic changes/simplification in
related areas.

- Fixed catcache's IndexScanOK() handling of the subscription catalog.

It's bit bigger patch but solves issues from multiple threads around
handling of ALTER/DROP subscription.

A lot of the locking that I added is normally done transparently by
dependency handling, but subscriptions and subscription relation status
do not use that much as it was deemed to bloat pg_depend needlessly
during the original patch review (it's also probably why this has
slipped through).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From d7038474012769c9c3b50231af76dd7796fe593f Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sat, 24 Jun 2017 19:38:21 +0200
Subject: [PATCH] Rework subscription worker and relation status handling

---
 src/backend/catalog/pg_subscription.c   | 137 +++--
 src/backend/commands/subscriptioncmds.c |  98 +-
 src/backend/replication/logical/launcher.c  | 293 +++-
 src/backend/replication/logical/tablesync.c |  97 +
 src/backend/replication/logical/worker.c|  23 ++-
 src/backend/utils/cache/catcache.c  |   6 +-
 src/include/catalog/pg_subscription_rel.h   |   6 +-
 src/include/replication/worker_internal.h   |   6 +-
 8 files changed, 367 insertions(+), 299 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index c69c461..b643e54 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -28,6 +28,8 @@
 
 #include "nodes/makefuncs.h"
 
+#include "storage/lmgr.h"
+
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -225,84 +227,101 @@ textarray_to_stringlist(ArrayType *textarray)
 }
 
 /*
- * Set the state of a subscription table.
- *
- * If update_only is true and the record for given table doesn't exist, do
- * nothing.  This can be used to avoid inserting a new record that was deleted
- * by someone else.  Generally, subscription DDL commands should use false,
- * workers should use true.
- *
- * The insert-or-update logic in this function is not concurrency safe so it
- * might raise an error in rare circumstances.  But if we took a stronger lock
- * such as ShareRowExclusiveLock, we would risk more deadlocks.
+ * Add new state record for a subscription table.
  */
 Oid
-SetSubscriptionRelState(Oid subid, Oid relid, char state,
-		XLogRecPtr sublsn, bool update_only)
+AddSubscriptionRelState(Oid subid, Oid relid, char state,
+		XLogRecPtr sublsn)
 {
 	Relation	rel;
 	HeapTuple	tup;
-	Oid			subrelid = InvalidOid;
+	Oid			subrelid;
 	bool		nulls[Natts_pg_subscription_rel];
 	Datum		values[Natts_pg_subscription_rel];
 
+	LockSharedObject(SubscriptionRelationId, subid, 0, AccessShareLock);
+
 	rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
 
 	/* Try finding existing mapping. */
 	tup = Sea

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-15 Thread Petr Jelinek
On 15/06/17 18:36, Peter Eisentraut wrote:
> On 6/15/17 12:22, Petr Jelinek wrote:
>> On 15/06/17 17:53, Peter Eisentraut wrote:
>>> On 6/14/17 18:35, Petr Jelinek wrote:
>>>> Attached fixes it (it was mostly about order of calls).
>>>
>>> So do I understand this right that the actual fix is just moving up the
>>> logicalrep_worker_stop() call in DropSubscription().
>>>
>>
>> No the fix is heap_open before SearchSysCache().
> 
> Right.  Is there a reason for moving the _stop() call then?
> 

Nothing specific, just felt it's better there when I was messing with
the function.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-15 Thread Petr Jelinek
On 15/06/17 17:53, Peter Eisentraut wrote:
> On 6/14/17 18:35, Petr Jelinek wrote:
>> Attached fixes it (it was mostly about order of calls).
> 
> So do I understand this right that the actual fix is just moving up the
> logicalrep_worker_stop() call in DropSubscription().
>

No the fix is heap_open before SearchSysCache().

>> I also split the
>> SetSubscriptionRelState into 2 separate interface while I was changing
>> it, because now that the update_only bool was added it has become quite
>> strange to have single interface for what is basically two separate
>> functions.
> 
> makes sense
> 
>> Other related problem is locking of subscriptions during operations on
>> them, especially AlterSubscription seems like it should lock the
>> subscription itself. I did that in 0002.
> 
> More detail here please.  AlterSubscription() does locking via
> heap_open().  This introduces a new locking method.  What are the
> implications?
> 

I don't think heap_open will be enough once we remove the
AccessExclusiveLock of the catalog in DropSubscription because
concurrent AlterSubscription might happily add tables to the
subscription that has been dropped if we don't lock it. But you made me
realize that even my patch is not enough because we then reread the
subscription info only from syscache without any kind of invalidation
attempt.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refreshing subscription relation state inside a transaction block

2017-06-15 Thread Petr Jelinek
On 15/06/17 15:52, Peter Eisentraut wrote:
> On 6/15/17 02:41, Petr Jelinek wrote:
>> Hmm, forcibly stopping currently running table sync is not what was
>> intended, I'll have to look into it. We should not be forcibly stopping
>> anything except the main apply worker during drop subscription (and we
>> do that only because we can't drop the remote replication slot otherwise).
> 
> The change being complained about was specifically to address the
> problem described in the commit message:
> 
> Stop table sync workers when subscription relation entry is removed
> 
> When a table sync worker is in waiting state and the subscription table
> entry is removed because of a concurrent subscription refresh, the
> worker could be left orphaned.  To avoid that, explicitly stop the
> worker when the pg_subscription_rel entry is removed.
> 
> 
> Maybe that wasn't the best solution.  Alternatively, the tablesync
> worker has to check itself whether the subscription relation entry has
> disappeared, or we need a post-commit check to remove orphaned workers.
> 

Ah I missed that commit/discussion, otherwise I would have probably
complained. I don't like killing workers in the DDL command, as I said
the only reason we do it in DropSubscription is to free the slot for
dropping.

I think that either of the options you suggested now would be better.
We'll need that for stopping the tablesync which is in progress during
DropSubscription as well as those will currently still keep running. I
guess we could simply just register syscache callback, the only problem
with that is we'd need to AcceptInvalidationMessages regularly while we
do the COPY which is not exactly free so maybe killing at the end of
transaction would be better (both for refresh and drop)?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refreshing subscription relation state inside a transaction block

2017-06-15 Thread Petr Jelinek
On 13/06/17 18:33, Masahiko Sawada wrote:
> On Wed, Jun 14, 2017 at 1:02 AM, Masahiko Sawada <sawada.m...@gmail.com> 
> wrote:
>> On Tue, Jun 13, 2017 at 4:53 PM, Petr Jelinek
>> <petr.jeli...@2ndquadrant.com> wrote:
>>> On 13/06/17 09:06, Masahiko Sawada wrote:
>>>> Hi,
>>>>
>>>> The commit ddd7b22b225ae41d16ceb218b387645cb9becfdc makes table sync
>>>> workers stop when subscription relation entry is removed. It doesn't
>>>> work fine inside transaction block. I think we should disallow to use
>>>> the following subscription DDLs inside a transaction block. Attached
>>>> patch.
>>>>
>>>
>>> Can you be more specific than "It doesn't work fine inside transaction
>>> block", what do you expect to happen and what actually happens?
>>>
>>
>> If we do ALTER SUBSCRIPTION SET PUBLICATION during executing table
>> sync then it forcibly stops concurrently running table sync worker for
>> a table that had been removed from pg_subscription_rel.
> 

Hmm, forcibly stopping currently running table sync is not what was
intended, I'll have to look into it. We should not be forcibly stopping
anything except the main apply worker during drop subscription (and we
do that only because we can't drop the remote replication slot otherwise).

> Also, until commit the transaction the worker cannot launch new table
> sync worker due to conflicting tuple lock on pg_subscription_rel.
> 

That part is correct, we don't know if the transaction will be rolled
back or not so we can't start workers as the state of the data in the
table in unknown.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-14 Thread Petr Jelinek
On 14/06/17 20:57, Andres Freund wrote:
> Hi,
> 
> On 2017-06-13 00:50:20 -0700, Andres Freund wrote:
>> Just to be clear: The patch, after the first point above (which I did),
>> looks ok.  I'm just looking for comments.
> 
> And pushed.
> 

Thanks!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-14 Thread Petr Jelinek
On 13/06/17 21:49, Peter Eisentraut wrote:
> On 6/13/17 02:33, Noah Misch wrote:
>>> Steps to reproduce -
>>> X cluster -> create 100 tables , publish all tables  (create publication pub
>>> for all tables);
>>> Y Cluster -> create 100 tables ,create subscription(create subscription sub
>>> connection 'user=centos host=localhost' publication pub;
>>> Y cluster ->drop subscription - drop subscription sub;
>>>
>>> check the log file on Y cluster.
>>>
>>> Sometime , i have seen this error on psql prompt and drop subscription
>>> operation got failed at first attempt.
>>>
>>> postgres=# drop subscription sub;
>>> ERROR:  tuple concurrently updated
>>> postgres=# drop subscription sub;
>>> NOTICE:  dropped replication slot "sub" on publisher
>>> DROP SUBSCRIPTION
>>
>> [Action required within three days.  This is a generic notification.]
> 
> It's being worked on.  Let's see by Thursday.
> 

Attached fixes it (it was mostly about order of calls). I also split the
SetSubscriptionRelState into 2 separate interface while I was changing
it, because now that the update_only bool was added it has become quite
strange to have single interface for what is basically two separate
functions.

There are still couple of remaining issues from this thread though.
Namely the AccessExclusiveLock of the pg_subscription catalog which is
not very pretty, but we need a way to block launcher from accessing the
subscription which is being dropped and make sure it will not start new
workers for it afterwards. Question is how however as by the time
launcher can lock individual subscription it is already processing it.
So it looks to me like we'd need to reread the catalog with new snapshot
after the lock was acquired which seems bit wasteful (I wonder if we
could just AcceptInvalidationMessages and refetch from syscache). Any
better ideas?

Other related problem is locking of subscriptions during operations on
them, especially AlterSubscription seems like it should lock the
subscription itself. I did that in 0002.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 005eeb820f4e0528513744136582c4489e2429e3 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Wed, 14 Jun 2017 08:14:20 +0200
Subject: [PATCH 2/2] Lock subscription when altering it

---
 src/backend/commands/subscriptioncmds.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 14c8f3f..e0ec8ea 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -642,6 +642,13 @@ AlterSubscription(AlterSubscriptionStmt *stmt)
 	   stmt->subname);
 
 	subid = HeapTupleGetOid(tup);
+
+	/*
+	 * Lock the subscription so nobody else can do anything with it (including
+	 * the replication workers).
+	 */
+	LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);
+
 	sub = GetSubscription(subid, false);
 
 	/* Form a new tuple. */
-- 
2.7.4

From 9011698ae800e0f45f960e91f6b16eab15634fac Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Tue, 13 Jun 2017 19:26:51 +0200
Subject: [PATCH 1/2] Improve the pg_subscription_rel handling

Split the SetSubscriptionRelState into separate Add and Update
functions, removing the unsafe upsert logic as callers are supposed to
know whether they are updating or adding new row.

Reorder the code in the above mentioned functions to avoid "tuple
updated concurrently" warnings.
---
 src/backend/catalog/pg_subscription.c   | 131 +++-
 src/backend/commands/subscriptioncmds.c |  14 +--
 src/backend/replication/logical/tablesync.c |  33 ---
 src/include/catalog/pg_subscription_rel.h   |   6 +-
 4 files changed, 98 insertions(+), 86 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index c5b2541..fd19675 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -225,24 +225,15 @@ textarray_to_stringlist(ArrayType *textarray)
 }
 
 /*
- * Set the state of a subscription table.
- *
- * If update_only is true and the record for given table doesn't exist, do
- * nothing.  This can be used to avoid inserting a new record that was deleted
- * by someone else.  Generally, subscription DDL commands should use false,
- * workers should use true.
- *
- * The insert-or-update logic in this function is not concurrency safe so it
- * might raise an error in rare circumstances.  But if we took a stronger lock
- * such as ShareRowExclusiveLock, we would risk more deadlocks.
+ * Add new state record for a subscription table.
  */
 Oid
-SetSubscriptio

Re: [HACKERS] Refreshing subscription relation state inside a transaction block

2017-06-13 Thread Petr Jelinek
On 13/06/17 09:06, Masahiko Sawada wrote:
> Hi,
> 
> The commit ddd7b22b225ae41d16ceb218b387645cb9becfdc makes table sync
> workers stop when subscription relation entry is removed. It doesn't
> work fine inside transaction block. I think we should disallow to use
> the following subscription DDLs inside a transaction block. Attached
> patch.
> 

Can you be more specific than "It doesn't work fine inside transaction
block", what do you expect to happen and what actually happens?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_subscription_rel entry can be updated concurrently

2017-06-12 Thread Petr Jelinek
On 13/06/17 02:52, Masahiko Sawada wrote:
> Hi,
> 
> I often get an error "ERROR:  tuple concurrently updated" when
> changing subscription state(ALTER SUBSCRIPTION or DROP SUBSCRIPTION).
> The cause of this error is that table sync worker and apply worker can
> try to update the same tuple in pg_subscription_rel. Especially it
> often happens when we do initial copy for many tables and change it
> during executing.
> 
> I think that touching the same tuple by two worker processes happens
> when aborting replication for a table or a subscription, so it would
> be the same result as when the worker ends up with an error. But I
> think since it's not an appropriate behavior we should deal with it.
> Any thoughts?
> 

This has been already reported by tushar in different thread and it's
still on my list to fix.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-09 Thread Petr Jelinek
On 09/06/17 03:20, Petr Jelinek wrote:
> On 09/06/17 01:06, Andres Freund wrote:
>> Hi,
>>
>> On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:
>>> One thing I don't like is the GetLastLocalTransactionId() that I had to
>>> add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
>>> but I don't have better solution there.
>>
>> I dislike that quite a bit - how about we instead just change the
>> information we keep in exportedSnapshots?  I.e. store a pointer to an
>> struct ExportedSnapshot
>> {
>> char *exportedAs;
>> Snapshot snapshot;
>> }
>>
>> Then we can get rid of that relatively ugly list_length() business too.
>>
> 
> That sounds reasonable, I'll do that.
> 

And here it is, seems better (the 0002 is same as before).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From f6df47e8257cbce4a78ceeeac504c9fe86527b05 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sat, 3 Jun 2017 02:07:49 +0200
Subject: [PATCH 2/2] Don't assign xid to logical decoding snapshots

---
 src/backend/replication/logical/snapbuild.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8848f5b..e06aa09 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -262,7 +262,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -463,7 +463,7 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+SnapBuildBuildSnapshot(SnapBuild *builder)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -562,7 +562,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder, GetTopTransactionId());
+	snap = SnapBuildBuildSnapshot(builder);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -679,7 +679,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder, TransactionId xid)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -743,7 +743,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+			builder->snapshot = SnapBuildBuildSnapshot(builder);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1061,7 +1061,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1831,7 +1831,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+	builder->snapshot = SnapBuildBuildSnapshot(builder);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-- 
2.7.4

From aaf6a332bd5010a6f4f6fd113c703a26533d5737 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sat, 3 Jun 2017 02:06:22 +0200
Subject: [PATCH 1/2] Use virtual transaction instead of normal ones in
 exported snapshots

---
 doc/src/sgml/ref/set_transaction.sgml |   8 +--
 src/backend/storage/ipc/procarray.c   |  11 ++--
 src/backend/storage/lmgr/predicate.c  |  26 
 src/backend/utils/time/snapmgr.c  | 121 ++
 src/include/storage/predicate.h   |   4 +-
 src/include/storage/procarray.h   |   2 +-
 6 files changed, 109 insertions(+), 63 deletions(-)

diff --git a/doc/src/s

Re: [HACKERS] walsender & parallelism

2017-06-09 Thread Petr Jelinek
On 09/06/17 20:56, Andres Freund wrote:
> On 2017-06-08 23:04:31 -0700, Noah Misch wrote:
>> On Wed, Jun 07, 2017 at 10:54:57PM -0700, Noah Misch wrote:
>>> On Fri, Jun 02, 2017 at 11:06:29PM -0700, Andres Freund wrote:
>>>> On 2017-06-02 22:12:46 -0700, Noah Misch wrote:
>>>>> On Fri, Jun 02, 2017 at 11:27:55PM -0400, Peter Eisentraut wrote:
>>>>>> On 5/31/17 23:54, Peter Eisentraut wrote:
>>>>>>> On 5/29/17 22:01, Noah Misch wrote:
>>>>>>>> On Tue, May 23, 2017 at 01:45:59PM -0400, Andres Freund wrote:
>>>>>>>>> On May 23, 2017 1:42:41 PM EDT, Petr Jelinek 
>>>>>>>>> <petr.jeli...@2ndquadrant.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> so this didn't really move anywhere AFAICS, do we think the approach
>>>>>>>>>> I've chosen is good or do we want to do something else here?
>>>>>>>>>
>>>>>>>>> Can you add it to the open items list?
>>>>>>>>
>>>>>>>> [Action required within three days.  This is a generic notification.]
>>>>>>>
>>>>>>> I have posted an update.  The next update will be Friday.
>>>>>>
>>>>>> Andres is now working on this.  Let's give him a bit of time. ;-)
>>>>>
>>>>> If you intended this as your soon-due status update, it is missing a 
>>>>> mandatory
>>>>> bit.
>>>>
>>>> I'm now owning this item.  I've posted patches, await review.  If none
>>>> were to be forthcoming, I'll do another pass Monday, and commit.
>>>
>>> This PostgreSQL 10 open item is past due for your status update.  Kindly 
>>> send
>>> a status update within 24 hours, and include a date for your subsequent 
>>> status
>>> update.  Refer to the policy on open item ownership:
>>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
>> for your status update.  Please reacquaint yourself with the policy on open
>> item ownership[1] and then reply immediately.  If I do not hear from you by
>> 2017-06-10 07:00 UTC, I will transfer this item to release management team
>> ownership without further notice.
> 
> The issue starting this thread is resolved, including several issues
> found in its wake, with 47fd420fb4d3e77dde960312f8672c82b14ecbad
> 6e1dd2773eb60a6ab87b27b8d9391b756e904ac3 and 
> c1abe6c786d8f00643de8519140d77644b474163
> 
> There's a remaining testing patch, and some related things left though.
> Basically patches 0001 and 0003 from
> http://archives.postgresql.org/message-id/81eaca60-3f7b-6a21-5531-fd51ffbf3f63%402ndquadrant.com
> I'd personally rather see a separate open-items entry for those, since
> this isn't related to paralellism, signal handler or such.
> 
> Petr, Peter (henceforth P^2), do you agree?

Makes sense, one could argue that 0003 could be even 2 open items with 2
patches - one with the fixes for parser and one for the other type of
message support (I am not even sure if we want to add other message
support into PG10).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-08 Thread Petr Jelinek
On 09/06/17 01:06, Andres Freund wrote:
> Hi,
> 
> On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:
>> One thing I don't like is the GetLastLocalTransactionId() that I had to
>> add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
>> but I don't have better solution there.
> 
> I dislike that quite a bit - how about we instead just change the
> information we keep in exportedSnapshots?  I.e. store a pointer to an
> struct ExportedSnapshot
> {
> char *exportedAs;
> Snapshot snapshot;
> }
> 
> Then we can get rid of that relatively ugly list_length() business too.
> 

That sounds reasonable, I'll do that.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender termination error messages worse in v10

2017-06-08 Thread Petr Jelinek
On 08/06/17 23:57, Andres Freund wrote:
> On 2017-06-03 00:55:22 +0200, Petr Jelinek wrote:
>> On 02/06/17 23:45, Andres Freund wrote:
>>> Hi Petr,
>>>
>>> On 2017-06-02 22:57:37 +0200, Petr Jelinek wrote:
>>>> On 02/06/17 20:51, Andres Freund wrote:
>>>>> I don't understand why the new block is there, nor does the commit
>>>>> message explain it.
>>>>>
>>>>
>>>> Hmm, that particular change can actually be reverted. It was needed for
>>>> one those custom replication commands which were replaced by normal
>>>> query support. I have missed it during the rewrite.
>>>
>>> Doesn't appear to be quite that simple, I get regression test failures
>>> in that case.
>>>
>>
>> Hmm, looks like we still use it for normal COPY handling. So basically
>> the problem is that if we run COPY TO STDOUT and then consume it using
>> the libpqrcv_receive it will end with normal PGRES_COMMAND_OK but we
>> need to call PQgetResult() in that case otherwise libpq thinks the
>> command is still active and any following command will fail, but if we
>> call PQgetResult on dead connection we get that error you complained about.
> 
> Should this possibly handled at the caller level?  This is a bit too
> much magic for my taste.

It is, I agree with that. I guess the main issue with handling it in
callers is that callers can't access libpq api so there would need to be
some additional abstraction (we did some of the necessary parts of that
for libpqrcv_exec).

> 
> Looking at the callers, the new code isn't super-obvious either:
> 
> len = walrcv_receive(wrconn, , );
> 
> if (len != 0)
> {
> /* Process the data */
> for (;;)
> {
> CHECK_FOR_INTERRUPTS();
> 
> if (len == 0)
> {
> break;
> }
> else if (len < 0)
> {
> ereport(LOG,
> (errmsg("data stream from publisher has ended")));
> endofstream = true;
> break;
> }
> 
> The len < 0, hidden inside a len != 0, which in the loop again chcks if
> len == 0 (because it's decremented in loop)?  And there's two different[5~
>   len = walrcv_receive(wrconn, , );
> statements?
> 

The logic there is that you have one main loop and one busy loop. When
there is flow of data, things happen as the data are processed and we
don't need to do the bookkeeping of timeouts, latching, other
maintenance tasks that need to be done (like checking table
synchronization in logical replication) because it all happens naturally
and that's what the inner loop is for. But when the upstream gets idle
and does not send anything (or there is network issue, etc) we need to
do these things (otherwise we'd miss timeouts, table synchronization
worker would wait forever on apply, etc) and that's what the outer loop
is for.

We only enter the inner loop if there were some data that's why there is
the firs len != 0, but then the inner loop bases decisions on the return
code of the walrcv_receive() so the if is repeated again (it also will
call walrcv_receive() as long as there are data so it needs to check for
those calls too).

It took me a while to understand why walreceiver does this originally,
but it did make sense to me once I understood the split between the main
(idle) loop and busy loop, so I did it the same way.

>> I guess it would make sense to do conditional exit on
>> (PQstatus(streamConn) == CONNECTION_BAD) like libpqrcv_PQexec does. It's
>> quite ugly code-wise though.
> 
> I think that's fine for now.  It'd imo be a good idea to improve matters
> here a bit, but for now I've just applied your patch.
> 

Okay, thanks.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Petr Jelinek
On 08/06/17 03:50, Josh Berkus wrote:
> On 06/07/2017 06:25 PM, Petr Jelinek wrote:
>> On 08/06/17 03:19, Josh Berkus wrote:
>>>
>>> Peter and Petr:
>>>
>>> On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
>>>> On 6/7/17 01:01, Josh Berkus wrote:
>>>>> * Having defaults on the various _workers all devolve from max_workers
>>>>> is also great.
>>>>
>>>> I'm not aware of anything like that happening.
>>>>
>>>>> P1. On the publishing node, logical replication relies on the *implied*
>>>>> correspondence of the application_name and the replication_slot both
>>>>> being named the same as the publication in order to associate a
>>>>> particular publication with a particular replication connection.
>>>>> However, there's absolutely nothing preventing me from also creating a
>>>>> binary replication connection by the same name  It really seems like we
>>>>> need a field in pg_stat_replication or pg_replication_slots which lists
>>>>> the publication.
>>>>
>>>> I'm not quite sure what you are getting at here.  The application_name
>>>> seen on the publisher side is the subscription name.  You can create a
>>>> binary replication connection using the same application_name, but
>>>> that's already been possible before.  But the publications don't care
>>>> about any of this.
>>>
>>> My point is that there is no system view where I can see, on the origin
>>> node, what subscribers are subscribing to which publications.  You can
>>> kinda guess that from pg_stat_replication etc., but it's not dependable
>>> information.
>>>
>>
>> That's like wanting the foreign server to show you which foreign tables
>> exist on the local server. This is not a tightly coupled system and you
>> are able to setup both sides without them being connected to each other
>> at the time of setup, so there is no way publisher can know anything.
> 
> Why wouldn't the publisher know who's connected once the replication
> connection as been made and the subscription has started?  Or is it just
> a log position, and the publisher really has no idea how many
> publications are being consumed?
> 

Plugin knows while the connection exists, but that's the thing, it goes
through pluggable interface (that can be used by other plugins, without
publications) so there would have to be some abstracted way for plugins
to give some extra information for the pg_stat_replication or similar
view. I am afraid it's bit too late to design something like that in
PG10 cycle.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Petr Jelinek
On 08/06/17 03:19, Josh Berkus wrote:
> 
> Peter and Petr:
> 
> On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
>> On 6/7/17 01:01, Josh Berkus wrote:
>>> * Having defaults on the various _workers all devolve from max_workers
>>> is also great.
>>
>> I'm not aware of anything like that happening.
>>
>>> P1. On the publishing node, logical replication relies on the *implied*
>>> correspondence of the application_name and the replication_slot both
>>> being named the same as the publication in order to associate a
>>> particular publication with a particular replication connection.
>>> However, there's absolutely nothing preventing me from also creating a
>>> binary replication connection by the same name  It really seems like we
>>> need a field in pg_stat_replication or pg_replication_slots which lists
>>> the publication.
>>
>> I'm not quite sure what you are getting at here.  The application_name
>> seen on the publisher side is the subscription name.  You can create a
>> binary replication connection using the same application_name, but
>> that's already been possible before.  But the publications don't care
>> about any of this.
> 
> My point is that there is no system view where I can see, on the origin
> node, what subscribers are subscribing to which publications.  You can
> kinda guess that from pg_stat_replication etc., but it's not dependable
> information.
> 

That's like wanting the foreign server to show you which foreign tables
exist on the local server. This is not a tightly coupled system and you
are able to setup both sides without them being connected to each other
at the time of setup, so there is no way publisher can know anything.

> 
>>> P2: If I create a subscription on a table with no primary key, I do not
>>> recieve a warning.  There should be a warning, since in most cases such
>>> a subscription will not work.  I suggest the text:
>>>
>>> "logical replication target relation "public.fines" has no primary key.
>>> Either create one, or set REPLICA IDENTITY index and set the published
>>> relation to REPLICA IDENTITY FULL."
>>
>> At that point, we don't know what is being published.  If only inserts
>> are being published or REPLICA IDENTITY FULL is set, then it will work.
>> We don't want to give warnings about things that might not be true.
>>
>> More guidance on some of the potential failure cases would be good, but
>> it would need more refinement.
> 
> Hmmm, yah, I see.  Let me explain why this is a UX issue as-is though:
> 
> 1. User forgets to create a PK on the subscriber node.
> 
> 2. User starts a subscription to the tables.
> 
> 3. Subscription is successful.
> 
> 4. First update hits the publisher node.
> 
> 5. Subscription fails and disconnects.
> 
> The user's first thought is going to be a network issue, or a bug, or
> some other problem, not a missing PK.  Yeah, they can find that
> information in the logs, but only if they think to look for it in the
> first place, and in some environments (AWS, containers, etc.) logs can
> be very hard to access.
> 
> We really need the subscription to fail at step (2), not wait for the
> first update to fail.  And if it doesn't fail at step 2, then we should
> at least give a warning.
> 

Yes, I actually mentioned somewhere at some point that we should call
the checks we call during the replication also from the appropriate DDL
commands when possible (the information might not be available when the
DDL is executed), but never got to actually implementing it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of snapshot in logical replication

2017-06-07 Thread Petr Jelinek
On 08/06/17 03:08, Craig Ringer wrote:
> On 7 June 2017 at 18:16, sanyam jain <sanyamjai...@live.in> wrote:
>> Hi,
>>
>> Can someone explain the usage of exporting snapshot when a logical
>> replication slot is created?
> 
> It's used to pg_dump the schema at a consistent point in history where
> all xacts are known to be in the snapshot (and thus dumped) or known
> not to be (so they'll be streamed out on the slot).
> 

Not just schema, it can also be used to get existing data at consistent
point in history so that changes will follow without gaps or duplicates.

That being said, the built-in logical replication isn't using the
exported snapshots at all.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-07 Thread Petr Jelinek
Hi,

On 07/06/17 07:01, Josh Berkus wrote:
> Folks,
> 
> I've put together some demos on PostgreSQL 10beta1.  Here's a few
> feedback notes based on my experience with it.
> [...snip...]
> 
> Problems
> 
> 
> P1. On the publishing node, logical replication relies on the *implied*
> correspondence of the application_name and the replication_slot both
> being named the same as the publication in order to associate a
> particular publication with a particular replication connection.
> However, there's absolutely nothing preventing me from also creating a
> binary replication connection by the same name  It really seems like we
> need a field in pg_stat_replication or pg_replication_slots which lists
> the publication.
> 

What do you mean implied correspondence of application_name and the
replication_slot? We only use subscription_name as default value for
those when user does not specify something else, all three of those can
have different value if user sets it up that way. And there is no
correspondence whatsoever to names of publications. The upstream only
knows which publications to replicate because subscription gives list of
requested publications as option to START_REPLICATION walsender command.
The list of publications associated with a subscription are only stored
on the subscriber and publisher has no idea what those are.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Petr Jelinek
Hi,

On 07/06/17 22:49, Erik Rijkers wrote:
> I am not sure whether what I found here amounts to a bug, I might be
> doing something dumb.
> 
> During the last few months I did tests by running pgbench over logical
> replication.  Earlier emails have details.
> 
> The basic form of that now works well (and the fix has been comitted)
> but as I looked over my testing program I noticed one change I made to
> it, already many weeks ago:
> 
> In the cleanup during startup (pre-flight check you might say) and also
> before the end, instead of
> 
>   echo "delete from pg_subscription;" | psql -qXp $port2 -- (1)
> 
> I changed that (as I say, many weeks ago) to:
> 
>   echo "delete from pg_subscription;
> delete from pg_subscription_rel;
> delete from pg_replication_origin; " | psql -qXp $port2   -- (2)
> 
> This occurs (2x) inside the bash function clean_pubsub(), in main test
> script pgbench_detail2.sh
> 
> This change was an effort to ensure to arrive at a 'clean' start (and
> end-) state which would always be the same.
> 
> All my more recent testing (and that of Mark, I have to assume) was thus
> done with (2).
> 
> Now, looking at the script again I am thinking that it would be
> reasonable to expect that after issuing
>delete from pg_subscription;
> 
> the other 2 tables are /also/ cleaned, automatically, as a consequence. 
> (Is this reasonable? this is really the main question of this email).
> 

Hmm, they are not cleaned automatically, deleting from system catalogs
manually like this never propagates to related tables, we don't use FKs
there.

> So I removed the latter two delete statements again, and ran the tests
> again with the form in  (1)
> 
> I have established that (after a number of successful cycles) the test
> stops succeeding with in the replica log repetitions of:
> 
> 2017-06-07 22:10:29.057 CEST [2421] LOG:  logical replication apply
> worker for subscription "sub1" has started
> 2017-06-07 22:10:29.057 CEST [2421] ERROR:  could not find free
> replication state slot for replication origin with OID 11
> 2017-06-07 22:10:29.057 CEST [2421] HINT:  Increase
> max_replication_slots and try again.
> 2017-06-07 22:10:29.058 CEST [2061] LOG:  worker process: logical
> replication worker for subscription 29235 (PID 2421) exited with exit
> code 1
> 
> when I manually 'clean up' by doing:
>delete from pg_replication_origin;
> 

Yeah because you consumed all the origins (I am still not huge fan of
how that limit works, but that's separate discussion).

> then, and only then, does the session finish and succeed ('replica ok').
> 
> So to me it looks as if there is an omission of
> pg_replication_origin-cleanup when pg_description is deleted.
> 

There is no omission, origin is not supposed to be deleted automatically
unless you use DROP SUBSCRIPTION.


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-07 Thread Petr Jelinek
On 07/06/17 03:00, Andres Freund wrote:
> On 2017-06-06 19:36:13 +0200, Petr Jelinek wrote:
> 
>> As a side note, we are starting to have several IsSomeTypeOfProcess
>> functions for these kind of things. I wonder if bgworker infrastructure
>> should somehow provide this type of stuff (the proposed bgw_type might
>> help there) as well as maybe being able to register interrupt handler
>> for the worker (it's really hard to do it via custom SIGTERM handler as
>> visible on worker, launcher and walsender issues we are fixing).
>> Obviously this is PG11 related thought.
> 
> Maybe it's also a sign that the bgworker infrastructure isn't really the
> best thing to use for internal processes like parallel workers and lrep
> processes - it's really built so core code *doesn't* know anything
> specific about them, which isn't really what we want in some of these
> cases.  That's not to say that bgworkers and parallelism/lrep shouldn't
> share infrastructure, don't get me wrong.
>

Well the nice thing about bgworkers is that it provides the basic
infrastructure. Main reason lrep stuff is using it is that I didn't want
to add another special backend kind to 20 different places, but turns
out it still needs to be in some. So either we need to add more support
for these kind of things to bgworkers or write something for internal
workers that shares the infrastructure with bgworkers as you say.

> 
>>  
>> -LogicalRepCtx->launcher_pid = 0;
>> -
>> -/* ... and if it returns, we're done */
>> -ereport(DEBUG1,
>> -(errmsg("logical replication launcher shutting down")));
>> +/* Not reachable */
>> +}
> 
> Maybe put a pg_unreachable() there?
> 

Ah didn't know it existed.

> 
>> @@ -2848,6 +2849,14 @@ ProcessInterrupts(void)
>>  ereport(FATAL,
>>  (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>   errmsg("terminating logical 
>> replication worker due to administrator command")));
>> +else if (IsLogicalLauncher())
>> +{
>> +ereport(DEBUG1,
>> +(errmsg("logical replication launcher 
>> shutting down")));
>> +
>> +/* The logical replication launcher can be stopped at 
>> any time. */
>> +proc_exit(0);
>> +}
> 
> We could use PgBackendStatus->st_backendType for these, merging these
> different paths.
> 

Hmm, that's not that easy, st_backendType will be generic type for
bgworker as the bgw_type patch didn't land yet (which is discussed in
yet another thread [1]). It seems like an argument for going forward
with it (the bgw_type patch) in PG10.

> I can take care of this one, if you/Peter want.
> 

I don't mind, it has some overlap with your proposed fixes for latching
so if you are willing go ahead.

[1]
https://www.postgresql.org/message-id/flat/0d795703-a885-2193-2331-f00d7a3a4...@2ndquadrant.com#0d795703-a885-2193-2331-f00d7a3a4...@2ndquadrant.com

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Petr Jelinek
On 06/06/17 23:42, Andres Freund wrote:
> On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:
>> On 06/06/17 23:17, Andres Freund wrote:
>>> Right.  I found a couple more instance of similarly iffy, although not
>>> quite as broken, patterns in launcher.c.  It's easy to get this wrong,
>>> but it's a lot easy if you do it differently everywhere you use a
>>> latch.  It's not good if code in the same file, by the same author(s),
>>> has different ways of using latches.
>>
>> Huh? I see same pattern everywhere in launcher.c, what am I missing?
> 
> WaitForReplicationWorkerAttach:
> while (...)
> CHECK_FOR_INTERRUPTS();
> /* other stuff including returns */
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
> 
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff */
> CHECK_FOR_INTERRUPTS()
> WaitLatch()
> POSTMASTER_DEATH
> ResetLatch()
> /* other stuff including returns */
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff including returns */
> CHECK_FOR_INTERRUPTS();
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
> 
> ApplyLauncherMain:
> while (!got_SIGTERM)
> /* lots other stuff */
> WaitLatch()
> WL_POSTMASTER_DEATH
> /* some other stuff */
> ResetLatch()
> (note no CFI)
> 
> they're not hugely different, but subtely there are differences.
> Sometimes you're guaranteed to check for interrupts after resetting the
> latch, in other cases not. Sometimes expensive-ish things happen before
> a CFI...
> 

Ah that's because signals in launcher are broken, see
https://www.postgresql.org/message-id/fe072153-babd-3b5d-8052-73527a6eb...@2ndquadrant.com
which also includes patch to fix it.

We originally had custom signal handling everywhere, then I realized it
was mistake for workers because of locking interaction but missed same
issue with launcher (the CFI in current launcher doesn't work).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpqrcv_PQexec() seems to violate latch protocol

2017-06-06 Thread Petr Jelinek
On 06/06/17 23:17, Andres Freund wrote:
> On 2017-06-06 17:14:59 -0400, Tom Lane wrote:
>> Andres Freund <and...@anarazel.de> writes:
>>> The function  in $subject does:
>>
>>> ResetLatch(>procLatch);
>>> rc = WaitLatchOrSocket(>procLatch,
>>>WL_POSTMASTER_DEATH | WL_SOCKET_READABLE 
>>> |
>>>WL_LATCH_SET,
>>>PQsocket(streamConn),
>>>0,
>>>WAIT_EVENT_LIBPQWALRECEIVER);
>>
>> Yeah, this is certainly broken.
>>
>>> Afaict, the ResetLatch() really should just instead be in the if (rc & 
>>> WL_LATCH_SET) block.
>>
>> And, to be specific, it should be before the CHECK_FOR_INTERRUPTS call,
>> since that is the useful work that we want to be sure occurs after
>> any latch-setting event.
> 
> Right.  I found a couple more instance of similarly iffy, although not
> quite as broken, patterns in launcher.c.  It's easy to get this wrong,
> but it's a lot easy if you do it differently everywhere you use a
> latch.  It's not good if code in the same file, by the same author(s),
> has different ways of using latches.

Huh? I see same pattern everywhere in launcher.c, what am I missing?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-06-06 Thread Petr Jelinek
On 06/06/17 21:09, Robert Haas wrote:
> On Tue, Jun 6, 2017 at 3:01 PM, Erik Rijkers <e...@xs4all.nl> wrote:
>> Belated apologies all round for the somewhat provocative $subject; but I
>> felt at that moment that this item needed some extra attention.
> 
> FWIW, it seemed like a pretty fair subject line to me given your test
> results.  I think it's in everyone's interest to get this feature
> stabilized before we ship a final release - people will rely on it,
> and if it drops even one row even occasionally, it's going to be a big
> problem for our users.
> 

+1, I considered the issues solved when snapshot builder issues were
fixed so it was good reminder to check again.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-06 Thread Petr Jelinek
On 03/06/17 18:53, Peter Eisentraut wrote:
> On 6/2/17 14:52, Peter Eisentraut wrote:
>> On 5/24/17 15:14, Petr Jelinek wrote:
>>> All the locking works just fine the way it is in master. The issue with
>>> deadlock with apply comes from the wrong handling of the SIGTERM in the
>>> apply (we didn't set InterruptPending). I changed the SIGTERM handler in
>>> patch 0001 just to die which is actually the correct behavior for apply
>>> workers. I also moved the connection cleanup code to the
>>> before_shmem_exit callback (similar to walreceiver) and now that part
>>> works correctly.
>>
>> I have committed this, in two separate parts.  This should fix the
>> originally reported issue.
>>
>> I will continue to work through your other patches.  I notice there is
>> still a bit of discussion about another patch, so please let me know if
>> there is anything else I should be looking for.
> 
> I have committed the remaining two patches.  I believe this fixes the
> originally reported issue.
> 

So the fact that we moved workers to standard interrupt handling broke
launcher in subtle ways because it still uses it's own SIGTERM handling
but some function it calls are using CHECK_FOR_INTERRUPTS (they are used
by worker as well). I think we need to move launcher to standard
interrupt handling as well. It's not same as other processes though as
it's allowed to be terminated any time (just like autovacuum launcher)
so we just proc_exit(0) instead of FATALing out.

This is related to the nightjar failures btw.

As a side note, we are starting to have several IsSomeTypeOfProcess
functions for these kind of things. I wonder if bgworker infrastructure
should somehow provide this type of stuff (the proposed bgw_type might
help there) as well as maybe being able to register interrupt handler
for the worker (it's really hard to do it via custom SIGTERM handler as
visible on worker, launcher and walsender issues we are fixing).
Obviously this is PG11 related thought.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 3fb7ed57d669beba3feba894a11c9dcff8e60414 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Tue, 6 Jun 2017 19:26:12 +0200
Subject: [PATCH] Use standard interrupt handling in logical replication
 launcher

---
 src/backend/replication/logical/launcher.c | 41 --
 src/backend/tcop/postgres.c|  9 +++
 src/include/replication/logicallauncher.h  |  2 ++
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 5aaf24b..52169f1 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -81,7 +81,6 @@ static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
 
 /* Flags set by signal handlers */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t got_SIGTERM = false;
 
 static bool on_commit_launcher_wakeup = false;
 
@@ -623,20 +622,6 @@ logicalrep_worker_onexit(int code, Datum arg)
 	ApplyLauncherWakeup();
 }
 
-/* SIGTERM: set flag to exit at next convenient time */
-static void
-logicalrep_launcher_sigterm(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGTERM = true;
-
-	/* Waken anything waiting on the process latch */
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGHUP: set flag to reload configuration at next convenient time */
 static void
 logicalrep_launcher_sighup(SIGNAL_ARGS)
@@ -798,13 +783,14 @@ ApplyLauncherMain(Datum main_arg)
 
 	before_shmem_exit(logicalrep_launcher_onexit, (Datum) 0);
 
+	Assert(LogicalRepCtx->launcher_pid == 0);
+	LogicalRepCtx->launcher_pid = MyProcPid;
+
 	/* Establish signal handlers. */
 	pqsignal(SIGHUP, logicalrep_launcher_sighup);
-	pqsignal(SIGTERM, logicalrep_launcher_sigterm);
+	pqsignal(SIGTERM, die);
 	BackgroundWorkerUnblockSignals();
 
-	LogicalRepCtx->launcher_pid = MyProcPid;
-
 	/*
 	 * Establish connection to nailed catalogs (we only ever access
 	 * pg_subscription).
@@ -812,7 +798,7 @@ ApplyLauncherMain(Datum main_arg)
 	BackgroundWorkerInitializeConnection(NULL, NULL);
 
 	/* Enter main loop */
-	while (!got_SIGTERM)
+	for (;;)
 	{
 		int			rc;
 		List	   *sublist;
@@ -822,6 +808,8 @@ ApplyLauncherMain(Datum main_arg)
 		TimestampTz now;
 		long		wait_time = DEFAULT_NAPTIME_PER_CYCLE;
 
+		CHECK_FOR_INTERRUPTS();
+
 		now = GetCurrentTimestamp();
 
 		/* Limit the start retry to once a wal_retrieve_retry_interval */
@@ -894,13 +882,16 @@ ApplyLauncherMain(Datum main_arg)
 		ResetLatch(>procLatch);
 	}
 
-	LogicalRepCtx->launcher_pid = 0;
-
-	/* ... and if it returns, we're done */
-	ereport(DEBUG1,
-			(errmsg("logical replication launcher shutting down")));
+	/*

Re: [HACKERS] inconsistent application_name use in logical workers

2017-06-06 Thread Petr Jelinek
On 06/06/17 15:07, Peter Eisentraut wrote:
> On 6/6/17 06:51, Petr Jelinek wrote:
>> On 06/06/17 04:19, Peter Eisentraut wrote:
>>> The logical replication code is supposed to use the subscription name as
>>> the fallback_application_name, but in some cases it uses the slot name,
>>> which could be different.  See attached patch to correct this.
>>
>> Hmm, well the differentiation has a reason though. The application_name
>> is used for sync rep and having synchronization connection using same
>> application_name might have adverse effects there because
>> synchronization connection can be in-front of main apply one, so sync
>> rep will think something is consumed while it's not.
> 
> True, we should use a different name for tablesync.c.  But the one in
> worker.c appears to be a mistake then?
> 

Yes.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconsistent application_name use in logical workers

2017-06-06 Thread Petr Jelinek
On 06/06/17 04:19, Peter Eisentraut wrote:
> The logical replication code is supposed to use the subscription name as
> the fallback_application_name, but in some cases it uses the slot name,
> which could be different.  See attached patch to correct this.
> 

Hmm, well the differentiation has a reason though. The application_name
is used for sync rep and having synchronization connection using same
application_name might have adverse effects there because
synchronization connection can be in-front of main apply one, so sync
rep will think something is consumed while it's not.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-06-05 Thread Petr Jelinek
On 03/06/17 05:18, Peter Eisentraut wrote:
> On 6/2/17 16:44, Petr Jelinek wrote:
>> However, I am not sure about the bgw_name_extra. I think I would have
>> preferred keeping full bgw_name field which would be used where full
>> name is needed and bgw_type where only the worker type is used. The
>> concatenation just doesn't sit well with me, especially if it requires
>> the bgw_name_extra to start with space.
> 
> I see your point.  There are also some i18n considerations to think through.
> 

So thinking a bit more, I wonder if we could simply do following:
- remove the application_name from logical workers
- add bgw_type and use it for worker type (if empty, use 'bgworker' like
now), would be probably nice if parallel workers added something to
indicate they are parallel workers there as well
- remove the 'bgworker:' prefix for ps display and just use the bgw_name

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-06-04 Thread Petr Jelinek
On 03/06/17 16:12, Jeff Janes wrote:
> 
> On Fri, Jun 2, 2017 at 4:10 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com <mailto:petr.jeli...@2ndquadrant.com>> wrote:
> 
> 
> While I was testing something for different thread I noticed that I
> manage transactions incorrectly in this patch. Fixed here, I didn't test
> it much yet (it takes a while as you know :) ). Not sure if it's related
> to the issue you've seen though.
> 
> 
> This conflicts again with Peter E's recent commit
> 3c9bc2157a4f465b3c070d1250597568d2dc285f
> 

Rebased on top of current HEAD.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 067edf450967c76d000c0b4a2cddf719fae45f7f Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Wed, 31 May 2017 01:51:45 +0200
Subject: [PATCH] Improve handover logic between sync and apply workers

Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.

Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.
---
 src/backend/replication/logical/tablesync.c | 215 
 1 file changed, 123 insertions(+), 92 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6e268f3..8926914 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -27,27 +27,21 @@
  *		 synchronization has finished.
  *
  *	  The stream position synchronization works in multiple steps.
- *	   - Sync finishes copy and sets table state as SYNCWAIT and waits
+ *	   - Sync finishes copy and sets worker state as SYNCWAIT and waits
  *		 for state to change in a loop.
  *	   - Apply periodically checks tables that are synchronizing for SYNCWAIT.
- *		 When the desired state appears it will compare its position in the
- *		 stream with the SYNCWAIT position and based on that changes the
- *		 state to based on following rules:
- *		  - if the apply is in front of the sync in the WAL stream the new
- *			state is set to CATCHUP and apply loops until the sync process
- *			catches up to the same LSN as apply
- *		  - if the sync is in front of the apply in the WAL stream the new
- *			state is set to SYNCDONE
- *		  - if both apply and sync are at the same position in the WAL stream
- *			the state of the table is set to READY
- *	   - If the state was set to CATCHUP sync will read the stream and
- *		 apply changes until it catches up to the specified stream
- *		 position and then sets state to READY and signals apply that it
- *		 can stop waiting and exits, if the state was set to something
- *		 else than CATCHUP the sync process will simply end.
- *	   - If the state was set to SYNCDONE by apply, the apply will
- *		 continue tracking the table until it reaches the SYNCDONE stream
- *		 position at which point it sets state to READY and stops tracking.
+ *		 When the desired state appears it will set the worker state to
+ *		 CATCHUP and starts loop waiting until either table state is set to
+ *		 SYNCDONE or worker exits
+ *	   - After worker seen the state change to CATCHUP, it will read the
+ *	 stream and apply changes until it catches up to the specified stream
+ *		 position and then sets state to SYNCDONE. Note that there might be
+ *		 zero changes applied betweem CATCHUP and SYNCDONE states as the sync
+ *		 worker might be ahead of the apply.
+ *	   - Once the state was set to SYNCDONE, the apply will continue tracking
+ *	 the table until it reaches the SYNCDONE stream position at which
+ *	 point it sets state to READY and stops tracking. Again there might
+ *	 be zero changes inbetween.
  *
  *	  The catalog pg_subscription_rel is used to keep information about
  *	  subscribed tables and their state and some transient state during
@@ -56,26 +50,29 @@
  *	  Example flows look like this:
  *	   - Apply is in front:
  *		  sync:8
- *			-> set SYNCWAIT
+ *			-> set in memory SYNCWAIT
  *		  apply:10
- *			-> set CATCHUP
+ *			-> set in memory CATCHUP
  *			-> enter wait-loop
  *		  sync:10
- *			-> set READY
+ *			-> set in catalog SYNCDONE
  *			-> exit
  *		  apply:10
  *			-> exit wait-loop
  *			-> continue rep
+ *		  apply:11
+ *		-> set in catalog READY
  *	   - Sync in front:
  *		  sync:10
- *			-> set SYNCWAIT
+ *			-> set in memory SYNCWAIT
  *		  apply:8
- *			-> set SYNCDONE
+ *			-> set in memory CATCHUP
  *			-> continue per-table filtering
- *		  sync:10
+  *		  sync:10
+ *			-> set in catalog SYNCDONE
  *			-> exit
  *		  apply:10
- *			-> set READY
+ *			-> set in catalog READY
  *			-> stop per-table filtering
  *			-> continue rep
  *

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-02 Thread Petr Jelinek
On 03/06/17 04:45, Mark Kirkwood wrote:
> On 03/06/17 11:10, Petr Jelinek wrote:
> 
>> On 02/06/17 22:29, Petr Jelinek wrote:
>>> On 02/06/17 08:55, Mark Kirkwood wrote:
>>>> On 02/06/17 17:11, Erik Rijkers wrote:
>>>>
>>>>> On 2017-06-02 00:46, Mark Kirkwood wrote:
>>>>>> On 31/05/17 21:16, Petr Jelinek wrote:
>>>>>>
>>>>>> I'm seeing a new failure with the patch applied - this time the
>>>>>> history table has missing rows. Petr, I'll put back your access :-)
>>>>> Is this error during 1-minute runs?
>>>>>
>>>>> I'm asking because I've moved back to longer (1-hour) runs (no errors
>>>>> so far), and I'd like to keep track of what the most 'vulnerable'
>>>>> parameters are.
>>>>>
>>>> Yeah, still using your test config (with my minor modifications).
>>>>
>>>> When I got the error the 1st time, I did a complete make clean and
>>>> rebuildbut it is still possible I've 'done it wrong' - so
>>>> independent confirmation would be good!
>>> Well, I've seen this issue as well while I was developing the fix, but
>>> the patch I proposed fixed it for me as well as the original issue.
>>>
>> While I was testing something for different thread I noticed that I
>> manage transactions incorrectly in this patch. Fixed here, I didn't test
>> it much yet (it takes a while as you know :) ). Not sure if it's related
>> to the issue you've seen though.
>>
>>
> Ok - I've applied this version, and running tests again. I needed to do
> a git pull to apply the patch, so getting some other changes too!
> 

Thanks, yes, I forgot to mention that I rebased it against the current
HEAD as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-02 Thread Petr Jelinek
On 03/06/17 03:25, Andres Freund wrote:
> 
>> That should solve the original problem reported here.
> 
> Did you verify that?
>

Yes, I tried to manually create multiple exported logical decoding
snapshots in parallel and read data from them and it worked fine, while
it blocked before.

> 
>> @@ -1741,17 +1741,17 @@ GetSerializableTransactionSnapshotInt(Snapshot 
>> snapshot,
>>  } while (!sxact);
>>  
>>  /* Get the snapshot, or check that it's safe to use */
>> -if (!TransactionIdIsValid(sourcexid))
>> +if (!sourcevxid)
>>  snapshot = GetSnapshotData(snapshot);
>> -else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcexid))
>> +else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
>>  {
>>  ReleasePredXact(sxact);
>>  LWLockRelease(SerializableXactHashLock);
>>  ereport(ERROR,
>>  
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>   errmsg("could not import the requested 
>> snapshot"),
>> -   errdetail("The source transaction %u is not running 
>> anymore.",
>> - sourcexid)));
>> +   errdetail("The source virtual transaction %d/%u is not running 
>> anymore.",
>> + sourcevxid->backendId, 
>> sourcevxid->localTransactionId)));
> 
> Hm, this is a harder to read.  Wonder if we should add a pid field, that'd
> make it a bit easier to interpret?
> 

Agreed, see attached. We have to pass the pid around a bit but I don't
think it's too bad.

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


0002-Don-t-assign-xid-to-logical-decoding-snapshots.patch
Description: invalid/octet-stream


0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-02 Thread Petr Jelinek
On 02/06/17 03:38, Robert Haas wrote:
> On Thu, Jun 1, 2017 at 2:28 PM, Andres Freund <and...@anarazel.de> wrote:
>> On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:
>>> Thinking more about this, I am not convinced it's a good idea to change
>>> exports this late in the cycle. I still think it's best to do the xid
>>> assignment only when the snapshot is actually exported but don't assign
>>> xid when the export is only used by the local session (the new option in
>>> PG10). That's one line change which impacts only logical
>>> replication/decoding as opposed to everything else which uses exported
>>> snapshots.
>>
>> I'm not quite convinced by this argument.  Exported snapshot contents
>> are ephemeral, we can change the format at any time.  The wait is fairly
>> annoying for every user of logical decoding.  For me the combination of
>> those two fact implies that we should rather fix this properly.
> 
> +1.  The change Andres is proposing doesn't sound like it will be
> terribly high-impact, and I think we'll be happier in the long run if
> we install real fixes instead of kludging it.
> 

All right, here is my rough attempt on doing what Andres has suggested,
going straight from xid to vxid, seems to work fine in my tests and
parallel pg_dump seems happy with it as well.

The second patch removes the xid parameter from SnapBuildBuildSnapshot
as it's not used for anything and skips the GetTopTransactionId() call
as well. That should solve the original problem reported here.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


0002-Don-t-assign-xid-to-logical-decoding-snapshots.patch
Description: invalid/octet-stream


0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-06-02 Thread Petr Jelinek
On 02/06/17 22:29, Petr Jelinek wrote:
> On 02/06/17 08:55, Mark Kirkwood wrote:
>> On 02/06/17 17:11, Erik Rijkers wrote:
>>
>>> On 2017-06-02 00:46, Mark Kirkwood wrote:
>>>> On 31/05/17 21:16, Petr Jelinek wrote:
>>>>
>>>> I'm seeing a new failure with the patch applied - this time the
>>>> history table has missing rows. Petr, I'll put back your access :-)
>>>
>>> Is this error during 1-minute runs?
>>>
>>> I'm asking because I've moved back to longer (1-hour) runs (no errors
>>> so far), and I'd like to keep track of what the most 'vulnerable'
>>> parameters are.
>>>
>>
>> Yeah, still using your test config (with my minor modifications).
>>
>> When I got the error the 1st time, I did a complete make clean and
>> rebuildbut it is still possible I've 'done it wrong' - so
>> independent confirmation would be good!
> 
> Well, I've seen this issue as well while I was developing the fix, but
> the patch I proposed fixed it for me as well as the original issue.
> 

While I was testing something for different thread I noticed that I
manage transactions incorrectly in this patch. Fixed here, I didn't test
it much yet (it takes a while as you know :) ). Not sure if it's related
to the issue you've seen though.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Improve-handover-logic-between-sync-and-apply-worker-v2.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender termination error messages worse in v10

2017-06-02 Thread Petr Jelinek
On 02/06/17 23:45, Andres Freund wrote:
> Hi Petr,
> 
> On 2017-06-02 22:57:37 +0200, Petr Jelinek wrote:
>> On 02/06/17 20:51, Andres Freund wrote:
>>> I don't understand why the new block is there, nor does the commit
>>> message explain it.
>>>
>>
>> Hmm, that particular change can actually be reverted. It was needed for
>> one those custom replication commands which were replaced by normal
>> query support. I have missed it during the rewrite.
> 
> Doesn't appear to be quite that simple, I get regression test failures
> in that case.
> 

Hmm, looks like we still use it for normal COPY handling. So basically
the problem is that if we run COPY TO STDOUT and then consume it using
the libpqrcv_receive it will end with normal PGRES_COMMAND_OK but we
need to call PQgetResult() in that case otherwise libpq thinks the
command is still active and any following command will fail, but if we
call PQgetResult on dead connection we get that error you complained about.

I guess it would make sense to do conditional exit on
(PQstatus(streamConn) == CONNECTION_BAD) like libpqrcv_PQexec does. It's
quite ugly code-wise though.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


libpqwalsender-disconnect-fix.diff
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender termination error messages worse in v10

2017-06-02 Thread Petr Jelinek
Hi Andres,

On 02/06/17 20:51, Andres Freund wrote:
> Hi,
> 
> commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
> Author: Peter Eisentraut <pete...@gmx.net>
> Date:   2017-03-23 08:36:36 -0400
> 
> Logical replication support for initial data copy
> 
> made walreceiver emit worse messages in v10 than before when the master
> gets shut down.  Before 10 you'll get something like:
> 
> 2017-06-02 11:46:07.173 PDT [16143][] LOG:  replication terminated by primary 
> server
> 2017-06-02 11:46:07.173 PDT [16143][] DETAIL:  End of WAL reached on timeline 
> 1 at 0/1B7FB8C8.
> 2017-06-02 11:46:07.173 PDT [16143][] FATAL:  could not send end-of-streaming 
> message to primary: no COPY in progress
> 
> the last bit is a bit superflous, but it still kinda makes sense.  Now
> you get:
> 2017-06-02 11:47:09.362 PDT [17061][] FATAL:  unexpected result after 
> CommandComplete: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> 
> the reason is that
> 
> static int
> libpqrcv_receive(char **buffer, pgsocket *wait_fd)
> {
> 
> previously did:
> if (rawlen == -1)   /* end-of-streaming or error */
> {
> PGresult   *res;
> 
> res = PQgetResult(streamConn);
> if (PQresultStatus(res) == PGRES_COMMAND_OK ||
> PQresultStatus(res) == PGRES_COPY_IN)
> {
> PQclear(res);
> return -1;
> }
> else
> {
> PQclear(res);
> ereport(ERROR,
> (errmsg("could not receive data from WAL stream: %s",
> PQerrorMessage(streamConn;
> }
> }
> 
> and now does
> 
> if (rawlen == -1)   /* end-of-streaming or error */
> {
> PGresult   *res;
> 
> res = PQgetResult(conn->streamConn);
> if (PQresultStatus(res) == PGRES_COMMAND_OK)
> {
> PQclear(res);
> 
> /* Verify that there are no more results */
> res = PQgetResult(conn->streamConn);
> if (res != NULL)
> ereport(ERROR,
> (errmsg("unexpected result after CommandComplete: %s",
> PQerrorMessage(conn->streamConn;
> return -1;
> }
> else if (PQresultStatus(res) == PGRES_COPY_IN)
> {
> PQclear(res);
> return -1;
> }
> else
> {
> PQclear(res);
> ereport(ERROR,
> (errmsg("could not receive data from WAL stream: %s",
> pchomp(PQerrorMessage(conn->streamConn);
> }
> }
> 
> note the new /* Verify that there are no more results */ bit.
> 
> I don't understand why the new block is there, nor does the commit
> message explain it.
> 

Hmm, that particular change can actually be reverted. It was needed for
one those custom replication commands which were replaced by normal
query support. I have missed it during the rewrite.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher set application_name?

2017-06-02 Thread Petr Jelinek
On 02/06/17 21:05, Peter Eisentraut wrote:
> On 6/2/17 02:31, Masahiko Sawada wrote:
>> I'd say current patch makes the user difficult to
>> distinguish between apply worker and table sync worker.
> 
> We could arguably make apply workers and sync workers have different
> bgw_type values.  But if you are interested in that level of detail, you
> should perhaps look at pg_stat_subscription.  pg_stat_activity only
> contains the "common" data, and the process-specific data is in other views.
> 

Agreed with this.

However, I am not sure about the bgw_name_extra. I think I would have
preferred keeping full bgw_name field which would be used where full
name is needed and bgw_type where only the worker type is used. The
concatenation just doesn't sit well with me, especially if it requires
the bgw_name_extra to start with space.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-06-02 Thread Petr Jelinek
On 02/06/17 08:55, Mark Kirkwood wrote:
> On 02/06/17 17:11, Erik Rijkers wrote:
> 
>> On 2017-06-02 00:46, Mark Kirkwood wrote:
>>> On 31/05/17 21:16, Petr Jelinek wrote:
>>>
>>> I'm seeing a new failure with the patch applied - this time the
>>> history table has missing rows. Petr, I'll put back your access :-)
>>
>> Is this error during 1-minute runs?
>>
>> I'm asking because I've moved back to longer (1-hour) runs (no errors
>> so far), and I'd like to keep track of what the most 'vulnerable'
>> parameters are.
>>
> 
> Yeah, still using your test config (with my minor modifications).
> 
> When I got the error the 1st time, I did a complete make clean and
> rebuildbut it is still possible I've 'done it wrong' - so
> independent confirmation would be good!

Well, I've seen this issue as well while I was developing the fix, but
the patch I proposed fixed it for me as well as the original issue.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is *still* broken)

2017-06-02 Thread Petr Jelinek
On 02/06/17 15:37, Amit Kapila wrote:
> On Thu, Jun 1, 2017 at 10:36 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> On 01/06/17 15:25, Tom Lane wrote:
>>> Robert Haas <robertmh...@gmail.com> writes:
>>>> So, are you going to, perhaps, commit this?  Or who is picking this up?
>>>
>>>> /me knows precious little about Windows.
>>>
>>> I'm not going to be the one to commit this either, but seems like someone
>>> should.
>>>
>>
>> The new code does not use any windows specific APIs or anything, it just
>> adds retry logic for reattaching when we do EXEC_BACKEND which seems to
>> be agreed way of solving this. I do have couple of comments about the
>> code though.
>>
>> The new parameter retry_count in PGSharedMemoryReAttach() seems to be
>> only used to decide if to log reattach issues so that we don't spam log
>> when retrying, but this fact is not mentioned anywhere.
>>
> 
> No, it is to avoid calling free of memory which is not reserved on
> retry.  See the comment:
> + * On the first try, release memory region reservation that was made by
> + * the postmaster.
> 
> Are you referring to the same function in sysv_shm.c, if so probably I
> can say refer the same API in win32_shmem.c or maybe add a similar
> comment there as well?
> 

Yeah something like that would help, but my main confusion comes from
the fact that there is counter (and even named as such) but only
relevant difference is 0 and not 0. I'd like mention of that mainly
since I was confused by that on the first read.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-01 Thread Petr Jelinek
On 01/06/17 17:32, Masahiko Sawada wrote:
> On Thu, May 25, 2017 at 5:29 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
>> On 05/25/2017 12:44 AM, Petr Jelinek wrote:
>>>
>>> There is still outstanding issue that sync worker will keep running
>>> inside the long COPY because the invalidation messages are also not
>>> processed until it finishes but all the original issues reported here
>>> disappear for me with the attached patches applied.
>>
>> After applying all your patches, drop subscription no  more hangs while
>> dropping  subscription but there is an error   "ERROR:  tuple concurrently
>> updated" in the log file of
>> standby.
>>
> 
> I tried to reproduce this issue with latest four patches I submit but
> it didn't happen. I guess this issue doesn't related to the issues
> reported on this thread and another factor  might be cause of this
> issue. It would be good to test the same again with latest four
> patches or after solved current some open items.
> 

That's because your additional patch kills the workers that do the
concurrent update. While that's probably okay, I still plan to look into
making the subscription and state locking more robust.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-06-01 Thread Petr Jelinek
On 01/06/17 16:51, Robert Haas wrote:
> On Wed, May 31, 2017 at 8:07 PM, Andres Freund <and...@anarazel.de> wrote:
>> Here's a patch doing what I suggested above.  The second patch addresses
>> an independent oversight where the post alter hook was invoked before
>> doing the catalog update.
> 
> 0002 is a slam-dunk.  I don't object to 0001 but haven't reviewed it 
> carefully.
> 

I did go over the code (ie didn't do actual testing), and I like it much
better than the current state. Both in terms of making the behavior more
consistent and the fact that the code is simpler.

So +1 to go ahead with both patches.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is *still* broken)

2017-06-01 Thread Petr Jelinek
On 01/06/17 15:25, Tom Lane wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> So, are you going to, perhaps, commit this?  Or who is picking this up?
> 
>> /me knows precious little about Windows.
> 
> I'm not going to be the one to commit this either, but seems like someone
> should.
> 

The new code does not use any windows specific APIs or anything, it just
adds retry logic for reattaching when we do EXEC_BACKEND which seems to
be agreed way of solving this. I do have couple of comments about the
code though.

The new parameter retry_count in PGSharedMemoryReAttach() seems to be
only used to decide if to log reattach issues so that we don't spam log
when retrying, but this fact is not mentioned anywhere.

Also, I am not excited about following coding style:
> + if (!pgwin32_ReserveSharedMemoryRegion(pi.hProcess))
> + continue;
> + else
> + {

Amit, if you want to avoid having to add the curly braces for single
line while still having else, I'd invert the expression in the if ()
statement so that true comes first. It's much less ugly to have curly
braces part first and the continue statement in the else block IMHO.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create subscription with `create_slot=false` and incorrect slot name

2017-06-01 Thread Petr Jelinek
On 01/06/17 04:44, Peter Eisentraut wrote:
> On 5/31/17 09:40, Robert Haas wrote:
>> On Tue, May 30, 2017 at 3:01 PM, Peter Eisentraut
>> <peter.eisentr...@2ndquadrant.com> wrote:
>>> On 5/25/17 17:26, Peter Eisentraut wrote:
>>>> Another way to fix this particular issue is to not verify the
>>>> replication slot name before doing the drop.  After all, if the name is
>>>> not valid, then you can also just report that it doesn't exist.
>>>
>>> Here is a possible patch along these lines.
>>
>> I don't see how this solves the problem.  Don't you still end up with
>> an error message telling you that you can't drop the subscription, and
>> no guidance as to how to fix it?
> 
> Well, the idea was to make the error message less cryptic.
> 
> But I notice that there is really little documentation about this.  So
> how about the attached documentation patch as well?
> 
> As mentioned earlier, if we want to do HINT messages, that will be a bit
> more involved and probably PG11 material.
> 

I think the combination of those patches is probably good enough
solution for PG10 (I never understood the need for name validation in
ReplicationSlotAcquire() anyway).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-01 Thread Petr Jelinek
On 31/05/17 11:21, Petr Jelinek wrote:
> On 31/05/17 09:24, Andres Freund wrote:
>> On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:
>>> I am not quite sure I understand (both the vxid suggestion and for the
>>> session dying badly). Maybe we can discuss bit more when you get to
>>> computer so it's easier for you to expand on what you mean.
>>
>> The xid interlock when exporting a snapshot is required because
>> otherwise it's not generally guaranteed that all resourced required for
>> the snapshot are reserved.  In the logical replication case we could
>> guarantee that otherwise, but there'd be weird-ish edge cases when
>> erroring out just after exporting a snapshot.
>>
>> The problem with using the xid as that interlock is that it requires
>> acquiring an xid - which is something we're going to block against when
>> building a new catalog snapshot.  Afaict that's not entirely required -
>> all that we need to verify is that the snapshot in the source
>> transaction is still running.  The easiest way for the importer to check
>> that the source is still alive seems to be export the virtual
>> transaction id instead of the xid.  Normally we can't store things like
>> virtual xids on disk, but that concern isn't valid here because exported
>> snapshots are ephemeral, there's also already a precedent in
>> predicate.c.
>>
>> It seems like it'd be fairly easy to change things around that way, but
>> maybe I'm missing something.
>>
> 
> Okay, thanks for explanation. Code-wise it does seem simple enough
> indeed. I admit I don't know enough about the exported snapshots and
> snapshot management in general to be able to answer the question of
> safety here. That said, it does seem to me like it should work as the
> exported snapshots are just on disk representation of in-memory state
> that becomes invalid once the in-memory state does.
> 

Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender & parallelism

2017-06-01 Thread Petr Jelinek
On 01/06/17 06:06, Andres Freund wrote:
> On 2017-05-31 23:51:08 -0400, Peter Eisentraut wrote:
>> I think the easiest and safest thing to do now is to just prevent
>> parallel plans in the walsender.  See attached patch.  This prevents the
>> hang in the select_parallel tests run under your new test setup.
> 
> I'm not quite sure I can buy this.  The lack of wired up signals has
> more problems than just hurting parallelism.  In fact, the USR1 thing
> seems like something that we actually should backpatch, rather than
> defer to v11.  I think there's some fair arguments to be made that we
> shouldn't do the refactoring right now - although I'm not sure about it
> - but just not fixing the bugs seems like a bad plan.
> 

I think the signal handling needs to be fixed. It does not have to be
done via large refactoring, but signals should be handled properly (= we
need to share SIGHUP/SIGUSR1 handling between postgres.c and walsender.c).

The rest can wait for PG11.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-31 Thread Petr Jelinek
On 31/05/17 09:24, Andres Freund wrote:
> On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:
>> I am not quite sure I understand (both the vxid suggestion and for the
>> session dying badly). Maybe we can discuss bit more when you get to
>> computer so it's easier for you to expand on what you mean.
> 
> The xid interlock when exporting a snapshot is required because
> otherwise it's not generally guaranteed that all resourced required for
> the snapshot are reserved.  In the logical replication case we could
> guarantee that otherwise, but there'd be weird-ish edge cases when
> erroring out just after exporting a snapshot.
> 
> The problem with using the xid as that interlock is that it requires
> acquiring an xid - which is something we're going to block against when
> building a new catalog snapshot.  Afaict that's not entirely required -
> all that we need to verify is that the snapshot in the source
> transaction is still running.  The easiest way for the importer to check
> that the source is still alive seems to be export the virtual
> transaction id instead of the xid.  Normally we can't store things like
> virtual xids on disk, but that concern isn't valid here because exported
> snapshots are ephemeral, there's also already a precedent in
> predicate.c.
> 
> It seems like it'd be fairly easy to change things around that way, but
> maybe I'm missing something.
> 

Okay, thanks for explanation. Code-wise it does seem simple enough
indeed. I admit I don't know enough about the exported snapshots and
snapshot management in general to be able to answer the question of
safety here. That said, it does seem to me like it should work as the
exported snapshots are just on disk representation of in-memory state
that becomes invalid once the in-memory state does.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - still unstable after all these months

2017-05-31 Thread Petr Jelinek
On 29/05/17 23:06, Mark Kirkwood wrote:
> On 29/05/17 23:14, Petr Jelinek wrote:
> 
>> On 29/05/17 03:33, Jeff Janes wrote:
>>
>>> What would you want to look at?  Would saving the WAL from the master be
>>> helpful?
>>>
>> Useful info is, logs from provider (mainly the logical decoding logs
>> that mention LSNs), logs from subscriber (the lines about when sync
>> workers finished), contents of the pg_subscription_rel (with srrelid
>> casted to regclass so we know which table is which), and pg_waldump
>> output around the LSNs found in the logs and in the pg_subscription_rel
>> (+ few lines before and some after to get context). It's enough to only
>> care about LSNs for the table(s) that are out of sync.
>>
> 
> I have a run that aborted with failure (accounts table md5 mismatch).
> Petr - would you like to have access to the machine ? If so send me you
> public key and I'll set it up.

Thanks to Mark's offer I was able to study the issue as it happened and
found the cause of this.

The busy loop in apply stops at the point when worker shmem state
indicates that table synchronization was finished, but that might not be
visible in the next transaction if it takes long to flush the final
commit to disk so we might ignore couple of transactions for given table
in the main apply because we think it's still being synchronized. This
also explains why I could not reproduce it on my testing machine (fast
ssd disk array where flushes were always fast) and why it happens
relatively rarely because it's one specific commit during the whole
synchronization process that needs to be slow.

So as solution I changed the busy loop in the apply to wait for
in-catalog status rather than in-memory status to make sure things are
really there and flushed.

While working on this I realized that the handover itself is bit more
complex than necessary (especially for debugging and for other people
understanding it) so I did some small changes as part of this patch to
make the sequences of states table goes during synchronization process
to always be the same. This might cause unnecessary update per one table
synchronization process in some cases but that seems like small enough
price to pay for clearer logic. And it also fixes another potential bug
that I identified where we might write wrong state to catalog if main
apply crashed while sync worker was waiting for status update.

I've been running tests on this overnight on another machine where I was
able to reproduce  the original issue within few runs (once I found what
causes it) and so far looks good.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


0001-Improve-handover-logic-between-sync-and-apply-worker.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-05-30 Thread Petr Jelinek
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
> At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek 
> <petr.jeli...@2ndquadrant.com> wrote in 
> <e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com>
>> Hi,
>>
>> We have had issue with walsender timeout when used with logical decoding
>> and the transaction is taking long time to be decoded (because it
>> contains many changes)
>>
>> I was looking today at the walsender code and realized that it's because
>> if the network and downstream are fast enough, we'll always take fast
>> path in WalSndWriteData which does not do reply or keepalive processing
>> and is only reached once the transaction has finished by other code. So
>> paradoxically we die of timeout because everything was fast enough to
>> never fall back to slow code path.
>>
>> I propose we only use fast path if the last processed reply is not older
>> than half of walsender timeout, if it is then we'll force the slow code
>> path to process the replies again. This is similar logic that we use to
>> determine if to send keepalive message. I also added CHECK_INTERRUPRS
>> call to fast code path because otherwise walsender might ignore them for
>> too long on large transactions.
>>
>> Thoughts?
> 
> + TimestampTz now = GetCurrentTimestamp();
> 
> I think it is not recommended to read the current time too
> frequently, especially within a loop that hates slowness. (I
> suppose that a loop that can fill up a send queue falls into that

Yeah that was my main worry for the patch as well, although given that
the loop does tuple processing it might not be very noticeable.

> category.)  If you don't mind a certain amount of additional
> complexity for eliminating the possible slowdown by the check,
> timeout would be usable. Attached patch does almost the same
> thing with your patch but without busy time check.
> 
> What do you think about this?
> 

I think we could do it that way.

> # I saw that SIGQUIT doens't work for active publisher, which I
> # think mention in another thread.

Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS();
in the fast-path which btw your updated patch is missing as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Petr Jelinek
On 29/05/17 21:44, Andres Freund wrote:
> 
> 
> On May 29, 2017 12:41:26 PM PDT, Petr Jelinek <petr.jeli...@2ndquadrant.com> 
> wrote:
>> On 29/05/17 21:28, Andres Freund wrote:
>>>
>>> On May 29, 2017 12:25:35 PM PDT, Petr Jelinek
>> <petr.jeli...@2ndquadrant.com> wrote:
>>>>
>>>> Looking at the code more, the xid is only used as parameter for
>>>> SnapBuildBuildSnapshot() which never does anything with that
>> parameter,
>>>> I wonder if it's really needed then.
>>>
>>> Not at a computer, but by memory that'll trigger the snapshot export
>> routine to include it.  Import in turn requires the xid to check if the
>> source is still alive.  But there's better ways, e.g. using the virtual
>> xactid.
>>>
>>
>> It does, and while that's unfortunate the logical replication does not
>> actually export the snapshots. It uses the USE_SNAPSHOT option where
>> the
>> snapshot is just installed into current transaction but not exported.
>> So
>> not calling the GetTopTransactionId() would solve it at least for that
>> in-core use-case. I don't see any bad side effects from doing so yet,
>> so
>> it might be good enough solution for PG10.
> 
> In the general case you can't do so, because of vacuum and such.  Even for LR 
> we need to make sure the exporting session didn't die badly, deleting the 
> slot.  Hence suggestion to use the virtual xid.
> 

I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Petr Jelinek
On 29/05/17 21:28, Andres Freund wrote:
> 
> On May 29, 2017 12:25:35 PM PDT, Petr Jelinek <petr.jeli...@2ndquadrant.com> 
> wrote:
>>
>> Looking at the code more, the xid is only used as parameter for
>> SnapBuildBuildSnapshot() which never does anything with that parameter,
>> I wonder if it's really needed then.
> 
> Not at a computer, but by memory that'll trigger the snapshot export routine 
> to include it.  Import in turn requires the xid to check if the source is 
> still alive.  But there's better ways, e.g. using the virtual xactid.
> 

It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where the
snapshot is just installed into current transaction but not exported. So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet, so
it might be good enough solution for PG10.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Petr Jelinek
On 29/05/17 21:23, Andres Freund wrote:
> 
> 
> On May 29, 2017 12:21:50 PM PDT, Petr Jelinek <petr.jeli...@2ndquadrant.com> 
> wrote:
>> On 29/05/17 20:59, Andres Freund wrote:
>>>
>>>
>>> On May 29, 2017 11:58:05 AM PDT, Petr Jelinek
>> <petr.jeli...@2ndquadrant.com> wrote:
>>>> On 27/05/17 17:17, Andres Freund wrote:
>>>>>
>>>>>
>>>>> On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
>>>> <petr.jeli...@2ndquadrant.com> wrote:
>>>>>> Actually, I guess it's the pid 47457 (COPY process) who is
>> actually
>>>>>> running the xid 73322726. In that case that's the same thing
>>>> Masahiko
>>>>>> Sawada reported [1]. Which basically is result of snapshot builder
>>>>>> waiting for transaction to finish, that's normal if there is a
>> long
>>>>>> transaction running when the snapshot is being created (and the
>> COPY
>>>> is
>>>>>> a long transaction).
>>>>>
>>>>> Hm.  I suspect the issue is that the exported snapshot needs an xid
>>>> for some crosscheck, and that's what we're waiting for.  Could you
>>>> check what happens if you don't assign one and just content the
>> error
>>>> checks out?   Not at my computer, just theorizing.
>>>>>
>>>>
>>>> I don't think that's it, in my opinion it's the parallelization of
>>>> table
>>>> data copy where we create snapshot for one process but then the next
>>>> one
>>>> has to wait for the first one to finish. Before we fixed the
>>>> snapshotting, the second one would just use the ondisk snapshot so
>> it
>>>> would work fine (except the snapshot was corrupted of course). I
>> wonder
>>>> if we could somehow give it a hint to ignore the read-only txes, but
>>>> then we have no way to enforce the txes to stay read-only so it does
>>>> not
>>>> seem safe.
>>>
>>> Read-only txs have no xid ...
>>>
>>
>> That's what I mean by hinting, normally they don't but building initial
>> snapshot in snapshot builder calls GetTopTransactionId() (see
>> SnapBuildInitialSnapshot()) which will assign it xid.
> 
> That's precisely what I pointed out a few emails above, and what I suggest 
> changing.
> 

Ah didn't realize that's what you meant. I can try playing with it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Petr Jelinek
On 29/05/17 21:21, Petr Jelinek wrote:
> On 29/05/17 20:59, Andres Freund wrote:
>>
>>
>> On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jeli...@2ndquadrant.com> 
>> wrote:
>>> On 27/05/17 17:17, Andres Freund wrote:
>>>>
>>>>
>>>> On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
>>> <petr.jeli...@2ndquadrant.com> wrote:
>>>>> Actually, I guess it's the pid 47457 (COPY process) who is actually
>>>>> running the xid 73322726. In that case that's the same thing
>>> Masahiko
>>>>> Sawada reported [1]. Which basically is result of snapshot builder
>>>>> waiting for transaction to finish, that's normal if there is a long
>>>>> transaction running when the snapshot is being created (and the COPY
>>> is
>>>>> a long transaction).
>>>>
>>>> Hm.  I suspect the issue is that the exported snapshot needs an xid
>>> for some crosscheck, and that's what we're waiting for.  Could you
>>> check what happens if you don't assign one and just content the error
>>> checks out?   Not at my computer, just theorizing.
>>>>
>>>
>>> I don't think that's it, in my opinion it's the parallelization of
>>> table
>>> data copy where we create snapshot for one process but then the next
>>> one
>>> has to wait for the first one to finish. Before we fixed the
>>> snapshotting, the second one would just use the ondisk snapshot so it
>>> would work fine (except the snapshot was corrupted of course). I wonder
>>> if we could somehow give it a hint to ignore the read-only txes, but
>>> then we have no way to enforce the txes to stay read-only so it does
>>> not
>>> seem safe.
>>
>> Read-only txs have no xid ...
>>
> 
> That's what I mean by hinting, normally they don't but building initial
> snapshot in snapshot builder calls GetTopTransactionId() (see
> SnapBuildInitialSnapshot()) which will assign it xid.
> 

Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Petr Jelinek
On 29/05/17 20:59, Andres Freund wrote:
> 
> 
> On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jeli...@2ndquadrant.com> 
> wrote:
>> On 27/05/17 17:17, Andres Freund wrote:
>>>
>>>
>>> On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
>> <petr.jeli...@2ndquadrant.com> wrote:
>>>> Actually, I guess it's the pid 47457 (COPY process) who is actually
>>>> running the xid 73322726. In that case that's the same thing
>> Masahiko
>>>> Sawada reported [1]. Which basically is result of snapshot builder
>>>> waiting for transaction to finish, that's normal if there is a long
>>>> transaction running when the snapshot is being created (and the COPY
>> is
>>>> a long transaction).
>>>
>>> Hm.  I suspect the issue is that the exported snapshot needs an xid
>> for some crosscheck, and that's what we're waiting for.  Could you
>> check what happens if you don't assign one and just content the error
>> checks out?   Not at my computer, just theorizing.
>>>
>>
>> I don't think that's it, in my opinion it's the parallelization of
>> table
>> data copy where we create snapshot for one process but then the next
>> one
>> has to wait for the first one to finish. Before we fixed the
>> snapshotting, the second one would just use the ondisk snapshot so it
>> would work fine (except the snapshot was corrupted of course). I wonder
>> if we could somehow give it a hint to ignore the read-only txes, but
>> then we have no way to enforce the txes to stay read-only so it does
>> not
>> seem safe.
> 
> Read-only txs have no xid ...
> 

That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication busy-waiting on a lock

2017-05-29 Thread Petr Jelinek
On 27/05/17 17:17, Andres Freund wrote:
> 
> 
> On May 27, 2017 9:48:22 AM EDT, Petr Jelinek <petr.jeli...@2ndquadrant.com> 
> wrote:
>> Actually, I guess it's the pid 47457 (COPY process) who is actually
>> running the xid 73322726. In that case that's the same thing Masahiko
>> Sawada reported [1]. Which basically is result of snapshot builder
>> waiting for transaction to finish, that's normal if there is a long
>> transaction running when the snapshot is being created (and the COPY is
>> a long transaction).
> 
> Hm.  I suspect the issue is that the exported snapshot needs an xid for some 
> crosscheck, and that's what we're waiting for.  Could you check what happens 
> if you don't assign one and just content the error checks out?   Not at my 
> computer, just theorizing.
> 

I don't think that's it, in my opinion it's the parallelization of table
data copy where we create snapshot for one process but then the next one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does not
seem safe.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   9   10   >