Re: [HACKERS] More replication race conditions

2017-09-01 Thread Michael Paquier
On Sat, Sep 2, 2017 at 12:03 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> On Mon, Aug 28, 2017 at 8:25 AM, Michael Paquier
>>  wrote:
>> > Today's run has finished with the same failure:
>> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
>> > Attached is a patch to make this code path wait that the transaction
>> > has been replayed. We could use as well synchronous_commit = apply,
>> > but I prefer the solution of this patch with a wait query.
>>
>> I have added an open item for this issue for PG10. The 2PC tests in
>> src/test/recovery are new in PG10, introduced by 3082098.
>
> Thanks, pushed.

Thanks, the error has now not happened for 5 runs. Let's see if it
ever shows up again.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-09-01 Thread Alvaro Herrera
Michael Paquier wrote:
> On Mon, Aug 28, 2017 at 8:25 AM, Michael Paquier
>  wrote:
> > Today's run has finished with the same failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
> > Attached is a patch to make this code path wait that the transaction
> > has been replayed. We could use as well synchronous_commit = apply,
> > but I prefer the solution of this patch with a wait query.
> 
> I have added an open item for this issue for PG10. The 2PC tests in
> src/test/recovery are new in PG10, introduced by 3082098.

Thanks, pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-09-01 Thread Simon Riggs
On 31 August 2017 at 12:54, Simon Riggs  wrote:

>> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> v10 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within three calendar days of
>> this message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping v10.  Consequently, I will appreciate your 
>> efforts
>> toward speedy resolution.  Thanks.
>>
>> [1] 
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
> I acknowledge receipt of the reminder and will fix by end of day tomorrow.

Applied. Thanks for the patch, Petr.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-31 Thread Simon Riggs
On 27 August 2017 at 03:32, Noah Misch  wrote:
> On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
>> On 24/08/17 19:54, Tom Lane wrote:
>> > sungazer just failed with
>> >
>> > pg_recvlogical exited with code '256', stdout '' and stderr 
>> > 'pg_recvlogical: could not send replication command "START_REPLICATION 
>> > SLOT "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" 
>> > '1')": ERROR:  replication slot "test_slot" is active for PID 8913148
>> > pg_recvlogical: disconnected
>> > ' at 
>> > /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
>> >  line 1657.
>> >
>> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
>> >
>> > Looks like we're still not there on preventing replication startup
>> > race conditions.
>>
>> Hmm, that looks like "by design" behavior. Slot acquiring will throw
>> error if the slot is already used by somebody else (slots use their own
>> locking mechanism that does not wait). In this case it seems the
>> walsender which was using slot for previous previous step didn't finish
>> releasing the slot by the time we start new command. We can work around
>> this by changing the test to wait perhaps.
>
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
>
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I acknowledge receipt of the reminder and will fix by end of day tomorrow.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-30 Thread Noah Misch
On Tue, Aug 29, 2017 at 08:44:42PM +0900, Michael Paquier wrote:
> On Mon, Aug 28, 2017 at 8:25 AM, Michael Paquier
>  wrote:
> > Today's run has finished with the same failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
> > Attached is a patch to make this code path wait that the transaction
> > has been replayed. We could use as well synchronous_commit = apply,
> > but I prefer the solution of this patch with a wait query.
> 
> I have added an open item for this issue for PG10. The 2PC tests in
> src/test/recovery are new in PG10, introduced by 3082098.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Álvaro,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-30 Thread Noah Misch
On Sun, Aug 27, 2017 at 02:32:49AM +, Noah Misch wrote:
> On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
> > On 24/08/17 19:54, Tom Lane wrote:
> > > sungazer just failed with
> > > 
> > > pg_recvlogical exited with code '256', stdout '' and stderr 
> > > 'pg_recvlogical: could not send replication command "START_REPLICATION 
> > > SLOT "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" 
> > > '1')": ERROR:  replication slot "test_slot" is active for PID 8913148
> > > pg_recvlogical: disconnected
> > > ' at 
> > > /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
> > >  line 1657.
> > > 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
> > > 
> > > Looks like we're still not there on preventing replication startup
> > > race conditions.
> > 
> > Hmm, that looks like "by design" behavior. Slot acquiring will throw
> > error if the slot is already used by somebody else (slots use their own
> > locking mechanism that does not wait). In this case it seems the
> > walsender which was using slot for previous previous step didn't finish
> > releasing the slot by the time we start new command. We can work around
> > this by changing the test to wait perhaps.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-29 Thread Michael Paquier
On Mon, Aug 28, 2017 at 8:25 AM, Michael Paquier
 wrote:
> Today's run has finished with the same failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
> Attached is a patch to make this code path wait that the transaction
> has been replayed. We could use as well synchronous_commit = apply,
> but I prefer the solution of this patch with a wait query.

I have added an open item for this issue for PG10. The 2PC tests in
src/test/recovery are new in PG10, introduced by 3082098.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Petr Jelinek
On 28/08/17 01:36, Michael Paquier wrote:
> On Sun, Aug 27, 2017 at 6:32 PM, Petr Jelinek
>  wrote:
>> Attached should fix this.
> 
> +$node_master->poll_query_until('postgres',
> +"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name =
> 'test_slot' AND active_pid IS NULL)"
> +)
> +  or die "slot never became inactive";
> +
>  $stdout_recv = $node_master->pg_recvlogical_upto(
> I am wondering if a similar check should actually go into
> pg_recvlogical_upto() instead. I tend to think that we've learned
> things the hard way with 3043c1dd and such.
> 

Putting it there can lead to tests that take forever to finish if
written incorrectly. This way they at least just fail.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Mon, Aug 28, 2017 at 8:33 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> Attached is a patch to make this code path wait that the transaction
>> has been replayed. We could use as well synchronous_commit = apply,
>> but I prefer the solution of this patch with a wait query.
>
> Petr proposed a different patch to fix the same problem at
> https://www.postgresql.org/message-id/1636c52e-c144-993a-6665-9358f322deda%402ndquadrant.com
>
> Which one is better?

Petr's patch is here to fix the race condition in test 006 for logical
decoding. My patch involves the failures that dangomushi sees since
yesterday with 2PC test 009. (Note: It took me a while to put this
animal in an environment with a more stable connection, so now I hope
that we'll get again daily reports)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 6:32 PM, Petr Jelinek
 wrote:
> Attached should fix this.

+$node_master->poll_query_until('postgres',
+"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name =
'test_slot' AND active_pid IS NULL)"
+)
+  or die "slot never became inactive";
+
 $stdout_recv = $node_master->pg_recvlogical_upto(
I am wondering if a similar check should actually go into
pg_recvlogical_upto() instead. I tend to think that we've learned
things the hard way with 3043c1dd and such.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Tom Lane
Michael Paquier  writes:
> Attached is a patch to make this code path wait that the transaction
> has been replayed. We could use as well synchronous_commit = apply,
> but I prefer the solution of this patch with a wait query.

Petr proposed a different patch to fix the same problem at
https://www.postgresql.org/message-id/1636c52e-c144-993a-6665-9358f322deda%402ndquadrant.com

Which one is better?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 3:34 PM, Michael Paquier
 wrote:
> On Sun, Aug 27, 2017 at 12:03 PM, Tom Lane  wrote:
>> contains exactly no means of ensuring that the master's transaction has
>> been replayed on the standby before we check for its results.  It's not
>> real clear why it seems to work 99.99% of the time, because, well, there
>> isn't any frickin' interlock there ...
>
> I have noticed this one this morning, and I am planning to address it
> with a proper patch soonishly. (I am still fighting a bit to get
> dangomushi in a more stable stable, and things run slow on it, so it
> is good at catching race conditions of this kind).

Today's run has finished with the same failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
Attached is a patch to make this code path wait that the transaction
has been replayed. We could use as well synchronous_commit = apply,
but I prefer the solution of this patch with a wait query.
-- 
Michael


tap-2pc-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Petr Jelinek
On 27/08/17 04:32, Noah Misch wrote:
> On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
>> On 24/08/17 19:54, Tom Lane wrote:
>>> sungazer just failed with
>>>
>>> pg_recvlogical exited with code '256', stdout '' and stderr 
>>> 'pg_recvlogical: could not send replication command "START_REPLICATION SLOT 
>>> "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": 
>>> ERROR:  replication slot "test_slot" is active for PID 8913148
>>> pg_recvlogical: disconnected
>>> ' at 
>>> /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
>>>  line 1657.
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
>>>
>>> Looks like we're still not there on preventing replication startup
>>> race conditions.
>>
>> Hmm, that looks like "by design" behavior. Slot acquiring will throw
>> error if the slot is already used by somebody else (slots use their own
>> locking mechanism that does not wait). In this case it seems the
>> walsender which was using slot for previous previous step didn't finish
>> releasing the slot by the time we start new command. We can work around
>> this by changing the test to wait perhaps.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 

Attached should fix this.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From d66caa3532558828ecb89b4f153cc1f29332b918 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sun, 27 Aug 2017 11:28:49 +0200
Subject: [PATCH] Fix race condition in logical decoding tap test

---
 src/test/recovery/t/006_logical_decoding.pl | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 4a90e9a..8b35bc8 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -78,6 +78,11 @@ chomp($stdout_recv);
 is($stdout_recv, $expected,
 	'got same expected output from pg_recvlogical decoding session');
 
+$node_master->poll_query_until('postgres',
+"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'test_slot' AND active_pid IS NULL)"
+)
+  or die "slot never became inactive";
+
 $stdout_recv = $node_master->pg_recvlogical_upto(
 	'postgres', 'test_slot', $endpos, 10,
 	'include-xids' => '0',
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 12:03 PM, Tom Lane  wrote:
> And *another* replication test race condition just now:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-26%2019%3A37%3A08
>
> As best I can interpret this, it's pointing out that this bit in
> src/test/recovery/t/009_twophase.pl:
>
> $cur_master->psql(
> 'postgres', "
> BEGIN;
> CREATE TABLE t_009_tbl2 (id int, msg text);
> SAVEPOINT s1;
> INSERT INTO t_009_tbl2 VALUES (27, 'issued to ${cur_master_name}');
> PREPARE TRANSACTION 'xact_009_13';
> -- checkpoint will issue XLOG_STANDBY_LOCK that can conflict with lock
> -- held by 'create table' statement
> CHECKPOINT;
> COMMIT PREPARED 'xact_009_13';");
>
> $cur_standby->psql(
> 'postgres',
> "SELECT count(*) FROM t_009_tbl2",
> stdout => \$psql_out);
> is($psql_out, '1', "Replay prepared transaction with DDL");
>
> contains exactly no means of ensuring that the master's transaction has
> been replayed on the standby before we check for its results.  It's not
> real clear why it seems to work 99.99% of the time, because, well, there
> isn't any frickin' interlock there ...

I have noticed this one this morning, and I am planning to address it
with a proper patch soonishly. (I am still fighting a bit to get
dangomushi in a more stable stable, and things run slow on it, so it
is good at catching race conditions of this kind).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-26 Thread Tom Lane
And *another* replication test race condition just now:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-26%2019%3A37%3A08

As best I can interpret this, it's pointing out that this bit in
src/test/recovery/t/009_twophase.pl:

$cur_master->psql(
'postgres', "
BEGIN;
CREATE TABLE t_009_tbl2 (id int, msg text);
SAVEPOINT s1;
INSERT INTO t_009_tbl2 VALUES (27, 'issued to ${cur_master_name}');
PREPARE TRANSACTION 'xact_009_13';
-- checkpoint will issue XLOG_STANDBY_LOCK that can conflict with lock
-- held by 'create table' statement
CHECKPOINT;
COMMIT PREPARED 'xact_009_13';");

$cur_standby->psql(
'postgres',
"SELECT count(*) FROM t_009_tbl2",
stdout => \$psql_out);
is($psql_out, '1', "Replay prepared transaction with DDL");

contains exactly no means of ensuring that the master's transaction has
been replayed on the standby before we check for its results.  It's not
real clear why it seems to work 99.99% of the time, because, well, there
isn't any frickin' interlock there ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-26 Thread Noah Misch
On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
> On 24/08/17 19:54, Tom Lane wrote:
> > sungazer just failed with
> > 
> > pg_recvlogical exited with code '256', stdout '' and stderr 
> > 'pg_recvlogical: could not send replication command "START_REPLICATION SLOT 
> > "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": 
> > ERROR:  replication slot "test_slot" is active for PID 8913148
> > pg_recvlogical: disconnected
> > ' at 
> > /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
> >  line 1657.
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
> > 
> > Looks like we're still not there on preventing replication startup
> > race conditions.
> 
> Hmm, that looks like "by design" behavior. Slot acquiring will throw
> error if the slot is already used by somebody else (slots use their own
> locking mechanism that does not wait). In this case it seems the
> walsender which was using slot for previous previous step didn't finish
> releasing the slot by the time we start new command. We can work around
> this by changing the test to wait perhaps.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Simon,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-25 Thread Petr Jelinek
On 24/08/17 19:54, Tom Lane wrote:
> sungazer just failed with
> 
> pg_recvlogical exited with code '256', stdout '' and stderr 'pg_recvlogical: 
> could not send replication command "START_REPLICATION SLOT "test_slot" 
> LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": ERROR:  
> replication slot "test_slot" is active for PID 8913148
> pg_recvlogical: disconnected
> ' at 
> /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
>  line 1657.
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
> 
> Looks like we're still not there on preventing replication startup
> race conditions.

Hmm, that looks like "by design" behavior. Slot acquiring will throw
error if the slot is already used by somebody else (slots use their own
locking mechanism that does not wait). In this case it seems the
walsender which was using slot for previous previous step didn't finish
releasing the slot by the time we start new command. We can work around
this by changing the test to wait perhaps.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] More replication race conditions

2017-08-24 Thread Tom Lane
sungazer just failed with

pg_recvlogical exited with code '256', stdout '' and stderr 'pg_recvlogical: 
could not send replication command "START_REPLICATION SLOT "test_slot" LOGICAL 
0/0 ("include-xids" '0', "skip-empty-xacts" '1')": ERROR:  replication slot 
"test_slot" is active for PID 8913148
pg_recvlogical: disconnected
' at 
/home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
 line 1657.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10

Looks like we're still not there on preventing replication startup
race conditions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers