On 05/06/2017 08:54 PM, Andrew Dunstan wrote: > > On 05/06/2017 07:41 PM, Craig Ringer wrote: >> >> On 7 May 2017 4:24 am, "Andrew Dunstan" >> <andrew.duns...@2ndquadrant.com >> <mailto:andrew.duns...@2ndquadrant.com>> wrote: >> >> >> I have been working on enabling the remaining TAP tests on MSVC >> build in >> the buildfarm client, but I have come across an odd problem. The bin >> tests all run fine, but the recover tests crash and in such a way >> as to >> crash the buildfarm client itself and require some manual cleanup. >> This >> happens at some stage after the tests have run (the final "ok" is >> output) but before the END handler in PostgresNode.pm (I put some >> traces >> in there to see if I could narrow down where there were problems). >> >> The symptom is that this appears at the end of the output when the >> client calls "vcregress.pl <http://vcregress.pl> taptest >> src/test/recover": >> >> Terminating on signal SIGBREAK(21) >> Terminating on signal SIGBREAK(21) >> Terminate batch job (Y/N)? >> >> And at that point there is nothing at all apparently running, >> according >> to Sysinternals Process Explorer, including the buildfarm client. >> >> It's 100% repeatable on bowerbird, and I'm a bit puzzled about how to >> fix it. >> >> >> Anyone have any clues? >> >> >> That looks like we've upset CMD.exe its self. I'm not sure how ... >> leaking a signal to the parent proc? >> >> I suspect this could be something to do with console process groups. >> >> Bowerbird is win8 . So this isn't going to be related to the support >> for ANSI escapes added in win10. >> >> A serach for the error turns up a complaint about IPC::Run as the >> first hit. Probably not coincidence. >> >> >> http://stackoverflow.com/q/40924750 >> >> See this bug >> >> https://rt.cpan.org/Public/Bug/Display.html?id=101093 >> >> >> > > > Actually, it's Win10, looks like I forgot to update the personality, my bad. > > I had a feeling it was probably something to do with timeout. That RT > ticket looks like it's on the money. >
(After extensive trial and error) Turns out it's not quite that, it's the kill_kill stuff. I think for now we should just disable it on the platform. That means not running tests 7 and 8 of the logical_decoding tests and all of the crash_recovery test. test::More has nice faciliti4es for skipping tests cleanly. See attached patch. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f3ffdad568e9fbce6b8cc3c6ffc4490842b1b5fb Mon Sep 17 00:00:00 2001 From: Andrew Dunstan <and...@dunslane.net> Date: Tue, 9 May 2017 13:03:41 -0400 Subject: [PATCH] Avoid tests which crash the calling process on Windows Certain recovery tests use the Perl IPC::Run module's start/kill_kill method of processing. On at least some versions of perl this causes the whole process and its caller to crash. If we ever find a better way of doing these tests they can be re-enabled. --- src/test/recovery/t/006_logical_decoding.pl | 22 +++++++++++++++------- src/test/recovery/t/011_crash_recovery.pl | 12 +++++++++++- 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl index bf9b50a..095cfa8 100644 --- a/src/test/recovery/t/006_logical_decoding.pl +++ b/src/test/recovery/t/006_logical_decoding.pl @@ -8,6 +8,7 @@ use warnings; use PostgresNode; use TestLib; use Test::More tests => 16; +use Config; # Initialize master node my $node_master = get_new_node('master'); @@ -72,13 +73,20 @@ is($node_master->psql('otherdb', "SELECT location FROM pg_logical_slot_peek_chan $node_master->safe_psql('otherdb', qq[SELECT pg_create_logical_replication_slot('otherdb_slot', 'test_decoding');]); # make sure you can't drop a slot while active -my $pg_recvlogical = IPC::Run::start(['pg_recvlogical', '-d', $node_master->connstr('otherdb'), '-S', 'otherdb_slot', '-f', '-', '--start']); -$node_master->poll_query_until('otherdb', "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'otherdb_slot' AND active_pid IS NOT NULL)"); -is($node_master->psql('postgres', 'DROP DATABASE otherdb'), 3, - 'dropping a DB with inactive logical slots fails'); -$pg_recvlogical->kill_kill; -is($node_master->slot('otherdb_slot')->{'slot_name'}, undef, - 'logical slot still exists'); +# +SKIP: +{ + # some Windows Perls at least don't like IPC::Run's start/kill_kill regime. + skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32'; + + my $pg_recvlogical = IPC::Run::start(['pg_recvlogical', '-d', $node_master->connstr('otherdb'), '-S', 'otherdb_slot', '-f', '-', '--start']); + $node_master->poll_query_until('otherdb', "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'otherdb_slot' AND active_pid IS NOT NULL)"); + is($node_master->psql('postgres', 'DROP DATABASE otherdb'), 3, + 'dropping a DB with inactive logical slots fails'); + $pg_recvlogical->kill_kill; + is($node_master->slot('otherdb_slot')->{'slot_name'}, undef, + 'logical slot still exists'); +} $node_master->poll_query_until('otherdb', "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'otherdb_slot' AND active_pid IS NULL)"); is($node_master->psql('postgres', 'DROP DATABASE otherdb'), 0, diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl index 3c3718e..8d8ae03 100644 --- a/src/test/recovery/t/011_crash_recovery.pl +++ b/src/test/recovery/t/011_crash_recovery.pl @@ -5,7 +5,17 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 3; +use Test::More; +use Config; +if ($Config{osname} eq 'MSWin32') +{ + # some Windows Perls at least don't like IPC::Run's start/kill_kill regime. + plan skip_all => "Test fails on Windows perl"; +} +else +{ + plan tests => 3; +} my $node = get_new_node('master'); $node->init(allows_streaming => 1); -- 2.9.3
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers