Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
On Mon, Feb 17, 2014 at 11:29 AM, Alvaro Herrera wrote: > The pg_regress part is ugly. However, pg_regress is doing something > unusual when starting postmaster itself, so the ugly coding to stop it > seems to match. If we wanted to avoid the ugliness here, the right fix > would be to use pg_ctl to start postmaster as well as to stop it. I wonder if this would change the behavior in cases where we hit ^C during the regression tests. Right now I think that kills the postmaster as well as pg_regress, but if we used pg_ctl, it might not, because pg_regress uses fork()+exec(), but pg_ctl uses system() to launch a shell which is in turn instructed to background the postmaster. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
On Tue, Feb 18, 2014 at 1:29 AM, Alvaro Herrera wrote: > MauMau escribió: > The pg_regress part is ugly. However, pg_regress is doing something > unusual when starting postmaster itself, so the ugly coding to stop it > seems to match. If we wanted to avoid the ugliness here, the right fix > would be to use pg_ctl to start postmaster as well as to stop it. But > that'd come at a price, because we would need more ugly code to figure > out postmaster's PID. All in all, the compromise proposed by this patch > seems acceptable. If we really wanted to make all this real pretty, we > could provide a "libpg_ctl" library to start and stop postmaster, as > well as query the PID. Probably not worth the trouble. This might not be worth the trouble for this bug, but actually it could be useful to many third-part tools and extensions to have a common and generic way to do things. I have seen many utilities using a copy/paste of pg_ctl functions and still maintain some of them... Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
MauMau escribió: > pg_ctl timed out waiting for the zombie postgres. > > maumau 19621 18849 0 15:21 pts/900:00:00 [postgres] > maumau 20253 18849 0 15:22 pts/900:00:00 > /maumau/postgresql-9.4/src/test/regress/./tmp_check/install//maumau/pgsql/bin/pg_ctl > stop -D /maumau/postgresql-9.4/src/test/regress/./tmp_check/data -s > -m fast > > pg_regress must wait for postgres to terminate by calling waitpid(), > because it invoked postgres directly. The attached > pg_regress_pg_stop.patch does this. If you like the combination of > this and the original fix for pg_ctl in one patch, please use > pg_stop_fail_v3.patch. The pg_regress part is ugly. However, pg_regress is doing something unusual when starting postmaster itself, so the ugly coding to stop it seems to match. If we wanted to avoid the ugliness here, the right fix would be to use pg_ctl to start postmaster as well as to stop it. But that'd come at a price, because we would need more ugly code to figure out postmaster's PID. All in all, the compromise proposed by this patch seems acceptable. If we really wanted to make all this real pretty, we could provide a "libpg_ctl" library to start and stop postmaster, as well as query the PID. Probably not worth the trouble. I would apply this patch to all supported branches after this week's release. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
From: "Ronan Dunklau" There is no regression tests covering this bugfix, althought I don't know if it would be practical to implement them. Thanks for reviewing the patch. I'm glad to know that it seems OK. Unfortunately, the current regression test system cannot handle the testing of pg_ctl. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
Le mardi 7 janvier 2014 17:05:03 Michael Paquier a écrit : > On Sun, Jan 5, 2014 at 3:49 PM, MauMau wrote: > > Could you confirm again and tell me what problem is happening? > > FWIW, I just quickly tested those two patches independently and got > them correctly applied with patch -p1 < $PATCH on master at edc4345. > They compiled and passed as well make check. > Regards, Both patches apply cleanly, I'll focus on the pg_stop_fail_v3 patch. Tests are running correctly. The bug this patch is supposed to fix has been reproduced on current HEAD, and cannot be reproduced using the patch. Previous concerns about using both get_pgpid and postmaster_is_alive are adressed. There is no regression tests covering this bugfix, althought I don't know if it would be practical to implement them. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
On Sun, Jan 5, 2014 at 3:49 PM, MauMau wrote: > Could you confirm again and tell me what problem is happening? FWIW, I just quickly tested those two patches independently and got them correctly applied with patch -p1 < $PATCH on master at edc4345. They compiled and passed as well make check. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
From: "Peter Eisentraut" On 12/25/13, 6:40 AM, MauMau wrote: pg_regress must wait for postgres to terminate by calling waitpid(), because it invoked postgres directly. The attached pg_regress_pg_stop.patch does this. If you like the combination of this and the original fix for pg_ctl in one patch, please use pg_stop_fail_v3.patch. This patch doesn't apply. Is that true? Today, I downloaded the following file (whose timestamp was 2014-Jan-04), and could apply pg_stop_fail_v3.patch cleanly. The "make check" succeeded. http://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz Could you confirm again and tell me what problem is happening? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
On 12/25/13, 6:40 AM, MauMau wrote: > pg_regress must wait for postgres to terminate by calling waitpid(), > because it invoked postgres directly. The attached > pg_regress_pg_stop.patch does this. If you like the combination of this > and the original fix for pg_ctl in one patch, please use > pg_stop_fail_v3.patch. This patch doesn't apply. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
From: "Peter Eisentraut" This patch breaks the regression tests: xml ... ok test stats... ok == shutting down postmaster == pg_ctl: server does not shut down pg_regress: could not stop postmaster: exit code was 256 pg_ctl timed out waiting for the zombie postgres. maumau 19621 18849 0 15:21 pts/900:00:00 [postgres] maumau 20253 18849 0 15:22 pts/900:00:00 /maumau/postgresql-9.4/src/test/regress/./tmp_check/install//maumau/pgsql/bin/pg_ctl stop -D /maumau/postgresql-9.4/src/test/regress/./tmp_check/data -s -m fast pg_regress must wait for postgres to terminate by calling waitpid(), because it invoked postgres directly. The attached pg_regress_pg_stop.patch does this. If you like the combination of this and the original fix for pg_ctl in one patch, please use pg_stop_fail_v3.patch. Sorry for causing trouble. Regards MauMau pg_stop_fail_v3.patch Description: Binary data pg_regress_pg_stop.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
On 12/5/13, 7:07 AM, MauMau wrote: > From: "Tom Lane" >> If you're going to do a postmaster_is_alive check, why bother with >> repeated get_pgpid()? > > As I said yesterday, I removed get_pgpid() calls. I'll add this patch > to 2014-1 commitfest this weekend if it is not committed until then. This patch breaks the regression tests: xml ... ok test stats... ok == shutting down postmaster == pg_ctl: server does not shut down pg_regress: could not stop postmaster: exit code was 256 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
From: "Tom Lane" If you're going to do a postmaster_is_alive check, why bother with repeated get_pgpid()? As I said yesterday, I removed get_pgpid() calls. I'll add this patch to 2014-1 commitfest this weekend if it is not committed until then. Regards MauMau pg_stop_fail_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
From: "Tom Lane" I think the reason why it was coded like that was that we hadn't written postmaster_is_alive() yet, or maybe we had but didn't want to trust it. However, with the coding you have here, we're fully exposed to any failure modes postmaster_is_alive() may have; so there's not a lot of value in accepting those and get_pgpid's failure modes too. Thank you for reviewing the patch so quickly. You are right, I don't think get_pgpid() here is no longer necessary. If the pid changes in one second, i.e. the original postgres terminates and "pg_ctl start" starts another one, "pg_ctl stop" can terminate successfully because the original postgres it was waiting for actually terminated. I'll submit the revised patch tomorrow. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
"MauMau" writes: > The problem occurs in the sequence below: > 1. postmaster creates $PGDATA/postmaster.pid. > 2. postmaster tries to resolve the value of listen_addresses to IP > addresses. This took about 15 seconds in my failure scenario. > 3. During 2, pg_ctl sends SIGTERM to postmaster. > 4. postmaster terminates immediately without deleting > $PGDATA/postmaster.pid. This is because it hasn't set signal handlers yet. > 5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears. > But the file does not disappear and it times out. Hm. I wonder if we shouldn't block SIGTERM etc. earlier. It hardly seems improbable that such signals would arrive during a slow startup. > *** 907,913 > > for (cnt = 0; cnt < wait_seconds; cnt++) > { > ! if ((pid = get_pgpid()) != 0) > { > print_msg("."); > pg_usleep(100); /* 1 sec */ > --- 907,914 > > for (cnt = 0; cnt < wait_seconds; cnt++) > { > ! if ((pid = get_pgpid()) != 0 && > ! postmaster_is_alive((pid_t) pid)) > { > print_msg("."); > pg_usleep(100); /* 1 sec */ If you're going to do a postmaster_is_alive check, why bother with repeated get_pgpid()? I think the reason why it was coded like that was that we hadn't written postmaster_is_alive() yet, or maybe we had but didn't want to trust it. However, with the coding you have here, we're fully exposed to any failure modes postmaster_is_alive() may have; so there's not a lot of value in accepting those and get_pgpid's failure modes too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] "pg_ctl stop" times out when it should respond quickly
Hello, I've encountered a small bug and fixed it. I guess this occurs on all major releases. I saw this happen on 9.2 and 9.4devel. Please find attached the patch and commit this. [Problem] If I mistakenly set an invalid value to listen_addresses, say '-1', and start the database server, it fails to start as follows. In my environment (RHEL6 for Intel64), it takes about 15 seconds before postgres prints the messages. This is OK. [maumau@myhost pgdata]$ pg_ctl -w start waiting for server to startLOG: could not translate host name "-1", service "5450" to address: Temporary failure in name resolution WARNING: could not create listen socket for "-1" FATAL: could not create any TCP/IP sockets stopped waiting pg_ctl: could not start server Examine the log output. [maumau@myhost pgdata]$ When I start the server without -w and try to stop it, "pg_ctl stop" waits for 60 seconds and timed out before it fails. This is what I'm seeing as a problem. I expected "pg_ctl stop" to respond quickly with success or failure depending on the timing. [maumau@myhost pgdata]$ pg_ctl start server starting ...(a few seconds later) [maumau@myhost ~]$ pg_ctl stop waiting for server to shut down. .. failed pg_ctl: server does not shut down HINT: The "-m fast" option immediately disconnects sessions rather than waiting for session-initiated disconnection. [maumau@myhost ~]$ [Cause] The problem occurs in the sequence below: 1. postmaster creates $PGDATA/postmaster.pid. 2. postmaster tries to resolve the value of listen_addresses to IP addresses. This took about 15 seconds in my failure scenario. 3. During 2, pg_ctl sends SIGTERM to postmaster. 4. postmaster terminates immediately without deleting $PGDATA/postmaster.pid. This is because it hasn't set signal handlers yet. 5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears. But the file does not disappear and it times out. [Fix] Make pg_ctl check if postmaster is still alive, because postmaster might have crashed unexpectedly. Regards MauMau pg_stop_fail.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers