"MauMau" <maumau...@gmail.com> writes: > The problem occurs in the sequence below:
> 1. postmaster creates $PGDATA/postmaster.pid. > 2. postmaster tries to resolve the value of listen_addresses to IP > addresses. This took about 15 seconds in my failure scenario. > 3. During 2, pg_ctl sends SIGTERM to postmaster. > 4. postmaster terminates immediately without deleting > $PGDATA/postmaster.pid. This is because it hasn't set signal handlers yet. > 5. "pg_ctl stop" waits in a loop until $PGDATA/postmaster.pid disappears. > But the file does not disappear and it times out. Hm. I wonder if we shouldn't block SIGTERM etc. earlier. It hardly seems improbable that such signals would arrive during a slow startup. > *** 907,913 **** > > for (cnt = 0; cnt < wait_seconds; cnt++) > { > ! if ((pid = get_pgpid()) != 0) > { > print_msg("."); > pg_usleep(1000000); /* 1 sec */ > --- 907,914 ---- > > for (cnt = 0; cnt < wait_seconds; cnt++) > { > ! if ((pid = get_pgpid()) != 0 && > ! postmaster_is_alive((pid_t) pid)) > { > print_msg("."); > pg_usleep(1000000); /* 1 sec */ If you're going to do a postmaster_is_alive check, why bother with repeated get_pgpid()? I think the reason why it was coded like that was that we hadn't written postmaster_is_alive() yet, or maybe we had but didn't want to trust it. However, with the coding you have here, we're fully exposed to any failure modes postmaster_is_alive() may have; so there's not a lot of value in accepting those and get_pgpid's failure modes too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers