On Wed, Oct 30, 2013 at 09:07:43AM -0400, Robert Haas wrote:
> On Wed, Oct 30, 2013 at 8:47 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> > On 2013-10-30 08:45:03 -0400, Robert Haas wrote:
> >> If I'm reading this correctly, the last three runs on frogmouth have
> >> all failed, and all of them have failed with a complaint about,
> >> specifically, Global/PostgreSQL.851401618.  Now, that really shouldn't
> >> be happening, because the code to choose that number looks like this:
> >>
> >>         dsm_control_handle = random();

> > Could it be that we haven't primed the random number generator with the
> > time or something like that yet?
> 
> Yeah, I think that's probably what it is.

I experienced a variation of this, namely a RHEL 7 system where initdb always
says "selecting dynamic shared memory implementation ... sysv".  Each initdb
is rejecting posix shm by probing the same ten segments:

$ strace initdb -D scratch 2>&1 | grep /dev/shm/P
open("/dev/shm/PostgreSQL.1804289383", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.846930886", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.1681692777", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.1714636915", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.1957747793", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.424238335", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.719885386", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.1649760492", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.596516649", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
open("/dev/shm/PostgreSQL.1189641421", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists)

Regular postmaster runs choose a random segment, but initdb, bootstrap
postgres, and single-user postgres all start with the same segment.  These
segments are months old.  Perhaps I was testing something that caused a
bootstrap postgres to crash.  After ten such crashes, future initdb runs
considered posix shm unusable.

> There's PostmasterRandom()
> to initialize the random-number generator on first use, but that
> doesn't help if some other module calls random().  I wonder if we
> ought to just get rid of PostmasterRandom() and instead have the
> postmaster run that initialization code very early in startup.

Usually, the first srandom() call happens early in PostmasterMain().  I plan
to add one to InitStandaloneProcess(), which substitutes for several tasks
otherwise done in PostmasterMain().  That seems like a good thing even if DSM
weren't in the picture.  Also, initdb needs an srandom() somewhere;
choose_dsm_implementation() itself seems fine.  Attached.  With this, "make
-j20 check-world" selected posix shm and passed even when I forced DSM
creation to fail on unseeded random():

--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -249,2 +249,5 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size 
request_size,
 
+       if (handle == 1804289383)
+               elog(ERROR, "generated handle with no randomness");
+
        snprintf(name, 64, "/PostgreSQL.%u", handle);
diff --git a/src/backend/utils/init/miscinit.c 
b/src/backend/utils/init/miscinit.c
index 865119d..f003831 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -325,6 +325,8 @@ InitStandaloneProcess(const char *argv0)
 
        MyStartTime = time(NULL);       /* set our start time in case we call 
elog */
 
+       srandom((unsigned int) (MyProcPid ^ MyStartTime));
+
        /* Initialize process-local latch support */
        InitializeLatchSupport();
        MyLatch = &LocalLatchData;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 32746c7..83f4b0b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -922,6 +922,8 @@ choose_dsm_implementation(void)
 #ifdef HAVE_SHM_OPEN
        int                     ntries = 10;
 
+       srandom((unsigned int) (getpid() ^ time(NULL)));
+
        while (ntries > 0)
        {
                uint32          handle;

Reply via email to