On Wed, Oct 30, 2013 at 09:07:43AM -0400, Robert Haas wrote: > On Wed, Oct 30, 2013 at 8:47 AM, Andres Freund <and...@2ndquadrant.com> wrote: > > On 2013-10-30 08:45:03 -0400, Robert Haas wrote: > >> If I'm reading this correctly, the last three runs on frogmouth have > >> all failed, and all of them have failed with a complaint about, > >> specifically, Global/PostgreSQL.851401618. Now, that really shouldn't > >> be happening, because the code to choose that number looks like this: > >> > >> dsm_control_handle = random();
> > Could it be that we haven't primed the random number generator with the > > time or something like that yet? > > Yeah, I think that's probably what it is. I experienced a variation of this, namely a RHEL 7 system where initdb always says "selecting dynamic shared memory implementation ... sysv". Each initdb is rejecting posix shm by probing the same ten segments: $ strace initdb -D scratch 2>&1 | grep /dev/shm/P open("/dev/shm/PostgreSQL.1804289383", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.846930886", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.1681692777", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.1714636915", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.1957747793", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.424238335", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.719885386", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.1649760492", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.596516649", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) open("/dev/shm/PostgreSQL.1189641421", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = -1 EEXIST (File exists) Regular postmaster runs choose a random segment, but initdb, bootstrap postgres, and single-user postgres all start with the same segment. These segments are months old. Perhaps I was testing something that caused a bootstrap postgres to crash. After ten such crashes, future initdb runs considered posix shm unusable. > There's PostmasterRandom() > to initialize the random-number generator on first use, but that > doesn't help if some other module calls random(). I wonder if we > ought to just get rid of PostmasterRandom() and instead have the > postmaster run that initialization code very early in startup. Usually, the first srandom() call happens early in PostmasterMain(). I plan to add one to InitStandaloneProcess(), which substitutes for several tasks otherwise done in PostmasterMain(). That seems like a good thing even if DSM weren't in the picture. Also, initdb needs an srandom() somewhere; choose_dsm_implementation() itself seems fine. Attached. With this, "make -j20 check-world" selected posix shm and passed even when I forced DSM creation to fail on unseeded random(): --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -249,2 +249,5 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, + if (handle == 1804289383) + elog(ERROR, "generated handle with no randomness"); + snprintf(name, 64, "/PostgreSQL.%u", handle);
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index 865119d..f003831 100644 --- a/src/backend/utils/init/miscinit.c +++ b/src/backend/utils/init/miscinit.c @@ -325,6 +325,8 @@ InitStandaloneProcess(const char *argv0) MyStartTime = time(NULL); /* set our start time in case we call elog */ + srandom((unsigned int) (MyProcPid ^ MyStartTime)); + /* Initialize process-local latch support */ InitializeLatchSupport(); MyLatch = &LocalLatchData; diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 32746c7..83f4b0b 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -922,6 +922,8 @@ choose_dsm_implementation(void) #ifdef HAVE_SHM_OPEN int ntries = 10; + srandom((unsigned int) (getpid() ^ time(NULL))); + while (ntries > 0) { uint32 handle;