Hi,

Recently, one of our clients reported a problem that Windows 10 sometime 
(approximately once in 300 tries) hung up at OS starting up while PostgreSQL
9.3.x service is starting up. My co-worker analyzed this and found that
PostgreSQL's auxiliary process and Windows' logon processes are in a dead-lock
situation.

Although this problem have been found only with PostgreSQL 9.3.x and Windows 10
in our client's environment for now, maybe the same problem occurs with other 
versions of PostgreSQL.

He reported this problem to pgsql-general list as below. Also, he created a 
patch
to add a build-time option for adding 0.5 or 3.0 seconds delay after each sub 
process starts.  The attached is the same one.  Our client confirmed that this 
patch resolves the dead-lock problem. Is it acceptable to add this option to 
PostgreSQL?  Any comment would be appreciated.

Regards,




Begin forwarded message:

Date: Fri, 29 Jun 2018 15:03:10 +0900
From: TAKATSUKA Haruka <haru...@sraoss.co.jp>
To: pgsql-gene...@postgresql.org
Subject: Windows 10 got stuck with PostgreSQL at starting up. Adding delay lets 
it avoid.


I got a trouble in PostgreSQL 9.3.x on Windows 10.
I would like to add new delay code as an official build option.

Windows 10 sometime (approximately once in 300 tries) hung up 
at OS starting up. The logs say it happened while the PostgreSQL 
service was starting. When OS stopped, some postgres auxiliary 
process were started and some were not started yet. 

The Windows dump say some threads of the postgres auxiliary process
are waiting OS level locks and the logon processes’thread are
also waiting a lock. MS help desk said that PostgreSQL’s OS level 
deadlock caused OS freeze. I think it is strange story. But, 
in fact, it not happened in repeated tests when I got rid of 
PostgreSQL from the initial auto-starting services.

I tweaked PostgreSQL 9.3.x (the newest from the repository) to add 
0.5 or 3.0 seconds delay after each sub process starts. 
And then the hung up was gone. This test patch is attached. 
It is only implemented for Windows. Also, I did not use existing 
pg_usleep because it contains locking codes (e.g. WaitForSingleObject
and Enter/LeaveCriticalSection).

Although Windows OS may have some problems, I think we should have
a means to avoid it. Can PostgreSQL be accepted such delay codes
as build-time options by preprocessor variables?


Thanks,
Takatsuka Haruka


-- 
Yugo Nagata <nag...@sraoss.co.jp>
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index d6fc2ed..ff03ebd 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -398,6 +398,30 @@ extern int optreset;                       /* might not be 
declared by system headers */
 static DNSServiceRef bonjour_sdref = NULL;
 #endif
 
+#define USE_AFTER_AUX_FORK_SLEEP 3000
+
+#ifdef USE_AFTER_AUX_FORK_SLEEP
+#ifndef WIN32
+#define AFTER_AUX_FORK_SLEEP()
+#else
+#define AFTER_AUX_FORK_SLEEP() do { SleepEx(USE_AFTER_AUX_FORK_SLEEP, FALSE); 
} while(0)
+#endif
+#else
+#define AFTER_AUX_FORK_SLEEP()
+#endif
+
+#define USE_AFTER_BACKEND_FORK_SLEEP 500
+
+#ifdef USE_AFTER_BACKEND_FORK_SLEEP
+#ifndef WIN32
+#define AFTER_BACKEND_FORK_SLEEP()
+#else
+#define AFTER_BACKEND_FORK_SLEEP() do { SleepEx(USE_AFTER_BACKEND_FORK_SLEEP, 
FALSE); } while(0)
+#endif
+#else
+#define AFTER_BACKEND_FORK_SLEEP()
+#endif
+
 /*
  * postmaster.c - function prototypes
  */
@@ -1709,6 +1733,7 @@ ServerLoop(void)
                                                 */
                                                StreamClose(port->sock);
                                                ConnFree(port);
+                                               AFTER_BACKEND_FORK_SLEEP();
                                        }
                                }
                        }
@@ -2801,11 +2826,20 @@ reaper(SIGNAL_ARGS)
                         * situation, some of them may be alive already.
                         */
                        if (!IsBinaryUpgrade && AutoVacuumingActive() && 
AutoVacPID == 0)
+                       {
                                AutoVacPID = StartAutoVacLauncher();
+                               AFTER_AUX_FORK_SLEEP(); 
+                       }
                        if (XLogArchivingActive() && PgArchPID == 0)
+                       {
                                PgArchPID = pgarch_start();
+                               AFTER_AUX_FORK_SLEEP();
+                       }
                        if (PgStatPID == 0)
+                       {
                                PgStatPID = pgstat_start();
+                               AFTER_AUX_FORK_SLEEP();
+                       }
 
                        /* some workers may be scheduled to start now */
                        maybe_start_bgworker();
@@ -5259,6 +5293,7 @@ StartChildProcess(AuxProcType type)
        /*
         * in parent, successful fork
         */
+       AFTER_AUX_FORK_SLEEP();
        return pid;
 }
 

Reply via email to