[moving to -hackers] On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas <robertmh...@gmail.com> wrote: > I suspect this is the same problem as bug #4897, and probably also the > same problem as this: > http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php > > and maybe also this and this: > http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php > http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php > > Unfortunately, it seems that no one has been able to get a stack trace yet.
Bruce pointed out yet another report of this problem to me: http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php After some discussion with Magnus, I think what is going on here is that the postmaster kicks off a new child process, which terminates before it actually starts running our code, either in OS-supplied code or some sort of "filter" like anti-spam or anti-virus software. It's presumably NOT dying in our code because - at least AFAICS - we don't exit(128) anywhere. One way we could possibly improve the situation is to not treat this as a child crash - that is, don't do a crash-and-restart cycle; just treat that backend as having done elog(FATAL). The trick is that you need a reliable way to distinguish between a regular child crash and an "early" child crash. Magnus suggested perhaps we could create a mutex that the child grabs before mapping shared memory; the postmaster could check whether the mutex had been taken. If so, we handle the crash normally; if not, we just chalk it up to experience and continue on. This isn't really a "fix" for the bug in the sense that the nicest thing of all would be to prevent the child from exiting abnormally in the first place. But it's far from clear that we can control that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers