subject:"Re\: \[HACKERS\] \[BUGS\] BUG #5305\: Postgres service stops when closing Windows session"

On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't 
 know.

 And here is confirmation from the Microsoft web site:

       In some instances, calling GetExitCode() against the failed process
       indicates the following exit code:
       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.

 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

Just to be clear, do you mean something as simple as this?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


win32_128.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Magnus Hagander mag...@hagander.net writes:
 On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

 Just to be clear, do you mean something as simple as this?

That seems like a rather klugy place and way to insert the fix.  One
complaint about it is that the notice won't get logged nicely.  It'd be
better if the main reaper() code was responsible for ignoring 128 so
that it could log the fact that it'd done so in the regular postmaster
log.

Another issue is that nonfatal doesn't mean successful.  In
particular, if this happened for the startup process, or probably some
other cases, taking the exit code as 0 would cause seriously wrong
things to happen.

On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
made it accept 128 as a normal exit case.  That would allow normal
processing to continue only when this happens to a regular backend,
which is probably sufficient for the purpose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Thu, Sep 9, 2010 at 19:48, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

 Just to be clear, do you mean something as simple as this?

 That seems like a rather klugy place and way to insert the fix.  One
 complaint about it is that the notice won't get logged nicely.  It'd be
 better if the main reaper() code was responsible for ignoring 128 so
 that it could log the fact that it'd done so in the regular postmaster
 log.

Agreed - I just wanted to throw it in somewhere for testing. Should've
mentioned htat.


 Another issue is that nonfatal doesn't mean successful.  In
 particular, if this happened for the startup process, or probably some
 other cases, taking the exit code as 0 would cause seriously wrong
 things to happen.

 On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
 made it accept 128 as a normal exit case.  That would allow normal
 processing to continue only when this happens to a regular backend,
 which is probably sufficient for the purpose.

Seems reasonable. I'll whack it around for that - see attached.

Dave has a reasonably reproducible test environment. Unforunately it's
on 8.3, so this patch will be completely unsafe there (it doesn't have
the deadman switch). But hopefully it can be used to see it fixes this
problem (while introducing others)h


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


win32_128.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Magnus Hagander mag...@hagander.net writes:
 On Thu, Sep 9, 2010 at 19:48, Tom Lane t...@sss.pgh.pa.us wrote:
 On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
 made it accept 128 as a normal exit case.

 Seems reasonable. I'll whack it around for that - see attached.

Hm, still doesn't log, which I think it should, even for testing
purposes (how will you know the case occurred?).  Maybe like this:

/*
 * If a backend dies in an ugly way then we must signal all other 
backends
 * to quickdie.  If exit status is zero (normal) or one (FATAL exit), we
 * assume everything is all right and proceed to remove the backend from
 * the active backend list.
+*
+* On Windows, also treat ERROR_WAIT_NO_CHILDREN (128) as a nonfatal
+* case, since that sometimes happens under load.
 */
+#ifdef WIN32
+   if (exitstatus == ERROR_WAIT_NO_CHILDREN)
+   {
+   LogChildExit(LOG, _(server process), pid, exitstatus);
+   exitstatus = 0;
+   }
+#endif
+
if (!EXIT_STATUS_0(exitstatus)  !EXIT_STATUS_1(exitstatus))
{
HandleChildCrash(pid, exitstatus, _(server process));
return;
}


 Dave has a reasonably reproducible test environment. Unforunately it's
 on 8.3, so this patch will be completely unsafe there (it doesn't have
 the deadman switch). But hopefully it can be used to see it fixes this
 problem (while introducing others)h

Sounds like a plan.

We're not so worried about this case that we'd want to backport the
deadman switch into 8.3 or 8.2 to have a fix there, are we?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-09 Thread Robert Haas

On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We're not so worried about this case that we'd want to backport the
 deadman switch into 8.3 or 8.2 to have a fix there, are we?

I think we should consider backporting the deadman switch to 8.3 and 8.2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Robert Haas robertmh...@gmail.com writes:
 On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We're not so worried about this case that we'd want to backport the
 deadman switch into 8.3 or 8.2 to have a fix there, are we?

 I think we should consider backporting the deadman switch to 8.3 and 8.2.

[ raised eyebrow... ]  Weren't you the one just lecturing me about
minimizing changes in back branches?

That was a fairly large patch, and I *don't* want to back-port it.
The thrust of my question was more along the lines of whether we should
look for a different solution to the current problem, so that we would
have something that could be back-ported into 8.2 and 8.3.  Personally
I'm satisfied with only fixing it in 8.4 and up, but then again I don't
use Windows.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Thu, Sep 9, 2010 at 21:00, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We're not so worried about this case that we'd want to backport the
 deadman switch into 8.3 or 8.2 to have a fix there, are we?

 I think we should consider backporting the deadman switch to 8.3 and 8.2.

 [ raised eyebrow... ]  Weren't you the one just lecturing me about
 minimizing changes in back branches?

 That was a fairly large patch, and I *don't* want to back-port it.
 The thrust of my question was more along the lines of whether we should
 look for a different solution to the current problem, so that we would
 have something that could be back-ported into 8.2 and 8.3.  Personally
 I'm satisfied with only fixing it in 8.4 and up, but then again I don't
 use Windows.

Once we've shown that it works, I think we should look at doing
something for = 8.3 as well.

How about something along the line of y previous patch (with the
event) for 8.2 and 8.3, and then this simplified one for 8.4+?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Magnus Hagander mag...@hagander.net writes:
 On Thu, Sep 9, 2010 at 21:00, Tom Lane t...@sss.pgh.pa.us wrote:
 The thrust of my question was more along the lines of whether we should
 look for a different solution to the current problem, so that we would
 have something that could be back-ported into 8.2 and 8.3.  Personally
 I'm satisfied with only fixing it in 8.4 and up, but then again I don't
 use Windows.

 Once we've shown that it works, I think we should look at doing
 something for = 8.3 as well.

 How about something along the line of y previous patch (with the
 event) for 8.2 and 8.3, and then this simplified one for 8.4+?

Actually, I was just wondering how much we really need the dead-man
switch for this patch.  If we don't have it, then what we risk is that
exit(128) will be taken as successful exit when it shouldn't be.  But
how likely is it that such a call will ever be made?  I think accepting
that small risk might be reasonable in the old branches.  It's not like
the other possible fixes are zero-risk in themselves; especially not
patches that are only meant for the old branches and will never get
testing in HEAD.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-09 Thread Robert Haas

On Thu, Sep 9, 2010 at 3:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We're not so worried about this case that we'd want to backport the
 deadman switch into 8.3 or 8.2 to have a fix there, are we?

 I think we should consider backporting the deadman switch to 8.3 and 8.2.

 [ raised eyebrow... ]  Weren't you the one just lecturing me about
 minimizing changes in back branches?

They call me Professor Haas?

I believe the specific nature of my complaint was that we should only
back-patch important bug or security fixes.  I think that there is
credible argument that unnecessary database PANICs fall into that
category and wonky whitespace in the ps output does not.  YMMV, of
course.

 That was a fairly large patch, and I *don't* want to back-port it.
 The thrust of my question was more along the lines of whether we should
 look for a different solution to the current problem, so that we would
 have something that could be back-ported into 8.2 and 8.3.  Personally
 I'm satisfied with only fixing it in 8.4 and up, but then again I don't
 use Windows.

I'm a bit surprised that you don't think this is back-patchable
material, considering the last paragraph of the commit message, which
seems to imply that you at least gave the matter some brief
consideration before deciding against it:

Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.

We certainly now have MANY documented field complaints at least of the
exit-128-on-Windows problem, if not the more general
backend-exits-without-going-through-the-normal-cleanup-path problem.
Having said that, I'd be just as happy to go back to Magnus's original
solution, which didn't depend on the dead-man switch anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Robert Haas robertmh...@gmail.com writes:
 We certainly now have MANY documented field complaints at least of the
 exit-128-on-Windows problem, if not the more general
 backend-exits-without-going-through-the-normal-cleanup-path problem.

Right, which is why I still don't care to risk back-porting a fix for
the latter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-09 Thread Robert Haas

On Thu, Sep 9, 2010 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 We certainly now have MANY documented field complaints at least of the
 exit-128-on-Windows problem, if not the more general
 backend-exits-without-going-through-the-normal-cleanup-path problem.

 Right, which is why I still don't care to risk back-porting a fix for
 the latter.

It's hard to say what the safest option is, I think.  There seem to be
basically three proposals on the table:

1. Back-port the dead-man switch, and ignore exit 128.
2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
3. Revert to Magnus's original solution.

Each of these has advantages and disadvantages.  The advantage of #1
is that it is safer than #2, and that is usually something we prize
fairly highly.  The disadvantage of #1 is that it involves
back-porting the dead-man switch, but on the flip side that code has
been out in the field for over a year now in 8.4, and AFAIK we haven't
any trouble with it.  Solution #3 should be approximately as safe as
solution #1, and has the advantage of touching less code in the back
branches, but on the other hand it is also NEW code.  So I think it's
arguable which is the best solution.  I think I like option #2 least
as among those choices, but it's a tough call.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Robert Haas robertmh...@gmail.com writes:
 It's hard to say what the safest option is, I think.  There seem to be
 basically three proposals on the table:

 1. Back-port the dead-man switch, and ignore exit 128.
 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
 3. Revert to Magnus's original solution.

 Each of these has advantages and disadvantages.  The advantage of #1
 is that it is safer than #2, and that is usually something we prize
 fairly highly.  The disadvantage of #1 is that it involves
 back-porting the dead-man switch, but on the flip side that code has
 been out in the field for over a year now in 8.4, and AFAIK we haven't
 any trouble with it.  Solution #3 should be approximately as safe as
 solution #1, and has the advantage of touching less code in the back
 branches, but on the other hand it is also NEW code.  So I think it's
 arguable which is the best solution.  I think I like option #2 least
 as among those choices, but it's a tough call.

Well, I don't want to use Magnus' original solution in 8.4 or up,
so I don't like #3 much: it's not only new code but code which would
get very limited testing.  And I don't believe that the risk of
unexpected use of exit(128) is large enough to make #1 preferable to #2.
YMMV.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Thu, Sep 9, 2010 at 22:09, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It's hard to say what the safest option is, I think.  There seem to be
 basically three proposals on the table:

 1. Back-port the dead-man switch, and ignore exit 128.
 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
 3. Revert to Magnus's original solution.

 Each of these has advantages and disadvantages.  The advantage of #1
 is that it is safer than #2, and that is usually something we prize
 fairly highly.  The disadvantage of #1 is that it involves
 back-porting the dead-man switch, but on the flip side that code has
 been out in the field for over a year now in 8.4, and AFAIK we haven't
 any trouble with it.  Solution #3 should be approximately as safe as
 solution #1, and has the advantage of touching less code in the back
 branches, but on the other hand it is also NEW code.  So I think it's
 arguable which is the best solution.  I think I like option #2 least
 as among those choices, but it's a tough call.

 Well, I don't want to use Magnus' original solution in 8.4 or up,
 so I don't like #3 much: it's not only new code but code which would
 get very limited testing.  And I don't believe that the risk of
 unexpected use of exit(128) is large enough to make #1 preferable to #2.
 YMMV.

I agree on option #3 not being good - that'd basically be dead-end
code in backbranches only, and it's significantly different.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-09 Thread Bruce Momjian

Robert Haas wrote:
 On Thu, Sep 9, 2010 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  We certainly now have MANY documented field complaints at least of the
  exit-128-on-Windows problem, if not the more general
  backend-exits-without-going-through-the-normal-cleanup-path problem.
 
  Right, which is why I still don't care to risk back-porting a fix for
  the latter.
 
 It's hard to say what the safest option is, I think.  There seem to be
 basically three proposals on the table:
 
 1. Back-port the dead-man switch, and ignore exit 128.
 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
 3. Revert to Magnus's original solution.
 
 Each of these has advantages and disadvantages.  The advantage of #1
 is that it is safer than #2, and that is usually something we prize
 fairly highly.  The disadvantage of #1 is that it involves
 back-porting the dead-man switch, but on the flip side that code has
 been out in the field for over a year now in 8.4, and AFAIK we haven't
 any trouble with it.  Solution #3 should be approximately as safe as
 solution #1, and has the advantage of touching less code in the back
 branches, but on the other hand it is also NEW code.  So I think it's
 arguable which is the best solution.  I think I like option #2 least
 as among those choices, but it's a tough call.

Well, the dead-man timer is for all platforms, while the 128 return
failure is Win32-only, so I don't see why applying the dead-man timer
makes sense when it might destabalize all platforms, when the bug is
just on Win32, and I don't think using defines to make the dead-man
timer Win32-only makes sense.

I think we have clear enough evidence that 128 on Win32 means
no-such-child and we can be sure the child never got started on that
platform.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-01 Thread Dave Page

On Wed, Sep 1, 2010 at 3:49 PM, Cristian Bittel cbit...@gmail.com wrote:
Maybe the issue, for the momtent, could be avoided modifying the shared heap
for sessions on Windows. But I don't really have idea how much to increase
or decrease the values. Try and error? But, inside the opened Windows
sessions nothing alerts of a heap exaust so could be unpredictable how much
to change the values until the next PostgreSQL service crash...
32-bits: http://support.microsoft.com/kb/184802

There are several reports for another services with the same behavior
including exit code 128 and a workaround to increase the heap on old Windows
versions but the Exit Code 128 seems to apply to Windows 2003 Server x64
also. And seems to be improved in Windows 2008 where heap is not fixed.
https://fogbugz.bitvise.com/default.asp?WinSSHD.1.12888.2
http://support.microsoft.com/kb/824422

Given the unpredictability, if this is connected to desktop heap I
don't think it's running out of per-session memory, so much as the
system-wide heap (which, afaict, is fixed at 48MB). That might explain
why a desktop session could affect other sessions.

Is this a terminal server, with lots of interactive users? Can you
check the heap usage using the desktop heap monitor:
http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8bdisplaylang=en

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-01 Thread Cristian Bittel

Maybe the issue, for the momtent, could be avoided modifying the shared heap
for sessions on Windows. But I don't really have idea how much to increase
or decrease the values. Try and error? But, inside the opened Windows
sessions nothing alerts of a heap exaust so could be unpredictable how much
to change the values until the next PostgreSQL service crash...
32-bits: http://support.microsoft.com/kb/184802
http://support.microsoft.com/kb/184802%20

There are several reports for another services with the same behavior
including exit code 128 and a workaround to increase the heap on old Windows
versions but the Exit Code 128 seems to apply to Windows 2003 Server x64
also. And seems to be improved in Windows 2008 where heap is not fixed.
https://fogbugz.bitvise.com/default.asp?WinSSHD.1.12888.2
http://support.microsoft.com/kb/824422





2010/8/31 Bruce Momjian br...@momjian.us

 Dave Page wrote:
  On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian br...@momjian.us wrote:
   Dave Page wrote:
   On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian br...@momjian.us
 wrote:
We have already found that exceeding desktop heap might cause a
CreateProcess to return success but later fail with a return code of
128, which causes a server restart.
  
   That doesn't mean that this is desktop heap exhaustion though - just
   that it can cause the same effect.
  
   Right, but it is the only possible server crash cause we have come up
   with so far.
 
  Understood - I'm just unconvinced it's the cause - aside from the
  point I made earlier about heap exhaustion being very predictable and
  reproducible (which this issue apparently is not), when the server is
  run under the SCM, it creates a logon session for that service alone
  which has it's own heap allocation which is entirely independent of
  the allocation used by any interactive logon sessions.
 
  So unless there's a major isolation bug in Windows, any desktop heap
  usage in an interactive session for one user should have zero effect
  on a non-interactive session for another user.

 Well, the only description that we have ever heard that makes sense is
 some kind of heap exhaustion, perhaps triggered by a Windows bug that
 doesn't properly track heap allocations sometimes.

 Of course, the cause might be aliens, but we don't have any evidence of
 that either.  :-|

 What we do know is that CreateProcess is returning success, and the
 child is exiting with 128 no_such_child, and that logging out can
 trigger it sometimes.

 --
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Sun, Aug 29, 2010 at 12:05 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Aug 26, 2010 at 22:59, Cristian Bittel cbit...@gmail.com wrote:
 I still believe this exit code 128 is related to pgAdmin opened during the
 clossing session on Remote Desktop. I have a Windows user login wich is not
 administrator just no privileged user, it cannot start/stop services, just
 monitoring. With pgAdmin window opened inside my disconected session, as
 Administrator if I close the another disconnected session, Postgres exit
 with 128 code.

 If the closing of a session on the remote desktop can affect a
 *service* then frankly that sounds like a serious isolation bug in
 Windows itself. The postmaster grabs the handle of the process when
 it's started and waits on that - that should never be affected by
 something in a different session.

 I think it's more likely that Windows just looses track when you
 terminate a lot of processes at once, and randomly kills off something
 - or at least *indicates* that something has been killed off.

 Did you reproduce this behavior?

 No, AFAIK nobody has managed to reproduce this behavior in any kind of
 consistent way. It's certainly been seen more than once in many
 places, but not consistently reproducible.

This behaviour, no - but desktop heap exhaustion is very easy to
reproduce. That's because the heap usage is caused by user32.dll which
uses a consistent amount with each process started, which is allocated
as the process is created. When I was working on the issue a couple of
years ago, it was entirely predictable - user32.dll allocates N bytes
and as soon as N * numbackends exceeds the allocated heap size, we
fall over.

It shouldn't matter as desktop heap is allocated on a per-session
basis, but are you logging on using the service account to run your
admin tasks Cristian? If so, do you see the problem if you login
interactively using a different account?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel cbit...@gmail.com wrote:
 To Dave's question, this behavior occurs on all Windows Server interactive
 sessions, no matter if Administrators or underpriviledge users, but is
 related to closing Windows interactive session while pgAdmin window is
 opened and connected to service. Nobody logon to Windows using postgres
 service user.

Thanks Cristian.

Can you reproduce the problem if you use psql instead of pgAdmin? Both
use libpq to talk to the server, so if your theory is correct, I would
expect to see the same crash. It's hard to see what would bring the
server down though...

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-31 Thread Bruce Momjian

Dave Page wrote:
 On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel cbit...@gmail.com wrote:
  To Dave's question, this behavior occurs on all Windows Server interactive
  sessions, no matter if Administrators or underpriviledge users, but is
  related to closing Windows interactive session while pgAdmin window is
  opened and connected to service. Nobody logon to Windows using postgres
  service user.
 
 Thanks Cristian.
 
 Can you reproduce the problem if you use psql instead of pgAdmin? Both
 use libpq to talk to the server, so if your theory is correct, I would
 expect to see the same crash. It's hard to see what would bring the
 server down though...

We have already found that exceeding desktop heap might cause a
CreateProcess to return success but later fail with a return code of
128, which causes a server restart.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian br...@momjian.us wrote:
 Dave Page wrote:
 On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel cbit...@gmail.com wrote:
  To Dave's question, this behavior occurs on all Windows Server interactive
  sessions, no matter if Administrators or underpriviledge users, but is
  related to closing Windows interactive session while pgAdmin window is
  opened and connected to service. Nobody logon to Windows using postgres
  service user.

 Thanks Cristian.

 Can you reproduce the problem if you use psql instead of pgAdmin? Both
 use libpq to talk to the server, so if your theory is correct, I would
 expect to see the same crash. It's hard to see what would bring the
 server down though...

 We have already found that exceeding desktop heap might cause a
 CreateProcess to return success but later fail with a return code of
 128, which causes a server restart.

That doesn't mean that this is desktop heap exhaustion though - just
that it can cause the same effect.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-31 Thread Bruce Momjian

Dave Page wrote:
 On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian br...@momjian.us wrote:
  Dave Page wrote:
  On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel cbit...@gmail.com wrote:
   To Dave's question, this behavior occurs on all Windows Server 
   interactive
   sessions, no matter if Administrators or underpriviledge users, but is
   related to closing Windows interactive session while pgAdmin window is
   opened and connected to service. Nobody logon to Windows using postgres
   service user.
 
  Thanks Cristian.
 
  Can you reproduce the problem if you use psql instead of pgAdmin? Both
  use libpq to talk to the server, so if your theory is correct, I would
  expect to see the same crash. It's hard to see what would bring the
  server down though...
 
  We have already found that exceeding desktop heap might cause a
  CreateProcess to return success but later fail with a return code of
  128, which causes a server restart.
 
 That doesn't mean that this is desktop heap exhaustion though - just
 that it can cause the same effect.

Right, but it is the only possible server crash cause we have come up
with so far.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-31 Thread Cristian Bittel

I am the remote support guy for a web developed application
(Apache+PHP+Pg. Postgres is isolated on a server, Apache runs on another
server), and installed at our client, our client is the Administrator user
on Windows Server, I just have a limited privileges Windows user for
monitoring. I have my own support superuser (not postgres user) for
Postgres database to monitor the status, logs and to perform stats queries.

To Windows Server I just can login using remote desktop, my interactive user
cannot start or stop the PostgreSQL service or other services, just
Administrators users can do it.

From inside my underprivileged session on the Windows server I can open
pgAdmin and connect to Postgres service. When I left the pgAdmin connected
to Postgres service opened into the Windows session (session connected or
disconnected) and I or someone else (Administrators) close my session,
then is when PostgreSQL service crash. If inside the remote session I
normally close pgAdmin using the X button or FileExit or Ctrl+Q, that
not affect PostgreSQL service.

This is the major reason to think is pgAdmin.exe when forced shutdown by
terminating Windows session which sends abnormal signal to PostgreSQL
service.

Besides the abnormal signal that pgAdmin forced shutingdown could being send
to PostgreSQL service, the service itself also could catch that behavior in
any of the aproaches you are discussing for the service itself to ignore
that signal.

To Dave's question, this behavior occurs on all Windows Server interactive
sessions, no matter if Administrators or underpriviledge users, but is
related to closing Windows interactive session while pgAdmin window is
opened and connected to service. Nobody logon to Windows using postgres
service user.

Regards,

Cristian.



2010/8/31 Dave Page dp...@pgadmin.org

 On Sun, Aug 29, 2010 at 12:05 PM, Magnus Hagander mag...@hagander.net
 wrote:
  On Thu, Aug 26, 2010 at 22:59, Cristian Bittel cbit...@gmail.com
 wrote:
  I still believe this exit code 128 is related to pgAdmin opened during
 the
  clossing session on Remote Desktop. I have a Windows user login wich is
 not
  administrator just no privileged user, it cannot start/stop services,
 just
  monitoring. With pgAdmin window opened inside my disconected session, as
  Administrator if I close the another disconnected session, Postgres
 exit
  with 128 code.
 
  If the closing of a session on the remote desktop can affect a
  *service* then frankly that sounds like a serious isolation bug in
  Windows itself. The postmaster grabs the handle of the process when
  it's started and waits on that - that should never be affected by
  something in a different session.
 
  I think it's more likely that Windows just looses track when you
  terminate a lot of processes at once, and randomly kills off something
  - or at least *indicates* that something has been killed off.
 
  Did you reproduce this behavior?
 
  No, AFAIK nobody has managed to reproduce this behavior in any kind of
  consistent way. It's certainly been seen more than once in many
  places, but not consistently reproducible.

 This behaviour, no - but desktop heap exhaustion is very easy to
 reproduce. That's because the heap usage is caused by user32.dll which
 uses a consistent amount with each process started, which is allocated
 as the process is created. When I was working on the issue a couple of
 years ago, it was entirely predictable - user32.dll allocates N bytes
 and as soon as N * numbackends exceeds the allocated heap size, we
 fall over.

 It shouldn't matter as desktop heap is allocated on a per-session
 basis, but are you logging on using the service account to run your
 admin tasks Cristian? If so, do you see the problem if you login
 interactively using a different account?

 --
 Dave Page
 Blog: http://pgsnake.blogspot.com
 Twitter: @pgsnake

 EnterpriseDB UK: http://www.enterprisedb.com
 The Enterprise Postgres Company

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian br...@momjian.us wrote:
 Dave Page wrote:
 On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian br...@momjian.us wrote:
  We have already found that exceeding desktop heap might cause a
  CreateProcess to return success but later fail with a return code of
  128, which causes a server restart.

 That doesn't mean that this is desktop heap exhaustion though - just
 that it can cause the same effect.

 Right, but it is the only possible server crash cause we have come up
 with so far.

Understood - I'm just unconvinced it's the cause - aside from the
point I made earlier about heap exhaustion being very predictable and
reproducible (which this issue apparently is not), when the server is
run under the SCM, it creates a logon session for that service alone
which has it's own heap allocation which is entirely independent of
the allocation used by any interactive logon sessions.

So unless there's a major isolation bug in Windows, any desktop heap
usage in an interactive session for one user should have zero effect
on a non-interactive session for another user.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-31 Thread Bruce Momjian

Dave Page wrote:
 On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian br...@momjian.us wrote:
  Dave Page wrote:
  On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian br...@momjian.us wrote:
   We have already found that exceeding desktop heap might cause a
   CreateProcess to return success but later fail with a return code of
   128, which causes a server restart.
 
  That doesn't mean that this is desktop heap exhaustion though - just
  that it can cause the same effect.
 
  Right, but it is the only possible server crash cause we have come up
  with so far.
 
 Understood - I'm just unconvinced it's the cause - aside from the
 point I made earlier about heap exhaustion being very predictable and
 reproducible (which this issue apparently is not), when the server is
 run under the SCM, it creates a logon session for that service alone
 which has it's own heap allocation which is entirely independent of
 the allocation used by any interactive logon sessions.
 
 So unless there's a major isolation bug in Windows, any desktop heap
 usage in an interactive session for one user should have zero effect
 on a non-interactive session for another user.

Well, the only description that we have ever heard that makes sense is
some kind of heap exhaustion, perhaps triggered by a Windows bug that
doesn't properly track heap allocations sometimes.

Of course, the cause might be aliens, but we don't have any evidence of
that either.  :-|

What we do know is that CreateProcess is returning success, and the
child is exiting with 128 no_such_child, and that logging out can
trigger it sometimes.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-29 Thread Magnus Hagander

On Thu, Aug 26, 2010 at 22:59, Cristian Bittel cbit...@gmail.com wrote:
 I still believe this exit code 128 is related to pgAdmin opened during the
 clossing session on Remote Desktop. I have a Windows user login wich is not
 administrator just no privileged user, it cannot start/stop services, just
 monitoring. With pgAdmin window opened inside my disconected session, as
 Administrator if I close the another disconnected session, Postgres exit
 with 128 code.

If the closing of a session on the remote desktop can affect a
*service* then frankly that sounds like a serious isolation bug in
Windows itself. The postmaster grabs the handle of the process when
it's started and waits on that - that should never be affected by
something in a different session.

I think it's more likely that Windows just looses track when you
terminate a lot of processes at once, and randomly kills off something
- or at least *indicates* that something has been killed off.

 Did you reproduce this behavior?

No, AFAIK nobody has managed to reproduce this behavior in any kind of
consistent way. It's certainly been seen more than once in many
places, but not consistently reproducible.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-26 Thread Robert Haas

On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't 
 know.

 And here is confirmation from the Microsoft web site:

       In some instances, calling GetExitCode() against the failed process
       indicates the following exit code:
       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.

 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

So do you want to code this up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-26 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

 So do you want to code this up?

Who, me?  I don't do Windows --- I'd have no way to test it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-26 Thread Cristian Bittel

I still believe this exit code 128 is related to pgAdmin opened during the
clossing session on Remote Desktop. I have a Windows user login wich is not
administrator just no privileged user, it cannot start/stop services, just
monitoring. With pgAdmin window opened inside my disconected session, as
Administrator if I close the another disconnected session, Postgres exit
with 128 code.

Did you reproduce this behavior?

Cristian.

2010/8/26 Tom Lane t...@sss.pgh.pa.us

 Robert Haas robertmh...@gmail.com writes:
  On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Given the existence of the deadman switch mechanism (which I hadn't
  remembered when this thread started), I'm coming around to the idea that
  we could just treat exit(128) as nonfatal on Windows.  If for some
  reason the child hadn't died instantly at startup, the deadman switch
  would distinguish that from the case described here.

  So do you want to code this up?

 Who, me?  I don't do Windows --- I'd have no way to test it.

regards, tom lane

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Robert Haas wrote:
 [moving to -hackers]
 
 On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas robertmh...@gmail.com wrote:
  I suspect this is the same problem as bug #4897, and probably also the
  same problem as this:
  http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
 
  and maybe also this and this:
  http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
  http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
 
  Unfortunately, it seems that no one has been able to get a stack trace yet.
 
 Bruce pointed out yet another report of this problem to me:
 
 http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php
 
 After some discussion with Magnus, I think what is going on here is
 that the postmaster kicks off a new child process, which terminates
 before it actually starts running our code, either in OS-supplied code
 or some sort of filter like anti-spam or anti-virus software.  It's
 presumably NOT dying in our code because - at least AFAICS - we don't
 exit(128) anywhere.  One way we could possibly improve the situation
 is to not treat this as a child crash - that is, don't do a
 crash-and-restart cycle; just treat that backend as having done
 elog(FATAL).  The trick is that you need a reliable way to distinguish
 between a regular child crash and an early child crash.  Magnus
 suggested perhaps we could create a mutex that the child grabs before
 mapping shared memory; the postmaster could check whether the mutex
 had been taken.  If so, we handle the crash normally; if not, we just
 chalk it up to experience and continue on.
 
 This isn't really a fix for the bug in the sense that the nicest
 thing of all would be to prevent the child from exiting abnormally in
 the first place.  But it's far from clear that we can control that.

This URL has some interesting details on our problem:

http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128

Error code 128 is identified as:

error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
processes to wait for

and the suggested cause is:

Have a look at Desktop Heap memory.

Essentially the desktop heap issue comes down to exhausted resources (eg
starting too many processes). When your app runs out of these resources,
one of the symptoms is that you won't be able to start a new process,
and the call to CreateProcess will fail with code 128.

My guess is that at the time of CreateProcess(), there is enough desktop
heap memory, but at some later time, perhaps caused by a logout, there
isn't and the process never gets started.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Tue, Aug 24, 2010 at 8:57 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 [moving to -hackers]

 On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas robertmh...@gmail.com wrote:
  I suspect this is the same problem as bug #4897, and probably also the
  same problem as this:
  http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
 
  and maybe also this and this:
  http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
  http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
 
  Unfortunately, it seems that no one has been able to get a stack trace yet.

 Bruce pointed out yet another report of this problem to me:

 http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php

 After some discussion with Magnus, I think what is going on here is
 that the postmaster kicks off a new child process, which terminates
 before it actually starts running our code, either in OS-supplied code
 or some sort of filter like anti-spam or anti-virus software.  It's
 presumably NOT dying in our code because - at least AFAICS - we don't
 exit(128) anywhere.  One way we could possibly improve the situation
 is to not treat this as a child crash - that is, don't do a
 crash-and-restart cycle; just treat that backend as having done
 elog(FATAL).  The trick is that you need a reliable way to distinguish
 between a regular child crash and an early child crash.  Magnus
 suggested perhaps we could create a mutex that the child grabs before
 mapping shared memory; the postmaster could check whether the mutex
 had been taken.  If so, we handle the crash normally; if not, we just
 chalk it up to experience and continue on.

 This isn't really a fix for the bug in the sense that the nicest
 thing of all would be to prevent the child from exiting abnormally in
 the first place.  But it's far from clear that we can control that.

 This URL has some interesting details on our problem:

        
 http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128

 Error code 128 is identified as:

        error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
        processes to wait for

 and the suggested cause is:

        Have a look at Desktop Heap memory.

        Essentially the desktop heap issue comes down to exhausted resources 
 (eg
        starting too many processes). When your app runs out of these 
 resources,
        one of the symptoms is that you won't be able to start a new process,
        and the call to CreateProcess will fail with code 128.

 My guess is that at the time of CreateProcess(), there is enough desktop
 heap memory, but at some later time, perhaps caused by a logout, there
 isn't and the process never gets started.

Yeah, that seems very plausible, although exactly how to verify I don't know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Robert Haas wrote:
  This isn't really a fix for the bug in the sense that the nicest
  thing of all would be to prevent the child from exiting abnormally in
  the first place. ?But it's far from clear that we can control that.
 
  This URL has some interesting details on our problem:
 
  ? ? ? 
  ?http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128
 
  Error code 128 is identified as:
 
  ? ? ? ?error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
  ? ? ? ?processes to wait for
 
  and the suggested cause is:
 
  ? ? ? ?Have a look at Desktop Heap memory.
 
  ? ? ? ?Essentially the desktop heap issue comes down to exhausted resources 
  (eg
  ? ? ? ?starting too many processes). When your app runs out of these 
  resources,
  ? ? ? ?one of the symptoms is that you won't be able to start a new process,
  ? ? ? ?and the call to CreateProcess will fail with code 128.
 
  My guess is that at the time of CreateProcess(), there is enough desktop
  heap memory, but at some later time, perhaps caused by a logout, there
  isn't and the process never gets started.
 
 Yeah, that seems very plausible, although exactly how to verify I don't know.

And here is confirmation from the Microsoft web site:

http://support.microsoft.com/kb/156484

Cmd.exe, Perl.exe, or other console-mode applications may fail to
initialize properly and terminate prematurely when launched by a service
using the CreateProcess() or CreateProcessAsUser() APIs. The calling
process has no way of knowing that the launched console-mode application
has terminated prematurely.

In some instances, calling GetExitCode() against the failed process
indicates the following exit code:
128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for. 
...
Internet Information Server (IIS) may exhibit this problem
intermittently when processing CGI or Perl scripts. In this case the
browser returns the following error when executing CGI scripts:

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-24 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't know.

 And here is confirmation from the Microsoft web site:

   In some instances, calling GetExitCode() against the failed process
   indicates the following exit code:
   128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for. 

Given the existence of the deadman switch mechanism (which I hadn't
remembered when this thread started), I'm coming around to the idea that
we could just treat exit(128) as nonfatal on Windows.  If for some
reason the child hadn't died instantly at startup, the deadman switch
would distinguish that from the case described here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Robert Haas wrote:
  Yeah, that seems very plausible, although exactly how to verify I don't 
  know.
 
  And here is confirmation from the Microsoft web site:
   
  In some instances, calling GetExitCode() against the failed process
  indicates the following exit code:
  128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for. 
 
 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

Agreed.  My guess is that there is some kind of Win32 OS race condition
in allocating desktop heap memory, and that sometimes with concurrent
CreateProcess() calls, a process gets started but can't complete its
creation.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Robert Haas wrote:
  Yeah, that seems very plausible, although exactly how to verify I don't 
  know.
 
  And here is confirmation from the Microsoft web site:
   
  In some instances, calling GetExitCode() against the failed process
  indicates the following exit code:
  128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for. 
 
 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

Here is a more detailed explaination of the failure and its relation to
desktop heap:

http://kbalertz.com/Feedback.aspx?kbNumber=184802

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-24 Thread Magnus Hagander

On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't 
 know.

 And here is confirmation from the Microsoft web site:

       In some instances, calling GetExitCode() against the failed process
       indicates the following exit code:
       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.

 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

Just because I had written it before you posted that, here's how the
win32-specific-set-a-flag-when-we're-in-control thing would look. But
if we're convinced that just ignoring error 128 is safe, then that's
obviously a simpler patch..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


win32_early_death.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

Magnus Hagander wrote:
 On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
  Bruce Momjian br...@momjian.us writes:
  Robert Haas wrote:
  Yeah, that seems very plausible, although exactly how to verify I don't 
  know.
 
  And here is confirmation from the Microsoft web site:
 
  ? ? ? In some instances, calling GetExitCode() against the failed process
  ? ? ? indicates the following exit code:
  ? ? ? 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait 
  for.
 
  Given the existence of the deadman switch mechanism (which I hadn't
  remembered when this thread started), I'm coming around to the idea that
  we could just treat exit(128) as nonfatal on Windows. ?If for some
  reason the child hadn't died instantly at startup, the deadman switch
  would distinguish that from the case described here.
 
 Just because I had written it before you posted that, here's how the
 win32-specific-set-a-flag-when-we're-in-control thing would look. But
 if we're convinced that just ignoring error 128 is safe, then that's
 obviously a simpler patch..

Can we please link to one of those URLs I mentioned so we have
definitive information on what is happening?  I think the Microsoft URL is
best:

http://support.microsoft.com/kb/156484

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-24 Thread Magnus Hagander

On Tue, Aug 24, 2010 at 21:14, Bruce Momjian br...@momjian.us wrote:
 Magnus Hagander wrote:
 On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
  Bruce Momjian br...@momjian.us writes:
  Robert Haas wrote:
  Yeah, that seems very plausible, although exactly how to verify I don't 
  know.
 
  And here is confirmation from the Microsoft web site:
 
  ? ? ? In some instances, calling GetExitCode() against the failed process
  ? ? ? indicates the following exit code:
  ? ? ? 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait 
  for.
 
  Given the existence of the deadman switch mechanism (which I hadn't
  remembered when this thread started), I'm coming around to the idea that
  we could just treat exit(128) as nonfatal on Windows. ?If for some
  reason the child hadn't died instantly at startup, the deadman switch
  would distinguish that from the case described here.

 Just because I had written it before you posted that, here's how the
 win32-specific-set-a-flag-when-we're-in-control thing would look. But
 if we're convinced that just ignoring error 128 is safe, then that's
 obviously a simpler patch..

 Can we please link to one of those URLs I mentioned so we have
 definitive information on what is happening?  I think the Microsoft URL is
 best:

        http://support.microsoft.com/kb/156484

That URL is specifically labeled to only be valid for NT4.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Tue, Aug 24, 2010 at 3:10 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't 
 know.

 And here is confirmation from the Microsoft web site:

       In some instances, calling GetExitCode() against the failed process
       indicates the following exit code:
       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait 
 for.

 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

 Just because I had written it before you posted that, here's how the
 win32-specific-set-a-flag-when-we're-in-control thing would look. But
 if we're convinced that just ignoring error 128 is safe, then that's
 obviously a simpler patch..

So, if we do this, what will happen to the client connection that was
due to be handled by the backend being spawned?  Is this going to lead
to extra fds accumulating or any such thing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-24 Thread Magnus Hagander

On Tue, Aug 24, 2010 at 21:39, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Aug 24, 2010 at 3:10 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Aug 24, 2010 at 15:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't 
 know.

 And here is confirmation from the Microsoft web site:

       In some instances, calling GetExitCode() against the failed process
       indicates the following exit code:
       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait 
 for.

 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

 Just because I had written it before you posted that, here's how the
 win32-specific-set-a-flag-when-we're-in-control thing would look. But
 if we're convinced that just ignoring error 128 is safe, then that's
 obviously a simpler patch..

 So, if we do this, what will happen to the client connection that was
 due to be handled by the backend being spawned?  Is this going to lead
 to extra fds accumulating or any such thing?

I don't see why. The process goes away, and with it goes all the
handles. And the postmaster still closes all sockets and handles the
same way it did before.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 Yeah, that seems very plausible, although exactly how to verify I don't 
 know.

 And here is confirmation from the Microsoft web site:

       In some instances, calling GetExitCode() against the failed process
       indicates the following exit code:
       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.

 Given the existence of the deadman switch mechanism (which I hadn't
 remembered when this thread started), I'm coming around to the idea that
 we could just treat exit(128) as nonfatal on Windows.  If for some
 reason the child hadn't died instantly at startup, the deadman switch
 would distinguish that from the case described here.

So the options are:

(1) If running on Windows and the exit code is 128 and the deadman
switch is not engaged, don't crash-and-restart.
(2) If running on Windows, create a mutex in the parent process and
take it in the child; if the mutex has not been taken, don't
crash-and-restart.

There is some amount of user code (I'm not sure preceisely how much)
that runs after shared memory is mapped and before the deadman switch
is engaged.  If we go with option #1, it would probably behoove us to
try to minimize the amount of such code (at least in HEAD).  There is
probably not a great deal of danger that we could manage to scribble
on shared memory and then exit normally (rather than via signal),
never mind the need to exit with exactly 128.  But not a great deal
is not the same as none.  If we go with option #2, the principal
danger seems to be that the code Magnus wrote will turn out to be less
robust than we might hope; for example, it might not work on all
versions of Windows, or be prone to some other installation-dependent
mischief.

Another question is how far either of these fixes could be
back-patched.  I believe the dead-man switch only exists as far back
as 8.4, but the original commit message mentioned the possibility of
eventually back-patching it further:

Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-08-24 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 There is some amount of user code (I'm not sure preceisely how much)
 that runs after shared memory is mapped and before the deadman switch
 is engaged.

Er ... what would you define as user code?

The deadman switch is engaged at the point where we create a PGPROC.
Before that, it's entirely impossible to take either LWLocks or
heavyweight locks, which means that practically any access to shared
memory would be illegal anyway.  If there's anything very interesting
going on in that stretch, I'd be surprised.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session