[patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-13 Thread Brian Dessent

As we all know, Cygwin calls SetErrorMode (SEM_FAILCRITICALERRORS) to
suppress those pop up GUI messageboxes from the operating system when a
process encounters an unhandled exception.  This has the advantage of
making things more POSIX-like, and I'm sure people that run long
testsuites or unattended headless servers appreciate not coming in after
a long weeked to find that their server has been wedged for days waiting
for someone to click on OK.

But of course if you follow the user list you also know that this is a
double edged sword, in that currently if a required DLL is missing there
is zero indication other than an curiously arbitrary exit status code of
53 decimal, or 0x35 hex which is the low byte of 0xC135,
STATUS_DLL_NOT_FOUND.

Anyway, the attached patch fixes all that by adding logic to let the
actual NTSTATUS logic percolate up to the waiting parent, so that it can
recognise these kinds of common(ish) faults and print a friendly message
-- or at least something other than silently dieing with no output.

After printing the message, the NTSTATUS is discarded and the exit
status code is replaced with a synthetic exit status corresponding to
killed by SIGKILL as read by the wait()ing parent, which means the
shell will also append Killed to that message.  I tried 0x80 |
SIGSEGV, corresponding to segmentation fault and core dumped but
since we aren't actually generating a core file, it seemed a little
weird to see the shell say that there was one generated.  The point here
is that the exit status that the parent (in most cases the shell) sees
is totally arbitrary, so we can put whatever makes the most sense
there.  I just figured that the shell printing Killed most closely
corresponds to the actual situation of the OS terminating the process
due to an unhandled exception.

There are three specific cases that I had in mind to handle with a
graceful message:

1. the user is missing a DLL
2. the DLL that is found is missing symbols
3. relocs in the .rdata section

In addition to catching and hopefully explaining those, it also prints a
generic default case for any other exception code.

Also, I'm attaching a Makefile that will create a test executable for
each of the three cases above.  It's totally standalone, you can type
make check and it will build and run the checks.  This is what the
current output looks like:

$ ./dll_not_found
dll_not_found.exe: one or more DLLs that this program requires cannot be
located by the system.  Make sure the PATH is correct and re-run the
setup program to install any packages indicated as necessary to satisfy
library dependencies.
Killed

$ ./missing_import
missing_import.exe: an entry point for one of more symbols could not be
found during program initialization.  Usually this means an incorrect
or out of date version of one or more DLLs is being erroniously found
on the PATH.
Killed

$ ./rdata_relocs 
rdata_relocs.exe: the process encountered an unhandled access violation
fault.
If this happens immediately and consistently at process startup,
one likely cause is relocs in the .rdata section as a result of
the runtime pseudo-reloc feature being applied to data imports
in 'const' structures.  Relinking with a linker script that marks
the .rdata section writeable can solve this problem.
Killed

In all three of these cases, the current behavior is that you would get
a GUI popup box from csrss.exe if you ran them from strace (since strace
does not call SetErrorMode() and that setting is inherited) but you get
absolutely no indication of an error if you run them from a Cygwin
process... other than $? if you know to check it.

I'm not 100% convinced that the change to the sigproc/pinfo stuff is
totally correct and safe, as it's pretty involved code and I had to
scatch my head for a while to figure out how everything interacts.  So
please do kick the tires.

BTW, when you *do* get the GUI popup messageboxes, they are helpful in
that the identify the precise DLL that's missing or the function that
isn't present, etc.  I was really hoping to figure out a cool way to get
that info, perhaps by poking around in the TEB or PEB somewhere, but I
haven't gotten that far.  If anyone has any general ideas where to look
for NTLDR's internal state, I'm all ears.  I have a hunch it would be
possible to get if we were running the exec'd process in a debugger loop
and pumping WaitForDebugEvent() messages, since those can have
parameters attached to exception codes.  But that's a little too
extreme.

Brian2008-03-13  Brian Dessent  [EMAIL PROTECTED]

* ntdll.h: Add several missing NTSTATUS defines.
* pinfo.cc (pinfo::maybe_set_exit_code_from_windows): Recognise
and preserve the exit status of a child that terminated with
an unhandled exception.
(pinfo::exit): Make the whole NTSTATUS exit code available to
the wait()ing parent when an exec()d process fails due to
an unhandled exception.
* pinfo.h (class _pinfo): Fix 

Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-13 Thread Eric Blake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Brian Dessent on 3/13/2008 7:45 PM:
| Anyway, the attached patch fixes all that by adding logic to let the
| actual NTSTATUS logic percolate up to the waiting parent, so that it can
| recognise these kinds of common(ish) faults and print a friendly message
| -- or at least something other than silently dieing with no output.

Cool!  However, I haven't looked at the patch itself, yet.

| $ ./dll_not_found
| dll_not_found.exe: one or more DLLs that this program requires cannot be
| located by the system.  Make sure the PATH is correct and re-run the
| setup program to install any packages indicated as necessary to satisfy
| library dependencies.
| Killed

Should we also mention 'cygcheck ./dll_not_found' to find out which ones
are missing?

|
| $ ./missing_import
| missing_import.exe: an entry point for one of more symbols could not be
| found during program initialization.  Usually this means an incorrect
| or out of date version of one or more DLLs is being erroniously found
| on the PATH.
| Killed

s/erroniously/erroneously/

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkfZ21gACgkQ84KuGfSFAYCjQwCfdmAdES3oXUTF0rf9eMFCvDBJ
SbIAn1xfTEwKHZDUAloRo4VdvEt99xWJ
=W9DL
-END PGP SIGNATURE-


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-13 Thread Brian Dessent
Eric Blake wrote:

 Should we also mention 'cygcheck ./dll_not_found' to find out which ones
 are missing?

It might be a good idea.  On the other hand it's kind of long already. 
I'm totally not married to what I've got for the wording though,
consider it a very rough draft.

 | missing_import.exe: an entry point for one of more symbols could not be
 | found during program initialization.  Usually this means an incorrect
 | or out of date version of one or more DLLs is being erroniously found
 | on the PATH.
 | Killed
 
 s/erroniously/erroneously/

Drat and s/one of more/one or more/ as well.

Brian