Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-23 Thread Brian Dessent
Christopher Faylor wrote:

 After poking at this a little, I think it would be better to issue a
 linux-like error message.
 
 In my sandbox, I now have this:
 
 bash-3.2$ ./libtest
 /cygdrive/s/test/libtest.exe: error while loading shared libraries: liba.dll: 
 cannot open shared object file: No such file or directory
 
 I haven't done the work to report on missing symbols yet but I think
 that's a much less common error condition.

Excellent.  The wording isn't really that important to me.   But I think
what is important is that we don't allow the situation where something
was unable to start and we are totally silent.  That leads to confusion
because people start to try to debug or blame the program being run when
in fact the program never saw the light of day in the first place.  It's
totally baffling when it happens and you're not aware to look for it. 
So even if we can't give the name of the symbol in the case of a missing
import, I think it's still important to say something.

Brian


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-23 Thread Christopher Faylor
On Sun, Mar 23, 2008 at 01:48:15PM -0700, Brian Dessent wrote:
Christopher Faylor wrote:
After poking at this a little, I think it would be better to issue a
linux-like error message.

In my sandbox, I now have this:

bash-3.2$ ./libtest /cygdrive/s/test/libtest.exe: error while loading
shared libraries: liba.dll: cannot open shared object file: No such
file or directory

I haven't done the work to report on missing symbols yet but I think
that's a much less common error condition.

Excellent.  The wording isn't really that important to me.  But I think
what is important is that we don't allow the situation where something
was unable to start and we are totally silent.  That leads to confusion
because people start to try to debug or blame the program being run
when in fact the program never saw the light of day in the first place.
It's totally baffling when it happens and you're not aware to look for
it.  So even if we can't give the name of the symbol in the case of a
missing import, I think it's still important to say something.

Yes.  I really have been meaning to fix this for a long time.  It's my
fault that cygwin has this bug.  I appreciate your point the way to
how this could be solved.

cgf


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-23 Thread Christopher Faylor
On Sun, Mar 23, 2008 at 08:09:06PM -0400, Christopher Faylor wrote:
On Sun, Mar 23, 2008 at 01:48:15PM -0700, Brian Dessent wrote:
Christopher Faylor wrote:
After poking at this a little, I think it would be better to issue a
linux-like error message.

In my sandbox, I now have this:

bash-3.2$ ./libtest /cygdrive/s/test/libtest.exe: error while loading
shared libraries: liba.dll: cannot open shared object file: No such
file or directory

I haven't done the work to report on missing symbols yet but I think
that's a much less common error condition.

Excellent.  The wording isn't really that important to me.  But I think
what is important is that we don't allow the situation where something
was unable to start and we are totally silent.  That leads to confusion
because people start to try to debug or blame the program being run
when in fact the program never saw the light of day in the first place.
It's totally baffling when it happens and you're not aware to look for
it.  So even if we can't give the name of the symbol in the case of a
missing import, I think it's still important to say something.

Yes.  I really have been meaning to fix this for a long time.  It's my
fault that cygwin has this bug.  I appreciate your point the way to
 ing
how this could be solved.

cgf


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-22 Thread Christopher Faylor
On Fri, Mar 14, 2008 at 11:11:43PM -0700, Brian Dessent wrote:
Christopher Faylor wrote:
That was going to be my first observation, actually.  I'm still trying
to digest the patch but it seems like it wouldn't work well with the
fork retry code.

The patch doesn't change any behavior though: in current Cygwin if the
thing we're exec()ing returns a Win32 exit code of C135 (or
whatever) then we retry creating it 5 times.  With the patch, we still
retry starting it 5 times but after the last one fails we recognise the
situation and print a message.

After poking at this a little, I think it would be better to issue a
linux-like error message.

In my sandbox, I now have this:

bash-3.2$ ./libtest
/cygdrive/s/test/libtest.exe: error while loading shared libraries: liba.dll: 
cannot open shared object file: No such file or directory

I haven't done the work to report on missing symbols yet but I think
that's a much less common error condition.

So thanks for the patch and the concept but I think I'd rather go with a
more linux-like solution.

cgf


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-22 Thread Christopher Faylor
On Fri, Mar 14, 2008 at 04:08:46AM -0700, Brian Dessent wrote:
Brian Dessent wrote:

 isn't present, etc.  I was really hoping to figure out a cool way to get
 that info, perhaps by poking around in the TEB or PEB somewhere, but I
 haven't gotten that far.  If anyone has any general ideas where to look
 for NTLDR's internal state, I'm all ears.  I have a hunch it would be
 possible to get if we were running the exec'd process in a debugger loop
 and pumping WaitForDebugEvent() messages, since those can have
 parameters attached to exception codes.  But that's a little too
 extreme.

For anyone curious, it's absolutely possible to do the above.  For the
C139 fault (missing procedure point entry), %ebx at the time of the
fault points right at the AsciiZ name of the missing import in the
.idata section, -8(%ebp) points to the import name in UNICODE, and
-10(%ebp) points to the DLL name in UNICODE.

For the C135 fault (the unable to locate component popup), %esi at
the time of the fault points right to the missing library name in
UNICODE.

For the C005 fault (the LDR hits an access violation trying to fixup
a reloc .rdata), %ebx points to an AsciiZ name of the symbol it was
relocating and 24(%ebp) points to an AsciiZ filename of the module which
that symbol is supposed to be pointing into.

Now I'm sure a lot of those above offsets are just coincidental, as I
haven't done much testing to see if it's reliably set as above.  However
it does mean that it would be relatively easy to use the debug API to
step a process through its initialization and find out exactly why it's
faulting.  I've been working on something along those lines for cygcheck
which will also give dynamic process tracing, i.e. runtime LoadLibrary
stuff.  Combined with enabling the LDR snaps debug output, there is a
tremendous amount of debug capability hidden here.

I took a more low-tech and undoubtedly less foolproof approach and just
stepped through the list of dlls for the failing executable, trying to
load them one at a time, reporting on the first dll to fail.  This is
not necessarily the same scenario as what the Windows dynamic linker
is doing but it should report an *a* missing dll if not necessarily the
dll that was causing the problem.  It also won't necessarily get the
search order right if the executable being started isn't in the same
directory as the parent executable.  That's fixable, though.

cgf


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-15 Thread Brian Dessent
Christopher Faylor wrote:

 That was going to be my first observation, actually.  I'm still trying
 to digest the patch but it seems like it wouldn't work well with the
 fork retry code.

The patch doesn't change any behavior though: in current Cygwin if the
thing we're exec()ing returns a Win32 exit code of C135 (or
whatever) then we retry creating it 5 times.  With the patch, we still
retry starting it 5 times but after the last one fails we recognise the
situation and print a message.

Brian


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-14 Thread Brian Dessent
Brian Dessent wrote:

 isn't present, etc.  I was really hoping to figure out a cool way to get
 that info, perhaps by poking around in the TEB or PEB somewhere, but I
 haven't gotten that far.  If anyone has any general ideas where to look
 for NTLDR's internal state, I'm all ears.  I have a hunch it would be
 possible to get if we were running the exec'd process in a debugger loop
 and pumping WaitForDebugEvent() messages, since those can have
 parameters attached to exception codes.  But that's a little too
 extreme.

For anyone curious, it's absolutely possible to do the above.  For the
C139 fault (missing procedure point entry), %ebx at the time of the
fault points right at the AsciiZ name of the missing import in the
.idata section, -8(%ebp) points to the import name in UNICODE, and
-10(%ebp) points to the DLL name in UNICODE.

For the C135 fault (the unable to locate component popup), %esi at
the time of the fault points right to the missing library name in
UNICODE.

For the C005 fault (the LDR hits an access violation trying to fixup
a reloc .rdata), %ebx points to an AsciiZ name of the symbol it was
relocating and 24(%ebp) points to an AsciiZ filename of the module which
that symbol is supposed to be pointing into.

Now I'm sure a lot of those above offsets are just coincidental, as I
haven't done much testing to see if it's reliably set as above.  However
it does mean that it would be relatively easy to use the debug API to
step a process through its initialization and find out exactly why it's
faulting.  I've been working on something along those lines for cygcheck
which will also give dynamic process tracing, i.e. runtime LoadLibrary
stuff.  Combined with enabling the LDR snaps debug output, there is a
tremendous amount of debug capability hidden here.

Brian


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-14 Thread Corinna Vinschen
On Mar 14 04:08, Brian Dessent wrote:
 Brian Dessent wrote:
 
  isn't present, etc.  I was really hoping to figure out a cool way to get
  that info, perhaps by poking around in the TEB or PEB somewhere, but I
  haven't gotten that far.  If anyone has any general ideas where to look
  for NTLDR's internal state, I'm all ears.  I have a hunch it would be
  possible to get if we were running the exec'd process in a debugger loop
  and pumping WaitForDebugEvent() messages, since those can have
  parameters attached to exception codes.  But that's a little too
  extreme.
 
 For anyone curious, it's absolutely possible to do the above.  For the
 C139 fault (missing procedure point entry), %ebx at the time of the
 fault points right at the AsciiZ name of the missing import in the
 .idata section, -8(%ebp) points to the import name in UNICODE, and
 -10(%ebp) points to the DLL name in UNICODE.
 
 For the C135 fault (the unable to locate component popup), %esi at
 the time of the fault points right to the missing library name in
 UNICODE.
 
 For the C005 fault (the LDR hits an access violation trying to fixup
 a reloc .rdata), %ebx points to an AsciiZ name of the symbol it was
 relocating and 24(%ebp) points to an AsciiZ filename of the module which
 that symbol is supposed to be pointing into.
 
 Now I'm sure a lot of those above offsets are just coincidental, as I
 haven't done much testing to see if it's reliably set as above.  However
 it does mean that it would be relatively easy to use the debug API to
 step a process through its initialization and find out exactly why it's
 faulting.  I've been working on something along those lines for cygcheck
 which will also give dynamic process tracing, i.e. runtime LoadLibrary
 stuff.  Combined with enabling the LDR snaps debug output, there is a
 tremendous amount of debug capability hidden here.

That's really cool.  Your patch looks good, but it's Chris' code so
he will have the final say.

What we also could do instead of adding this to the DLL is to add this
to cygcheck and/or strace only.  If somebody complains on the list that
a process just exits, we can point him to run it under cygcheck and it
will tell you what's wrong.  That would be already quite nice, imho.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-14 Thread Christopher Faylor
On Thu, Mar 13, 2008 at 07:46:37PM -0700, Brian Dessent wrote:
Brian Dessent wrote:

 As we all know, Cygwin calls SetErrorMode (SEM_FAILCRITICALERRORS) to
 suppress those pop up GUI messageboxes from the operating system when 

Oh, I forgot to mention:

In the course of testing this I came to realize that because of some
sort of retry if fork doesn't seem to be working code (not sure of the
details), every time that this situation comes up we are actually
launching five copies of the binary.

That was going to be my first observation, actually.  I'm still trying
to digest the patch but it seems like it wouldn't work well with the
fork retry code.

cgf


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-13 Thread Eric Blake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Brian Dessent on 3/13/2008 7:45 PM:
| Anyway, the attached patch fixes all that by adding logic to let the
| actual NTSTATUS logic percolate up to the waiting parent, so that it can
| recognise these kinds of common(ish) faults and print a friendly message
| -- or at least something other than silently dieing with no output.

Cool!  However, I haven't looked at the patch itself, yet.

| $ ./dll_not_found
| dll_not_found.exe: one or more DLLs that this program requires cannot be
| located by the system.  Make sure the PATH is correct and re-run the
| setup program to install any packages indicated as necessary to satisfy
| library dependencies.
| Killed

Should we also mention 'cygcheck ./dll_not_found' to find out which ones
are missing?

|
| $ ./missing_import
| missing_import.exe: an entry point for one of more symbols could not be
| found during program initialization.  Usually this means an incorrect
| or out of date version of one or more DLLs is being erroniously found
| on the PATH.
| Killed

s/erroniously/erroneously/

- --
Don't work too hard, make some time for fun as well!

Eric Blake [EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkfZ21gACgkQ84KuGfSFAYCjQwCfdmAdES3oXUTF0rf9eMFCvDBJ
SbIAn1xfTEwKHZDUAloRo4VdvEt99xWJ
=W9DL
-END PGP SIGNATURE-


Re: [patch] recognise when an exec()d process terminates due to unhandled exception

2008-03-13 Thread Brian Dessent
Eric Blake wrote:

 Should we also mention 'cygcheck ./dll_not_found' to find out which ones
 are missing?

It might be a good idea.  On the other hand it's kind of long already. 
I'm totally not married to what I've got for the wording though,
consider it a very rough draft.

 | missing_import.exe: an entry point for one of more symbols could not be
 | found during program initialization.  Usually this means an incorrect
 | or out of date version of one or more DLLs is being erroniously found
 | on the PATH.
 | Killed
 
 s/erroniously/erroneously/

Drat and s/one of more/one or more/ as well.

Brian