Re: [PATCH] better stackdumps

2008-03-20 Thread Corinna Vinschen
On Mar 19 08:56, Brian Dessent wrote:
 Christopher Faylor wrote:
 
  Sorry, but I don't like this concept.  This bloats the cygwin DLL for a
  condition that would be better served by either using gdb or generating
  a real coredump.
 
 I hear you, but part of the motivation for writing this was a recent
 thread the other week on the gdb list where the poster asked how to get
 symbols in a Cygwin stackdump file.  I suggested the same thing, setting
 error_start=dumper to get a real core dump.  They did, and the result
 was completely useless.  Here is what dumper gives you for the same
 simple testcase:
 [...]
 addr2line also seems to be totally unequipped to deal with separate .dbg
 information, as I can't get it to output a thing even though both a.exe
 and cygwin1.dll have full debug symbols:
 
 $ addr2line -e a.exe 0x610F74B1
 ??:0

Is it a big problem to fix addr2line to deal with .dbg files?

I like your idea to add names to the stackdump especially because of
addr2line's brokenness.  But, actually, if addr2line would work with
.dbg files, there would be no reason to add this to the stackdump file.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat


Re: [PATCH] better stackdumps

2008-03-20 Thread Christopher Faylor
On Thu, Mar 20, 2008 at 11:35:32AM +0100, Corinna Vinschen wrote:
On Mar 19 08:56, Brian Dessent wrote:
 Christopher Faylor wrote:
 
  Sorry, but I don't like this concept.  This bloats the cygwin DLL for a
  condition that would be better served by either using gdb or generating
  a real coredump.
 
 I hear you, but part of the motivation for writing this was a recent
 thread the other week on the gdb list where the poster asked how to get
 symbols in a Cygwin stackdump file.  I suggested the same thing, setting
 error_start=dumper to get a real core dump.  They did, and the result
 was completely useless.  Here is what dumper gives you for the same
 simple testcase:
 [...]
 addr2line also seems to be totally unequipped to deal with separate .dbg
 information, as I can't get it to output a thing even though both a.exe
 and cygwin1.dll have full debug symbols:
 
 $ addr2line -e a.exe 0x610F74B1
 ??:0

Is it a big problem to fix addr2line to deal with .dbg files?

I like your idea to add names to the stackdump especially because of
addr2line's brokenness.  But, actually, if addr2line would work with
.dbg files, there would be no reason to add this to the stackdump file.

There's still the issue of dealing with the separate signal stack.  That
makes stack dumps less than useful.

However, I would really love it if gdb was able to decode this information
automatically.

The bottom line is that I think that rather than modifying cygwin to
work around the limitations of the tools we should be fixing the tools.

But, then, that puts the problem back on my shoulders as the gdb and
binutils maintainer.

PTC.

cgf


addr2line [ Was: better stackdumps ]

2008-03-20 Thread Brian Dessent
Corinna Vinschen wrote:

 Is it a big problem to fix addr2line to deal with .dbg files?
 
 I like your idea to add names to the stackdump especially because of
 addr2line's brokenness.  But, actually, if addr2line would work with
 .dbg files, there would be no reason to add this to the stackdump file.

I absolutely agree that addr2line and/or dumper and/or gdb should be
fixed, regardless of this patch.  I never meant to imply an either/or
situation, and in fact I have debugged addr2line and here are the
reasons it's broken:

Firstly it's got nothing to do with .gnu_debuglink separate debug file,
that part works just fine.  And secondly addr2line only loads the debug
information for the module that you supply with -e, meaning that if you
give -e a.exe it will look at symbols for a.exe, but it doesn't know
that a.exe is dynamically linked to cygwin1.dll and it won't try to load
symbols for cygwin1.dll.  This means to use it you need to know
beforehand which module the address is in, which right there makes it
kind of a pain to use for DLLs, and to me it rather dilutes the argument
that you can just postprocess a stackdump file with it since you need
more information than what's there.

The next problem is that addr2line first tries to read STABS, and if
that fails it falls back to DWARF-2.  I always build Cygwin and most
other things with DWARF-2 debug symbols, mainly to make sure they work
but really aren't we eventually hoping to get rid of STABS?  Anyway,
this exposed another problem in that even if you build all of Newlib and
Cygwin with -gdwarf-2 or -ggdb3, you still get a handful STABS symbols
which are hardcoded in various assembler files:

mktemp.cc:20:  asm (.stabs \ msg \,30,0,0,0\n\t \
mktemp.cc:21:  .stabs \_ #symbol \,1,0,0,0\n);

This is used to insert a linktime warning for using mktemp().

sigfe.s:3:  .stabs  _sigfe:F(0,1),36,0,0,__sigfe
sigfe.s:44: .stabs  _sigbe:F(0,1),36,0,0,__sigbe
sigfe.s:70: .stabs  sigreturn:F(0,1),36,0,0,_sigreturn
sigfe.s:108:.stabs  sigdelayed:F(0,1),36,0,0,_sigdelayed

This becomes a problem in that when bfd tries to find an address in the
debug data it sees these minimal STABS and considers them a match --
even though they are mostly irrelevant, they are present and since it's
only got an address to go by it doesn't know that there is a much better
match in the DWARF-2 data.  It just sees that it has gotten a (bad)
match, so it doesn't bother looking in the DWARF-2 data.  And since
those hand-coded .stabs above only give symbol name locations, not line
number information, that means that regardless of what you ask addr2line
it's going to return nothing because it only cares about line number
info.

I see two potential fixes here, the first being that Cygwin could be
adapted to not hardcode .stabs but rather detect whether it's being
built with DWARF-2 or STABS and use the appropriate kind.  The other fix
is to teach BFD to try DWARF-2 first before STABS.  The attached patch
does this, for the purposes of illustration -- I don't really claim this
is correct.

Once that is applied, here is the result of running the patched
addr2line on the addresses in the stackdump of this testcase:

$ for F in 610F74B1 610FDD3B 6110A310 610AA4A8 61006094; do
/build/combined/binutils/.libs/addr2line.exe -e /bin/cygwin1.dll -f
0x$F; done
??
??:0
_vfprintf_r
/usr/src/sourceware/newlib/libc/stdio/vfprintf.c:1197
printf
/usr/src/sourceware/newlib/libc/stdio/printf.c:55
??
??:0
_Z10dll_crt0_1Pv
/usr/src/sourceware/winsup/cygwin/dcrt0.cc:930

It now gets 3 out of 5 correct.  It got tripped up on _sigbe because
again addr2line only cares about line number info, not general address
information, and while there is information for the location of _sigbe,
they don't contain line number info:

(gdb) i ad _sigbe
Symbol _sigbe is at 0x610aa4a8 in a file compiled without debugging.

For the top frame (strlen), addr2line could not print anything because
while there is location information, there is no line number
information:

(gdb) i li *0x610F74B1
No line number information available for address 0x610f74b1 strlen+17

This is due to the fact that strlen is implemented in newlib as
libc/machine/i386/strlen.S which is a straight assembler version, and
hence no line number debug records.



*** To summarize thus far:

1. addr2line can be made to work again by one of a) dictating the use of
STABS (boo!), b) modifying Cygwin to not emit hardcoded .stabs
directives directly, c) modifying BFD to prefer DWARF-2 to STABS when
reading COFF files.

2. addr2line requires the user to know beforehand which DLL a symbol is
in, because it can't resolve runtime dependencies.

3. addr2line only cares about line number debug records, which means it
will be incapable of representing many symbols.

4. As an implication of 3), addr2line is totally useless on DLLs/EXEs
without debug information available.



I think point number 4 is worth repeating: we as developers take for
granted having debug 

Re: addr2line [ Was: better stackdumps ]

2008-03-20 Thread Christopher Faylor
On Thu, Mar 20, 2008 at 11:23:05AM -0700, Brian Dessent wrote:
Corinna Vinschen wrote:

 Is it a big problem to fix addr2line to deal with .dbg files?
 
 I like your idea to add names to the stackdump especially because of
 addr2line's brokenness.  But, actually, if addr2line would work with
 .dbg files, there would be no reason to add this to the stackdump file.

I absolutely agree that addr2line and/or dumper and/or gdb should be
fixed, regardless of this patch.  I never meant to imply an either/or
situation, and in fact I have debugged addr2line and here are the
reasons it's broken:

Firstly it's got nothing to do with .gnu_debuglink separate debug file,
that part works just fine.  And secondly addr2line only loads the debug
information for the module that you supply with -e, meaning that if you
give -e a.exe it will look at symbols for a.exe, but it doesn't know
that a.exe is dynamically linked to cygwin1.dll and it won't try to load
symbols for cygwin1.dll.  This means to use it you need to know
beforehand which module the address is in, which right there makes it
kind of a pain to use for DLLs, and to me it rather dilutes the argument
that you can just postprocess a stackdump file with it since you need
more information than what's there.

The next problem is that addr2line first tries to read STABS, and if
that fails it falls back to DWARF-2.  I always build Cygwin and most
other things with DWARF-2 debug symbols, mainly to make sure they work
but really aren't we eventually hoping to get rid of STABS?  Anyway,
this exposed another problem in that even if you build all of Newlib and
Cygwin with -gdwarf-2 or -ggdb3, you still get a handful STABS symbols
which are hardcoded in various assembler files:

mktemp.cc:20:  asm (.stabs \ msg \,30,0,0,0\n\t \
mktemp.cc:21:  .stabs \_ #symbol \,1,0,0,0\n);

This is used to insert a linktime warning for using mktemp().

sigfe.s:3:  .stabs  _sigfe:F(0,1),36,0,0,__sigfe
sigfe.s:44: .stabs  _sigbe:F(0,1),36,0,0,__sigbe
sigfe.s:70: .stabs  sigreturn:F(0,1),36,0,0,_sigreturn
sigfe.s:108:.stabs  sigdelayed:F(0,1),36,0,0,_sigdelayed

This becomes a problem in that when bfd tries to find an address in the
debug data it sees these minimal STABS and considers them a match --
even though they are mostly irrelevant, they are present and since it's
only got an address to go by it doesn't know that there is a much better
match in the DWARF-2 data.  It just sees that it has gotten a (bad)
match, so it doesn't bother looking in the DWARF-2 data.  And since
those hand-coded .stabs above only give symbol name locations, not line
number information, that means that regardless of what you ask addr2line
it's going to return nothing because it only cares about line number
info.

I see two potential fixes here, the first being that Cygwin could be
adapted to not hardcode .stabs but rather detect whether it's being
built with DWARF-2 or STABS and use the appropriate kind.  The other fix
is to teach BFD to try DWARF-2 first before STABS.  The attached patch
does this, for the purposes of illustration -- I don't really claim this
is correct.

Once that is applied, here is the result of running the patched
addr2line on the addresses in the stackdump of this testcase:

$ for F in 610F74B1 610FDD3B 6110A310 610AA4A8 61006094; do
/build/combined/binutils/.libs/addr2line.exe -e /bin/cygwin1.dll -f
0x$F; done
??
??:0
_vfprintf_r
/usr/src/sourceware/newlib/libc/stdio/vfprintf.c:1197
printf
/usr/src/sourceware/newlib/libc/stdio/printf.c:55
??
??:0
_Z10dll_crt0_1Pv
/usr/src/sourceware/winsup/cygwin/dcrt0.cc:930

It now gets 3 out of 5 correct.  It got tripped up on _sigbe because
again addr2line only cares about line number info, not general address
information, and while there is information for the location of _sigbe,
they don't contain line number info:

(gdb) i ad _sigbe
Symbol _sigbe is at 0x610aa4a8 in a file compiled without debugging.

For the top frame (strlen), addr2line could not print anything because
while there is location information, there is no line number
information:

(gdb) i li *0x610F74B1
No line number information available for address 0x610f74b1 strlen+17

This is due to the fact that strlen is implemented in newlib as
libc/machine/i386/strlen.S which is a straight assembler version, and
hence no line number debug records.



*** To summarize thus far:

1. addr2line can be made to work again by one of a) dictating the use of
STABS (boo!), b) modifying Cygwin to not emit hardcoded .stabs
directives directly, c) modifying BFD to prefer DWARF-2 to STABS when
reading COFF files.

2. addr2line requires the user to know beforehand which DLL a symbol is
in, because it can't resolve runtime dependencies.

3. addr2line only cares about line number debug records, which means it
will be incapable of representing many symbols.

4. As an implication of 3), addr2line is totally useless on DLLs/EXEs
without debug information available.



I think point number 4 is worth 

Re: addr2line [ Was: better stackdumps ]

2008-03-20 Thread Brian Dessent
Brian Dessent wrote:

 I think I see what's going on here though, the Cygwin fault handler took
 the first chance exception and wrote the stackdump file, and only then
 passed it on to the debugger, so that by the time gdb got notice of the
 fault the stack was all fubar.  This could be the reason why dumper is
 not working too.  I thought there was a IsBeingDebugged() check in the

Silly me, this is good old set cygwin-exceptions defaulting to off...
of course gdb was ignoring the fault and letting Cygwin handle it.  With
it set to on everything works as expected, and the issue of why the
process state that dumper records is so trashed is unrelated.

Brian


[PATCH] stackdump rev2

2008-03-20 Thread Brian Dessent
Brian Dessent wrote:

 Yes, it means there is one frame that says sigbe instead of the actual
 return location somewhere else.  I don't think that's impossible to fix
 either: the fault handler gets the context of the faulting thread so it
 can look up its tls area through %fs:4 and peek at the top of the signal
 stack for the value.  I will investigate if this is workable.

In fact, since the fault handler runs in the context of the thread that
fauled, this turns out to be trivial.  I started with an implementation
that calls GetThreadSelectorEntry to resolve the %fs:4 in the CONTEXT,
but it turned out to always be equal to simply _my_tls.  This updated
version of the patch simply subsitutes _my_tls.retaddr() in place of
thestack.sf.AddrPC.Offset if it is equal to _sigbe, allowing for proper
unwinding through these frames.  I tested this when the faulting thread
is the main thread as well as a user created thread and it seems to do
the right thing.  Am I missing something hideous here?

Example when it's the main thread:

Exception: STATUS_ACCESS_VIOLATION at eip=610F7501
eax= ebx= ecx= edx=FEED esi= edi=FEED
ebp=0022C568 esp=0022C564 
program=\\?\C:\cygwin\home\brian\testcases\backtrace\a.exe, pid 7720, thread 
main
cs=001B ds=0023 es=0023 fs=003B gs= ss=0023
Stack trace:
Frame Function  Args
0022C568  610F7501  (FEED, 0022C676, 00402008, 0001) 
cygwin1.dll!_strlen+0x11
0022CC98  610FDD8B  (0022D008, 6111C668, 00402000, 0022CCC8) 
cygwin1.dll!_fputc+0x34EB
0022CCB8  6110A360  (00402000, FEED, 0009, 00401065) 
cygwin1.dll!_printf+0x30
0022CCE8  00401084  (0001, 61290908, 00680098, ) a.exe+0x84
0022CD98  61006094  (, 0022CDD0, 61005430, 0022CDD0) 
cygwin1.dll!_dll_crt0_1+0xC64
End of stack trace

Example when it's a user created thread:

Exception: STATUS_ACCESS_VIOLATION at eip=610F7501
eax= ebx= ecx= edx=FEED esi= edi=FEED
ebp=1886C5E8 esp=1886C5E4 
program=\\?\C:\cygwin\home\brian\testcases\backtrace\tc2.exe, pid 8108, thread 
unknown (0x1BA4)
cs=001B ds=0023 es=0023 fs=003B gs= ss=0023
Stack trace:
Frame Function  Args
1886C5E8  610F7501  (FEED, 1886C6F6, 0040200F, 0001) 
cygwin1.dll!_strlen+0x11
1886CD18  610FDD8B  (1886D008, 6111C668, 00402005, 1886CD48) 
cygwin1.dll!_fputc+0x34EB
1886CD38  6110A360  (00402005, FEED, 1886CD58, 611004C0) 
cygwin1.dll!_printf+0x30
1886CD58  0040106C  (00402012, 000A, 1886CD78, 0040109E) tc2.exe+0x6C
1886CD68  00401085  (00402017, 1886CE64, 1886CDB8, 610C85AB) tc2.exe+0x85
1886CD78  0040109E  (, , 611FD6B0, 6100415A) tc2.exe+0x9E
1886CDB8  610C85AB  (006901F0, 1886CDF0, 610C8530, 1886CDF0) 
cygwin1.dll!pthread::thread_init_wrapper+0x7B
End of stack trace

As you can see I also added special-casing for the thread_init_wrapper
function that forms the bottom of the stack for user created threads.

Christopher Faylor wrote:

 That's not a patch that I can approve, unfortunately.

That's okay, it was just illustrative anyway.  I think I can fix this
purely in Cygwin by simply not emitting and .stabs if the effective
CFLAGS indicates DWARF-2 is desired.  That's next on my plate.

Brian2008-03-20  Brian Dessent  [EMAIL PROTECTED]

* exceptions.cc (maybe_adjust_va_for_sigfe): New function to cope
with signal wrappers.
(prettyprint_va): New function that attempts to find a symbolic
name for a memory location by walking the export sections of all
modules.
(stackdump): Call it.  Use the signal frame return address
instead of _sigbe.
* gendef: Mark __sigfe as a global so that its address can be
used by the backtrace code.
* ntdll.h (struct _PEB_LDR_DATA): Declare.
(struct _LDR_MODULE): Declare.
(struct _PEB): Use actual LDR_DATA type for LdrData.
(RtlImageDirectoryEntryToData): Declare.

Index: exceptions.cc
===
RCS file: /cvs/src/src/winsup/cygwin/exceptions.cc,v
retrieving revision 1.319
diff -u -p -r1.319 exceptions.cc
--- exceptions.cc   12 Mar 2008 12:41:49 -  1.319
+++ exceptions.cc   20 Mar 2008 21:06:07 -
@@ -284,6 +284,159 @@ stack_info::walk ()
   return 1;
 }
 
+/* These symbols are used by the below functions to put a prettier face
+   on a stack backtrace.  */
+extern u_char etext asm (etext);  /* End of .text */
+extern u_char thread_init_wrapper asm ([EMAIL PROTECTED]);
+extern u_char _sigfe, _sigbe;
+void dll_crt0_1 (void *);
+
+const struct {
+  DWORD va;
+  const char *label;
+} hints[] = {
+  { (DWORD) thread_init_wrapper, pthread::thread_init_wrapper },
+  { (DWORD) dll_crt0_1, _dll_crt0_1 }
+};
+
+/* Helper function to assist with backtraces.  This tries to detect if
+   an entrypoint is really a sigfe wrapper and returns the actual address
+   of the function.  Here's an example:
+
+   610ab9f0 __sigfe_printf:
+