Re: [PATCH] better stackdumps
On Mar 19 08:56, Brian Dessent wrote: Christopher Faylor wrote: Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. I hear you, but part of the motivation for writing this was a recent thread the other week on the gdb list where the poster asked how to get symbols in a Cygwin stackdump file. I suggested the same thing, setting error_start=dumper to get a real core dump. They did, and the result was completely useless. Here is what dumper gives you for the same simple testcase: [...] addr2line also seems to be totally unequipped to deal with separate .dbg information, as I can't get it to output a thing even though both a.exe and cygwin1.dll have full debug symbols: $ addr2line -e a.exe 0x610F74B1 ??:0 Is it a big problem to fix addr2line to deal with .dbg files? I like your idea to add names to the stackdump especially because of addr2line's brokenness. But, actually, if addr2line would work with .dbg files, there would be no reason to add this to the stackdump file. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat
Re: [PATCH] better stackdumps
On Thu, Mar 20, 2008 at 11:35:32AM +0100, Corinna Vinschen wrote: On Mar 19 08:56, Brian Dessent wrote: Christopher Faylor wrote: Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. I hear you, but part of the motivation for writing this was a recent thread the other week on the gdb list where the poster asked how to get symbols in a Cygwin stackdump file. I suggested the same thing, setting error_start=dumper to get a real core dump. They did, and the result was completely useless. Here is what dumper gives you for the same simple testcase: [...] addr2line also seems to be totally unequipped to deal with separate .dbg information, as I can't get it to output a thing even though both a.exe and cygwin1.dll have full debug symbols: $ addr2line -e a.exe 0x610F74B1 ??:0 Is it a big problem to fix addr2line to deal with .dbg files? I like your idea to add names to the stackdump especially because of addr2line's brokenness. But, actually, if addr2line would work with .dbg files, there would be no reason to add this to the stackdump file. There's still the issue of dealing with the separate signal stack. That makes stack dumps less than useful. However, I would really love it if gdb was able to decode this information automatically. The bottom line is that I think that rather than modifying cygwin to work around the limitations of the tools we should be fixing the tools. But, then, that puts the problem back on my shoulders as the gdb and binutils maintainer. PTC. cgf
addr2line [ Was: better stackdumps ]
Corinna Vinschen wrote: Is it a big problem to fix addr2line to deal with .dbg files? I like your idea to add names to the stackdump especially because of addr2line's brokenness. But, actually, if addr2line would work with .dbg files, there would be no reason to add this to the stackdump file. I absolutely agree that addr2line and/or dumper and/or gdb should be fixed, regardless of this patch. I never meant to imply an either/or situation, and in fact I have debugged addr2line and here are the reasons it's broken: Firstly it's got nothing to do with .gnu_debuglink separate debug file, that part works just fine. And secondly addr2line only loads the debug information for the module that you supply with -e, meaning that if you give -e a.exe it will look at symbols for a.exe, but it doesn't know that a.exe is dynamically linked to cygwin1.dll and it won't try to load symbols for cygwin1.dll. This means to use it you need to know beforehand which module the address is in, which right there makes it kind of a pain to use for DLLs, and to me it rather dilutes the argument that you can just postprocess a stackdump file with it since you need more information than what's there. The next problem is that addr2line first tries to read STABS, and if that fails it falls back to DWARF-2. I always build Cygwin and most other things with DWARF-2 debug symbols, mainly to make sure they work but really aren't we eventually hoping to get rid of STABS? Anyway, this exposed another problem in that even if you build all of Newlib and Cygwin with -gdwarf-2 or -ggdb3, you still get a handful STABS symbols which are hardcoded in various assembler files: mktemp.cc:20: asm (.stabs \ msg \,30,0,0,0\n\t \ mktemp.cc:21: .stabs \_ #symbol \,1,0,0,0\n); This is used to insert a linktime warning for using mktemp(). sigfe.s:3: .stabs _sigfe:F(0,1),36,0,0,__sigfe sigfe.s:44: .stabs _sigbe:F(0,1),36,0,0,__sigbe sigfe.s:70: .stabs sigreturn:F(0,1),36,0,0,_sigreturn sigfe.s:108:.stabs sigdelayed:F(0,1),36,0,0,_sigdelayed This becomes a problem in that when bfd tries to find an address in the debug data it sees these minimal STABS and considers them a match -- even though they are mostly irrelevant, they are present and since it's only got an address to go by it doesn't know that there is a much better match in the DWARF-2 data. It just sees that it has gotten a (bad) match, so it doesn't bother looking in the DWARF-2 data. And since those hand-coded .stabs above only give symbol name locations, not line number information, that means that regardless of what you ask addr2line it's going to return nothing because it only cares about line number info. I see two potential fixes here, the first being that Cygwin could be adapted to not hardcode .stabs but rather detect whether it's being built with DWARF-2 or STABS and use the appropriate kind. The other fix is to teach BFD to try DWARF-2 first before STABS. The attached patch does this, for the purposes of illustration -- I don't really claim this is correct. Once that is applied, here is the result of running the patched addr2line on the addresses in the stackdump of this testcase: $ for F in 610F74B1 610FDD3B 6110A310 610AA4A8 61006094; do /build/combined/binutils/.libs/addr2line.exe -e /bin/cygwin1.dll -f 0x$F; done ?? ??:0 _vfprintf_r /usr/src/sourceware/newlib/libc/stdio/vfprintf.c:1197 printf /usr/src/sourceware/newlib/libc/stdio/printf.c:55 ?? ??:0 _Z10dll_crt0_1Pv /usr/src/sourceware/winsup/cygwin/dcrt0.cc:930 It now gets 3 out of 5 correct. It got tripped up on _sigbe because again addr2line only cares about line number info, not general address information, and while there is information for the location of _sigbe, they don't contain line number info: (gdb) i ad _sigbe Symbol _sigbe is at 0x610aa4a8 in a file compiled without debugging. For the top frame (strlen), addr2line could not print anything because while there is location information, there is no line number information: (gdb) i li *0x610F74B1 No line number information available for address 0x610f74b1 strlen+17 This is due to the fact that strlen is implemented in newlib as libc/machine/i386/strlen.S which is a straight assembler version, and hence no line number debug records. *** To summarize thus far: 1. addr2line can be made to work again by one of a) dictating the use of STABS (boo!), b) modifying Cygwin to not emit hardcoded .stabs directives directly, c) modifying BFD to prefer DWARF-2 to STABS when reading COFF files. 2. addr2line requires the user to know beforehand which DLL a symbol is in, because it can't resolve runtime dependencies. 3. addr2line only cares about line number debug records, which means it will be incapable of representing many symbols. 4. As an implication of 3), addr2line is totally useless on DLLs/EXEs without debug information available. I think point number 4 is worth repeating: we as developers take for granted having debug
Re: addr2line [ Was: better stackdumps ]
On Thu, Mar 20, 2008 at 11:23:05AM -0700, Brian Dessent wrote: Corinna Vinschen wrote: Is it a big problem to fix addr2line to deal with .dbg files? I like your idea to add names to the stackdump especially because of addr2line's brokenness. But, actually, if addr2line would work with .dbg files, there would be no reason to add this to the stackdump file. I absolutely agree that addr2line and/or dumper and/or gdb should be fixed, regardless of this patch. I never meant to imply an either/or situation, and in fact I have debugged addr2line and here are the reasons it's broken: Firstly it's got nothing to do with .gnu_debuglink separate debug file, that part works just fine. And secondly addr2line only loads the debug information for the module that you supply with -e, meaning that if you give -e a.exe it will look at symbols for a.exe, but it doesn't know that a.exe is dynamically linked to cygwin1.dll and it won't try to load symbols for cygwin1.dll. This means to use it you need to know beforehand which module the address is in, which right there makes it kind of a pain to use for DLLs, and to me it rather dilutes the argument that you can just postprocess a stackdump file with it since you need more information than what's there. The next problem is that addr2line first tries to read STABS, and if that fails it falls back to DWARF-2. I always build Cygwin and most other things with DWARF-2 debug symbols, mainly to make sure they work but really aren't we eventually hoping to get rid of STABS? Anyway, this exposed another problem in that even if you build all of Newlib and Cygwin with -gdwarf-2 or -ggdb3, you still get a handful STABS symbols which are hardcoded in various assembler files: mktemp.cc:20: asm (.stabs \ msg \,30,0,0,0\n\t \ mktemp.cc:21: .stabs \_ #symbol \,1,0,0,0\n); This is used to insert a linktime warning for using mktemp(). sigfe.s:3: .stabs _sigfe:F(0,1),36,0,0,__sigfe sigfe.s:44: .stabs _sigbe:F(0,1),36,0,0,__sigbe sigfe.s:70: .stabs sigreturn:F(0,1),36,0,0,_sigreturn sigfe.s:108:.stabs sigdelayed:F(0,1),36,0,0,_sigdelayed This becomes a problem in that when bfd tries to find an address in the debug data it sees these minimal STABS and considers them a match -- even though they are mostly irrelevant, they are present and since it's only got an address to go by it doesn't know that there is a much better match in the DWARF-2 data. It just sees that it has gotten a (bad) match, so it doesn't bother looking in the DWARF-2 data. And since those hand-coded .stabs above only give symbol name locations, not line number information, that means that regardless of what you ask addr2line it's going to return nothing because it only cares about line number info. I see two potential fixes here, the first being that Cygwin could be adapted to not hardcode .stabs but rather detect whether it's being built with DWARF-2 or STABS and use the appropriate kind. The other fix is to teach BFD to try DWARF-2 first before STABS. The attached patch does this, for the purposes of illustration -- I don't really claim this is correct. Once that is applied, here is the result of running the patched addr2line on the addresses in the stackdump of this testcase: $ for F in 610F74B1 610FDD3B 6110A310 610AA4A8 61006094; do /build/combined/binutils/.libs/addr2line.exe -e /bin/cygwin1.dll -f 0x$F; done ?? ??:0 _vfprintf_r /usr/src/sourceware/newlib/libc/stdio/vfprintf.c:1197 printf /usr/src/sourceware/newlib/libc/stdio/printf.c:55 ?? ??:0 _Z10dll_crt0_1Pv /usr/src/sourceware/winsup/cygwin/dcrt0.cc:930 It now gets 3 out of 5 correct. It got tripped up on _sigbe because again addr2line only cares about line number info, not general address information, and while there is information for the location of _sigbe, they don't contain line number info: (gdb) i ad _sigbe Symbol _sigbe is at 0x610aa4a8 in a file compiled without debugging. For the top frame (strlen), addr2line could not print anything because while there is location information, there is no line number information: (gdb) i li *0x610F74B1 No line number information available for address 0x610f74b1 strlen+17 This is due to the fact that strlen is implemented in newlib as libc/machine/i386/strlen.S which is a straight assembler version, and hence no line number debug records. *** To summarize thus far: 1. addr2line can be made to work again by one of a) dictating the use of STABS (boo!), b) modifying Cygwin to not emit hardcoded .stabs directives directly, c) modifying BFD to prefer DWARF-2 to STABS when reading COFF files. 2. addr2line requires the user to know beforehand which DLL a symbol is in, because it can't resolve runtime dependencies. 3. addr2line only cares about line number debug records, which means it will be incapable of representing many symbols. 4. As an implication of 3), addr2line is totally useless on DLLs/EXEs without debug information available. I think point number 4 is worth
Re: addr2line [ Was: better stackdumps ]
Brian Dessent wrote: I think I see what's going on here though, the Cygwin fault handler took the first chance exception and wrote the stackdump file, and only then passed it on to the debugger, so that by the time gdb got notice of the fault the stack was all fubar. This could be the reason why dumper is not working too. I thought there was a IsBeingDebugged() check in the Silly me, this is good old set cygwin-exceptions defaulting to off... of course gdb was ignoring the fault and letting Cygwin handle it. With it set to on everything works as expected, and the issue of why the process state that dumper records is so trashed is unrelated. Brian
[PATCH] stackdump rev2
Brian Dessent wrote: Yes, it means there is one frame that says sigbe instead of the actual return location somewhere else. I don't think that's impossible to fix either: the fault handler gets the context of the faulting thread so it can look up its tls area through %fs:4 and peek at the top of the signal stack for the value. I will investigate if this is workable. In fact, since the fault handler runs in the context of the thread that fauled, this turns out to be trivial. I started with an implementation that calls GetThreadSelectorEntry to resolve the %fs:4 in the CONTEXT, but it turned out to always be equal to simply _my_tls. This updated version of the patch simply subsitutes _my_tls.retaddr() in place of thestack.sf.AddrPC.Offset if it is equal to _sigbe, allowing for proper unwinding through these frames. I tested this when the faulting thread is the main thread as well as a user created thread and it seems to do the right thing. Am I missing something hideous here? Example when it's the main thread: Exception: STATUS_ACCESS_VIOLATION at eip=610F7501 eax= ebx= ecx= edx=FEED esi= edi=FEED ebp=0022C568 esp=0022C564 program=\\?\C:\cygwin\home\brian\testcases\backtrace\a.exe, pid 7720, thread main cs=001B ds=0023 es=0023 fs=003B gs= ss=0023 Stack trace: Frame Function Args 0022C568 610F7501 (FEED, 0022C676, 00402008, 0001) cygwin1.dll!_strlen+0x11 0022CC98 610FDD8B (0022D008, 6111C668, 00402000, 0022CCC8) cygwin1.dll!_fputc+0x34EB 0022CCB8 6110A360 (00402000, FEED, 0009, 00401065) cygwin1.dll!_printf+0x30 0022CCE8 00401084 (0001, 61290908, 00680098, ) a.exe+0x84 0022CD98 61006094 (, 0022CDD0, 61005430, 0022CDD0) cygwin1.dll!_dll_crt0_1+0xC64 End of stack trace Example when it's a user created thread: Exception: STATUS_ACCESS_VIOLATION at eip=610F7501 eax= ebx= ecx= edx=FEED esi= edi=FEED ebp=1886C5E8 esp=1886C5E4 program=\\?\C:\cygwin\home\brian\testcases\backtrace\tc2.exe, pid 8108, thread unknown (0x1BA4) cs=001B ds=0023 es=0023 fs=003B gs= ss=0023 Stack trace: Frame Function Args 1886C5E8 610F7501 (FEED, 1886C6F6, 0040200F, 0001) cygwin1.dll!_strlen+0x11 1886CD18 610FDD8B (1886D008, 6111C668, 00402005, 1886CD48) cygwin1.dll!_fputc+0x34EB 1886CD38 6110A360 (00402005, FEED, 1886CD58, 611004C0) cygwin1.dll!_printf+0x30 1886CD58 0040106C (00402012, 000A, 1886CD78, 0040109E) tc2.exe+0x6C 1886CD68 00401085 (00402017, 1886CE64, 1886CDB8, 610C85AB) tc2.exe+0x85 1886CD78 0040109E (, , 611FD6B0, 6100415A) tc2.exe+0x9E 1886CDB8 610C85AB (006901F0, 1886CDF0, 610C8530, 1886CDF0) cygwin1.dll!pthread::thread_init_wrapper+0x7B End of stack trace As you can see I also added special-casing for the thread_init_wrapper function that forms the bottom of the stack for user created threads. Christopher Faylor wrote: That's not a patch that I can approve, unfortunately. That's okay, it was just illustrative anyway. I think I can fix this purely in Cygwin by simply not emitting and .stabs if the effective CFLAGS indicates DWARF-2 is desired. That's next on my plate. Brian2008-03-20 Brian Dessent [EMAIL PROTECTED] * exceptions.cc (maybe_adjust_va_for_sigfe): New function to cope with signal wrappers. (prettyprint_va): New function that attempts to find a symbolic name for a memory location by walking the export sections of all modules. (stackdump): Call it. Use the signal frame return address instead of _sigbe. * gendef: Mark __sigfe as a global so that its address can be used by the backtrace code. * ntdll.h (struct _PEB_LDR_DATA): Declare. (struct _LDR_MODULE): Declare. (struct _PEB): Use actual LDR_DATA type for LdrData. (RtlImageDirectoryEntryToData): Declare. Index: exceptions.cc === RCS file: /cvs/src/src/winsup/cygwin/exceptions.cc,v retrieving revision 1.319 diff -u -p -r1.319 exceptions.cc --- exceptions.cc 12 Mar 2008 12:41:49 - 1.319 +++ exceptions.cc 20 Mar 2008 21:06:07 - @@ -284,6 +284,159 @@ stack_info::walk () return 1; } +/* These symbols are used by the below functions to put a prettier face + on a stack backtrace. */ +extern u_char etext asm (etext); /* End of .text */ +extern u_char thread_init_wrapper asm ([EMAIL PROTECTED]); +extern u_char _sigfe, _sigbe; +void dll_crt0_1 (void *); + +const struct { + DWORD va; + const char *label; +} hints[] = { + { (DWORD) thread_init_wrapper, pthread::thread_init_wrapper }, + { (DWORD) dll_crt0_1, _dll_crt0_1 } +}; + +/* Helper function to assist with backtraces. This tries to detect if + an entrypoint is really a sigfe wrapper and returns the actual address + of the function. Here's an example: + + 610ab9f0 __sigfe_printf: +