Re: [PATCH] better stackdumps
On Mar 19 08:56, Brian Dessent wrote: Christopher Faylor wrote: Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. I hear you, but part of the motivation for writing this was a recent thread the other week on the gdb list where the poster asked how to get symbols in a Cygwin stackdump file. I suggested the same thing, setting error_start=dumper to get a real core dump. They did, and the result was completely useless. Here is what dumper gives you for the same simple testcase: [...] addr2line also seems to be totally unequipped to deal with separate .dbg information, as I can't get it to output a thing even though both a.exe and cygwin1.dll have full debug symbols: $ addr2line -e a.exe 0x610F74B1 ??:0 Is it a big problem to fix addr2line to deal with .dbg files? I like your idea to add names to the stackdump especially because of addr2line's brokenness. But, actually, if addr2line would work with .dbg files, there would be no reason to add this to the stackdump file. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat
Re: [PATCH] better stackdumps
On Thu, Mar 20, 2008 at 11:35:32AM +0100, Corinna Vinschen wrote: On Mar 19 08:56, Brian Dessent wrote: Christopher Faylor wrote: Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. I hear you, but part of the motivation for writing this was a recent thread the other week on the gdb list where the poster asked how to get symbols in a Cygwin stackdump file. I suggested the same thing, setting error_start=dumper to get a real core dump. They did, and the result was completely useless. Here is what dumper gives you for the same simple testcase: [...] addr2line also seems to be totally unequipped to deal with separate .dbg information, as I can't get it to output a thing even though both a.exe and cygwin1.dll have full debug symbols: $ addr2line -e a.exe 0x610F74B1 ??:0 Is it a big problem to fix addr2line to deal with .dbg files? I like your idea to add names to the stackdump especially because of addr2line's brokenness. But, actually, if addr2line would work with .dbg files, there would be no reason to add this to the stackdump file. There's still the issue of dealing with the separate signal stack. That makes stack dumps less than useful. However, I would really love it if gdb was able to decode this information automatically. The bottom line is that I think that rather than modifying cygwin to work around the limitations of the tools we should be fixing the tools. But, then, that puts the problem back on my shoulders as the gdb and binutils maintainer. PTC. cgf
Re: [PATCH] better stackdumps
Christopher Faylor wrote: Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. I hear you, but part of the motivation for writing this was a recent thread the other week on the gdb list where the poster asked how to get symbols in a Cygwin stackdump file. I suggested the same thing, setting error_start=dumper to get a real core dump. They did, and the result was completely useless. Here is what dumper gives you for the same simple testcase: $ gdb (gdb) core a.exe.core [New process 1] [New process 0] [New process 0] #0 0x7c90eb94 in ?? () (gdb) thr apply all bt Thread 3 (process 0): #0 0x7c90eb94 in ?? () Thread 2 (process 0): #0 0x7c90eb94 in ?? () Thread 1 (process 1): #0 0x7c90eb94 in ?? () You can't even make out the names of any of the loaded modules from the core: (gdb) i tar Local core dump file: `/home/brian/testcases/backtrace/a.exe.core', file type elf32-i386. 0x0001 - 0x00011000 is load11 0x0002 - 0x00021000 is load12 0x001ff000 - 0x00233000 is load13 0x0024 - 0x00248000 is load14 0x00253000 - 0x00254000 is load15 0x0034 - 0x00346000 is load16 0x0035 - 0x00353000 is load17 0x0036 - 0x00376000 is load18 0x0038 - 0x003bd000 is load19 0x003c - 0x003c6000 is load20 0x003d - 0x003d1000 is load21 0x003e - 0x003ee000 is load22 0x003f - 0x003f1000 is load23 0x0040 - 0x00401000 is load24 0x004013d0 - 0x00405000 is load25 0x0042 - 0x00461000 is load26 0x0066c000 - 0x006a is load27 0x1867f000 - 0x1868 is load28 0x60fd - 0x60fd5000 is load29 0x60ff - 0x60ff9000 is load30 0x6100 - 0x61001000 is load31 0x61118994 - 0x61119000 is load32 0x6111a3d8 - 0x611fa000 is load33 0x611fb000 - 0x612a is load34 0x77b4 - 0x77b41000 is load35 0x77b5d60c - 0x77b62000 is load36 0x77dd - 0x77dd1000 is load37 0x77e452d9 - 0x77e6b000 is load38 0x77e7 - 0x77e71000 is load39 0x77ef3353 - 0x77ef4000 is load40 0x77efa90d - 0x77f02000 is load41 0x77fe - 0x77fe1000 is load42 0x77fed1dc - 0x77ff1000 is load43 0x7c80 - 0x7c801000 is load44 0x7c883111 - 0x7c8f5000 is load45 0x7c90 - 0x7c901000 is load46 0x7c97b6fe - 0x7c9b is load47 0x7f6f - 0x7f6f7000 is load48 0x7ffb - 0x7ffd4000 is load49 0x7ffdc000 - 0x7ffe1000 is load50 addr2line also seems to be totally unequipped to deal with separate .dbg information, as I can't get it to output a thing even though both a.exe and cygwin1.dll have full debug symbols: $ addr2line -e a.exe 0x610F74B1 ??:0 $ addr2line -e a.exe 0x610FDD3B ??:0 $ addr2line -e a.exe 0x6110A310 ??:0 $ addr2line -e a.exe 0x610AA4A8 ??:0 The situation with error_start=gdb isn't really all that better: (gdb) thr apply all bt Thread 3 (thread 4552.0x16a8): #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from /winxp/system32/ntdll.dll #1 0x7c9507a8 in ntdll!KiIntSystemCall () from /winxp/system32/ntdll.dll #2 0x0005 in ~cygheap_fdget (this=0x1) at /usr/src/sourceware/winsup/cygwin/times.cc:518 #3 0x in ?? () Thread 2 (thread 4552.0x132c): #0 0x7c90eb94 in ntdll!LdrAccessResource () from /winxp/system32/ntdll.dll #1 0x7c90e288 in ntdll!ZwReadFile () from /winxp/system32/ntdll.dll #2 0x7c801875 in ReadFile () from /winxp/system32/kernel32.dll #3 0x0754 in ?? () at /usr/src/sourceware/winsup/cygwin/dtable.h:33 #4 0x in ?? () Thread 1 (thread 4552.0x18b0): #0 0x7c90eb94 in ntdll!LdrAccessResource () from /winxp/system32/ntdll.dll #1 0x7c90e21f in ntdll!ZwQueryVirtualMemory () from /winxp/system32/ntdll.dll #2 0x7c937b93 in ntdll!RtlUpcaseUnicodeChar () from /winxp/system32/ntdll.dll #3 0x in ?? () #4 0x61027c20 in sigpacket::process () at /usr/src/sourceware/winsup/cygwin/exceptions.cc:1444 #5 0x7c93783a in ntdll!LdrFindCreateProcessManifest () from /winxp/system32/ntdll.dll #6 0x61027c20 in sigpacket::process () at /usr/src/sourceware/winsup/cygwin/exceptions.cc:1444 #7 0x7c90eafa in ntdll!LdrDisableThreadCalloutsForDll () from /winxp/system32/ntdll.dll #8 0x in ?? () #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from /winxp/system32/ntdll.dll None of this has anything to do with the actual call chain that triggered the fault which was printf-fputc-strlen. Yes, you usually have to continue to get the fault re-triggered, but for some reason when I type continue in this simple testcase gdb just hangs completely. Even if the user gets this far they will still need to have debug symbols installed for cygwin1.dll which in of itself is a whole other task that most users cringe at taking on. On contrast, the
Re: [PATCH] better stackdumps
A Wednesday 19 March 2008 15:56:55, Brian Dessent wrote: Christopher Faylor wrote: Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. I hear you, but part of the motivation for writing this was a recent thread the other week on the gdb list where the poster asked how to get symbols in a Cygwin stackdump file. I suggested the same thing, setting error_start=dumper to get a real core dump. They did, and the result was completely useless. Here is what dumper gives you for the same simple testcase: $ gdb (gdb) core a.exe.core [New process 1] [New process 0] [New process 0] #0 0x7c90eb94 in ?? () (gdb) thr apply all bt Thread 3 (process 0): #0 0x7c90eb94 in ?? () Thread 2 (process 0): #0 0x7c90eb94 in ?? () Thread 1 (process 1): #0 0x7c90eb94 in ?? () You can't even make out the names of any of the loaded modules from the core: Sorry I missed the discussion at [EMAIL PROTECTED] What does info sharelibrary say? The last I looked at this, it worked. Is this broken in gdb head and on the cygwin distributed gdb? that this patch introduces. Plus without being able to recognise that signal wrappers obscure the location of the real entrypoints to many/most Cygwin functions, the backtrace used by this method looks very bad and doesn't give useful information for routines in Cygwin -- and being able to do that processing is much easier when you're in the actual module that has the wrappers as you can simply test against _sigfe. Is this something that would be nice to have in gdb then? -- Pedro Alves
[PATCH] better stackdumps
This patch adds the ability to see functions/symbols in the .stackdump files generated when there's a fault. It parses the export sections of each loaded module and finds the closest exported address for each stack frame address. This of course won't be perfect as it will show the wrong function if the frame is in the middle of a non-exported function, but it's better than what we have now. This also uses a couple of tricks to make the output more sensible. It can see through the sigfe wrappers and print the actual functions being wrapped. It also has a set of internal symbols that it consults for symbols in Cygwin. This allows it to get the bottom frame correct (_dll_crt0_1) even though that function isn't exported. If there are any other such functions they can be easily added to the 'hints' array. Also attached is a sample output of an invalid C program and the resulting stackdump. Note that the frame labeled _sigbe really should be a frame somewhere inside the main .exe. I pondered trying to extract the sigbe's return address off the signal stack and using that for the label but I haven't quite gotten there, since I can't think of a reliable way to figure out the correct location on the tls stack where the real return address is stored. Of course the labeling works for any module/dll, not just cygwin1.dll, but I didn't have a more elaborate testcase to demonstrate. Brian2008-03-18 Brian Dessent [EMAIL PROTECTED] * exceptions.cc (maybe_adjust_va_for_sigfe): New function to cope with signal wrappers. (prettyprint_va): New function that attempts to find a symbolic name for a memory location by walking the export sections of all modules. (stackdump): Call it. * gendef: Mark __sigfe as a global so that its address can be used by the backtrace code. * ntdll.h (struct _PEB_LDR_DATA): Declare. (struct _LDR_MODULE): Declare. (struct _PEB): Use actual LDR_DATA type for LdrData. (RtlImageDirectoryEntryToData): Declare. Index: exceptions.cc === RCS file: /cvs/src/src/winsup/cygwin/exceptions.cc,v retrieving revision 1.319 diff -u -p -r1.319 exceptions.cc --- exceptions.cc 12 Mar 2008 12:41:49 - 1.319 +++ exceptions.cc 19 Mar 2008 00:04:13 - @@ -284,6 +284,158 @@ stack_info::walk () return 1; } +/* These symbols are used by the below functions to put a prettier face + on a stack backtrace. */ +extern u_char etext asm (etext); /* End of .text */ +extern u_char _sigfe, _sigbe; +void dll_crt0_1 (void *); + +const struct { + DWORD va; + const char *label; +} hints[] = { + { (DWORD) _sigbe, _sigbe }, + { (DWORD) dll_crt0_1, _dll_crt0_1 } +}; + +/* Helper function to assist with backtraces. This tries to detect if + an entrypoint is really a sigfe wrapper and returns the actual address + of the function. Here's an example: + + 610ab9f0 __sigfe_printf: + 610ab9f0: 68 40 a4 10 61 push $0x6110a440 + 610ab9f5: e9 bf eb ff ff jmp610aa5b9 __sigfe + + Suppose that we are passed 0x610ab9f0. We need to recognise the + push/jmp combination and return 0x6110a440 _printf instead. Note + that this is a relative jump. */ +static DWORD +maybe_adjust_va_for_sigfe (DWORD va) +{ + if (va (DWORD) user_data-hmodule || va (DWORD) etext) +return va; + + unsigned char *x = (unsigned char *) va; + + if (x[0] == 0x68 x[5] == 0xe9) +{ + DWORD jmprel = *(DWORD *)(x + 6); + + if ((unsigned) va + 10 + (unsigned) jmprel == (unsigned) _sigfe) +return *(DWORD *)(x + 1); +} + return va; +} + +/* Walk the list of modules in the current process and parse their + export tables in order to find the entrypoint closest to but less + than 'faultva'. This won't be perfect, such as when 'faultva' + actually resides in a non-exported function, but it is still better + than nothing. Note that this algorithm could be made much more + efficient by both sorting the export tables as well as saving the + result between calls. However, this implementation requires no + allocation of memory and minimal system calls, so it should be safe + in the context of an exception handler. And we're probably about to + terminate the process anyway, so performance is not critical. */ +static char * +prettyprint_va (DWORD faultva) +{ + static char buf[256]; + + ULONG bestmatch_va = 0; + + PLIST_ENTRY head = NtCurrentTeb()-Peb-LdrData-InMemoryOrderModuleList; + for (PLIST_ENTRY x = head-Flink; x != head; x = x-Flink) +{ + PLDR_MODULE mod = CONTAINING_RECORD (x, LDR_MODULE, + InMemoryOrderModuleList); + if ((DWORD) mod-BaseAddress faultva) +continue; + + DWORD len; + IMAGE_EXPORT_DIRECTORY *edata_va = (IMAGE_EXPORT_DIRECTORY *) + RtlImageDirectoryEntryToData
Re: [PATCH] better stackdumps
Brian Dessent wrote: Of course the labeling works for any module/dll, not just cygwin1.dll, but I didn't have a more elaborate testcase to demonstrate. Forgot to mention... The symbols are just tacked on on the right hand side there for now. I wasn't really sure how to handle that. I didn't want to remove display of the actual EIP for each frame as that could be removing useful info, but I wasn't quite sure where to put everything or how to align it... so as it is now it wraps wider than 80 chars which is probably pretty ugly on a default size terminal. Brian
Re: [PATCH] better stackdumps
Igor Peshansky wrote: Would it make sense to force a newline before the function name and to display it with a small indent? That way people who want the old-style stackdump could just feed the new one into grep -v '^ ' or something... Yes, that would be one way. That actually reminds me of another issue that I forgot to mention: glibc has a backtrace API that can be called from user-code at any time, not just at faults. At the moment we are exporting something similar called cygwin_stackdump but we don't declare it in any header. Would it be worthwhile to try to match the glibc API and export it under the same name/output format? Brian
Re: [PATCH] better stackdumps
On Tue, Mar 18, 2008 at 05:24:20PM -0700, Brian Dessent wrote: This patch adds the ability to see functions/symbols in the .stackdump files generated when there's a fault. It parses the export sections of each loaded module and finds the closest exported address for each stack frame address. This of course won't be perfect as it will show the wrong function if the frame is in the middle of a non-exported function, but it's better than what we have now. This also uses a couple of tricks to make the output more sensible. It can see through the sigfe wrappers and print the actual functions being wrapped. It also has a set of internal symbols that it consults for symbols in Cygwin. This allows it to get the bottom frame correct (_dll_crt0_1) even though that function isn't exported. If there are any other such functions they can be easily added to the 'hints' array. Also attached is a sample output of an invalid C program and the resulting stackdump. Note that the frame labeled _sigbe really should be a frame somewhere inside the main .exe. I pondered trying to extract the sigbe's return address off the signal stack and using that for the label but I haven't quite gotten there, since I can't think of a reliable way to figure out the correct location on the tls stack where the real return address is stored. Of course the labeling works for any module/dll, not just cygwin1.dll, but I didn't have a more elaborate testcase to demonstrate. Brian 2008-03-18 Brian Dessent [EMAIL PROTECTED] * exceptions.cc (maybe_adjust_va_for_sigfe): New function to cope with signal wrappers. (prettyprint_va): New function that attempts to find a symbolic name for a memory location by walking the export sections of all modules. (stackdump): Call it. * gendef: Mark __sigfe as a global so that its address can be used by the backtrace code. * ntdll.h (struct _PEB_LDR_DATA): Declare. (struct _LDR_MODULE): Declare. (struct _PEB): Use actual LDR_DATA type for LdrData. (RtlImageDirectoryEntryToData): Declare. Sorry, but I don't like this concept. This bloats the cygwin DLL for a condition that would be better served by either using gdb or generating a real coredump. OTOH, adding a list of loaded dlls to a stackdump might not be a bad idea so that some postprocessing program could generate the same output as long as that didn't add too much code to cygwin. cgf