On Mon, Apr 15, 2013 at 4:20 PM, Andrew Deason <adea...@sinenomine.net>wrote:
> On Thu, 28 Mar 2013 16:38:55 -0500 > Andrew Deason <adea...@sinenomine.net> wrote: > > > What I was after is the stack trace of all of the LWPs in the buserver > > process. You cannot get at those easily, since LWP is a threading > > system that is not understood by the debugger (dbx or gdb). That's > > kinda why I was treating the 'core file' option as something where you > > give the core file to a developer. Getting that information by > > providing instructions to you makes this a bit more difficult... but > > is probably doable. > > So, while I was waiting for some stuff to compile while trying this, I > realized this might be fixed by > < > http://git.openafs.org/?p=openafs.git;a=patch;h=dce2d8206ecd35c96e75cc0662432c2a4f9c3d7a > >. > I'm not clear on what exactly the principal is for, but that does fix a > bug that was introduced in the 1.6 series. Since there have not been > many substantial changes to budb in general, and that change impacts the > CreateDump function, that seems like a likely culprit. (To devs: the > original change doesn't make a lot of sense to me; the commit messages > suggest there are different strutures in play, but the args and function > parameters are all ktc_principal.) > *If* it's it, it would have to be a missing null termination. I looked at that case specifically and couldn't create an issue but it's conceivable I missed due to differences in the host I tested on. The only other interesting thing was the potential for differences due to the offsetof changes, but that was also a red herring. >This actually isn't so bad if you rely on mdb to give you the stack > traces. Attached a dbx script that can be used to get some traces. This > should probably live in a repo or something... somewhere. Do people have > an opinion on where this should go? > > We have scripts from Robert Milkowski for dtrace which similarly lack a home; Properly they might be in their own module instead of openafs but I don't think it would be particularly abusive to include them here. > Anyway, you can use it like this. If you compiled with LWP debug turned > on, it's more likely to work (this means running ./configure with > --enable-debug-lwp), but it's not required. Run: > > $ /opt/SUNWspro/bin/dbx /path/to/buserver /path/to/core > [...] > (dbx) source lwpstacks.ksh > (dbx) lwpstacks > > If you don't have LWP debug, this will fail (probably with something > like "dbx: struct "lwp_pcb" is not defined[...]"). You can try running > this without using debug symbols (we'll guess at where some data is), by > running this instead: > > (dbx) lwpstacks nodebug > > With the script as-is, the 'nodebug' stuff seems to work with OpenAFS > 1.6.2 on Solaris 10 SPARC, but it may need fiddling to work anywhere > else. > > If either of those works, you'll see something like: > > (dbx) lwpstacks nodebug > !# NOT using debug symbols > !# looking for threads in blocked > ::echo stack pointer for thread 14a530: 1562d8 > 0x001562d8::stack 0 ! sed 's/^/ /' > ::echo > ::echo stack pointer for thread 180cf8: 18caa0 > 0x0018caa0::stack 0 ! sed 's/^/ /' > [...] > > To get actual stack traces out of that, pipe the output through mdb: > > (dbx) lwpstacks nodebug | mdb /path/to/buserver /path/to/core > stack pointer for thread 14a530: 1562d8 > LWP_WaitProcess+0x38() > rxi_Sleep+4() > rx_GetCall+0x320() > rxi_ServerProc+0x40() > rx_ServerProc+0x74() > Create_Process_Part2+0x40() > 0x68388() > ubik_ServerInitCommon+0x23c() > > stack pointer for thread 180cf8: 18caa0 > LWP_WaitProcess+0x38() > [...] > > This output is similar enough to mdb ::findstack output that it will > work with David Powell's "munges" script if you have that. But it's also > pretty useful just by itself. > > Surprisingly, that doesn't require any manual core editing. mdb I think > is the only debugger I've used that lets you get stack trace information > from arbitrary context (at least, I haven't seen an easy way for gdb or > dbx to do this), but the way state is stored on solaris on sparc > probably helps make that easier. > > If you want to provide such stack output from the core you captured, it > may say what's going on. > > -- Derrick