On Mon, Apr 15, 2013 at 4:20 PM, Andrew Deason <adea...@sinenomine.net>wrote:

> On Thu, 28 Mar 2013 16:38:55 -0500
> Andrew Deason <adea...@sinenomine.net> wrote:
>
> > What I was after is the stack trace of all of the LWPs in the buserver
> > process. You cannot get at those easily, since LWP is a threading
> > system that is not understood by the debugger (dbx or gdb). That's
> > kinda why I was treating the 'core file' option as something where you
> > give the core file to a developer. Getting that information by
> > providing instructions to you makes this a bit more difficult... but
> > is probably doable.
>
> So, while I was waiting for some stuff to compile while trying this, I
> realized this might be fixed by
> <
> http://git.openafs.org/?p=openafs.git;a=patch;h=dce2d8206ecd35c96e75cc0662432c2a4f9c3d7a
> >.
> I'm not clear on what exactly the principal is for, but that does fix a
> bug that was introduced in the 1.6 series. Since there have not been
> many substantial changes to budb in general, and that change impacts the
> CreateDump function, that seems like a likely culprit. (To devs: the
> original change doesn't make a lot of sense to me; the commit messages
> suggest there are different strutures in play, but the args and function
> parameters are all ktc_principal.)
>

*If* it's it, it would have to be a missing null termination. I looked at
that
case specifically and couldn't create an issue but it's conceivable I
missed
due to differences in the host I tested on. The only other interesting thing
was the potential for differences due to the offsetof changes, but that was
also a red herring.

>This actually isn't so bad if you rely on mdb to give you the stack

> traces. Attached a dbx script that can be used to get some traces. This
> should probably live in a repo or something... somewhere. Do people have
> an opinion on where this should go?
>
>
We have scripts from Robert Milkowski for dtrace which similarly lack a
home;
Properly they might be in their own module instead of openafs but I don't
think
it would be particularly abusive to include them here.


> Anyway, you can use it like this. If you compiled with LWP debug turned
> on, it's more likely to work (this means running ./configure with
> --enable-debug-lwp), but it's not required. Run:
>
> $ /opt/SUNWspro/bin/dbx /path/to/buserver /path/to/core
> [...]
> (dbx) source lwpstacks.ksh
> (dbx) lwpstacks
>
> If you don't have LWP debug, this will fail (probably with something
> like "dbx: struct "lwp_pcb" is not defined[...]"). You can try running
> this without using debug symbols (we'll guess at where some data is), by
> running this instead:
>
> (dbx) lwpstacks nodebug
>
> With the script as-is, the 'nodebug' stuff seems to work with OpenAFS
> 1.6.2 on Solaris 10 SPARC, but it may need fiddling to work anywhere
> else.
>
> If either of those works, you'll see something like:
>
> (dbx) lwpstacks nodebug
> !# NOT using debug symbols
> !# looking for threads in blocked
> ::echo stack pointer for thread 14a530: 1562d8
> 0x001562d8::stack 0 ! sed 's/^/  /'
> ::echo
> ::echo stack pointer for thread 180cf8: 18caa0
> 0x0018caa0::stack 0 ! sed 's/^/  /'
> [...]
>
> To get actual stack traces out of that, pipe the output through mdb:
>
> (dbx) lwpstacks nodebug | mdb /path/to/buserver /path/to/core
> stack pointer for thread 14a530: 1562d8
>   LWP_WaitProcess+0x38()
>   rxi_Sleep+4()
>   rx_GetCall+0x320()
>   rxi_ServerProc+0x40()
>   rx_ServerProc+0x74()
>   Create_Process_Part2+0x40()
>   0x68388()
>   ubik_ServerInitCommon+0x23c()
>
> stack pointer for thread 180cf8: 18caa0
>   LWP_WaitProcess+0x38()
> [...]
>
> This output is similar enough to mdb ::findstack output that it will
> work with David Powell's "munges" script if you have that. But it's also
> pretty useful just by itself.
>
> Surprisingly, that doesn't require any manual core editing. mdb I think
> is the only debugger I've used that lets you get stack trace information
> from arbitrary context (at least, I haven't seen an easy way for gdb or
> dbx to do this), but the way state is stored on solaris on sparc
> probably helps make that easier.
>
> If you want to provide such stack output from the core you captured, it
> may say what's going on.
>
>

-- 
Derrick

Reply via email to