Re: L4Re: Identifying the source location of a program exception

Paul Boddie Thu, 30 Jul 2020 16:29:43 -0700

Frank,

Thank you very much for your very descriptive account of how the exception 
location might be discovered using the kernel debugger. I think this may be a 
long exercise, but I wanted to respond to at least acknowledge your message.


On Wednesday, 29 July 2020 09:24:17 CEST Frank Mehnert wrote:
> 
> I want to encourage you to take the program counter value serious. The
> message says that there was an access to the memory at address 0x38
> (sounds like an access to offset 38 of an object where the object pointer
> was not initialized) and the corresponding program counter in userland
> s 0x3a8bd9. From that value I guess that your host is AMD64.

Yes, that is correct. I also assumed that the error was related to a null 
reference.

> Now the question is of course: Which application triggered this exception?
> If you know the answer then you should disassemble the corresponding binary
> with
> 
>   objdump -ldC <filename> | less
> 
> and search for the program counter. If your binary was compiled with
> debugging information, you will even see the source code around the
> faulting instruction.
> 
> If your binary was not compiled with debugging information:
> 
>  1. If the application is compiled within the L4Re tree then use the
>     binary from the package build directory because that one is not
>     stripped, for example
> 
>       build-x86-64/pkg/hello/server/src/OBJ-amd64_gen-l4f/hello
> 
>     rather than
> 
>       build-x86-64/bin/amd64_gen/l4f/hello
> 
>     because the latter binary is stripped (i.e. contains no debugging
>     information) if CONFIG_BID_STRIP_PROGS is set to 'y'.

This is a useful reminder, but I think I must have experienced difficulties 
before with the bin subdirectory's contents, so I tend to access the 
appropriate binaries inside their package directories, anyway. It's probably 
just good fortune that something in my mind remembers the right kind of 
location to investigate.

>  2. If you compiled the binary yourself, make sure to the the '-g' flag
>     to the compiler options. For L4Re applications using the L4Re build
>     infrastructure this is done automatically, see 1.

I think that getting programs built outside the L4Re build framework would be 
too advanced for me.

> Next question: Is your binary linked statically or does it use dynamic
> libraries? You can find this out by doing
> 
>   objdump -p <filename>
> 
> If the output contains at least one line with 'NEEDED' then your binary
> uses dynamic libraries and looking for the program counter can be more
> difficult if the fault happens in a dynamic library because the library
> code is relocated to an unknown address when the library is loaded at
> program start.
> 
> Therefore for debugging it's always advisable to use static linked
> binaries. If your application uses the L4Re build infrastructure, set
> 
>   MODE = static
> 
> in the Makefile. If you use your own Makefile, make sure to add
> 
>   -static
> 
> to the linker flags.
> 
> Exploring your application binary is always the first advisable strategy
> to such an exception.

Here, I was using shared libraries, so I have now switched the linking of the 
offending program to be static.

[Details of the current thread and the return instruction address...]

> Remember: You are inspecting the region mapper thread which is != the
> thread which triggered the exception! Therefore, if you press <space>
> at the word marked as 'Return frame: IP', you will see the code for
> 'enter_kdebug()'. That doesn't help you.

This was certainly very useful advice, saving me quite some potential 
frustration, along with this:

> Now use the 'lp' view to see the list of present threads in the system. The
> cursor is placed at the current thread (the region mapper of your
> application). Look around at threads with the same 'sp' value (sp = space,
> the address space of the application). See this example:
> 
>    id  cpu    name             pr     sp  wait    to state
>    20   0     hello             2     1c    1d       ready,rcv_wait
>    1d   0     #hello           ff     1c             ready
>     d   0     moe              ff      c     -       ready,rcv_wait
>     b   0     sigma0            1      a     -       ready,rcv_wait
>     9   1     -----             0      1             ready
>     8   3     -----             0      1             ready
>     7   2     -----             0      1             ready
>     6   0     -----             0      1             ready
> 
> (this setup emulates 4 CPUs, thus there are 4 idle threads)
> 
> Thread '1d' is the region mapper thread of the hello application. 'hello'
> has 2 threads, thread 1d and thread 20. Thread 20 is currently waiting
> for an IPC from thread 1d. Therefore thread 20 is the one you want to
> inspect. Go there and press enter. Then move the TCB stack cursor down
> to 'Return frame: IP' as I told you before, see there:

OK, so following these instructions, I think I correctly identify the waiting 
thread in the same "space" corresponding to the region mapper thread. 
Navigating to the return instruction address indeed indicates the reported 
address:

L4Re[rm]: unhandled read page fault at 0x70 pc=0x100491b

And if I look in the objdump output, at least on some occasions, I can find an 
instruction which would be causing the exception. The code looks like this:

 100490f:       49 8b 04 24             mov    (%r12),%rax
 1004913:       4c 89 ee                mov    %r13,%rsi
 1004916:       31 d2                   xor    %edx,%edx
 1004918:       4c 89 e7                mov    %r12,%rdi
 100491b:       ff 50 70                callq  *0x70(%rax)

It is at this final instruction that the exception occurs, and the offset is 
as reported, too.

The awkward thing here, though, is that the offending instruction is a virtual 
method call within the same instance:

this->flush_flexpage(flexpage);

As I think I noted in my previous message, concurrency issues may be involved 
here, and I rather think I may need to step back and consider whether I am 
doing things well enough.

Paul



_______________________________________________
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Re: L4Re: Identifying the source location of a program exception

Reply via email to