On 06/05/2015 09:05 PM, Jakub Jermar wrote:
> On 06/05/2015 08:15 PM, Jan Mareš wrote:
>> Concerning the second panic, I am convinced that threads can't be used
>> as execution containers. There is a race condition causing memory
>> corruption in the process. How resistant udebug is to corrupted memory
>> of the process I don't know, but first I would try to remove the race
>> condition.
> 
> I am not that convinced about this one. Udebug is in the kernel and here
> we are dealing with uspace stuff.
> 
> I can see a thread which is interrupted from futex_sleep() and so it
> executes thread_exit(), where it assumes that it has a go call (in the
> udebug parlance), but that's not the case, so the kernel panics.
> 
> udebug should probably check for the go in this case, to behave
> gracefully. On the other hand, it is not clear, why taskdump (which is
> run on the crashing test) cannot finish its job in a timely manner.

I think I have figured out some of the missing bits of the mosaic.

So glib-test-rec-mutex creates two threads. One thread blocks in
SYS_FUTEX_SLEEP on a futex, most likely held by the other thread. This
second thread however crashes for a reason that does not bother us now.
The crash results in a debugging attempt by taskdump and the creation of
the kbox debugging kernel thread in the glib-test-rec-mutex task. This
is our third thread, btw.

Taskdump sends UDEBUG_M_BEGIN to the debugged glib-test-rec-mutex. The
kbox kernel thread receives this call and processes it by marking all
the other threads as debugged, but does not answer it, because one of
the task's threads is still not stoppable: the one blocked in
SYS_FUTEX_SLEEP.

The two tasks (glib-test-rec-mutex and taskdump) are now waiting for
each other in a sort of a deadlock. There is no-one to up the futex for
which the SYS_FUTEX_SLEEP thread is waiting and the debugging session
cannot proceed until all threads become stoppable and the UDEBUG_M_BEGIN
call is answered.

The only way out is to explicitly kill the glib-test-rec-mutex task.
That results in the SYS_FUTEX_SLEEP being forcefully interrupted from
the sleep and a call to thread_exit() in syscall_handler(), which in
turn calls udebug_thread_e_event(). But all this happens asynchronously
to taskdump, which does not manage to send the UDEBUG_M_GO message on
time. udebug_thread_e_event() does not check for this possibility and
assumes UDEBUG_M_GO is ready and triggers the kernel panic by
referencing a NULL pointer.

Jakub

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to