On 06/05/2015 09:05 PM, Jakub Jermar wrote: > On 06/05/2015 08:15 PM, Jan Mareš wrote: >> Concerning the second panic, I am convinced that threads can't be used >> as execution containers. There is a race condition causing memory >> corruption in the process. How resistant udebug is to corrupted memory >> of the process I don't know, but first I would try to remove the race >> condition. > > I am not that convinced about this one. Udebug is in the kernel and here > we are dealing with uspace stuff. > > I can see a thread which is interrupted from futex_sleep() and so it > executes thread_exit(), where it assumes that it has a go call (in the > udebug parlance), but that's not the case, so the kernel panics. > > udebug should probably check for the go in this case, to behave > gracefully. On the other hand, it is not clear, why taskdump (which is > run on the crashing test) cannot finish its job in a timely manner.
I think I have figured out some of the missing bits of the mosaic. So glib-test-rec-mutex creates two threads. One thread blocks in SYS_FUTEX_SLEEP on a futex, most likely held by the other thread. This second thread however crashes for a reason that does not bother us now. The crash results in a debugging attempt by taskdump and the creation of the kbox debugging kernel thread in the glib-test-rec-mutex task. This is our third thread, btw. Taskdump sends UDEBUG_M_BEGIN to the debugged glib-test-rec-mutex. The kbox kernel thread receives this call and processes it by marking all the other threads as debugged, but does not answer it, because one of the task's threads is still not stoppable: the one blocked in SYS_FUTEX_SLEEP. The two tasks (glib-test-rec-mutex and taskdump) are now waiting for each other in a sort of a deadlock. There is no-one to up the futex for which the SYS_FUTEX_SLEEP thread is waiting and the debugging session cannot proceed until all threads become stoppable and the UDEBUG_M_BEGIN call is answered. The only way out is to explicitly kill the glib-test-rec-mutex task. That results in the SYS_FUTEX_SLEEP being forcefully interrupted from the sleep and a call to thread_exit() in syscall_handler(), which in turn calls udebug_thread_e_event(). But all this happens asynchronously to taskdump, which does not manage to send the UDEBUG_M_GO message on time. udebug_thread_e_event() does not check for this possibility and assumes UDEBUG_M_GO is ready and triggers the kernel panic by referencing a NULL pointer. Jakub _______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/listinfo/helenos-devel
