Hi,

are you saying that a thread is not stoppable while it's blocked waiting for
a futex? That would be obviously wrong. Threads must be stoppable while 
being blocked in any syscall that's blocking by design - otherwise you would
get poor debugging experience.

As far as what happened when the sleep was forcefully interrupted - I don't 
really follow you. I'd propably need to refresh my memory and look through 
the code. We can discuss this on the meeting.

Cheers,
Jiri


Od: Jakub Jermar <[email protected]>


"On 06/10/2015 10:50 AM, Jan Mareš wrote:
> Wow, that's one hell of a setup. So the right solution is to be prepared
> for not receiving UDEBUG_M_GO. If I understand correctly and this
> happens asynchronously to taskdump, then this could be also reproduced
> by killing a task while taskdump is in the middle of dumping it. It
> would have to very precise moment, but thanks to glib-test-rec-mutex
> this moment was a bit prolonged :).
> 
> Shouldn't taskdump wait for the response to UDEBUG_M_BEGIN with some
> timeout then? And if it doesn't receive the response it could kill it
> automatically, especially when taskdump was called as a result of a crash.

The fact that the thread was not in the debug state when it enterer the
syscall but rather while it was blocking in it also plays a role.
Otherwise I think it would have waited for the go properly. So it
appears there are two possible solutions. Either to detect go is not
there and simply abort the udebug processing, in which case we would
loose the ability to debug this thread, or, after the interruption from
the syscall somehow detect that the thread became debugged in the
meantime, make sure the UDEBUG_M_BEGIN is answered and wait for the go.
And of course, there is the question of whether blocking threads should
be interrupted from their sleep as they are marked for debugging.

> When you have time have a look at that file with fibrils and threads, it
> would be really interesting to know why it doesn't work or
> indeed beneficial to be able fibrils and threads like that in the future.

I have spent some time on this during the weekend and even found one bug
that may have been causing some of the crashes: the fibril_list was
being accessed unprotected by any futex. But there appear to be more
issues and I haven't found them yet.

Jakub

> 
> 2015-06-09 23:27 GMT+02:00 Jakub Jermar <[email protected]
> <mailto:[email protected]>>:
> 
> On 06/05/2015 09:05 PM, Jakub Jermar wrote:
> > On 06/05/2015 08:15 PM, Jan Mareš wrote:
> >> Concerning the second panic, I am convinced that threads can't be used
> >> as execution containers. There is a race condition causing memory
> >> corruption in the process. How resistant udebug is to corrupted memory
> >> of the process I don't know, but first I would try to remove the race
> >> condition.
> >
> > I am not that convinced about this one. Udebug is in the kernel and here
> > we are dealing with uspace stuff.
> >
> > I can see a thread which is interrupted from futex_sleep() and so it
> > executes thread_exit(), where it assumes that it has a go call (in the
> > udebug parlance), but that's not the case, so the kernel panics.
> >
> > udebug should probably check for the go in this case, to behave
> > gracefully. On the other hand, it is not clear, why taskdump (which is
> > run on the crashing test) cannot finish its job in a timely manner.
> 
> I think I have figured out some of the missing bits of the mosaic.
> 
> So glib-test-rec-mutex creates two threads. One thread blocks in
> SYS_FUTEX_SLEEP on a futex, most likely held by the other thread. This
> second thread however crashes for a reason that does not bother us now.
> The crash results in a debugging attempt by taskdump and the creation of
> the kbox debugging kernel thread in the glib-test-rec-mutex task. This
> is our third thread, btw.
> 
> Taskdump sends UDEBUG_M_BEGIN to the debugged glib-test-rec-mutex. The
> kbox kernel thread receives this call and processes it by marking all
> the other threads as debugged, but does not answer it, because one of
> the task's threads is still not stoppable: the one blocked in
> SYS_FUTEX_SLEEP.
> 
> The two tasks (glib-test-rec-mutex and taskdump) are now waiting for
> each other in a sort of a deadlock. There is no-one to up the futex for
> which the SYS_FUTEX_SLEEP thread is waiting and the debugging session
> cannot proceed until all threads become stoppable and the UDEBUG_M_BEGIN
> call is answered.
> 
> The only way out is to explicitly kill the glib-test-rec-mutex task.
> That results in the SYS_FUTEX_SLEEP being forcefully interrupted from
> the sleep and a call to thread_exit() in syscall_handler(), which in
> turn calls udebug_thread_e_event(). But all this happens asynchronously
> to taskdump, which does not manage to send the UDEBUG_M_GO message on
> time. udebug_thread_e_event() does not check for this possibility and
> assumes UDEBUG_M_GO is ready and triggers the kernel panic by
> referencing a NULL pointer.
> 
> Jakub
> 
> _______________________________________________
> HelenOS-devel mailing list
> [email protected] <mailto:[email protected]>
> http://lists.modry.cz/listinfo/helenos-devel
> 
> 
> 
> 
> _______________________________________________
> HelenOS-devel mailing list
> [email protected]
> http://lists.modry.cz/listinfo/helenos-devel
> 


_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel";
_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to