I see, yes I realized my suggestion was wrong yesterday when I had a better look at it after I returned to my PC.
Anyway, I promised you that I would try to reproduce the problems with threads and fibrils on plain fibrils. I think I have it, unless there is a bug in my code. Have a look at [1]. It's the smallest extract I was able to create to reproduce the race condition I'm running into. If I set PREEMPTIVNESS to 0 everything works fine, if I set to 1, I start to get page faults. It seems to me that async_futex is not doing it's job, given the stack traces from taskdump. But stack trace can be misleading as well. [1] http://bazaar.launchpad.net/~maresja1/helenos/qemu_porting/view/2212/uspace/app/posixtest/posixtest.c 2015-06-04 15:03 GMT+02:00 Jakub Jermar <[email protected]>: > On 3.6.2015 23:32, Jan Mareš wrote: > > Isn't it because udebug kernel module is trying to dump all the threads > > and the third one is simply not there? > > > > Just a guess - I've seen the code briefly. > > No, there rather seems to be some problem with udebug_thread_e_event() > for an interrupted thread that calls thread_exit() from syscall_handler(). > > We have a thread, which is udebug.active, but its go_call is NULL. And > then in udebug_thread_e_event() we do: > > 364 call_t *call = THREAD->udebug.go_call; > 365 > 366 THREAD->udebug.go_call = NULL; > 367 IPC_SET_RETVAL(call->data, 0); > > But since call is NULL, the kernel panics. > > For more context: > > The thread becomes interrupted as a result of our attempt to kill the > task and probably was interrupted from a sleep in some other syscall. We > killed the task because it became hung. In fact, it was not hung, but > crashed and taskdump was run on it, so that explains why its refcount > was up by one (it was effectively being debugged by taskdump), but, for > some reason, taskdump, could not finish its job. > > Jakub > > > > > Dne 3. 6. 2015 21:41 napsal uživatel "Jakub Jermar" <[email protected] > > <mailto:[email protected]>>: > > > > On 06/03/2015 05:54 PM, Jan Mareš wrote: > > > Thank you very much too :). Any luck with the second one? > > > > The out of sync number of threads can be IMHO explained by the fact > that > > the statistics source is task_t::refcount and as such can be > temporarily > > incremented in sections between task_hold() and task_release(), > which is > > probably what happens here. You would see the off-by-one number if > the > > thread thread lingered in such a section for some reason. In any > case, > > the increased thread count is nothing to worry about. Maybe we should > > try using task_t::lifecount for the statistics instead. > > > > As for the second panic itself, the kernel encountered the page fault > > exception while it was executing udebug_thread_e_event(). Not sure > why. > > Will have to debug this. > > > > Best, > > Jakub > > > > > > _______________________________________________ > > HelenOS-devel mailing list > > [email protected] <mailto:[email protected]> > > http://lists.modry.cz/listinfo/helenos-devel > > > > > > > > _______________________________________________ > > HelenOS-devel mailing list > > [email protected] > > http://lists.modry.cz/listinfo/helenos-devel > > > > > _______________________________________________ > HelenOS-devel mailing list > [email protected] > http://lists.modry.cz/listinfo/helenos-devel >
_______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/listinfo/helenos-devel
