Hello Jakub, Any input on that piece of code I sent in the previous message? Dne 4. 6. 2015 15:24 napsal uživatel "Jan Mareš" <[email protected]>:
> I see, yes I realized my suggestion was wrong yesterday when I had a > better look at it after I returned to my PC. > > Anyway, I promised you that I would try to reproduce the problems with > threads and fibrils on plain fibrils. I think I have it, unless there is a > bug in my code. Have a look at [1]. It's the smallest extract I was able to > create to reproduce the race condition I'm running into. If I set > PREEMPTIVNESS to 0 everything works fine, if I set to 1, I start to get > page faults. It seems to me that async_futex is not doing it's job, given > the stack traces from taskdump. But stack trace can be misleading as well. > > [1] > http://bazaar.launchpad.net/~maresja1/helenos/qemu_porting/view/2212/uspace/app/posixtest/posixtest.c > > 2015-06-04 15:03 GMT+02:00 Jakub Jermar <[email protected]>: > >> On 3.6.2015 23:32, Jan Mareš wrote: >> > Isn't it because udebug kernel module is trying to dump all the threads >> > and the third one is simply not there? >> > >> > Just a guess - I've seen the code briefly. >> >> No, there rather seems to be some problem with udebug_thread_e_event() >> for an interrupted thread that calls thread_exit() from syscall_handler(). >> >> We have a thread, which is udebug.active, but its go_call is NULL. And >> then in udebug_thread_e_event() we do: >> >> 364 call_t *call = THREAD->udebug.go_call; >> 365 >> 366 THREAD->udebug.go_call = NULL; >> 367 IPC_SET_RETVAL(call->data, 0); >> >> But since call is NULL, the kernel panics. >> >> For more context: >> >> The thread becomes interrupted as a result of our attempt to kill the >> task and probably was interrupted from a sleep in some other syscall. We >> killed the task because it became hung. In fact, it was not hung, but >> crashed and taskdump was run on it, so that explains why its refcount >> was up by one (it was effectively being debugged by taskdump), but, for >> some reason, taskdump, could not finish its job. >> >> Jakub >> >> > >> > Dne 3. 6. 2015 21:41 napsal uživatel "Jakub Jermar" <[email protected] >> > <mailto:[email protected]>>: >> > >> > On 06/03/2015 05:54 PM, Jan Mareš wrote: >> > > Thank you very much too :). Any luck with the second one? >> > >> > The out of sync number of threads can be IMHO explained by the fact >> that >> > the statistics source is task_t::refcount and as such can be >> temporarily >> > incremented in sections between task_hold() and task_release(), >> which is >> > probably what happens here. You would see the off-by-one number if >> the >> > thread thread lingered in such a section for some reason. In any >> case, >> > the increased thread count is nothing to worry about. Maybe we >> should >> > try using task_t::lifecount for the statistics instead. >> > >> > As for the second panic itself, the kernel encountered the page >> fault >> > exception while it was executing udebug_thread_e_event(). Not sure >> why. >> > Will have to debug this. >> > >> > Best, >> > Jakub >> > >> > >> > _______________________________________________ >> > HelenOS-devel mailing list >> > [email protected] <mailto:[email protected]> >> > http://lists.modry.cz/listinfo/helenos-devel >> > >> > >> > >> > _______________________________________________ >> > HelenOS-devel mailing list >> > [email protected] >> > http://lists.modry.cz/listinfo/helenos-devel >> > >> >> >> _______________________________________________ >> HelenOS-devel mailing list >> [email protected] >> http://lists.modry.cz/listinfo/helenos-devel >> > >
_______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/listinfo/helenos-devel
