I see, yes I realized my suggestion was wrong yesterday when I had a better
look at it after I returned to my PC.

Anyway, I promised you that I would try to reproduce the problems with
threads and fibrils on plain fibrils. I think I have it, unless there is a
bug in my code. Have a look at [1]. It's the smallest extract I was able to
create to reproduce the race condition I'm running into. If I set
PREEMPTIVNESS to 0 everything works fine, if I set to 1, I start to get
page faults. It seems to me that async_futex is not doing it's job, given
the stack traces from taskdump. But stack trace can be misleading as well.

[1]
http://bazaar.launchpad.net/~maresja1/helenos/qemu_porting/view/2212/uspace/app/posixtest/posixtest.c

2015-06-04 15:03 GMT+02:00 Jakub Jermar <[email protected]>:

> On 3.6.2015 23:32, Jan Mareš wrote:
> > Isn't it because udebug kernel module is trying to dump all the threads
> > and the third one is simply not there?
> >
> > Just a guess - I've seen the code briefly.
>
> No, there rather seems to be some problem with udebug_thread_e_event()
> for an interrupted thread that calls thread_exit() from syscall_handler().
>
> We have a thread, which is udebug.active, but its go_call is NULL. And
> then in udebug_thread_e_event() we do:
>
> 364             call_t *call = THREAD->udebug.go_call;
> 365
> 366             THREAD->udebug.go_call = NULL;
> 367             IPC_SET_RETVAL(call->data, 0);
>
> But since call is NULL, the kernel panics.
>
> For more context:
>
> The thread becomes interrupted as a result of our attempt to kill the
> task and probably was interrupted from a sleep in some other syscall. We
> killed the task because it became hung. In fact, it was not hung, but
> crashed and taskdump was run on it, so that explains why its refcount
> was up by one (it was effectively being debugged by taskdump), but, for
> some reason, taskdump, could not finish its job.
>
> Jakub
>
> >
> > Dne 3. 6. 2015 21:41 napsal uživatel "Jakub Jermar" <[email protected]
> > <mailto:[email protected]>>:
> >
> >     On 06/03/2015 05:54 PM, Jan Mareš wrote:
> >     > Thank you very much too :). Any luck with the second one?
> >
> >     The out of sync number of threads can be IMHO explained by the fact
> that
> >     the statistics source is task_t::refcount and as such can be
> temporarily
> >     incremented in sections between task_hold() and task_release(),
> which is
> >     probably what happens here. You would see the off-by-one number if
> the
> >     thread thread lingered in such a section for some reason. In any
> case,
> >     the increased thread count is nothing to worry about. Maybe we should
> >     try using task_t::lifecount for the statistics instead.
> >
> >     As for the second panic itself, the kernel encountered the page fault
> >     exception while it was executing udebug_thread_e_event(). Not sure
> why.
> >     Will have to debug this.
> >
> >     Best,
> >     Jakub
> >
> >
> >     _______________________________________________
> >     HelenOS-devel mailing list
> >     [email protected] <mailto:[email protected]>
> >     http://lists.modry.cz/listinfo/helenos-devel
> >
> >
> >
> > _______________________________________________
> > HelenOS-devel mailing list
> > [email protected]
> > http://lists.modry.cz/listinfo/helenos-devel
> >
>
>
> _______________________________________________
> HelenOS-devel mailing list
> [email protected]
> http://lists.modry.cz/listinfo/helenos-devel
>
_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to