Re: [HelenOS-devel] Fibrils and threads

Jan Mareš Fri, 05 Jun 2015 00:40:48 -0700

Hello Jakub,

Any input on that piece of code I sent in the previous message?
Dne 4. 6. 2015 15:24 napsal uživatel "Jan Mareš" <[email protected]>:


> I see, yes I realized my suggestion was wrong yesterday when I had a
> better look at it after I returned to my PC.
>
> Anyway, I promised you that I would try to reproduce the problems with
> threads and fibrils on plain fibrils. I think I have it, unless there is a
> bug in my code. Have a look at [1]. It's the smallest extract I was able to
> create to reproduce the race condition I'm running into. If I set
> PREEMPTIVNESS to 0 everything works fine, if I set to 1, I start to get
> page faults. It seems to me that async_futex is not doing it's job, given
> the stack traces from taskdump. But stack trace can be misleading as well.
>
> [1]
> http://bazaar.launchpad.net/~maresja1/helenos/qemu_porting/view/2212/uspace/app/posixtest/posixtest.c
>
> 2015-06-04 15:03 GMT+02:00 Jakub Jermar <[email protected]>:
>
>> On 3.6.2015 23:32, Jan Mareš wrote:
>> > Isn't it because udebug kernel module is trying to dump all the threads
>> > and the third one is simply not there?
>> >
>> > Just a guess - I've seen the code briefly.
>>
>> No, there rather seems to be some problem with udebug_thread_e_event()
>> for an interrupted thread that calls thread_exit() from syscall_handler().
>>
>> We have a thread, which is udebug.active, but its go_call is NULL. And
>> then in udebug_thread_e_event() we do:
>>
>> 364             call_t *call = THREAD->udebug.go_call;
>> 365
>> 366             THREAD->udebug.go_call = NULL;
>> 367             IPC_SET_RETVAL(call->data, 0);
>>
>> But since call is NULL, the kernel panics.
>>
>> For more context:
>>
>> The thread becomes interrupted as a result of our attempt to kill the
>> task and probably was interrupted from a sleep in some other syscall. We
>> killed the task because it became hung. In fact, it was not hung, but
>> crashed and taskdump was run on it, so that explains why its refcount
>> was up by one (it was effectively being debugged by taskdump), but, for
>> some reason, taskdump, could not finish its job.
>>
>> Jakub
>>
>> >
>> > Dne 3. 6. 2015 21:41 napsal uživatel "Jakub Jermar" <[email protected]
>> > <mailto:[email protected]>>:
>> >
>> >     On 06/03/2015 05:54 PM, Jan Mareš wrote:
>> >     > Thank you very much too :). Any luck with the second one?
>> >
>> >     The out of sync number of threads can be IMHO explained by the fact
>> that
>> >     the statistics source is task_t::refcount and as such can be
>> temporarily
>> >     incremented in sections between task_hold() and task_release(),
>> which is
>> >     probably what happens here. You would see the off-by-one number if
>> the
>> >     thread thread lingered in such a section for some reason. In any
>> case,
>> >     the increased thread count is nothing to worry about. Maybe we
>> should
>> >     try using task_t::lifecount for the statistics instead.
>> >
>> >     As for the second panic itself, the kernel encountered the page
>> fault
>> >     exception while it was executing udebug_thread_e_event(). Not sure
>> why.
>> >     Will have to debug this.
>> >
>> >     Best,
>> >     Jakub
>> >
>> >
>> >     _______________________________________________
>> >     HelenOS-devel mailing list
>> >     [email protected] <mailto:[email protected]>
>> >     http://lists.modry.cz/listinfo/helenos-devel
>> >
>> >
>> >
>> > _______________________________________________
>> > HelenOS-devel mailing list
>> > [email protected]
>> > http://lists.modry.cz/listinfo/helenos-devel
>> >
>>
>>
>> _______________________________________________
>> HelenOS-devel mailing list
>> [email protected]
>> http://lists.modry.cz/listinfo/helenos-devel
>>
>
>

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/listinfo/helenos-devel

Re: [HelenOS-devel] Fibrils and threads

Reply via email to