Re: [ECOS] Scheduler startup question

Michael Jones Tue, 04 Mar 2014 09:09:40 -0800

Christophe,

If you are looking at my source forge code, the integration brach is my latest 
code with the trylock in Vectors.S.


Mike

On Mar 4, 2014, at 9:51 AM, Michael Jones <[email protected]> wrote:

> Christophe,
> 
> What I mean is the lock shown in the code you put below is not in the eCos 
> code database. So when I said I added code, I added the code you put below.
> 
> I removed that code and moved it to Vectors.S, where it is now a trylock, 
> rather than the main lock call. (My latest code on source forge does not have 
> this lock call shown below.)
> 
> When the lock was called in inteterrupt_end, it did not deadlock. When it was 
> called in Vectors.S, it deadlocked.
> 
> The functional difference is that when the lock was called in Vectors.S, it 
> was called before the ISR was called.
> 
> But as I said, I have not tried to find the root cause of the deadlock.
> 
> Perhaps I can try the kernel instrumentation when I have some time this 
> weekend.
> 
> Mike
> 
> On Mar 4, 2014, at 9:16 AM, christophe <[email protected]> wrote:
> 
>> Michael,
>> 
>> I am not sure what you mean by adding code in interrupt_end to take the 
>> lock. The locking mechanism is present for SMP target, no change required:
>> 
>> externC void
>> interrupt_end(
>>   cyg_uint32          isr_ret,
>>   Cyg_Interrupt       *intr,
>>   HAL_SavedRegisters  *regs
>>   )
>> {
>> //    CYG_REPORT_FUNCTION();
>> 
>> #ifdef CYGPKG_KERNEL_SMP_SUPPORT
>>   Cyg_Scheduler::lock();
>> #endif
>> 
>> The macro for incrementing the lock in SMP looks at the current owner of the 
>> lock and spin when required.
>> 
>> I found the kernel instrumentation option very useful for debugging 
>> deadlocks. I was using CodeConfidence plugin in Eclipse to analyze the trace 
>> which makes it pretty efficient debugging.
>> 
>> Christophe
>> 
>> On 3/4/2014 4:58 PM, Michael Jones wrote:
>>> Christophe,
>>> 
>>> When I first got SMP to work I added some code in interrupt_end to take the 
>>> lock, but I moved it back to Vectors.S because I was trying to reduce 
>>> changes to the kernel. Functionally, the only difference is getting the 
>>> lock before the ISR is executed or not.
>>> 
>>> My bigger concern is how the lock is taken. When I increase the lock count, 
>>> the core doing so (core 0) may not be the holder of the lock, which leads 
>>> to assertions. And if it spins while taking the lock, it deadlocks. I have 
>>> not traced down the deadlock, but I think the problem is in the scheduler, 
>>> where some secondary CPU is waiting.
>>> 
>>> My current solution is to use a trylock in Vectors.S and living with the 
>>> fact that when it fails, it will take another real time clock interrupt to 
>>> try again. So interrupt_end is not guaranteed to called on each interrupt. 
>>> This keeps things simple. All interrupts go to core 0 except inter cpu 
>>> interrupts. Some latency is added because taking the lock is not guaranteed.
>>> 
>>> Other ways to handle this is to send interrupts to all cores, use inter 
>>> core interrupts, etc, in an effort to guarantee a lock is incremented by 
>>> the core that holds the lock.
>>> 
>>> I was not able to figure our how i386 handled this. Does anyone know how 
>>> the i386 SMP incremented the lock if the core that got the interrupt did 
>>> not hold the lock?
>>> 
>>> Mike
>>> 
>>> 
>>> On Mar 4, 2014, at 8:37 AM, christophe <[email protected]> wrote:
>>> 
>>>> Hi Michael,
>>>> 
>>>> I might remember wrong but I think in case of SMP target, the lock is not 
>>>> taken in Vector.S but directly after entering interrupt_end. Of course 
>>>> this is spinlock based so it might delay posting/scheduling of the DSR.
>>>> 
>>>> Christophe
>>>> 
>>>> On 3/2/2014 9:19 PM, Michael Jones wrote:
>>>>> Jurgen,
>>>>> 
>>>>> I think I fully understand how the scheduler locking works during 
>>>>> interrupt now. Vectors.S takes the lock, and interrupt_end clears it. 
>>>>> However, the normal technique of incrementing the lock count does not 
>>>>> work with SMP. The problem is that another CPU may have the lock. 
>>>>> Incrementing anyway leads to assertions. Attempting to take the lock with 
>>>>> the spinlock can lead to deadlocks or an unresponsive network application.
>>>>> 
>>>>> So I changed things so that in Vectors.S, during an interrupt, an attempt 
>>>>> at locking is made. This means trying to take a spinlock that might fail. 
>>>>> If the lock is taken, interrupt_end is called. If the lock fails, 
>>>>> interrupt_end is not called.
>>>>> 
>>>>> This means that a DSR may not be posted on that interrupt. This can cause 
>>>>> some latency based on the real time clock interrupt rate, or time until a 
>>>>> thread switch. However, it is stable and assertion free. Also, a HAL 
>>>>> could implement a timeout on the try spinlock which might reduce latency.
>>>>> 
>>>>> To support the try and testing if the lock was taken, I had to add some 
>>>>> functions to the kernel. The following wiki page has been updated to 
>>>>> reflect the kernel changes.
>>>>> 
>>>>> https://sourceforge.net/p/ecosfreescale/wiki/SMP%20Kernel/
>>>>> 
>>>>> Anyone with SMP knowledge might want to take a look. There may be better 
>>>>> solutions to some of these problems. But at least for now, the IMX6 SMP 
>>>>> HAL seems stable and I can run IO intensive Lua scripts over telnet 
>>>>> reliably, even when the client aborts.
>>>>> 
>>>>> The client abort means telnet has to kill a thread. This was quite a 
>>>>> challenge. Telnet is creating a separate heap for Lua so it can kill the 
>>>>> thread and reclaim memory. The remaining problem is closing file handles. 
>>>>> I still get some assertions when a handle is sometimes killed by a thread 
>>>>> that does not own it. I don't think that can be solved without adding 
>>>>> some new functions dedicated to clean up of file handles by an outside 
>>>>> thread.
>>>>> 
>>>>> Mike
>>>>> 
>>>>> 
>>>>> 
>>>>> On Feb 26, 2014, at 11:40 PM, Lambrecht Jürgen <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> As far as I know the scheduler is started after cyg_user_start(), used 
>>>>>> by your application to initialize everything.  Do you use cyg_user_start?
>>>>>> 
>>>>>> 
>>>>>> Verzonden vanaf Samsung Mobile
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -------- Oorspronkelijk bericht --------
>>>>>> Van: Michael Jones <[email protected]>
>>>>>> Datum:
>>>>>> Aan: ecos discuss <[email protected]>
>>>>>> Onderwerp: [ECOS] Scheduler startup question
>>>>>> 
>>>>>> 
>>>>>> I have a question about proper scheduler locking startup behavior.
>>>>>> 
>>>>>> The context is I am cleaning up my iMX6 HAL and attempting to make 
>>>>>> things work without a couple of kernel hacks I added to make it work.
>>>>>> 
>>>>>> The question has to do with sched_lock. By default this has a value of 
>>>>>> 1, so during startup the scheduler is locked.
>>>>>> 
>>>>>> When there is an interrupt, sched_lock is incremented in Vectors.S, and 
>>>>>> decremented in interrupt_end.
>>>>>> 
>>>>>> However, I am getting an assert in sync.h which is part of the BSD 
>>>>>> stack. The assert is because it expects the lock to be zero.
>>>>>> 
>>>>>> The question is, during the startup process, how does the lock get set 
>>>>>> to zero after initialization? Is it supposed to stay 1 while hardware is 
>>>>>> initialized and through all the constructors, etc? Is it cleared by the 
>>>>>> scheduler somehow? Is the HAL supposed to zero it at some point during 
>>>>>> startup?
>>>>>> 
>>>>>> My HAL is part of the ARM hal, so if this is device specific, it is the 
>>>>>> ARM HAL I am working with.
>>>>>> 
>>>>>> Mike
>>>>>> --
>>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
>>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
>>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>>>>>> 
>>>> 
>>>> -- 
>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>>>> 
>> 
>> 
>> -- 
>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>> 
> 


--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

Re: [ECOS] Scheduler startup question

Reply via email to