Christophe, If you are looking at my source forge code, the integration brach is my latest code with the trylock in Vectors.S.
Mike On Mar 4, 2014, at 9:51 AM, Michael Jones <[email protected]> wrote: > Christophe, > > What I mean is the lock shown in the code you put below is not in the eCos > code database. So when I said I added code, I added the code you put below. > > I removed that code and moved it to Vectors.S, where it is now a trylock, > rather than the main lock call. (My latest code on source forge does not have > this lock call shown below.) > > When the lock was called in inteterrupt_end, it did not deadlock. When it was > called in Vectors.S, it deadlocked. > > The functional difference is that when the lock was called in Vectors.S, it > was called before the ISR was called. > > But as I said, I have not tried to find the root cause of the deadlock. > > Perhaps I can try the kernel instrumentation when I have some time this > weekend. > > Mike > > On Mar 4, 2014, at 9:16 AM, christophe <[email protected]> wrote: > >> Michael, >> >> I am not sure what you mean by adding code in interrupt_end to take the >> lock. The locking mechanism is present for SMP target, no change required: >> >> externC void >> interrupt_end( >> cyg_uint32 isr_ret, >> Cyg_Interrupt *intr, >> HAL_SavedRegisters *regs >> ) >> { >> // CYG_REPORT_FUNCTION(); >> >> #ifdef CYGPKG_KERNEL_SMP_SUPPORT >> Cyg_Scheduler::lock(); >> #endif >> >> The macro for incrementing the lock in SMP looks at the current owner of the >> lock and spin when required. >> >> I found the kernel instrumentation option very useful for debugging >> deadlocks. I was using CodeConfidence plugin in Eclipse to analyze the trace >> which makes it pretty efficient debugging. >> >> Christophe >> >> On 3/4/2014 4:58 PM, Michael Jones wrote: >>> Christophe, >>> >>> When I first got SMP to work I added some code in interrupt_end to take the >>> lock, but I moved it back to Vectors.S because I was trying to reduce >>> changes to the kernel. Functionally, the only difference is getting the >>> lock before the ISR is executed or not. >>> >>> My bigger concern is how the lock is taken. When I increase the lock count, >>> the core doing so (core 0) may not be the holder of the lock, which leads >>> to assertions. And if it spins while taking the lock, it deadlocks. I have >>> not traced down the deadlock, but I think the problem is in the scheduler, >>> where some secondary CPU is waiting. >>> >>> My current solution is to use a trylock in Vectors.S and living with the >>> fact that when it fails, it will take another real time clock interrupt to >>> try again. So interrupt_end is not guaranteed to called on each interrupt. >>> This keeps things simple. All interrupts go to core 0 except inter cpu >>> interrupts. Some latency is added because taking the lock is not guaranteed. >>> >>> Other ways to handle this is to send interrupts to all cores, use inter >>> core interrupts, etc, in an effort to guarantee a lock is incremented by >>> the core that holds the lock. >>> >>> I was not able to figure our how i386 handled this. Does anyone know how >>> the i386 SMP incremented the lock if the core that got the interrupt did >>> not hold the lock? >>> >>> Mike >>> >>> >>> On Mar 4, 2014, at 8:37 AM, christophe <[email protected]> wrote: >>> >>>> Hi Michael, >>>> >>>> I might remember wrong but I think in case of SMP target, the lock is not >>>> taken in Vector.S but directly after entering interrupt_end. Of course >>>> this is spinlock based so it might delay posting/scheduling of the DSR. >>>> >>>> Christophe >>>> >>>> On 3/2/2014 9:19 PM, Michael Jones wrote: >>>>> Jurgen, >>>>> >>>>> I think I fully understand how the scheduler locking works during >>>>> interrupt now. Vectors.S takes the lock, and interrupt_end clears it. >>>>> However, the normal technique of incrementing the lock count does not >>>>> work with SMP. The problem is that another CPU may have the lock. >>>>> Incrementing anyway leads to assertions. Attempting to take the lock with >>>>> the spinlock can lead to deadlocks or an unresponsive network application. >>>>> >>>>> So I changed things so that in Vectors.S, during an interrupt, an attempt >>>>> at locking is made. This means trying to take a spinlock that might fail. >>>>> If the lock is taken, interrupt_end is called. If the lock fails, >>>>> interrupt_end is not called. >>>>> >>>>> This means that a DSR may not be posted on that interrupt. This can cause >>>>> some latency based on the real time clock interrupt rate, or time until a >>>>> thread switch. However, it is stable and assertion free. Also, a HAL >>>>> could implement a timeout on the try spinlock which might reduce latency. >>>>> >>>>> To support the try and testing if the lock was taken, I had to add some >>>>> functions to the kernel. The following wiki page has been updated to >>>>> reflect the kernel changes. >>>>> >>>>> https://sourceforge.net/p/ecosfreescale/wiki/SMP%20Kernel/ >>>>> >>>>> Anyone with SMP knowledge might want to take a look. There may be better >>>>> solutions to some of these problems. But at least for now, the IMX6 SMP >>>>> HAL seems stable and I can run IO intensive Lua scripts over telnet >>>>> reliably, even when the client aborts. >>>>> >>>>> The client abort means telnet has to kill a thread. This was quite a >>>>> challenge. Telnet is creating a separate heap for Lua so it can kill the >>>>> thread and reclaim memory. The remaining problem is closing file handles. >>>>> I still get some assertions when a handle is sometimes killed by a thread >>>>> that does not own it. I don't think that can be solved without adding >>>>> some new functions dedicated to clean up of file handles by an outside >>>>> thread. >>>>> >>>>> Mike >>>>> >>>>> >>>>> >>>>> On Feb 26, 2014, at 11:40 PM, Lambrecht Jürgen <[email protected]> >>>>> wrote: >>>>> >>>>>> As far as I know the scheduler is started after cyg_user_start(), used >>>>>> by your application to initialize everything. Do you use cyg_user_start? >>>>>> >>>>>> >>>>>> Verzonden vanaf Samsung Mobile >>>>>> >>>>>> >>>>>> >>>>>> -------- Oorspronkelijk bericht -------- >>>>>> Van: Michael Jones <[email protected]> >>>>>> Datum: >>>>>> Aan: ecos discuss <[email protected]> >>>>>> Onderwerp: [ECOS] Scheduler startup question >>>>>> >>>>>> >>>>>> I have a question about proper scheduler locking startup behavior. >>>>>> >>>>>> The context is I am cleaning up my iMX6 HAL and attempting to make >>>>>> things work without a couple of kernel hacks I added to make it work. >>>>>> >>>>>> The question has to do with sched_lock. By default this has a value of >>>>>> 1, so during startup the scheduler is locked. >>>>>> >>>>>> When there is an interrupt, sched_lock is incremented in Vectors.S, and >>>>>> decremented in interrupt_end. >>>>>> >>>>>> However, I am getting an assert in sync.h which is part of the BSD >>>>>> stack. The assert is because it expects the lock to be zero. >>>>>> >>>>>> The question is, during the startup process, how does the lock get set >>>>>> to zero after initialization? Is it supposed to stay 1 while hardware is >>>>>> initialized and through all the constructors, etc? Is it cleared by the >>>>>> scheduler somehow? Is the HAL supposed to zero it at some point during >>>>>> startup? >>>>>> >>>>>> My HAL is part of the ARM hal, so if this is device specific, it is the >>>>>> ARM HAL I am working with. >>>>>> >>>>>> Mike >>>>>> -- >>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos >>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >>>>>> >>>>>> >>>>>> -- >>>>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos >>>>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >>>>>> >>>> >>>> -- >>>> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos >>>> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >>>> >> >> >> -- >> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos >> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss >> > -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
