On October 22, 2015 1:37:18 PM CDT, Isaac Gutekunst <isaac.guteku...@vecna.com> wrote: >I think I may have some information that's actually useful. > >I've managed to actually execute some tests.... and lots of them are >failing. > >sp01 and sp02 fail quite quickly, as an assertion fails. > >assertion "first != _Chain_Tail( &ready_queues[ index ] )" failed: file > >"../../cpukit/../../../stm32f7x/lib/ >include/rtems/score/schedulerpriorityimpl.h", line >166, function: _Scheduler_priority_Ready_queue_first > >This failure is common to many of the failed tests so far. What does >this mean? >
Does hello run? >Isaac > >On 10/22/2015 09:16 AM, Jay Doyle wrote: >> >> >> On 10/22/2015 01:40 AM, Sebastian Huber wrote: >>> >>> >>> On 21/10/15 15:48, Jay Doyle wrote: >>>> >>>> >>>> On 10/21/2015 09:35 AM, Sebastian Huber wrote: >>>>> >>>>> >>>>> On 21/10/15 15:08, Isaac Gutekunst wrote: >>>>>> >>>>>> >>>>>> On 10/21/2015 09:00 AM, Sebastian Huber wrote: >>>>>>> >>>>>>> >>>>>>> On 21/10/15 14:56, Isaac Gutekunst wrote: >>>>>>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 21/10/15 14:13, Isaac Gutekunst wrote: >>>>>>>>>> Thanks for the reply. >>>>>>>>>> >>>>>>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote: >>>>>>>>> [...] >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> As far as I can tell this would only occur if the caller of >>>>>>>>>>>> pthread_mutex_lock was in a >>>>>>>>>>>> "bad" >>>>>>>>>>>> state. I don't believe it is in an interrupt context, and >>>>>>>>>>>> don't know what other bad states >>>>>>>>>>>> could exist. >>>>>>>>>>> >>>>>>>>>>> We have >>>>>>>>>>> >>>>>>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \ >>>>>>>>>>> (!_Thread_Dispatch_is_enabled() \ >>>>>>>>>>> && (_wait) \ >>>>>>>>>>> && (_System_state_Get() >= SYSTEM_STATE_UP)) >>>>>>>>>>> >>>>>>>>>>> What is the thread dispatch disable level and the system >state >>>>>>>>>>> at this point? >>>>>>>>>>> >>>>>>>>>>> In case the thread dispatch disable level is not zero, then >>>>>>>>>>> something is probably broken >>>>>>>>>>> in the >>>>>>>>>>> operating system code which is difficult to find. Could be a >>>>>>>>>>> general memory corruption >>>>>>>>>>> problem >>>>>>>>>>> too. Which RTEMS version do you use? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The thread dispatch disable level is usually -1 or -2. >>>>>>>>>> (0xFFFFFFFE or 0xFFFFFFD). >>>>>>>>> >>>>>>>>> A negative value is very bad, but easy to detect via manual >>>>>>>>> instrumentation (only an hand full >>>>>>>>> of spots touch this variable) or hardware >>>>>>>>> breakpoints/watchpoints. Looks the rest of >>>>>>>>> _Per_CPU_Information all right? >>>>>>>>> >>>>>>>> It looks like it's only the thread_dispatch_disable_level >that's >>>>>>>> broken. >>>>>>>> >>>>>>>> We'll go and grep for all places for all the places it's >touched, >>>>>>>> and look for something. >>>>>>>> >>>>>>>> The problem with watchpoints is they fire exceptionally often, >and >>>>>>>> putting in a conditional >>>>>>>> watchpoint slows the code to a crawl, but that may be worth it. >>>>>>>> >>>>>>>> Here are some printouts of the relevant structs right after a >crash: >>>>>>>> >>>>>>>> $4 = { >>>>>>>> cpu_per_cpu = {<No data fields>}, >>>>>>>> isr_nest_level = 0, >>>>>>>> thread_dispatch_disable_level = 4294967295, >>>>>>>> executing = 0xc01585c8, >>>>>>>> heir = 0xc0154038, >>>>>>>> dispatch_necessary = true, >>>>>>>> time_of_last_context_switch = { >>>>>>>> sec = 2992, >>>>>>>> frac = 10737447432380511034 >>>>>>>> }, >>>>>>>> Stats = {<No data fields>} >>>>>>>> } >>>>>>> >>>>>>> No, this doesn't look good. According to the stack trace you are >in >>>>>>> thread context. However, we >>>>>>> have executing != heir and dispatch_necessary == true. This is a >>>>>>> broken state itself. I guess, >>>>>>> something is wrong with the interrupt level so that a context >>>>>>> switch is blocked. On ARMv7-M >>>>>>> this is done via the system call exception. >>>>>>> >>>>>> This is a bit beyond my RTEMS knowledge. What would you advise >>>>>> looking into? >>>>> >>>>> I would try to instrument the code to figure out where the thread >>>>> dispatch disable level goes negative. >>>>> >>>> >>>> We just did. I added a check in _ARMV7M_Interrupt_service_leave to >>>> see if the _Thread_Dispatch_disable_level is positive before the >>>> decrementing it and this eventually fails. >>>> >>>> I'm not sure if this tells us much because I think the call itself >>>> correct. In this particular case it is processing an I2C >interrupt. >>>> I will try to see if we can capture information about the sequence >of >>>> changes to the _Thread_Dispatch_disable_level just before the point >in >>>> which we know something is clearly wrong (i.e., decreasing it below >>>> zero.) >>> >>> Since the isr_nest_level is 0, I don't think its a problem with the >spots that use >>> _ARMV7M_Interrupt_service_leave(). Did you check the interrupt >priorities? See also >>> >>> https://lists.rtems.org/pipermail/users/2015-June/029155.html >>> >> Thanks for the pointer to this posting. It seems like a very similar >situation to what we are >> experiencing -- especially considering that we invoke an RTEMS call >in our ethernet isr. >> Unfortunately, all our interrupts use the default interrupt priority >level set in the bsp >> header file as: >> >> #define BSP_ARMV7M_IRQ_PRIORITY_DEFAULT (13 << 4) >> >> which should be mean that they are all non-NMIs unless we explicitly >set their interrupt level >> lower. >> >> >> >> >> _______________________________________________ >> devel mailing list >> devel@rtems.org >> http://lists.rtems.org/mailman/listinfo/devel >_______________________________________________ >devel mailing list >devel@rtems.org >http://lists.rtems.org/mailman/listinfo/devel --joel _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel