Re: Port of project from NuttX 7.30 to 10.1 RC1: Unexpected IRQ

Abdelatif Guettouche Wed, 26 May 2021 09:43:10 -0700

Maybe this one could help:
https://cwiki.apache.org/confluence/display/NUTTX/NuttX+9.1#NuttX9.1-CompatibilityConcerns


> I am using the flat (monolithic build) and I see no place that define
>this flag, at all.
>I dont even see a place in the codebase that defines this flag.

__KERNEL__ is defined in tools/Config.mk (line:100)

> The fact that mm_initialize only shows one region is weird... where is
the heap for the main RAM at 0x20000000?

CONFIG_MM_REGIONS needs to be set up correctly if you have multiple
heap regions.

On Wed, May 26, 2021 at 5:22 PM Sebastien Lorquet <[email protected]> wrote:
>
> Hello,
>
> Thanks for the remarks.
>
> I am using the flat (monolithic build) and I see no place that define
> this flag, at all.
>
> I dont even see a place in the codebase that defines this flag.
>
> I see nothing related to mm, nor anything outdated in my Make.defs,
> which is from my old setup, yes, but still similar to a recent one.
>
> Sebastien
>
> Le 26/05/2021 à 18:08, raiden00pl a écrit :
> > If you use CONFIG_BUILD_FLAT=y, make sure that __KERNEL__ flag is set here:
> > https://github.com/apache/incubator-nuttx/blob/master/include/nuttx/mm/mm.h#L85
> > I remember that at some point I had a similar hardfault in mm which doesn't
> > make sense and it was due to outdated board Make.defs.
> >
> > śr., 26 maj 2021 o 17:21 Sebastien Lorquet <[email protected]>
> > napisał(a):
> >
> >> Update: stack dump and register analysis are in fact pointing to a crash
> >> in mm_alloc
> >>
> >> I have enabled memory management debug:
> >>
> >> mm_initialize: Heap: start=0x10000000 size=65536
> >> mm_addregion: Region 1: base=0x10000154 size=65184
> >> stm32_netinitialize: Enabling PHY power
> >> stm32_netinitialize: PHY reset...
> >> stm32_netinitialize: PHY reset done.
> >> stm32_netinitialize: Configuring PHY int
> >> F
> >> mm_free: Freeing 0x70fb460b
> >> irq_unexpected_isr: ERROR irq: 3
> >> up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 50
> >> up_registerdump: R0: 00000001 2000737c c00000f2 08000101 00000000
> >> 00000000 00000000 200073c8
> >> up_registerdump: R8: 00000000 00000000 00000000 00000000 00000000
> >> 200073c8 080126ad 080126f8
> >> up_registerdump: xPSR: 21000000 PRIMASK: 00000000 CONTROL: 00000000
> >> up_registerdump: EXC_RETURN: fffffff9
> >> up_dumpstate: sp:         200072c8
> >> up_dumpstate: stack base: 20007078
> >> up_dumpstate: stack size: 00000400
> >>
> >> The fact that mm_initialize only shows one region is weird... where is
> >> the heap for the main RAM at 0x20000000?
> >>
> >> the mm_free(0x70fb460b) is not what causes the hardfault (it comes
> >> later), but what the hell is is this invalid address!
> >>
> >> This is the first call to mm_free, here is the backtrace:
> >>
> >> Breakpoint 1, mm_free (heap=0x200060b4 <g_mmheap>, mem=0x70fb460b) at
> >> mm_heap/mm_free.c:85
> >> 85        if (!mem)
> >> (gdb) bt
> >> #0  mm_free (heap=0x200060b4 <g_mmheap>, mem=0x70fb460b) at
> >> mm_heap/mm_free.c:85
> >> #1  0x0801264a in mm_free_delaylist (heap=0x200060b4 <g_mmheap>) at
> >> mm_heap/mm_malloc.c:82
> >> #2  0x08012672 in mm_malloc (heap=0x200060b4 <g_mmheap>, size=24) at
> >> mm_heap/mm_malloc.c:115
> >> #3  0x08012a32 in mm_zalloc (heap=0x200060b4 <g_mmheap>, size=24) at
> >> mm_heap/mm_zalloc.c:45
> >> #4  0x080123ac in zalloc (size=24) at umm_heap/umm_zalloc.c:68
> >> #5  0x080399fa in inode_alloc (name=0x8059a78 "") at
> >> inode/fs_inodereserve.c:78
> >> #6  0x08039a5c in inode_root_reserve () at inode/fs_inodereserve.c:129
> >> #7  0x080398cc in inode_initialize () at inode/fs_inode.c:92
> >> #8  0x08039284 in fs_initialize () at fs_initialize.c:47
> >> #9  0x08007eb4 in nx_start () at init/nx_start.c:600
> >> #10 0x0800421e in __start () at chip/stm32_start.c:338
> >>
> >> As previously analyzed, this happens in fs_initialize through
> >> inode_root_reserve, so I was on the right track.
> >>
> >> Caller shows mm_free called with that weird address:
> >>
> >> (gdb) f 1
> >> #1  0x0801264a in mm_free_delaylist (heap=0x200060b4 <g_mmheap>) at
> >> mm_heap/mm_malloc.c:82
> >> 82            mm_free(heap, address);
> >> (gdb) list
> >> 77
> >> 78            /* The address should always be non-NULL since that was
> >> checked in the
> >> 79             * 'while' condition above.
> >> 80             */
> >> 81
> >> 82            mm_free(heap, address); <-- address == 0x70fb460b
> >> 83          }
> >> 84      #endif
> >> 85      }
> >> 86
> >>
> >> (gdb) print &g_mmheap
> >> $3 = (struct mm_heap_s *) 0x200060b4 <g_mmheap>
> >> (gdb) print g_mmheap
> >> $4 = {mm_impl = 0x0}
> >>
> >> this is not good!
> >>
> >> This is not a timing or IRQ related issue but a heap issue.
> >>
> >> R15 = 080126f8 translates to here:
> >>
> >>
> >> https://github.com/apache/incubator-nuttx/blob/master/mm/mm_heap/mm_malloc.c#L199
> >>
> >> => this free() has corrupted a badly initialized heap, and the next
> >> malloc fails, giving a hardfault because that address is invalid.
> >>
> >> Horrific mess!
> >>
> >> ==>
> >>
> >> I think that my old board code does not initialize the board properly, I
> >> probably have to check for differences between my code and the
> >> stm32f429i-disco built-in board (on which I based my board).
> >>
> >> Sebastien
> >>
> >> Le 25/05/2021 à 21:26, Nathan Hartman a écrit :
> >>> On Tue, May 25, 2021 at 12:02 PM Sebastien Lorquet <[email protected]
> >>>
> >>> wrote:
> >>>
> >>>> Back to the business
> >>>>>> After this we managed to recompile our project using the latest NuttX
> >>>>>> sources, but it fails when trying to init the PHY irq on our STM32F427
> >>>>>> board: We get "unexpected IRQ".
> >>>>>>
> >>>>>> Yes I know that's pretty vague :-)
> >>>>>>
> >>>>>> Is there anything obvious I should have been careful with in this
> >>>>>> domain, before I dig the jtag probe to fix it (tomorrow) ?
> >>>>> I would first start by looking through the Release Notes between v7.30
> >>>>> and v10.1. Many big improvements and bug fixes happened and some of
> >>>>> them are mentioned in Compatibility Concerns along with some changes
> >>>>> you might need to make to configuration etc.
> >>>>>
> >>>>> Also another thing you can try: Has this board and PHY worked
> >>>>> correctly with v7.30? If so, you can bisect and with very few tests
> >>>>> (I'm guessing fewer than 20) find the exact commit that broke it.
> >>>> Release notes are hard to read but I did not find anything special about
> >>>> phy interrupts.
> >>>>
> >>>> Note that it may not be the phy interrupt. Here is my log:
> >>>>
> >>>> stm32_netinitialize: Enabling PHY power
> >>>> stm32_netinitialize: PHY reset...
> >>>> stm32_netinitialize: PHY reset done.
> >>>> stm32_netinitialize: Configuring PHY int
> >>>> F
> >>>> irq_unexpected_isr: ERROR irq: 3
> >>>> up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 50
> >>>> up_registerdump: R0: 00000001 2000737c c00000f2 08000101 00000000
> >>>> 00000000 00000000 200073c8
> >>>> up_registerdump: R8: 00000000 00000000 00000000 00000000 00000000
> >>>> 200073c8 080126ad 080126f8
> >>>> up_registerdump: xPSR: 21000000 PRIMASK: 00000000 CONTROL: 00000000
> >>>> up_registerdump: EXC_RETURN: fffffff9
> >>>>
> >>>> A lot of OS initialization things happen at the point, marked by the
> >>>> letter F.
> >>>>
> >>>> It seems that an unexpected IRQ happens in this interval, around the
> >>>> time the filesystem is initialized. The backtrace goes down to memory
> >>>> allocation routines through the initialization of the root inode.
> >>>>
> >>>> My guess is that AN external IRQ is triggered (possibly not the PHY IRQ)
> >>>> but the ISR handler for that one is not ready yet. I will add debug
> >>>> messages.
> >>>>
> >>>>
> >>>> I would expect that situation to be a simple NOP, but it seems that
> >>>> undefined handlers are set to this function "irq_unexpected_isr"
> >>>>
> >>>> Is that a new behaviour? a default config that I did not set properly
> >>>> when porting our old defconfig?
> >>>>
> >>>> Sebastien
> >>>>
> >>>>> Nathan
> >>> Did you try disabling the PHY (or networking) in Kconfig to see if
> >> removing
> >>> it from the build will eliminate the hardfault?
> >>>
> >>> Have you seen this about hardfault debugging:
> >>>
> >> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=139629445#content/view/139629445
> >>> Nathan
> >>>

Re: Port of project from NuttX 7.30 to 10.1 RC1: Unexpected IRQ

Reply via email to