There have been numerous fixes for Cortex-A SMP since 7.31.  You might want to upgrade.

On 7/29/2020 10:19 AM, Mansfield, Ryan wrote:
Hi,

We are trying to enable SMP for a dual core ARM Cortex-R8. We are using a 
kernel based on NuttX-7.31, with added proprietary support for our embedded 
Cortex-R8 SOC with separate instruction and data caches. We modeled the R-core 
SMP implementation after the A-core implementation and the IMX6 as recent as 
February, 2020.

Our current method of stress testing is to enable our features, which consist 
of some threads (around 6 or so) and low priority work queue work, and running 
Ostest many times.

We would like to run both I-cache and D-cache enabled, but we spend some time 
trying to enable coherent d-caches with the SCU with no luck, and noticed some 
stability issues called out in the Sabre6 readme. So we settled on I-cache 
only. However, even with just I-cache, we have been experiencing issues booting 
into the operating system with that enabled. It seems that when we enable more 
features to start on boot, we seem to get caught in the ldrex/strex assembly 
loop of the up_testset.S function in both cores. We cannot reproduce this with 
I-cache disabled. We added some DSB instructions before and after the 
lrdex/strex pair, and we don't see that issue anymore. Is there anything you 
would be aware of that would cause this issue?

However, when we do this, we start hitting some system asserts related to critical 
sections and sched_lock/unlock commands during "stress testing". For example, 
we seem to hit an enter_critical_section assert where the irqlock on the tcb is 0, while 
g_cpu_irqset indicates the calling CPU has the irq lock. We see this happen recently when 
we call nxsem_post from a ramlog_read.

DEBUGASSERT((g_cpu_irqset & (1 << cpu)) == 0);

We have also hit system asserts during OS test during the signest test in the 
waiter_action function. During the sched_unlock after we increment the 
nest_count var, we see the assert where the lockcount of the tcb is 1, as 
expected, but the locks are not held by the CPU in question.

DEBUGASSERT(g_cpu_schedlock == SP_LOCKED &&
                       (g_cpu_lockset & (1 << cpu)) != 0);

We think the issue might be one of two things. that the changes to up_testset 
are somehow ruining the atomic access to the irq/sched_lock and corresponding 
set bits via the set_locks, or that we pulled architecture specific changes 
that require some generic OS code to be updated as well.

Two questions:
                1) What conditions have you seen that trigger those specific 
asserts that we should look out for?
                2) Do any of the current working SMP implementations support 
I-cache?

Thank you,

Ryan

Reply via email to