On 3/8/2022 02:52, Sebastian Huber wrote:
On 28/02/2022 20:18, Kinsey Moore wrote:
On 2/28/2022 12:19, Sebastian Huber wrote:
On 26/02/2022 08:03, Kinsey Moore wrote:
On 2/26/2022 00:53, Sebastian Huber wrote:
On 26/02/2022 00:41, Kinsey Moore wrote:
This may also be an issue for ARM, RISC-V and others as it
doesn't appear that ARM saves CPSR during context switch and I
couldn't tell that RISC-V does this either, though I'm less
familiar with it.
This doesn't look like the right way to fix this issue.
There is currently the assumption that all processors start
multitasking with a context switch to _Thread_Handler() which sets
the interrupt level. It is possible to construct a scenario in
which we start multitasking with a migration of a thread which
already executed the _Thread_Handler() prologue. This would result
in an execution with disabled interrupts. I think the proper fix
for this scenario is to enable interrupts in
_CPU_SMP_Prepare_start_multitasking().
Doing a context switch with interrupts disabled is a fatal
application error on all architectures with
#define CPU_ENABLE_ROBUST_THREAD_DISPATCH TRUE
or enabled SMP support.
Ok, great. I was wondering if that was the case and this is
definitely the kind of feedback I was looking for. I'll adjust the
patch set to reflect that. I still wonder if this is an issue on
other SMP CPU ports, though, since most of them don't implement
that hook, either.
I would like to have a closer look at this next week then I am back
from holidays.
Enabling interrupts in _CPU_SMP_Prepare_start_multitasking() would
not work since we use the interrupt stack at this point. We should
add a ticket and a test case for this (I can do this next week). How
did you observe this bug?
I was only able to observe this bug once the 2/2 patch is applied and
that optimization opens a race condition (adding a few no-ops to the
Per_CPU_Control accessor prevents it from appearing) in the
sppercpudata01 test on SMP configurations since the task is migrating
across CPUs as CPUs are coming online. The race condition resolves
nominally in 90% of cases so while it's not a frequent failure it is
reproducible.
I added a ticket and a test case:
http://devel.rtems.org/ticket/4627
Could you please check if the test case fails currently on your
aarch64 target?
I have verified that this test case fails under QEMU and on the hardware
target.
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel