tmedicci commented on PR #13278: URL: https://github.com/apache/nuttx/pull/13278#issuecomment-2326414854
Thanks @yf13 , Just to add to the discussion on https://github.com/apache/nuttx/pull/12864#issuecomment-2325779041, long-story-short: Our internal CI fails to test the `iperf` after #12864 for all Espressif's RISC-V devices (ESP32-C3 and ESP32-C6). The output either drops to 0 or the device halts. When halted, it keeps looping the list within the [`list_for_every_entry`](https://github.com/apache/nuttx/blob/59fd10000eb0a3e87d345e5bc4fff5934af447fd/sched/mqueue/mq_sndinternal.c#L346) in [`nxmq_do_send`](https://github.com/apache/nuttx/blob/59fd10000eb0a3e87d345e5bc4fff5934af447fd/sched/mqueue/mq_sndinternal.c#L324C5-L324C17): our Wi-Fi driver uses `mqueue` to exchange data from the Wi-Fi ISR to the Wi-Fi task. After some debugging, I've found that the list was being corrupted when a message was received in [`file_mq_receive`](https://github.com/apache/incubator-nuttx/blob/391bf7b37c11b3d52e6f17cd8e1ff1c95c7e0e77/sched/mqueue/mq_receive.c#L103). This function should run inside a critical section (although the [`nxmq_wait_receive`](https://github.com/apache/nuttx/blob/391bf7b37c11b3d52e6f17cd8e1ff1c95c7e0e77/sched /mqueue/mq_rcvinternal.c#L134) and [`nxmq_do_receive`](https://github.com/apache/nuttx/blob/391bf7b37c11b3d52e6f17cd8e1ff1c95c7e0e77/sched/mqueue/mq_rcvinternal.c#L269) can reenable interrupts if a context switch is needed, restoring the critical section when the `mqueue` list is being manipulated). Finally, I added some global variables to check the places the `mqueue` list is being read/written (and where we expect to be in a critical section) and I expected these variables to be `false` when an interrupt is about to be dispatched (this is what the [NuttX Patch](https://github.com/user-attachments/files/16843821/isrmq-patch.gz) in https://github.com/apache/nuttx/pull/12864#issuecomment-2325779041 is about). Adding a breakpoint in this check, it's reached in ESP32-C3/ESP32-C6 as soon as `iperf` starts, stating that the critical section is not being respected. @yf13 created a testing application that simulates our Wi-Fi drive for `rv-virt` and the same behavior can be seen when the application runs on QEMU: the breakpoint is reached. Finally, reverting #12864 fixes `iperf` testing and, as expected, the breakpoint is not reached (this ensures that no interrupt occurred during the manipulation of the list). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org