On 6/26/24 9:39 AM, Dominic Rath wrote:
On 13.06.2024 14:22, Andrew Davis wrote:
We looked into this some time ago, and noticed that the IRQ approach caused 
problems in the virtio/rpmsg code. I'd like to understand if your change was 
for the same reason, or something else we missed before.


It is most likely the same reason. Seems despite its name, rproc_vq_interrupt() 
cannot
be called from an IRQ/atomic context. As the following backtrace shows, that 
function
calls down into functions which are not IRQ safe. So we needed to keep it 
threaded:

[    5.389374] BUG: scheduling while atomic: (udev-worker)/232/0x00010002
[    5.395917] Modules linked in: videobuf2_dma_contig videobuf2_memops 
videobuf2_v4l2 phy_j721e_wiz display_connector omap_mailbox(+) videodev 
tps6594_i2c(+) videobuf2_common phy_can_transceiver at24 cd6
[    5.433562] CPU: 0 PID: 232 Comm: (udev-worker) Not tainted 
6.10.0-rc1-next-20240528-dirty #10
[    5.442158] Hardware name: Texas Instruments AM69 SK (DT)
[    5.447540] Call trace:
[    5.449976]  dump_backtrace+0x94/0xec
[    5.453640]  show_stack+0x18/0x24
[    5.456944]  dump_stack_lvl+0x78/0x90
[    5.460598]  dump_stack+0x18/0x24
[    5.463900]  __schedule_bug+0x50/0x68
[    5.467552]  __schedule+0x80c/0xb0c
[    5.471029]  schedule+0x34/0x104
[    5.474243]  schedule_preempt_disabled+0x24/0x40
[    5.478845]  rwsem_down_write_slowpath+0x31c/0x56c
[    5.483622]  down_write+0x90/0x94
[    5.486924]  kernfs_add_one+0x3c/0x148
[    5.490661]  kernfs_create_dir_ns+0x50/0x94
[    5.494830]  sysfs_create_dir_ns+0x70/0x10c
[    5.498999]  kobject_add_internal+0x98/0x26c
[    5.503254]  kobject_add+0x9c/0x10c
[    5.506729]  device_add+0xc0/0x790
[    5.510120]  rpmsg_register_device_override+0x10c/0x1c0
[    5.515333]  rpmsg_register_device+0x14/0x20
[    5.519590]  virtio_rpmsg_create_channel+0xb0/0x104
[    5.524452]  rpmsg_create_channel+0x28/0x60
[    5.528622]  rpmsg_ns_cb+0x100/0x1dc
[    5.532185]  rpmsg_recv_done+0x114/0x2e4
[    5.536094]  vring_interrupt+0x68/0xa4
[    5.539833]  rproc_vq_interrupt+0x2c/0x48
[    5.543830]  k3_r5_rproc_mbox_callback+0x84/0x90 [ti_k3_r5_remoteproc]
[    5.550348]  mbox_chan_received_data+0x1c/0x2c
[    5.554779]  mbox_interrupt+0xa0/0x17c [omap_mailbox]
[    5.559820]  __handle_irq_event_percpu+0x48/0x13c
[    5.564511]  handle_irq_event+0x4c/0xac


I looked into this a bit more closely, together with the colleague who 
implemented our internal workaround, and we came up with one more concern:

Have you considered that this synchronous path from the (threaded) IRQ draining 
the mailbox down to the (potentially blocking) rpmsg_* calls might let the 
hardware mailbox grow full?

 From what I remember the vring (?) has room for 512 messages, but the hardware 
mailbox on e.g. the AM64x can only handle four messages. At least with the 
current implementation on TI's MCU+ SDK running on the R5f that could cause the 
R5f to block, waiting for room in the hardware mailbox, while there are plenty 
of vring buffers available.


We would like to switch back to the non-threaded handler at some point. As you 
mention doing this
in a threaded way increase the risk of the hardware message queue filling and 
blocking the remote side.
(Plus the threaded handling can add latency to the message handling which 
should be avoided for real-time
reasons)

The fix might be to have rpmsg_recv_done() callback simply queue the message 
and then schedule another
worker to actually do the next stage processing. That way complex actions on 
messages do not block
vring_interrupt() which should be treated like an atomic context call.

Anyway for now, I'd expect the much faster host core (2xA53 @ 1GHZ in AM64) to 
be able to outpace the
R5s and keep the mailbox drained. Are you actually running into this issue or 
is the concern based on
ensuring RT performance(not blocking on mailbox queue filled) on the R5 side?

Andrew

Best Regards,

Dominic

Reply via email to