On Wed, 11 Oct 2023 19:57:56 +0000 "Bly, Mike" <m...@ciena.com> wrote:
> Hello, > > We have run into a timing issue between threads when using the memif > interface type and need some guidance. > > Our application has a DPDK based process operating (among other > things) a memif server interface. The problem is exposed when this > memif interface receives a memif.disconnect message from the remote > client, while in the middle of an rte_eth_rx_burst() on this same > memif interface. As the IRQ message handling is on its own thread as > compared to the DPDK worker thread doing the rx_burst, this resulted > in a crash. The backtraces for which have been shared below. How does > one ensure there are guard rails in place to gracefully exit the > rx-burst when a disconnect occurs? Or, how do we properly modify the > code such that we defer responding to the disconnect CB after the > rx-burst operation has completed? > > We are utilizing DPDK 21.11.2. I have diff'd dpdks-stable:22.11.3 in > ./drivers/net/memif, but I do not see anything obvious that would > address this. I did a similar diff for dpdk:23.07, but do not see > anything obvious there either. > > -Mike > > (gdb) thread 1 > [Switching to thread 1 (Thread 0x7f17e2813600 (LWP 470))] > #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00, > bufs=0x7f17e28100e8, nb_pkts=32) at > ../git/drivers/net/memif/rte_eth_memif.c:338 338 > last_slot = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); (gdb) bt > #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00, > bufs=0x7f17e28100e8, nb_pkts=32) at > ../git/drivers/net/memif/rte_eth_memif.c:338 #1 0x000000000047e6fb > in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f17e28100e8, queue_id=0, > port_id=<optimized out>) at /usr/include/rte_ethdev.h:5368 #2 > pmd_main_loop () at ../git/swfw/api/src/swfwPmd.c:1086 #3 > 0x000000000047f309 in pmd_launch_one_lcore (dummy=<optimized out>) at > ../git/my_process.c:1157 #4 0x00007f17f7070e7c in eal_thread_loop > (arg=<optimized out>) at ../git/lib/eal/linux/eal_thread.c:146 #5 > 0x00007f17f4c3da72 in start_thread (arg=<optimized out>) at > pthread_create.c:442 #6 0x00007f17f4cbf930 in clone3 () at > ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) l 333 > ring_size = 1 << mq->log2_ring_size; 334 mask = ring_size > - 1; 335 336 if (type == MEMIF_RING_C2S) { 337 > cur_slot = mq->last_head; 338 last_slot > = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); 339 } > else { 340 cur_slot = mq->last_tail; 341 > last_slot = __atomic_load_n(&ring->tail, > __ATOMIC_ACQUIRE); 342 } (gdb) p ring->head Cannot access > memory at address 0x7f17d8e58006 > > (gdb) thread 19 > [Switching to thread 19 (Thread 0x7f17f0804600 (LWP 468))] > #0 0x00007f17f4caf97b in __GI___close (fd=494) at > ../sysdeps/unix/sysv/linux/close.c:27 27 return SYSCALL_CANCEL > (close, fd); (gdb) bt > #0 0x00007f17f4caf97b in __GI___close (fd=494) at > ../sysdeps/unix/sysv/linux/close.c:27 #1 0x00007f17e374f01f in > memif_free_regions (dev=dev@entry=0x7f17f727f000 > <rte_eth_devices+99072>) at > ../git/drivers/net/memif/rte_eth_memif.c:882 #2 0x00007f17e37475d0 > in memif_disconnect (dev=0x7f17f727f000 <rte_eth_devices+99072>) at > ../git/drivers/net/memif/memif_socket.c:623 #3 0x00007f17f7091bd2 in > eal_intr_process_interrupts (nfds=<optimized out>, events=<optimized > out>) at ../git/lib/eal/linux/eal_interrupts.c:1026 #4 > out>eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=20) at > out>../git/lib/eal/linux/eal_interrupts.c:1100 #5 > out>eal_intr_thread_main (arg=<optimized out>) at > out>../git/lib/eal/linux/eal_interrupts.c:1172 #6 0x00007f17f4c3da72 > out>in start_thread (arg=<optimized out>) at pthread_create.c:442 #7 > out>0x00007f17f4cbf930 in clone3 () at > out>../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > I don't think memif maintainer has been very active. One possibility would be the memif driver support removal event interrupt. This would require driver and application change.