On Thu, Jan 16, 2020 at 8:50 PM Aaron Conole <acon...@redhat.com> wrote: > > I've noticed an occasional segfault from the build system in the > service_autotest and after talking with David (CC'd), it seems like it's > due to the rte_service_finalize deleting the lcore_states object while > active lcores are running. > > The below patch is an attempt to solve it by first reassigning all the > lcores back to ROLE_RTE before releasing the memory. There is probably > a larger question for DPDK proper about actually closing the pending > lcore threads, but that's a separate issue. I've been running with the > patch for a while, and haven't seen the crash anymore on my system. > > Thoughts? Is it acceptable as-is?
Added this patch to my env, still reproducing the same issue after ~10-20 tries. I added a breakpoint to service_lcore_uninit that is indeed caught when exiting the test application (just wanted to make sure your change was in my binary). To reproduce: I modified app/test/meson.build to have an explicit "-l 0-1" + compiled with your patch. Then, I started a dummy busyloop "while true; do true; done" in a shell that I had pinned to core 1 (taskset -pc 1 $$). Finally, started another shell (as root), pinned to cores 0-1 on my laptop (taskset -pc 0,1 $$) and ran meson test --gdb --repeat=10000 service_autotest Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff4922700 (LWP 8572)] rte_service_runner_func (arg=<optimized out>) at ../lib/librte_eal/common/rte_service.c:458 458 cs->loops++; A debugging session is active. Inferior 1 [process 8566] will be killed. Quit anyway? (y or n) n Not confirmed. Missing separate debuginfos, use: debuginfo-install elfutils-libelf-0.172-2.el7.x86_64 glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libibverbs-17.2-3.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 libpcap-1.5.3-11.el7.x86_64 numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) info threads Id Target Id Frame * 4 Thread 0x7ffff4922700 (LWP 8572) "lcore-slave-1" rte_service_runner_func (arg=<optimized out>) at ../lib/librte_eal/common/rte_service.c:458 3 Thread 0x7ffff5123700 (LWP 8571) "rte_mp_handle" 0x00007ffff63a4b4d in recvmsg () from /lib64/libpthread.so.0 2 Thread 0x7ffff5924700 (LWP 8570) "eal-intr-thread" 0x00007ffff60c7603 in epoll_wait () from /lib64/libc.so.6 1 Thread 0x7ffff7fd2c00 (LWP 8566) "dpdk-test" 0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 rte_service_runner_func (arg=<optimized out>) at ../lib/librte_eal/common/rte_service.c:458 #1 0x0000000000b2c84f in eal_thread_loop (arg=<optimized out>) at ../lib/librte_eal/linux/eal/eal_thread.c:153 #2 0x00007ffff639ddd5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007ffff60c702d in clone () from /lib64/libc.so.6 (gdb) f 0 #0 rte_service_runner_func (arg=<optimized out>) at ../lib/librte_eal/common/rte_service.c:458 458 cs->loops++; (gdb) p *cs $1 = {service_mask = 0, runstate = 0 '\000', is_service_core = 0 '\000', service_active_on_lcore = '\000' <repeats 63 times>, loops = 0, calls_per_service = {0 <repeats 64 times>}} (gdb) p lcore_config[1] $2 = {thread_id = 140737296606976, pipe_master2slave = {14, 20}, pipe_slave2master = {21, 22}, f = 0xb26ec0 <rte_service_runner_func>, arg = 0x0, ret = 0, state = RUNNING, socket_id = 0, core_id = 1, core_index = 1, core_role = 0 '\000', detected = 1 '\001', cpuset = {__bits = {2, 0 <repeats 15 times>}}} (gdb) p lcore_config[0] $3 = {thread_id = 0, pipe_master2slave = {0, 0}, pipe_slave2master = {0, 0}, f = 0x0, arg = 0x0, ret = 0, state = WAIT, socket_id = 0, core_id = 0, core_index = 0, core_role = 0 '\000', detected = 1 '\001', cpuset = {__bits = {1, 0 <repeats 15 times>}}} (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fd2c00 (LWP 8566))] #0 0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2 #1 0x00007ffff7de4756 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7de4fcf in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7de9d1e in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df19da in _dl_runtime_resolve_xsavec () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7deafba in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff6002c29 in __run_exit_handlers () from /lib64/libc.so.6 #7 0x00007ffff6002c77 in exit () from /lib64/libc.so.6 #8 0x00007ffff5feb49c in __libc_start_main () from /lib64/libc.so.6 #9 0x00000000004fa126 in _start () -- David Marchand