On 18.06.2019 12:45, Eelco Chaudron wrote: > > > On 17 Jun 2019, at 22:32, William Tu wrote: > >> On Mon, Jun 17, 2019 at 11:23 AM William Tu <u9012...@gmail.com> wrote: >>> >>> Hi Eelco, >>> >>> On Mon, Jun 17, 2019 at 3:12 AM Eelco Chaudron <echau...@redhat.com> wrote: >>>> >>>> Hi William, >>>> >>>> See below parts of an offline email discussion I had with Magnus before, >>>> and some research I did in the end, which explains that by design you >>>> might not get all the descriptors ready. >>> >>> I think it's different issues. The behavior you described is a hickup >>> waiting >>> for queuing 16 rx packets. Here, at the afxdp_complete_tx, the >>> xsk_ring_cons__peek >>> returns descs that already been released, causing ovs push more elems and >>> thus >>> crash. >>> >>>> Hope this helps change your design… >>>> >>>> In addition, the Point to Point test is working with you change, >>>> however, the PVP test is still failing due to buffer starvation (see my >>>> comments in Patchv8 for a possible cause). >>>> >>> Thanks, looking back v8 >>> https://patchwork.ozlabs.org/patch/1097740/ >>> Hopefully next version will fix this issue. >>> >>>> Also on OVS restart system crashes in the following part: >>>> >>>> #0 netdev_afxdp_rxq_recv (rxq_=0x173c080, batch=0x7fe1397f80d0, >>>> qfill=0x0) at lib/netdev-afxdp.c:583 >>>> #1 0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>, >>>> batch=batch@entry=0x7fe1397f80d0, qfill=<optimized out>) at >>>> lib/netdev.c:710 >>>> #2 0x00000000008dd1c3 in dp_netdev_process_rxq_port >>>> (pmd=pmd@entry=0x175d990, rxq=0x175a460, port_no=2) at >>>> lib/dpif-netdev.c:4257 >>>> #3 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at >>>> lib/dpif-netdev.c:5449 >>>> #4 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at >>>> lib/ovs-thread.c:352 >>>> #5 0x00007fe1633872de in start_thread () from /lib64/libpthread.so.0 >>>> #6 0x00007fe162b2ca63 in clone () from /lib64/libc.so.6 >>>> >>> How do you restart the system? So I have two afxdp port >>> Port "eth3" >>> Interface "eth3" >>> type: afxdp >>> options: {n_rxq="1", xdpmode=drv} >>> Port "eth5" >>> Interface "eth5" >>> type: afxdp >>> options: {n_rxq="1", xdpmode=drv} >>> >>> I tested using >>> # ovs-vsctl del-port eth3 >>> # ovs-vsctl del-port eth5 >>> # ovs-vsctl del-br br0 >>> # ovs-appctl -t ovs-vswitchd exit >>> Looks ok. >>> >>> <snip> >>> >>>>> This means, that if you rely on (the naive :-)) code in the sample >>>>> application, you can endup in a situation where you can receive from >>>>> the >>>>> Rx ring, but not post to the fill ring. >>>>> >>>>> So, the reason for the 16 packet hickup is as following: >>>>> >>>>> 1. Userland: The fill ring is completely filled. >>>>> 2. Kernel: One packet is received, one entry picked from the fill >>>>> ring, >>>>> but the consumer pointer is not bumped, and packet is placed on the >>>>> Rx ring. >>>>> 3. Userland: One packet is picked from the Rx ring. >>>>> 4. Userland: Tries to put an entry on fill ring. The fill ring is >>>>> full, >>>>> so userland spins. >>>>> 5. Kernel: When 16 packets has been picked from the fill ring the >>>>> consumer ptr is released. >>>>> 6. Userland: Exists the while loop. >>> >>> Based on the above, there is no starvation problem here if there are more >>> than 16 packets, correct? And at step 4, we can skip spinning and try to >>> process more rx ring. >>> >>> For next version, I will first check the fill ring by using >>> xsk_prod_nb_free(), >>> to avoid the step 4. >>> >>> Thanks >>> William >> >> Hi Eelco, >> >> I have some fixes with commit "prepare for v12" at >> https://github.com/williamtu/ovs-ebpf/commits/afxdp-v11 >> >> I tested PVP and it works ok (using tap and also veth namespaces) >> Can you give it a try? > > The PVP test seems to work fine however after a while it stops forwarding: > > $ ovs-ofctl dump-flows ovs_pvp_br0 > cookie=0x0, duration=8.510s, table=0, n_packets=1, n_bytes=1020, > in_port=eno1 actions=output:tapVM > cookie=0x0, duration=8.504s, table=0, n_packets=1, n_bytes=252, > in_port=tapVM actions=output:eno1 > > Results: > > "Physical port, ""eno1"", speed 10 Gbit/s, traffic rate 100%" > "Physical to Virtual to Physical test, L3 flows[port redirect]" > ,Packet size > Number of flows,64,256,1024 > 10,13448,131687,0 > 100,596,0,0 > 1000,596,0,0 > > Rather low compared to the kernel, note the above is using a single queue: > > "Physical port, ""eno1"", speed 10 Gbit/s, traffic rate 100%" > "Physical to Virtual to Physical test, L3 flows[port redirect]" > ,Packet size > Number of flows,64,256,1024 > 10,502411,451579,421558 > 100,525439,440637,422051 > 1000,463875,419996,402010 > > However I can not restart OVS (see other email on how I restart), even if I > clear the XDP programs before a restart it fails, and cores. > The only way to recover is to reboot the box and start from scratch: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007f455919a9b5 in xsk_clear_bpf_maps (xsk=0x21) at xsk.c:462 > 462 bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0); > [Current thread is 1 (Thread 0x7f4559f1c000 (LWP 4898))] > Missing separate debuginfos, use: dnf debuginfo-install > elfutils-libelf-0.174-6.el8.x86_64 glibc-2.28-42.el8_0.1.x86_64 > libatomic-8.2.1-3.5.el8.x86_64 libcap-ng-0.7.9-4.el8.x86_64 > numactl-libs-2.0.12-2.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64 > zlib-1.2.11-10.el8.x86_64 > (gdb) bt > #0 0x00007f455919a9b5 in xsk_clear_bpf_maps (xsk=0x21) at xsk.c:462 > #1 0x00007f455919b278 in xsk_socket__delete (xsk=0x21) at xsk.c:711 > #2 0x00000000009b3af1 in xsk_destroy (xsk_info=<optimized out>) at > lib/netdev-afxdp.c:313 > #3 xsk_destroy_all (netdev=0x1df49a0) at lib/netdev-afxdp.c:313 > #4 0x00000000009b4fe9 in netdev_afxdp_destruct (netdev_=0x1df49a0) at > lib/netdev-afxdp.c:845 > #5 0x0000000000906e53 in netdev_unref (dev=0x1df49a0) at lib/netdev.c:573 > #6 0x00000000008739b1 in iface_do_create (errp=0x7ffe4fc5b588, > netdevp=0x7ffe4fc5b580, ofp_portp=0x7ffe4fc5b578, iface_cfg=0x1cde5d0, > br=0x1ce1690) at vswitchd/bridge.c:1825 > #7 iface_create (port_cfg=0x1cb3690, iface_cfg=0x1cde5d0, br=0x1ce1690) at > vswitchd/bridge.c:1848 > #8 bridge_add_ports__ (br=br@entry=0x1ce1690, > wanted_ports=wanted_ports@entry=0x1ce1770, > with_requested_port=with_requested_port@entry=false) at vswitchd/bridge.c:936 > #9 0x0000000000875ef7 in bridge_add_ports (wanted_ports=0x1ce1770, > br=0x1ce1690) at vswitchd/bridge.c:952 > #10 bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x1cb4b90) at > vswitchd/bridge.c:666 > #11 0x0000000000879521 in bridge_run () at vswitchd/bridge.c:3043 > #12 0x00000000004ef545 in main (argc=<optimized out>, argv=<optimized out>) > at vswitchd/ovs-vswitchd.c:127 > > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00051|netdev_afxdp|ERR|xsk_socket__create failed (Device or resource > busy) mode: SKB qid: 0 > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00052|netdev_afxdp|ERR|failed to create AF_XDP socket on queue 0 > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00055|netdev_afxdp|ERR|AF_XDP device tapVM reconfig fails > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00056|dpif_netdev|ERR|Failed to set interface tapVM new configuration > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00062|netdev_afxdp|ERR|xsk_socket__create failed (Device or resource > busy) mode: DRV qid: 0 > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00063|netdev_afxdp|ERR|failed to create AF_XDP socket on queue 0 > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00066|netdev_afxdp|ERR|AF_XDP device eno1 reconfig fails > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: > ovs|00067|dpif_netdev|ERR|Failed to set interface eno1 new configuration > Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com kernel: > ovs-vswitchd[5861]: segfault at 123 ip 00000000009b3afd sp 00007ffff954a770 > error 4 in ovs-vswitchd[400000+899000] >
I guess, this crash caused by trying to destroy unallocated queue. Following change could help: --- diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index a6543e8f5..6e1431dce 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -249,7 +249,7 @@ xsk_configure_all(struct netdev *netdev) ifindex = linux_get_ifindex(netdev_get_name(netdev)); n_rxq = netdev_n_rxq(netdev); - dev->xsks = xmalloc(n_rxq * sizeof(struct xsk_socket_info *)); + dev->xsks = xzalloc(n_rxq * sizeof(struct xsk_socket_info *)); /* configure each queue */ for (i = 0; i < n_rxq; i++) { --- This should prevent OVS from crash, however, I don't know why socket creation fails in your case. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev