On 08.08.2019 17:53, Eelco Chaudron wrote: > > > On 8 Aug 2019, at 14:09, Ilya Maximets wrote: > >> On 08.08.2019 14:42, Eelco Chaudron wrote: >>> >>> >>> On 19 Jul 2019, at 16:54, Ilya Maximets wrote: >>> >>>> On 18.07.2019 23:11, William Tu wrote: >>>>> The patch introduces experimental AF_XDP support for OVS netdev. >>>>> AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket >>>>> type built upon the eBPF and XDP technology. It is aims to have >>>>> comparable >>>>> performance to DPDK but cooperate better with existing kernel's networking >>>>> stack. An AF_XDP socket receives and sends packets from an eBPF/XDP >>>>> program >>>>> attached to the netdev, by-passing a couple of Linux kernel's subsystems >>>>> As a result, AF_XDP socket shows much better performance than AF_PACKET >>>>> For more details about AF_XDP, please see linux kernel's >>>>> Documentation/networking/af_xdp.rst. Note that by default, this feature is >>>>> not compiled in. >>>>> >>>>> Signed-off-by: William Tu <u9012...@gmail.com> >>>> >>>> >>>> Thanks, William, Eelco and Ben! >>>> >>>> I fixed couple of things and applied to master! >>> >>> Good to see this got merged into master while on PTO. However, when I got >>> back I decided to test it once more… >>> >>> When testing PVP I got a couple of packets trough, and then it would stall. >>> I thought it might be my kernel, so updated to yesterdays latest, no luck… >>> >>> I did see a bunch of “eno1: send failed due to exhausted memory pool.” >>> messages in the log. Putting back patch v14, made my problems go away… >>> >>> After some debugging, I noticed the problem was with the “continue” case in >>> the afxdp_complete_tx() function. >>> Applying the following patch made it work again: >>> >>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c >>> index b7cc0d988..9b335ddf0 100644 >>> --- a/lib/netdev-afxdp.c >>> +++ b/lib/netdev-afxdp.c >>> @@ -823,16 +823,21 @@ afxdp_complete_tx(struct xsk_socket_info *xsk_info) >>> >>> if (tx_to_free == BATCH_SIZE || j == tx_done - 1) { >>> umem_elem_push_n(&umem->mpool, tx_to_free, elems_push); >>> xsk_info->outstanding_tx -= tx_to_free; >>> tx_to_free = 0; >>> } >>> } >>> >>> + if (tx_to_free) { >>> + umem_elem_push_n(&umem->mpool, tx_to_free, elems_push); >>> + xsk_info->outstanding_tx -= tx_to_free; >>> + } >>> + >>> if (tx_done > 0) { >>> xsk_ring_cons__release(&umem->cq, tx_done); >>> } else { >>> COVERAGE_INC(afxdp_cq_empty); >>> } >>> } >> >> Good catch! Will you submit a patch? >> BTW, to reduce the code duplication I'd suggest to remove the 'continue' >> like this: >> >> if (*addr != UINT64_MAX) { >> Do work; >> } else { >> COVERAGE_INC(afxdp_cq_skip); >> } > > Done, patch has been sent out… > >>> >>> >>> Which made me wonder why we do mark elements as being used? To my knowledge >>> (and looking at some of the code and examples), after the >>> xsk_ring_cons__release() function a xsk_ring_cons__peek() should not >>> receive any duplicate slots. >>> >>> I see a rather high number of afxdp_cq_skip, which should to my knowledge >>> never happen? >> >> I tried to investigate this previously, but didn't find anything suspicious. >> So, for my knowledge, this should never happen too. >> However, I only looked at the code without actually running, because I had no >> HW available for testing. >> >> While investigation and stress-testing virtual ports I found few issues with >> missing locking inside the kernel, so there is no trust for kernel part of >> XDP >> implementation from my side. I'm suspecting that there are some other bugs in >> kernel/libbpf that only could be reproduced with driver mode. >> >> This never happens for virtual ports with SKB mode, so I never saw this >> coverage >> counter being non-zero. > > Did some quick debugging, as something else has come up that needs my > attention :) > > But once I’m in a faulty state and sent a single packet, causing > afxdp_complete_tx() to be called, it tells me 2048 descriptors are ready, > which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that there might be > some ring management bug. Maybe consumer and receiver are equal meaning 0 > buffers, but it returns max? I did not look at the kernel code, so this is > just a wild guess :) > > (gdb) p tx_done > $3 = 2048 > > (gdb) p umem->cq > $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask = 2047, size = > 2048, producer = 0x7f08486b8000, consumer = 0x7f08486b8040, ring = > 0x7f08486b8080}
Thanks for debugging! xsk_ring_cons__peek() just returns the difference between cached_prod and cached_cons, but these values are too different: 3830466864 - 3578066899 = 252399965 Since this value > requested, it returns requested number (2048). So, the ring is broken. At least broken its 'cached' part. It'll be good to look at *consumer and *producer values to verify the state of the actual ring. > >>> >>> $ ovs-appctl coverage/show | grep xdp >>> afxdp_cq_empty 0.0/sec 339.600/sec 5.6606/sec total: >>> 20378 >>> afxdp_tx_full 0.0/sec 29.967/sec 0.4994/sec total: >>> 1798 >>> afxdp_cq_skip 0.0/sec 61884770.167/sec 1174238.3644/sec >>> total: 4227258112 >>> >>> >>> You mentioned you saw this high number in your v15 change notes, did you do >>> any research on why? >>> >>> Cheers, >>> >>> Eelco > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev