On 17 Jun 2019, at 20:23, William Tu wrote:

Hi Eelco,

On Mon, Jun 17, 2019 at 3:12 AM Eelco Chaudron <echau...@redhat.com> wrote:

Hi William,

See below parts of an offline email discussion I had with Magnus before,
and some research I did in the end, which explains that by design you
might not get all the descriptors ready.

I think it's different issues. The behavior you described is a hickup waiting
for queuing 16 rx packets. Here, at the afxdp_complete_tx, the
xsk_ring_cons__peek
returns descs that already been released, causing ovs push more elems and thus
crash.

You are right did not read it thoroughly… Looks like a bug to me, after __release() I would assume it will not return the same elements in __peek().


Hope this helps change your design…

In addition, the Point to Point test is working with you change,
however, the PVP test is still failing due to buffer starvation (see my
comments in Patchv8 for a possible cause).

Thanks, looking back v8
https://patchwork.ozlabs.org/patch/1097740/
Hopefully next version will fix this issue.

Also on OVS restart system crashes in the following part:

#0  netdev_afxdp_rxq_recv (rxq_=0x173c080, batch=0x7fe1397f80d0,
qfill=0x0) at lib/netdev-afxdp.c:583
#1  0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>,
batch=batch@entry=0x7fe1397f80d0, qfill=<optimized out>) at
lib/netdev.c:710
#2  0x00000000008dd1c3 in dp_netdev_process_rxq_port
(pmd=pmd@entry=0x175d990, rxq=0x175a460, port_no=2) at
lib/dpif-netdev.c:4257
#3  0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at
lib/dpif-netdev.c:5449
#4  0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:352
#5  0x00007fe1633872de in start_thread () from /lib64/libpthread.so.0
#6  0x00007fe162b2ca63 in clone () from /lib64/libc.so.6

How do you restart the system? So I have two afxdp port
        Port "eth3"
            Interface "eth3"
                type: afxdp
                options: {n_rxq="1", xdpmode=drv}
        Port "eth5"
            Interface "eth5"
                type: afxdp
                options: {n_rxq="1", xdpmode=drv}

I tested using
# ovs-vsctl del-port eth3
# ovs-vsctl del-port eth5
# ovs-vsctl del-br br0
# ovs-appctl -t ovs-vswitchd exit
Looks ok.

I’m using an RHEL7 instance and use systemd to restart openvswitch with “systemctl restart openvswitch”.
It uses ovs-ctl to stat/stop, see here for some details:

https://github.com/openvswitch/ovs/blob/master/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in


<snip>

This means, that if you rely on (the naive :-)) code in the sample
application, you can endup in a situation where you can receive from
the
Rx ring, but not post to the fill ring.

So, the reason for the 16 packet hickup is as following:

1. Userland: The fill ring is completely filled.
2. Kernel: One packet is received, one entry picked from the fill
ring,
but the consumer pointer is not bumped, and packet is placed on the
   Rx ring.
3. Userland: One packet is picked from the Rx ring.
4. Userland: Tries to put an entry on fill ring. The fill ring is
full,
   so userland spins.
5. Kernel: When 16 packets has been picked from the fill ring the
   consumer ptr is released.
6. Userland: Exists the while loop.

Based on the above, there is no starvation problem here if there are more than 16 packets, correct? And at step 4, we can skip spinning and try to
process more rx ring.

For next version, I will first check the fill ring by using xsk_prod_nb_free(),
to avoid the step 4.

Yes, a __free() check here will skip this problem. I was running a single ping only test and it would spin forever…

Thanks
William
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to