Few more bits.

On 19.06.2019 22:51, William Tu wrote:
> The patch introduces experimental AF_XDP support for OVS netdev.
> AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> type built upon the eBPF and XDP technology.  It is aims to have comparable
> performance to DPDK but cooperate better with existing kernel's networking
> stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> attached to the netdev, by-passing a couple of Linux kernel's subsystems
> As a result, AF_XDP socket shows much better performance than AF_PACKET
> For more details about AF_XDP, please see linux kernel's
> Documentation/networking/af_xdp.rst. Note that by default, this feature is
> not compiled in.
> 
> Signed-off-by: William Tu <[email protected]>
> ---

<snip>

> +int
> +netdev_afxdp_batch_send(struct netdev *netdev, int qid,
> +                        struct dp_packet_batch *batch,
> +                        bool concurrent_txq)
> +{
> +    struct netdev_linux *dev = netdev_linux_cast(netdev);
> +    struct xsk_socket_info *xsk_info = dev->xsks[qid];

You're remapping 'qid' below, but using old 'xsk_info'.

> +    struct umem_elem *elems_pop[BATCH_SIZE];
> +    struct xsk_umem_info *umem;
> +    struct dp_packet *packet;
> +    bool free_batch = true;

This must be 'false' by default.

> +    uint32_t idx = 0;
> +    int error = 0;
> +    int ret;
> +
> +    if (OVS_UNLIKELY(concurrent_txq)) {
> +        qid = qid % dev->up.n_txq;
> +        ovs_spin_lock(&dev->tx_locks[qid]);
> +    }
> +
> +    if (!xsk_info || !xsk_info->xsk) {
> +        goto out;
> +    }
> +
> +    afxdp_complete_tx(xsk_info);
> +
> +    free_batch = check_free_batch(batch);
> +
> +    umem = xsk_info->umem;
> +    ret = umem_elem_pop_n(&umem->mpool, batch->count, (void **)elems_pop);
> +    if (OVS_UNLIKELY(ret)) {
> +        xsk_info->tx_dropped += batch->count;
> +        error = ENOMEM;
> +        goto out;
> +    }
> +
> +    /* Make sure we have enough TX descs */
> +    ret = xsk_ring_prod__reserve(&xsk_info->tx, batch->count, &idx);
> +    if (OVS_UNLIKELY(ret == 0)) {
> +        umem_elem_push_n(&umem->mpool, batch->count, (void **)elems_pop);
> +        xsk_info->tx_dropped += batch->count;
> +        error = ENOMEM;
> +        goto out;
> +    }
> +
> +    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> +        struct umem_elem *elem;
> +        uint64_t index;
> +
> +        elem = elems_pop[i];
> +        /* Copy the packet to the umem we just pop from umem pool.
> +         * TODO: avoid this copy if the packet and the pop umem
> +         * are located in the same umem.
> +         */
> +        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> +
> +        index = (uint64_t)((char *)elem - (char *)umem->buffer);
> +        xsk_ring_prod__tx_desc(&xsk_info->tx, idx + i)->addr = index;
> +        xsk_ring_prod__tx_desc(&xsk_info->tx, idx + i)->len
> +            = dp_packet_size(packet);
> +    }
> +    xsk_ring_prod__submit(&xsk_info->tx, batch->count);
> +    xsk_info->outstanding_tx += batch->count;
> +
> +    ret = kick_tx(xsk_info);
> +    if (OVS_UNLIKELY(ret)) {
> +        VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
> +                     ovs_strerror(ret));
> +    }
> +
> +out:
> +    if (free_batch) {
> +        free_afxdp_buf_batch(batch);
> +    } else {
> +        dp_packet_delete_batch(batch, true);
> +    }
> +
> +    if (OVS_UNLIKELY(concurrent_txq)) {
> +        ovs_spin_unlock(&dev->tx_locks[qid]);
> +    }
> +    return error;
> +}

<snip>

> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index bf4b6f8dc621..1f020e1c3825 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3106,6 +3106,36 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>          </p>
>        </column>
>  
> +      <column name="other_config" key="xdpmode"
> +              type='{"type": "string",
> +                     "enum": ["set", ["skb", "drv"]]}'>
> +        <p>
> +          Specifies the operational mode of the XDP program.
> +          If "drv", the XDP program is loaded into the device driver with
> +          zero-copy RX and TX enabled. This mode requires device driver with
> +          AF_XDP support and has the best performance.
> +          If "skb", the XDP program is using generic XDP mode in kernel with
> +          extra data copying between userspace and kernel. No device driver
> +          support is needed. Note that this is afxdp netdev type only.
> +          Defaults to "skb" mode.
> +        </p>
> +      </column>
> +
> +      <column name="other_config" key="xdpmode"
> +              type='{"type": "string",
> +                     "enum": ["set", ["skb", "drv"]]}'>
> +        <p>
> +          Specifies the operational mode of the XDP program.
> +          If "drv", the XDP program is loaded into the device driver with
> +          zero-copy RX and TX enabled. This mode requires device driver with
> +          AF_XDP support and has the best performance.
> +          If "skb", the XDP program is using generic XDP mode in kernel with
> +          extra data copying between userspace and kernel. No device driver
> +          support is needed. Note that this is afxdp netdev type only.
> +          Defaults to "skb" mode.
> +        </p>
> +      </column>
> +

Duplicated docs.


One more thing I noticed is the same issue as you had with completion queue, but
with rx queue. When I'm trying to send traffic from 2 threads to the same port,
I'm starting receiving same pointers from rx ring. Not only the same ring 
entries,
but there was cases where two identical pointers was stored sequentially in rx 
ring.
I'm more and more thinking that it's a kernel/libbpf bug. The last bit that left
for checking is the pointers inside the fill queue. All other parts in OVS 
seems to
work correctly. I'll send more information about the testcase later after 
re-checking
with the most recent bpf-next.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to