from:"Loftus, Ciara"

RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy

2024-05-29 Thread Loftus, Ciara



> -Original Message-
> From: Du, Frank 
> Sent: Thursday, May 23, 2024 8:56 AM
> To: Morten Brørup ; Ferruh Yigit
> ; dev@dpdk.org; Andrew Rybchenko
> ; Burakov, Anatoly
> 
> Cc: Loftus, Ciara 
> Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
> 
> > -Original Message-
> > From: Morten Brørup 
> > Sent: Thursday, May 23, 2024 3:41 PM
> > To: Du, Frank ; Ferruh Yigit ;
> > dev@dpdk.org; Andrew Rybchenko ;
> Burakov,
> > Anatoly 
> > Cc: Loftus, Ciara 
> > Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
> >
> > > From: Du, Frank [mailto:frank...@intel.com]
> > > Sent: Thursday, 23 May 2024 08.56
> > >
> > > > From: Morten Brørup 
> > > > Sent: Wednesday, May 22, 2024 3:27 PM
> > > >
> > > > > From: Du, Frank [mailto:frank...@intel.com]
> > > > > Sent: Wednesday, 22 May 2024 03.25
> > > > >
> > > > > > From: Ferruh Yigit 
> > > > > > Sent: Wednesday, May 22, 2024 1:58 AM
> > > > > >
> > > > > > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > > > > > The current calculation assumes that the mbufs are contiguous.
> > > > > > > However, this assumption is incorrect when the memory spans
> > > > > > > across a huge
> > > > > > page.
> >
> > What does "the memory spans across a huge page" mean?
> >
> > Should it be "the memory spans across multiple memory chunks"?
> 
> This does not pertain to multiple memory chunks but rather to mbuf memory.
> The scenario involves a single memory chunk utilizing multiple 2M pages. To
> ensure that each mbuf resides exclusively within a single page, there are
> deliberate spacing gaps when allocating mbufs across the 2M page
> boundaries.
> 
> >
> > > > > > > Correct to directly read the size from the mempool memory chunks.
> > > > > > >
> > > > > > > Signed-off-by: Frank Du 
> > > > > > >
> > > > > > > ---
> > > > > > > v2:
> > > > > > > * Add virtual contiguous detect for for multiple memhdrs.
> > > > > > > ---
> > > > > > >  drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > > > > > -
> > > > > > >  1 file changed, 28 insertions(+), 6 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > index 268a130c49..7456108d6d 100644
> > > > > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev
> > > > > > > *dev __rte_unused,  }
> > > > > > >
> > > > > > >  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > > > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > > > > > uint64_t *align)
> > > > > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool
> > > > > > > +*mp, uint64_t *align, size_t *len)
> > > > > > >  {
> > > > > > > - struct rte_mempool_memhdr *memhdr;
> > > > > > > + struct rte_mempool_memhdr *memhdr, *next;
> > > > > > >   uintptr_t memhdr_addr, aligned_addr;
> > > > > > > + size_t memhdr_len = 0;
> > > > > > >
> > > > > > > + /* get the mempool base addr and align */
> > > > > > >   memhdr = STAILQ_FIRST(&mp->mem_list);
> > > > > > >   memhdr_addr = (uintptr_t)memhdr->addr;
> > > >
> > > > This is not a new bug; but if the mempool is not populated, memhdr
> > > > is NULL
> > > here.
> > >
> > > Thanks, will add a check later.
> > >
> > > >
> > > > > > >   aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > > > > >   *align = memhdr_addr - aligned_addr;
> > > > > > >
> > > > > >
> > > > > > I am aware this is not part of this patch, but as note, can't we
> > > > > > use 'RTE_ALIGN_FLOOR' to calcula

RE: [PATCH v6 4/9] net/af_xdp: use generic SW stats

2024-05-17 Thread Loftus, Ciara

> Subject: [PATCH v6 4/9] net/af_xdp: use generic SW stats
> 
> Use common code for all SW stats.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 98 -
>  1 file changed, 25 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..65fc2f478f 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c

[snip]

> @@ -541,6 +521,7 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
> 
>   for (i = 0; i < nb_pkts; i++) {
>   mbuf = bufs[i];
> + pkt_len = rte_pktmbuf_pkt_len(mbuf);
> 
>   if (mbuf->pool == umem->mb_pool) {
>   if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
> @@ -589,17 +570,13 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   count++;
>   }
> 
> - tx_bytes += mbuf->pkt_len;
> + rte_eth_count_packet(&txq->stats, pkt_len);

This change resolves the bugzilla you reported recently (1440 - use after free 
in af_xdp). Should this be mentioned in the commit message? We probably still 
need a separate patch for backporting that can be used without this entire 
series.

>   }
> 
>  out:
>   xsk_ring_prod__submit(&txq->tx, count);
>   kick_tx(txq, cq);
> 
> - txq->stats.tx_pkts += count;
> - txq->stats.tx_bytes += tx_bytes;
> - txq->stats.tx_dropped += nb_pkts - count;
> -
>   return count;
>  }
>  #else
> @@ -610,7 +587,6 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   struct xsk_umem_info *umem = txq->umem;
>   struct rte_mbuf *mbuf;
>   void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
> - unsigned long tx_bytes = 0;
>   int i;
>   uint32_t idx_tx;
>   struct xsk_ring_cons *cq = &txq->pair->cq;
> @@ -640,7 +616,8 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   pkt = xsk_umem__get_data(umem->mz->addr,
>desc->addr);
>   rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), desc-
> >len);
> - tx_bytes += mbuf->pkt_len;
> + rte_eth_qsw_update(&txq->stats, mbuf);

Typo? Assume this should be rte_eth_count_packet

> +
>   rte_pktmbuf_free(mbuf);
>   }
> 
> @@ -648,9 +625,6 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
> 
>   kick_tx(txq, cq);
> 
> - txq->stats.tx_pkts += nb_pkts;
> - txq->stats.tx_bytes += tx_bytes;
> -
>   return nb_pkts;
>  }
> 
> @@ -847,39 +821,26 @@ eth_dev_info(struct rte_eth_dev *dev, struct
> rte_eth_dev_info *dev_info)
>  static int
>  eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>  {
> - struct pmd_internals *internals = dev->data->dev_private;
>   struct pmd_process_private *process_private = dev->process_private;
> - struct xdp_statistics xdp_stats;
> - struct pkt_rx_queue *rxq;
> - struct pkt_tx_queue *txq;
> - socklen_t optlen;
> - int i, ret, fd;
> + unsigned int i;
> 
> - for (i = 0; i < dev->data->nb_rx_queues; i++) {
> - optlen = sizeof(struct xdp_statistics);
> - rxq = &internals->rx_queues[i];
> - txq = rxq->pair;
> - stats->q_ipackets[i] = rxq->stats.rx_pkts;
> - stats->q_ibytes[i] = rxq->stats.rx_bytes;
> + rte_eth_counters_stats_get(dev, offsetof(struct pkt_tx_queue, stats),
> +offsetof(struct pkt_rx_queue, stats), stats);
> 
> - stats->q_opackets[i] = txq->stats.tx_pkts;
> - stats->q_obytes[i] = txq->stats.tx_bytes;
> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
> + struct xdp_statistics xdp_stats;
> + socklen_t optlen = sizeof(xdp_stats);
> + int fd;
> 
> - stats->ipackets += stats->q_ipackets[i];
> - stats->ibytes += stats->q_ibytes[i];
> - stats->imissed += rxq->stats.rx_dropped;
> - stats->oerrors += txq->stats.tx_dropped;
>   fd = process_private->rxq_xsk_fds[i];
> - ret = fd >= 0 ? getsockopt(fd, SOL_XDP, XDP_STATISTICS,
> -&xdp_stats, &optlen) : -1;
> - if (ret != 0) {
> + if (fd < 0)
> + continue;
> + if (getsockopt(fd, SOL_XDP, XDP_STATISTICS,
> +&xdp_stats, &optlen)  < 0) {
>   AF_XDP_LOG(ERR, "getsockopt() failed for
> XDP_STATISTICS.\n");
>   return -1;
>   }
>   stats->imissed += xdp_stats.rx_dropped;
> -
> - stats->opackets += stats->q_opackets[i];
> - stats->obytes += stats->q_obytes[i];
>   }
> 
>   return 0;
> @@ -888,17 +849,8 @@ eth_stats_get(struct rt

RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy

2024-05-17 Thread Loftus, Ciara

> 
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
> 
> Signed-off-by: Frank Du 

Hi Frank,

Thanks for the patch.

Before your patch the umem_size was calculated using mb_pool->populated_size * 
rte_mempool_calc_obj_size(mb_pool->elt_size, ..)
With your patch we sum up the lens of all memhdrs in the mempool.

When debugging I see the new calculation can yield a larger value, but the new 
logic looks good and more thorough to me so I'm happy to opt with the new 
approach.

Acked-by: Ciara Loftus 

Thanks,
Ciara

> 
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs.
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 34 
> -
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..7456108d6d 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> __rte_unused,
>  }
> 
>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t
> *align, size_t *len)
>  {
> - struct rte_mempool_memhdr *memhdr;
> + struct rte_mempool_memhdr *memhdr, *next;
>   uintptr_t memhdr_addr, aligned_addr;
> + size_t memhdr_len = 0;
> 
> + /* get the mempool base addr and align */
>   memhdr = STAILQ_FIRST(&mp->mem_list);
>   memhdr_addr = (uintptr_t)memhdr->addr;
>   aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>   *align = memhdr_addr - aligned_addr;
> + memhdr_len += memhdr->len;
> +
> + /* check if virtual contiguous memory for multiple memhdrs */
> + next = STAILQ_NEXT(memhdr, next);
> + while (next != NULL) {
> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr +
> memhdr->len) {
> + AF_XDP_LOG(ERR, "memory chunks not virtual
> contiguous, "
> + "next: %p, cur: %p(len: %" PRId64 "
> )\n",
> + next->addr, memhdr->addr, memhdr-
> >len);
> + return 0;
> + }
> + /* virtual contiguous */
> + memhdr = next;
> + memhdr_len += memhdr->len;
> + next = STAILQ_NEXT(memhdr, next);
> + }
> 
> + *len = memhdr_len;
>   return aligned_addr;
>  }
> 
> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   void *base_addr = NULL;
>   struct rte_mempool *mb_pool = rxq->mb_pool;
>   uint64_t umem_size, align = 0;
> + size_t len = 0;
> 
>   if (internals->shared_umem) {
>   if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   }
> 
>   umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> &len);
> + if (!base_addr) {
> + AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> pool\n");
> + goto err;
> + }
> + umem_size = (uint64_t)len + align;
> 
>   ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
>   &rxq->fq, &rxq->cq, &usr_config);
> --
> 2.34.1

RE: [PATCH 2/3] net/af_xdp: Fix mbuf alloc failed statistic

2024-05-14 Thread Loftus, Ciara

> 
> On Fri, 10 May 2024 10:03:57 +
> Ciara Loftus  wrote:
> 
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index fee0d5d5f3..968bbf6d45 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -124,6 +124,7 @@ struct rx_stats {
> > uint64_t rx_pkts;
> > uint64_t rx_bytes;
> > uint64_t rx_dropped;
> > +   uint64_t alloc_failed;
> >  };
> 
> You don't have to use local statistic for this, there already is one in the 
> dev
> struct
> i.e dev->data->rx_mbuf_alloc_failed. The problem is you need the DPDK port
> number to find
> what dev is.

We now have the port number from the first patch in this series so that's no 
longer an issue.

> 
> And the code in ethdev for stats get will put it in the right place.
> 
> 
> PS: what is the point of rxq->stats.rx_dropped? It is never incremented.

Looks pointless indeed. Will add another patch to the series and remove it.

> 
> PPS: Looks like AF_XDP considers kernel full as an error (ie tx_dropped gets
> counted as error).
> This is not what real hardware does.

RE: [PATCH] net/af_xdp: fix umem map size for zero copy

2024-04-30 Thread Loftus, Ciara

> >
> > > Subject: [PATCH] net/af_xdp: fix umem map size for zero copy
> > >
> > > The current calculation assumes that the mbufs are contiguous.
> > > However, this assumption is incorrect when the memory spans across a
> huge
> > page.
> > > Correct to directly read the size from the mempool memory chunks.
> > >
> > > Signed-off-by: Frank Du 
> > > ---
> > >  drivers/net/af_xdp/rte_eth_af_xdp.c | 10 +-
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > index 268a130c49..cb95d17d13 100644
> > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > @@ -1039,7 +1039,7 @@ eth_link_update(struct rte_eth_dev *dev
> > > __rte_unused,  }
> > >
> > >  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > uint64_t
> > > *align)
> > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > > +uint64_t
> > > *align, size_t *len)
> > >  {
> > >   struct rte_mempool_memhdr *memhdr;
> > >   uintptr_t memhdr_addr, aligned_addr; @@ -1048,6 +1048,7 @@
> static
> > > inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> > > *align)
> > >   memhdr_addr = (uintptr_t)memhdr->addr;
> > >   aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > >   *align = memhdr_addr - aligned_addr;
> > > + *len = memhdr->len;
> > >
> > >   return aligned_addr;
> > >  }
> > > @@ -1125,6 +1126,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > > pmd_internals *internals,
> > >   void *base_addr = NULL;
> > >   struct rte_mempool *mb_pool = rxq->mb_pool;
> > >   uint64_t umem_size, align = 0;
> > > + size_t len = 0;
> > >
> > >   if (internals->shared_umem) {
> > >   if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@
> > > -1156,10 +1158,8 @@ xsk_umem_info *xdp_umem_configure(struct
> > > pmd_internals *internals,
> > >   }
> > >
> > >   umem->mb_pool = mb_pool;
> > > - base_addr = (void *)get_base_addr(mb_pool, &align);
> > > - umem_size = (uint64_t)mb_pool->populated_size *
> > > - (uint64_t)usr_config.frame_size +
> > > - align;
> > > + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> > > &len);
> > > + umem_size = (uint64_t)len + align;
> >
> > len is set to the length of the first memhdr of the mempool. There may be
> many
> > other memhdrs in the mempool. So I don't think this is the correct value to
> use for
> > calculating the entire umem size.
> 
> Current each xdp rx ring is bonded to one single umem region, it can't reuse
> the memory
> if there are multiple memhdrs in the mempool. How about adding a check on
> the number
> of the memory chunks to only allow one single memhdr mempool can be used
> here?

The UMEM needs to be a region of virtual contiguous memory. I think this can 
still be the case, even if the mempool has multiple memhdrs.
If we detect >1 memhdrs perhaps we need to verify that the 
RTE_MEMPOOL_F_NO_IOVA_CONTIG flag is not set which I think would mean that the 
mempool may not be virtually contiguous.

> 
> >
> > >
> > >   ret = xsk_umem__create(&umem->umem, base_addr,
> > umem_size,
> > >   &rxq->fq, &rxq->cq, &usr_config);
> > > --
> > > 2.34.1

RE: [PATCH] net/af_xdp: fix umem map size for zero copy

2024-04-26 Thread Loftus, Ciara

> Subject: [PATCH] net/af_xdp: fix umem map size for zero copy
> 
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
> 
> Signed-off-by: Frank Du 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..cb95d17d13 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1039,7 +1039,7 @@ eth_link_update(struct rte_eth_dev *dev
> __rte_unused,
>  }
> 
>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t
> *align, size_t *len)
>  {
>   struct rte_mempool_memhdr *memhdr;
>   uintptr_t memhdr_addr, aligned_addr;
> @@ -1048,6 +1048,7 @@ static inline uintptr_t get_base_addr(struct
> rte_mempool *mp, uint64_t *align)
>   memhdr_addr = (uintptr_t)memhdr->addr;
>   aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>   *align = memhdr_addr - aligned_addr;
> + *len = memhdr->len;
> 
>   return aligned_addr;
>  }
> @@ -1125,6 +1126,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   void *base_addr = NULL;
>   struct rte_mempool *mb_pool = rxq->mb_pool;
>   uint64_t umem_size, align = 0;
> + size_t len = 0;
> 
>   if (internals->shared_umem) {
>   if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1156,10 +1158,8 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   }
> 
>   umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> &len);
> + umem_size = (uint64_t)len + align;

len is set to the length of the first memhdr of the mempool. There may be many 
other memhdrs in the mempool. So I don't think this is the correct value to use 
for calculating the entire umem size.

> 
>   ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
>   &rxq->fq, &rxq->cq, &usr_config);
> --
> 2.34.1

RE: [v1 1/1] docs: af_xdp device plugin repo update

2024-04-26 Thread Loftus, Ciara

> Subject: [v1 1/1] docs: af_xdp device plugin repo update
> 
> Fixup the references to the AF_XDP Device Plugin repo.
> 
> Fixes: 66a2aca4f512 ("docs: fix AF_XDP device plugin howto")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Maryam Tahhan 

Acked-by: Ciara Loftus 

> ---
>  doc/guides/howto/af_xdp_cni.rst | 12 ++--
>  doc/guides/nics/af_xdp.rst  |  2 +-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/guides/howto/af_xdp_cni.rst
> b/doc/guides/howto/af_xdp_cni.rst
> index a1a6d5b99c..63345ec79c 100644
> --- a/doc/guides/howto/af_xdp_cni.rst
> +++ b/doc/guides/howto/af_xdp_cni.rst
> @@ -16,7 +16,7 @@ to redirect packets to a memory buffer in userspace.
>  This document explains how to enable the `AF_XDP Plugin for Kubernetes`_
> within
>  a DPDK application using the :doc:`../nics/af_xdp` to connect and use these
> technologies.
> 
> -.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-
> for-kubernetes
> +.. _AF_XDP Plugin for Kubernetes: https://github.com/redhat-et/afxdp-
> plugins-for-kubernetes
> 
> 
>  Background
> @@ -91,7 +91,7 @@ Howto run dpdk-testpmd with CNI plugin:
> 
>.. code-block:: console
> 
> - # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
> + # git clone 
> https://github.com/redhat-et/afxdp-plugins-for-kubernetes.git
> 
>  * Build the CNI plugin
> 
> @@ -128,7 +128,7 @@ Howto run dpdk-testpmd with CNI plugin:
> 
>For further reference please use the `config.json`_
> 
> -  .. _config.json: https://github.com/intel/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/config.json
> +  .. _config.json: https://github.com/redhat-et/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/config.json
> 
>  * Create the Network Attachment definition
> 
> @@ -167,7 +167,7 @@ Howto run dpdk-testpmd with CNI plugin:
> 
>For further reference please use the `nad.yaml`_
> 
> -  .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/nad.yaml
> +  .. _nad.yaml: https://github.com/redhat-et/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/nad.yaml
> 
>  * Build the Docker image
> 
> @@ -237,7 +237,7 @@ Howto run dpdk-testpmd with CNI plugin:
> 
>For further reference please use the `pod.yaml`_
> 
> -  .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/pod-1c1d.yaml
> +  .. _pod.yaml: https://github.com/redhat-et/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/pod-1c1d.yaml
> 
>  * Run DPDK with a command like the following:
> 
> @@ -250,4 +250,4 @@ Howto run dpdk-testpmd with CNI plugin:
> 
>  For further reference please use the `e2e`_ test case in `AF_XDP Plugin for
> Kubernetes`_
> 
> -  .. _e2e: https://github.com/intel/afxdp-plugins-for-
> kubernetes/tree/v0.0.2/test/e2e
> +  .. _e2e: https://github.com/redhat-et/afxdp-plugins-for-
> kubernetes/tree/v0.0.2/test/e2e
> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> index 1932525d4d..4612168122 100644
> --- a/doc/guides/nics/af_xdp.rst
> +++ b/doc/guides/nics/af_xdp.rst
> @@ -157,7 +157,7 @@ use_cni
>  The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to
>  enable the `AF_XDP Plugin for Kubernetes`_ within a DPDK application.
> 
> -.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-
> for-kubernetes
> +.. _AF_XDP Plugin for Kubernetes: https://github.com/redhat-et/afxdp-
> plugins-for-kubernetes
> 
>  .. code-block:: console
> 
> --
> 2.41.0

RE: [v1 1/1] MAINTAINERS: add another AF_XDP maintainer

2024-04-25 Thread Loftus, Ciara

> Subject: [v1 1/1] MAINTAINERS: add another AF_XDP maintainer
> 
> Add Maryam Tahhan as an additional maintainer for AF_XDP
> PMD and it's documentation.
> 
> Signed-off-by: Maryam Tahhan 

Thanks Maryam.

Acked-by: Ciara Loftus 

> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7abb3aee49..f0d6a36abd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -647,6 +647,7 @@ F: doc/guides/nics/features/afpacket.ini
> 
>  Linux AF_XDP
>  M: Ciara Loftus 
> +M: Maryam Tahhan 
>  F: drivers/net/af_xdp/
>  F: doc/guides/nics/af_xdp.rst
>  F: doc/guides/nics/features/af_xdp.ini
> --
> 2.41.0

RE: [v14 0/3] net/af_xdp: fix multi interface support for K8s

2024-04-08 Thread Loftus, Ciara

> 
> The original `use_cni` implementation was limited to
> supporting only a single netdev in a DPDK pod. This patchset
> aims to fix this limitation transparently to the end user.
> It will also enable compatibility with the latest AF_XDP
> Device Plugin.
> 
> Signed-off-by: Maryam Tahhan 

Thanks Maryam.

For the series,
Acked-by: Ciara Loftus 

> ---
> v14:
> * Fixup bpf_map_update_elem() in compat.h to use xsk fd as the
>   third argument.
> 
> v13:
> * Fixup checkpatch issues.
> 
> v12:
> * Ensure backwards compability with libbpf versions that don't support
>   xsk_socket__update_xskmap().
> 
> v11:
> * Fixed up typos picked up by checkpatch.
> 
> v10:
> * Add UDS acronym
> * Update `use_cni` in docs with ``use_cni``
> * Remove reference to limitations and simply document behaviour
>   before and after DPDK 23.11.
> 
> v9:
> * Fixup checkpatch issues.
> 
> v8:
> * Go back to using `use_cni` vdev argument
> * Introduce `use_map_pinning` vdev param.
> * Rename `uds_path` to `dp_path` so that it can be used
>   with map pinning as well as `use_cni`.
> * Set `dp_path` internally in the AF_XDP PMD if it's
>   not configured by the user.
> * Clean up the original `use_cni` documentation separately
>   to coding changes.
> 
> v7:
> * Give a more descriptive commit msg headline.
> * Fixup typos in documentation.
> 
> v6:
> * Add link to PR 81 in commit message
> * Add release notes changes to this patchset
> 
> v5:
> * Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
> * Remove use_cni references in af_xdp.rst
> 
> v4:
> * Rename af_xdp_cni.rst to af_xdp_dp.rst
> * Removed all incorrect references to CNI throughout af_xdp
>   PMD file.
> * Fixed Typos in af_xdp_dp.rst
> 
> v3:
> * Remove `use_cni` vdev argument as it's no longer needed.
> * Update incorrect CNI references for the AF_XDP DP in the
>   documentation.
> * Update the documentation to run a simple example with the
>   AF_XDP DP plugin in K8s.
> 
> v2:
> * Rename sock_path to uds_path.
> * Update documentation to reflect when CAP_BPF is needed.
> * Fix testpmd arguments in the provided example for Pods.
> * Use AF_XDP API to update the xskmap entry.
> ---
> 
> Maryam Tahhan (3):
>   docs: AF_XDP Device Plugin
>   net/af_xdp: fix multi interface support for K8s
>   net/af_xdp: support AF_XDP DP pinned maps
> 
>  doc/guides/howto/af_xdp_cni.rst| 253 --
>  doc/guides/howto/af_xdp_dp.rst | 340 +
>  doc/guides/howto/index.rst |   2 +-
>  doc/guides/nics/af_xdp.rst |  44 +++-
>  doc/guides/rel_notes/release_24_07.rst |  17 ++
>  drivers/net/af_xdp/compat.h|  15 ++
>  drivers/net/af_xdp/meson.build |   4 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c| 170 +
>  8 files changed, 543 insertions(+), 302 deletions(-)
>  delete mode 100644 doc/guides/howto/af_xdp_cni.rst
>  create mode 100644 doc/guides/howto/af_xdp_dp.rst
> 
> --
> 2.41.0

RE: [v13 2/3] net/af_xdp: fix multi interface support for K8s

2024-04-08 Thread Loftus, Ciara



> 
> +#ifdef ETH_AF_XDP_UPDATE_XSKMAP
> +static __rte_always_inline int
> +update_xskmap(struct xsk_socket *xsk, int map_fd, int xsk_queue_idx
> __rte_unused)
> +{
> + return xsk_socket__update_xskmap(xsk, map_fd);
> +}
> +#else
> +static __rte_always_inline int
> +update_xskmap(struct xsk_socket *xsk, int map_fd, int xsk_queue_idx)
> +{
> + int fd = xsk_socket__fd(xsk);

'fd' computed here is not used in this function so generates an unused variable 
warning.

> + return bpf_map_update_elem(map_fd, &xsk_queue_idx, &map_fd,
> 0);
> +}
> +#endif
> +

RE: [v11 2/3] net/af_xdp: fix multi interface support for K8s

2024-03-01 Thread Loftus, Ciara

snip

> @@ -1695,17 +1699,16 @@ xsk_configure(struct pmd_internals *internals,
> struct pkt_rx_queue *rxq,
>   }
> 
>   if (internals->use_cni) {
> - int err, fd, map_fd;
> + int err, map_fd;
> 
> - /* get socket fd from CNI plugin */
> - map_fd = get_cni_fd(internals->if_name);
> + /* get socket fd from AF_XDP Device Plugin */
> + map_fd = uds_get_xskmap_fd(internals->if_name, internals-
> >dp_path);
>   if (map_fd < 0) {
> - AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
> + AF_XDP_LOG(ERR, "Failed to receive xskmap fd from
> AF_XDP Device Plugin\n");
>   goto out_xsk;
>   }
> - /* get socket fd */
> - fd = xsk_socket__fd(rxq->xsk);
> - err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx,
> &fd, 0);
> +
> + err = xsk_socket__update_xskmap(rxq->xsk, map_fd);

Hi Maryam,

I've reviewed the series again. I haven't tested the device-plugin specific 
functionality as I don't have that environment set up, but outside of that I am 
happy that the new functionality doesn't break anything else. The doc updates 
look good to me now, thank you for the fixes.

I have just spotted one issue and I apologise for only catching it now.
Patch 2 introduces a dependency on the xsk_socket__update_xskmap function which 
is available in:
libbpf >= v0.3.0 and <= v0.6.0
libxdp > v1.2.0

The af_xdp.rst guide states we are compatible with libbpf (on it's own) <= 
v0.6.0. So users using libbpf < v0.3.0 will get an undefined reference warning 
for the xsk_socket__update_xskmap function.

Is it possible to implement fallback functionality (or if that's not possible, 
bail out) if that function is not available? See how this is done for the 
xsk_socket__create_shared function in meson.build and compat.h. 

Thanks,
Ciara

>   if (err) {
>   AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in
> map.\n");
>   goto out_xsk;
> @@ -1881,13 +1884,13 @@ static const struct eth_dev_ops ops = {
>   .get_monitor_addr = eth_get_monitor_addr,
>  };
> 
> -/* CNI option works in unprivileged container environment
> - * and ethernet device functionality will be reduced. So
> - * additional customiszed eth_dev_ops struct is needed
> - * for cni. Promiscuous enable and disable functionality
> - * is removed.
> +/* AF_XDP Device Plugin option works in unprivileged
> + * container environments and ethernet device functionality
> + * will be reduced. So additional customised eth_dev_ops
> + * struct is needed for the Device Plugin. Promiscuous
> + * enable and disable functionality is removed.
>   **/
> -static const struct eth_dev_ops ops_cni = {
> +static const struct eth_dev_ops ops_afxdp_dp = {
>   .dev_start = eth_dev_start,
>   .dev_stop = eth_dev_stop,
>   .dev_close = eth_dev_close,
> @@ -2023,7 +2026,8 @@ xdp_get_channels_info(const char *if_name, int
> *max_queues,
>  static int
>  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
>int *queue_cnt, int *shared_umem, char *prog_path,
> -  int *busy_budget, int *force_copy, int *use_cni)
> +  int *busy_budget, int *force_copy, int *use_cni,
> +  char *dp_path)
>  {
>   int ret;
> 
> @@ -2069,6 +2073,11 @@ parse_parameters(struct rte_kvargs *kvlist, char
> *if_name, int *start_queue,
>   if (ret < 0)
>   goto free_kvlist;
> 
> + ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
> +  &parse_prog_arg, dp_path);
> + if (ret < 0)
> + goto free_kvlist;
> +
>  free_kvlist:
>   rte_kvargs_free(kvlist);
>   return ret;
> @@ -2108,7 +2117,7 @@ static struct rte_eth_dev *
>  init_internals(struct rte_vdev_device *dev, const char *if_name,
>  int start_queue_idx, int queue_cnt, int shared_umem,
>  const char *prog_path, int busy_budget, int force_copy,
> -int use_cni)
> +int use_cni, const char *dp_path)
>  {
>   const char *name = rte_vdev_device_name(dev);
>   const unsigned int numa_node = dev->device.numa_node;
> @@ -2138,6 +2147,7 @@ init_internals(struct rte_vdev_device *dev, const
> char *if_name,
>   internals->shared_umem = shared_umem;
>   internals->force_copy = force_copy;
>   internals->use_cni = use_cni;
> + strlcpy(internals->dp_path, dp_path, PATH_MAX);
> 
>   if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
> &internals->combined_queue_cnt)) {
> @@ -2199,7 +2209,7 @@ init_internals(struct rte_vdev_device *dev, const
> char *if_name,
>   if (!internals->use_cni)
>   eth_dev->dev_ops = &ops;
>   else
> - eth_dev->dev_ops = &ops_cni;
> + eth_dev->dev_ops = &ops_afxdp_dp;
> 
>   eth_dev-

RE: [PATCH v2] net/af_xdp: fix resources leak when xsk configure fails

2024-02-22 Thread Loftus, Ciara

> Subject: [PATCH v2] net/af_xdp: fix resources leak when xsk configure fails
> 
> In xdp_umem_configure() allocated some resources for the
> xsk umem, we should delete them when xsk configure fails,
> otherwise it will lead to resources leak.
> 
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yunjian Wang 

Thanks!

Reviewed-by 

> ---
> v2: update code style as suggested by Maryam Tahhan
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2d151e45c7..b52513bd7e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -960,6 +960,11 @@ remove_xdp_program(struct pmd_internals
> *internals)
>  static void
>  xdp_umem_destroy(struct xsk_umem_info *umem)
>  {
> + if (umem->umem) {
> + (void)xsk_umem__delete(umem->umem);
> + umem->umem = NULL;
> + }
> +
>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>   umem->mb_pool = NULL;
>  #else
> @@ -992,11 +997,8 @@ eth_dev_close(struct rte_eth_dev *dev)
>   break;
>   xsk_socket__delete(rxq->xsk);
> 
> - if (__atomic_fetch_sub(&rxq->umem->refcnt, 1,
> __ATOMIC_ACQUIRE) - 1
> - == 0) {
> - (void)xsk_umem__delete(rxq->umem->umem);
> + if (__atomic_fetch_sub(&rxq->umem->refcnt, 1,
> __ATOMIC_ACQUIRE) - 1 == 0)
>   xdp_umem_destroy(rxq->umem);
> - }
> 
>   /* free pkt_tx_queue */
>   rte_free(rxq->pair);
> --
> 2.41.0

RE: [v9 2/3] net/af_xdp: fix multi interface support for K8s

2024-02-22 Thread Loftus, Ciara

> Subject: [v9 2/3] net/af_xdp: fix multi interface support for K8s
> 
> The original 'use_cni' implementation, was added
> to enable support for the AF_XDP PMD in a K8s env
> without any escalated privileges.
> However 'use_cni' used a hardcoded socket rather
> than a configurable one. If a DPDK pod is requesting
> multiple net devices and these devices are from
> different pools, then the AF_XDP PMD attempts to
> mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
> Which means that at best only 1 netdev will handshake
> correctly with the AF_XDP DP. This patch addresses
> this by making the socket parameter configurable using
> a new vdev param called 'dp_path' alongside the
> original 'use_cni' param. If the 'dp_path' parameter
> is not set alongside the 'use_cni' parameter, then
> it's configured inside the AF_XDP PMD (transparently
> to the user). This change has been tested
> with the AF_XDP DP PR 81[1], with both single and

[1] does not point to any reference.

> multiple interfaces.
> 
> Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Maryam Tahhan 
> ---
>  doc/guides/howto/af_xdp_dp.rst | 43 ++--
>  doc/guides/nics/af_xdp.rst | 14 
>  doc/guides/rel_notes/release_24_03.rst |  7 ++
>  drivers/net/af_xdp/rte_eth_af_xdp.c| 94 --
>  4 files changed, 116 insertions(+), 42 deletions(-)
> 
> diff --git a/doc/guides/howto/af_xdp_dp.rst
> b/doc/guides/howto/af_xdp_dp.rst
> index 657fc8d52c..8a64ec5599 100644
> --- a/doc/guides/howto/af_xdp_dp.rst
> +++ b/doc/guides/howto/af_xdp_dp.rst
> @@ -52,13 +52,18 @@ should be used when creating the socket
>  to instruct libbpf not to load the default libbpf program on the netdev.
>  Instead the loading is handled by the AF_XDP Device Plugin.
> 
> +The EAL vdev argument ``dp_path`` is used alongside the ``use_cni``
> argument
> +to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
> +AF_XDP Device Plugin. If this argument is not passed alongside the 
> ``use_cni``
> +argument then the AF_XDP PMD configures it internally.
> +
>  Limitations
>  ---
> 
>  For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
>  the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP
> PMD
> -is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
> -and the pod is limited to a single netdev.
> +is only compatible with the `AF_XDP Device Plugin for Kubernetes`_  up to
> +commit id `38317c2`_ and the pod is limited to a single netdev.
> 
>  .. note::
> 
> @@ -75,6 +80,14 @@ in the PMD alongside the `use_cni` parameter.
> 
>  .. _38317c2: https://github.com/intel/afxdp-plugins-for-
> kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
> 
> +.. note::
> +
> +The introduction of the ``dp_path`` EAL vdev argument fixes the 
> limitation
> above. If a
> +user doesn't explicitly set the ``dp_path``parameter when using 
> ``use_cni``
> then that
> +path is transparently configured in the AF_XDP PMD to the default
> +`AF_XDP Device Plugin for Kubernetes`_ mount point path. This is
> compatible with the latest
> +AF_XDP Device Plugin. For backwards compatibility with versions of the
> AF_XDP DP <= commit
> +id `38317c2`_ please explicitly set ``dp_path`` to ``/tmp/afxdp.sock``.

I think instead of adding a note here we can just simply remove the limitation. 
When the user has this patch, they most likely will not care about limitations 
that were previously present.
Just make sure that the information about the behaviour when dp_path is not set 
and how to set dp_path to be backwards compatible is still captured in the docs 
somewhere.
I just think it's confusing to state a limitation followed by a note that it is 
resolved.

The remainder of the changes in this series LGTM.

Thanks,
Ciara

> 
>  Prerequisites
>  -
> @@ -105,10 +118,10 @@ Device Plugin and DPDK container prerequisites:
> 
>.. code-block:: console
> 
> - cat << EOF | sudo tee
> /etc/systemd/system/containerd.service.d/limits.conf
> - [Service]
> - LimitMEMLOCK=infinity
> - EOF
> +cat << EOF | sudo tee
> /etc/systemd/system/containerd.service.d/limits.conf
> +[Service]
> +LimitMEMLOCK=infinity
> +EOF
> 
>  * dpdk-testpmd application should have AF_XDP feature enabled.
> 
> @@ -284,7 +297,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin +
> CNI
>  emptyDir:
>medium: HugePages
> 
> -  For further reference please use the `pod.yaml`_
> +  For further reference please see the `pod.yaml`_
> 
>.. _pod.yaml: https://github.com/intel/afxdp-plugins-for-
> kubernetes/blob/main/examples/pod-spec.yaml
> 
> @@ -297,3 +310,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin +
> CNI
> --vdev=net_af_xdp0,use_cni=1,iface= \
> --no-mlockall --in-memory \
> -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward

RE: [PATCH] net/af_xdp: fix resources leak when xsk configure fails

2024-02-22 Thread Loftus, Ciara

> 
> On 22/02/2024 03:07, Yunjian Wang wrote:
> In xdp_umem_configure() allocated some resources for the
> xsk umem, we should delete them when xsk configure fails,
> otherwise it will lead to resources leak.
> 
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> Cc: mailto:sta...@dpdk.org
> 
> Signed-off-by: Yunjian Wang mailto:wangyunj...@huawei.com
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2d151e45c7..8b8b2cff9f 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1723,8 +1723,10 @@ xsk_configure(struct pmd_internals *internals,
> struct pkt_rx_queue *rxq,
>  out_xsk:
>   xsk_socket__delete(rxq->xsk);
>  out_umem:
> - if (__atomic_fetch_sub(&rxq->umem->refcnt, 1,
> __ATOMIC_ACQUIRE) - 1 == 0)
> + if (__atomic_fetch_sub(&rxq->umem->refcnt, 1,
> __ATOMIC_ACQUIRE) - 1 == 0) {
> + (void)xsk_umem__delete(rxq->umem->umem);
>   xdp_umem_destroy(rxq->umem);
> + }
> 
>   return ret;
>  }
> 
> Does it make sense to: move `xsk_umem__delete()` inside
> `xdp_umem_destroy()` to be invoked after a NULL check for `umem->umem`
> and then fixup the places where both functions are called to only invoke
> `xdp_umem_destroy()`? (Keeping all the umem cleanup code
> in one place)
> @Yunjian WDYT?
> 
> @Ciara WDYT?

Thanks for the patch Yunjian.

@Maryam +1 for the suggestion I think it would be a good optimisation for the 
cleanup code.

Thanks,
Ciara

>

RE: [v7 1/1] net/af_xdp: fix multi interface support for K8s

2024-02-09 Thread Loftus, Ciara

> 
> On 1/11/2024 2:21 PM, Ferruh Yigit wrote:
> > On 1/11/2024 12:21 PM, Maryam Tahhan wrote:
> >> On 11/01/2024 11:35, Ferruh Yigit wrote:
> >>> Devarg is user interface, changing it impacts the user.
> >>>
> >>> Assume that user of '22.11.3' using 'use_cni' dev_arg, it will be broken
> >>> when user upgrades DPDK to '22.11.4', which is not expected.
> >>>
> >>> dev_arg is not API/ABI but as it impacts the user, it is in the gray
> >>> area to backport to the LTS release.
> >> Fair enough
> >>> Current patch doesn't have Fixes tag or stable tag, so it doesn't
> >>> request to be backported to LTS release. I took this as an improvement,
> >>> more than a fix.
> >>
> >> This was overlooked by me apologies. It's been a while since I've
> >> contributed to DPDK and I must've missed this detail in the contribution
> >> guide.
> >>> As far as I understand existing code (that use 'use_cni' dev_arg)
> >>> supports only single netdev, this patch adds support for multiple netdevs.
> >>
> >> The use_cni implementation will no longer work with the AF_XDP DP as the
> >> use_cni was originally implemented as it has hard coded what's now an
> >> incorrect path for the UDS.
> >>
> >>> So what do you think keep LTS with 'use_cni' dev_arg, is there a
> >>> requirement to update LTS release?
> >>> If so, can it be an option to keep 'use_cni' for backward compatibility
> >>> but add only add 'uds_path' and remove 'use_cni' in next LTS?
> >>
> >>
> >> Yeah we can go back to the version of the patch that had the 'use_cni'
> >> flag that was used in combination with the path argument. We can add
> >> better documentation re the "use_cni" misnomer... What we can then do is
> >> if no path argument is set by the user assume their intent and and
> >> generate the path internally in the AF_XDP PMD (which was suggested by
> >> Shibin at some stage). That way there should be no surprises to the End
> >> User.
> >>
> >
> > Ack, this keeps backward compatibility,
> >
> > BUT if 'use_cni' is already broken in v23.11 (that is what I understand
> > from your above comment), means there is no user of it in LTS, and we
> > can be more pragmatic and replace the dev_args, by backporting this
> > patch, assuming LTS maintainer is also OK with it.
> >
> 
> Hi Maryam,
> 
> How do you want to continue with the patch, I think options we considered:
> 
> 1. Fix 'use_cni' documentation (which we can backport to LTS) and
> overload the argument for new purpose. This will enable new feature by
> keeping backward compatibility. And requires new version of this patch.
> 
> 2. If the 'use_cni' is completely broken in the 23.11 LTS, which means
> there is no user or backward compatibility to worry about, we can merge
> this patch and backport it to LTS.
> 
> 3. Don't backport this fix to LTS, merge only to current release, which
> means your new feature won't be available to some users as long as a few
> years.
> 
> 
> (1.) is most user friendly, but if 'use_cni' already broken in LTS we
> can go with option (2.). What do you think?
> 
> 
> 
> btw, @Ciara, @Maryam, if (2.) is true, how we end up having a feature
> ('use_cni' dev_args) completely broken in an LTS release?

My understanding is that the use_cni implementation that is available in the 
23.11 LTS is compatible with a particular version of the 
afxdp-plugins-for-kubernetes source. Maryam's change makes it compatible with 
the latest version. @Maryam can you confirm this?
If my understanding is correct then I think we should include the 
version/tag/commit-id of afxdp-plugins-for-kubernetes that the code is 
compatible with. Including backporting a patch to LTS to specify what version 
that code is comaptible with.

> 
> 
> 
> >
> >> Long term I would like to keep a (renamed) path argument (in case the
> >> path does ever change from the AF_XDP DP POV) and use it also in
> >> combination with another (maybe boolean) param for passing pinned bpf
> >> maps rather than another separate path.
> >>
> >> WDYT? Would this work for the LTS release?
> >>
> >>
> >

RE: [v5] net/af_xdp: enable uds_path instead of use_cni

2023-12-15 Thread Loftus, Ciara

> 
> With the original 'use_cni' implementation, (using a
> hardcoded socket rather than a configurable one),
> if a DPDK pod is requesting multiple net devices
> and these devices are from different pools, then
> the container attempts to mount all the netdev UDSes
> in the pod as /tmp/afxdp.sock. Which means that at best
> only 1 netdev will handshake correctly with the AF_XDP
> DP. This patch addresses this by making the socket
> parameter configurable using a new vdev param called
> 'uds_path' and removing the previous 'use_cni' param.
> This patch also fixes incorrect references to the
> AF_XDP DP as CNI and updates the documentation with a
> working example. This change has been tested with the
> AF_XDP DP PR 81, with both single and multiple interfaces.
> 
> v5:
> * Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
> * Remove use_cni references in af_xdp.rst
> 
> v4:
> * Rename af_xdp_cni.rst to af_xdp_dp.rst
> * Removed all incorrect references to CNI throughout af_xdp
>   PMD file.
> * Fixed Typos in af_xdp_dp.rst
> 
> v3:
> * Remove `use_cni` vdev argument as it's no longer needed.
> * Update incorrect CNI references for the AF_XDP DP in the
>   documentation.
> * Update the documentation to run a simple example with the
>   AF_XDP DP plugin in K8s.
> 
> v2:
> * Rename sock_path to uds_path.
> * Update documentation to reflect when CAP_BPF is needed.
> * Fix testpmd arguments in the provided example for Pods.
> * Use AF_XDP API to update the xskmap entry.
> 
> Signed-off-by: Maryam Tahhan 

Thanks for the changes. LGTM.

Reviewed-by: Ciara Loftus 

Would like to wait for Shibin to review this latest patch before it's 
considered for merging as I know he had some feedback on earlier versions.

Thanks,
Ciara

> ---
>  doc/guides/howto/af_xdp_cni.rst | 253 -
>  doc/guides/howto/af_xdp_dp.rst  | 278
> 
>  doc/guides/howto/index.rst  |   2 +-
>  doc/guides/nics/af_xdp.rst  |  27 ++-
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 100 +-
>  5 files changed, 345 insertions(+), 315 deletions(-)
>  delete mode 100644 doc/guides/howto/af_xdp_cni.rst
>  create mode 100644 doc/guides/howto/af_xdp_dp.rst
> 
> diff --git a/doc/guides/howto/af_xdp_cni.rst
> b/doc/guides/howto/af_xdp_cni.rst
> deleted file mode 100644
> index a1a6d5b99c..00
> --- a/doc/guides/howto/af_xdp_cni.rst
> +++ /dev/null
> @@ -1,253 +0,0 @@
> -.. SPDX-License-Identifier: BSD-3-Clause
> -   Copyright(c) 2023 Intel Corporation.
> -
> -Using a CNI with the AF_XDP driver
> -==
> -
> -Introduction
> -
> -
> -CNI, the Container Network Interface, is a technology for configuring
> -container network interfaces
> -and which can be used to setup Kubernetes networking.
> -AF_XDP is a Linux socket Address Family that enables an XDP program
> -to redirect packets to a memory buffer in userspace.
> -
> -This document explains how to enable the `AF_XDP Plugin for Kubernetes`_
> within
> -a DPDK application using the :doc:`../nics/af_xdp` to connect and use these
> technologies.
> -
> -.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-
> for-kubernetes
> -
> -
> -Background
> ---
> -
> -The standard :doc:`../nics/af_xdp` initialization process involves loading an
> eBPF program
> -onto the kernel netdev to be used by the PMD.
> -This operation requires root or escalated Linux privileges
> -and thus prevents the PMD from working in an unprivileged container.
> -The AF_XDP CNI plugin handles this situation
> -by providing a device plugin that performs the program loading.
> -
> -At a technical level the CNI opens a Unix Domain Socket and listens for a 
> client
> -to make requests over that socket.
> -A DPDK application acting as a client connects and initiates a configuration
> "handshake".
> -The client then receives a file descriptor which points to the XSKMAP
> -associated with the loaded eBPF program.
> -The XSKMAP is a BPF map of AF_XDP sockets (XSK).
> -The client can then proceed with creating an AF_XDP socket
> -and inserting that socket into the XSKMAP pointed to by the descriptor.
> -
> -The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
> -to run the PMD in unprivileged mode and to receive the XSKMAP file
> descriptor
> -from the CNI.
> -When this flag is set,
> -the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
> -should be used when creating the socket
> -to instruct libbpf not to load the default libbpf program on the netdev.
> -Instead the loading is handled by the CNI.
> -
> -.. note::
> -
> -   The Unix Domain Socket file path appear in the end user is
> "/tmp/afxdp.sock".
> -
> -
> -Prerequisites
> --
> -
> -Docker and container prerequisites:
> -
> -* Set up the device plugin
> -  as described in the instructions for `AF_XDP Plugin for Kubernetes`_.
> -
> -* The Docker image should contain the libbpf and libxdp libraries,
> -  which are

RE: [v4] net/af_xdp: enable uds_path instead of use_cni

2023-12-15 Thread Loftus, Ciara

Thanks for the latest patch Maryam. I have one minor suggestion inline.
Also, there are still some references to "use_cni" in af_xdp.rst which should 
be removed/replaced with uds_path.
Once that's done I think the patch should be good to go. Perhaps also consider 
adding a note to the release notes mentioning the new functionality.

Thanks,
Ciara

> 
> With the original 'use_cni' implementation, (using a
> hardcoded socket rather than a configurable one),
> if a DPDK pod is requesting multiple net devices
> and these devices are from different pools, then
> the container attempts to mount all the netdev UDSes
> in the pod as /tmp/afxdp.sock. Which means that at best
> only 1 netdev will handshake correctly with the AF_XDP
> DP. This patch addresses this by making the socket
> parameter configurable using a new vdev param called
> 'uds_path' and removing the previous 'use_cni' param.
> This patch also fixes incorrect references to the
> AF_XDP DP as CNI and updates the documentation with a
> working example. This change has been tested with the
> AF_XDP DP PR 81, with both single and multiple interfaces.
> 
> v4:
> * Rename af_xdp_cni.rst to af_xdp_dp.rst
> * Removed all incorrect references to CNI throughout af_xdp
>   PMD file.
> * Fixed Typos in af_xdp_dp.rst
> 
> v3:
> * Remove `use_cni` vdev argument as it's no longer needed.
> * Update incorrect CNI references for the AF_XDP DP in the
>   documentation.
> * Update the documentation to run a simple example with the
>   AF_XDP DP plugin in K8s.
> 
> v2:
> * Rename sock_path to uds_path.
> * Update documentation to reflect when CAP_BPF is needed.
> * Fix testpmd arguments in the provided example for Pods.
> * Use AF_XDP API to update the xskmap entry.
> 
> Signed-off-by: Maryam Tahhan 
> ---
>  doc/guides/howto/af_xdp_cni.rst | 253 -
>  doc/guides/howto/af_xdp_dp.rst  | 278
> 
>  doc/guides/howto/index.rst  |   2 +-
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 100 +-
>  4 files changed, 328 insertions(+), 305 deletions(-)
>  delete mode 100644 doc/guides/howto/af_xdp_cni.rst
>  create mode 100644 doc/guides/howto/af_xdp_dp.rst
> 



> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 353c8688ec..6caad58e60 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -88,7 +88,6 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
>  #define UDS_MAX_CMD_LEN  64
>  #define UDS_MAX_CMD_RESP 128
>  #define UDS_XSK_MAP_FD_MSG   "/xsk_map_fd"
> -#define UDS_SOCK "/tmp/afxdp.sock"
>  #define UDS_CONNECT_MSG  "/connect"
>  #define UDS_HOST_OK_MSG  "/host_ok"
>  #define UDS_HOST_NAK_MSG "/host_nak"
> @@ -170,7 +169,7 @@ struct pmd_internals {
>   char prog_path[PATH_MAX];
>   bool custom_prog_configured;
>   bool force_copy;
> - bool use_cni;
> + char uds_path[PATH_MAX];
>   struct bpf_map *map;
> 
>   struct rte_ether_addr eth_addr;
> @@ -190,7 +189,7 @@ struct pmd_process_private {
>  #define ETH_AF_XDP_PROG_ARG  "xdp_prog"
>  #define ETH_AF_XDP_BUDGET_ARG"busy_budget"
>  #define ETH_AF_XDP_FORCE_COPY_ARG"force_copy"
> -#define ETH_AF_XDP_USE_CNI_ARG   "use_cni"
> +#define ETH_AF_XDP_USE_DP_UDS_PATH_ARG   "uds_path"

Use the same alignment for "uds_path" as the strings above it.

> 
>  static const char * const valid_arguments[] = {
>   ETH_AF_XDP_IFACE_ARG,
> @@ -200,7 +199,7 @@ static const char * const valid_arguments[] = {
>   ETH_AF_XDP_PROG_ARG,
>   ETH_AF_XDP_BUDGET_ARG,
>   ETH_AF_XDP_FORCE_COPY_ARG,
> - ETH_AF_XDP_USE_CNI_ARG,
> + ETH_AF_XDP_USE_DP_UDS_PATH_ARG,
>   NULL
>  };
> 
> @@ -1351,7 +1350,7 @@ configure_preferred_busy_poll(struct
> pkt_rx_queue *rxq)
>  }
> 
>  static int
> -init_uds_sock(struct sockaddr_un *server)
> +init_uds_sock(struct sockaddr_un *server, const char *uds_path)
>  {
>   int sock;
> 
> @@ -1362,7 +1361,7 @@ init_uds_sock(struct sockaddr_un *server)
>   }
> 
>   server->sun_family = AF_UNIX;
> - strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
> + strlcpy(server->sun_path, uds_path, sizeof(server->sun_path));
> 
>   if (connect(sock, (struct sockaddr *)server, sizeof(struct
> sockaddr_un)) < 0) {
>   close(sock);
> @@ -1382,7 +1381,7 @@ struct msg_internal {
>  };
> 
>  static int
> -send_msg(int sock, char *request, int *fd)
> +send_msg(int sock, char *request, int *fd, const char *uds_path)
>  {
>   int snd;
>   struct iovec iov;
> @@ -1393,7 +1392,7 @@ send_msg(int sock, char *request, int *fd)
> 
>   memset(&dst, 0, sizeof(dst));
>   dst.sun_family = AF_UNIX;
> - strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
> + strlcpy(dst.sun_path, uds_path,

RE: [v2] net/af_xdp: enable a sock path alongside use_cni

2023-12-05 Thread Loftus, Ciara




> -Original Message-
> From: Maryam Tahhan 
> Sent: Monday, December 4, 2023 10:31 AM
> To: ferruh.yi...@amd.com; step...@networkplumber.org;
> lihuis...@huawei.com; fengcheng...@huawei.com;
> liuyongl...@huawei.com; Koikkara Reeny, Shibin
> 
> Cc: dev@dpdk.org; Tahhan, Maryam 
> Subject: [v2] net/af_xdp: enable a sock path alongside use_cni
> 
> With the original 'use_cni' implementation, (using a
> hardcoded socket rather than a configurable one),
> if a single pod is requesting multiple net devices
> and these devices are from different pools, then
> the container attempts to mount all the netdev UDSes
> in the pod as /tmp/afxdp.sock. Which means that at best
> only 1 netdev will handshake correctly with the AF_XDP
> DP. This patch addresses this by making the socket
> parameter configurable alongside the 'use_cni' param.
> Tested with the AF_XDP DP CNI PR 81.
> 
> v2:
> * Rename sock_path to uds_path.
> * Update documentation to reflect when CAP_BPF is needed.
> * Fix testpmd arguments in the provided example for Pods.
> * Use AF_XDP API to update the xskmap entry.
> 
> Signed-off-by: Maryam Tahhan 
> ---
>  doc/guides/howto/af_xdp_cni.rst | 24 ++-
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 62 ++---
>  2 files changed, 54 insertions(+), 32 deletions(-)
> 
> diff --git a/doc/guides/howto/af_xdp_cni.rst
> b/doc/guides/howto/af_xdp_cni.rst
> index a1a6d5b99c..7829526b40 100644
> --- a/doc/guides/howto/af_xdp_cni.rst
> +++ b/doc/guides/howto/af_xdp_cni.rst
> @@ -38,9 +38,10 @@ The XSKMAP is a BPF map of AF_XDP sockets (XSK).
>  The client can then proceed with creating an AF_XDP socket
>  and inserting that socket into the XSKMAP pointed to by the descriptor.
> 
> -The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
> -to run the PMD in unprivileged mode and to receive the XSKMAP file
> descriptor
> -from the CNI.
> +The EAL vdev arguments ``use_cni`` and ``uds_path`` are used to indicate that
> +the user wishes to run the PMD in unprivileged mode and to receive the
> XSKMAP
> +file descriptor from the CNI.
> +
>  When this flag is set,
>  the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
>  should be used when creating the socket
> @@ -49,7 +50,7 @@ Instead the loading is handled by the CNI.
> 
>  .. note::
> 
> -   The Unix Domain Socket file path appear in the end user is
> "/tmp/afxdp.sock".
> +   The Unix Domain Socket file path appears to the end user at
> "/tmp/afxdp_dp//afxdp.sock".
> 
> 
>  Prerequisites
> @@ -223,8 +224,7 @@ Howto run dpdk-testpmd with CNI plugin:
>   securityContext:
>capabilities:
>   add:
> -   - CAP_NET_RAW
> -   - CAP_BPF
> +   - NET_RAW
>   resources:
> requests:
>   hugepages-2Mi: 2Gi
> @@ -239,14 +239,20 @@ Howto run dpdk-testpmd with CNI plugin:
> 
>.. _pod.yaml: https://github.com/intel/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/pod-1c1d.yaml
> 
> +.. note::
> +
> +   For Kernel versions older than 5.19 `CAP_BPF` is also required in
> +   the container capabilities stanza.
> +
>  * Run DPDK with a command like the following:
> 
>.. code-block:: console
> 
>   kubectl exec -i  --container  -- \
> -   //dpdk-testpmd -l 0,1 --no-pci \
> -   --vdev=net_af_xdp0,use_cni=1,iface= \
> -   -- --no-mlockall --in-memory
> +   //dpdk-testpmd -l 0-2 --no-pci --main-lcore=2 \
> +   --vdev net_af_xdp0,iface= name>,use_cni=1,uds_path=/tmp/afxdp_dp//afxdp.sock \
> +   --vdev net_af_xdp1,iface=e name>,use_cni=1,uds_path=/tmp/afxdp_dp//afxdp.sock \
> +   -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
> 
>  For further reference please use the `e2e`_ test case in `AF_XDP Plugin for
> Kubernetes`_
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 353c8688ec..505ed6cf1e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -88,7 +88,6 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
>  #define UDS_MAX_CMD_LEN  64
>  #define UDS_MAX_CMD_RESP 128
>  #define UDS_XSK_MAP_FD_MSG   "/xsk_map_fd"
> -#define UDS_SOCK "/tmp/afxdp.sock"
>  #define UDS_CONNECT_MSG  "/connect"
>  #define UDS_HOST_OK_MSG  "/host_ok"
>  #define UDS_HOST_NAK_MSG "/host_nak"
> @@ -171,6 +170,7 @@ struct pmd_internals {
>   bool custom_prog_configured;
>   bool force_copy;
>   bool use_cni;
> + char uds_path[PATH_MAX];
>   struct bpf_map *map;
> 
>   struct rte_ether_addr eth_addr;
> @@ -191,6 +191,7 @@ struct pmd_process_private {
>  #define ETH_AF_XDP_BUDGET_ARG"busy_budget"
>  #define ETH_AF_XDP_FORCE_COPY_ARG"force_copy"
>  #define ETH_AF_XDP_USE_CNI_ARG   "use_cni"
> +

RE: [PATCH v2] net/af_xdp: fix memzone leak in error path

2023-12-05 Thread Loftus, Ciara

> 
> In xdp_umem_configure() allocated memzone for the 'umem', we should
> free it when xsk_umem__create() call fails, otherwise it will lead to
> memory zone leak. To fix it move 'umem->mz = mz;' assignment after
> 'mz == NULL' check.
> 
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yunjian Wang 
> ---
> v2: update code suggested by Ferruh Yigit
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 353c8688ec..9f0f751d4a 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1235,6 +1235,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   goto err;
>   }
> 
> + umem->mz = mz;
>   ret = xsk_umem__create(&umem->umem, mz->addr,
>  ETH_AF_XDP_NUM_BUFFERS *
> ETH_AF_XDP_FRAME_SIZE,
>  &rxq->fq, &rxq->cq,
> @@ -1244,7 +1245,6 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   AF_XDP_LOG(ERR, "Failed to create umem\n");
>   goto err;
>   }
> - umem->mz = mz;
> 
>   return umem;
> 
> --
> 2.33.0

Thank you for the patch.

Acked-by: Ciara Loftus

RE: [PATCH] tap: fix build of tap_bpf_program

2023-07-26 Thread Loftus, Ciara

> On 7/20/2023 8:45 AM, Ferruh Yigit wrote:
> > On 7/19/2023 5:12 PM, Stephen Hemminger wrote:
> >> On Wed, 19 Jul 2023 11:03:36 +0100
> >> Ferruh Yigit  wrote:
> >>
> >>> On 7/19/2023 11:00 AM, Ferruh Yigit wrote:
>  On 7/17/2023 8:15 PM, Stephen Hemminger wrote:
> > The tap_bpf_program.c is not built as part of normal DPDK
> > EAL environment. It is intended to be built standalone
> > and does not use rte_common.h.
> >
> > This reverts the related change from
> > commit ef5baf3486e0 ("replace packed attributes")
> >
> > Note: this patch will cause expected warnings from checkpatch
> > because the code involved is not used directly in DPDK environment.
> >
> > Signed-off-by: Stephen Hemminger 
> >
> 
>  Agree, this seems done by mistake as part of batch update,
> 
>  Acked-by: Ferruh Yigit 
> 
> 
>  But I can't update the bpf file at all, if I am not missing something I
> >>>
> >>> * I can't *compile* the bpf file ...
> >>>
>  am not sure if we should get just this update or have a patch/patchset
>  that fixes the build.
> 
>  @Ophir, how the bpf file is compiled? And did you test it recently?
> 
>  I am using command from the documentation:
>  `clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf
>  -filetype=obj -o tap_bpf_program.o`
> >>
> >> It looks like this won't work because it was expecting to be able
> >> to find header files from older version of iproute2.  These are not
> >> distributed, and the change to support libbpf in iproute2 makes the
> >> current versions not work.
> >>
> >> As a stopgap, will look back in history and see what version of header
> >> files will at least get a working build.
> >>
> >> From there, need to replace how the conversion of .o to array works.
> >> Would prefer to use dlopen() to read the ELF file rather than expecting
> >> developers to hack together their own tools.
> >>
> >> Not sure how much effort is really needed here. This is only being
> >> used for the case of rte_flow with multiq RSS. Probably, no one ever
> >> used it.
> >>
> >
> > Should we remove the file, instead of fixing '__rte_packed'?
> >
> 
> +Long, and af_xdp maintainers,
> 
> @Long, do you know if this bfp code is still in use somewhere, if so is
> the user interested in fixing/maintaining the code?
> 
> 
> @Ciara, @Qi, do you see any benefit to keep/extend this kind of bfp file
> usage? Do you think is this something to invest more?

If the code is still being used I would agree with Stephen that using dlopen or 
libbpf to load the eBPF code would be preferable. The current steps are 
difficult to follow.

RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-07-21 Thread Loftus, Ciara

> 
> > 
> >  On 6/24/22 13:23, Ciara Loftus wrote:
> > > libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and
> > >> bpf_set_link_xdp_fd
> > > functions. Use meson to detect if libbpf >= v0.7.0 is linked and if 
> > > so,
> > use
> > > the recommended replacement functions bpf_xdp_query_id,
> >  bpf_xdp_attach
> > > and bpf_xdp_detach which are available to use since libbpf v0.7.0.
> > >
> > > Also prevent linking with libbpf versions > v0.8.0.
> > >
> > > Signed-off-by: Ciara Loftus 
> > > ---
> > > doc/guides/nics/af_xdp.rst  |  3 ++-
> > > drivers/net/af_xdp/compat.h | 36
> >  -
> > > drivers/net/af_xdp/meson.build  |  7 ++
> > > drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
> > > 4 files changed, 42 insertions(+), 23 deletions(-)
> > 
> >  Don't we need to mention these changes in release notes?
> > 
> > >
> > > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > > index 56681c8365..9edb48df67 100644
> > > --- a/doc/guides/nics/af_xdp.rst
> > > +++ b/doc/guides/nics/af_xdp.rst
> > > @@ -43,7 +43,8 @@ Prerequisites
> > > This is a Linux-specific PMD, thus the following prerequisites
> apply:
> > >
> > > *  A Linux Kernel (version > v4.18) with XDP sockets configuration
> > >> enabled;
> > > -*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf
> > <=v0.6.0
> > > +*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or,
> > libbpf
> > > +   <=v0.6.0.
> > > *  If using libxdp, it requires an environment variable called
> > >LIBXDP_OBJECT_PATH to be set to the location of where libxdp
> > >> placed its
> >  bpf
> > >object files. This is usually in /usr/local/lib/bpf or
> > /usr/local/lib64/bpf.
> > > diff --git a/drivers/net/af_xdp/compat.h
> > >> b/drivers/net/af_xdp/compat.h
> > > index 28ea64aeaa..8f4ac8b5ea 100644
> > > --- a/drivers/net/af_xdp/compat.h
> > > +++ b/drivers/net/af_xdp/compat.h
> > > @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q
> >  __rte_unused)
> > > }
> > > #endif
> > >
> > > -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
> > > +#ifdef RTE_NET_AF_XDP_LIBBPF_V070
> > 
> >  Typically version-based checks are considered as bad. Isn't it
> >  better use feature-based checks/defines?
> > >>>
> > >>> Hi Andrew,
> > >>>
> > >>> Thank you for the feedback. Is the feature-based checking something
> > that
> > >> we can push to the next release?
> > >>>
> > >>> We are already using the pkg-config version-check method for other
> > >> libraries/features in the meson.build file:
> > >>> * libxdp >= v1.2.2 # earliest compatible libxdp release
> > >>> * libbpf >= v0.7.0 # bpf_object__* functions
> > >>> * libbpf >= v0.2.0 # shared umem feature
> > >>>
> > >>> If we change to your suggested method I think we should change
> them
> > all
> > >> in one patch. IMO it's probably too close to the release to change them
> all
> > >> right now. What do you think?
> > >>>
> > >>> Thanks,
> > >>> Ciara
> > >>
> > >> Hi Ciara,
> > >>
> > >> yes, ideally we should avoid usage of version-based check everywhere,
> > >> but I don't think that it is critical to switch at once. We can use it
> > >> for new checks right now and rewrite old/existing checks a bit later in
> > >> the next release.
> > >>
> > >> Please, note that my notes are related to review notes from Thomas
> who
> > >> asked by file_library() method is removed. Yes, it is confusing and it
> > >> is better to avoid it. Usage of feature-based checks would allow to
> > >> preserve find_library() as well.
> > >
> > > Thank you for the explanation.
> > > In this case we want to check that the libbpf library is <=v0.8.0. At this
> > moment in time v0.8.0 is the latest version of libbpf so we cannot check for
> a
> > symbol that tells us the library is > v0.8.0. Can you think of a way to
> approach
> > this without using the pkg-config version check method?
> > >
> > > I've introduced this check to future-proof the PMD and ensure we only
> > ever link with versions of libbpf that we've validated to be compatible with
> > the PMD. When say v0.9.0 is released we can patch the PMD allowing for
> > libbpf <= v0.9.0 and make any necessary API changes as part of that patch.
> > This should hopefully help avoid the scenario Thomas encountered.
> >
> > Personally I'd consider such checks which limit version as a drawback.
> > I think checks on build should not be used to reject future versions.
> > Otherwise, introduction of any further even minor version would require
> > a patch to allow it. Documentation is the place for information about
> > validated versions. Build should not enforce it.
> 
> Got it. I'll submit a v2 which removes the version-limiting and reinstates the
> cc.find_li

RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-06-28 Thread Loftus, Ciara

> 
>  On 6/24/22 13:23, Ciara Loftus wrote:
> > libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and
> >> bpf_set_link_xdp_fd
> > functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so,
> use
> > the recommended replacement functions bpf_xdp_query_id,
>  bpf_xdp_attach
> > and bpf_xdp_detach which are available to use since libbpf v0.7.0.
> >
> > Also prevent linking with libbpf versions > v0.8.0.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> > doc/guides/nics/af_xdp.rst  |  3 ++-
> > drivers/net/af_xdp/compat.h | 36
>  -
> > drivers/net/af_xdp/meson.build  |  7 ++
> > drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
> > 4 files changed, 42 insertions(+), 23 deletions(-)
> 
>  Don't we need to mention these changes in release notes?
> 
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index 56681c8365..9edb48df67 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -43,7 +43,8 @@ Prerequisites
> > This is a Linux-specific PMD, thus the following prerequisites 
> > apply:
> >
> > *  A Linux Kernel (version > v4.18) with XDP sockets configuration
> >> enabled;
> > -*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf
> <=v0.6.0
> > +*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or,
> libbpf
> > +   <=v0.6.0.
> > *  If using libxdp, it requires an environment variable called
> >LIBXDP_OBJECT_PATH to be set to the location of where libxdp
> >> placed its
>  bpf
> >object files. This is usually in /usr/local/lib/bpf or
> /usr/local/lib64/bpf.
> > diff --git a/drivers/net/af_xdp/compat.h
> >> b/drivers/net/af_xdp/compat.h
> > index 28ea64aeaa..8f4ac8b5ea 100644
> > --- a/drivers/net/af_xdp/compat.h
> > +++ b/drivers/net/af_xdp/compat.h
> > @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q
>  __rte_unused)
> > }
> > #endif
> >
> > -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
> > +#ifdef RTE_NET_AF_XDP_LIBBPF_V070
> 
>  Typically version-based checks are considered as bad. Isn't it
>  better use feature-based checks/defines?
> >>>
> >>> Hi Andrew,
> >>>
> >>> Thank you for the feedback. Is the feature-based checking something
> that
> >> we can push to the next release?
> >>>
> >>> We are already using the pkg-config version-check method for other
> >> libraries/features in the meson.build file:
> >>> * libxdp >= v1.2.2 # earliest compatible libxdp release
> >>> * libbpf >= v0.7.0 # bpf_object__* functions
> >>> * libbpf >= v0.2.0 # shared umem feature
> >>>
> >>> If we change to your suggested method I think we should change them
> all
> >> in one patch. IMO it's probably too close to the release to change them all
> >> right now. What do you think?
> >>>
> >>> Thanks,
> >>> Ciara
> >>
> >> Hi Ciara,
> >>
> >> yes, ideally we should avoid usage of version-based check everywhere,
> >> but I don't think that it is critical to switch at once. We can use it
> >> for new checks right now and rewrite old/existing checks a bit later in
> >> the next release.
> >>
> >> Please, note that my notes are related to review notes from Thomas who
> >> asked by file_library() method is removed. Yes, it is confusing and it
> >> is better to avoid it. Usage of feature-based checks would allow to
> >> preserve find_library() as well.
> >
> > Thank you for the explanation.
> > In this case we want to check that the libbpf library is <=v0.8.0. At this
> moment in time v0.8.0 is the latest version of libbpf so we cannot check for a
> symbol that tells us the library is > v0.8.0. Can you think of a way to 
> approach
> this without using the pkg-config version check method?
> >
> > I've introduced this check to future-proof the PMD and ensure we only
> ever link with versions of libbpf that we've validated to be compatible with
> the PMD. When say v0.9.0 is released we can patch the PMD allowing for
> libbpf <= v0.9.0 and make any necessary API changes as part of that patch.
> This should hopefully help avoid the scenario Thomas encountered.
> 
> Personally I'd consider such checks which limit version as a drawback.
> I think checks on build should not be used to reject future versions.
> Otherwise, introduction of any further even minor version would require
> a patch to allow it. Documentation is the place for information about
> validated versions. Build should not enforce it.

Got it. I'll submit a v2 which removes the version-limiting and reinstates the 
cc.find_library() method. I'll update the documentation to indicate only 
versions up to v0.8.0 are supported and add a note to the release notes.
Although if it's too late in the release cycle we can postpone this patch until 
after, and simply

RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-06-27 Thread Loftus, Ciara

> 
> On 6/27/22 17:17, Loftus, Ciara wrote:
> >>
> >> On 6/24/22 13:23, Ciara Loftus wrote:
> >>> libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and
> bpf_set_link_xdp_fd
> >>> functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, 
> >>> use
> >>> the recommended replacement functions bpf_xdp_query_id,
> >> bpf_xdp_attach
> >>> and bpf_xdp_detach which are available to use since libbpf v0.7.0.
> >>>
> >>> Also prevent linking with libbpf versions > v0.8.0.
> >>>
> >>> Signed-off-by: Ciara Loftus 
> >>> ---
> >>>doc/guides/nics/af_xdp.rst  |  3 ++-
> >>>drivers/net/af_xdp/compat.h | 36
> >> -
> >>>drivers/net/af_xdp/meson.build  |  7 ++
> >>>drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
> >>>4 files changed, 42 insertions(+), 23 deletions(-)
> >>
> >> Don't we need to mention these changes in release notes?
> >>
> >>>
> >>> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> >>> index 56681c8365..9edb48df67 100644
> >>> --- a/doc/guides/nics/af_xdp.rst
> >>> +++ b/doc/guides/nics/af_xdp.rst
> >>> @@ -43,7 +43,8 @@ Prerequisites
> >>>This is a Linux-specific PMD, thus the following prerequisites apply:
> >>>
> >>>*  A Linux Kernel (version > v4.18) with XDP sockets configuration
> enabled;
> >>> -*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf 
> >>> <=v0.6.0
> >>> +*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, 
> >>> libbpf
> >>> +   <=v0.6.0.
> >>>*  If using libxdp, it requires an environment variable called
> >>>   LIBXDP_OBJECT_PATH to be set to the location of where libxdp
> placed its
> >> bpf
> >>>   object files. This is usually in /usr/local/lib/bpf or 
> >>> /usr/local/lib64/bpf.
> >>> diff --git a/drivers/net/af_xdp/compat.h
> b/drivers/net/af_xdp/compat.h
> >>> index 28ea64aeaa..8f4ac8b5ea 100644
> >>> --- a/drivers/net/af_xdp/compat.h
> >>> +++ b/drivers/net/af_xdp/compat.h
> >>> @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q
> >> __rte_unused)
> >>>}
> >>>#endif
> >>>
> >>> -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
> >>> +#ifdef RTE_NET_AF_XDP_LIBBPF_V070
> >>
> >> Typically version-based checks are considered as bad. Isn't it
> >> better use feature-based checks/defines?
> >
> > Hi Andrew,
> >
> > Thank you for the feedback. Is the feature-based checking something that
> we can push to the next release?
> >
> > We are already using the pkg-config version-check method for other
> libraries/features in the meson.build file:
> > * libxdp >= v1.2.2 # earliest compatible libxdp release
> > * libbpf >= v0.7.0 # bpf_object__* functions
> > * libbpf >= v0.2.0 # shared umem feature
> >
> > If we change to your suggested method I think we should change them all
> in one patch. IMO it's probably too close to the release to change them all
> right now. What do you think?
> >
> > Thanks,
> > Ciara
> 
> Hi Ciara,
> 
> yes, ideally we should avoid usage of version-based check everywhere,
> but I don't think that it is critical to switch at once. We can use it
> for new checks right now and rewrite old/existing checks a bit later in
> the next release.
> 
> Please, note that my notes are related to review notes from Thomas who
> asked by file_library() method is removed. Yes, it is confusing and it
> is better to avoid it. Usage of feature-based checks would allow to
> preserve find_library() as well.

Thank you for the explanation.
In this case we want to check that the libbpf library is <=v0.8.0. At this 
moment in time v0.8.0 is the latest version of libbpf so we cannot check for a 
symbol that tells us the library is > v0.8.0. Can you think of a way to 
approach this without using the pkg-config version check method?

I've introduced this check to future-proof the PMD and ensure we only ever link 
with versions of libbpf that we've validated to be compatible with the PMD. 
When say v0.9.0 is released we can patch the PMD allowing for libbpf <= v0.9.0 
and make any necessary API changes as part of that patch. This should hopefully 
help avoid the scenario Thomas encountered.

Ciara

> 
> Andrew.

RE: [PATCH] net/af_xdp: make compatible with libbpf v0.8.0

2022-06-27 Thread Loftus, Ciara

> 
> On 6/24/22 13:23, Ciara Loftus wrote:
> > libbpf v0.8.0 deprecates the bpf_get_link_xdp_id and bpf_set_link_xdp_fd
> > functions. Use meson to detect if libbpf >= v0.7.0 is linked and if so, use
> > the recommended replacement functions bpf_xdp_query_id,
> bpf_xdp_attach
> > and bpf_xdp_detach which are available to use since libbpf v0.7.0.
> >
> > Also prevent linking with libbpf versions > v0.8.0.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   doc/guides/nics/af_xdp.rst  |  3 ++-
> >   drivers/net/af_xdp/compat.h | 36
> -
> >   drivers/net/af_xdp/meson.build  |  7 ++
> >   drivers/net/af_xdp/rte_eth_af_xdp.c | 19 +++
> >   4 files changed, 42 insertions(+), 23 deletions(-)
> 
> Don't we need to mention these changes in release notes?
> 
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index 56681c8365..9edb48df67 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -43,7 +43,8 @@ Prerequisites
> >   This is a Linux-specific PMD, thus the following prerequisites apply:
> >
> >   *  A Linux Kernel (version > v4.18) with XDP sockets configuration 
> > enabled;
> > -*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0
> > +*  Both libxdp >=v1.2.2 and libbpf <=v0.8.0 libraries installed, or, libbpf
> > +   <=v0.6.0.
> >   *  If using libxdp, it requires an environment variable called
> >  LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its
> bpf
> >  object files. This is usually in /usr/local/lib/bpf or 
> > /usr/local/lib64/bpf.
> > diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
> > index 28ea64aeaa..8f4ac8b5ea 100644
> > --- a/drivers/net/af_xdp/compat.h
> > +++ b/drivers/net/af_xdp/compat.h
> > @@ -60,7 +60,7 @@ tx_syscall_needed(struct xsk_ring_prod *q
> __rte_unused)
> >   }
> >   #endif
> >
> > -#ifdef RTE_NET_AF_XDP_LIBBPF_OBJ_OPEN
> > +#ifdef RTE_NET_AF_XDP_LIBBPF_V070
> 
> Typically version-based checks are considered as bad. Isn't it
> better use feature-based checks/defines?

Hi Andrew,

Thank you for the feedback. Is the feature-based checking something that we can 
push to the next release?

We are already using the pkg-config version-check method for other 
libraries/features in the meson.build file:
* libxdp >= v1.2.2 # earliest compatible libxdp release
* libbpf >= v0.7.0 # bpf_object__* functions
* libbpf >= v0.2.0 # shared umem feature

If we change to your suggested method I think we should change them all in one 
patch. IMO it's probably too close to the release to change them all right now. 
What do you think?

Thanks,
Ciara

RE: [PATCH] net/af_xdp: limit libbpf version to <= v0.7.0

2022-06-24 Thread Loftus, Ciara

> 
> 24/06/2022 08:06, Ciara Loftus:
> > Linking with libbpf v0.8.0 causes deprication warnings. As a temporary
> > measure, prevent linking with libbpf versions v0.8.0 and greater. This
> > limitation should be removed in the future when appropriate
> > compatibility measures are introduced.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> > -bpf_dep = dependency('libbpf', required: false, method: 'pkg-config')
> > -if not bpf_dep.found()
> > -bpf_dep = cc.find_library('bpf', required: false)
> > -endif
> > +bpf_dep = dependency('libbpf', version : '<=0.7.0', required: false,
> method: 'pkg-config')
> 
> It is also removing the find_library() method.
> Any reason it was there?
> 

My understanding is that one can't check the library version using that method.
So it was a valid method of picking up the library until now where we always 
need to check the version before linking.

RE: af_xdp + libbpf 0.8

2022-06-23 Thread Loftus, Ciara

> 
> 24/06/2022 00:18, Ferruh Yigit:
> > On 6/23/2022 10:58 PM, Thomas Monjalon wrote:
> > > Hi,
> > >
> > > It seems DPDK is not compatible with libbpf 0.8:
> > >
> > > drivers/net/af_xdp/rte_eth_af_xdp.c:871:6: error: 'bpf_get_link_xdp_id'
> is deprecated: libbpf v0.8+: use bpf_xdp_query_id() instead
> > > /usr/include/bpf/libbpf.h:1168:1: note: 'bpf_get_link_xdp_id' has been
> explicitly marked deprecated here
> > > LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_xdp_query_id() instead")
> > >
> > > drivers/net/af_xdp/rte_eth_af_xdp.c:876:2: error: 'bpf_set_link_xdp_fd'
> is deprecated: libbpf v0.8+: use bpf_xdp_attach() instead
> > > /usr/include/bpf/libbpf.h:1163:1: note: 'bpf_set_link_xdp_fd' has been
> explicitly marked deprecated here
> > > LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_xdp_attach() instead")
> > >
> > > dpdk/drivers/net/af_xdp/rte_eth_af_xdp.c:1198:8: error:
> 'bpf_set_link_xdp_fd' is deprecated: libbpf v0.8+: use bpf_xdp_attach()
> instead
> > > /usr/include/bpf/libbpf.h:1163:1: note: 'bpf_set_link_xdp_fd' has been
> explicitly marked deprecated here
> > > LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_xdp_attach() instead")
> >
> >  From af_xdp documentation (doc.dpdk.org/guides/nics/af_xdp.html):
> >
> > 5.2. Prerequisites
> > This is a Linux-specific PMD, thus the following prerequisites apply:
> >
> > Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0
> 
> I am in the first case: libxdp-1.2.3 and libbpf-0.8
> According to the documentation, it should work.
> 

I submitted a fix to prevent linking with >=v0.8.0.
It can be used until the more robust solution is ready which requires a bit 
more validation.

Thanks,
Ciara

RE: [PATCH] net/af_xdp: fix custom program loading with multiple queues

2022-03-11 Thread Loftus, Ciara

> 
> 10/03/2022 09:49, Loftus, Ciara:
> > > When the PMD is configured to load a custom XDP program, it sets
> > > XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag to prevent libbpf from
> > > loading its default XDP program. However, when queue_count is set to
> > > greater than 1, this flag is only set for the first XSK socket but not
> > > for subsequent XSK sockets. This causes XSK socket creation failure.
> > >
> > > This commit ensures that XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag
> is
> > > set for all XSK socket creations when custom XDP program is being used.
> > >
> > > Fixes: 01fa83c94d7e ("net/af_xdp: workaround custom program loading")
> > >
> > > Signed-off-by: Junxiao Shi 
> >
> > Thanks for the patch!
> > It's probably too late to make it into 22.03 but cc-ing stable as it should 
> > be
> backported to 21.11.x.
> >
> > Acked-by: Ciara Loftus 
> 
> I can take it in -rc4. How much are you confident there is no regression?
> 

Thanks. I am confident there is no regression.

RE: [PATCH] net/af_xdp: fix custom program loading with multiple queues

2022-03-10 Thread Loftus, Ciara

> When the PMD is configured to load a custom XDP program, it sets
> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag to prevent libbpf from
> loading its default XDP program. However, when queue_count is set to
> greater than 1, this flag is only set for the first XSK socket but not
> for subsequent XSK sockets. This causes XSK socket creation failure.
> 
> This commit ensures that XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag is
> set for all XSK socket creations when custom XDP program is being used.
> 
> Fixes: 01fa83c94d7e ("net/af_xdp: workaround custom program loading")
> 
> Signed-off-by: Junxiao Shi 

Thanks for the patch!
It's probably too late to make it into 22.03 but cc-ing stable as it should be 
backported to 21.11.x.

Acked-by: Ciara Loftus 

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 23 ---
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 65479138d3..9920f49870 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1307,18 +1307,19 @@ xsk_configure(struct pmd_internals *internals,
> struct pkt_rx_queue *rxq,
>   cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
>  #endif
> 
> - if (strnlen(internals->prog_path, PATH_MAX) &&
> - !internals->custom_prog_configured) {
> - ret = load_custom_xdp_prog(internals->prog_path,
> -internals->if_index,
> -&internals->map);
> - if (ret) {
> - AF_XDP_LOG(ERR, "Failed to load custom XDP
> program %s\n",
> - internals->prog_path);
> - goto out_umem;
> + if (strnlen(internals->prog_path, PATH_MAX)) {
> + if (!internals->custom_prog_configured) {
> + ret = load_custom_xdp_prog(internals->prog_path,
> + internals->if_index,
> + &internals->map);
> + if (ret) {
> + AF_XDP_LOG(ERR, "Failed to load custom
> XDP program %s\n",
> + internals->prog_path);
> + goto out_umem;
> + }
> + internals->custom_prog_configured = 1;
>   }
> - internals->custom_prog_configured = 1;
> - cfg.libbpf_flags =
> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
> + cfg.libbpf_flags |=
> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
>   }
> 
>   if (internals->shared_umem)
> --
> 2.17.1

RE: [PATCH] net/af_xdp: make the PMD compatible with libbpf >= v0.7.0

2022-02-17 Thread Loftus, Ciara

> Subject: Re: [PATCH] net/af_xdp: make the PMD compatible with libbpf >=
> v0.7.0
> 
> On Thu, Feb 17, 2022 at 12:14:30PM +, Ciara Loftus wrote:
> > libbpf v0.7.0 deprecates the bpf_prog_load function. Use meson to detect
> > if libbpf >= v0.7.0 is linked and if so, use the recommended replacement
> > functions bpf_object__open_file and bpf_oject__load.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >  drivers/net/af_xdp/compat.h | 39
> +
> >  drivers/net/af_xdp/meson.build  |  5 
> >  drivers/net/af_xdp/rte_eth_af_xdp.c |  9 +++
> >  3 files changed, 48 insertions(+), 5 deletions(-)
> >
> 
> > diff --git a/drivers/net/af_xdp/meson.build
> b/drivers/net/af_xdp/meson.build
> > index 93e895eab9..9fe4063b99 100644
> > --- a/drivers/net/af_xdp/meson.build
> > +++ b/drivers/net/af_xdp/meson.build
> > @@ -22,6 +22,11 @@ if cc.has_header('linux/if_xdp.h')
> >  cflags += ['-DRTE_NET_AF_XDP_SHARED_UMEM']
> >  ext_deps += xdp_dep
> >  ext_deps += bpf_dep
> > +bpf_ver_dep = dependency('libbpf', version : '>=0.6.0',
> 
> typo? Commit log refers to v0.7.

Indeed. Thanks for the catch! v2 on the way.

Ciara

RE: [PATCH] net/af_xdp: add missing trailing newline in logs

2022-02-17 Thread Loftus, Ciara

> Subject: [PATCH] net/af_xdp: add missing trailing newline in logs
> 
> Caught while trying --in-memory mode, some log messages in this driver
> are not terminated with a newline:
> rte_pmd_af_xdp_probe(): net_af_xdp: Failed to register multi-process IPC
> callback: Operation not supportedvdev_probe(): failed to initialize
> net_af_xdp device
> 
> Other locations in this driver had the same issue, fix all at once.
> 
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Fixes: 9876cf8316b3 ("net/af_xdp: re-enable secondary process support")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 

Thanks David, LGTM.

Acked-by: Ciara Loftus 

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 6ac710c6bd..8bdc2920cf 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1071,7 +1071,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   umem = rte_zmalloc_socket("umem", sizeof(*umem), 0,
> rte_socket_id());
>   if (umem == NULL) {
> - AF_XDP_LOG(ERR, "Failed to allocate umem info");
> + AF_XDP_LOG(ERR, "Failed to allocate umem info\n");
>   return NULL;
>   }
> 
> @@ -1084,7 +1084,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
>   &rxq->fq, &rxq->cq, &usr_config);
>   if (ret) {
> - AF_XDP_LOG(ERR, "Failed to create umem");
> + AF_XDP_LOG(ERR, "Failed to create umem\n");
>   goto err;
>   }
>   umem->buffer = base_addr;
> @@ -1124,7 +1124,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> 
>   umem = rte_zmalloc_socket("umem", sizeof(*umem), 0,
> rte_socket_id());
>   if (umem == NULL) {
> - AF_XDP_LOG(ERR, "Failed to allocate umem info");
> + AF_XDP_LOG(ERR, "Failed to allocate umem info\n");
>   return NULL;
>   }
> 
> @@ -1160,7 +1160,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>  &usr_config);
> 
>   if (ret) {
> - AF_XDP_LOG(ERR, "Failed to create umem");
> + AF_XDP_LOG(ERR, "Failed to create umem\n");
>   goto err;
>   }
>   umem->mz = mz;
> @@ -1847,7 +1847,7 @@ afxdp_mp_request_fds(const char *name, struct
> rte_eth_dev *dev)
>   AF_XDP_LOG(DEBUG, "Sending multi-process IPC request for %s\n",
> name);
>   ret = rte_mp_request_sync(&request, &replies, &timeout);
>   if (ret < 0 || replies.nb_received != 1) {
> - AF_XDP_LOG(ERR, "Failed to request fds from primary: %d",
> + AF_XDP_LOG(ERR, "Failed to request fds from primary:
> %d\n",
>  rte_errno);
>   return -1;
>   }
> @@ -1996,7 +1996,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device
> *dev)
>   if (!afxdp_dev_count) {
>   ret = rte_mp_action_register(ETH_AF_XDP_MP_KEY,
> afxdp_mp_send_fds);
>   if (ret < 0) {
> - AF_XDP_LOG(ERR, "%s: Failed to register multi-
> process IPC callback: %s",
> + AF_XDP_LOG(ERR, "%s: Failed to register multi-
> process IPC callback: %s\n",
>  name, strerror(rte_errno));
>   return -1;
>   }
> --
> 2.23.0

RE: [PATCH v2] net/af_xdp: allow operation when multiprocess is disabled

2022-02-17 Thread Loftus, Ciara

> Subject: [PATCH v2] net/af_xdp: allow operation when multiprocess is
> disabled
> 
> If EAL multiprocess feature has been disabled via rte_mp_disable()
> function, AF_XDP driver may not be able to register its IPC callback.
> Previously this leads to probe failure.
> This commit adds a check for this condition so that AF_XDP can still be
> used even if multiprocess is disabled.
> 
> Fixes: 9876cf8316b3 ("net/af_xdp: re-enable secondary process support")
> 
> Signed-off-by: Junxiao Shi 

Thanks for the patch!

Acked-by: Ciara Loftus 

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 6ac710c6bd..2163df7c5c 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1995,7 +1995,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device
> *dev)
>   /* Register IPC callback which shares xsk fds from primary to
> secondary */
>   if (!afxdp_dev_count) {
>   ret = rte_mp_action_register(ETH_AF_XDP_MP_KEY,
> afxdp_mp_send_fds);
> - if (ret < 0) {
> + if (ret < 0 && rte_errno != ENOTSUP) {
>   AF_XDP_LOG(ERR, "%s: Failed to register multi-
> process IPC callback: %s",
>  name, strerror(rte_errno));
>   return -1;
> --
> 2.17.1

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-16 Thread Loftus, Ciara

> Subject: RE: [PATCH v4] net/af_xdp: re-enable secondary process support
> 
> >
> > On 2/11/2022 1:01 PM, Loftus, Ciara wrote:
> > >>
> > >> On 2/11/2022 9:26 AM, Loftus, Ciara wrote:
> > >>>>>
> > >>>>> On 2/10/2022 5:47 PM, Loftus, Ciara wrote:
> > >>>>>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> > >>>> support
> > >>>>>>>
> > >>>>>>> On 2/10/2022 3:40 PM, Loftus, Ciara wrote:
> > >>>>>>>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary
> > process
> > >>>>> support
> > >>>>>>>>>
> > >>>>>>>>> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> > >>>>>>>>>> Secondary process support had been disabled for the
> AF_XDP
> > >> PMD
> > >>>>>>>>> because
> > >>>>>>>>>> there was no logic in place to share the AF_XDP socket file
> > >>>> descriptors
> > >>>>>>>>>> between the processes. This commit introduces this logic
> using
> > >> the
> > >>>>> IPC
> > >>>>>>>>>> APIs.
> > >>>>>>>>>>
> > >>>>>>>>>> Rx and Tx are disabled in the secondary process due to
> memory
> > >>>>> mapping
> > >>>>>>> of
> > >>>>>>>>>> the AF_XDP rings being assigned by the kernel in the primary
> > >>>> process
> > >>>>>>> only.
> > >>>>>>>>>> However other operations including retrieval of stats are
> > >> permitted.
> > >>>>>>>>>>
> > >>>>>>>>>> Signed-off-by: Ciara Loftus 
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Hi Ciara,
> > >>>>>>>>>
> > >>>>>>>>> When I tried to test the patch getting following error [1], it
> > doesn't
> > >>>> look
> > >>>>>>>>> related to this patch but can you help to fix the issue, thanks.
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>>>>>> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> > >>>>>>>>> xsk_configure(): Failed to create xsk socket.
> > >>>>>>>>> eth_rx_queue_setup(): Failed to configure xdp socket
> > >>>>>>>>> Fail to configure port 2 rx queues
> > >>>>>>>>> EAL: Error - exiting with code: 1
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Hi Ferruh,
> > >>>>>>>>
> > >>>>>>>> This file should be generated when libxdp is compiled.
> > >>>>>>>> Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
> > >>>>>>>> Can you check if that file is there for you? It could be in
> > >>>>>>> /usr/local/lib64/bpf/ on your machine.
> > >>>>>>>> What kernel are you running on?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> It is in: /usr/local/lib64/bpf/xsk_def_xdp_prog.o
> > >>>>>>>
> > >>>>>>> I had to compile libxdp from source because OS package version
> > was
> > >> old
> > >>>>>>> to work with af_xdp.
> > >>>>>>> Is something required to point location of this file to af_xdp
> PMD?
> > >>>>>>>
> > >>>>>>> I run kernel:
> > >>>>>>> 5.15.16-200.fc35.x86_64
> > >>>>>>
> > >>>>>> I read through the libxdp code to figure out what happens when
> > >>>> searching
> > >>>>> for the file:
> > >>>>>> https://github.com/xdp-project/xdp-
> > >>>>> tools/blob/v1.2.2/lib/libxdp/libxdp.c#L1055
> > >>>>>>
> > >>>>>> secure_g

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-11 Thread Loftus, Ciara

> 
> On 2/11/2022 1:01 PM, Loftus, Ciara wrote:
> >>
> >> On 2/11/2022 9:26 AM, Loftus, Ciara wrote:
> >>>>>
> >>>>> On 2/10/2022 5:47 PM, Loftus, Ciara wrote:
> >>>>>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> >>>> support
> >>>>>>>
> >>>>>>> On 2/10/2022 3:40 PM, Loftus, Ciara wrote:
> >>>>>>>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary
> process
> >>>>> support
> >>>>>>>>>
> >>>>>>>>> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> >>>>>>>>>> Secondary process support had been disabled for the AF_XDP
> >> PMD
> >>>>>>>>> because
> >>>>>>>>>> there was no logic in place to share the AF_XDP socket file
> >>>> descriptors
> >>>>>>>>>> between the processes. This commit introduces this logic using
> >> the
> >>>>> IPC
> >>>>>>>>>> APIs.
> >>>>>>>>>>
> >>>>>>>>>> Rx and Tx are disabled in the secondary process due to memory
> >>>>> mapping
> >>>>>>> of
> >>>>>>>>>> the AF_XDP rings being assigned by the kernel in the primary
> >>>> process
> >>>>>>> only.
> >>>>>>>>>> However other operations including retrieval of stats are
> >> permitted.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Ciara Loftus 
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Hi Ciara,
> >>>>>>>>>
> >>>>>>>>> When I tried to test the patch getting following error [1], it
> doesn't
> >>>> look
> >>>>>>>>> related to this patch but can you help to fix the issue, thanks.
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> >>>>>>>>> xsk_configure(): Failed to create xsk socket.
> >>>>>>>>> eth_rx_queue_setup(): Failed to configure xdp socket
> >>>>>>>>> Fail to configure port 2 rx queues
> >>>>>>>>> EAL: Error - exiting with code: 1
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi Ferruh,
> >>>>>>>>
> >>>>>>>> This file should be generated when libxdp is compiled.
> >>>>>>>> Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
> >>>>>>>> Can you check if that file is there for you? It could be in
> >>>>>>> /usr/local/lib64/bpf/ on your machine.
> >>>>>>>> What kernel are you running on?
> >>>>>>>>
> >>>>>>>
> >>>>>>> It is in: /usr/local/lib64/bpf/xsk_def_xdp_prog.o
> >>>>>>>
> >>>>>>> I had to compile libxdp from source because OS package version
> was
> >> old
> >>>>>>> to work with af_xdp.
> >>>>>>> Is something required to point location of this file to af_xdp PMD?
> >>>>>>>
> >>>>>>> I run kernel:
> >>>>>>> 5.15.16-200.fc35.x86_64
> >>>>>>
> >>>>>> I read through the libxdp code to figure out what happens when
> >>>> searching
> >>>>> for the file:
> >>>>>> https://github.com/xdp-project/xdp-
> >>>>> tools/blob/v1.2.2/lib/libxdp/libxdp.c#L1055
> >>>>>>
> >>>>>> secure_getenv(XDP_OBJECT_ENVVAR) is called which according to
> the
> >>>>> README "defaults to /usr/lib/bpf (or /usr/lib64/bpf on systems using
> a
> >> split
> >>>>> library path)".
> >>>>>> If that fails, BPF_OBJECT_PATH will be searched, which points to
> >>>>> /usr/lib/bpf
> >>>>>>
> >>>>>> I discovered that on my system the getenv() call fails, but the file is
> >>

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-11 Thread Loftus, Ciara

> 
> On 2/11/2022 9:26 AM, Loftus, Ciara wrote:
> >>>
> >>> On 2/10/2022 5:47 PM, Loftus, Ciara wrote:
> >>>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> >> support
> >>>>>
> >>>>> On 2/10/2022 3:40 PM, Loftus, Ciara wrote:
> >>>>>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> >>> support
> >>>>>>>
> >>>>>>> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> >>>>>>>> Secondary process support had been disabled for the AF_XDP
> PMD
> >>>>>>> because
> >>>>>>>> there was no logic in place to share the AF_XDP socket file
> >> descriptors
> >>>>>>>> between the processes. This commit introduces this logic using
> the
> >>> IPC
> >>>>>>>> APIs.
> >>>>>>>>
> >>>>>>>> Rx and Tx are disabled in the secondary process due to memory
> >>> mapping
> >>>>> of
> >>>>>>>> the AF_XDP rings being assigned by the kernel in the primary
> >> process
> >>>>> only.
> >>>>>>>> However other operations including retrieval of stats are
> permitted.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Ciara Loftus 
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi Ciara,
> >>>>>>>
> >>>>>>> When I tried to test the patch getting following error [1], it doesn't
> >> look
> >>>>>>> related to this patch but can you help to fix the issue, thanks.
> >>>>>>>
> >>>>>>> [1]
> >>>>>>> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> >>>>>>> xsk_configure(): Failed to create xsk socket.
> >>>>>>> eth_rx_queue_setup(): Failed to configure xdp socket
> >>>>>>> Fail to configure port 2 rx queues
> >>>>>>> EAL: Error - exiting with code: 1
> >>>>>>
> >>>>>>
> >>>>>> Hi Ferruh,
> >>>>>>
> >>>>>> This file should be generated when libxdp is compiled.
> >>>>>> Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
> >>>>>> Can you check if that file is there for you? It could be in
> >>>>> /usr/local/lib64/bpf/ on your machine.
> >>>>>> What kernel are you running on?
> >>>>>>
> >>>>>
> >>>>> It is in: /usr/local/lib64/bpf/xsk_def_xdp_prog.o
> >>>>>
> >>>>> I had to compile libxdp from source because OS package version was
> old
> >>>>> to work with af_xdp.
> >>>>> Is something required to point location of this file to af_xdp PMD?
> >>>>>
> >>>>> I run kernel:
> >>>>> 5.15.16-200.fc35.x86_64
> >>>>
> >>>> I read through the libxdp code to figure out what happens when
> >> searching
> >>> for the file:
> >>>> https://github.com/xdp-project/xdp-
> >>> tools/blob/v1.2.2/lib/libxdp/libxdp.c#L1055
> >>>>
> >>>> secure_getenv(XDP_OBJECT_ENVVAR) is called which according to the
> >>> README "defaults to /usr/lib/bpf (or /usr/lib64/bpf on systems using a
> split
> >>> library path)".
> >>>> If that fails, BPF_OBJECT_PATH will be searched, which points to
> >>> /usr/lib/bpf
> >>>>
> >>>> I discovered that on my system the getenv() call fails, but the file is
> >>> eventually found because luckily BPF_OBJECT_PATH points to the
> >>> appropriate place for me (lib):
> >>>> https://github.com/xdp-project/xdp-
> tools/blob/v1.2.2/lib/util/util.h#L24
> >>>> I suspect the same failure is happening for you, but since
> >>> BPF_OBJECT_PATH points to lib and not lib64, the file is not found.
> >>>> As a temporary measure can you create a symlink in /usr/local/lib/bpf/
> to
> >>> point to /usr/local/lib/bpf/xsk_def_xdp_prog.o
> >>>> I will investigate the libxdp issue further. Maybe a change is needed in
> >> the
> >>> library. If a change or setup recommendation is needed

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-11 Thread Loftus, Ciara

> >
> > On 2/10/2022 5:47 PM, Loftus, Ciara wrote:
> > >> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> support
> > >>
> > >> On 2/10/2022 3:40 PM, Loftus, Ciara wrote:
> > >>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> > support
> > >>>>
> > >>>> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> > >>>>> Secondary process support had been disabled for the AF_XDP PMD
> > >>>> because
> > >>>>> there was no logic in place to share the AF_XDP socket file
> descriptors
> > >>>>> between the processes. This commit introduces this logic using the
> > IPC
> > >>>>> APIs.
> > >>>>>
> > >>>>> Rx and Tx are disabled in the secondary process due to memory
> > mapping
> > >> of
> > >>>>> the AF_XDP rings being assigned by the kernel in the primary
> process
> > >> only.
> > >>>>> However other operations including retrieval of stats are permitted.
> > >>>>>
> > >>>>> Signed-off-by: Ciara Loftus 
> > >>>>>
> > >>>>
> > >>>> Hi Ciara,
> > >>>>
> > >>>> When I tried to test the patch getting following error [1], it doesn't
> look
> > >>>> related to this patch but can you help to fix the issue, thanks.
> > >>>>
> > >>>> [1]
> > >>>> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> > >>>> xsk_configure(): Failed to create xsk socket.
> > >>>> eth_rx_queue_setup(): Failed to configure xdp socket
> > >>>> Fail to configure port 2 rx queues
> > >>>> EAL: Error - exiting with code: 1
> > >>>
> > >>>
> > >>> Hi Ferruh,
> > >>>
> > >>> This file should be generated when libxdp is compiled.
> > >>> Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
> > >>> Can you check if that file is there for you? It could be in
> > >> /usr/local/lib64/bpf/ on your machine.
> > >>> What kernel are you running on?
> > >>>
> > >>
> > >> It is in: /usr/local/lib64/bpf/xsk_def_xdp_prog.o
> > >>
> > >> I had to compile libxdp from source because OS package version was old
> > >> to work with af_xdp.
> > >> Is something required to point location of this file to af_xdp PMD?
> > >>
> > >> I run kernel:
> > >> 5.15.16-200.fc35.x86_64
> > >
> > > I read through the libxdp code to figure out what happens when
> searching
> > for the file:
> > > https://github.com/xdp-project/xdp-
> > tools/blob/v1.2.2/lib/libxdp/libxdp.c#L1055
> > >
> > > secure_getenv(XDP_OBJECT_ENVVAR) is called which according to the
> > README "defaults to /usr/lib/bpf (or /usr/lib64/bpf on systems using a split
> > library path)".
> > > If that fails, BPF_OBJECT_PATH will be searched, which points to
> > /usr/lib/bpf
> > >
> > > I discovered that on my system the getenv() call fails, but the file is
> > eventually found because luckily BPF_OBJECT_PATH points to the
> > appropriate place for me (lib):
> > > https://github.com/xdp-project/xdp-tools/blob/v1.2.2/lib/util/util.h#L24
> > > I suspect the same failure is happening for you, but since
> > BPF_OBJECT_PATH points to lib and not lib64, the file is not found.
> > > As a temporary measure can you create a symlink in /usr/local/lib/bpf/ to
> > point to /usr/local/lib/bpf/xsk_def_xdp_prog.o
> > > I will investigate the libxdp issue further. Maybe a change is needed in
> the
> > library. If a change or setup recommendation is needed in DPDK I will
> create a
> > patch.
> > >
> >
> >
> > I don't have XDP_OBJECT_ENVVAR or BPF_OBJECT_PATH environment
> > variables set,
> > if they should be we should document them.
> >
> > When I created '/usr/local/lib/bpf/' link, the BPF file found.
> > This should be clarified/documented for users.
> 
> Ok. Ideally we shouldn't have to create the symlink. I will look for a better
> solution and submit a patch.
> The symlink might be a temporary solution if another solution is not found.

Can you please try setting the environment variable 
LIBXDP_OBJECT_PATH=/usr/local/lib64/bpf/
And see if your

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-10 Thread Loftus, Ciara

> 
> On 2/10/2022 5:47 PM, Loftus, Ciara wrote:
> >> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process support
> >>
> >> On 2/10/2022 3:40 PM, Loftus, Ciara wrote:
> >>>> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process
> support
> >>>>
> >>>> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> >>>>> Secondary process support had been disabled for the AF_XDP PMD
> >>>> because
> >>>>> there was no logic in place to share the AF_XDP socket file descriptors
> >>>>> between the processes. This commit introduces this logic using the
> IPC
> >>>>> APIs.
> >>>>>
> >>>>> Rx and Tx are disabled in the secondary process due to memory
> mapping
> >> of
> >>>>> the AF_XDP rings being assigned by the kernel in the primary process
> >> only.
> >>>>> However other operations including retrieval of stats are permitted.
> >>>>>
> >>>>> Signed-off-by: Ciara Loftus 
> >>>>>
> >>>>
> >>>> Hi Ciara,
> >>>>
> >>>> When I tried to test the patch getting following error [1], it doesn't 
> >>>> look
> >>>> related to this patch but can you help to fix the issue, thanks.
> >>>>
> >>>> [1]
> >>>> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> >>>> xsk_configure(): Failed to create xsk socket.
> >>>> eth_rx_queue_setup(): Failed to configure xdp socket
> >>>> Fail to configure port 2 rx queues
> >>>> EAL: Error - exiting with code: 1
> >>>
> >>>
> >>> Hi Ferruh,
> >>>
> >>> This file should be generated when libxdp is compiled.
> >>> Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
> >>> Can you check if that file is there for you? It could be in
> >> /usr/local/lib64/bpf/ on your machine.
> >>> What kernel are you running on?
> >>>
> >>
> >> It is in: /usr/local/lib64/bpf/xsk_def_xdp_prog.o
> >>
> >> I had to compile libxdp from source because OS package version was old
> >> to work with af_xdp.
> >> Is something required to point location of this file to af_xdp PMD?
> >>
> >> I run kernel:
> >> 5.15.16-200.fc35.x86_64
> >
> > I read through the libxdp code to figure out what happens when searching
> for the file:
> > https://github.com/xdp-project/xdp-
> tools/blob/v1.2.2/lib/libxdp/libxdp.c#L1055
> >
> > secure_getenv(XDP_OBJECT_ENVVAR) is called which according to the
> README "defaults to /usr/lib/bpf (or /usr/lib64/bpf on systems using a split
> library path)".
> > If that fails, BPF_OBJECT_PATH will be searched, which points to
> /usr/lib/bpf
> >
> > I discovered that on my system the getenv() call fails, but the file is
> eventually found because luckily BPF_OBJECT_PATH points to the
> appropriate place for me (lib):
> > https://github.com/xdp-project/xdp-tools/blob/v1.2.2/lib/util/util.h#L24
> > I suspect the same failure is happening for you, but since
> BPF_OBJECT_PATH points to lib and not lib64, the file is not found.
> > As a temporary measure can you create a symlink in /usr/local/lib/bpf/ to
> point to /usr/local/lib/bpf/xsk_def_xdp_prog.o
> > I will investigate the libxdp issue further. Maybe a change is needed in the
> library. If a change or setup recommendation is needed in DPDK I will create a
> patch.
> >
> 
> 
> I don't have XDP_OBJECT_ENVVAR or BPF_OBJECT_PATH environment
> variables set,
> if they should be we should document them.
> 
> When I created '/usr/local/lib/bpf/' link, the BPF file found.
> This should be clarified/documented for users.

Ok. Ideally we shouldn't have to create the symlink. I will look for a better 
solution and submit a patch.
The symlink might be a temporary solution if another solution is not found.

> 
> 
> And still observing following two:
> 
> 1) I don't know what following log means:
> Configuring Port 2 (socket 0)
> libbpf: elf: skipping unrecognized data section(7) .xdp_run_config
> libbpf: elf: skipping unrecognized data section(8) xdp_metadata
> libxdp: XDP flag not supported by libxdp.
> libbpf: elf: skipping unrecognized data section(8) xdp_metadata
> libbpf: elf: skipping unrecognized data section(8) xdp_metadata

I reported this and a patch was submitted to libbpf to demote those logs:
https://www.spinics.net/lists/bpf/msg49140.html
It looks like the patch never made it. I'll chase it up.
Anyway, the logs can be ignored as they are not errors.

> 
> 2) When I try to create two af_xdp interface, I only got one:
> "--vdev net_af_xdp,iface=enp24s0f1 --vdev net_af_xdp,iface=enp24s0f0"

This is also expected as you haven't given each vdev a unique name. Try:
"--vdev net_af_xdp0,iface=enp24s0f1 --vdev net_af_xdp1,iface=enp24s0f0"

Thank you for the testing.

Ciara

> 
> 
> Thanks,
> ferruh

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-10 Thread Loftus, Ciara

> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process support
> 
> On 2/10/2022 3:40 PM, Loftus, Ciara wrote:
> >> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process support
> >>
> >> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> >>> Secondary process support had been disabled for the AF_XDP PMD
> >> because
> >>> there was no logic in place to share the AF_XDP socket file descriptors
> >>> between the processes. This commit introduces this logic using the IPC
> >>> APIs.
> >>>
> >>> Rx and Tx are disabled in the secondary process due to memory mapping
> of
> >>> the AF_XDP rings being assigned by the kernel in the primary process
> only.
> >>> However other operations including retrieval of stats are permitted.
> >>>
> >>> Signed-off-by: Ciara Loftus 
> >>>
> >>
> >> Hi Ciara,
> >>
> >> When I tried to test the patch getting following error [1], it doesn't look
> >> related to this patch but can you help to fix the issue, thanks.
> >>
> >> [1]
> >> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> >> xsk_configure(): Failed to create xsk socket.
> >> eth_rx_queue_setup(): Failed to configure xdp socket
> >> Fail to configure port 2 rx queues
> >> EAL: Error - exiting with code: 1
> >
> >
> > Hi Ferruh,
> >
> > This file should be generated when libxdp is compiled.
> > Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
> > Can you check if that file is there for you? It could be in
> /usr/local/lib64/bpf/ on your machine.
> > What kernel are you running on?
> >
> 
> It is in: /usr/local/lib64/bpf/xsk_def_xdp_prog.o
> 
> I had to compile libxdp from source because OS package version was old
> to work with af_xdp.
> Is something required to point location of this file to af_xdp PMD?
> 
> I run kernel:
> 5.15.16-200.fc35.x86_64

I read through the libxdp code to figure out what happens when searching for 
the file:
https://github.com/xdp-project/xdp-tools/blob/v1.2.2/lib/libxdp/libxdp.c#L1055

secure_getenv(XDP_OBJECT_ENVVAR) is called which according to the README 
"defaults to /usr/lib/bpf (or /usr/lib64/bpf on systems using a split library 
path)". 
If that fails, BPF_OBJECT_PATH will be searched, which points to /usr/lib/bpf

I discovered that on my system the getenv() call fails, but the file is 
eventually found because luckily BPF_OBJECT_PATH points to the appropriate 
place for me (lib):
https://github.com/xdp-project/xdp-tools/blob/v1.2.2/lib/util/util.h#L24
I suspect the same failure is happening for you, but since BPF_OBJECT_PATH 
points to lib and not lib64, the file is not found.
As a temporary measure can you create a symlink in /usr/local/lib/bpf/ to point 
to /usr/local/lib/bpf/xsk_def_xdp_prog.o
I will investigate the libxdp issue further. Maybe a change is needed in the 
library. If a change or setup recommendation is needed in DPDK I will create a 
patch.

Thanks,
Ciara

>

RE: [PATCH v4] net/af_xdp: re-enable secondary process support

2022-02-10 Thread Loftus, Ciara

> Subject: Re: [PATCH v4] net/af_xdp: re-enable secondary process support
> 
> On 2/9/2022 9:48 AM, Ciara Loftus wrote:
> > Secondary process support had been disabled for the AF_XDP PMD
> because
> > there was no logic in place to share the AF_XDP socket file descriptors
> > between the processes. This commit introduces this logic using the IPC
> > APIs.
> >
> > Rx and Tx are disabled in the secondary process due to memory mapping of
> > the AF_XDP rings being assigned by the kernel in the primary process only.
> > However other operations including retrieval of stats are permitted.
> >
> > Signed-off-by: Ciara Loftus 
> >
> 
> Hi Ciara,
> 
> When I tried to test the patch getting following error [1], it doesn't look
> related to this patch but can you help to fix the issue, thanks.
> 
> [1]
> libxdp: Couldn't find a BPF file with name xsk_def_xdp_prog.o
> xsk_configure(): Failed to create xsk socket.
> eth_rx_queue_setup(): Failed to configure xdp socket
> Fail to configure port 2 rx queues
> EAL: Error - exiting with code: 1


Hi Ferruh,

This file should be generated when libxdp is compiled.
Mine is located @ /usr/local/lib/bpf/xsk_def_xdp_prog.o
Can you check if that file is there for you? It could be in 
/usr/local/lib64/bpf/ on your machine.
What kernel are you running on?

Thanks,
Ciara

RE: [PATCH v1] net/af_xdp: make umem configure code more readable

2022-02-10 Thread Loftus, Ciara

> Subject: [PATCH v1] net/af_xdp: make umem configure code more readable
> 
> The below compile time defined style make the code not so readable, the
> first function end block is after "#endif" segment.
> 
>   #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> 
>   xdp_umem_configure()
>   {
> 
>   #else
>   xdp_umem_configure()
>   {
> 
>   #endif
>   'shared code block'
>   }
> 
> Signed-off-by: Haiyue Wang 

Thanks for the patch!

Acked-by: Ciara Loftus 

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 1b6192fa44..802f912cb7 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1078,6 +1078,12 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   __atomic_store_n(&umem->refcnt, 1, __ATOMIC_RELEASE);
>   }
> 
> + return umem;
> +
> +err:
> + xdp_umem_destroy(umem);
> + return NULL;
> +}
>  #else
>  static struct
>  xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
> @@ -1138,13 +1144,13 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   }
>   umem->mz = mz;
> 
> -#endif
>   return umem;
> 
>  err:
>   xdp_umem_destroy(umem);
>   return NULL;
>  }
> +#endif
> 
>  static int
>  load_custom_xdp_prog(const char *prog_path, int if_index, struct bpf_map
> **map)
> --
> 2.35.1

RE: [PATCH] ethdev: introduce generic dummy packet burst function

2022-02-09 Thread Loftus, Ciara

> Subject: [PATCH] ethdev: introduce generic dummy packet burst function
> 
> Multiple PMDs have dummy/noop Rx/Tx packet burst functions.
> 
> These dummy functions are very simple, introduce a common function in
> the ethdev and update drivers to use it instead of each driver having
> its own functions.
> 
> Signed-off-by: Ferruh Yigit 
> ---
> Cc: Ciara Loftus 
> ---

[snip]

> diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
> index 3f3c4a7c7214..910b76a92c42 100644
> --- a/drivers/net/mlx4/mlx4.c
> +++ b/drivers/net/mlx4/mlx4.c
> @@ -350,8 +350,8 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
>   return 0;
>   DEBUG("%p: detaching flows from all RX queues", (void *)dev);
>   priv->started = 0;
> - dev->tx_pkt_burst = mlx4_tx_burst_removed;
> - dev->rx_pkt_burst = mlx4_rx_burst_removed;
> + dev->tx_pkt_burst = rte_eth_pkt_burst_dummy;
> + dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
>   rte_wmb();
>   /* Disable datapath on secondary process. */
>   mlx4_mp_req_stop_rxtx(dev);
> @@ -383,8 +383,8 @@ mlx4_dev_close(struct rte_eth_dev *dev)
>   DEBUG("%p: closing device \"%s\"",
> (void *)dev,
> ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
> - dev->rx_pkt_burst = mlx4_rx_burst_removed;
> - dev->tx_pkt_burst = mlx4_tx_burst_removed;
> + dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
> + dev->tx_pkt_burst = rte_eth_pkt_burst_dummy;
>   rte_wmb();
>   /* Disable datapath on secondary process. */
>   mlx4_mp_req_stop_rxtx(dev);
> diff --git a/drivers/net/mlx4/mlx4_mp.c b/drivers/net/mlx4/mlx4_mp.c
> index 8fcfb5490ee9..1da64910aadd 100644
> --- a/drivers/net/mlx4/mlx4_mp.c
> +++ b/drivers/net/mlx4/mlx4_mp.c
> @@ -150,8 +150,8 @@ mp_secondary_handle(const struct rte_mp_msg
> *mp_msg, const void *peer)
>   break;
>   case MLX4_MP_REQ_STOP_RXTX:
>   INFO("port %u stopping datapath", dev->data->port_id);
> - dev->tx_pkt_burst = mlx4_tx_burst_removed;
> - dev->rx_pkt_burst = mlx4_rx_burst_removed;
> + dev->tx_pkt_burst = rte_eth_pkt_burst_dummy;
> + dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
>   rte_mb();
>   mp_init_msg(dev, &mp_res, param->type);
>   res->result = 0;
> diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
> index ed9e41fcdea9..059e432a63fc 100644
> --- a/drivers/net/mlx4/mlx4_rxtx.c
> +++ b/drivers/net/mlx4/mlx4_rxtx.c
> @@ -1338,55 +1338,3 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf
> **pkts, uint16_t pkts_n)
>   rxq->stats.ipackets += i;
>   return i;
>  }
> -
> -/**
> - * Dummy DPDK callback for Tx.
> - *
> - * This function is used to temporarily replace the real callback during
> - * unsafe control operations on the queue, or in case of error.
> - *
> - * @param dpdk_txq
> - *   Generic pointer to Tx queue structure.
> - * @param[in] pkts
> - *   Packets to transmit.
> - * @param pkts_n
> - *   Number of packets in array.
> - *
> - * @return
> - *   Number of packets successfully transmitted (<= pkts_n).
> - */
> -uint16_t
> -mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t
> pkts_n)
> -{
> - (void)dpdk_txq;
> - (void)pkts;
> - (void)pkts_n;
> - rte_mb();

The mlx4 and mlx5 PMDs lose a call to rte_mb() when switching over to the new 
dummy functions. Maybe the maintainer can comment on whether that's an issue or 
not? Other than that LGTM.

Ciara

> - return 0;
> -}
> -
> -/**
> - * Dummy DPDK callback for Rx.
> - *
> - * This function is used to temporarily replace the real callback during
> - * unsafe control operations on the queue, or in case of error.
> - *
> - * @param dpdk_rxq
> - *   Generic pointer to Rx queue structure.
> - * @param[out] pkts
> - *   Array to store received packets.
> - * @param pkts_n
> - *   Maximum number of packets in array.
> - *
> - * @return
> - *   Number of packets successfully received (<= pkts_n).
> - */
> -uint16_t
> -mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t
> pkts_n)
> -{
> - (void)dpdk_rxq;
> - (void)pkts;
> - (void)pkts_n;
> - rte_mb();
> - return 0;
> -}
> diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
> index 83e9534cd0a7..70f3cd868058 100644
> --- a/drivers/net/mlx4/mlx4_rxtx.h
> +++ b/drivers/net/mlx4/mlx4_rxtx.h
> @@ -149,10 +149,6 @@ uint16_t mlx4_tx_burst(void *dpdk_txq, struct
> rte_mbuf **pkts,
>  uint16_t pkts_n);
>  uint16_t mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
>  uint16_t pkts_n);
> -uint16_t mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts,
> -uint16_t pkts_n);
> -uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
> -uint16_t pkts_n);
> 
>  /* mlx4_txq.c */
> 
> diff --git a/drivers/net/mlx5/linux/mlx5_mp_os.c
> b/dri

RE: [PATCH v3] net/af_xdp: re-enable secondary process support

2022-02-08 Thread Loftus, Ciara

> 
> On 2/8/2022 6:42 PM, Stephen Hemminger wrote:
> > On Tue, 8 Feb 2022 18:00:27 +
> > Ferruh Yigit  wrote:
> >
> >> On 2/8/2022 5:45 PM, Stephen Hemminger wrote:
> >>> On Tue,  8 Feb 2022 13:48:00 +
> >>> Ciara Loftus  wrote:
> >>>
>  +- **Secondary Processes**
>  +
>  +  Rx and Tx are not supported for secondary processes due to the
> single-producer
>  +  single-consumer nature of the AF_XDP rings. However other
> operations including
>  +  statistics retrieval are permitted.
>  +  The maximum number of queues permitted for PMDs operating in
> this model is 8
>  +  as this is the maximum number of fds that can be sent through the
> IPC APIs as
>  +  defined by RTE_MP_MAX_FD_NUM.
>  +
> >>>
> >>> This seems like a restriction that is true for most devices in DPDK.
> >>> Most other devices also have restriction that on queues;
> >>> the hardware descriptor ring can only be used by one thread at a time.
> >>> Is this different with AF_XDP?
> >>
> >> I asked the same on v2 :) and Ciara explained the reason, it is on v2
> discussion thread.
> >
> > The wording of the message is what confused me.
> > It would be better to change:
> >  due to the single-producer single-consumer nature of the AF_XDP rings
> > to
> >  due to memory mapping of the AF_XDP rings being assigned by the
> kernel
> >  in the primary process only.
> 
> +1

Agree, I worded this poorly! Will submit a v4 with a more accurate explanation 
of the limitation.

Thanks,
Ciara

RE: [PATCH v2] net/af_xdp: re-enable secondary process support

2022-02-07 Thread Loftus, Ciara

> >>
> >> On 2/4/2022 12:54 PM, Ciara Loftus wrote:
> >>> Secondary process support had been disabled for the AF_XDP PMD
> >>> because there was no logic in place to share the AF_XDP socket
> >>> file descriptors between the processes. This commit introduces
> >>> this logic using the IPC APIs.
> >>>
> >>> Since AF_XDP rings are single-producer single-consumer, rx/tx
> >>> in the secondary process is disabled. However other operations
> >>> including retrieval of stats are permitted.
> >>>
> >>> Signed-off-by: Ciara Loftus 
> >>>
> >>> ---
> >>> v1 -> v2:
> >>> * Rebase to next-net
> >>>
> >>> RFC -> v1:
> >>> * Added newline to af_xdp.rst
> >>> * Fixed spelling errors
> >>> * Fixed potential NULL dereference in init_internals
> >>> * Fixed potential free of address-of expression in
> afxdp_mp_request_fds
> >>> ---
> >>>doc/guides/nics/af_xdp.rst |   9 ++
> >>>doc/guides/nics/features/af_xdp.ini|   1 +
> >>>doc/guides/rel_notes/release_22_03.rst |   1 +
> >>>drivers/net/af_xdp/rte_eth_af_xdp.c| 210
> >> +++--
> >>>4 files changed, 207 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> >>> index db02ea1984..eb4eab28a8 100644
> >>> --- a/doc/guides/nics/af_xdp.rst
> >>> +++ b/doc/guides/nics/af_xdp.rst
> >>> @@ -141,4 +141,13 @@ Limitations
> >>>  NAPI context from a watchdog timer instead of from softirqs. More
> >> information
> >>>  on this feature can be found at [1].
> >>>
> >>> +- **Secondary Processes**
> >>> +
> >>> +  Rx and Tx are not supported for secondary processes due to the
> single-
> >> producer
> >>> +  single-consumer nature of the AF_XDP rings. However other
> operations
> >> including
> >>> +  statistics retrieval are permitted.
> >>
> >> Hi Ciara,
> >>
> >> Isn't this limitation same for all PMDs, like not both primary & secondary
> can
> >> Rx/Tx
> >> from same queue at the same time.
> >> But primary can initiallize the PMD and secondary can do the datapath,
> >> or isn't af_xdp supports multiple queue, if so some queues can be used
> by
> >> primary and some by secondary for datapath.
> >>
> >> Is there anyhing special for af_xdp that prevents it?
> >
> > Hi Ferruh,
> >
> > Thanks for the review.
> > Each queue of the PMD corresponds to a new AF_XDP socket.
> > Each socket has an RX and TX ring that is mmapped from the kernel to
> userspace and this mapping is only valid for the primary process.
> > I did not figure out a way to share that mapping with the secondary process
> successfully. Can you think of anything that might work?
> >
> 
> Does the application knows the buffer address for the Rx/Tx, or is
> abstracted to the 'fd'?

The application knows the buffer address of the Rx/Tx rings.
We pass a pointer to these rings to the libbpf xsk_socket__create API, which 
sets up the mappings:
http://code.dpdk.org/dpdk/v21.11/source/drivers/net/af_xdp/rte_eth_af_xdp.c#L1291
Then later on in the datapath we operate directly on those rings:
http://code.dpdk.org/dpdk/v21.11/source/drivers/net/af_xdp/rte_eth_af_xdp.c#L268
The fd is used in the datapath, but just for the syscalls (recvfrom/poll/send).

> If only 'fd' is used, this patch already converts 'fd' between
> processes.
> cc'ed Anatoly, but what I understand is after MP fd conversion:
> Primary process: FD=x
> Secondary process: FD=y
> And both x & y points to exact same socket in the kernel side.
> 
> At least this is how it works for the 'tap' interface, and that is
> why 'fs' are in the process_private area and converted between primary
> and secondary, I thought it will be same for the xdp socket.
> 
> Did you test the secondary Rx/Tx in the secondary after this patch?

RE: [PATCH v2] net/af_xdp: re-enable secondary process support

2022-02-06 Thread Loftus, Ciara

> 
> On 2/4/2022 12:54 PM, Ciara Loftus wrote:
> > Secondary process support had been disabled for the AF_XDP PMD
> > because there was no logic in place to share the AF_XDP socket
> > file descriptors between the processes. This commit introduces
> > this logic using the IPC APIs.
> >
> > Since AF_XDP rings are single-producer single-consumer, rx/tx
> > in the secondary process is disabled. However other operations
> > including retrieval of stats are permitted.
> >
> > Signed-off-by: Ciara Loftus 
> >
> > ---
> > v1 -> v2:
> > * Rebase to next-net
> >
> > RFC -> v1:
> > * Added newline to af_xdp.rst
> > * Fixed spelling errors
> > * Fixed potential NULL dereference in init_internals
> > * Fixed potential free of address-of expression in afxdp_mp_request_fds
> > ---
> >   doc/guides/nics/af_xdp.rst |   9 ++
> >   doc/guides/nics/features/af_xdp.ini|   1 +
> >   doc/guides/rel_notes/release_22_03.rst |   1 +
> >   drivers/net/af_xdp/rte_eth_af_xdp.c| 210
> +++--
> >   4 files changed, 207 insertions(+), 14 deletions(-)
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index db02ea1984..eb4eab28a8 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -141,4 +141,13 @@ Limitations
> > NAPI context from a watchdog timer instead of from softirqs. More
> information
> > on this feature can be found at [1].
> >
> > +- **Secondary Processes**
> > +
> > +  Rx and Tx are not supported for secondary processes due to the single-
> producer
> > +  single-consumer nature of the AF_XDP rings. However other operations
> including
> > +  statistics retrieval are permitted.
> 
> Hi Ciara,
> 
> Isn't this limitation same for all PMDs, like not both primary & secondary can
> Rx/Tx
> from same queue at the same time.
> But primary can initiallize the PMD and secondary can do the datapath,
> or isn't af_xdp supports multiple queue, if so some queues can be used by
> primary and some by secondary for datapath.
> 
> Is there anyhing special for af_xdp that prevents it?

Hi Ferruh,

Thanks for the review.
Each queue of the PMD corresponds to a new AF_XDP socket.
Each socket has an RX and TX ring that is mmapped from the kernel to userspace 
and this mapping is only valid for the primary process.
I did not figure out a way to share that mapping with the secondary process 
successfully. Can you think of anything that might work?

> 
> > +  The maximum number of queues permitted for PMDs operating in this
> model is 8
> > +  as this is the maximum number of fds that can be sent through the IPC
> APIs as
> > +  defined by RTE_MP_MAX_FD_NUM.
> > +
> > [1] https://lwn.net/Articles/837010/
> > diff --git a/doc/guides/nics/features/af_xdp.ini
> b/doc/guides/nics/features/af_xdp.ini
> > index 54b738e616..8e7e075aaf 100644
> > --- a/doc/guides/nics/features/af_xdp.ini
> > +++ b/doc/guides/nics/features/af_xdp.ini
> > @@ -9,4 +9,5 @@ Power mgmt address monitor = Y
> >   MTU update   = Y
> >   Promiscuous mode = Y
> >   Stats per queue  = Y
> > +Multiprocess aware   = Y
> >   x86-64   = Y
> > diff --git a/doc/guides/rel_notes/release_22_03.rst
> b/doc/guides/rel_notes/release_22_03.rst
> > index bf2e3f78a9..dfd2cbbccf 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -58,6 +58,7 @@ New Features
> >   * **Updated AF_XDP PMD**
> >
> > * Added support for libxdp >=v1.2.2.
> > +  * Re-enabled secondary process support. RX/TX is not supported.
> >
> >   * **Updated Cisco enic driver.**
> >
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 1b6192fa44..407f6d8dbe 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -80,6 +80,18 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype,
> NOTICE);
> >
> >   #define ETH_AF_XDP_ETH_OVERHEAD
>   (RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN)
> >
> > +#define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
> > +
> > +static int afxdp_dev_count;
> > +
> > +/* Message header to synchronize fds via IPC */
> > +struct ipc_hdr {
> > +   char port_name[RTE_DEV_NAME_MAX_LEN];
> > +   /* The file descriptors are in the dedicated part
> > +* of the Unix message to be translated by the kernel.
> > +*/
> > +};
> > +
> >   struct xsk_umem_info {
> > struct xsk_umem *umem;
> > struct rte_ring *buf_ring;
> > @@ -147,6 +159,10 @@ struct pmd_internals {
> > struct pkt_tx_queue *tx_queues;
> >   };
> >
> > +struct pmd_process_private {
> > +   int rxq_xsk_fds[RTE_MAX_QUEUES_PER_PORT];
> > +};
> > +
> >   #define ETH_AF_XDP_IFACE_ARG  "iface"
> >   #define ETH_AF_XDP_START_QUEUE_ARG"start_queue"
> >   #define ETH_AF_XDP_QUEUE_COUNT_ARG
>   "queue_count"
> > @@ -795,11 +811,12 @@ static int
> >   eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> >   {
> >

RE: [PATCH v2] net/af_xdp: use libxdp if available

2022-01-27 Thread Loftus, Ciara

> 
> On Tue, Jan 25, 2022 at 07:20:43AM +, Ciara Loftus wrote:
> > AF_XDP support is deprecated in libbpf since v0.7.0 [1]. The
> > libxdp library now provides the functionality which once was in
> > libbpf and which the AF_XDP PMD relies on. This commit updates the
> > AF_XDP meson build to use the libxdp library if a version >= v1.2.2 is
> > available. If it is not available, only versions of libbpf prior to v0.7.0
> > are allowed, as they still contain the required AF_XDP functionality.
> >
> > libbpf still remains a dependency even if libxdp is present, as we
> > use libbpf APIs for program loading.
> >
> > The minimum required kernel version for libxdp for use with AF_XDP is
> v5.3.
> > For the library to be fully-featured, a kernel v5.10 or newer is
> > recommended. The full compatibility information can be found in the
> libxdp
> > README.
> >
> > v1.2.2 of libxdp includes an important fix required for linking with
> > DPDK which is why this version or greater is required. Meson uses
> > pkg-config to verify the version of libxdp on the system, so it is
> > necessary that the library is discoverable using pkg-config in order for
> > the PMD to use it. To verify this, you can run:
> > pkg-config --modversion libxdp
> >
> > [1] https://github.com/libbpf/libbpf/commit/277846bc6c15
> >
> > Signed-off-by: Ciara Loftus 
> 
> Hi Ciara,
> 
> couple of comments inline below.
> 
> /Bruce
> 
> > ---
> > v2:
> > * Set minimum libxdp version at v1.2.2
> >
> > RFC -> v1:
> > * Set minimum libxdp version at v1.3.0
> > * Don't provide alternative to discovery via pkg-config
> > * Add missing newline to end of file
> > ---
> >  doc/guides/nics/af_xdp.rst |  6 ++--
> >  doc/guides/rel_notes/release_22_03.rst |  4 +++
> >  drivers/net/af_xdp/compat.h|  4 +++
> >  drivers/net/af_xdp/meson.build | 39 +-
> >  drivers/net/af_xdp/rte_eth_af_xdp.c|  1 -
> >  5 files changed, 42 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index c9d0e1ad6c..db02ea1984 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -43,9 +43,7 @@ Prerequisites
> >  This is a Linux-specific PMD, thus the following prerequisites apply:
> >
> >  *  A Linux Kernel (version > v4.18) with XDP sockets configuration enabled;
> > -*  libbpf (within kernel version > v5.1-rc4) with latest af_xdp support
> installed,
> > -   User can install libbpf via `make install_lib` && `make 
> > install_headers` in
> > -   /tools/lib/bpf;
> > +*  Both libxdp >=v1.2.2 and libbpf libraries installed, or, libbpf <=v0.6.0
> >  *  A Kernel bound interface to attach to;
> >  *  For need_wakeup feature, it requires kernel version later than v5.3-rc1;
> >  *  For PMD zero copy, it requires kernel version later than v5.4-rc1;
> > @@ -143,4 +141,4 @@ Limitations
> >NAPI context from a watchdog timer instead of from softirqs. More
> information
> >on this feature can be found at [1].
> >
> > -  [1] https://lwn.net/Articles/837010/
> > \ No newline at end of file
> > +  [1] https://lwn.net/Articles/837010/
> > diff --git a/doc/guides/rel_notes/release_22_03.rst
> b/doc/guides/rel_notes/release_22_03.rst
> > index 8a202ec4f4..ad7283df65 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -55,6 +55,10 @@ New Features
> >   Also, make sure to start the actual text at the margin.
> >   ===
> >
> > +* **Update AF_XDP PMD**
> > +
> > +  * Added support for libxdp >=v1.2.2.
> > +
> >
> >  Removed Items
> >  -
> > diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
> > index 3880dc7dd7..245df1b109 100644
> > --- a/drivers/net/af_xdp/compat.h
> > +++ b/drivers/net/af_xdp/compat.h
> > @@ -2,7 +2,11 @@
> >   * Copyright(c) 2020 Intel Corporation.
> >   */
> >
> > +#ifdef RTE_LIBRTE_AF_XDP_PMD_LIBXDP
> 
> This is a really long macro name. With meson builds we have largely moved
> away from using "RTE_LIBRTE_" as a prefix, and also have dropped "PMD"
> from
> names too. The global enable macro for AF_XDP driver is now
> "RTE_NET_AF_XDP" so I'd suggest this macro could be shortened to
> "RTE_NET_AF_XDP_LIBXDP".

+1

> 
> > +#include 
> > +#else
> >  #include 
> > +#endif
> >  #include 
> >  #include 
> >
> > diff --git a/drivers/net/af_xdp/meson.build
> b/drivers/net/af_xdp/meson.build
> > index 3ed2b29784..981d4c6087 100644
> > --- a/drivers/net/af_xdp/meson.build
> > +++ b/drivers/net/af_xdp/meson.build
> > @@ -9,19 +9,44 @@ endif
> >
> >  sources = files('rte_eth_af_xdp.c')
> >
> > +xdp_dep = dependency('libxdp', version : '>=1.2.2', required: false,
> method: 'pkg-config')
> >  bpf_dep = dependency('libbpf', required: false, method: 'pkg-config')
> >  if not bpf_dep.found()
> >  bpf_dep = cc.find_library('bpf', required: false)
> >  endif
> >
> > -if bpf_dep.found() and cc.h

Re: [dpdk-dev] [PATCH] net/af_xdp: use bpf link for XDP programs

2021-10-22 Thread Loftus, Ciara

> 
> On 10/14/2021 10:50 AM, Ciara Loftus wrote:
> > --- a/drivers/net/af_xdp/compat.h
> > +++ b/drivers/net/af_xdp/compat.h
> > @@ -2,9 +2,11 @@
> >* Copyright(c) 2020 Intel Corporation.
> >*/
> >
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> 
> Hi Ciara,
> 
> I am getting build error because xdp/filter.h is missing [1], where
> that header should be?
> And should the meson recognize the missing header/library and behave
> according, or is the build error expected?

My mistake. This header shouldn't be included. Thank you for catching this.
It was to provide the macros BPF_MOV64_IMM and BPF_EXIT_INSN but we can derive 
these from the bpf/bph.h header and remove this dependency. I will implement 
this in the v2.

PS I noticed the PMD will not initialize since the "fix max Rx packet length" 
series. Probably an incorrect value for max_rx_pktlen in the PMD. I will look 
into it and provide a patch.

Thanks,
Ciara

> 
> 
> [1]
> In file included from ../drivers/net/af_xdp/rte_eth_af_xdp.c:42:
> ../drivers/net/af_xdp/compat.h:9:10: fatal error: xdp/filter.h: No such file 
> or
> directory
>  9 | #include 
>|

Re: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process

2021-09-23 Thread Loftus, Ciara

> > > > >
> > > > > Doing basic operations like info_get or get_stats was broken
> > > > > in af_xdp PMD. The info_get would crash because dev->device
> > > > > was NULL in secondary process. Fix this by doing same initialization
> > > > > as af_packet and tap devices.
> > > > >
> > > > > The get_stats would crash because the XDP socket is not open in
> > > > > primary process. As a workaround don't query kernel for dropped
> > > > > packets when called from secondary process.
> > > > >
> > > > > Note: this does not address the other bug which is that transmitting
> > > > > in secondary process is broken because the send() in tx_kick
> > > > > will fail because XDP socket fd is not valid in secondary process.
> > > >
> > > > Hi Stephen,
> > > >
> > > > Apologies for the delayed reply, I was on vacation.
> > > >
> > > > In the Bugzilla report you suggest we:
> > > > "mark AF_XDP as broken in with primary/secondary
> > > > and return an error in probe in secondary process".
> > > > I agree with this suggestion. However with this patch we still permit
> > > secondary, and just make sure it doesn't crash for get_stats. Did you
> change
> > > your mind?
> > > > Personally, I would prefer to have primary/secondary either working
> 100%
> > > or else not allowed at all by throwing an error during probe. What do you
> > > think? Do you have a reason/use case to permit secondary processes
> despite
> > > some features not being available eg. full stats, tx?
> > > >
> > > > Thanks,
> > > > Ciara
> > >
> > > There are two cases where secondary is useful even if send/receive can't
> > > work from secondary process.
> > > The pdump and proc-info applications can work with these patches.
> > >
> > > I am using XDP over pdump as an easy way to get packets into the code
> for
> > > testing.
> > >
> > > The flag in the documentation doesn't have a "limited" version.
> > > If you want, will send another patch to disable secondary support.
> >
> > Thanks for explaining. Since there are use cases for secondary, even if the
> functionality is limited, I don't think it should be disabled.
> > Since we can't flag it as 'limited' in the feature matrix, could you please 
> > add
> a note about the send/receive limitation in the AF_XDP PMD documentation
> in a v2? There are already a number of limitations listed, which you can add
> to.
> >
> > Thanks,
> > Ciara
> >
> > >
> > > Supporting secondary, means adding a mechanism to pass the socket
> > > around.
> 
> Looking at this in more detail, and my recommendation is:
> For 21.11 release (and mark it as Fixes so it gets backported). Have the 
> driver
> return -ENOTSUP in secondary process.
> 
> For 22.03 add real secondary support using the rte_mp_msg to pass
> necessary
> state to secondary process. Includes socket (and other memory regions?).
> 
> The pdump and proc-info cases are only useful for developer testing, and
> there are
> other ways to do the same thing.


+1 that sounds reasonable to me.

Thanks,
Ciara

Re: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process

2021-09-20 Thread Loftus, Ciara

> 
> On Mon, 20 Sep 2021 13:23:57 +0000
> "Loftus, Ciara"  wrote:
> 
> > > -Original Message-
> > > From: dev  On Behalf Of Stephen Hemminger
> > > Sent: Friday 3 September 2021 17:15
> > > To: dev@dpdk.org
> > > Cc: Stephen Hemminger ;
> > > sta...@dpdk.org; xiaolong...@intel.com
> > > Subject: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary
> process
> > >
> > > Doing basic operations like info_get or get_stats was broken
> > > in af_xdp PMD. The info_get would crash because dev->device
> > > was NULL in secondary process. Fix this by doing same initialization
> > > as af_packet and tap devices.
> > >
> > > The get_stats would crash because the XDP socket is not open in
> > > primary process. As a workaround don't query kernel for dropped
> > > packets when called from secondary process.
> > >
> > > Note: this does not address the other bug which is that transmitting
> > > in secondary process is broken because the send() in tx_kick
> > > will fail because XDP socket fd is not valid in secondary process.
> >
> > Hi Stephen,
> >
> > Apologies for the delayed reply, I was on vacation.
> >
> > In the Bugzilla report you suggest we:
> > "mark AF_XDP as broken in with primary/secondary
> > and return an error in probe in secondary process".
> > I agree with this suggestion. However with this patch we still permit
> secondary, and just make sure it doesn't crash for get_stats. Did you change
> your mind?
> > Personally, I would prefer to have primary/secondary either working 100%
> or else not allowed at all by throwing an error during probe. What do you
> think? Do you have a reason/use case to permit secondary processes despite
> some features not being available eg. full stats, tx?
> >
> > Thanks,
> > Ciara
> 
> There are two cases where secondary is useful even if send/receive can't
> work from secondary process.
> The pdump and proc-info applications can work with these patches.
> 
> I am using XDP over pdump as an easy way to get packets into the code for
> testing.
> 
> The flag in the documentation doesn't have a "limited" version.
> If you want, will send another patch to disable secondary support.

Thanks for explaining. Since there are use cases for secondary, even if the 
functionality is limited, I don't think it should be disabled.
Since we can't flag it as 'limited' in the feature matrix, could you please add 
a note about the send/receive limitation in the AF_XDP PMD documentation in a 
v2? There are already a number of limitations listed, which you can add to.

Thanks,
Ciara

> 
> Supporting secondary, means adding a mechanism to pass the socket
> around.

Re: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process

2021-09-20 Thread Loftus, Ciara




> -Original Message-
> From: dev  On Behalf Of Stephen Hemminger
> Sent: Friday 3 September 2021 17:15
> To: dev@dpdk.org
> Cc: Stephen Hemminger ;
> sta...@dpdk.org; xiaolong...@intel.com
> Subject: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process
> 
> Doing basic operations like info_get or get_stats was broken
> in af_xdp PMD. The info_get would crash because dev->device
> was NULL in secondary process. Fix this by doing same initialization
> as af_packet and tap devices.
> 
> The get_stats would crash because the XDP socket is not open in
> primary process. As a workaround don't query kernel for dropped
> packets when called from secondary process.
> 
> Note: this does not address the other bug which is that transmitting
> in secondary process is broken because the send() in tx_kick
> will fail because XDP socket fd is not valid in secondary process.

Hi Stephen,

Apologies for the delayed reply, I was on vacation.

In the Bugzilla report you suggest we:
"mark AF_XDP as broken in with primary/secondary
and return an error in probe in secondary process".
I agree with this suggestion. However with this patch we still permit 
secondary, and just make sure it doesn't crash for get_stats. Did you change 
your mind?
Personally, I would prefer to have primary/secondary either working 100% or 
else not allowed at all by throwing an error during probe. What do you think? 
Do you have a reason/use case to permit secondary processes despite some 
features not being available eg. full stats, tx?

Thanks,
Ciara

> 
> Bugzilla ID: 805
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> Cc: sta...@dpdk.org
> Cc: xiaolong...@intel.com
> Ciara Loftus 
> Qi Zhang 
> Anatoly Burakov 
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 74ffa4511284..70abc14fa753 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -860,7 +860,7 @@ eth_stats_get(struct rte_eth_dev *dev, struct
> rte_eth_stats *stats)
>   struct pkt_rx_queue *rxq;
>   struct pkt_tx_queue *txq;
>   socklen_t optlen;
> - int i, ret;
> + int i;
> 
>   for (i = 0; i < dev->data->nb_rx_queues; i++) {
>   optlen = sizeof(struct xdp_statistics);
> @@ -876,13 +876,12 @@ eth_stats_get(struct rte_eth_dev *dev, struct
> rte_eth_stats *stats)
>   stats->ibytes += stats->q_ibytes[i];
>   stats->imissed += rxq->stats.rx_dropped;
>   stats->oerrors += txq->stats.tx_dropped;
> - ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> - XDP_STATISTICS, &xdp_stats, &optlen);
> - if (ret != 0) {
> - AF_XDP_LOG(ERR, "getsockopt() failed for
> XDP_STATISTICS.\n");
> - return -1;
> - }
> - stats->imissed += xdp_stats.rx_dropped;
> +
> + /* The socket fd is not valid in secondary process */
> + if (rte_eal_process_type() != RTE_PROC_SECONDARY &&
> + getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> +XDP_STATISTICS, &xdp_stats, &optlen) == 0)
> + stats->imissed += xdp_stats.rx_dropped;
> 
>   stats->opackets += stats->q_opackets[i];
>   stats->obytes += stats->q_obytes[i];
> @@ -1799,7 +1798,9 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device
> *dev)
>   AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
>   return -EINVAL;
>   }
> + /* TODO: reconnect socket from primary */
>   eth_dev->dev_ops = &ops;
> + eth_dev->device = &dev->device;
>   rte_eth_dev_probing_finish(eth_dev);
>   return 0;
>   }
> --
> 2.30.2

Re: [dpdk-dev] [PATCH v2] net/af_xdp: fix zero copy Tx queue drain

2021-08-25 Thread Loftus, Ciara

> 
> Call xsk_ring_prod__submit() before kick_tx() so that the kernel
> consumer sees the updated state of Tx ring. Otherwise, Tx packets are
> stuck in the ring until the next call to af_xdp_tx_zc().
> 
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Baruch Siach 

Thanks for respinning. I tested it out and it looks good to me.

Acked-by: Ciara Loftus 

> --
> v2:
> 
>   Don't call xsk_ring_prod__submit() when kick_tx() is only used to
>   drain the completion queue (Ciara Loftus)
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 74ffa4511284..9bea0a895a3e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -527,7 +527,6 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
> 
>   if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
>   rte_pktmbuf_free(local_mbuf);
> - kick_tx(txq, cq);
>   goto out;
>   }
> 
> @@ -551,10 +550,9 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   tx_bytes += mbuf->pkt_len;
>   }
> 
> - kick_tx(txq, cq);
> -
>  out:
>   xsk_ring_prod__submit(&txq->tx, count);
> + kick_tx(txq, cq);
> 
>   txq->stats.tx_pkts += count;
>   txq->stats.tx_bytes += tx_bytes;
> --
> 2.32.0

Re: [dpdk-dev] [PATCH] net/af_xdp: fix zero copy Tx queue drain

2021-08-25 Thread Loftus, Ciara

> 
> Call xsk_ring_prod__submit() before kick_tx() so that the kernel
> consumer sees the updated state of Tx ring. Otherwise, Tx packets are
> stuck in the ring until the next call to af_xdp_tx_zc().
> 
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Baruch Siach 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 74ffa4511284..81998d86b4aa 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -502,10 +502,11 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> 
>   if (mbuf->pool == umem->mb_pool) {
>   if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
> + xsk_ring_prod__submit(&txq->tx, count);

We cannot submit 'count' to the tx ring both here and at 'out'. We could end up 
submitting too many.

>   kick_tx(txq, cq);

The purpose of this kick is not necessarily to transmit new descriptors but to 
drain the completion queue. No space in the completion queue may be what is 
preventing the kernel from transmitting existing tx buffers and then in 
userspace causing the the xsk_ring_prod__reserve to fail.
We are not trying to transmit new descriptors here.

>   if (!xsk_ring_prod__reserve(&txq->tx, 1,
>   &idx_tx))
> - goto out;
> + goto out_skip_tx;
>   }
>   desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx);
>   desc->len = mbuf->pkt_len;
> @@ -527,7 +528,6 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
> 
>   if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
>   rte_pktmbuf_free(local_mbuf);
> - kick_tx(txq, cq);
>   goto out;
>   }
> 
> @@ -551,11 +551,11 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   tx_bytes += mbuf->pkt_len;
>   }
> 
> - kick_tx(txq, cq);
> -
>  out:
>   xsk_ring_prod__submit(&txq->tx, count);
> + kick_tx(txq, cq);

I think this change is valid. We should kick here after the submit.
Thanks for the patch. Could you please spin a v2 if you agree with the above?

Thanks,
Ciara

> 
> +out_skip_tx:
>   txq->stats.tx_pkts += count;
>   txq->stats.tx_bytes += tx_bytes;
>   txq->stats.tx_dropped += nb_pkts - count;
> --
> 2.32.0

Re: [dpdk-dev] dpdk-stable-20.11.1 compile with all examples fails on openSUSE Leap 15.3

2021-06-23 Thread Loftus, Ciara

> 
> Hi,
> 
> I can't compile dpdk-stable-20.11.1 on openSUSE Leap 15.3.
> 
> kernel: 5.3.18-59.5-default
> 
> Configuration options:
> Message:
> =
> Libraries Enabled
> =
> 
> libs:
> kvargs, telemetry, eal, ring, rcu, mempool, mbuf, net,
> meter, ethdev, pci, cmdline, metrics, hash, timer, acl,
> bbdev, bitratestats, cfgfile, compressdev, cryptodev, distributor,
> efd, eventdev,
> gro, gso, ip_frag, jobstats, kni, latencystats, lpm, member,
> power, pdump, rawdev, regexdev, rib, reorder, sched, security,
> stack, vhost, ipsec, fib, port, table, pipeline, flow_classify,
> bpf, graph, node,
> 
> Message:
> ===
> Drivers Enabled
> ===
> 
> common:
> cpt, dpaax, iavf, octeontx, octeontx2, sfc_efx, mlx5, qat,
> 
> bus:
> dpaa, fslmc, ifpga, pci, vdev, vmbus,
> mempool:
> bucket, dpaa, dpaa2, octeontx, octeontx2, ring, stack,
> net:
> af_packet, af_xdp, ark, atlantic, avp, axgbe, bond, bnx2x,
> bnxt, cxgbe, dpaa, dpaa2, e1000, ena, enetc, enic,
> failsafe, fm10k, i40e, hinic, hns3, iavf, ice, igc,
> ixgbe, kni, liquidio, memif, mlx4, mlx5, netvsc, nfp,
> null, octeontx, octeontx2, pcap, pfe, qede, ring, sfc,
> softnic, tap, thunderx, txgbe, vdev_netvsc, vhost, virtio, vmxnet3,
> 
> raw:
> dpaa2_cmdif, dpaa2_qdma, ioat, ntb, octeontx2_dma, octeontx2_ep,
> skeleton,
> crypto:
> bcmfs, caam_jr, ccp, dpaa_sec, dpaa2_sec, nitrox, null, octeontx,
> octeontx2, openssl, scheduler, virtio,
> compress:
> octeontx, zlib,
> regex:
> mlx5, octeontx2,
> vdpa:
> ifc, mlx5,
> event:
> dlb, dlb2, dpaa, dpaa2, octeontx2, opdl, skeleton, sw,
> dsw, octeontx,
> baseband:
> null, turbo_sw, fpga_lte_fec, fpga_5gnr_fec, acc100,
> 
> Message:
> =
> Content Skipped
> =
> 
> libs:
> 
> drivers:
> common/mvep:missing dependency, "libmusdk"
> net/ipn3ke: missing dependency, "libfdt"
> net/mvneta: missing dependency, "libmusdk"
> net/mvpp2:  missing dependency, "libmusdk"
> net/nfb:missing dependency, "libnfb"
> net/szedata2:   missing dependency, "libsze2"
> raw/ifpga:  missing dependency, "libfdt"
> crypto/aesni_gcm:   missing dependency, "libIPSec_MB"
> crypto/aesni_mb:missing dependency, "libIPSec_MB"
> crypto/armv8:   missing dependency, "libAArch64crypto"
> crypto/kasumi:  missing dependency, "libIPSec_MB"
> crypto/mvsam:   missing dependency, "libmusdk"
> crypto/snow3g:  missing dependency, "libIPSec_MB"
> crypto/zuc: missing dependency, "libIPSec_MB"
> compress/isal:  missing dependency, "libisal"
> 
> 
> Build targets in project: 1116
> 
> Found ninja-1.10.0 at /usr/bin/ninja
> 
> The ninja build fails:
> ninja: Entering directory `build'
> [896/2816] Compiling C object
> 'drivers/a715181@@tmp_rte_net_af_xdp@sta
> /net_af_xdp_rte_eth_af_xdp.c.o'
> FAILED:
> drivers/a715181@@tmp_rte_net_af_xdp@sta/net_af_xdp_rte_eth_af_xdp
> .c.o
> 
> cc -Idrivers/a715181@@tmp_rte_net_af_xdp@sta -Idrivers -I../drivers
> -Idrivers/net/af_xdp -I../drivers/net/af_xdp -Ilib/librte_ethdev
> -I../lib/librte_ethdev -I. -I.. -Iconfig -I../config
> -Ilib/librte_eal/include -I../lib/librte_eal/include
> -Ilib/librte_eal/linux/include -I../lib/librte_eal/linux/include
> -Ilib/librte_eal/x86/include -I../lib/librte_eal/x86/include
> -Ilib/librte_eal/common -I../lib/librte_eal/common -Ilib/librte_eal
> -I../lib/librte_eal -Ilib/librte_kvargs -I../lib/librte_kvargs
> -Ilib/librte_metrics -I../lib/librte_metrics -Ilib/librte_telemetry
> -I../lib/librte_telemetry -Ilib/librte_net -I../lib/librte_net
> -Ilib/librte_mbuf -I../lib/librte_mbuf -Ilib/librte_mempool
> -I../lib/librte_mempool -Ilib/librte_ring -I../lib/librte_ring
> -Ilib/librte_meter -I../lib/librte_meter -Idrivers/bus/pci
> -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/librte_pci
> -I../lib/librte_pci -Idrivers/bus/vdev -I../drivers/bus/vdev
> -I/usr/local/include -fdiagnostics-color=always -pipe
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -include rte_config.h
> -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral
> -Wformat-security -Wmissing-declarations -Wmissing-prototypes
> -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare
> -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-missing-field-initializers
> -D_GNU_SOURCE -fPIC -march=native -DALLOW_EXPERIMENTAL_API
> -DALLOW_INTERNAL_API -Wno-format-truncation -MD -MQ
> 'drivers/a715181@
> @tmp_rte_net_af_xdp@sta/net_af_xdp_rte_eth_af_xdp.c.o' -MF
> 'drivers/a715181@
> @tmp_rte_net_af_xdp@sta/net_af_xdp_rte_eth_af_xdp.c.o.d' -o
> 'drivers/a715181@@tmp_rte_net_af_xdp@sta/net_af_xdp_rte_eth_af_xd
> p.c.o' -c
> ../drivers/net/af_xdp/rte_eth_af_xdp.c
> In file included

Re: [dpdk-dev] [PATCH v1 2/7] net/af_xdp: add power monitor support

2021-06-02 Thread Loftus, Ciara

> Subject: [PATCH v1 2/7] net/af_xdp: add power monitor support
> 
> Implement support for .get_monitor_addr in AF_XDP driver.
> 
> Signed-off-by: Anatoly Burakov 

Thanks Anatoly. LGTM.

Acked-by: Ciara Loftus 

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index eb5660a3dc..dfbf74ea53 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "compat.h"
> 
> @@ -788,6 +789,29 @@ eth_dev_configure(struct rte_eth_dev *dev)
>   return 0;
>  }
> 
> +static int
> +eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
> +{
> + struct pkt_rx_queue *rxq = rx_queue;
> + unsigned int *prod = rxq->rx.producer;
> + const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */
> +
> + /* watch for changes in producer ring */
> + pmc->addr = (void*)prod;
> +
> + /* store current value */
> + pmc->val = cur_val;
> + pmc->mask = (uint32_t)~0; /* mask entire uint32_t value */
> +
> + /* AF_XDP producer ring index is 32-bit */
> + pmc->size = sizeof(uint32_t);
> +
> + /* this requires an inverted check */
> + pmc->invert = 1;
> +
> + return 0;
> +}
> +
>  static int
>  eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  {
> @@ -1448,6 +1472,7 @@ static const struct eth_dev_ops ops = {
>   .link_update = eth_link_update,
>   .stats_get = eth_stats_get,
>   .stats_reset = eth_stats_reset,
> + .get_monitor_addr = eth_get_monitor_addr
>  };
> 
>  /** parse busy_budget argument */
> --
> 2.25.1

Re: [dpdk-dev] [21.08 PATCH v1 2/2] net/af_xdp: add power monitor support

2021-05-12 Thread Loftus, Ciara

> 
> Implement support for .get_monitor_addr in AF_XDP driver.
> 
> Signed-off-by: Anatoly Burakov 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 52 -
>  1 file changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 0c91a40c4a..a4b4a4b75d 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "compat.h"
> 
> @@ -778,6 +779,26 @@ eth_dev_configure(struct rte_eth_dev *dev)
>   return 0;
>  }
> 
> +static int
> +eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
> +{
> + struct pkt_rx_queue *rxq = rx_queue;
> + unsigned int *prod = rxq->fq.producer;
> + const uint32_t cur_val = rxq->fq.cached_prod; /* use cached value

The above two lines should use the producer from the rx ring 'rx' instead of 
the fill queue 'fq'.
So instead of rxq->fq.* should be rxq->rx.*

Other than that, and once the additional whitespace you identified below is 
removed it LGTM.

Thanks,
Ciara

> */
> +
> + /* watch for changes in producer ring */
> + pmc->addr = (void*)prod;
> +
> + /* store current value */
> + pmc->val = cur_val;
> + pmc->mask = (uint32_t)~0; /* mask entire uint32_t value */
> +
> + /* AF_XDP producer ring index is 32-bit */
> + pmc->size = sizeof(uint32_t);
> +
> + return 0;
> +}
> +
>  static int
>  eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  {
> @@ -1423,21 +1444,22 @@ eth_dev_promiscuous_disable(struct
> rte_eth_dev *dev)
>  }
> 
>  static const struct eth_dev_ops ops = {
> - .dev_start = eth_dev_start,
> - .dev_stop = eth_dev_stop,
> - .dev_close = eth_dev_close,
> - .dev_configure = eth_dev_configure,
> - .dev_infos_get = eth_dev_info,
> - .mtu_set = eth_dev_mtu_set,
> - .promiscuous_enable = eth_dev_promiscuous_enable,
> - .promiscuous_disable = eth_dev_promiscuous_disable,
> - .rx_queue_setup = eth_rx_queue_setup,
> - .tx_queue_setup = eth_tx_queue_setup,
> - .rx_queue_release = eth_queue_release,
> - .tx_queue_release = eth_queue_release,
> - .link_update = eth_link_update,
> - .stats_get = eth_stats_get,
> - .stats_reset = eth_stats_reset,
> +.dev_start = eth_dev_start,
> +.dev_stop = eth_dev_stop,
> +.dev_close = eth_dev_close,
> +.dev_configure = eth_dev_configure,
> +.dev_infos_get = eth_dev_info,
> +.mtu_set = eth_dev_mtu_set,
> +.promiscuous_enable = eth_dev_promiscuous_enable,
> +.promiscuous_disable = eth_dev_promiscuous_disable,
> +.rx_queue_setup = eth_rx_queue_setup,
> +.tx_queue_setup = eth_tx_queue_setup,
> +.rx_queue_release = eth_queue_release,
> +.tx_queue_release = eth_queue_release,
> +.link_update = eth_link_update,
> +.stats_get = eth_stats_get,
> +.stats_reset = eth_stats_reset,
> +.get_monitor_addr = eth_get_monitor_addr
>  };
> 
>  /** parse busy_budget argument */
> --
> 2.25.1

Re: [dpdk-dev] [PATCH v2 3/3] net/af_xdp: preferred busy polling

2021-03-09 Thread Loftus, Ciara

> On 3/9/2021 10:19 AM, Ciara Loftus wrote:
> > This commit introduces support for preferred busy polling
> > to the AF_XDP PMD. This feature aims to improve single-core
> > performance for AF_XDP sockets under heavy load.
> >
> > A new vdev arg is introduced called 'busy_budget' whose default
> > value is 64. busy_budget is the value supplied to the kernel
> > with the SO_BUSY_POLL_BUDGET socket option and represents the
> > busy-polling NAPI budget. To set the budget to a different value
> > eg. 256:
> >
> > --vdev=net_af_xdp0,iface=eth0,busy_budget=256
> >
> > Preferred busy polling is enabled by default provided a kernel with
> > version >= v5.11 is in use. To disable it, set the budget to zero.
> >
> > The following settings are also strongly recommended to be used in
> > conjunction with this feature:
> >
> > echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
> > echo 20 | sudo tee /sys/class/net/eth0/gro_flush_timeout
> >
> > .. where eth0 is the interface being used by the PMD.
> >
> > Signed-off-by: Ciara Loftus 
> 
> <...>
> 
> > --- a/doc/guides/rel_notes/release_21_05.rst
> > +++ b/doc/guides/rel_notes/release_21_05.rst
> > @@ -70,6 +70,10 @@ New Features
> > * Added command to display Rx queue used descriptor count.
> >   ``show port (port_id) rxq (queue_id) desc used count``
> >
> > +* **Updated the AF_XDP driver.**
> > +
> > +  * Added support for preferred busy polling.
> > +
> >
> 
> Can you please move the update after above the testpmd updates?
> For more details the expected order is in the section comment.
> 
> > +static int
> > +configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
> > +{
> > +   int sock_opt = 1;
> > +   int fd = xsk_socket__fd(rxq->xsk);
> > +   int ret = 0;
> > +
> > +   ret = setsockopt(fd, SOL_SOCKET, SO_PREFER_BUSY_POLL,
> > +   (void *)&sock_opt, sizeof(sock_opt));
> > +   if (ret < 0) {
> > +   AF_XDP_LOG(DEBUG, "Failed to set
> SO_PREFER_BUSY_POLL\n");
> > +   goto err_prefer;
> > +   }
> > +
> > +   sock_opt = ETH_AF_XDP_DFLT_BUSY_TIMEOUT;
> > +   ret = setsockopt(fd, SOL_SOCKET, SO_BUSY_POLL, (void
> *)&sock_opt,
> > +   sizeof(sock_opt));
> > +   if (ret < 0) {
> > +   AF_XDP_LOG(DEBUG, "Failed to set SO_BUSY_POLL\n");
> 
> [1]
> 
> > +   goto err_timeout;
> > +   }
> > +
> > +   sock_opt = rxq->busy_budget;
> > +   ret = setsockopt(fd, SOL_SOCKET, SO_BUSY_POLL_BUDGET,
> > +   (void *)&sock_opt, sizeof(sock_opt));
> > +   if (ret < 0) {
> > +   AF_XDP_LOG(DEBUG, "Failed to set
> SO_BUSY_POLL_BUDGET\n");
> 
> In above [1] and here, shouldn't the function return error, even the rollback
> is
> successful.
> I am thinking a case an invalid 'busy_budget' provided, like a very big number
> or negative value.

How about introducing a check when parsing the argument at init and failing 
then, instead of here?
In that case if we fail here it should not be due to invalid value. It would be 
due to insufficient permissions.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/sock.c?h=v5.11#n1150
In that case I think issuing a log, rolling back and continuing with setup 
would be best, instead of returning an error and aborting and forcing the user 
to explicitly disable busy polling via busy_budget=0 in order to get the PMD to 
initialize.

> 
> > +   } else {
> > +   AF_XDP_LOG(INFO, "Busy polling budget set to: %u\n",
> > +   rxq->busy_budget);
> > +   return 0;
> > +   }
> > +
> > +   /* setsockopt failure - attempt to restore xsk to default state and
> > +* proceed without busy polling support.
> > +*/
> > +   sock_opt = 0;
> > +   ret = setsockopt(fd, SOL_SOCKET, SO_BUSY_POLL, (void
> *)&sock_opt,
> > +   sizeof(sock_opt));
> > +   if (ret < 0) {
> > +   AF_XDP_LOG(ERR, "Failed to unset SO_BUSY_POLL\n");
> > +   return -1;
> > +   }
> > +
> > +err_timeout:
> > +   sock_opt = 0;
> > +   ret = setsockopt(fd, SOL_SOCKET, SO_PREFER_BUSY_POLL,
> > +   (void *)&sock_opt, sizeof(sock_opt));
> > +   if (ret < 0) {
> > +   AF_XDP_LOG(ERR, "Failed to unset
> SO_PREFER_BUSY_POLL\n");
> > +   return -1;
> > +   }
> > +
> > +err_prefer:
> > +   rxq->busy_budget = 0;
> > +   return 0;
> > +}
> > +
> 
> <...>

Re: [dpdk-dev] [PATCH v2 1/3] net/af_xdp: allow bigger batch sizes

2021-03-09 Thread Loftus, Ciara

> 
> On 3/9/2021 10:19 AM, Ciara Loftus wrote:
> > Prior to this commit, the maximum batch sizes for zero-copy and copy-
> mode
> > rx and copy-mode tx were set to 32. Apart from zero-copy tx, the user
> > could never rx/tx any more than 32 packets at a time and without
> inspecting
> > the code the user wouldn't be aware of this.
> >
> > This commit removes these upper limits placed on the user and instead
> > sets an internal batch size equal to the default ring size (2048). Batches
> > larger than this are still processed, however they are split into smaller
> > batches similar to how it's done in other drivers. This is necessary
> > because some arrays used during rx/tx need to be sized at compile-time.
> >
> > Allowing a larger batch size allows for fewer batches and thus larger bulk
> > operations, fewer ring accesses and fewer syscalls which should yield
> > improved performance.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   drivers/net/af_xdp/rte_eth_af_xdp.c | 67
> -
> >   1 file changed, 57 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 3957227bf0..be524e4784 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -66,8 +66,8 @@ RTE_LOG_REGISTER(af_xdp_logtype,
> pmd.net.af_xdp, NOTICE);
> >   #define ETH_AF_XDP_DFLT_START_QUEUE_IDX   0
> >   #define ETH_AF_XDP_DFLT_QUEUE_COUNT   1
> >
> > -#define ETH_AF_XDP_RX_BATCH_SIZE   32
> > -#define ETH_AF_XDP_TX_BATCH_SIZE   32
> > +#define ETH_AF_XDP_RX_BATCH_SIZE
>   XSK_RING_CONS__DEFAULT_NUM_DESCS
> > +#define ETH_AF_XDP_TX_BATCH_SIZE
>   XSK_RING_CONS__DEFAULT_NUM_DESCS
> >
> 
> Just to double check, can there be a library version that these macros not
> defined, should it be checked?

There can't be a library version with AF_XDP support without these macros, as 
they've been around since the very beginning:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1cad078842396
We only build the PMD if xsk.h is available, and since these macros have been 
in the file since it has existed, we're safe.

Re: [dpdk-dev] [PATCH 3/3] net/af_xdp: preferred busy polling

2021-03-08 Thread Loftus, Ciara

> >
> > On 2/24/2021 11:18 AM, Ciara Loftus wrote:
> > > This commit introduces support for preferred busy polling
> > > to the AF_XDP PMD. This feature aims to improve single-core
> > > performance for AF_XDP sockets under heavy load.
> > >
> > > A new vdev arg is introduced called 'busy_budget' whose default
> > > value is 64. busy_budget is the value supplied to the kernel
> > > with the SO_BUSY_POLL_BUDGET socket option and represents the
> > > busy-polling NAPI budget. To set the budget to a different value
> > > eg. 256:
> > >
> > > --vdev=net_af_xdp0,iface=eth0,busy_budget=256
> > >
> > > Preferred busy polling is enabled by default provided a kernel with
> > > version >= v5.11 is in use. To disable it, set the budget to zero.
> > >
> > > The following settings are also strongly recommended to be used in
> > > conjunction with this feature:
> > >
> > > echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
> > > echo 20 | sudo tee /sys/class/net/eth0/gro_flush_timeout
> > >
> > > .. where eth0 is the interface being used by the PMD.
> > >
> > > Signed-off-by: Ciara Loftus 
> > > ---
> > >   doc/guides/nics/af_xdp.rst  | 38 -
> > >   drivers/net/af_xdp/compat.h | 13 +
> > >   drivers/net/af_xdp/rte_eth_af_xdp.c | 85
> > -
> > >   3 files changed, 121 insertions(+), 15 deletions(-)
> >
> > Can you please update the release notes too to announce the feature?
> 
> Will do.
> 
> >
> > <...>
> >
> >
> > > @@ -39,3 +39,16 @@ create_shared_socket(struct xsk_socket **xsk_ptr
> > __rte_unused,
> > >   return -1;
> > >   }
> > >   #endif
> > > +
> > > +#ifdef XDP_USE_NEED_WAKEUP
> > > +static int
> > > +syscall_needed(struct xsk_ring_prod *q, uint32_t busy_budget)
> > > +{
> > > + return xsk_ring_prod__needs_wakeup(q) | busy_budget;
> > > +}
> > > +#else
> > > +syscall_needed(struct xsk_ring_prod *q __rte_unused, uint32_t
> > busy_budget)
> > > +{
> > > + return busy_budget;
> > > +}
> >
> > Is the return type missing in the definition?
> 
> Yes. Thanks for spotting this.
> 
> >
> > Also for the case when both 'XDP_USE_NEED_WAKEUP' &
> > 'SO_PREFER_BUSY_POLL' this
> > function will always return '0', but current implementation doesn't know
> this
> > in
> > the compile time and compiler can't optimize for it, do you think does it
> make
> > sense to do this optimization?
> 
> It makes sense assuming the compile environment and run environment are
> the same.
> However you make a valid point below. If the environments are different,
> we can't make this optimization because we can’t rely on the presence of the
> flags alone to tell us if these features are supported. More below.
> 
> >
> > <...>
> >
> > > @@ -1628,8 +1670,22 @@ rte_pmd_af_xdp_probe(struct
> rte_vdev_device
> > *dev)
> > >   return -EINVAL;
> > >   }
> > >
> > > +#ifdef SO_PREFER_BUSY_POLL
> > > + busy_budget = busy_budget == -1 ?
> > ETH_AF_XDP_DFLT_BUSY_BUDGET :
> > > + busy_budget;
> > > + if (!busy_budget)
> > > + AF_XDP_LOG(ERR, "Preferred busy polling disabled\n");
> >
> > Is this an error case? What do you think changing the log level to DEBUG or
> > INFO?
> 
> +1 for INFO
> 
> >
> > Also how these compile time flags will work if the compiled environment
> and
> > run
> > environment kernel version are different and incompatible?
> 
> This is a valid point. Right now if XDP_USE_NEED_WAKEUP is defined we
> assume the functionality is available in the kernel. If it's not, socket 
> creation
> will fail and we abort. Perhaps we should retry socket creation without the
> flag if we get this failure. And record if support is available in a runtime
> variable. I'll look at adding this as another patch to the v2 series.

Hi Ferruh,

I looked at this a little more. For the v2 I'll make sure busy poll can work in 
these environments with different compile-time and run-time kernels and use 
setsockopt() to detect support in the kernel.
Since it will require significant changes and validation I'll submit a separate 
series ensuring the same for the other existing flags (XDP_USE_NEED_WAKEUP / 
XDP_UMEM_UNALIGNED_CHUNK_FLAG / shared umem).

Thanks,
Ciara

> 
> >
> > Overall can it be possible to detect the support on runtime via
> 'setsockopt()'
> > without compile time macros and eliminate the compile time flags? Does it
> > make
> > sense?
> 
> I think this can be done. It should allow applications compiled on older
> kernels without SO_PREFER_BUSY_POLL run on newer kernels with the
> feature.
> I will tackle this in the v2.
> 
> Thanks for your feedback!
> 
> Ciara

Re: [dpdk-dev] [PATCH 3/3] net/af_xdp: preferred busy polling

2021-03-04 Thread Loftus, Ciara

> 
> On 2/24/2021 11:18 AM, Ciara Loftus wrote:
> > This commit introduces support for preferred busy polling
> > to the AF_XDP PMD. This feature aims to improve single-core
> > performance for AF_XDP sockets under heavy load.
> >
> > A new vdev arg is introduced called 'busy_budget' whose default
> > value is 64. busy_budget is the value supplied to the kernel
> > with the SO_BUSY_POLL_BUDGET socket option and represents the
> > busy-polling NAPI budget. To set the budget to a different value
> > eg. 256:
> >
> > --vdev=net_af_xdp0,iface=eth0,busy_budget=256
> >
> > Preferred busy polling is enabled by default provided a kernel with
> > version >= v5.11 is in use. To disable it, set the budget to zero.
> >
> > The following settings are also strongly recommended to be used in
> > conjunction with this feature:
> >
> > echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
> > echo 20 | sudo tee /sys/class/net/eth0/gro_flush_timeout
> >
> > .. where eth0 is the interface being used by the PMD.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   doc/guides/nics/af_xdp.rst  | 38 -
> >   drivers/net/af_xdp/compat.h | 13 +
> >   drivers/net/af_xdp/rte_eth_af_xdp.c | 85
> -
> >   3 files changed, 121 insertions(+), 15 deletions(-)
> 
> Can you please update the release notes too to announce the feature?

Will do.

> 
> <...>
> 
> 
> > @@ -39,3 +39,16 @@ create_shared_socket(struct xsk_socket **xsk_ptr
> __rte_unused,
> > return -1;
> >   }
> >   #endif
> > +
> > +#ifdef XDP_USE_NEED_WAKEUP
> > +static int
> > +syscall_needed(struct xsk_ring_prod *q, uint32_t busy_budget)
> > +{
> > +   return xsk_ring_prod__needs_wakeup(q) | busy_budget;
> > +}
> > +#else
> > +syscall_needed(struct xsk_ring_prod *q __rte_unused, uint32_t
> busy_budget)
> > +{
> > +   return busy_budget;
> > +}
> 
> Is the return type missing in the definition?

Yes. Thanks for spotting this.

> 
> Also for the case when both 'XDP_USE_NEED_WAKEUP' &
> 'SO_PREFER_BUSY_POLL' this
> function will always return '0', but current implementation doesn't know this
> in
> the compile time and compiler can't optimize for it, do you think does it make
> sense to do this optimization?

It makes sense assuming the compile environment and run environment are the 
same.
However you make a valid point below. If the environments are different, we 
can't make this optimization because we can’t rely on the presence of the flags 
alone to tell us if these features are supported. More below.

> 
> <...>
> 
> > @@ -1628,8 +1670,22 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device
> *dev)
> > return -EINVAL;
> > }
> >
> > +#ifdef SO_PREFER_BUSY_POLL
> > +   busy_budget = busy_budget == -1 ?
> ETH_AF_XDP_DFLT_BUSY_BUDGET :
> > +   busy_budget;
> > +   if (!busy_budget)
> > +   AF_XDP_LOG(ERR, "Preferred busy polling disabled\n");
> 
> Is this an error case? What do you think changing the log level to DEBUG or
> INFO?

+1 for INFO

> 
> Also how these compile time flags will work if the compiled environment and
> run
> environment kernel version are different and incompatible?

This is a valid point. Right now if XDP_USE_NEED_WAKEUP is defined we assume 
the functionality is available in the kernel. If it's not, socket creation will 
fail and we abort. Perhaps we should retry socket creation without the flag if 
we get this failure. And record if support is available in a runtime variable. 
I'll look at adding this as another patch to the v2 series.

> 
> Overall can it be possible to detect the support on runtime via 'setsockopt()'
> without compile time macros and eliminate the compile time flags? Does it
> make
> sense?

I think this can be done. It should allow applications compiled on older 
kernels without SO_PREFER_BUSY_POLL run on newer kernels with the feature.
I will tackle this in the v2.

Thanks for your feedback!

Ciara

Re: [dpdk-dev] [PATCH 1/3] net/af_xdp: Increase max batch size to 512

2021-03-03 Thread Loftus, Ciara

> 
> On 2/24/2021 11:18 AM, Ciara Loftus wrote:
> > Prior to this the max size was 32 which was unnecessarily
> > small.
> 
> Can you please describe the impact? Why changed from 32 to 512?
> I assume this is to improve the performance but can you please explicitly
> document it in the commit log?

Indeed - improved performance due to bulk operations and fewer ring accesses 
and syscalls.
The value 512 was arbitrary. I will change this to the default ring size as 
defined by libbpf (2048) in v2.
Will update the commit log with this info.

> 
> > Also enforce the max batch size for TX for both
> > copy and zero copy modes. Prior to this only copy mode
> > enforced the max size.
> >
> 
> By enforcing, the PMD ignores the user provided burst value if it is more than
> PMS supported MAX, and this ignoring is done in silent. Also there is no way
> to
> discover this MAX value without checking the code.
> 
> Overall, why this max values are required at all? After quick check I can see
> they are used for some bulk operations, which I assume can be eliminated,
> what
> do you think?

We need to size some arrays at compile time with this max value.

Instead of removing the bulk operations which may impact performance, how about 
taking an approach where we split batches that are > 2048 into smaller batches 
and still handle all the packets instead of discarding those > 2048. Something 
like what's done in ixgbe for example:
http://code.dpdk.org/dpdk/v21.02/source/drivers/net/ixgbe/ixgbe_rxtx.c#L318

Thanks,
Ciara

> 
> > Signed-off-by: Ciara Loftus 
> 
> <...>

Re: [dpdk-dev] [PATCH][v4] net/af_xdp: optimize RX path by removing the unneeded allocation mbuf

2020-11-25 Thread Loftus, Ciara

> Subject: [dpdk-dev] [PATCH][v4] net/af_xdp: optimize RX path by removing
> the unneeded allocation mbuf
> 
> when receive packets, the max bunch number of mbuf are allocated
> if hardware does not receive the max bunch number packets, it
> will free redundancy mbuf, this is low performance
> 
> so optimize rx performance, by allocating number of mbuf based on
> result of xsk_ring_cons__peek, to avoid to redundancy allocation,
> and free mbuf when receive packets
> 
> and rx cached_cons must be rollbacked if fail to allocating mbuf,
> found by Ciara Loftus
> 
> Signed-off-by: Li RongQing 
> Signed-off-by: Dongsheng Rong 

Thanks for the v4.

Acked-by: Ciara Loftus 

> ---
> V2: rollback rx cached_cons if mbuf failed to be allocated
> V3: add comment when rollback rx cached_cons
> V4: fix the comment
> 
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 73 ++---
>  1 file changed, 35 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2c7892bd7..d7dd7d125 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -255,28 +255,32 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   struct xsk_umem_info *umem = rxq->umem;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   struct rte_mbuf *fq_bufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> - /* allocate bufs for fill queue replenishment after rx */
> - if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> - AF_XDP_LOG(DEBUG,
> - "Failed to get enough buffers for fq.\n");
> - return 0;
> - }
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> 
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> -
> - if (rcvd == 0) {
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> 
> - goto out;
> + return 0;
> + }
> +
> + /* allocate bufs for fill queue replenishment after rx */
> + if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> + AF_XDP_LOG(DEBUG,
> + "Failed to get enough buffers for fq.\n");
> + /* rollback cached_cons which is added by
> +  * xsk_ring_cons__peek
> +  */
> + rx->cached_cons -= nb_pkts;
> + return 0;
>   }
> 
> - for (i = 0; i < rcvd; i++) {
> + for (i = 0; i < nb_pkts; i++) {
>   const struct xdp_desc *desc;
>   uint64_t addr;
>   uint32_t len;
> @@ -301,20 +305,14 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   rx_bytes += len;
>   }
> 
> - xsk_ring_cons__release(rx, rcvd);
> -
> - (void)reserve_fill_queue(umem, rcvd, fq_bufs, fq);
> + xsk_ring_cons__release(rx, nb_pkts);
> + (void)reserve_fill_queue(umem, nb_pkts, fq_bufs, fq);
> 
>   /* statistics */
> - rxq->stats.rx_pkts += rcvd;
> + rxq->stats.rx_pkts += nb_pkts;
>   rxq->stats.rx_bytes += rx_bytes;
> 
> -out:
> - if (rcvd != nb_pkts)
> - rte_mempool_put_bulk(umem->mb_pool, (void
> **)&fq_bufs[rcvd],
> -  nb_pkts - rcvd);
> -
> - return rcvd;
> + return nb_pkts;
>  }
>  #else
>  static uint16_t
> @@ -326,7 +324,7 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   struct xsk_ring_prod *fq = &rxq->fq;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   uint32_t free_thresh = fq->size >> 1;
>   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> @@ -334,20 +332,24 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   (void)reserve_fill_queue(umem,
> ETH_AF_XDP_RX_BATCH_SIZE,
>NULL, fq);
> 
> - if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, nb_pkts)
> != 0))
> - return 0;
> -
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> - if (rcvd == 0) {
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> + return 0;
> + }
> 
> - goto out;
> + if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs,
> nb_pkts))) {
> + /* rollback cached_cons which is added by
> +  * xsk_ring_cons__peek
> +  */
> + rx->cached_cons -= nb_pkts;
> + return 0;
>   }
> 
> - for (i = 0; i < rcvd; i++) {
> + for (i = 0; i < nb_pkts; i++) {
>   const struct xdp_desc *desc;
>

Re: [dpdk-dev] [PATCH][v3] net/af_xdp: optimize RX path by removing the unneeded allocation mbuf

2020-11-24 Thread Loftus, Ciara

> 
> when receive packets, the max bunch number of mbuf are allocated
> if hardware does not receive the max bunch number packets, it
> will free redundancy mbuf, this is low performance
> 
> so optimize rx performance, by allocating number of mbuf based on
> result of xsk_ring_cons__peek, to avoid to redundancy allocation,
> and free mbuf when receive packets
> 
> and rx cached_cons must be rollbacked if fail to allocating mbuf,
> found by Ciara Loftus
> 
> Signed-off-by: Li RongQing 
> Signed-off-by: Dongsheng Rong 
> ---
> 
> V2: rollback rx cached_cons if mbuf failed to be allocated
> V3: add comment when rollback rx cached_cons
> we should create a function for rollback as suggested by Ciara Loftus,
> like xsk_ring_cons__cancel, but this function should be in kernel,
> and I will send it to kernel
> 
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 73 ++---
>  1 file changed, 35 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2c7892bd7..69a4d54a3 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -255,28 +255,32 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   struct xsk_umem_info *umem = rxq->umem;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   struct rte_mbuf *fq_bufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> - /* allocate bufs for fill queue replenishment after rx */
> - if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> - AF_XDP_LOG(DEBUG,
> - "Failed to get enough buffers for fq.\n");
> - return 0;
> - }
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> 
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> -
> - if (rcvd == 0) {
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> 
> - goto out;
> + return 0;
> + }
> +
> + /* allocate bufs for fill queue replenishment after rx */
> + if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> + AF_XDP_LOG(DEBUG,
> + "Failed to get enough buffers for fq.\n");
> + /* rollback cached_cons which is added by
> +  * xsk_ring_prod__needs_wakeup
> +  */

Thanks for adding the comment.
There's a small mistake here.
The function in which cached_cons is added is xsk_ring_cons__peek.
Could you please submit a v4 with this change?
Thanks!
Ciara


> + rx->cached_cons -= nb_pkts;
> + return 0;
>   }
> 
> - for (i = 0; i < rcvd; i++) {
> + for (i = 0; i < nb_pkts; i++) {
>   const struct xdp_desc *desc;
>   uint64_t addr;
>   uint32_t len;
> @@ -301,20 +305,14 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   rx_bytes += len;
>   }
> 
> - xsk_ring_cons__release(rx, rcvd);
> -
> - (void)reserve_fill_queue(umem, rcvd, fq_bufs, fq);
> + xsk_ring_cons__release(rx, nb_pkts);
> + (void)reserve_fill_queue(umem, nb_pkts, fq_bufs, fq);
> 
>   /* statistics */
> - rxq->stats.rx_pkts += rcvd;
> + rxq->stats.rx_pkts += nb_pkts;
>   rxq->stats.rx_bytes += rx_bytes;
> 
> -out:
> - if (rcvd != nb_pkts)
> - rte_mempool_put_bulk(umem->mb_pool, (void
> **)&fq_bufs[rcvd],
> -  nb_pkts - rcvd);
> -
> - return rcvd;
> + return nb_pkts;
>  }
>  #else
>  static uint16_t
> @@ -326,7 +324,7 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   struct xsk_ring_prod *fq = &rxq->fq;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   uint32_t free_thresh = fq->size >> 1;
>   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> @@ -334,20 +332,24 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   (void)reserve_fill_queue(umem,
> ETH_AF_XDP_RX_BATCH_SIZE,
>NULL, fq);
> 
> - if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, nb_pkts)
> != 0))
> - return 0;
> -
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> - if (rcvd == 0) {
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> + return 0;
> + }
> 
> - goto out;
> + if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs,
> nb_pkts))) {
> + /* rollback cached_cons which is added by
> +  * xsk_ring_prod__needs_wakeup
> +  */
> +

Re: [dpdk-dev] [dpdk-stable] [PATCH] net/af_xdp: fix 32-bit build for older kernels

2020-11-16 Thread Loftus, Ciara

> 
> On 11/12/2020 4:35 PM, Ciara Loftus wrote:
> > 'uint64_t' is used to hold pointers in multiple locations in the
> > copy-mode code (used for kernels before 5.4). For a 32-bit build
> > this assumption is wrong and results in build errors. This commit
> > replaces such instances of 'uint64_t' with 'uintptr_t'.
> >
> > While the copy-mode code will now compile for 32-bit, the PMD is
> > not expected to work and will fail at initialisation due to some
> > limitations in the kernel that were subsequently removed in v5.4.
> > Add a note to the docs to flag this limitation.
> >
> > Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> > Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   doc/guides/nics/af_xdp.rst  | 1 +
> >   drivers/net/af_xdp/rte_eth_af_xdp.c | 6 +++---
> >   2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> > index 052e59a3ae..5ed24374f8 100644
> > --- a/doc/guides/nics/af_xdp.rst
> > +++ b/doc/guides/nics/af_xdp.rst
> > @@ -50,6 +50,7 @@ This is a Linux-specific PMD, thus the following
> prerequisites apply:
> >   *  For PMD zero copy, it requires kernel version later than v5.4-rc1;
> >   *  For shared_umem, it requires kernel version v5.10 or later and libbpf
> version
> >  v0.2.0 or later.
> > +*  For 32-bit OS, a kernel with version 5.4 or later is required.
> >
> 
> +1 to doc update
> 
> >   Set up an af_xdp interface
> >   -
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 4076ff797c..75ff1c00b2 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -349,7 +349,7 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> >
> > for (i = 0; i < rcvd; i++) {
> > const struct xdp_desc *desc;
> > -   uint64_t addr;
> > +   uintptr_t addr;
> > uint32_t len;
> > void *pkt;
> >
> > @@ -402,7 +402,7 @@ pull_umem_cq(struct xsk_umem_info *umem, int
> size, struct xsk_ring_cons *cq)
> > n = xsk_ring_cons__peek(cq, size, &idx_cq);
> >
> > for (i = 0; i < n; i++) {
> > -   uint64_t addr;
> > +   uintptr_t addr;
> > addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
> 
> Hi Ciara,
> 
> As far as I can see the API 'xsk_ring_cons__comp_addr()' returns fixed size
> variable ('__u64'),
> and when the PMD is compiled for 32bit, won't it be assigning a 64bit variable
> to the 32bit storage.

Correct. However we can assume the higher 32bits are zero in this case.
The 'addr' we are consuming via this API will be one which we previously 
enqueued to the buf_ring and we always cast to (void *) on enqueue.

> 
> I guess libbpf also needs to be adjusted for the 32bit support, what about
> making PMD changes after libbpf changed?

I'm not sure whether this is planned but maybe it makes sense to wait and see 
rather than relying on assumptions above.

> 
> >   #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > addr = xsk_umem__extract_addr(addr);
> > @@ -1005,7 +1005,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> > char ring_name[RTE_RING_NAMESIZE];
> > char mz_name[RTE_MEMZONE_NAMESIZE];
> > int ret;
> > -   uint64_t i;
> > +   uintptr_t i;
> >
> 
> Not sure on this one, 'i' seems not to hold a pointer but index, and result of
> calculation cast to "void *", I assume intention is to prevent calculation
> result to be 64 bit to cover the case "void *" is 4 bytes, for that what do 
> you
> think making variable uint32_t?

Do you suggest something like:
#ifdef RTE_ARCH_64
   uint64_t i;
#else
   uint32_t i;
#endif

I can submit a v2 with just the doc update and hold off on the other changes 
until the necessary changes to libbpf are in place. Let me know what you think.

Thanks,
Ciara

Re: [dpdk-dev] [PATCH][v2] net/af_xdp: avoid to unnecessary allocation and free mbuf in rx path

2020-11-15 Thread Loftus, Ciara

> 
> On 10/14/2020 1:15 PM, Li,Rongqing wrote:
> >
> >
> >> -Original Message-----
> >> From: Loftus, Ciara [mailto:ciara.lof...@intel.com]
> >> Sent: Friday, October 02, 2020 12:24 AM
> >> To: Li,Rongqing 
> >> Cc: dev@dpdk.org
> >> Subject: RE: [PATCH][v2] net/af_xdp: avoid to unnecessary allocation and
> free
> >> mbuf in rx path
> >>
> >>>
> >>> when receive packets, the max bunch number of mbuf are allocated if
> >>> hardware does not receive the max bunch number packets, it will free
> >>> redundancy mbuf, that is low-performance
> >>>
> >>> so optimize rx performance, by allocating number of mbuf based on
> >>> result of xsk_ring_cons__peek, to avoid to redundancy allocation, and
> >>> free mbuf when receive packets
> >>
> >> Hi,
> >>
> >> Thanks for the patch and fixing the issue I raised.
> >
> > Thanks for your finding
> >
> >> With my testing so far I haven't measured an improvement in
> performance
> >> with the patch.
> >> Do you have data to share which shows the benefit of your patch?
> >>
> >> I agree the potential excess allocation of mbufs for the fill ring is not 
> >> the
> most
> >> optimal, but if doing it does not significantly impact the performance I
> would be
> >> in favour of keeping that approach versus touching the cached_cons
> outside of
> >> libbpf which is unconventional.
> >>
> >> If a benefit can be shown and we proceed with the approach, I would
> suggest
> >> creating a new function for the cached consumer rollback eg.
> >> xsk_ring_cons_cancel() or similar, and add a comment describing what it
> does.
> >>
> >
> > Thanks for your test.
> >
> > Yes, it has benefit
> >
> > We first see this issue when do some send performance, topo is like below
> >
> > Qemu with vhost-user ->ovs--->xdp interface
> >
> > Qemu sends udp packets, xdp has not packets to receive, but it must be
> polled by ovs, and xdp must allocated/free mbuf unnecessary, with this
> packet, we has about 5% benefit for sending, this depends on flow table
> complexity
> >
> >
> > When do rx benchmark, if packets per batch is reaching about 32, the
> benefit is very little.
> > If packets per batch is far less than 32, we can see the cycle per packet is
> reduced obviously
> >
> 
> Hi Li, Ciara,
> 
> What is the status of this patch, is the patch justified and is a new versions
> requested/expected?


Apologies for the delay, I missed your reply Li.
With the data you've provided I think the patch is justified.
I think the rollback requires some explanation in the code as it may not be 
immediately clear what is happening.
I suggest a v3 with either a comment above the rollback, or a new function as 
described in my previous mail, also with a comment.

Thanks for the patch.

Ciara

Re: [dpdk-dev] DPDK AF_XDP test results.

2020-11-11 Thread Loftus, Ciara

> 
> Hi All,
>   There is a test plan for the performance tests of different 
> scenarios
> with AF_XDP in the dpdk docs
> https://doc.dpdk.org/dts/test_plans/af_xdp_test_plan.html , but I am not
> able to find the test  results of the same. Can anyone please help in sharing
> the results link for these tests. We are starting to look at AF_XDP with DPDK
> and these results will really help us in deciding few points. Also it will be 
> really
> appreciated if we have some comparative study of using SR-IOV + DPDK PMD
> vs XDP + AF_XDP DPDK PMD. Thanks for the help in this regard.

Hi Souvik,

I'm not aware of available results either however the plan outlines the steps 
to gather the results on your own platform if that's a possibility for you? 
With some small changes comparative tests can be set up for SRIOV.
If you choose to run some tests locally I'd be happy to help if you encounter 
any issues.

Thanks,
Ciara

> 
> --
> Thanks,
> Souvik

Re: [dpdk-dev] [PATCH] net/af_xdp: do not use fixed size storage for pointer

2020-11-09 Thread Loftus, Ciara

> 
> 'uint64_t' is used to hold the pointer, for 32-bits build this
> assumption is wrong and giving following build error:
> 
> rte_eth_af_xdp.c: In function ‘xdp_umem_configure’:
> rte_eth_af_xdp.c:970:15:
> error: cast to pointer from integer of different size
>[-Werror=int-to-pointer-cast]
>   970 |   base_addr = (void *)get_base_addr(mb_pool, &align);
>   |   ^
> 
> Replacing the 'uint64_t' return type of the 'get_base_addr()' to the
> 'uintptr_t'.
> Although not sure if the overall logic supports the 32-bits, using
> 'uintptr_t' should be safe both for 64/32 bits.
> 
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ferruh Yigit 
> 
> ---
> 
> Hi Ciara,
> 
> I am not sure if 32-bit is supported for the af_xdp, but even not does
> this change make sense for the 64-bits?

Hi Ferruh,

LGTM. I've tested it for 64bit and all looks good to me.

Tested-by: Ciara Loftus 

I've been looking into 32-bit compatibility and will submit a patch for at 
least the docs when I've verified what works.

Thanks,
Ciara

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 4076ff797c..2c7892bd7e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -910,13 +910,13 @@ eth_link_update(struct rte_eth_dev *dev
> __rte_unused,
>  }
> 
>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uint64_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
>  {
>   struct rte_mempool_memhdr *memhdr;
> - uint64_t memhdr_addr, aligned_addr;
> + uintptr_t memhdr_addr, aligned_addr;
> 
>   memhdr = STAILQ_FIRST(&mp->mem_list);
> - memhdr_addr = (uint64_t)memhdr->addr;
> + memhdr_addr = (uintptr_t)memhdr->addr;
>   aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>   *align = memhdr_addr - aligned_addr;
> 
> --
> 2.26.2

Re: [dpdk-dev] [PATCH] net/af_xdp: Don't allow umem sharing for xsks with same netdev, qid

2020-10-13 Thread Loftus, Ciara

> On 10/8/2020 10:17 AM, Ciara Loftus wrote:
> > Supporting this would require locks, which would impact the performance
> of
> > the more typical cases - xsks with different qids and netdevs.
> >
> > Signed-off-by: Ciara Loftus 
> > Fixes: 74b46340e2d4 ("net/af_xdp: support shared UMEM")
> 
> Hi Ciara,
> 
> 'check-git-log.sh' script is giving some errors, can you please fix them, you
> can run script as: "./devtools/check-git-log.sh -n1"
> 
> And can you please give some sample in the description what is not
> supported now?
> 
> Also does this require any documentation update, as kind of limitation or
> known
> issues?

Thanks for the suggestions Ferruh, all valid. I've implemented them in the v2.

Ciara

Re: [dpdk-dev] [PATCH][v2] net/af_xdp: avoid to unnecessary allocation and free mbuf in rx path

2020-10-01 Thread Loftus, Ciara

> 
> when receive packets, the max bunch number of mbuf are allocated
> if hardware does not receive the max bunch number packets, it
> will free redundancy mbuf, that is low-performance
> 
> so optimize rx performance, by allocating number of mbuf based on
> result of xsk_ring_cons__peek, to avoid to redundancy allocation,
> and free mbuf when receive packets

Hi,

Thanks for the patch and fixing the issue I raised.
With my testing so far I haven't measured an improvement in performance with 
the patch.
Do you have data to share which shows the benefit of your patch?

I agree the potential excess allocation of mbufs for the fill ring is not the 
most optimal, but if doing it does not significantly impact the performance I 
would be in favour of keeping that approach versus touching the cached_cons 
outside of libbpf which is unconventional.

If a benefit can be shown and we proceed with the approach, I would suggest 
creating a new function for the cached consumer rollback eg. 
xsk_ring_cons_cancel() or similar, and add a comment describing what it does.

Thanks,
Ciara

> 
> V2: rollback rx cached_cons if mbuf failed to be allocated
> 
> Signed-off-by: Li RongQing 
> Signed-off-by: Dongsheng Rong 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 67 ---
> --
>  1 file changed, 29 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 01f462b46..e04fa43f6 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -251,28 +251,29 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   struct xsk_umem_info *umem = rxq->umem;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   struct rte_mbuf *fq_bufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> - /* allocate bufs for fill queue replenishment after rx */
> - if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> - AF_XDP_LOG(DEBUG,
> - "Failed to get enough buffers for fq.\n");
> - return 0;
> - }
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> 
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> -
> - if (rcvd == 0) {
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> 
> - goto out;
> + return 0;
> + }
> +
> + /* allocate bufs for fill queue replenishment after rx */
> + if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> + AF_XDP_LOG(DEBUG,
> + "Failed to get enough buffers for fq.\n");
> + rx->cached_cons -= nb_pkts;
> + return 0;
>   }
> 
> - for (i = 0; i < rcvd; i++) {
> + for (i = 0; i < nb_pkts; i++) {
>   const struct xdp_desc *desc;
>   uint64_t addr;
>   uint32_t len;
> @@ -297,20 +298,14 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   rx_bytes += len;
>   }
> 
> - xsk_ring_cons__release(rx, rcvd);
> -
> - (void)reserve_fill_queue(umem, rcvd, fq_bufs, fq);
> + xsk_ring_cons__release(rx, nb_pkts);
> + (void)reserve_fill_queue(umem, nb_pkts, fq_bufs, fq);
> 
>   /* statistics */
> - rxq->stats.rx_pkts += rcvd;
> + rxq->stats.rx_pkts += nb_pkts;
>   rxq->stats.rx_bytes += rx_bytes;
> 
> -out:
> - if (rcvd != nb_pkts)
> - rte_mempool_put_bulk(umem->mb_pool, (void
> **)&fq_bufs[rcvd],
> -  nb_pkts - rcvd);
> -
> - return rcvd;
> + return nb_pkts;
>  }
>  #else
>  static uint16_t
> @@ -322,7 +317,7 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   struct xsk_ring_prod *fq = &rxq->fq;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   uint32_t free_thresh = fq->size >> 1;
>   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> @@ -330,20 +325,21 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   (void)reserve_fill_queue(umem,
> ETH_AF_XDP_RX_BATCH_SIZE,
>NULL, fq);
> 
> - if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, nb_pkts)
> != 0))
> - return 0;
> -
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> - if (rcvd == 0) {
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> + return 0;
> + }
> 
> - goto out;
> + if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs,
> nb_pkts))) {
> + rx->cached_co

Re: [dpdk-dev] [PATCH 2/2] af_xdp: avoid to unnecessary allocation and free mbuf

2020-09-18 Thread Loftus, Ciara

> 
> optimize rx performance, by allocating mbuf based on result
> of xsk_ring_cons__peek, to avoid to redundancy allocation,
> and free mbuf when receive packets
> 
> Signed-off-by: Li RongQing 
> Signed-off-by: Dongsheng Rong 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 64 ---
> --
>  1 file changed, 27 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 7ce4ad04a..48824050e 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -229,28 +229,29 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   struct xsk_umem_info *umem = rxq->umem;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   struct rte_mbuf *fq_bufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> - /* allocate bufs for fill queue replenishment after rx */
> - if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> - AF_XDP_LOG(DEBUG,
> - "Failed to get enough buffers for fq.\n");
> - return 0;
> - }
> 
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> 
> - if (rcvd == 0) {
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(&umem->fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> 
> - goto out;
> + return 0;
>   }
> 
> - for (i = 0; i < rcvd; i++) {
> + /* allocate bufs for fill queue replenishment after rx */
> + if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
> + AF_XDP_LOG(DEBUG,
> + "Failed to get enough buffers for fq.\n");

Thanks for this patch. I've considered this in the past.
There is a problem if we hit this condition.
We advance the rx producer @ xsk_ring_cons__peek.
But if we have no mbufs to hold the rx data, it is lost.
That's why we allocate the mbufs up front now.
Agree that we might have wasteful allocations and it's not the most optimal, 
but we don't drop packets due to failed mbuf allocs.

> + return 0;
> + }
> +
> + for (i = 0; i < nb_pkts; i++) {
>   const struct xdp_desc *desc;
>   uint64_t addr;
>   uint32_t len;
> @@ -275,20 +276,15 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   rx_bytes += len;
>   }
> 
> - xsk_ring_cons__release(rx, rcvd);
> + xsk_ring_cons__release(rx, nb_pkts);
> 
> - (void)reserve_fill_queue(umem, rcvd, fq_bufs);
> + (void)reserve_fill_queue(umem, nb_pkts, fq_bufs);
> 
>   /* statistics */
> - rxq->stats.rx_pkts += rcvd;
> + rxq->stats.rx_pkts += nb_pkts;
>   rxq->stats.rx_bytes += rx_bytes;
> 
> -out:
> - if (rcvd != nb_pkts)
> - rte_mempool_put_bulk(umem->mb_pool, (void
> **)&fq_bufs[rcvd],
> -  nb_pkts - rcvd);
> -
> - return rcvd;
> + return nb_pkts;
>  }
>  #else
>  static uint16_t
> @@ -300,27 +296,26 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   struct xsk_ring_prod *fq = &umem->fq;
>   uint32_t idx_rx = 0;
>   unsigned long rx_bytes = 0;
> - int rcvd, i;
> + int i;
>   uint32_t free_thresh = fq->size >> 1;
>   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> - if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, nb_pkts)
> != 0))
> - return 0;
> -
> - rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> - if (rcvd == 0) {
> + nb_pkts = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> + if (nb_pkts == 0) {
>  #if defined(XDP_USE_NEED_WAKEUP)
>   if (xsk_ring_prod__needs_wakeup(fq))
>   (void)poll(rxq->fds, 1, 1000);
>  #endif
> -
> - goto out;
> + return 0;
>   }
> 
> + if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, nb_pkts)
> != 0))
> + return 0;
> +
>   if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
>   (void)reserve_fill_queue(umem,
> ETH_AF_XDP_RX_BATCH_SIZE, NULL);
> 
> - for (i = 0; i < rcvd; i++) {
> + for (i = 0; i < nb_pkts; i++) {
>   const struct xdp_desc *desc;
>   uint64_t addr;
>   uint32_t len;
> @@ -339,18 +334,13 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>   bufs[i] = mbufs[i];
>   }
> 
> - xsk_ring_cons__release(rx, rcvd);
> + xsk_ring_cons__release(rx, nb_pkts);
> 
>   /* statistics */
> - rxq->stats.rx_pkts += rcvd;
> + rxq->stats.rx_pkts += nb_pkts;
>   rxq->stats.rx_bytes += rx_bytes;
> 
> -out:
> - if (rcvd != nb_pkts)
> - rte_mempool_put_bulk(rxq->mb_pool, (void
> **)&mbufs[rcvd],
> -  nb_

Re: [dpdk-dev] [PATCH] af_xdp: avoid deadlock due to empty fill queue

2020-09-18 Thread Loftus, Ciara

> when receive packets, it is possible to fail to reserve
> fill queue, since buffer ring is shared between tx and rx,
> and maybe not available temporary. at last, both fill
> queue and rx queue are empty.
> 
> then kernel side will be unable to receive packets due to
> empty fill queue, and dpdk will be unable to reserve fill
> queue because dpdk has not pakcets to receive, at last
> deadlock will happen
> 
> so move reserve fill queue before xsk_ring_cons__peek
> to fix it
> 
> Signed-off-by: Li RongQing 

Thanks for the fix. I tested and saw no significant performance drop.

Minor: the first line of the commit should read "net/af_xdp: "

Acked-by: Ciara Loftus 

CC-ing stable as I think this fix should be considered for inclusion.

Thanks,
Ciara

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 7ce4ad04a..2dc9cab27 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -304,6 +304,10 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   uint32_t free_thresh = fq->size >> 1;
>   struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
> 
> + if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
> + (void)reserve_fill_queue(umem,
> ETH_AF_XDP_RX_BATCH_SIZE, NULL);
> +
> +
>   if (unlikely(rte_pktmbuf_alloc_bulk(rxq->mb_pool, mbufs, nb_pkts)
> != 0))
>   return 0;
> 
> @@ -317,9 +321,6 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   goto out;
>   }
> 
> - if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
> - (void)reserve_fill_queue(umem,
> ETH_AF_XDP_RX_BATCH_SIZE, NULL);
> -
>   for (i = 0; i < rcvd; i++) {
>   const struct xdp_desc *desc;
>   uint64_t addr;
> --
> 2.16.2

Re: [dpdk-dev] [PATCH 1/1] net/af_xdp: shared UMEM support

2020-09-17 Thread Loftus, Ciara

> >
> > Kernel v5.10 will introduce the ability to efficiently share a UMEM between
> > AF_XDP sockets bound to different queue ids on the same or different
> > devices. This patch integrates that functionality into the AF_XDP PMD.
> >
> > A PMD will attempt to share a UMEM with others if the shared_umem=1
> > vdev arg is set. UMEMs can only be shared across PMDs with the same
> > mempool, up to a limited number of PMDs goverened by the size of the
> > given mempool.
> > Sharing UMEMs is not supported for non-zero-copy (aligned) mode.
> >
> > The benefit of sharing UMEM across PMDs is a saving in memory due to not
> > having to register the UMEM multiple times. Throughput was measured to
> > remain within 2% of the default mode (not sharing UMEM).
> >
> > A version of libbpf >= v0.2.0 is required and the appropriate pkg-config 
> > file
> > for libbpf must be installed such that meson can determine the version.
> >
> > Signed-off-by: Ciara Loftus 
> 
> 
> 
> >
> > +/* List which tracks PMDs to facilitate sharing UMEMs across them. */
> > +struct internal_list {
> > +TAILQ_ENTRY(internal_list) next;
> > +struct rte_eth_dev *eth_dev;
> > +};
> > +
> > +TAILQ_HEAD(internal_list_head, internal_list); static struct
> > +internal_list_head internal_list =
> > +TAILQ_HEAD_INITIALIZER(internal_list);
> > +
> > +static pthread_mutex_t internal_list_lock =
> PTHREAD_MUTEX_INITIALIZER;
> 
> [Tahhan, Maryam] do multiple threads typically initialize and ethdev/invoke
> the underlying driver?
> Most apps I've seen initialize the ports one after the other in the starting
> thread - so if there's not multiple threads doing initialization - we may want
> to consider removing this mutex...
> Or maybe do you see something potentially removing a port while a port is
> being added?

Hi Maryam,

Yes. Although unlikely, I'm not aware of any guarantee that a port A cannot be 
removed when port B is being added and since both operations can touch the 
tailq I'm inclined to keep the mutex. But I'm open to correction.

Thanks,
Ciara

> 
>

Re: [dpdk-dev] [PATCH 1/2] af_xdp: not return a negative value in af_xdp_rx_zc

2020-09-17 Thread Loftus, Ciara

> 
> af_xdp_rx_zc should always return the number of received packets,
> and negative value will be as number of received packets, and
> confuse the caller
> 
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: sta...@dpdk.org
> Signed-off-by: Li RongQing 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Hi,

Thank you for the patch. The same fix was submitted and accepted into the 
next-net tree a few weeks ago:
http://git.dpdk.org/next/dpdk-next-net/commit/?id=e85f60b4e282695ca0e457dc1ad21479b4bd7479
It should hopefully hit the main tree soon.

Ciara

> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 936d4a7d5..7ce4ad04a 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -236,7 +236,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   if (rte_pktmbuf_alloc_bulk(umem->mb_pool, fq_bufs, nb_pkts)) {
>   AF_XDP_LOG(DEBUG,
>   "Failed to get enough buffers for fq.\n");
> - return -1;
> + return 0;
>   }
> 
>   rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
> --
> 2.16.2

Re: [dpdk-dev] [PATCH] net/af_xdp: custom XDP program loading

2020-09-14 Thread Loftus, Ciara

> >
> > The new 'xdp_prog=' vdev arg allows the user to specify the path
> to
> > a custom XDP program to be set on the device, instead of the default libbpf
> > one. The program must have an XSK_MAP of name 'xsks_map' which will
> > allow for the redirection of some packets to userspace and thus the PMD,
> > using some criteria defined in the program.
> > Note: a netdev may only load one program.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> 
> [MT] Stupid question :), AF_XDP is specifically related to loading an XDP
> program that allows you to redirect packets to an XSK...
> why would you want to allow a custom XDP program to be loaded?
> 
> Other than that the code itself looked GTM

Hi Maryam,

Thanks for your feedback. It's a good question, and I will update the commit 
message in the v2 with the reasoning.

Sometimes it might be desired to redirect some not all packets to the xsk/PMD. 
eg. if the user wishes to drop or process a certain type of packet in the 
kernel. That logic can be put in the custom program. The key is that the custom 
program allows for *some* packets to still hit the xsk, and we check for that 
through checking the presence of the xsks_map.

Thanks,
Ciara

> 
>

Re: [dpdk-dev] [PATCH RFC 1/1] net/af_xdp: shared UMEM support

2020-08-26 Thread Loftus, Ciara


> 
> On 8/11/2020 10:50 AM, Ciara Loftus wrote:
> > A future kernel will introduce the ability to efficiently share a UMEM
> > between AF_XDP sockets bound to different queue ids on the same or
> > different devices. This patch integrates that functionality into the AF_XDP
> > PMD.
> >
> > A PMD will attempt to share a UMEM with others if the shared_umem=1
> vdev
> > arg is set. UMEMs can only be shared across PMDs with the same
> mempool, up
> > to a limited number of PMDs goverened by the size of the given mempool.
> >
> > The benefit of sharing UMEM across PMDs is a saving in memory due to not
> > having to register the UMEM multiple times.
> >
> > Signed-off-by: Ciara Loftus 
> 
> <...>
> 
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2019 Intel Corporation.
> > + * Copyright(c) 2020 Intel Corporation.
> >   */
> >  #include 
> >  #include 
> > @@ -15,6 +15,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "af_xdp_deps.h"
> >  #include 
> >
> > @@ -37,6 +38,11 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +
> > +#if KERNEL_VERSION(5, 7, 0) < LINUX_VERSION_CODE
> > +#define ETH_AF_XDP_SHARED_UMEM 1
> > +#endif
> 
> I think better to separate these version checks from the actual code, what do
> you think creating a compat.h under 'net/af_xdp' and move above logic
> there?

Good suggestion. Much cleaner. I'll implement this.

> 
> <...>
> 
> > @@ -888,9 +1048,15 @@ xsk_configure(struct pmd_internals *internals,
> struct pkt_rx_queue *rxq,
> > cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
> >  #endif
> >
> > -   ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> > -   rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
> > -   &txq->tx, &cfg);
> > +   if (!internals->shared_umem_configured) {
> > +   ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> > +   rxq->xsk_queue_idx, rxq->umem->umem,
> &rxq->rx,
> > +   &txq->tx, &cfg);
> > +   } else {
> > +   ret = xsk_socket__create_shared(&rxq->xsk, internals-
> >if_name,
> > +   rxq->xsk_queue_idx, rxq->umem->umem,
> &rxq->rx,
> > +   &txq->tx, &rxq->fq, &rxq->cq, &cfg);
> > +   }
> 
> Is the above dependency (ETH_AF_XDP_SHARED_UMEM) for the kernel
> 'af_xdp' code,
> or for 'libbpf.so'?
> 
> The 'xsk_socket__create_shared()' API is not available in the latest
> 'libbpf.so', I wonder if the kernel version check is to align with the correct
> 'libbpf.so' version.
> If not how the dependent version of the 'libbpf.so' managed for DPDK?

Good point.
For the RFC I'm assuming the user is using libbpf packaged with the kernel both 
with appropriate support.
In the next RFC/v1 I'll introduce a check for both the correct libbpf version 
and underlying kernel support, as both are required.

Thanks,
Ciara

Re: [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support

2020-08-26 Thread Loftus, Ciara

> 
> On 8/11/2020 10:50 AM, Ciara Loftus wrote:
> > This RFC integrates shared UMEM support into the AF_XDP PMD. It is
> based on the
> > WIP kernel series [1] by Magnus Karlsson.
> >
> > Detailed information on the shared UMEM feature can be found in the final
> patch
> > in the aforementioned series.
> >
> > Support for the kernel feature can eventually be detected in DPDK by
> querying
> > the LINUX_KERNEL_VERSION. As of now the feature is not yet merged
> upstream, so
> > for this RFC it is assumed the user is using a patched version of v5.8.
> 
> Hi Ciara,
> 
> I don't see this feature in kernel in 5.9.0-rc2, is there any update in the
> kernel side?

Hi Ferruh,

Some performance quirks have been identified with the latest version of the 
kernel patch.
Once those are resolved we should expect a new revision, hopefully within the 
5.9 window.

Thanks,
Ciara

> 
> >
> > Shared UMEM is only available for zero copy mode.
> >
> > In order to share UMEM information between PMDs, the ethdevs wishing
> to share
> > must be tracked somehow. The method chosen to do so is similar to
> methods used
> > in the vHost [2] and vDPA drivers, where pointers to the ethdevs are
> maintained
> > in an internal list. Proposals for alternate solutions are welcome.
> >
> > Performance data to follow with the v1.
> >
> > [1] https://patchwork.ozlabs.org/project/netdev/cover/1595307848-
> 20719-1-git-send-email-magnus.karls...@intel.com/
> > [2]
> https://git.dpdk.org/dpdk/commit/?id=ee584e9710b9abd60ee9faef664e106
> dcea10085
> >
> > Ciara Loftus (1):
> >   net/af_xdp: shared UMEM support
> >
> >  doc/guides/nics/af_xdp.rst  |   5 +-
> >  drivers/net/af_xdp/rte_eth_af_xdp.c | 315 ++--
> 
> >  2 files changed, 252 insertions(+), 68 deletions(-)
> >

Re: [dpdk-dev] [PATCH] doc: announce Vhost dequeue zero-copy removal

2020-08-07 Thread Loftus, Ciara

> 
> Vhost-user dequeue zero-copy support will be removed in
> 20.11. The only known user is OVS where the feature is
> still experimental, and has not received any update for
> several years. This feature faces reliability issues and
> is often conflicting with new features being implemented.
> 
> Signed-off-by: Maxime Coquelin 
> ---
> 
> Hi, the topic was discussed during OVS-DPDK public meeting
> on July 22nd. Ian, if you had time to discuss the topic with
> your team and agree with the removal, please ack the patch.
> If, not please let me know.
> 
>  doc/guides/rel_notes/deprecation.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index ea4cfa7a48..a923849419 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -151,3 +151,8 @@ Deprecation Notices
>Python 2 support will be completely removed in 20.11.
>In 20.08, explicit deprecation warnings will be displayed when running
>scripts with Python 2.
> +
> +* vhost: Vhost-user dequeue zero-copy support will be removed in 20.11.
> The
> +  only known user is OVS where the feature is still experimental, and has not
> +  received any update for 2.5 years. This feature faces reliability issues 
> and
> +  is often conflicting with new features being implemented.
> --
> 2.26.2

Acked-by: Ciara Loftus

Re: [dpdk-dev] [PATCH] net/af_xdp: optimisations to improve packet loss

2020-06-23 Thread Loftus, Ciara

> 
> On Fri, 12 Jun 2020 14:17:46 +
> Ciara Loftus  wrote:
> 
> > This commit makes some changes to the AF_XDP PMD in an effort to
> improve
> > its packet loss characteristics.
> >
> > 1. In the case of failed transmission due to inability to reserve a tx
> > descriptor, the PMD now pulls from the completion ring, issues a syscall
> > in which the kernel attempts to complete outstanding tx operations, then
> > tries to reserve the tx descriptor again. Prior to this we dropped the
> > packet after the syscall and didn't try to re-reserve.
> >
> > 2. During completion ring cleanup, always pull as many entries as possible
> > from the ring as opposed to the batch size or just how many packets
> > we're going to attempt to send. Keeping the completion ring emptier
> should
> > reduce failed transmissions in the kernel, as the kernel requires space in
> > the completion ring to successfully tx.
> >
> > 3. Size the fill ring as twice the receive ring size which may help reduce
> > allocation failures in the driver.
> >
> > With these changes, a benchmark which measured the packet rate at
> which
> > 0.01% packet loss could be reached improved from ~0.1G to ~3Gbps.
> >
> > Signed-off-by: Ciara Loftus 
> 
> You might want to add the ability to emulate a tx_free threshold
> by pulling more completions earlier.

Thanks for the suggestion. I've implemented it in the v2.

Ciara

Re: [dpdk-dev] [PATCH v3 3/3] net/af_xdp: fix maximum MTU value

2020-02-13 Thread Loftus, Ciara

> Subject: Re: [dpdk-dev] [PATCH v3 3/3] net/af_xdp: fix maximum MTU value
> 
> On 02/10, Ciara Loftus wrote:
> >The maximum MTU for af_xdp zero copy is equal to the page size less the
> >frame overhead introduced by AF_XDP (XDP HR = 256) and DPDK (frame
> headroom
> >= 320). The patch updates this value to reflect this.
> >
> >This change also makes it possible to remove unneeded constants for both
> >zero-copy and copy mode.
> >
> >Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> >Cc: sta...@dpdk.org
> >
> >Signed-off-by: Ciara Loftus 
> >---
> > drivers/net/af_xdp/rte_eth_af_xdp.c | 23 +++
> > 1 file changed, 11 insertions(+), 12 deletions(-)
> >
> >diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >index 1e98cd44f..75f037c3e 100644
> >--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >@@ -59,13 +59,6 @@ static int af_xdp_logtype;
> >
> > #define ETH_AF_XDP_FRAME_SIZE   2048
> > #define ETH_AF_XDP_NUM_BUFFERS  4096
> >-#ifdef XDP_UMEM_UNALIGNED_CHUNK_FLAG
> >-#define ETH_AF_XDP_MBUF_OVERHEAD128 /* sizeof(struct
> rte_mbuf) */
> >-#define ETH_AF_XDP_DATA_HEADROOM \
> >-(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
> >-#else
> >-#define ETH_AF_XDP_DATA_HEADROOM0
> >-#endif
> > #define ETH_AF_XDP_DFLT_NUM_DESCS
>   XSK_RING_CONS__DEFAULT_NUM_DESCS
> > #define ETH_AF_XDP_DFLT_START_QUEUE_IDX 0
> > #define ETH_AF_XDP_DFLT_QUEUE_COUNT 1
> >@@ -602,7 +595,14 @@ eth_dev_info(struct rte_eth_dev *dev, struct
> rte_eth_dev_info *dev_info)
> > dev_info->max_tx_queues = internals->queue_cnt;
> >
> > dev_info->min_mtu = RTE_ETHER_MIN_MTU;
> >-dev_info->max_mtu = ETH_AF_XDP_FRAME_SIZE -
> ETH_AF_XDP_DATA_HEADROOM;
> >+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> >+dev_info->max_mtu = getpagesize() -
> >+sizeof(struct rte_mempool_objhdr) -
> >+sizeof(struct rte_mbuf) -
> >+RTE_PKTMBUF_HEADROOM -
> XDP_PACKET_HEADROOM;
> >+#else
> >+dev_info->max_mtu = ETH_AF_XDP_FRAME_SIZE;
> 
> Do we need to subtract XDP_PACKET_HEADROOM for copy mode as well?

Good catch. I'll add this and spin a v4.
Thanks for the reviews.

Ciara

> 
> Thanks,
> Xiaolong
> 
> >+#endif
> >
> > dev_info->default_rxportconf.nb_queues = 1;
> > dev_info->default_txportconf.nb_queues = 1;
> >@@ -804,7 +804,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> > .fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
> > .comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
> > .frame_size = ETH_AF_XDP_FRAME_SIZE,
> >-.frame_headroom = ETH_AF_XDP_DATA_HEADROOM };
> >+.frame_headroom = 0 };
> > char ring_name[RTE_RING_NAMESIZE];
> > char mz_name[RTE_MEMZONE_NAMESIZE];
> > int ret;
> >@@ -829,8 +829,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> >
> > for (i = 0; i < ETH_AF_XDP_NUM_BUFFERS; i++)
> > rte_ring_enqueue(umem->buf_ring,
> >- (void *)(i * ETH_AF_XDP_FRAME_SIZE +
> >-  ETH_AF_XDP_DATA_HEADROOM));
> >+ (void *)(i * ETH_AF_XDP_FRAME_SIZE));
> >
> > snprintf(mz_name, sizeof(mz_name), "af_xdp_umem_%s_%u",
> >internals->if_name, rxq->xsk_queue_idx);
> >@@ -939,7 +938,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> > /* Now get the space available for data in the mbuf */
> > buf_size = rte_pktmbuf_data_room_size(mb_pool) -
> > RTE_PKTMBUF_HEADROOM;
> >-data_size = ETH_AF_XDP_FRAME_SIZE -
> ETH_AF_XDP_DATA_HEADROOM;
> >+data_size = ETH_AF_XDP_FRAME_SIZE;
> >
> > if (data_size > buf_size) {
> > AF_XDP_LOG(ERR, "%s: %d bytes will not fit in mbuf (%d
> bytes)\n",
> >--
> >2.17.1
> >

Re: [dpdk-dev] [PATCH] net/af_xdp: use single-prod-and-cons ring

2020-01-13 Thread Loftus, Ciara

> 
> The ring is used only by af_xdp pmd itself, so no need to support
> multi-producer and multi-consumer mode. This patch changes the ring
> to single-producer and single-consumer mode, which could yield better
> performance for addr enqueue and dequeue.
> 
> Signed-off-by: Xiao Wang 

LGTM.

I ran some rough numbers and measured a +~6.8% improvement for single-core 
single-pmd testpmd loopback (IRQs pinned to app core) and +~15.9% for two core 
(IRQs and app pinned to separate cores).

Tested-by: Ciara Loftus 

Thanks,
Ciara

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index d903e6c28..683e2a559 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -809,7 +809,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
>   umem->buf_ring = rte_ring_create(ring_name,
>ETH_AF_XDP_NUM_BUFFERS,
>rte_socket_id(),
> -  0x0);
> +  RING_F_SP_ENQ |
> RING_F_SC_DEQ);
>   if (umem->buf_ring == NULL) {
>   AF_XDP_LOG(ERR, "Failed to create rte_ring\n");
>   goto err;
> --
> 2.15.1

Re: [dpdk-dev] [PATCH] net/af_xdp: fix redundant check for NEED WAKEUP

2020-01-08 Thread Loftus, Ciara

> 
> Function kick_tx() has built-in detection on NEED_WAKEUP flag, so just
> call it directly, like elsewhere in the driver.
> 
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xiao Wang 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 2b1245ee4..d903e6c28 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -480,10 +480,7 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs,
> uint16_t nb_pkts)
>   tx_bytes += mbuf->pkt_len;
>   }
> 
> -#if defined(XDP_USE_NEED_WAKEUP)
> - if (xsk_ring_prod__needs_wakeup(&txq->tx))
> -#endif
> - kick_tx(txq);
> + kick_tx(txq);
> 
>  out:
>   xsk_ring_prod__submit(&txq->tx, count);
> --
> 2.15.1


Thanks for the patch.

Tested-by: Ciara Loftus

Re: [dpdk-dev] [PATCH v3] net/i40e: fix TSO pkt exceeds allowed buf size issue

2020-01-02 Thread Loftus, Ciara

> > -Original Message-
> > From: Li, Xiaoyun 
> > Sent: Thursday, December 26, 2019 2:46 PM
> > To: Zhang, Qi Z ; Xing, Beilei
> ;
> > Loftus, Ciara ; dev@dpdk.org
> > Cc: Li, Xiaoyun ; sta...@dpdk.org
> > Subject: [PATCH v3] net/i40e: fix TSO pkt exceeds allowed buf size issue
> >
> > Hardware limits that max buffer size per tx descriptor should be (16K-1)B.
> So
> > when TSO enabled, the mbuf data size may exceed the limit and cause
> > malicious behavior to the NIC. This patch fixes this issue by using more tx
> descs
> > for this kind of large buffer.
> >
> > Fixes: 4861cde46116 ("i40e: new poll mode driver")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Xiaoyun Li 
> 
> Acked-by: Qi Zhang 

Tested-by: Ciara Loftus

Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: support pinning of IRQs

2019-10-21 Thread Loftus, Ciara

> > On Mon, 30 Sep 2019 16:42:04 +
> > Ciara Loftus  wrote:
> >
> > > +/* drivers supported for the queue_irq option */
> > > +enum supported_drivers {
> > > + I40E_DRIVER,
> > > + IXGBE_DRIVER,
> > > + MLX5_DRIVER,
> > > + NUM_DRIVERS
> > > +};
> >
> > Anything device specific like this raises a red flag to me.
> >
> > This regex etc, seems like a huge hack. Is there a better way  using
> > irqbalance and smp_affinity in kernel drivers?
> >
> > NACK
> 
> Hi Stephen,
> 
> Thanks for looking at the patch. I understand your concern however
> unfortunately I haven't been able to identify a way to achieve the desired
> outcome by using your suggestions of irqbalance and smp_affinity. Did you
> have something specific in mind or are aware of any generic way of retrieving
> interrupt numbers for NICs regardless of vendor or range?
> 
> I think this feature is really important for the usability of this PMD. 
> Without it,
> to configure the IRQs the user has to open up /proc/interrupts, trawl through
> it and identify the correct IRQ number for their given NIC and qid (the format
> for which is unlikely to be known off-hand), and manually pin them by writing
> the appropriate values in the appropriate format to the appropriate file -
> prone to error if not automated IMO.
> If the user fails to set the affinity it's probably fine for a single pmd, 
> however
> with multiple pmds all irqs will by default land on core 0 and lead to 
> terrible
> performance.

Hi,

Following this up with some performance data which shows the impact of no 
pinning.

The test case is N instances of testpmd macswap where N= the number of 
interfaces.

ifaces  no pinning  pinning
1   9059100 9171612
2   9261635 18376552
3   9332804 27696702

For the no-pinning case, all IRQs are landing on the default core 0, which 
results in very poor scaling versus the pinned case where scaling is linear.

Thanks,
Ciara

> 
> It should be possible to rework the code to remove the regexes and use a
> direct string compare. Would that make the solution more palatable?
> 
> Let me know what you think.
> 
> Thanks,
> Ciara

Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: support pinning of IRQs

2019-10-03 Thread Loftus, Ciara

> -Original Message-
> From: Stephen Hemminger 
> Sent: Monday 30 September 2019 18:12
> To: Loftus, Ciara 
> Cc: dev@dpdk.org; Ye, Xiaolong ; Laatz, Kevin
> ; Richardson, Bruce 
> Subject: Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
> 
> On Mon, 30 Sep 2019 16:42:04 +
> Ciara Loftus  wrote:
> 
> > +/* drivers supported for the queue_irq option */
> > +enum supported_drivers {
> > +   I40E_DRIVER,
> > +   IXGBE_DRIVER,
> > +   MLX5_DRIVER,
> > +   NUM_DRIVERS
> > +};
> 
> Anything device specific like this raises a red flag to me.
> 
> This regex etc, seems like a huge hack. Is there a better way  using
> irqbalance and smp_affinity in kernel drivers?
> 
> NACK

Hi Stephen,

Thanks for looking at the patch. I understand your concern however 
unfortunately I haven't been able to identify a way to achieve the desired 
outcome by using your suggestions of irqbalance and smp_affinity. Did you have 
something specific in mind or are aware of any generic way of retrieving 
interrupt numbers for NICs regardless of vendor or range?

I think this feature is really important for the usability of this PMD. Without 
it, to configure the IRQs the user has to open up /proc/interrupts, trawl 
through it and identify the correct IRQ number for their given NIC and qid (the 
format for which is unlikely to be known off-hand), and manually pin them by 
writing the appropriate values in the appropriate format to the appropriate 
file - prone to error if not automated IMO.
If the user fails to set the affinity it's probably fine for a single pmd, 
however with multiple pmds all irqs will by default land on core 0 and lead to 
terrible performance.

It should be possible to rework the code to remove the regexes and use a direct 
string compare. Would that make the solution more palatable?

Let me know what you think.

Thanks,
Ciara

Re: [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs

2019-09-27 Thread Loftus, Ciara

[snip]

> >+
> >+static void
> >+configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id)
> >+{
> >+int coreid = internals->queue_irqs[rx_queue_id];
> >+char driver[NAME_MAX];
> >+uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx;
> >+regex_t r;
> >+int interrupt;
> >+
> >+if (coreid < 0)
> >+return;
> >+
> >+if (coreid > (get_nprocs() - 1)) {
> >+AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n",
> >+coreid);
> >+return;
> >+}
> 
> I think we can combine above 2 sanity checks together.
> 

Hi Xiaolong,

Thanks for your review. I agree with all of your feedback except this one.

configure_irqs() is called for every queue. The queues with no affinity have a 
coreid initialized to -1. So coreid < 0 is a valid value and we should return 
with no error. However for the case where coreid > nprocs, this is an actual 
error and we should report that with a log.
What do you think?

Thanks,
Ciara

[snip]

> >@@ -697,6 +996,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> > goto err;
> > }
> >
> >+configure_irqs(internals, rx_queue_id);
> >+
> > rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
> > rxq->fds[0].events = POLLIN;
> >
> >@@ -834,6 +1135,39 @@ parse_name_arg(const char *key __rte_unused,
> > return 0;
> > }
> >

Re: [dpdk-dev] [PATCH] net/af_xdp: fix Tx halt when no recv packets

2019-09-17 Thread Loftus, Ciara

> 
> The kernel only consumes Tx packets if we have some Rx traffic on specified
> queue or we have called send(). So we need to issue a send() even when the
> allocation fails so that kernel will start to consume packets again.
> 
> Commit 45bba02c95b0 ("net/af_xdp: support need wakeup feature") breaks
> above rule by adding some condition to send, this patch fixes it while still
> keeps the need_wakeup feature for Tx.
> 
> Fixes: 45bba02c95b0 ("net/af_xdp: support need wakeup feature")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xiaolong Ye 

Thanks for the patch Xiaolong.

Verified that this resolved an issue whereby when transmitting in one direction 
from a NIC PMD to the AF_XDP PMD, the AF_XDP PMD would stop transmitting after 
a short time.

Tested-by: Ciara Loftus 

Thanks,
Ciara

> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 28 ++--
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 41ed5b2af..e496e9aaa 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -286,19 +286,16 @@ kick_tx(struct pkt_tx_queue *txq)  {
>   struct xsk_umem_info *umem = txq->pair->umem;
> 
> -#if defined(XDP_USE_NEED_WAKEUP)
> - if (xsk_ring_prod__needs_wakeup(&txq->tx))
> -#endif
> - while (send(xsk_socket__fd(txq->pair->xsk), NULL,
> - 0, MSG_DONTWAIT) < 0) {
> - /* some thing unexpected */
> - if (errno != EBUSY && errno != EAGAIN && errno !=
> EINTR)
> - break;
> -
> - /* pull from completion queue to leave more space
> */
> - if (errno == EAGAIN)
> - pull_umem_cq(umem,
> ETH_AF_XDP_TX_BATCH_SIZE);
> - }
> + while (send(xsk_socket__fd(txq->pair->xsk), NULL,
> + 0, MSG_DONTWAIT) < 0) {
> + /* some thing unexpected */
> + if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
> + break;
> +
> + /* pull from completion queue to leave more space */
> + if (errno == EAGAIN)
> + pull_umem_cq(umem,
> ETH_AF_XDP_TX_BATCH_SIZE);
> + }
>   pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);  }
> 
> @@ -367,7 +364,10 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> 
>   xsk_ring_prod__submit(&txq->tx, nb_pkts);
> 
> - kick_tx(txq);
> +#if defined(XDP_USE_NEED_WAKEUP)
> + if (xsk_ring_prod__needs_wakeup(&txq->tx))
> +#endif
> + kick_tx(txq);
> 
>   txq->stats.tx_pkts += nb_pkts;
>   txq->stats.tx_bytes += tx_bytes;
> --
> 2.17.1

Re: [dpdk-dev] [PATCH] net/af_xdp: enable support for unaligned umem chunks

2019-09-04 Thread Loftus, Ciara

> 
> Hi, Ciara
> 
> Thanks for the patch, the performance number is quite impressive.
> 
> On 08/29, Ciara Loftus wrote:
> >This patch enables the unaligned chunks feature for AF_XDP which allows
> >chunks to be placed at arbitrary places in the umem, as opposed to them
> >being required to be aligned to 2k. This allows for DPDK application
> >mempools to be mapped directly into the umem and in turn enable zero
> >copy transfer between umem and the PMD.
> >
> >This patch replaces the zero copy via external mbuf mechanism
> >introduced in commit e9ff8bb71943 ("net/af_xdp: enable zero copy by
> external mbuf").
> >The pmd_zero copy vdev argument is also removed as now the PMD will
> >auto-detect presence of the unaligned chunks feature and enable it if
> >so and otherwise fall back to copy mode if not detected.
> >
> >When enabled, this feature significantly improves single-core
> >performance of the PMD.
> >
> >Signed-off-by: Ciara Loftus 
> >Signed-off-by: Kevin Laatz 
> >---
> > doc/guides/nics/af_xdp.rst |   1 -
> > doc/guides/rel_notes/release_19_11.rst |   9 +
> > drivers/net/af_xdp/rte_eth_af_xdp.c| 304 ++---
> > 3 files changed, 231 insertions(+), 83 deletions(-)
> >
> >diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> >index ec46f08f0..48dd788ac 100644
> >--- a/doc/guides/nics/af_xdp.rst
> >+++ b/doc/guides/nics/af_xdp.rst
> >@@ -35,7 +35,6 @@ The following options can be provided to set up an
> af_xdp port in DPDK.
> > *   ``iface`` - name of the Kernel interface to attach to (required);
> > *   ``start_queue`` - starting netdev queue id (optional, default 0);
> > *   ``queue_count`` - total netdev queue number (optional, default 1);
> >-*   ``pmd_zero_copy`` - enable zero copy or not (optional, default 0);
> >
> > Prerequisites
> > -
> >diff --git a/doc/guides/rel_notes/release_19_11.rst
> >b/doc/guides/rel_notes/release_19_11.rst
> >index 8490d897c..28a8e5372 100644
> >--- a/doc/guides/rel_notes/release_19_11.rst
> >+++ b/doc/guides/rel_notes/release_19_11.rst
> >@@ -56,6 +56,13 @@ New Features
> >  Also, make sure to start the actual text at the margin.
> >
> =
> >
> >+* **Updated the AF_XDP PMD.**
> >+
> >+  Updated the AF_XDP PMD. The new features include:
> >+
> >+  * Enabled zero copy between application mempools and UMEM by
> enabling the
> >+XDP_UMEM_UNALIGNED_CHUNKS UMEM flag.
> >+
> 
> Better to document the kernel dependency in the af_xdp.rst.

Will do.

> 
> >
> > Removed Items
> > -
> >@@ -69,6 +76,8 @@ Removed Items
> >Also, make sure to start the actual text at the margin.
> >
> =
> >
> >+* Removed AF_XDP pmd_zero copy vdev argument. Support is now auto-
> detected.
> >+
> >
> > API Changes
> > ---
> >diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >index 41ed5b2af..7956d5778 100644
> >--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> [snip]
> > reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size)
> >{
> > struct xsk_ring_prod *fq = &umem->fq;
> >-void *addrs[reserve_size];
> > uint32_t idx;
> > uint16_t i;
> >+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> >+
> >+if (unlikely(!xsk_ring_prod__reserve(fq, reserve_size, &idx))) {
> >+AF_XDP_LOG(DEBUG, "Failed to reserve enough fq
> descs.\n");
> >+return -1;
> >+}
> >+
> >+for (i = 0; i < reserve_size; i++) {
> >+struct rte_mbuf *mbuf;
> >+__u64 *fq_addr;
> >+uint64_t addr;
> >+
> >+mbuf = rte_pktmbuf_alloc(umem->mb_pool);
> >+if (unlikely(mbuf == NULL))
> >+break;
> 
> If this rare case happens, not all of the reserved slots of fq will be filled 
> with
> proper mbuf addr, then we just call xsk_ring_prod__submit(fq,
> reserve_size) to let kernel receive packets on these addrs, something
> unexpected may happen.

Good catch. I'll fix this in the v2.

Thanks!
Ciara

> 
> Thanks,
> Xiaolong
> 
> >+

Re: [dpdk-dev] [PATCH] net/af_xdp: enable support for unaligned umem chunks

2019-09-02 Thread Loftus, Ciara

> > Hi Ciara,
> >
> > I haven't tried this patch but have a question.
> >
> > On Thu, Aug 29, 2019 at 8:04 AM Ciara Loftus 
> wrote:
> > >
> > > This patch enables the unaligned chunks feature for AF_XDP which
> > > allows chunks to be placed at arbitrary places in the umem, as
> > > opposed to them being required to be aligned to 2k. This allows for
> > > DPDK application mempools to be mapped directly into the umem and in
> > > turn enable zero copy transfer between umem and the PMD.
> > >
> > > This patch replaces the zero copy via external mbuf mechanism
> > > introduced in commit e9ff8bb71943 ("net/af_xdp: enable zero copy by
> > external mbuf").
> > > The pmd_zero copy vdev argument is also removed as now the PMD will
> > > auto-detect presence of the unaligned chunks feature and enable it
> > > if so and otherwise fall back to copy mode if not detected.
> > >
> > > When enabled, this feature significantly improves single-core
> > > performance of the PMD.
> >
> > Why using unaligned chunk feature improve performance?
> > Existing external mbuf already has zero copy between umem and PMD,
> and
> > your patch also does the same thing. So the improvement is from
> > somewhere else?
> 
> Hi William,
> 
> Good question.
> The external mbuf way indeed has zero copy however there's some
> additional complexity in that path in the management of the buf_ring.
> 
> For example on the fill/rx path, in the ext mbuf solution one must dequeue
> an addr from the buf_ring and add it to the fill queue, allocate an mbuf for
> the external mbuf, get a pointer to the data @ addr and attach the external
> mbuf. With the new solution, we allocate an mbuf from the mempool, derive
> the addr from the mbuf itself and add it to the fill queue, and then on rx we
> can simply cast the pointer to the data @ addr to an mbuf and return it to the
> user.
> On tx/complete, instead of dequeuing from the buf_ring to get a valid addr
> we can again just derive it from the mbuf itself.
> 
> I've performed some testing to compare the old vs new zc and found that for
> the case where the PMD and IRQs are pinned to separate cores the
> difference is ~-5%, but for single-core case where the PMD and IRQs are
> pinned to the same core (with the need_wakeup feature enabled), or when
> multiple PMDs are forwarding to one another the difference is significant.
> Please see below:
> 
> ports  queues/port pinningΔ old zc
> 1  1   0  -4.74%
> 1  1   1  17.99%
> 2  1   0  -5.62%
> 2  1   1  71.77%
> 1  2   0  114.24%
> 1  2   1  134.88%

Apologies, the last 4 figures above were comparing old memcpy vs zc. Corrected 
data set below:

ports  qs/port pinningΔ old zc
1  1   0  -4.74%
1  1   1  17.99%
2  1   0  -5.80%
2  1   1  37.24%
1  2   0  104.27%
1  2   1  136.73%

> 
> FYI the series has been now merged into the bpf-next tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-
> next.git/commit/?id=bdb15a29cc28f8155e20f7fb58b60ffc452f2d1b
> 
> Thanks,
> Ciara
> 
> >
> > Thank you
> > William
> >
> > >
> > > Signed-off-by: Ciara Loftus 
> > > Signed-off-by: Kevin Laatz 
> > > ---
> > >  doc/guides/nics/af_xdp.rst |   1 -
> > >  doc/guides/rel_notes/release_19_11.rst |   9 +
> > >  drivers/net/af_xdp/rte_eth_af_xdp.c| 304 ++
> --
> > -
> > >  3 files changed, 231 insertions(+), 83 deletions(-)
> > >
> >

Re: [dpdk-dev] [PATCH] net/af_xdp: enable support for unaligned umem chunks

2019-09-02 Thread Loftus, Ciara

> Hi Ciara,
> 
> I haven't tried this patch but have a question.
> 
> On Thu, Aug 29, 2019 at 8:04 AM Ciara Loftus  wrote:
> >
> > This patch enables the unaligned chunks feature for AF_XDP which
> > allows chunks to be placed at arbitrary places in the umem, as opposed
> > to them being required to be aligned to 2k. This allows for DPDK
> > application mempools to be mapped directly into the umem and in turn
> > enable zero copy transfer between umem and the PMD.
> >
> > This patch replaces the zero copy via external mbuf mechanism
> > introduced in commit e9ff8bb71943 ("net/af_xdp: enable zero copy by
> external mbuf").
> > The pmd_zero copy vdev argument is also removed as now the PMD will
> > auto-detect presence of the unaligned chunks feature and enable it if
> > so and otherwise fall back to copy mode if not detected.
> >
> > When enabled, this feature significantly improves single-core
> > performance of the PMD.
> 
> Why using unaligned chunk feature improve performance?
> Existing external mbuf already has zero copy between umem and PMD, and
> your patch also does the same thing. So the improvement is from
> somewhere else?

Hi William,

Good question.
The external mbuf way indeed has zero copy however there's some additional 
complexity in that path in the management of the buf_ring.

For example on the fill/rx path, in the ext mbuf solution one must dequeue an 
addr from the buf_ring and add it to the fill queue, allocate an mbuf for the 
external mbuf, get a pointer to the data @ addr and attach the external mbuf. 
With the new solution, we allocate an mbuf from the mempool, derive the addr 
from the mbuf itself and add it to the fill queue, and then on rx we can simply 
cast the pointer to the data @ addr to an mbuf and return it to the user.
On tx/complete, instead of dequeuing from the buf_ring to get a valid addr we 
can again just derive it from the mbuf itself.

I've performed some testing to compare the old vs new zc and found that for the 
case where the PMD and IRQs are pinned to separate cores the difference is 
~-5%, but for single-core case where the PMD and IRQs are pinned to the same 
core (with the need_wakeup feature enabled), or when multiple PMDs are 
forwarding to one another the difference is significant. Please see below:

ports  queues/port pinningΔ old zc
1  1   0  -4.74%
1  1   1  17.99%
2  1   0  -5.62%
2  1   1  71.77%
1  2   0  114.24%
1  2   1  134.88%

FYI the series has been now merged into the bpf-next tree:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=bdb15a29cc28f8155e20f7fb58b60ffc452f2d1b

Thanks,
Ciara

> 
> Thank you
> William
> 
> >
> > Signed-off-by: Ciara Loftus 
> > Signed-off-by: Kevin Laatz 
> > ---
> >  doc/guides/nics/af_xdp.rst |   1 -
> >  doc/guides/rel_notes/release_19_11.rst |   9 +
> >  drivers/net/af_xdp/rte_eth_af_xdp.c| 304 ++--
> -
> >  3 files changed, 231 insertions(+), 83 deletions(-)
> >
>

Re: [dpdk-dev] [PATCH] net/af_xdp: enable support for unaligned umem chunks

2019-08-30 Thread Loftus, Ciara

> 
> This patch enables the unaligned chunks feature for AF_XDP which allows
> chunks to be placed at arbitrary places in the umem, as opposed to them
> being required to be aligned to 2k. This allows for DPDK application
> mempools to be mapped directly into the umem and in turn enable zero
> copy transfer between umem and the PMD.
> 
> This patch replaces the zero copy via external mbuf mechanism introduced in
> commit e9ff8bb71943 ("net/af_xdp: enable zero copy by external mbuf").
> The pmd_zero copy vdev argument is also removed as now the PMD will
> auto-detect presence of the unaligned chunks feature and enable it if so and
> otherwise fall back to copy mode if not detected.
> 
> When enabled, this feature significantly improves single-core performance
> of the PMD.
> 
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Kevin Laatz 
> ---

Apologies for omitting this detail from the original mail.
Those wishing to try out this feature need to first apply this series which is 
currently under review to their kernel tree:
https://lore.kernel.org/bpf/20190827022531.15060-1-kevin.la...@intel.com/T/#u

Thanks,
Ciara

Re: [dpdk-dev] [dpdk-users] Traffic doesn't forward on virtual devices

2018-07-11 Thread Loftus, Ciara

> > >
> > > Bala Sankaran  writes:
> > >
> > > > Perfect!
> > > >
> > > > Thanks for the help.
> > > >
> > > > - Original Message -
> > > >> From: "Keith Wiles" 
> > > >> To: "Bala Sankaran" 
> > > >> Cc: us...@dpdk.org, "Aaron Conole" 
> > > >> Sent: Thursday, July 5, 2018 11:41:46 AM
> > > >> Subject: Re: [dpdk-users] Traffic doesn't forward on virtual devices
> > > >>
> > > >>
> > > >>
> > > >> > On Jul 5, 2018, at 9:53 AM, Bala Sankaran 
> > > wrote:
> > > >> >
> > > >> > Greetings,
> > > >> >
> > > >> > I am currently using dpdk version 17.11.2. I see that there are a few
> > > other
> > > >> > revisions in 17.11.3, followed by the latest stable version of
> > > >> > 18.02.2.
> > > >> >
> > > >> > Based on the issues I have faced so far (see Original
> > > >> > Message below), would you suggest that  I go for
> > > >> > another version? If yes, which one? In essence, my question is,
> would
> > > >> > resorting to a different version of dpdk solve my current issue of
> > > >> > virtqueue id being invalid?
> > > >> >
> > > >> > Any help is much appreciated.
> > > >>
> > > >> From a support perspective using the latest version 18.05 or the long
> > > >> term
> > > >> supported version 17.11.3 is easier for most to help. I would pick the
> > > >> latest release 18.05 myself. As for fixing this problem I do not know.
> > > >> You
> > > >> can look into the MAINTAINERS file and find the maintainers of area(s)
> > > and
> > > >> include them in the CC line on your questions as sometimes they miss
> the
> > > >> emails as the volume can be high at times.
> > >
> > > Thanks Keith.
> > >
> > > I took a quick look and it seems like the queues are not setting up
> > > correctly between OvS and testpmd?  Probably there's a step missing
> > > somewhere, although nothing in either the netdev-dpdk.c from OvS nor
> the
> > > rte_ethdev was obvious to stand out to me.
> > >
> > > I've CC'd Maxime, Ian, and Ciara - maybe they have a better idea to try?
> >
> > Hi,
> >
> > I think the appropriate driver to use in this test on the test-pmd side 
> > might
> > be virtio-user.
> > Follow the same steps just change your vdev test-pmd arguments to:
> > --vdev='net_virtio_user0,path=/usr/local/var/run/openvswitch/vhu0'
> >
> > Thanks,
> > Ciara
> >
> 
> Thank you for your response.
> 
> I tried using virtio-user, but I face an error that says: Failed to prepare
> memory for vhost-user.
> The command I ran is as below:
> 
> [root@localhost openvswitch]# testpmd --socket-mem=1024 --
> vdev='net_virtio_user1,path=/usr/local/var/run/openvswitch/vhu1,server=
> 1'  --vdev='net_tap1,iface=tap1' --file-prefix page1 -- -i
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/page1/mp_socket
> EAL: Probing VFIO support...
> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using
> unreliable clock cycles !
> rte_pmd_tap_probe(): Initializing pmd_tap for net_tap1 as tap1
> Interactive-mode selected
> Warning: NUMA should be configured manually by using --port-numa-config
> and --ring-numa-config parameters along with --numa.
> testpmd: create a new mbuf pool : n=171456,
> size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> Configuring Port 0 (socket 0)
> virtio_user_server_reconnect(): WARNING: Some features 0x1801 are not
> supported by vhost-user!
> get_hugepage_file_info(): Exceed maximum of 8
> prepare_vhost_memory_user(): Failed to prepare memory for vhost-user
> Port 0: DA:60:01:0C:4B:29
> Configuring Port 1 (socket 0)
> Port 1: D2:5A:94:68:AF:B3
> Checking link statuses...
> 
> Port 0: LSC event
> Done
> 
> I tried increasing the socket-memory, I checked /proc/meminfo and found
> there were over
> 1280 free hugepages.
> So my understanding is that this is not an issue where I don't have enough
> hugepages.
> 
> Can you provide leads on what's wrong here?

Hi,

The limitations section for Virtio User 
https://doc.dpdk.org/guides/howto/virtio_user_for_container_networking.html#limitations
 states that:

" Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. 
If you have more regions (especially when 2MB hugepages are used), the option, 
--single-file-segments, can help to reduce the number of shared files."

Suggest using the --single-file-segments option on the test-pmd command line or 
failing that increasing your hugepage size to 1G 
https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html#use-of-hugepages-in-the-linux-environment

Thanks,
Ciara

> 
> > >
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> > - Original Message -
> > > >> >> From: "Bala Sankaran" 
> > > >> >> To: us...@dpdk.org
> > > >> >> Cc: "Aaron Conole" 
> > > >> >> Sent: Thursday, June 28, 2018 3:18:13 PM
> > > >> >> Subject: Traffic doesn't forward on virtual devices
> > > >> >>
> > > >> >>
> > > >> >> Hello team,
> > > >> >>
> > > >> >> I am working on a project to do PVP tests on dpdk. As a first step, 
> > > >> >> I
> > > >> >> would
> > > >> >>

Re: [dpdk-dev] [dpdk-users] Traffic doesn't forward on virtual devices

2018-07-10 Thread Loftus, Ciara

> 
> Bala Sankaran  writes:
> 
> > Perfect!
> >
> > Thanks for the help.
> >
> > - Original Message -
> >> From: "Keith Wiles" 
> >> To: "Bala Sankaran" 
> >> Cc: us...@dpdk.org, "Aaron Conole" 
> >> Sent: Thursday, July 5, 2018 11:41:46 AM
> >> Subject: Re: [dpdk-users] Traffic doesn't forward on virtual devices
> >>
> >>
> >>
> >> > On Jul 5, 2018, at 9:53 AM, Bala Sankaran 
> wrote:
> >> >
> >> > Greetings,
> >> >
> >> > I am currently using dpdk version 17.11.2. I see that there are a few
> other
> >> > revisions in 17.11.3, followed by the latest stable version of 18.02.2.
> >> >
> >> > Based on the issues I have faced so far (see Original
> >> > Message below), would you suggest that  I go for
> >> > another version? If yes, which one? In essence, my question is, would
> >> > resorting to a different version of dpdk solve my current issue of
> >> > virtqueue id being invalid?
> >> >
> >> > Any help is much appreciated.
> >>
> >> From a support perspective using the latest version 18.05 or the long term
> >> supported version 17.11.3 is easier for most to help. I would pick the
> >> latest release 18.05 myself. As for fixing this problem I do not know. You
> >> can look into the MAINTAINERS file and find the maintainers of area(s)
> and
> >> include them in the CC line on your questions as sometimes they miss the
> >> emails as the volume can be high at times.
> 
> Thanks Keith.
> 
> I took a quick look and it seems like the queues are not setting up
> correctly between OvS and testpmd?  Probably there's a step missing
> somewhere, although nothing in either the netdev-dpdk.c from OvS nor the
> rte_ethdev was obvious to stand out to me.
> 
> I've CC'd Maxime, Ian, and Ciara - maybe they have a better idea to try?

Hi,

I think the appropriate driver to use in this test on the test-pmd side might 
be virtio-user.
Follow the same steps just change your vdev test-pmd arguments to:
--vdev='net_virtio_user0,path=/usr/local/var/run/openvswitch/vhu0'

Thanks,
Ciara

> 
> >> >
> >> > Thanks
> >> >
> >> > - Original Message -
> >> >> From: "Bala Sankaran" 
> >> >> To: us...@dpdk.org
> >> >> Cc: "Aaron Conole" 
> >> >> Sent: Thursday, June 28, 2018 3:18:13 PM
> >> >> Subject: Traffic doesn't forward on virtual devices
> >> >>
> >> >>
> >> >> Hello team,
> >> >>
> >> >> I am working on a project to do PVP tests on dpdk. As a first step, I
> >> >> would
> >> >> like to get traffic flow between tap devices. I'm in process of setting 
> >> >> up
> >> >> the architecture, in which I've used testpmd to forward traffic
> between
> >> >> two
> >> >> virtual devices(tap and vhost users) over a bridge.
> >> >>
> >> >> While I'm at it, I've identified that the internal dev_attached flag 
> >> >> never
> >> >> gets set to 1 from the rte_eth_vhost.c file. I've tried to manually set 
> >> >> it
> >> >> to 1 in the start routine, but I just see that the queue index being
> >> >> referenced is out of range.
> >> >>
> >> >> I'm not sure how to proceed.  Has anyone had luck using testpmd to
> >> >> communicate with vhost-user devices?  If yes, any hints on a
> workaround?
> >> >>
> >> >> Here's how I configured my setup after installing dpdk and
> openvswitch:
> >> >>
> >> >> 1. To start ovs-ctl:
> >> >> /usr/local/share/openvswitch/scripts/ovs-ctl start
> >> >>
> >> >> 2. Setup hugepages:
> >> >> echo '2048' > /proc/sys/vm/nr_hugepages
> >> >>
> >> >> 3. Add a new network namespace:
> >> >> ip netns add ns1
> >> >>
> >> >> 4. Add and set a bridge:
> >> >> ovs-vsctl add-br dpdkbr0 -- set Bridge dpdkbr0 datapath_type=netdev
> >> >> options:vhost-server-path=/usr/local/var/run/openvswitch/vhu0
> >> >> ovs-vsctl show
> >> >>
> >> >> 5. Add a vhost user to the bridge created:
> >> >> ovs-vsctl add-port dpdkbr0 vhu0 -- set Interface vhu0
> >> >> type=dpdkvhostuserclient
> >> >>
> >> >> 6. Execute bash on the network namespace:
> >> >> ip netns exec ns1 bash
> >> >>
> >> >> 7. Use testpmd and connect the namespaces:
> >> >> testpmd --socket-mem=512
> >> >> --
> vdev='eth_vhost0,iface=/usr/local/var/run/openvswitch/vhu0,queues=1'
> >> >> --vdev='net_tap0,iface=tap0' --file-prefix page0 -- -i
> >> >>
> >> >>
> >> >> I repeated steps 3 - 7 for another network namespace on the same
> bridge.
> >> >> Following this, in fresh terminals, I assigned IP addresses to the tap
> >> >> devices created and tried pinging them. From port statistics,
> >> >> I identified the above mentioned issue with the dev_attached and
> queue
> >> >> statistics.
> >> >>
> >> >> I would greatly appreciate any help from your end.
> >> >>
> >> >> Thanks.
> >> >>
> >> >> -
> >> >> Bala Sankaran
> >> >> Networking Services Intern
> >> >> Red Hat Inc .,
> >> >>
> >> > -
> >> > Bala Sankaran
> >> > Networking Services Intern
> >>
> >> Regards,
> >> Keith
> >>
> >>
> >
> > --
> > Bala Sankaran
> > Networking Servic

Re: [dpdk-dev] [PATCH] net/vhost: Initialise vid to -1

2018-05-03 Thread Loftus, Ciara

> 
> >
> > On 04/27/2018 04:19 PM, Ciara Loftus wrote:
> > > rte_eth_vhost_get_vid_from_port_id returns a value of 0 if called before
> > > the first call to the new_device callback. A vid value >=0 suggests the
> > > device is active which is not the case in this instance. Initialise vid
> > > to a negative value to prevent this.
> > >
> > > Signed-off-by: Ciara Loftus 
> > > ---
> > >   drivers/net/vhost/rte_eth_vhost.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > > index 99a7727..f47950c 100644
> > > --- a/drivers/net/vhost/rte_eth_vhost.c
> > > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > > @@ -1051,6 +1051,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t rx_queue_id,
> > >   return -ENOMEM;
> > >   }
> > >
> > > + vq->vid = -1;
> > >   vq->mb_pool = mb_pool;
> > >   vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> > >   dev->data->rx_queues[rx_queue_id] = vq;
> > >
> >
> > Reviewed-by: Maxime Coquelin 
> >
> > Thanks,
> > Maxime
> 
> On second thoughts, self-NACK.
> 
> We need to provision for the case where we want to call
> eth_rx_queue_setup AFTER new_device. For instance when we want to
> change the mb_pool. In this case we need to maintain the same vid and not
> reset it to -1.
> 
> Without this patch the original problem still exists and need to find an
> alternative workaround.

Junjie's patches fix the issue I was observing. Thanks Junjie!
https://dpdk.org/browse/dpdk/commit/?id=30a701a53737a0b6f7953412cc3b3d36c1d49122
https://dpdk.org/browse/dpdk/commit/?id=e6722dee533cda3756fbc5c9ea4ddfbf30276f1b

Along with the v2 of this patch could they be considered for the 17.11 stable 
branch?

Thanks,
Ciara

> 
> Thanks,
> Ciara

Re: [dpdk-dev] [PATCH] net/vhost: Initialise vid to -1

2018-04-30 Thread Loftus, Ciara

> 
> On 04/27/2018 04:19 PM, Ciara Loftus wrote:
> > rte_eth_vhost_get_vid_from_port_id returns a value of 0 if called before
> > the first call to the new_device callback. A vid value >=0 suggests the
> > device is active which is not the case in this instance. Initialise vid
> > to a negative value to prevent this.
> >
> > Signed-off-by: Ciara Loftus 
> > ---
> >   drivers/net/vhost/rte_eth_vhost.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > index 99a7727..f47950c 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -1051,6 +1051,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> uint16_t rx_queue_id,
> > return -ENOMEM;
> > }
> >
> > +   vq->vid = -1;
> > vq->mb_pool = mb_pool;
> > vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> > dev->data->rx_queues[rx_queue_id] = vq;
> >
> 
> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Maxime

On second thoughts, self-NACK.

We need to provision for the case where we want to call eth_rx_queue_setup 
AFTER new_device. For instance when we want to change the mb_pool. In this case 
we need to maintain the same vid and not reset it to -1.

Without this patch the original problem still exists and need to find an 
alternative workaround.

Thanks,
Ciara

Re: [dpdk-dev] About : Enable optional dequeue zero copy for vHost User

2018-01-23 Thread Loftus, Ciara

Hi,

For the meantime this feature is proposed as ‘experimental’ for OVS DPDK.
Unless you are transmitting to a NIC, you don’t need to set the n_txq_desc.
My testing has been only with a DPDK driver in the guest. Have you tried that 
option?

Thanks,
Ciara

From: liyang07 [mailto:liyan...@corp.netease.com]
Sent: Wednesday, January 17, 2018 10:41 AM
To: Loftus, Ciara 
Cc: dev@dpdk.org
Subject: About : Enable optional dequeue zero copy for vHost User


Hi Ciara,

I am tesing the function of "vHost dequeue zero copy" for vm2vm on a host, 
and I have some problems:

1. The networking is OK before run iperf, I can ping successed from vm1 to 
vm2，but after run iperf, the networking between vm1 and vm2 is down;(I think 
n_txq_desc cause the problem)

2. I know the limitation about n_txq_desc, but I cannot set the n_txq_desc 
for dpdkport while the vms on the same host, because there is no dpdkports work 
fow my testing;



Thus, how can I resolve it, thanks.

Re: [dpdk-dev] [PATCH] vhost: fix dequeue zero copy not work with virtio1

2017-12-15 Thread Loftus, Ciara

> 
> Hi Junjie,
> 
> On 12/13/2017 05:50 PM, Junjie Chen wrote:
> > This fix dequeue zero copy can not work with Qemu
> > version >= 2.7. Since from Qemu 2.7 virtio device
> > use virtio-1 protocol, the zero copy code path
> > forget to add offset to buffer address.
> >
> > Signed-off-by: Junjie Chen 
> > ---
> >   lib/librte_vhost/virtio_net.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 6fee16e..79d80f7 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -977,7 +977,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > desc->addr + desc_offset,
> cpy_len {
> > cur->data_len = cpy_len;
> > cur->data_off = 0;
> > -   cur->buf_addr = (void *)(uintptr_t)desc_addr;
> > +   cur->buf_addr = (void *)(uintptr_t)(desc_addr
> > +   + desc_offset);
> > cur->buf_iova = hpa;
> >
> > /*
> >
> 
> Thanks for fixing this.
> 
> Reviewed-by: Maxime Coquelin 
> 
> Maxime

Thanks for the fix. Can this be considered for the stable branch?

Thanks,
Ciara

Re: [dpdk-dev] [PATCH] vhost: fix crash on NUMA

2017-06-02 Thread Loftus, Ciara

> The queue allocation was changed, from allocating one queue-pair at a
> time to one queue at a time. Most of the changes have been done, but
> just with one being missed: the size of coping the old queue is still
> based on queue-pair at numa_realloc(), which leads to overwritten issue.
> As a result, crash may happen.
> 
> Fix it by specifying the right copy size. Also, the net queue macros
> are not used any more. Remove them.
> 
> Fixes: ab4d7b9f1afc ("vhost: turn queue pair to vring")
> 
> Cc: sta...@dpdk.org
> Reported-by: Ciara Loftus 
> Signed-off-by: Yuanhan Liu 

Tested-by: Ciara Loftus 

> ---
>  lib/librte_vhost/vhost_user.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 5c8058b..e486b78 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -238,8 +238,6 @@ numa_realloc(struct virtio_net *dev, int index)
>   struct vhost_virtqueue *old_vq, *vq;
>   int ret;
> 
> - enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
> -
>   old_dev = dev;
>   vq = old_vq = dev->virtqueue[index];
> 
> @@ -261,7 +259,7 @@ numa_realloc(struct virtio_net *dev, int index)
>   if (!vq)
>   return dev;
> 
> - memcpy(vq, old_vq, sizeof(*vq) * VIRTIO_QNUM);
> + memcpy(vq, old_vq, sizeof(*vq));
>   rte_free(old_vq);
>   }
> 
> --
> 2.8.1

Re: [dpdk-dev] [PATCH] vhost: support rx_queue_count

2017-05-23 Thread Loftus, Ciara

> 
> This patch implements the ops rx_queue_count for vhost PMD by adding
> a helper function rte_vhost_rx_queue_count in vhost lib.
> 
> The ops ops rx_queue_count gets vhost RX queue avail count and helps
> to understand the queue fill level.
> 
> Signed-off-by: Zhihong Wang 
> ---
>  drivers/net/vhost/rte_eth_vhost.c  | 13 +
>  lib/librte_vhost/rte_vhost.h   | 12 
>  lib/librte_vhost/rte_vhost_version.map |  7 +++
>  lib/librte_vhost/vhost.c   | 23 +++
>  4 files changed, 55 insertions(+)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index 257bf6d..e3a3fe0 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -973,6 +973,18 @@ eth_link_update(struct rte_eth_dev *dev
> __rte_unused,
>   return 0;
>  }
> 
> +static uint32_t
> +eth_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
> +{
> + struct vhost_queue *vq;
> +
> + vq = dev->data->rx_queues[rx_queue_id];
> + if (!vq)
> + return 0;
> +
> + return rte_vhost_rx_queue_count(vq->vid, vq->virtqueue_id);
> +}
> +
>  static const struct eth_dev_ops ops = {
>   .dev_start = eth_dev_start,
>   .dev_stop = eth_dev_stop,
> @@ -984,6 +996,7 @@ static const struct eth_dev_ops ops = {
>   .rx_queue_release = eth_queue_release,
>   .tx_queue_release = eth_queue_release,
>   .tx_done_cleanup = eth_tx_done_cleanup,
> + .rx_queue_count = eth_rx_queue_count,
>   .link_update = eth_link_update,
>   .stats_get = eth_stats_get,
>   .stats_reset = eth_stats_reset,
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index 605e47c..f64ed20 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -432,6 +432,18 @@ int rte_vhost_get_mem_table(int vid, struct
> rte_vhost_memory **mem);
>  int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
> struct rte_vhost_vring *vring);
> 
> +/**
> + * Get vhost RX queue avail count.
> + *
> + * @param vid
> + *  vhost device ID
> + * @param qid
> + *  virtio queue index in mq case
> + * @return
> + *  num of desc available
> + */
> +uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_vhost/rte_vhost_version.map
> b/lib/librte_vhost/rte_vhost_version.map
> index 0785873..1e70495 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -45,3 +45,10 @@ DPDK_17.05 {
>   rte_vhost_log_write;
> 
>  } DPDK_16.07;
> +
> +DPDK_17.08 {
> + global:
> +
> + rte_vhost_rx_queue_count;
> +
> +} DPDK_17.05;
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 0b19d2e..140d2ae 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -475,3 +475,26 @@ rte_vhost_log_used_vring(int vid, uint16_t
> vring_idx,
> 
>   vhost_log_used_vring(dev, vq, offset, len);
>  }
> +
> +uint32_t
> +rte_vhost_rx_queue_count(int vid, uint16_t qid)
> +{
> + struct virtio_net *dev;
> + struct vhost_virtqueue *vq;
> +
> + dev = get_device(vid);
> + if (!dev)
> + return 0;
> +
> + if (unlikely(qid >= dev->nr_vring || (qid & 1) == 0)) {

I assume the '& 1' is to ensure it's a virtio txq? It might be clearer to use 
the VIRTIO_TXQ var or similar to make this clearer.

> + RTE_LOG(ERR, VHOST_DATA, "(%d) %s: invalid virtqueue idx
> %d.\n",
> + dev->vid, __func__, qid);
> + return 0;
> + }
> +
> + vq = dev->virtqueue[qid];
> + if (unlikely(vq->enabled == 0))
> + return 0;
> +
> + return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
> +}
> --
> 2.7.4


Acked-by: Ciara Loftus

Re: [dpdk-dev] [PATCH] vhost: fix MQ fails to startup

2017-04-27 Thread Loftus, Ciara

> 
> vhost since dpdk17.02 + qemu2.7 and above will cause failures of
> new connection when negotiating to set MQ. (one queue pair works
> well).Because there exist some bugs in qemu code when introducing
> VHOST_USER_PROTOCOL_F_REPLY_ACK to qemu. when dealing with the
> vhost
> message VHOST_USER_SET_MEM_TABLE for the second time, qemu indeed
> doesn't send the messge (The message needs to be sent only once)but
> still will be waiting for dpdk's reply ack, then, qemu is always
> freezing. DPDK code works in the right way. But the feature
> VHOST_USER_PROTOCOL_F_REPLY_ACK has to be disabled by default at the
> dpdk side in order to avoid the feature support of DPDK + qemu at
> the same time. if doing like that, MQ can works well. Once Qemu bugs
> have been fixed and upstreamed, we can enable it.
> 
> Fixes: 73c8f9f69c6c("vhost: introduce reply ack feature")
> 
> Reported-by: Loftus, Ciara 
> Signed-off-by: Zhiyong Yang 

Thanks for the fix Zhiyong. I tested the patch in my environment and it 
resolves the issue I was seeing.

Tested-by: Ciara Loftus 

Thanks,
Ciara

> ---
>  lib/librte_vhost/vhost_user.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index 2ba22db..a3d2900 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -52,7 +52,7 @@
>  #define VHOST_USER_PROTOCOL_FEATURES ((1ULL <<
> VHOST_USER_PROTOCOL_F_MQ) | \
>(1ULL <<
> VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
>(1ULL <<
> VHOST_USER_PROTOCOL_F_RARP) | \
> -  (1ULL <<
> VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
> +  (0ULL <<
> VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
>(1ULL <<
> VHOST_USER_PROTOCOL_F_NET_MTU))
> 
>  typedef enum VhostUserRequest {
> --
> 2.7.4

[dpdk-dev] [PATCH v7 2/2] net/vhost: add pmd xstats

2016-09-29 Thread Loftus, Ciara

> 
> This feature adds vhost pmd extended statistics from per port perspective
> in order to meet the requirements of the applications such as OVS etc.
> 
> The statistics counters are based on RFC 2819 and RFC 2863 as follows:
> 
> rx/tx_good_packets
> rx/tx_total_bytes
> rx/tx_missed_pkts
> rx/tx_broadcast_packets
> rx/tx_multicast_packets
> rx/tx_unicast_packets
> rx/tx_undersize_errors
> rx/tx_size_64_packets
> rx/tx_size_65_to_127_packets;
> rx/tx_size_128_to_255_packets;
> rx/tx_size_256_to_511_packets;
> rx/tx_size_512_to_1023_packets;
> rx/tx_size_1024_to_1522_packets;
> rx/tx_1523_to_max_packets;
> rx/tx_errors
> rx_fragmented_errors
> rx_jabber_errors
> rx_unknown_protos_packets;
> 
> No API is changed or added.
> rte_eth_xstats_get_names() to retrieve what kinds of vhost xstats are
> supported,
> rte_eth_xstats_get() to retrieve vhost extended statistics,
> rte_eth_xstats_reset() to reset vhost extended statistics.
> 
> The usage of vhost pmd xstats is the same as virtio pmd xstats.
> for example, when test-pmd application is running in interactive mode
> vhost pmd xstats will support the two following commands:
> 
> show port xstats all | port_id will show vhost xstats
> clear port xstats all | port_id will reset vhost xstats
> 
> net/virtio pmd xstats(the function virtio_update_packet_stats) is used
> as reference when implementing the feature.
> 
> Signed-off-by: Zhiyong Yang 
> ---
> 
> Changes in V7:
> 
> Removed the "_portX" prepend to the xstat names. Keep vhost xstats name
> consistent with physical NIC i40e, ixgbe, etc.
> 
> Changes in V6:
> 
> 1. Change xstats from per queue to per port. Keep vhost consistent with
> physical NIC i40e, ixgbe, etc.
> 2. Added the release note.
> 
> Changes in V5:
> for vhost_count_multicast_broadcast, passing struct rte_mbuf *buf instead
> of struct rte_mbuf **buf and remove the 3th parameter uint16_t count;.
> 
> Changes in v4:
> 1. add a member VHOST_XSTATS_MAX in enum vhost_xstats_pkts, So, we
> can
> define uint64_t xstats[VHOST_XSTATS_MAX]; instead of xstats[16].
> 2. restore unicast_packets and update it in the function
> vhost_dev_xstats_get
> 3. move the loop out of function vhost_count_multicast_broadcast in order
> to reduce the computation.
> 
> Changes in v3:
> 1. rework the vhost_update_packet_xstats and separate it into two parts.
>One function deals with the generic packets update, another one deals
>with increasing the broadcast and multicast with failure packets sent
>according to RFC2863 page42 ifHCOutMulticastPkts ifHCOutBroadcastPkts.
> 2. define enum vhost_stat_pkts to replace the magic numbers and enhance
>the code readability.
> 3. remove some unnecessary type casts and fix one format issue.
> 
> Changes in v2:
> 1. remove the compiling switch.
> 2. fix two code bugs.
> 
>  doc/guides/rel_notes/release_16_11.rst |   4 +
>  drivers/net/vhost/rte_eth_vhost.c  | 276
> -
>  2 files changed, 275 insertions(+), 5 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_16_11.rst
> b/doc/guides/rel_notes/release_16_11.rst
> index 66916af..ae90baf 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -36,6 +36,10 @@ New Features
> 
>   This section is a comment. Make sure to start the actual text at the
> margin.
> 
> +* **Added vhost pmd xstats support.**
> +
> +  Added vhost pmd extended statistics from per port perspective.
> +
> 
>  Resolved Issues
>  ---
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index d99d4ee..ef7b037 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -72,10 +72,30 @@ static struct ether_addr base_eth_addr = {
>   }
>  };
> 
> +enum vhost_xstats_pkts {
> + VHOST_UNDERSIZE_PKT = 0,
> + VHOST_64_PKT,
> + VHOST_65_TO_127_PKT,
> + VHOST_128_TO_255_PKT,
> + VHOST_256_TO_511_PKT,
> + VHOST_512_TO_1023_PKT,
> + VHOST_1024_TO_1522_PKT,
> + VHOST_1523_TO_MAX_PKT,
> + VHOST_BROADCAST_PKT,
> + VHOST_MULTICAST_PKT,
> + VHOST_UNICAST_PKT,
> + VHOST_ERRORS_PKT,
> + VHOST_ERRORS_FRAGMENTED,
> + VHOST_ERRORS_JABBER,
> + VHOST_UNKNOWN_PROTOCOL,
> + VHOST_XSTATS_MAX,
> +};
> +
>  struct vhost_stats {
>   uint64_t pkts;
>   uint64_t bytes;
>   uint64_t missed_pkts;
> + uint64_t xstats[VHOST_XSTATS_MAX];
>  };
> 
>  struct vhost_queue {
> @@ -86,11 +106,7 @@ struct vhost_queue {
>   struct rte_mempool *mb_pool;
>   uint8_t port;
>   uint16_t virtqueue_id;
> - uint64_t rx_pkts;
> - uint64_t tx_pkts;
> - uint64_t missed_pkts;
> - uint64_t rx_bytes;
> - uint64_t tx_bytes;
> + struct vhost_stats stats;
>  };
> 
>  struct pmd_internal {
> @@ -133,6 +149,242 @@ struct rte_vhost_vring_state {
> 
>  static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
> 
> +#define VHOST_XSTATS_NAME_SIZE 64
> +
> +str

[dpdk-dev] [PATCH] net/vhost: Add function to retreive the 'vid' for a given port id

2016-09-23 Thread Loftus, Ciara

> On Fri, Sep 23, 2016 at 10:43:20AM +0200, Thomas Monjalon wrote:
> > 2016-09-23 12:26, Yuanhan Liu:
> > > On Thu, Sep 22, 2016 at 06:43:55PM +0200, Thomas Monjalon wrote:
> > > > > > > > > > There could be a similar need in other PMD.
> > > > > > > > > > If we can get an opaque identifier of the device which is 
> > > > > > > > > > not
> the port id,
> > > > > > > > > > we could call some specific functions of the driver not
> implemented in
> > > > > > > > > > the generic ethdev API.
> > > > > > > > >
> > > > > > > > > That means you have to add/export the PMD API first. Isn't it
> against what
> > > > > > > > > you are proposing -- "I think we should not add any API to the
> PMDs" ;)
> > > > > > > >
> > > > > > > > Yes you are totally right :)
> > > > > > > > Except that in vhost case, we would not have any API in the
> PMD.
> > > > > > > > But it would allow to have some specific API in other PMDs for
> the features
> > > > > > > > which do not fit in a generic API.
> > > > > > >
> > > > > > > So, does that mean you are okay with this patch now? I mean,
> okay to introduce
> > > > > > > a vhost PMD API?
> > > > > >
> > > > > > It means I would be in favor of introducing API in drivers for very
> specific
> > > > > > features.
> > > > > > In this case, I am not sure that retrieving an internal id is very
> specific.
> > > > >
> > > > > It's not, instead, it's very generic. The "internal id" is actually 
> > > > > the
> > > > > public interface to vhost-user application, like "fd" to file APIs.
> > > > >
> > > > > Instead of introducing a few specific wrappers/APIs, I'd prefer to
> > > > > introduce a generic one to get the handle, and let the application to
> > > > > call other vhost APIs.
> > > >
> > > > Yes it makes sense.
> > > > I was thinking of introducing a function to get an internal id from
> ethdev,
> > > > in order to use it with any driver or underlying library.
> > > > But it would be an opaque pointer and you need an int.
> > > > Note that we can cast an int into a pointer, so I am not sure what is
> best.
> > >
> > > Yes, that should work. But I just doubt what the "opaque pointer" could
> be
> > > for other PMD drivers, and what the application could do with it. For a
> > > typical nic PMD driver, I can think of nothing is valuable to export to
> > > user applications.
> > >
> > > But maybe it's valuable to other virtual PMD drives as well, like the TAP
> > > pmd from Keith?
> > >
> > > If so, we may go that way.
> >
> > I would like to have more opinions/votes before proceeding.
> 
> Sure, fair enough. There is no rush.

My hope would be have this, or at least some way to access 
rte_vhost_get_queue_num(vid) from the PMD in 16.11. We can't integrate the PMD 
into OVS until we achieve this. Is this likely at this stage given the 
uncertainty around the API?

Thanks,
Ciara

> 
> > > Another thought is that, it may be a bit weird to me to introduce an API
> > > to get an opaque pointer. I mean, it's a bit hard to document it, because
> > > it has different meaning for different drivers. Should we list all of
> > > them then?
> >
> > I think it can be documented in API using this handler how it can
> > be retrieved. In your case, the vhost lib can explain that the vid
> > is retrieved from the PMD with this generic ethdev function.
> 
> Okay.
> 
>   --yliu

[dpdk-dev] [PATCH] doc: deprecate vhost-cuse

2016-07-21 Thread Loftus, Ciara

> Subject: [dpdk-dev] [PATCH] doc: deprecate vhost-cuse
> 
> Vhost-cuse was invented before vhost-user exist. The both are actually
> doing the same thing: a vhost-net implementation in user space. But they
> are not exactly the same thing.
> 
> Firstly, vhost-cuse is harder for use; no one seems to care it, either.
> Furthermore, since v2.1, a large majority of development effort has gone
> to vhost-user. For example, we extended the vhost-user spec to add the
> multiple queue support. We also added the vhost-user live migration at
> v16.04 and the latest one, vhost-user reconnect that allows vhost app
> restart without restarting the guest. Both of them are very important
> features for product usage and none of them works for vhost-cuse.
> 
> You now see that the difference between vhost-user and vhost-cuse is
> big (and will be bigger and bigger as time moves forward), that you
> should never use vhost-cuse, that we should drop it completely.
> 
> The remove would also result to a much cleaner code base, allowing us
> to do all kinds of extending easier.
> 
> So here to mark vhost-cuse as deprecated in this release and will be
> removed in the next release (v16.11).
> 
> Signed-off-by: Yuanhan Liu 
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index f502f86..ee99558 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,7 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are
> deprecated and
>will be removed in 16.11.
>It is replaced by rte_mempool_generic_get/put functions.
> +
> +* The vhost-cuse will be removed in 16.11. Since v2.1, a large majority of
> +  development effort has gone to vhost-user, such as multiple-queue, live
> +  migration, reconnect etc. Therefore, vhost-user should be used instead.
> --
> 1.9.0

Acked-by: Ciara Loftus

[dpdk-dev] [PATCH] vhost: fix missing flag reset on stop

2016-06-29 Thread Loftus, Ciara

> 
> Commit 550c9d27d143 ("vhost: set/reset device flags internally") moves
> the VIRTIO_DEV_RUNNING set/reset to vhost lib. But I missed one reset
> on stop; here fixes it.
> 
> Fixes: 550c9d27d143 ("vhost: set/reset device flags internally")
> 
> Reported-by: Loftus Ciara 
> Signed-off-by: Yuanhan Liu 
> ---
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c
> b/lib/librte_vhost/vhost_user/virtio-net-user.c
> index a6a48dc..e7c4347 100644
> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> @@ -317,8 +317,10 @@ user_get_vring_base(int vid, struct
> vhost_vring_state *state)
>   if (dev == NULL)
>   return -1;
>   /* We have to stop the queue (virtio) if it is running. */
> - if (dev->flags & VIRTIO_DEV_RUNNING)
> + if (dev->flags & VIRTIO_DEV_RUNNING) {
> + dev->flags &= ~VIRTIO_DEV_RUNNING;
>   notify_ops->destroy_device(vid);
> + }
> 
>   /* Here we are safe to get the last used index */
>   vhost_get_vring_base(vid, state->index, state);
> --
> 1.9.0

Thanks for the patch. I've tested it and it solves the issue I was seeing where 
destroy_device was being called too many times.

Tested-by: Ciara Loftus

1 2 >

1 - 100 of 120 matches

Mail list logo