Re: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread David Marchand
On Fri, Jul 1, 2022 at 8:25 AM Huang, Wei  wrote:
> > >
> > > rte_pmd_ifpga_get_pci_bus() documentation is vague and it is unclear
> > > what could be done with it.
> > > On the other hand, EAL provides a standard API to retrieve a bus
> > > object by name.
> > >
> Agree, this API is used in an external application, I can use 
> rte_get_bus_by_name() to replace it.

What is the PCI bus used for, in this application?


> I will submit a patch to remove rte_pmd_ifpga_get_pci_bus() after DPDK22.07 
> is released.

I sent a patch a few days ago:
https://patches.dpdk.org/project/dpdk/patch/20220628144643.1213026-3-david.march...@redhat.com/


-- 
David Marchand



RE: [PATCH v2] vhost: fix unchecked return value

2022-07-01 Thread Hu, Jiayu
Hi Maxime,

> -Original Message-
> From: Maxime Coquelin 
> Sent: Thursday, June 30, 2022 5:57 PM
> To: Hu, Jiayu ; dev@dpdk.org
> Cc: Xia, Chenbo ; sta...@dpdk.org; David Marchand
> 
> Subject: Re: [PATCH v2] vhost: fix unchecked return value
> 
> 
> 
> On 6/29/22 11:07, Jiayu Hu wrote:
> > This patch checks the return value of rte_dma_info_get() called in
> > rte_vhost_async_dma_configure().
> >
> > Coverity issue: 379066
> > Fixes: 53d3f4778c1d ("vhost: integrate dmadev in asynchronous
> > data-path")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Jiayu Hu 
> > Reviewed-by: Chenbo Xia 
> > ---
> > v2:
> > - add cc stable tag
> > ---
> >   lib/vhost/vhost.c | 6 +-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index
> > b14521e4d1..70c04c036e 100644
> > --- a/lib/vhost/vhost.c
> > +++ b/lib/vhost/vhost.c
> > @@ -1868,7 +1868,11 @@ rte_vhost_async_dma_configure(int16_t
> dma_id, uint16_t vchan_id)
> > return -1;
> > }
> >
> > -   rte_dma_info_get(dma_id, &info);
> > +   if (rte_dma_info_get(dma_id, &info) != 0) {
> > +   VHOST_LOG_CONFIG(ERR, "Fail to get DMA %d
> information.\n", dma_id);
> > +   return -1;
> > +   }
> > +
> > if (vchan_id >= info.max_vchans) {
> > VHOST_LOG_CONFIG(ERR, "Invalid DMA %d vChannel %u.\n",
> dma_id, vchan_id);
> > return -1;
> 
> The patch itself looks good, but rte_vhost_async_dma_configure() should be
> protected by a lock, as concurrent calls of this function would lead to
> undefined behavior.

This function is expected to be called only once. Is there any use case to 
cause it
called concurrently?

Thanks,
Jiayu
> 
> Can you cook something?
> 
> David, is that the issue you mentioned me this week or was it another one?
> 
> Thanks,
> Maxime



Re: [PATCH 4/4] vhost: prefix logs with context

2022-07-01 Thread David Marchand
On Thu, Jun 30, 2022 at 6:13 PM Maxime Coquelin
 wrote:
> On 6/27/22 11:27, David Marchand wrote:
> > We recently improved the log messages in the vhost library, adding some
> > context that helps filtering for a given vhost-user device.
> > However, some parts of the code were missed, and some later code changes
> > broke this new convention (fixes were sent previous to this patch).
> >
> > Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
> > used as context. This should help limit regressions on this topic.
> >
> > Most of the time, the context is the vhost-user device socket path.
> > For the rest when a vhost-user device can not be related, generic
> > names were chosen:
> > - "dma", for vhost-user async DMA operations,
> > - "device", for vhost-user device creation and lookup,
> > - "thread", for threads management,
> >
> > Signed-off-by: David Marchand 
> > ---
> >   lib/vhost/iotlb.c  |  30 +-
> >   lib/vhost/socket.c | 129 -
> >   lib/vhost/vdpa.c   |   4 +-
> >   lib/vhost/vhost.c  | 144 -
> >   lib/vhost/vhost.h  |  20 +-
> >   lib/vhost/vhost_user.c | 642 +
> >   lib/vhost/virtio_net.c | 258 +
> >   7 files changed, 634 insertions(+), 593 deletions(-)
> >
>
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > index 810bc71c9d..310aaf88ff 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -625,14 +625,14 @@ vhost_log_write_iova(struct virtio_net *dev, struct 
> > vhost_virtqueue *vq,
> >   extern int vhost_config_log_level;
> >   extern int vhost_data_log_level;
> >
> > -#define VHOST_LOG_CONFIG(level, fmt, args...)\
> > +#define VHOST_LOG_CONFIG(prefix, level, fmt, args...)\
> >   rte_log(RTE_LOG_ ## level, vhost_config_log_level,  \
> > - "VHOST_CONFIG: " fmt, ##args)
> > + "VHOST_CONFIG: (%s): " fmt, prefix, ##args)
> >
> > -#define VHOST_LOG_DATA(level, fmt, args...) \
> > +#define VHOST_LOG_DATA(prefix, level, fmt, args...)  \
> >   (void)((RTE_LOG_ ## level <= RTE_LOG_DP_LEVEL) ?\
> >rte_log(RTE_LOG_ ## level,  vhost_data_log_level,  \
> > - "VHOST_DATA : " fmt, ##args) :  \
> > + "VHOST_DATA: (%s):" fmt, prefix, ##args) :  \
> >0)
>
> As discussed off-list, adding the function will break OVS tests once
> again. I propose to pick the first 3 patches for now.

The issue with OVS tests is that they match the log message content,
and this current patch changes the format.

For example, before we have:
VHOST_CONFIG: (vhost0.sock) vhost-user server: socket created, fd: 57
VHOST_CONFIG: (vhost0.sock) binding succeeded
After:
VHOST_CONFIG: (vhost0.sock): vhost-user server: socket created, fd: 56
VHOST_CONFIG: (vhost0.sock): binding succeeded

I can respin, removing the extra ':' in VHOST_* macros.
WDYT?


-- 
David Marchand



RE: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread Huang, Wei


> -Original Message-
> From: David Marchand 
> Sent: Friday, July 1, 2022 15:01
> To: Huang, Wei ; Xu, Rosen ;
> Zhang, Tianfei 
> Cc: dev@dpdk.org; Ray Kinsella 
> Subject: Re: [PATCH] doc: announce some raw/ifpga API removal
> 
> On Fri, Jul 1, 2022 at 8:25 AM Huang, Wei  wrote:
> > > >
> > > > rte_pmd_ifpga_get_pci_bus() documentation is vague and it is
> > > > unclear what could be done with it.
> > > > On the other hand, EAL provides a standard API to retrieve a bus
> > > > object by name.
> > > >
> > Agree, this API is used in an external application, I can use
> rte_get_bus_by_name() to replace it.
> 
> What is the PCI bus used for, in this application?
> 
In this application, target PCI device is Intel FPGA, it supports some special 
operation like removing it from PCI bus and rescanning it back from PCI bus,
So there are two things need to be done directly on rte_pci_bus:
1. Rescan PCI bus
 pci_bus->bus.scan()
2. Get pci_dev by specified PCI address, and remove it
TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
return pci_dev;
}

pci_drv = pci_dev->driver;
pci_drv->remove(pci_dev);
> 
> > I will submit a patch to remove rte_pmd_ifpga_get_pci_bus() after DPDK22.07
> is released.
> 
> I sent a patch a few days ago:
> https://patches.dpdk.org/project/dpdk/patch/20220628144643.1213026-3-
> david.march...@redhat.com/
> 
> 
> --
> David Marchand



Re: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread David Marchand
On Fri, Jul 1, 2022 at 9:16 AM Huang, Wei  wrote:
> > What is the PCI bus used for, in this application?
> >
> In this application, target PCI device is Intel FPGA, it supports some 
> special operation like removing it from PCI bus and rescanning it back from 
> PCI bus,
> So there are two things need to be done directly on rte_pci_bus:
> 1. Rescan PCI bus
>  pci_bus->bus.scan()
> 2. Get pci_dev by specified PCI address, and remove it
> TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
> if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
> return pci_dev;
> }
> 
> pci_drv = pci_dev->driver;
> pci_drv->remove(pci_dev);

Can't this application use rte_dev_remove and rte_dev_probe?
If not, we should add the missing parts in the API.


-- 
David Marchand



[Bug 1047] support of run time updatation of dpdk qos sched pipe profiles param

2022-07-01 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1047

Bug ID: 1047
   Summary: support of run time updatation of dpdk qos sched pipe
profiles param
   Product: DPDK
   Version: 21.05
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: other
  Assignee: dev@dpdk.org
  Reporter: vikash.ku...@amantyatech.com
  Target Milestone: ---

We are using DPDK Hierarchical Scheduler

I have a case where I want to update the credits for pipe and subport profiles
in run time and it may be possible for multiple time to update in running
system. 
I didn't found any API to update/modify subport and pipe profiles at run time.
There are API to add new pipe profiles at run time :  
int rte_sched_subport_pipe_profile_add (struct rte_sched_port *port,
uint32_t subport_id, struct rte_sched_pipe_params *params, uint32_t
*pipe_profile_id)

I want to know another API by which we can update/modify same profile at
runtime.

It's very urgent for me so, please provide me some solutions.

Thanks in advance.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH v2] vhost: fix unchecked return value

2022-07-01 Thread Maxime Coquelin




On 7/1/22 09:11, Hu, Jiayu wrote:

Hi Maxime,


-Original Message-
From: Maxime Coquelin 
Sent: Thursday, June 30, 2022 5:57 PM
To: Hu, Jiayu ; dev@dpdk.org
Cc: Xia, Chenbo ; sta...@dpdk.org; David Marchand

Subject: Re: [PATCH v2] vhost: fix unchecked return value



On 6/29/22 11:07, Jiayu Hu wrote:

This patch checks the return value of rte_dma_info_get() called in
rte_vhost_async_dma_configure().

Coverity issue: 379066
Fixes: 53d3f4778c1d ("vhost: integrate dmadev in asynchronous
data-path")
Cc: sta...@dpdk.org

Signed-off-by: Jiayu Hu 
Reviewed-by: Chenbo Xia 
---
v2:
- add cc stable tag
---
   lib/vhost/vhost.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index
b14521e4d1..70c04c036e 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1868,7 +1868,11 @@ rte_vhost_async_dma_configure(int16_t

dma_id, uint16_t vchan_id)

return -1;
}

-   rte_dma_info_get(dma_id, &info);
+   if (rte_dma_info_get(dma_id, &info) != 0) {
+   VHOST_LOG_CONFIG(ERR, "Fail to get DMA %d

information.\n", dma_id);

+   return -1;
+   }
+
if (vchan_id >= info.max_vchans) {
VHOST_LOG_CONFIG(ERR, "Invalid DMA %d vChannel %u.\n",

dma_id, vchan_id);

return -1;


The patch itself looks good, but rte_vhost_async_dma_configure() should be
protected by a lock, as concurrent calls of this function would lead to
undefined behavior.


This function is expected to be called only once. Is there any use case to 
cause it
called concurrently?


Ok, so what about:


static bool dma_configured:

int
rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id)
{
struct rte_dma_info info;
void *pkts_cmpl_flag_addr;
uint16_t max_desc;

if (dma_configured)
return -1;

dma_configured = true;



If this is called only once, this should be OK. ;)


Maxime


Thanks,
Jiayu


Can you cook something?

David, is that the issue you mentioned me this week or was it another one?

Thanks,
Maxime






Re: [PATCH 4/4] vhost: prefix logs with context

2022-07-01 Thread Maxime Coquelin




On 7/1/22 09:13, David Marchand wrote:

On Thu, Jun 30, 2022 at 6:13 PM Maxime Coquelin
 wrote:

On 6/27/22 11:27, David Marchand wrote:

We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).

Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.

Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,

Signed-off-by: David Marchand 
---
   lib/vhost/iotlb.c  |  30 +-
   lib/vhost/socket.c | 129 -
   lib/vhost/vdpa.c   |   4 +-
   lib/vhost/vhost.c  | 144 -
   lib/vhost/vhost.h  |  20 +-
   lib/vhost/vhost_user.c | 642 +
   lib/vhost/virtio_net.c | 258 +
   7 files changed, 634 insertions(+), 593 deletions(-)




diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 810bc71c9d..310aaf88ff 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -625,14 +625,14 @@ vhost_log_write_iova(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
   extern int vhost_config_log_level;
   extern int vhost_data_log_level;

-#define VHOST_LOG_CONFIG(level, fmt, args...)\
+#define VHOST_LOG_CONFIG(prefix, level, fmt, args...)\
   rte_log(RTE_LOG_ ## level, vhost_config_log_level,  \
- "VHOST_CONFIG: " fmt, ##args)
+ "VHOST_CONFIG: (%s): " fmt, prefix, ##args)

-#define VHOST_LOG_DATA(level, fmt, args...) \
+#define VHOST_LOG_DATA(prefix, level, fmt, args...)  \
   (void)((RTE_LOG_ ## level <= RTE_LOG_DP_LEVEL) ?\
rte_log(RTE_LOG_ ## level,  vhost_data_log_level,  \
- "VHOST_DATA : " fmt, ##args) :  \
+ "VHOST_DATA: (%s):" fmt, prefix, ##args) :  \
0)


As discussed off-list, adding the function will break OVS tests once
again. I propose to pick the first 3 patches for now.


The issue with OVS tests is that they match the log message content,
and this current patch changes the format.

For example, before we have:
VHOST_CONFIG: (vhost0.sock) vhost-user server: socket created, fd: 57
VHOST_CONFIG: (vhost0.sock) binding succeeded
After:
VHOST_CONFIG: (vhost0.sock): vhost-user server: socket created, fd: 56
VHOST_CONFIG: (vhost0.sock): binding succeeded

I can respin, removing the extra ':' in VHOST_* macros.
WDYT?



Sounds good!

Thanks,
Maxime



[PATCH v2 0/4] Vhost logs fixes and improvement

2022-07-01 Thread David Marchand
Here is a series that fixes log messages (with one regression being
fixed in patch 2) and changes the VHOST_LOG_* helpers to enforce that
vhost log messages will always have some context/prefix to help
debugging on setups with many vhost ports.

The first three patches are low risk and can probably be merged in
v22.07.

Changes since v1:
- fixed log formats in patch4,


-- 
David Marchand

David Marchand (4):
  vhost: add some trailing newline in log messages
  vhost: restore device information in log messages
  vhost: improve some datapath log messages
  vhost: prefix logs with context

 lib/vhost/iotlb.c  |  30 +-
 lib/vhost/socket.c | 129 -
 lib/vhost/vdpa.c   |   4 +-
 lib/vhost/vhost.c  | 144 +-
 lib/vhost/vhost.h  |  20 +-
 lib/vhost/vhost_user.c | 639 +
 lib/vhost/virtio_net.c | 258 +
 7 files changed, 634 insertions(+), 590 deletions(-)

-- 
2.36.1



[PATCH v2 1/4] vhost: add some trailing newline in log messages

2022-07-01 Thread David Marchand
VHOST_LOG_* macros don't append a newline.
Add missing ones.

Fixes: e623e0c6d8a5 ("vhost: add reconnect ability")
Fixes: af1475918124 ("vhost: introduce API to start a specific driver")
Fixes: 2dfeebe26546 ("vhost: check return of mutex initialization")
Cc: sta...@dpdk.org

Signed-off-by: David Marchand 
Reviewed-by: Chenbo Xia 
---
 lib/vhost/socket.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 7a0f63af14..24d60ca149 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -499,7 +499,7 @@ vhost_user_reconnect_init(void)
 
ret = pthread_mutex_init(&reconn_list.mutex, NULL);
if (ret < 0) {
-   VHOST_LOG_CONFIG(ERR, "%s: failed to initialize mutex", 
__func__);
+   VHOST_LOG_CONFIG(ERR, "%s: failed to initialize mutex\n", 
__func__);
return ret;
}
TAILQ_INIT(&reconn_list.head);
@@ -507,9 +507,9 @@ vhost_user_reconnect_init(void)
ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL,
 vhost_user_client_reconnect, NULL);
if (ret != 0) {
-   VHOST_LOG_CONFIG(ERR, "failed to create reconnect thread");
+   VHOST_LOG_CONFIG(ERR, "failed to create reconnect thread\n");
if (pthread_mutex_destroy(&reconn_list.mutex))
-   VHOST_LOG_CONFIG(ERR, "%s: failed to destroy reconnect 
mutex", __func__);
+   VHOST_LOG_CONFIG(ERR, "%s: failed to destroy reconnect 
mutex\n", __func__);
}
 
return ret;
@@ -1170,8 +1170,8 @@ rte_vhost_driver_start(const char *path)
"vhost-events", NULL, fdset_event_dispatch,
&vhost_user.fdset);
if (ret != 0) {
-   VHOST_LOG_CONFIG(ERR, "(%s) failed to create fdset 
handling thread", path);
-
+   VHOST_LOG_CONFIG(ERR, "(%s) failed to create fdset 
handling thread\n",
+   path);
fdset_pipe_uninit(&vhost_user.fdset);
return -1;
}
-- 
2.36.1



[PATCH v2 2/4] vhost: restore device information in log messages

2022-07-01 Thread David Marchand
device information in the log messages was dropped.

Fixes: 52ade97e3641 ("vhost: fix physical address mapping")

Signed-off-by: David Marchand 
Reviewed-by: Chenbo Xia 
---
 lib/vhost/vhost_user.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 2b9a3b69fa..f324f822d6 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -144,7 +144,8 @@ async_dma_map(struct virtio_net *dev, bool do_map)
return;
 
/* DMA mapping errors won't stop 
VHOST_USER_SET_MEM_TABLE. */
-   VHOST_LOG_CONFIG(ERR, "DMA engine map 
failed\n");
+   VHOST_LOG_CONFIG(ERR, "(%s) DMA engine map 
failed\n",
+   dev->ifname);
}
}
 
@@ -160,7 +161,8 @@ async_dma_map(struct virtio_net *dev, bool do_map)
if (rte_errno == EINVAL)
return;
 
-   VHOST_LOG_CONFIG(ERR, "DMA engine unmap 
failed\n");
+   VHOST_LOG_CONFIG(ERR, "(%s) DMA engine unmap 
failed\n",
+   dev->ifname);
}
}
}
@@ -945,7 +947,8 @@ add_one_guest_page(struct virtio_net *dev, uint64_t 
guest_phys_addr,
dev->max_guest_pages * sizeof(*page),
RTE_CACHE_LINE_SIZE);
if (dev->guest_pages == NULL) {
-   VHOST_LOG_CONFIG(ERR, "cannot realloc guest_pages\n");
+   VHOST_LOG_CONFIG(ERR, "(%s) cannot realloc 
guest_pages\n",
+   dev->ifname);
rte_free(old_pages);
return -1;
}
-- 
2.36.1



[PATCH v2 3/4] vhost: improve some datapath log messages

2022-07-01 Thread David Marchand
Those messages were missed when adding socket context.
Fix this.

Signed-off-by: David Marchand 
Reviewed-by: Maxime Coquelin 
---
 lib/vhost/vhost.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 4ebcb7448a..810bc71c9d 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -652,7 +652,7 @@ extern int vhost_data_log_level;
} \
snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), 
VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
\
-   VHOST_LOG_DATA(DEBUG, "%s", packet); \
+   VHOST_LOG_DATA(DEBUG, "(%s) %s", device->ifname, packet); \
 } while (0)
 #else
 #define PRINT_PACKET(device, addr, size, header) do {} while (0)
@@ -866,8 +866,8 @@ vhost_vring_call_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
vq->signalled_used = new;
vq->signalled_used_valid = true;
 
-   VHOST_LOG_DATA(DEBUG, "%s: used_event_idx=%d, old=%d, new=%d\n",
-   __func__,
+   VHOST_LOG_DATA(DEBUG, "(%s) %s: used_event_idx=%d, old=%d, 
new=%d\n",
+   dev->ifname, __func__,
vhost_used_event(vq),
old, new);
 
-- 
2.36.1



[PATCH v2 4/4] vhost: prefix logs with context

2022-07-01 Thread David Marchand
We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).

Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.

Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,

Signed-off-by: David Marchand 
---
Changes since v1:
- preserved original format for logs (removing extra ':'),

---
 lib/vhost/iotlb.c  |  30 +-
 lib/vhost/socket.c | 129 -
 lib/vhost/vdpa.c   |   4 +-
 lib/vhost/vhost.c  | 144 -
 lib/vhost/vhost.h  |  20 +-
 lib/vhost/vhost_user.c | 642 +
 lib/vhost/virtio_net.c | 258 +
 7 files changed, 634 insertions(+), 593 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 5a5ba8b82a..35b4193606 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -70,18 +70,18 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, 
struct vhost_virtqueue *
 
ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
if (ret) {
-   VHOST_LOG_CONFIG(DEBUG,
-   "(%s) IOTLB pool %s empty, clear entries for 
pending insertion\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+   "IOTLB pool %s empty, clear entries for pending 
insertion\n",
+   vq->iotlb_pool->name);
if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
vhost_user_iotlb_pending_remove_all(vq);
else
vhost_user_iotlb_cache_random_evict(vq);
ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "(%s) IOTLB pool %s still empty, 
pending insertion failure\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR,
+   "IOTLB pool %s still empty, pending insertion 
failure\n",
+   vq->iotlb_pool->name);
return;
}
}
@@ -169,18 +169,18 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, 
struct vhost_virtqueue *vq
 
ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
if (ret) {
-   VHOST_LOG_CONFIG(DEBUG,
-   "(%s) IOTLB pool %s empty, clear entries for 
cache insertion\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+   "IOTLB pool %s empty, clear entries for cache 
insertion\n",
+   vq->iotlb_pool->name);
if (!TAILQ_EMPTY(&vq->iotlb_list))
vhost_user_iotlb_cache_random_evict(vq);
else
vhost_user_iotlb_pending_remove_all(vq);
ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "(%s) IOTLB pool %s still empty, cache 
insertion failed\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR,
+   "IOTLB pool %s still empty, cache insertion 
failed\n",
+   vq->iotlb_pool->name);
return;
}
}
@@ -320,7 +320,7 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
 
snprintf(pool_name, sizeof(pool_name), "iotlb_%u_%d_%d",
getpid(), dev->vid, vq_index);
-   VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB cache name: %s\n", dev->ifname, 
pool_name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG, "IOTLB cache name: %s\n", 
pool_name);
 
/* If already created, free it and recreate */
vq->iotlb_pool = rte_mempool_lookup(pool_name);
@@ -332,8 +332,8 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
RTE_MEMPOOL_F_NO_CACHE_ALIGN |
RTE_MEMPOOL_F_SP_PUT);
if (!vq->iotlb_pool) {
-   VHOST_LOG_CONFIG(ERR, "(%s) Failed to create IOTLB cache pool 
%s\n",
-   dev->ifname, pool_name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB 
cache pool %s\n",
+

Re: [PATCH v2 4/4] vhost: prefix logs with context

2022-07-01 Thread Maxime Coquelin




On 7/1/22 09:55, David Marchand wrote:

We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).

Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.

Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,

Signed-off-by: David Marchand 
---
Changes since v1:
- preserved original format for logs (removing extra ':'),

---
  lib/vhost/iotlb.c  |  30 +-
  lib/vhost/socket.c | 129 -
  lib/vhost/vdpa.c   |   4 +-
  lib/vhost/vhost.c  | 144 -
  lib/vhost/vhost.h  |  20 +-
  lib/vhost/vhost_user.c | 642 +
  lib/vhost/virtio_net.c | 258 +
  7 files changed, 634 insertions(+), 593 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



[PATCH 2/2] net/octeontx_ep: support link status

2022-07-01 Thread Sathesh Edara
Added functionality to update link speed, duplex mode and link state.

Signed-off-by: Sathesh Edara 
---
 doc/guides/nics/features/octeontx_ep.ini |  1 +
 drivers/net/octeontx_ep/otx_ep_ethdev.c  | 17 +
 2 files changed, 18 insertions(+)

diff --git a/doc/guides/nics/features/octeontx_ep.ini 
b/doc/guides/nics/features/octeontx_ep.ini
index e0c469676e..1423963adc 100644
--- a/doc/guides/nics/features/octeontx_ep.ini
+++ b/doc/guides/nics/features/octeontx_ep.ini
@@ -9,4 +9,5 @@ SR-IOV   = Y
 Linux= Y
 x86-64   = Y
 Basic stats  = Y
+Link status  = Y
 Usage doc= Y
diff --git a/drivers/net/octeontx_ep/otx_ep_ethdev.c 
b/drivers/net/octeontx_ep/otx_ep_ethdev.c
index cb45bd7a8a..a44c8f5217 100644
--- a/drivers/net/octeontx_ep/otx_ep_ethdev.c
+++ b/drivers/net/octeontx_ep/otx_ep_ethdev.c
@@ -387,6 +387,22 @@ otx_ep_dev_stats_get(struct rte_eth_dev *eth_dev,
return 0;
 }
 
+static int
+otx_ep_link_update(struct rte_eth_dev *eth_dev, int wait_to_complete)
+{
+   RTE_SET_USED(wait_to_complete);
+
+   if (!eth_dev->data->dev_started)
+   return 0;
+   struct rte_eth_link link;
+
+   memset(&link, 0, sizeof(link));
+   link.link_status = RTE_ETH_LINK_UP;
+   link.link_speed  = RTE_ETH_SPEED_NUM_10G;
+   link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
+   return rte_eth_linkstatus_set(eth_dev, &link);
+}
+
 /* Define our ethernet definitions */
 static const struct eth_dev_ops otx_ep_eth_dev_ops = {
.dev_configure  = otx_ep_dev_configure,
@@ -399,6 +415,7 @@ static const struct eth_dev_ops otx_ep_eth_dev_ops = {
.dev_infos_get  = otx_ep_dev_info_get,
.stats_get  = otx_ep_dev_stats_get,
.stats_reset= otx_ep_dev_stats_reset,
+   .link_update= otx_ep_link_update,
 };
 
 static int
-- 
2.36.1



RE: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread Huang, Wei


> -Original Message-
> From: David Marchand 
> Sent: Friday, July 1, 2022 15:22
> To: Huang, Wei 
> Cc: Xu, Rosen ; Zhang, Tianfei ;
> dev@dpdk.org; Ray Kinsella 
> Subject: Re: [PATCH] doc: announce some raw/ifpga API removal
> 
> On Fri, Jul 1, 2022 at 9:16 AM Huang, Wei  wrote:
> > > What is the PCI bus used for, in this application?
> > >
> > In this application, target PCI device is Intel FPGA, it supports some
> > special operation like removing it from PCI bus and rescanning it back from 
> > PCI
> bus, So there are two things need to be done directly on rte_pci_bus:
> > 1. Rescan PCI bus
> >  pci_bus->bus.scan()
> > 2. Get pci_dev by specified PCI address, and remove it
> > TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
> > if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
> > return pci_dev;
> > }
> > 
> > pci_drv = pci_dev->driver;
> > pci_drv->remove(pci_dev);
> 
> Can't this application use rte_dev_remove and rte_dev_probe?
> If not, we should add the missing parts in the API.
> 
Both rte_dev_remove and rte_dev_probe need rte_device pointer. In this 
application, it only know the device's PCI address, is there an
existing API to get the rte_pci_device pointer by its PCI address ?

For PCI rescan, I know there is an API called rte_bus_scan(), which seems to be 
able to replace pci_bus->bus.scan(). But is it reasonable to scan all buses
when I want to only scan PCI ?
Why I need to rescan PCI bus, the answer is current PCI scan only add PCI 
device to rte_pci_bus when it is bound to kernel driver. In our case, a FPGA
may not bind to any driver, this application bind vfio-pci to it and call bus 
scan to add it to rte_pci_bus.
> 
> --
> David Marchand



Re: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread David Marchand
On Fri, Jul 1, 2022 at 10:02 AM Huang, Wei  wrote:
>
>
>
> > -Original Message-
> > From: David Marchand 
> > Sent: Friday, July 1, 2022 15:22
> > To: Huang, Wei 
> > Cc: Xu, Rosen ; Zhang, Tianfei 
> > ;
> > dev@dpdk.org; Ray Kinsella 
> > Subject: Re: [PATCH] doc: announce some raw/ifpga API removal
> >
> > On Fri, Jul 1, 2022 at 9:16 AM Huang, Wei  wrote:
> > > > What is the PCI bus used for, in this application?
> > > >
> > > In this application, target PCI device is Intel FPGA, it supports some
> > > special operation like removing it from PCI bus and rescanning it back 
> > > from PCI
> > bus, So there are two things need to be done directly on rte_pci_bus:
> > > 1. Rescan PCI bus
> > >  pci_bus->bus.scan()
> > > 2. Get pci_dev by specified PCI address, and remove it
> > > TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
> > > if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
> > > return pci_dev;
> > > }
> > > 
> > > pci_drv = pci_dev->driver;
> > > pci_drv->remove(pci_dev);
> >
> > Can't this application use rte_dev_remove and rte_dev_probe?
> > If not, we should add the missing parts in the API.
> >
> Both rte_dev_remove and rte_dev_probe need rte_device pointer. In this 
> application, it only know the device's PCI address, is there an
> existing API to get the rte_pci_device pointer by its PCI address ?

rte_dev_probe takes a devargs string as input.
int rte_dev_probe(const char *devargs);

You need the rte_device for removing which can be retrieved from
rte_rawdev_info_get.

>
> For PCI rescan, I know there is an API called rte_bus_scan(), which seems to 
> be able to replace pci_bus->bus.scan(). But is it reasonable to scan all buses
> when I want to only scan PCI ?
> Why I need to rescan PCI bus, the answer is current PCI scan only add PCI 
> device to rte_pci_bus when it is bound to kernel driver. In our case, a FPGA
> may not bind to any driver, this application bind vfio-pci to it and call bus 
> scan to add it to rte_pci_bus.

Scanning is called on the relevant bus when probing, see local_dev_probe.


-- 
David Marchand



[PATCH] doc: announce rename of octeontx_ep driver

2022-07-01 Thread Veerasenareddy Burru
To enable single unified driver to support current OcteonTx and
future Octeon PCI endpoint NICs, octeontx_ep driver is renamed to
octeon_ep to reflect common driver for all Octeon based
PCI endpoint NICs.

Signed-off-by: Veerasenareddy Burru 
---
 doc/guides/rel_notes/deprecation.rst | 9 +
 1 file changed, 9 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 4e5b23c53d..d50e68aed4 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -125,3 +125,12 @@ Deprecation Notices
   applications should be updated to use the ``dmadev`` library instead,
   with the underlying HW-functionality being provided by the ``ioat`` or
   ``idxd`` dma drivers
+
+* drivers/octeontx_ep: rename octeontx_ep drivers
+
+  Current driver name "octeontx_ep" was to support OcteonTX line of products.
+  This is renamed to apply for all Octeon EP products:
+  OcteonTX + future Octeon chipsets.
+  This deprecation notice is to do following actions in DPDK v22.11 version.
+
+  #. Rename ``drivers/net/octeontx_ep/`` to ``drivers/net/octeon_ep/``
-- 
2.36.0



RE: [PATCH v4] net: fix checksum with unaligned buffer

2022-07-01 Thread Emil Berg


> -Original Message-
> From: Stephen Hemminger 
> Sent: den 30 juni 2022 19:46
> To: Morten Brørup 
> Cc: Emil Berg ; bruce.richard...@intel.com;
> dev@dpdk.org; sta...@dpdk.org; bugzi...@dpdk.org; hof...@lysator.liu.se;
> olivier.m...@6wind.com
> Subject: Re: [PATCH v4] net: fix checksum with unaligned buffer
> 
> On Thu, 23 Jun 2022 14:39:00 +0200
> Morten Brørup  wrote:
> 
> > +   /* if buffer is unaligned, keeping it byte order independent */
> > +   if (unlikely(unaligned)) {
> > +   uint16_t first = 0;
> > +   if (unlikely(len == 0))
> > +   return 0;
> 
> Why is length == 0 unique to unaligned case?
> 
> > +   ((unsigned char *)&first)[1] = *(const unsigned
> char *)buf;
> 
> Use a proper union instead of casting to avoid aliasing warnings.
> 
> > +   bsum += first;
> > +   buf = RTE_PTR_ADD(buf, 1);
> > +   len--;
> > +   }
> 
> Many CPU's (such as x86) won't care about alignment and therefore the
> extra code to handle this is not worth doing.
> 

x86 does care about alignment. An example is the vmovdqa instruction, where 'a' 
stands for 'aligned'. The description in the link below says: "When the source 
or destination operand is a memory operand, the operand must be aligned on a 
16-byte boundary or a general-protection exception (#GP) will be generated. "

https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64

> Perhaps DPDK needs a macro (like Linux kernel) for efficient unaligned
> access.
> 
> In Linux kernel it is CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS


Re: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread David Marchand
On Fri, Jul 1, 2022 at 10:09 AM David Marchand
 wrote:
> > > > 2. Get pci_dev by specified PCI address, and remove it
> > > > TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
> > > > if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
> > > > return pci_dev;
> > > > }
> > > > 
> > > > pci_drv = pci_dev->driver;
> > > > pci_drv->remove(pci_dev);
> > >
> > > Can't this application use rte_dev_remove and rte_dev_probe?
> > > If not, we should add the missing parts in the API.
> > >
> > Both rte_dev_remove and rte_dev_probe need rte_device pointer. In this 
> > application, it only know the device's PCI address, is there an
> > existing API to get the rte_pci_device pointer by its PCI address ?
>
> rte_dev_probe takes a devargs string as input.
> int rte_dev_probe(const char *devargs);
>
> You need the rte_device for removing which can be retrieved from
> rte_rawdev_info_get.

Additionnaly, rte_eal_hotplug_{add,remove} do the same jobs, but with
an easier(?) interface.


-- 
David Marchand



Re: [PATCH] doc: add deprecation notice for kni example

2022-07-01 Thread David Marchand
On Thu, Jun 30, 2022 at 6:50 PM Bruce Richardson
 wrote:
>
> As agreed by DPDK Technical Board, the KNI example app is due to be
> removed from the repository to discourage use in future projects, given
> that better alternatives exist.
>
> Signed-off-by: Bruce Richardson 

Acked-by: David Marchand 


-- 
David Marchand



Re: [PATCH] vdpa/mlx5: fix leak on event thread creation

2022-07-01 Thread David Marchand
On Mon, Jun 20, 2022 at 3:11 PM David Marchand
 wrote:
>
> As stated in the manual, pthread_attr_init return value should be
> checked.
> Besides, a pthread_attr_t should be destroyed once unused.
>
> In practice, we may have no leak (from what I read in glibc current code),
> but this may change in the future.
> Stick to a correct use of the API.
>
> Fixes: 5cf3fd3af4df ("vdpa/mlx5: add CPU core parameter to bind polling 
> thread")
> Cc: sta...@dpdk.org
>
> Signed-off-by: David Marchand 

Review, please?


-- 
David Marchand



RE: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread Huang, Wei


> -Original Message-
> From: David Marchand 
> Sent: Friday, July 1, 2022 16:15
> To: Huang, Wei 
> Cc: Xu, Rosen ; Zhang, Tianfei ;
> dev@dpdk.org; Ray Kinsella 
> Subject: Re: [PATCH] doc: announce some raw/ifpga API removal
> 
> On Fri, Jul 1, 2022 at 10:09 AM David Marchand 
> wrote:
> > > > > 2. Get pci_dev by specified PCI address, and remove it
> > > > > TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
> > > > > if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
> > > > > return pci_dev;
> > > > > }
> > > > > 
> > > > > pci_drv = pci_dev->driver;
> > > > > pci_drv->remove(pci_dev);
> > > >
> > > > Can't this application use rte_dev_remove and rte_dev_probe?
> > > > If not, we should add the missing parts in the API.
> > > >
> > > Both rte_dev_remove and rte_dev_probe need rte_device pointer. In
> > > this application, it only know the device's PCI address, is there an 
> > > existing API
> to get the rte_pci_device pointer by its PCI address ?
> >
> > rte_dev_probe takes a devargs string as input.
> > int rte_dev_probe(const char *devargs);
> >
> > You need the rte_device for removing which can be retrieved from
> > rte_rawdev_info_get.
> 
> Additionnaly, rte_eal_hotplug_{add,remove} do the same jobs, but with an
> easier(?) interface.
> 
> 
I checked rte_eal_hotplug_{add,remove}, they should meet my requirements, 
thanks a lot.
> --
> David Marchand



Re: [PATCH] vdpa/mlx5: fix leak on event thread creation

2022-07-01 Thread Maxime Coquelin




On 7/1/22 10:30, David Marchand wrote:

On Mon, Jun 20, 2022 at 3:11 PM David Marchand
 wrote:


As stated in the manual, pthread_attr_init return value should be
checked.
Besides, a pthread_attr_t should be destroyed once unused.

In practice, we may have no leak (from what I read in glibc current code),
but this may change in the future.
Stick to a correct use of the API.

Fixes: 5cf3fd3af4df ("vdpa/mlx5: add CPU core parameter to bind polling thread")
Cc: sta...@dpdk.org

Signed-off-by: David Marchand 


Review, please?




I reviewed the patch but was waiting for Nvidia to test it.


Reviewed-by: Maxime Coquelin 

Nvidia, could it be done ASAP so that it goes to -rc3?

Thanks,
Maxime



Re: [PATCH] doc: announce some raw/ifpga API removal

2022-07-01 Thread David Marchand
On Fri, Jul 1, 2022 at 10:32 AM Huang, Wei  wrote:
> > > > > > 2. Get pci_dev by specified PCI address, and remove it
> > > > > > TAILQ_FOREACH(pci_dev, &pci_bus->device_list, next) {
> > > > > > if (!rte_pci_addr_cmp(&pci_dev->addr, &addr))
> > > > > > return pci_dev;
> > > > > > }
> > > > > > 
> > > > > > pci_drv = pci_dev->driver;
> > > > > > pci_drv->remove(pci_dev);
> > > > >
> > > > > Can't this application use rte_dev_remove and rte_dev_probe?
> > > > > If not, we should add the missing parts in the API.
> > > > >
> > > > Both rte_dev_remove and rte_dev_probe need rte_device pointer. In
> > > > this application, it only know the device's PCI address, is there an 
> > > > existing API
> > to get the rte_pci_device pointer by its PCI address ?
> > >
> > > rte_dev_probe takes a devargs string as input.
> > > int rte_dev_probe(const char *devargs);
> > >
> > > You need the rte_device for removing which can be retrieved from
> > > rte_rawdev_info_get.
> >
> > Additionnaly, rte_eal_hotplug_{add,remove} do the same jobs, but with an
> > easier(?) interface.
> >
> >
> I checked rte_eal_hotplug_{add,remove}, they should meet my requirements, 
> thanks a lot.

Cool, thanks.


-- 
David Marchand



[PATCH] crypto/qat: fix secure session check

2022-07-01 Thread Rebecca Troy
Currently when running the dpdk-perf-test with docsis
security sessions, a segmentation fault occurs. This
is due to the check being made that the session is not
equal to op->sym->sec_session. This check passes the
first time but on the second iteration fails and doesn't
create the build_request.

This commit fixes that error by getting the ctx first
from the private session data and then comparing ctx,
rather than op->sym->sec_session, with the sess.

Fixes: fb3b9f492205 ("crypto/qat: rework burst data path")
Cc: kai...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Rebecca Troy 
---
 drivers/crypto/qat/qat_sym.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/qat/qat_sym.c b/drivers/crypto/qat/qat_sym.c
index 3477cd89ad..e5ae670b3a 100644
--- a/drivers/crypto/qat/qat_sym.c
+++ b/drivers/crypto/qat/qat_sym.c
@@ -105,16 +105,16 @@ qat_sym_build_request(void *in_op, uint8_t *out_msg,
 
 #ifdef RTE_LIB_SECURITY
else if (op->sess_type == RTE_CRYPTO_OP_SECURITY_SESSION) {
-   if ((void *)sess != (void *)op->sym->sec_session) {
-   struct rte_cryptodev *cdev;
-   struct qat_cryptodev_private *internals;
-
-   ctx = get_sec_session_private_data(
+   ctx = get_sec_session_private_data(
op->sym->sec_session);
if (unlikely(!ctx)) {
QAT_DP_LOG(ERR, "No session for this device");
return -EINVAL;
}
+   if (sess != (uintptr_t)ctx) {
+   struct rte_cryptodev *cdev;
+   struct qat_cryptodev_private *internals;
+
if (unlikely(ctx->bpi_ctx == NULL)) {
QAT_DP_LOG(ERR, "QAT PMD only supports security"
" operation requests for"
-- 
2.25.1



[PATCH 00/17] Introduce Microsoft Azure Network Adatper (MANA) PMD

2022-07-01 Thread longli
From: Long Li 

MANA is a network interface card to be used in the Azure cloud environment.
MANA provides safe access to user memory through memory registration. It has
IOMMU built into the hardware.

MANA uses IB verbs and RDMA layer to configure hardware resources. It
requires the corresponding RDMA kernel-mode and user-mode drivers.

The MANA RDMA kernel-mode driver is being reviewed at:
https://patchwork.kernel.org/project/netdevbpf/cover/1655345240-26411-1-git-send-email-lon...@linuxonhyperv.com/

The MANA RDMA user-mode driver is being reviewed at:
https://github.com/linux-rdma/rdma-core/pull/1177

Long Li (17):
  net/mana: add basic driver, build environment and doc
  net/mana: add device configuration and stop
  net/mana: add function to report support ptypes
  net/mana: add link update
  net/mana: add function for device removal interrupts
  net/mana: add device info
  net/mana: add function to configure RSS
  net/mana: add function to configure RX queues
  net/mana: add function to configure TX queues
  net/mana: implement memory registration
  net/mana: implement the hardware layer operations
  net/mana: add function to start/stop TX queues
  net/mana: add function to start/stop RX queues
  net/mana: add function to receive packets
  net/mana: add function to send packets
  net/mana: add function to start/stop device
  net/mana: add function to report queue stats

 MAINTAINERS   |6 +
 doc/guides/nics/features/mana.ini |   18 +
 doc/guides/nics/index.rst |1 +
 doc/guides/nics/mana.rst  |   54 ++
 drivers/net/mana/gdma.c   |  309 +++
 drivers/net/mana/mana.c   | 1388 +
 drivers/net/mana/mana.h   |  543 +++
 drivers/net/mana/meson.build  |   38 +
 drivers/net/mana/mp.c |  345 +++
 drivers/net/mana/mr.c |  339 +++
 drivers/net/mana/rx.c |  473 ++
 drivers/net/mana/tx.c |  420 +
 drivers/net/mana/version.map  |3 +
 drivers/net/meson.build   |1 +
 14 files changed, 3938 insertions(+)
 create mode 100644 doc/guides/nics/features/mana.ini
 create mode 100644 doc/guides/nics/mana.rst
 create mode 100644 drivers/net/mana/gdma.c
 create mode 100644 drivers/net/mana/mana.c
 create mode 100644 drivers/net/mana/mana.h
 create mode 100644 drivers/net/mana/meson.build
 create mode 100644 drivers/net/mana/mp.c
 create mode 100644 drivers/net/mana/mr.c
 create mode 100644 drivers/net/mana/rx.c
 create mode 100644 drivers/net/mana/tx.c
 create mode 100644 drivers/net/mana/version.map

-- 
2.17.1



[PATCH 01/17] net/mana: add basic driver, build environment and doc

2022-07-01 Thread longli
From: Long Li 

MANA is a PCI device. It uses IB verbs to access hardware through the kernel
RDMA layer. This patch introduces build environment and basic device probe
functions.

Signed-off-by: Long Li 
---
 MAINTAINERS   |   6 +
 doc/guides/nics/features/mana.ini |  10 +
 doc/guides/nics/index.rst |   1 +
 doc/guides/nics/mana.rst  |  54 +++
 drivers/net/mana/mana.c   | 729 ++
 drivers/net/mana/mana.h   | 214 +
 drivers/net/mana/meson.build  |  34 ++
 drivers/net/mana/mp.c | 257 +++
 drivers/net/mana/version.map  |   3 +
 drivers/net/meson.build   |   1 +
 10 files changed, 1309 insertions(+)
 create mode 100644 doc/guides/nics/features/mana.ini
 create mode 100644 doc/guides/nics/mana.rst
 create mode 100644 drivers/net/mana/mana.c
 create mode 100644 drivers/net/mana/mana.h
 create mode 100644 drivers/net/mana/meson.build
 create mode 100644 drivers/net/mana/mp.c
 create mode 100644 drivers/net/mana/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 18d9edaf88..b8bda48a33 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -837,6 +837,12 @@ F: buildtools/options-ibverbs-static.sh
 F: doc/guides/nics/mlx5.rst
 F: doc/guides/nics/features/mlx5.ini
 
+Microsoft mana
+M: Long Li 
+F: drivers/net/mana
+F: doc/guides/nics/mana.rst
+F: doc/guides/nics/features/mana.ini
+
 Microsoft vdev_netvsc - EXPERIMENTAL
 M: Matan Azrad 
 F: drivers/net/vdev_netvsc/
diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
new file mode 100644
index 00..9d8676089b
--- /dev/null
+++ b/doc/guides/nics/features/mana.ini
@@ -0,0 +1,10 @@
+;
+; Supported features of the 'cnxk' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux= Y
+Multiprocess aware   = Y
+Usage doc= Y
+x86-64   = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 1c94caccea..2725d1d9f0 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -41,6 +41,7 @@ Network Interface Controller Drivers
 intel_vf
 kni
 liquidio
+mana
 memif
 mlx4
 mlx5
diff --git a/doc/guides/nics/mana.rst b/doc/guides/nics/mana.rst
new file mode 100644
index 00..a871db35a7
--- /dev/null
+++ b/doc/guides/nics/mana.rst
@@ -0,0 +1,54 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright 2022 Microsoft Corporation
+
+MANA poll mode driver library
+=
+
+The MANA poll mode driver library (**librte_net_mana**) implements support
+for Microsoft Azure Network Adatper VF in SR-IOV context.
+
+Features
+
+
+Features of the MANA Ethdev PMD are:
+
+Prerequisites
+-
+
+This driver relies on external libraries and kernel drivers for resources
+allocations and initialization. The following dependencies are not part of
+DPDK and must be installed separately:
+
+- **libibverbs** (provided by rdma-core package)
+
+  User space verbs framework used by librte_net_mana. This library provides
+  a generic interface between the kernel and low-level user space drivers
+  such as libmana.
+
+  It allows slow and privileged operations (context initialization, hardware
+  resources allocations) to be managed by the kernel and fast operations to
+  never leave user space.
+
+- **libmana** (provided by rdma-core package)
+
+  Low-level user space driver library for Microsoft Azure Network Adatper
+  devices, it is automatically loaded by libibverbs.
+
+- **Kernel modules**
+
+  They provide the kernel-side verbs API and low level device drivers that
+  manage actual hardware initialization and resources sharing with user
+  space processes.
+
+  Unlike most other PMDs, these modules must remain loaded and bound to
+  their devices:
+
+  - mana: Ethernet device driver that provides kernel network interfaces.
+  - mana_ib: InifiniBand device driver.
+  - ib_uverbs: user space driver for verbs (entry point for libibverbs).
+
+Driver compilation and testing
+--
+
+Refer to the document :ref:`compiling and testing a PMD for a NIC 
`
+for details.
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
new file mode 100644
index 00..893e2b1e23
--- /dev/null
+++ b/drivers/net/mana/mana.c
@@ -0,0 +1,729 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+#include "mana.h"
+
+/* Shared memory between primary/secondary processes, per driver */
+struct mana_shared_data *mana_shared_data;
+const struct rte_memzone *mana_shared_mz;
+static co

[PATCH 02/17] net/mana: add device configuration and stop

2022-07-01 Thread longli
From: Long Li 

MANA defines its memory allocation functions to override IB layer default
functions to allocate device queues. This patch adds the code for device
configuration and stop.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.c | 87 -
 drivers/net/mana/mana.h |  3 --
 2 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 893e2b1e23..882a38d7df 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -57,7 +57,91 @@ static rte_spinlock_t mana_shared_data_lock = 
RTE_SPINLOCK_INITIALIZER;
 int mana_logtype_driver;
 int mana_logtype_init;
 
+static void *mana_alloc_verbs_buf(size_t size, void *data)
+{
+   void *ret;
+   size_t alignment = rte_mem_page_size();
+   int socket = (int)(uintptr_t)data;
+
+   DRV_LOG(DEBUG, "size=%lu socket=%d", size, socket);
+
+   if (alignment == (size_t)-1) {
+   DRV_LOG(ERR, "Failed to get mem page size");
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
+   ret = rte_zmalloc_socket("mana_verb_buf", size, alignment, socket);
+   if (!ret && size)
+   rte_errno = ENOMEM;
+   return ret;
+}
+
+static void mana_free_verbs_buf(void *ptr, void *data __rte_unused)
+{
+   rte_free(ptr);
+}
+
+static int mana_dev_configure(struct rte_eth_dev *dev)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+   struct rte_eth_conf *dev_conf = &dev->data->dev_conf;
+   const struct rte_eth_rxmode *rxmode = &dev_conf->rxmode;
+   const struct rte_eth_txmode *txmode = &dev_conf->txmode;
+
+   if (dev_conf->rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG)
+   dev_conf->rxmode.offloads |= DEV_RX_OFFLOAD_RSS_HASH;
+
+   if (txmode->offloads & ~BNIC_DEV_TX_OFFLOAD_SUPPORT) {
+   DRV_LOG(ERR, "Unsupported TX offload: %lx", txmode->offloads);
+   return -EINVAL;
+   }
+
+   if (rxmode->offloads & ~BNIC_DEV_RX_OFFLOAD_SUPPORT) {
+   DRV_LOG(ERR, "Unsupported RX offload: %lx", rxmode->offloads);
+   return -EINVAL;
+   }
+
+   if (dev->data->nb_rx_queues != dev->data->nb_tx_queues) {
+   DRV_LOG(ERR, "Only support equal number of rx/tx queues");
+   return -EINVAL;
+   }
+
+   if (!rte_is_power_of_2(dev->data->nb_rx_queues)) {
+   DRV_LOG(ERR, "number of TX/RX queues must be power of 2");
+   return -EINVAL;
+   }
+
+   priv->num_queues = dev->data->nb_rx_queues;
+
+   manadv_set_context_attr(priv->ib_ctx, MANADV_CTX_ATTR_BUF_ALLOCATORS,
+   (void *)((uintptr_t)&(struct 
manadv_ctx_allocators){
+   .alloc = &mana_alloc_verbs_buf,
+   .free = &mana_free_verbs_buf,
+   .data = 0,
+   }));
+
+   return 0;
+}
+
+static int
+mana_dev_close(struct rte_eth_dev *dev)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+   int ret;
+
+   ret = ibv_close_device(priv->ib_ctx);
+   if (ret) {
+   ret = errno;
+   return ret;
+   }
+
+   return 0;
+}
+
 const struct eth_dev_ops mana_dev_ops = {
+   .dev_configure  = mana_dev_configure,
+   .dev_close  = mana_dev_close,
 };
 
 const struct eth_dev_ops mana_dev_sec_ops = {
@@ -652,8 +736,7 @@ static int mana_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
 
 static int mana_dev_uninit(struct rte_eth_dev *dev)
 {
-   RTE_SET_USED(dev);
-   return 0;
+   return mana_dev_close(dev);
 }
 
 static int mana_pci_remove(struct rte_pci_device *pci_dev)
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index dbef5420ff..9609bee4de 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -177,9 +177,6 @@ uint16_t mana_rx_burst_removed(void *dpdk_rxq, struct 
rte_mbuf **pkts,
 uint16_t mana_tx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
   uint16_t pkts_n);
 
-void *mana_alloc_verbs_buf(size_t size, void *data);
-void mana_free_verbs_buf(void *ptr, void *data);
-
 /** Request timeout for IPC. */
 #define MANA_MP_REQ_TIMEOUT_SEC 5
 
-- 
2.17.1



[PATCH 03/17] net/mana: add function to report support ptypes

2022-07-01 Thread longli
From: Long Li 

Report supported protocol types.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.c | 16 
 drivers/net/mana/mana.h |  2 --
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 882a38d7df..77796ce40d 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -139,9 +139,25 @@ mana_dev_close(struct rte_eth_dev *dev)
return 0;
 }
 
+static const uint32_t *mana_supported_ptypes(struct rte_eth_dev *dev 
__rte_unused)
+{
+   static const uint32_t ptypes[] = {
+   RTE_PTYPE_L2_ETHER,
+   RTE_PTYPE_L3_IPV4_EXT_UNKNOWN,
+   RTE_PTYPE_L3_IPV6_EXT_UNKNOWN,
+   RTE_PTYPE_L4_FRAG,
+   RTE_PTYPE_L4_TCP,
+   RTE_PTYPE_L4_UDP,
+   RTE_PTYPE_UNKNOWN
+   };
+
+   return ptypes;
+}
+
 const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
.dev_close  = mana_dev_close,
+   .dev_supported_ptypes_get = mana_supported_ptypes,
 };
 
 const struct eth_dev_ops mana_dev_sec_ops = {
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 9609bee4de..b0571a0516 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -169,8 +169,6 @@ extern int mana_logtype_init;
 
 #define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
 
-const uint32_t *mana_supported_ptypes(struct rte_eth_dev *dev);
-
 uint16_t mana_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
   uint16_t pkts_n);
 
-- 
2.17.1



[PATCH 04/17] net/mana: add link update

2022-07-01 Thread longli
From: Long Li 

The carrier state is managed by the Azure host. MANA runs as a VF and always
reports UP.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |  1 +
 drivers/net/mana/mana.c   | 17 +
 2 files changed, 18 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index 9d8676089b..b7e7cc510b 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Link status  = P
 Linux= Y
 Multiprocess aware   = Y
 Usage doc= Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 77796ce40d..7b495e1aa1 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -154,10 +154,27 @@ static const uint32_t *mana_supported_ptypes(struct 
rte_eth_dev *dev __rte_unuse
return ptypes;
 }
 
+static int mana_dev_link_update(struct rte_eth_dev *dev,
+   int wait_to_complete __rte_unused)
+{
+   struct rte_eth_link link;
+
+   /* MANA has no concept of carrier state, always reporting UP */
+   link = (struct rte_eth_link) {
+   .link_duplex = RTE_ETH_LINK_FULL_DUPLEX,
+   .link_autoneg = RTE_ETH_LINK_SPEED_FIXED,
+   .link_speed = RTE_ETH_SPEED_NUM_200G,
+   .link_status = RTE_ETH_LINK_UP,
+   };
+
+   return rte_eth_linkstatus_set(dev, &link);
+}
+
 const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
.dev_close  = mana_dev_close,
.dev_supported_ptypes_get = mana_supported_ptypes,
+   .link_update= mana_dev_link_update,
 };
 
 const struct eth_dev_ops mana_dev_sec_ops = {
-- 
2.17.1



[PATCH 05/17] net/mana: add function for device removal interrupts

2022-07-01 Thread longli
From: Long Li 

MANA supports PCI hot plug events. Add this interrupt to DPDK core so its
parent PMD can detect device removal during Azure servicing or live migration.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |  1 +
 drivers/net/mana/mana.c   | 97 +++
 drivers/net/mana/mana.h   |  1 +
 3 files changed, 99 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index b7e7cc510b..47e20754eb 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -7,5 +7,6 @@
 Link status  = P
 Linux= Y
 Multiprocess aware   = Y
+Removal event= Y
 Usage doc= Y
 x86-64   = Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 7b495e1aa1..f03908b6e4 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -124,12 +124,18 @@ static int mana_dev_configure(struct rte_eth_dev *dev)
return 0;
 }
 
+static int mana_intr_uninstall(struct mana_priv *priv);
+
 static int
 mana_dev_close(struct rte_eth_dev *dev)
 {
struct mana_priv *priv = dev->data->dev_private;
int ret;
 
+   ret = mana_intr_uninstall(priv);
+   if (ret)
+   return ret;
+
ret = ibv_close_device(priv->ib_ctx);
if (ret) {
ret = errno;
@@ -364,6 +370,90 @@ static int mana_ibv_device_to_pci_addr(const struct 
ibv_device *device,
return 0;
 }
 
+static void mana_intr_handler(void *arg)
+{
+   struct mana_priv *priv = arg;
+   struct ibv_context *ctx = priv->ib_ctx;
+   struct ibv_async_event event;
+
+   /* Read and ack all messages from IB device */
+   while (true) {
+   if (ibv_get_async_event(ctx, &event))
+   break;
+
+   if (event.event_type == IBV_EVENT_DEVICE_FATAL) {
+   struct rte_eth_dev *dev;
+
+   dev = &rte_eth_devices[priv->port_id];
+   if (dev->data->dev_conf.intr_conf.rmv)
+   rte_eth_dev_callback_process(dev,
+   RTE_ETH_EVENT_INTR_RMV, NULL);
+   }
+
+   ibv_ack_async_event(&event);
+   }
+}
+
+static int mana_intr_uninstall(struct mana_priv *priv)
+{
+   int ret;
+
+   ret = rte_intr_callback_unregister(priv->intr_handle,
+  mana_intr_handler, priv);
+   if (ret <= 0) {
+   DRV_LOG(ERR, "Failed to unregister intr callback ret %d", ret);
+   return ret;
+   }
+
+   rte_intr_instance_free(priv->intr_handle);
+
+   return 0;
+}
+
+static int mana_intr_install(struct mana_priv *priv)
+{
+   int ret, flags;
+   struct ibv_context *ctx = priv->ib_ctx;
+
+   priv->intr_handle = rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+   if (!priv->intr_handle) {
+   DRV_LOG(ERR, "Failed to allocate intr_handle");
+   rte_errno = ENOMEM;
+   return -ENOMEM;
+   }
+
+   rte_intr_fd_set(priv->intr_handle, -1);
+
+   flags = fcntl(ctx->async_fd, F_GETFL);
+   ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to change async_fd to NONBLOCK");
+   goto free_intr;
+   }
+
+   rte_intr_fd_set(priv->intr_handle, ctx->async_fd);
+   rte_intr_type_set(priv->intr_handle, RTE_INTR_HANDLE_EXT);
+
+   ret = rte_intr_callback_register(priv->intr_handle,
+mana_intr_handler, priv);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to register intr callback");
+   rte_intr_fd_set(priv->intr_handle, -1);
+   goto restore_fd;
+   }
+
+   return 0;
+
+restore_fd:
+   fcntl(ctx->async_fd, F_SETFL, flags);
+
+free_intr:
+   rte_intr_instance_free(priv->intr_handle);
+   priv->intr_handle = NULL;
+
+   return ret;
+}
+
 static int mana_proc_priv_init(struct rte_eth_dev *dev)
 {
struct mana_process_priv *priv;
@@ -677,6 +767,13 @@ static int mana_pci_probe_mac(struct rte_pci_driver 
*pci_drv __rte_unused,
name, priv->max_rx_queues, priv->max_rx_desc,
priv->max_send_sge);
 
+   /* Create async interrupt handler */
+   ret = mana_intr_install(priv);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to install intr handler");
+   goto failed;
+   }
+
rte_spinlock_lock(&mana_shared_data->lock);
mana_shared_data->primary_cnt++;
rte_spinlock_unlock(&mana_shared_data->lock);
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index b0571a0516..71c82e4bd2 100644
---

[PATCH 06/17] net/mana: add device info

2022-07-01 Thread longli
From: Long Li 

Add the function to get device info.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |  1 +
 drivers/net/mana/mana.c   | 82 +++
 2 files changed, 83 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index 47e20754eb..5183c6d3d0 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -8,5 +8,6 @@ Link status  = P
 Linux= Y
 Multiprocess aware   = Y
 Removal event= Y
+Speed capabilities   = P
 Usage doc= Y
 x86-64   = Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index f03908b6e4..1513d5904b 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -145,6 +145,86 @@ mana_dev_close(struct rte_eth_dev *dev)
return 0;
 }
 
+static int mana_dev_info_get(struct rte_eth_dev *dev,
+struct rte_eth_dev_info *dev_info)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+
+   dev_info->max_mtu = RTE_ETHER_MTU;
+
+   /* RX params */
+   dev_info->min_rx_bufsize = MIN_RX_BUF_SIZE;
+   dev_info->max_rx_pktlen = MAX_FRAME_SIZE;
+
+   dev_info->max_rx_queues = priv->max_rx_queues;
+   dev_info->max_tx_queues = priv->max_tx_queues;
+
+   dev_info->max_mac_addrs = BNIC_MAX_MAC_ADDR;
+   dev_info->max_hash_mac_addrs = 0;
+
+   dev_info->max_vfs = 1;
+
+   /* Offload params */
+   dev_info->rx_offload_capa = BNIC_DEV_RX_OFFLOAD_SUPPORT;
+
+   dev_info->tx_offload_capa = BNIC_DEV_TX_OFFLOAD_SUPPORT;
+
+   /* RSS */
+   dev_info->reta_size = INDIRECTION_TABLE_NUM_ELEMENTS;
+   dev_info->hash_key_size = TOEPLITZ_HASH_KEY_SIZE_IN_BYTES;
+   dev_info->flow_type_rss_offloads = BNIC_ETH_RSS_SUPPORT;
+
+   /* Thresholds */
+   dev_info->default_rxconf = (struct rte_eth_rxconf){
+   .rx_thresh = {
+   .pthresh = 8,
+   .hthresh = 8,
+   .wthresh = 0,
+   },
+   .rx_free_thresh = 32,
+   /* If no descriptors available, pkts are dropped by default */
+   .rx_drop_en = 1,
+   };
+
+   dev_info->default_txconf = (struct rte_eth_txconf){
+   .tx_thresh = {
+   .pthresh = 32,
+   .hthresh = 0,
+   .wthresh = 0,
+   },
+   .tx_rs_thresh = 32,
+   .tx_free_thresh = 32,
+   };
+
+   /* Buffer limits */
+   dev_info->rx_desc_lim.nb_min = MIN_BUFFERS_PER_QUEUE;
+   dev_info->rx_desc_lim.nb_max = priv->max_rx_desc;
+   dev_info->rx_desc_lim.nb_align = MIN_BUFFERS_PER_QUEUE;
+   dev_info->rx_desc_lim.nb_seg_max = priv->max_recv_sge;
+   dev_info->rx_desc_lim.nb_mtu_seg_max = priv->max_recv_sge;
+
+   dev_info->tx_desc_lim.nb_min = MIN_BUFFERS_PER_QUEUE;
+   dev_info->tx_desc_lim.nb_max = priv->max_tx_desc;
+   dev_info->tx_desc_lim.nb_align = MIN_BUFFERS_PER_QUEUE;
+   dev_info->tx_desc_lim.nb_seg_max = priv->max_send_sge;
+   dev_info->rx_desc_lim.nb_mtu_seg_max = priv->max_recv_sge;
+
+   /* Speed */
+   dev_info->speed_capa = ETH_LINK_SPEED_100G;
+
+   /* RX params */
+   dev_info->default_rxportconf.burst_size = 1;
+   dev_info->default_rxportconf.ring_size = MAX_RECEIVE_BUFFERS_PER_QUEUE;
+   dev_info->default_rxportconf.nb_queues = 1;
+
+   /* TX params */
+   dev_info->default_txportconf.burst_size = 1;
+   dev_info->default_txportconf.ring_size = MAX_SEND_BUFFERS_PER_QUEUE;
+   dev_info->default_txportconf.nb_queues = 1;
+
+   return 0;
+}
+
 static const uint32_t *mana_supported_ptypes(struct rte_eth_dev *dev 
__rte_unused)
 {
static const uint32_t ptypes[] = {
@@ -179,11 +259,13 @@ static int mana_dev_link_update(struct rte_eth_dev *dev,
 const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
.dev_close  = mana_dev_close,
+   .dev_infos_get  = mana_dev_info_get,
.dev_supported_ptypes_get = mana_supported_ptypes,
.link_update= mana_dev_link_update,
 };
 
 const struct eth_dev_ops mana_dev_sec_ops = {
+   .dev_infos_get = mana_dev_info_get,
 };
 
 uint16_t
-- 
2.17.1



[PATCH 07/17] net/mana: add function to configure RSS

2022-07-01 Thread longli
From: Long Li 

Currently this PMD supports RSS configuration when the device is stopped.
Configuring RSS in running state will be supported in the future.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |  1 +
 drivers/net/mana/mana.c   | 61 +++
 drivers/net/mana/mana.h   |  1 +
 3 files changed, 63 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index 5183c6d3d0..9ba4767a06 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -8,6 +8,7 @@ Link status  = P
 Linux= Y
 Multiprocess aware   = Y
 Removal event= Y
+RSS hash = Y
 Speed capabilities   = P
 Usage doc= Y
 x86-64   = Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 1513d5904b..46b1d5502d 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -240,6 +240,65 @@ static const uint32_t *mana_supported_ptypes(struct 
rte_eth_dev *dev __rte_unuse
return ptypes;
 }
 
+static int mana_rss_hash_update(struct rte_eth_dev *dev,
+   struct rte_eth_rss_conf *rss_conf)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+
+   /* Currently can only update RSS hash when device is stopped */
+   if (dev->data->dev_started) {
+   DRV_LOG(ERR, "Can't update RSS after device has started");
+   return -ENODEV;
+   }
+
+   if (rss_conf->rss_hf & ~BNIC_ETH_RSS_SUPPORT) {
+   DRV_LOG(ERR, "Port %u invalid RSS HF 0x%lx",
+   dev->data->port_id, rss_conf->rss_hf);
+   return -EINVAL;
+   }
+
+   if (rss_conf->rss_key && rss_conf->rss_key_len) {
+   if (rss_conf->rss_key_len != TOEPLITZ_HASH_KEY_SIZE_IN_BYTES) {
+   DRV_LOG(ERR, "Port %u key len must be %u long",
+   dev->data->port_id,
+   TOEPLITZ_HASH_KEY_SIZE_IN_BYTES);
+   return -EINVAL;
+   }
+
+   priv->rss_conf.rss_key_len = rss_conf->rss_key_len;
+   priv->rss_conf.rss_key =
+   rte_zmalloc("mana_rss", rss_conf->rss_key_len,
+   RTE_CACHE_LINE_SIZE);
+   if (!priv->rss_conf.rss_key)
+   return -ENOMEM;
+   memcpy(priv->rss_conf.rss_key, rss_conf->rss_key,
+  rss_conf->rss_key_len);
+   }
+   priv->rss_conf.rss_hf = rss_conf->rss_hf;
+
+   return 0;
+}
+
+static int mana_rss_hash_conf_get(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+
+   if (!rss_conf)
+   return -EINVAL;
+
+   if (rss_conf->rss_key &&
+   rss_conf->rss_key_len >= priv->rss_conf.rss_key_len) {
+   memcpy(rss_conf->rss_key, priv->rss_conf.rss_key,
+  priv->rss_conf.rss_key_len);
+   }
+
+   rss_conf->rss_key_len = priv->rss_conf.rss_key_len;
+   rss_conf->rss_hf = priv->rss_conf.rss_hf;
+
+   return 0;
+}
+
 static int mana_dev_link_update(struct rte_eth_dev *dev,
int wait_to_complete __rte_unused)
 {
@@ -261,6 +320,8 @@ const struct eth_dev_ops mana_dev_ops = {
.dev_close  = mana_dev_close,
.dev_infos_get  = mana_dev_info_get,
.dev_supported_ptypes_get = mana_supported_ptypes,
+   .rss_hash_update= mana_rss_hash_update,
+   .rss_hash_conf_get  = mana_rss_hash_conf_get,
.link_update= mana_dev_link_update,
 };
 
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 71c82e4bd2..1efb2330ee 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -72,6 +72,7 @@ struct mana_priv {
uint8_t ind_table_key[40];
struct ibv_qp *rwq_qp;
void *db_page;
+   struct rte_eth_rss_conf rss_conf;
struct rte_intr_handle *intr_handle;
int max_rx_queues;
int max_tx_queues;
-- 
2.17.1



[PATCH 08/17] net/mana: add function to configure RX queues

2022-07-01 Thread longli
From: Long Li 

RX hardware queue is allocated when starting the queue. This function is for
queue configuration pre starting.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.c | 68 +
 1 file changed, 68 insertions(+)

diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 46b1d5502d..951fc418b6 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -225,6 +225,16 @@ static int mana_dev_info_get(struct rte_eth_dev *dev,
return 0;
 }
 
+static void mana_dev_rx_queue_info(struct rte_eth_dev *dev, uint16_t queue_id,
+  struct rte_eth_rxq_info *qinfo)
+{
+   struct mana_rxq *rxq = dev->data->rx_queues[queue_id];
+
+   qinfo->mp = rxq->mp;
+   qinfo->nb_desc = rxq->num_desc;
+   qinfo->conf.offloads = dev->data->dev_conf.rxmode.offloads;
+}
+
 static const uint32_t *mana_supported_ptypes(struct rte_eth_dev *dev 
__rte_unused)
 {
static const uint32_t ptypes[] = {
@@ -299,6 +309,61 @@ static int mana_rss_hash_conf_get(struct rte_eth_dev *dev,
return 0;
 }
 
+static int mana_dev_rx_queue_setup(struct rte_eth_dev *dev,
+  uint16_t queue_idx, uint16_t nb_desc,
+  unsigned int socket_id,
+  const struct rte_eth_rxconf *rx_conf 
__rte_unused,
+  struct rte_mempool *mp)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+   struct mana_rxq *rxq;
+   int ret;
+
+   rxq = rte_zmalloc_socket("mana_rxq", sizeof(*rxq), 0, socket_id);
+   if (!rxq) {
+   DRV_LOG(ERR, "failed to allocate rxq");
+   return -ENOMEM;
+   }
+
+   DRV_LOG(DEBUG, "idx %u nb_desc %u socket %u",
+   queue_idx, nb_desc, socket_id);
+
+   rxq->socket = socket_id;
+
+   rxq->desc_ring = rte_zmalloc_socket("mana_rx_mbuf_ring",
+   sizeof(struct mana_rxq_desc) *
+   nb_desc,
+   RTE_CACHE_LINE_SIZE, socket_id);
+
+   if (!rxq->desc_ring) {
+   DRV_LOG(ERR, "failed to allocate rxq desc_ring");
+   ret = -ENOMEM;
+   goto fail;
+   }
+
+   rxq->num_desc = nb_desc;
+
+   rxq->priv = priv;
+   rxq->num_desc = nb_desc;
+   rxq->mp = mp;
+   dev->data->rx_queues[queue_idx] = rxq;
+
+   return 0;
+
+fail:
+   rte_free(rxq->desc_ring);
+   rte_free(rxq);
+   return ret;
+}
+
+static void mana_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct mana_rxq *rxq = dev->data->rx_queues[qid];
+
+   rte_free(rxq->desc_ring);
+   rte_free(rxq);
+}
+
 static int mana_dev_link_update(struct rte_eth_dev *dev,
int wait_to_complete __rte_unused)
 {
@@ -319,9 +384,12 @@ const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
.dev_close  = mana_dev_close,
.dev_infos_get  = mana_dev_info_get,
+   .rxq_info_get   = mana_dev_rx_queue_info,
.dev_supported_ptypes_get = mana_supported_ptypes,
.rss_hash_update= mana_rss_hash_update,
.rss_hash_conf_get  = mana_rss_hash_conf_get,
+   .rx_queue_setup = mana_dev_rx_queue_setup,
+   .rx_queue_release   = mana_dev_rx_queue_release,
.link_update= mana_dev_link_update,
 };
 
-- 
2.17.1



[PATCH 09/17] net/mana: add function to configure TX queues

2022-07-01 Thread longli
From: Long Li 

TX hardware queue is allocated when starting the queue, this is for
pre configuration.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.c | 65 +
 1 file changed, 65 insertions(+)

diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 951fc418b6..6b1c3ee035 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -225,6 +225,15 @@ static int mana_dev_info_get(struct rte_eth_dev *dev,
return 0;
 }
 
+static void mana_dev_tx_queue_info(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo)
+{
+   struct mana_txq *txq = dev->data->tx_queues[queue_id];
+
+   qinfo->conf.offloads = dev->data->dev_conf.txmode.offloads;
+   qinfo->nb_desc = txq->num_desc;
+}
+
 static void mana_dev_rx_queue_info(struct rte_eth_dev *dev, uint16_t queue_id,
   struct rte_eth_rxq_info *qinfo)
 {
@@ -309,6 +318,59 @@ static int mana_rss_hash_conf_get(struct rte_eth_dev *dev,
return 0;
 }
 
+static int mana_dev_tx_queue_setup(struct rte_eth_dev *dev,
+  uint16_t queue_idx, uint16_t nb_desc,
+  unsigned int socket_id,
+  const struct rte_eth_txconf *tx_conf 
__rte_unused)
+
+{
+   struct mana_priv *priv = dev->data->dev_private;
+   struct mana_txq *txq;
+   int ret;
+
+   txq = rte_zmalloc_socket("mana_txq", sizeof(*txq), 0, socket_id);
+   if (!txq) {
+   DRV_LOG(ERR, "failed to allocate txq");
+   return -ENOMEM;
+   }
+
+   txq->socket = socket_id;
+
+   txq->desc_ring = rte_malloc_socket("mana_tx_desc_ring",
+  sizeof(struct mana_txq_desc) *
+   nb_desc,
+  RTE_CACHE_LINE_SIZE, socket_id);
+   if (!txq->desc_ring) {
+   DRV_LOG(ERR, "failed to allocate txq desc_ring");
+   ret = -ENOMEM;
+   goto fail;
+   }
+
+   DRV_LOG(DEBUG, "idx %u nb_desc %u socket %u txq->desc_ring %p",
+   queue_idx, nb_desc, socket_id, txq->desc_ring);
+
+   txq->desc_ring_head = 0;
+   txq->desc_ring_tail = 0;
+   txq->priv = priv;
+   txq->num_desc = nb_desc;
+   dev->data->tx_queues[queue_idx] = txq;
+
+   return 0;
+
+fail:
+   rte_free(txq->desc_ring);
+   rte_free(txq);
+   return ret;
+}
+
+static void mana_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
+{
+   struct mana_txq *txq = dev->data->tx_queues[qid];
+
+   rte_free(txq->desc_ring);
+   rte_free(txq);
+}
+
 static int mana_dev_rx_queue_setup(struct rte_eth_dev *dev,
   uint16_t queue_idx, uint16_t nb_desc,
   unsigned int socket_id,
@@ -384,10 +446,13 @@ const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
.dev_close  = mana_dev_close,
.dev_infos_get  = mana_dev_info_get,
+   .txq_info_get   = mana_dev_tx_queue_info,
.rxq_info_get   = mana_dev_rx_queue_info,
.dev_supported_ptypes_get = mana_supported_ptypes,
.rss_hash_update= mana_rss_hash_update,
.rss_hash_conf_get  = mana_rss_hash_conf_get,
+   .tx_queue_setup = mana_dev_tx_queue_setup,
+   .tx_queue_release   = mana_dev_tx_queue_release,
.rx_queue_setup = mana_dev_rx_queue_setup,
.rx_queue_release   = mana_dev_rx_queue_release,
.link_update= mana_dev_link_update,
-- 
2.17.1



[PATCH 10/17] net/mana: implement memory registration

2022-07-01 Thread longli
From: Long Li 

MANA hardware has iommu built-in, that provides hardware safe access to user
memory through memory registration. Since memory registration is an
expensive operation, this patch implements a two level memory registartion
cache mechanisum for each queue and for each port.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.c  |  20 +++
 drivers/net/mana/mana.h  |  38 
 drivers/net/mana/meson.build |   1 +
 drivers/net/mana/mp.c|  85 +
 drivers/net/mana/mr.c| 339 +++
 5 files changed, 483 insertions(+)
 create mode 100644 drivers/net/mana/mr.c

diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 6b1c3ee035..6c8983cd6a 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -132,6 +132,8 @@ mana_dev_close(struct rte_eth_dev *dev)
struct mana_priv *priv = dev->data->dev_private;
int ret;
 
+   remove_all_mr(priv);
+
ret = mana_intr_uninstall(priv);
if (ret)
return ret;
@@ -346,6 +348,13 @@ static int mana_dev_tx_queue_setup(struct rte_eth_dev *dev,
goto fail;
}
 
+   ret = mana_mr_btree_init(&txq->mr_btree,
+MANA_MR_BTREE_PER_QUEUE_N, 0);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to init TXQ MR btree");
+   goto fail;
+   }
+
DRV_LOG(DEBUG, "idx %u nb_desc %u socket %u txq->desc_ring %p",
queue_idx, nb_desc, socket_id, txq->desc_ring);
 
@@ -367,6 +376,8 @@ static void mana_dev_tx_queue_release(struct rte_eth_dev 
*dev, uint16_t qid)
 {
struct mana_txq *txq = dev->data->tx_queues[qid];
 
+   mana_mr_btree_free(&txq->mr_btree);
+
rte_free(txq->desc_ring);
rte_free(txq);
 }
@@ -403,6 +414,13 @@ static int mana_dev_rx_queue_setup(struct rte_eth_dev *dev,
goto fail;
}
 
+   ret = mana_mr_btree_init(&rxq->mr_btree,
+MANA_MR_BTREE_PER_QUEUE_N, socket_id);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to init RXQ MR btree");
+   goto fail;
+   }
+
rxq->num_desc = nb_desc;
 
rxq->priv = priv;
@@ -422,6 +440,8 @@ static void mana_dev_rx_queue_release(struct rte_eth_dev 
*dev, uint16_t qid)
 {
struct mana_rxq *rxq = dev->data->rx_queues[qid];
 
+   mana_mr_btree_free(&rxq->mr_btree);
+
rte_free(rxq->desc_ring);
rte_free(rxq);
 }
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 1efb2330ee..b1ef9ce60b 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -50,6 +50,22 @@ struct mana_shared_data {
 #define MAX_RECEIVE_BUFFERS_PER_QUEUE  256
 #define MAX_SEND_BUFFERS_PER_QUEUE 256
 
+struct mana_mr_cache {
+   uint32_tlkey;
+   uintptr_t   addr;
+   size_t  len;
+   void*verb_obj;
+};
+
+#define MANA_MR_BTREE_CACHE_N  512
+struct mana_mr_btree {
+   uint16_tlen;/* Used entries */
+   uint16_tsize;   /* Total entries */
+   int overflow;
+   int socket;
+   struct mana_mr_cache *table;
+};
+
 struct mana_process_priv {
void *db_page;
 };
@@ -82,6 +98,7 @@ struct mana_priv {
int max_recv_sge;
int max_mr;
uint64_t max_mr_size;
+   struct mana_mr_btree mr_btree;
rte_rwlock_tmr_list_lock;
 };
 
@@ -132,6 +149,7 @@ struct mana_txq {
uint32_t desc_ring_head, desc_ring_tail;
 
struct mana_stats stats;
+   struct mana_mr_btree mr_btree;
unsigned int socket;
 };
 
@@ -154,6 +172,7 @@ struct mana_rxq {
struct mana_gdma_queue gdma_cq;
 
struct mana_stats stats;
+   struct mana_mr_btree mr_btree;
 
unsigned int socket;
 };
@@ -177,6 +196,24 @@ uint16_t mana_rx_burst_removed(void *dpdk_rxq, struct 
rte_mbuf **pkts,
 uint16_t mana_tx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
   uint16_t pkts_n);
 
+struct mana_mr_cache *find_pmd_mr(struct mana_mr_btree *local_tree,
+ struct mana_priv *priv,
+ struct rte_mbuf *mbuf);
+int new_pmd_mr(struct mana_mr_btree *local_tree, struct mana_priv *priv,
+  struct rte_mempool *pool);
+void remove_all_mr(struct mana_priv *priv);
+void del_pmd_mr(struct mana_mr_cache *mr);
+
+void mana_mempool_chunk_cb(struct rte_mempool *mp, void *opaque,
+  struct rte_mempool_memhdr *memhdr, unsigned int idx);
+
+struct mana_mr_cache *mana_mr_btree_lookup(struct mana_mr_btree *bt,
+  uint16_t *idx,
+  uintptr_t addr, size_t len);
+int mana_mr_btree_insert(struct mana_mr_btree *bt, struct mana_mr_cache 
*entry);
+int mana_mr_btree_init(struct mana_mr_btree *bt, int n, int socket);
+void mana_mr_btree_free(struct mana_mr_btree 

[PATCH 11/17] net/mana: implement the hardware layer operations

2022-07-01 Thread longli
From: Long Li 

The hardware layer of MANA understands the device queue and doorbell formats.
Those functions are implemented for use by packet RX/TX code.

Signed-off-by: Long Li 
---
 drivers/net/mana/gdma.c  | 309 +++
 drivers/net/mana/mana.h  | 183 +
 drivers/net/mana/meson.build |   1 +
 3 files changed, 493 insertions(+)
 create mode 100644 drivers/net/mana/gdma.c

diff --git a/drivers/net/mana/gdma.c b/drivers/net/mana/gdma.c
new file mode 100644
index 00..c86ee69bdd
--- /dev/null
+++ b/drivers/net/mana/gdma.c
@@ -0,0 +1,309 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "mana.h"
+
+uint8_t *gdma_get_wqe_pointer(struct mana_gdma_queue *queue)
+{
+   uint32_t offset_in_bytes =
+   (queue->head * GDMA_WQE_ALIGNMENT_UNIT_SIZE) &
+   (queue->size - 1);
+
+   DRV_LOG(DEBUG, "txq sq_head %u sq_size %u offset_in_bytes %u",
+   queue->head, queue->size, offset_in_bytes);
+
+   if (offset_in_bytes + GDMA_WQE_ALIGNMENT_UNIT_SIZE > queue->size)
+   DRV_LOG(ERR, "fatal error: offset_in_bytes %u too big",
+   offset_in_bytes);
+
+   return ((uint8_t *)queue->buffer) + offset_in_bytes;
+}
+
+static uint32_t
+write_dma_client_oob(uint8_t *work_queue_buffer_pointer,
+const struct gdma_work_request *work_request,
+uint32_t client_oob_size)
+{
+   uint8_t *p = work_queue_buffer_pointer;
+
+   struct gdma_wqe_dma_oob *header = (struct gdma_wqe_dma_oob *)p;
+
+   memset(header, 0, sizeof(struct gdma_wqe_dma_oob));
+   header->num_sgl_entries = work_request->num_sgl_elements;
+   header->inline_client_oob_size_in_dwords =
+   client_oob_size / sizeof(uint32_t);
+   header->client_data_unit = work_request->client_data_unit;
+
+   DRV_LOG(DEBUG, "queue buf %p sgl %u oob_h %u du %u oob_buf %p oob_b %u",
+   work_queue_buffer_pointer, header->num_sgl_entries,
+   header->inline_client_oob_size_in_dwords,
+   header->client_data_unit, work_request->inline_oob_data,
+   work_request->inline_oob_size_in_bytes);
+
+   p += sizeof(struct gdma_wqe_dma_oob);
+   if (work_request->inline_oob_data &&
+   work_request->inline_oob_size_in_bytes > 0) {
+   memcpy(p, work_request->inline_oob_data,
+  work_request->inline_oob_size_in_bytes);
+   if (client_oob_size > work_request->inline_oob_size_in_bytes)
+   memset(p + work_request->inline_oob_size_in_bytes, 0,
+  client_oob_size -
+  work_request->inline_oob_size_in_bytes);
+   }
+
+   return sizeof(struct gdma_wqe_dma_oob) + client_oob_size;
+}
+
+static uint32_t
+write_scatter_gather_list(uint8_t *work_queue_head_pointer,
+ uint8_t *work_queue_end_pointer,
+ uint8_t *work_queue_cur_pointer,
+ struct gdma_work_request *work_request)
+{
+   struct gdma_sgl_element *sge_list;
+   struct gdma_sgl_element dummy_sgl[1];
+   uint8_t *address;
+   uint32_t size;
+   uint32_t num_sge;
+   uint32_t size_to_queue_end;
+   uint32_t sge_list_size;
+
+   DRV_LOG(DEBUG, "work_queue_cur_pointer %p work_request->flags %x",
+   work_queue_cur_pointer, work_request->flags);
+
+   num_sge = work_request->num_sgl_elements;
+   sge_list = work_request->sgl;
+   size_to_queue_end = (uint32_t)(work_queue_end_pointer -
+  work_queue_cur_pointer);
+
+   if (num_sge == 0) {
+   /* Per spec, the case of an empty SGL should be handled as
+* follows to avoid corrupted WQE errors:
+* Write one dummy SGL entry
+* Set the address to 1, leave the rest as 0
+*/
+   dummy_sgl[num_sge].address = 1;
+   dummy_sgl[num_sge].size = 0;
+   dummy_sgl[num_sge].memory_key = 0;
+   num_sge++;
+   sge_list = dummy_sgl;
+   }
+
+   sge_list_size = 0;
+   {
+   address = (uint8_t *)sge_list;
+   size = sizeof(struct gdma_sgl_element) * num_sge;
+   if (size_to_queue_end < size) {
+   memcpy(work_queue_cur_pointer, address,
+  size_to_queue_end);
+   work_queue_cur_pointer = work_queue_head_pointer;
+   address += size_to_queue_end;
+

[PATCH 12/17] net/mana: add function to start/stop TX queues

2022-07-01 Thread longli
From: Long Li 

MANA allocate device queues through the IB layer when starting TX queues. When
device is stopped all the queues are unmapped and freed.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |   1 +
 drivers/net/mana/mana.h   |   4 +
 drivers/net/mana/meson.build  |   1 +
 drivers/net/mana/tx.c | 180 ++
 4 files changed, 186 insertions(+)
 create mode 100644 drivers/net/mana/tx.c

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index 9ba4767a06..7546c99ea3 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -7,6 +7,7 @@
 Link status  = P
 Linux= Y
 Multiprocess aware   = Y
+Queue start/stop = Y
 Removal event= Y
 RSS hash = Y
 Speed capabilities   = P
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 1847902054..fef646a9a7 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -379,6 +379,10 @@ uint16_t mana_tx_burst_removed(void *dpdk_rxq, struct 
rte_mbuf **pkts,
 int gdma_poll_completion_queue(struct mana_gdma_queue *cq,
   struct gdma_comp *comp);
 
+int start_tx_queues(struct rte_eth_dev *dev);
+
+int stop_tx_queues(struct rte_eth_dev *dev);
+
 struct mana_mr_cache *find_pmd_mr(struct mana_mr_btree *local_tree,
  struct mana_priv *priv,
  struct rte_mbuf *mbuf);
diff --git a/drivers/net/mana/meson.build b/drivers/net/mana/meson.build
index 4a80189428..34bb9c6b2f 100644
--- a/drivers/net/mana/meson.build
+++ b/drivers/net/mana/meson.build
@@ -11,6 +11,7 @@ deps += ['pci', 'bus_pci', 'net', 'eal', 'kvargs']
 
 sources += files(
'mana.c',
+   'tx.c',
'mr.c',
'gdma.c',
'mp.c',
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
new file mode 100644
index 00..dde911e548
--- /dev/null
+++ b/drivers/net/mana/tx.c
@@ -0,0 +1,180 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "mana.h"
+
+int stop_tx_queues(struct rte_eth_dev *dev)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+   int i;
+
+   for (i = 0; i < priv->num_queues; i++) {
+   struct mana_txq *txq = dev->data->tx_queues[i];
+
+   if (txq->qp) {
+   ibv_destroy_qp(txq->qp);
+   txq->qp = NULL;
+   }
+
+   if (txq->cq) {
+   ibv_destroy_cq(txq->cq);
+   txq->cq = NULL;
+   }
+
+   /* Drain and free posted WQEs */
+   while (txq->desc_ring_tail != txq->desc_ring_head) {
+   struct mana_txq_desc *desc =
+   &txq->desc_ring[txq->desc_ring_tail];
+
+   rte_pktmbuf_free(desc->pkt);
+
+   txq->desc_ring_tail =
+   (txq->desc_ring_tail + 1) % txq->num_desc;
+   }
+   txq->desc_ring_head = 0;
+   txq->desc_ring_tail = 0;
+
+   memset(&txq->gdma_sq, 0, sizeof(txq->gdma_sq));
+   memset(&txq->gdma_cq, 0, sizeof(txq->gdma_cq));
+   }
+
+   return 0;
+}
+
+int start_tx_queues(struct rte_eth_dev *dev)
+{
+   struct mana_priv *priv = dev->data->dev_private;
+   int ret, i;
+
+   /* start TX queues */
+   for (i = 0; i < priv->num_queues; i++) {
+   struct mana_txq *txq;
+   struct ibv_qp_init_attr qp_attr = { 0 };
+   struct manadv_obj obj = {};
+   struct manadv_qp dv_qp;
+   struct manadv_cq dv_cq;
+
+   txq = dev->data->tx_queues[i];
+
+   manadv_set_context_attr(priv->ib_ctx,
+   MANADV_CTX_ATTR_BUF_ALLOCATORS,
+   (void *)((uintptr_t)&(struct manadv_ctx_allocators){
+   .alloc = &mana_alloc_verbs_buf,
+   .free = &mana_free_verbs_buf,
+   .data = (void *)(uintptr_t)txq->socket,
+   }));
+
+   txq->cq = ibv_create_cq(priv->ib_ctx, txq->num_desc,
+   NULL, NULL, 0);
+   if (!txq->cq) {
+   DRV_LOG(ERR, "failed to create cq queue index %d", i);
+   ret = -errno;
+   goto fail;
+   }
+
+   qp_attr.send_cq = txq->cq;
+   qp_attr.recv_cq = txq->cq;
+   qp_attr.cap.max_send_wr = txq->num_desc;
+   

[PATCH 13/17] net/mana: add function to start/stop RX queues

2022-07-01 Thread longli
From: Long Li 

MANA allocates device queues through the IB layer when starting RX queues. When
device is stopped all the queues are unmapped and freed.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.h  |   5 +
 drivers/net/mana/meson.build |   1 +
 drivers/net/mana/rx.c| 369 +++
 3 files changed, 375 insertions(+)
 create mode 100644 drivers/net/mana/rx.c

diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index fef646a9a7..5052ec9061 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -364,6 +364,7 @@ extern int mana_logtype_init;
 
 int mana_ring_doorbell(void *db_page, enum gdma_queue_types queue_type,
   uint32_t queue_id, uint32_t tail);
+int rq_ring_doorbell(struct mana_rxq *rxq);
 
 int gdma_post_work_request(struct mana_gdma_queue *queue,
   struct gdma_work_request *work_req,
@@ -379,10 +380,14 @@ uint16_t mana_tx_burst_removed(void *dpdk_rxq, struct 
rte_mbuf **pkts,
 int gdma_poll_completion_queue(struct mana_gdma_queue *cq,
   struct gdma_comp *comp);
 
+int start_rx_queues(struct rte_eth_dev *dev);
 int start_tx_queues(struct rte_eth_dev *dev);
 
+int stop_rx_queues(struct rte_eth_dev *dev);
 int stop_tx_queues(struct rte_eth_dev *dev);
 
+int alloc_and_post_rx_wqe(struct mana_rxq *rxq);
+
 struct mana_mr_cache *find_pmd_mr(struct mana_mr_btree *local_tree,
  struct mana_priv *priv,
  struct rte_mbuf *mbuf);
diff --git a/drivers/net/mana/meson.build b/drivers/net/mana/meson.build
index 34bb9c6b2f..8233c04eee 100644
--- a/drivers/net/mana/meson.build
+++ b/drivers/net/mana/meson.build
@@ -11,6 +11,7 @@ deps += ['pci', 'bus_pci', 'net', 'eal', 'kvargs']
 
 sources += files(
'mana.c',
+   'rx.c',
'tx.c',
'mr.c',
'gdma.c',
diff --git a/drivers/net/mana/rx.c b/drivers/net/mana/rx.c
new file mode 100644
index 00..bcc9f308f3
--- /dev/null
+++ b/drivers/net/mana/rx.c
@@ -0,0 +1,369 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "mana.h"
+
+static uint8_t mana_rss_hash_key_default[TOEPLITZ_HASH_KEY_SIZE_IN_BYTES] = {
+   0x2c, 0xc6, 0x81, 0xd1,
+   0x5b, 0xdb, 0xf4, 0xf7,
+   0xfc, 0xa2, 0x83, 0x19,
+   0xdb, 0x1a, 0x3e, 0x94,
+   0x6b, 0x9e, 0x38, 0xd9,
+   0x2c, 0x9c, 0x03, 0xd1,
+   0xad, 0x99, 0x44, 0xa7,
+   0xd9, 0x56, 0x3d, 0x59,
+   0x06, 0x3c, 0x25, 0xf3,
+   0xfc, 0x1f, 0xdc, 0x2a,
+};
+
+int rq_ring_doorbell(struct mana_rxq *rxq)
+{
+   struct mana_priv *priv = rxq->priv;
+   int ret;
+   void *db_page = priv->db_page;
+
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+   struct rte_eth_dev *dev =
+   &rte_eth_devices[priv->dev_data->port_id];
+   struct mana_process_priv *process_priv = dev->process_private;
+
+   db_page = process_priv->db_page;
+   }
+
+   ret = mana_ring_doorbell(db_page, gdma_queue_receive,
+rxq->gdma_rq.id,
+rxq->gdma_rq.head *
+   GDMA_WQE_ALIGNMENT_UNIT_SIZE);
+
+   if (ret)
+   DRV_LOG(ERR, "failed to ring RX doorbell ret %d", ret);
+
+   return ret;
+}
+
+int alloc_and_post_rx_wqe(struct mana_rxq *rxq)
+{
+   struct rte_mbuf *mbuf = NULL;
+   struct gdma_sgl_element sgl[1];
+   struct gdma_work_request request = {0};
+   struct gdma_posted_wqe_info wqe_info = {0};
+   struct mana_priv *priv = rxq->priv;
+   int ret;
+   struct mana_mr_cache *mr;
+
+   mbuf = rte_pktmbuf_alloc(rxq->mp);
+   if (!mbuf) {
+   rxq->stats.nombuf++;
+   return -ENOMEM;
+   }
+
+   mr = find_pmd_mr(&rxq->mr_btree, priv, mbuf);
+   if (!mr) {
+   DRV_LOG(ERR, "failed to register RX MR");
+   rte_pktmbuf_free(mbuf);
+   return -ENOMEM;
+   }
+
+   request.gdma_header.struct_size = sizeof(request);
+   wqe_info.gdma_header.struct_size = sizeof(wqe_info);
+
+   sgl[0].address = rte_cpu_to_le_64(rte_pktmbuf_mtod(mbuf, uint64_t));
+   sgl[0].memory_key = mr->lkey;
+   sgl[0].size =
+   rte_pktmbuf_data_room_size(rxq->mp) -
+   RTE_PKTMBUF_HEADROOM;
+
+   request.sgl = sgl;
+   request.num_sgl_elements = 1;
+   request.inline_oob_data = NULL;
+   request.inline_oob_size_in_bytes = 0;
+   request.flags = 0;
+   request.client_data_unit = NOT_USING_CLIENT_DATA_UNIT;
+
+   ret = gdma_post_

[PATCH 14/17] net/mana: add function to receive packets

2022-07-01 Thread longli
From: Long Li 

With all the RX queues created, MANA can use those queues to receive packets.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |   2 +
 drivers/net/mana/mana.c   |   2 +
 drivers/net/mana/mana.h   |  37 +++
 drivers/net/mana/mp.c |   2 +
 drivers/net/mana/rx.c | 104 ++
 5 files changed, 147 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index 7546c99ea3..b47860554d 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -6,6 +6,8 @@
 [Features]
 Link status  = P
 Linux= Y
+L3 checksum offload  = Y
+L4 checksum offload  = Y
 Multiprocess aware   = Y
 Queue start/stop = Y
 Removal event= Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 6c8983cd6a..6d8a0512c1 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -987,6 +987,8 @@ static int mana_pci_probe_mac(struct rte_pci_driver 
*pci_drv __rte_unused,
/* fd is no not used after mapping doorbell */
close(fd);
 
+   eth_dev->rx_pkt_burst = mana_rx_burst;
+
rte_spinlock_lock(&mana_shared_data->lock);
mana_shared_data->secondary_cnt++;
mana_local_data.secondary_cnt++;
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 5052ec9061..626abc431a 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -178,6 +178,11 @@ struct gdma_work_request {
 
 enum mana_cqe_type {
CQE_INVALID = 0,
+
+   CQE_RX_OKAY = 1,
+   CQE_RX_COALESCED_4  = 2,
+   CQE_RX_OBJECT_FENCE = 3,
+   CQE_RX_TRUNCATED= 4,
 };
 
 struct mana_cqe_header {
@@ -203,6 +208,35 @@ struct mana_cqe_header {
(NDIS_HASH_TCP_IPV4 | NDIS_HASH_UDP_IPV4 | NDIS_HASH_TCP_IPV6 |  \
 NDIS_HASH_UDP_IPV6 | NDIS_HASH_TCP_IPV6_EX | NDIS_HASH_UDP_IPV6_EX)
 
+struct mana_rx_comp_per_packet_info {
+   uint32_t packet_length  : 16;
+   uint32_t reserved0  : 16;
+   uint32_t reserved1;
+   uint32_t packet_hash;
+}; /* HW DATA */
+#define RX_COM_OOB_NUM_PACKETINFO_SEGMENTS 4
+
+struct mana_rx_comp_oob {
+   struct mana_cqe_header cqe_hdr;
+
+   uint32_t rx_vlan_id : 12;
+   uint32_t rx_vlan_tag_present: 1;
+   uint32_t rx_outer_ip_header_checksum_succeeded  : 1;
+   uint32_t rx_outer_ip_header_checksum_failed : 1;
+   uint32_t reserved   : 1;
+   uint32_t rx_hash_type   : 9;
+   uint32_t rx_ip_header_checksum_succeeded: 1;
+   uint32_t rx_ip_header_checksum_failed   : 1;
+   uint32_t rx_tcp_checksum_succeeded  : 1;
+   uint32_t rx_tcp_checksum_failed : 1;
+   uint32_t rx_udp_checksum_succeeded  : 1;
+   uint32_t rx_udp_checksum_failed : 1;
+   uint32_t reserved1  : 1;
+   struct mana_rx_comp_per_packet_info
+   packet_info[RX_COM_OOB_NUM_PACKETINFO_SEGMENTS];
+   uint32_t received_wqe_offset;
+}; /* HW DATA */
+
 struct gdma_wqe_dma_oob {
uint32_t reserved:24;
uint32_t last_Vbytes:8;
@@ -371,6 +405,9 @@ int gdma_post_work_request(struct mana_gdma_queue *queue,
   struct gdma_posted_wqe_info *wqe_info);
 uint8_t *gdma_get_wqe_pointer(struct mana_gdma_queue *queue);
 
+uint16_t mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **rx_pkts,
+  uint16_t pkts_n);
+
 uint16_t mana_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
   uint16_t pkts_n);
 
diff --git a/drivers/net/mana/mp.c b/drivers/net/mana/mp.c
index 9cb3c09d32..a4c612c1a3 100644
--- a/drivers/net/mana/mp.c
+++ b/drivers/net/mana/mp.c
@@ -160,6 +160,8 @@ static int mana_mp_secondary_handle(const struct rte_mp_msg 
*mp_msg,
case MANA_MP_REQ_START_RXTX:
DRV_LOG(INFO, "Port %u starting datapath", dev->data->port_id);
 
+   dev->rx_pkt_burst = mana_rx_burst;
+
rte_mb();
 
res->result = 0;
diff --git a/drivers/net/mana/rx.c b/drivers/net/mana/rx.c
index bcc9f308f3..4e43299144 100644
--- a/drivers/net/mana/rx.c
+++ b/drivers/net/mana/rx.c
@@ -367,3 +367,107 @@ int start_rx_queues(struct rte_eth_dev *dev)
stop_rx_queues(dev);
return ret;
 }
+
+uint16_t mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+   uint16_t pkt_received = 0, cqe_processed = 0;
+   struct mana_rxq *rxq = dpdk_rxq;
+   struct mana_priv *priv = rxq->priv;
+   struct gdma_comp comp;
+   struct rte_m

[PATCH 15/17] net/mana: add function to send packets

2022-07-01 Thread longli
From: Long Li 

With all the TX queues created, MANA can send packets over those queues.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |   1 +
 drivers/net/mana/mana.c   |   1 +
 drivers/net/mana/mana.h   |  65 
 drivers/net/mana/mp.c |   1 +
 drivers/net/mana/tx.c | 240 ++
 5 files changed, 308 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index b47860554d..bd50fe81d6 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Free Tx mbuf on demand = Y
 Link status  = P
 Linux= Y
 L3 checksum offload  = Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 6d8a0512c1..0ffa2882e0 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -987,6 +987,7 @@ static int mana_pci_probe_mac(struct rte_pci_driver 
*pci_drv __rte_unused,
/* fd is no not used after mapping doorbell */
close(fd);
 
+   eth_dev->tx_pkt_burst = mana_tx_burst;
eth_dev->rx_pkt_burst = mana_rx_burst;
 
rte_spinlock_lock(&mana_shared_data->lock);
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 626abc431a..2a74e54007 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -62,6 +62,47 @@ struct mana_shared_data {
 
 #define NOT_USING_CLIENT_DATA_UNIT 0
 
+enum tx_packet_format_v2 {
+   short_packet_format = 0,
+   long_packet_format = 1
+};
+
+struct transmit_short_oob_v2 {
+   enum tx_packet_format_v2 packet_format : 2;
+   uint32_t tx_is_outer_IPv4 : 1;
+   uint32_t tx_is_outer_IPv6 : 1;
+   uint32_t tx_compute_IP_header_checksum : 1;
+   uint32_t tx_compute_TCP_checksum : 1;
+   uint32_t tx_compute_UDP_checksum : 1;
+   uint32_t suppress_tx_CQE_generation : 1;
+   uint32_t VCQ_number : 24;
+   uint32_t tx_transport_header_offset : 10;
+   uint32_t VSQ_frame_num : 14;
+   uint32_t short_vport_offset : 8;
+};
+
+struct transmit_long_oob_v2 {
+   uint32_t TxIsEncapsulatedPacket : 1;
+   uint32_t TxInnerIsIPv6 : 1;
+   uint32_t TxInnerTcpOptionsPresent : 1;
+   uint32_t InjectVlanPriorTag : 1;
+   uint32_t Reserved1 : 12;
+   uint32_t PriorityCodePoint : 3;
+   uint32_t DropEligibleIndicator : 1;
+   uint32_t VlanIdentifier : 12;
+   uint32_t TxInnerFrameOffset : 10;
+   uint32_t TxInnerIpHeaderRelativeOffset : 6;
+   uint32_t LongVportOffset : 12;
+   uint32_t Reserved3 : 4;
+   uint32_t Reserved4 : 32;
+   uint32_t Reserved5 : 32;
+};
+
+struct transmit_oob_v2 {
+   struct transmit_short_oob_v2 short_oob;
+   struct transmit_long_oob_v2 long_oob;
+};
+
 enum gdma_queue_types {
gdma_queue_type_invalid = 0,
gdma_queue_send,
@@ -183,6 +224,17 @@ enum mana_cqe_type {
CQE_RX_COALESCED_4  = 2,
CQE_RX_OBJECT_FENCE = 3,
CQE_RX_TRUNCATED= 4,
+
+   CQE_TX_OKAY = 32,
+   CQE_TX_SA_DROP  = 33,
+   CQE_TX_MTU_DROP = 34,
+   CQE_TX_INVALID_OOB  = 35,
+   CQE_TX_INVALID_ETH_TYPE = 36,
+   CQE_TX_HDR_PROCESSING_ERROR = 37,
+   CQE_TX_VF_DISABLED  = 38,
+   CQE_TX_VPORT_IDX_OUT_OF_RANGE   = 39,
+   CQE_TX_VPORT_DISABLED   = 40,
+   CQE_TX_VLAN_TAGGING_VIOLATION   = 41,
 };
 
 struct mana_cqe_header {
@@ -191,6 +243,17 @@ struct mana_cqe_header {
uint32_t vendor_err  : 24;
 }; /* HW DATA */
 
+struct mana_tx_comp_oob {
+   struct mana_cqe_header cqe_hdr;
+
+   uint32_t tx_data_offset;
+
+   uint32_t tx_sgl_offset   : 5;
+   uint32_t tx_wqe_offset   : 27;
+
+   uint32_t reserved[12];
+}; /* HW DATA */
+
 /* NDIS HASH Types */
 #define BIT(nr)(1 << (nr))
 #define NDIS_HASH_IPV4  BIT(0)
@@ -407,6 +470,8 @@ uint8_t *gdma_get_wqe_pointer(struct mana_gdma_queue 
*queue);
 
 uint16_t mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **rx_pkts,
   uint16_t pkts_n);
+uint16_t mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts,
+  uint16_t pkts_n);
 
 uint16_t mana_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
   uint16_t pkts_n);
diff --git a/drivers/net/mana/mp.c b/drivers/net/mana/mp.c
index a4c612c1a3..ea9014db8d 100644
--- a/drivers/net/mana/mp.c
+++ b/drivers/net/mana/mp.c
@@ -160,6 +160,7 @@ static int mana_mp_secondary_handle(const struct rte_mp_msg 
*mp_msg,
case MANA_MP_REQ_START_RXTX:
DRV_LOG(INFO, "Port %u starting datapath", dev->data->port_id);
 
+  

[PATCH 16/17] net/mana: add function to start/stop device

2022-07-01 Thread longli
From: Long Li 

Add support for starting/stopping the device.

Signed-off-by: Long Li 
---
 drivers/net/mana/mana.c | 70 +
 1 file changed, 70 insertions(+)

diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 0ffa2882e0..b919d86500 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -126,6 +126,74 @@ static int mana_dev_configure(struct rte_eth_dev *dev)
 
 static int mana_intr_uninstall(struct mana_priv *priv);
 
+static int
+mana_dev_start(struct rte_eth_dev *dev)
+{
+   int ret;
+   struct mana_priv *priv = dev->data->dev_private;
+
+   rte_rwlock_init(&priv->mr_list_lock);
+   ret = mana_mr_btree_init(&priv->mr_btree, MANA_MR_BTREE_CACHE_N,
+dev->device->numa_node);
+   if (ret) {
+   DRV_LOG(ERR, "Failed to init device MR btree %d", ret);
+   return ret;
+   }
+
+   ret = start_tx_queues(dev);
+   if (ret) {
+   DRV_LOG(ERR, "failed to start tx queues %d", ret);
+   return ret;
+   }
+
+   ret = start_rx_queues(dev);
+   if (ret) {
+   DRV_LOG(ERR, "failed to start rx queues %d", ret);
+   stop_tx_queues(dev);
+   return ret;
+   }
+
+   rte_wmb();
+
+   dev->tx_pkt_burst = mana_tx_burst;
+   dev->rx_pkt_burst = mana_rx_burst;
+
+   DRV_LOG(INFO, "TX/RX queues have started");
+
+   /* Enable datapath for secondary processes */
+   mana_mp_req_on_rxtx(dev, MANA_MP_REQ_START_RXTX);
+
+   return 0;
+}
+
+static int
+mana_dev_stop(struct rte_eth_dev *dev __rte_unused)
+{
+   int ret;
+
+   dev->tx_pkt_burst = mana_tx_burst_removed;
+   dev->rx_pkt_burst = mana_rx_burst_removed;
+
+   /* Stop datapath on secondary processes */
+   mana_mp_req_on_rxtx(dev, MANA_MP_REQ_STOP_RXTX);
+
+   rte_wmb();
+
+   ret = stop_tx_queues(dev);
+   if (ret) {
+   DRV_LOG(ERR, "failed to stop tx queues");
+   return ret;
+   }
+
+   ret = stop_rx_queues(dev);
+   if (ret) {
+   DRV_LOG(ERR, "failed to stop tx queues");
+   return ret;
+   }
+
+   return 0;
+}
+
 static int
 mana_dev_close(struct rte_eth_dev *dev)
 {
@@ -464,6 +532,8 @@ static int mana_dev_link_update(struct rte_eth_dev *dev,
 
 const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
+   .dev_start  = mana_dev_start,
+   .dev_stop   = mana_dev_stop,
.dev_close  = mana_dev_close,
.dev_infos_get  = mana_dev_info_get,
.txq_info_get   = mana_dev_tx_queue_info,
-- 
2.17.1



[PATCH 17/17] net/mana: add function to report queue stats

2022-07-01 Thread longli
From: Long Li 

Report packet statistics.

Signed-off-by: Long Li 
---
 doc/guides/nics/features/mana.ini |  2 +
 drivers/net/mana/mana.c   | 77 +++
 2 files changed, 79 insertions(+)

diff --git a/doc/guides/nics/features/mana.ini 
b/doc/guides/nics/features/mana.ini
index bd50fe81d6..a77d6f2249 100644
--- a/doc/guides/nics/features/mana.ini
+++ b/doc/guides/nics/features/mana.ini
@@ -4,6 +4,7 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Basic stats  = Y
 Free Tx mbuf on demand = Y
 Link status  = P
 Linux= Y
@@ -14,5 +15,6 @@ Queue start/stop = Y
 Removal event= Y
 RSS hash = Y
 Speed capabilities   = P
+Stats per queue  = Y
 Usage doc= Y
 x86-64   = Y
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index b919d86500..b514a4cfef 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -530,6 +530,79 @@ static int mana_dev_link_update(struct rte_eth_dev *dev,
return rte_eth_linkstatus_set(dev, &link);
 }
 
+static int mana_dev_stats_get(struct rte_eth_dev *dev,
+ struct rte_eth_stats *stats)
+{
+   unsigned int i;
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   struct mana_txq *txq = dev->data->tx_queues[i];
+
+   if (!txq)
+   continue;
+
+   stats->opackets = txq->stats.packets;
+   stats->obytes = txq->stats.bytes;
+   stats->oerrors = txq->stats.errors;
+
+   if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+   stats->q_opackets[i] = txq->stats.packets;
+   stats->q_obytes[i] = txq->stats.bytes;
+   }
+   }
+
+   stats->rx_nombuf = 0;
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   struct mana_rxq *rxq = dev->data->rx_queues[i];
+
+   if (!rxq)
+   continue;
+
+   stats->ipackets = rxq->stats.packets;
+   stats->ibytes = rxq->stats.bytes;
+   stats->ierrors = rxq->stats.errors;
+
+   /* There is no good way to get stats->imissed, not setting it */
+
+   if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+   stats->q_ipackets[i] = rxq->stats.packets;
+   stats->q_ibytes[i] = rxq->stats.bytes;
+   }
+
+   stats->rx_nombuf += rxq->stats.nombuf;
+   }
+
+   return 0;
+}
+
+static int
+mana_dev_stats_reset(struct rte_eth_dev *dev __rte_unused)
+{
+   unsigned int i;
+
+   PMD_INIT_FUNC_TRACE();
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   struct mana_txq *txq = dev->data->tx_queues[i];
+
+   if (!txq)
+   continue;
+
+   memset(&txq->stats, 0, sizeof(txq->stats));
+   }
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   struct mana_rxq *rxq = dev->data->rx_queues[i];
+
+   if (!rxq)
+   continue;
+
+   memset(&rxq->stats, 0, sizeof(rxq->stats));
+   }
+
+   return 0;
+}
+
 const struct eth_dev_ops mana_dev_ops = {
.dev_configure  = mana_dev_configure,
.dev_start  = mana_dev_start,
@@ -546,9 +619,13 @@ const struct eth_dev_ops mana_dev_ops = {
.rx_queue_setup = mana_dev_rx_queue_setup,
.rx_queue_release   = mana_dev_rx_queue_release,
.link_update= mana_dev_link_update,
+   .stats_get  = mana_dev_stats_get,
+   .stats_reset= mana_dev_stats_reset,
 };
 
 const struct eth_dev_ops mana_dev_sec_ops = {
+   .stats_get = mana_dev_stats_get,
+   .stats_reset = mana_dev_stats_reset,
.dev_infos_get = mana_dev_info_get,
 };
 
-- 
2.17.1



RE: [PATCH] app/testpmd: fix secondary process cannot dump packet

2022-07-01 Thread Zhang, Yuying
Hi,

> -Original Message-
> From: Zhang, Peng1X 
> Sent: Friday, June 24, 2022 2:15 AM
> To: dev@dpdk.org
> Cc: Singh, Aman Deep ; Zhang, Yuying
> ; Zhang, Peng1X ;
> sta...@dpdk.org
> Subject: [PATCH] app/testpmd: fix secondary process cannot dump packet
> 
> From: Peng Zhang 
> 
> The origin design is whether testpmd is primary or not, if state of receive 
> queue
> is stop, then packets will not be dumped for show.
> While to secondary process, receive queue will not be set up, and state will 
> still
> be stop even if testpmd is started. So packets of stated secondary process
> cannot be dumped for show.
> 
> The current design is to secondary process state of queue will be set to start
> after testpmd is started. Then packets of started secondary process can be
> dumped for show.
> 
> Fixes: a550baf24af9 ("app/testpmd: support multi-process")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Peng Zhang 

Acked-by: Yuying Zhang 

> ---
>  app/test-pmd/testpmd.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 205d98ee3d..93ba7e7c9b 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -3007,6 +3007,18 @@ start_port(portid_t pid)
>   if (setup_hairpin_queues(pi, p_pi, cnt_pi) != 0)
>   return -1;
>   }
> +
> + if (port->need_reconfig_queues > 0 && !is_proc_primary()) {
> + struct rte_eth_rxconf *rx_conf;
> + for (qi = 0; qi < nb_rxq; qi++) {
> + rx_conf = &(port->rxq[qi].conf);
> + ports[pi].rxq[qi].state =
> + rx_conf->rx_deferred_start ?
> + RTE_ETH_QUEUE_STATE_STOPPED :
> + RTE_ETH_QUEUE_STATE_STARTED;
> + }
> + }
> +
>   configure_rxtx_dump_callbacks(verbose_level);
>   if (clear_ptypes) {
>   diag = rte_eth_dev_set_ptypes(pi,
> RTE_PTYPE_UNKNOWN,
> --
> 2.25.1



RE: [PATCH v2] raw/ntb: add PPD status check for SPR

2022-07-01 Thread Wu, Jingjing



> -Original Message-
> From: Guo, Junfeng 
> Sent: Thursday, June 30, 2022 4:56 PM
> To: Wu, Jingjing 
> Cc: dev@dpdk.org; Guo, Junfeng 
> Subject: [PATCH v2] raw/ntb: add PPD status check for SPR
> 
> Add PPD (PCIe Port Definition) status check for SPR (Sapphire Rapids).
> 
> Note that NTB on SPR has the same device id with that on ICX, while
> the field offsets of PPD Control Register are different. Here, we use
> the PCI device revision id to distinguish the HW platform (ICX/SPR)
> and check the Port Config Status and Port Definition accordingly.
> 
> +---+++
> |  Fields   | Bit Range (on ICX) | Bit Range (on SPR) |
> +---+++
> | Port Configuration Status | 12 | 14 |
> | Port Definition   | 9:8| 10:8   |
> +---+++
> 
> v2:
> fix revision id value check logic.
> 
> Signed-off-by: Junfeng Guo 
Acked-by: Jingjing Wu 


Re: [PATCH] mbuf: add mbuf physical address field to dynamic field

2022-07-01 Thread Olivier Matz
Hi,

On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
> On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
> > If all devices are configured to run in IOVA mode as VA, physical
> > address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
> > space is free to use as a dynamic field. So a new dynamic field member
> > (dynfield2) is added in mbuf structure to make use of that space.
> > 
> > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
> > mbuf that can use dynfield2.
> > 
> > Signed-off-by: Shijith Thotton 
> > ---
> I disagree with this patch. The mbuf should always record the iova of the
> buffer directly, rather than forcing the drivers to query the EAL mode.
> This will likely also break all vector drivers right now, as they are
> sensitive to the mbuf layout and the position of the IOVA address in the
> buffer.

I have the same opinion than Stephen and Bruce. This field is widely used
in DPDK, I don't think it is a good idea to disable it if some conditions
are met.


RE: [PATCH] vhost: fix sync dequeue offload

2022-07-01 Thread Xia, Chenbo
> -Original Message-
> From: Ding, Xuan 
> Sent: Friday, June 24, 2022 1:38 PM
> To: maxime.coque...@redhat.com; Xia, Chenbo 
> Cc: dev@dpdk.org; Hu, Jiayu ; Ling, WeiX
> ; Ding, Xuan 
> Subject: [PATCH] vhost: fix sync dequeue offload
> 
> From: Xuan Ding 
> 
> This patch fixes the missing virtio net header copy in sync
> dequeue path caused by refactoring, which affects dequeue
> offloading.
> 
> Fixes: 6d823bb302c7("vhost: prepare sync for descriptor to mbuf
> refactoring")
> 
> Signed-off-by: Xuan Ding 
> ---
>  lib/vhost/virtio_net.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 68a26eb17d..d5a9f7c691 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -2635,9 +2635,17 @@ desc_to_mbuf(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  buf_iova + buf_offset, cpy_len, 
> false) <
> 0)
>   goto error;
>   } else {
> - sync_fill_seg(dev, vq, cur, mbuf_offset,
> -   buf_addr + buf_offset,
> -   buf_iova + buf_offset, cpy_len, false);
> + if (hdr && cur == m) {
> + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
> mbuf_offset),
> + (void *)((uintptr_t)(buf_addr + 
> buf_offset)),
> + cpy_len);
> + vhost_log_cache_write_iova(dev, vq, buf_iova +
> buf_offset, cpy_len);
> + PRINT_PACKET(dev, (uintptr_t)(buf_addr +
> buf_offset), cpy_len, 0);

Although above logic can also be included in func sync_fill_seg, but it will 
need
to add a new dequeue-specific param, so I would think this patch is fine.

During review of this patch, I also notice one bug that writes dirty page log 
when
doing dequeue. But it's not related to this patch. So:

Reviewed-by: Chenbo Xia  

> + } else {
> + sync_fill_seg(dev, vq, cur, mbuf_offset,
> + buf_addr + buf_offset,
> + buf_iova + buf_offset, cpy_len, false);
> + }
>   }
> 
>   mbuf_avail  -= cpy_len;
> --
> 2.17.1



RE: [PATCH] net/virtio: fix socket nonblocking mode affects initialization

2022-07-01 Thread Xia, Chenbo
> -Original Message-
> From: Wang, YuanX 
> Sent: Friday, June 17, 2022 10:42 AM
> To: maxime.coque...@redhat.com; Xia, Chenbo ;
> dev@dpdk.org
> Cc: Hu, Jiayu ; He, Xingguang ;
> Wang, YuanX ; sta...@dpdk.org
> Subject: [PATCH] net/virtio: fix socket nonblocking mode affects
> initialization
> 
> The virtio-user initialization requires unix socket to receive backend
> messages in block mode. However, vhost_user_update_link_state() sets
> the same socket to nonblocking via fcntl, which affects all threads.
> Enabling the rxq interrupt can causes both of these behaviors to occur
> concurrently, with the result that the initialization may fail
> because no messages are received in nonblocking socket.
> 
> Thread 1:
> virtio_init_device()
> --> virtio_user_start_device()
>   --> vhost_user_set_memory_table()
>   --> vhost_user_check_reply_ack()
> 
> Thread 2:
> virtio_interrupt_handler()
> --> vhost_user_update_link_state()
> 
> Fix that by replacing O_NONBLOCK with the recv per-call option
> MSG_DONTWAIT.
> 
> Fixes: ef53b6030039 ("net/virtio-user: support LSC")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yuan Wang 
> ---
>  drivers/net/virtio/virtio_user/vhost_user.c | 15 +--
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/vhost_user.c
> b/drivers/net/virtio/virtio_user/vhost_user.c
> index 7d1749114d..198bd63d3c 100644
> --- a/drivers/net/virtio/virtio_user/vhost_user.c
> +++ b/drivers/net/virtio/virtio_user/vhost_user.c
> @@ -940,15 +940,8 @@ vhost_user_update_link_state(struct virtio_user_dev
> *dev)
> 
>   if (data->vhostfd >= 0) {
>   int r;
> - int flags;
> 
> - flags = fcntl(data->vhostfd, F_GETFL);
> - if (fcntl(data->vhostfd, F_SETFL, flags | O_NONBLOCK) == -1) {
> - PMD_DRV_LOG(ERR, "error setting O_NONBLOCK flag");
> - return -1;
> - }
> -
> - r = recv(data->vhostfd, buf, 128, MSG_PEEK);
> + r = recv(data->vhostfd, buf, 128, MSG_PEEK | MSG_DONTWAIT);
>   if (r == 0 || (r < 0 && errno != EAGAIN)) {
>   dev->net_status &= (~VIRTIO_NET_S_LINK_UP);
>   PMD_DRV_LOG(ERR, "virtio-user port %u is down", dev-
> >hw.port_id);
> @@ -963,12 +956,6 @@ vhost_user_update_link_state(struct virtio_user_dev
> *dev)
>   } else {
>   dev->net_status |= VIRTIO_NET_S_LINK_UP;
>   }
> -
> - if (fcntl(data->vhostfd, F_SETFL,
> - flags & ~O_NONBLOCK) == -1) {
> - PMD_DRV_LOG(ERR, "error clearing O_NONBLOCK flag");
> - return -1;
> - }
>   } else if (dev->is_server) {
>   dev->net_status &= (~VIRTIO_NET_S_LINK_UP);
>   if (virtio_user_dev_server_reconnect(dev) >= 0)
> --
> 2.25.1

Reviewed-by: Chenbo Xia 


[PATCH v2 0/2] add a fast path for memif Rx/Tx

2022-07-01 Thread Joyce Kong
For memif non-zero-copy mode, there is a branch to compare
the mbuf and memif buffer size during memory copying. Add
a fast memory copy path by removing this branch with mbuf
and memif buffer size defined at compile time. And for Tx
fast path, bulk free the mbufs which come from the same
mempool.

When mbuf == memif buffer size, both Rx/Tx would choose
the fast memcpy path. When mbuf < memif buffer size, the
Rx chooses previous memcpy path while Tx chooses fast
memcpy path. When mbuf > memif buffer size, the Rx chooses
fast memcpy path while Tx chooses previous memcpy path.

Test with 1p1q on Ampere Altra AArch64 server,
-
  buf size  | memif = mbuf | memif < mbuf | memif > mbuf
-
non-zc gain |16.95%| 3.28%|13.29%
-
   zc gain  |19.43%| 4.62%|18.14%
-

Test with 1p1q on Cascade Lake Xeon X86server,
-
  buf size  | memif = mbuf | memif < mbuf | memif > mbuf
-
non-zc gain |19.97%| 2.35%|21.43%
-
   zc gain  |14.30%|-1.21%|11.98%
-

v2:
 Rebase v1 and update commit message.

Joyce Kong (2):
  net/memif: add a Rx fast path
  net/memif: add a Tx fast path

 drivers/net/memif/rte_eth_memif.c | 257 --
 1 file changed, 175 insertions(+), 82 deletions(-)

-- 
2.25.1



[PATCH v2 1/2] net/memif: add a Rx fast path

2022-07-01 Thread Joyce Kong
For memif non-zero-copy mode, there is a branch to compare
the mbuf and memif buffer size during memory copying. Add
a fast memory copy path by removing this branch with mbuf
and memif buffer size defined at compile time. The removal
of the branch leads to considerable performance uplift.
The Rx fast path would not change mbuf's behavior of storing
memif buf.

When memif <= buffer size, Rx chooses the fast memcpy path,
otherwise it would choose the original path.

Test with 1p1q on Ampere Altra AArch64 server,
--
|  buf size   | memif <= mbuf | memif > mbuf |
--
| non-zc gain | 4.30% |-0.52%|
--
|   zc gain   | 2.46% | 0.70%|
--

Test with 1p1q on Cascade Lake Xeon X86server,
--
|   buf size  | memif <= mbuf | memif > mbuf |
--
| non-zc gain | 2.13% |-1.40%|
--
|   zc gain   | 0.18% | 0.48%|
--

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
Acked-by: Morten Brørup 
---
 drivers/net/memif/rte_eth_memif.c | 123 --
 1 file changed, 83 insertions(+), 40 deletions(-)

diff --git a/drivers/net/memif/rte_eth_memif.c 
b/drivers/net/memif/rte_eth_memif.c
index dd951b8296..24fc8b13fa 100644
--- a/drivers/net/memif/rte_eth_memif.c
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -341,67 +341,111 @@ eth_memif_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
if (cur_slot == last_slot)
goto refill;
n_slots = last_slot - cur_slot;
+   if (likely(mbuf_size >= pmd->cfg.pkt_buffer_size)) {
+   while (n_slots && n_rx_pkts < nb_pkts) {
+   mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+   if (unlikely(mbuf_head == NULL))
+   goto no_free_bufs;
+   mbuf = mbuf_head;
+
+next_slot1:
+   mbuf->port = mq->in_port;
+   s0 = cur_slot & mask;
+   d0 = &ring->desc[s0];
 
-   while (n_slots && n_rx_pkts < nb_pkts) {
-   mbuf_head = rte_pktmbuf_alloc(mq->mempool);
-   if (unlikely(mbuf_head == NULL))
-   goto no_free_bufs;
-   mbuf = mbuf_head;
-   mbuf->port = mq->in_port;
-   dst_off = 0;
+   cp_len = d0->length;
 
-next_slot:
-   s0 = cur_slot & mask;
-   d0 = &ring->desc[s0];
+   rte_pktmbuf_data_len(mbuf) = cp_len;
+   rte_pktmbuf_pkt_len(mbuf) = cp_len;
+   if (mbuf != mbuf_head)
+   rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
 
-   src_len = d0->length;
-   src_off = 0;
+   rte_memcpy(rte_pktmbuf_mtod(mbuf, void *),
+   (uint8_t *)memif_get_buffer(proc_private, d0), 
cp_len);
 
-   do {
-   dst_len = mbuf_size - dst_off;
-   if (dst_len == 0) {
-   dst_off = 0;
-   dst_len = mbuf_size;
+   cur_slot++;
+   n_slots--;
 
-   /* store pointer to tail */
+   if (d0->flags & MEMIF_DESC_FLAG_NEXT) {
mbuf_tail = mbuf;
mbuf = rte_pktmbuf_alloc(mq->mempool);
if (unlikely(mbuf == NULL))
goto no_free_bufs;
-   mbuf->port = mq->in_port;
ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, 
mbuf);
if (unlikely(ret < 0)) {
MIF_LOG(ERR, 
"number-of-segments-overflow");
rte_pktmbuf_free(mbuf);
goto no_free_bufs;
}
+   goto next_slot1;
}
-   cp_len = RTE_MIN(dst_len, src_len);
 
-   rte_pktmbuf_data_len(mbuf) += cp_len;
-   rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
-   if (mbuf != mbuf_head)
-   rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+   mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+   *bufs++ = mbuf_head;
+   n_rx_pkts++;
+   }
+   } else {
+   while (n_slots && n_rx_pkts < nb_pkts) {
+ 

[PATCH v2 2/2] net/memif: add a Tx fast path

2022-07-01 Thread Joyce Kong
For memif non-zero-copy mode, there is a branch to compare
the mbuf and memif buffer size during memory copying. If all
mbufs come from the same mempool, and memif buf size >= mbuf
size, add a fast Tx memory copy path without the comparing
branch and with mbuf bulk free, otherwise still run the
original Tx path.
The Tx fast path would not change memif's behavior of storing
mbuf.

The removal of the branch and bulk free lead to considerable
performance uplift.

Test with 1p1q on Ampere Altra AArch64 server,
--
|  buf size   | memif >= mbuf | memif < mbuf |
--
| non-zc gain |13.35% |-0.77%|
--
|  zc gain|17.15% |-0.47%|
--

Test with 1p1q on Cascade Lake Xeon X86server,
--
|  buf size   | memif >= mbuf | memif < mbuf |
--
| non-zc gain |10.10% |-0.29%|
--
|   zc gain   | 8.87% |-0.99%|
--

Signed-off-by: Joyce Kong 
Reviewed-by: Ruifeng Wang 
Acked-by: Morten Brørup 
---
 drivers/net/memif/rte_eth_memif.c | 134 --
 1 file changed, 92 insertions(+), 42 deletions(-)

diff --git a/drivers/net/memif/rte_eth_memif.c 
b/drivers/net/memif/rte_eth_memif.c
index 24fc8b13fa..bafcfd5a7c 100644
--- a/drivers/net/memif/rte_eth_memif.c
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -659,62 +659,112 @@ eth_memif_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
n_free = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE) - slot;
}
 
-   while (n_tx_pkts < nb_pkts && n_free) {
-   mbuf_head = *bufs++;
-   nb_segs = mbuf_head->nb_segs;
-   mbuf = mbuf_head;
+   uint8_t i;
+   struct rte_mbuf **buf_tmp = bufs;
+   mbuf_head = *buf_tmp++;
+   struct rte_mempool *mp = mbuf_head->pool;
+
+   for (i = 1; i < nb_pkts; i++) {
+   mbuf_head = *buf_tmp++;
+   if (mbuf_head->pool != mp)
+   break;
+   }
+
+   uint16_t mbuf_size = rte_pktmbuf_data_room_size(mp) - 
RTE_PKTMBUF_HEADROOM;
+   if (i == nb_pkts && pmd->cfg.pkt_buffer_size >= mbuf_size) {
+   buf_tmp = bufs;
+   while (n_tx_pkts < nb_pkts && n_free) {
+   mbuf_head = *bufs++;
+   nb_segs = mbuf_head->nb_segs;
+   mbuf = mbuf_head;
 
-   saved_slot = slot;
-   d0 = &ring->desc[slot & mask];
-   dst_off = 0;
-   dst_len = (type == MEMIF_RING_C2S) ?
-   pmd->run.pkt_buffer_size : d0->length;
+   saved_slot = slot;
 
-next_in_chain:
-   src_off = 0;
-   src_len = rte_pktmbuf_data_len(mbuf);
+next_in_chain1:
+   d0 = &ring->desc[slot & mask];
+   cp_len = rte_pktmbuf_data_len(mbuf);
 
-   while (src_len) {
-   if (dst_len == 0) {
+   rte_memcpy((uint8_t *)memif_get_buffer(proc_private, 
d0),
+   rte_pktmbuf_mtod(mbuf, void *), cp_len);
+
+   d0->length = cp_len;
+   mq->n_bytes += cp_len;
+   slot++;
+   n_free--;
+
+   if (--nb_segs > 0) {
if (n_free) {
-   slot++;
-   n_free--;
d0->flags |= MEMIF_DESC_FLAG_NEXT;
-   d0 = &ring->desc[slot & mask];
-   dst_off = 0;
-   dst_len = (type == MEMIF_RING_C2S) ?
-   pmd->run.pkt_buffer_size : 
d0->length;
-   d0->flags = 0;
+   mbuf = mbuf->next;
+   goto next_in_chain1;
} else {
slot = saved_slot;
-   goto no_free_slots;
+   goto free_mbufs;
}
}
-   cp_len = RTE_MIN(dst_len, src_len);
 
-   rte_memcpy((uint8_t *)memif_get_buffer(proc_private,
-  d0) + dst_off,
-   rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
-   cp_len);
+   n_tx_pkts++;
+   }
+free_mbufs:
+   rte

[PATCH] docs: change the doc to highlight the allowed multicast addresses

2022-07-01 Thread huzaifa.rahman
Bugzilla ID: 802

The ipv4_multicast example does not work with any multicast IPs.
Only a selected few IPs are allowed to send pakcets. These IPs
are listed in an mcast_group_table array along with their respective
port masks. A normal user would not know about this behaviour since
there is no mention of it in the docs

Added the mcast_group_table in the docs so user would know which
IPs are allowed.

Signed-off-by: huzaifa.rahman 
---
 doc/guides/sample_app_ug/ipv4_multicast.rst | 8 +++-
 examples/ipv4_multicast/main.c  | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/guides/sample_app_ug/ipv4_multicast.rst 
b/doc/guides/sample_app_ug/ipv4_multicast.rst
index f87f7be57e..7c49212c64 100644
--- a/doc/guides/sample_app_ug/ipv4_multicast.rst
+++ b/doc/guides/sample_app_ug/ipv4_multicast.rst
@@ -22,7 +22,13 @@ There are two key differences from the L2 Forwarding sample 
application:
 
 The lookup method is the Four-byte Key (FBK) hash-based method.
 The lookup table is composed of pairs of destination IPv4 address (the FBK)
-and a port mask associated with that IPv4 address.
+and a port mask associated with that IPv4 address. By default, the following 
IP addresses and their respective
+port masks are added:
+
+.. literalinclude:: ../../../examples/ipv4_multicast/main.c
+:language: c
+:start-after: Create the mcast group table. 8<
+:end-before: >8 End of create mcast group table.
 
 .. note::
 
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index bdcaa3bcd1..c086149eca 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -139,6 +139,7 @@ struct mcast_group_params {
uint16_t port_mask;
 };
 
+/* Create the mcast group table. 8< */
 static struct mcast_group_params mcast_group_table[] = {
{RTE_IPV4(224,0,0,101), 0x1},
{RTE_IPV4(224,0,0,102), 0x2},
@@ -156,6 +157,7 @@ static struct mcast_group_params mcast_group_table[] = {
{RTE_IPV4(224,0,0,114), 0xE},
{RTE_IPV4(224,0,0,115), 0xF},
 };
+/* >8 End of create mcast group table. */
 
 /* Send burst of packets on an output interface */
 static void
-- 
2.25.1



RE: [PATCH] app/testpmd: fix secondary process cannot dump packet

2022-07-01 Thread Zhang, Peng1X
Hi,
In fact, the patch is aim to fix this issue that secondary process cannot dump 
packet after start testpmd. 
This issue is induced by commit id is 3c4426db54fc ("app/testpmd: do not poll 
stopped queues"). After 
secondary process start, the default value of Rx/Tx queue state maintained by 
testpmd is 
'RTE_ETH_QUEUE_STATE_STOPPED', the 'fsm[sm_id]->disabled' flag will set true 
according to queues 
state, then packet cannot forward and dump. 

The reason why not use 'dev->data->rx_queue_state' is whether queue state is 
start or stop in primary 
process depend on rx_conf->rx_deferred_start after start testpmd. And after 
having started testpmd, 
queue state can be controlled by command for example 'port x rxq x start'. 
Should we align with the same behavior of queues state for primary and 
secondary process after start testpmd?   

> -Original Message-
> From: lihuisong (C) 
> Sent: Wednesday, June 29, 2022 10:55 AM
> To: Andrew Rybchenko ; Zhang, Peng1X
> ; dev@dpdk.org
> Cc: Singh, Aman Deep ; Zhang, Yuying
> ; sta...@dpdk.org
> Subject: Re: [PATCH] app/testpmd: fix secondary process cannot dump packet
> 
> 
> 在 2022/6/23 20:10, Andrew Rybchenko 写道:
> > On 6/23/22 21:15, peng1x.zh...@intel.com wrote:
> >> From: Peng Zhang 
> >>
> >> The origin design is whether testpmd is primary or not, if state of
> >> receive queue is stop, then packets will not be dumped for show.
> >> While to secondary process, receive queue will not be set up, and
> >> state will still be stop even if testpmd is started. So packets of
> >> stated secondary process cannot be dumped for show.
> >>
> >> The current design is to secondary process state of queue will be set
> >> to start after testpmd is started. Then packets of started secondary
> >> process can be dumped for show.
> >>
> >> Fixes: a550baf24af9 ("app/testpmd: support multi-process")
> >> Cc: sta...@dpdk.org
> >>
> >> Signed-off-by: Peng Zhang 
> >> ---
> >>   app/test-pmd/testpmd.c | 12 
> >>   1 file changed, 12 insertions(+)
> >>
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >> 205d98ee3d..93ba7e7c9b 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -3007,6 +3007,18 @@ start_port(portid_t pid)
> >>   if (setup_hairpin_queues(pi, p_pi, cnt_pi) != 0)
> >>   return -1;
> >>   }
> >> +
> >> +    if (port->need_reconfig_queues > 0 && !is_proc_primary()) {
> >> +    struct rte_eth_rxconf *rx_conf;
> >> +    for (qi = 0; qi < nb_rxq; qi++) {
> >> +    rx_conf = &(port->rxq[qi].conf);
> >> +    ports[pi].rxq[qi].state =
> >> +    rx_conf->rx_deferred_start ?
> >> +    RTE_ETH_QUEUE_STATE_STOPPED :
> >> +    RTE_ETH_QUEUE_STATE_STARTED;
> >
> > I'm not sure why it is correct to assume that deferred queue is not
> > yet started.
> +1.
> 
> We should also consider whether the queue state can be changed in secondary.
> The 'rx_conf->rx_deferred_start' is the data in secondary.
> Why not use 'dev->data->rx_queue_state[]'.
> 
> In fact, the issue you memtioned was introduced the following patch:
> Fixes: 3c4426db54fc ("app/testpmd: do not poll stopped queues")
> 
> The root cause of this issue is that the default value of Rx/Tx queue state
> maintained by testpmd is 'RTE_ETH_QUEUE_STATE_STOPPED'. As a result,
> secondary doesn't start polling thread to receive packets when start packet
> forwarding. And now, secondary cannot receive and send any packets.
> 
> Could you fix it together?
> >
> >> +    }
> >> +    }
> >> +
> >>   configure_rxtx_dump_callbacks(verbose_level);
> >>   if (clear_ptypes) {
> >>   diag = rte_eth_dev_set_ptypes(pi, RTE_PTYPE_UNKNOWN,
> >
> > .


RE: [PATCH] mbuf: add mbuf physical address field to dynamic field

2022-07-01 Thread Slava Ovsiienko
Hi,

Just to note, some PMDs do not use physical address field at all.
As an example - mlx5 PMD (and it is far from being the only one)
could take an advantage from this patch. Nonetheless, I tend to agree -
for the whole DPDK framework it looks risky. I had the similar thoughts
about removing iova field and I did not dare to propose 😊

With best regards,
Slava

> -Original Message-
> From: Olivier Matz 
> Sent: Friday, July 1, 2022 12:49
> To: Bruce Richardson 
> Cc: Shijith Thotton ; jer...@marvell.com; NBU-
> Contact-Thomas Monjalon (EXTERNAL) ; dev@dpdk.org
> Subject: Re: [PATCH] mbuf: add mbuf physical address field to dynamic
> field
> 
> Hi,
> 
> On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
> > On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
> > > If all devices are configured to run in IOVA mode as VA, physical
> > > address field of mbuf (buf_iova) won't be used. In such cases,
> > > buf_iova space is free to use as a dynamic field. So a new dynamic
> > > field member
> > > (dynfield2) is added in mbuf structure to make use of that space.
> > >
> > > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify
> > > the mbuf that can use dynfield2.
> > >
> > > Signed-off-by: Shijith Thotton 
> > > ---
> > I disagree with this patch. The mbuf should always record the iova of
> > the buffer directly, rather than forcing the drivers to query the EAL
> mode.
> > This will likely also break all vector drivers right now, as they are
> > sensitive to the mbuf layout and the position of the IOVA address in
> > the buffer.
> 
> I have the same opinion than Stephen and Bruce. This field is widely
> used in DPDK, I don't think it is a good idea to disable it if some
> conditions are met.


[PATCH] common/cnxk: allow changing PTP mode on 10k platforms

2022-07-01 Thread Tomasz Duszynski
Since firmware has added support for toggling PTP mode on 10k platforms
userspace code should allow doing that as well.

Cc: sta...@dpdk.org

Signed-off-by: Tomasz Duszynski 
Reviewed-by: Jerin Jacob Kollanukkaran 
---
 drivers/common/cnxk/roc_bphy_cgx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index a5df104088..e966494e21 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -285,9 +285,6 @@ roc_bphy_cgx_ptp_rx_ena_dis(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
 {
uint64_t scr1, scr0;

-   if (roc_model_is_cn10k())
-   return -ENOTSUP;
-
if (!roc_cgx)
return -EINVAL;

--
2.25.1



RE: [EXT] Re: [PATCH] mbuf: add mbuf physical address field to dynamic field

2022-07-01 Thread Shijith Thotton
>
>On Thu, Jun 30, 2022 at 05:55:21PM +0100, Bruce Richardson wrote:
>> On Thu, Jun 30, 2022 at 09:55:16PM +0530, Shijith Thotton wrote:
>> > If all devices are configured to run in IOVA mode as VA, physical
>> > address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
>> > space is free to use as a dynamic field. So a new dynamic field member
>> > (dynfield2) is added in mbuf structure to make use of that space.
>> >
>> > A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
>> > mbuf that can use dynfield2.
>> >
>> > Signed-off-by: Shijith Thotton 
>> > ---
>> I disagree with this patch. The mbuf should always record the iova of the
>> buffer directly, rather than forcing the drivers to query the EAL mode.
>> This will likely also break all vector drivers right now, as they are
>> sensitive to the mbuf layout and the position of the IOVA address in the
>> buffer.
>
 
Hi Bruce,

The IOVA check should have been bus specific, instead of eal.  The bus IOVA mode
will be VA, only if all devices on the bus has the flag
RTE_PCI_DRV_NEED_IOVA_AS_VA. It was our thought process, but used wrong API for
the check. It should have avoided the issue which you mentioned above.

>I have the same opinion than Stephen and Bruce. This field is widely used
>in DPDK, I don't think it is a good idea to disable it if some conditions
>are met.

Hi Olivier, 

I was under the assumption, buf_iova won't be used directly by the application
(only through wrapper). So that wrappers can check ol_flags before setting
buf_iova.


Re: [PATCH] vhost: fix virtio blk vDPA live migration IO drop

2022-07-01 Thread Maxime Coquelin




On 6/22/22 09:47, Andy Pei wrote:

In the virtio blk vDPA live migration use case, before the live
migration process, QEMU will set call fd to vDPA back-end. QEMU
and vDPA back-end stand by until live migration starts.
During live migration process, QEMU sets kick fd and a new call
fd. However, after the kick fd is set to the vDPA back-end, the
vDPA back-end configures device and data path starts. The new
call fd will cause some kind of "re-configuration", this kind
of "re-configuration" cause IO drop.
After this patch, vDPA back-end configures device after kick fd
and call fd are well set and make sure no IO drops.
This patch only impact virtio blk vDPA device and does not impact
net device.

Fixes: 7015b6577178 ("vdpa/ifc: add block device SW live-migration")

Signed-off-by: Andy Pei 
---
  lib/vhost/vhost_user.c | 15 +++
  1 file changed, 15 insertions(+)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Re: [PATCH] vdpa/ifc: fix vhost message size check issue

2022-07-01 Thread Maxime Coquelin




On 6/21/22 15:46, Andy Pei wrote:

For vhost message VHOST_USER_GET_CONFIG, we do not check
payload size in vhost lib, we check payload size in driver
specific ops.
For ifc vdpa driver, we just need to make sure payload size
is not smaller than sizeof(struct virtio_blk_config).

Fixes: 856d03bcdc54 ("vdpa/ifc: add block operations")

Signed-off-by: Andy Pei 
---
  drivers/vdpa/ifc/ifcvf_vdpa.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Re: [PATCH] mbuf: add mbuf physical address field to dynamic field

2022-07-01 Thread Shijith Thotton


Re: [PATCH] mbuf: add mbuf physical address field to dynamic field

2022-07-01 Thread Shijith Thotton
>> If all devices are configured to run in IOVA mode as VA, physical
>> address field of mbuf (buf_iova) won't be used. In such cases, buf_iova
>> space is free to use as a dynamic field. So a new dynamic field member
>> (dynfield2) is added in mbuf structure to make use of that space.
>>
>> A new mbuf flag RTE_MBUF_F_DYNFIELD2 is introduced to help identify the
>> mbuf that can use dynfield2.
>>
>> Signed-off-by: Shijith Thotton 
>
> This seems like a complex and potentially error prone way to do this.
> What is the use case?
>

PCI drivers with the flag RTE_PCI_DRV_NEED_IOVA_AS_VA only works in IOVA mode as
VA. buf_iova field of mbuf is not used by those PMDs and can be used as a
dynamic area to save space.

> How much of a performance gain?

No change in performance.


RE: [PATCH v2] net/vhost: fix deadlock on vring state change

2022-07-01 Thread Xia, Chenbo
> -Original Message-
> From: Wang, YuanX 
> Sent: Monday, June 27, 2022 1:51 PM
> To: maxime.coque...@redhat.com; Xia, Chenbo ;
> dev@dpdk.org
> Cc: Hu, Jiayu ; He, Xingguang ;
> Jiang, Cheng1 ; Ling, WeiX ;
> Wang, YuanX ; sta...@dpdk.org
> Subject: [PATCH v2] net/vhost: fix deadlock on vring state change
> 
> If vring state changes after pmd starts working, the locked vring
> notifies pmd, thus calling update_queuing_status(), the latter
> will wait for pmd to finish accessing vring, while pmd is also
> waiting for vring to be unlocked, thus causing deadlock.
> 
> Actually, update_queuing_status() only needs to wait while
> destroy/stopping the device, but not in other cases.
> 
> This patch adds a flag for whether or not to wait to fix this issue.
> 
> Fixes: 1ce3c7fe149f ("net/vhost: emulate device start/stop behavior")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yuan Wang 
> ---
> V2: rewrite the commit log.
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index d75d256040..7e512d94bf 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -741,7 +741,7 @@ eth_vhost_install_intr(struct rte_eth_dev *dev)
>  }
> 
>  static void
> -update_queuing_status(struct rte_eth_dev *dev)
> +update_queuing_status(struct rte_eth_dev *dev, bool wait_queuing)
>  {
>   struct pmd_internal *internal = dev->data->dev_private;
>   struct vhost_queue *vq;
> @@ -767,7 +767,7 @@ update_queuing_status(struct rte_eth_dev *dev)
>   rte_atomic32_set(&vq->allow_queuing, 1);
>   else
>   rte_atomic32_set(&vq->allow_queuing, 0);
> - while (rte_atomic32_read(&vq->while_queuing))
> + while (wait_queuing && rte_atomic32_read(&vq->while_queuing))
>   rte_pause();
>   }
> 
> @@ -779,7 +779,7 @@ update_queuing_status(struct rte_eth_dev *dev)
>   rte_atomic32_set(&vq->allow_queuing, 1);
>   else
>   rte_atomic32_set(&vq->allow_queuing, 0);
> - while (rte_atomic32_read(&vq->while_queuing))
> + while (wait_queuing && rte_atomic32_read(&vq->while_queuing))
>   rte_pause();
>   }
>  }
> @@ -868,7 +868,7 @@ new_device(int vid)
>   vhost_dev_csum_configure(eth_dev);
> 
>   rte_atomic32_set(&internal->dev_attached, 1);
> - update_queuing_status(eth_dev);
> + update_queuing_status(eth_dev, false);
> 
>   VHOST_LOG(INFO, "Vhost device %d created\n", vid);
> 
> @@ -898,7 +898,7 @@ destroy_device(int vid)
>   internal = eth_dev->data->dev_private;
> 
>   rte_atomic32_set(&internal->dev_attached, 0);
> - update_queuing_status(eth_dev);
> + update_queuing_status(eth_dev, true);
> 
>   eth_dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
> 
> @@ -1008,7 +1008,7 @@ vring_state_changed(int vid, uint16_t vring, int
> enable)
>   state->max_vring = RTE_MAX(vring, state->max_vring);
>   rte_spinlock_unlock(&state->lock);
> 
> - update_queuing_status(eth_dev);
> + update_queuing_status(eth_dev, false);
> 
>   VHOST_LOG(INFO, "vring%u is %s\n",
>   vring, enable ? "enabled" : "disabled");
> @@ -1197,7 +1197,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
>   }
> 
>   rte_atomic32_set(&internal->started, 1);
> - update_queuing_status(eth_dev);
> + update_queuing_status(eth_dev, false);
> 
>   return 0;
>  }
> @@ -1209,7 +1209,7 @@ eth_dev_stop(struct rte_eth_dev *dev)
> 
>   dev->data->dev_started = 0;
>   rte_atomic32_set(&internal->started, 0);
> - update_queuing_status(dev);
> + update_queuing_status(dev, true);
> 
>   return 0;
>  }
> --
> 2.25.1

Reviewed-by: Chenbo Xia 


[PATCH] net/cnxk: fix to display extended stats

2022-07-01 Thread Rakesh Kudurumalla
This fix replaces the usage of roc_nix_num_xstats_get()
which is compile time api with runtime api
roc_nix_xstats_names_get() resolving xstat count
difference for cn9k and cn10k while displaying xstats
for dpdk ports

Fixes: 825bd1d9d8e6 ("common/cnxk: update extra stats for inline device")

Signed-off-by: Rakesh Kudurumalla 
---
 drivers/net/cnxk/cnxk_stats.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cnxk/cnxk_stats.c b/drivers/net/cnxk/cnxk_stats.c
index 4b0deac05e..f2fc89 100644
--- a/drivers/net/cnxk/cnxk_stats.c
+++ b/drivers/net/cnxk/cnxk_stats.c
@@ -172,7 +172,7 @@ cnxk_nix_xstats_get_names(struct rte_eth_dev *eth_dev,
struct roc_nix *nix = &dev->nix;
int roc_size, size, i, q;
 
-   roc_size = roc_nix_num_xstats_get(nix);
+   roc_size = roc_nix_xstats_names_get(nix, NULL, 0);
/* Per Queue statistics also returned as part of xstats */
size = roc_size + (dev->nb_rxq * CNXK_NB_RXQ_STATS) +
   (dev->nb_txq * CNXK_NB_TXQ_STATS);
@@ -232,7 +232,7 @@ cnxk_nix_xstats_get_names_by_id(struct rte_eth_dev *eth_dev,
unsigned int limit)
 {
struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
-   uint32_t nix_cnt = roc_nix_num_xstats_get(&dev->nix);
+   uint32_t nix_cnt = roc_nix_xstats_names_get(&dev->nix, NULL, 0);
uint32_t stat_cnt = nix_cnt + (dev->nb_rxq * CNXK_NB_RXQ_STATS) +
(dev->nb_txq * CNXK_NB_TXQ_STATS);
struct rte_eth_xstat_name xnames[stat_cnt];
@@ -265,7 +265,7 @@ cnxk_nix_xstats_get_by_id(struct rte_eth_dev *eth_dev, 
const uint64_t *ids,
  uint64_t *values, unsigned int n)
 {
struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
-   uint32_t nix_cnt = roc_nix_num_xstats_get(&dev->nix);
+   uint32_t nix_cnt = roc_nix_xstats_names_get(&dev->nix, NULL, 0);
uint32_t stat_cnt = nix_cnt + (dev->nb_rxq * CNXK_NB_RXQ_STATS) +
(dev->nb_txq * CNXK_NB_TXQ_STATS);
struct rte_eth_xstat xstats[stat_cnt];
-- 
2.25.1



Re: [PATCH v3 1/4] doc/howto: rework section on virtio-user as exception path

2022-07-01 Thread Maxime Coquelin




On 6/10/22 17:35, Bruce Richardson wrote:

This patch extensively reworks the howto guide on using virtio-user for
exception packets. Changes include:

* rename "exceptional path" to "exception path"
* remove references to uio and just reference vfio-pci
* simplify testpmd command-lines, giving a basic usage example first
   before adding on detail about checksum or TSO parameters
* give a complete working example showing traffic flowing through the
   whole system from a testpmd loopback using the created TAP netdev
* replace use of "ifconfig" with Linux standard "ip" command
* other general rewording.

CC: sta...@dpdk.org

Signed-off-by: Bruce Richardson 

---
V3:
* fix error reported by Chenbo on review.
* add stable on CC, since this rework could be applicable for older
   releases too, if desired for backport.
---
  .../howto/virtio_user_as_exceptional_path.rst | 159 +++---
  1 file changed, 100 insertions(+), 59 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



RE: [PATCH v2] vhost: fix avail idx update error when desc copy failed

2022-07-01 Thread Xia, Chenbo
> -Original Message-
> From: Gaoxiang Liu 
> Sent: Wednesday, June 22, 2022 9:20 AM
> To: maxime.coque...@redhat.com; Xia, Chenbo 
> Cc: dev@dpdk.org; liugaoxi...@huawei.com; Gaoxiang Liu
> ; sta...@dpdk.org
> Subject: [PATCH v2] vhost: fix avail idx update error when desc copy
> failed
> 
> When copy_desc_to_mbuf function failed, i added 1.

Function name now is desc_to_mbuf

> And last_avail_idx added i, other than i - 1.
> It may cause that the first mbuf in mbuf-list is dropped,
> the second mbuf in mbuf-list is received in the following
> rx procedure.
> And The pkt_len of the second mbuf is zero, resulting in
> segment fault when parsing the mbuf.

Could you help elaborate more? Do you mean first mbuf len is zero
as it's dropped? And where does the segfault happen? APP? Please
describe more to help understand the issue.

But I do notice one problem here is if vhost APP does not handle
the mbuf array correctly, some packets will be missed in the case
of pkts got dropped in the middle of a burst.

Thanks,
Chenbo

> 
> Fixes: 0fd5608ef97f ("vhost: handle mbuf allocation failure")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Gaoxiang Liu 
> 
> ---
> v2:
> * Fixed other idx update errors.
> ---
>  lib/vhost/virtio_net.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 68a26eb17d..eb254e1024 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -2850,11 +2850,11 @@ virtio_dev_tx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>   if (dropped)
>   rte_pktmbuf_free_bulk(&pkts[i - 1], count - i + 1);
> 
> - vq->last_avail_idx += i;
> + vq->last_avail_idx += i - dropped;
> 
>   do_data_copy_dequeue(vq);
> - if (unlikely(i < count))
> - vq->shadow_used_idx = i;
> + if (unlikely((i - dropped) < count))
> + vq->shadow_used_idx = i - dropped;
>   if (likely(vq->shadow_used_idx)) {
>   flush_shadow_used_ring_split(dev, vq);
>   vhost_vring_call_split(dev, vq);
> --
> 2.32.0



RE: [PATCH v3 2/3] doc: update vhost sample app docs

2022-07-01 Thread Xia, Chenbo
> -Original Message-
> From: Lipiec, Herakliusz 
> Sent: Tuesday, June 21, 2022 11:21 PM
> To: maxime.coque...@redhat.com; Xia, Chenbo ;
> Richardson, Bruce 
> Cc: dev@dpdk.org; Lipiec, Herakliusz 
> Subject: [PATCH v3 2/3] doc: update vhost sample app docs
> 
> Vhost sample app documentation describes parameters that are not in the
> code and omits parameters that exist. Also switching the order of
> sections on running vhost and VM, since the --client parameter in the
> sample line requires a socket to be created by VM. Removing uio
> references and updating with vfio-pci.
> 
> Signed-off-by: Herakliusz Lipiec 
> ---
> V3:
>   * fix apply issues
> V2:
>   * Rewording portmask description as suggested by Chenbo.
> ---
>  doc/guides/sample_app_ug/vhost.rst | 67 --
>  1 file changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/doc/guides/sample_app_ug/vhost.rst
> b/doc/guides/sample_app_ug/vhost.rst
> index e034115ce9..982e19214d 100644
> --- a/doc/guides/sample_app_ug/vhost.rst
> +++ b/doc/guides/sample_app_ug/vhost.rst
> @@ -33,19 +33,7 @@ The application is located in the ``vhost`` sub-
> directory.
>  .. note::
> In this example, you need build DPDK both on the host and inside guest.
> 
> -Start the vswitch example
> -~
> -
> -.. code-block:: console
> -
> -./dpdk-vhost -l 0-3 -n 4 --socket-mem 1024  \
> - -- --socket-file /tmp/sock0 --client \
> - ...
> -
> -Check the `Parameters`_ section for the explanations on what do those
> -parameters mean.
> -
> -.. _vhost_app_run_vm:
> +. _vhost_app_run_vm:
> 
>  Start the VM
>  
> @@ -66,6 +54,19 @@ Start the VM
>  some specific features, a higher version might be need. Such as
>  QEMU 2.7 (or above) for the reconnect feature.
> 
> +
> +Start the vswitch example
> +~
> +
> +.. code-block:: console
> +
> +./dpdk-vhost -l 0-3 -n 4 --socket-mem 1024  \
> + -- --socket-file /tmp/sock0 --client \
> + ...
> +
> +Check the `Parameters`_ section for the explanations on what do those
> +parameters mean.
> +
>  .. _vhost_app_run_dpdk_inside_guest:
> 
>  Run testpmd inside guest
> @@ -77,8 +78,8 @@ could be done by:
> 
>  .. code-block:: console
> 
> -   modprobe uio_pci_generic
> -   dpdk/usertools/dpdk-devbind.py -b uio_pci_generic :00:04.0
> +   modprobe vfio-pci
> +   dpdk/usertools/dpdk-devbind.py -b vfio-pci :00:04.0
> 
>  Then start testpmd for packet forwarding testing.
> 
> @@ -87,6 +88,9 @@ Then start testpmd for packet forwarding testing.
>  .//app/dpdk-testpmd -l 0-1 -- -i
>  > start tx_first
> 
> +For more information about vIOMMU and NO-IOMMU and VFIO please refer to
> +:doc:`/../linux_gsg/linux_drivers` section of the DPDK Getting started
> guide.
> +
>  Inject packets
>  --
> 
> @@ -146,26 +150,10 @@ The rx-retry-delay option specifies the timeout (in
> micro seconds) between
>  retries on an RX burst, it takes effect only when rx retry is enabled.
> The
>  default value is 15.
> 
> -**--dequeue-zero-copy**
> -Dequeue zero copy will be enabled when this option is given. it is worth
> to
> -note that if NIC is bound to driver with iommu enabled, dequeue zero copy
> -cannot work at VM2NIC mode (vm2vm=0) due to currently we don't setup
> iommu
> -dma mapping for guest memory.
> -
> -**--vlan-strip 0|1**
> -VLAN strip option is removed, because different NICs have different
> behaviors
> -when disabling VLAN strip. Such feature, which heavily depends on
> hardware,
> -should be removed from this example to reduce confusion. Now, VLAN strip
> is
> -enabled and cannot be disabled.
> -
>  **--builtin-net-driver**
>  A very simple vhost-user net driver which demonstrates how to use the
> generic
>  vhost APIs will be used when this option is given. It is disabled by
> default.
> 
> -**--dma-type**
> -This parameter is used to specify DMA type for async vhost-user net
> driver which
> -demonstrates how to use the async vhost APIs. It's used in combination
> with dmas.
> -
>  **--dmas**
>  This parameter is used to specify the assigned DMA device of a vhost
> device.
>  Async vhost-user net driver will be used if --dmas is set. For example
> @@ -176,6 +164,20 @@ operation. The index of the device corresponds to the
> socket file in order,
>  that means vhost device 0 is created through the first socket file, vhost
>  device 1 is created through the second socket file, and so on.
> 
> +**--total-num-mbufs 0-N**
> +This parameter sets the number of mbufs to be allocated in mbuf pools,
> +the default value is 147456. This is can be used if launch of a port
> fails
> +due to shortage of mbufs.
> +
> +**--tso 0|1**
> +Disables/enables TCP segment offload.
> +
> +**--tx-csum 0|1**
> +Disables/enables TX checksum offload.
> +
> +**-p mask**
> +Port mask which specifies the ports to be used
> +
>  Common Issues
>  -
> 
> @@ -204,7 +206,8 @@ Common Issues
> 

[PATCH v3] vhost: prefix logs with context

2022-07-01 Thread David Marchand
We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).

Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.

Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,

Signed-off-by: David Marchand 
Reviewed-by: Maxime Coquelin 
---
Changes since v2:
- rebased on next-virtio,

Changes since v1:
- preserved original format for logs (removing extra ':'),

---
 lib/vhost/iotlb.c  |  30 +-
 lib/vhost/socket.c | 129 -
 lib/vhost/vdpa.c   |   4 +-
 lib/vhost/vhost.c  | 146 +-
 lib/vhost/vhost.h  |  20 +-
 lib/vhost/vhost_user.c | 644 +
 lib/vhost/virtio_net.c | 258 +
 7 files changed, 636 insertions(+), 595 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 5a5ba8b82a..35b4193606 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -70,18 +70,18 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, 
struct vhost_virtqueue *
 
ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
if (ret) {
-   VHOST_LOG_CONFIG(DEBUG,
-   "(%s) IOTLB pool %s empty, clear entries for 
pending insertion\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+   "IOTLB pool %s empty, clear entries for pending 
insertion\n",
+   vq->iotlb_pool->name);
if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
vhost_user_iotlb_pending_remove_all(vq);
else
vhost_user_iotlb_cache_random_evict(vq);
ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "(%s) IOTLB pool %s still empty, 
pending insertion failure\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR,
+   "IOTLB pool %s still empty, pending insertion 
failure\n",
+   vq->iotlb_pool->name);
return;
}
}
@@ -169,18 +169,18 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, 
struct vhost_virtqueue *vq
 
ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
if (ret) {
-   VHOST_LOG_CONFIG(DEBUG,
-   "(%s) IOTLB pool %s empty, clear entries for 
cache insertion\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG,
+   "IOTLB pool %s empty, clear entries for cache 
insertion\n",
+   vq->iotlb_pool->name);
if (!TAILQ_EMPTY(&vq->iotlb_list))
vhost_user_iotlb_cache_random_evict(vq);
else
vhost_user_iotlb_pending_remove_all(vq);
ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "(%s) IOTLB pool %s still empty, cache 
insertion failed\n",
-   dev->ifname, vq->iotlb_pool->name);
+   VHOST_LOG_CONFIG(dev->ifname, ERR,
+   "IOTLB pool %s still empty, cache insertion 
failed\n",
+   vq->iotlb_pool->name);
return;
}
}
@@ -320,7 +320,7 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
 
snprintf(pool_name, sizeof(pool_name), "iotlb_%u_%d_%d",
getpid(), dev->vid, vq_index);
-   VHOST_LOG_CONFIG(DEBUG, "(%s) IOTLB cache name: %s\n", dev->ifname, 
pool_name);
+   VHOST_LOG_CONFIG(dev->ifname, DEBUG, "IOTLB cache name: %s\n", 
pool_name);
 
/* If already created, free it and recreate */
vq->iotlb_pool = rte_mempool_lookup(pool_name);
@@ -332,8 +332,8 @@ vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
RTE_MEMPOOL_F_NO_CACHE_ALIGN |
RTE_MEMPOOL_F_SP_PUT);
if (!vq->iotlb_pool) {
-   VHOST_LOG_CONFIG(ERR, "(%s) Failed to create IOTLB cache pool 
%s\n",
-   dev->ifname, pool_name);
+   VHO

Re: [PATCH v3 1/3] examples/vhost: update makefile to match meson build system

2022-07-01 Thread Maxime Coquelin




On 6/21/22 17:20, Herakliusz Lipiec wrote:

Meson build system creates a vhost binary but Makefile
and docs reference same as vhost-switch. Updating makefile
to match meson and the docs accordingly.

Signed-off-by: Herakliusz Lipiec 
Acked-by: Bruce Richardson 
---
V2:
  * Moving relevant doc updates here from second patch as per
Bruces suggestion.
---
  doc/guides/sample_app_ug/vhost.rst | 10 +-
  examples/vhost/Makefile|  2 +-
  2 files changed, 6 insertions(+), 6 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Re: [PATCH] app/testpmd: fix GTP PSC raw processing

2022-07-01 Thread Singh, Aman Deep

Hi Gregory,


On 6/30/2022 6:20 PM, Gregory Etelson wrote:

Fix GTP PSP extension size initialization.
Clear input buffer.

cc: sta...@dpdk.org

Fixes: c65282c9aa31 ("app/testpmd: fix GTP PSC raw processing")
Signed-off-by: Gregory Etelson 
---
  app/test-pmd/cmdline_flow.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 6cb1173385..7f50028eb7 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -11030,10 +11030,12 @@ cmd_set_raw_parsed(const struct buffer *in)
const struct rte_flow_item_gtp_psc
*opt = item->spec;
struct rte_gtp_psc_generic_hdr *hdr;
-
-   *total_size += RTE_ALIGN(sizeof(hdr),
+   size_t hdr_size = RTE_ALIGN(sizeof(*hdr),
 sizeof(int32_t));


we missed sizeof(*hdr), last time. Ok now.


+
+   *total_size += hdr_size;
hdr = (typeof(hdr))(data_tail - (*total_size));
+   memset(hdr, 0, hdr_size);


Is this memset adding a value here ?


*hdr = opt->hdr;
hdr->ext_hdr_len = 1;
gtp_psc = i;




Re: [PATCH v2] doc/prog_guide: fix readability in lib vhost prog guide

2022-07-01 Thread Maxime Coquelin




On 6/23/22 15:57, Herakliusz Lipiec wrote:

fix grammar issues and readbility in vhost library programmer guide

Fixes: 768274ebbd5e ("vhost: avoid populate guest memory")
Cc: sta...@dpdk.org

Signed-off-by: Herakliusz Lipiec 
---
  doc/guides/prog_guide/vhost_lib.rst | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)




Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v3 1/3] examples/vhost: update makefile to match meson build system

2022-07-01 Thread Maxime Coquelin




On 6/21/22 17:20, Herakliusz Lipiec wrote:

Meson build system creates a vhost binary but Makefile
and docs reference same as vhost-switch. Updating makefile
to match meson and the docs accordingly.

Signed-off-by: Herakliusz Lipiec 
Acked-by: Bruce Richardson 
---
V2:
  * Moving relevant doc updates here from second patch as per
Bruces suggestion.
---
  doc/guides/sample_app_ug/vhost.rst | 10 +-
  examples/vhost/Makefile|  2 +-
  2 files changed, 6 insertions(+), 6 deletions(-)




Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v3 2/3] doc: update vhost sample app docs

2022-07-01 Thread Maxime Coquelin




On 6/21/22 17:20, Herakliusz Lipiec wrote:

Vhost sample app documentation describes parameters that are not in the
code and omits parameters that exist. Also switching the order of
sections on running vhost and VM, since the --client parameter in the
sample line requires a socket to be created by VM. Removing uio
references and updating with vfio-pci.

Signed-off-by: Herakliusz Lipiec 
---
V3:
   * fix apply issues
V2:
   * Rewording portmask description as suggested by Chenbo.
---
  doc/guides/sample_app_ug/vhost.rst | 67 --
  1 file changed, 35 insertions(+), 32 deletions(-)




Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v3 3/3] examples/vhost: update vhost usage message

2022-07-01 Thread Maxime Coquelin




On 6/21/22 17:20, Herakliusz Lipiec wrote:

updating vhost usage message to be aligned with the documentation.

Signed-off-by: Herakliusz Lipiec 
Reviewed-by: Chenbo Xia 
---
  examples/vhost/main.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)




Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH] net/virtio: fix socket nonblocking mode affects initialization

2022-07-01 Thread Maxime Coquelin




On 6/17/22 04:42, Yuan Wang wrote:

The virtio-user initialization requires unix socket to receive backend
messages in block mode. However, vhost_user_update_link_state() sets
the same socket to nonblocking via fcntl, which affects all threads.
Enabling the rxq interrupt can causes both of these behaviors to occur
concurrently, with the result that the initialization may fail
because no messages are received in nonblocking socket.

Thread 1:
virtio_init_device()
--> virtio_user_start_device()
--> vhost_user_set_memory_table()
--> vhost_user_check_reply_ack()

Thread 2:
virtio_interrupt_handler()
--> vhost_user_update_link_state()

Fix that by replacing O_NONBLOCK with the recv per-call option
MSG_DONTWAIT.

Fixes: ef53b6030039 ("net/virtio-user: support LSC")
Cc: sta...@dpdk.org

Signed-off-by: Yuan Wang 
---
  drivers/net/virtio/virtio_user/vhost_user.c | 15 +--
  1 file changed, 1 insertion(+), 14 deletions(-)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v2] doc: update async enqueue API usage

2022-07-01 Thread Maxime Coquelin




On 6/21/22 09:21, xuan.d...@intel.com wrote:

From: Xuan Ding 

This patch updates the correct usage for async enqueue APIs.
The rte_vhost_poll_enqueue_completed() needs to be
called in time to notify the guest of completed packets and
avoid packet loss.

Signed-off-by: Xuan Ding 
---
v2:
* refine doc and commit log
---
  doc/guides/prog_guide/vhost_lib.rst | 8 
  1 file changed, 8 insertions(+)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH] vdpa/ifc: fix vhost message size check issue

2022-07-01 Thread Maxime Coquelin




On 6/21/22 15:46, Andy Pei wrote:

For vhost message VHOST_USER_GET_CONFIG, we do not check
payload size in vhost lib, we check payload size in driver
specific ops.
For ifc vdpa driver, we just need to make sure payload size
is not smaller than sizeof(struct virtio_blk_config).

Fixes: 856d03bcdc54 ("vdpa/ifc: add block operations")

Signed-off-by: Andy Pei 
---
  drivers/vdpa/ifc/ifcvf_vdpa.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH] vhost: fix virtio blk vDPA live migration IO drop

2022-07-01 Thread Maxime Coquelin




On 6/22/22 09:47, Andy Pei wrote:

In the virtio blk vDPA live migration use case, before the live
migration process, QEMU will set call fd to vDPA back-end. QEMU
and vDPA back-end stand by until live migration starts.
During live migration process, QEMU sets kick fd and a new call
fd. However, after the kick fd is set to the vDPA back-end, the
vDPA back-end configures device and data path starts. The new
call fd will cause some kind of "re-configuration", this kind
of "re-configuration" cause IO drop.
After this patch, vDPA back-end configures device after kick fd
and call fd are well set and make sure no IO drops.
This patch only impact virtio blk vDPA device and does not impact
net device.

Fixes: 7015b6577178 ("vdpa/ifc: add block device SW live-migration")

Signed-off-by: Andy Pei 
---
  lib/vhost/vhost_user.c | 15 +++
  1 file changed, 15 insertions(+)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v3] doc: clean vhost async path doc

2022-07-01 Thread Maxime Coquelin




On 6/22/22 03:45, xuan.d...@intel.com wrote:

From: Xuan Ding 

This patch moves the 'Recommended IOVA mode in async datapath'
section under 'Vhost asynchronous data path' as a sub-section,
which makes the doc cleaner.

Signed-off-by: Xuan Ding 
Reviewed-by: Jiayu Hu 
---
v3:
* add Reviewd-by

v2:
* fix a typo in commit log
---
  doc/guides/prog_guide/vhost_lib.rst | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH] vhost: fix sync dequeue offload

2022-07-01 Thread Maxime Coquelin




On 6/24/22 07:38, xuan.d...@intel.com wrote:

From: Xuan Ding 

This patch fixes the missing virtio net header copy in sync
dequeue path caused by refactoring, which affects dequeue
offloading.

Fixes: 6d823bb302c7("vhost: prepare sync for descriptor to mbuf refactoring")

Signed-off-by: Xuan Ding 
---
  lib/vhost/virtio_net.c | 14 +++---
  1 file changed, 11 insertions(+), 3 deletions(-)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v4] examples/vhost: fix retry logic on eth rx path

2022-07-01 Thread Maxime Coquelin




On 6/22/22 11:25, Yuan Wang wrote:

drain_eth_rx() uses rte_vhost_avail_entries() to calculate
the available entries to determine if a retry is required.
However, this function only works with split rings, and
calculating packed rings will return the wrong value and cause
unnecessary retries resulting in a significant performance penalty.

This patch fix that by using the difference between tx/rx burst
as the retry condition.

Fixes: be800696c26e ("examples/vhost: use burst enqueue and dequeue from lib")
Cc: sta...@dpdk.org

Signed-off-by: Yuan Wang 
Tested-by: Wei Ling 
---
V4: Fix fiexs tag.
V3: Fix mbuf index.
V2: Rebase to 22.07 rc1.
---
  examples/vhost/main.c | 28 +++-
  1 file changed, 11 insertions(+), 17 deletions(-)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime
__ATOMIC_SEQ_CST);



Re: [PATCH] vdpa/mlx5: add ConnectX-6 LX device ID

2022-07-01 Thread Maxime Coquelin




On 6/23/22 11:00, Wisam Jaddo wrote:

This adds ConnectX-6 LX to the list of supported
Mellanox devices that run the MLX5 vdpa PMD.

Signed-off-by: Wisam Jaddo 
---
  drivers/vdpa/mlx5/mlx5_vdpa.c | 4 
  1 file changed, 4 insertions(+)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v2] net/vhost: fix deadlock on vring state change

2022-07-01 Thread Maxime Coquelin




On 6/27/22 07:51, Yuan Wang wrote:

If vring state changes after pmd starts working, the locked vring
notifies pmd, thus calling update_queuing_status(), the latter
will wait for pmd to finish accessing vring, while pmd is also
waiting for vring to be unlocked, thus causing deadlock.

Actually, update_queuing_status() only needs to wait while
destroy/stopping the device, but not in other cases.

This patch adds a flag for whether or not to wait to fix this issue.

Fixes: 1ce3c7fe149f ("net/vhost: emulate device start/stop behavior")
Cc: sta...@dpdk.org

Signed-off-by: Yuan Wang 
---
V2: rewrite the commit log.
---
  drivers/net/vhost/rte_eth_vhost.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)




Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v2] vdpa/sfc: handle sync issue between qemu and vhost-user

2022-07-01 Thread Maxime Coquelin




On 6/28/22 07:29, abhimanyu.sa...@xilinx.com wrote:

From: Abhimanyu Saini 

When DPDK app is running in the VF, it sometimes rings the doorbell
before dev_config has had a chance to complete and hence it misses
the event. As workaround, ring the doorbell when vDPA reports the
notify_area to QEMU.

Fixes: 630be406dcbf ("vdpa/sfc: get queue notify area info")
Cc: sta...@dpdk.org

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Abhimanyu Saini 
---
v1:
* Update the commit id that this patch fixes

  drivers/vdpa/sfc/sfc_vdpa_ops.c | 14 ++
  1 file changed, 14 insertions(+)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v2] vhost: fix unchecked return value

2022-07-01 Thread Maxime Coquelin




On 6/29/22 11:07, Jiayu Hu wrote:

This patch checks the return value of rte_dma_info_get()
called in rte_vhost_async_dma_configure().

Coverity issue: 379066
Fixes: 53d3f4778c1d ("vhost: integrate dmadev in asynchronous data-path")
Cc: sta...@dpdk.org

Signed-off-by: Jiayu Hu 
Reviewed-by: Chenbo Xia 
---
v2:
- add cc stable tag
---
  lib/vhost/vhost.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)




Applied to dpdk-next-virtio/main.

Please propose something to protect DMA channels registration & also
consider introducing a function to unregister.

Thanks,
Maxime



Re: [PATCH v3] vhost: prefix logs with context

2022-07-01 Thread Maxime Coquelin




On 7/1/22 15:20, David Marchand wrote:

We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).

Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.

Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,

Signed-off-by: David Marchand 
Reviewed-by: Maxime Coquelin 
---
Changes since v2:
- rebased on next-virtio,

Changes since v1:
- preserved original format for logs (removing extra ':'),

---
  lib/vhost/iotlb.c  |  30 +-
  lib/vhost/socket.c | 129 -
  lib/vhost/vdpa.c   |   4 +-
  lib/vhost/vhost.c  | 146 +-
  lib/vhost/vhost.h  |  20 +-
  lib/vhost/vhost_user.c | 644 +
  lib/vhost/virtio_net.c | 258 +
  7 files changed, 636 insertions(+), 595 deletions(-)



Applied to dpdk-next-virtio/main.

Thanks,
Maxime



Re: [PATCH v2 0/4] Vhost logs fixes and improvement

2022-07-01 Thread Maxime Coquelin




On 7/1/22 09:55, David Marchand wrote:

Here is a series that fixes log messages (with one regression being
fixed in patch 2) and changes the VHOST_LOG_* helpers to enforce that
vhost log messages will always have some context/prefix to help
debugging on setups with many vhost ports.

The first three patches are low risk and can probably be merged in
v22.07.

Changes since v1:
- fixed log formats in patch4,





Applied first 3 patches to dpdk-next-virtio/main.

Thanks,
Maxime



RE: [PATCH v2] vhost: fix unchecked return value

2022-07-01 Thread Hu, Jiayu


> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, July 1, 2022 10:00 PM
> To: Hu, Jiayu ; dev@dpdk.org
> Cc: Xia, Chenbo ; sta...@dpdk.org
> Subject: Re: [PATCH v2] vhost: fix unchecked return value
> 
> 
> 
> On 6/29/22 11:07, Jiayu Hu wrote:
> > This patch checks the return value of rte_dma_info_get() called in
> > rte_vhost_async_dma_configure().
> >
> > Coverity issue: 379066
> > Fixes: 53d3f4778c1d ("vhost: integrate dmadev in asynchronous
> > data-path")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Jiayu Hu 
> > Reviewed-by: Chenbo Xia 
> > ---
> > v2:
> > - add cc stable tag
> > ---
> >   lib/vhost/vhost.c | 6 +-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> 
> 
> Applied to dpdk-next-virtio/main.
> 
> Please propose something to protect DMA channels registration & also
> consider introducing a function to unregister.

Thanks Maxime. Will do.

Regards,
Jiayu
> 
> Thanks,
> Maxime



Re: release candidate 22.07-rc2

2022-07-01 Thread Ferruh Yigit

On 6/27/2022 3:15 AM, Thomas Monjalon wrote:

A new DPDK release candidate is ready for testing:
https://git.dpdk.org/dpdk/tag/?id=v22.07-rc2

There are 317 new patches in this snapshot.

Release notes:
https://doc.dpdk.org/guides/rel_notes/release_22_07.html

There were a lot of updates in drivers.
The driver features should be frozen now.

Please test and report issues on bugs.dpdk.org.
Do not forget to review examples and documentation updates.

DPDK 22.07-rc3 is expected in one week (targetting end of June).

Thank you everyone



The testing with dpdk 22.07-rc2 from AMD looks good.
Following functional testing is done:

* Basic NIC IO forwarding (single/multiple queues/cores)
* Basic cryptodev
* AES-CBC128 SHA1-HMAC
* AES-CBC128 SHA2-256-HMAC
* AES-GCM-128

Systems tested:
  - Platform: AMD EPYC 7713 64-Core Processor (2 Socket)
 OS: Ubuntu 20.04.3 LTS
 NICs:
  - Mellanox MLX5 CX-6


Tuning guide: AMD Milan DPDK tuning guide (BIOS and Kernel)
(https://www.amd.com/system/files/documents/data-plane-development-kit-tuning-guide-amd-epyc7003-series-processors.pdf)


Thanks to the team that did the testing, I am just a messenger ;)


RE: [PATCH] app/testpmd: fix GTP PSC raw processing

2022-07-01 Thread Gregory Etelson
Hello,

> > --- a/app/test-pmd/cmdline_flow.c
> > +++ b/app/test-pmd/cmdline_flow.c
> > @@ -11030,10 +11030,12 @@ cmd_set_raw_parsed(const struct buffer
> *in)
> >   const struct rte_flow_item_gtp_psc
> >   *opt = item->spec;
> >   struct rte_gtp_psc_generic_hdr *hdr;
> > -
> > - *total_size += RTE_ALIGN(sizeof(hdr),
> > + size_t hdr_size = RTE_ALIGN(sizeof(*hdr),
> >sizeof(int32_t));
> 
> we missed sizeof(*hdr), last time. Ok now.
> 
> > +
> > + *total_size += hdr_size;
> >   hdr = (typeof(hdr))(data_tail - 
> > (*total_size));
> > + memset(hdr, 0, hdr_size);
> 
> Is this memset adding a value here ?
> 

Size of struct rte_gtp_psc_generic_hdr is 3 bytes. In a packet the structure is 
padded 
with one extra byte for 32bits aligned value. The extra byte content is not 
covered by 
the GTP_PSC flow item configuration. Application must explicitly put 0 in that 
byte.
The patch zeros entire 32bits.

> >   *hdr = opt->hdr;
> >   hdr->ext_hdr_len = 1;
> >   gtp_psc = i;



[PATCH] doc: announce changes to rte_eth_set_queue_rate_limit api

2022-07-01 Thread skoteshwar
From: Satha Rao 

rte_eth_set_queue_rate_limit argument rate modified to uint64_t
to support more than 64Gbps.

Signed-off-by: Satha Rao 
---
 doc/guides/rel_notes/deprecation.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 4e5b23c..5bf2b72 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -125,3 +125,8 @@ Deprecation Notices
   applications should be updated to use the ``dmadev`` library instead,
   with the underlying HW-functionality being provided by the ``ioat`` or
   ``idxd`` dma drivers
+
+* ethdev: The function ``rte_eth_set_queue_rate_limit`` takes ``rate`` in Mbps.
+  This parameter declared as uint16_t, queue rate limited to 64Gbps. ``rate``
+  parameter will be modified to uint64_t in DPDK 22.11 so that it can work for
+  more than 64Gbps.
-- 
1.8.3.1



Re: [PATCH 01/17] net/mana: add basic driver, build environment and doc

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:31 -0700
lon...@linuxonhyperv.com wrote:

> diff --git a/doc/guides/nics/features/mana.ini 
> b/doc/guides/nics/features/mana.ini
> new file mode 100644
> index 00..9d8676089b
> --- /dev/null
> +++ b/doc/guides/nics/features/mana.ini
> @@ -0,0 +1,10 @@
> +;
> +; Supported features of the 'cnxk' network poll mode driver.

Looks like cut/paste error!

> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Linux= Y
> +Multiprocess aware   = Y
> +Usage doc= Y
> +x86-64   = Y


Re: [PATCH 02/17] net/mana: add device configuration and stop

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:32 -0700
lon...@linuxonhyperv.com wrote:

> +
> + if (txmode->offloads & ~BNIC_DEV_TX_OFFLOAD_SUPPORT) {
> + DRV_LOG(ERR, "Unsupported TX offload: %lx", txmode->offloads);
> + return -EINVAL;
> + }
> +
> + if (rxmode->offloads & ~BNIC_DEV_RX_OFFLOAD_SUPPORT) {
> + DRV_LOG(ERR, "Unsupported RX offload: %lx", rxmode->offloads);
> + return -EINVAL;
> + }
> +

If the device reports the correct capabilities in dev_info.tx_offload_capa
and dev_info.rx_offload_capa then these checks are unnecessary since the
flags are already checked in ethdev_configure.


Re: [PATCH 10/17] net/mana: implement memory registration

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:40 -0700
lon...@linuxonhyperv.com wrote:

> +int new_pmd_mr(struct mana_mr_btree *local_tree, struct mana_priv *priv,
> +struct rte_mempool *pool);
> +void remove_all_mr(struct mana_priv *priv);
> +void del_pmd_mr(struct mana_mr_cache *mr);

Please use one prefix across all of the driver for functions.
mana_new_pmd_mr etc


Re: [PATCH 11/17] net/mana: implement the hardware layer operations

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:41 -0700
lon...@linuxonhyperv.com wrote:

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

This is including everything and much of this not used here.
I see no rwlock, alarm or rte_malloc here.


Re: [PATCH 12/17] net/mana: add function to start/stop TX queues

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:42 -0700
lon...@linuxonhyperv.com wrote:

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#include "mana.h"
> +

Lots extra include here as well.
Try IWYU tool to find what is needed?


Re: [PATCH 13/17] net/mana: add function to start/stop RX queues

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:43 -0700
lon...@linuxonhyperv.com wrote:

> +
> +static uint8_t mana_rss_hash_key_default[TOEPLITZ_HASH_KEY_SIZE_IN_BYTES] = {
> + 0x2c, 0xc6, 0x81, 0xd1,
> + 0x5b, 0xdb, 0xf4, 0xf7,
> + 0xfc, 0xa2, 0x83, 0x19,
> + 0xdb, 0x1a, 0x3e, 0x94,
> + 0x6b, 0x9e, 0x38, 0xd9,
> + 0x2c, 0x9c, 0x03, 0xd1,
> + 0xad, 0x99, 0x44, 0xa7,
> + 0xd9, 0x56, 0x3d, 0x59,
> + 0x06, 0x3c, 0x25, 0xf3,
> + 0xfc, 0x1f, 0xdc, 0x2a,
> +};
> +

Is this constant?


Re: [PATCH 01/17] net/mana: add basic driver, build environment and doc

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:31 -0700
lon...@linuxonhyperv.com wrote:

> + while (fgets(line, sizeof(line), file) == line) {
> + size_t len = strlen(line);
> + int ret;
> +
> + /* Truncate long lines. */
> + if (len == (sizeof(line) - 1))
> + while (line[(len - 1)] != '\n') {
> + ret = fgetc(file);
> + if (ret == EOF)
> + break;
> + line[(len - 1)] = ret;

An alternative, would be to use getline() which handles arbitrary length input.


Re: [PATCH 01/17] net/mana: add basic driver, build environment and doc

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:31 -0700
lon...@linuxonhyperv.com wrote:

> +
> +struct mana_priv {
> + struct rte_eth_dev_data *dev_data;
> + struct mana_process_priv *process_priv;
> + int num_queues;
> +
> + /* DPDK port */
> + int port_id;
> +
> + /* IB device port */
> + int dev_port;

Are the port values and number of queues really signed?
Best to use unsigned value of specific size.


Re: [PATCH 01/17] net/mana: add basic driver, build environment and doc

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:31 -0700
lon...@linuxonhyperv.com wrote:

> + uint64_t max_mr_size;
> + rte_rwlock_tmr_list_lock;
> +};

Reader/Writer locks are slower for the usual uncontended case.
Unless you have a reader holding onto the lock for a long time,
better to use spin lock.

This is Linux wisdom (thank you paulmck), Windows seems to love
reader/writer locks.


Re: [PATCH 01/17] net/mana: add basic driver, build environment and doc

2022-07-01 Thread Stephen Hemminger
On Fri,  1 Jul 2022 02:02:31 -0700
lon...@linuxonhyperv.com wrote:

> diff --git a/drivers/net/mana/meson.build b/drivers/net/mana/meson.build
> new file mode 100644
> index 00..7ab34c253c
> --- /dev/null
> +++ b/drivers/net/mana/meson.build
> @@ -0,0 +1,34 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2022 Microsoft Corporation
> +
> +if is_windows
> +build = false
> +reason = 'not supported on Windows'
> +subdir_done()
> +endif

Since the driver is listed as only supported on x86, probably best
to enforce that in meson build as well.


  1   2   >