Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Andrew Rybchenko
On 3/12/21 8:46 PM, Thomas Monjalon wrote:
> Since rte_flow is the only API for filtering operations,
> the legacy driver interface filter_ctrl was too much complicated
> for the simple task of getting the struct rte_flow_ops.
> 
> The filter type RTE_ETH_FILTER_GENERIC and
> the filter operarion RTE_ETH_FILTER_GET are removed.
> The new driver callback flow_ops_get replaces filter_ctrl.
> 
> Signed-off-by: Thomas Monjalon 

[snip]

> diff --git a/lib/librte_ethdev/rte_eth_ctrl.h 
> b/lib/librte_ethdev/rte_eth_ctrl.h
> index 8a50dbfef9..42652f9cce 100644
> --- a/lib/librte_ethdev/rte_eth_ctrl.h
> +++ b/lib/librte_ethdev/rte_eth_ctrl.h
> @@ -339,7 +339,7 @@ struct rte_eth_fdir_action {
>  };
>  
>  /**
> - * A structure used to define the flow director filter entry by filter_ctrl 
> API.
> + * A structure used to define the flow director filter entry.
>   */
>  struct rte_eth_fdir_filter {
>   uint32_t soft_id;
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 241af6c4ca..1a896e3e64 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -255,18 +255,19 @@ rte_flow_ops_get(uint16_t port_id, struct 
> rte_flow_error *error)
>  
>   if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
>   code = ENODEV;
> - else if (unlikely(!dev->dev_ops->filter_ctrl ||
> -   dev->dev_ops->filter_ctrl(dev,
> - RTE_ETH_FILTER_GENERIC,
> - RTE_ETH_FILTER_GET,
> - &ops) ||
> -   !ops))
> - code = ENOSYS;
> + else if (unlikely(dev->dev_ops->flow_ops_get == NULL))
> + code = ENOTSUP;
>   else
> - return ops;
> - rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> -NULL, rte_strerror(code));
> - return NULL;
> + code = dev->dev_ops->flow_ops_get(dev, &ops);
> + if (code == 0 && ops == NULL)
> + code = EACCES;

It looks something new. I think it should be mentioned in flow_ops_get
type documentation (similar to eth_promiscuous_enable_t) and
rte_flow_validate() etc functions
return values description.

> +
> + if (code != 0) {
> + rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +NULL, rte_strerror(code));
> + return NULL;
> + }
> + return ops;
>  }
>  
>  /* Check whether a flow rule can be created on a given port. */

[snip]



Re: [dpdk-dev] [PATCH v2 2/2] drivers/net: remove explicit include of legacy filtering

2021-03-15 Thread Hemant Agrawal

for dpaa2

Acked-by: Hemant Agrawal 



Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Hemant Agrawal

For dpaa2

Acked-by: Hemant Agrawal 




Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Thomas Monjalon
15/03/2021 08:18, Andrew Rybchenko:
> On 3/12/21 8:46 PM, Thomas Monjalon wrote:
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -255,18 +255,19 @@ rte_flow_ops_get(uint16_t port_id, struct 
> > rte_flow_error *error)
> >  
> > if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> > code = ENODEV;
> > -   else if (unlikely(!dev->dev_ops->filter_ctrl ||
> > - dev->dev_ops->filter_ctrl(dev,
> > -   RTE_ETH_FILTER_GENERIC,
> > -   RTE_ETH_FILTER_GET,
> > -   &ops) ||
> > - !ops))
> > -   code = ENOSYS;
> > +   else if (unlikely(dev->dev_ops->flow_ops_get == NULL))
> > +   code = ENOTSUP;
> > else
> > -   return ops;
> > -   rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > -  NULL, rte_strerror(code));
> > -   return NULL;
> > +   code = dev->dev_ops->flow_ops_get(dev, &ops);
> > +   if (code == 0 && ops == NULL)
> > +   code = EACCES;
> 
> It looks something new. I think it should be mentioned in flow_ops_get
> type documentation (similar to eth_promiscuous_enable_t) and
> rte_flow_validate() etc functions
> return values description.

It is an internal function used only in rte_flow.c.
The real consequence is to set rte_errno in a lot of rte_flow API.
Not sure there is a good way to document the code details.
Other codes are not documented in rte_flow.h





[dpdk-dev] [PATCH v1 1/1] net/hinic: fix coredump when PMD used by fstack

2021-03-15 Thread Guoyang Zhou
The fstack will use secondary process to access the memory of
eth_dev_ops , and it wants to get the info of dev, but hinic
driver does not initialized it when in secondary process.

Fixes: 66f64dd6dc86 ("net/hinic: fix secondary process")
Cc: sta...@dpdk.org
Signed-off-by: Guoyang Zhou 
---
 drivers/net/hinic/base/hinic_compat.h | 25 -
 drivers/net/hinic/hinic_pmd_ethdev.c  |  5 +
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/net/hinic/base/hinic_compat.h 
b/drivers/net/hinic/base/hinic_compat.h
index 6dd210e..aea3320 100644
--- a/drivers/net/hinic/base/hinic_compat.h
+++ b/drivers/net/hinic/base/hinic_compat.h
@@ -171,6 +171,7 @@ static inline u32 readl(const volatile void *addr)
 #else
 #define CLOCK_TYPE CLOCK_MONOTONIC
 #endif
+#define HINIC_MUTEX_TIMEOUT  10
 
 static inline unsigned long clock_gettime_ms(void)
 {
@@ -225,24 +226,14 @@ static inline int hinic_mutex_destroy(pthread_mutex_t 
*pthreadmutex)
 static inline int hinic_mutex_lock(pthread_mutex_t *pthreadmutex)
 {
int err;
+   struct timespec tout;
 
-   err = pthread_mutex_lock(pthreadmutex);
-   if (!err) {
-   return err;
-   } else if (err == EOWNERDEAD) {
-   PMD_DRV_LOG(ERR, "Mutex lock failed. (ErrorNo=%d)", errno);
-#if defined(__GLIBC__)
-#if __GLIBC_PREREQ(2, 12)
-   (void)pthread_mutex_consistent(pthreadmutex);
-#else
-   (void)pthread_mutex_consistent_np(pthreadmutex);
-#endif
-#else
-   (void)pthread_mutex_consistent(pthreadmutex);
-#endif
-   } else {
-   PMD_DRV_LOG(ERR, "Mutex lock failed. (ErrorNo=%d)", errno);
-   }
+   (void)clock_gettime(CLOCK_TYPE, &tout);
+
+   tout.tv_sec += HINIC_MUTEX_TIMEOUT;
+   err = pthread_mutex_timedlock(pthreadmutex, &tout);
+   if (err)
+   PMD_DRV_LOG(ERR, "Mutex lock failed. (ErrorNo=%d)", err);
 
return err;
 }
diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c 
b/drivers/net/hinic/hinic_pmd_ethdev.c
index 1d6b710..057e7b1 100644
--- a/drivers/net/hinic/hinic_pmd_ethdev.c
+++ b/drivers/net/hinic/hinic_pmd_ethdev.c
@@ -3085,6 +3085,10 @@ static int hinic_dev_close(struct rte_eth_dev *dev)
.filter_ctrl   = hinic_dev_filter_ctrl,
 };
 
+static const struct eth_dev_ops hinic_dev_sec_ops = {
+   .dev_infos_get = hinic_dev_infos_get,
+};
+
 static int hinic_func_init(struct rte_eth_dev *eth_dev)
 {
struct rte_pci_device *pci_dev;
@@ -3099,6 +3103,7 @@ static int hinic_func_init(struct rte_eth_dev *eth_dev)
 
/* EAL is SECONDARY and eth_dev is already created */
if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+   eth_dev->dev_ops = &hinic_dev_sec_ops;
PMD_DRV_LOG(INFO, "Initialize %s in secondary process",
eth_dev->data->name);
 
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Andrew Rybchenko
On 3/15/21 10:54 AM, Thomas Monjalon wrote:
> 15/03/2021 08:18, Andrew Rybchenko:
>> On 3/12/21 8:46 PM, Thomas Monjalon wrote:
>>> --- a/lib/librte_ethdev/rte_flow.c
>>> +++ b/lib/librte_ethdev/rte_flow.c
>>> @@ -255,18 +255,19 @@ rte_flow_ops_get(uint16_t port_id, struct 
>>> rte_flow_error *error)
>>>  
>>> if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
>>> code = ENODEV;
>>> -   else if (unlikely(!dev->dev_ops->filter_ctrl ||
>>> - dev->dev_ops->filter_ctrl(dev,
>>> -   RTE_ETH_FILTER_GENERIC,
>>> -   RTE_ETH_FILTER_GET,
>>> -   &ops) ||
>>> - !ops))
>>> -   code = ENOSYS;
>>> +   else if (unlikely(dev->dev_ops->flow_ops_get == NULL))
>>> +   code = ENOTSUP;
>>> else
>>> -   return ops;
>>> -   rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
>>> -  NULL, rte_strerror(code));
>>> -   return NULL;
>>> +   code = dev->dev_ops->flow_ops_get(dev, &ops);
>>> +   if (code == 0 && ops == NULL)
>>> +   code = EACCES;
>> It looks something new. I think it should be mentioned in flow_ops_get
>> type documentation (similar to eth_promiscuous_enable_t) and
>> rte_flow_validate() etc functions
>> return values description.
> It is an internal function used only in rte_flow.c.
> The real consequence is to set rte_errno in a lot of rte_flow API.
> Not sure there is a good way to document the code details.
> Other codes are not documented in rte_flow.h

First of all it is a behaviour of the flow_ops_get callback and
driver developers should know that it is a legal to return 0 and
ops==NULL and know what it means.

Second, it is visible as rte_flow_validate() (and other functions
which use rte_flow_ops_get()) return value value which has
special meaning. So, should be documented.



Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Thomas Monjalon
15/03/2021 09:43, Andrew Rybchenko:
> On 3/15/21 10:54 AM, Thomas Monjalon wrote:
> > 15/03/2021 08:18, Andrew Rybchenko:
> >> On 3/12/21 8:46 PM, Thomas Monjalon wrote:
> >>> --- a/lib/librte_ethdev/rte_flow.c
> >>> +++ b/lib/librte_ethdev/rte_flow.c
> >>> @@ -255,18 +255,19 @@ rte_flow_ops_get(uint16_t port_id, struct 
> >>> rte_flow_error *error)
> >>>  
> >>>   if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> >>>   code = ENODEV;
> >>> - else if (unlikely(!dev->dev_ops->filter_ctrl ||
> >>> -   dev->dev_ops->filter_ctrl(dev,
> >>> - RTE_ETH_FILTER_GENERIC,
> >>> - RTE_ETH_FILTER_GET,
> >>> - &ops) ||
> >>> -   !ops))
> >>> - code = ENOSYS;
> >>> + else if (unlikely(dev->dev_ops->flow_ops_get == NULL))
> >>> + code = ENOTSUP;
> >>>   else
> >>> - return ops;
> >>> - rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> >>> -NULL, rte_strerror(code));
> >>> - return NULL;
> >>> + code = dev->dev_ops->flow_ops_get(dev, &ops);
> >>> + if (code == 0 && ops == NULL)
> >>> + code = EACCES;
> >> It looks something new. I think it should be mentioned in flow_ops_get
> >> type documentation (similar to eth_promiscuous_enable_t) and
> >> rte_flow_validate() etc functions
> >> return values description.
> > 
> > It is an internal function used only in rte_flow.c.
> > The real consequence is to set rte_errno in a lot of rte_flow API.
> > Not sure there is a good way to document the code details.
> > Other codes are not documented in rte_flow.h
> 
> First of all it is a behaviour of the flow_ops_get callback and
> driver developers should know that it is a legal to return 0 and
> ops==NULL and know what it means.

The combination code 0 and ops NULL is not new.
Previously, it was returning ENOSYS.
I've just given a more meaningful error code: EACCES,
while replacing ENOSYS with ENOTSUP for the other case.

> Second, it is visible as rte_flow_validate() (and other functions
> which use rte_flow_ops_get()) return value value which has
> special meaning. So, should be documented.

Yes, I should update the API doc where ENOSYS was mentioned.
Or probably better: I should keep the error code ENOSYS
and do not break API.
Preference?




[dpdk-dev] [RFC] net/i40e: change the timing of FDIR input set configuration

2021-03-15 Thread Murphy Yang
The configuration of FDIR input set should not be set
during flow validate. It should be set when flow create.

Signed-off-by: Murphy Yang 
---
 drivers/net/i40e/i40e_ethdev.h |  1 +
 drivers/net/i40e/i40e_fdir.c   | 88 +++
 drivers/net/i40e/i40e_flow.c   | 95 +++---
 3 files changed, 96 insertions(+), 88 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 1e8f5d3a87..c6ec071f44 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -631,6 +631,7 @@ struct i40e_fdir_flow_ext {
uint8_t raw_id;
uint8_t is_vf;   /* 1 for VF, 0 for port dev */
uint16_t dst_id; /* VF ID, available when is_vf is 1*/
+   uint64_t input_set;
bool inner_ip;   /* If there is inner ip */
enum i40e_fdir_ip_type iip_type; /* ip type for inner ip */
enum i40e_fdir_ip_type oip_type; /* ip type for outer ip */
diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index c572d003cb..af0c00de04 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1588,6 +1588,83 @@ i40e_flow_set_fdir_flex_msk(struct i40e_pf *pf,
pf->fdir.flex_mask_flag[pctype] = 1;
 }
 
+static int
+i40e_flow_set_fdir_inset(struct i40e_pf *pf,
+enum i40e_filter_pctype pctype,
+uint64_t input_set)
+{
+   uint32_t mask_reg[I40E_INSET_MASK_NUM_REG] = {0};
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint64_t inset_reg = 0;
+   int i, num;
+
+   /* Check if the input set is valid */
+   if (i40e_validate_input_set(pctype, RTE_ETH_FILTER_FDIR,
+   input_set) != 0) {
+   PMD_DRV_LOG(ERR, "Invalid input set");
+   return -EINVAL;
+   }
+
+   /* Check if the configuration is conflicted */
+   if (pf->fdir.inset_flag[pctype] &&
+   memcmp(&pf->fdir.input_set[pctype], &input_set, sizeof(uint64_t)))
+   return -1;
+
+   if (pf->fdir.inset_flag[pctype] &&
+   !memcmp(&pf->fdir.input_set[pctype], &input_set, sizeof(uint64_t)))
+   return 0;
+
+   num = i40e_generate_inset_mask_reg(input_set, mask_reg,
+  I40E_INSET_MASK_NUM_REG);
+   if (num < 0)
+   return -EINVAL;
+
+   if (pf->support_multi_driver) {
+   for (i = 0; i < num; i++)
+   if (i40e_read_rx_ctl(hw,
+   I40E_GLQF_FD_MSK(i, pctype)) !=
+   mask_reg[i]) {
+   PMD_DRV_LOG(ERR, "Input set setting is not"
+   " supported with"
+   " `support-multi-driver`"
+   " enabled!");
+   return -EPERM;
+   }
+   for (i = num; i < I40E_INSET_MASK_NUM_REG; i++)
+   if (i40e_read_rx_ctl(hw,
+   I40E_GLQF_FD_MSK(i, pctype)) != 0) {
+   PMD_DRV_LOG(ERR, "Input set setting is not"
+   " supported with"
+   " `support-multi-driver`"
+   " enabled!");
+   return -EPERM;
+   }
+
+   } else {
+   for (i = 0; i < num; i++)
+   i40e_check_write_reg(hw, I40E_GLQF_FD_MSK(i, pctype),
+   mask_reg[i]);
+   /*clear unused mask registers of the pctype */
+   for (i = num; i < I40E_INSET_MASK_NUM_REG; i++)
+   i40e_check_write_reg(hw,
+   I40E_GLQF_FD_MSK(i, pctype), 0);
+   }
+
+   inset_reg |= i40e_translate_input_set_reg(hw->mac.type, input_set);
+
+   i40e_check_write_reg(hw, I40E_PRTQF_FD_INSET(pctype, 0),
+(uint32_t)(inset_reg & UINT32_MAX));
+   i40e_check_write_reg(hw, I40E_PRTQF_FD_INSET(pctype, 1),
+(uint32_t)((inset_reg >>
+I40E_32_BIT_WIDTH) & UINT32_MAX));
+
+   I40E_WRITE_FLUSH(hw);
+
+   pf->fdir.input_set[pctype] = input_set;
+   pf->fdir.inset_flag[pctype] = 1;
+   return 0;
+}
+
 static inline unsigned char *
 i40e_find_available_buffer(struct rte_eth_dev *dev)
 {
@@ -1686,6 +1763,17 @@ i40e_flow_add_del_fdir_filter(struct rte_eth_dev *dev,
 
if (add) {
if (filter->input.flow_ext.is_flex_flow) {
+   ret = i40e_flow_set_fdir_inset(pf, pctype,
+   filter->input.flow_ext.input_set);
+   if (ret == -1) {
+

Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Andrew Rybchenko
On 3/15/21 11:55 AM, Thomas Monjalon wrote:
> 15/03/2021 09:43, Andrew Rybchenko:
>> On 3/15/21 10:54 AM, Thomas Monjalon wrote:
>>> 15/03/2021 08:18, Andrew Rybchenko:
 On 3/12/21 8:46 PM, Thomas Monjalon wrote:
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -255,18 +255,19 @@ rte_flow_ops_get(uint16_t port_id, struct 
> rte_flow_error *error)
>  
>   if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
>   code = ENODEV;
> - else if (unlikely(!dev->dev_ops->filter_ctrl ||
> -   dev->dev_ops->filter_ctrl(dev,
> - RTE_ETH_FILTER_GENERIC,
> - RTE_ETH_FILTER_GET,
> - &ops) ||
> -   !ops))
> - code = ENOSYS;
> + else if (unlikely(dev->dev_ops->flow_ops_get == NULL))
> + code = ENOTSUP;

It is described as:
   -ENOTSUP: valid but unsupported rule specification (e.g.
   partial bit-masks are unsupported).
So, it looks different. May be it is really better to keep
ENOSYS.

>   else
> - return ops;
> - rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> -NULL, rte_strerror(code));
> - return NULL;
> + code = dev->dev_ops->flow_ops_get(dev, &ops);
> + if (code == 0 && ops == NULL)
> + code = EACCES;
 It looks something new. I think it should be mentioned in flow_ops_get
 type documentation (similar to eth_promiscuous_enable_t) and
 rte_flow_validate() etc functions
 return values description.
>>>
>>> It is an internal function used only in rte_flow.c.
>>> The real consequence is to set rte_errno in a lot of rte_flow API.
>>> Not sure there is a good way to document the code details.
>>> Other codes are not documented in rte_flow.h
>>
>> First of all it is a behaviour of the flow_ops_get callback and
>> driver developers should know that it is a legal to return 0 and
>> ops==NULL and know what it means.
> 
> The combination code 0 and ops NULL is not new.
> Previously, it was returning ENOSYS.
> I've just given a more meaningful error code: EACCES,
> while replacing ENOSYS with ENOTSUP for the other case.

Yes, exactly. What I'm trying to say that it would be
helpful to make it a bit more transparent to PMD developers.
Yes, it was not documented before, I agree. I think it is
a good time to improve documentation.

>> Second, it is visible as rte_flow_validate() (and other functions
>> which use rte_flow_ops_get()) return value value which has
>> special meaning. So, should be documented.
> 
> Yes, I should update the API doc where ENOSYS was mentioned.
> Or probably better: I should keep the error code ENOSYS
> and do not break API.
> Preference?

Good question. I think we should not distinguish NULL callback
and NULL ops returned by not-NULL callback. So, I think
keeping ENOSYS is the best option here.


Re: [dpdk-dev] [PATCH v2 1/2] ethdev: replace callback getting filter operations

2021-03-15 Thread Thomas Monjalon
15/03/2021 10:08, Andrew Rybchenko:
> On 3/15/21 11:55 AM, Thomas Monjalon wrote:
> > 15/03/2021 09:43, Andrew Rybchenko:
> >> On 3/15/21 10:54 AM, Thomas Monjalon wrote:
> >>> 15/03/2021 08:18, Andrew Rybchenko:
>  On 3/12/21 8:46 PM, Thomas Monjalon wrote:
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -255,18 +255,19 @@ rte_flow_ops_get(uint16_t port_id, struct 
> > rte_flow_error *error)
> >  
> > if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> > code = ENODEV;
> > -   else if (unlikely(!dev->dev_ops->filter_ctrl ||
> > - dev->dev_ops->filter_ctrl(dev,
> > -   
> > RTE_ETH_FILTER_GENERIC,
> > -   RTE_ETH_FILTER_GET,
> > -   &ops) ||
> > - !ops))
> > -   code = ENOSYS;
> > +   else if (unlikely(dev->dev_ops->flow_ops_get == NULL))
> > +   code = ENOTSUP;
> 
> It is described as:
>-ENOTSUP: valid but unsupported rule specification (e.g.
>partial bit-masks are unsupported).
> So, it looks different. May be it is really better to keep
> ENOSYS.
> 
> > else
> > -   return ops;
> > -   rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > -  NULL, rte_strerror(code));
> > -   return NULL;
> > +   code = dev->dev_ops->flow_ops_get(dev, &ops);
> > +   if (code == 0 && ops == NULL)
> > +   code = EACCES;
>  It looks something new. I think it should be mentioned in flow_ops_get
>  type documentation (similar to eth_promiscuous_enable_t) and
>  rte_flow_validate() etc functions
>  return values description.
> >>>
> >>> It is an internal function used only in rte_flow.c.
> >>> The real consequence is to set rte_errno in a lot of rte_flow API.
> >>> Not sure there is a good way to document the code details.
> >>> Other codes are not documented in rte_flow.h
> >>
> >> First of all it is a behaviour of the flow_ops_get callback and
> >> driver developers should know that it is a legal to return 0 and
> >> ops==NULL and know what it means.
> > 
> > The combination code 0 and ops NULL is not new.
> > Previously, it was returning ENOSYS.
> > I've just given a more meaningful error code: EACCES,
> > while replacing ENOSYS with ENOTSUP for the other case.
> 
> Yes, exactly. What I'm trying to say that it would be
> helpful to make it a bit more transparent to PMD developers.
> Yes, it was not documented before, I agree. I think it is
> a good time to improve documentation.
> 
> >> Second, it is visible as rte_flow_validate() (and other functions
> >> which use rte_flow_ops_get()) return value value which has
> >> special meaning. So, should be documented.
> > 
> > Yes, I should update the API doc where ENOSYS was mentioned.
> > Or probably better: I should keep the error code ENOSYS
> > and do not break API.
> > Preference?
> 
> Good question. I think we should not distinguish NULL callback
> and NULL ops returned by not-NULL callback. So, I think
> keeping ENOSYS is the best option here.

OK, thank you for the review.
So the conclusion is: keep ENOSYS and document NULL ops case.




[dpdk-dev] [PATCH 0/2] support block cipher DIGEST_ENCRYPTED mode

2021-03-15 Thread Tejasree Kondoj
This series adds support for block cipher DIGEST_ENCRYPTED mode in
OCTEON TX, OCTEON TX2 PMDs and sample unit test application.

Tejasree Kondoj (2):
  common/cpt: support DIGEST_ENCRYPTED mode
  test/crypto: support block cipher DIGEST_ENCRYPTED mode

 app/test/test_cryptodev_aes_test_vectors.h| 589 ++
 app/test/test_cryptodev_blockcipher.c |  95 ++-
 app/test/test_cryptodev_blockcipher.h |  10 +
 doc/guides/cryptodevs/features/octeontx.ini   |   1 +
 doc/guides/cryptodevs/features/octeontx2.ini  |   1 +
 doc/guides/rel_notes/release_21_05.rst|   8 +
 drivers/common/cpt/cpt_mcode_defines.h|   7 +-
 drivers/common/cpt/cpt_ucode.h|  42 +-
 drivers/crypto/octeontx/otx_cryptodev_ops.c   |  11 +-
 drivers/crypto/octeontx2/otx2_cryptodev.c |   3 +-
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c |   8 +-
 11 files changed, 749 insertions(+), 26 deletions(-)

-- 
2.27.0



[dpdk-dev] [PATCH 1/2] common/cpt: support DIGEST_ENCRYPTED mode

2021-03-15 Thread Tejasree Kondoj
Adding support for DIGEST_ENCRYPTED mode.

Signed-off-by: Tejasree Kondoj 
---
 doc/guides/cryptodevs/features/octeontx.ini   |  1 +
 doc/guides/cryptodevs/features/octeontx2.ini  |  1 +
 doc/guides/rel_notes/release_21_05.rst|  8 
 drivers/common/cpt/cpt_mcode_defines.h|  7 +++-
 drivers/common/cpt/cpt_ucode.h| 42 +++
 drivers/crypto/octeontx/otx_cryptodev_ops.c   | 11 +++--
 drivers/crypto/octeontx2/otx2_cryptodev.c |  3 +-
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c |  8 +++-
 8 files changed, 67 insertions(+), 14 deletions(-)

diff --git a/doc/guides/cryptodevs/features/octeontx.ini 
b/doc/guides/cryptodevs/features/octeontx.ini
index 10d94e3f7b..d9776a5788 100644
--- a/doc/guides/cryptodevs/features/octeontx.ini
+++ b/doc/guides/cryptodevs/features/octeontx.ini
@@ -13,6 +13,7 @@ OOP SGL In LB  Out = Y
 OOP SGL In SGL Out = Y
 OOP LB  In LB  Out = Y
 RSA PRIV OP KEY QT = Y
+Digest encrypted   = Y
 Symmetric sessionless  = Y
 
 ;
diff --git a/doc/guides/cryptodevs/features/octeontx2.ini 
b/doc/guides/cryptodevs/features/octeontx2.ini
index b0d50ce984..66c5fefde6 100644
--- a/doc/guides/cryptodevs/features/octeontx2.ini
+++ b/doc/guides/cryptodevs/features/octeontx2.ini
@@ -14,6 +14,7 @@ OOP SGL In LB  Out = Y
 OOP SGL In SGL Out = Y
 OOP LB  In LB  Out = Y
 RSA PRIV OP KEY QT = Y
+Digest encrypted   = Y
 Symmetric sessionless  = Y
 
 ;
diff --git a/doc/guides/rel_notes/release_21_05.rst 
b/doc/guides/rel_notes/release_21_05.rst
index 23f7f0bff9..d7c65091a9 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -65,6 +65,14 @@ New Features
 
   * Added support for txgbevf PMD.
 
+* **Updated the OCTEON TX crypto PMD.**
+
+  * Added support for DIGEST_ENCRYPTED mode in OCTEON TX crypto PMD.
+
+* **Updated the OCTEON TX2 crypto PMD.**
+
+  * Added support for DIGEST_ENCRYPTED mode in OCTEON TX2 crypto PMD.
+
 * **Updated testpmd.**
 
   * Added command to display Rx queue used descriptor count.
diff --git a/drivers/common/cpt/cpt_mcode_defines.h 
b/drivers/common/cpt/cpt_mcode_defines.h
index 56a745f419..624bdcf3cf 100644
--- a/drivers/common/cpt/cpt_mcode_defines.h
+++ b/drivers/common/cpt/cpt_mcode_defines.h
@@ -20,6 +20,9 @@
 #define CPT_MAJOR_OP_ZUC_SNOW3G0x37
 #define CPT_MAJOR_OP_KASUMI0x38
 #define CPT_MAJOR_OP_MISC  0x01
+#define CPT_HMAC_FIRST_BIT_POS 0x4
+#define CPT_FC_MINOR_OP_ENCRYPT0x0
+#define CPT_FC_MINOR_OP_DECRYPT0x1
 
 /* AE opcodes */
 #define CPT_MAJOR_OP_MODEX 0x03
@@ -314,8 +317,10 @@ struct cpt_ctx {
uint64_t hmac   :1;
uint64_t zsk_flags  :3;
uint64_t k_ecb  :1;
+   uint64_t auth_enc   :1;
+   uint64_t dec_auth   :1;
uint64_t snow3g :2;
-   uint64_t rsvd   :21;
+   uint64_t rsvd   :19;
/* Below fields are accessed by hardware */
union {
mc_fc_context_t fctx;
diff --git a/drivers/common/cpt/cpt_ucode.h b/drivers/common/cpt/cpt_ucode.h
index 0536620710..ee6d49aae7 100644
--- a/drivers/common/cpt/cpt_ucode.h
+++ b/drivers/common/cpt/cpt_ucode.h
@@ -752,7 +752,9 @@ cpt_enc_hmac_prep(uint32_t flags,
 
/* Encryption */
vq_cmd_w0.s.opcode.major = CPT_MAJOR_OP_FC;
-   vq_cmd_w0.s.opcode.minor = 0;
+   vq_cmd_w0.s.opcode.minor = CPT_FC_MINOR_OP_ENCRYPT;
+   vq_cmd_w0.s.opcode.minor |= (cpt_ctx->auth_enc <<
+   CPT_HMAC_FIRST_BIT_POS);
 
if (hash_type == GMAC_TYPE) {
encr_offset = 0;
@@ -779,6 +781,9 @@ cpt_enc_hmac_prep(uint32_t flags,
outputlen = enc_dlen + mac_len;
}
 
+   if (cpt_ctx->auth_enc != 0)
+   outputlen = enc_dlen;
+
/* GP op header */
vq_cmd_w0.s.param1 = encr_data_len;
vq_cmd_w0.s.param2 = auth_data_len;
@@ -1112,7 +1117,9 @@ cpt_dec_hmac_prep(uint32_t flags,
 
/* Decryption */
vq_cmd_w0.s.opcode.major = CPT_MAJOR_OP_FC;
-   vq_cmd_w0.s.opcode.minor = 1;
+   vq_cmd_w0.s.opcode.minor = CPT_FC_MINOR_OP_DECRYPT;
+   vq_cmd_w0.s.opcode.minor |= (cpt_ctx->dec_auth <<
+   CPT_HMAC_FIRST_BIT_POS);
 
if (hash_type == GMAC_TYPE) {
encr_offset = 0;
@@ -1130,6 +1137,9 @@ cpt_dec_hmac_prep(uint32_t flags,
outputlen = enc_dlen;
}
 
+   if (cpt_ctx->dec_auth != 0)
+   outputlen = inputlen = enc_dlen;
+
vq_cmd_w0.s.param1 = encr_data_len;
vq_cmd_w0.s.param2 = auth_data_len;
 
@@ -2566,6 +2576,7 @@ fill_sess_cipher(struct rte_crypto_sym_xform *xform,
 struct cpt_sess_misc *sess)
 {
struct rte_crypto_cipher_xform *c_form;
+   struct cpt_ctx *ctx = SESS_PRIV(sess);
cipher_type_t enc_type = 0; /* NULL Cipher type */
uint32_t cipher_key_len = 0;
ui

[dpdk-dev] [PATCH 2/2] test/crypto: support block cipher DIGEST_ENCRYPTED mode

2021-03-15 Thread Tejasree Kondoj
Adding support for block cipher DIGEST_ENCRYPTED mode.

Signed-off-by: Tejasree Kondoj 
---
 app/test/test_cryptodev_aes_test_vectors.h | 589 +
 app/test/test_cryptodev_blockcipher.c  |  95 +++-
 app/test/test_cryptodev_blockcipher.h  |  10 +
 3 files changed, 682 insertions(+), 12 deletions(-)

diff --git a/app/test/test_cryptodev_aes_test_vectors.h 
b/app/test/test_cryptodev_aes_test_vectors.h
index c192d75a7e..7755b271c2 100644
--- a/app/test/test_cryptodev_aes_test_vectors.h
+++ b/app/test/test_cryptodev_aes_test_vectors.h
@@ -1093,6 +1093,172 @@ static const uint8_t ciphertext512_aes128cbc_aad[] = {
0x73, 0x65, 0x72, 0x73, 0x2C, 0x20, 0x73, 0x75
 };
 
+static const uint8_t plaintext_aes_common_digest_enc[] = {
+   0x57, 0x68, 0x61, 0x74, 0x20, 0x61, 0x20, 0x6C,
+   0x6F, 0x75, 0x73, 0x79, 0x20, 0x65, 0x61, 0x72,
+   0x74, 0x68, 0x21, 0x20, 0x48, 0x65, 0x20, 0x77,
+   0x6F, 0x6E, 0x64, 0x65, 0x72, 0x65, 0x64, 0x20,
+   0x68, 0x6F, 0x77, 0x20, 0x6D, 0x61, 0x6E, 0x79,
+   0x20, 0x70, 0x65, 0x6F, 0x70, 0x6C, 0x65, 0x20,
+   0x77, 0x65, 0x72, 0x65, 0x20, 0x64, 0x65, 0x73,
+   0x74, 0x69, 0x74, 0x75, 0x74, 0x65, 0x20, 0x74,
+   0x68, 0x61, 0x74, 0x20, 0x73, 0x61, 0x6D, 0x65,
+   0x20, 0x6E, 0x69, 0x67, 0x68, 0x74, 0x20, 0x65,
+   0x76, 0x65, 0x6E, 0x20, 0x69, 0x6E, 0x20, 0x68,
+   0x69, 0x73, 0x20, 0x6F, 0x77, 0x6E, 0x20, 0x70,
+   0x72, 0x6F, 0x73, 0x70, 0x65, 0x72, 0x6F, 0x75,
+   0x73, 0x20, 0x63, 0x6F, 0x75, 0x6E, 0x74, 0x72,
+   0x79, 0x2C, 0x20, 0x68, 0x6F, 0x77, 0x20, 0x6D,
+   0x61, 0x6E, 0x79, 0x20, 0x68, 0x6F, 0x6D, 0x65,
+   0x73, 0x20, 0x77, 0x65, 0x72, 0x65, 0x20, 0x73,
+   0x68, 0x61, 0x6E, 0x74, 0x69, 0x65, 0x73, 0x2C,
+   0x20, 0x68, 0x6F, 0x77, 0x20, 0x6D, 0x61, 0x6E,
+   0x79, 0x20, 0x68, 0x75, 0x73, 0x62, 0x61, 0x6E,
+   0x64, 0x73, 0x20, 0x77, 0x65, 0x72, 0x65, 0x20,
+   0x64, 0x72, 0x75, 0x6E, 0x6B, 0x20, 0x61, 0x6E,
+   0x64, 0x20, 0x77, 0x69, 0x76, 0x65, 0x73, 0x20,
+   0x73, 0x6F, 0x63, 0x6B, 0x65, 0x64, 0x2C, 0x20,
+   0x61, 0x6E, 0x64, 0x20, 0x68, 0x6F, 0x77, 0x20,
+   0x6D, 0x61, 0x6E, 0x79, 0x20, 0x63, 0x68, 0x69,
+   0x6C, 0x64, 0x72, 0x65, 0x6E, 0x20, 0x77, 0x65,
+   0x72, 0x65, 0x20, 0x62, 0x75, 0x6C, 0x6C, 0x69,
+   0x65, 0x64, 0x2C, 0x20, 0x61, 0x62, 0x75, 0x73,
+   0x65, 0x64, 0x2C, 0x20, 0x6F, 0x72, 0x20, 0x61,
+   0x62, 0x61, 0x6E, 0x64, 0x6F, 0x6E, 0x65, 0x64,
+   0x2E, 0x20, 0x48, 0x6F, 0x77, 0x20, 0x6D, 0x61,
+   0x6E, 0x79, 0x20, 0x66, 0x61, 0x6D, 0x69, 0x6C,
+   0x69, 0x65, 0x73, 0x20, 0x68, 0x75, 0x6E, 0x67,
+   0x65, 0x72, 0x65, 0x64, 0x20, 0x66, 0x6F, 0x72,
+   0x20, 0x66, 0x6F, 0x6F, 0x64, 0x20, 0x74, 0x68,
+   0x65, 0x79, 0x20, 0x63, 0x6F, 0x75, 0x6C, 0x64,
+   0x20, 0x6E, 0x6F, 0x74, 0x20, 0x61, 0x66, 0x66,
+   0x6F, 0x72, 0x64, 0x20, 0x74, 0x6F, 0x20, 0x62,
+   0x75, 0x79, 0x3F, 0x20, 0x48, 0x6F, 0x77, 0x20,
+   0x6D, 0x61, 0x6E, 0x79, 0x20, 0x68, 0x65, 0x61,
+   0x72, 0x74, 0x73, 0x20, 0x77, 0x65, 0x72, 0x65,
+   0x20, 0x62, 0x72, 0x6F, 0x6B, 0x65, 0x6E, 0x3F,
+   0x20, 0x48, 0x6F, 0x77, 0x20, 0x6D, 0x61, 0x6E,
+   0x79, 0x20, 0x73, 0x75, 0x69, 0x63, 0x69, 0x64,
+   0x65, 0x73, 0x20, 0x77, 0x6F, 0x75, 0x6C, 0x64,
+   0x20, 0x74, 0x61, 0x6B, 0x65, 0x20, 0x70, 0x6C,
+   0x61, 0x63, 0x65, 0x20, 0x74, 0x68, 0x61, 0x74,
+   0x20, 0x73, 0x61, 0x6D, 0x65, 0x20, 0x6E, 0x69,
+   0x67, 0x68, 0x74, 0x2C, 0x20, 0x68, 0x6F, 0x77,
+   0x20, 0x6D, 0x61, 0x6E, 0x79, 0x20, 0x70, 0x65,
+   0x6F, 0x70, 0x6C, 0x65, 0x20, 0x77, 0x6F, 0x75,
+   0x6C, 0x64, 0x20, 0x67, 0x6F, 0x20, 0x69, 0x6E,
+   0x73, 0x61, 0x6E, 0x65, 0x3F, 0x20, 0x48, 0x6F,
+   0x77, 0x20, 0x6D, 0x61, 0x6E, 0x79, 0x20, 0x63,
+   0x6F, 0x63, 0x6B, 0x72, 0x6F, 0x61, 0x63, 0x68,
+   0x65, 0x73, 0x20, 0x61, 0x6E, 0x64, 0x20, 0x6C,
+   0x61, 0x6E, 0x64, 0x6C, 0x6F, 0x72, 0x64, 0x73,
+   0x20, 0x77, 0x6F, 0x75, 0x6C, 0x64, 0x20, 0x74,
+   0x72, 0x69, 0x75, 0x6D, 0x70, 0x68, 0x3F, 0x20,
+   0x48, 0x6F, 0x77, 0x20, 0x6D, 0x61, 0x6E, 0x79,
+   0x20, 0x77, 0x69, 0x6E, 0x6E, 0x65, 0x72, 0x73,
+   0x20, 0x77, 0x65, 0x72, 0x65, 0x20, 0x6c, 0x6f,
+   0x73, 0x65, 0x72, 0x73, 0x2c, 0x20, 0x73, 0x75,
+   /* mac */
+   0xC4, 0xB7, 0x0E, 0x6B, 0xDE, 0xD1, 0xE7, 0x77,
+   0x7E, 0x2E, 0x8F, 0xFC, 0x48, 0x39, 0x46, 0x17,
+   0x3F, 0x91, 0x64, 0x59
+};
+
+static const uint8_t ciphertext512_aes128cbc_digest_enc[] = {
+   0x8B, 0x4D, 0xDA, 0x1B, 0xCF, 0x04, 0xA0, 0x31,
+   0xB4, 0xBF, 0xBD, 0x68, 0x43, 0x20, 0x7E, 0x76,
+   0xB1, 0x96, 0x8B, 0xA2, 0x7C, 0xA2, 0x83, 0x9E,
+   0x39, 0x5A, 0x2F, 0x7E, 0x92, 0xB4, 0x48, 0x1A,
+   0x3F, 0x6B, 0x5D, 0xDF, 0x52, 0x85, 0x5F, 0x8E,
+   0x42, 0x3C, 0xFB, 0xE9, 0x1A, 0x24, 0xD6, 0x08,
+   0xDD, 0xFD, 0x16, 0xFB, 0xE9, 0x55, 0xEF, 0xF0,
+   0xA0, 0x8D, 0x13, 0xAB, 

[dpdk-dev] [PATCH] crypto/octeontx2: remove redundant code

2021-03-15 Thread Tejasree Kondoj
Removing redundant field in a union.

Signed-off-by: Tejasree Kondoj 
---
 drivers/crypto/octeontx2/otx2_ipsec_po.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/crypto/octeontx2/otx2_ipsec_po.h 
b/drivers/crypto/octeontx2/otx2_ipsec_po.h
index 8a672a38ea..eda9f19738 100644
--- a/drivers/crypto/octeontx2/otx2_ipsec_po.h
+++ b/drivers/crypto/octeontx2/otx2_ipsec_po.h
@@ -203,7 +203,6 @@ struct otx2_ipsec_po_out_sa {
 
/* w8-w55 */
union {
-   uint8_t raw[384];
struct {
struct otx2_ipsec_po_ip_template template;
} aes_gcm;
-- 
2.27.0



[dpdk-dev] [PATCH 0/3] add lookaside IPsec UDP encapsulation and transport mode

2021-03-15 Thread Tejasree Kondoj
This series adds lookaside IPsec UDP encapsulation and transport mode support.
The functionality has been tested with ipsec-secgw application running in
lookaside protocol offload mode.

Tejasree Kondoj (3):
  crypto/octeontx2: add UDP encapsulation support
  examples/ipsec-secgw: add UDP encapsulation support
  crypto/octeontx2: support lookaside IPv4 transport mode

 doc/guides/cryptodevs/octeontx2.rst   |   2 +
 doc/guides/rel_notes/release_21_05.rst|  10 ++
 doc/guides/sample_app_ug/ipsec_secgw.rst  |   5 +-
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c |   7 +-
 drivers/crypto/octeontx2/otx2_cryptodev_sec.c | 126 --
 drivers/crypto/octeontx2/otx2_cryptodev_sec.h |   4 +-
 drivers/crypto/octeontx2/otx2_ipsec_po.h  |   6 +
 drivers/crypto/octeontx2/otx2_ipsec_po_ops.h  |   8 +-
 examples/ipsec-secgw/ipsec-secgw.c|  33 -
 examples/ipsec-secgw/ipsec-secgw.h|   2 +
 examples/ipsec-secgw/ipsec.c  |   1 +
 examples/ipsec-secgw/ipsec.h  |   1 +
 examples/ipsec-secgw/sad.h|   5 +-
 13 files changed, 130 insertions(+), 80 deletions(-)

-- 
2.27.0



[dpdk-dev] [PATCH 1/3] crypto/octeontx2: add UDP encapsulation support

2021-03-15 Thread Tejasree Kondoj
Adding UDP encapsulation support for IPsec in
lookaside protocol mode.

Signed-off-by: Tejasree Kondoj 
---
 doc/guides/cryptodevs/octeontx2.rst   |  1 +
 doc/guides/rel_notes/release_21_05.rst|  5 +++
 drivers/crypto/octeontx2/otx2_cryptodev_sec.c | 40 ++-
 3 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/doc/guides/cryptodevs/octeontx2.rst 
b/doc/guides/cryptodevs/octeontx2.rst
index d312eeb74c..b30f98180a 100644
--- a/doc/guides/cryptodevs/octeontx2.rst
+++ b/doc/guides/cryptodevs/octeontx2.rst
@@ -181,6 +181,7 @@ Features supported
 * Tunnel mode
 * ESN
 * Anti-replay
+* UDP Encapsulation
 * AES-128/192/256-GCM
 * AES-128/192/256-CBC-SHA1-HMAC
 * AES-128/192/256-CBC-SHA256-128-HMAC
diff --git a/doc/guides/rel_notes/release_21_05.rst 
b/doc/guides/rel_notes/release_21_05.rst
index 23f7f0bff9..66e28e21be 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -65,6 +65,11 @@ New Features
 
   * Added support for txgbevf PMD.
 
+* **Updated the OCTEON TX2 crypto PMD.**
+
+  * Updated the OCTEON TX2 crypto PMD lookaside protocol offload for IPsec with
+UDP encapsulation support for NAT Traversal.
+
 * **Updated testpmd.**
 
   * Added command to display Rx queue used descriptor count.
diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_sec.c 
b/drivers/crypto/octeontx2/otx2_cryptodev_sec.c
index 342f089df8..8942ff1fac 100644
--- a/drivers/crypto/octeontx2/otx2_cryptodev_sec.c
+++ b/drivers/crypto/octeontx2/otx2_cryptodev_sec.c
@@ -203,6 +203,7 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
 struct rte_security_session *sec_sess)
 {
struct rte_crypto_sym_xform *auth_xform, *cipher_xform;
+   struct otx2_ipsec_po_ip_template *template;
const uint8_t *cipher_key, *auth_key;
struct otx2_sec_session_ipsec_lp *lp;
struct otx2_ipsec_po_sa_ctl *ctl;
@@ -248,11 +249,7 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
if (ipsec->tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4) {
 
if (ctl->enc_type == OTX2_IPSEC_PO_SA_ENC_AES_GCM) {
-   if (ipsec->options.udp_encap) {
-   sa->aes_gcm.template.ip4.udp_src = 4500;
-   sa->aes_gcm.template.ip4.udp_dst = 4500;
-   }
-   ip = &sa->aes_gcm.template.ip4.ipv4_hdr;
+   template = &sa->aes_gcm.template;
ctx_len = offsetof(struct otx2_ipsec_po_out_sa,
aes_gcm.template) + sizeof(
sa->aes_gcm.template.ip4);
@@ -260,11 +257,7 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
lp->ctx_len = ctx_len >> 3;
} else if (ctl->auth_type ==
OTX2_IPSEC_PO_SA_AUTH_SHA1) {
-   if (ipsec->options.udp_encap) {
-   sa->sha1.template.ip4.udp_src = 4500;
-   sa->sha1.template.ip4.udp_dst = 4500;
-   }
-   ip = &sa->sha1.template.ip4.ipv4_hdr;
+   template = &sa->sha1.template;
ctx_len = offsetof(struct otx2_ipsec_po_out_sa,
sha1.template) + sizeof(
sa->sha1.template.ip4);
@@ -272,11 +265,7 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
lp->ctx_len = ctx_len >> 3;
} else if (ctl->auth_type ==
OTX2_IPSEC_PO_SA_AUTH_SHA2_256) {
-   if (ipsec->options.udp_encap) {
-   sa->sha2.template.ip4.udp_src = 4500;
-   sa->sha2.template.ip4.udp_dst = 4500;
-   }
-   ip = &sa->sha2.template.ip4.ipv4_hdr;
+   template = &sa->sha2.template;
ctx_len = offsetof(struct otx2_ipsec_po_out_sa,
sha2.template) + sizeof(
sa->sha2.template.ip4);
@@ -285,8 +274,15 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
} else {
return -EINVAL;
}
+   ip = &template->ip4.ipv4_hdr;
+   if (ipsec->options.udp_encap) {
+   ip->next_proto_id = IPPROTO_UD

[dpdk-dev] [PATCH 2/3] examples/ipsec-secgw: add UDP encapsulation support

2021-03-15 Thread Tejasree Kondoj
Adding lookaside IPsec UDP encapsulation support
for NAT traversal.
Added --udp-encap option for application to specify
if UDP encapsulation need to be enabled.
Example secgw command with UDP encapsultation enabled:
 -c 0x1 -- -P -p 0x1 --config "(0,0,0)" -f ep0.cfg --udp-encap

Signed-off-by: Tejasree Kondoj 
---
 doc/guides/rel_notes/release_21_05.rst   |  5 
 doc/guides/sample_app_ug/ipsec_secgw.rst |  5 +++-
 examples/ipsec-secgw/ipsec-secgw.c   | 33 ++--
 examples/ipsec-secgw/ipsec-secgw.h   |  2 ++
 examples/ipsec-secgw/ipsec.c |  1 +
 examples/ipsec-secgw/ipsec.h |  1 +
 examples/ipsec-secgw/sad.h   |  5 +++-
 7 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_05.rst 
b/doc/guides/rel_notes/release_21_05.rst
index 66e28e21be..2e67038bfe 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -75,6 +75,11 @@ New Features
   * Added command to display Rx queue used descriptor count.
 ``show port (port_id) rxq (queue_id) desc used count``
 
+* **Updated ipsec-secgw sample application.**
+
+  * Updated the ``ipsec-secgw`` sample application with UDP encapsulation
+support for NAT Traversal.
+
 
 Removed Items
 -
diff --git a/doc/guides/sample_app_ug/ipsec_secgw.rst 
b/doc/guides/sample_app_ug/ipsec_secgw.rst
index 176e292d3f..099f499c18 100644
--- a/doc/guides/sample_app_ug/ipsec_secgw.rst
+++ b/doc/guides/sample_app_ug/ipsec_secgw.rst
@@ -139,6 +139,7 @@ The application has a number of command line options::
 --reassemble NUM
 --mtu MTU
 --frag-ttl FRAG_TTL_NS
+--udp-encap
 
 Where:
 
@@ -234,6 +235,8 @@ Where:
 Should be lower for low number of reassembly buckets.
 Valid values: from 1 ns to 10 s. Default value: 1000 (10 s).
 
+*   ``--udp-encap``: enables IPsec UDP Encapsulation for NAT Traversal.
+
 
 The mapping of lcores to port/queues is similar to other l3fwd applications.
 
@@ -1023,4 +1026,4 @@ Available options:
 *   ``-h`` Show usage.
 
 If  is specified, only tests for that mode will be invoked. For the
-list of available modes please refer to run_test.sh.
\ No newline at end of file
+list of available modes please refer to run_test.sh.
diff --git a/examples/ipsec-secgw/ipsec-secgw.c 
b/examples/ipsec-secgw/ipsec-secgw.c
index 20d69ba813..57c8973e9d 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -115,6 +115,7 @@ struct flow_info flow_info_tbl[RTE_MAX_ETHPORTS];
 #define CMD_LINE_OPT_REASSEMBLE"reassemble"
 #define CMD_LINE_OPT_MTU   "mtu"
 #define CMD_LINE_OPT_FRAG_TTL  "frag-ttl"
+#define CMD_LINE_OPT_UDP_ENCAP "udp-encap"
 
 #define CMD_LINE_ARG_EVENT "event"
 #define CMD_LINE_ARG_POLL  "poll"
@@ -139,6 +140,7 @@ enum {
CMD_LINE_OPT_REASSEMBLE_NUM,
CMD_LINE_OPT_MTU_NUM,
CMD_LINE_OPT_FRAG_TTL_NUM,
+   CMD_LINE_OPT_UDP_ENCAP_NUM,
 };
 
 static const struct option lgopts[] = {
@@ -152,6 +154,7 @@ static const struct option lgopts[] = {
{CMD_LINE_OPT_REASSEMBLE, 1, 0, CMD_LINE_OPT_REASSEMBLE_NUM},
{CMD_LINE_OPT_MTU, 1, 0, CMD_LINE_OPT_MTU_NUM},
{CMD_LINE_OPT_FRAG_TTL, 1, 0, CMD_LINE_OPT_FRAG_TTL_NUM},
+   {CMD_LINE_OPT_UDP_ENCAP, 0, 0, CMD_LINE_OPT_UDP_ENCAP_NUM},
{NULL, 0, 0, 0}
 };
 
@@ -360,6 +363,9 @@ prepare_one_packet(struct rte_mbuf *pkt, struct 
ipsec_traffic *t)
const struct rte_ether_hdr *eth;
const struct rte_ipv4_hdr *iph4;
const struct rte_ipv6_hdr *iph6;
+   const struct rte_udp_hdr *udp;
+   uint16_t nat_port;
+   uint16_t ip4_hdr_len;
 
eth = rte_pktmbuf_mtod(pkt, const struct rte_ether_hdr *);
if (eth->ether_type == rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4)) {
@@ -368,9 +374,26 @@ prepare_one_packet(struct rte_mbuf *pkt, struct 
ipsec_traffic *t)
RTE_ETHER_HDR_LEN);
adjust_ipv4_pktlen(pkt, iph4, 0);
 
-   if (iph4->next_proto_id == IPPROTO_ESP)
+   switch (iph4->next_proto_id) {
+   case IPPROTO_ESP:
t->ipsec.pkts[(t->ipsec.num)++] = pkt;
-   else {
+   break;
+   case IPPROTO_UDP:
+   if (app_sa_prm.udp_encap == 1) {
+   ip4_hdr_len = ((iph4->version_ihl &
+   RTE_IPV4_HDR_IHL_MASK) *
+   RTE_IPV4_IHL_MULTIPLIER);
+   udp = rte_pktmbuf_mtod_offset(pkt,
+   struct rte_udp_hdr *, ip4_hdr_len);
+   nat_port = rte_cpu_to_be_16(IPSEC_NAT_T_PORT);
+   if (udp->src_port == nat_port ||
+   

[dpdk-dev] [PATCH 3/3] crypto/octeontx2: support lookaside IPv4 transport mode

2021-03-15 Thread Tejasree Kondoj
Adding support for IPv4 lookaside IPsec transport mode.

Signed-off-by: Tejasree Kondoj 
---
 doc/guides/cryptodevs/octeontx2.rst   |   1 +
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c |   7 +-
 drivers/crypto/octeontx2/otx2_cryptodev_sec.c | 110 ++
 drivers/crypto/octeontx2/otx2_cryptodev_sec.h |   4 +-
 drivers/crypto/octeontx2/otx2_ipsec_po.h  |   6 +
 drivers/crypto/octeontx2/otx2_ipsec_po_ops.h  |   8 +-
 6 files changed, 76 insertions(+), 60 deletions(-)

diff --git a/doc/guides/cryptodevs/octeontx2.rst 
b/doc/guides/cryptodevs/octeontx2.rst
index b30f98180a..811e61a1f6 100644
--- a/doc/guides/cryptodevs/octeontx2.rst
+++ b/doc/guides/cryptodevs/octeontx2.rst
@@ -179,6 +179,7 @@ Features supported
 * IPv6
 * ESP
 * Tunnel mode
+* Transport mode(IPv4)
 * ESN
 * Anti-replay
 * UDP Encapsulation
diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c 
b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
index cec20b5c6d..c20170bcaa 100644
--- a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
+++ b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
@@ -928,7 +928,7 @@ otx2_cpt_sec_post_process(struct rte_crypto_op *cop, 
uintptr_t *rsp)
struct rte_mbuf *m = sym_op->m_src;
struct rte_ipv6_hdr *ip6;
struct rte_ipv4_hdr *ip;
-   uint16_t m_len;
+   uint16_t m_len = 0;
int mdata_len;
char *data;
 
@@ -938,11 +938,12 @@ otx2_cpt_sec_post_process(struct rte_crypto_op *cop, 
uintptr_t *rsp)
if (word0->s.opcode.major == OTX2_IPSEC_PO_PROCESS_IPSEC_INB) {
data = rte_pktmbuf_mtod(m, char *);
 
-   if (rsp[4] == RTE_SECURITY_IPSEC_TUNNEL_IPV4) {
+   if (rsp[4] == OTX2_IPSEC_PO_TRANSPORT ||
+   rsp[4] == OTX2_IPSEC_PO_TUNNEL_IPV4) {
ip = (struct rte_ipv4_hdr *)(data +
OTX2_IPSEC_PO_INB_RPTR_HDR);
m_len = rte_be_to_cpu_16(ip->total_length);
-   } else {
+   } else if (rsp[4] == OTX2_IPSEC_PO_TUNNEL_IPV6) {
ip6 = (struct rte_ipv6_hdr *)(data +
OTX2_IPSEC_PO_INB_RPTR_HDR);
m_len = rte_be_to_cpu_16(ip6->payload_len) +
diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_sec.c 
b/drivers/crypto/octeontx2/otx2_cryptodev_sec.c
index 8942ff1fac..6493ce8370 100644
--- a/drivers/crypto/octeontx2/otx2_cryptodev_sec.c
+++ b/drivers/crypto/octeontx2/otx2_cryptodev_sec.c
@@ -25,12 +25,15 @@ ipsec_lp_len_precalc(struct rte_security_ipsec_xform *ipsec,
 {
struct rte_crypto_sym_xform *cipher_xform, *auth_xform;
 
-   if (ipsec->tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4)
-   lp->partial_len = sizeof(struct rte_ipv4_hdr);
-   else if (ipsec->tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV6)
-   lp->partial_len = sizeof(struct rte_ipv6_hdr);
-   else
-   return -EINVAL;
+   lp->partial_len = 0;
+   if (ipsec->mode == RTE_SECURITY_IPSEC_SA_MODE_TUNNEL) {
+   if (ipsec->tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4)
+   lp->partial_len = sizeof(struct rte_ipv4_hdr);
+   else if (ipsec->tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV6)
+   lp->partial_len = sizeof(struct rte_ipv6_hdr);
+   else
+   return -EINVAL;
+   }
 
if (ipsec->proto == RTE_SECURITY_IPSEC_SA_PROTO_ESP) {
lp->partial_len += sizeof(struct rte_esp_hdr);
@@ -203,7 +206,7 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
 struct rte_security_session *sec_sess)
 {
struct rte_crypto_sym_xform *auth_xform, *cipher_xform;
-   struct otx2_ipsec_po_ip_template *template;
+   struct otx2_ipsec_po_ip_template *template = NULL;
const uint8_t *cipher_key, *auth_key;
struct otx2_sec_session_ipsec_lp *lp;
struct otx2_ipsec_po_sa_ctl *ctl;
@@ -229,10 +232,10 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
memset(sa, 0, sizeof(struct otx2_ipsec_po_out_sa));
 
/* Initialize lookaside ipsec private data */
+   lp->mode_type = OTX2_IPSEC_PO_TRANSPORT;
lp->ip_id = 0;
lp->seq_lo = 1;
lp->seq_hi = 0;
-   lp->tunnel_type = ipsec->tunnel.type;
 
ret = ipsec_po_sa_ctl_set(ipsec, crypto_xform, ctl);
if (ret)
@@ -242,46 +245,47 @@ crypto_sec_ipsec_outb_session_create(struct rte_cryptodev 
*crypto_dev,
if (ret)
return ret;
 
-   if (ipsec->mode == RTE_SECURITY_IPSEC_SA_MODE_TUNNEL) {
-   /* Start ip id from 1 */
-   lp->ip_id = 1;
+   /* Start ip id from 1 */
+   lp->ip_id = 1;
+
+   if (ctl->enc_type == OTX2_IPSEC_PO_SA_ENC_AES_GCM) {
+   template = &sa->aes_gcm.template;
+   ctx_len = offsetof(struct otx2_ipsec_po_out_sa,
+

Re: [dpdk-dev] [PATCH v3 00/11] improve options help

2021-03-15 Thread Bruce Richardson
On Fri, Mar 12, 2021 at 07:17:09PM +0100, Thomas Monjalon wrote:
> The main intent of this series is to provide a nice help
> for the --log-level option.
> More patches are added to improve options help in general.
> 
> 
> v3:
> - fix use of RTE_LOG_MAX
> - accept (with warning) log level higher than RTE_LOG_MAX
> v2:
> - fix use of the new macro RTE_LOG_MAX in level parsing
> - improve parameters type and name while moving functions
> 
> 
Series-acked-by: Bruce Richardson 


Re: [dpdk-dev] [PATCH 7/7] eventdev: fix ABI breakage due to event vector

2021-03-15 Thread Kinsella, Ray



On 08/03/2021 18:44, Jerin Jacob wrote:
> On Sun, Feb 21, 2021 at 3:41 AM  wrote:
>>
>> From: Pavan Nikhilesh 
>>
>> Fix ABI breakage due to event vector configuration by moving
>> the vector configuration into a new structure and having a separate
>> function for enabling the vector config on a given ethernet device and
>> queue pair.
>> This vector config and function can be merged to queue config in
>> v21.11.
>>
>> Fixes: 44c81670cf0a ("eventdev: introduce event vector Rx capability")
> 
> Hi @Ray Kinsella @Neil Horman @Thomas Monjalon @David Marchand
> 
> Is the ABI breakage contract between release to release. Right? i.e it
> is not between each patch. Right?
> 
> Summary:
> 1)  Ideal way of adding this feature is to add elements in the
> existing structure as mentioned
> in  ("eventdev: introduce event vector Rx capability") in this series.
> 2) Since this breaking ABI, Introducing a new structure to fix this. I
> think, we can remove this
> limitation in 21.11 as that time we can change ABI as required.
> 
> So, Is this patch needs to be squashed to  ("eventdev: introduce event
> vector Rx capability") to avoid
> ABI compatibility between patches? Or Is it OK to break the ABI
> compatibility in a patch in the series
> and later fix it in the same series?(This is for more readability as
> we can revert this patch in 21.11).

You are essentially writing it as you want it to appear in 21.11, 
you then add one patch at the end to fix ABI compability until then.
You then only have one patch to revert in the 21.11 cycle. 

Agree with David, I like the approach. 

+1 from me. 

> 
> 
> 
>>
>> Signed-off-by: Pavan Nikhilesh 
>> ---
>>  app/test-eventdev/test_pipeline_common.c  |  16 +-
>>  lib/librte_eventdev/eventdev_pmd.h|  29 +++
>>  .../rte_event_eth_rx_adapter.c| 168 --
>>  .../rte_event_eth_rx_adapter.h|  27 +++
>>  lib/librte_eventdev/version.map   |   1 +
>>  5 files changed, 184 insertions(+), 57 deletions(-)
>>
>> diff --git a/app/test-eventdev/test_pipeline_common.c 
>> b/app/test-eventdev/test_pipeline_common.c
>> index 89f73be86..9aeefdd5f 100644
>> --- a/app/test-eventdev/test_pipeline_common.c
>> +++ b/app/test-eventdev/test_pipeline_common.c
>> @@ -331,6 +331,7 @@ pipeline_event_rx_adapter_setup(struct evt_options *opt, 
>> uint8_t stride,
>> uint16_t prod;
>> struct rte_mempool *vector_pool = NULL;
>> struct rte_event_eth_rx_adapter_queue_conf queue_conf;
>> +   struct rte_event_eth_rx_adapter_event_vector_config vec_conf;
>>
>> memset(&queue_conf, 0,
>> sizeof(struct rte_event_eth_rx_adapter_queue_conf));
>> @@ -360,12 +361,8 @@ pipeline_event_rx_adapter_setup(struct evt_options 
>> *opt, uint8_t stride,
>> }
>> if (opt->ena_vector) {
>> if (cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR) 
>> {
>> -   queue_conf.vector_sz = opt->vector_size;
>> -   queue_conf.vector_timeout_ns =
>> -   opt->vector_tmo_nsec;
>> queue_conf.rx_queue_flags |=
>> RTE_EVENT_ETH_RX_ADAPTER_QUEUE_EVENT_VECTOR;
>> -   queue_conf.vector_mp = vector_pool;
>> } else {
>> evt_err("Rx adapter doesn't support event 
>> vector");
>> return -EINVAL;
>> @@ -385,6 +382,17 @@ pipeline_event_rx_adapter_setup(struct evt_options 
>> *opt, uint8_t stride,
>> return ret;
>> }
>>
>> +   if (opt->ena_vector) {
>> +   vec_conf.vector_sz = opt->vector_size;
>> +   vec_conf.vector_timeout_ns = opt->vector_tmo_nsec;
>> +   vec_conf.vector_mp = vector_pool;
>> +   if 
>> (rte_event_eth_rx_adapter_queue_event_vector_config(
>> +   prod, prod, -1, &vec_conf) < 0) {
>> +   evt_err("Failed to configure event 
>> vectorization for Rx adapter");
>> +   return -EINVAL;
>> +   }
>> +   }
>> +
>> if (!(cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT)) {
>> uint32_t service_id = -1U;
>>
>> diff --git a/lib/librte_eventdev/eventdev_pmd.h 
>> b/lib/librte_eventdev/eventdev_pmd.h
>> index 60bfaebc0..d79dfd612 100644
>> --- a/lib/librte_eventdev/eventdev_pmd.h
>> +++ b/lib/librte_eventdev/eventdev_pmd.h
>> @@ -667,6 +667,32 @@ typedef int 
>> (*eventdev_eth_rx_adapter_vector_limits_get_t)(
>> const struct rte_eventdev *dev, const struct rte_eth_dev *eth_dev,
>> struct rte_event_eth_rx_adapter_vector_limits *limits);
>>
>> +struct rte_event_eth_rx_adapter_event_vector_config;
>> +/**
>> + 

Re: [dpdk-dev] [PATCH v2] ethdev: introduce enable_driver_sdk to install driver headers

2021-03-15 Thread Bruce Richardson
On Fri, Mar 12, 2021 at 02:20:06PM -0800, Tyler Retzlaff wrote:
> Introduce a meson option enable_driver_sdk when true installs internal
> driver headers for ethdev. this allows drivers that do not depend on
> stable api/abi to be built external to the dpdk source tree.
> 
> Signed-off-by: Tyler Retzlaff 
> ---
>  lib/librte_ethdev/meson.build | 6 ++
>  lib/meson.build   | 4 
>  meson_options.txt | 2 ++
>  3 files changed, 12 insertions(+)
> 
The infrastructure looks good to me. However, you need to add change to the
cryptodev, eventdev, etc. to add the headers from there too.


Re: [dpdk-dev] [PATCH v11 2/2] bus/pci: support MMIO in PCI ioport accessors

2021-03-15 Thread David Marchand
On Thu, Mar 11, 2021 at 7:43 AM Wang, Haiyue  wrote:
> Like kernel use macro to do pio and mmio, maybe we can also to do so for
> making code clean:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/iomap.c
>
> #define IO_COND(addr, is_pio, is_mmio) do { \
> unsigned long port = (unsigned long __force)addr;   \
> if (port >= PIO_RESERVED) { \
> is_mmio;\
> } else if (port > PIO_OFFSET) { \
> port &= PIO_MASK;   \
> is_pio; \
> } else  \
> bad_io_access(port, #is_pio );  \
> } while (0)
>
>
> Like:
>
> #if defined(RTE_ARCH_X86)
> #define IO_COND(addr, is_pio, is_mmio) do {   \
> if ((uint64_t)(uintptr_t)addr >= PIO_MAX) {   \
> is_mmio;  \
> } else {  \
> is_pio;   \
> } \
> } while (0)
> #else
> #define IO_COND(addr, is_pio, is_mmio) do {   \
> is_mmio;  \
> } while (0)
> #endif

We should not just copy/paste kernel code.

Plus here, this seems a bit overkill.
And there are other parts in this code that could use some polishing.

What do you think of merging this series as is (now that we got non
regression reports) and doing such cleanups in followup patches?


-- 
David Marchand



Re: [dpdk-dev] [PATCH v3 07/11] eal: add log level help

2021-03-15 Thread Kinsella, Ray



On 12/03/2021 18:17, Thomas Monjalon wrote:
> The option --log-level was not completely described in the usage text,
> and it was difficult to guess the names of the log types and levels.
> 
> A new value "help" is accepted after --log-level to give more details
> about the syntax and listing the log types and levels.
> 
> The array "levels" used for level name parsing is replaced with
> a (modified) existing function which was used in rte_log_dump().
> 
> The new function rte_log_list_types() is exported in the API
> for allowing an application to give this info to the user
> if not exposing the EAL option --log-level.
> The list of log types cannot include all drivers if not linked in the
> application (shared object plugin case).
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/librte_eal/common/eal_common_log.c | 24 +---
>  lib/librte_eal/common/eal_common_options.c | 44 +++---
>  lib/librte_eal/common/eal_log.h|  5 +++
>  lib/librte_eal/include/rte_log.h   | 11 ++
>  lib/librte_eal/version.map |  3 ++
>  5 files changed, 69 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_log.c 
> b/lib/librte_eal/common/eal_common_log.c
> index 40cac36f89..d695b04068 100644
> --- a/lib/librte_eal/common/eal_common_log.c
> +++ b/lib/librte_eal/common/eal_common_log.c
> @@ -397,12 +397,12 @@ RTE_INIT_PRIO(log_init, LOG)
>   rte_logs.dynamic_types_len = RTE_LOGTYPE_FIRST_EXT_ID;
>  }
>  
> -static const char *
> -loglevel_to_string(uint32_t level)
> +const char *
> +eal_log_level2str(uint32_t level)
>  {
>   switch (level) {
>   case 0: return "disabled";
> - case RTE_LOG_EMERG: return "emerg";
> + case RTE_LOG_EMERG: return "emergency";
>   case RTE_LOG_ALERT: return "alert";
>   case RTE_LOG_CRIT: return "critical";
>   case RTE_LOG_ERR: return "error";
> @@ -414,6 +414,20 @@ loglevel_to_string(uint32_t level)
>   }
>  }
>  
> +/* Dump name of each logtype, one per line. */
> +void
> +rte_log_list_types(FILE *out, const char *prefix)
> +{
> + size_t type;
> +
> + for (type = 0; type < rte_logs.dynamic_types_len; ++type) {
> + if (rte_logs.dynamic_types[type].name == NULL)
> + continue;
> + fprintf(out, "%s%s\n",
> + prefix, rte_logs.dynamic_types[type].name);
> + }
> +}
> +
>  /* dump global level and registered log types */
>  void
>  rte_log_dump(FILE *f)
> @@ -421,14 +435,14 @@ rte_log_dump(FILE *f)
>   size_t i;
>  
>   fprintf(f, "global log level is %s\n",
> - loglevel_to_string(rte_log_get_global_level()));
> + eal_log_level2str(rte_log_get_global_level()));
>  
>   for (i = 0; i < rte_logs.dynamic_types_len; i++) {
>   if (rte_logs.dynamic_types[i].name == NULL)
>   continue;
>   fprintf(f, "id %zu: %s, level is %s\n",
>   i, rte_logs.dynamic_types[i].name,
> - loglevel_to_string(rte_logs.dynamic_types[i].loglevel));
> + eal_log_level2str(rte_logs.dynamic_types[i].loglevel));
>   }
>  }
>  
> diff --git a/lib/librte_eal/common/eal_common_options.c 
> b/lib/librte_eal/common/eal_common_options.c
> index 2df3ae04ea..1da6583d71 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -1227,19 +1227,31 @@ eal_parse_syslog(const char *facility, struct 
> internal_config *conf)
>  }
>  #endif
>  
> +static void
> +eal_log_usage(void)
> +{
> + unsigned int level;
> +
> + printf("Log type is a pattern matching items of this list"
> + " (plugins may be missing):\n");
> + rte_log_list_types(stdout, "\t");
> + printf("\n");
> + printf("Syntax using globbing pattern: ");
> + printf("--"OPT_LOG_LEVEL" pattern:level\n");
> + printf("Syntax using regular expression:   ");
> + printf("--"OPT_LOG_LEVEL" regexp,level\n");
> + printf("Syntax for the global level:   ");
> + printf("--"OPT_LOG_LEVEL" level\n");
> + printf("Logs are emitted if allowed by both global and specific 
> levels.\n");
> + printf("\n");
> + printf("Log level can be a number or the first letters of its name:\n");
> + for (level = 1; level <= RTE_LOG_MAX; level++)
> + printf("\t%d   %s\n", level, eal_log_level2str(level));
> +}
> +
>  static int
>  eal_parse_log_priority(const char *level)
>  {
> - static const char * const levels[] = {
> - [RTE_LOG_EMERG]   = "emergency",
> - [RTE_LOG_ALERT]   = "alert",
> - [RTE_LOG_CRIT]= "critical",
> - [RTE_LOG_ERR] = "error",
> - [RTE_LOG_WARNING] = "warning",
> - [RTE_LOG_NOTICE]  = "notice",
> - [RTE_LOG_INFO]= "info",
> - [RTE_LOG_DEBUG]   = "debug",
> - };
>   size_t len = strlen(level

Re: [dpdk-dev] [PATCH v4 4/5] examples/l3fwd: implement FIB lookup method

2021-03-15 Thread Walsh, Conor
Hi Vladimir,



> > +   /* Add IPv4 and IPv6 hops to one array depending on type. */
> > +   for (i = 0; i < nb_rx; i++) {
> > +   if (type_arr[i])
> > +   nh = (uint16_t)hopsv4[ipv4_arr_assem++];
> > +   else
> > +   nh = (uint16_t)hopsv6[ipv6_arr_assem++];
> > +   hops[i] = (nh != FIB_DEFAULT_HOP && nh <=
> RTE_MAX_ETHPORTS &&
> > +   (enabled_port_mask & 1 << nh) != 0) ? nh : portid;
> 
> I think you can get rid of
> "nh <= RTE_MAX_ETHPORTS && (enabled_port_mask & 1 << nh) != 0"
> because it can be controlled during initialization when installing
> routes to the table. So you can check it just before rte_fib_add() and
> install FIB_DEFAULT_HOP if needed.

I will change this ternary to: hops[i] = nh != FIB_DEFAULT_HOP ? nh : portid;
I will also update the event code to match this.

> 
> Apart from that LGTM.

I will push a v5 with this change.

Thanks,
Conor.

> 
> 
> > +   }
> > +
> > +#if defined FIB_SEND_MULTI
> > +   send_packets_multi(qconf, pkts_burst, hops, nb_rx);
> > +#else
> > +   fib_send_single(nb_rx, qconf, pkts_burst, hops);
> > +#endif
> > +}
> > +
> 
> 
> 
> >
> 
> --
> Regards,
> Vladimir


Re: [dpdk-dev] [PATCH v3 07/11] eal: add log level help

2021-03-15 Thread Bruce Richardson
On Mon, Mar 15, 2021 at 10:19:47AM +, Kinsella, Ray wrote:
> 
> 
> On 12/03/2021 18:17, Thomas Monjalon wrote:
> > The option --log-level was not completely described in the usage text,
> > and it was difficult to guess the names of the log types and levels.
> > 
> > A new value "help" is accepted after --log-level to give more details
> > about the syntax and listing the log types and levels.
> > 
> > The array "levels" used for level name parsing is replaced with
> > a (modified) existing function which was used in rte_log_dump().
> > 
> > The new function rte_log_list_types() is exported in the API
> > for allowing an application to give this info to the user
> > if not exposing the EAL option --log-level.
> > The list of log types cannot include all drivers if not linked in the
> > application (shared object plugin case).
> > 
> > Signed-off-by: Thomas Monjalon 
> > ---
> >  lib/librte_eal/common/eal_common_log.c | 24 +---
> >  lib/librte_eal/common/eal_common_options.c | 44 +++---
> >  lib/librte_eal/common/eal_log.h|  5 +++
> >  lib/librte_eal/include/rte_log.h   | 11 ++
> >  lib/librte_eal/version.map |  3 ++
> >  5 files changed, 69 insertions(+), 18 deletions(-)
> > 

> > @@ -1274,6 +1286,11 @@ eal_parse_log_level(const char *arg)
> > char *str, *level;
> > int priority;
> >  
> > +   if (strcmp(arg, "help") == 0) {
> 
> So I think the convention is to support both "?" and "help".
> Qemu does this at least. 
> 
I've seen "/?" used for help on windows binaries, but "-?" not so much in the
linux world, where --help (and often -h for short) seem to be the standard.


Re: [dpdk-dev] [PATCH v3 07/11] eal: add log level help

2021-03-15 Thread Kinsella, Ray



On 15/03/2021 10:31, Bruce Richardson wrote:
> On Mon, Mar 15, 2021 at 10:19:47AM +, Kinsella, Ray wrote:
>>
>>
>> On 12/03/2021 18:17, Thomas Monjalon wrote:
>>> The option --log-level was not completely described in the usage text,
>>> and it was difficult to guess the names of the log types and levels.
>>>
>>> A new value "help" is accepted after --log-level to give more details
>>> about the syntax and listing the log types and levels.
>>>
>>> The array "levels" used for level name parsing is replaced with
>>> a (modified) existing function which was used in rte_log_dump().
>>>
>>> The new function rte_log_list_types() is exported in the API
>>> for allowing an application to give this info to the user
>>> if not exposing the EAL option --log-level.
>>> The list of log types cannot include all drivers if not linked in the
>>> application (shared object plugin case).
>>>
>>> Signed-off-by: Thomas Monjalon 
>>> ---
>>>  lib/librte_eal/common/eal_common_log.c | 24 +---
>>>  lib/librte_eal/common/eal_common_options.c | 44 +++---
>>>  lib/librte_eal/common/eal_log.h|  5 +++
>>>  lib/librte_eal/include/rte_log.h   | 11 ++
>>>  lib/librte_eal/version.map |  3 ++
>>>  5 files changed, 69 insertions(+), 18 deletions(-)
>>>
> 
>>> @@ -1274,6 +1286,11 @@ eal_parse_log_level(const char *arg)
>>> char *str, *level;
>>> int priority;
>>>  
>>> +   if (strcmp(arg, "help") == 0) {
>>
>> So I think the convention is to support both "?" and "help".
>> Qemu does this at least. 
>>
> I've seen "/?" used for help on windows binaries, but "-?" not so much in the
> linux world, where --help (and often -h for short) seem to be the standard.
> 

This is slightly different - it is where you are looking to return a list of 
valid 
values for a parameter. So for instance in qemu mentioned above 

 ~ > qemu-system-x86_64 -cpu ? | head -n 10
Available CPUs:
x86 486   (alias configured by machine type)
x86 486-v1
x86 Broadwell (alias configured by machine type)
x86 Broadwell-IBRS(alias of Broadwell-v3)
x86 Broadwell-noTSX   (alias of Broadwell-v2)
x86 Broadwell-noTSX-IBRS  (alias of Broadwell-v4)
x86 Broadwell-v1  Intel Core Processor (Broadwell)
x86 Broadwell-v2  Intel Core Processor (Broadwell, no TSX)
x86 Broadwell-v3  Intel Core Processor (Broadwell, IBRS)





Re: [dpdk-dev] [PATCH v3 00/11] improve options help

2021-03-15 Thread Andrew Rybchenko
On 3/15/21 12:40 PM, Bruce Richardson wrote:
> On Fri, Mar 12, 2021 at 07:17:09PM +0100, Thomas Monjalon wrote:
>> The main intent of this series is to provide a nice help
>> for the --log-level option.
>> More patches are added to improve options help in general.
>>
>>
>> v3:
>> - fix use of RTE_LOG_MAX
>> - accept (with warning) log level higher than RTE_LOG_MAX
>> v2:
>> - fix use of the new macro RTE_LOG_MAX in level parsing
>> - improve parameters type and name while moving functions
>>
>>
> Series-acked-by: Bruce Richardson 
> 

Series-acked-by: Andrew Rybchenko 


Re: [dpdk-dev] [PATCH v3 07/11] eal: add log level help

2021-03-15 Thread Thomas Monjalon
15/03/2021 11:42, Kinsella, Ray:
> 
> On 15/03/2021 10:31, Bruce Richardson wrote:
> > On Mon, Mar 15, 2021 at 10:19:47AM +, Kinsella, Ray wrote:
> >>
> >>
> >> On 12/03/2021 18:17, Thomas Monjalon wrote:
> >>> The option --log-level was not completely described in the usage text,
> >>> and it was difficult to guess the names of the log types and levels.
> >>>
> >>> A new value "help" is accepted after --log-level to give more details
> >>> about the syntax and listing the log types and levels.
> >>>
> >>> The array "levels" used for level name parsing is replaced with
> >>> a (modified) existing function which was used in rte_log_dump().
> >>>
> >>> The new function rte_log_list_types() is exported in the API
> >>> for allowing an application to give this info to the user
> >>> if not exposing the EAL option --log-level.
> >>> The list of log types cannot include all drivers if not linked in the
> >>> application (shared object plugin case).
> >>>
> >>> Signed-off-by: Thomas Monjalon 
> >>> ---
> >>>  lib/librte_eal/common/eal_common_log.c | 24 +---
> >>>  lib/librte_eal/common/eal_common_options.c | 44 +++---
> >>>  lib/librte_eal/common/eal_log.h|  5 +++
> >>>  lib/librte_eal/include/rte_log.h   | 11 ++
> >>>  lib/librte_eal/version.map |  3 ++
> >>>  5 files changed, 69 insertions(+), 18 deletions(-)
> >>>
> > 
> >>> @@ -1274,6 +1286,11 @@ eal_parse_log_level(const char *arg)
> >>>   char *str, *level;
> >>>   int priority;
> >>>  
> >>> + if (strcmp(arg, "help") == 0) {
> >>
> >> So I think the convention is to support both "?" and "help".
> >> Qemu does this at least. 
> >>
> > I've seen "/?" used for help on windows binaries, but "-?" not so much in 
> > the
> > linux world, where --help (and often -h for short) seem to be the standard.
> > 
> 
> This is slightly different - it is where you are looking to return a list of 
> valid 
> values for a parameter. So for instance in qemu mentioned above 
> 
>  ~ > qemu-system-x86_64 -cpu ? | head -n 10

"?" is a special character.
In my zsh, I need to quote it to avoid globbing parsing,
so I'm not a fan.

I will let you extend the syntax in a separate patch :)




[dpdk-dev] [RFC PATCH] meson: remove unnecessary explicit link to libpcap

2021-03-15 Thread Gabriel Ganne
libpcap is already found and registered as a dependency by meson, and
the dependency is already correctly used in librte_port. This line is
just unnecessary.

It also has the side effect of messing with the meson link line: dpdk
link will be declared twice: manually and then through pkg-config. If
you configure meson to prefer static linking over dynamic, this will
cause the build to fail on librte_port, since the pcap deps are not yet
seen by the linker.

Signed-off-by: Gabriel Ganne 
---
 config/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/config/meson.build b/config/meson.build
index 0fb7e1b27a0f..3eb90327dfcc 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -177,7 +177,6 @@ if not pcap_dep.found()
 endif
 if pcap_dep.found() and cc.has_header('pcap.h', dependencies: pcap_dep)
dpdk_conf.set('RTE_PORT_PCAP', 1)
-   dpdk_extra_ldflags += '-lpcap'
 endif
 
 # for clang 32-bit compiles we need libatomic for 64-bit atomic ops
-- 
2.29.2



Re: [dpdk-dev] [PATCH 2/3] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread Maxime Coquelin



On 1/11/21 3:50 AM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -Original Message-
>> From: Maxime Coquelin 
>> Sent: Tuesday, December 22, 2020 12:15 AM
>> To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com;
>> david.march...@redhat.com; olivier.m...@6wind.com
>> Cc: Maxime Coquelin 
>> Subject: [PATCH 2/3] net/virtio: allocate fake mbuf in Rx queue
>>
>> While it is worth clarifying whether the fake mbuf
>> in virtnet_rx struct is really necessary, it is sure
>> that it heavily impacts cache usage by being part of
>> the struct. Indeed, it takes uses cachelines, and
> 
> Did you mean 'uses cachelines'?

I don't know what I meant here :) I will rework it!

>> requires alignement on a cacheline.
> 
> Alignment?
> 
> With above fixed:
> 
> Reviewed-by: Chenbo Xia 
> 
>>
>> Before this series, it means it took 120 bytes in
>> virtnet_rx struct:
>>
>> struct virtnet_rx {
>>  struct virtqueue * vq;   /* 0 8 */
>>
>>  /* XXX 56 bytes hole, try to pack */
>>
>>  /* --- cacheline 1 boundary (64 bytes) --- */
>>  struct rte_mbuffake_mbuf __attribute__((__aligned__(64)));
>> /*64   128 */
>>  /* --- cacheline 3 boundary (192 bytes) --- */
>>
>> This patch allocates it using malloc in order to optimize
>> virtnet_rx cache usage and so virtqueue cache usage.
>>
>> Signed-off-by: Maxime Coquelin 
>> ---
>>  drivers/net/virtio/virtio_ethdev.c | 10 ++
>>  drivers/net/virtio/virtio_rxtx.c   |  8 +++-
>>  drivers/net/virtio/virtio_rxtx.h   |  2 +-
>>  3 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.c
>> b/drivers/net/virtio/virtio_ethdev.c
>> index 297c01a70d..a1351b36ca 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -539,6 +539,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t
>> queue_idx)
>>  }
>>
>>  if (queue_type == VTNET_RQ) {
>> +struct rte_mbuf *fake_mbuf;
>>  size_t sz_sw = (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
>> sizeof(vq->sw_ring[0]);
>>
>> @@ -550,10 +551,18 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t
>> queue_idx)
>>  goto fail_q_alloc;
>>  }
>>
>> +fake_mbuf = malloc(sizeof(*fake_mbuf));
>> +if (!fake_mbuf) {
>> +PMD_INIT_LOG(ERR, "can not allocate fake mbuf");
>> +ret = -ENOMEM;
>> +goto fail_q_alloc;
>> +}
>> +
>>  vq->sw_ring = sw_ring;
>>  rxvq = &vq->rxq;
>>  rxvq->port_id = dev->data->port_id;
>>  rxvq->mz = mz;
>> +rxvq->fake_mbuf = fake_mbuf;
>>  } else if (queue_type == VTNET_TQ) {
>>  txvq = &vq->txq;
>>  txvq->port_id = dev->data->port_id;
>> @@ -636,6 +645,7 @@ virtio_free_queues(struct virtio_hw *hw)
>>
>>  queue_type = virtio_get_queue_type(hw, i);
>>  if (queue_type == VTNET_RQ) {
>> +free(vq->rxq.fake_mbuf);
>>  rte_free(vq->sw_ring);
>>  rte_memzone_free(vq->rxq.mz);
>>  } else if (queue_type == VTNET_TQ) {
>> diff --git a/drivers/net/virtio/virtio_rxtx.c
>> b/drivers/net/virtio/virtio_rxtx.c
>> index 1fcce36cbd..d147d7300a 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -703,11 +703,9 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev 
>> *dev,
>> uint16_t queue_idx)
>>  virtio_rxq_vec_setup(rxvq);
>>  }
>>
>> -memset(&rxvq->fake_mbuf, 0, sizeof(rxvq->fake_mbuf));
>> -for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST;
>> - desc_idx++) {
>> -vq->sw_ring[vq->vq_nentries + desc_idx] =
>> -&rxvq->fake_mbuf;
>> +memset(rxvq->fake_mbuf, 0, sizeof(*rxvq->fake_mbuf));
>> +for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST; desc_idx++) {
>> +vq->sw_ring[vq->vq_nentries + desc_idx] = rxvq->fake_mbuf;
>>  }
>>
>>  if (hw->use_vec_rx && !virtio_with_packed_queue(hw)) {
>> diff --git a/drivers/net/virtio/virtio_rxtx.h
>> b/drivers/net/virtio/virtio_rxtx.h
>> index 7f1036be6f..6ce5d67d15 100644
>> --- a/drivers/net/virtio/virtio_rxtx.h
>> +++ b/drivers/net/virtio/virtio_rxtx.h
>> @@ -19,7 +19,7 @@ struct virtnet_stats {
>>
>>  struct virtnet_rx {
>>  /* dummy mbuf, for wraparound when processing RX ring. */
>> -struct rte_mbuf fake_mbuf;
>> +struct rte_mbuf *fake_mbuf;
>>  uint64_t mbuf_initializer; /**< value to init mbufs. */
>>  struct rte_mempool *mpool; /**< mempool for mbuf allocation */
>>
>> --
>> 2.29.2
> 



[dpdk-dev] [PATCH v5 0/5] examples/l3fwd: add FIB lookup method to l3fwd

2021-03-15 Thread Conor Walsh
Currently the l3fwd sample app supports LPM and EM lookup methods this
patchset implements the FIB library as another lookup method for l3fwd.
Instead of adding an individual flag for FIB, a new flag '--lookup' has
been added that allows the user to select their desired lookup method.
The flags '-E' and '-L' have been retained for backwards compatibility.

---

v5:
- Removed runtime checks to ensure desired port is within portmask,
  unused ports are still removed during setup

v4:
- Changed individual switches for lookup methods to an
  enum for all lookup methods
- Removed '-F' and introduced '--lookup' flag to select lookup methods
- Fixed indentation issues
- Renamed some variables for increased clarity
- Minor changes to some logic for readability
- Implemented MAC updating for FIB on non-SSE machines
- Implemented RFC1812 for FIB on non-SSE machines
- Added checks to ensure desired port is within portmask

v3: add support for NEON, PPC 64 and machines that do not support SSE,
NEON or PPC 64.

v2: added the socket header file to fix FreeBSD build.

Conor Walsh (5):
  examples/l3fwd: fix LPM IPv6 subnets
  examples/l3fwd: move l3fwd routes to common header
  examples/l3fwd: add FIB infrastructure
  examples/l3fwd: implement FIB lookup method
  doc/guides/l3_forward: update documentation for FIB

 doc/guides/sample_app_ug/l3_forward.rst | 113 -
 examples/l3fwd/Makefile |   2 +-
 examples/l3fwd/l3fwd.h  |  27 +-
 examples/l3fwd/l3fwd_common_route.h |  48 +++
 examples/l3fwd/l3fwd_event.c|   9 +
 examples/l3fwd/l3fwd_event.h|   1 +
 examples/l3fwd/l3fwd_fib.c  | 528 
 examples/l3fwd/l3fwd_lpm.c  |  68 +--
 examples/l3fwd/main.c   | 107 +++--
 examples/l3fwd/meson.build  |   4 +-
 10 files changed, 809 insertions(+), 98 deletions(-)
 create mode 100644 examples/l3fwd/l3fwd_common_route.h
 create mode 100644 examples/l3fwd/l3fwd_fib.c

-- 
2.25.1



[dpdk-dev] [PATCH v5 1/5] examples/l3fwd: fix LPM IPv6 subnets

2021-03-15 Thread Conor Walsh
The IPv6 subnets used were not within the 2001:200::/48 subnet
Changed to 2001:200:0:{0-7}::/64 where 0-7 is the port ID

Fixes: 37afe381bde4 ("examples/l3fwd: use reserved IP addresses")

Signed-off-by: Conor Walsh 
Acked-by: Vladimir Medvedkin 
---
 examples/l3fwd/l3fwd_lpm.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index 3dcf1fef18..1cfaf36572 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -42,7 +42,10 @@ struct ipv6_l3fwd_lpm_route {
uint8_t  if_out;
 };
 
-/* 198.18.0.0/16 are set aside for RFC2544 benchmarking (RFC5735). */
+/*
+ * 198.18.0.0/16 are set aside for RFC2544 benchmarking (RFC5735).
+ * 198.18.{0-7}.0/24 = Port {0-7}
+ */
 static const struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
{RTE_IPV4(198, 18, 0, 0), 24, 0},
{RTE_IPV4(198, 18, 1, 0), 24, 1},
@@ -54,16 +57,19 @@ static const struct ipv4_l3fwd_lpm_route 
ipv4_l3fwd_lpm_route_array[] = {
{RTE_IPV4(198, 18, 7, 0), 24, 7},
 };
 
-/* 2001:0200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180) */
+/*
+ * 2001:200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180).
+ * 2001:200:0:{0-7}::/64 = Port {0-7}
+ */
 static const struct ipv6_l3fwd_lpm_route ipv6_l3fwd_lpm_route_array[] = {
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 48, 0},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, 48, 1},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0}, 48, 2},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0}, 48, 3},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0}, 48, 4},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0}, 48, 5},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0}, 48, 6},
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0}, 48, 7},
+   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 0},
+   {{32, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 1},
+   {{32, 1, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 2},
+   {{32, 1, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 3},
+   {{32, 1, 2, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 4},
+   {{32, 1, 2, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 5},
+   {{32, 1, 2, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 6},
+   {{32, 1, 2, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 7},
 };
 
 #define IPV4_L3FWD_LPM_MAX_RULES 1024
-- 
2.25.1



[dpdk-dev] [PATCH v5 2/5] examples/l3fwd: move l3fwd routes to common header

2021-03-15 Thread Conor Walsh
To prevent code duplication from the addition of lookup methods
the routes specified in lpm should be moved to a common header.

Signed-off-by: Conor Walsh 
Acked-by: Konstantin Ananyev 
Acked-by: Vladimir Medvedkin 
---
 examples/l3fwd/l3fwd_common_route.h | 48 +++
 examples/l3fwd/l3fwd_lpm.c  | 74 +++--
 2 files changed, 65 insertions(+), 57 deletions(-)
 create mode 100644 examples/l3fwd/l3fwd_common_route.h

diff --git a/examples/l3fwd/l3fwd_common_route.h 
b/examples/l3fwd/l3fwd_common_route.h
new file mode 100644
index 00..7f0125a8a5
--- /dev/null
+++ b/examples/l3fwd/l3fwd_common_route.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Corporation
+ */
+
+#include 
+#include 
+
+struct ipv4_l3fwd_common_route {
+   uint32_t ip;
+   uint8_t  depth;
+   uint8_t  if_out;
+};
+
+struct ipv6_l3fwd_common_route {
+   uint8_t ip[16];
+   uint8_t  depth;
+   uint8_t  if_out;
+};
+
+/*
+ * 198.18.0.0/16 are set aside for RFC2544 benchmarking (RFC5735).
+ * 198.18.{0-7}.0/24 = Port {0-7}
+ */
+static const struct ipv4_l3fwd_common_route ipv4_l3fwd_common_route_array[] = {
+   {RTE_IPV4(198, 18, 0, 0), 24, 0},
+   {RTE_IPV4(198, 18, 1, 0), 24, 1},
+   {RTE_IPV4(198, 18, 2, 0), 24, 2},
+   {RTE_IPV4(198, 18, 3, 0), 24, 3},
+   {RTE_IPV4(198, 18, 4, 0), 24, 4},
+   {RTE_IPV4(198, 18, 5, 0), 24, 5},
+   {RTE_IPV4(198, 18, 6, 0), 24, 6},
+   {RTE_IPV4(198, 18, 7, 0), 24, 7},
+};
+
+/*
+ * 2001:200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180).
+ * 2001:200:0:{0-7}::/64 = Port {0-7}
+ */
+static const struct ipv6_l3fwd_common_route ipv6_l3fwd_common_route_array[] = {
+   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 0},
+   {{32, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 1},
+   {{32, 1, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 2},
+   {{32, 1, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 3},
+   {{32, 1, 2, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 4},
+   {{32, 1, 2, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 5},
+   {{32, 1, 2, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 6},
+   {{32, 1, 2, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 7},
+};
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index 1cfaf36572..818cf717d1 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -30,47 +30,7 @@
 #include "l3fwd.h"
 #include "l3fwd_event.h"
 
-struct ipv4_l3fwd_lpm_route {
-   uint32_t ip;
-   uint8_t  depth;
-   uint8_t  if_out;
-};
-
-struct ipv6_l3fwd_lpm_route {
-   uint8_t ip[16];
-   uint8_t  depth;
-   uint8_t  if_out;
-};
-
-/*
- * 198.18.0.0/16 are set aside for RFC2544 benchmarking (RFC5735).
- * 198.18.{0-7}.0/24 = Port {0-7}
- */
-static const struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
-   {RTE_IPV4(198, 18, 0, 0), 24, 0},
-   {RTE_IPV4(198, 18, 1, 0), 24, 1},
-   {RTE_IPV4(198, 18, 2, 0), 24, 2},
-   {RTE_IPV4(198, 18, 3, 0), 24, 3},
-   {RTE_IPV4(198, 18, 4, 0), 24, 4},
-   {RTE_IPV4(198, 18, 5, 0), 24, 5},
-   {RTE_IPV4(198, 18, 6, 0), 24, 6},
-   {RTE_IPV4(198, 18, 7, 0), 24, 7},
-};
-
-/*
- * 2001:200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180).
- * 2001:200:0:{0-7}::/64 = Port {0-7}
- */
-static const struct ipv6_l3fwd_lpm_route ipv6_l3fwd_lpm_route_array[] = {
-   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 0},
-   {{32, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 1},
-   {{32, 1, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 2},
-   {{32, 1, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 3},
-   {{32, 1, 2, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 4},
-   {{32, 1, 2, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 5},
-   {{32, 1, 2, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 6},
-   {{32, 1, 2, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 7},
-};
+#include "l3fwd_common_route.h"
 
 #define IPV4_L3FWD_LPM_MAX_RULES 1024
 #define IPV4_L3FWD_LPM_NUMBER_TBL8S (1 << 8)
@@ -485,18 +445,18 @@ setup_lpm(const int socketid)
socketid);
 
/* populate the LPM table */
-   for (i = 0; i < RTE_DIM(ipv4_l3fwd_lpm_route_array); i++) {
+   for (i = 0; i < RTE_DIM(ipv4_l3fwd_common_route_array); i++) {
struct in_addr in;
 
/* skip unused ports */
-   if ((1 << ipv4_l3fwd_lpm_route_array[i].if_out &
+   if ((1 << ipv4_l3fwd_common_route_array[i].if_out &
enabled_port_mask) == 0)
continue;
 
ret = rte_lpm_add(ipv4_l3fwd_lpm_lookup_struct[socketid],
-   ipv4_l3fwd_lpm_route_array[i].ip,
-   ipv4_l3fwd_lpm_route_array[i].depth,
-   ipv4_l3fwd_lpm_route_array[i].if_out);
+

[dpdk-dev] [PATCH v5 3/5] examples/l3fwd: add FIB infrastructure

2021-03-15 Thread Conor Walsh
The purpose of this commit is to add the necessary function calls
and supporting infrastructure to allow the Forwarding Information Base
(FIB) library to be integrated into the l3fwd sample app.
Instead of adding an individual flag for FIB, a new flag '--lookup' has
been added that allows the user to select their desired lookup method.
The flags '-E' and '-L' have been retained for backwards compatibility.

Signed-off-by: Conor Walsh 
Acked-by: Konstantin Ananyev 
Acked-by: Vladimir Medvedkin 
---
 examples/l3fwd/Makefile  |   2 +-
 examples/l3fwd/l3fwd.h   |  27 -
 examples/l3fwd/l3fwd_event.c |   9 +++
 examples/l3fwd/l3fwd_event.h |   1 +
 examples/l3fwd/l3fwd_fib.c   |  60 
 examples/l3fwd/main.c| 107 ++-
 examples/l3fwd/meson.build   |   4 +-
 7 files changed, 176 insertions(+), 34 deletions(-)
 create mode 100644 examples/l3fwd/l3fwd_fib.c

diff --git a/examples/l3fwd/Makefile b/examples/l3fwd/Makefile
index 7e70bbd826..5f7baffbf7 100644
--- a/examples/l3fwd/Makefile
+++ b/examples/l3fwd/Makefile
@@ -5,7 +5,7 @@
 APP = l3fwd
 
 # all source are stored in SRCS-y
-SRCS-y := main.c l3fwd_lpm.c l3fwd_em.c l3fwd_event.c
+SRCS-y := main.c l3fwd_lpm.c l3fwd_fib.c l3fwd_em.c l3fwd_event.c
 SRCS-y += l3fwd_event_generic.c l3fwd_event_internal_port.c
 
 # Build using pkg-config variables if possible
diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 2cf06099e0..a808d60247 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2010-2016 Intel Corporation
+ * Copyright(c) 2010-2021 Intel Corporation
  */
 
 #ifndef __L3_FWD_H__
@@ -180,13 +180,16 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t 
link_len)
 int
 init_mem(uint16_t portid, unsigned int nb_mbuf);
 
-/* Function pointers for LPM or EM functionality. */
+/* Function pointers for LPM, EM or FIB functionality. */
 void
 setup_lpm(const int socketid);
 
 void
 setup_hash(const int socketid);
 
+void
+setup_fib(const int socketid);
+
 int
 em_check_ptype(int portid);
 
@@ -207,6 +210,9 @@ em_main_loop(__rte_unused void *dummy);
 int
 lpm_main_loop(__rte_unused void *dummy);
 
+int
+fib_main_loop(__rte_unused void *dummy);
+
 int
 lpm_event_main_loop_tx_d(__rte_unused void *dummy);
 int
@@ -225,8 +231,17 @@ em_event_main_loop_tx_q(__rte_unused void *dummy);
 int
 em_event_main_loop_tx_q_burst(__rte_unused void *dummy);
 
+int
+fib_event_main_loop_tx_d(__rte_unused void *dummy);
+int
+fib_event_main_loop_tx_d_burst(__rte_unused void *dummy);
+int
+fib_event_main_loop_tx_q(__rte_unused void *dummy);
+int
+fib_event_main_loop_tx_q_burst(__rte_unused void *dummy);
+
 
-/* Return ipv4/ipv6 fwd lookup struct for LPM or EM. */
+/* Return ipv4/ipv6 fwd lookup struct for LPM, EM or FIB. */
 void *
 em_get_ipv4_l3fwd_lookup_struct(const int socketid);
 
@@ -239,4 +254,10 @@ lpm_get_ipv4_l3fwd_lookup_struct(const int socketid);
 void *
 lpm_get_ipv6_l3fwd_lookup_struct(const int socketid);
 
+void *
+fib_get_ipv4_l3fwd_lookup_struct(const int socketid);
+
+void *
+fib_get_ipv6_l3fwd_lookup_struct(const int socketid);
+
 #endif  /* __L3_FWD_H__ */
diff --git a/examples/l3fwd/l3fwd_event.c b/examples/l3fwd/l3fwd_event.c
index 4d31593a0a..961860ea18 100644
--- a/examples/l3fwd/l3fwd_event.c
+++ b/examples/l3fwd/l3fwd_event.c
@@ -227,6 +227,12 @@ l3fwd_event_resource_setup(struct rte_eth_conf *port_conf)
[1][0] = em_event_main_loop_tx_q,
[1][1] = em_event_main_loop_tx_q_burst,
};
+   const event_loop_cb fib_event_loop[2][2] = {
+   [0][0] = fib_event_main_loop_tx_d,
+   [0][1] = fib_event_main_loop_tx_d_burst,
+   [1][0] = fib_event_main_loop_tx_q,
+   [1][1] = fib_event_main_loop_tx_q_burst,
+   };
uint32_t event_queue_cfg;
int ret;
 
@@ -264,4 +270,7 @@ l3fwd_event_resource_setup(struct rte_eth_conf *port_conf)
 
evt_rsrc->ops.em_event_loop = em_event_loop[evt_rsrc->tx_mode_q]
   [evt_rsrc->has_burst];
+
+   evt_rsrc->ops.fib_event_loop = fib_event_loop[evt_rsrc->tx_mode_q]
+  [evt_rsrc->has_burst];
 }
diff --git a/examples/l3fwd/l3fwd_event.h b/examples/l3fwd/l3fwd_event.h
index 0e46164170..3ad1902ab5 100644
--- a/examples/l3fwd/l3fwd_event.h
+++ b/examples/l3fwd/l3fwd_event.h
@@ -55,6 +55,7 @@ struct l3fwd_event_setup_ops {
adapter_setup_cb adapter_setup;
event_loop_cb lpm_event_loop;
event_loop_cb em_event_loop;
+   event_loop_cb fib_event_loop;
 };
 
 struct l3fwd_event_resources {
diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c
new file mode 100644
index 00..0a2d02db2f
--- /dev/null
+++ b/examples/l3fwd/l3fwd_fib.c
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Cor

[dpdk-dev] [PATCH v5 4/5] examples/l3fwd: implement FIB lookup method

2021-03-15 Thread Conor Walsh
This patch implements the Forwarding Information Base (FIB) library
in l3fwd using the function calls and infrastructure introduced in
the previous patch.

Signed-off-by: Conor Walsh 
Acked-by: Konstantin Ananyev 
---
 examples/l3fwd/l3fwd_fib.c | 480 -
 1 file changed, 474 insertions(+), 6 deletions(-)

diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c
index 0a2d02db2f..a58b933f83 100644
--- a/examples/l3fwd/l3fwd_fib.c
+++ b/examples/l3fwd/l3fwd_fib.c
@@ -2,59 +2,527 @@
  * Copyright(c) 2021 Intel Corporation
  */
 
+#include 
+#include 
+#include 
+#include 
+#include 
+
 #include 
 #include 
 
 #include "l3fwd.h"
+#if defined RTE_ARCH_X86
+#include "l3fwd_sse.h"
+#elif defined __ARM_NEON
+#include "l3fwd_neon.h"
+#elif defined RTE_ARCH_PPC_64
+#include "l3fwd_altivec.h"
+#endif
 #include "l3fwd_event.h"
 #include "l3fwd_common_route.h"
 
+/* Configure how many packets ahead to prefetch for fib. */
+#define FIB_PREFETCH_OFFSET 4
+
+/* A non-existent portid is needed to denote a default hop for fib. */
+#define FIB_DEFAULT_HOP 999
+
+/*
+ * If the machine has SSE, NEON or PPC 64 then multiple packets
+ * can be sent at once if not only single packets will be sent
+ */
+#if defined RTE_ARCH_X86 || defined __ARM_NEON \
+   || defined RTE_ARCH_PPC_64
+#define FIB_SEND_MULTI
+#endif
+
+static struct rte_fib *ipv4_l3fwd_fib_lookup_struct[NB_SOCKETS];
+static struct rte_fib6 *ipv6_l3fwd_fib_lookup_struct[NB_SOCKETS];
+
+/* Parse packet type and ip address. */
+static inline void
+fib_parse_packet(struct rte_mbuf *mbuf,
+   uint32_t *ipv4, uint32_t *ipv4_cnt,
+   uint8_t ipv6[RTE_FIB6_IPV6_ADDR_SIZE],
+   uint32_t *ipv6_cnt, uint8_t *ip_type)
+{
+   struct rte_ether_hdr *eth_hdr;
+   struct rte_ipv4_hdr *ipv4_hdr;
+   struct rte_ipv6_hdr *ipv6_hdr;
+
+   eth_hdr = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);
+   /* IPv4 */
+   if (mbuf->packet_type & RTE_PTYPE_L3_IPV4) {
+   ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1);
+   *ipv4 = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
+   /* Store type of packet in type_arr (IPv4=1, IPv6=0). */
+   *ip_type = 1;
+   (*ipv4_cnt)++;
+   }
+   /* IPv6 */
+   else {
+   ipv6_hdr = (struct rte_ipv6_hdr *)(eth_hdr + 1);
+   rte_mov16(ipv6, (const uint8_t *)ipv6_hdr->dst_addr);
+   *ip_type = 0;
+   (*ipv6_cnt)++;
+   }
+}
+
+/*
+ * If the machine does not have SSE, NEON or PPC 64 then the packets
+ * are sent one at a time using send_single_packet()
+ */
+#if !defined FIB_SEND_MULTI
+static inline void
+fib_send_single(int nb_tx, struct lcore_conf *qconf,
+   struct rte_mbuf **pkts_burst, uint16_t hops[nb_tx])
+{
+   int32_t j;
+   struct rte_ether_hdr *eth_hdr;
+
+   for (j = 0; j < nb_tx; j++) {
+   /* Run rfc1812 if packet is ipv4 and checks enabled. */
+#if defined DO_RFC_1812_CHECKS
+   rfc1812_process((struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(
+   pkts_burst[j], struct rte_ether_hdr *) + 1),
+   &hops[j], pkts_burst[j]->packet_type);
+#endif
+
+   /* Set MAC addresses. */
+   eth_hdr = rte_pktmbuf_mtod(pkts_burst[j],
+   struct rte_ether_hdr *);
+   *(uint64_t *)ð_hdr->d_addr = dest_eth_addr[hops[j]];
+   rte_ether_addr_copy(&ports_eth_addr[hops[j]],
+   ð_hdr->s_addr);
+
+   /* Send single packet. */
+   send_single_packet(qconf, pkts_burst[j], hops[j]);
+   }
+}
+#endif
+
+/* Bulk parse, fib lookup and send. */
+static inline void
+fib_send_packets(int nb_rx, struct rte_mbuf **pkts_burst,
+   uint16_t portid, struct lcore_conf *qconf)
+{
+   uint32_t ipv4_arr[nb_rx];
+   uint8_t ipv6_arr[nb_rx][RTE_FIB6_IPV6_ADDR_SIZE];
+   uint16_t hops[nb_rx];
+   uint64_t hopsv4[nb_rx], hopsv6[nb_rx];
+   uint8_t type_arr[nb_rx];
+   uint32_t ipv4_cnt = 0, ipv6_cnt = 0;
+   uint32_t ipv4_arr_assem = 0, ipv6_arr_assem = 0;
+   uint16_t nh;
+   int32_t i;
+
+   /* Prefetch first packets. */
+   for (i = 0; i < FIB_PREFETCH_OFFSET && i < nb_rx; i++)
+   rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *));
+
+   /* Parse packet info and prefetch. */
+   for (i = 0; i < (nb_rx - FIB_PREFETCH_OFFSET); i++) {
+   /* Prefetch packet. */
+   rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[
+   i + FIB_PREFETCH_OFFSET], void *));
+   fib_parse_packet(pkts_burst[i],
+   &ipv4_arr[ipv4_cnt], &ipv4_cnt,
+   ipv6_arr[ipv6_cnt], &ipv6_cnt,
+   &type_arr[i]);
+   }
+
+   /* Parse remaining packet 

[dpdk-dev] [PATCH v5 5/5] doc/guides/l3_forward: update documentation for FIB

2021-03-15 Thread Conor Walsh
The purpose of this patch is to update the l3fwd user guide to include
the changes proposed in this patchset.

Signed-off-by: Conor Walsh 
---
 doc/guides/sample_app_ug/l3_forward.rst | 113 +---
 1 file changed, 100 insertions(+), 13 deletions(-)

diff --git a/doc/guides/sample_app_ug/l3_forward.rst 
b/doc/guides/sample_app_ug/l3_forward.rst
index e7875f8dcd..d5892bdcf8 100644
--- a/doc/guides/sample_app_ug/l3_forward.rst
+++ b/doc/guides/sample_app_ug/l3_forward.rst
@@ -11,7 +11,7 @@ The application performs L3 forwarding.
 Overview
 
 
-The application demonstrates the use of the hash and LPM libraries in the DPDK
+The application demonstrates the use of the hash, LPM and FIB libraries in DPDK
 to implement packet forwarding using poll or event mode PMDs for packet I/O.
 The initialization and run-time paths are very similar to those of the
 :doc:`l2_forward_real_virtual` and :doc:`l2_forward_event`.
@@ -22,7 +22,7 @@ decision is made based on information read from the input 
packet.
 Eventdev can optionally use S/W or H/W (if supported by platform) scheduler
 implementation for packet I/O based on run time parameters.
 
-The lookup method is either hash-based or LPM-based and is selected at run 
time. When the selected lookup method is hash-based,
+The lookup method is hash-based, LPM-based or FIB-based and is selected at run 
time. When the selected lookup method is hash-based,
 a hash object is used to emulate the flow classification stage.
 The hash object is used in correlation with a flow table to map each input 
packet to its flow at runtime.
 
@@ -30,14 +30,14 @@ The hash lookup key is represented by a DiffServ 5-tuple 
composed of the followi
 Source IP Address, Destination IP Address, Protocol, Source Port and 
Destination Port.
 The ID of the output interface for the input packet is read from the 
identified flow table entry.
 The set of flows used by the application is statically configured and loaded 
into the hash at initialization time.
-When the selected lookup method is LPM based, an LPM object is used to emulate 
the forwarding stage for IPv4 packets.
-The LPM object is used as the routing table to identify the next hop for each 
input packet at runtime.
+When the selected lookup method is LPM or FIB based, an LPM or FIB object is 
used to emulate the forwarding stage for IPv4 packets.
+The LPM or FIB object is used as the routing table to identify the next hop 
for each input packet at runtime.
 
-The LPM lookup key is represented by the Destination IP Address field read 
from the input packet.
-The ID of the output interface for the input packet is the next hop returned 
by the LPM lookup.
-The set of LPM rules used by the application is statically configured and 
loaded into the LPM object at initialization time.
+The LPM and FIB lookup keys are represented by the Destination IP Address 
field read from the input packet.
+The ID of the output interface for the input packet is the next hop returned 
by the LPM or FIB lookup.
+The set of LPM and FIB rules used by the application is statically configured 
and loaded into the LPM or FIB object at initialization time.
 
-In the sample application, hash-based forwarding supports IPv4 and IPv6. 
LPM-based forwarding supports IPv4 only.
+In the sample application, hash-based and FIB-based forwarding supports both 
IPv4 and IPv6. LPM-based forwarding supports IPv4 only.
 
 Compiling the Application
 -
@@ -53,8 +53,7 @@ The application has a number of command line options::
 
 ./dpdk-l3fwd [EAL options] -- -p PORTMASK
  [-P]
- [-E]
- [-L]
+ [--lookup LOOKUP_METHOD]
  --config(port,queue,lcore)[,(port,queue,lcore)]
  [--eth-dest=X,MM:MM:MM:MM:MM:MM]
  [--enable-jumbo [--max-pkt-len PKTLEN]]
@@ -66,6 +65,8 @@ The application has a number of command line options::
  [--mode]
  [--eventq-sched]
  [--event-eth-rxqs]
+ [-E]
+ [-L]
 
 Where,
 
@@ -74,9 +75,7 @@ Where,
 * ``-P:`` Optional, sets all ports to promiscuous mode so that packets are 
accepted regardless of the packet's Ethernet MAC destination address.
   Without this option, only packets with the Ethernet MAC destination address 
set to the Ethernet address of the port are accepted.
 
-* ``-E:`` Optional, enable exact match.
-
-* ``-L:`` Optional, enable longest prefix match.
+* ``--lookup:`` Optional, Select the lookup method. Accepted options ``em`` 
(Exact Match), ``lpm`` (Longest Prefix Match), ``fib`` (Forwarding Information 
Base). Default is ``lpm``.
 
 * ``--config (port,queue,lcore)[,(port,queue,lcore)]:`` Determines which 
queues from which ports are mapped to which cores.
 
@@ -102,6 +101,10 @@

Re: [dpdk-dev] [PATCH v3 0/4] net: replace Windows networking shim

2021-03-15 Thread Ferruh Yigit

On 3/13/2021 10:22 PM, Dmitry Kozlyuk wrote:

Networking header shim in Windows EAL conflicts with system headers and
tries to provide POSIX compatibility out of scope for DPDK.
Remove dependency on POSIX headers from libraries supported on Windows,
then replace shim with librte_net with workarounds.

A proposed deprecation notice is assumed:
http://patchwork.dpdk.org/project/dpdk/list/?series=15595

v3: Fix build on FreeBSD for real (CI).
v2: Fix build on FreeBSD (CI).

Depends-on: series-15513 ("eal/windows: do not expose POSIX symbols")

Dmitry Kozlyuk (4):
   cmdline: remove POSIX dependency
   ethdev: remove POSIX dependency
   net/mlx5: remove POSIX dependency
   net: replace Windows networking shim



Hi Dmitry,

Have you seen the CI reported build errors:
http://mails.dpdk.org/archives/test-report/2021-March/182361.html

Briefly:
./lib/librte_net/rte_net.c:132:7: error: 'IPPROTO_GRE' undeclared
./lib/librte_net/rte_net.c:163:7: error: 'IPPROTO_IPIP' undeclared



Re: [dpdk-dev] [PATCH v11 2/2] bus/pci: support MMIO in PCI ioport accessors

2021-03-15 Thread Wang, Haiyue
> -Original Message-
> From: David Marchand 
> Sent: Monday, March 15, 2021 18:20
> To: Wang, Haiyue ; 谢华伟(此时此刻) 
> 
> Cc: maxime.coque...@redhat.com; Yigit, Ferruh ; 
> dev@dpdk.org; Burakov, Anatoly
> ; xuemi...@nvidia.com; gr...@u256.net
> Subject: Re: [dpdk-dev] [PATCH v11 2/2] bus/pci: support MMIO in PCI ioport 
> accessors
> 
> On Thu, Mar 11, 2021 at 7:43 AM Wang, Haiyue  wrote:
> > Like kernel use macro to do pio and mmio, maybe we can also to do so for
> > making code clean:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/iomap.c
> >
> > #define IO_COND(addr, is_pio, is_mmio) do { \
> > unsigned long port = (unsigned long __force)addr;   \
> > if (port >= PIO_RESERVED) { \
> > is_mmio;\
> > } else if (port > PIO_OFFSET) { \
> > port &= PIO_MASK;   \
> > is_pio; \
> > } else  \
> > bad_io_access(port, #is_pio );  \
> > } while (0)
> >
> >
> > Like:
> >
> > #if defined(RTE_ARCH_X86)
> > #define IO_COND(addr, is_pio, is_mmio) do {   \
> > if ((uint64_t)(uintptr_t)addr >= PIO_MAX) {   \
> > is_mmio;  \
> > } else {  \
> > is_pio;   \
> > } \
> > } while (0)
> > #else
> > #define IO_COND(addr, is_pio, is_mmio) do {   \
> > is_mmio;  \
> > } while (0)
> > #endif
> 
> We should not just copy/paste kernel code.
> 

Got it ;-)

> 
> 
> --
> David Marchand



Re: [dpdk-dev] Duplicating traffic with RTE Flow

2021-03-15 Thread Jan Viktorin
Hello Jiawei,

On Fri, 12 Mar 2021 09:32:44 +
"Jiawei(Jonny) Wang"  wrote:

> Hi Jan,
> 
> > -Original Message-
> > From: Jan Viktorin 
> > Sent: Friday, March 12, 2021 12:33 AM
> > To: Jiawei(Jonny) Wang 
> > Cc: Slava Ovsiienko ; Asaf Penso
> > ; dev@dpdk.org; Ori Kam 
> > Subject: Re: [dpdk-dev] Duplicating traffic with RTE Flow
> > 
> > On Thu, 11 Mar 2021 02:11:07 +
> > "Jiawei(Jonny) Wang"  wrote:
> >   
> > > Hi Jan,
> > >
> > > Sorry for late response,
> > >
> > > First rule is invalid, port only works on FDB domain so need
> > > 'transfer' here; Second rule should be ok,  could you please check if the 
> > >  
> > port 1 was enabled on you dpdk application?
> > 
> > I assume that it is enabled, see full transcript:
> > 
> >  $ ofed_info
> >  MLNX_OFED_LINUX-5.2-1.0.4.0 (OFED-5.2-1.0.4):
> >  ...
> >  $ sudo dpdk-testpmd -v -- -i
> >  EAL: Detected 24 lcore(s)
> >  EAL: Detected 1 NUMA nodes
> >  EAL: RTE Version: 'DPDK 20.11.0'
> >  EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> >  EAL: Selected IOVA mode 'PA'
> >  EAL: No available hugepages reported in hugepages-1048576kB
> >  EAL: Probing VFIO support...
> >  EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: :04:00.0 (socket 0)
> >  mlx5_pci: No available register for Sampler.
> >  mlx5_pci: Size 0x is not power of 2, will be aligned to 0x1.
> >  EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: :04:00.1 (socket 0)
> >  mlx5_pci: No available register for Sampler.
> >  mlx5_pci: Size 0x is not power of 2, will be aligned to 0x1.
> >  EAL: No legacy callbacks, legacy socket not created  Interactive-mode
> > selected
> >  testpmd: create a new mbuf pool : n=331456, size=2176,
> > socket=0
> >  testpmd: preferred mempool ops selected: ring_mp_mc  Configuring Port 0
> > (socket 0)  Port 0: B8:59:9F:E2:09:F6  Configuring Port 1 (socket 0)  Port 
> > 1:
> > B8:59:9F:E2:09:F7  Checking link statuses...
> >  Done  
> 
> Seems that you start two PF port here,  Port 1 is not VF port;
> FDB rule can steering the packet form PF to its VFs and vice versa, Could you 
> please try to open the
> VF ports and start the testpmd with representor=.

I did not know this, so I tried with VFs:

 # echo 2 > /sys/class/net/hge1/device/sriov_numvfs
 # echo switchdev > /sys/class/net/hge1/compat/devlink/mode

 # dpdk-testpmd -v -a ':05:00.1,representor=[0-1]' -- -i
 EAL: Detected 24 lcore(s)
 EAL: Detected 1 NUMA nodes
 EAL: RTE Version: 'DPDK 20.11.0'
 EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
 EAL: Selected IOVA mode 'VA'
 EAL: No available hugepages reported in hugepages-1048576kB
 EAL: Probing VFIO support...
 EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: :05:00.1 (socket 0)
 mlx5_pci: No available register for Sampler.
 mlx5_pci: Size 0x is not power of 2, will be aligned to 0x1.
 mlx5_pci: No available register for Sampler.
 mlx5_pci: No available register for Sampler.
 EAL: No legacy callbacks, legacy socket not created
 Interactive-mode selected
 testpmd: create a new mbuf pool : n=331456, size=2176, socket=0
 testpmd: preferred mempool ops selected: ring_mp_mc

 Warning! port-topology=paired and odd forward ports number, the last port will 
pair with itself.

 Configuring Port 0 (socket 0)
 Port 0: B8:59:9F:E2:09:F7
 Configuring Port 1 (socket 0)
 Port 1: B2:57:D6:72:F3:31
 Configuring Port 2 (socket 0)
 Port 2: 9E:CB:D0:73:59:CE
 Checking link statuses...
 Done
 testpmd> show port summary all
 Number of available ports: 3
 Port MAC Address   Name Driver Status   Link
 0B8:59:9F:E2:09:F7 :05:00.1 mlx5_pci   up   100 Gbps
 1B2:57:D6:72:F3:31 :05:00.1_representor_0 mlx5_pci   up   100 
Gbps
 29E:CB:D0:73:59:CE :05:00.1_representor_1 mlx5_pci   up   100 
Gbps
 testpmd> set sample_actions 0 port_id id 1 / end
 testpmd> flow validate 0 ingress transfer pattern end actions sample ratio 1 
index 0 / drop / end
 port_flow_complain(): Caught PMD error type 1 (cause unspecified): sample 
action not supported: Operation not supported

Still no luck. However, there is this message 3-times in the log:

 mlx5_pci: No available register for Sampler.

It looks like it might be related. What does it mean?

Jan

> 
> Thanks.
> 
> >  testpmd> port start 1  
> >  Port 1 is now not stopped
> >  Please stop the ports first
> >  Done  
> >  testpmd> set sample_actions 0 port_id id 1 / end  testpmd> flow validate 0 
> >  
> > ingress transfer pattern end actions sample ratio 1 index 0 / drop / end
> >  port_flow_complain(): Caught PMD error type 1 (cause unspecified): (no
> > stated reason): Operation not supported  testpmd> flow create 0 ingress
> > transfer pattern end actions sample ratio 1 index 0 / drop / end
> >  port_flow_complain(): Caught PMD error type 1 (cause unspecified): (no
> > stated reason): Operation not supported  testpmd>  Stopping port 0...
> >  Stopping ports...
> >  Done
> > 
> >  Stopping port 1...
> >  Stoppi

Re: [dpdk-dev] [PATCH v3 0/4] net: replace Windows networking shim

2021-03-15 Thread Dmitry Kozlyuk
>
>
>
Hi Ferruh,

>
Have you seen the CI reported build errors:
> http://mails.dpdk.org/archives/test-report/2021-March/182361.html
>
> Briefly:
> ./lib/librte_net/rte_net.c:132:7: error: 'IPPROTO_GRE' undeclared
> ./lib/librte_net/rte_net.c:163:7: error: 'IPPROTO_IPIP' undeclared
>

This is because CI doesn't apply patches in Depends-on. In this case,
missing constants would be defined when RTE_BUILD_INTERNAL is defined (so
that symbols are only visible to DPDK), and it is introduced by dependency
series.

>


[dpdk-dev] [Bug 661] [mlx5] VLAN packets will not do RSS

2021-03-15 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=661

Bug ID: 661
   Summary: [mlx5] VLAN packets will not do RSS
   Product: DPDK
   Version: 19.11
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: alia...@nvidia.com
  Target Milestone: ---

RSS doesn't seem to be working correctly with packets that have VLAN. The
packets will be received, but will always go to the first queue.

To reproduce, run testpmd and set RSS type to TCP, then send the following
packet:
Ether / Dot1Q / IP / TCP / Raw

Packets will be spread correctly when sending a packet without VLAN. Example:
Ether / IP / TCP / Raw

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] [Bug 661] [mlx5] VLAN packets will not do RSS

2021-03-15 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=661

Ali Alnubani (alia...@nvidia.com) changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Ali Alnubani (alia...@nvidia.com) ---
Will not fix this in 19.11.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [dpdk-dev] 19.11.7 patches review and test

2021-03-15 Thread Ali Alnubani
Hi,

> -Original Message-
> From: Christian Ehrhardt 
> Sent: Wednesday, March 10, 2021 3:37 PM
> To: sta...@dpdk.org
> Cc: dev@dpdk.org; Abhishek Marathe ;
> Akhil Goyal ; Ali Alnubani ;
> benjamin.wal...@intel.com; David Christensen ;
> hariprasad.govindhara...@intel.com; Hemant Agrawal
> ; Ian Stokes ; Jerin
> Jacob ; John McNamara ;
> Ju-Hyoung Lee ; Kevin Traynor
> ; Luca Boccassi ; Pei Zhang
> ; pingx...@intel.com; qian.q...@intel.com; Raslan
> Darawsheh ; NBU-Contact-Thomas Monjalon
> ; yuan.p...@intel.com; zhaoyan.c...@intel.com
> Subject: 19.11.7 patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for stable release 19.11.7.
> 
> The (new) planned date for the final release is 17th of March.
> 
> Please help with testing and validation of your use cases and report
> any issues/results with reply-all to this mail. For the final release
> the fixes and reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v19.11.7-rc2
> 
> These patches are located at branch 19.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Christian Ehrhardt 
> 
> ---

Thanks Christian for creating the new release candidate.

The following covers the functional tests that we ran on Mellanox hardware for 
this release:
- Basic functionality:
  Send and receive multiple types of traffic.
- testpmd xstats counter test.
- testpmd timestamp test.
- Changing/checking link status through testpmd.
- RTE flow tests:
  Items: eth / vlan / ipv4 / ipv6 / tcp / udp / icmp / gre / nvgre / vxlan / ip 
in ip / mplsoudp / mplsogre
  Actions: drop / queue / rss / mark / flag / jump / count / raw_encap / 
raw_decap / vxlan_encap / vxlan_decap / NAT / dec_ttl
- Some RSS tests.
- VLAN filtering, stripping and insertion tests.
- Checksum and TSO tests.
- ptype tests.
- link_status_interrupt example application tests.
- l3fwd-power example application tests.
- Multi-process example applications tests.

Functional tests ran on:
- NIC: ConnectX-4 Lx / OS: RHEL7.4 / Driver: MLNX_OFED_LINUX-5.2-2.2.0.0 / 
Firmware: 14.29.2002
- NIC: ConnectX-5 / OS: RHEL7.4 / Driver: MLNX_OFED_LINUX-5.2-2.2.0.0 / 
Firmware: 16.29.2002

Compilation tests with multiple configurations in the following OS/driver 
combinations are also passing:
- Ubuntu 20.04.2 with MLNX_OFED_LINUX-5.2-2.2.0.0.
- Ubuntu 20.04.2 with rdma-core master (a1a9ffb).
- Ubuntu 20.04.2 with rdma-core v28.0.
- Ubuntu 18.04.5 with rdma-core v17.1.
- Ubuntu 18.04.5 with rdma-core master (a1a9ffb) (i386).
- Ubuntu 16.04.7 with rdma-core v22.7.
- Fedora 32 with rdma-core v33.0.
- CentOS 7 7.9.2009 with rdma-core master (a1a9ffb).
- CentOS 7 7.9.2009 with MLNX_OFED_LINUX-5.2-2.2.0.0.
- CentOS 8 8.3.2011 with rdma-core master (a1a9ffb).
- OpenSUSE Leap 15.2 with rdma-core v27.1.

We don't see any new issues in this release candidate. However, due to 
environment changes, we started seeing the following issue, which reproduces in 
older 19.11 releases as well:
https://bugs.dpdk.org/show_bug.cgi?id=661
We will not fix this issue in this release.

Regards,
Ali


Re: [dpdk-dev] DPDK 21.05 NVIDIA Mellanox Roadmap

2021-03-15 Thread Thomas Monjalon
This roadmap has been integrated in the web page
https://core.dpdk.org/roadmap/#2105


01/03/2021 20:17, Asaf Penso:
> Below is NVIDIA Mellanox's roadmap for DPDK21.05, on which we are currently 
> working:
> 
> rte_flow new APIs:
> ===
> [1]Support a new action offload which perform connection tracking window 
> validation.
> Motivation:
> TCP connection tracking is needed for many applications that act as a 
> mediator and perform forwarding. The new offload connection tracking(CT) 
> window validation is used for enforcing TCP protocol adherence.
> It also enforces several sanity checks for TCP packets like the validity of 
> L3 and L4 headers as well as the accuracy of L3 and L4 checksum.
> The new offload action API will provide means to create, configure, query, 
> and modify the connection tracking object by a SW application. It will 
> support a bi-directional, cross vport TCP handshake in an optimized manner
>  
> [2]Add support for matching based on sanity checks of TCP packets. Able to 
> match the validity of L3 and L4 headers as well as L3 checksum and L4 
> checksum.
> Motivation:
> Allow TCP connection tracking flow to intercept corrupted packets before they 
> alter the connection tracking object. An application may match on such cases 
> and handle differently than regular route(e.g drop or pass to SW queue
>  
> [3]Extend meter capabilities with the concept of meter policy
> Motivation:
> Extend meter capabilities to add support for a shared meter policy. A meter 
> policy is an object that can be shared among different meters. It provides 
> the ability to associate different actions per color Red/Yellow/Green and 
> thus use a meter as a steering mechanism. The first implementation will 
> support queue, rss, jump, mark, and set_tag actions. Given the fact that the 
> policy is shared across many meter flows a performance gain is also expected. 
> rte API will be augmented with an additional create meter API to make use of 
> the new policy object.
>
> [4]Add support for writing information related to a single rte flow
> Motivation: 
> Allow finely grained debug of how flows are represented in the HW. Previously 
> support was added to dump all rte flows using 'flow dump  all 
> . Now we are extending to support single flow dump using flow 
> dump  rule  
> 
> 
> rte_mtr new APIs
> ===
> [5] add support for a meter profile that enable packet per second metering
> Motivation:
> Provide flexibility to applications that would like to meter based on packets 
> per second granularity on top of byte per second granularity that exist today 
> as part of meter profile.
>  
> 
> mlx5 PMD updates: 
> 
> mlx5 PMD will support the rte_flow update changes listed above and below
>  
> [6]Extend support for VLAN pop on egress direction and VLAN push on ingress 
> direction
> Motivation:
> Some applications like firewalls, need to alter the routing information 
> bi-directionally. Today mlx5 PMD supports VLAN pop on ingress and VLAN push 
> on ingress and the intention here is the augment with the corresponding 
> pop/push actions.
> 
> [7]Add support for rte_security API
> Motivation: enable IPsec inline offload to be used in conjunction with other 
> rte flow API to enable inline encrypt/decrypt of packets. Mlx5 will support 
> Encapsulating Security Payload(ESP) with ConnectX-6 Dx and BlueField-2 
> 
> [8]Add support for power saving in rx queues
> Motivation: support for umwait command to enable reduction of power 
> consumption if no packets are received.
>  
> [9]Add support for using HW configured timestamp format 
> Motivation: modify the pmd to use the timestamp format based on HW ability - 
> either UTC or free-running
> 
> 
> New PMDs:
> ==
> [10]Implement look aside AES-XTS encryption/decryption PMD over BlueField-2 
> SmartNIC and ConnectX-6 Dx to support existing rte_cryptodev API
> 
> 
> Regex PMD updates:
> =
> [11]Added support for regex(regular expression engine in BlueField-2  with 
> chained mbuf   
> Motivation: Allow  regex to handle jobs that require a multiple chained mbuf 
> jobs efficiently 
> 
> 
> testpmd updates: 
> 
> testpmd updated to support the changes listed above
>  
> 
> flow-perf updates:
> 
> enhance flow-perf application to support the connection tracking window 
> validation offload






Re: [dpdk-dev] DPDK 21.05 Wangxun Roadmap

2021-03-15 Thread Thomas Monjalon
The addition of a new driver is added to the web page:
https://core.dpdk.org/roadmap/#2105


04/03/2021 09:30, Jiawen Wu:
> There is Wangxun's roadmap for DPDK 21.05.
> 
> Bug fixes:
> [1] fix TXGBE Rx drop statistics
> [2] fix TXGBE packet type
> [3] fix TXGBE IPsec
> [4] fix TXGBE backplane link process, and support to control training for
> auto-negotiation
> 
> Others:
> [1] remove redundancy code for TXGBE
> 
> New PMD:
> [1] add a new PMD for Wangxun 1Gb NICs





Re: [dpdk-dev] [PATCH 2/3] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread Maxime Coquelin



On 1/11/21 6:39 AM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -Original Message-
>> From: Maxime Coquelin 
>> Sent: Tuesday, December 22, 2020 12:15 AM
>> To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com;
>> david.march...@redhat.com; olivier.m...@6wind.com
>> Cc: Maxime Coquelin 
>> Subject: [PATCH 2/3] net/virtio: allocate fake mbuf in Rx queue
>>
>> While it is worth clarifying whether the fake mbuf
>> in virtnet_rx struct is really necessary, it is sure
>> that it heavily impacts cache usage by being part of
>> the struct. Indeed, it takes uses cachelines, and
>> requires alignement on a cacheline.
>>
>> Before this series, it means it took 120 bytes in
>> virtnet_rx struct:
>>
>> struct virtnet_rx {
>>  struct virtqueue * vq;   /* 0 8 */
>>
>>  /* XXX 56 bytes hole, try to pack */
>>
>>  /* --- cacheline 1 boundary (64 bytes) --- */
>>  struct rte_mbuffake_mbuf __attribute__((__aligned__(64)));
>> /*64   128 */
>>  /* --- cacheline 3 boundary (192 bytes) --- */
>>
>> This patch allocates it using malloc in order to optimize
>> virtnet_rx cache usage and so virtqueue cache usage.
>>
>> Signed-off-by: Maxime Coquelin 
>> ---
>>  drivers/net/virtio/virtio_ethdev.c | 10 ++
>>  drivers/net/virtio/virtio_rxtx.c   |  8 +++-
>>  drivers/net/virtio/virtio_rxtx.h   |  2 +-
>>  3 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.c
>> b/drivers/net/virtio/virtio_ethdev.c
>> index 297c01a70d..a1351b36ca 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -539,6 +539,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t
>> queue_idx)
>>  }
>>
>>  if (queue_type == VTNET_RQ) {
>> +struct rte_mbuf *fake_mbuf;
>>  size_t sz_sw = (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
>> sizeof(vq->sw_ring[0]);
>>
>> @@ -550,10 +551,18 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t
>> queue_idx)
>>  goto fail_q_alloc;
>>  }
>>
>> +fake_mbuf = malloc(sizeof(*fake_mbuf));
>> +if (!fake_mbuf) {
>> +PMD_INIT_LOG(ERR, "can not allocate fake mbuf");
>> +ret = -ENOMEM;
>> +goto fail_q_alloc;
>> +}
>> +
>>  vq->sw_ring = sw_ring;
>>  rxvq = &vq->rxq;
>>  rxvq->port_id = dev->data->port_id;
>>  rxvq->mz = mz;
>> +rxvq->fake_mbuf = fake_mbuf;
>>  } else if (queue_type == VTNET_TQ) {
>>  txvq = &vq->txq;
>>  txvq->port_id = dev->data->port_id;
>> @@ -636,6 +645,7 @@ virtio_free_queues(struct virtio_hw *hw)
>>
>>  queue_type = virtio_get_queue_type(hw, i);
>>  if (queue_type == VTNET_RQ) {
>> +free(vq->rxq.fake_mbuf);
> 
> After thinking about this again, although you add the free of fake mbuf
> here, it's better to add free in virtio_init_queue too after fail_q_alloc.
> And when setup_queue(hw, vq) fails, it's better to goto fail_q_alloc to 
> free fake mbuf. Now it will not memory leak as we use virtio_free_queues when
> virtio_alloc_queues fails. But inside virtio_init_queue, it's better to
> handle the errors well.. If you agree with above, it may also be good to
> change the name 'fail_q_alloc' since now it may also fail when setting up
> queues.


The error path indeed needs some rework.
I will add a preliminary patch to rework it before this patch is
applied.

> Sorry for an extra email about this...

No worries, that's much appreciated!

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>>  rte_free(vq->sw_ring);
>>  rte_memzone_free(vq->rxq.mz);
>>  } else if (queue_type == VTNET_TQ) {
>> diff --git a/drivers/net/virtio/virtio_rxtx.c
>> b/drivers/net/virtio/virtio_rxtx.c
>> index 1fcce36cbd..d147d7300a 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -703,11 +703,9 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev 
>> *dev,
>> uint16_t queue_idx)
>>  virtio_rxq_vec_setup(rxvq);
>>  }
>>
>> -memset(&rxvq->fake_mbuf, 0, sizeof(rxvq->fake_mbuf));
>> -for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST;
>> - desc_idx++) {
>> -vq->sw_ring[vq->vq_nentries + desc_idx] =
>> -&rxvq->fake_mbuf;
>> +memset(rxvq->fake_mbuf, 0, sizeof(*rxvq->fake_mbuf));
>> +for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST; desc_idx++) {
>> +vq->sw_ring[vq->vq_nentries + desc_idx] = rxvq->fake_mbuf;
>>  }
>>
>>  if (hw->use_vec_rx && !virtio_with_packed_queue(hw)) {
>> diff --git a/drivers/net/virtio/virtio_rxtx.h
>> b/drivers/net/virtio/virtio_rxtx.h
>> index 7f1036be6f..6ce5d67d15 100644
>> --- a/drivers/net/virtio/virtio_rxtx.h
>> +++ b/drivers/net/virtio/

[dpdk-dev] [PATCH v2 0/8] common/sfc_efx: prepare to introduce vDPA driver

2021-03-15 Thread Andrew Rybchenko
Update base driver to provide functionality required by vDPA driver.

Factor out helper functions to be shared by net and vDPA drivers.

v2:
 - fix windows build breakage - do not build common/sfc_efx in the case
   of windows
 - remove undefined efx_virtio_* functions from version.map (since
   EFSYS_OPT_VIRTIO is disabled)

Vijay Kumar Srivastava (6):
  common/sfc_efx/base: add virtio build dependency
  common/sfc_efx/base: add support to get virtio features
  common/sfc_efx/base: add support to verify virtio features
  common/sfc_efx: add support to get the device class
  net/sfc: skip driver probe for incompatible device class
  drivers: add common driver API to get efx family

Vijay Srivastava (2):
  common/sfc_efx/base: add base virtio support for vDPA
  common/sfc_efx/base: add API to get VirtQ doorbell offset

 doc/guides/nics/sfc_efx.rst|   8 +
 drivers/common/meson.build |   2 +-
 drivers/common/sfc_efx/base/efx.h  | 142 
 drivers/common/sfc_efx/base/efx_check.h|   9 +
 drivers/common/sfc_efx/base/efx_impl.h |  42 +++
 drivers/common/sfc_efx/base/efx_virtio.c   | 340 ++
 drivers/common/sfc_efx/base/meson.build|   2 +
 drivers/common/sfc_efx/base/rhead_impl.h   |  37 ++
 drivers/common/sfc_efx/base/rhead_virtio.c | 379 +
 drivers/common/sfc_efx/efsys.h |   2 +
 drivers/common/sfc_efx/meson.build |   5 +
 drivers/common/sfc_efx/sfc_efx.c   | 105 ++
 drivers/common/sfc_efx/sfc_efx.h   |  44 +++
 drivers/common/sfc_efx/version.map |   3 +
 drivers/meson.build|   1 +
 drivers/net/sfc/sfc.c  |  61 +---
 drivers/net/sfc/sfc.h  |   1 +
 drivers/net/sfc/sfc_ethdev.c   |   7 +
 drivers/net/sfc/sfc_kvargs.c   |   1 +
 19 files changed, 1133 insertions(+), 58 deletions(-)
 create mode 100644 drivers/common/sfc_efx/base/efx_virtio.c
 create mode 100644 drivers/common/sfc_efx/base/rhead_virtio.c
 create mode 100644 drivers/common/sfc_efx/sfc_efx.h

-- 
2.30.1



[dpdk-dev] [PATCH v2 2/8] common/sfc_efx/base: add API to get VirtQ doorbell offset

2021-03-15 Thread Andrew Rybchenko
From: Vijay Srivastava 

Add an API to query the virtqueue doorbell offset in the BAR for a VI.
For vDPA, the virtio net driver notifies the device directly by writing
doorbell. This API would be invoked from vDPA client driver.

Signed-off-by: Vijay Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/sfc_efx/base/efx.h  | 12 +++
 drivers/common/sfc_efx/base/efx_impl.h |  2 +
 drivers/common/sfc_efx/base/efx_virtio.c   | 41 ++
 drivers/common/sfc_efx/base/rhead_impl.h   |  6 ++
 drivers/common/sfc_efx/base/rhead_virtio.c | 93 ++
 5 files changed, 154 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index c2c73bd382..d4b7d7f47e 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4475,6 +4475,18 @@ extern   void
 efx_virtio_qdestroy(
__inefx_virtio_vq_t *evvp);
 
+/*
+ * Get the offset in the BAR of the doorbells for a VI.
+ * net device : doorbell offset of RX & TX queues
+ * block device : request doorbell offset in the BAR.
+ * For further details refer section of 4 of SF-119689
+ */
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_get_doorbell_offset(
+   __inefx_virtio_vq_t *evvp,
+   __out   uint32_t *offsetp);
+
 #endif /* EFSYS_OPT_VIRTIO */
 
 #ifdef __cplusplus
diff --git a/drivers/common/sfc_efx/base/efx_impl.h 
b/drivers/common/sfc_efx/base/efx_impl.h
index f27d9fa82c..d6742f4a8c 100644
--- a/drivers/common/sfc_efx/base/efx_impl.h
+++ b/drivers/common/sfc_efx/base/efx_impl.h
@@ -316,6 +316,8 @@ typedef struct efx_virtio_ops_s {
efx_virtio_vq_dyncfg_t *);
efx_rc_t(*evo_virtio_qstop)(efx_virtio_vq_t *,
efx_virtio_vq_dyncfg_t *);
+   efx_rc_t(*evo_get_doorbell_offset)(efx_virtio_vq_t *,
+   uint32_t *);
 } efx_virtio_ops_t;
 #endif /* EFSYS_OPT_VIRTIO */
 
diff --git a/drivers/common/sfc_efx/base/efx_virtio.c 
b/drivers/common/sfc_efx/base/efx_virtio.c
index 1b7b01556e..de998fcad9 100644
--- a/drivers/common/sfc_efx/base/efx_virtio.c
+++ b/drivers/common/sfc_efx/base/efx_virtio.c
@@ -12,6 +12,7 @@
 static const efx_virtio_ops_t  __efx_virtio_rhead_ops = {
rhead_virtio_qstart,/* evo_virtio_qstart */
rhead_virtio_qstop, /* evo_virtio_qstop */
+   rhead_virtio_get_doorbell_offset,   /* evo_get_doorbell_offset */
 };
 #endif /* EFSYS_OPT_RIVERHEAD */
 
@@ -213,4 +214,44 @@ efx_virtio_qdestroy(
}
 }
 
+   __checkReturn   efx_rc_t
+efx_virtio_get_doorbell_offset(
+   __inefx_virtio_vq_t *evvp,
+   __out   uint32_t *offsetp)
+{
+   efx_nic_t *enp;
+   const efx_virtio_ops_t *evop;
+   efx_rc_t rc;
+
+   if ((evvp == NULL) || (offsetp == NULL)) {
+   rc = EINVAL;
+   goto fail1;
+   }
+
+   enp = evvp->evv_enp;
+   evop = enp->en_evop;
+
+   EFSYS_ASSERT3U(enp->en_magic, ==, EFX_NIC_MAGIC);
+   EFSYS_ASSERT3U(enp->en_mod_flags, &, EFX_MOD_VIRTIO);
+
+   if (evop == NULL) {
+   rc = ENOTSUP;
+   goto fail2;
+   }
+
+   if ((rc = evop->evo_get_doorbell_offset(evvp, offsetp)) != 0)
+   goto fail3;
+
+   return (0);
+
+fail3:
+   EFSYS_PROBE(fail3);
+fail2:
+   EFSYS_PROBE(fail2);
+fail1:
+   EFSYS_PROBE1(fail1, efx_rc_t, rc);
+
+   return (rc);
+}
+
 #endif /* EFSYS_OPT_VIRTIO */
diff --git a/drivers/common/sfc_efx/base/rhead_impl.h 
b/drivers/common/sfc_efx/base/rhead_impl.h
index a15ac52a58..4304f63f4c 100644
--- a/drivers/common/sfc_efx/base/rhead_impl.h
+++ b/drivers/common/sfc_efx/base/rhead_impl.h
@@ -492,6 +492,12 @@ rhead_virtio_qstop(
__inefx_virtio_vq_t *evvp,
__out_opt   efx_virtio_vq_dyncfg_t *evvdp);
 
+LIBEFX_INTERNAL
+extern __checkReturn   efx_rc_t
+rhead_virtio_get_doorbell_offset(
+   __inefx_virtio_vq_t *evvp,
+   __out   uint32_t *offsetp);
+
 #endif /* EFSYS_OPT_VIRTIO */
 
 #ifdef __cplusplus
diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c 
b/drivers/common/sfc_efx/base/rhead_virtio.c
index d1719f834e..147460c95c 100644
--- a/drivers/common/sfc_efx/base/rhead_virtio.c
+++ b/drivers/common/sfc_efx/base/rhead_virtio.c
@@ -187,4 +187,97 @@ rhead_virtio_qstop(
return (rc);
 }
 
+   __checkReturn   efx_rc_t
+rhead_virtio_get_doorbell_offset(
+   __inefx_virtio_vq_t *evvp,
+   __out   uint32_t *offsetp)
+{
+   efx_nic_t *enp = evvp->evv_enp;
+   efx_mcdi_req_t req;
+   uint32_t type;
+   EFX_MCDI_DECLARE_BUF(payload, MC_CMD_VIRTIO_GET_DOORBELL_OFFSET_REQ_LEN,
+   MC_CMD_VIRTIO_GET_NET_DOORBELL_OFFSET_RESP_LEN);
+   efx_rc_t

[dpdk-dev] [PATCH v2 1/8] common/sfc_efx/base: add base virtio support for vDPA

2021-03-15 Thread Andrew Rybchenko
From: Vijay Srivastava 

In the vDPA mode, only data path is offloaded in the hardware and
control path still goes through the hypervisor and it configures
virtqueues via vDPA driver so new virtqueue APIs are required.

Implement virtio init/fini and virtqueue create/destroy APIs.

Signed-off-by: Vijay Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/sfc_efx/base/efx.h  | 109 +++
 drivers/common/sfc_efx/base/efx_check.h|   6 +
 drivers/common/sfc_efx/base/efx_impl.h |  36 
 drivers/common/sfc_efx/base/efx_virtio.c   | 216 +
 drivers/common/sfc_efx/base/meson.build|   2 +
 drivers/common/sfc_efx/base/rhead_impl.h   |  17 ++
 drivers/common/sfc_efx/base/rhead_virtio.c | 190 ++
 drivers/common/sfc_efx/efsys.h |   2 +
 8 files changed, 578 insertions(+)
 create mode 100644 drivers/common/sfc_efx/base/efx_virtio.c
 create mode 100644 drivers/common/sfc_efx/base/rhead_virtio.c

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index 2c820022b2..c2c73bd382 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4368,6 +4368,115 @@ efx_mae_action_rule_remove(
 
 #endif /* EFSYS_OPT_MAE */
 
+#if EFSYS_OPT_VIRTIO
+
+/* A Virtio net device can have one or more pairs of Rx/Tx virtqueues
+ * while virtio block device has a single virtqueue,
+ * for further details refer section of 4.2.3 of SF-120734
+ */
+typedef enum efx_virtio_vq_type_e {
+   EFX_VIRTIO_VQ_TYPE_NET_RXQ,
+   EFX_VIRTIO_VQ_TYPE_NET_TXQ,
+   EFX_VIRTIO_VQ_TYPE_BLOCK,
+   EFX_VIRTIO_VQ_NTYPES
+} efx_virtio_vq_type_t;
+
+typedef struct efx_virtio_vq_dyncfg_s {
+   /*
+* If queue is being created to be migrated then this
+* should be the FINAL_PIDX value returned by MC_CMD_VIRTIO_FINI_QUEUE
+* of the queue being migrated from. Otherwise, it should be zero.
+*/
+   uint32_tevvd_vq_pidx;
+   /*
+* If this queue is being created to be migrated then this
+* should be the FINAL_CIDX value returned by MC_CMD_VIRTIO_FINI_QUEUE
+* of the queue being migrated from. Otherwise, it should be zero.
+*/
+   uint32_tevvd_vq_cidx;
+} efx_virtio_vq_dyncfg_t;
+
+/*
+ * Virtqueue size must be a power of 2, maximum size is 32768
+ * (see VIRTIO v1.1 section 2.6)
+ */
+#define EFX_VIRTIO_MAX_VQ_SIZE 0x8000
+
+typedef struct efx_virtio_vq_cfg_s {
+   unsigned intevvc_vq_num;
+   efx_virtio_vq_type_tevvc_type;
+   /*
+* vDPA as VF : It is target VF number if queue is being created on VF.
+* vDPA as PF : If queue to be created on PF then it should be
+* EFX_PCI_VF_INVALID.
+*/
+   uint16_tevvc_target_vf;
+   /*
+* Maximum virtqueue size is EFX_VIRTIO_MAX_VQ_SIZE and
+* virtqueue size 0 means the queue is unavailable.
+*/
+   uint32_tevvc_vq_size;
+   efsys_dma_addr_tevvc_desc_tbl_addr;
+   efsys_dma_addr_tevvc_avail_ring_addr;
+   efsys_dma_addr_tevvc_used_ring_addr;
+   /* MSIX vector number for the virtqueue or 0x if MSIX is not used */
+   uint16_tevvc_msix_vector;
+   /*
+* evvc_pas_id contains a PCIe address space identifier if the queue
+* uses PASID.
+*/
+   boolean_t   evvc_use_pasid;
+   uint32_tevvc_pas_id;
+   /* Negotiated virtio features to be applied to this virtqueue */
+   uint64_tevcc_features;
+} efx_virtio_vq_cfg_t;
+
+typedef struct efx_virtio_vq_s efx_virtio_vq_t;
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_init(
+   __inefx_nic_t *enp);
+
+LIBEFX_API
+extern void
+efx_virtio_fini(
+   __inefx_nic_t *enp);
+
+/*
+ * When virtio net driver in the guest sets VIRTIO_CONFIG_STATUS_DRIVER_OK bit,
+ * hypervisor starts configuring all the virtqueues in the device. When the
+ * vhost_user has received VHOST_USER_SET_VRING_ENABLE for all the virtqueues,
+ * then it invokes VDPA driver callback dev_conf. APIs qstart and qcreate would
+ * be invoked from dev_conf callback to create the virtqueues, For further
+ * details refer SF-122427.
+ */
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_qcreate(
+   __inefx_nic_t *enp,
+   __deref_out efx_virtio_vq_t **evvpp);
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_qstart(
+   __inefx_virtio_vq_t *evvp,
+   __inefx_virtio_vq_cfg_t *evvcp,
+   __in_optefx_virtio_vq_dyncfg_t *evvdp);
+
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_qstop(
+   __inefx_virtio_vq_t *evvp,
+   __out_opt   efx_virtio_vq_dyncfg_t *evvdp);
+
+LIBEFX_API
+extern void
+efx_virtio_qdestroy(
+

[dpdk-dev] [PATCH v2 5/8] common/sfc_efx/base: add support to verify virtio features

2021-03-15 Thread Andrew Rybchenko
From: Vijay Kumar Srivastava 

Add an API to verify virtio features supported by device.

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/sfc_efx/base/efx.h  |  7 
 drivers/common/sfc_efx/base/efx_impl.h |  2 +
 drivers/common/sfc_efx/base/efx_virtio.c   | 38 +++
 drivers/common/sfc_efx/base/rhead_impl.h   |  7 
 drivers/common/sfc_efx/base/rhead_virtio.c | 44 ++
 5 files changed, 98 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index e3ac51eae0..ff5091a36b 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4501,6 +4501,13 @@ efx_virtio_get_features(
__inefx_virtio_device_type_t type,
__out   uint64_t *featuresp);
 
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_verify_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __inuint64_t features);
+
 #endif /* EFSYS_OPT_VIRTIO */
 
 #ifdef __cplusplus
diff --git a/drivers/common/sfc_efx/base/efx_impl.h 
b/drivers/common/sfc_efx/base/efx_impl.h
index 758206d382..aa878014c1 100644
--- a/drivers/common/sfc_efx/base/efx_impl.h
+++ b/drivers/common/sfc_efx/base/efx_impl.h
@@ -320,6 +320,8 @@ typedef struct efx_virtio_ops_s {
uint32_t *);
efx_rc_t(*evo_get_features)(efx_nic_t *,
efx_virtio_device_type_t, uint64_t *);
+   efx_rc_t(*evo_verify_features)(efx_nic_t *,
+   efx_virtio_device_type_t, uint64_t);
 } efx_virtio_ops_t;
 #endif /* EFSYS_OPT_VIRTIO */
 
diff --git a/drivers/common/sfc_efx/base/efx_virtio.c 
b/drivers/common/sfc_efx/base/efx_virtio.c
index 20c22f02b5..b46997c09e 100644
--- a/drivers/common/sfc_efx/base/efx_virtio.c
+++ b/drivers/common/sfc_efx/base/efx_virtio.c
@@ -14,6 +14,7 @@ static const efx_virtio_ops_t __efx_virtio_rhead_ops = {
rhead_virtio_qstop, /* evo_virtio_qstop */
rhead_virtio_get_doorbell_offset,   /* evo_get_doorbell_offset */
rhead_virtio_get_features,  /* evo_get_features */
+   rhead_virtio_verify_features,   /* evo_verify_features */
 };
 #endif /* EFSYS_OPT_RIVERHEAD */
 
@@ -299,4 +300,41 @@ efx_virtio_get_features(
return (rc);
 }
 
+   __checkReturn   efx_rc_t
+efx_virtio_verify_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __inuint64_t features)
+{
+   const efx_virtio_ops_t *evop = enp->en_evop;
+   efx_rc_t rc;
+
+   if (type >= EFX_VIRTIO_DEVICE_NTYPES) {
+   rc = EINVAL;
+   goto fail1;
+   }
+
+   EFSYS_ASSERT3U(enp->en_magic, ==, EFX_NIC_MAGIC);
+   EFSYS_ASSERT3U(enp->en_mod_flags, &, EFX_MOD_VIRTIO);
+
+   if (evop == NULL) {
+   rc = ENOTSUP;
+   goto fail2;
+   }
+
+   if ((rc = evop->evo_verify_features(enp, type, features)) != 0)
+   goto fail3;
+
+   return (0);
+
+fail3:
+   EFSYS_PROBE(fail3);
+fail2:
+   EFSYS_PROBE(fail2);
+fail1:
+   EFSYS_PROBE1(fail1, efx_rc_t, rc);
+
+   return (rc);
+}
+
 #endif /* EFSYS_OPT_VIRTIO */
diff --git a/drivers/common/sfc_efx/base/rhead_impl.h 
b/drivers/common/sfc_efx/base/rhead_impl.h
index 69d701a47e..3bf9beceb0 100644
--- a/drivers/common/sfc_efx/base/rhead_impl.h
+++ b/drivers/common/sfc_efx/base/rhead_impl.h
@@ -505,6 +505,13 @@ rhead_virtio_get_features(
__inefx_virtio_device_type_t type,
__out   uint64_t *featuresp);
 
+LIBEFX_INTERNAL
+extern __checkReturn   efx_rc_t
+rhead_virtio_verify_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __inuint64_t features);
+
 #endif /* EFSYS_OPT_VIRTIO */
 
 #ifdef __cplusplus
diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c 
b/drivers/common/sfc_efx/base/rhead_virtio.c
index 508d03d58f..0023ea1e83 100644
--- a/drivers/common/sfc_efx/base/rhead_virtio.c
+++ b/drivers/common/sfc_efx/base/rhead_virtio.c
@@ -332,4 +332,48 @@ rhead_virtio_get_features(
return (rc);
 }
 
+   __checkReturn   efx_rc_t
+rhead_virtio_verify_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __inuint64_t features)
+{
+   efx_mcdi_req_t req;
+   EFX_MCDI_DECLARE_BUF(payload, MC_CMD_VIRTIO_TEST_FEATURES_IN_LEN,
+   MC_CMD_VIRTIO_TEST_FEATURES_OUT_LEN);
+   efx_rc_t rc;
+
+   EFX_STATIC_ASSERT(EFX_VIRTIO_DEVICE_TYPE_NET ==
+   MC_CMD_VIRTIO_GET_FEATURES_IN_NET);
+   EFX_STATIC_ASSERT(EFX_VIRTIO_DEVICE_TYPE_BLOCK ==
+  

[dpdk-dev] [PATCH v2 3/8] common/sfc_efx/base: add virtio build dependency

2021-03-15 Thread Andrew Rybchenko
From: Vijay Kumar Srivastava 

Add EFSYS_HAS_UINT64 build dependency on EFSYS_OPT_VIRTIO.
virtio features are represented as bitmask in 64-bit unsigned
integer.

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/sfc_efx/base/efx_check.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx_check.h 
b/drivers/common/sfc_efx/base/efx_check.h
index 86a6d92fef..66b38eeae0 100644
--- a/drivers/common/sfc_efx/base/efx_check.h
+++ b/drivers/common/sfc_efx/base/efx_check.h
@@ -411,6 +411,9 @@
 # if !EFSYS_OPT_RIVERHEAD
 #  error "VIRTIO requires RIVERHEAD"
 # endif
+# if !EFSYS_HAS_UINT64
+#  error "VIRTIO requires UINT64"
+# endif
 #endif /* EFSYS_OPT_VIRTIO */
 
 #endif /* _SYS_EFX_CHECK_H */
-- 
2.30.1



[dpdk-dev] [PATCH v2 4/8] common/sfc_efx/base: add support to get virtio features

2021-03-15 Thread Andrew Rybchenko
From: Vijay Kumar Srivastava 

Add an API to get virtio features supported by device.

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/sfc_efx/base/efx.h  | 14 ++
 drivers/common/sfc_efx/base/efx_impl.h |  2 +
 drivers/common/sfc_efx/base/efx_virtio.c   | 45 +++
 drivers/common/sfc_efx/base/rhead_impl.h   |  7 +++
 drivers/common/sfc_efx/base/rhead_virtio.c | 52 ++
 5 files changed, 120 insertions(+)

diff --git a/drivers/common/sfc_efx/base/efx.h 
b/drivers/common/sfc_efx/base/efx.h
index d4b7d7f47e..e3ac51eae0 100644
--- a/drivers/common/sfc_efx/base/efx.h
+++ b/drivers/common/sfc_efx/base/efx.h
@@ -4433,6 +4433,13 @@ typedef struct efx_virtio_vq_cfg_s {
 
 typedef struct efx_virtio_vq_s efx_virtio_vq_t;
 
+typedef enum efx_virtio_device_type_e {
+   EFX_VIRTIO_DEVICE_TYPE_RESERVED,
+   EFX_VIRTIO_DEVICE_TYPE_NET,
+   EFX_VIRTIO_DEVICE_TYPE_BLOCK,
+   EFX_VIRTIO_DEVICE_NTYPES
+} efx_virtio_device_type_t;
+
 LIBEFX_API
 extern __checkReturn   efx_rc_t
 efx_virtio_init(
@@ -4487,6 +4494,13 @@ efx_virtio_get_doorbell_offset(
__inefx_virtio_vq_t *evvp,
__out   uint32_t *offsetp);
 
+LIBEFX_API
+extern __checkReturn   efx_rc_t
+efx_virtio_get_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __out   uint64_t *featuresp);
+
 #endif /* EFSYS_OPT_VIRTIO */
 
 #ifdef __cplusplus
diff --git a/drivers/common/sfc_efx/base/efx_impl.h 
b/drivers/common/sfc_efx/base/efx_impl.h
index d6742f4a8c..758206d382 100644
--- a/drivers/common/sfc_efx/base/efx_impl.h
+++ b/drivers/common/sfc_efx/base/efx_impl.h
@@ -318,6 +318,8 @@ typedef struct efx_virtio_ops_s {
efx_virtio_vq_dyncfg_t *);
efx_rc_t(*evo_get_doorbell_offset)(efx_virtio_vq_t *,
uint32_t *);
+   efx_rc_t(*evo_get_features)(efx_nic_t *,
+   efx_virtio_device_type_t, uint64_t *);
 } efx_virtio_ops_t;
 #endif /* EFSYS_OPT_VIRTIO */
 
diff --git a/drivers/common/sfc_efx/base/efx_virtio.c 
b/drivers/common/sfc_efx/base/efx_virtio.c
index de998fcad9..20c22f02b5 100644
--- a/drivers/common/sfc_efx/base/efx_virtio.c
+++ b/drivers/common/sfc_efx/base/efx_virtio.c
@@ -13,6 +13,7 @@ static const efx_virtio_ops_t __efx_virtio_rhead_ops = {
rhead_virtio_qstart,/* evo_virtio_qstart */
rhead_virtio_qstop, /* evo_virtio_qstop */
rhead_virtio_get_doorbell_offset,   /* evo_get_doorbell_offset */
+   rhead_virtio_get_features,  /* evo_get_features */
 };
 #endif /* EFSYS_OPT_RIVERHEAD */
 
@@ -254,4 +255,48 @@ efx_virtio_get_doorbell_offset(
return (rc);
 }
 
+   __checkReturn   efx_rc_t
+efx_virtio_get_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __out   uint64_t *featuresp)
+{
+   const efx_virtio_ops_t *evop = enp->en_evop;
+   efx_rc_t rc;
+
+   if (featuresp == NULL) {
+   rc = EINVAL;
+   goto fail1;
+   }
+
+   if (type >= EFX_VIRTIO_DEVICE_NTYPES) {
+   rc = EINVAL;
+   goto fail2;
+   }
+
+   EFSYS_ASSERT3U(enp->en_magic, ==, EFX_NIC_MAGIC);
+   EFSYS_ASSERT3U(enp->en_mod_flags, &, EFX_MOD_VIRTIO);
+
+   if (evop == NULL) {
+   rc = ENOTSUP;
+   goto fail3;
+   }
+
+   if ((rc = evop->evo_get_features(enp, type, featuresp)) != 0)
+   goto fail4;
+
+   return (0);
+
+fail4:
+   EFSYS_PROBE(fail4);
+fail3:
+   EFSYS_PROBE(fail3);
+fail2:
+   EFSYS_PROBE(fail2);
+fail1:
+   EFSYS_PROBE1(fail1, efx_rc_t, rc);
+
+   return (rc);
+}
+
 #endif /* EFSYS_OPT_VIRTIO */
diff --git a/drivers/common/sfc_efx/base/rhead_impl.h 
b/drivers/common/sfc_efx/base/rhead_impl.h
index 4304f63f4c..69d701a47e 100644
--- a/drivers/common/sfc_efx/base/rhead_impl.h
+++ b/drivers/common/sfc_efx/base/rhead_impl.h
@@ -498,6 +498,13 @@ rhead_virtio_get_doorbell_offset(
__inefx_virtio_vq_t *evvp,
__out   uint32_t *offsetp);
 
+LIBEFX_INTERNAL
+extern __checkReturn   efx_rc_t
+rhead_virtio_get_features(
+   __inefx_nic_t *enp,
+   __inefx_virtio_device_type_t type,
+   __out   uint64_t *featuresp);
+
 #endif /* EFSYS_OPT_VIRTIO */
 
 #ifdef __cplusplus
diff --git a/drivers/common/sfc_efx/base/rhead_virtio.c 
b/drivers/common/sfc_efx/base/rhead_virtio.c
index 147460c95c..508d03d58f 100644
--- a/drivers/common/sfc_efx/base/rhead_virtio.c
+++ b/drivers/common/sfc_efx/base/rhead_virtio.c
@@ -280,4 +280,56 @@ rhead_virtio_get_doorbell_offset(
return (rc);
 }
 
+   __checkReturn  

[dpdk-dev] [PATCH v2 7/8] net/sfc: skip driver probe for incompatible device class

2021-03-15 Thread Andrew Rybchenko
From: Vijay Kumar Srivastava 

Driver would be probed only for the net device class.

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 doc/guides/nics/sfc_efx.rst  | 8 
 drivers/net/sfc/sfc.h| 1 +
 drivers/net/sfc/sfc_ethdev.c | 7 +++
 drivers/net/sfc/sfc_kvargs.c | 1 +
 4 files changed, 17 insertions(+)

diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index b6047cf5c7..cf1269cc03 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -357,6 +357,14 @@ allow option like "-a 02:00.0,arg1=value1,...".
 Case-insensitive 1/y/yes/on or 0/n/no/off may be used to specify
 boolean parameters value.
 
+- ``class`` [net|vdpa] (default **net**)
+
+  Choose the mode of operation of ef100 device.
+  **net** device will work as network device and will be probed by net/sfc 
driver.
+  **vdpa** device will work as vdpa device and will be probed by vdpa/sfc 
driver.
+  If this parameter is not specified then ef100 device will operate as
+  network device.
+
 - ``rx_datapath`` [auto|efx|ef10|ef10_essb] (default **auto**)
 
   Choose receive datapath implementation.
diff --git a/drivers/net/sfc/sfc.h b/drivers/net/sfc/sfc.h
index c2945b6ba2..b48a818adb 100644
--- a/drivers/net/sfc/sfc.h
+++ b/drivers/net/sfc/sfc.h
@@ -22,6 +22,7 @@
 #include "efx.h"
 
 #include "sfc_efx_mcdi.h"
+#include "sfc_efx.h"
 
 #include "sfc_debug.h"
 #include "sfc_log.h"
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index 00a0fd3d02..23828c24ff 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -2161,6 +2161,13 @@ sfc_eth_dev_init(struct rte_eth_dev *dev)
const struct rte_ether_addr *from;
int ret;
 
+   if (sfc_efx_dev_class_get(pci_dev->device.devargs) !=
+   SFC_EFX_DEV_CLASS_NET) {
+   SFC_GENERIC_LOG(DEBUG,
+   "Incompatible device class: skip probing, should be 
probed by other sfc driver.");
+   return 1;
+   }
+
sfc_register_dp();
 
logtype_main = sfc_register_logtype(&pci_dev->addr,
diff --git a/drivers/net/sfc/sfc_kvargs.c b/drivers/net/sfc/sfc_kvargs.c
index c42b326ab0..0efa92ed28 100644
--- a/drivers/net/sfc/sfc_kvargs.c
+++ b/drivers/net/sfc/sfc_kvargs.c
@@ -28,6 +28,7 @@ sfc_kvargs_parse(struct sfc_adapter *sa)
SFC_KVARG_TX_DATAPATH,
SFC_KVARG_FW_VARIANT,
SFC_KVARG_RXD_WAIT_TIMEOUT_NS,
+   SFC_EFX_KVARG_DEV_CLASS,
NULL,
};
 
-- 
2.30.1



[dpdk-dev] [PATCH v2 6/8] common/sfc_efx: add support to get the device class

2021-03-15 Thread Andrew Rybchenko
From: Vijay Kumar Srivastava 

Device class argument would be used to select compatible driver.
Driver probe would be skipped for incompatible device class.

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/sfc_efx/sfc_efx.c   | 49 ++
 drivers/common/sfc_efx/sfc_efx.h   | 34 +
 drivers/common/sfc_efx/version.map |  2 ++
 3 files changed, 85 insertions(+)
 create mode 100644 drivers/common/sfc_efx/sfc_efx.h

diff --git a/drivers/common/sfc_efx/sfc_efx.c b/drivers/common/sfc_efx/sfc_efx.c
index d7a84c9835..a3146db255 100644
--- a/drivers/common/sfc_efx/sfc_efx.c
+++ b/drivers/common/sfc_efx/sfc_efx.c
@@ -7,12 +7,61 @@
  * for Solarflare) and Solarflare Communications, Inc.
  */
 
+#include 
 #include 
+#include 
+#include 
 
 #include "sfc_efx_log.h"
+#include "sfc_efx.h"
 
 uint32_t sfc_efx_logtype;
 
+static int
+sfc_efx_kvarg_dev_class_handler(__rte_unused const char *key,
+   const char *class_str, void *opaque)
+{
+   enum sfc_efx_dev_class *dev_class = opaque;
+
+   if (class_str == NULL)
+   return *dev_class;
+
+   if (strcmp(class_str, "vdpa") == 0) {
+   *dev_class = SFC_EFX_DEV_CLASS_VDPA;
+   } else if (strcmp(class_str, "net") == 0) {
+   *dev_class = SFC_EFX_DEV_CLASS_NET;
+   } else {
+   SFC_EFX_LOG(ERR, "Unsupported class %s.", class_str);
+   *dev_class = SFC_EFX_DEV_CLASS_INVALID;
+   }
+
+   return 0;
+}
+
+enum sfc_efx_dev_class
+sfc_efx_dev_class_get(struct rte_devargs *devargs)
+{
+   struct rte_kvargs *kvargs;
+   const char *key = SFC_EFX_KVARG_DEV_CLASS;
+   enum sfc_efx_dev_class dev_class = SFC_EFX_DEV_CLASS_NET;
+
+   if (devargs == NULL)
+   return dev_class;
+
+   kvargs = rte_kvargs_parse(devargs->args, NULL);
+   if (kvargs == NULL)
+   return dev_class;
+
+   if (rte_kvargs_count(kvargs, key) != 0) {
+   rte_kvargs_process(kvargs, key, sfc_efx_kvarg_dev_class_handler,
+  &dev_class);
+   }
+
+   rte_kvargs_free(kvargs);
+
+   return dev_class;
+}
+
 RTE_INIT(sfc_efx_register_logtype)
 {
int ret;
diff --git a/drivers/common/sfc_efx/sfc_efx.h b/drivers/common/sfc_efx/sfc_efx.h
new file mode 100644
index 00..bbccd3e9e8
--- /dev/null
+++ b/drivers/common/sfc_efx/sfc_efx.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright(c) 2019-2020 Xilinx, Inc.
+ * Copyright(c) 2019 Solarflare Communications Inc.
+ *
+ * This software was jointly developed between OKTET Labs (under contract
+ * for Solarflare) and Solarflare Communications, Inc.
+ */
+
+#ifndef _SFC_EFX_H_
+#define _SFC_EFX_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define SFC_EFX_KVARG_DEV_CLASS"class"
+
+enum sfc_efx_dev_class {
+   SFC_EFX_DEV_CLASS_INVALID = 0,
+   SFC_EFX_DEV_CLASS_NET,
+   SFC_EFX_DEV_CLASS_VDPA,
+
+   SFC_EFX_DEV_NCLASS
+};
+
+__rte_internal
+enum sfc_efx_dev_class sfc_efx_dev_class_get(struct rte_devargs *devargs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _SFC_EFX_H_ */
diff --git a/drivers/common/sfc_efx/version.map 
b/drivers/common/sfc_efx/version.map
index 403feeaf11..a3345d34f7 100644
--- a/drivers/common/sfc_efx/version.map
+++ b/drivers/common/sfc_efx/version.map
@@ -221,6 +221,8 @@ INTERNAL {
efx_txq_nbufs;
efx_txq_size;
 
+   sfc_efx_dev_class_get;
+
sfc_efx_mcdi_init;
sfc_efx_mcdi_fini;
 
-- 
2.30.1



Re: [dpdk-dev] [PATCH 8/8] drivers: add common driver API to get efx family

2021-03-15 Thread Andrew Rybchenko
On 3/14/21 3:36 AM, Ferruh Yigit wrote:
> On 3/11/2021 11:03 AM, Andrew Rybchenko wrote:
>> From: Vijay Kumar Srivastava 
>>
>> Move function to get efx family from net driver into common driver.
>>
>> Signed-off-by: Vijay Kumar Srivastava 
>> Signed-off-by: Andrew Rybchenko 
>
> <...>
>
>> diff --git a/drivers/meson.build b/drivers/meson.build
>> index fdf76120ac..9c8eded697 100644
>> --- a/drivers/meson.build
>> +++ b/drivers/meson.build
>> @@ -7,6 +7,7 @@ subdirs = [
>>   'bus',
>>   'common/mlx5', # depends on bus.
>>   'common/qat', # depends on bus.
>> +    'common/sfc_efx', # depends on bus.
>>   'mempool', # depends on common and bus.
>>   'net', # depends on common, bus, mempool
>>   'raw', # depends on common, bus and net.
>
> This enables building 'common/sfc_efx' for windows and fail. A windows
> build check needs to be added to 'common/sfc_efx'

I've send v2 which fixes the problem and one more -
undefined functions mentioned in version.map.

Thanks,
Andrew.


[dpdk-dev] [PATCH v2 8/8] drivers: add common driver API to get efx family

2021-03-15 Thread Andrew Rybchenko
From: Vijay Kumar Srivastava 

Move function to get efx family from net driver into common driver.

Signed-off-by: Vijay Kumar Srivastava 
Signed-off-by: Andrew Rybchenko 
---
 drivers/common/meson.build |  2 +-
 drivers/common/sfc_efx/meson.build |  5 +++
 drivers/common/sfc_efx/sfc_efx.c   | 56 +++
 drivers/common/sfc_efx/sfc_efx.h   | 10 +
 drivers/common/sfc_efx/version.map |  1 +
 drivers/meson.build|  1 +
 drivers/net/sfc/sfc.c  | 61 ++
 7 files changed, 78 insertions(+), 58 deletions(-)

diff --git a/drivers/common/meson.build b/drivers/common/meson.build
index ba6325adf3..66e12143b2 100644
--- a/drivers/common/meson.build
+++ b/drivers/common/meson.build
@@ -6,4 +6,4 @@ if is_windows
 endif
 
 std_deps = ['eal']
-drivers = ['cpt', 'dpaax', 'iavf', 'mvep', 'octeontx', 'octeontx2', 'sfc_efx']
+drivers = ['cpt', 'dpaax', 'iavf', 'mvep', 'octeontx', 'octeontx2']
diff --git a/drivers/common/sfc_efx/meson.build 
b/drivers/common/sfc_efx/meson.build
index d9afcf3eeb..9faf5161f5 100644
--- a/drivers/common/sfc_efx/meson.build
+++ b/drivers/common/sfc_efx/meson.build
@@ -5,6 +5,10 @@
 # This software was jointly developed between OKTET Labs (under contract
 # for Solarflare) and Solarflare Communications, Inc.
 
+if is_windows
+   subdir_done()
+endif
+
 if (arch_subdir != 'x86' or not dpdk_conf.get('RTE_ARCH_64')) and (arch_subdir 
!= 'arm' or not host_machine.cpu_family().startswith('aarch64'))
build = false
reason = 'only supported on x86_64 and aarch64'
@@ -32,6 +36,7 @@ endforeach
 subdir('base')
 objs = [base_objs]
 
+deps += ['bus_pci']
 sources = files(
'sfc_efx.c',
'sfc_efx_mcdi.c',
diff --git a/drivers/common/sfc_efx/sfc_efx.c b/drivers/common/sfc_efx/sfc_efx.c
index a3146db255..0b78933d9f 100644
--- a/drivers/common/sfc_efx/sfc_efx.c
+++ b/drivers/common/sfc_efx/sfc_efx.c
@@ -62,6 +62,62 @@ sfc_efx_dev_class_get(struct rte_devargs *devargs)
return dev_class;
 }
 
+static efx_rc_t
+sfc_efx_find_mem_bar(efsys_pci_config_t *configp, int bar_index,
+efsys_bar_t *barp)
+{
+   efsys_bar_t result;
+   struct rte_pci_device *dev;
+
+   memset(&result, 0, sizeof(result));
+
+   if (bar_index < 0 || bar_index >= PCI_MAX_RESOURCE)
+   return -EINVAL;
+
+   dev = configp->espc_dev;
+
+   result.esb_rid = bar_index;
+   result.esb_dev = dev;
+   result.esb_base = dev->mem_resource[bar_index].addr;
+
+   *barp = result;
+
+   return 0;
+}
+
+static efx_rc_t
+sfc_efx_pci_config_readd(efsys_pci_config_t *configp, uint32_t offset,
+efx_dword_t *edp)
+{
+   int rc;
+
+   rc = rte_pci_read_config(configp->espc_dev, edp->ed_u32, sizeof(*edp),
+offset);
+
+   return (rc < 0 || rc != sizeof(*edp)) ? EIO : 0;
+}
+
+int
+sfc_efx_family(struct rte_pci_device *pci_dev,
+  efx_bar_region_t *mem_ebrp, efx_family_t *family)
+{
+   static const efx_pci_ops_t ops = {
+   .epo_config_readd = sfc_efx_pci_config_readd,
+   .epo_find_mem_bar = sfc_efx_find_mem_bar,
+   };
+
+   efsys_pci_config_t espcp;
+   int rc;
+
+   espcp.espc_dev = pci_dev;
+
+   rc = efx_family_probe_bar(pci_dev->id.vendor_id,
+ pci_dev->id.device_id,
+ &espcp, &ops, family, mem_ebrp);
+
+   return rc;
+}
+
 RTE_INIT(sfc_efx_register_logtype)
 {
int ret;
diff --git a/drivers/common/sfc_efx/sfc_efx.h b/drivers/common/sfc_efx/sfc_efx.h
index bbccd3e9e8..71288b7299 100644
--- a/drivers/common/sfc_efx/sfc_efx.h
+++ b/drivers/common/sfc_efx/sfc_efx.h
@@ -10,6 +10,11 @@
 #ifndef _SFC_EFX_H_
 #define _SFC_EFX_H_
 
+#include 
+
+#include "efx.h"
+#include "efsys.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -27,6 +32,11 @@ enum sfc_efx_dev_class {
 __rte_internal
 enum sfc_efx_dev_class sfc_efx_dev_class_get(struct rte_devargs *devargs);
 
+__rte_internal
+int sfc_efx_family(struct rte_pci_device *pci_dev,
+  efx_bar_region_t *mem_ebrp,
+  efx_family_t *family);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/common/sfc_efx/version.map 
b/drivers/common/sfc_efx/version.map
index a3345d34f7..c3414b760b 100644
--- a/drivers/common/sfc_efx/version.map
+++ b/drivers/common/sfc_efx/version.map
@@ -222,6 +222,7 @@ INTERNAL {
efx_txq_size;
 
sfc_efx_dev_class_get;
+   sfc_efx_family;
 
sfc_efx_mcdi_init;
sfc_efx_mcdi_fini;
diff --git a/drivers/meson.build b/drivers/meson.build
index fdf76120ac..9c8eded697 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -7,6 +7,7 @@ subdirs = [
'bus',
'common/mlx5', # depends on bus.
'common/qat', # depends on bus.
+   'common/sfc_efx', # depends on bus.
'mempool', # depends on common and bus.
'net',  

Re: [dpdk-dev] [PATCH v11 0/2] support both PIO and MMIO BAR for legacy virito device

2021-03-15 Thread David Marchand
On Wed, Mar 10, 2021 at 6:37 PM 谢华伟(此时此刻)  wrote:
>
> virtio PMD assumes legacy device only supports PIO(port-mapped) BAR
> resource. This is wrong. As we need to create lots of devices, adn PIO
> resource on x86 is very limited, we expose MMIO(memory-mapped I/O) BAR.
>
> Kernel supports both PIO and MMIO BAR for legacy virtio-pci device, and
> for all other pci devices. This patchset handles different type of BAR in
> the similar way.
>
> In previous implementation, under igb_uio driver we get PIO address from
> igb_uio sysfs entry; with uio_pci_generic, we get PIO address from
> /proc/ioports for x86, and for other ARCHs, we get PIO address from
> standard PCI sysfs entry. For PIO/MMIO RW, there is different path for
> different drivers and arch.
>
> All of the above is too much twisted. This patchset unifies the way to get
> both PIO and MMIO address for different driver and ARCHs, all from standard
> resource attr under pci sysfs. This is most generic.
>
> We distinguish PIO and MMIO by their address range like how kernel does.
> It is ugly but works.
>
> v2 changes:
> - add more explanation in the commit message
>
> v3 changes:
> - fix patch format issues
>
> v4 changes:
> - fixes for RTE_KDRV_UIO_GENERIC -> RTE_PCI_KDRV_UIO_GENERIC
>
> v5 changes:
> - split into three seperate patches
>
> v6 changes:
> - change to DEBUG level for IO bar detection in pci_uio_ioport_map
> - rework the code in iobar branch
> - fixes commit message format issue
> - temporarily remove the 3rd patch for vfio path, leave it for future 
> discusssion
> - rework against virtio_pmd_rework_v2
>
> v7 changes:
> - fix compilation issues of in/out instruction on non X86 archs
>
> v8 changes:
> - change the word fix to refactor in patch 1's commit message
>
> v9 changes:
> - keep pause version in in/out instructions
>
> v10 changes:
> - trival fixes in commit message, like > 75 chars
>
> v11 changes:
> - commit message fix and change
>

Aligned Sob and Author to fix the last checkpatch warning.

Series applied to the main branch.
Thanks Huawei and thanks too to reviewers/testers.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v2] update Intel roadmap for 21.05

2021-03-15 Thread Thomas Monjalon
10/03/2021 23:20, Ferruh Yigit:
> Signed-off-by: Ferruh Yigit 
> ---
> v2:
> * there won't be a new driver for dlb2.5
> * reword thash library support

Applied with minor updates for sorting things, thanks.




Re: [dpdk-dev] [RFC] eventdev: introduce event dispatcher

2021-03-15 Thread Mattias Rönnblom
On 2021-03-07 14:04, Jerin Jacob wrote:
> On Fri, Feb 26, 2021 at 1:31 PM Mattias Rönnblom
>  wrote:
>> On 2021-02-25 13:32, Jerin Jacob wrote:
>>> On Fri, Feb 19, 2021 at 12:00 AM Mattias Rönnblom
>>>  wrote:
 The purpose of the event dispatcher is primarily to decouple different
 parts of an application (e.g., processing pipeline stages), which
 share the same underlying event device.

 The event dispatcher replaces the conditional logic (often, a switch
 statement) that typically follows an event device dequeue operation,
 where events are dispatched to different parts of the application
 based on the destination queue id.
>>> # If the device has all type queue[1] this RFC would restrict to
>>> use queue ONLY as stage. A stage can be a Queue Type also.
>>> How we can abstract this in this model?
>>
>> "All queue type" is about scheduling policy. I would think that would be
>> independent of the "logical endpoint" of the event (i.e., the queue id).
>> I feel like I'm missing something here.
> Each queue type also can be represented as a stage.
> For example, If the system has only one queue, the Typical IPsec
> outbound stages can be
> Q0-Ordered(For SA lookup) -> Q0(Atomic)(For Sequence number update) ->
> Q0(Orderd)(Crypto operation)->Q0(Atomic)(Send on wire)


OK, this makes sense.


Would such an application want to add a callback 
per-queue-per-sched-type, or just per-sched-type? In your example, if 
you would have a queue Q1 as well, would want to have the option to have 
different callbacks for atomic-type events on Q0 and Q1?


Would you want to dispatch based on anything else in the event? You 
could basically do it on any field (flow id, priority, etc.), but is 
there some other field that's commonly used to denote a processing stage?


>>
>>> # Also, I think, it may make sense to add this type of infrastructure as
>>> helper functions as these are built on top of existing APIs i.e There
>>> is no support
>>> required from the driver to establish this model. IMO, If we need to
>>> add such support as
>>> one fixed set of functionality, we could have helper APIs to express a 
>>> certain
>>> usage of eventdev. Rather defining the that's only way to do this.
>>> I think, A helper function can be used to as abstraction to define
>>> this kind of model.
>>>
>>> # Also, There is function pointer overhead and aggregating the events
>>> in implementation,
>>> That may be not always "the" optimized model of making it work vs switch 
>>> case in
>>> application.
>>
>> Sure, but what to do in a reasonable generic framework?
>>
>>
>> If you are very sensitive to that 20 cc or whatever function pointer
>> call, you won't use this library. Or you will, and use static linking
>> and LTO to get rid of that overhead.
>>
>>
>> Probably, you have a few queues, not many. Probably, your dequeue bursts
>> are large, if the system load is high (and otherwise, you don't care
>> about efficiency). Then, you will have at least of couple of events per
>> function call.
> I am fine with this library and exposing it as a function pointer if
> someone needs to
> have a "helper" function to model the system around this logic.
>
> This RFC looks good to me in general. I would suggest to make it as
>
> - Helper functions i.e if someone chooses to do write the stage in
> this way, it can be enabled through this helper function.
> By choosing as helper function it depicts, this is one way to do the
> stuff but the NOT ONLY WAY.
> - Abstract stages as a queue(which already added in the patch) and
> each type in the queue for all type queue cases.
> - Enhance test-eventdev to showcase the functionality and performance
> of these helpers.
>
> Thanks for the RFC.
>
>>
>>> [1]
>>> See RTE_EVENT_DEV_CAP_QUEUE_ALL_TYPES in
>>> https://protect2.fireeye.com/v1/url?k=dcf3a2b9-83689b94-dcf3e222-8692dc8284cb-5ba19813a1556a85&q=1&e=0ff1861f-8e24-453c-a93b-73fd88e0f316&u=https%3A%2F%2Fdoc.dpdk.org%2Fguides%2Fprog_guide%2Feventdev.html
>>>
>>>
 The concept is similar to a UNIX file descriptor event loop library.
 Instead of tying callback functions to fds as for example libevent
 does, the event dispatcher binds callbacks to queue ids.

 An event dispatcher is configured to dequeue events from a specific
 event device, and ties into the service core framework, to do its (and
 the application's) work.

 The event dispatcher provides a convenient way for an eventdev-based
 application to use service cores for application-level processing, and
 thus for sharing those cores with other DPDK services.

 Signed-off-by: Mattias Rönnblom 
 ---
lib/librte_eventdev/Makefile |   2 +
lib/librte_eventdev/meson.build  |   6 +-
lib/librte_eventdev/rte_event_dispatcher.c   | 420 +++
lib/librte_eventdev/rte_event_dispatcher.h   | 251 +++
lib/librte_eventdev/rte_eventdev_version.map |

Re: [dpdk-dev] [RFC] eventdev: introduce event dispatcher

2021-03-15 Thread Van Haaren, Harry
> -Original Message-
> From: dev  On Behalf Of Mattias Rönnblom
> Sent: Monday, March 15, 2021 2:45 PM
> To: Jerin Jacob 
> Cc: Jerin Jacob ; dpdk-dev ; Richardson,
> Bruce 
> Subject: Re: [dpdk-dev] [RFC] eventdev: introduce event dispatcher
> 
> On 2021-03-07 14:04, Jerin Jacob wrote:
> > On Fri, Feb 26, 2021 at 1:31 PM Mattias Rönnblom
> >  wrote:
> >> On 2021-02-25 13:32, Jerin Jacob wrote:
> >>> On Fri, Feb 19, 2021 at 12:00 AM Mattias Rönnblom
> >>>  wrote:
>  The purpose of the event dispatcher is primarily to decouple different
>  parts of an application (e.g., processing pipeline stages), which
>  share the same underlying event device.
> 
>  The event dispatcher replaces the conditional logic (often, a switch
>  statement) that typically follows an event device dequeue operation,
>  where events are dispatched to different parts of the application
>  based on the destination queue id.
> >>> # If the device has all type queue[1] this RFC would restrict to
> >>> use queue ONLY as stage. A stage can be a Queue Type also.
> >>> How we can abstract this in this model?
> >>
> >> "All queue type" is about scheduling policy. I would think that would be
> >> independent of the "logical endpoint" of the event (i.e., the queue id).
> >> I feel like I'm missing something here.
> > Each queue type also can be represented as a stage.
> > For example, If the system has only one queue, the Typical IPsec
> > outbound stages can be
> > Q0-Ordered(For SA lookup) -> Q0(Atomic)(For Sequence number update) ->
> > Q0(Orderd)(Crypto operation)->Q0(Atomic)(Send on wire)
> 
> 
> OK, this makes sense.
> 
> 
> Would such an application want to add a callback
> per-queue-per-sched-type, or just per-sched-type? In your example, if
> you would have a queue Q1 as well, would want to have the option to have
> different callbacks for atomic-type events on Q0 and Q1?
> 
> 
> Would you want to dispatch based on anything else in the event? You
> could basically do it on any field (flow id, priority, etc.), but is
> there some other field that's commonly used to denote a processing stage?

I expect that struct rte_event::event_type and sub_event_type would regularly
be used to split out different type of "things" that would be handled 
separately.

Overall, I think we could imagine the Queue number, Queue Scheduling type 
(Re-Ordered, Atomic), 
Event type, sub event type, Flow-ID.. all contributing somehow to what function 
to execute in some situation.

As a somewhat extreme example to prove a point:
An RX core might use rte_flow rules to split traffic into some arbitrary 
grouping, and
then the rte_event::flow_id could be used to select the function-pointer to 
jump to handle it?

I like the *concept* of having a table of func-ptrs, and removing of a switch() 
in that way,
but I'm not sure that DPDK Eventdev APIs are the right place for it. I think 
Jerin already
suggested the "helper function" concept, which seems a good idea to allow 
optional usage.

To be clear, I'm not against upstreaming of such an event-dispatcher, but I'm 
not sure
its possible to build it to be generic enough for all use-cases. Maybe focusing 
on an actual
use-case and driving the design from that is a good approach?


Regards, -Harry




[dpdk-dev] [PATCH v2 0/4] net/virtio: make virtqueue struct cache-friendly

2021-03-15 Thread Maxime Coquelin
This series optimizes the cache usage of virtqueue struct,
by making a "fake" mbuf being dynamically allocated in Rx
virtnet struct, by removing a useless virtuque pointer
into the virtnet structs and by moving a few fields
to pack holes.

With these 3 patches, the virtqueue struct size goes from
576 bytes (9 cachelines) to 248 bytes (4 cachelines).

Changes in v2:
==
- Rebase on latest main
- Improve error path in virtio_init_queue
- Fix various typos in commit messages

Maxime Coquelin (4):
  net/virtio: remove reference to virtqueue in vrings
  net/virtio: improve queue init error path
  net/virtio: allocate fake mbuf in Rx queue
  net/virtio: pack virtuqueue struct

 drivers/net/virtio/virtio_ethdev.c| 68 ---
 drivers/net/virtio/virtio_rxtx.c  | 36 +-
 drivers/net/virtio/virtio_rxtx.h  |  5 +-
 drivers/net/virtio/virtio_rxtx_packed.c   |  4 +-
 drivers/net/virtio/virtio_rxtx_packed.h   |  6 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h   |  4 +-
 drivers/net/virtio/virtio_rxtx_simple.h   |  2 +-
 .../net/virtio/virtio_rxtx_simple_altivec.c   |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_neon.c  |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_sse.c   |  2 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |  4 +-
 drivers/net/virtio/virtio_user_ethdev.c   |  2 +-
 drivers/net/virtio/virtqueue.h| 24 ---
 13 files changed, 88 insertions(+), 73 deletions(-)

-- 
2.29.2



[dpdk-dev] [PATCH v2 2/4] net/virtio: improve queue init error path

2021-03-15 Thread Maxime Coquelin
This patch improves the error path of virtio_init_queue(),
by cleaning in reversing order all resources that have
been allocated.

Suggested-by: Chenbo Xia 
Signed-off-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index af090fdf9c..65ad71f1a6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -507,7 +507,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
mz = rte_memzone_lookup(vq_name);
if (mz == NULL) {
ret = -ENOMEM;
-   goto fail_q_alloc;
+   goto free_vq;
}
}
 
@@ -533,7 +533,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
hdr_mz = rte_memzone_lookup(vq_hdr_name);
if (hdr_mz == NULL) {
ret = -ENOMEM;
-   goto fail_q_alloc;
+   goto free_mz;
}
}
}
@@ -547,7 +547,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
if (!sw_ring) {
PMD_INIT_LOG(ERR, "can not allocate RX soft ring");
ret = -ENOMEM;
-   goto fail_q_alloc;
+   goto free_hdr_mz;
}
 
vq->sw_ring = sw_ring;
@@ -604,15 +604,22 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
 
if (VIRTIO_OPS(hw)->setup_queue(hw, vq) < 0) {
PMD_INIT_LOG(ERR, "setup_queue failed");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto clean_vq;
}
 
return 0;
 
-fail_q_alloc:
-   rte_free(sw_ring);
+clean_vq:
+   hw->cvq = NULL;
+
+   if (sw_ring)
+   rte_free(sw_ring);
+free_hdr_mz:
rte_memzone_free(hdr_mz);
+free_mz:
rte_memzone_free(mz);
+free_vq:
rte_free(vq);
 
return ret;
-- 
2.29.2



[dpdk-dev] [PATCH v2 1/4] net/virtio: remove reference to virtqueue in vrings

2021-03-15 Thread Maxime Coquelin
Vrings are part of the virtqueues, so we don't need
to have a pointer to it in Vrings descriptions.

Instead, let's just subtract from its offset to
calculate virtqueue address.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Chenbo Xia 
---
 drivers/net/virtio/virtio_ethdev.c| 36 +--
 drivers/net/virtio/virtio_rxtx.c  | 28 +++
 drivers/net/virtio/virtio_rxtx.h  |  3 --
 drivers/net/virtio/virtio_rxtx_packed.c   |  4 +--
 drivers/net/virtio/virtio_rxtx_packed.h   |  6 ++--
 drivers/net/virtio/virtio_rxtx_packed_avx.h   |  4 +--
 drivers/net/virtio/virtio_rxtx_simple.h   |  2 +-
 .../net/virtio/virtio_rxtx_simple_altivec.c   |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_neon.c  |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_sse.c   |  2 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |  4 +--
 drivers/net/virtio/virtio_user_ethdev.c   |  2 +-
 drivers/net/virtio/virtqueue.h|  6 +++-
 13 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 333a5243a9..af090fdf9c 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -133,7 +133,7 @@ virtio_send_command_packed(struct virtnet_ctl *cvq,
   struct virtio_pmd_ctrl *ctrl,
   int *dlen, int pkt_num)
 {
-   struct virtqueue *vq = cvq->vq;
+   struct virtqueue *vq = virtnet_cq_to_vq(cvq);
int head;
struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
struct virtio_pmd_ctrl *result;
@@ -229,7 +229,7 @@ virtio_send_command_split(struct virtnet_ctl *cvq,
  int *dlen, int pkt_num)
 {
struct virtio_pmd_ctrl *result;
-   struct virtqueue *vq = cvq->vq;
+   struct virtqueue *vq = virtnet_cq_to_vq(cvq);
uint32_t head, i;
int k, sum = 0;
 
@@ -316,13 +316,13 @@ virtio_send_command(struct virtnet_ctl *cvq, struct 
virtio_pmd_ctrl *ctrl,
 
ctrl->status = status;
 
-   if (!cvq || !cvq->vq) {
+   if (!cvq) {
PMD_INIT_LOG(ERR, "Control queue is not supported.");
return -1;
}
 
rte_spinlock_lock(&cvq->lock);
-   vq = cvq->vq;
+   vq = virtnet_cq_to_vq(cvq);
 
PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
"vq->hw->cvq = %p vq = %p",
@@ -552,19 +552,16 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
 
vq->sw_ring = sw_ring;
rxvq = &vq->rxq;
-   rxvq->vq = vq;
rxvq->port_id = dev->data->port_id;
rxvq->mz = mz;
} else if (queue_type == VTNET_TQ) {
txvq = &vq->txq;
-   txvq->vq = vq;
txvq->port_id = dev->data->port_id;
txvq->mz = mz;
txvq->virtio_net_hdr_mz = hdr_mz;
txvq->virtio_net_hdr_mem = hdr_mz->iova;
} else if (queue_type == VTNET_CQ) {
cvq = &vq->cq;
-   cvq->vq = vq;
cvq->mz = mz;
cvq->virtio_net_hdr_mz = hdr_mz;
cvq->virtio_net_hdr_mem = hdr_mz->iova;
@@ -851,7 +848,7 @@ virtio_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, 
uint16_t queue_id)
 {
struct virtio_hw *hw = dev->data->dev_private;
struct virtnet_rx *rxvq = dev->data->rx_queues[queue_id];
-   struct virtqueue *vq = rxvq->vq;
+   struct virtqueue *vq = virtnet_rxq_to_vq(rxvq);
 
virtqueue_enable_intr(vq);
virtio_mb(hw->weak_barriers);
@@ -862,7 +859,7 @@ static int
 virtio_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 {
struct virtnet_rx *rxvq = dev->data->rx_queues[queue_id];
-   struct virtqueue *vq = rxvq->vq;
+   struct virtqueue *vq = virtnet_rxq_to_vq(rxvq);
 
virtqueue_disable_intr(vq);
return 0;
@@ -2180,8 +2177,7 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   struct virtnet_rx *rxvq;
-   struct virtnet_tx *txvq __rte_unused;
+   struct virtqueue *vq;
struct virtio_hw *hw = dev->data->dev_private;
int ret;
 
@@ -2238,27 +2234,27 @@ virtio_dev_start(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
 
for (i = 0; i < dev->data->nb_rx_queues; i++) {
-   rxvq = dev->data->rx_queues[i];
+   vq = virtnet_rxq_to_vq(dev->data->rx_queues[i]);
/* Flush the old packets */
-   virtqueue_rxvq_flush(rxvq->vq);
-   virtqueue_notify(rxvq->vq);
+   virtqueue_rxvq_flush(vq);
+   virtqueue_notify(vq);
}
 
for (i = 0; i < dev->data->nb_tx_queues; i++) {
-   txvq = dev->data->tx_queues[i];
-   virtqueue_notify(txvq->vq);
+   vq = virtnet_t

[dpdk-dev] [PATCH v2 3/4] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread Maxime Coquelin
While it is worth clarifying whether the fake mbuf
in virtnet_rx struct is really necessary, it is sure
that it heavily impacts cache usage by being part of
the struct. Indeed, it uses two cachelines, and
requires alignement on a cacheline.

Before this series, it means it took 120 bytes in
virtnet_rx struct:

struct virtnet_rx {
struct virtqueue * vq;   /* 0 8 */

/* XXX 56 bytes hole, try to pack */

/* --- cacheline 1 boundary (64 bytes) --- */
struct rte_mbuffake_mbuf __attribute__((__aligned__(64))); 
/*64   128 */
/* --- cacheline 3 boundary (192 bytes) --- */

This patch allocates it using malloc in order to optimize
virtnet_rx cache usage and so virtqueue cache usage.

Signed-off-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 13 +
 drivers/net/virtio/virtio_rxtx.c   |  8 +++-
 drivers/net/virtio/virtio_rxtx.h   |  2 +-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 65ad71f1a6..0ff0b16027 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -435,6 +435,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
int queue_type = virtio_get_queue_type(hw, queue_idx);
int ret;
int numa_node = dev->device->numa_node;
+   struct rte_mbuf *fake_mbuf = NULL;
 
PMD_INIT_LOG(INFO, "setting up queue: %u on NUMA node %d",
queue_idx, numa_node);
@@ -550,10 +551,18 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
goto free_hdr_mz;
}
 
+   fake_mbuf = malloc(sizeof(*fake_mbuf));
+   if (!fake_mbuf) {
+   PMD_INIT_LOG(ERR, "can not allocate fake mbuf");
+   ret = -ENOMEM;
+   goto free_sw_ring;
+   }
+
vq->sw_ring = sw_ring;
rxvq = &vq->rxq;
rxvq->port_id = dev->data->port_id;
rxvq->mz = mz;
+   rxvq->fake_mbuf = fake_mbuf;
} else if (queue_type == VTNET_TQ) {
txvq = &vq->txq;
txvq->port_id = dev->data->port_id;
@@ -613,6 +622,9 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
 clean_vq:
hw->cvq = NULL;
 
+   if (fake_mbuf)
+   free(fake_mbuf);
+free_sw_ring:
if (sw_ring)
rte_free(sw_ring);
 free_hdr_mz:
@@ -643,6 +655,7 @@ virtio_free_queues(struct virtio_hw *hw)
 
queue_type = virtio_get_queue_type(hw, i);
if (queue_type == VTNET_RQ) {
+   free(vq->rxq.fake_mbuf);
rte_free(vq->sw_ring);
rte_memzone_free(vq->rxq.mz);
} else if (queue_type == VTNET_TQ) {
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 32af8d3d11..c1ce15c8f5 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -703,11 +703,9 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, 
uint16_t queue_idx)
virtio_rxq_vec_setup(rxvq);
}
 
-   memset(&rxvq->fake_mbuf, 0, sizeof(rxvq->fake_mbuf));
-   for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST;
-desc_idx++) {
-   vq->sw_ring[vq->vq_nentries + desc_idx] =
-   &rxvq->fake_mbuf;
+   memset(rxvq->fake_mbuf, 0, sizeof(*rxvq->fake_mbuf));
+   for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST; desc_idx++) {
+   vq->sw_ring[vq->vq_nentries + desc_idx] = rxvq->fake_mbuf;
}
 
if (hw->use_vec_rx && !virtio_with_packed_queue(hw)) {
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7f1036be6f..6ce5d67d15 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -19,7 +19,7 @@ struct virtnet_stats {
 
 struct virtnet_rx {
/* dummy mbuf, for wraparound when processing RX ring. */
-   struct rte_mbuf fake_mbuf;
+   struct rte_mbuf *fake_mbuf;
uint64_t mbuf_initializer; /**< value to init mbufs. */
struct rte_mempool *mpool; /**< mempool for mbuf allocation */
 
-- 
2.29.2



[dpdk-dev] [PATCH v2 4/4] net/virtio: pack virtuqueue struct

2021-03-15 Thread Maxime Coquelin
This patch optimizes packing of the virtuqueue
struct by moving fields around to fill holes.

Offset field is not used and so can be removed.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Chenbo Xia 
---
 drivers/net/virtio/virtqueue.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 17e76f0e8c..4536b0ef9d 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -244,6 +244,15 @@ struct virtqueue {
uint16_t vq_avail_idx; /**< sync until needed */
uint16_t vq_free_thresh; /**< free threshold */
 
+   /**
+* Head of the free chain in the descriptor table. If
+* there are no free descriptors, this will be set to
+* VQ_RING_DESC_CHAIN_END.
+*/
+   uint16_t  vq_desc_head_idx;
+   uint16_t  vq_desc_tail_idx;
+   uint16_t  vq_queue_index;   /**< PCI queue index */
+   
void *vq_ring_virt_mem;  /**< linear address of vring*/
unsigned int vq_ring_size;
 
@@ -256,15 +265,6 @@ struct virtqueue {
rte_iova_t vq_ring_mem; /**< physical address of vring,
 * or virtual address for virtio_user. */
 
-   /**
-* Head of the free chain in the descriptor table. If
-* there are no free descriptors, this will be set to
-* VQ_RING_DESC_CHAIN_END.
-*/
-   uint16_t  vq_desc_head_idx;
-   uint16_t  vq_desc_tail_idx;
-   uint16_t  vq_queue_index;
-   uint16_t offset; /**< relative offset to obtain addr in mbuf */
uint16_t  *notify_addr;
struct rte_mbuf **sw_ring;  /**< RX software ring. */
struct vq_desc_extra vq_descx[0];
-- 
2.29.2



Re: [dpdk-dev] [PATCH v11 2/2] bus/pci: support MMIO in PCI ioport accessors

2021-03-15 Thread 谢华伟(此时此刻)



On 2021/3/15 18:19, David Marchand wrote:

#else
#define IO_COND(addr, is_pio, is_mmio) do {   \
 is_mmio;  \
} while (0)
#endif

We should not just copy/paste kernel code.

Plus here, this seems a bit overkill.
And there are other parts in this code that could use some polishing.

What do you think of merging this series as is (now that we got non
regression reports) and doing such cleanups in followup patches?


I am OK. Yes, we could do some cleanup after it is merged, for example 
against vfio, if it is really necessary for virtio PMD only to use vfio 
to access IO port.




[dpdk-dev] [PATCH] net/mlx5: add power monitoring support

2021-03-15 Thread Alexander Kozyrev
Support the PMD power management API in MLX5 driver.
The monitor policy of this API puts a CPU core to sleep until
a data in some monitored memory address is changed by the NIC.
Implement the get_monitor_addr function to return an address
of a CQE owner bit to monitor the arrival of a new packet.

Signed-off-by: Alexander Kozyrev 
---
 doc/guides/rel_notes/release_21_05.rst |  4 
 drivers/net/mlx5/mlx5.c|  2 ++
 drivers/net/mlx5/mlx5_rxtx.c   | 19 +++
 drivers/net/mlx5/mlx5_rxtx.h   |  1 +
 4 files changed, 26 insertions(+)

diff --git a/doc/guides/rel_notes/release_21_05.rst 
b/doc/guides/rel_notes/release_21_05.rst
index f262d48e82..928eafd92f 100644
--- a/doc/guides/rel_notes/release_21_05.rst
+++ b/doc/guides/rel_notes/release_21_05.rst
@@ -65,6 +65,10 @@ New Features
   * Added support for freeing Tx mbuf on demand.
   * Added support for copper port in Kunpeng930.
 
+* **Updated Mellanox mlx5 driver.**
+
+  * Added support for the monitor policy of Power Management API.
+
 * **Updated NXP DPAA driver.**
 
   * Added support for shared ethernet interface.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index abd7ff70df..7b419deb2c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1496,6 +1496,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
.hairpin_queue_peer_update = mlx5_hairpin_queue_peer_update,
.hairpin_queue_peer_bind = mlx5_hairpin_queue_peer_bind,
.hairpin_queue_peer_unbind = mlx5_hairpin_queue_peer_unbind,
+   .get_monitor_addr = mlx5_get_monitor_addr,
 };
 
 /* Available operations from secondary process. */
@@ -1580,6 +1581,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
.hairpin_queue_peer_update = mlx5_hairpin_queue_peer_update,
.hairpin_queue_peer_bind = mlx5_hairpin_queue_peer_bind,
.hairpin_queue_peer_unbind = mlx5_hairpin_queue_peer_unbind,
+   .get_monitor_addr = mlx5_get_monitor_addr,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index e3ce9fd224..0fc0c2096a 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -712,6 +712,25 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
return rx_queue_count(rxq);
 }
 
+int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct mlx5_rxq_data *rxq = rx_queue;
+   const unsigned int cqe_num = 1 << rxq->cqe_n;
+   const unsigned int cqe_mask = cqe_num - 1;
+   const uint16_t idx = rxq->cq_ci & cqe_num;
+   volatile struct mlx5_cqe *cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
+
+   if (unlikely(!cqe)) {
+   rte_errno = EINVAL;
+   return -rte_errno;
+   }
+   pmc->addr = &cqe->op_own;
+   pmc->val =  !!idx;
+   pmc->mask = MLX5_CQE_OWNER_MASK;
+   pmc->size = sizeof(uint8_t);
+   return 0;
+}
+
 #define MLX5_SYSTEM_LOG_DIR "/var/log"
 /**
  * Dump debug information to log file.
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0fd98af9d1..35a1bba486 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -441,6 +441,7 @@ int mlx5_rx_burst_mode_get(struct rte_eth_dev *dev, 
uint16_t rx_queue_id,
   struct rte_eth_burst_mode *mode);
 int mlx5_tx_burst_mode_get(struct rte_eth_dev *dev, uint16_t tx_queue_id,
   struct rte_eth_burst_mode *mode);
+int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc);
 
 /* Vectorized version of mlx5_rxtx.c */
 int mlx5_rxq_check_vec_support(struct mlx5_rxq_data *rxq_data);
-- 
2.24.1



[dpdk-dev] [PATCH v2] eal, power: use UINT64_MAX instead of -1ULL

2021-03-15 Thread Tyler Retzlaff
use UINT64_MAX instead of -1ULL when manipulating uint64_t masks and
initializing sentinel values.

some compilers generate a warning when applying a '-' to an unsigned
literal so avoid this by initializing with unsigned preprocessor
definitions where appropriate.

Signed-off-by: Tyler Retzlaff 
---
 lib/librte_eal/common/eal_common_fbarray.c | 12 ++--
 lib/librte_power/rte_power_pmd_mgmt.c  |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c 
b/lib/librte_eal/common/eal_common_fbarray.c
index 592ec5859..3a28a5324 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -138,7 +138,7 @@ find_next_n(const struct rte_fbarray *arr, unsigned int 
start, unsigned int n,
 */
last = MASK_LEN_TO_IDX(arr->len);
last_mod = MASK_LEN_TO_MOD(arr->len);
-   last_msk = ~(-1ULL << last_mod);
+   last_msk = ~(UINT64_MAX << last_mod);
 
for (msk_idx = first; msk_idx < msk->n_masks; msk_idx++) {
uint64_t cur_msk, lookahead_msk;
@@ -398,8 +398,8 @@ find_prev_n(const struct rte_fbarray *arr, unsigned int 
start, unsigned int n,
first_mod = MASK_LEN_TO_MOD(start);
/* we're going backwards, so mask must start from the top */
ignore_msk = first_mod == MASK_ALIGN - 1 ?
-   -1ULL : /* prevent overflow */
-   ~(-1ULL << (first_mod + 1));
+   UINT64_MAX : /* prevent overflow */
+   ~(UINT64_MAX << (first_mod + 1));
 
/* go backwards, include zero */
msk_idx = first;
@@ -513,7 +513,7 @@ find_prev_n(const struct rte_fbarray *arr, unsigned int 
start, unsigned int n,
 * no runs in the space we've lookbehind-scanned
 * as well, so skip that on next iteration.
 */
-   ignore_msk = -1ULL << need;
+   ignore_msk = UINT64_MAX << need;
msk_idx = lookbehind_idx;
break;
}
@@ -560,8 +560,8 @@ find_prev(const struct rte_fbarray *arr, unsigned int 
start, bool used)
first_mod = MASK_LEN_TO_MOD(start);
/* we're going backwards, so mask must start from the top */
ignore_msk = first_mod == MASK_ALIGN - 1 ?
-   -1ULL : /* prevent overflow */
-   ~(-1ULL << (first_mod + 1));
+   UINT64_MAX : /* prevent overflow */
+   ~(UINT64_MAX << (first_mod + 1));
 
/* go backwards, include zero */
idx = first;
diff --git a/lib/librte_power/rte_power_pmd_mgmt.c 
b/lib/librte_power/rte_power_pmd_mgmt.c
index 454ef7091..db03cbf42 100644
--- a/lib/librte_power/rte_power_pmd_mgmt.c
+++ b/lib/librte_power/rte_power_pmd_mgmt.c
@@ -111,7 +111,7 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf 
**pkts __rte_unused,
ret = rte_eth_get_monitor_addr(port_id, qidx,
&pmc);
if (ret == 0)
-   rte_power_monitor(&pmc, -1ULL);
+   rte_power_monitor(&pmc, UINT64_MAX);
}
q_conf->umwait_in_progress = false;
 
-- 
2.30.0.vfs.0.2



Re: [dpdk-dev] [PATCH v4 1/2] eal: error number enhancement for thread TLS API

2021-03-15 Thread Tal Shnaiderman
> Subject: Re: [PATCH v4 1/2] eal: error number enhancement for thread TLS
> API
> 
> On Wed, Mar 10, 2021 at 02:48:55PM +0200, Tal Shnaiderman wrote:
> > add error number reporting to rte_errno in all functions in the
> > rte_thread_tls_* API.
> >
> > Suggested-by: Anatoly Burakov 
> > Signed-off-by: Tal Shnaiderman 
> > ---
> >  lib/librte_eal/include/rte_thread.h | 14 +++---
> >  lib/librte_eal/unix/rte_thread.c|  6 ++
> >  lib/librte_eal/windows/rte_thread.c |  6 ++
> >  3 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/include/rte_thread.h
> > b/lib/librte_eal/include/rte_thread.h
> 
> After we introduce a translation function to map from Windows error codes
> to errno style codes (as part of EAL threads API), should we change this to
> directly return the error code from the functions?
> Or do we follow the pattern of setting rte_errno?

Sorry for the late reply,

I'd stick to errors in rte_errno, note that in cases like rte_thread_value_get 
the only way to get the errors is with rte_errno since it's returning the value 
itself.

BTW will you also add translation function for the UNIX errors to get identical 
errors?


Re: [dpdk-dev] [PATCH v2 2/4] net/virtio: improve queue init error path

2021-03-15 Thread David Marchand
On Mon, Mar 15, 2021 at 4:20 PM Maxime Coquelin
 wrote:
> @@ -604,15 +604,22 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
> queue_idx)
>
> if (VIRTIO_OPS(hw)->setup_queue(hw, vq) < 0) {
> PMD_INIT_LOG(ERR, "setup_queue failed");
> -   return -EINVAL;
> +   ret = -EINVAL;
> +   goto clean_vq;
> }
>
> return 0;
>
> -fail_q_alloc:
> -   rte_free(sw_ring);
> +clean_vq:
> +   hw->cvq = NULL;
> +
> +   if (sw_ring)
> +   rte_free(sw_ring);

Nit: rte_free handles NULL fine, you can remove the test, the same way
it was done before.

> +free_hdr_mz:
> rte_memzone_free(hdr_mz);
> +free_mz:
> rte_memzone_free(mz);
> +free_vq:
> rte_free(vq);
>
> return ret;

-- 
David Marchand



Re: [dpdk-dev] [PATCH v2 2/4] net/virtio: improve queue init error path

2021-03-15 Thread Maxime Coquelin



On 3/15/21 4:38 PM, David Marchand wrote:
> On Mon, Mar 15, 2021 at 4:20 PM Maxime Coquelin
>  wrote:
>> @@ -604,15 +604,22 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
>> queue_idx)
>>
>> if (VIRTIO_OPS(hw)->setup_queue(hw, vq) < 0) {
>> PMD_INIT_LOG(ERR, "setup_queue failed");
>> -   return -EINVAL;
>> +   ret = -EINVAL;
>> +   goto clean_vq;
>> }
>>
>> return 0;
>>
>> -fail_q_alloc:
>> -   rte_free(sw_ring);
>> +clean_vq:
>> +   hw->cvq = NULL;
>> +
>> +   if (sw_ring)
>> +   rte_free(sw_ring);
> 
> Nit: rte_free handles NULL fine, you can remove the test, the same way
> it was done before.

The API doc indeed specifies the NULL case, I'll remove it in v3.

>> +free_hdr_mz:
>> rte_memzone_free(hdr_mz);
>> +free_mz:
>> rte_memzone_free(mz);
>> +free_vq:
>> rte_free(vq);
>>
>> return ret;
> 

Thanks,
Maxime



Re: [dpdk-dev] [PATCH v2 3/4] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread David Marchand
On Mon, Mar 15, 2021 at 4:20 PM Maxime Coquelin
 wrote:
> @@ -550,10 +551,18 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
> queue_idx)
> goto free_hdr_mz;
> }
>
> +   fake_mbuf = malloc(sizeof(*fake_mbuf));
> +   if (!fake_mbuf) {
> +   PMD_INIT_LOG(ERR, "can not allocate fake mbuf");
> +   ret = -ENOMEM;
> +   goto free_sw_ring;
> +   }
> +
> vq->sw_ring = sw_ring;
> rxvq = &vq->rxq;
> rxvq->port_id = dev->data->port_id;
> rxvq->mz = mz;
> +   rxvq->fake_mbuf = fake_mbuf;

IIRC, vq is allocated as dpdk memory (rte_malloc).
Generally speaking, storing a local pointer inside such an object is
dangerous if other processes start to look at this part.


> } else if (queue_type == VTNET_TQ) {
> txvq = &vq->txq;
> txvq->port_id = dev->data->port_id;
> @@ -613,6 +622,9 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
> queue_idx)
>  clean_vq:
> hw->cvq = NULL;
>
> +   if (fake_mbuf)
> +   free(fake_mbuf);

No need for if().


-- 
David Marchand



Re: [dpdk-dev] [PATCH v2 4/4] net/virtio: pack virtuqueue struct

2021-03-15 Thread David Marchand
On Mon, Mar 15, 2021 at 4:20 PM Maxime Coquelin
 wrote:
>
> This patch optimizes packing of the virtuqueue

virtqueue ? and same typo in the title.

> struct by moving fields around to fill holes.
>
> Offset field is not used and so can be removed.
>
> Signed-off-by: Maxime Coquelin 
> Reviewed-by: Chenbo Xia 


-- 
David Marchand



[dpdk-dev] [PATCH v2] vhost: add header check in dequeue offload

2021-03-15 Thread Xiao Wang
When parsing the virtio net header and packet header for dequeue offload,
we need to perform sanity check on the packet header to ensure:
  - No out-of-boundary memory access.
  - The packet header and virtio_net header are valid and aligned.

Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
Cc: sta...@dpdk.org

Signed-off-by: Xiao Wang 
---
v2:
Allow empty L4 payload for cksum offload.
---
 lib/librte_vhost/virtio_net.c | 49 +--
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 583bf379c6..53a8ff2898 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1821,44 +1821,64 @@ virtio_net_with_host_offload(struct virtio_net *dev)
return false;
 }
 
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
+static int
+parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
+   uint16_t *len)
 {
struct rte_ipv4_hdr *ipv4_hdr;
struct rte_ipv6_hdr *ipv6_hdr;
void *l3_hdr = NULL;
struct rte_ether_hdr *eth_hdr;
uint16_t ethertype;
+   uint16_t data_len = m->data_len;
 
eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
 
+   if (data_len <= sizeof(struct rte_ether_hdr))
+   return -EINVAL;
+
m->l2_len = sizeof(struct rte_ether_hdr);
ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
+   data_len -= sizeof(struct rte_ether_hdr);
 
if (ethertype == RTE_ETHER_TYPE_VLAN) {
+   if (data_len <= sizeof(struct rte_vlan_hdr))
+   return -EINVAL;
+
struct rte_vlan_hdr *vlan_hdr =
(struct rte_vlan_hdr *)(eth_hdr + 1);
 
m->l2_len += sizeof(struct rte_vlan_hdr);
ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
+   data_len -= sizeof(struct rte_vlan_hdr);
}
 
l3_hdr = (char *)eth_hdr + m->l2_len;
 
switch (ethertype) {
case RTE_ETHER_TYPE_IPV4:
+   if (data_len <= sizeof(struct rte_ipv4_hdr))
+   return -EINVAL;
ipv4_hdr = l3_hdr;
*l4_proto = ipv4_hdr->next_proto_id;
m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
+   if (data_len <= m->l3_len) {
+   m->l3_len = 0;
+   return -EINVAL;
+   }
*l4_hdr = (char *)l3_hdr + m->l3_len;
m->ol_flags |= PKT_TX_IPV4;
+   data_len -= m->l3_len;
break;
case RTE_ETHER_TYPE_IPV6:
+   if (data_len <= sizeof(struct rte_ipv6_hdr))
+   return -EINVAL;
ipv6_hdr = l3_hdr;
*l4_proto = ipv6_hdr->proto;
m->l3_len = sizeof(struct rte_ipv6_hdr);
*l4_hdr = (char *)l3_hdr + m->l3_len;
m->ol_flags |= PKT_TX_IPV6;
+   data_len -= m->l3_len;
break;
default:
m->l3_len = 0;
@@ -1866,6 +1886,9 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, 
void **l4_hdr)
*l4_hdr = NULL;
break;
}
+
+   *len = data_len;
+   return 0;
 }
 
 static __rte_always_inline void
@@ -1874,24 +1897,30 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, 
struct rte_mbuf *m)
uint16_t l4_proto = 0;
void *l4_hdr = NULL;
struct rte_tcp_hdr *tcp_hdr = NULL;
+   uint16_t len = 0;
 
if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
return;
 
-   parse_ethernet(m, &l4_proto, &l4_hdr);
+   if (parse_ethernet(m, &l4_proto, &l4_hdr, &len) < 0)
+   return;
+
if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
if (hdr->csum_start == (m->l2_len + m->l3_len)) {
switch (hdr->csum_offset) {
case (offsetof(struct rte_tcp_hdr, cksum)):
-   if (l4_proto == IPPROTO_TCP)
+   if (l4_proto == IPPROTO_TCP &&
+   len >= sizeof(struct rte_tcp_hdr))
m->ol_flags |= PKT_TX_TCP_CKSUM;
break;
case (offsetof(struct rte_udp_hdr, dgram_cksum)):
-   if (l4_proto == IPPROTO_UDP)
+   if (l4_proto == IPPROTO_UDP &&
+   len >= sizeof(struct rte_udp_hdr))
m->ol_flags |= PKT_TX_UDP_CKSUM;
break;
case (offsetof(struct rte_sctp_hdr, cksum)):
-   if (l4_proto == IPPROTO_SCTP)
+   if (l4_proto == IPPROTO_SCTP &&
+  

Re: [dpdk-dev] [PATCH 2/2] net/mlx5: avoid unbind step to enable switchdev mode

2021-03-15 Thread Slava Ovsiienko
Hi, Jan

Yes, bullet [4] explicitly requires to unbind VFs, and detach the netdevs from 
the mlx5_core driver.
Otherwise, kernel driver refuses to be configured with switchdev mode in [5]. 
So, [4] can't be skipped.
After setting swithdev mode, VFs can be bound back (if it is needed, and these 
ones are not mapped to VMs):

echo -n "" > > /sys/bus/pci/drivers/mlx5_core/bind

With best regards,
Slava

> -Original Message-
> From: Jan Viktorin 
> Sent: Monday, March 15, 2021 17:34
> To: dev@dpdk.org
> Cc: Jan Viktorin ; Asaf Penso ;
> Shahaf Shuler ; Slava Ovsiienko
> ; Matan Azrad 
> Subject: [PATCH 2/2] net/mlx5: avoid unbind step to enable switchdev mode
> 
> From: Jan Viktorin 
> 
> The step 4 is a contradiction. It advices to unbind the device from the
> mlx5_core which removes the associated system network interface (e.g.
> eth0). In the step 5, the same system network interface (e.g. eth0) is
> required to exist.
> 
> Signed-off-by: Jan Viktorin 
> ---
>  doc/guides/nics/mlx5.rst | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 0a2dc3dee..122d8e0fc 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -1370,11 +1370,7 @@ the DPDK application.
> 
>  echo  /sys/class/net//device/sriov_numvfs
> 
> -4. Unbind the device (can be rebind after the switchdev mode)::
> -
> -echo -n "" >
> /sys/bus/pci/drivers/mlx5_core/unbind
> -
> -5. Enable switchdev mode::
> +4. Enable switchdev mode::
> 
>  echo switchdev > /sys/class/net//compat/devlink/mode
> 
> --
> 2.30.1



Re: [dpdk-dev] [PATCH v2 0/8] common/sfc_efx: prepare to introduce vDPA driver

2021-03-15 Thread Ferruh Yigit

On 3/15/2021 1:58 PM, Andrew Rybchenko wrote:

Update base driver to provide functionality required by vDPA driver.

Factor out helper functions to be shared by net and vDPA drivers.

v2:
  - fix windows build breakage - do not build common/sfc_efx in the case
of windows
  - remove undefined efx_virtio_* functions from version.map (since
EFSYS_OPT_VIRTIO is disabled)

Vijay Kumar Srivastava (6):
   common/sfc_efx/base: add virtio build dependency
   common/sfc_efx/base: add support to get virtio features
   common/sfc_efx/base: add support to verify virtio features
   common/sfc_efx: add support to get the device class
   net/sfc: skip driver probe for incompatible device class
   drivers: add common driver API to get efx family

Vijay Srivastava (2):
   common/sfc_efx/base: add base virtio support for vDPA
   common/sfc_efx/base: add API to get VirtQ doorbell offset



build still fails for windows,
http://mails.dpdk.org/archives/test-report/2021-March/182539.html

I guess it is missing, "build=false":
  if is_windows
 +   build=false
 subdir_done()
  endif




Re: [dpdk-dev] [PATCH v3 07/11] eal: add log level help

2021-03-15 Thread Stephen Hemminger
On Mon, 15 Mar 2021 11:52:13 +0100
Thomas Monjalon  wrote:

> 15/03/2021 11:42, Kinsella, Ray:
> > 
> > On 15/03/2021 10:31, Bruce Richardson wrote:  
> > > On Mon, Mar 15, 2021 at 10:19:47AM +, Kinsella, Ray wrote:  
> > >>
> > >>
> > >> On 12/03/2021 18:17, Thomas Monjalon wrote:  
> > >>> The option --log-level was not completely described in the usage text,
> > >>> and it was difficult to guess the names of the log types and levels.
> > >>>
> > >>> A new value "help" is accepted after --log-level to give more details
> > >>> about the syntax and listing the log types and levels.
> > >>>
> > >>> The array "levels" used for level name parsing is replaced with
> > >>> a (modified) existing function which was used in rte_log_dump().
> > >>>
> > >>> The new function rte_log_list_types() is exported in the API
> > >>> for allowing an application to give this info to the user
> > >>> if not exposing the EAL option --log-level.
> > >>> The list of log types cannot include all drivers if not linked in the
> > >>> application (shared object plugin case).
> > >>>
> > >>> Signed-off-by: Thomas Monjalon 
> > >>> ---
> > >>>  lib/librte_eal/common/eal_common_log.c | 24 +---
> > >>>  lib/librte_eal/common/eal_common_options.c | 44 +++---
> > >>>  lib/librte_eal/common/eal_log.h|  5 +++
> > >>>  lib/librte_eal/include/rte_log.h   | 11 ++
> > >>>  lib/librte_eal/version.map |  3 ++
> > >>>  5 files changed, 69 insertions(+), 18 deletions(-)
> > >>>  
> > >   
> > >>> @@ -1274,6 +1286,11 @@ eal_parse_log_level(const char *arg)
> > >>> char *str, *level;
> > >>> int priority;
> > >>>  
> > >>> +   if (strcmp(arg, "help") == 0) {  
> > >>
> > >> So I think the convention is to support both "?" and "help".
> > >> Qemu does this at least. 
> > >>  
> > > I've seen "/?" used for help on windows binaries, but "-?" not so much in 
> > > the
> > > linux world, where --help (and often -h for short) seem to be the 
> > > standard.
> > >   
> > 
> > This is slightly different - it is where you are looking to return a list 
> > of valid 
> > values for a parameter. So for instance in qemu mentioned above 
> > 
> >  ~ > qemu-system-x86_64 -cpu ? | head -n 10  
> 
> "?" is a special character.
> In my zsh, I need to quote it to avoid globbing parsing,
> so I'm not a fan.
> 
> I will let you extend the syntax in a separate patch :)
> 
> 

Also '?' is used by getopt to match unknown option. So qemu might just be
doing that as unintended side effect of any unknown option



Re: [dpdk-dev] [PATCH 2/2] net/mlx5: avoid unbind step to enable switchdev mode

2021-03-15 Thread Jan Viktorin
Hello Salva,

On Mon, 15 Mar 2021 15:53:51 +
Slava Ovsiienko  wrote:

> Hi, Jan
> 
> Yes, bullet [4] explicitly requires to unbind VFs, and detach the netdevs 
> from the mlx5_core driver.
> Otherwise, kernel driver refuses to be configured with switchdev mode in [5]. 
> So, [4] can't be skipped.
> After setting swithdev mode, VFs can be bound back (if it is needed, and 
> these ones are not mapped to VMs):

OK, but I believe that it is **not possible** to follow the rule [5].
The guide explicitly says in [4] "can be rebind **after** the switchdev mode".
Just, if you unbind the device, there is no way how to configure the switchdev 
mode,
this is the contradiction I mentioned in the commit. You cannot configure 
switchdev
mode because the interface is gone and the path /sys/class/net//compat/devlink/mode
no longer exists.

So, maybe, just the formulation is wrong. So, what is the **exact
right** way how to do it? I would change the commit accordingly. Just,
let's make it right. Would it work this way?

 # echo -n "" > /sys/bus/pci/drivers/mlx5_core/unbind
 # echo -n "" > /sys/bus/pci/drivers/mlx5_core/bind
 # echo switchdev > /sys/class/net//compat/devlink/mode

It is good to mention that after the rebind, the  can
change.

Regards,
Jan

> 
> echo -n "" > > /sys/bus/pci/drivers/mlx5_core/bind
> 
> With best regards,
> Slava
> 
> > -Original Message-
> > From: Jan Viktorin 
> > Sent: Monday, March 15, 2021 17:34
> > To: dev@dpdk.org
> > Cc: Jan Viktorin ; Asaf Penso ;
> > Shahaf Shuler ; Slava Ovsiienko
> > ; Matan Azrad 
> > Subject: [PATCH 2/2] net/mlx5: avoid unbind step to enable switchdev mode
> > 
> > From: Jan Viktorin 
> > 
> > The step 4 is a contradiction. It advices to unbind the device from the
> > mlx5_core which removes the associated system network interface (e.g.
> > eth0). In the step 5, the same system network interface (e.g. eth0) is
> > required to exist.
> > 
> > Signed-off-by: Jan Viktorin 
> > ---
> >  doc/guides/nics/mlx5.rst | 6 +-
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> > 0a2dc3dee..122d8e0fc 100644
> > --- a/doc/guides/nics/mlx5.rst
> > +++ b/doc/guides/nics/mlx5.rst
> > @@ -1370,11 +1370,7 @@ the DPDK application.
> > 
> >  echo  /sys/class/net//device/sriov_numvfs
> > 
> > -4. Unbind the device (can be rebind after the switchdev mode)::
> > -
> > -echo -n "" >
> > /sys/bus/pci/drivers/mlx5_core/unbind
> > -
> > -5. Enable switchdev mode::
> > +4. Enable switchdev mode::
> > 
> >  echo switchdev > /sys/class/net//compat/devlink/mode
> > 
> > --
> > 2.30.1  
> 



Re: [dpdk-dev] [dpdk-stable] [PATCH v2] vhost: add header check in dequeue offload

2021-03-15 Thread David Marchand
On Mon, Mar 15, 2021 at 4:52 PM Xiao Wang  wrote:
>
> When parsing the virtio net header and packet header for dequeue offload,
> we need to perform sanity check on the packet header to ensure:
>   - No out-of-boundary memory access.
>   - The packet header and virtio_net header are valid and aligned.
>
> Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Xiao Wang 
> ---
> v2:
> Allow empty L4 payload for cksum offload.
> ---
>  lib/librte_vhost/virtio_net.c | 49 
> +--
>  1 file changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 583bf379c6..53a8ff2898 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -1821,44 +1821,64 @@ virtio_net_with_host_offload(struct virtio_net *dev)
> return false;
>  }
>
> -static void
> -parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
> +static int
> +parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
> +   uint16_t *len)
>  {
> struct rte_ipv4_hdr *ipv4_hdr;
> struct rte_ipv6_hdr *ipv6_hdr;
> void *l3_hdr = NULL;
> struct rte_ether_hdr *eth_hdr;
> uint16_t ethertype;
> +   uint16_t data_len = m->data_len;

>
> eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
>
> +   if (data_len <= sizeof(struct rte_ether_hdr))
> +   return -EINVAL;

On principle, the check should happen before calling rte_pktmbuf_mtod,
like what rte_pktmbuf_read does.

Looking at the rest of the patch, does this helper function only
handle mono segment mbufs?
My reading of copy_desc_to_mbuf() was that it could generate multi
segments mbufs...


[snip]

> case RTE_ETHER_TYPE_IPV4:
> +   if (data_len <= sizeof(struct rte_ipv4_hdr))
> +   return -EINVAL;
> ipv4_hdr = l3_hdr;
> *l4_proto = ipv4_hdr->next_proto_id;
> m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
> +   if (data_len <= m->l3_len) {
> +   m->l3_len = 0;
> +   return -EINVAL;
> +   }

... so here, comparing l3 length to only the first segment length
(data_len) would be invalid.

If this helper must deal with multi segments, why not use rte_pktmbuf_read?
This function returns access to mbuf data after checking offset and
length are contiguous, else copy the needed data in a passed buffer.


> *l4_hdr = (char *)l3_hdr + m->l3_len;
> m->ol_flags |= PKT_TX_IPV4;
> +   data_len -= m->l3_len;
> break;


-- 
David Marchand



[dpdk-dev] [PATCH 0/7] Add support for VXLAN and NVGRE encap as a sample actions

2021-03-15 Thread Salem Sol
This series adds support for VXLAN and NVGRE encap as a sample actions with the 
proper
documentation, this series depends on [1] for the documentation part.
  
[1] 
http://patches.dpdk.org/project/dpdk/patch/1615774238-51875-1-git-send-email-jiaw...@nvidia.com/

Jiawei Wang (1):
  app/testpmd: store VXLAN/NVGRE encap data globally

Salem Sol (6):
  net/mlx5: support VXLAN encap action in sample
  net/mlx5: support NVGRE encap action in sample
  app/testpmd: support VXLAN encap for sample action
  app/testpmd: support NVGRE encap for sample action
  doc: update sample actions support in testpmd guide
  doc: update sample actions support in mlx5 guide

 app/test-pmd/cmdline_flow.c | 90 ++---
 doc/guides/nics/mlx5.rst|  4 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +
 drivers/net/mlx5/mlx5_flow_dv.c | 13 +++
 4 files changed, 99 insertions(+), 29 deletions(-)

-- 
2.21.0



[dpdk-dev] [PATCH 1/7] app/testpmd: store VXLAN/NVGRE encap data globally

2021-03-15 Thread Salem Sol
From: Jiawei Wang 

With the current code the VXLAN/NVGRE parsing routine
stored the configuration of the header on stack, this
might lead to overwriting the data on the stack.

This patch stores the external data of vxlan and nvgre encap
into global data as a pre-step to supporting vxlan and nvgre
encap as a sample actions.

Signed-off-by: Jiawei Wang 
---
 app/test-pmd/cmdline_flow.c | 76 -
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 49d9f9c043..84676a2e45 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -5244,31 +5244,14 @@ parse_vc_action_rss_queue(struct context *ctx, const 
struct token *token,
return len;
 }
 
-/** Parse VXLAN encap action. */
+/** Setup VXLAN encap configuration. */
 static int
-parse_vc_action_vxlan_encap(struct context *ctx, const struct token *token,
-   const char *str, unsigned int len,
-   void *buf, unsigned int size)
+parse_setup_vxlan_encap_data
+   (struct action_vxlan_encap_data *action_vxlan_encap_data)
 {
-   struct buffer *out = buf;
-   struct rte_flow_action *action;
-   struct action_vxlan_encap_data *action_vxlan_encap_data;
-   int ret;
-
-   ret = parse_vc(ctx, token, str, len, buf, size);
-   if (ret < 0)
-   return ret;
-   /* Nothing else to do if there is no buffer. */
-   if (!out)
-   return ret;
-   if (!out->args.vc.actions_n)
+   if (!action_vxlan_encap_data)
return -1;
-   action = &out->args.vc.actions[out->args.vc.actions_n - 1];
-   /* Point to selected object. */
-   ctx->object = out->args.vc.data;
-   ctx->objmask = NULL;
/* Set up default configuration. */
-   action_vxlan_encap_data = ctx->object;
*action_vxlan_encap_data = (struct action_vxlan_encap_data){
.conf = (struct rte_flow_action_vxlan_encap){
.definition = action_vxlan_encap_data->items,
@@ -5372,19 +5355,18 @@ parse_vc_action_vxlan_encap(struct context *ctx, const 
struct token *token,
}
memcpy(action_vxlan_encap_data->item_vxlan.vni, vxlan_encap_conf.vni,
   RTE_DIM(vxlan_encap_conf.vni));
-   action->conf = &action_vxlan_encap_data->conf;
-   return ret;
+   return 0;
 }
 
-/** Parse NVGRE encap action. */
+/** Parse VXLAN encap action. */
 static int
-parse_vc_action_nvgre_encap(struct context *ctx, const struct token *token,
+parse_vc_action_vxlan_encap(struct context *ctx, const struct token *token,
const char *str, unsigned int len,
void *buf, unsigned int size)
 {
struct buffer *out = buf;
struct rte_flow_action *action;
-   struct action_nvgre_encap_data *action_nvgre_encap_data;
+   struct action_vxlan_encap_data *action_vxlan_encap_data;
int ret;
 
ret = parse_vc(ctx, token, str, len, buf, size);
@@ -5399,8 +5381,20 @@ parse_vc_action_nvgre_encap(struct context *ctx, const 
struct token *token,
/* Point to selected object. */
ctx->object = out->args.vc.data;
ctx->objmask = NULL;
+   action_vxlan_encap_data = ctx->object;
+   parse_setup_vxlan_encap_data(action_vxlan_encap_data);
+   action->conf = &action_vxlan_encap_data->conf;
+   return ret;
+}
+
+/** Setup NVGRE encap configuration. */
+static int
+parse_setup_nvgre_encap_data
+   (struct action_nvgre_encap_data *action_nvgre_encap_data)
+{
+   if (!action_nvgre_encap_data)
+   return -1;
/* Set up default configuration. */
-   action_nvgre_encap_data = ctx->object;
*action_nvgre_encap_data = (struct action_nvgre_encap_data){
.conf = (struct rte_flow_action_nvgre_encap){
.definition = action_nvgre_encap_data->items,
@@ -5463,6 +5457,34 @@ parse_vc_action_nvgre_encap(struct context *ctx, const 
struct token *token,
RTE_FLOW_ITEM_TYPE_VOID;
memcpy(action_nvgre_encap_data->item_nvgre.tni, nvgre_encap_conf.tni,
   RTE_DIM(nvgre_encap_conf.tni));
+   return 0;
+}
+
+/** Parse NVGRE encap action. */
+static int
+parse_vc_action_nvgre_encap(struct context *ctx, const struct token *token,
+   const char *str, unsigned int len,
+   void *buf, unsigned int size)
+{
+   struct buffer *out = buf;
+   struct rte_flow_action *action;
+   struct action_nvgre_encap_data *action_nvgre_encap_data;
+   int ret;
+
+   ret = parse_vc(ctx, token, str, len, buf, size);
+   if (ret < 0)
+   return ret;
+   /* Nothing else to do if there is no buffer. */
+   if (!out)
+   return ret;
+   if (!out->args.vc.actions_n)
+   return -1;
+   action = &out->

[dpdk-dev] [PATCH 2/7] net/mlx5: support VXLAN encap action in sample

2021-03-15 Thread Salem Sol
Add support for VXLAN encap as a sample action
and validate it.

Signed-off-by: Salem Sol 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1a74d5ac2b..4b2db47e39 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5242,6 +5242,16 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
return ret;
++actions_n;
break;
+   case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+   ret = flow_dv_validate_action_l2_encap(dev,
+  sub_action_flags,
+  act, attr,
+  error);
+   if (ret < 0)
+   return ret;
+   sub_action_flags |= MLX5_FLOW_ACTION_ENCAP;
+   ++actions_n;
+   break;
default:
return rte_flow_error_set(error, ENOTSUP,
  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -10407,6 +10417,7 @@ flow_dv_translate_action_sample(struct rte_eth_dev *dev,
action_flags |= MLX5_FLOW_ACTION_PORT_ID;
break;
}
+   case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
/* Save the encap resource before sample */
pre_rix = dev_flow->handle->dvh.rix_encap_decap;
-- 
2.21.0



[dpdk-dev] [PATCH 4/7] app/testpmd: support VXLAN encap for sample action

2021-03-15 Thread Salem Sol
Add support for rte_flow_action_vxlan_encap as a sample action.

The example of test-pmd command:

1.  set vxlan ip-version ... vni ... udp-src ...
set raw_encap 1 eth src.../ ipv4.../...
set sample_actions 2 vxlan_encap / port_id id 0 / end
flow create 0 ... pattern eth / end actions
   sample ratio 1 index 2 / raw_encap index 1 / port_id id 0...

The flow will result in all the matched egress packets will be
encapsulated and sent to wire, and also mirrored the packets
using VXLAN encapsulation data and sent to wire.

Signed-off-by: Salem Sol 
---
 app/test-pmd/cmdline_flow.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 84676a2e45..61dfaab8fd 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -582,6 +582,7 @@ struct rte_flow_action_queue 
sample_queue[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_count sample_count[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_port_id sample_port_id[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_raw_encap sample_encap[RAW_SAMPLE_CONFS_MAX_NUM];
+struct action_vxlan_encap_data sample_vxlan_encap[RAW_SAMPLE_CONFS_MAX_NUM];
 struct action_rss_data sample_rss_data[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_vf sample_vf[RAW_SAMPLE_CONFS_MAX_NUM];
 
@@ -1615,6 +1616,7 @@ static const enum index next_action_sample[] = {
ACTION_COUNT,
ACTION_PORT_ID,
ACTION_RAW_ENCAP,
+   ACTION_VXLAN_ENCAP,
ACTION_NEXT,
ZERO,
 };
@@ -7949,6 +7951,11 @@ cmd_set_raw_parsed_sample(const struct buffer *in)
(const void *)action->conf, size);
action->conf = &sample_vf[idx];
break;
+   case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+   size = sizeof(struct rte_flow_action_vxlan_encap);
+   parse_setup_vxlan_encap_data(&sample_vxlan_encap[idx]);
+   action->conf = &sample_vxlan_encap[idx].conf;
+   break;
default:
printf("Error - Not supported action\n");
return;
-- 
2.21.0



[dpdk-dev] [PATCH 3/7] net/mlx5: support NVGRE encap action in sample

2021-03-15 Thread Salem Sol
Add support for NVGRE encap as a sample action
and validate it.

Signed-off-by: Salem Sol 
---
 drivers/net/mlx5/mlx5_flow_dv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 4b2db47e39..590abdc822 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5243,6 +5243,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
++actions_n;
break;
case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+   case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
ret = flow_dv_validate_action_l2_encap(dev,
   sub_action_flags,
   act, attr,
@@ -10418,6 +10419,7 @@ flow_dv_translate_action_sample(struct rte_eth_dev *dev,
break;
}
case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+   case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
/* Save the encap resource before sample */
pre_rix = dev_flow->handle->dvh.rix_encap_decap;
-- 
2.21.0



[dpdk-dev] [PATCH 5/7] app/testpmd: support NVGRE encap for sample action

2021-03-15 Thread Salem Sol
Add support for rte_flow_action_nvge_encap as a sample action.

The example of test-pmd command:

1.  set nvgre ip-version ... tni ... ip-src ... ip-dst ...
set raw_encap 1 eth src... / ipv4... /...
set sample_actions 2 nvgre / port_id id 0 / end
flow create 0 ... pattern eth / end actions
   sample ratio 1 index 2 / raw_encap index 1 / port_id id 0...

The flow will result in all the matched egress packets will be
encapsulated and sent to wire, and also mirrored the packets
using NVGRE encapsulation data and sent to wire.

Signed-off-by: Salem Sol 
---
 app/test-pmd/cmdline_flow.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 61dfaab8fd..0e33410005 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -583,6 +583,7 @@ struct rte_flow_action_count 
sample_count[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_port_id sample_port_id[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_raw_encap sample_encap[RAW_SAMPLE_CONFS_MAX_NUM];
 struct action_vxlan_encap_data sample_vxlan_encap[RAW_SAMPLE_CONFS_MAX_NUM];
+struct action_nvgre_encap_data sample_nvgre_encap[RAW_SAMPLE_CONFS_MAX_NUM];
 struct action_rss_data sample_rss_data[RAW_SAMPLE_CONFS_MAX_NUM];
 struct rte_flow_action_vf sample_vf[RAW_SAMPLE_CONFS_MAX_NUM];
 
@@ -1617,6 +1618,7 @@ static const enum index next_action_sample[] = {
ACTION_PORT_ID,
ACTION_RAW_ENCAP,
ACTION_VXLAN_ENCAP,
+   ACTION_NVGRE_ENCAP,
ACTION_NEXT,
ZERO,
 };
@@ -7956,6 +7958,11 @@ cmd_set_raw_parsed_sample(const struct buffer *in)
parse_setup_vxlan_encap_data(&sample_vxlan_encap[idx]);
action->conf = &sample_vxlan_encap[idx].conf;
break;
+   case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+   size = sizeof(struct rte_flow_action_nvgre_encap);
+   parse_setup_nvgre_encap_data(&sample_nvgre_encap[idx]);
+   action->conf = &sample_nvgre_encap[idx];
+   break;
default:
printf("Error - Not supported action\n");
return;
-- 
2.21.0



[dpdk-dev] [PATCH 6/7] doc: update sample actions support in testpmd guide

2021-03-15 Thread Salem Sol
Update documentation for sample action usage in testpmd utilizing
rte_flow_action_vxlan_encap and rte_flow_action_nvgre_encap and
show the command line example.

Signed-off-by: Salem Sol 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 21 +
 1 file changed, 21 insertions(+)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 3a31cc6237..5e40c9bc1c 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -4900,6 +4900,27 @@ and also mirrored the packets with encapsulation header 
and sent to port id 0.
  testpmd> set sample_actions 0 raw_encap / port_id id 0 / end
  testpmd> flow create 0 ingress transfer pattern eth / end actions
 sample ratio 1 index 0  / port_id id 2 / end
+E-Switch Mirroring rule, the matched ingress packets are sent to port id 2,
+and also mirrored the packets with VXLAN encapsulation header and sent to port 
id 0.
+
+::
+
+ testpmd> set vxlan ip-version ipv4 vni 4 udp-src 4 udp-dst 4 ip-src 127.0.0.1
+ip-dst 128.0.0.1 eth-src 11:11:11:11:11:11 eth-dst 22:22:22:22:22:22
+ testpmd> set sample_actions 0 vxlan_encap / port_id id 0 / end
+ testpmd> flow create 0 ingress transfer pattern eth / end actions
+sample ratio 1 index 0  / port_id id 2 / end
+
+E-Switch Mirroring rule, the matched ingress packets are sent to port id 2,
+and also mirrored the packets with NVGRE encapsulation header and sent to port 
id 0.
+
+::
+
+ testpmd> set nvgre ip-version ipv4 tni 4 ip-src 127.0.0.1 ip-dst 128.0.0.1
+eth-src 11:11:11:11:11:11 eth-dst 22:22:22:22:22:22
+ testpmd> set sample_actions 0 nvgre_encap / port_id id 0 / end
+ testpmd> flow create 0 ingress transfer pattern eth / end actions
+sample ratio 1 index 0  / port_id id 2 / end
 
 BPF Functions
 --
-- 
2.21.0



[dpdk-dev] [PATCH 7/7] doc: update sample actions support in mlx5 guide

2021-03-15 Thread Salem Sol
Updates the documentation with the added support for sample actions VXLAN
and NVGRE encap in E-Switch steering flow.

Signed-off-by: Salem Sol 
---
 doc/guides/nics/mlx5.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 96fce36e3c..378b7202d9 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -376,8 +376,8 @@ Limitations
 encapsulation actions.
   - For NIC Rx flow, supports ``MARK``, ``COUNT``, ``QUEUE``, ``RSS`` in the
 sample actions list.
-  - For E-Switch mirroring flow, supports ``RAW ENCAP``, ``Port ID`` in the
-sample actions list.
+  - For E-Switch mirroring flow, supports ``RAW ENCAP``, ``Port ID``,
+``VXLAN ENCAP``, ``NVGRE ENCAP`` in the sample actions list.
 
 - Modify Field flow:
 
-- 
2.21.0



[dpdk-dev] [PATCH 1/2] net/mlx5: fix typos

2021-03-15 Thread Jan Viktorin
From: Jan Viktorin 

Signed-off-by: Jan Viktorin 
---
 doc/guides/nics/mlx5.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7c50497fb..0a2dc3dee 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1372,9 +1372,9 @@ the DPDK application.
 
 4. Unbind the device (can be rebind after the switchdev mode)::
 
-echo -n " /sys/bus/pci/drivers/mlx5_core/unbind
+echo -n "" > /sys/bus/pci/drivers/mlx5_core/unbind
 
-5. Enbale switchdev mode::
+5. Enable switchdev mode::
 
 echo switchdev > /sys/class/net//compat/devlink/mode
 
-- 
2.30.1



[dpdk-dev] [PATCH 2/2] net/mlx5: avoid unbind step to enable switchdev mode

2021-03-15 Thread Jan Viktorin
From: Jan Viktorin 

The step 4 is a contradiction. It advices to unbind the device from the
mlx5_core which removes the associated system network interface (e.g.
eth0). In the step 5, the same system network interface (e.g. eth0) is
required to exist.

Signed-off-by: Jan Viktorin 
---
 doc/guides/nics/mlx5.rst | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0a2dc3dee..122d8e0fc 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1370,11 +1370,7 @@ the DPDK application.
 
 echo  /sys/class/net//device/sriov_numvfs
 
-4. Unbind the device (can be rebind after the switchdev mode)::
-
-echo -n "" > /sys/bus/pci/drivers/mlx5_core/unbind
-
-5. Enable switchdev mode::
+4. Enable switchdev mode::
 
 echo switchdev > /sys/class/net//compat/devlink/mode
 
-- 
2.30.1



Re: [dpdk-dev] [PATCH v2 3/4] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread Maxime Coquelin



On 3/15/21 4:50 PM, David Marchand wrote:
> On Mon, Mar 15, 2021 at 4:20 PM Maxime Coquelin
>  wrote:
>> @@ -550,10 +551,18 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
>> queue_idx)
>> goto free_hdr_mz;
>> }
>>
>> +   fake_mbuf = malloc(sizeof(*fake_mbuf));
>> +   if (!fake_mbuf) {
>> +   PMD_INIT_LOG(ERR, "can not allocate fake mbuf");
>> +   ret = -ENOMEM;
>> +   goto free_sw_ring;
>> +   }
>> +
>> vq->sw_ring = sw_ring;
>> rxvq = &vq->rxq;
>> rxvq->port_id = dev->data->port_id;
>> rxvq->mz = mz;
>> +   rxvq->fake_mbuf = fake_mbuf;
> 
> IIRC, vq is allocated as dpdk memory (rte_malloc).
> Generally speaking, storing a local pointer inside such an object is
> dangerous if other processes start to look at this part.

Agree, I will change to rte_zmalloc_socket, as vq (which is was part of)
is allocated like that.


Thanks,
Maxime

> 
>> } else if (queue_type == VTNET_TQ) {
>> txvq = &vq->txq;
>> txvq->port_id = dev->data->port_id;
>> @@ -613,6 +622,9 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
>> queue_idx)
>>  clean_vq:
>> hw->cvq = NULL;
>>
>> +   if (fake_mbuf)
>> +   free(fake_mbuf);
> 
> No need for if().
> 
> 



Re: [dpdk-dev] [PATCH v2] sched : Initialize tc ov watermark.

2021-03-15 Thread Coyle, David


> -Original Message-
> From: dev  On Behalf Of Savinay Dharmappa
> Sent: Tuesday, March 9, 2021 4:10 PM
> To: Singh, Jasvinder ; Dumitrescu, Cristian
> ; dev@dpdk.org
> Cc: Dharmappa, Savinay 
> Subject: [dpdk-dev] [PATCH v2] sched : Initialize tc ov watermark.
> 
> tc ov watermark is initialized with computed value of max tc ov watermark.
> 
> Signed-off-by: Savinay Dharmappa 
> ---
> v2: fix spelling error.
> ---
>  lib/librte_sched/rte_sched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Tested-by: David Coyle  

[dpdk-dev] [PATCH v3 1/4] net/virtio: remove reference to virtqueue in vrings

2021-03-15 Thread Maxime Coquelin
Vrings are part of the virtqueues, so we don't need
to have a pointer to it in Vrings descriptions.

Instead, let's just subtract from its offset to
calculate virtqueue address.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Chenbo Xia 
---
 drivers/net/virtio/virtio_ethdev.c| 36 +--
 drivers/net/virtio/virtio_rxtx.c  | 28 +++
 drivers/net/virtio/virtio_rxtx.h  |  3 --
 drivers/net/virtio/virtio_rxtx_packed.c   |  4 +--
 drivers/net/virtio/virtio_rxtx_packed.h   |  6 ++--
 drivers/net/virtio/virtio_rxtx_packed_avx.h   |  4 +--
 drivers/net/virtio/virtio_rxtx_simple.h   |  2 +-
 .../net/virtio/virtio_rxtx_simple_altivec.c   |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_neon.c  |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_sse.c   |  2 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |  4 +--
 drivers/net/virtio/virtio_user_ethdev.c   |  2 +-
 drivers/net/virtio/virtqueue.h|  6 +++-
 13 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 333a5243a9..af090fdf9c 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -133,7 +133,7 @@ virtio_send_command_packed(struct virtnet_ctl *cvq,
   struct virtio_pmd_ctrl *ctrl,
   int *dlen, int pkt_num)
 {
-   struct virtqueue *vq = cvq->vq;
+   struct virtqueue *vq = virtnet_cq_to_vq(cvq);
int head;
struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
struct virtio_pmd_ctrl *result;
@@ -229,7 +229,7 @@ virtio_send_command_split(struct virtnet_ctl *cvq,
  int *dlen, int pkt_num)
 {
struct virtio_pmd_ctrl *result;
-   struct virtqueue *vq = cvq->vq;
+   struct virtqueue *vq = virtnet_cq_to_vq(cvq);
uint32_t head, i;
int k, sum = 0;
 
@@ -316,13 +316,13 @@ virtio_send_command(struct virtnet_ctl *cvq, struct 
virtio_pmd_ctrl *ctrl,
 
ctrl->status = status;
 
-   if (!cvq || !cvq->vq) {
+   if (!cvq) {
PMD_INIT_LOG(ERR, "Control queue is not supported.");
return -1;
}
 
rte_spinlock_lock(&cvq->lock);
-   vq = cvq->vq;
+   vq = virtnet_cq_to_vq(cvq);
 
PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
"vq->hw->cvq = %p vq = %p",
@@ -552,19 +552,16 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
 
vq->sw_ring = sw_ring;
rxvq = &vq->rxq;
-   rxvq->vq = vq;
rxvq->port_id = dev->data->port_id;
rxvq->mz = mz;
} else if (queue_type == VTNET_TQ) {
txvq = &vq->txq;
-   txvq->vq = vq;
txvq->port_id = dev->data->port_id;
txvq->mz = mz;
txvq->virtio_net_hdr_mz = hdr_mz;
txvq->virtio_net_hdr_mem = hdr_mz->iova;
} else if (queue_type == VTNET_CQ) {
cvq = &vq->cq;
-   cvq->vq = vq;
cvq->mz = mz;
cvq->virtio_net_hdr_mz = hdr_mz;
cvq->virtio_net_hdr_mem = hdr_mz->iova;
@@ -851,7 +848,7 @@ virtio_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, 
uint16_t queue_id)
 {
struct virtio_hw *hw = dev->data->dev_private;
struct virtnet_rx *rxvq = dev->data->rx_queues[queue_id];
-   struct virtqueue *vq = rxvq->vq;
+   struct virtqueue *vq = virtnet_rxq_to_vq(rxvq);
 
virtqueue_enable_intr(vq);
virtio_mb(hw->weak_barriers);
@@ -862,7 +859,7 @@ static int
 virtio_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 {
struct virtnet_rx *rxvq = dev->data->rx_queues[queue_id];
-   struct virtqueue *vq = rxvq->vq;
+   struct virtqueue *vq = virtnet_rxq_to_vq(rxvq);
 
virtqueue_disable_intr(vq);
return 0;
@@ -2180,8 +2177,7 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   struct virtnet_rx *rxvq;
-   struct virtnet_tx *txvq __rte_unused;
+   struct virtqueue *vq;
struct virtio_hw *hw = dev->data->dev_private;
int ret;
 
@@ -2238,27 +2234,27 @@ virtio_dev_start(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
 
for (i = 0; i < dev->data->nb_rx_queues; i++) {
-   rxvq = dev->data->rx_queues[i];
+   vq = virtnet_rxq_to_vq(dev->data->rx_queues[i]);
/* Flush the old packets */
-   virtqueue_rxvq_flush(rxvq->vq);
-   virtqueue_notify(rxvq->vq);
+   virtqueue_rxvq_flush(vq);
+   virtqueue_notify(vq);
}
 
for (i = 0; i < dev->data->nb_tx_queues; i++) {
-   txvq = dev->data->tx_queues[i];
-   virtqueue_notify(txvq->vq);
+   vq = virtnet_t

[dpdk-dev] [PATCH v3 0/4] net/virtio: make virtqueue struct cache-friendly

2021-03-15 Thread Maxime Coquelin
This series optimizes the cache usage of virtqueue struct,
by making a "fake" mbuf being dynamically allocated in Rx
virtnet struct, by removing a useless virtuque pointer
into the virtnet structs and by moving a few fields
to pack holes.

With these 3 patches, the virtqueue struct size goes from
576 bytes (9 cachelines) to 248 bytes (4 cachelines).

Changes in v3:
==
- Use rte_zmalloc_socket for fake mbuf alloc (David)
- Fix typos in commit messages
- Remove superfluous pointer check befor freeing (David)
- Fix checkpatch warnings

Changes in v2:
==
- Rebase on latest main
- Improve error path in virtio_init_queue
- Fix various typos in commit messages

Maxime Coquelin (4):
  net/virtio: remove reference to virtqueue in vrings
  net/virtio: improve queue init error path
  net/virtio: allocate fake mbuf in Rx queue
  net/virtio: pack virtqueue struct

 drivers/net/virtio/virtio_ethdev.c| 64 +++
 drivers/net/virtio/virtio_rxtx.c  | 37 +--
 drivers/net/virtio/virtio_rxtx.h  |  5 +-
 drivers/net/virtio/virtio_rxtx_packed.c   |  4 +-
 drivers/net/virtio/virtio_rxtx_packed.h   |  6 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h   |  4 +-
 drivers/net/virtio/virtio_rxtx_simple.h   |  2 +-
 .../net/virtio/virtio_rxtx_simple_altivec.c   |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_neon.c  |  2 +-
 drivers/net/virtio/virtio_rxtx_simple_sse.c   |  2 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |  4 +-
 drivers/net/virtio/virtio_user_ethdev.c   |  2 +-
 drivers/net/virtio/virtqueue.h| 24 ---
 13 files changed, 85 insertions(+), 73 deletions(-)

-- 
2.29.2



[dpdk-dev] [PATCH v3 2/4] net/virtio: improve queue init error path

2021-03-15 Thread Maxime Coquelin
This patch improves the error path of virtio_init_queue(),
by cleaning in reversing order all resources that have
been allocated.

Suggested-by: Chenbo Xia 
Signed-off-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index af090fdf9c..d5643733f7 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -507,7 +507,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
mz = rte_memzone_lookup(vq_name);
if (mz == NULL) {
ret = -ENOMEM;
-   goto fail_q_alloc;
+   goto free_vq;
}
}
 
@@ -533,7 +533,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
hdr_mz = rte_memzone_lookup(vq_hdr_name);
if (hdr_mz == NULL) {
ret = -ENOMEM;
-   goto fail_q_alloc;
+   goto free_mz;
}
}
}
@@ -547,7 +547,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
if (!sw_ring) {
PMD_INIT_LOG(ERR, "can not allocate RX soft ring");
ret = -ENOMEM;
-   goto fail_q_alloc;
+   goto free_hdr_mz;
}
 
vq->sw_ring = sw_ring;
@@ -604,15 +604,20 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
 
if (VIRTIO_OPS(hw)->setup_queue(hw, vq) < 0) {
PMD_INIT_LOG(ERR, "setup_queue failed");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto clean_vq;
}
 
return 0;
 
-fail_q_alloc:
+clean_vq:
+   hw->cvq = NULL;
rte_free(sw_ring);
+free_hdr_mz:
rte_memzone_free(hdr_mz);
+free_mz:
rte_memzone_free(mz);
+free_vq:
rte_free(vq);
 
return ret;
-- 
2.29.2



[dpdk-dev] [PATCH v3 3/4] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread Maxime Coquelin
While it is worth clarifying whether the fake mbuf
in virtnet_rx struct is really necessary, it is sure
that it heavily impacts cache usage by being part of
the struct. Indeed, it uses two cachelines, and
requires alignment on a cacheline.

Before this series, it means it took 120 bytes in
virtnet_rx struct:

struct virtnet_rx {
struct virtqueue * vq;   /* 0 8 */

/* XXX 56 bytes hole, try to pack */

/* --- cacheline 1 boundary (64 bytes) --- */
struct rte_mbuffake_mbuf __attribute__((__aligned__(64))); 
/*64   128 */
/* --- cacheline 3 boundary (192 bytes) --- */

This patch allocates it using malloc in order to optimize
virtnet_rx cache usage and so virtqueue cache usage.

Signed-off-by: Maxime Coquelin 
---
 drivers/net/virtio/virtio_ethdev.c | 13 +
 drivers/net/virtio/virtio_rxtx.c   |  9 +++--
 drivers/net/virtio/virtio_rxtx.h   |  2 +-
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d5643733f7..fda6c141dd 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -435,6 +435,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
int queue_type = virtio_get_queue_type(hw, queue_idx);
int ret;
int numa_node = dev->device->numa_node;
+   struct rte_mbuf *fake_mbuf = NULL;
 
PMD_INIT_LOG(INFO, "setting up queue: %u on NUMA node %d",
queue_idx, numa_node);
@@ -550,10 +551,19 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
goto free_hdr_mz;
}
 
+   fake_mbuf = rte_zmalloc_socket("sw_ring", sizeof(*fake_mbuf),
+   RTE_CACHE_LINE_SIZE, numa_node);
+   if (!fake_mbuf) {
+   PMD_INIT_LOG(ERR, "can not allocate fake mbuf");
+   ret = -ENOMEM;
+   goto free_sw_ring;
+   }
+
vq->sw_ring = sw_ring;
rxvq = &vq->rxq;
rxvq->port_id = dev->data->port_id;
rxvq->mz = mz;
+   rxvq->fake_mbuf = fake_mbuf;
} else if (queue_type == VTNET_TQ) {
txvq = &vq->txq;
txvq->port_id = dev->data->port_id;
@@ -612,6 +622,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
queue_idx)
 
 clean_vq:
hw->cvq = NULL;
+   rte_free(fake_mbuf);
+free_sw_ring:
rte_free(sw_ring);
 free_hdr_mz:
rte_memzone_free(hdr_mz);
@@ -641,6 +653,7 @@ virtio_free_queues(struct virtio_hw *hw)
 
queue_type = virtio_get_queue_type(hw, i);
if (queue_type == VTNET_RQ) {
+   free(vq->rxq.fake_mbuf);
rte_free(vq->sw_ring);
rte_memzone_free(vq->rxq.mz);
} else if (queue_type == VTNET_TQ) {
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 32af8d3d11..8df913b0ba 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -703,12 +703,9 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, 
uint16_t queue_idx)
virtio_rxq_vec_setup(rxvq);
}
 
-   memset(&rxvq->fake_mbuf, 0, sizeof(rxvq->fake_mbuf));
-   for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST;
-desc_idx++) {
-   vq->sw_ring[vq->vq_nentries + desc_idx] =
-   &rxvq->fake_mbuf;
-   }
+   memset(rxvq->fake_mbuf, 0, sizeof(*rxvq->fake_mbuf));
+   for (desc_idx = 0; desc_idx < RTE_PMD_VIRTIO_RX_MAX_BURST; desc_idx++)
+   vq->sw_ring[vq->vq_nentries + desc_idx] = rxvq->fake_mbuf;
 
if (hw->use_vec_rx && !virtio_with_packed_queue(hw)) {
while (vq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7f1036be6f..6ce5d67d15 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -19,7 +19,7 @@ struct virtnet_stats {
 
 struct virtnet_rx {
/* dummy mbuf, for wraparound when processing RX ring. */
-   struct rte_mbuf fake_mbuf;
+   struct rte_mbuf *fake_mbuf;
uint64_t mbuf_initializer; /**< value to init mbufs. */
struct rte_mempool *mpool; /**< mempool for mbuf allocation */
 
-- 
2.29.2



[dpdk-dev] [PATCH v3 4/4] net/virtio: pack virtqueue struct

2021-03-15 Thread Maxime Coquelin
This patch optimizes packing of the virtqueue
struct by moving fields around to fill holes.

Offset field is not used and so can be removed.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Chenbo Xia 
---
 drivers/net/virtio/virtqueue.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 17e76f0e8c..e9992b745d 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -244,6 +244,15 @@ struct virtqueue {
uint16_t vq_avail_idx; /**< sync until needed */
uint16_t vq_free_thresh; /**< free threshold */
 
+   /**
+* Head of the free chain in the descriptor table. If
+* there are no free descriptors, this will be set to
+* VQ_RING_DESC_CHAIN_END.
+*/
+   uint16_t  vq_desc_head_idx;
+   uint16_t  vq_desc_tail_idx;
+   uint16_t  vq_queue_index;   /**< PCI queue index */
+
void *vq_ring_virt_mem;  /**< linear address of vring*/
unsigned int vq_ring_size;
 
@@ -256,15 +265,6 @@ struct virtqueue {
rte_iova_t vq_ring_mem; /**< physical address of vring,
 * or virtual address for virtio_user. */
 
-   /**
-* Head of the free chain in the descriptor table. If
-* there are no free descriptors, this will be set to
-* VQ_RING_DESC_CHAIN_END.
-*/
-   uint16_t  vq_desc_head_idx;
-   uint16_t  vq_desc_tail_idx;
-   uint16_t  vq_queue_index;
-   uint16_t offset; /**< relative offset to obtain addr in mbuf */
uint16_t  *notify_addr;
struct rte_mbuf **sw_ring;  /**< RX software ring. */
struct vq_desc_extra vq_descx[0];
-- 
2.29.2



Re: [dpdk-dev] [PATCH v3 3/4] net/virtio: allocate fake mbuf in Rx queue

2021-03-15 Thread David Marchand
On Mon, Mar 15, 2021 at 5:46 PM Maxime Coquelin
 wrote:
> @@ -612,6 +622,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
> queue_idx)
>
>  clean_vq:
> hw->cvq = NULL;
> +   rte_free(fake_mbuf);
> +free_sw_ring:
> rte_free(sw_ring);
>  free_hdr_mz:
> rte_memzone_free(hdr_mz);
> @@ -641,6 +653,7 @@ virtio_free_queues(struct virtio_hw *hw)
>
> queue_type = virtio_get_queue_type(hw, i);
> if (queue_type == VTNET_RQ) {
> +   free(vq->rxq.fake_mbuf);

rte_free()


-- 
David Marchand



Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4

2021-03-15 Thread Ferruh Yigit

On 3/1/2021 4:38 PM, Stephen Hemminger wrote:

On Mon, 1 Mar 2021 11:10:01 +0300
Igor Ryzhov  wrote:


Stephen,

No, I don't have a better proposal, but I think it is not correct to change
the behavior of KNI (making link down without a real response).
Even though we know that communicating with userspace under rtnl_lock is a
bad idea, it works as it is for many years already.

Elad,

I agree with you that KNI should be removed from the main tree if it is not
possible to fix this __dev_close_many issue.
There were discussions about this multiple times already, but no one is
working on this AFAIK.
Last time the discussion was a month ago:
https://www.mail-archive.com/dev@dpdk.org/msg196033.html

Igor


The better proposal would be to make DPDK virtio smarter.
There already is virtio devices that must handle this (VDPA) etc.
And when you can control link through virtio, then put a big warning
in KNI that says "Don't use this"



Hi Igor, Elad,

I think it is reasonable to do the ifdown as async to solve the problem,
still we can make sync default, and async with kernel parameter, to cover both 
case.

I will put more details on the patches.


Re: [dpdk-dev] [PATCH v3 0/4] net/virtio: make virtqueue struct cache-friendly

2021-03-15 Thread David Marchand
On Mon, Mar 15, 2021 at 5:46 PM Maxime Coquelin
 wrote:
>
> This series optimizes the cache usage of virtqueue struct,
> by making a "fake" mbuf being dynamically allocated in Rx
> virtnet struct, by removing a useless virtuque pointer
> into the virtnet structs and by moving a few fields
> to pack holes.
>
> With these 3 patches, the virtqueue struct size goes from
> 576 bytes (9 cachelines) to 248 bytes (4 cachelines).
>
> Changes in v3:
> ==
> - Use rte_zmalloc_socket for fake mbuf alloc (David)
> - Fix typos in commit messages
> - Remove superfluous pointer check befor freeing (David)
> - Fix checkpatch warnings

Once fixed patch 3, you can add for the series,
Reviewed-by: David Marchand 


-- 
David Marchand



Re: [dpdk-dev] [PATCH v3 07/11] eal: add log level help

2021-03-15 Thread Kinsella, Ray



On 15/03/2021 15:59, Stephen Hemminger wrote:
> On Mon, 15 Mar 2021 11:52:13 +0100
> Thomas Monjalon  wrote:
> 
>> 15/03/2021 11:42, Kinsella, Ray:
>>>
>>> On 15/03/2021 10:31, Bruce Richardson wrote:  
 On Mon, Mar 15, 2021 at 10:19:47AM +, Kinsella, Ray wrote:  
>
>
> On 12/03/2021 18:17, Thomas Monjalon wrote:  
>> The option --log-level was not completely described in the usage text,
>> and it was difficult to guess the names of the log types and levels.
>>
>> A new value "help" is accepted after --log-level to give more details
>> about the syntax and listing the log types and levels.
>>
>> The array "levels" used for level name parsing is replaced with
>> a (modified) existing function which was used in rte_log_dump().
>>
>> The new function rte_log_list_types() is exported in the API
>> for allowing an application to give this info to the user
>> if not exposing the EAL option --log-level.
>> The list of log types cannot include all drivers if not linked in the
>> application (shared object plugin case).
>>
>> Signed-off-by: Thomas Monjalon 
>> ---
>>  lib/librte_eal/common/eal_common_log.c | 24 +---
>>  lib/librte_eal/common/eal_common_options.c | 44 +++---
>>  lib/librte_eal/common/eal_log.h|  5 +++
>>  lib/librte_eal/include/rte_log.h   | 11 ++
>>  lib/librte_eal/version.map |  3 ++
>>  5 files changed, 69 insertions(+), 18 deletions(-)
>>  
   
>> @@ -1274,6 +1286,11 @@ eal_parse_log_level(const char *arg)
>>  char *str, *level;
>>  int priority;
>>  
>> +if (strcmp(arg, "help") == 0) {  
>
> So I think the convention is to support both "?" and "help".
> Qemu does this at least. 
>  
 I've seen "/?" used for help on windows binaries, but "-?" not so much in 
 the
 linux world, where --help (and often -h for short) seem to be the standard.
   
>>>
>>> This is slightly different - it is where you are looking to return a list 
>>> of valid 
>>> values for a parameter. So for instance in qemu mentioned above 
>>>
>>>  ~ > qemu-system-x86_64 -cpu ? | head -n 10  
>>
>> "?" is a special character.
>> In my zsh, I need to quote it to avoid globbing parsing,
>> so I'm not a fan.
>>
>> I will let you extend the syntax in a separate patch :)
>>
>>
> 
> Also '?' is used by getopt to match unknown option. So qemu might just be
> doing that as unintended side effect of any unknown option
> 

for other unknowns it explicitly complains ... 

~ > qemu-system-x86_64 -cpu unknown
Unable to init server: Could not connect: Connection refused
qemu-system-x86_64: unable to find CPU model 'unknown


  1   2   >