Re: [dpdk-dev] [PATCH] app/testpmd: add sanity checks when retrieving xstats
Hey guys, On Thu, Jun 7, 2018 at 10:15 AM, David Marchand wrote: > Testpmd should not expect the xstats names and values arrays to be > aligned: neither the arrays sizes, nor the order in which the values are. > > This hid some bugs where pmds would either return wrong values count or > invalid statistics indexes. > > Link: > http://dpdk.org/browse/dpdk/commit/?id=5fd4d049692b2fde8bf49c7461b18180a8fd2545 > Link: http://dpdk.org/dev/patchwork/patch/40705/ > > Signed-off-by: David Marchand > --- > > @stable: when this goes in, I recommend backporting this to all existing > branches, as it makes it easier to show this kind of pmds bugs. Can someone have a look please ? Thanks. -- David Marchand
Re: [dpdk-dev] [PATCH] net/ixgbe: fix tunnel id format error for FDIR
Hi, Wenzhuo > -Original Message- > From: Lu, Wenzhuo > Sent: Tuesday, June 12, 2018 1:10 PM > To: Zhao1, Wei ; dev@dpdk.org > Cc: sta...@dpdk.org > Subject: RE: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > Hi Wei, > > > > -Original Message- > > From: Zhao1, Wei > > Sent: Tuesday, June 5, 2018 5:12 PM > > To: dev@dpdk.org > > Cc: Lu, Wenzhuo ; sta...@dpdk.org; Zhao1, Wei > > > > Subject: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > > In cloud mode for FDIR, tunnel id should be set as protocol request, > > the lower 8 bits should be set as reserved. > To my opinion, the original implementation and this patch have different > understanding of the 'tunnel_id' in ' struct rte_eth_tunnel_flow'. Originally > it > only means the tunnel id but not including the reserved 8 bits. > This patch means it should include the reserved bits. Maybe it makes things > easier because the whole 4 bytes are big endian. > So, may I suggest to add some comments in ' struct rte_eth_tunnel_flow' to > let the users know what the 'tunnel_id' really means? The format from input for 'tunnel_id' should be network network byte order And it also should not contain reserved 1 byte, but hardware HASH need the format include reserved 1 byte And in network byte order. So, this patch convert to that format. I will add some comment in function ixgbe_fdir_filter_to_atr_input and ixgbe_parse_fdir_filter_tunnel, That will not Influence other NIC format for this struct member. Is that ok? > > > > > Fixes: 82fb702077f6 ("ixgbe: support new flow director modes for > > X550") > > Fixes: 11777435c727 ("net/ixgbe: parse flow director filter") > > > > Signed-off-by: Wei Zhao > > --- > > drivers/net/ixgbe/ixgbe_fdir.c | 2 +- drivers/net/ixgbe/ixgbe_flow.c > > | 5 ++--- > > 2 files changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/ixgbe/ixgbe_fdir.c > > b/drivers/net/ixgbe/ixgbe_fdir.c index d5e5179..67ab627 100644 > > --- a/drivers/net/ixgbe/ixgbe_fdir.c > > +++ b/drivers/net/ixgbe/ixgbe_fdir.c > > @@ -774,7 +774,7 @@ ixgbe_fdir_filter_to_atr_input(const struct > > rte_eth_fdir_filter *fdir_filter, > > input->formatted.tunnel_type = > > fdir_filter->input.flow.tunnel_flow.tunnel_type; > > input->formatted.tni_vni = > > - fdir_filter->input.flow.tunnel_flow.tunnel_id; > > + fdir_filter->input.flow.tunnel_flow.tunnel_id >> 8; > > } > > > > return 0; > > diff --git a/drivers/net/ixgbe/ixgbe_flow.c > > b/drivers/net/ixgbe/ixgbe_flow.c index eb0644c..64af777 100644 > > --- a/drivers/net/ixgbe/ixgbe_flow.c > > +++ b/drivers/net/ixgbe/ixgbe_flow.c > > @@ -2489,8 +2489,7 @@ ixgbe_parse_fdir_filter_tunnel(const struct > > rte_flow_attr *attr, > > rte_memcpy(((uint8_t *) > > &rule->ixgbe_fdir.formatted.tni_vni + 1), > > vxlan_spec->vni, RTE_DIM(vxlan_spec->vni)); > > - rule->ixgbe_fdir.formatted.tni_vni = > > rte_be_to_cpu_32( > > - rule->ixgbe_fdir.formatted.tni_vni); > > + rule->ixgbe_fdir.formatted.tni_vni >>= 8; > > } > > } > > > > @@ -2587,7 +2586,7 @@ ixgbe_parse_fdir_filter_tunnel(const struct > > rte_flow_attr *attr, > > /* tni is a 24-bits bit field */ > > rte_memcpy(&rule->ixgbe_fdir.formatted.tni_vni, > > nvgre_spec->tni, RTE_DIM(nvgre_spec->tni)); > > - rule->ixgbe_fdir.formatted.tni_vni <<= 8; > > + rule->ixgbe_fdir.formatted.tni_vni >>= 8; > > } > > } > > > > -- > > 2.7.5
[dpdk-dev] [PATCH 0/2] maintainers: updates for Vhost and Virtio
Hi, Since Jianfeng & Yuanhan resignation, I was the only active maintainer for Vhost lib and PMD, and I had no backup for managing the next-virtio tree. Contacted offline, Tetsuya has kindly accepted to remove himself from the Vhost PMD maintainers as he didn't had time to be active recently. Tetsuya asked me to send the patch and add his Acked-by. I'd like to thank him for his contributions, and wish him the best for his current and next adventures! I propose Tiwei and Zhihong to officially co-maintain both Vhost and Virtio components. They have been very helpful with their reviews in the last months, and know very well both the code and the spec. Being 3 co-maintainers would ensure better reviews while letting us time to develop new features. Also, I propose Tiwei to co-manage the next-virtio tree. Thanks, Maxime Maxime Coquelin (2): maintainers: update Vhost PMD maintainership maintainers: add Vhost and Virtio co-maintainers MAINTAINERS | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) -- 2.14.3
[dpdk-dev] [PATCH 1/2] maintainers: update Vhost PMD maintainership
Tetsuya has kindly agreed to handover the maintainership for the Vhost PMD. Thanks to him for his contributions. Acked-by: Tetsuya Mukawa Signed-off-by: Maxime Coquelin --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 4667fa7fb..1c28f6d38 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -664,7 +664,6 @@ F: doc/guides/sample_app_ug/vhost_scsi.rst F: examples/vhost_crypto/ Vhost PMD -M: Tetsuya Mukawa M: Maxime Coquelin T: git://dpdk.org/next/dpdk-next-virtio F: drivers/net/vhost/ -- 2.14.3
[dpdk-dev] [PATCH 2/2] maintainers: add Vhost and Virtio co-maintainers
Add Tiwei and Zhihong as co-maintainers for the Vhost and Virtio components. They have done great contributions recently, and been very helpfull in helping to review Vhost and Virtio series. Also, add Tiwei as backup for the Next-virtio tree. Signed-off-by: Maxime Coquelin --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1c28f6d38..14939f10a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -41,6 +41,7 @@ T: git://dpdk.org/next/dpdk-next-net-mlx Next-virtio Tree M: Maxime Coquelin +M: Tiwei Bie T: git://dpdk.org/next/dpdk-next-virtio Next-crypto Tree @@ -654,6 +655,8 @@ F: doc/guides/nics/features/vmxnet3.ini Vhost-user M: Maxime Coquelin +M: Tiwei Bie +M: Zhihong Wang T: git://dpdk.org/next/dpdk-next-virtio F: lib/librte_vhost/ F: doc/guides/prog_guide/vhost_lib.rst @@ -665,6 +668,8 @@ F: examples/vhost_crypto/ Vhost PMD M: Maxime Coquelin +M: Tiwei Bie +M: Zhihong Wang T: git://dpdk.org/next/dpdk-next-virtio F: drivers/net/vhost/ F: doc/guides/nics/vhost.rst @@ -673,6 +678,7 @@ F: doc/guides/nics/features/vhost.ini Virtio PMD M: Maxime Coquelin M: Tiwei Bie +M: Zhihong Wang T: git://dpdk.org/next/dpdk-next-virtio F: drivers/net/virtio/ F: doc/guides/nics/virtio.rst -- 2.14.3
Re: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port representors
Hi Adrien, > -Original Message- > From: dev On Behalf Of Adrien Mazarguil > Sent: Saturday, May 26, 2018 12:35 AM > To: Shahaf Shuler > Cc: dev@dpdk.org > Subject: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port representors > > Prior to this patch, all port representors detected on a given device were > probed and Ethernet devices > instantiated for each of them. > > This patch adds support for the standard "representor" parameter, which > implies that port representors > are not probed by default anymore, except for the list provided through > device arguments. > > (Patch based on prior work from Yuanhan Liu) > > Signed-off-by: Adrien Mazarguil > --- > doc/guides/nics/mlx5.rst| 12 > doc/guides/prog_guide/poll_mode_drv.rst | 2 ++ > drivers/net/mlx5/mlx5.c | 25 + > 3 files changed, 39 insertions(+) > > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index > 79c982e29..5229e546c 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -388,6 +388,18 @@ Run-time configuration > >Disabled by default. > > +- ``representor`` parameter [list] > + > + This parameter can be used to instantiate DPDK Ethernet devices from > + existing port (or VF) representors configured on the device. > + > + It is a standard parameter whose format is described in > + :ref:`ethernet_device_standard_device_arguments`. > + > + For instance, to probe port representors 0 through 2:: > + > +representor=[0-2] > + > Firmware configuration > ~~ > > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > b/doc/guides/prog_guide/poll_mode_drv.rst > index af82352a0..58d49ba0f 100644 > --- a/doc/guides/prog_guide/poll_mode_drv.rst > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > @@ -365,6 +365,8 @@ Ethernet Device API > > The Ethernet device API exported by the Ethernet PMDs is described in the > *DPDK API Reference*. > > +.. _ethernet_device_standard_device_arguments: > + > Ethernet Device Standard Device Arguments > ~ > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index > 09afca63c..216753ba6 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -90,6 +90,9 @@ > /* Activate Netlink support in VF mode. */ #define MLX5_VF_NL_EN "vf_nl_en" > > +/* Select port representors to instantiate. */ #define MLX5_REPRESENTOR > +"representor" > + > #ifndef HAVE_IBV_MLX5_MOD_MPW > #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2) #define > MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3) > @@ -420,6 +423,9 @@ mlx5_args_check(const char *key, const char *val, void > *opaque) > struct mlx5_dev_config *config = opaque; > unsigned long tmp; > > + /* No-op, port representors are processed in mlx5_dev_spawn(). */ > + if (!strcmp(MLX5_REPRESENTOR, key)) > + return 0; > errno = 0; > tmp = strtoul(val, NULL, 0); > if (errno) { > @@ -492,6 +498,7 @@ mlx5_args(struct mlx5_dev_config *config, struct > rte_devargs *devargs) > MLX5_RX_VEC_EN, > MLX5_L3_VXLAN_EN, > MLX5_VF_NL_EN, > + MLX5_REPRESENTOR, > NULL, > }; > struct rte_kvargs *kvlist; > @@ -1142,13 +1149,30 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > struct rte_eth_dev **eth_list = NULL; > struct ibv_context *ctx; > struct ibv_device_attr_ex attr; > + struct rte_eth_devargs eth_da; Not related to this patch, from this data structure, maximum representor count is 32, customer might use VF on container environment, 32 is far from requirement. We need additional work here. A workaround is that users call this api multiple times with different representor IDs. > void *tmp; > unsigned int i; > unsigned int j = 0; > unsigned int n = 0; > int ret; > > + if (dpdk_dev->devargs) { > + ret = rte_eth_devargs_parse(dpdk_dev->devargs->args, ð_da); > + if (ret) > + goto error; > + } else { > + memset(ð_da, 0, sizeof(eth_da)); > + } > next: > + if (j) { > + unsigned int k; > + > + for (k = 0; k < eth_da.nb_representor_ports; ++k) > + if (eth_da.representor_ports[k] == j - 1) > + break; > + if (k == eth_da.nb_representor_ports) > + goto skip; > + } > errno = 0; > ctx = mlx5_glue->open_device(ibv_dev[j]); Need a range check for j here. > if (!ctx) { > @@ -1187,6 +1211,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > goto error; > ++n; > } > +skip: > if (ibv_dev[++j]) > goto next; > eth_list[n] = NULL; > -- > 2.11.0
Re: [dpdk-dev] [PATCH] app/testpmd: add sanity checks when retrieving xstats
Was out of office, so only saw the patchset this morning. Missing fixlines (watch out for subject line wrap): Fixes: e2aae1c1ced9 ("ethdev: remove name from extended statistic fetch") Fixes: 0a5beecf466a ("ethdev: revert xstats by ID") Otherwise looks good to me. A case of implementation simplification turning into API assumption :( ..Remy On 07/06/2018 09:15, David Marchand wrote: [..] Signed-off-by: David Marchand Acked-by: Remy Horton
Re: [dpdk-dev] [PATCH] net/ixgbe: fix tunnel id format error for FDIR
Hi Wei, > -Original Message- > From: Zhao1, Wei > Sent: Tuesday, June 12, 2018 3:49 PM > To: Lu, Wenzhuo ; dev@dpdk.org > Cc: sta...@dpdk.org > Subject: RE: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > Hi, Wenzhuo > > > -Original Message- > > From: Lu, Wenzhuo > > Sent: Tuesday, June 12, 2018 1:10 PM > > To: Zhao1, Wei ; dev@dpdk.org > > Cc: sta...@dpdk.org > > Subject: RE: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > > Hi Wei, > > > > > > > -Original Message- > > > From: Zhao1, Wei > > > Sent: Tuesday, June 5, 2018 5:12 PM > > > To: dev@dpdk.org > > > Cc: Lu, Wenzhuo ; sta...@dpdk.org; Zhao1, Wei > > > > > > Subject: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > > > > In cloud mode for FDIR, tunnel id should be set as protocol request, > > > the lower 8 bits should be set as reserved. > > To my opinion, the original implementation and this patch have > > different understanding of the 'tunnel_id' in ' struct > > rte_eth_tunnel_flow'. Originally it only means the tunnel id but not > including the reserved 8 bits. > > This patch means it should include the reserved bits. Maybe it makes > > things easier because the whole 4 bytes are big endian. > > So, may I suggest to add some comments in ' struct > > rte_eth_tunnel_flow' to let the users know what the 'tunnel_id' really > means? > > The format from input for 'tunnel_id' should be network network byte order > And it also should not contain reserved 1 byte, but hardware HASH need the > format include reserved 1 byte And in network byte order. So, this patch > convert to that format. > I will add some comment in function ixgbe_fdir_filter_to_atr_input and > ixgbe_parse_fdir_filter_tunnel, That will not Influence other NIC format for > this struct member. > Is that ok? It's not appropriate that different NIC has different understanding of the same info provided by APP. And I found this tunnel_id is already used by other NICs. We cannot change the meaning. So I think this patch is not good. > > > > > > > > > > Fixes: 82fb702077f6 ("ixgbe: support new flow director modes for > > > X550") > > > Fixes: 11777435c727 ("net/ixgbe: parse flow director filter") > > > > > > Signed-off-by: Wei Zhao > > > --- > > > drivers/net/ixgbe/ixgbe_fdir.c | 2 +- > > > drivers/net/ixgbe/ixgbe_flow.c > > > | 5 ++--- > > > 2 files changed, 3 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/ixgbe/ixgbe_fdir.c > > > b/drivers/net/ixgbe/ixgbe_fdir.c index d5e5179..67ab627 100644 > > > --- a/drivers/net/ixgbe/ixgbe_fdir.c > > > +++ b/drivers/net/ixgbe/ixgbe_fdir.c > > > @@ -774,7 +774,7 @@ ixgbe_fdir_filter_to_atr_input(const struct > > > rte_eth_fdir_filter *fdir_filter, > > > input->formatted.tunnel_type = > > > fdir_filter->input.flow.tunnel_flow.tunnel_type; > > > input->formatted.tni_vni = > > > - fdir_filter->input.flow.tunnel_flow.tunnel_id; > > > + fdir_filter->input.flow.tunnel_flow.tunnel_id >> 8; > > > } > > > > > > return 0; > > > diff --git a/drivers/net/ixgbe/ixgbe_flow.c > > > b/drivers/net/ixgbe/ixgbe_flow.c index eb0644c..64af777 100644 > > > --- a/drivers/net/ixgbe/ixgbe_flow.c > > > +++ b/drivers/net/ixgbe/ixgbe_flow.c > > > @@ -2489,8 +2489,7 @@ ixgbe_parse_fdir_filter_tunnel(const struct > > > rte_flow_attr *attr, > > > rte_memcpy(((uint8_t *) > > > &rule->ixgbe_fdir.formatted.tni_vni + 1), > > > vxlan_spec->vni, RTE_DIM(vxlan_spec->vni)); > > > - rule->ixgbe_fdir.formatted.tni_vni = > > > rte_be_to_cpu_32( > > > - rule->ixgbe_fdir.formatted.tni_vni); > > > + rule->ixgbe_fdir.formatted.tni_vni >>= 8; > > > } > > > } > > > > > > @@ -2587,7 +2586,7 @@ ixgbe_parse_fdir_filter_tunnel(const struct > > > rte_flow_attr *attr, > > > /* tni is a 24-bits bit field */ > > > rte_memcpy(&rule->ixgbe_fdir.formatted.tni_vni, > > > nvgre_spec->tni, RTE_DIM(nvgre_spec->tni)); > > > - rule->ixgbe_fdir.formatted.tni_vni <<= 8; > > > + rule->ixgbe_fdir.formatted.tni_vni >>= 8; > > > } > > > } > > > > > > -- > > > 2.7.5
Re: [dpdk-dev] [PATCH] net/ixgbe: fix tunnel id format error for FDIR
Hi,wenzhuo > -Original Message- > From: Lu, Wenzhuo > Sent: Tuesday, June 12, 2018 4:39 PM > To: Zhao1, Wei ; dev@dpdk.org > Cc: sta...@dpdk.org > Subject: RE: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > Hi Wei, > > > -Original Message- > > From: Zhao1, Wei > > Sent: Tuesday, June 12, 2018 3:49 PM > > To: Lu, Wenzhuo ; dev@dpdk.org > > Cc: sta...@dpdk.org > > Subject: RE: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > > > > Hi, Wenzhuo > > > > > -Original Message- > > > From: Lu, Wenzhuo > > > Sent: Tuesday, June 12, 2018 1:10 PM > > > To: Zhao1, Wei ; dev@dpdk.org > > > Cc: sta...@dpdk.org > > > Subject: RE: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > > > > Hi Wei, > > > > > > > > > > -Original Message- > > > > From: Zhao1, Wei > > > > Sent: Tuesday, June 5, 2018 5:12 PM > > > > To: dev@dpdk.org > > > > Cc: Lu, Wenzhuo ; sta...@dpdk.org; Zhao1, > > > > Wei > > > > Subject: [PATCH] net/ixgbe: fix tunnel id format error for FDIR > > > > > > > > In cloud mode for FDIR, tunnel id should be set as protocol > > > > request, the lower 8 bits should be set as reserved. > > > To my opinion, the original implementation and this patch have > > > different understanding of the 'tunnel_id' in ' struct > > > rte_eth_tunnel_flow'. Originally it only means the tunnel id but not > > including the reserved 8 bits. > > > This patch means it should include the reserved bits. Maybe it makes > > > things easier because the whole 4 bytes are big endian. > > > So, may I suggest to add some comments in ' struct > > > rte_eth_tunnel_flow' to let the users know what the 'tunnel_id' > > > really > > means? > > > > The format from input for 'tunnel_id' should be network network byte > > order And it also should not contain reserved 1 byte, but hardware > > HASH need the format include reserved 1 byte And in network byte > > order. So, this patch convert to that format. > > I will add some comment in function ixgbe_fdir_filter_to_atr_input > > and ixgbe_parse_fdir_filter_tunnel, That will not Influence other NIC > > format for this struct member. > > Is that ok? > It's not appropriate that different NIC has different understanding of the > same info provided by APP. > And I found this tunnel_id is already used by other NICs. We cannot change > the meaning. So I think this patch is not good. We do not change the input format in testpmd level, but we MUST change in our IXGBE PMD, because this request by hardware HASH. This is the most important root cause for IXGBE vxlan packet fail. > > > > > > > > > > > > > > > > Fixes: 82fb702077f6 ("ixgbe: support new flow director modes for > > > > X550") > > > > Fixes: 11777435c727 ("net/ixgbe: parse flow director filter") > > > > > > > > Signed-off-by: Wei Zhao > > > > --- > > > > drivers/net/ixgbe/ixgbe_fdir.c | 2 +- > > > > drivers/net/ixgbe/ixgbe_flow.c > > > > | 5 ++--- > > > > 2 files changed, 3 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/net/ixgbe/ixgbe_fdir.c > > > > b/drivers/net/ixgbe/ixgbe_fdir.c index d5e5179..67ab627 100644 > > > > --- a/drivers/net/ixgbe/ixgbe_fdir.c > > > > +++ b/drivers/net/ixgbe/ixgbe_fdir.c > > > > @@ -774,7 +774,7 @@ ixgbe_fdir_filter_to_atr_input(const struct > > > > rte_eth_fdir_filter *fdir_filter, > > > > input->formatted.tunnel_type = > > > > fdir_filter->input.flow.tunnel_flow.tunnel_type; > > > > input->formatted.tni_vni = > > > > - fdir_filter->input.flow.tunnel_flow.tunnel_id; > > > > + fdir_filter->input.flow.tunnel_flow.tunnel_id > > > > >> 8; > > > > } > > > > > > > > return 0; > > > > diff --git a/drivers/net/ixgbe/ixgbe_flow.c > > > > b/drivers/net/ixgbe/ixgbe_flow.c index eb0644c..64af777 100644 > > > > --- a/drivers/net/ixgbe/ixgbe_flow.c > > > > +++ b/drivers/net/ixgbe/ixgbe_flow.c > > > > @@ -2489,8 +2489,7 @@ ixgbe_parse_fdir_filter_tunnel(const struct > > > > rte_flow_attr *attr, > > > > rte_memcpy(((uint8_t *) > > > > &rule->ixgbe_fdir.formatted.tni_vni + > > > > 1), > > > > vxlan_spec->vni, > > > > RTE_DIM(vxlan_spec->vni)); > > > > - rule->ixgbe_fdir.formatted.tni_vni = > > > > rte_be_to_cpu_32( > > > > - rule->ixgbe_fdir.formatted.tni_vni); > > > > + rule->ixgbe_fdir.formatted.tni_vni >>= 8; > > > > } > > > > } > > > > > > > > @@ -2587,7 +2586,7 @@ ixgbe_parse_fdir_filter_tunnel(const struct > > > > rte_flow_attr *attr, > > > > /* tni is a 24-bits bit field */ > > > > rte_memcpy(&rule->ixgbe_fdir.formatted.tni_vni, > > > > nvgre_spec->tni, RTE_DIM(nvgre_spec->tni)); > > > > - rule->ixgbe_fdir.formatted.tni_vni <<= 8; > > > > +
Re: [dpdk-dev] [PATCH 0/2] maintainers: updates for Vhost and Virtio
Hi Maxime, On Tue, Jun 12, 2018 at 10:01:25AM +0200, Maxime Coquelin wrote: > Hi, > > Since Jianfeng & Yuanhan resignation, I was the only active > maintainer for Vhost lib and PMD, and I had no backup for > managing the next-virtio tree. > > Contacted offline, Tetsuya has kindly accepted to remove > himself from the Vhost PMD maintainers as he didn't had time > to be active recently. Tetsuya asked me to send the patch > and add his Acked-by. I'd like to thank him for his > contributions, and wish him the best for his current and > next adventures! > > I propose Tiwei and Zhihong to officially co-maintain both > Vhost and Virtio components. They have been very helpful with > their reviews in the last months, and know very well both the > code and the spec. Being 3 co-maintainers would ensure better > reviews while letting us time to develop new features. > > Also, I propose Tiwei to co-manage the next-virtio tree. Thank you very much for proposing us as co-maintainers! I will try my best to help you co-maintain the vhost and virtio components! :) Best regards, Tiwei Bie > > Thanks, > Maxime > > Maxime Coquelin (2): > maintainers: update Vhost PMD maintainership > maintainers: add Vhost and Virtio co-maintainers > > MAINTAINERS | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > -- > 2.14.3 >
[dpdk-dev] [PATCH 3/3] mem: provide thread-unsafe memseg list walk variant
Sometimes, user code needs to walk memseg list while being inside a memory-related callback. Rather than making everyone copy around the same iteration code and depending on DPDK internals, provide an official way to do memseg_list_walk() inside callbacks. Also, remove existing reimplementation from memalloc code and use the new API instead. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_memory.c | 29 + lib/librte_eal/common/include/rte_memory.h | 18 +++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 37 +- lib/librte_eal/rte_eal_version.map | 1 + 4 files changed, 43 insertions(+), 42 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index afe0d5b57..6c4a8d40b 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -883,14 +883,11 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg) } int __rte_experimental -rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) +rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int i, ret = 0; - /* do not allow allocations/frees/init while we iterate */ - rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); - for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { struct rte_memseg_list *msl = &mcfg->memsegs[i]; @@ -898,17 +895,23 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) continue; ret = func(msl, arg); - if (ret < 0) { - ret = -1; - goto out; - } - if (ret > 0) { - ret = 1; - goto out; - } + if (ret) + return ret; } -out: + return 0; +} + +int __rte_experimental +rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret = 0; + + /* do not allow allocations/frees/init while we iterate */ + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_list_walk_thread_unsafe(func, arg); rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + return ret; } diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index c5a84c333..c4b7f4cff 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -299,6 +299,24 @@ rte_memseg_walk_thread_unsafe(rte_memseg_walk_t func, void *arg); int __rte_experimental rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg); +/** + * Walk each allocated memseg list without performing any locking. + * + * @note This function does not perform any locking, and is only safe to call + * from within memory-related callback functions. + * + * @param func + * Iterator function + * @param arg + * Argument passed to iterator + * @return + * 0 if walked over the entire list + * 1 if stopped by the user + * -1 if user function reported error + */ +int __rte_experimental +rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg); + /** * Dump the physical memory layout to a file. * diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 8c11f98c9..1ebc4b571 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -171,32 +171,6 @@ get_file_size(int fd) return st.st_size; } -/* we cannot use rte_memseg_list_walk() here because we will be holding a - * write lock whenever we enter every function in this file, however copying - * the same iteration code everywhere is not ideal as well. so, use a lockless - * copy of memseg list walk here. - */ -static int -memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg) -{ - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int i, ret = 0; - - for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { - struct rte_memseg_list *msl = &mcfg->memsegs[i]; - - if (msl->base_va == NULL) - continue; - - ret = func(msl, arg); - if (ret < 0) - return -1; - if (ret > 0) - return 1; - } - return 0; -} - /* returns 1 on successful lock, 0 on unsuccessful lock, -1 on error */ static int lock(int fd, int type) { @@ -878,7 +852,8 @@ eal_memalloc_alloc_seg_bulk(struct rte_memseg **ms, int n_segs, size_t page_sz, wa.socket = socket; wa.segs_allocated = 0; - ret = memseg_list_walk_thread_unsafe(alloc_seg_walk, &wa); +
[dpdk-dev] [PATCH 2/3] mem: provide thread-unsafe memseg walk variant
Sometimes, user code needs to walk memseg list while being inside a memory-related callback. Rather than making everyone copy around the same iteration code and depending on DPDK internals, provide an official way to do memseg_walk() inside callbacks. Also, remove existing reimplementation from sPAPR VFIO code and use the new API instead. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_memory.c | 28 -- lib/librte_eal/common/include/rte_memory.h | 18 + lib/librte_eal/linuxapp/eal/eal_vfio.c | 43 +++--- lib/librte_eal/rte_eal_version.map | 1 + 4 files changed, 40 insertions(+), 50 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index e3320a746..afe0d5b57 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -841,14 +841,11 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg) } int __rte_experimental -rte_memseg_walk(rte_memseg_walk_t func, void *arg) +rte_memseg_walk_thread_unsafe(rte_memseg_walk_t func, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int i, ms_idx, ret = 0; - /* do not allow allocations/frees/init while we iterate */ - rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); - for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { struct rte_memseg_list *msl = &mcfg->memsegs[i]; const struct rte_memseg *ms; @@ -863,18 +860,25 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg) while (ms_idx >= 0) { ms = rte_fbarray_get(arr, ms_idx); ret = func(msl, ms, arg); - if (ret < 0) { - ret = -1; - goto out; - } else if (ret > 0) { - ret = 1; - goto out; - } + if (ret) + return ret; ms_idx = rte_fbarray_find_next_used(arr, ms_idx + 1); } } -out: + return 0; +} + +int __rte_experimental +rte_memseg_walk(rte_memseg_walk_t func, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret = 0; + + /* do not allow allocations/frees/init while we iterate */ + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_walk_thread_unsafe(func, arg); rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + return ret; } diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index aeba38bfa..c5a84c333 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -263,6 +263,24 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg); int __rte_experimental rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg); +/** + * Walk list of all memsegs without performing any locking. + * + * @note This function does not perform any locking, and is only safe to call + * from within memory-related callback functions. + * + * @param func + * Iterator function + * @param arg + * Argument passed to iterator + * @return + * 0 if walked over the entire list + * 1 if stopped by the user + * -1 if user function reported error + */ +int __rte_experimental +rte_memseg_walk_thread_unsafe(rte_memseg_walk_t func, void *arg); + /** * Walk each VA-contiguous area without performing any locking. * diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index a2bbdfbf4..14c9332e9 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -87,42 +87,6 @@ static const struct vfio_iommu_type iommu_types[] = { }, }; -/* for sPAPR IOMMU, we will need to walk memseg list, but we cannot use - * rte_memseg_walk() because by the time we enter callback we will be holding a - * write lock, so regular rte-memseg_walk will deadlock. copying the same - * iteration code everywhere is not ideal as well. so, use a lockless copy of - * memseg walk here. - */ -static int -memseg_walk_thread_unsafe(rte_memseg_walk_t func, void *arg) -{ - struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; - int i, ms_idx, ret = 0; - - for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { - struct rte_memseg_list *msl = &mcfg->memsegs[i]; - const struct rte_memseg *ms; - struct rte_fbarray *arr; - - if (msl->memseg_arr.count == 0) - continue; - - arr = &msl->memseg_arr; - - ms_idx = rte_fbarray_find_next_used(arr, 0); - while (ms_idx >= 0) { -
[dpdk-dev] [PATCH 1/3] mem: provide thread-unsafe contig walk variant
Sometimes, user code needs to walk memseg list while being inside a memory-related callback. Rather than making everyone copy around the same iteration code and depending on DPDK internals, provide an official way to do memseg_contig_walk() inside callbacks. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_memory.c | 28 -- lib/librte_eal/common/include/rte_memory.h | 18 ++ lib/librte_eal/rte_eal_version.map | 1 + 3 files changed, 35 insertions(+), 12 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4f0688f9d..e3320a746 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -788,14 +788,11 @@ rte_mem_lock_page(const void *virt) } int __rte_experimental -rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg) +rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int i, ms_idx, ret = 0; - /* do not allow allocations/frees/init while we iterate */ - rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); - for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { struct rte_memseg_list *msl = &mcfg->memsegs[i]; const struct rte_memseg *ms; @@ -820,19 +817,26 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg) len = n_segs * msl->page_sz; ret = func(msl, ms, len, arg); - if (ret < 0) { - ret = -1; - goto out; - } else if (ret > 0) { - ret = 1; - goto out; - } + if (ret) + return ret; ms_idx = rte_fbarray_find_next_used(arr, ms_idx + n_segs); } } -out: + return 0; +} + +int __rte_experimental +rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int ret = 0; + + /* do not allow allocations/frees/init while we iterate */ + rte_rwlock_read_lock(&mcfg->memory_hotplug_lock); + ret = rte_memseg_contig_walk_thread_unsafe(func, arg); rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock); + return ret; } diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index aab9f6fe5..aeba38bfa 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -263,6 +263,24 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg); int __rte_experimental rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg); +/** + * Walk each VA-contiguous area without performing any locking. + * + * @note This function does not perform any locking, and is only safe to call + * from within memory-related callback functions. + * + * @param func + * Iterator function + * @param arg + * Argument passed to iterator + * @return + * 0 if walked over the entire list + * 1 if stopped by the user + * -1 if user function reported error + */ +int __rte_experimental +rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg); + /** * Dump the physical memory layout to a file. * diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index f7dd0e7bc..98bfbe796 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -286,6 +286,7 @@ EXPERIMENTAL { rte_mem_virt2memseg; rte_mem_virt2memseg_list; rte_memseg_contig_walk; + rte_memseg_contig_walk_thread_unsafe; rte_memseg_list_walk; rte_memseg_walk; rte_mp_action_register; -- 2.17.1
[dpdk-dev] [PATCH v1] crypto/openssl: add support for 8 byte 3DES
Added extra case to support 8 byte key size for 3DES CBC. Also changed capabilities to reflect the change. Signed-off-by: Marko, Kovacevic --- drivers/crypto/openssl/rte_openssl_pmd.c | 3 +++ drivers/crypto/openssl/rte_openssl_pmd_ops.c | 2 +- test/test/test_cryptodev_des_test_vectors.h | 6 -- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/openssl/rte_openssl_pmd.c b/drivers/crypto/openssl/rte_openssl_pmd.c index 93c6d7e..a1b4ca4 100644 --- a/drivers/crypto/openssl/rte_openssl_pmd.c +++ b/drivers/crypto/openssl/rte_openssl_pmd.c @@ -137,6 +137,9 @@ get_cipher_algo(enum rte_crypto_cipher_algorithm sess_algo, size_t keylen, switch (sess_algo) { case RTE_CRYPTO_CIPHER_3DES_CBC: switch (keylen) { + case 8: + *algo = EVP_des_cbc(); + break; case 16: *algo = EVP_des_ede_cbc(); break; diff --git a/drivers/crypto/openssl/rte_openssl_pmd_ops.c b/drivers/crypto/openssl/rte_openssl_pmd_ops.c index 1cb87d5..e7c5a57 100644 --- a/drivers/crypto/openssl/rte_openssl_pmd_ops.c +++ b/drivers/crypto/openssl/rte_openssl_pmd_ops.c @@ -397,7 +397,7 @@ static const struct rte_cryptodev_capabilities openssl_pmd_capabilities[] = { .algo = RTE_CRYPTO_CIPHER_3DES_CBC, .block_size = 8, .key_size = { - .min = 16, + .min = 8, .max = 24, .increment = 8 }, diff --git a/test/test/test_cryptodev_des_test_vectors.h b/test/test/test_cryptodev_des_test_vectors.h index 4217b72..7479e70 100644 --- a/test/test/test_cryptodev_des_test_vectors.h +++ b/test/test/test_cryptodev_des_test_vectors.h @@ -1232,13 +1232,15 @@ static const struct blockcipher_test_case triple_des_cipheronly_test_cases[] = { .test_descr = "3DES-64-CBC Encryption", .test_data = &triple_des64cbc_test_vector, .op_mask = BLOCKCIPHER_TEST_OP_ENCRYPT, - .pmd_mask = BLOCKCIPHER_TEST_TARGET_PMD_MB + .pmd_mask = BLOCKCIPHER_TEST_TARGET_PMD_MB | + BLOCKCIPHER_TEST_TARGET_PMD_OPENSSL }, { .test_descr = "3DES-64-CBC Decryption", .test_data = &triple_des64cbc_test_vector, .op_mask = BLOCKCIPHER_TEST_OP_DECRYPT, - .pmd_mask = BLOCKCIPHER_TEST_TARGET_PMD_MB + .pmd_mask = BLOCKCIPHER_TEST_TARGET_PMD_MB | + BLOCKCIPHER_TEST_TARGET_PMD_OPENSSL }, { .test_descr = "3DES-128-CBC Encryption", -- 2.9.5
[dpdk-dev] [PATCH v1] crypto/qat: add support for 8 byte 3DES
Added extra case to support 8 byte key size for 3DES CBC. Also changed capabilities to reflect the change. Signed-off-by: Marko, Kovacevic --- drivers/crypto/qat/qat_adf/qat_algs.h| 1 + drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 1 + drivers/crypto/qat/qat_crypto_capabilities.h | 2 +- test/test/test_cryptodev_des_test_vectors.h | 6 -- 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/qat/qat_adf/qat_algs.h b/drivers/crypto/qat/qat_adf/qat_algs.h index 88bd5f00e..a3a3545ef 100644 --- a/drivers/crypto/qat/qat_adf/qat_algs.h +++ b/drivers/crypto/qat/qat_adf/qat_algs.h @@ -21,6 +21,7 @@ /* 3DES key sizes */ #define QAT_3DES_KEY_SZ_OPT1 24 /* Keys are independent */ #define QAT_3DES_KEY_SZ_OPT2 16 /* K3=K1 */ +#define QAT_3DES_KEY_SZ_OPT3 8 /* K1=K2=K3 */ #define QAT_AES_HW_CONFIG_CBC_ENC(alg) \ ICP_QAT_HW_CIPHER_CONFIG_BUILD(ICP_QAT_HW_CIPHER_CBC_MODE, alg, \ diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c index c87ed40fe..e89858d22 100644 --- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c +++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c @@ -994,6 +994,7 @@ int qat_alg_validate_3des_key(int key_len, enum icp_qat_hw_cipher_algo *alg) switch (key_len) { case QAT_3DES_KEY_SZ_OPT1: case QAT_3DES_KEY_SZ_OPT2: + case QAT_3DES_KEY_SZ_OPT3: *alg = ICP_QAT_HW_CIPHER_ALGO_3DES; break; default: diff --git a/drivers/crypto/qat/qat_crypto_capabilities.h b/drivers/crypto/qat/qat_crypto_capabilities.h index 001c32c5d..1d6110970 100644 --- a/drivers/crypto/qat/qat_crypto_capabilities.h +++ b/drivers/crypto/qat/qat_crypto_capabilities.h @@ -434,7 +434,7 @@ .algo = RTE_CRYPTO_CIPHER_3DES_CBC, \ .block_size = 8,\ .key_size = { \ - .min = 16, \ + .min = 8, \ .max = 24, \ .increment = 8 \ }, \ diff --git a/test/test/test_cryptodev_des_test_vectors.h b/test/test/test_cryptodev_des_test_vectors.h index 7479e70f0..103345605 100644 --- a/test/test/test_cryptodev_des_test_vectors.h +++ b/test/test/test_cryptodev_des_test_vectors.h @@ -1233,14 +1233,16 @@ static const struct blockcipher_test_case triple_des_cipheronly_test_cases[] = { .test_data = &triple_des64cbc_test_vector, .op_mask = BLOCKCIPHER_TEST_OP_ENCRYPT, .pmd_mask = BLOCKCIPHER_TEST_TARGET_PMD_MB | - BLOCKCIPHER_TEST_TARGET_PMD_OPENSSL + BLOCKCIPHER_TEST_TARGET_PMD_OPENSSL | + BLOCKCIPHER_TEST_TARGET_PMD_QAT }, { .test_descr = "3DES-64-CBC Decryption", .test_data = &triple_des64cbc_test_vector, .op_mask = BLOCKCIPHER_TEST_OP_DECRYPT, .pmd_mask = BLOCKCIPHER_TEST_TARGET_PMD_MB | - BLOCKCIPHER_TEST_TARGET_PMD_OPENSSL + BLOCKCIPHER_TEST_TARGET_PMD_OPENSSL | + BLOCKCIPHER_TEST_TARGET_PMD_QAT }, { .test_descr = "3DES-128-CBC Encryption", -- 2.13.6
Re: [dpdk-dev] Why rte_pipeline.c only support forwarding to same table in different entries?
Hi Hobby, Shyam, > > > > I am trying to have one ACL table 0 to forward different traffic to > > different table, but it failed. > > > > > > Is there any reason why adding this limitation? > > The current librte_pipeline implementation is limited to supporting a chain of tables as opposed to a tree of tables. Basically, all entries in table A that are connected to another table (as opposed to dropping packets or sending them to an output port) have to point to the same table B. The reason is related to simplicity of implementation (performance) while the same functionality can be easily achieved in a slightly different way. The current implementation has a single buffer where the current burst of packets under processing is stored: - as packets are dropped or sent to output ports, this array of packets starts to develop "holes" - after current table processing is completed, the remaining packets move to the next table - the current pipeline iteration is completed when no packets are left in the array A tree of table topology or a chain where some tables can be skipped by some packets would be more complex and probably lower performance: - each table should have an input queue of packets; as packets are sent to another table, they are stored in the input array of that table - a scheduler is needed to track which table has packets to process and schedule them How to achieve the same thing by using external recirculation through a SW queue/ring: - have table A send the packets logically destined to table B to a pipeline output port instead; this output port sits on top of a SW queue/ring (ring writer port type) - have an additional pipeline input port created on top of the same SW queue/ring (ring reader port type) - connect this pipeline input port directly to table B The same effect can also be achieved by having table B as part of a different pipeline than table A, with the second pipeline executed by the same (or different, your choice) CPU core as the first pipeline. Makes sense? Regards, Cristian
Re: [dpdk-dev] [PATCH 0/2] maintainers: updates for Vhost and Virtio
Hi Maxime, > -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Tuesday, June 12, 2018 4:01 PM > To: mtetsu...@gmail.com; Bie, Tiwei ; Wang, Zhihong > ; dev@dpdk.org > Cc: tho...@monjalon.net; Yigit, Ferruh ; Maxime > Coquelin > Subject: [PATCH 0/2] maintainers: updates for Vhost and Virtio > > Hi, > > Since Jianfeng & Yuanhan resignation, I was the only active > maintainer for Vhost lib and PMD, and I had no backup for > managing the next-virtio tree. > > Contacted offline, Tetsuya has kindly accepted to remove > himself from the Vhost PMD maintainers as he didn't had time > to be active recently. Tetsuya asked me to send the patch > and add his Acked-by. I'd like to thank him for his > contributions, and wish him the best for his current and > next adventures! > > I propose Tiwei and Zhihong to officially co-maintain both > Vhost and Virtio components. They have been very helpful with > their reviews in the last months, and know very well both the > code and the spec. Being 3 co-maintainers would ensure better > reviews while letting us time to develop new features. Thanks for the proposal! I'm really glad to help co-maintain the Virtio and Vhost components. ;) Regards -Zhihong > > Also, I propose Tiwei to co-manage the next-virtio tree. > > Thanks, > Maxime > > Maxime Coquelin (2): > maintainers: update Vhost PMD maintainership > maintainers: add Vhost and Virtio co-maintainers > > MAINTAINERS | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > -- > 2.14.3
[dpdk-dev] [PATCH] net/mlx5: fix pmd crash in device probe
This patch initializes counter descriptor struct before invoking Verbs api to avoid segment fault. Fixes: 9a761de8ea14 ("net/mlx5: flow counter support") Cc: or...@mellanox.com Signed-off-by: Xueming Li --- drivers/net/mlx5/mlx5.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 216753ba6..cf9936ae2 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -708,7 +708,7 @@ mlx5_dev_spawn_one(struct rte_device *dpdk_dev, unsigned int mprq_min_stride_num_n = 0; unsigned int mprq_max_stride_num_n = 0; #ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT - struct ibv_counter_set_description cs_desc; + struct ibv_counter_set_description cs_desc = { .counter_type = 0 }; #endif struct ether_addr mac; char name[RTE_ETH_NAME_MAX_LEN]; -- 2.13.3
Re: [dpdk-dev] [PATCH 3/6] cryptodev: remove max number of sessions
Hello Pablo, On Fri, Jun 08, 2018 at 11:02:31PM +0100, Pablo de Lara wrote: > Sessions are not created and stored in the crypto device > anymore, since now the session mempool is created > at the application level. > > Therefore the limitation of the maximum number of sessions > that can be created should not be dependent of the crypto device. > > Signed-off-by: Pablo de Lara > --- > config/common_base | 12 > config/rte_config.h | 14 -- > doc/guides/cryptodevs/aesni_gcm.rst | 4 +--- > doc/guides/cryptodevs/aesni_mb.rst | 4 +--- > doc/guides/cryptodevs/armv8.rst | 1 - > doc/guides/cryptodevs/ccp.rst| 2 -- > doc/guides/cryptodevs/dpaa2_sec.rst | 5 - > doc/guides/cryptodevs/dpaa_sec.rst | 5 - > doc/guides/cryptodevs/kasumi.rst | 4 +--- > doc/guides/cryptodevs/mvsam.rst | 1 - > doc/guides/cryptodevs/null.rst | 4 +--- > doc/guides/cryptodevs/openssl.rst| 1 - > doc/guides/cryptodevs/scheduler.rst | 4 > doc/guides/cryptodevs/snow3g.rst | 4 +--- > doc/guides/cryptodevs/zuc.rst| 4 +--- > doc/guides/prog_guide/cryptodev_lib.rst | 9 ++--- > doc/guides/rel_notes/deprecation.rst | 3 --- > doc/guides/rel_notes/release_18_08.rst | 3 ++- > drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 5 + > drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 1 - > drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 2 -- > drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 5 + > drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c | 1 - > .../crypto/aesni_mb/rte_aesni_mb_pmd_private.h | 2 -- > drivers/crypto/armv8/rte_armv8_pmd.c | 5 + > drivers/crypto/armv8/rte_armv8_pmd_ops.c | 1 - > drivers/crypto/armv8/rte_armv8_pmd_private.h | 2 -- > drivers/crypto/ccp/ccp_pmd_ops.c | 1 - > drivers/crypto/ccp/ccp_pmd_private.h | 1 - > drivers/crypto/ccp/rte_ccp_pmd.c | 16 +--- > drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 3 ++- > drivers/crypto/dpaa_sec/dpaa_sec.c | 1 - > drivers/crypto/dpaa_sec/dpaa_sec.h | 1 + > drivers/crypto/kasumi/rte_kasumi_pmd.c | 5 + > drivers/crypto/kasumi/rte_kasumi_pmd_ops.c | 1 - > drivers/crypto/kasumi/rte_kasumi_pmd_private.h | 2 -- > drivers/crypto/mvsam/rte_mrvl_pmd.c | 6 -- > drivers/crypto/mvsam/rte_mrvl_pmd_ops.c | 1 - > drivers/crypto/mvsam/rte_mrvl_pmd_private.h | 1 - > drivers/crypto/null/null_crypto_pmd.c| 3 --- > drivers/crypto/null/null_crypto_pmd_ops.c| 1 - > drivers/crypto/null/null_crypto_pmd_private.h| 1 - > drivers/crypto/openssl/rte_openssl_pmd.c | 3 --- > drivers/crypto/openssl/rte_openssl_pmd_ops.c | 1 - > drivers/crypto/openssl/rte_openssl_pmd_private.h | 2 -- > drivers/crypto/qat/qat_crypto.c | 1 - > drivers/crypto/qat/qat_crypto.h | 2 -- > drivers/crypto/qat/rte_qat_cryptodev.c | 4 +--- > drivers/crypto/scheduler/scheduler_pmd.c | 13 + > drivers/crypto/scheduler/scheduler_pmd_ops.c | 7 --- > drivers/crypto/snow3g/rte_snow3g_pmd.c | 3 --- > drivers/crypto/snow3g/rte_snow3g_pmd_ops.c | 1 - > drivers/crypto/snow3g/rte_snow3g_pmd_private.h | 2 -- > drivers/crypto/virtio/virtio_cryptodev.c | 3 --- > drivers/crypto/zuc/rte_zuc_pmd.c | 5 + > drivers/crypto/zuc/rte_zuc_pmd_ops.c | 1 - > drivers/crypto/zuc/rte_zuc_pmd_private.h | 2 -- > lib/librte_cryptodev/rte_cryptodev.h | 5 - > lib/librte_cryptodev/rte_cryptodev_pmd.c | 12 ++-- > lib/librte_cryptodev/rte_cryptodev_pmd.h | 4 > test/test/test_cryptodev.c | 13 +++-- > 61 files changed, 30 insertions(+), 206 deletions(-) > > diff --git a/config/common_base b/config/common_base > index 6b0d1cbbb..db6dec335 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -473,14 +473,12 @@ CONFIG_RTE_LIBRTE_PMD_ARMV8_CRYPTO_DEBUG=n > # Compile NXP DPAA2 crypto sec driver for CAAM HW > # > CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC=n > -CONFIG_RTE_DPAA2_SEC_PMD_MAX_NB_SESSIONS=2048 > > # > # NXP DPAA caam - crypto driver > # > CONFIG_RTE_LIBRTE_PMD_DPAA_SEC=n > CONFIG_RTE_LIBRTE_DPAA_MAX_CRYPTODEV=4 > -CONFIG_RTE_DPAA_SEC_PMD_MAX_NB_SESSIONS=2048 > > # > # Compile PMD for QuickAssist based devices > @@ -490,11 +488,6 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_INIT=n > CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_TX=n > CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_RX=n > CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n > -# > -# N
[dpdk-dev] [PATCH v1] net/mlx5: fix pmd crash in device probe
This patch initializes counter descriptor struct before invoking Verbs api to avoid segment fault. Fixes: 9a761de8ea14 ("net/mlx5: flow counter support") Cc: or...@mellanox.com Cc: sta...@dpdk.org Signed-off-by: Xueming Li --- drivers/net/mlx5/mlx5.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index c933e274f..9ab965968 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -706,7 +706,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, int i; struct mlx5dv_context attrs_out = {0}; #ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT - struct ibv_counter_set_description cs_desc; + struct ibv_counter_set_description cs_desc = { .counter_type = 0 }; #endif /* Prepare shared data between primary and secondary process. */ -- 2.13.3
Re: [dpdk-dev] [PATCH v3 1/5] net/virtio: forbid simple Tx path by default
On 06/12/2018 08:35 AM, Tiwei Bie wrote: On Thu, Jun 07, 2018 at 11:26:12AM +0200, Maxime Coquelin wrote: Simple Tx path is not compliant with the Virtio specification, as it assumes the device will use the descriptors in order. VIRTIO_F_IN_ORDER feature has been introduced recently, but the simple Tx path is not compliant with it as VIRTIO_F_IN_ORDER requires that chained descriptors are used sequentially, which is not the case in simple Tx path. This patch introduces 'simple_tx_support' devarg to unlock Tx simple path selection. Reported-by: Tiwei Bie Signed-off-by: Maxime Coquelin --- doc/guides/nics/virtio.rst | 9 + drivers/net/virtio/virtio_ethdev.c | 73 +- drivers/net/virtio/virtio_pci.h| 1 + 3 files changed, 82 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst index 8922f9c0b..53ce1c12a 100644 --- a/doc/guides/nics/virtio.rst +++ b/doc/guides/nics/virtio.rst @@ -222,6 +222,9 @@ Tx callbacks: #. ``virtio_xmit_pkts_simple``: Vector version fixes the available ring indexes to optimize performance. + This implementation does not comply with the Virtio specification, and so + is not selectable by default. "simple_tx_support=1" devarg must be passed + to unlock it. By default, the non-vector callbacks are used: @@ -331,3 +334,9 @@ The user can specify below argument in devargs. driver, and works as a HW vhost backend. This argument is used to specify a virtio device needs to work in vDPA mode. (Default: 0 (disabled)) + +#. ``simple_tx_support``: + +This argument enables support for the simple Tx path, which is not +compliant with the Virtio specification. +(Default: 0 (disabled)) I tried this patch on my server. Virtio-user will fail to probe when simple_tx_support is specified dues to the check in virtio_user_pmd_probe(): PMD: Error parsing device, invalid key virtio_user_pmd_probe(): error when parsing param vdev_probe(): failed to initialize virtio_user0 device EAL: Bus (vdev) probe failed. Thanks for the heads-up, I hadn't tried with virtio-user. I'll post a new version with this fixed soon. diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 5833dad73..052dd056a 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1331,6 +1331,8 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev) if (hw->use_simple_tx) { PMD_INIT_LOG(INFO, "virtio: using simple Tx path on port %u", eth_dev->data->port_id); + PMD_INIT_LOG(WARNING, + "virtio: simple Tx path does not comply with Virtio spec"); eth_dev->tx_pkt_burst = virtio_xmit_pkts_simple; } else { PMD_INIT_LOG(INFO, "virtio: using standard Tx path on port %u", @@ -1790,6 +1792,66 @@ rte_virtio_pmd_init(void) rte_pci_register(&rte_virtio_pmd); } +#define VIRTIO_SIMPLE_TX_SUPPORT "simple_tx_support" + +static int virtio_dev_args_check(const char *key, const char *val, + void *opaque) +{ + struct rte_eth_dev *dev = opaque; + struct virtio_hw *hw = dev->data->dev_private; + unsigned long tmp; + int ret = 0; + + errno = 0; + tmp = strtoul(val, NULL, 0); + if (errno) { + PMD_INIT_LOG(INFO, + "%s: \"%s\" is not a valid integer", key, val); + return errno; + } + + if (strcmp(VIRTIO_SIMPLE_TX_SUPPORT, key) == 0) + hw->support_simple_tx = !!tmp; + + return ret; +} + +static int +virtio_dev_args(struct rte_eth_dev *dev) +{ + struct rte_kvargs *kvlist; + struct rte_devargs *devargs; + const char *valid_args[] = { + VIRTIO_SIMPLE_TX_SUPPORT, + NULL, + }; checkpatch is complaining about above definition: WARNING:STATIC_CONST_CHAR_ARRAY: char * array declaration might be better as static const #96: FILE: drivers/net/virtio/virtio_ethdev.c:1824: + const char *valid_args[] = { Yeah, I missed the warning when running my build scripts and noticed it from the upstream CI mail. It will be fixed in next version. Thanks! Maxime Best regards, Tiwei Bie
Re: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for cross compiling from x86
Hi Jerin, Bruce and Thomas, To fix the meson cross build issue(host clang + cross gcc), we have to decouple clang options from gcc ones. Currently the options for gcc and clang tightly coupled as they share a single meson project and both added to the project arguments, this is the root cause. I have a patch to remove the specific options from the project arguments and add it individually to C_FLAGS. It basically work, but it changed a lot of meson.build files. Any comments are welcome. Best Regards, Gavin > -Original Message- > From: Gavin Hu > Sent: Tuesday, June 12, 2018 9:28 AM > To: 'Jerin Jacob' > Cc: Bruce Richardson ; Thomas Monjalon > ; dev@dpdk.org; nd > Subject: RE: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for cross > compiling from x86 > > > > > -Original Message- > > From: Jerin Jacob > > Sent: Monday, June 4, 2018 8:51 PM > > To: Gavin Hu > > Cc: Bruce Richardson ; Thomas Monjalon > > ; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for cross > > compiling from x86 > > > > -Original Message- > > > Date: Mon, 4 Jun 2018 06:03:34 + > > > From: Gavin Hu > > > To: Jerin Jacob , Bruce Richardson > > > , Thomas Monjalon > > > > CC: "dev@dpdk.org" > > > Subject: RE: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for > > > cross compiling from x86 > > > > > > See my inline comments: > > > > > > > -Original Message- > > > > From: Jerin Jacob > > > > Sent: Thursday, May 31, 2018 3:36 AM > > > > To: Gavin Hu > > > > Cc: dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for > > > > cross compiling from x86 > > > > > > > > -Original Message- > > > > > Date: Tue, 29 May 2018 18:43:36 +0800 > > > > > From: Gavin Hu > > > > > To: dev@dpdk.org > > > > > CC: gavin...@arm.com > > > > > Subject: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for > > > > > cross compiling from x86 > > > > > X-Mailer: git-send-email 2.1.4 > > > > > > > > > > + 1. EXTRA_CFLAGS and EXTRA_LDFLAGS should be added to include > > > > > + the > > > > NUMA headers and link the library respectively, > > > > > + if the step > > > > > + :ref:`argment_the_cross_toolcain_with_numa_support` was > > > > skipped therefore the toolchain was not > > > > > + argmented with NUMA support. > > > > > + > > > > > + 2. RTE_DEVEL_BUILD has to be disabled, otherwise the numa.h > > > > > + file gets > > > > > > > > If the warnings are from numa.h then please use -isystem > > > install dir> instead of disabling RTE_DEVEL_BUILD. > > > > > > > [Gavin Hu] This is a good advice, I verified it okay and can upload > > > a new > > patch. > > > > > > > > + a lot of compiling errors of Werror=cast-qual, > > > > > + Werror=strict-prototypes > > > > and Werror=old-style-definition. > > > > > + An example is given below: > > > > > + > > > > > + .. code-block:: console > > > > > + > > > > > + make -j CROSS=aarch64-linux-gnu- CONFIG_RTE_KNI_KMOD=n > > > > CONFIG_RTE_EAL_IGB_UIO=n > > > > > + RTE_DEVEL_BUILD=n EXTRA_CFLAGS="- > > I/include" > > > > EXTRA_LDFLAGS= > > > > > + "-L/lib -lnuma" > > > > > + > > > > > > > > As discussed earlier, meson cross build instruction is missing. > > > > > > > [Gavin Hu] I reproduced the meson build issue Bruce reported, as > > > shown > > below. > > > It was not introduced by gcc, nor clang, it was actually introduced > > > by meson.build, see line #65 of > > > http://www.dpdk.org/browse/dpdk/tree/config/meson.build > > > Even worse, "has_argument" is not reliable(refer here: > > http://mesonbuild.com/Compiler-properties.html#has-argument) for > some > > compilers. > > > This is the case of gcc and clang, which caused the 4 warning > > > options were > > included in the whole project, either the compiler is gcc or clang, > > cross or native. > > > This finally caused the unrecognized warning options. > > > > > > I tried to disable the warning options, then the compiling got lots > > > of noisy > > warnings and errors. > > > > > > To fix this issue, we need to create a meson subproject for > > > pmdinfogen, > > the change is not little and I am not familiar with this. > > > > > > Any comments are welcome! > > > > > > If I am not wrong, This is specific to host compiler issue with specific > version. > > Right? The build steps will remain same, so as far as this patch > > concerned, you can add the meson build step in this patch. > > > [Gavin Hu] Hi Jerin, > Sorry for late response, but I am still keeping working on that(some more > patches are in process of internal review). > The host compiler issue with specific version was fixed by one of my patch. > > With all my patches applied: > For GNU Makefile, > 1. Host clang + cross gcc works > 2. Host gcc + cross gcc works > > For Meson/Ninja: > 3. Host gcc + cross gcc works > 4. Host clang + cross gcc does NOT work > > The root cause of number 4 is clear, it is a meson build project issue. > Both gcc and clang take all the 4 pr
Re: [dpdk-dev] [PATCH v5 2/2] doc: add a guide doc for cross compiling from x86
12/06/2018 14:06, Gavin Hu: > Hi Jerin, Bruce and Thomas, > > To fix the meson cross build issue(host clang + cross gcc), we have to > decouple clang options from gcc ones. > Currently the options for gcc and clang tightly coupled as they share a > single meson project and both added to the project arguments, this is the > root cause. > > I have a patch to remove the specific options from the project arguments and > add it individually to C_FLAGS. It basically work, but it changed a lot of > meson.build files. Why is it changing many files? Can we have a fix in a common place? Can you show your patch please?
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
11/06/2018 18:35, Wiles, Keith: > > > On Jun 11, 2018, at 11:06 AM, Ophir Munk wrote: > > > > This commit explains how to manually compile the C source file > > tap_bpf_program.c into an ELF file using the clang compiler. > > The code in tap_bpf_program.c requires definitions found in iproute2 > > source code. This commit suggests cloning the iproute2 git tree and > > include its path in the clang command. It also adds inclusion of file > > bpf_api.h (required for eBPF definitions) which is located in iproute2 > > source tree. For more details refer to TAP documentation. > > This commit is related to commits [1] and [2]. > > Normally I would have suggested that eBPF be disable in the TAP driver as it > requires external code and programs, but that ship has sailed. The external programs are required only to generate new instructions, changing the behaviour of the BPF program. Currently, the instructions for RSS behaviour are provided. > I would like to see building the tap_bpf_program.o as a target in the > Makefile, this way the developer can just run the ‘make bpf_program’ target > and it would be simpler and less error prone. For this to happen, we need to improve the tools. It is a work in progress. This is a very first step to use Linux BPF with DPDK. If there are more interests, we should really streamline its usage for all parts of DPDK which runs on top of some kernel code.
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
> On Jun 12, 2018, at 7:26 AM, Thomas Monjalon wrote: > > 11/06/2018 18:35, Wiles, Keith: >> >>> On Jun 11, 2018, at 11:06 AM, Ophir Munk wrote: >>> >>> This commit explains how to manually compile the C source file >>> tap_bpf_program.c into an ELF file using the clang compiler. >>> The code in tap_bpf_program.c requires definitions found in iproute2 >>> source code. This commit suggests cloning the iproute2 git tree and >>> include its path in the clang command. It also adds inclusion of file >>> bpf_api.h (required for eBPF definitions) which is located in iproute2 >>> source tree. For more details refer to TAP documentation. >>> This commit is related to commits [1] and [2]. >> >> Normally I would have suggested that eBPF be disable in the TAP driver as it >> requires external code and programs, but that ship has sailed. > > The external programs are required only to generate new instructions, > changing the behaviour of the BPF program. > Currently, the instructions for RSS behaviour are provided. > >> I would like to see building the tap_bpf_program.o as a target in the >> Makefile, this way the developer can just run the ‘make bpf_program’ target >> and it would be simpler and less error prone. > > For this to happen, we need to improve the tools. In what way do we need to improve the tools and which tools are we talking about. Building the .o file below appears to be a simple set of command lines. I have a question in my original email about what tool. > It is a work in progress. > This is a very first step to use Linux BPF with DPDK. > If there are more interests, we should really streamline its usage > for all parts of DPDK which runs on top of some kernel code. streamlining other parts of DPDK would be nice, but we are now talking about the tap/eBPF patch. > > > Regards, Keith
Re: [dpdk-dev] [RFC] net/tap: add queues when attaching from secondary process
> On Jun 7, 2018, at 6:24 PM, Raslan Darawsheh wrote: > > Hi, > > As you know that currently TAP pmd support attaching a secondary process to a > primary process. > But, it's still lacking the ability to do Rx/Tx burst since it's lacking the > necessary fds for RX/TX queues, > And the setting of Rx/Tx burst function. > > This patch the main purpose is to exchange the fds between the processes > throw the IPC massages > And to set the Rx/Tx functions for the secondary. > > I hope I explained it properly, please let me know if you still didn't get > it. I see the code sending the FD’s of primary and secondary to each other and the code looks fine. The problem I see is what I asked before in the comments below, which is the FDs on one process can not be used on another process without the kernel converting the FD for the given process. Is this the case here or not? https://stackoverflow.com/questions/8037746/is-there-an-easier-way-to-share-file-descriptors-between-unrelated-processes-on Regards, Keith
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
12/06/2018 14:36, Wiles, Keith: > > > On Jun 12, 2018, at 7:26 AM, Thomas Monjalon wrote: > > > > 11/06/2018 18:35, Wiles, Keith: > >> > >>> On Jun 11, 2018, at 11:06 AM, Ophir Munk wrote: > >>> > >>> This commit explains how to manually compile the C source file > >>> tap_bpf_program.c into an ELF file using the clang compiler. > >>> The code in tap_bpf_program.c requires definitions found in iproute2 > >>> source code. This commit suggests cloning the iproute2 git tree and > >>> include its path in the clang command. It also adds inclusion of file > >>> bpf_api.h (required for eBPF definitions) which is located in iproute2 > >>> source tree. For more details refer to TAP documentation. > >>> This commit is related to commits [1] and [2]. > >> > >> Normally I would have suggested that eBPF be disable in the TAP driver as > >> it requires external code and programs, but that ship has sailed. > > > > The external programs are required only to generate new instructions, > > changing the behaviour of the BPF program. > > Currently, the instructions for RSS behaviour are provided. > > > >> I would like to see building the tap_bpf_program.o as a target in the > >> Makefile, this way the developer can just run the ‘make bpf_program’ > >> target and it would be simpler and less error prone. As explained in the documentation, for now there is a dependency on iproute2 for the compilation of this BPF program. So we cannot make it as simple as a "make command". Probably that we can rework it to change the dependency. I heard there are some good BPF libraries available now? > > For this to happen, we need to improve the tools. > > In what way do we need to improve the tools and which tools are we talking > about. Building the .o file below appears to be a simple set of command > lines. I have a question in my original email about what tool. The .o file is only the an intermediate file. The next step (numbered as 5 in this patch) is to extract the section of BPF instructions to be uploaded in the kernel. This step must be done by a "tool". Ophir did it by hacking tc, but it is not upstreamed yet. There could be other ways (possibly easier) to achieve the same result. > > It is a work in progress. Contributions are welcome. > > This is a very first step to use Linux BPF with DPDK. > > If there are more interests, we should really streamline its usage > > for all parts of DPDK which runs on top of some kernel code. > > streamlining other parts of DPDK would be nice, but we are now talking about > the tap/eBPF patch.
Re: [dpdk-dev] [PATCH 0/7] Make unit tests great again
+Cc Jananee 07/06/2018 23:01, Anatoly Burakov: > Previously, unit tests were running in groups. There were > technical reasons why that was the case (mostly having to do > with limiting memory), but it was hard to maintain and update > the autotest script. > > In 18.05, limiting of memory at DPDK startup was no longer > necessary, as DPDK allocates memory at runtime as needed. This > has the implication that the old test grouping can now be > retired and replaced with a more sensible way of running unit > tests (using multiprocessing pool of workers and a queue of > tests). This patchset accomplishes exactly that. > > This patchset conflicts with some of the earlier work on > autotests [1] [2] [3], but i think it presents a cleaner > solution for some of the problems highlighted by those patch > series. I can integrate those patches into this series if > need be. > > [1] http://dpdk.org/dev/patchwork/patch/40370/ > [2] http://dpdk.org/dev/patchwork/patch/40371/ > [3] http://dpdk.org/dev/patchwork/patch/40372/ It may be interesting to work on lists of tests as done in the following patch: http://dpdk.org/dev/patchwork/patch/40373/ The idea is to split tests in several categories: - basic and short test - longer and lower priority - performance test As a long term solution, we can think about making category an attribute inside the test itself?
Re: [dpdk-dev] [PATCH 2/7] net/mlx5: remove redundant objects in probe code
On Sun, Jun 10, 2018 at 11:00:59AM +, Xueming(Steven) Li wrote: > Ack. Trivial issue related to other patch found , not sure whether it good to > fix it here. > > - config.tso = ((device_attr_ex.tso_caps.max_tso > 0) && > > - (device_attr_ex.tso_caps.supported_qpts & > > + config.tso = (attr.tso_caps.max_tso > 0 && > > + (attr.tso_caps.supported_qpts & > > (1 << IBV_QPT_RAW_PACKET))); > > Not related to this patch, wrong indent. No problem, I'll add an extra space for v2. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 1/7] net/mlx5: rename confusing object in probe code
On Sun, Jun 10, 2018 at 11:00:57AM +, Xueming(Steven) Li wrote: > Ack except one minor question below. > > - mlx5_glue->dv_query_device(attr_ctx, &attrs_out); > > - if (attrs_out.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) { > > - if (attrs_out.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) { > > + mlx5_glue->dv_query_device(ctx, &dv_attr); > > Should ctx be attr_ctx?. Indeed, seems like I didn't validate this patch on its own after splitting it from another. Will fix in v2, thanks. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 3/7] net/mlx5: split PCI from generic probing code
On Sun, Jun 10, 2018 at 12:59:06PM +, Xueming(Steven) Li wrote: > Hi Adrien, > > The logic looks much more clear now with the split. > > - len = snprintf(name, sizeof(name), PCI_PRI_FMT, > > -pci_dev->addr.domain, pci_dev->addr.bus, > > -pci_dev->addr.devid, pci_dev->addr.function); > > - if (attr.orig_attr.phys_port_cnt > 1) > > - snprintf(name + len, sizeof(name), " port %u", i); > > + if (attr->orig_attr.phys_port_cnt > 1) > > + snprintf(name, sizeof(name), "%s", dpdk_dev->name); > > + else > > + snprintf(name, sizeof(name), "%s port %u", > > +dpdk_dev->name, port); > > Name contains port only if phys_port_cnt > 1 in previous logic, are you sure? Nice catch, will fix it for v2. This wasn't noticed because this code is replaced in a subsequent patch of the series. > > + for (i = 0; i < attr.orig_attr.phys_port_cnt; ++i) { > > + eth_list[i] = mlx5_dev_spawn_one(dpdk_dev, ibv_dev, vf, > > +&attr, i + 1); > > + if (eth_list[i]) > > + continue; > > + /* Save rte_errno and roll back in case of failure. */ > > + ret = rte_errno; > > + while (i--) { > > + mlx5_dev_close(eth_list[i]); > > + if (rte_eal_process_type() == RTE_PROC_PRIMARY) > > + rte_free(eth_list[i]->data->dev_private); > > + claim_zero(rte_eth_dev_release_port(eth_list[i])); > > + } > > + free(eth_list); > > + rte_errno = ret; > > + return NULL; > > The code is correct, but I personally prefer to move complex error handling > to > dedicate "error:" block to make the code clear. Since it's the only place where this failure can occur, I'll leave it as is on the basis that doing so saves a goto statement. Those should be avoided where possible. It would have been a different story if the same error handling code was called from multiple places. > > +static int > > +mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, > > + struct rte_pci_device *pci_dev) { > > + struct ibv_device **ibv_list; > > + struct rte_eth_dev **eth_list = NULL; > > + int vf; > > + int ret; > > + > > + assert(pci_drv == &mlx5_driver); > > + switch (pci_dev->id.device_id) { > > + case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF: > > + case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF: > > + case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF: > > + case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF: > > + vf = 1; > > + break; > > + default: > > + vf = 0; > > + } > > How about use a macro for vf detection and invoke in mlx5_dev_spawn_one(). > Seems it not used in outer callers. mlx5_dev_spawn_one() can be invoked with IB devices not backed by PCI (e.g. vdevs), for which the caller may still knowingly ask for VF behavior, either by user request or through other means. In this case, the caller happens to be a PCI probing function which, based on the device ID, easily determines whether VF behavior shall be requested. This is basically the only place where PCI device ID can be checked. Adding a macro here would only obfuscate this check. Adding it in mlx5_dev_spawn_one() would entangle PCI and generic code again, the opposite of the purpose of this patch, therefore I'll leave it unmodified for v2. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 6/7] net/mlx5: probe all port representors
On Tue, Jun 12, 2018 at 06:42:38AM +, Xueming(Steven) Li wrote: > Hi Adrien, > > > -Original Message- > > From: dev On Behalf Of Adrien Mazarguil > > Sent: Saturday, May 26, 2018 12:35 AM > > To: Shahaf Shuler > > Cc: dev@dpdk.org > > Subject: [dpdk-dev] [PATCH 6/7] net/mlx5: probe all port representors > > > > Probe existing port representors in addition to their master device and > > associate them automatically. > > > > To avoid name collision between Ethernet devices, their names use the same > > convention as ixgbe and > > i40e PMDs, that is, instead of only a PCI address in DBDF notation: > > > > - "net_{DBDF}_0" for master/switch devices. > > - "net_{DBDF}_representor_{rep}" with "rep" starting from 0 for port > > representors. > > > > Both optionally suffixed with "_port_{num}" instead of " port {num}" for > > devices that expose several > > Verbs ports (note this is never the case on mlx5, but kept for historical > > reasons for the time being). > > > > (Patch based on prior work from Yuanhan Liu) > > > > Signed-off-by: Adrien Mazarguil > > + /* > > +* Allocate a switch domain for master devices and share it with > > +* port representors. > > +*/ > > + if (!master) { > > + priv->representor = 0; > > + priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID; > > + priv->rep_id = 0; > > + err = rte_eth_switch_domain_alloc(&priv->domain_id); > > So domain_id is used to identify relation between PF and representor port? Right, as described by the API [1]. What's missing in this patch is that this information is not reported through dev_infos_get(). I'll add it for v2. [1] https://www.dpdk.org/doc/guides/prog_guide/switch_representation.html#port-representors > > + DRV_LOG(INFO, "%u port(s) detected on \"%s\"", > > + attr.orig_attr.phys_port_cnt, ibv_dev[j]->name); > > + tmp = realloc(eth_list, sizeof(*eth_list) * > > + (n + attr.orig_attr.phys_port_cnt + 1)); > > + if (!tmp) { > > rte_errno = errno; > > - return NULL; > > + goto error; > > } > > + eth_list = tmp; > > for (i = 0; i < attr.orig_attr.phys_port_cnt; ++i) { > > Is there any mlx5 device that support more than physical ports on same PCI id? > I remember this is major difference between mlx5 and mlx4. Unlike mlx4, no known mlx5 adapter exposes more than a single port per PCI address. This is kept for historical reasons (i.e. because it's always been there) and the fact that since Verbs exposes it, phys_port_cnt should at least be checked, if only to return an error when somehow multiple ports are detected. This series just makes easier to drop this intermediate loop if deemed necessary later, simply by calling mlx5_dev_spawn_one() directly. In the meantime, separation of low-level PCI from Verbs device instantiation means there is an extra step to iterate on all possible IB ports. Same behavior as usual, modifying it is not the purpose of this series. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [dpdk-dev, 5/7] net/mlx5: add port representor awareness
On Mon, Jun 11, 2018 at 01:05:55PM +, Xueming(Steven) Li wrote: > Hi Adrien, > > Couldn't find your original email from inbox anyway, have to start a new > thread here. > > +static int > > +mlx5_cmp_ibv_name(const void *a, const void *b) > > +{ > > + const char *name_a = (*(const struct ibv_device *const *)a)->name; > > + const char *name_b = (*(const struct ibv_device *const *)b)->name; > > + size_t i = 0; > > + > > + while (name_a[i] && name_a[i] == name_b[i]) > > + ++i; > > + return atoi(name_a + i) - atoi(name_b + i); > > Comparing "1" and "10" here will return 0, does this matter? Sure it does! The whole point of this function is precisely to avoid this kind of issues. I'll fix it for v2, thanks. > > + if (n > 1) { > > + /* > > +* The existence of several matching entries means port > > +* representors have been instantiated. No existing Verbs > > +* call nor /sys entries can tell them apart at this point. > > +* > > +* While definitely hackish, assume their names are numbered > > +* based on order of creation with master device first, > > +* followed by first port representor, followed by the > > +* second one and so on. > > +*/ > > + DRV_LOG(WARNING, > > + "probing device with port representors involves" > > + " heuristics with uncertain outcome"); > > + qsort(ibv_match, n, sizeof(*ibv_match), mlx5_cmp_ibv_name); > > + DRV_LOG(WARNING, "assuming \"%s\" is the master device", > > + ibv_match[0]->name); > > + for (ret = 1; ret < n; ++ret) > > + DRV_LOG(WARNING, > > + "assuming \"%s\" is port representor #%d", > > + ibv_match[ret]->name, ret - 1); > > Such dump will appear when attaching each rep port, how about just > do it for PF in DEBUG level? It occurs only once when probing the master device and detecting the presence of representors, not for each of them. I prefer to leave it as a warning because this detection approach, while an undeniable improvement over not checking anything and ending up configuring the wrong netdevice, is unfortunately not 100% accurate. This will be improved, however users must be warned of possible issues in the meantime. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port representors
On Tue, Jun 12, 2018 at 08:02:17AM +, Xueming(Steven) Li wrote: > Hi Adrien, > > > -Original Message- > > From: dev On Behalf Of Adrien Mazarguil > > Sent: Saturday, May 26, 2018 12:35 AM > > To: Shahaf Shuler > > Cc: dev@dpdk.org > > Subject: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port > > representors > > > > Prior to this patch, all port representors detected on a given device were > > probed and Ethernet devices > > instantiated for each of them. > > > > This patch adds support for the standard "representor" parameter, which > > implies that port representors > > are not probed by default anymore, except for the list provided through > > device arguments. > > > > (Patch based on prior work from Yuanhan Liu) > > > > Signed-off-by: Adrien Mazarguil > > @@ -1142,13 +1149,30 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > > struct rte_eth_dev **eth_list = NULL; > > struct ibv_context *ctx; > > struct ibv_device_attr_ex attr; > > + struct rte_eth_devargs eth_da; > > Not related to this patch, from this data structure, maximum representor > count is 32, > customer might use VF on container environment, 32 is far from requirement. > We need > additional work here. A workaround is that users call this api multiple times > with different > representor IDs. 32 ought to be enough for anybody! Not sure I understand your concern actually. One can't instantiate more representors than there are DPDK ports because the limit for both is RTE_MAX_ETHPORTS (i.e. 1 representor = 1 DPDK port). Users who want to spawn more than 32 DPDK ports overall must increase RTE_MAX_ETHPORTS regardless. > > void *tmp; > > unsigned int i; > > unsigned int j = 0; > > unsigned int n = 0; > > int ret; > > > > + if (dpdk_dev->devargs) { > > + ret = rte_eth_devargs_parse(dpdk_dev->devargs->args, ð_da); > > + if (ret) > > + goto error; > > + } else { > > + memset(ð_da, 0, sizeof(eth_da)); > > + } > > next: > > + if (j) { > > + unsigned int k; > > + > > + for (k = 0; k < eth_da.nb_representor_ports; ++k) > > + if (eth_da.representor_ports[k] == j - 1) > > + break; > > + if (k == eth_da.nb_representor_ports) > > + goto skip; > > + } > > errno = 0; > > ctx = mlx5_glue->open_device(ibv_dev[j]); > > Need a range check for j here. I think it's properly checked. j == 0 stands for "master device", always found at index 0 and probed. Representors devices, if any, start at index 1 which triggers the previous block. This block makes sure that a given representor is indeed enabled before either spawning the related device (pass through with a valid "j") or skipping it altogether (goto skip). I intend to leave this patch as is for v2. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [RFC] net/tap: add queues when attaching from secondary process
Hi Keith, Yes you are right about that, This is exactly the case and I'll need to provide a V2 for this with the kernel mapping. Kindest regards, Raslan Darawsheh -Original Message- From: Wiles, Keith [mailto:keith.wi...@intel.com] Sent: Tuesday, June 12, 2018 3:46 PM To: Raslan Darawsheh Cc: Thomas Monjalon ; dev@dpdk.org Subject: Re: [dpdk-dev] [RFC] net/tap: add queues when attaching from secondary process > On Jun 7, 2018, at 6:24 PM, Raslan Darawsheh wrote: > > Hi, > > As you know that currently TAP pmd support attaching a secondary process to a > primary process. > But, it's still lacking the ability to do Rx/Tx burst since it's > lacking the necessary fds for RX/TX queues, And the setting of Rx/Tx burst > function. > > This patch the main purpose is to exchange the fds between the > processes throw the IPC massages And to set the Rx/Tx functions for the > secondary. > > I hope I explained it properly, please let me know if you still didn't get > it. I see the code sending the FD’s of primary and secondary to each other and the code looks fine. The problem I see is what I asked before in the comments below, which is the FDs on one process can not be used on another process without the kernel converting the FD for the given process. Is this the case here or not? https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F8037746%2Fis-there-an-easier-way-to-share-file-descriptors-between-unrelated-processes-on&data=02%7C01%7Crasland%40mellanox.com%7C6356c8b95d8042516c1108d5d0628740%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636644043986930914&sdata=nX7%2FhXRDKrccz%2BVDUyZ%2BwB088r4R9KEQGeoZ0i2CJZk%3D&reserved=0 Regards, Keith
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
> On Jun 12, 2018, at 7:58 AM, Thomas Monjalon wrote: > > 12/06/2018 14:36, Wiles, Keith: >> >>> On Jun 12, 2018, at 7:26 AM, Thomas Monjalon wrote: >>> >>> 11/06/2018 18:35, Wiles, Keith: > On Jun 11, 2018, at 11:06 AM, Ophir Munk wrote: > > This commit explains how to manually compile the C source file > tap_bpf_program.c into an ELF file using the clang compiler. > The code in tap_bpf_program.c requires definitions found in iproute2 > source code. This commit suggests cloning the iproute2 git tree and > include its path in the clang command. It also adds inclusion of file > bpf_api.h (required for eBPF definitions) which is located in iproute2 > source tree. For more details refer to TAP documentation. > This commit is related to commits [1] and [2]. Normally I would have suggested that eBPF be disable in the TAP driver as it requires external code and programs, but that ship has sailed. >>> >>> The external programs are required only to generate new instructions, >>> changing the behaviour of the BPF program. >>> Currently, the instructions for RSS behaviour are provided. >>> I would like to see building the tap_bpf_program.o as a target in the Makefile, this way the developer can just run the ‘make bpf_program’ target and it would be simpler and less error prone. > > As explained in the documentation, for now there is a dependency on iproute2 > for the compilation of this BPF program. > So we cannot make it as simple as a "make command". > Probably that we can rework it to change the dependency. > I heard there are some good BPF libraries available now? Well the dependence of iproute2 is really no different then requiring say libnuma, they just have to pull the code first to type the ‘make bpf_program’ right? If that is the case then a make target make sense to me. If iproute2 is not found then an error, right? > >>> For this to happen, we need to improve the tools. >> >> In what way do we need to improve the tools and which tools are we talking >> about. Building the .o file below appears to be a simple set of command >> lines. I have a question in my original email about what tool. > > The .o file is only the an intermediate file. > The next step (numbered as 5 in this patch) is to extract the section > of BPF instructions to be uploaded in the kernel. > This step must be done by a "tool". Ophir did it by hacking tc, > but it is not upstreamed yet. > There could be other ways (possibly easier) to achieve the same result. Please change the doc to reflect the tool is not upstreamed yet and the developer needs to figure out how to extract the data from the binary. I used objdump -j l3_l4 -s tap_bpf_program.o and got a hex dump of the l3_l4 section bf16 61681000 ... Someone schooled in the art of Python coding should be able to convert that output to a ‘C’ data array. :-) > >>> It is a work in progress. > > Contributions are welcome. > >>> This is a very first step to use Linux BPF with DPDK. >>> If there are more interests, we should really streamline its usage >>> for all parts of DPDK which runs on top of some kernel code. >> >> streamlining other parts of DPDK would be nice, but we are now talking about >> the tap/eBPF patch. Regards, Keith
Re: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port representors
> -Original Message- > From: Adrien Mazarguil > Sent: Tuesday, June 12, 2018 9:21 PM > To: Xueming(Steven) Li > Cc: Shahaf Shuler ; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port > representors > > On Tue, Jun 12, 2018 at 08:02:17AM +, Xueming(Steven) Li wrote: > > Hi Adrien, > > > > > -Original Message- > > > From: dev On Behalf Of Adrien Mazarguil > > > Sent: Saturday, May 26, 2018 12:35 AM > > > To: Shahaf Shuler > > > Cc: dev@dpdk.org > > > Subject: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port > > > representors > > > > > > Prior to this patch, all port representors detected on a given > > > device were probed and Ethernet devices instantiated for each of them. > > > > > > This patch adds support for the standard "representor" parameter, > > > which implies that port representors are not probed by default anymore, > > > except for the list > provided through device arguments. > > > > > > (Patch based on prior work from Yuanhan Liu) > > > > > > Signed-off-by: Adrien Mazarguil > > > > @@ -1142,13 +1149,30 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > > > struct rte_eth_dev **eth_list = NULL; > > > struct ibv_context *ctx; > > > struct ibv_device_attr_ex attr; > > > + struct rte_eth_devargs eth_da; > > > > Not related to this patch, from this data structure, maximum > > representor count is 32, customer might use VF on container > > environment, 32 is far from requirement. We need additional work here. > > A workaround is that users call this api multiple times with different > > representor IDs. > > 32 ought to be enough for anybody! > > Not sure I understand your concern actually. One can't instantiate more > representors than there are > DPDK ports because the limit for both is RTE_MAX_ETHPORTS (i.e. 1 representor > = 1 DPDK port). Users > who want to spawn more than 32 DPDK ports overall must increase > RTE_MAX_ETHPORTS regardless. ConnectX-5 support 127 VFs, but as you said, increasing RTE_MAX_ETHPORTS should work. > > > > void *tmp; > > > unsigned int i; > > > unsigned int j = 0; > > > unsigned int n = 0; > > > int ret; > > > > > > + if (dpdk_dev->devargs) { > > > + ret = rte_eth_devargs_parse(dpdk_dev->devargs->args, ð_da); > > > + if (ret) > > > + goto error; > > > + } else { > > > + memset(ð_da, 0, sizeof(eth_da)); > > > + } > > > next: > > > + if (j) { > > > + unsigned int k; > > > + > > > + for (k = 0; k < eth_da.nb_representor_ports; ++k) > > > + if (eth_da.representor_ports[k] == j - 1) > > > + break; > > > + if (k == eth_da.nb_representor_ports) > > > + goto skip; > > > + } > > > errno = 0; > > > ctx = mlx5_glue->open_device(ibv_dev[j]); > > > > Need a range check for j here. > > I think it's properly checked. j == 0 stands for "master device", always > found at index 0 and probed. > Representors devices, if any, start at index 1 which triggers the previous > block. This block makes > sure that a given representor is indeed enabled before either spawning the > related device (pass > through with a valid "j") or skipping it altogether (goto skip). Yes, this code looks good. What I wanted to ask what if dev args specify an invalid rep id, e.g. 33. This code walk through silently w/o warning, it works, but it better to have a warning if input id out of range. > > I intend to leave this patch as is for v2. > > -- > Adrien Mazarguil > 6WIND
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
12/06/2018 15:33, Wiles, Keith: > > > On Jun 12, 2018, at 7:58 AM, Thomas Monjalon wrote: > > > > 12/06/2018 14:36, Wiles, Keith: > >> > >>> On Jun 12, 2018, at 7:26 AM, Thomas Monjalon wrote: > >>> > >>> 11/06/2018 18:35, Wiles, Keith: > > > On Jun 11, 2018, at 11:06 AM, Ophir Munk wrote: > > > > This commit explains how to manually compile the C source file > > tap_bpf_program.c into an ELF file using the clang compiler. > > The code in tap_bpf_program.c requires definitions found in iproute2 > > source code. This commit suggests cloning the iproute2 git tree and > > include its path in the clang command. It also adds inclusion of file > > bpf_api.h (required for eBPF definitions) which is located in iproute2 > > source tree. For more details refer to TAP documentation. > > This commit is related to commits [1] and [2]. > > Normally I would have suggested that eBPF be disable in the TAP driver > as it requires external code and programs, but that ship has sailed. > >>> > >>> The external programs are required only to generate new instructions, > >>> changing the behaviour of the BPF program. > >>> Currently, the instructions for RSS behaviour are provided. > >>> > I would like to see building the tap_bpf_program.o as a target in the > Makefile, this way the developer can just run the ‘make bpf_program’ > target and it would be simpler and less error prone. > > > > As explained in the documentation, for now there is a dependency on iproute2 > > for the compilation of this BPF program. > > So we cannot make it as simple as a "make command". > > Probably that we can rework it to change the dependency. > > I heard there are some good BPF libraries available now? > > Well the dependence of iproute2 is really no different then requiring say > libnuma, they just have to pull the code first to type the ‘make bpf_program’ > right? The iproute2 dependency is different because it is not a library. The .h file is never packaged. So we need to download the sources and set -I to this directory. > If that is the case then a make target make sense to me. If iproute2 is not > found then an error, right? > >>> For this to happen, we need to improve the tools. > >> > >> In what way do we need to improve the tools and which tools are we talking > >> about. Building the .o file below appears to be a simple set of command > >> lines. I have a question in my original email about what tool. > > > > The .o file is only the an intermediate file. > > The next step (numbered as 5 in this patch) is to extract the section > > of BPF instructions to be uploaded in the kernel. > > This step must be done by a "tool". Ophir did it by hacking tc, > > but it is not upstreamed yet. > > There could be other ways (possibly easier) to achieve the same result. > > Please change the doc to reflect the tool is not upstreamed yet and the > developer needs to figure out how to extract the data from the binary. > > I used objdump -j l3_l4 -s tap_bpf_program.o and got a hex dump of the l3_l4 > section > > bf16 61681000 > ... > > Someone schooled in the art of Python coding should be able to convert that > output to a ‘C’ data array. :-) > > > > >>> It is a work in progress. > > > > Contributions are welcome. > > > >>> This is a very first step to use Linux BPF with DPDK. > >>> If there are more interests, we should really streamline its usage > >>> for all parts of DPDK which runs on top of some kernel code. > >> > >> streamlining other parts of DPDK would be nice, but we are now talking > >> about the tap/eBPF patch. > > Regards, > Keith > >
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
> On Jun 12, 2018, at 8:44 AM, Thomas Monjalon wrote: > > 12/06/2018 15:33, Wiles, Keith: >> >>> On Jun 12, 2018, at 7:58 AM, Thomas Monjalon wrote: >>> >>> 12/06/2018 14:36, Wiles, Keith: > On Jun 12, 2018, at 7:26 AM, Thomas Monjalon wrote: > > 11/06/2018 18:35, Wiles, Keith: >> >>> On Jun 11, 2018, at 11:06 AM, Ophir Munk wrote: >>> >>> This commit explains how to manually compile the C source file >>> tap_bpf_program.c into an ELF file using the clang compiler. >>> The code in tap_bpf_program.c requires definitions found in iproute2 >>> source code. This commit suggests cloning the iproute2 git tree and >>> include its path in the clang command. It also adds inclusion of file >>> bpf_api.h (required for eBPF definitions) which is located in iproute2 >>> source tree. For more details refer to TAP documentation. >>> This commit is related to commits [1] and [2]. >> >> Normally I would have suggested that eBPF be disable in the TAP driver >> as it requires external code and programs, but that ship has sailed. > > The external programs are required only to generate new instructions, > changing the behaviour of the BPF program. > Currently, the instructions for RSS behaviour are provided. > >> I would like to see building the tap_bpf_program.o as a target in the >> Makefile, this way the developer can just run the ‘make bpf_program’ >> target and it would be simpler and less error prone. >>> >>> As explained in the documentation, for now there is a dependency on iproute2 >>> for the compilation of this BPF program. >>> So we cannot make it as simple as a "make command". >>> Probably that we can rework it to change the dependency. >>> I heard there are some good BPF libraries available now? >> >> Well the dependence of iproute2 is really no different then requiring say >> libnuma, they just have to pull the code first to type the ‘make >> bpf_program’ right? > > The iproute2 dependency is different because it is not a library. > The .h file is never packaged. > So we need to download the sources and set -I to this directory. To eliminate the -I problem the clone could be done inside the tap directory and -I ./iproute2/include used, right? The make target could even clone the code into the tap directory, which means we can solve these problems you are pointing out. Go ahead and do what you want here, but making it harder for the developer should not be our normally mode of operation. > > >> If that is the case then a make target make sense to me. If iproute2 is not >> found then an error, right? > > > For this to happen, we need to improve the tools. In what way do we need to improve the tools and which tools are we talking about. Building the .o file below appears to be a simple set of command lines. I have a question in my original email about what tool. >>> >>> The .o file is only the an intermediate file. >>> The next step (numbered as 5 in this patch) is to extract the section >>> of BPF instructions to be uploaded in the kernel. >>> This step must be done by a "tool". Ophir did it by hacking tc, >>> but it is not upstreamed yet. >>> There could be other ways (possibly easier) to achieve the same result. >> >> Please change the doc to reflect the tool is not upstreamed yet and the >> developer needs to figure out how to extract the data from the binary. >> >> I used objdump -j l3_l4 -s tap_bpf_program.o and got a hex dump of the l3_l4 >> section >> >> bf16 61681000 >> ... >> >> Someone schooled in the art of Python coding should be able to convert that >> output to a ‘C’ data array. :-) >> >>> > It is a work in progress. >>> >>> Contributions are welcome. >>> > This is a very first step to use Linux BPF with DPDK. > If there are more interests, we should really streamline its usage > for all parts of DPDK which runs on top of some kernel code. streamlining other parts of DPDK would be nice, but we are now talking about the tap/eBPF patch. >> >> Regards, >> Keith Regards, Keith
Re: [dpdk-dev] [PATCH 3/6] cryptodev: remove max number of sessions
> -Original Message- > From: Tomasz Duszynski [mailto:t...@semihalf.com] > Sent: Tuesday, June 12, 2018 12:38 PM > To: De Lara Guarch, Pablo > Cc: Doherty, Declan ; akhil.go...@nxp.com; > ravi1.ku...@amd.com; jerin.ja...@caviumnetworks.com; Zhang, Roy Fan > ; Trahe, Fiona ; > t...@semihalf.com; jianjay.z...@huawei.com; dev@dpdk.org > Subject: Re: [PATCH 3/6] cryptodev: remove max number of sessions > > Hello Pablo, > > On Fri, Jun 08, 2018 at 11:02:31PM +0100, Pablo de Lara wrote: > > Sessions are not created and stored in the crypto device > > anymore, since now the session mempool is created > > at the application level. > > > > Therefore the limitation of the maximum number of sessions > > that can be created should not be dependent of the crypto device. > > > > Signed-off-by: Pablo de Lara ... > > diff --git a/drivers/crypto/mvsam/rte_mrvl_pmd.c > b/drivers/crypto/mvsam/rte_mrvl_pmd.c > > index 1b6029a56..822b6cac7 100644 > > --- a/drivers/crypto/mvsam/rte_mrvl_pmd.c > > +++ b/drivers/crypto/mvsam/rte_mrvl_pmd.c > > @@ -719,7 +719,6 @@ cryptodev_mrvl_crypto_create(const char *name, > > internals = dev->data->dev_private; > > > > internals->max_nb_qpairs = init_params->max_nb_queue_pairs; > > - internals->max_nb_sessions = init_params->max_nb_sessions; > > > > /* > > * ret == -EEXIST is correct, it means DMA > > @@ -734,8 +733,6 @@ cryptodev_mrvl_crypto_create(const char *name, > > "DMA memory has been already initialized by a > different driver."); > > } > > > > - sam_params.max_num_sessions = internals->max_nb_sessions; > > This will not fly since library maintains separate list of sessions. > We have to initialize this number to something sane. Since we cannot > get it from userspace perhaps make that compile-time configurable > by adding separate CONFIG_? Hi Tomasz, If you need to have an actual limit, you could define it internally (not adding an external configuration option), but bear in mind that This won't prevent an application from trying to allocate more sessions. If your PMD has a limitation on the maximum number of sessions, then maybe this change won't work for you (removing the maximum number of sessions), so let me know and we can discuss this. Thanks, Pablo P.S. Please, next time, strip out the code that you are not commenting, as it was hard to find this question :)
Re: [dpdk-dev] [dpdk-dev, 5/7] net/mlx5: add port representor awareness
> -Original Message- > From: Adrien Mazarguil > Sent: Tuesday, June 12, 2018 9:20 PM > To: Xueming(Steven) Li > Cc: Shahaf Shuler ; dev@dpdk.org > Subject: Re: [dpdk-dev,5/7] net/mlx5: add port representor awareness > > On Mon, Jun 11, 2018 at 01:05:55PM +, Xueming(Steven) Li wrote: > > Hi Adrien, > > > > Couldn't find your original email from inbox anyway, have to start a new > > thread here. > > > > +static int > > > +mlx5_cmp_ibv_name(const void *a, const void *b) { > > > + const char *name_a = (*(const struct ibv_device *const *)a)->name; > > > + const char *name_b = (*(const struct ibv_device *const *)b)->name; > > > + size_t i = 0; > > > + > > > + while (name_a[i] && name_a[i] == name_b[i]) > > > + ++i; > > > + return atoi(name_a + i) - atoi(name_b + i); > > > > Comparing "1" and "10" here will return 0, does this matter? > > Sure it does! The whole point of this function is precisely to avoid this > kind of issues. I'll fix it > for v2, thanks. > > > > > + if (n > 1) { > > > + /* > > > + * The existence of several matching entries means port > > > + * representors have been instantiated. No existing Verbs > > > + * call nor /sys entries can tell them apart at this point. > > > + * > > > + * While definitely hackish, assume their names are numbered > > > + * based on order of creation with master device first, > > > + * followed by first port representor, followed by the > > > + * second one and so on. > > > + */ > > > + DRV_LOG(WARNING, > > > + "probing device with port representors involves" > > > + " heuristics with uncertain outcome"); > > > + qsort(ibv_match, n, sizeof(*ibv_match), mlx5_cmp_ibv_name); > > > + DRV_LOG(WARNING, "assuming \"%s\" is the master device", > > > + ibv_match[0]->name); > > > + for (ret = 1; ret < n; ++ret) > > > + DRV_LOG(WARNING, > > > + "assuming \"%s\" is port representor #%d", > > > + ibv_match[ret]->name, ret - 1); > > > > Such dump will appear when attaching each rep port, how about just do > > it for PF in DEBUG level? > > It occurs only once when probing the master device and detecting the presence > of representors, not for > each of them. > > I prefer to leave it as a warning because this detection approach, while an > undeniable improvement > over not checking anything and ending up configuring the wrong netdevice, is > unfortunately not 100% > accurate. This will be improved, however users must be warned of possible > issues in the meantime. Yes, the list is different when VF number changed outside, a full dump should be helpful, how about set it to DEBUG or INFO level? Users don't need to know this, just for debug purpose. > > -- > Adrien Mazarguil > 6WIND
Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C file
Please note that other than cloning iproute2 we also need to install clang and llvm tools versions 3.7 and upper. Not sure there are clang and llvm packages of the required versions for the common distributions. I compiled the tools source code and installed them manually. > -Original Message- > From: Wiles, Keith [mailto:keith.wi...@intel.com] > Sent: Tuesday, June 12, 2018 4:53 PM > To: Thomas Monjalon > Cc: dev@dpdk.org; Ophir Munk ; Pascal Mazon > ; Olga Shern > Subject: Re: [dpdk-dev] [PATCH v1] net/tap: explain how to compile eBPF C > file > > > > > On Jun 12, 2018, at 8:44 AM, Thomas Monjalon > wrote: > > > > 12/06/2018 15:33, Wiles, Keith: > >> > >>> On Jun 12, 2018, at 7:58 AM, Thomas Monjalon > wrote: > >>> > >>> 12/06/2018 14:36, Wiles, Keith: > > > On Jun 12, 2018, at 7:26 AM, Thomas Monjalon > wrote: > > > > 11/06/2018 18:35, Wiles, Keith: > >> > >>> On Jun 11, 2018, at 11:06 AM, Ophir Munk > wrote: > >>> > >>> This commit explains how to manually compile the C source file > >>> tap_bpf_program.c into an ELF file using the clang compiler. > >>> The code in tap_bpf_program.c requires definitions found in > >>> iproute2 source code. This commit suggests cloning the iproute2 > >>> git tree and include its path in the clang command. It also adds > >>> inclusion of file bpf_api.h (required for eBPF definitions) > >>> which is located in iproute2 source tree. For more details refer to > TAP documentation. > >>> This commit is related to commits [1] and [2]. > >> > >> Normally I would have suggested that eBPF be disable in the TAP > driver as it requires external code and programs, but that ship has sailed. > > > > The external programs are required only to generate new > > instructions, changing the behaviour of the BPF program. > > Currently, the instructions for RSS behaviour are provided. > > > >> I would like to see building the tap_bpf_program.o as a target in the > Makefile, this way the developer can just run the ‘make bpf_program’ target > and it would be simpler and less error prone. > >>> > >>> As explained in the documentation, for now there is a dependency on > >>> iproute2 for the compilation of this BPF program. > >>> So we cannot make it as simple as a "make command". > >>> Probably that we can rework it to change the dependency. > >>> I heard there are some good BPF libraries available now? > >> > >> Well the dependence of iproute2 is really no different then requiring say > libnuma, they just have to pull the code first to type the ‘make bpf_program’ > right? > > > > The iproute2 dependency is different because it is not a library. > > The .h file is never packaged. > > So we need to download the sources and set -I to this directory. > > To eliminate the -I problem the clone could be done inside the tap directory > and -I ./iproute2/include used, right? > The make target could even clone the code into the tap directory, which > means we can solve these problems you are pointing out. > > Go ahead and do what you want here, but making it harder for the developer > should not be our normally mode of operation. > > > > > > >> If that is the case then a make target make sense to me. If iproute2 is not > found then an error, right? > > > > > > For this to happen, we need to improve the tools. > > In what way do we need to improve the tools and which tools are we > talking about. Building the .o file below appears to be a simple set of > command lines. I have a question in my original email about what tool. > >>> > >>> The .o file is only the an intermediate file. > >>> The next step (numbered as 5 in this patch) is to extract the > >>> section of BPF instructions to be uploaded in the kernel. > >>> This step must be done by a "tool". Ophir did it by hacking tc, but > >>> it is not upstreamed yet. > >>> There could be other ways (possibly easier) to achieve the same result. > >> > >> Please change the doc to reflect the tool is not upstreamed yet and the > developer needs to figure out how to extract the data from the binary. > >> > >> I used objdump -j l3_l4 -s tap_bpf_program.o and got a hex dump of > >> the l3_l4 section > >> > >> bf16 61681000 ... > >> > >> Someone schooled in the art of Python coding should be able to > >> convert that output to a ‘C’ data array. :-) > >> > >>> > > It is a work in progress. > >>> > >>> Contributions are welcome. > >>> > > This is a very first step to use Linux BPF with DPDK. > > If there are more interests, we should really streamline its usage > > for all parts of DPDK which runs on top of some kernel code. > > streamlining other parts of DPDK would be nice, but we are now talking > about the tap/eBPF patch. > >> > >> Regards, > >> Keith > > Regards, > Keith
Re: [dpdk-dev] [dpdk-dev, 5/7] net/mlx5: add port representor awareness
On Tue, Jun 12, 2018 at 01:57:45PM +, Xueming(Steven) Li wrote: > > -Original Message- > > From: Adrien Mazarguil > > Sent: Tuesday, June 12, 2018 9:20 PM > > To: Xueming(Steven) Li > > Cc: Shahaf Shuler ; dev@dpdk.org > > Subject: Re: [dpdk-dev,5/7] net/mlx5: add port representor awareness > > > > On Mon, Jun 11, 2018 at 01:05:55PM +, Xueming(Steven) Li wrote: > > > Hi Adrien, > > > > > > Couldn't find your original email from inbox anyway, have to start a new > > > thread here. > > > > > > +static int > > > > +mlx5_cmp_ibv_name(const void *a, const void *b) { > > > > + const char *name_a = (*(const struct ibv_device *const > > > > *)a)->name; > > > > + const char *name_b = (*(const struct ibv_device *const > > > > *)b)->name; > > > > + size_t i = 0; > > > > + > > > > + while (name_a[i] && name_a[i] == name_b[i]) > > > > + ++i; > > > > + return atoi(name_a + i) - atoi(name_b + i); > > > > > > Comparing "1" and "10" here will return 0, does this matter? > > > > Sure it does! The whole point of this function is precisely to avoid this > > kind of issues. I'll fix it > > for v2, thanks. > > > > > > > > + if (n > 1) { > > > > + /* > > > > +* The existence of several matching entries means port > > > > +* representors have been instantiated. No existing > > > > Verbs > > > > +* call nor /sys entries can tell them apart at this > > > > point. > > > > +* > > > > +* While definitely hackish, assume their names are > > > > numbered > > > > +* based on order of creation with master device first, > > > > +* followed by first port representor, followed by the > > > > +* second one and so on. > > > > +*/ > > > > + DRV_LOG(WARNING, > > > > + "probing device with port representors involves" > > > > + " heuristics with uncertain outcome"); > > > > + qsort(ibv_match, n, sizeof(*ibv_match), > > > > mlx5_cmp_ibv_name); > > > > + DRV_LOG(WARNING, "assuming \"%s\" is the master device", > > > > + ibv_match[0]->name); > > > > + for (ret = 1; ret < n; ++ret) > > > > + DRV_LOG(WARNING, > > > > + "assuming \"%s\" is port representor > > > > #%d", > > > > + ibv_match[ret]->name, ret - 1); > > > > > > Such dump will appear when attaching each rep port, how about just do > > > it for PF in DEBUG level? > > > > It occurs only once when probing the master device and detecting the > > presence of representors, not for > > each of them. > > > > I prefer to leave it as a warning because this detection approach, while an > > undeniable improvement > > over not checking anything and ending up configuring the wrong netdevice, > > is unfortunately not 100% > > accurate. This will be improved, however users must be warned of possible > > issues in the meantime. > > Yes, the list is different when VF number changed outside, a full dump should > be helpful, how about set it to DEBUG or INFO level? > Users don't need to know this, just for debug purpose. Because by "assuming" things, there's a slight possibility for the PMD to be wrong. It's not a mere debug message. Using the wrong IB device may silently wreak havoc on some systems, therefore since the PMD can't be sure, users are warned about this fact and what IB devices will be used. This calls for extra attention and manual checks (where possible) *before* an issue is encountered, until we replace this piece of code with a safer approach. I think a WARNING level is warranted. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port representors
> -Original Message- > From: dev On Behalf Of Adrien Mazarguil > Sent: Saturday, May 26, 2018 12:35 AM > To: Shahaf Shuler > Cc: dev@dpdk.org > Subject: [dpdk-dev] [PATCH 7/7] net/mlx5: add parameter for port representors > > Prior to this patch, all port representors detected on a given device were > probed and Ethernet devices > instantiated for each of them. > > This patch adds support for the standard "representor" parameter, which > implies that port representors > are not probed by default anymore, except for the list provided through > device arguments. > > (Patch based on prior work from Yuanhan Liu) > > Signed-off-by: Adrien Mazarguil > --- > doc/guides/nics/mlx5.rst| 12 > doc/guides/prog_guide/poll_mode_drv.rst | 2 ++ > drivers/net/mlx5/mlx5.c | 25 + > 3 files changed, 39 insertions(+) > > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index > 79c982e29..5229e546c 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -388,6 +388,18 @@ Run-time configuration > >Disabled by default. > > +- ``representor`` parameter [list] > + > + This parameter can be used to instantiate DPDK Ethernet devices from > + existing port (or VF) representors configured on the device. > + > + It is a standard parameter whose format is described in > + :ref:`ethernet_device_standard_device_arguments`. > + > + For instance, to probe port representors 0 through 2:: > + > +representor=[0-2] > + > Firmware configuration > ~~ > > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > b/doc/guides/prog_guide/poll_mode_drv.rst > index af82352a0..58d49ba0f 100644 > --- a/doc/guides/prog_guide/poll_mode_drv.rst > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > @@ -365,6 +365,8 @@ Ethernet Device API > > The Ethernet device API exported by the Ethernet PMDs is described in the > *DPDK API Reference*. > > +.. _ethernet_device_standard_device_arguments: > + > Ethernet Device Standard Device Arguments > ~ > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index > 09afca63c..216753ba6 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -90,6 +90,9 @@ > /* Activate Netlink support in VF mode. */ #define MLX5_VF_NL_EN "vf_nl_en" > > +/* Select port representors to instantiate. */ #define MLX5_REPRESENTOR > +"representor" > + > #ifndef HAVE_IBV_MLX5_MOD_MPW > #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2) #define > MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3) > @@ -420,6 +423,9 @@ mlx5_args_check(const char *key, const char *val, void > *opaque) > struct mlx5_dev_config *config = opaque; > unsigned long tmp; > > + /* No-op, port representors are processed in mlx5_dev_spawn(). */ > + if (!strcmp(MLX5_REPRESENTOR, key)) > + return 0; > errno = 0; > tmp = strtoul(val, NULL, 0); > if (errno) { > @@ -492,6 +498,7 @@ mlx5_args(struct mlx5_dev_config *config, struct > rte_devargs *devargs) > MLX5_RX_VEC_EN, > MLX5_L3_VXLAN_EN, > MLX5_VF_NL_EN, > + MLX5_REPRESENTOR, > NULL, > }; > struct rte_kvargs *kvlist; > @@ -1142,13 +1149,30 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > struct rte_eth_dev **eth_list = NULL; > struct ibv_context *ctx; > struct ibv_device_attr_ex attr; > + struct rte_eth_devargs eth_da; > void *tmp; > unsigned int i; > unsigned int j = 0; > unsigned int n = 0; > int ret; > > + if (dpdk_dev->devargs) { > + ret = rte_eth_devargs_parse(dpdk_dev->devargs->args, ð_da); > + if (ret) > + goto error; > + } else { > + memset(ð_da, 0, sizeof(eth_da)); > + } > next: > + if (j) { > + unsigned int k; > + > + for (k = 0; k < eth_da.nb_representor_ports; ++k) > + if (eth_da.representor_ports[k] == j - 1) > + break; > + if (k == eth_da.nb_representor_ports) > + goto skip; > + } > errno = 0; > ctx = mlx5_glue->open_device(ibv_dev[j]); > if (!ctx) { > @@ -1187,6 +1211,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > goto error; > ++n; > } > +skip: > if (ibv_dev[++j]) > goto next; int rte_eth_dev_attach(const char *devargs, uint16_t *port_id); The rte_eth_dev_attach api attach one device a time as only one *port_id parameter. Dev argument "82:0.0,representer[a-b] will register multiple devices in one call, is this correct behavior? I ask this because this caused testpmd CLI "port attach" crash due to only the last registered port id returned. > eth_list[n] = NULL; > -- > 2.11.0
Re: [dpdk-dev] [PATCH v10 0/5] Hyper-V/Azure netvsc PMD and bus support
Hi Stephen, I would like to try merging these patches soon. Below are some comments from my first check: - please fix most of the checkpatches warnings - please merge meson support in the related patch - please fix the doxygen of uuid (warnings from "make doc-api-html") 08/06/2018 18:59, Stephen Hemminger: > Yet another version of the Hyper-V native bus (VMBus) > and network device (netvsc) drivers. This virtual device > is used in Microsoft Hyper-V in Windows 10, Windows Server 2016 > and Azure. Most of this code was extracted from FreeBSD and some of > this is from earlier code donated by Brocade. > > Only Linux is supported at present, but the code is split > to allow future FreeBSD and Windows support. > > This version works with upstream kernel (4.16) but in that > mode only a single queue is supported. With additional > patches that are pending for 5.0 kernel, multi-queue > support works as well.
Re: [dpdk-dev] [PATCH v10 0/5] Hyper-V/Azure netvsc PMD and bus support
On Tue, 12 Jun 2018 17:21:28 +0200 Thomas Monjalon wrote: > Hi Stephen, > > I would like to try merging these patches soon. > Below are some comments from my first check: > - please fix most of the checkpatches warnings > - please merge meson support in the related patch > - please fix the doxygen of uuid (warnings from "make doc-api-html") > > > 08/06/2018 18:59, Stephen Hemminger: > > Yet another version of the Hyper-V native bus (VMBus) > > and network device (netvsc) drivers. This virtual device > > is used in Microsoft Hyper-V in Windows 10, Windows Server 2016 > > and Azure. Most of this code was extracted from FreeBSD and some of > > this is from earlier code donated by Brocade. > > > > Only Linux is supported at present, but the code is split > > to allow future FreeBSD and Windows support. > > > > This version works with upstream kernel (4.16) but in that > > mode only a single queue is supported. With additional > > patches that are pending for 5.0 kernel, multi-queue > > support works as well. > > > Ok, will do. The checkpatch warnings left were mostly about SPDX and Camel Case (PRIx64)
Re: [dpdk-dev] [PATCH 2/5] crypto/qat: move to dynamic logging for non-dp trace
Hi, > -Original Message- > From: Trahe, Fiona > Sent: Friday, May 11, 2018 12:32 PM > To: dev@dpdk.org > Cc: De Lara Guarch, Pablo ; Trahe, Fiona > ; Jozwiak, TomaszX > Subject: [PATCH 2/5] crypto/qat: move to dynamic logging for non-dp trace > > From: Tomasz Jozwiak > > For all trace not on the data-path move to dynamic logging. > > Signed-off-by: Tomasz Jozwiak > Signed-off-by: Fiona Trahe ... > +++ b/drivers/crypto/qat/qat_logs.c > @@ -0,0 +1,18 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ > + > +#include > + > +int qat_gen_logtype; > + > +static void > +qat_pci_init_log(void) > +{ > + /* Non-data-path logging for pci device and all services */ > + qat_gen_logtype = rte_log_register("pmd.qat_general"); > + if (qat_gen_logtype >= 0) > + rte_log_set_level(qat_gen_logtype, RTE_LOG_NOTICE); } > + > +RTE_INIT(qat_pci_init_log); I am seeing a compilation error on clang: drivers/crypto/qat/qat_logs.c:18:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] RTE_INIT(qat_pci_init_log); ^ Thanks, Pablo
[dpdk-dev] [PATCH v4 0/2] TAP TSO
v1: - Initial release v2: - Fixing cksum errors - TCP segment size refers to TCP payload size (not including l2,l3,l4 headers) v3 (8 May 2018): - Bug fixing in case input mbuf is segmented - Following review comments by Raslan Darawsha This patch implements TAP TSO (TSP segmentation offload) in SW. It uses dpdk library librte_gso. Dpdk librte_gso library segments large TCP payloads (e.g. 64K bytes) into smaller size buffers. By supporting TSO offload capability in software a TAP device can be used as a failsafe sub device and be paired with another PCI device which supports TSO capability in HW. This patch includes 2 commits: 1. Calculation of IP/TCP/UDP checksums for multi segments packets. Previously checksum offload was skipped if the number of packet segments was greater than 1. This commit removes this limitation. It is required before supporting TAP TSO since the generated TCP TSO may be composed of two segments where the first segment includes l2,l3,l4 headers. 2. TAP TSO implementation: calling rte_gso_segment() to segment large TCP packets. This commits creates of a small private mbuf pool in TAP PMD required by librte_gso. The number of buffers will be 64 - each of 128 bytes length. TSO segments size refers to TCP payload size (not including l2,l3,l4 headers) librte_gso supports TCP segmentation above IPv4 The serie was marked as suppressed before 18.05 release in order to include it in 18.08. v4 (12 Jun 2018): Updates following a rebase on top of v18.05 Ophir Munk (2): net/tap: calculate checksums of multi segs packets net/tap: support TSO (TCP Segment Offload) drivers/net/tap/Makefile | 2 +- drivers/net/tap/rte_eth_tap.c | 309 -- drivers/net/tap/rte_eth_tap.h | 3 + mk/rte.app.mk | 4 +- 4 files changed, 244 insertions(+), 74 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH v4 1/2] net/tap: calculate checksums of multi segs packets
Prior to this commit IP/UDP/TCP checksum offload calculations were skipped in case of a multi segments packet. This commit enables TAP checksum calculations for multi segments packets. The only restriction is that the first segment must contain headers of layers 3 (IP) and 4 (UDP or TCP) Reviewed-by: Raslan Darawsheh Signed-off-by: Ophir Munk --- drivers/net/tap/rte_eth_tap.c | 158 +- 1 file changed, 110 insertions(+), 48 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index df396bf..c19f053 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -415,12 +415,43 @@ tap_tx_offload_get_queue_capa(void) DEV_TX_OFFLOAD_TCP_CKSUM; } +/* Finalize l4 checksum calculation */ static void -tap_tx_offload(char *packet, uint64_t ol_flags, unsigned int l2_len, - unsigned int l3_len) +tap_tx_l4_cksum(uint16_t *l4_cksum, uint16_t l4_phdr_cksum, + uint32_t l4_raw_cksum) { - void *l3_hdr = packet + l2_len; + if (l4_cksum) { + uint32_t cksum; + + cksum = __rte_raw_cksum_reduce(l4_raw_cksum); + cksum += l4_phdr_cksum; + + cksum = ((cksum & 0x) >> 16) + (cksum & 0x); + cksum = (~cksum) & 0x; + if (cksum == 0) + cksum = 0x; + *l4_cksum = cksum; + } +} +/* Accumaulate L4 raw checksums */ +static void +tap_tx_l4_add_rcksum(char *l4_data, unsigned int l4_len, uint16_t *l4_cksum, + uint32_t *l4_raw_cksum) +{ + if (l4_cksum == NULL) + return; + + *l4_raw_cksum = __rte_raw_cksum(l4_data, l4_len, *l4_raw_cksum); +} + +/* L3 and L4 pseudo headers checksum offloads */ +static void +tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len, + unsigned int l3_len, unsigned int l4_len, uint16_t **l4_cksum, + uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum) +{ + void *l3_hdr = packet + l2_len; if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) { struct ipv4_hdr *iph = l3_hdr; uint16_t cksum; @@ -430,38 +461,21 @@ tap_tx_offload(char *packet, uint64_t ol_flags, unsigned int l2_len, iph->hdr_checksum = (cksum == 0x) ? cksum : ~cksum; } if (ol_flags & PKT_TX_L4_MASK) { - uint16_t l4_len; - uint32_t cksum; - uint16_t *l4_cksum; void *l4_hdr; l4_hdr = packet + l2_len + l3_len; if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) - l4_cksum = &((struct udp_hdr *)l4_hdr)->dgram_cksum; + *l4_cksum = &((struct udp_hdr *)l4_hdr)->dgram_cksum; else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) - l4_cksum = &((struct tcp_hdr *)l4_hdr)->cksum; + *l4_cksum = &((struct tcp_hdr *)l4_hdr)->cksum; else return; - *l4_cksum = 0; - if (ol_flags & PKT_TX_IPV4) { - struct ipv4_hdr *iph = l3_hdr; - - l4_len = rte_be_to_cpu_16(iph->total_length) - l3_len; - cksum = rte_ipv4_phdr_cksum(l3_hdr, 0); - } else { - struct ipv6_hdr *ip6h = l3_hdr; - - /* payload_len does not include ext headers */ - l4_len = rte_be_to_cpu_16(ip6h->payload_len) - - l3_len + sizeof(struct ipv6_hdr); - cksum = rte_ipv6_phdr_cksum(l3_hdr, 0); - } - cksum += rte_raw_cksum(l4_hdr, l4_len); - cksum = ((cksum & 0x) >> 16) + (cksum & 0x); - cksum = (~cksum) & 0x; - if (cksum == 0) - cksum = 0x; - *l4_cksum = cksum; + **l4_cksum = 0; + if (ol_flags & PKT_TX_IPV4) + *l4_phdr_cksum = rte_ipv4_phdr_cksum(l3_hdr, 0); + else + *l4_phdr_cksum = rte_ipv6_phdr_cksum(l3_hdr, 0); + *l4_raw_cksum = __rte_raw_cksum(l4_hdr, l4_len, 0); } } @@ -482,17 +496,27 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) max_size = *txq->mtu + (ETHER_HDR_LEN + ETHER_CRC_LEN + 4); for (i = 0; i < nb_pkts; i++) { struct rte_mbuf *mbuf = bufs[num_tx]; - struct iovec iovecs[mbuf->nb_segs + 1]; + struct iovec iovecs[mbuf->nb_segs + 2]; struct tun_pi pi = { .flags = 0, .proto = 0x00 }; struct rte_mbuf *seg = mbuf; char m_copy[mbuf->data_len]; + int proto; int n; int j; + int
[dpdk-dev] [PATCH v4 2/2] net/tap: support TSO (TCP Segment Offload)
This commit implements TCP segmentation offload in TAP. librte_gso library is used to segment large TCP payloads (e.g. packets of 64K bytes size) into smaller MTU size buffers. By supporting TSO offload capability in software a TAP device can be used as a failsafe sub device and be paired with another PCI device which supports TSO capability in HW. For more details on librte_gso implementation please refer to dpdk documentation. The number of newly generated TCP TSO segments is limited to 64. Reviewed-by: Raslan Darawsheh Signed-off-by: Ophir Munk --- drivers/net/tap/Makefile | 2 +- drivers/net/tap/rte_eth_tap.c | 159 +++--- drivers/net/tap/rte_eth_tap.h | 3 + mk/rte.app.mk | 4 +- 4 files changed, 138 insertions(+), 30 deletions(-) diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile index ccc5c5f..3243365 100644 --- a/drivers/net/tap/Makefile +++ b/drivers/net/tap/Makefile @@ -24,7 +24,7 @@ CFLAGS += -I. CFLAGS += $(WERROR_FLAGS) LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash -LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_bus_vdev -lrte_gso CFLAGS += -DTAP_MAX_QUEUES=$(TAP_MAX_QUEUES) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index c19f053..62b931f 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -55,6 +56,9 @@ #define ETH_TAP_CMP_MAC_FMT "0123456789ABCDEFabcdef" #define ETH_TAP_MAC_ARG_FMT ETH_TAP_MAC_FIXED "|" ETH_TAP_USR_MAC_FMT +#define TAP_GSO_MBUFS_NUM 128 +#define TAP_GSO_MBUF_SEG_SIZE 128 + static struct rte_vdev_driver pmd_tap_drv; static struct rte_vdev_driver pmd_tun_drv; @@ -412,7 +416,8 @@ tap_tx_offload_get_queue_capa(void) return DEV_TX_OFFLOAD_MULTI_SEGS | DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM | - DEV_TX_OFFLOAD_TCP_CKSUM; + DEV_TX_OFFLOAD_TCP_CKSUM | + DEV_TX_OFFLOAD_TCP_TSO; } /* Finalize l4 checksum calculation */ @@ -479,23 +484,15 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len, } } -/* Callback to handle sending packets from the tap interface - */ -static uint16_t -pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +static inline void +tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs, + struct rte_mbuf **pmbufs, uint16_t l234_hlen, + uint16_t *num_packets, unsigned long *num_tx_bytes) { - struct tx_queue *txq = queue; - uint16_t num_tx = 0; - unsigned long num_tx_bytes = 0; - uint32_t max_size; int i; - if (unlikely(nb_pkts == 0)) - return 0; - - max_size = *txq->mtu + (ETHER_HDR_LEN + ETHER_CRC_LEN + 4); - for (i = 0; i < nb_pkts; i++) { - struct rte_mbuf *mbuf = bufs[num_tx]; + for (i = 0; i < num_mbufs; i++) { + struct rte_mbuf *mbuf = pmbufs[i]; struct iovec iovecs[mbuf->nb_segs + 2]; struct tun_pi pi = { .flags = 0, .proto = 0x00 }; struct rte_mbuf *seg = mbuf; @@ -503,8 +500,7 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) int proto; int n; int j; - int k; /* first index in iovecs for copying segments */ - uint16_t l234_hlen; /* length of layers 2,3,4 headers */ + int k; /* current index in iovecs for copying segments */ uint16_t seg_len; /* length of first segment */ uint16_t nb_segs; uint16_t *l4_cksum; /* l4 checksum (pseudo header + payload) */ @@ -512,10 +508,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) uint16_t l4_phdr_cksum = 0; /* TCP/UDP pseudo header checksum */ uint16_t is_cksum = 0; /* in case cksum should be offloaded */ - /* stats.errs will be incremented */ - if (rte_pktmbuf_pkt_len(mbuf) > max_size) - break; - l4_cksum = NULL; if (txq->type == ETH_TUNTAP_TYPE_TUN) { /* @@ -554,9 +546,8 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) l234_hlen = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; if (seg_len < l234_hlen) break; - - /* To change checksums, work on a -* copy of l2, l3 l4 headers. + /* To change checksums, work on a * copy of l2, l3 +* headers + l4 pseudo header */ rte_memcpy(m_copy, rte_pktmbuf_mtod(mbuf, void *),
Re: [dpdk-dev] [PATCH v4 1/2] net/tap: calculate checksums of multi segs packets
A few formatting problems I have noticed. We can review the code logic in a meeting. > On Jun 12, 2018, at 11:31 AM, Ophir Munk wrote: > > Prior to this commit IP/UDP/TCP checksum offload calculations > were skipped in case of a multi segments packet. > This commit enables TAP checksum calculations for multi segments > packets. > The only restriction is that the first segment must contain > headers of layers 3 (IP) and 4 (UDP or TCP) > > Reviewed-by: Raslan Darawsheh > Signed-off-by: Ophir Munk > --- > drivers/net/tap/rte_eth_tap.c | 158 +- > 1 file changed, 110 insertions(+), 48 deletions(-) > > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c > index df396bf..c19f053 100644 > --- a/drivers/net/tap/rte_eth_tap.c > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -415,12 +415,43 @@ tap_tx_offload_get_queue_capa(void) > DEV_TX_OFFLOAD_TCP_CKSUM; > } > > +/* Finalize l4 checksum calculation */ > static void > -tap_tx_offload(char *packet, uint64_t ol_flags, unsigned int l2_len, > -unsigned int l3_len) > +tap_tx_l4_cksum(uint16_t *l4_cksum, uint16_t l4_phdr_cksum, > + uint32_t l4_raw_cksum) > { > - void *l3_hdr = packet + l2_len; > + if (l4_cksum) { > + uint32_t cksum; > + > + cksum = __rte_raw_cksum_reduce(l4_raw_cksum); > + cksum += l4_phdr_cksum; > + > + cksum = ((cksum & 0x) >> 16) + (cksum & 0x); > + cksum = (~cksum) & 0x; > + if (cksum == 0) > + cksum = 0x; > + *l4_cksum = cksum; > + } > +} > > +/* Accumaulate L4 raw checksums */ > +static void > +tap_tx_l4_add_rcksum(char *l4_data, unsigned int l4_len, uint16_t *l4_cksum, > + uint32_t *l4_raw_cksum) > +{ > + if (l4_cksum == NULL) > + return; > + > + *l4_raw_cksum = __rte_raw_cksum(l4_data, l4_len, *l4_raw_cksum); > +} > + > +/* L3 and L4 pseudo headers checksum offloads */ > +static void > +tap_tx_l3_cksum(char *packet, uint64_t ol_flags, unsigned int l2_len, > + unsigned int l3_len, unsigned int l4_len, uint16_t **l4_cksum, > + uint16_t *l4_phdr_cksum, uint32_t *l4_raw_cksum) > +{ > + void *l3_hdr = packet + l2_len; Needs a blank line here. > if (ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4)) { > struct ipv4_hdr *iph = l3_hdr; > uint16_t cksum; > @@ -430,38 +461,21 @@ tap_tx_offload(char *packet, uint64_t ol_flags, > unsigned int l2_len, > iph->hdr_checksum = (cksum == 0x) ? cksum : ~cksum; > } > if (ol_flags & PKT_TX_L4_MASK) { > - uint16_t l4_len; > - uint32_t cksum; > - uint16_t *l4_cksum; > void *l4_hdr; > > l4_hdr = packet + l2_len + l3_len; > if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) > - l4_cksum = &((struct udp_hdr *)l4_hdr)->dgram_cksum; > + *l4_cksum = &((struct udp_hdr *)l4_hdr)->dgram_cksum; > else if ((ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) > - l4_cksum = &((struct tcp_hdr *)l4_hdr)->cksum; > + *l4_cksum = &((struct tcp_hdr *)l4_hdr)->cksum; > else > return; > - *l4_cksum = 0; > - if (ol_flags & PKT_TX_IPV4) { > - struct ipv4_hdr *iph = l3_hdr; > - > - l4_len = rte_be_to_cpu_16(iph->total_length) - l3_len; > - cksum = rte_ipv4_phdr_cksum(l3_hdr, 0); > - } else { > - struct ipv6_hdr *ip6h = l3_hdr; > - > - /* payload_len does not include ext headers */ > - l4_len = rte_be_to_cpu_16(ip6h->payload_len) - > - l3_len + sizeof(struct ipv6_hdr); > - cksum = rte_ipv6_phdr_cksum(l3_hdr, 0); > - } > - cksum += rte_raw_cksum(l4_hdr, l4_len); > - cksum = ((cksum & 0x) >> 16) + (cksum & 0x); > - cksum = (~cksum) & 0x; > - if (cksum == 0) > - cksum = 0x; > - *l4_cksum = cksum; > + **l4_cksum = 0; > + if (ol_flags & PKT_TX_IPV4) > + *l4_phdr_cksum = rte_ipv4_phdr_cksum(l3_hdr, 0); > + else > + *l4_phdr_cksum = rte_ipv6_phdr_cksum(l3_hdr, 0); > + *l4_raw_cksum = __rte_raw_cksum(l4_hdr, l4_len, 0); > } > } > > @@ -482,17 +496,27 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > max_size = *txq->mtu + (ETHER_HDR_LEN + ETHER_CRC_LEN + 4); > for (i = 0; i < nb_pkts; i++) { > struct rte_mbuf *mbuf = bufs[num_tx]; > - struct iovec iovecs[mbuf->nb_segs + 1]; > + struct iovec iovecs[mbuf->nb_seg
Re: [dpdk-dev] [PATCH v4 2/2] net/tap: support TSO (TCP Segment Offload)
> On Jun 12, 2018, at 11:31 AM, Ophir Munk wrote: > > This commit implements TCP segmentation offload in TAP. > librte_gso library is used to segment large TCP payloads (e.g. packets > of 64K bytes size) into smaller MTU size buffers. > By supporting TSO offload capability in software a TAP device can be used > as a failsafe sub device and be paired with another PCI device which > supports TSO capability in HW. > > For more details on librte_gso implementation please refer to dpdk > documentation. > The number of newly generated TCP TSO segments is limited to 64. > > Reviewed-by: Raslan Darawsheh > Signed-off-by: Ophir Munk > --- > drivers/net/tap/Makefile | 2 +- > drivers/net/tap/rte_eth_tap.c | 159 +++--- > drivers/net/tap/rte_eth_tap.h | 3 + > mk/rte.app.mk | 4 +- > 4 files changed, 138 insertions(+), 30 deletions(-) > > diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile > index ccc5c5f..3243365 100644 > --- a/drivers/net/tap/Makefile > +++ b/drivers/net/tap/Makefile > @@ -24,7 +24,7 @@ CFLAGS += -I. > CFLAGS += $(WERROR_FLAGS) > LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring > LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash > -LDLIBS += -lrte_bus_vdev > +LDLIBS += -lrte_bus_vdev -lrte_gso > > CFLAGS += -DTAP_MAX_QUEUES=$(TAP_MAX_QUEUES) > > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c > index c19f053..62b931f 100644 > --- a/drivers/net/tap/rte_eth_tap.c > +++ b/drivers/net/tap/rte_eth_tap.c > @@ -17,6 +17,7 @@ > #include > #include > > +#include > #include > #include > #include > @@ -55,6 +56,9 @@ > #define ETH_TAP_CMP_MAC_FMT "0123456789ABCDEFabcdef" > #define ETH_TAP_MAC_ARG_FMT ETH_TAP_MAC_FIXED "|" ETH_TAP_USR_MAC_FMT > > +#define TAP_GSO_MBUFS_NUM128 > +#define TAP_GSO_MBUF_SEG_SIZE128 > + > static struct rte_vdev_driver pmd_tap_drv; > static struct rte_vdev_driver pmd_tun_drv; > > @@ -412,7 +416,8 @@ tap_tx_offload_get_queue_capa(void) > return DEV_TX_OFFLOAD_MULTI_SEGS | > DEV_TX_OFFLOAD_IPV4_CKSUM | > DEV_TX_OFFLOAD_UDP_CKSUM | > -DEV_TX_OFFLOAD_TCP_CKSUM; > +DEV_TX_OFFLOAD_TCP_CKSUM | > +DEV_TX_OFFLOAD_TCP_TSO; > } > > /* Finalize l4 checksum calculation */ > @@ -479,23 +484,15 @@ tap_tx_l3_cksum(char *packet, uint64_t ol_flags, > unsigned int l2_len, > } > } > > -/* Callback to handle sending packets from the tap interface > - */ > -static uint16_t > -pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) > +static inline void > +tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs, > + struct rte_mbuf **pmbufs, uint16_t l234_hlen, > + uint16_t *num_packets, unsigned long *num_tx_bytes) > { > - struct tx_queue *txq = queue; > - uint16_t num_tx = 0; > - unsigned long num_tx_bytes = 0; > - uint32_t max_size; > int i; > > - if (unlikely(nb_pkts == 0)) > - return 0; > - > - max_size = *txq->mtu + (ETHER_HDR_LEN + ETHER_CRC_LEN + 4); > - for (i = 0; i < nb_pkts; i++) { > - struct rte_mbuf *mbuf = bufs[num_tx]; > + for (i = 0; i < num_mbufs; i++) { > + struct rte_mbuf *mbuf = pmbufs[i]; > struct iovec iovecs[mbuf->nb_segs + 2]; > struct tun_pi pi = { .flags = 0, .proto = 0x00 }; > struct rte_mbuf *seg = mbuf; > @@ -503,8 +500,7 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > int proto; > int n; > int j; > - int k; /* first index in iovecs for copying segments */ > - uint16_t l234_hlen; /* length of layers 2,3,4 headers */ > + int k; /* current index in iovecs for copying segments */ > uint16_t seg_len; /* length of first segment */ > uint16_t nb_segs; > uint16_t *l4_cksum; /* l4 checksum (pseudo header + payload) */ > @@ -512,10 +508,6 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > uint16_t l4_phdr_cksum = 0; /* TCP/UDP pseudo header checksum */ > uint16_t is_cksum = 0; /* in case cksum should be offloaded */ > > - /* stats.errs will be incremented */ > - if (rte_pktmbuf_pkt_len(mbuf) > max_size) > - break; > - > l4_cksum = NULL; > if (txq->type == ETH_TUNTAP_TYPE_TUN) { > /* > @@ -554,9 +546,8 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, > uint16_t nb_pkts) > l234_hlen = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len; > if (seg_len < l234_hlen) > break; > - > - /* To change checksums, work on a > - * copy of l2, l3 l4 headers. > + /* To change checksums, wo
[dpdk-dev] [PATCH v11 0/4] Hyper-V/Azure netvsc PMD and bus support
Latest version of the Hyper-V native bus (VMBus) and network device (netvsc) drivers. This virtual device is used in Microsoft Hyper-V in Windows 10, Windows Server 2016 and Azure. Most of this code was extracted from FreeBSD and some of this is from earlier code donated by Brocade. Only Linux is supported at present, but the code is split to allow future FreeBSD and Windows support. This version works best with 4.17 kernel which has all the necessary patches for multi-queue support. It is possible to use with 4.16 but then only a single queue is supported. Device binding is best done via driverctl; this required some additional fixes to kernel and driverctl to work correctly. Linux kernel vmbus support needed to support sysfs driver_override and driverctl needed to handle non-PCI bus from udev. https://gitlab.com/driverctl/driverctl/merge_requests/3 http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-April/118889.html The remaining TODO's are: - Performance testing - Rx buffer copy avoidance (external mbuf) - Transparent VF support v11 - merge meson build into bus and netvsc patches - fix docbook for rte_uuid - fix a couple of checkpatch warnings - reduce logging during debug for stats v10 - resolve RSS setup - add documentation about restart issue - update documentation to refer to 4.17 - use same RSS key as MLX v9 - fix places where still targeted at previous release - add map entry for rte_uuid - fix meson build dependencies v8 - targeted for 18.08, move release notes - use Ted's libuuid (not FreeBSD) because that is 3 clause BSD license versus 2 clause in FreeBSD - minor checkpatch whitespace fixes v7 - add EAL UUID functions from BSD to remove dependency on libuuid this means device can be enabled by default and eliminates build issues - add support for latest NetVSP protocol (from hayaingz) - add probe finish for compatability with 18.05 - rebase to 18.05-rc3 Stephen Hemminger (4): eal: add rte_uuid support bus/vmbus: add hyper-v virtual bus support net/netvsc: add hyper-v netvsc network device net/netvsc: add documentation MAINTAINERS | 11 + config/common_base| 13 + config/common_linuxapp|3 + doc/guides/nics/features/netvsc.ini | 23 + doc/guides/nics/index.rst |1 + doc/guides/nics/netvsc.rst| 101 ++ doc/guides/rel_notes/known_issues.rst | 20 + doc/guides/rel_notes/release_18_08.rst|6 + drivers/bus/Makefile |1 + drivers/bus/meson.build |2 +- drivers/bus/vmbus/Makefile| 36 + drivers/bus/vmbus/linux/Makefile |3 + drivers/bus/vmbus/linux/vmbus_bus.c | 355 + drivers/bus/vmbus/linux/vmbus_uio.c | 390 ++ drivers/bus/vmbus/meson.build | 18 + drivers/bus/vmbus/private.h | 132 ++ drivers/bus/vmbus/rte_bus_vmbus.h | 396 ++ drivers/bus/vmbus/rte_bus_vmbus_version.map | 28 + drivers/bus/vmbus/rte_vmbus_reg.h | 344 + drivers/bus/vmbus/vmbus_bufring.c | 241 drivers/bus/vmbus/vmbus_channel.c | 406 ++ drivers/bus/vmbus/vmbus_common.c | 286 drivers/bus/vmbus/vmbus_common_uio.c | 232 drivers/net/Makefile |1 + drivers/net/meson.build |2 +- drivers/net/netvsc/Makefile | 23 + drivers/net/netvsc/hn_ethdev.c| 755 ++ drivers/net/netvsc/hn_logs.h | 35 + drivers/net/netvsc/hn_nvs.c | 535 drivers/net/netvsc/hn_nvs.h | 245 drivers/net/netvsc/hn_rndis.c | 1097 +++ drivers/net/netvsc/hn_rndis.h | 26 + drivers/net/netvsc/hn_rxtx.c | 1216 + drivers/net/netvsc/hn_var.h | 139 ++ drivers/net/netvsc/meson.build|7 + drivers/net/netvsc/ndis.h | 378 + drivers/net/netvsc/rndis.h| 414 ++ drivers/net/netvsc/rte_pmd_netvsc_version.map |5 + lib/librte_eal/bsdapp/eal/Makefile|1 + lib/librte_eal/common/Makefile|2 +- lib/librte_eal/common/eal_common_uuid.c | 193 +++ lib/librte_eal/common/include/rte_uuid.h | 129 ++ lib/librte_eal/common/meson.build |2 + lib/librte_eal/linuxapp/eal/Makefile |1 + lib/librte_eal/rte_eal_version.map|9 + mk/rte.app.mk |2 + 46 files changed, 8262 insertions(+), 3 deletions(-) create mode 100644 doc/guides/nics/features/netvsc.ini create mode 100644 doc/guides/nics/netvsc.rst c
[dpdk-dev] [PATCH v11 1/4] eal: add rte_uuid support
Since uuid functions may not be available everywhere, implement uuid functions in DPDK. These are based off the BSD licensed libuuid in util-link. Signed-off-by: Stephen Hemminger --- lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/eal_common_uuid.c | 193 +++ lib/librte_eal/common/include/rte_uuid.h | 129 +++ lib/librte_eal/common/meson.build| 2 + lib/librte_eal/linuxapp/eal/Makefile | 1 + lib/librte_eal/rte_eal_version.map | 9 ++ 7 files changed, 336 insertions(+), 1 deletion(-) create mode 100644 lib/librte_eal/common/eal_common_uuid.c create mode 100644 lib/librte_eal/common/include/rte_uuid.h diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index 3fd33f1e4318..13eafca61243 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_options.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_thread.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_proc.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_fbarray.c +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_uuid.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_malloc.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile index 48f870f24bef..68a680bcb934 100644 --- a/lib/librte_eal/common/Makefile +++ b/lib/librte_eal/common/Makefile @@ -16,7 +16,7 @@ INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h INC += rte_malloc.h rte_keepalive.h rte_time.h INC += rte_service.h rte_service_component.h INC += rte_bitmap.h rte_vfio.h rte_hypervisor.h rte_test.h -INC += rte_reciprocal.h rte_fbarray.h +INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h GENERIC_INC += rte_spinlock.h rte_memcpy.h rte_cpuflags.h rte_rwlock.h diff --git a/lib/librte_eal/common/eal_common_uuid.c b/lib/librte_eal/common/eal_common_uuid.c new file mode 100644 index ..1b93c5b37ea1 --- /dev/null +++ b/lib/librte_eal/common/eal_common_uuid.c @@ -0,0 +1,193 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (C) 1996, 1997 Theodore Ts'o. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, and the entire permission notice in its entirety, + *including the disclaimer of warranties. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. The name of the author may not be used to endorse or promote + *products derived from this software without specific prior + *written permission. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF + * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT + * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE + * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH + * DAMAGE. + */ + +#include +#include +#include +#include +#include + +#include + +/* UUID packed form */ +struct uuid { + uint32_ttime_low; + uint16_ttime_mid; + uint16_ttime_hi_and_version; + uint16_tclock_seq; + uint8_t node[6]; +}; + +static void uuid_pack(const struct uuid *uu, rte_uuid_t ptr) +{ + uint32_t tmp; + uint8_t *out = ptr; + + tmp = uu->time_low; + out[3] = (uint8_t) tmp; + tmp >>= 8; + out[2] = (uint8_t) tmp; + tmp >>= 8; + out[1] = (uint8_t) tmp; + tmp >>= 8; + out[0] = (uint8_t) tmp; + + tmp = uu->time_mid; + out[5] = (uint8_t) tmp; + tmp >>= 8; + out[4] = (uint8_t) tmp; + + tmp = uu->time_hi_and_version; + out[7] = (uint8_t) tmp; + tmp >>= 8; + out[6] = (uint8_t) tmp; + + tmp = uu->clock_seq; + out[9] = (uint8_t) tmp; + tmp >>= 8; + out[8] = (uint8_t) tmp; + + memcpy(out+10, uu->node, 6); +} + +static void uuid_unpack(const rte_uuid_t in, struct uuid *uu) +{ + con
[dpdk-dev] [PATCH v11 4/4] net/netvsc: add documentation
From: Stephen Hemminger Matching documentation for new netvsc device. Includes a brief note about the restart issue. Signed-off-by: Stephen Hemminger --- doc/guides/nics/features/netvsc.ini| 23 ++ doc/guides/nics/index.rst | 1 + doc/guides/nics/netvsc.rst | 101 + doc/guides/rel_notes/known_issues.rst | 20 + doc/guides/rel_notes/release_18_08.rst | 6 ++ 5 files changed, 151 insertions(+) create mode 100644 doc/guides/nics/features/netvsc.ini create mode 100644 doc/guides/nics/netvsc.rst diff --git a/doc/guides/nics/features/netvsc.ini b/doc/guides/nics/features/netvsc.ini new file mode 100644 index ..2ff6042bf47b --- /dev/null +++ b/doc/guides/nics/features/netvsc.ini @@ -0,0 +1,23 @@ +; +; Supported features of the 'netvsc' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +Speed capabilities = P +Link status = Y +Queue start/stop = Y +Scattered Rx = Y +Promiscuous mode = Y +Allmulticast mode= Y +Basic stats = Y +Stats per queue = Y +Extended stats = Y +Multiprocess aware = Y +Other kdrv = Y +ARMv7= Y +ARMv8= Y +x86-32 = Y +x86-64 = Y +Usage doc= Y +MTU update = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index ddb9eb7a922e..d4eae2b8f046 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -33,6 +33,7 @@ Network Interface Controller Drivers mlx4 mlx5 mvpp2 +netvsc nfp octeontx qede diff --git a/doc/guides/nics/netvsc.rst b/doc/guides/nics/netvsc.rst new file mode 100644 index ..0f033e3a3d63 --- /dev/null +++ b/doc/guides/nics/netvsc.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: BSD-3-Clause +Copyright(c) Microsoft Corporation. + +Netvsc poll mode driver +=== + +The Netvsc Poll Mode driver (PMD) provides support for the paravirtualized +network device for Microsoft Hyper-V. It can be used with +Window Server 2008/2012/2016, Windows 10 and Azure cloud. +The device offers multi-queue support (if kernel and host support it), +checksum and segmentation offloads. + + +Features and Limitations of Hyper-V PMD +--- + +In this release, the hyper PMD driver provides the basic functionality of packet reception and transmission. + +* It supports merge-able buffers per packet when receiving packets and scattered buffer per packet +when transmitting packets. The packet size supported is from 64 to 65536. + +* The PMD supports multicast packets and promiscuous mode subject to restrictions on the host. +In order to this to work, the guest network configuration on Hyper-V must be configured to allow MAC address +spoofing. This option is not available on Azure. + +* The device has only a single MAC address. +Hyper-V driver does not support MAC or VLAN filtering because the Hyper-V host does not support it. + +* VLAN tags are always stripped and presented in mbuf tci field. + +* The Hyper-V driver does not use or support Link State or Rx interrupt. + +* The maximum number of queues is limited by the host (currently 64). +When used with 4.16 kernel only a single queue is available. + +* This driver is intended for use with synthetic path only. +Accelerated Networking (SR-IOV) acceleration is not supported yet. +Use the VDEV_NETVSC device for accelerated networking instead. + + +Installation + + +The Netvsc PMD is a standalone driver, similar to virtio and vmxnet3. +Using Netvsc PMD requires that the associated VMBUS device be bound to the userspace +I/O device driver for Hyper-V (uio_hv_generic). By default, all netvsc devices +will be bound to the Linux kernel driver; in order to use netvsc PMD the +device must first be overridden. + +The first step is to identify the network device to override. +VMBUS uses Universal Unique Identifiers +(`UUID`_) to identify devices on the bus similar to how PCI uses Domain:Bus:Function. +The UUID associated with a Linux kernel network device can be determined +by looking at the sysfs information. To find the UUID for eth1 and +store it in a shell variable: + +.. code-block:: console + + DEV_UUID=$(basename $(readlink /sys/class/net/eth1/device)) + + +.. _`UUID`: https://en.wikipedia.org/wiki/Universally_unique_identifier + +There are several possible ways to assign the uio device driver for a device. +The easiest way (but only on 4.18 or later) +is to use the `driverctl Device Driver control utility`_ to override +the normal kernel device. + +.. code-block:: console + + driverctl -b vmbus set-override $DEV_UUID uio_hv_generic + +.. _`driverctl Device Driver control utility`: https://gitlab.com/driverctl/driverctl + +Any settings done with driverctl are by defa
[dpdk-dev] [PATCH v11 2/4] bus/vmbus: add hyper-v virtual bus support
This patch adds support for an additional bus type Virtual Machine BUS (VMBUS) on Microsoft Hyper-V in Windows 10, Windows Server 2016 and Azure. Most of this code was extracted from FreeBSD and some of this is from earlier code donated by Brocade. Only Linux is supported at present, but the code is split to allow future FreeBSD and Windows support. The bus support relies on the uio_hv_generic driver from Linux kernel 4.16. Multiple queue support requires additional sysfs interfaces which is in kernel 5.0 (a.k.a 4.17). Signed-off-by: Stephen Hemminger --- MAINTAINERS | 3 + config/common_base | 5 + drivers/bus/Makefile| 1 + drivers/bus/meson.build | 2 +- drivers/bus/vmbus/Makefile | 36 ++ drivers/bus/vmbus/linux/Makefile| 3 + drivers/bus/vmbus/linux/vmbus_bus.c | 355 + drivers/bus/vmbus/linux/vmbus_uio.c | 390 +++ drivers/bus/vmbus/meson.build | 18 + drivers/bus/vmbus/private.h | 132 +++ drivers/bus/vmbus/rte_bus_vmbus.h | 396 +++ drivers/bus/vmbus/rte_bus_vmbus_version.map | 28 ++ drivers/bus/vmbus/rte_vmbus_reg.h | 344 + drivers/bus/vmbus/vmbus_bufring.c | 241 drivers/bus/vmbus/vmbus_channel.c | 406 drivers/bus/vmbus/vmbus_common.c| 286 ++ drivers/bus/vmbus/vmbus_common_uio.c| 232 +++ mk/rte.app.mk | 1 + 18 files changed, 2878 insertions(+), 1 deletion(-) create mode 100644 drivers/bus/vmbus/Makefile create mode 100644 drivers/bus/vmbus/linux/Makefile create mode 100644 drivers/bus/vmbus/linux/vmbus_bus.c create mode 100644 drivers/bus/vmbus/linux/vmbus_uio.c create mode 100644 drivers/bus/vmbus/meson.build create mode 100644 drivers/bus/vmbus/private.h create mode 100644 drivers/bus/vmbus/rte_bus_vmbus.h create mode 100644 drivers/bus/vmbus/rte_bus_vmbus_version.map create mode 100644 drivers/bus/vmbus/rte_vmbus_reg.h create mode 100644 drivers/bus/vmbus/vmbus_bufring.c create mode 100644 drivers/bus/vmbus/vmbus_channel.c create mode 100644 drivers/bus/vmbus/vmbus_common.c create mode 100644 drivers/bus/vmbus/vmbus_common_uio.c diff --git a/MAINTAINERS b/MAINTAINERS index 4667fa7fbcb1..e9e0f9c188fe 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -421,6 +421,9 @@ F: drivers/bus/pci/ VDEV bus driver F: drivers/bus/vdev/ +VMBUS bus driver +M: Stephen Hemminger +F: drivers/bus/vmbus/ Networking Drivers -- diff --git a/config/common_base b/config/common_base index 6b0d1cbbb76c..e4e30ba50437 100644 --- a/config/common_base +++ b/config/common_base @@ -402,6 +402,11 @@ CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y CONFIG_RTE_LIBRTE_MVPP2_PMD=n # +# Compile support for VMBus library +# +CONFIG_RTE_LIBRTE_VMBUS=n + + # Compile virtual device driver for NetVSC on Hyper-V/Azure # CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile index ef7f24751974..cea3b55e60c9 100644 --- a/drivers/bus/Makefile +++ b/drivers/bus/Makefile @@ -10,5 +10,6 @@ endif DIRS-$(CONFIG_RTE_LIBRTE_IFPGA_BUS) += ifpga DIRS-$(CONFIG_RTE_LIBRTE_PCI_BUS) += pci DIRS-$(CONFIG_RTE_LIBRTE_VDEV_BUS) += vdev +DIRS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/bus/meson.build b/drivers/bus/meson.build index 52c755dcfd8e..80de2d91d52d 100644 --- a/drivers/bus/meson.build +++ b/drivers/bus/meson.build @@ -1,7 +1,7 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel Corporation -drivers = ['dpaa', 'fslmc', 'ifpga', 'pci', 'vdev'] +drivers = ['dpaa', 'fslmc', 'ifpga', 'pci', 'vdev', 'vmbus'] std_deps = ['eal'] config_flag_fmt = 'RTE_LIBRTE_@0@_BUS' driver_name_fmt = 'rte_bus_@0@' diff --git a/drivers/bus/vmbus/Makefile b/drivers/bus/vmbus/Makefile new file mode 100644 index ..bd18a71154af --- /dev/null +++ b/drivers/bus/vmbus/Makefile @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: BSD-3-Clause + +include $(RTE_SDK)/mk/rte.vars.mk + +LIB = librte_bus_vmbus.a +LIBABIVER := 1 +EXPORT_MAP := rte_bus_vmbus_version.map + +CFLAGS += -I$(SRCDIR) +CFLAGS += -O3 $(WERROR_FLAGS) +CFLAGS += -DALLOW_EXPERIMENTAL_API + +ifneq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),) +SYSTEM := linux +endif +ifneq ($(CONFIG_RTE_EXEC_ENV_BSDAPP),) +$(error "VMBUS not implemented for BSD yet") +endif + +CFLAGS += -I$(RTE_SDK)/drivers/bus/vmbus/$(SYSTEM) +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/$(SYSTEM)app/eal + +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring +LDLIBS += -lrte_ethdev -luuid + +include $(RTE_SDK)/drivers/bus/vmbus/$(SYSTEM)/Makefile +SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) := $(addprefix $(SYSTEM)/,$(SRCS)) +SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_common.c +SRC
[dpdk-dev] [PATCH v11 3/4] net/netvsc: add hyper-v netvsc network device
The driver supports Hyper-V networking directly like virtio for KVM or vmxnet3 for VMware. This code is based off of the FreeBSD driver. The file and variable names are kept the same to help with understanding (with most of the BSD style warts removed). This version supports the latest NetVSP 6.1 version and older versions. Signed-off-by: Haiyang Zhang Signed-off-by: Stephen Hemminger --- MAINTAINERS |8 + config/common_base|8 + config/common_linuxapp|3 + drivers/net/Makefile |1 + drivers/net/meson.build |2 +- drivers/net/netvsc/Makefile | 23 + drivers/net/netvsc/hn_ethdev.c| 755 ++ drivers/net/netvsc/hn_logs.h | 35 + drivers/net/netvsc/hn_nvs.c | 535 drivers/net/netvsc/hn_nvs.h | 245 drivers/net/netvsc/hn_rndis.c | 1097 +++ drivers/net/netvsc/hn_rndis.h | 26 + drivers/net/netvsc/hn_rxtx.c | 1216 + drivers/net/netvsc/hn_var.h | 139 ++ drivers/net/netvsc/meson.build|7 + drivers/net/netvsc/ndis.h | 378 + drivers/net/netvsc/rndis.h| 414 ++ drivers/net/netvsc/rte_pmd_netvsc_version.map |5 + mk/rte.app.mk |1 + 19 files changed, 4897 insertions(+), 1 deletion(-) create mode 100644 drivers/net/netvsc/Makefile create mode 100644 drivers/net/netvsc/hn_ethdev.c create mode 100644 drivers/net/netvsc/hn_logs.h create mode 100644 drivers/net/netvsc/hn_nvs.c create mode 100644 drivers/net/netvsc/hn_nvs.h create mode 100644 drivers/net/netvsc/hn_rndis.c create mode 100644 drivers/net/netvsc/hn_rndis.h create mode 100644 drivers/net/netvsc/hn_rxtx.c create mode 100644 drivers/net/netvsc/hn_var.h create mode 100644 drivers/net/netvsc/meson.build create mode 100644 drivers/net/netvsc/ndis.h create mode 100644 drivers/net/netvsc/rndis.h create mode 100644 drivers/net/netvsc/rte_pmd_netvsc_version.map diff --git a/MAINTAINERS b/MAINTAINERS index e9e0f9c188fe..10bb5444561f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -600,6 +600,14 @@ F: drivers/net/vdev_netvsc/ F: doc/guides/nics/vdev_netvsc.rst F: doc/guides/nics/features/vdev_netvsc.ini +Microsoft Hyper-V netvsc - EXPERIMENTAL +M: Stephen Hemminger +M: "K. Y. Srinivasan" +M: Haiyang Zhang +F: drivers/net/hyperv/ +F: doc/guides/nics/netvsc.rst +F: doc/guides/nics/features/netvsc.ini + Netcope szedata2 M: Matej Vido F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index e4e30ba50437..7c825c7b250f 100644 --- a/config/common_base +++ b/config/common_base @@ -406,7 +406,15 @@ CONFIG_RTE_LIBRTE_MVPP2_PMD=n # CONFIG_RTE_LIBRTE_VMBUS=n +# +# Compile native PMD for Hyper-V/Azure +# +CONFIG_RTE_LIBRTE_NETVSC_PMD=n +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_RX=n +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_TX=n +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_DUMP=n +# # Compile virtual device driver for NetVSC on Hyper-V/Azure # CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n diff --git a/config/common_linuxapp b/config/common_linuxapp index 5c68cc0ff440..b452ccd87990 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -25,6 +25,9 @@ CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y CONFIG_RTE_PROC_INFO=y +CONFIG_RTE_LIBRTE_VMBUS=y +CONFIG_RTE_LIBRTE_NETVSC_PMD=y + # NXP DPAA BUS and drivers CONFIG_RTE_LIBRTE_DPAA_BUS=y CONFIG_RTE_LIBRTE_DPAA_MEMPOOL=y diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 9f9da665173a..7cfc08d6f340 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -33,6 +33,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4 DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5 DIRS-$(CONFIG_RTE_LIBRTE_MVPP2_PMD) += mvpp2 +DIRS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += netvsc DIRS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp DIRS-$(CONFIG_RTE_LIBRTE_BNXT_PMD) += bnxt DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null diff --git a/drivers/net/meson.build b/drivers/net/meson.build index b7d00a04c8ef..74d6c0f7cabf 100644 --- a/drivers/net/meson.build +++ b/drivers/net/meson.build @@ -3,7 +3,7 @@ drivers = ['af_packet', 'axgbe', 'bonding', 'dpaa', 'dpaa2', 'e1000', 'enic', 'fm10k', 'i40e', 'ixgbe', - 'mvpp2', 'null', 'octeontx', 'pcap', 'ring', + 'mvpp2', 'netvsc', 'null', 'octeontx', 'pcap', 'ring', 'sfc', 'thunderx', 'virtio'] std_deps = ['ethdev', 'kvargs'] # 'ethdev' also pulls in mbuf, net, eal etc std_deps += ['bus_pci'] # very many PMDs depend on PCI, so make std diff --git a/drivers/net/netvsc/Makefile b/drivers/net/netvsc/Makefile new file mode 100644 index ..3c713af3c8fc --- /dev/null +++ b/drivers/net/netvsc/Makefile @@ -0,0 +1,23 @@ +# SPDX-Licen
Re: [dpdk-dev] [PATCH v2 08/15] net/ifc: rename to ifcvf
Hi Bruce, > -Original Message- > From: Richardson, Bruce > Sent: Saturday, June 9, 2018 5:21 AM > To: dev@dpdk.org > Cc: Richardson, Bruce ; Wang, Xiao W > > Subject: [PATCH v2 08/15] net/ifc: rename to ifcvf > > All files in the directory and the resulting driver have prefix of ifcvf, > not just ifc, so rename directory for accuracy. Also rename the map file > to standard name for meson build in the process. Compared with renaming the dir to IFCVF and renaming it back to IFC sometime in future, I think keeping the dir name as IFC is better for us, this avoids the extra effort. We can just rename below files: doc/guides/nics/ifcvf.rst => doc/guides/nics/ifc.rst drivers/net/ifc/rte_ifcvf_version.map => drivers/net/ifc/rte_pmd_ifc_version. And yes, we need to update documents which refer to ifc. Thanks! Xiao > > CC: Xiao Wang > Signed-off-by: Bruce Richardson > --- > MAINTAINERS | 4 ++-- > drivers/net/Makefile | 2 +- > drivers/net/{ifc => ifcvf}/Makefile | 2 +- > drivers/net/{ifc => ifcvf}/base/ifcvf.c | 0 > drivers/net/{ifc => ifcvf}/base/ifcvf.h | 0 > drivers/net/{ifc => ifcvf}/base/ifcvf_osdep.h | 0 > drivers/net/{ifc => ifcvf}/ifcvf_vdpa.c | 0 > .../rte_ifcvf_version.map => ifcvf/rte_pmd_ifcvf_version.map} | 0 > 8 files changed, 4 insertions(+), 4 deletions(-) > rename drivers/net/{ifc => ifcvf}/Makefile (94%) > rename drivers/net/{ifc => ifcvf}/base/ifcvf.c (100%) > rename drivers/net/{ifc => ifcvf}/base/ifcvf.h (100%) > rename drivers/net/{ifc => ifcvf}/base/ifcvf_osdep.h (100%) > rename drivers/net/{ifc => ifcvf}/ifcvf_vdpa.c (100%) > rename drivers/net/{ifc/rte_ifcvf_version.map => > ifcvf/rte_pmd_ifcvf_version.map} (100%) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4667fa7fb..4f6055590 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -559,10 +559,10 @@ T: git://dpdk.org/next/dpdk-next-net-intel > F: drivers/net/avf/ > F: doc/guides/nics/features/avf*.ini > > -Intel ifc > +Intel ifcvf > M: Xiao Wang > T: git://dpdk.org/next/dpdk-next-net-intel > -F: drivers/net/ifc/ > +F: drivers/net/ifcvf/ > F: doc/guides/nics/ifcvf.rst > F: doc/guides/nics/features/ifcvf.ini > > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index 9f9da6651..9308f9a7b 100644 > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -59,7 +59,7 @@ endif # $(CONFIG_RTE_LIBRTE_SCHED) > ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) > DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost > ifeq ($(CONFIG_RTE_EAL_VFIO),y) > -DIRS-$(CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD) += ifc > +DIRS-$(CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD) += ifcvf > endif > endif # $(CONFIG_RTE_LIBRTE_VHOST) > > diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifcvf/Makefile > similarity index 94% > rename from drivers/net/ifc/Makefile > rename to drivers/net/ifcvf/Makefile > index 1011995bc..a022faaad 100644 > --- a/drivers/net/ifc/Makefile > +++ b/drivers/net/ifcvf/Makefile > @@ -22,7 +22,7 @@ BASE_DRIVER_OBJS=$(sort $(patsubst %.c,%.o,$(notdir > $(wildcard $(SRCDIR)/base/*. > > VPATH += $(SRCDIR)/base > > -EXPORT_MAP := rte_ifcvf_version.map > +EXPORT_MAP := rte_pmd_ifcvf_version.map > > LIBABIVER := 1 > > diff --git a/drivers/net/ifc/base/ifcvf.c b/drivers/net/ifcvf/base/ifcvf.c > similarity index 100% > rename from drivers/net/ifc/base/ifcvf.c > rename to drivers/net/ifcvf/base/ifcvf.c > diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifcvf/base/ifcvf.h > similarity index 100% > rename from drivers/net/ifc/base/ifcvf.h > rename to drivers/net/ifcvf/base/ifcvf.h > diff --git a/drivers/net/ifc/base/ifcvf_osdep.h > b/drivers/net/ifcvf/base/ifcvf_osdep.h > similarity index 100% > rename from drivers/net/ifc/base/ifcvf_osdep.h > rename to drivers/net/ifcvf/base/ifcvf_osdep.h > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifcvf/ifcvf_vdpa.c > similarity index 100% > rename from drivers/net/ifc/ifcvf_vdpa.c > rename to drivers/net/ifcvf/ifcvf_vdpa.c > diff --git a/drivers/net/ifc/rte_ifcvf_version.map > b/drivers/net/ifcvf/rte_pmd_ifcvf_version.map > similarity index 100% > rename from drivers/net/ifc/rte_ifcvf_version.map > rename to drivers/net/ifcvf/rte_pmd_ifcvf_version.map > -- > 2.17.1
[dpdk-dev] [RFC v2] eventdev: event tx adapter APIs
Add common APIs for the transmit stage of an event driven DPDK application. Also add a transmit queue field to the mbuf that is used by the adapter to transmit mbufs. Signed-off-by: Nikhil Rao --- Changelog = v1->v2: * Add the tx_adapter_enqueue function to struct rte_eventdev. It is set to the common Tx adapter function when creating the adapter if the eventdev PMD does not support it or if the DEV_TX_OFFLOAD_MT_LOCKFREE flag is NOT set on all ethernet devices. * Add the rte_event_eth_tx_adapter_enqueue() API. * Add the txq_id field to struct rte_mbuf. lib/librte_eventdev/rte_event_eth_tx_adapter.h | 380 + lib/librte_eventdev/rte_eventdev.h | 7 +- lib/librte_mbuf/rte_mbuf.h | 20 +- 3 files changed, 405 insertions(+), 2 deletions(-) create mode 100644 lib/librte_eventdev/rte_event_eth_tx_adapter.h diff --git a/lib/librte_eventdev/rte_event_eth_tx_adapter.h b/lib/librte_eventdev/rte_event_eth_tx_adapter.h new file mode 100644 index 000..a0e8505 --- /dev/null +++ b/lib/librte_eventdev/rte_event_eth_tx_adapter.h @@ -0,0 +1,380 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2017 Intel Corporation. + */ + +#ifndef _RTE_EVENT_ETH_TX_ADAPTER_ +#define _RTE_EVENT_ETH_TX_ADAPTER_ + +/** + * @file + * + * RTE Event Ethernet Tx Adapter + * + * The event ethernet Tx adapter provides configuration and data path APIs + * for the transmit stage of an event driven packet processing application. + * These APIs abstract the implementation of the transmit stage and allow the + * the application to use eventdev PMD support or a common implementation. + * + * In the common implementation, the application uses the adapter API to + * enqueue mbufs to the adapter which runs as a rte_service function. The + * service function deqeueues events from its event port and transmits the + * mbufs referenced by these events. + * + * The ethernet Tx event adapter APIs are: + * + * - rte_event_eth_tx_adapter_create() + * - rte_event_eth_tx_adapter_create_ext() + * - rte_event_eth_tx_adapter_free() + * - rte_event_eth_tx_adapter_start() + * - rte_event_eth_tx_adapter_stop() + * - rte_event_eth_tx_adapter_queue_start() + * - rte_event_eth_tx_adapter_queue_stop() + * - rte_event_eth_tx_adapter_enqueue() + * - rte_event_eth_tx_adapter_stats_get() + * - rte_event_eth_tx_adapter_stats_reset() + * + * The application creates the adapter using + * rte_event_eth_tx_adapter_create(). The adapter may internally create an event + * port using the port configuration parameter. + * The adapter is responsible for linking the queue as per its implementation, + * for example in the case of the service function, the adapter links this queue + * to the event port it will dequeue events from. + * + * The application uses rte_event_eth_tx_adapter_enqueue() to send mbufs to the + * adaptervia this event queue. The ethernet port and transmit queue index to + * transmit the mbuf on are specified in the mbuf. + * + * The application can start and stop the adapter using the + * rte_event_eth_tx_adapter_start/stop() calls. + * + * To support dynamic reconfiguration of Tx queues, the application can + * call rte_event_eth_tx_adapter_queue_start()/stop() to synchronize + * access to the Tx queue with the adapter. For example, if the application + * wants to reconfigure a Tx queue that could be concurrently + * being accessed by the adapter, it calls rte_event_eth_tx_adapter_queue_stop() + * first, reconfigures the queue and then calls + * rte_event_eth_tx_adapter_queue_start() which signals to the adapter + * that it is safe to resume access to the Tx queue. + * + * The common adapter implementation uses an EAL service function as described + * before and its execution is controlled using the rte_service APIs. The + * rte_event_eth_tx_adapter_service_id_get() + * function can be used to retrieve the adapter's service function ID. + */ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "rte_eventdev.h" + +#define RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32 + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Adapter configuration structure + */ +struct rte_event_eth_tx_adapter_conf { + uint8_t event_port_id; + /**< Event port identifier, the adapter service function dequeues mbuf +* events from this port. +*/ + uint32_t max_nb_tx; + /**< The adapter can return early if it has processed at least +* max_nb_tx mbufs. This isn't treated as a requirement; batching may +* cause the adapter to process more than max_nb_tx mbufs. +*/ +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Function type used for adapter configuration callback. The callback is + * used to fill in members of the struct rte_event_eth_tx_adapter_conf, this + * callback is invoked when creating a SW service to transmit packets. +
[dpdk-dev] [PATCH v3] net/i40e: workaround for Fortville performance
The GL_SWR_PM_UP_THR value is not impacted from the link speed, its value is set according to the total number of ports for a better pipe-monitor configuration. All bellowing relevant device IDs are considered (NICs, LOMs, Mezz and Backplane): Device-ID ValueComments 0x1572 0x03030303 10G SFI 0x1581 0x03030303 10G Backplane 0x1586 0x03030303 10G BaseT 0x1589 0x03030303 10G BaseT (FortPond) 0x1580 0x06060606 40G Backplane 0x1583 0x06060606 2x40G QSFP 0x1584 0x06060606 1x40G QSFP 0x1587 0x06060606 20G Backplane (HP) 0x1588 0x06060606 20G KR2 (HP) 0x158A 0x06060606 25G Backplane 0x158B 0x06060606 25G SFP28 Fixes: c9223a2bf53c ("i40e: workaround for XL710 performance") Fixes: 75d133dd3296 ("net/i40e: enable 25G device") Cc: sta...@dpdk.org Signed-off-by: Haiyue Wang --- v2 -> v3: - Change the return type of i40e_get_swr_pm_cfg from int to bool. v1 -> v2: - The GL_SWR_PM_UP_THR register size is 4B, so change the table value type from uint64_t to uint32_t to reduce the table size. - Fix two CAMELCASE coding style errors. --- drivers/net/i40e/i40e_ethdev.c | 71 +- 1 file changed, 64 insertions(+), 7 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 13c5d32..ef17de8 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -10003,6 +10003,60 @@ i40e_pctype_to_flowtype(const struct i40e_adapter *adapter, #define I40E_GL_SWR_PM_UP_THR_SF_VALUE 0x06060606 #define I40E_GL_SWR_PM_UP_THR0x269FBC +/* + * GL_SWR_PM_UP_THR: + * The value is not impacted from the link speed, its value is set according + * to the total number of ports for a better pipe-monitor configuration. + */ +static bool +i40e_get_swr_pm_cfg(struct i40e_hw *hw, uint32_t *value) +{ +#define I40E_GL_SWR_PM_EF_DEVICE(dev) \ + .device_id = (dev), \ + .val = I40E_GL_SWR_PM_UP_THR_EF_VALUE + +#define I40E_GL_SWR_PM_SF_DEVICE(dev) \ + .device_id = (dev), \ + .val = I40E_GL_SWR_PM_UP_THR_SF_VALUE + + static const struct { + uint16_t device_id; + uint32_t val; + } swr_pm_table[] = { + { I40E_GL_SWR_PM_EF_DEVICE(I40E_DEV_ID_SFP_XL710) }, + { I40E_GL_SWR_PM_EF_DEVICE(I40E_DEV_ID_KX_C) }, + { I40E_GL_SWR_PM_EF_DEVICE(I40E_DEV_ID_10G_BASE_T) }, + { I40E_GL_SWR_PM_EF_DEVICE(I40E_DEV_ID_10G_BASE_T4) }, + + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_KX_B) }, + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_QSFP_A) }, + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_QSFP_B) }, + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_20G_KR2) }, + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_20G_KR2_A) }, + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_25G_B) }, + { I40E_GL_SWR_PM_SF_DEVICE(I40E_DEV_ID_25G_SFP28) }, + }; + uint32_t i; + + if (value == NULL) { + PMD_DRV_LOG(ERR, "value is NULL"); + return false; + } + + for (i = 0; i < RTE_DIM(swr_pm_table); i++) { + if (hw->device_id == swr_pm_table[i].device_id) { + *value = swr_pm_table[i].val; + + PMD_DRV_LOG(DEBUG, "Device 0x%x with GL_SWR_PM_UP_THR " + "value - 0x%08x", + hw->device_id, *value); + return true; + } + } + + return false; +} + static int i40e_dev_sync_phy_type(struct i40e_hw *hw) { @@ -10067,13 +10121,16 @@ i40e_configure_registers(struct i40e_hw *hw) } if (reg_table[i].addr == I40E_GL_SWR_PM_UP_THR) { - if (I40E_PHY_TYPE_SUPPORT_40G(hw->phy.phy_types) || /* For XL710 */ - I40E_PHY_TYPE_SUPPORT_25G(hw->phy.phy_types)) /* For XXV710 */ - reg_table[i].val = - I40E_GL_SWR_PM_UP_THR_SF_VALUE; - else /* For X710 */ - reg_table[i].val = - I40E_GL_SWR_PM_UP_THR_EF_VALUE; + uint32_t cfg_val; + + if (!i40e_get_swr_pm_cfg(hw, &cfg_val)) { + PMD_DRV_LOG(DEBUG, "Device 0x%x skips " + "GL_SWR_PM_UP_THR value fixup", + hw->device_id); + continue; + } + + reg_table[i].val = cfg_val; } ret = i40e_aq_debug_read_register(hw, reg_table[i].addr, -- 2.7.4
Re: [dpdk-dev] [PATCH 3/6] cryptodev: remove max number of sessions
On Tue, Jun 12, 2018 at 01:53:36PM +, De Lara Guarch, Pablo wrote: > > > > -Original Message- > > From: Tomasz Duszynski [mailto:t...@semihalf.com] > > Sent: Tuesday, June 12, 2018 12:38 PM > > To: De Lara Guarch, Pablo > > Cc: Doherty, Declan ; akhil.go...@nxp.com; > > ravi1.ku...@amd.com; jerin.ja...@caviumnetworks.com; Zhang, Roy Fan > > ; Trahe, Fiona ; > > t...@semihalf.com; jianjay.z...@huawei.com; dev@dpdk.org > > Subject: Re: [PATCH 3/6] cryptodev: remove max number of sessions > > > > Hello Pablo, > > > > On Fri, Jun 08, 2018 at 11:02:31PM +0100, Pablo de Lara wrote: > > > Sessions are not created and stored in the crypto device > > > anymore, since now the session mempool is created > > > at the application level. > > > > > > Therefore the limitation of the maximum number of sessions > > > that can be created should not be dependent of the crypto device. > > > > > > Signed-off-by: Pablo de Lara > > ... > > > > diff --git a/drivers/crypto/mvsam/rte_mrvl_pmd.c > > b/drivers/crypto/mvsam/rte_mrvl_pmd.c > > > index 1b6029a56..822b6cac7 100644 > > > --- a/drivers/crypto/mvsam/rte_mrvl_pmd.c > > > +++ b/drivers/crypto/mvsam/rte_mrvl_pmd.c > > > @@ -719,7 +719,6 @@ cryptodev_mrvl_crypto_create(const char *name, > > > internals = dev->data->dev_private; > > > > > > internals->max_nb_qpairs = init_params->max_nb_queue_pairs; > > > - internals->max_nb_sessions = init_params->max_nb_sessions; > > > > > > /* > > >* ret == -EEXIST is correct, it means DMA > > > @@ -734,8 +733,6 @@ cryptodev_mrvl_crypto_create(const char *name, > > > "DMA memory has been already initialized by a > > different driver."); > > > } > > > > > > - sam_params.max_num_sessions = internals->max_nb_sessions; > > > > This will not fly since library maintains separate list of sessions. > > We have to initialize this number to something sane. Since we cannot > > get it from userspace perhaps make that compile-time configurable > > by adding separate CONFIG_? > > Hi Tomasz, > > If you need to have an actual limit, you could define it internally > (not adding an external configuration option), but bear in mind that > This won't prevent an application from trying to allocate more sessions. You can define arbitrary number of session on condition you have enough memory. So no hard limit here. What bothers me is the case where app wants to initialize more session than the library internally has. If this happens userspace will get an error. On the other hand requesting some arbitrary large number of session from library and hoping app will never use so many wastes memory (which might be valuable on resource constrained systems). That is why keeping the number of sessions in app and library in sync is important. Do we have any option in DPDK now to workaround this? > > If your PMD has a limitation on the maximum number of sessions, then maybe > this change > won't work for you (removing the maximum number of sessions), so let me know > and we can discuss this. > > Thanks, > Pablo > > P.S. Please, next time, strip out the code that you are not commenting, as it > was hard to find this question :) > -- - Tomasz Duszyński
Re: [dpdk-dev] [PATCH 1/7] vhost: announce VIRTIO_F_IN_ORDER support
On Fri, Jun 08, 2018 at 05:07:18PM +0800, Marvin Liu wrote: [...] > @@ -853,6 +853,10 @@ rte_vhost_driver_register(const char *path, uint64_t > flags) > vsocket->supported_features = VIRTIO_NET_SUPPORTED_FEATURES; > vsocket->features = VIRTIO_NET_SUPPORTED_FEATURES; > > + /* Dequeue zero copy can't assure descriptors returned in order */ > + if (vsocket->dequeue_zero_copy) > + vsocket->features &= ~(1ULL << VIRTIO_F_IN_ORDER); You also need to clear this bit from vsocket->supported_features. Thanks
Re: [dpdk-dev] [PATCH 3/7] net/virtio-user: add mgr_rxbuf and in_order vdev parameters
On Fri, Jun 08, 2018 at 05:07:20PM +0800, Marvin Liu wrote: [...] > @@ -419,6 +420,12 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char > *path, int queues, > dev->device_features = VIRTIO_USER_SUPPORTED_FEATURES; > } > > + if (!mrg_rxbuf) > + dev->device_features &= ~(1ull << VIRTIO_NET_F_MRG_RXBUF); > + > + if (!in_order) > + dev->device_features &= ~(1ull << VIRTIO_F_IN_ORDER); You also need to handle the server mode case. In virtio_user_server_reconnect(), dev->device_features will be overwritten. Thanks
Re: [dpdk-dev] [PATCH v3 2/2] net/ifcvf: enable the host notifier support
Hi, > -Original Message- > From: Bie, Tiwei > Sent: Friday, June 8, 2018 11:22 AM > To: maxime.coque...@redhat.com; dev@dpdk.org > Cc: Wang, Xiao W > Subject: [PATCH v3 2/2] net/ifcvf: enable the host notifier support > > The necessary vDPA ops have already been implemented > in ifcvf driver. So just need to announce the necessary > protocol features to enable the host notifier support. > > Signed-off-by: Tiwei Bie > --- > drivers/net/ifc/ifcvf_vdpa.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c > index c6627c23a..b8e22daf3 100644 > --- a/drivers/net/ifc/ifcvf_vdpa.c > +++ b/drivers/net/ifc/ifcvf_vdpa.c > @@ -646,6 +646,9 @@ ifcvf_get_vdpa_features(int did, uint64_t *features) > > #define VDPA_SUPPORTED_PROTOCOL_FEATURES \ > (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \ > + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \ > + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \ > + 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER | \ >1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) > static int > ifcvf_get_protocol_features(int did __rte_unused, uint64_t *features) > -- > 2.17.0 Acked-by: Xiao Wang BRs, Xiao
Re: [dpdk-dev] [dpdk-stable] 18.02.2 patches review and test
Luca, I ran some smoke tests on 18.02.2 / 17.11.3 and 16.11.7 using test_pmd and found no issues with these snapshots. Further, I ran some tests via OvS-DPDK and VPP (DPDK accelerated) using the aforementioned snapshots and saw no issues. Hope this helps. Cheers, Marco On Fri, 2018-06-08 at 10:16 +0100, Luca Boccassi wrote: > On Mon, 2018-06-04 at 09:58 +0100, Luca Boccassi wrote: > > Hi all, > > > > Here is a list of patches targeted for stable release 18.02.2. > > Please > > help review and test. The planned date for the final release is > > Thursday, > > the 14th of June. Before that, please shout if anyone has > > objections > > with these > > patches being applied. > > > > Also for the companies committed to running regression tests, > > please run the tests and report any issue before the release date. > > > > These patches are located at branch 18.02 of dpdk-stable repo: > > https://dpdk.org/browse/dpdk-stable/ > > > > Thanks. > > > > Luca Boccassi > > Hi, > > The release date for 18.02.2 is being postponed by one day to Friday > the 15th to due to unforeseen delays in some regression tests. > Apologies for the inconvenience. >