[dpdk-dev] [v3] crypto/cnxk: add telemetry endpoints to cryptodev
Add telemetry endpoints to cryptodev. Signed-off-by: Gowrishankar Muthukrishnan --- Depends-on: patch-19252 ("cryptodev: add telemetry endpoint for cryptodev info") Depends-on: series-19253 ("cnxk: enable telemetry endpoints") v3: - common info moved to lib/cryptodev endpoint. --- .../crypto/cnxk/cnxk_cryptodev_telemetry.c| 119 ++ drivers/crypto/cnxk/meson.build | 1 + 2 files changed, 120 insertions(+) create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c b/drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c new file mode 100644 index 00..657004e65a --- /dev/null +++ b/drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c @@ -0,0 +1,119 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + +#include +#include +#include + +#include + +#include "cnxk_cryptodev.h" +#include "cnxk_telemetry.h" + +#define CRYPTO_CAPS_SZ \ + (RTE_ALIGN_CEIL(sizeof(struct rte_cryptodev_capabilities), \ + sizeof(uint64_t)) /\ +sizeof(uint64_t)) + +#define SEC_CAPS_SZ \ + (RTE_ALIGN_CEIL(sizeof(struct rte_security_capability),\ + sizeof(uint64_t)) /\ +sizeof(uint64_t)) + +static int +crypto_caps_array(struct rte_tel_data *d, + struct rte_cryptodev_capabilities *dev_caps, + size_t dev_caps_n) +{ + union caps_u { + struct rte_cryptodev_capabilities dev_caps; + uint64_t val[CRYPTO_CAPS_SZ]; + } caps; + unsigned int i, j, n = 0; + + rte_tel_data_start_array(d, RTE_TEL_U64_VAL); + + for (i = 0; i < dev_caps_n; i++) { + if (dev_caps[i].op == RTE_CRYPTO_OP_TYPE_UNDEFINED) + break; + + memset(&caps, 0, sizeof(caps)); + rte_memcpy(&caps.dev_caps, &dev_caps[i], sizeof(dev_caps[0])); + for (j = 0; j < CRYPTO_CAPS_SZ; j++) + rte_tel_data_add_array_u64(d, caps.val[j]); + ++n; + } + + return n; +} + +static int +sec_caps_array(struct rte_tel_data *d, struct rte_security_capability *dev_caps, + size_t dev_caps_n) +{ + union caps_u { + struct rte_security_capability dev_caps; + uint64_t val[SEC_CAPS_SZ]; + } caps; + unsigned int i, j, n = 0; + + rte_tel_data_start_array(d, RTE_TEL_U64_VAL); + + for (i = 0; i < dev_caps_n; i++) { + memset(&caps, 0, sizeof(caps)); + rte_memcpy(&caps.dev_caps, &dev_caps[i], sizeof(dev_caps[0])); + for (j = 0; j < SEC_CAPS_SZ; j++) + rte_tel_data_add_array_u64(d, caps.val[j]); + ++n; + } + + return n; +} + +static int +cryptodev_tel_handle_info(const char *cmd __rte_unused, const char *params, + struct rte_tel_data *d) +{ + struct rte_tel_data *sec_crypto_caps, *sec_caps; + char name[RTE_CRYPTODEV_NAME_MAX_LEN]; + int sec_crypto_caps_n, sec_caps_n; + struct rte_cryptodev *dev; + struct cnxk_cpt_vf *vf; + + if (params == NULL || strlen(params) == 0 || !isdigit(*params)) + return -EINVAL; + + rte_strlcpy(name, params, RTE_CRYPTODEV_NAME_LEN); + dev = rte_cryptodev_pmd_get_named_dev(name); + if (!dev) { + plt_err("No cryptodev of name %s available", name); + return -EINVAL; + } + + vf = dev->data->dev_private; + rte_tel_data_start_dict(d); + + /* Security Crypto capabilities */ + sec_crypto_caps = rte_tel_data_alloc(); + sec_crypto_caps_n = crypto_caps_array( + sec_crypto_caps, vf->sec_crypto_caps, CNXK_SEC_CRYPTO_MAX_CAPS); + rte_tel_data_add_dict_container(d, "sec_crypto_caps", sec_crypto_caps, + 0); + rte_tel_data_add_dict_int(d, "sec_crypto_caps_n", sec_crypto_caps_n); + + /* Security capabilities */ + sec_caps = rte_tel_data_alloc(); + sec_caps_n = sec_caps_array(sec_caps, vf->sec_caps, CNXK_SEC_MAX_CAPS); + rte_tel_data_add_dict_container(d, "sec_caps", sec_caps, 0); + rte_tel_data_add_dict_int(d, "sec_caps_n", sec_caps_n); + + return 0; +} + +RTE_INIT(cnxk_cryptodev_init_telemetry) +{ + rte_telemetry_register_cmd( + "/cnxk/cryptodev/info", cryptodev_tel_handle_info, + "Returns cryptodev info. Parameters: pci id"); +} diff --git a/drivers/crypto/cnxk/meson.build b/drivers/crypto/cnxk/meson.build index 437d208b5a..4350928289 100644 --- a/drivers/crypto/cnxk/meson.build +++ b/drivers/crypto/cnxk/meson.build @@ -19,6 +19,7 @@ sources =
Re: [dpdk-dev] [PATCH v2] examples/vhost: fix use-after-free on drain vhost
> -Original Message- > From: Ma, WenwuX > Sent: Saturday, September 25, 2021 1:23 AM > To: dev@dpdk.org > Cc: maxime.coque...@redhat.com; Xia, Chenbo ; Jiang, > Cheng1 ; Hu, Jiayu ; Yang, YvonneX > ; Ma, WenwuX ; sta...@dpdk.org > Subject: [PATCH v2] examples/vhost: fix use-after-free on drain vhost > > When a vdev is removed in destroy_device function, > the corresponding vhost TX buffer will also be freed, > but the vhost TX buffer may still be used in the > drain_vhost function, which will cause an error of > heap-use-after-free. Therefore, before accessing > vhost TX buffer, we need to check whether the vdev > has been removed, if so, let's skip this vdev. > > Fixes: a68ba8e0a6b6 ("examples/vhost: refactor vhost data path") > Cc: sta...@dpdk.org > > Signed-off-by: Wenwu Ma > --- Reviewed-by: Chenbo Xia
Re: [dpdk-dev] [PATCH v2 5/5] app/testpmd: add missing flow types in port info
On 9/22/21 13:31, Li, Xiaoyun wrote: Hi -Original Message- From: Maxime Coquelin Sent: Wednesday, September 22, 2021 17:58 To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com; david.march...@redhat.com; andrew.rybche...@oktetlabs.ru; Yigit, Ferruh ; michae...@nvidia.com; viachesl...@nvidia.com; Li, Xiaoyun Cc: sta...@dpdk.org; nelio.laranje...@6wind.com; yvuge...@redhat.com; ybend...@redhat.com; Maxime Coquelin Subject: [PATCH v2 5/5] app/testpmd: add missing flow types in port info This patch adds missing IPv6-Ex flow types to port info command. Signed-off-by: Maxime Coquelin --- app/test-pmd/config.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 9a4a0c232b..3550e0a18f 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -4528,6 +4528,9 @@ flowtype_to_str(uint16_t flow_type) {"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP}, {"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER}, {"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD}, + {"ipv6-ex", RTE_ETH_FLOW_IPV6_EX}, + {"ipv6-tcp-ex", RTE_ETH_FLOW_IPV6_TCP_EX}, + {"ipv6-udp-ex", RTE_ETH_FLOW_IPV6_UDP_EX}, {"port", RTE_ETH_FLOW_PORT}, {"vxlan", RTE_ETH_FLOW_VXLAN}, {"geneve", RTE_ETH_FLOW_GENEVE}, You add missing ipv6 ex why not adding RTE_ETH_FLOW_GTPU too. It's also missing. Also, flowtype_to_str is added but what about str2flowtype() in cmdline.c? Both suggestions are added to the next revision. Thanks, Maxime -- 2.31.1
Re: [dpdk-dev] [PATCH v5 1/2] eventdev: add rx queue conf get api
On Tue, Sep 28, 2021 at 3:04 PM Jayatheerthan, Jay wrote: > > > -Original Message- > > From: Kundapura, Ganapati > > Sent: Thursday, September 16, 2021 6:21 PM > > To: Jayatheerthan, Jay ; jerinjac...@gmail.com > > Cc: dev@dpdk.org; Yigit, Ferruh > > Subject: [PATCH v5 1/2] eventdev: add rx queue conf get api > > > > Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue > > information - event queue identifier, flags for handling received packets, > > scheduler type, event priority, polling frequency of the receive queue > > and flow identifier in rte_event_eth_rx_adapter_queue_conf structure > > > > Signed-off-by: Ganapati Kundapura > > > > --- > > v5: > > * Filled queue_conf after memzone lookup > > * PMD callback if not NULL, invoked to override queue_conf struct > > * Added memzone lookup for stats_get(), stats_reset(), service_id_get() > > api's called by secondary applications. > > > > v4: > > * squashed 1/3 and 3/3 > > * reused rte_event_eth_rx_adapter_queue_conf structure in place of > > rte_event_eth_rx_adapter_queue_info > > * renamed to rte_event_eth_rx_adapter_queue_info_get() to > > rte_event_eth_rx_adapter_queue_conf_get to align with > > rte_event_eth_rx_adapter_queue_conf structure > > > > v3: > > * Split single patch into implementaion, test and document updation > > patches separately > > > > v2: > > * Fixed build issue due to missing entry in version.map > > > > v1: > > * Initial patch with implementaion, test and doc together > > --- > > --- > > .../prog_guide/event_ethernet_rx_adapter.rst | 8 ++ > > lib/eventdev/eventdev_pmd.h| 28 +++ > > lib/eventdev/rte_event_eth_rx_adapter.c| 91 > > +- > > lib/eventdev/rte_event_eth_rx_adapter.h| 27 +++ > > lib/eventdev/version.map | 1 + > > 5 files changed, 154 insertions(+), 1 deletion(-) > > > > diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst > > b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst > > index 0780b6f..ce23d8a 100644 > > --- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst > > +++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst > > @@ -146,6 +146,14 @@ if the callback is supported, and the counts > > maintained by the service function, > > if one exists. The service function also maintains a count of cycles for > > which > > it was not able to enqueue to the event device. > > > > +Getting Adapter queue config > > + > > + > > +The ``rte_event_eth_rx_adapter_queue_conf_get()`` function reports > > +flags for handling received packets, event queue identifier, scheduler > > type, > > +event priority, polling frequency of the receive queue and flow identifier > > +in struct ``rte_event_eth_rx_adapter_queue_conf``. > > + > > Interrupt Based Rx Queues > > ~~ > > > > diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h > > index 63b3bc4..e69644b 100644 > > --- a/lib/eventdev/eventdev_pmd.h > > +++ b/lib/eventdev/eventdev_pmd.h > > @@ -562,6 +562,32 @@ typedef int (*eventdev_eth_rx_adapter_queue_del_t) > > int32_t rx_queue_id); > > > > /** > > + * Retrieve Rx adapter queue config information for the specified > > + * rx queue ID. > > + * > > + * @param dev > > + * Event device pointer > > + * > > + * @param eth_dev > > + * Ethernet device pointer > > + * > > + * @param rx_queue_id > > + * Ethernet device receive queue index. > > + * > > + * @param[out] queue_conf > > + * Pointer to rte_event_eth_rx_adapter_queue_conf structure > > + * > > + * @return > > + * - 0: Success > > + * - <0: Error code on failure. > > + */ > > +typedef int (*eventdev_eth_rx_adapter_queue_conf_get_t) > > + (const struct rte_eventdev *dev, > > + const struct rte_eth_dev *eth_dev, > > + uint16_t rx_queue_id, > > + struct rte_event_eth_rx_adapter_queue_conf > > *queue_conf); > > + > > +/** > > * Start ethernet Rx adapter. This callback is invoked if > > * the caps returned from eventdev_eth_rx_adapter_caps_get(.., eth_port_id) > > * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set and Rx queues > > @@ -1081,6 +1107,8 @@ struct rte_eventdev_ops { > > /**< Add Rx queues to ethernet Rx adapter */ > > eventdev_eth_rx_adapter_queue_del_t eth_rx_adapter_queue_del; > > /**< Delete Rx queues from ethernet Rx adapter */ > > + eventdev_eth_rx_adapter_queue_conf_get_t > > eth_rx_adapter_queue_conf_get; > > + /**< Get Rx adapter queue info */ > > eventdev_eth_rx_adapter_start_t eth_rx_adapter_start; > > /**< Start ethernet Rx adapter */ > > eventdev_eth_rx_adapter_stop_t eth_rx_adapter_stop; > > diff --git a/lib/eventdev/rte_event_eth_rx_adapter.c > > b/lib/eventdev/rte_event_eth_rx_adapter.c > > index f2dc695..6cc4210 100644 > > --- a/lib/eventde
Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs
Gentle ping for comments.. @David, could you help me understand what is the compile error in Fedora 31? DPDK_compile_spdk failure is expected as the header name for SPDK is changed, I am not sure if it's the same error... Thanks, Chenbo > -Original Message- > From: dev On Behalf Of Chenbo Xia > Sent: Saturday, September 18, 2021 10:25 AM > To: dev@dpdk.org; david.march...@redhat.com > Subject: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs > > As announced in the deprecation notice, most ABIs in PCI bus will be removed. > > As there exist some applications that want to access PCI memory resource, > two new APIs are defined in Patch 1 and corresponding changes are applied > to testpmd in Patch 2. > > Patch 3-4 clean up the unnecessary usage of PCI bus header in examples. > > Patch 5-6 clean up the unused PCI related structure in kni library and related > tests and examples. > > Patch 7 finally removes most of ABIs in PCI bus. > > --- > v2: > - Add check on call of port_id_pci_reg_write (Xiaoyun) > - Combine two clean-up patches in test and example, and backport (David) > > Chenbo Xia (7): > bus/pci: add new memory resource access APIs > app/testpmd: use PCI memory resource access APIs > examples/ethtool: use PCI library API to get PCI address > examples/kni: remove unused PCI bus header > kni: remove unused PCI info from test and example > kni: replace unused variable definition with reserved bytes > bus/pci: remove ABIs in PCI bus > > app/test-pmd/config.c | 50 +-- > app/test-pmd/testpmd.h| 54 +-- > app/test/test_kni.c | 57 --- > app/test/virtual_pmd.c| 2 +- > doc/guides/rel_notes/release_21_11.rst| 8 + > drivers/baseband/acc100/rte_acc100_pmd.c | 2 +- > .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c | 2 +- > drivers/baseband/fpga_lte_fec/fpga_lte_fec.c | 2 +- > drivers/bus/pci/bsd/pci.c | 1 - > drivers/bus/pci/linux/pci.c | 1 - > drivers/bus/pci/linux/pci_uio.c | 1 - > drivers/bus/pci/linux/pci_vfio.c | 1 - > drivers/bus/pci/meson.build | 4 + > drivers/bus/pci/pci_common.c | 78 > drivers/bus/pci/pci_common_uio.c | 1 - > drivers/bus/pci/pci_driver.h | 402 ++ > drivers/bus/pci/pci_params.c | 1 - > drivers/bus/pci/private.h | 3 +- > drivers/bus/pci/rte_bus_pci.h | 387 ++--- > drivers/bus/pci/version.map | 28 +- > drivers/common/cnxk/roc_platform.h| 2 +- > drivers/common/mlx5/linux/mlx5_common_verbs.c | 2 +- > drivers/common/mlx5/mlx5_common_pci.c | 2 +- > drivers/common/octeontx2/otx2_dev.h | 2 +- > drivers/common/octeontx2/otx2_sec_idev.c | 2 +- > drivers/common/qat/qat_device.h | 2 +- > drivers/common/qat/qat_qp.c | 2 +- > drivers/common/sfc_efx/sfc_efx.h | 2 +- > drivers/compress/mlx5/mlx5_compress.c | 2 +- > drivers/compress/octeontx/otx_zip.h | 2 +- > drivers/compress/qat/qat_comp.c | 2 +- > drivers/crypto/ccp/ccp_dev.h | 2 +- > drivers/crypto/ccp/ccp_pci.h | 2 +- > drivers/crypto/ccp/rte_ccp_pmd.c | 2 +- > drivers/crypto/cnxk/cn10k_cryptodev.c | 2 +- > drivers/crypto/cnxk/cn9k_cryptodev.c | 2 +- > drivers/crypto/mlx5/mlx5_crypto.c | 2 +- > drivers/crypto/nitrox/nitrox_device.h | 2 +- > drivers/crypto/octeontx/otx_cryptodev.c | 2 +- > drivers/crypto/octeontx/otx_cryptodev_ops.c | 2 +- > drivers/crypto/octeontx2/otx2_cryptodev.c | 2 +- > drivers/crypto/qat/qat_sym.c | 2 +- > drivers/crypto/qat/qat_sym_pmd.c | 2 +- > drivers/crypto/virtio/virtio_cryptodev.c | 2 +- > drivers/crypto/virtio/virtio_pci.h| 2 +- > drivers/event/dlb2/pf/dlb2_main.h | 2 +- > drivers/event/dlb2/pf/dlb2_pf.c | 2 +- > drivers/event/octeontx/ssovf_probe.c | 2 +- > drivers/event/octeontx/timvf_probe.c | 2 +- > drivers/event/octeontx2/otx2_evdev.c | 2 +- > drivers/mempool/cnxk/cnxk_mempool.c | 2 +- > drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- > drivers/mempool/octeontx2/otx2_mempool.c | 2 +- > drivers/mempool/octeontx2/otx2_mempool.h | 2 +- > drivers/mempool/octeontx2/otx2_mempool_irq.c | 2 +- > drivers/meson.build | 4 + > drivers/net/ark/ark_ethdev.c | 2 +- > drivers/net/avp/avp_ethdev.c | 2 +- > drivers/net/bnx2x/bnx2x.h | 2 +- > drivers/net/bnxt/bnxt.h | 2
[dpdk-dev] [PATCH] mbuf: remove deprecated bad outer IPv4 checksum flag on Rx
Removed offload flag PKT_RX_EIP_CKSUM_BAD. PKT_RX_OUTER_IP_CKSUM_BAD should be used as a replacement. Signed-off-by: Andrew Rybchenko --- doc/guides/rel_notes/deprecation.rst | 7 +-- doc/guides/rel_notes/release_21_11.rst | 3 +++ lib/mbuf/rte_mbuf_core.h | 7 --- 3 files changed, 4 insertions(+), 13 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 59445a6f42..3175426dad 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -42,7 +42,7 @@ Deprecation Notices * mbuf: The mbuf offload flags ``PKT_*`` will be renamed as ``RTE_MBUF_F_*``. A compatibility layer will be kept until DPDK 22.11, except for the flags that are already deprecated (``PKT_RX_L4_CKSUM_BAD``, ``PKT_RX_IP_CKSUM_BAD``, - ``PKT_RX_EIP_CKSUM_BAD``, ``PKT_TX_QINQ_PKT``) which will be removed. + ``PKT_RX_OUTER_IP_CKSUM_BAD``, ``PKT_TX_QINQ_PKT``) which will be removed. * pci: To reduce unnecessary ABIs exposed by DPDK bus driver, "rte_bus_pci.h" will be made internal in 21.11 and macros/data structures/functions defined @@ -159,11 +159,6 @@ Deprecation Notices will be limited to maximum 256 queues. Also compile time flag ``RTE_ETHDEV_QUEUE_STAT_CNTRS`` will be removed. -* ethdev: The offload flag ``PKT_RX_EIP_CKSUM_BAD`` will be removed and - replaced by the new flag ``PKT_RX_OUTER_IP_CKSUM_BAD``. The new name is more - consistent with existing outer header checksum status flag naming, which - should help in reducing confusion about its usage. - * i40e: As there are both i40evf and iavf pmd, the functions of them are duplicated. And now more and more advanced features are developed on iavf. To keep consistent with kernel driver's name diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index a84c912f20..6c5a3c8981 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -126,6 +126,9 @@ Removed Items blacklist/whitelist are removed. Users must use the new block/allow list arguments. +* mbuf: Removed offload flag ``PKT_RX_EIP_CKSUM_BAD``. + ``PKT_RX_OUTER_IP_CKSUM_BAD`` should be used as a replacement. + API Changes --- diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h index bb38d7f581..8db76d4885 100644 --- a/lib/mbuf/rte_mbuf_core.h +++ b/lib/mbuf/rte_mbuf_core.h @@ -79,13 +79,6 @@ extern "C" { */ #define PKT_RX_OUTER_IP_CKSUM_BAD (1ULL << 5) -/** - * Deprecated. - * This flag has been renamed, use PKT_RX_OUTER_IP_CKSUM_BAD instead. - */ -#define PKT_RX_EIP_CKSUM_BAD \ - RTE_DEPRECATED(PKT_RX_EIP_CKSUM_BAD) PKT_RX_OUTER_IP_CKSUM_BAD - /** * A vlan has been stripped by the hardware and its tci is saved in * mbuf->vlan_tci. This can only happen if vlan stripping is enabled -- 2.30.2
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Tue, 2021-09-28 at 20:29 +0530, Jerin Jacob wrote: > On Tue, Sep 28, 2021 at 8:10 PM Xueming(Steven) Li > wrote: > > > > On Tue, 2021-09-28 at 13:59 +, Ananyev, Konstantin wrote: > > > > > > > > On Tue, Sep 28, 2021 at 6:55 PM Xueming(Steven) Li > > > > wrote: > > > > > > > > > > On Tue, 2021-09-28 at 18:28 +0530, Jerin Jacob wrote: > > > > > > On Tue, Sep 28, 2021 at 5:07 PM Xueming(Steven) Li > > > > > > wrote: > > > > > > > > > > > > > > On Tue, 2021-09-28 at 15:05 +0530, Jerin Jacob wrote: > > > > > > > > On Sun, Sep 26, 2021 at 11:06 AM Xueming(Steven) Li > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Wed, 2021-08-11 at 13:04 +0100, Ferruh Yigit wrote: > > > > > > > > > > On 8/11/2021 9:28 AM, Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > Sent: Wednesday, August 11, 2021 4:03 PM > > > > > > > > > > > > To: Xueming(Steven) Li > > > > > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit > > > > > > > > > > > > ; NBU-Contact-Thomas > > > > > > > > > > > > Monjalon > > > > ; > > > > > > > > > > > > Andrew Rybchenko > > > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: > > > > > > > > > > > > introduce shared Rx queue > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 9, 2021 at 7:46 PM Xueming(Steven) Li > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > > > Sent: Monday, August 9, 2021 9:51 PM > > > > > > > > > > > > > > To: Xueming(Steven) Li > > > > > > > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit > > > > > > > > > > > > > > ; > > > > > > > > > > > > > > NBU-Contact-Thomas Monjalon > > > > > > > > > > > > > > ; Andrew Rybchenko > > > > > > > > > > > > > > > > > > > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: > > > > > > > > > > > > > > introduce shared Rx queue > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 9, 2021 at 5:18 PM Xueming Li > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue is > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > switch domain, the memory consumption became > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > important, polling all ports leads to high > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. Ports > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > configuration in a switch domain could share > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > Polling any queue using same shared RX queue > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > all member ports. Source port is identified > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group should be > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled on > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the representor? If > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain could > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your question. > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > almost same as before, PF and representor port > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > RTE_ETH_
Re: [dpdk-dev] [EXT] [PATCH] cryptodev: add telemetry callbacks
On Wed, Sep 29, 2021 at 05:14:38AM +, Gowrishankar Muthukrishnan wrote: > > An example usage can be seen below: > > > > Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2 > > {"version": "DPDK 21.11.0-rc0", "pid": 1135019, "max_output_len": 16384} > > --> / > > {"/": ["/", "/cryptodev/list", "/cryptodev/stats", ...]} > > --> /cryptodev/list > > {"/cryptodev/list": {":1a:01.0_qat_sym": 0, ":1a:01.0_qat_asym": \ > > 1}} > > Will this be better if we list keys by port ID, as port ID is used as param > in appropriate endpoints ? > Just a suggestion. > +1 to that suggestion
Re: [dpdk-dev] [PATCH v4] efd: change data type of parameter
Hi > -Original Message- > From: David Christensen > Sent: Tuesday, September 28, 2021 4:53 PM > To: De Lara Guarch, Pablo ; Wang, Yipeng1 > ; Marohn, Byron > Cc: dev@dpdk.org; Mcnamara, John > Subject: Re: [PATCH v4] efd: change data type of parameter > > > > On 9/28/21 6:58 AM, Pablo de Lara wrote: > > rte_efd_create() function was using uint8_t for a socket bitmask, for > > one of its parameters. > > This limits the maximum of NUMA sockets to be 8. > > Changing to to uint64_t increases it to 64, which should be more > > future-proof. > > > > Coverity issue: 366390 > > Fixes: 56b6ef874f8 ("efd: new Elastic Flow Distributor library") > > > > Signed-off-by: Pablo de Lara > > Acked-by: John McNamara > > --- > > > > v4: Set socket id in EFD tests > > > > v3: Fixed commit message > > > > v2: Fixed EFD tests > > Results with v4 on a non-consecutive NUMA system: ... > Test OK Great! Thanks a lot for checking. Would you mind adding tested-by to the patch? Pablo
Re: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free
Hi Olivier, I wanted to retest the patch on latest main, but it no longer applies, could you please rebase it? Thanks, Ali > -Original Message- > From: Morten Brørup > Sent: Tuesday, September 28, 2021 12:40 PM > To: Slava Ovsiienko ; NBU-Contact-Thomas > Monjalon ; Olivier Matz ; > Ali Alnubani > Cc: dev@dpdk.org; David Marchand ; Alexander > Kozyrev ; Ferruh Yigit ; > zhaoyan.c...@intel.com; Andrew Rybchenko > ; Ananyev, Konstantin > ; Ajit Khaparde > ; jer...@marvell.com > Subject: RE: [dpdk-dev] [dpdk-stable] [PATCH v4] mbuf: fix reset on mbuf free > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Slava Ovsiienko > > Sent: Tuesday, 28 September 2021 11.01 > > > > Hi, > > > > I've re-read the entire thread. > > If I understand correctly, the root problem was (in initial patch): > > > > > m1 = rte_pktmbuf_alloc(mp); > > > rte_pktmbuf_append(m1, 500); > > > m2 = rte_pktmbuf_alloc(mp); > > > rte_pktmbuf_append(m2, 500); > > > rte_pktmbuf_chain(m1, m2); > > > m0 = rte_pktmbuf_alloc(mp); > > > rte_pktmbuf_append(m0, 500); > > > rte_pktmbuf_chain(m0, m1); > > > > > > As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 > > segment > > > (this is not required), after this code the mbuf chain have 3 > > > segments: > > > - m0: next=m1, nb_seg=3 > > > - m1: next=m2, nb_seg=2 > > > - m2: next=NULL, nb_seg=1 > > > > > The proposed fix was to ALWAYS set next and nb_seg fields on > > mbuf_free(), regardless next field content. That would perform > > unconditional write to mbuf, and might affect the configurations, > > where are no multi- segment packets at al. mbuf_free() is "backbone" > > API, it is used by all cases, all scenaries are affected. > > > > As far as I know, the current approach for nb_seg field - it contains > > other value than 1 only in the first mbuf , for the following > > segments, it should not be considered at all (only the first segment > > fields are valid), and it is supposed to contain 1, as it was > > initially allocated from the pool. > > > > In the example above the problem was introduced by > > rte_pktmbuf_chain(). Could we consider fixing the rte_pktmbuf_chain() > > (used in potentially fewer common sceneries) instead of touching the > > extremely common rte_mbuf_free() ? > > > > With best regards, > > Slava > > Great idea, Slava! > > Changing the invariant for 'nb_segs', so it must be 1, except in the first > segment > of a segmented packet. > > Thinking further about it, perhaps we can achieve even higher performance by a > minor additional modification: Use 0 instead of 1? Or offset 'nb_segs' by -1, > so it > reflects the number of additional segments? > > And perhaps combining the invariants for 'nb_segs' and 'next' could provide > even > more performance improvements. I don't know, just sharing a thought. > > Anyway, I vote for fixing the bug. One way or the other! > > -Morten > > > > > > -Original Message- > > > From: Thomas Monjalon > > > Sent: Tuesday, September 28, 2021 11:29 > > > > > > Follow-up again: > > > We have added a note in 21.08, we should fix it in 21.11. > > > If there are no counter proposal, I suggest applying this patch, no > > matter the > > > performance regression. > > > > > > > > > 30/07/2021 16:54, Thomas Monjalon: > > > > 30/07/2021 16:35, Morten Brørup: > > > > > > From: Olivier Matz [mailto:olivier.m...@6wind.com] > > > > > > Sent: Friday, 30 July 2021 14.37 > > > > > > > > > > > > Hi Thomas, > > > > > > > > > > > > On Sat, Jul 24, 2021 at 10:47:34AM +0200, Thomas Monjalon > > wrote: > > > > > > > What's the follow-up for this patch? > > > > > > > > > > > > Unfortunatly, I still don't have the time to work on this > > > > > > topic > > yet. > > > > > > > > > > > > In my initial tests, in our lab, I didn't notice any > > performance > > > > > > regression, but Ali has seen an impact (0.5M PPS, but I don't > > know > > > > > > how much in percent). > > > > > > > > > > > > > > > > > > > 19/01/2021 15:04, Slava Ovsiienko: > > > > > > > > Hi, All > > > > > > > > > > > > > > > > Could we postpose this patch at least to rc2? We would > > > > > > > > like > > to > > > > > > conduct more investigations? > > > > > > > > > > > > > > > > With best regards, Slava > > > > > > > > > > > > > > > > From: Olivier Matz > > > > > > > > > On Mon, Jan 18, 2021 at 05:52:32PM +, Ali Alnubani > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > (Sorry had to resend this to some recipients due to > > mail > > > > > > > > > > server > > > > > > problems). > > > > > > > > > > > > > > > > > > > > Just confirming that I can still reproduce the > > regression > > > > > > > > > > with > > > > > > single core and > > > > > > > > > 64B frames on other servers. > > > > > > > > > > > > > > > > > > Many thanks for the feedback. Can you please detail what > > is > > > > > > > > > the > > > > > > amount of > > > > > > > > > performance loss in percent, and confirm the test case? > > (I > > > > > > suppose it
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, Sep 29, 2021 at 1:11 PM Xueming(Steven) Li wrote: > > On Tue, 2021-09-28 at 20:29 +0530, Jerin Jacob wrote: > > On Tue, Sep 28, 2021 at 8:10 PM Xueming(Steven) Li > > wrote: > > > > > > On Tue, 2021-09-28 at 13:59 +, Ananyev, Konstantin wrote: > > > > > > > > > > On Tue, Sep 28, 2021 at 6:55 PM Xueming(Steven) Li > > > > > wrote: > > > > > > > > > > > > On Tue, 2021-09-28 at 18:28 +0530, Jerin Jacob wrote: > > > > > > > On Tue, Sep 28, 2021 at 5:07 PM Xueming(Steven) Li > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, 2021-09-28 at 15:05 +0530, Jerin Jacob wrote: > > > > > > > > > On Sun, Sep 26, 2021 at 11:06 AM Xueming(Steven) Li > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Wed, 2021-08-11 at 13:04 +0100, Ferruh Yigit wrote: > > > > > > > > > > > On 8/11/2021 9:28 AM, Xueming(Steven) Li wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > > Sent: Wednesday, August 11, 2021 4:03 PM > > > > > > > > > > > > > To: Xueming(Steven) Li > > > > > > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit > > > > > > > > > > > > > ; NBU-Contact-Thomas > > > > > > > > > > > > > Monjalon > > > > > ; > > > > > > > > > > > > > Andrew Rybchenko > > > > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: > > > > > > > > > > > > > introduce shared Rx queue > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 9, 2021 at 7:46 PM Xueming(Steven) Li > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > > > > Sent: Monday, August 9, 2021 9:51 PM > > > > > > > > > > > > > > > To: Xueming(Steven) Li > > > > > > > > > > > > > > > Cc: dpdk-dev ; Ferruh Yigit > > > > > > > > > > > > > > > ; > > > > > > > > > > > > > > > NBU-Contact-Thomas Monjalon > > > > > > > > > > > > > > > ; Andrew Rybchenko > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: > > > > > > > > > > > > > > > introduce shared Rx queue > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 9, 2021 at 5:18 PM Xueming Li > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue is > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > switch domain, the memory consumption became > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > important, polling all ports leads to high > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. Ports > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > configuration in a switch domain could share > > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > > Polling any queue using same shared RX queue > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > all member ports. Source port is identified > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group should be > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled on > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the representor? If > > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain could > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your question. > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > almost same as before, PF and representor port > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > descriptors(mb
Re: [dpdk-dev] [PATCH] ethdev: remove legacy mirroring API
Hi Andrew, Thanks a lot for working on this. I had sent the deprecation notice and forgot about it, apologies. 28/09/2021 18:16, Andrew Rybchenko: > A more file grain flow API should be used instead of it. Do you mean "fine-grain"? I think we can reference the rte_flow action to be used: RTE_FLOW_ACTION_TYPE_SAMPLE > --- a/doc/guides/rel_notes/release_21_11.rst > +++ b/doc/guides/rel_notes/release_21_11.rst > +* ethdev: Removed the port mirroring API. A more fine grain flow API may > + be used instead. The structures ``rte_eth_mirror_conf`` and > + ``rte_eth_vlan_mirror`` and the functions ``rte_eth_mirror_rule_set`` > + and ``rte_eth_mirror_rule_reset`` along with the associated macros > + ``ETH_MIRROR_*`` are removed. Here as well we can mention the replacement RTE_FLOW_ACTION_TYPE_SAMPLE.
Re: [dpdk-dev] [PATCH v4 06/13] net/bnxt: add support for tunnel offload API
28/09/2021 23:32, Ajit Khaparde: > On Tue, Sep 28, 2021 at 5:43 AM Ferruh Yigit wrote: > > Following warning is reported by Jerin & Thomas: > > > > $ ./devtools/check-doc-vs-code.sh > > rte_flow doc out of sync for bnxt > > action shared > > > > > > This is mainly saying 'RTE_FLOW_ACTION_TYPE_SHARED' flow action is > > supported but > > not documented. But from code I am not sure if shared action is used, there > > is > > only one range check using this enum in the code, can you please clarify? > > Correct. RTE_FLOW_ACTION_TYPE_SHARED is being used only for a range check. > It is not really a supported action right now as far as the code is concerned. > As of now, no document update is necessary. Thanks You need to update something to make the check OK. If it is a false positive, it can be handled in the script, but we cannot let the script returning an error.
[dpdk-dev] [PATCH v2] ethdev: remove legacy mirroring API
A more fine-grain flow API action RTE_FLOW_ACTION_TYPE_SAMPLE should be used instead of it. Signed-off-by: Andrew Rybchenko --- v2: - fix description - mention RTE_FLOW_ACTION_TYPE_SAMPLE in the descriptor and release notes - fix net/txgbe broken build app/test-pmd/cmdline.c | 283 doc/guides/nics/features.rst| 11 - doc/guides/nics/features/default.ini| 1 - doc/guides/nics/features/i40e.ini | 1 - doc/guides/nics/features/ipn3ke.ini | 1 - doc/guides/nics/features/ixgbe.ini | 1 - doc/guides/rel_notes/deprecation.rst| 6 - doc/guides/rel_notes/release_21_11.rst | 7 + doc/guides/testpmd_app_ug/testpmd_funcs.rst | 25 -- drivers/net/i40e/i40e_ethdev.c | 352 drivers/net/i40e/i40e_ethdev.h | 23 -- drivers/net/ixgbe/ixgbe_ethdev.c| 192 --- drivers/net/ixgbe/ixgbe_ethdev.h| 9 - drivers/net/ixgbe/ixgbe_pf.c| 3 - drivers/net/txgbe/txgbe_ethdev.h| 8 - drivers/net/txgbe/txgbe_pf.c| 2 - lib/ethdev/ethdev_driver.h | 13 - lib/ethdev/rte_ethdev.c | 61 lib/ethdev/rte_ethdev.h | 77 - lib/ethdev/version.map | 2 - 20 files changed, 7 insertions(+), 1071 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index a9efd027c3..826256b0b3 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -567,24 +567,6 @@ static void cmd_help_long_parsed(void *parsed_result, "queue_mask (queue_mask_value)\n" "Set rate limit for queues in VF of a port\n\n" - "set port (port_id) mirror-rule (rule_id)" - " (pool-mirror-up|pool-mirror-down|vlan-mirror)" - " (poolmask|vlanid[,vlanid]*) dst-pool (pool_id) (on|off)\n" - " Set pool or vlan type mirror rule on a port.\n" - " e.g., 'set port 0 mirror-rule 0 vlan-mirror 0,1" - " dst-pool 0 on' enable mirror traffic with vlan 0,1" - " to pool 0.\n\n" - - "set port (port_id) mirror-rule (rule_id)" - " (uplink-mirror|downlink-mirror) dst-pool" - " (pool_id) (on|off)\n" - " Set uplink or downlink type mirror rule on a port.\n" - " e.g., 'set port 0 mirror-rule 0 uplink-mirror dst-pool" - " 0 on' enable mirror income traffic to pool 0.\n\n" - - "reset port (port_id) mirror-rule (rule_id)\n" - " Reset a mirror rule.\n\n" - "set flush_rx (on|off)\n" " Flush (default) or don't flush RX streams before" " forwarding. Mainly used with PCAP drivers.\n\n" @@ -9494,268 +9476,6 @@ cmdline_parse_inst_t cmd_cfg_tunnel_udp_port = { }, }; -/* *** CONFIGURE VM MIRROR VLAN/POOL RULE *** */ -struct cmd_set_mirror_mask_result { - cmdline_fixed_string_t set; - cmdline_fixed_string_t port; - portid_t port_id; - cmdline_fixed_string_t mirror; - uint8_t rule_id; - cmdline_fixed_string_t what; - cmdline_fixed_string_t value; - cmdline_fixed_string_t dstpool; - uint8_t dstpool_id; - cmdline_fixed_string_t on; -}; - -cmdline_parse_token_string_t cmd_mirror_mask_set = - TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result, - set, "set"); -cmdline_parse_token_string_t cmd_mirror_mask_port = - TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result, - port, "port"); -cmdline_parse_token_num_t cmd_mirror_mask_portid = - TOKEN_NUM_INITIALIZER(struct cmd_set_mirror_mask_result, - port_id, RTE_UINT16); -cmdline_parse_token_string_t cmd_mirror_mask_mirror = - TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result, - mirror, "mirror-rule"); -cmdline_parse_token_num_t cmd_mirror_mask_ruleid = - TOKEN_NUM_INITIALIZER(struct cmd_set_mirror_mask_result, - rule_id, RTE_UINT8); -cmdline_parse_token_string_t cmd_mirror_mask_what = - TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result, - what, "pool-mirror-up#pool-mirror-down" - "#vlan-mirror"); -cmdline_parse_token_string_t cmd_mirror_mask_value = - TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result, - value, NULL); -cmdline_parse_token_string_t cmd_mirror_mask_dstpool = - TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result, -
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, 2021-09-29 at 00:26 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > could > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > question. > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > port > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > ethdev receive queues land into > > > > the same > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > is > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > > > > > > > > > > The > > > > > > > > > > > forwarding logic has to be changed slightly in > > > > > > > > > > > case > > > > > > > > > > > of shared rxq. > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > lookup > > > > > > > > > > > source stream according to mbuf->port, forwarding > > > > > > > > > > > to > > > > > > > > > > > target fs. > > > > > > > > > > > Packets from same source port could be grouped as > > > > > > > > > > > a > > > > > > > > > > > small burst to process, this will accelerates the > > > > > > > > > > > performance if traffic > > > > come from > > > > > > > > > > > limited ports. I'll introduce some common api to > > > > > > > > > > > do > > > > > > > > > > > shard rxq forwarding, call it with packets > > > > > > > > > > > handling > > > > > > > > > > > callback, so it suites for > > > > > > > > > > > all forwarding engine. Will sent patches soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > All ports will put the packets in to the same queue > > > > > > > > > > (share queue), right? Does > > > > > > > > > > this means only single core will poll only, what > > > > > > > > > > will > > > > > > > > > > happen if there are > > > > > > > > > > multiple cores polling, won't it cause problem? > > > > > > > > > > > > > > > > > > > > And if this requires specific changes in the > > > > > > > > > > application, I am not sure about > > > > > > > > > > t
Re: [dpdk-dev] [PATCH] ethdev: remove legacy mirroring API
Hi Thomas, On 9/29/21 11:17 AM, Thomas Monjalon wrote: > Hi Andrew, > Thanks a lot for working on this. > I had sent the deprecation notice and forgot about it, apologies. No worries. I always liked to delete some code. I think it is my favorite in programming :-) > > 28/09/2021 18:16, Andrew Rybchenko: >> A more file grain flow API should be used instead of it. > > Do you mean "fine-grain"? Thanks. Fixed. > I think we can reference the rte_flow action to be used: > RTE_FLOW_ACTION_TYPE_SAMPLE Yes, I think is useful, added in v2. > >> --- a/doc/guides/rel_notes/release_21_11.rst >> +++ b/doc/guides/rel_notes/release_21_11.rst >> +* ethdev: Removed the port mirroring API. A more fine grain flow API may >> + be used instead. The structures ``rte_eth_mirror_conf`` and >> + ``rte_eth_vlan_mirror`` and the functions ``rte_eth_mirror_rule_set`` >> + and ``rte_eth_mirror_rule_reset`` along with the associated macros >> + ``ETH_MIRROR_*`` are removed. > > Here as well we can mention the replacement RTE_FLOW_ACTION_TYPE_SAMPLE. Yes, added in v2.
Re: [dpdk-dev] [dpdk-stable] [PATCH v5 2/2] ethdev: fix docs of drivers callbacks getting xstats by IDs
On 9/28/2021 5:53 PM, Andrew Rybchenko wrote: > On 9/28/21 7:50 PM, Ferruh Yigit wrote: >> On 9/28/2021 1:05 PM, Andrew Rybchenko wrote: >>> From: Ivan Ilchenko >>> >>> Update xstats by IDs callbacks documentation in accordance with >>> ethdev usage of these callbacks. Document valid combinations of >>> input arguments to make driver implementation simpler. >>> >>> Fixes: 79c913a42f0 ("ethdev: retrieve xstats by ID") >>> Cc: sta...@dpdk.org >>> >>> Signed-off-by: Ivan Ilchenko >>> Signed-off-by: Andrew Rybchenko >>> Reviewed-by: Andy Moreton >>> --- >>> lib/ethdev/ethdev_driver.h | 42 -- >>> 1 file changed, 40 insertions(+), 2 deletions(-) >>> >>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h >>> index 40e474aa7e..c89eefcc42 100644 >>> --- a/lib/ethdev/ethdev_driver.h >>> +++ b/lib/ethdev/ethdev_driver.h >>> @@ -187,11 +187,28 @@ typedef int (*eth_xstats_get_t)(struct rte_eth_dev >>> *dev, >>> struct rte_eth_xstat *stats, unsigned int n); >>> /**< @internal Get extended stats of an Ethernet device. */ >>> >>> +/** >>> + * @internal >>> + * Get extended stats of an Ethernet device. >>> + * >>> + * @param dev >>> + * ethdev handle of port. >>> + * @param ids >>> + * IDs array to retrieve specific statistics. Must not be NULL. >>> + * @param values >>> + * A pointer to a table to be filled with device statistics values. >>> + * Must not be NULL. >>> + * @param n >>> + * Element count in @p ids and @p values. >>> + * >>> + * @return >>> + * - A number of filled in stats. >>> + * - A negative value on error. >>> + */ >>> typedef int (*eth_xstats_get_by_id_t)(struct rte_eth_dev *dev, >>> const uint64_t *ids, >>> uint64_t *values, >>> unsigned int n); >>> -/**< @internal Get extended stats of an Ethernet device. */ >>> >>> /** >>> * @internal >>> @@ -218,10 +235,31 @@ typedef int (*eth_xstats_get_names_t)(struct >>> rte_eth_dev *dev, >>> struct rte_eth_xstat_name *xstats_names, unsigned int size); >>> /**< @internal Get names of extended stats of an Ethernet device. */ >>> >>> +/** >>> + * @internal >>> + * Get names of extended stats of an Ethernet device. >>> + * For name count, set @p xstats_names and @p ids to NULL. >> >> Why limiting this behavior to 'xstats_get_names_by_id'? >> >> Internally 'xstats_get_names_by_id' is used to get the count, but I think >> this >> is not intentionally selected, just one of the xstats_*_by_id dev_ops used. >> >> I think it is confusing to have this support for one of the '_by_id' dev_ops >> but >> not for other. Why not require both to support returning 'count'? > > Simply because it is dead code. There is no point to require > from driver to have dead code. > Let me step back a little, both ethdev APIs can be used to return xstats count by providing 'values/names' & 'ids' pointers as NULL and 'size' as 0: 'rte_eth_xstats_get_names_by_id()' 'rte_eth_xstats_get_by_id()' But internally both APIs use 'xstats_get_names_by_id' dev_ops to get the count, as said above I believe this selection is done unintentionally. I am for below two options: a) Internally use 'xstats_get_names' || 'xstats_get' dev_ops to get the xstats count, and doesn't support getting xstats count for both '_by_id' dev_ops, this simplifies driver code. As far as I remember I suggested this before, still I prefer this one. b) If we will support getting xstats count from '_by_id' dev_ops, I think both should support it, to not make it more complex to figure out which one support what. As sample both 'xstats_get_names' and 'xstats_get' supports getting xstats count, not just one.
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
Hi Bruce, >-Original Message- >From: Richardson, Bruce >Sent: Friday 24 September 2021 17:19 >To: dev@dpdk.org >Cc: Power, Ciara ; Burakov, Anatoly >; Richardson, Bruce >; sta...@dpdk.org; David Marchand > >Subject: [PATCH v2] telemetry: fix "in-memory" process socket conflicts > >When DPDK is run with --in-memory mode, multiple processes can run >simultaneously using the same runtime dir. This leads to each process >removing another process' telemetry socket as it started up, giving >unexpected behaviour. > >This patch changes that behaviour to first check if the existing socket is >active. >If not, it's an old socket to be cleaned up and can be removed. If it is >active, >telemetry initialization fails and an error message is printed out giving >instructions on how to remove the error; either by using file-prefix to have a >different runtime dir (and therefore socket path) or by disabling telemetry if >it >not needed. > >Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality") >Cc: sta...@dpdk.org > >Reported-by: David Marchand >Signed-off-by: Bruce Richardson >--- >v2: fix build error on FreeBSD >--- Acked-by: Ciara Power Thanks!
[dpdk-dev] [v4] crypto/cnxk: add telemetry endpoints to cryptodev
Add telemetry endpoints to cryptodev. Signed-off-by: Gowrishankar Muthukrishnan --- Depends-on: patch-19252 ("cryptodev: add telemetry endpoint for cryptodev info") Depends-on: series-19253 ("cnxk: enable telemetry endpoints") v4: - fix compilation issue. --- .../crypto/cnxk/cnxk_cryptodev_telemetry.c| 119 ++ drivers/crypto/cnxk/meson.build | 1 + 2 files changed, 120 insertions(+) create mode 100644 drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c b/drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c new file mode 100644 index 00..a9df7e49c3 --- /dev/null +++ b/drivers/crypto/cnxk/cnxk_cryptodev_telemetry.c @@ -0,0 +1,119 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + +#include +#include +#include + +#include +#include + +#include "cnxk_cryptodev.h" + +#define CRYPTO_CAPS_SZ \ + (RTE_ALIGN_CEIL(sizeof(struct rte_cryptodev_capabilities), \ + sizeof(uint64_t)) /\ +sizeof(uint64_t)) + +#define SEC_CAPS_SZ \ + (RTE_ALIGN_CEIL(sizeof(struct rte_security_capability),\ + sizeof(uint64_t)) /\ +sizeof(uint64_t)) + +static int +crypto_caps_array(struct rte_tel_data *d, + struct rte_cryptodev_capabilities *dev_caps, + size_t dev_caps_n) +{ + union caps_u { + struct rte_cryptodev_capabilities dev_caps; + uint64_t val[CRYPTO_CAPS_SZ]; + } caps; + unsigned int i, j, n = 0; + + rte_tel_data_start_array(d, RTE_TEL_U64_VAL); + + for (i = 0; i < dev_caps_n; i++) { + if (dev_caps[i].op == RTE_CRYPTO_OP_TYPE_UNDEFINED) + break; + + memset(&caps, 0, sizeof(caps)); + rte_memcpy(&caps.dev_caps, &dev_caps[i], sizeof(dev_caps[0])); + for (j = 0; j < CRYPTO_CAPS_SZ; j++) + rte_tel_data_add_array_u64(d, caps.val[j]); + ++n; + } + + return n; +} + +static int +sec_caps_array(struct rte_tel_data *d, struct rte_security_capability *dev_caps, + size_t dev_caps_n) +{ + union caps_u { + struct rte_security_capability dev_caps; + uint64_t val[SEC_CAPS_SZ]; + } caps; + unsigned int i, j, n = 0; + + rte_tel_data_start_array(d, RTE_TEL_U64_VAL); + + for (i = 0; i < dev_caps_n; i++) { + memset(&caps, 0, sizeof(caps)); + rte_memcpy(&caps.dev_caps, &dev_caps[i], sizeof(dev_caps[0])); + for (j = 0; j < SEC_CAPS_SZ; j++) + rte_tel_data_add_array_u64(d, caps.val[j]); + ++n; + } + + return n; +} + +static int +cryptodev_tel_handle_info(const char *cmd __rte_unused, const char *params, + struct rte_tel_data *d) +{ + struct rte_tel_data *sec_crypto_caps, *sec_caps; + char name[RTE_CRYPTODEV_NAME_MAX_LEN]; + int sec_crypto_caps_n, sec_caps_n; + struct rte_cryptodev *dev; + struct cnxk_cpt_vf *vf; + + if (params == NULL || strlen(params) == 0 || !isdigit(*params)) + return -EINVAL; + + rte_strlcpy(name, params, RTE_CRYPTODEV_NAME_LEN); + dev = rte_cryptodev_pmd_get_named_dev(name); + if (!dev) { + plt_err("No cryptodev of name %s available", name); + return -EINVAL; + } + + vf = dev->data->dev_private; + rte_tel_data_start_dict(d); + + /* Security Crypto capabilities */ + sec_crypto_caps = rte_tel_data_alloc(); + sec_crypto_caps_n = crypto_caps_array( + sec_crypto_caps, vf->sec_crypto_caps, CNXK_SEC_CRYPTO_MAX_CAPS); + rte_tel_data_add_dict_container(d, "sec_crypto_caps", sec_crypto_caps, + 0); + rte_tel_data_add_dict_int(d, "sec_crypto_caps_n", sec_crypto_caps_n); + + /* Security capabilities */ + sec_caps = rte_tel_data_alloc(); + sec_caps_n = sec_caps_array(sec_caps, vf->sec_caps, CNXK_SEC_MAX_CAPS); + rte_tel_data_add_dict_container(d, "sec_caps", sec_caps, 0); + rte_tel_data_add_dict_int(d, "sec_caps_n", sec_caps_n); + + return 0; +} + +RTE_INIT(cnxk_cryptodev_init_telemetry) +{ + rte_telemetry_register_cmd( + "/cnxk/cryptodev/info", cryptodev_tel_handle_info, + "Returns cryptodev info. Parameters: pci id"); +} diff --git a/drivers/crypto/cnxk/meson.build b/drivers/crypto/cnxk/meson.build index 437d208b5a..4350928289 100644 --- a/drivers/crypto/cnxk/meson.build +++ b/drivers/crypto/cnxk/meson.build @@ -19,6 +19,7 @@ sources = files( 'cnxk_cryptodev_capabil
Re: [dpdk-dev] [PATCH v2] ethdev: remove legacy mirroring API
29/09/2021 10:39, Andrew Rybchenko: > A more fine-grain flow API action RTE_FLOW_ACTION_TYPE_SAMPLE should > be used instead of it. > > Signed-off-by: Andrew Rybchenko > --- > v2: > - fix description > - mention RTE_FLOW_ACTION_TYPE_SAMPLE in the descriptor and > release notes > - fix net/txgbe broken build Acked-by: Thomas Monjalon Thanks
Re: [dpdk-dev] [PATCH v2 1/5] net/virtio: add initial RSS support
On 9/23/21 09:35, Xia, Chenbo wrote: Hi Maxime, -Original Message- From: Maxime Coquelin Sent: Wednesday, September 22, 2021 5:58 PM To: dev@dpdk.org; Xia, Chenbo ; amore...@redhat.com; david.march...@redhat.com; andrew.rybche...@oktetlabs.ru; Yigit, Ferruh ; michae...@nvidia.com; viachesl...@nvidia.com; Li, Xiaoyun Cc: sta...@dpdk.org; nelio.laranje...@6wind.com; yvuge...@redhat.com; ybend...@redhat.com; Maxime Coquelin Subject: [PATCH v2 1/5] net/virtio: add initial RSS support Provide the capability to update the hash key, hash types and RETA table on the fly (without needing to stop/start the device). However, the key length and the number of RETA entries are fixed to 40B and 128 entries respectively. This is done in order to simplify the design, but may be revisited later as the Virtio spec provides this flexibility. Note that only VIRTIO_NET_F_RSS support is implemented, VIRTIO_NET_F_HASH_REPORT, which would enable reporting the packet RSS hash calculated by the device into mbuf.rss, is not yet supported. Regarding the default RSS configuration, it has been chosen to use the default Intel ixgbe key as default key, and default RETA is a simple modulo between the hash and the number of Rx queues. Signed-off-by: Maxime Coquelin --- doc/guides/nics/features/virtio.ini| 3 + doc/guides/nics/virtio.rst | 3 + doc/guides/rel_notes/release_21_11.rst | 6 + drivers/net/virtio/virtio.h| 31 ++- drivers/net/virtio/virtio_ethdev.c | 367 - drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtqueue.h | 21 ++ 7 files changed, 428 insertions(+), 6 deletions(-) diff --git a/doc/guides/nics/features/virtio.ini b/doc/guides/nics/features/virtio.ini index 48f6f393b1..a5eab4932f 100644 --- a/doc/guides/nics/features/virtio.ini +++ b/doc/guides/nics/features/virtio.ini @@ -14,6 +14,9 @@ Promiscuous mode = Y Allmulticast mode= Y Unicast MAC filter = Y Multicast MAC filter = Y +RSS hash = P +RSS key update = Y +RSS reta update = Y VLAN filter = Y Basic stats = Y Stats per queue = Y diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst index 82ce7399ce..98e0d012b7 100644 --- a/doc/guides/nics/virtio.rst +++ b/doc/guides/nics/virtio.rst @@ -73,6 +73,9 @@ In this release, the virtio PMD driver provides the basic functionality of packe * Virtio supports using port IO to get PCI resource when UIO module is not available. +* Virtio supports RSS Rx mode with 40B configurable hash key length, 128 +configurable RETA entries and configurable hash types. + Prerequisites - diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index f5d16993db..2f9d81926b 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -96,6 +96,12 @@ New Features Added command-line options to specify total number of processes and current process ID. Each process owns subset of Rx and Tx queues. +* **Added initial RSS support to Virtio PMD.** + + Initial support for RSS receive mode has been added to the Virtio PMD, + with the capability for the application to configure the hash key, the + RETA and the hash types. Virtio hash reporting is yet to be added. + Removed Items - diff --git a/drivers/net/virtio/virtio.h b/drivers/net/virtio/virtio.h index 525e2dad4c..b4f21dc0c7 100644 --- a/drivers/net/virtio/virtio.h +++ b/drivers/net/virtio/virtio.h @@ -30,6 +30,7 @@ #define VIRTIO_NET_F_GUEST_ANNOUNCE 21/* Guest can announce device on the network */ #define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_RSS 60 /* RSS supported */ /* * Do we get callbacks when the ring is completely used, @@ -100,6 +101,29 @@ */ #define VIRTIO_MAX_INDIRECT ((int)(rte_mem_page_size() / 16)) +/* Virtio RSS hash types */ +#define VIRTIO_NET_HASH_TYPE_IPV4 (1 << 0) +#define VIRTIO_NET_HASH_TYPE_TCPV4 (1 << 1) +#define VIRTIO_NET_HASH_TYPE_UDPV4 (1 << 2) +#define VIRTIO_NET_HASH_TYPE_IPV6 (1 << 3) +#define VIRTIO_NET_HASH_TYPE_TCPV6 (1 << 4) +#define VIRTIO_NET_HASH_TYPE_UDPV6 (1 << 5) Spec uses 'v' instead of 'V' for macro definition. Better to align it? I actually put it in uper case on purpose. I don't have a strong opinion on this, but prefer to keep defines in upper case. The spec lacks consistency on this, as VIRTIO_NET_HDR_GSO_TCPV6 is all upper case for example. +#define VIRTIO_NET_HASH_TYPE_IP_EX (1 << 6) +#define VIRTIO_NET_HASH_TYPE_TCP_EX(1 << 7) +#define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8) + +#define VIRTIO_NET_HASH_TYPE_MASK ( \ + VIRTIO_NET_HASH_TYPE_IPV4 | \ + VIRTIO_NET_HASH_TYPE_TCPV4 | \ + VIRTIO_NET_HASH_TYPE_U
[dpdk-dev] [PATCH v2 0/3] add SA config option for inner pkt csum
Add inner packet IPv4 hdr and L4 checksum enable options in conf. These will be used in case of protocol offload. Per SA, application could specify whether the checksum(compute/verify) can be offloaded to security device. Depends on https://patches.dpdk.org/project/dpdk/list/?series=19243 Changes in v2: - Fixed release notes - Added feature flag in default.ini and cn10k.ini - Fixed test patch subject Archana Muniganti (3): security: add SA config option for inner pkt csum crypto/cnxk: add inner checksum test/crypto: add inner checksum cases app/test/test_cryptodev.c | 34 +++ app/test/test_cryptodev_security_ipsec.c | 195 ++ app/test/test_cryptodev_security_ipsec.h | 2 + ...st_cryptodev_security_ipsec_test_vectors.h | 118 +++ doc/guides/cryptodevs/features/cn10k.ini | 1 + doc/guides/cryptodevs/features/default.ini| 1 + doc/guides/rel_notes/deprecation.rst | 4 +- doc/guides/rel_notes/release_21_11.rst| 6 + drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 65 -- drivers/crypto/cnxk/cn10k_ipsec.c | 49 - drivers/crypto/cnxk/cn10k_ipsec.h | 1 + drivers/crypto/cnxk/cn10k_ipsec_la_ops.h | 9 +- drivers/crypto/cnxk/cnxk_cryptodev.c | 3 + .../crypto/cnxk/cnxk_cryptodev_capabilities.c | 2 + lib/cryptodev/rte_cryptodev.h | 2 + lib/security/rte_security.h | 18 ++ 16 files changed, 490 insertions(+), 20 deletions(-) -- 2.22.0
[dpdk-dev] [PATCH v2 1/3] security: add SA config option for inner pkt csum
Add inner packet IPv4 hdr and L4 checksum enable options in conf. These will be used in case of protocol offload. Per SA, application could specify whether the checksum(compute/verify) can be offloaded to security device. Signed-off-by: Archana Muniganti --- doc/guides/cryptodevs/features/default.ini | 1 + doc/guides/rel_notes/deprecation.rst | 4 ++-- doc/guides/rel_notes/release_21_11.rst | 4 lib/cryptodev/rte_cryptodev.h | 2 ++ lib/security/rte_security.h| 18 ++ 5 files changed, 27 insertions(+), 2 deletions(-) diff --git a/doc/guides/cryptodevs/features/default.ini b/doc/guides/cryptodevs/features/default.ini index c24814de98..96d95ddc81 100644 --- a/doc/guides/cryptodevs/features/default.ini +++ b/doc/guides/cryptodevs/features/default.ini @@ -33,6 +33,7 @@ Non-Byte aligned data = Sym raw data path API = Cipher multiple data units = Cipher wrapped key = +Inner checksum = ; ; Supported crypto algorithms of a default crypto driver. diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 05fc2fdee7..8308e00ed4 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -232,8 +232,8 @@ Deprecation Notices IPsec payload MSS (Maximum Segment Size), and ESN (Extended Sequence Number). * security: The IPsec SA config options ``struct rte_security_ipsec_sa_options`` - will be updated with new fields to support new features like IPsec inner - checksum, TSO in case of protocol offload. + will be updated with new fields to support new features like TSO in case of + protocol offload. * ipsec: The structure ``rte_ipsec_sa_prm`` will be extended with a new field ``hdr_l3_len`` to configure tunnel L3 header length. diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 8da851..93d1b36889 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -194,6 +194,10 @@ ABI Changes ``rte_security_ipsec_xform`` to allow applications to configure SA soft and hard expiry limits. Limits can be either in number of packets or bytes. +* security: The new options ``ip_csum_enable`` and ``l4_csum_enable`` were added + in structure ``rte_security_ipsec_sa_options`` to indicate whether inner + packet IPv4 header checksum and L4 checksum need to be offloaded to + security device. Known Issues diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h index bb01f0f195..d9271a6c45 100644 --- a/lib/cryptodev/rte_cryptodev.h +++ b/lib/cryptodev/rte_cryptodev.h @@ -479,6 +479,8 @@ rte_cryptodev_asym_get_xform_enum(enum rte_crypto_asym_xform_type *xform_enum, /**< Support operations on multiple data-units message */ #define RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY(1ULL << 26) /**< Support wrapped key in cipher xform */ +#define RTE_CRYPTODEV_FF_SECURITY_INNER_CSUM (1ULL << 27) +/**< Support inner checksum computation/verification */ /** * Get the name of a crypto device feature flag diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h index ab1a6e1f65..945f45ad76 100644 --- a/lib/security/rte_security.h +++ b/lib/security/rte_security.h @@ -230,6 +230,24 @@ struct rte_security_ipsec_sa_options { * * 0: Do not match UDP ports */ uint32_t udp_ports_verify : 1; + + /** Compute/verify inner packet IPv4 header checksum in tunnel mode +* +* * 1: For outbound, compute inner packet IPv4 header checksum +* before tunnel encapsulation and for inbound, verify after +* tunnel decapsulation. +* * 0: Inner packet IP header checksum is not computed/verified. +*/ + uint32_t ip_csum_enable : 1; + + /** Compute/verify inner packet L4 checksum in tunnel mode +* +* * 1: For outbound, compute inner packet L4 checksum before +* tunnel encapsulation and for inbound, verify after +* tunnel decapsulation. +* * 0: Inner packet L4 checksum is not computed/verified. +*/ + uint32_t l4_csum_enable : 1; }; /** IPSec security association direction */ -- 2.22.0
[dpdk-dev] [PATCH v2 2/3] crypto/cnxk: add inner checksum
Add inner checksum support for cn10k. Signed-off-by: Archana Muniganti --- doc/guides/cryptodevs/features/cn10k.ini | 1 + doc/guides/rel_notes/release_21_11.rst| 1 + drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 65 +++ drivers/crypto/cnxk/cn10k_ipsec.c | 49 +- drivers/crypto/cnxk/cn10k_ipsec.h | 1 + drivers/crypto/cnxk/cn10k_ipsec_la_ops.h | 9 ++- drivers/crypto/cnxk/cnxk_cryptodev.c | 3 + .../crypto/cnxk/cnxk_cryptodev_capabilities.c | 2 + 8 files changed, 113 insertions(+), 18 deletions(-) diff --git a/doc/guides/cryptodevs/features/cn10k.ini b/doc/guides/cryptodevs/features/cn10k.ini index f5552feca3..9d08bd5c04 100644 --- a/doc/guides/cryptodevs/features/cn10k.ini +++ b/doc/guides/cryptodevs/features/cn10k.ini @@ -15,6 +15,7 @@ OOP SGL In SGL Out = Y OOP LB In LB Out = Y Symmetric sessionless = Y Digest encrypted = Y +Inner checksum = Y ; ; Supported crypto algorithms of 'cn10k' crypto driver. diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 93d1b36889..163cdaa800 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -72,6 +72,7 @@ New Features * Added Transport mode support in lookaside protocol (IPsec) for CN10K. * Added UDP encapsulation support in lookaside protocol (IPsec) for CN10K. * Added support for lookaside protocol (IPsec) offload for CN9K. + * Added inner checksum support in lookaside protocol (IPsec) for CN10K. * **Added support for event crypto adapter on Marvell CN10K and CN9K.** diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c index 3caf05aab9..c25c8e67b2 100644 --- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c +++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c @@ -50,7 +50,7 @@ cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op) static __rte_always_inline int __rte_hot cpt_sec_inst_fill(struct rte_crypto_op *op, struct cn10k_sec_session *sess, - struct cpt_inst_s *inst) + struct cpt_inflight_req *infl_req, struct cpt_inst_s *inst) { struct rte_crypto_sym_op *sym_op = op->sym; union roc_ot_ipsec_sa_word2 *w2; @@ -72,8 +72,10 @@ cpt_sec_inst_fill(struct rte_crypto_op *op, struct cn10k_sec_session *sess, if (w2->s.dir == ROC_IE_SA_DIR_OUTBOUND) ret = process_outb_sa(op, sa, inst); - else + else { + infl_req->op_flags |= CPT_OP_FLAGS_IPSEC_DIR_INBOUND; ret = process_inb_sa(op, sa, inst); + } return ret; } @@ -122,7 +124,8 @@ cn10k_cpt_fill_inst(struct cnxk_cpt_qp *qp, struct rte_crypto_op *ops[], if (op->sess_type == RTE_CRYPTO_OP_SECURITY_SESSION) { sec_sess = get_sec_session_private_data( sym_op->sec_session); - ret = cpt_sec_inst_fill(op, sec_sess, &inst[0]); + ret = cpt_sec_inst_fill(op, sec_sess, infl_req, + &inst[0]); if (unlikely(ret)) return 0; w7 = sec_sess->sa.inst.w7; @@ -342,6 +345,49 @@ cn10k_cpt_sec_post_process(struct rte_crypto_op *cop, m->pkt_len = m_len; } +static inline void +cn10k_cpt_sec_ucc_process(struct rte_crypto_op *cop, + struct cpt_inflight_req *infl_req, + const uint8_t uc_compcode) +{ + struct cn10k_sec_session *sess; + struct cn10k_ipsec_sa *sa; + struct rte_mbuf *mbuf; + + if (uc_compcode == ROC_IE_OT_UCC_SUCCESS_SA_SOFTEXP_FIRST) + cop->aux_flags = RTE_CRYPTO_OP_AUX_FLAGS_IPSEC_SOFT_EXPIRY; + + if (!(infl_req->op_flags & CPT_OP_FLAGS_IPSEC_DIR_INBOUND)) + return; + + sess = get_sec_session_private_data(cop->sym->sec_session); + sa = &sess->sa; + + mbuf = cop->sym->m_src; + + switch (uc_compcode) { + case ROC_IE_OT_UCC_SUCCESS: + if (sa->ip_csum_enable) + mbuf->ol_flags |= PKT_RX_IP_CKSUM_GOOD; + break; + case ROC_IE_OT_UCC_SUCCESS_PKT_IP_BADCSUM: + mbuf->ol_flags |= PKT_RX_IP_CKSUM_BAD; + break; + case ROC_IE_OT_UCC_SUCCESS_PKT_L4_GOODCSUM: + mbuf->ol_flags |= PKT_RX_L4_CKSUM_GOOD; + if (sa->ip_csum_enable) + mbuf->ol_flags |= PKT_RX_IP_CKSUM_GOOD; + break; + case ROC_IE_OT_UCC_SUCCESS_PKT_L4_BADCSUM: + mbuf->ol_flags |= PKT_RX_L4_CKSUM_BAD; + if (sa->ip_csum_enable) + mbuf->ol_flags |= PKT_RX_IP_CKSUM_GOOD; + break; + default: + break; + } +} + static inline void cn10k_cpt_de
[dpdk-dev] [PATCH v2 3/3] test/crypto: add inner checksum cases
This patch adds tests for inner IP and inner L4 checksum in IPsec mode. Signed-off-by: Archana Muniganti --- app/test/test_cryptodev.c | 34 +++ app/test/test_cryptodev_security_ipsec.c | 195 ++ app/test/test_cryptodev_security_ipsec.h | 2 + ...st_cryptodev_security_ipsec_test_vectors.h | 118 +++ doc/guides/rel_notes/release_21_11.rst| 1 + 5 files changed, 350 insertions(+) diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c index 5f0d023451..c127e6bc04 100644 --- a/app/test/test_cryptodev.c +++ b/app/test/test_cryptodev.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #ifdef RTE_CRYPTO_SCHEDULER #include @@ -9275,6 +9277,30 @@ test_ipsec_proto_udp_ports_verify(const void *data __rte_unused) return test_ipsec_proto_all(&flags); } +static int +test_ipsec_proto_inner_ip_csum(const void *data __rte_unused) +{ + struct ipsec_test_flags flags; + + memset(&flags, 0, sizeof(flags)); + + flags.ip_csum = true; + + return test_ipsec_proto_all(&flags); +} + +static int +test_ipsec_proto_inner_l4_csum(const void *data __rte_unused) +{ + struct ipsec_test_flags flags; + + memset(&flags, 0, sizeof(flags)); + + flags.l4_csum = true; + + return test_ipsec_proto_all(&flags); +} + static int test_PDCP_PROTO_all(void) { @@ -14231,6 +14257,14 @@ static struct unit_test_suite ipsec_proto_testsuite = { "Tunnel src and dst addr verification", ut_setup_security, ut_teardown, test_ipsec_proto_tunnel_src_dst_addr_verify), + TEST_CASE_NAMED_ST( + "Inner IP checksum", + ut_setup_security, ut_teardown, + test_ipsec_proto_inner_ip_csum), + TEST_CASE_NAMED_ST( + "Inner L4 checksum", + ut_setup_security, ut_teardown, + test_ipsec_proto_inner_l4_csum), TEST_CASES_END() /**< NULL terminate unit test array */ } }; diff --git a/app/test/test_cryptodev_security_ipsec.c b/app/test/test_cryptodev_security_ipsec.c index 764e77bbff..bcd9746c98 100644 --- a/app/test/test_cryptodev_security_ipsec.c +++ b/app/test/test_cryptodev_security_ipsec.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "test.h" @@ -103,6 +104,22 @@ test_ipsec_sec_caps_verify(struct rte_security_ipsec_xform *ipsec_xform, return -ENOTSUP; } + if (ipsec_xform->options.ip_csum_enable == 1 && + sec_cap->ipsec.options.ip_csum_enable == 0) { + if (!silent) + RTE_LOG(INFO, USER1, + "Inner IP checksum is not supported\n"); + return -ENOTSUP; + } + + if (ipsec_xform->options.l4_csum_enable == 1 && + sec_cap->ipsec.options.l4_csum_enable == 0) { + if (!silent) + RTE_LOG(INFO, USER1, + "Inner L4 checksum is not supported\n"); + return -ENOTSUP; + } + return 0; } @@ -160,6 +177,56 @@ test_ipsec_td_in_from_out(const struct ipsec_test_data *td_out, } } +static bool +is_ipv4(void *ip) +{ + struct rte_ipv4_hdr *ipv4 = ip; + uint8_t ip_ver; + + ip_ver = (ipv4->version_ihl & 0xf0) >> RTE_IPV4_IHL_MULTIPLIER; + if (ip_ver == IPVERSION) + return true; + else + return false; +} + +static void +test_ipsec_csum_init(void *ip, bool l3, bool l4) +{ + struct rte_ipv4_hdr *ipv4; + struct rte_tcp_hdr *tcp; + struct rte_udp_hdr *udp; + uint8_t next_proto; + uint8_t size; + + if (is_ipv4(ip)) { + ipv4 = ip; + size = sizeof(struct rte_ipv4_hdr); + next_proto = ipv4->next_proto_id; + + if (l3) + ipv4->hdr_checksum = 0; + } else { + size = sizeof(struct rte_ipv6_hdr); + next_proto = ((struct rte_ipv6_hdr *)ip)->proto; + } + + if (l4) { + switch (next_proto) { + case IPPROTO_TCP: + tcp = (struct rte_tcp_hdr *)RTE_PTR_ADD(ip, size); + tcp->cksum = 0; + break; + case IPPROTO_UDP: + udp = (struct rte_udp_hdr *)RTE_PTR_ADD(ip, size); + udp->dgram_cksum = 0; + break; + default: + return; + } + } +} + void test_ipsec_td_prepare(const struct crypto_param *param1, const struct crypto_param *param2, @@ -194,6 +261,17 @@ test_ipsec_td_prepare(const struct crypto_param *param1, if (flags->sa_expiry_pkts_soft)
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, 2021-09-29 at 00:26 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > could > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > question. > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > port > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > ethdev receive queues land into > > > > the same > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > is > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > > > > > > > > > > The > > > > > > > > > > > forwarding logic has to be changed slightly in > > > > > > > > > > > case > > > > > > > > > > > of shared rxq. > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > lookup > > > > > > > > > > > source stream according to mbuf->port, forwarding > > > > > > > > > > > to > > > > > > > > > > > target fs. > > > > > > > > > > > Packets from same source port could be grouped as > > > > > > > > > > > a > > > > > > > > > > > small burst to process, this will accelerates the > > > > > > > > > > > performance if traffic > > > > come from > > > > > > > > > > > limited ports. I'll introduce some common api to > > > > > > > > > > > do > > > > > > > > > > > shard rxq forwarding, call it with packets > > > > > > > > > > > handling > > > > > > > > > > > callback, so it suites for > > > > > > > > > > > all forwarding engine. Will sent patches soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > All ports will put the packets in to the same queue > > > > > > > > > > (share queue), right? Does > > > > > > > > > > this means only single core will poll only, what > > > > > > > > > > will > > > > > > > > > > happen if there are > > > > > > > > > > multiple cores polling, won't it cause problem? > > > > > > > > > > > > > > > > > > > > And if this requires specific changes in the > > > > > > > > > > application, I am not sure about > > > > > > > > > > t
[dpdk-dev] [PATCH v3 0/5] Virtio PMD RSS support & RSS fixes
This series is mainly adding support for RSS to Virtio PMD driver. The two last patches are fixing an issue in testpmd that could cause out of bounds access, and fix an issue spotted in the mlx5 driver while looking for inspiration. The first motivation for this series is to eventually support RSS down to the Vhost-user library, so that OVS can benefit from it. But it will be also useful with vDPA devices in the future. Regarding the testing, I have tested it with qemu v5.2 from Fedora 34. Since libvirt does not support yet enabling RSS feature in the Qemu virtio-net device, and this feature is disabled by default, the tester can either rebuild the qemu package to enable it by default or use the qemu cmdline to do the same. The tester can use testpmd in icmpecho mode in the guest and scapy on the host to inject random traffic on the tap interface, e.g.: sendp(Ether(src=RandMAC()) / IP(src=RandIP(), dst='192.168.123.9') / UDP(sport=RandShort(), dport=RandShort()), loop=True, iface='vnet7') Then it can play with RSS config in testpmd to change the RETA, or hash type and see traffic being steered accordingly by checking the Rx xstats. Changes in v3: == - Add applying user-specified RSS conf a device config time (Andrew) - Remove useless checks (Chenbo) - Clean control message payload dlen variable (Chenbo) - Add GTPU offload type (Xiaoyun) - Add missing types to str2flowtype() (Xiaoyun) Changes in v2: == - Rework patch 2 to keep old behaviour, but fix possible out of bounds due to key length (Andrew/Nelio/Xiaoyun) - s/reta/RETA/ (Andrew) - Applied A-by on patch 3 (Slava) - Fix display of configured hash types - Add missing flow types definition to testpmd's port info command Maxime Coquelin (5): net/virtio: add initial RSS support app/testpmd: fix RSS key length app/testpmd: fix RSS type display net/mlx5: fix RSS RETA update app/testpmd: add missing flow types in port info app/test-pmd/cmdline.c | 4 + app/test-pmd/config.c | 11 +- doc/guides/nics/features/virtio.ini| 3 + doc/guides/nics/virtio.rst | 3 + doc/guides/rel_notes/release_21_11.rst | 6 + drivers/net/mlx5/mlx5_rss.c| 2 +- drivers/net/virtio/virtio.h| 31 +- drivers/net/virtio/virtio_ethdev.c | 394 - drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtqueue.h | 21 ++ 10 files changed, 466 insertions(+), 12 deletions(-) -- 2.31.1
[dpdk-dev] [PATCH v3 1/5] net/virtio: add initial RSS support
Provide the capability to update the hash key, hash types and RETA table on the fly (without needing to stop/start the device). However, the key length and the number of RETA entries are fixed to 40B and 128 entries respectively. This is done in order to simplify the design, but may be revisited later as the Virtio spec provides this flexibility. Note that only VIRTIO_NET_F_RSS support is implemented, VIRTIO_NET_F_HASH_REPORT, which would enable reporting the packet RSS hash calculated by the device into mbuf.rss, is not yet supported. Regarding the default RSS configuration, it has been chosen to use the default Intel ixgbe key as default key, and default RETA is a simple modulo between the hash and the number of Rx queues. Signed-off-by: Maxime Coquelin --- doc/guides/nics/features/virtio.ini| 3 + doc/guides/nics/virtio.rst | 3 + doc/guides/rel_notes/release_21_11.rst | 6 + drivers/net/virtio/virtio.h| 31 +- drivers/net/virtio/virtio_ethdev.c | 394 - drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtqueue.h | 21 ++ 7 files changed, 452 insertions(+), 9 deletions(-) diff --git a/doc/guides/nics/features/virtio.ini b/doc/guides/nics/features/virtio.ini index 48f6f393b1..a5eab4932f 100644 --- a/doc/guides/nics/features/virtio.ini +++ b/doc/guides/nics/features/virtio.ini @@ -14,6 +14,9 @@ Promiscuous mode = Y Allmulticast mode= Y Unicast MAC filter = Y Multicast MAC filter = Y +RSS hash = P +RSS key update = Y +RSS reta update = Y VLAN filter = Y Basic stats = Y Stats per queue = Y diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst index 82ce7399ce..98e0d012b7 100644 --- a/doc/guides/nics/virtio.rst +++ b/doc/guides/nics/virtio.rst @@ -73,6 +73,9 @@ In this release, the virtio PMD driver provides the basic functionality of packe * Virtio supports using port IO to get PCI resource when UIO module is not available. +* Virtio supports RSS Rx mode with 40B configurable hash key length, 128 +configurable RETA entries and configurable hash types. + Prerequisites - diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 19356ac53c..18591655ca 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -106,6 +106,12 @@ New Features Added command-line options to specify total number of processes and current process ID. Each process owns subset of Rx and Tx queues. +* **Added initial RSS support to Virtio PMD.** + + Initial support for RSS receive mode has been added to the Virtio PMD, + with the capability for the application to configure the hash key, the + RETA and the hash types. Virtio hash reporting is yet to be added. + Removed Items - diff --git a/drivers/net/virtio/virtio.h b/drivers/net/virtio/virtio.h index 525e2dad4c..b4f21dc0c7 100644 --- a/drivers/net/virtio/virtio.h +++ b/drivers/net/virtio/virtio.h @@ -30,6 +30,7 @@ #define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the network */ #define VIRTIO_NET_F_MQ22 /* Device supports Receive Flow Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_RSS 60 /* RSS supported */ /* * Do we get callbacks when the ring is completely used, @@ -100,6 +101,29 @@ */ #define VIRTIO_MAX_INDIRECT ((int)(rte_mem_page_size() / 16)) +/* Virtio RSS hash types */ +#define VIRTIO_NET_HASH_TYPE_IPV4 (1 << 0) +#define VIRTIO_NET_HASH_TYPE_TCPV4 (1 << 1) +#define VIRTIO_NET_HASH_TYPE_UDPV4 (1 << 2) +#define VIRTIO_NET_HASH_TYPE_IPV6 (1 << 3) +#define VIRTIO_NET_HASH_TYPE_TCPV6 (1 << 4) +#define VIRTIO_NET_HASH_TYPE_UDPV6 (1 << 5) +#define VIRTIO_NET_HASH_TYPE_IP_EX (1 << 6) +#define VIRTIO_NET_HASH_TYPE_TCP_EX(1 << 7) +#define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8) + +#define VIRTIO_NET_HASH_TYPE_MASK ( \ + VIRTIO_NET_HASH_TYPE_IPV4 | \ + VIRTIO_NET_HASH_TYPE_TCPV4 | \ + VIRTIO_NET_HASH_TYPE_UDPV4 | \ + VIRTIO_NET_HASH_TYPE_IPV6 | \ + VIRTIO_NET_HASH_TYPE_TCPV6 | \ + VIRTIO_NET_HASH_TYPE_UDPV6 | \ + VIRTIO_NET_HASH_TYPE_IP_EX | \ + VIRTIO_NET_HASH_TYPE_TCP_EX | \ + VIRTIO_NET_HASH_TYPE_UDP_EX) + + /* * Maximum number of virtqueues per device. */ @@ -157,7 +181,9 @@ struct virtio_net_config { * Any other value stands for unknown. */ uint8_t duplex; - + uint8_t rss_max_key_size; + uint16_t rss_max_indirection_table_length; + uint32_t supported_hash_types; } __rte_packed; struct virtio_hw { @@ -190,6 +216,9 @@ struct virtio_hw { rte_spinlock_t state_lock; struct rte_mbuf **inject_pkts; uint16_t max_queue_pairs; + uint32_t rss_hash_types; + uint16_t *rss_reta; + uint8_t *rs
[dpdk-dev] [PATCH v3 2/5] app/testpmd: fix RSS key length
port_rss_hash_key_update() initializes rss_conf with the RSS key configuration provided by the user, but it calls rte_eth_dev_rss_hash_conf_get() before calling rte_eth_dev_rss_hash_update(), which overrides the parsed RSS config. While the RSS key value is set again after, this is not the case of the key length. It could cause out of bounds access if the key length parsed is smaller than the one read from rte_eth_dev_rss_hash_conf_get(). This patch restores the key length before the rte_eth_dev_rss_hash_update() call to ensure the RSS key value/length pair is consistent. Fixes: 8205e241b2b0 ("app/testpmd: add missing type to RSS hash commands") Cc: sta...@dpdk.org Signed-off-by: Maxime Coquelin Acked-by: Xiaoyun Li Reviewed-by: Chenbo Xia --- app/test-pmd/config.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 9c66329e96..611965769c 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -2854,7 +2854,7 @@ port_rss_hash_key_update(portid_t port_id, char rss_type[], uint8_t *hash_key, unsigned int i; rss_conf.rss_key = NULL; - rss_conf.rss_key_len = hash_key_len; + rss_conf.rss_key_len = 0; rss_conf.rss_hf = 0; for (i = 0; rss_type_table[i].str; i++) { if (!strcmp(rss_type_table[i].str, rss_type)) @@ -2863,6 +2863,7 @@ port_rss_hash_key_update(portid_t port_id, char rss_type[], uint8_t *hash_key, diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf); if (diag == 0) { rss_conf.rss_key = hash_key; + rss_conf.rss_key_len = hash_key_len; diag = rte_eth_dev_rss_hash_update(port_id, &rss_conf); } if (diag == 0) -- 2.31.1
[dpdk-dev] [PATCH v3 3/5] app/testpmd: fix RSS type display
This patch fixes the display of the RSS hash types configured in the port, which displayed "all" even if only a single type was configured Fixes: 3c90743dd3b9 ("app/testpmd: support more types for flow RSS") Cc: sta...@dpdk.org Signed-off-by: Maxime Coquelin Acked-by: Xiaoyun Li Reviewed-by: Chenbo Xia --- app/test-pmd/config.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 611965769c..9a4a0c232b 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -2833,7 +2833,9 @@ port_rss_hash_conf_show(portid_t port_id, int show_rss_key) } printf("RSS functions:\n "); for (i = 0; rss_type_table[i].str; i++) { - if (rss_hf & rss_type_table[i].rss_type) + if (rss_type_table[i].rss_type == 0) + continue; + if ((rss_hf & rss_type_table[i].rss_type) == rss_type_table[i].rss_type) printf("%s ", rss_type_table[i].str); } printf("\n"); -- 2.31.1
[dpdk-dev] [PATCH v3 4/5] net/mlx5: fix RSS RETA update
This patch fixes RETA updating for entries above 64. Without ithat, these entries are never updated as calculated mask value will always be 0. Fixes: 634efbc2c8c0 ("mlx5: support RETA query and update") Cc: sta...@dpdk.org Cc: nelio.laranje...@6wind.com Signed-off-by: Maxime Coquelin Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_rss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c index c32129cdc2..6dc52acee0 100644 --- a/drivers/net/mlx5/mlx5_rss.c +++ b/drivers/net/mlx5/mlx5_rss.c @@ -211,7 +211,7 @@ mlx5_dev_rss_reta_update(struct rte_eth_dev *dev, for (idx = 0, i = 0; (i != reta_size); ++i) { idx = i / RTE_RETA_GROUP_SIZE; pos = i % RTE_RETA_GROUP_SIZE; - if (((reta_conf[idx].mask >> i) & 0x1) == 0) + if (((reta_conf[idx].mask >> pos) & 0x1) == 0) continue; MLX5_ASSERT(reta_conf[idx].reta[pos] < priv->rxqs_n); (*priv->reta_idx)[i] = reta_conf[idx].reta[pos]; -- 2.31.1
[dpdk-dev] [PATCH v3 5/5] app/testpmd: add missing flow types in port info
This patch adds missing IPv6-Ex and GPTU flow types to port info command. It also add the same definitions to str2flowtype(), used to configure flow director. Signed-off-by: Maxime Coquelin --- app/test-pmd/cmdline.c | 4 app/test-pmd/config.c | 4 2 files changed, 8 insertions(+) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index a9efd027c3..259d9cee4e 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -10408,6 +10408,10 @@ str2flowtype(char *string) {"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP}, {"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER}, {"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD}, + {"ipv6-ex", RTE_ETH_FLOW_IPV6_EX}, + {"ipv6-tcp-ex", RTE_ETH_FLOW_IPV6_TCP_EX}, + {"ipv6-udp-ex", RTE_ETH_FLOW_IPV6_UDP_EX}, + {"gptu", RTE_ETH_FLOW_GTPU}, }; for (i = 0; i < RTE_DIM(flowtype_str); i++) { diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 9a4a0c232b..39deb02950 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -4528,11 +4528,15 @@ flowtype_to_str(uint16_t flow_type) {"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP}, {"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER}, {"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD}, + {"ipv6-ex", RTE_ETH_FLOW_IPV6_EX}, + {"ipv6-tcp-ex", RTE_ETH_FLOW_IPV6_TCP_EX}, + {"ipv6-udp-ex", RTE_ETH_FLOW_IPV6_UDP_EX}, {"port", RTE_ETH_FLOW_PORT}, {"vxlan", RTE_ETH_FLOW_VXLAN}, {"geneve", RTE_ETH_FLOW_GENEVE}, {"nvgre", RTE_ETH_FLOW_NVGRE}, {"vxlan-gpe", RTE_ETH_FLOW_VXLAN_GPE}, + {"gptu", RTE_ETH_FLOW_GTPU}, }; for (i = 0; i < RTE_DIM(flowtype_str_table); i++) { -- 2.31.1
Re: [dpdk-dev] [PATCH] net/bnxt: fix tunnel port accounting
On Mon, Sep 27, 2021 at 3:48 PM Ajit Khaparde wrote: > > Fix the tunnel port counting logic. > Currently we are incrementing the port count without checking > the if bnxt_hwrm_tunnel_dst_port_alloc would return success or failure. > Modify the logic to increment it only if the firmware returns success. > > Fixes: 10d074b2022d ("net/bnxt: support tunneling") > Cc: sta...@dpdk.org > > Signed-off-by: Ajit Khaparde > Reviewed-by: Lance Richardson Patch applied to dpdk-next-net-brcm.
Re: [dpdk-dev] cannot use virtio-user pmd in IOVA=PA mode anymore
Hi Olivier, On 9/24/21 10:21, Olivier Matz wrote: Hello, I recently tested a use-case with our application using the main branch of dpdk.org. - the application runs inside a standard x86 VM (no IOMMU) - there are emulated physical NICs inside the VM - we use virtio-user pmds connected to tap interfaces through the vhost-net backend for exception path This use-case works with the the stable 20.11 branch, but not with the current main dpdk.org branch. The virtio-user driver refuses to start in IOVA=PA mode: vdev_probe_all_drivers(): virtio_user0 requires VA IOVA mode but current mode is PA, not initializing EAL: Driver cannot attach the device (virtio_user0) This is likely due to these commits: 8d935fff5546 ("bus/vdev: add driver IOVA VA mode requirement") 17043a2909bb ("net/virtio: force IOVA as VA mode for virtio-user") We didn't see this problem before because we are only testing dpdk.org main branch on physical machines that have an IOMMU. I'm not sure to understand the reasons for which the ability to run a virtio-user pmd in IOVA=PA mode was removed. The commitlog of 17043a2909bb says: At least Vhost-user backend of Virtio-user PMD requires IOVA as VA mode. Until now, it was implemented as a hack by forcing to use mbuf's buf_addr field instead of buf_iova. I don't get why vhost-user backend would require IOVA=VA. Note that we also have a use-case where a virtio-user pmd is connected to our pmd-vhost in another application, which was working with 20.11. If there is a constraint with vhost-user backend, what about vhost-net backend? Would it make sense to re-allow this feature by somehow reverting 17043a2909bb with some additional cleanup? I think so. I am aware that a solution can be to configure qemu to enable a vIOMMU, but it is not my preferred solution yet, as it would impact our users that do not do this currently. The initial purpose of this change was to simplify the code, I did not thought about this adverse side-effect. While moving to VFIO with IOMMU enabled would be a wise decision :), I will work on reverting that patch to restore the old behavior as that was not something that was agreed when the patch was submitted. Thanks for reporting the issue, Maxime Thanks! Olivier
Re: [dpdk-dev] [PATCH v6 2/2] vhost: enable IOMMU for async vhost
On 29-Sep-21 7:12 AM, Hu, Jiayu wrote: Hi Xuan, -Original Message- From: Ding, Xuan Sent: Wednesday, September 29, 2021 10:41 AM To: dev@dpdk.org; Burakov, Anatoly ; maxime.coque...@redhat.com; Xia, Chenbo Cc: Hu, Jiayu ; Jiang, Cheng1 ; Richardson, Bruce ; Pai G, Sunil ; Wang, Yinan ; Yang, YvonneX ; Ding, Xuan Subject: [PATCH v6 2/2] vhost: enable IOMMU for async vhost The use of IOMMU has many advantages, such as isolation and address translation. This patch extends the capbility of DMA engine to use IOMMU if the DMA engine is bound to vfio. When set memory table, the guest memory will be mapped into the default container of DPDK. Signed-off-by: Xuan Ding --- +async_dma_map(struct rte_vhost_mem_region *region, bool +*dma_map_success, bool do_map) { +uint64_t host_iova; +int ret = 0; + +host_iova = rte_mem_virt2iova((void *)(uintptr_t)region- host_user_addr); +if (do_map) { +/* Add mapped region into the default container of DPDK. */ +ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, + region->host_user_addr, + host_iova, + region->size); +*dma_map_success = ret == 0; + +if (ret) { +/* + * DMA device may bind with kernel driver, in this case, + * we don't need to program IOMMU manually. However, if no + * device is bound with vfio/uio in DPDK, and vfio kernel + * module is loaded, the API will still be called and return + * with ENODEV/ENOSUP. + * + * DPDK vfio only returns ENODEV/ENOSUP in very similar + * situations(vfio either unsupported, or supported + * but no devices found). Either way, no mappings could be + * performed. We treat it as normal case in async path. + */ What do you mean by saying "vfio either unsupported"? Does it mean platform doesn't support iommu? Unsupported as in the VFIO driver is not loaded. We don't really care if the *system* supports VFIO as much as whether it's *accessible to us*. I'm sure you would agree that scenario "VFIO is not supported" is exactly equivalent to "VFIO driver is not loaded", because from our perspective, in both cases the VFIO driver is not loaded :) Thanks, Jiayu -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v4 06/13] net/bnxt: add support for tunnel offload API
On 9/29/2021 9:20 AM, Thomas Monjalon wrote: > 28/09/2021 23:32, Ajit Khaparde: >> On Tue, Sep 28, 2021 at 5:43 AM Ferruh Yigit wrote: >>> Following warning is reported by Jerin & Thomas: >>> >>> $ ./devtools/check-doc-vs-code.sh >>> rte_flow doc out of sync for bnxt >>> action shared >>> >>> >>> This is mainly saying 'RTE_FLOW_ACTION_TYPE_SHARED' flow action is >>> supported but >>> not documented. But from code I am not sure if shared action is used, there >>> is >>> only one range check using this enum in the code, can you please clarify? >> >> Correct. RTE_FLOW_ACTION_TYPE_SHARED is being used only for a range check. >> It is not really a supported action right now as far as the code is >> concerned. >> As of now, no document update is necessary. Thanks > > You need to update something to make the check OK. > If it is a false positive, it can be handled in the script, > but we cannot let the script returning an error. > > I will add following exception to the script (suggested by Thomas) in the next-net: diff --git a/devtools/parse-flow-support.sh b/devtools/parse-flow-support.sh index 8462abe53603..63c0b20e234f 100755 --- a/devtools/parse-flow-support.sh +++ b/devtools/parse-flow-support.sh @@ -25,7 +25,8 @@ exclude() # $dir/tf_ulp/ulp_rte_handler_tbl.c | grep -wo "$1[[:alnum:]_]*" | sort -u | tr '\n' '|' | sed 's,.$,\n,') - grep -vE "$filter";; + exceptions='RTE_FLOW_ACTION_TYPE_SHARED' + grep -vE "$filter" | grep -vE $exceptions;; *) cat esac }
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
> -Original Message- > From: Xueming(Steven) Li > Sent: Wednesday, September 29, 2021 10:13 AM > To: jerinjac...@gmail.com; Ananyev, Konstantin > Cc: NBU-Contact-Thomas Monjalon ; > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > On Wed, 2021-09-29 at 00:26 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > could > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > port > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > > ethdev receive queues land into > > > > > the same > > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > > is > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > > > > > > > > > > > The > > > > > > > > > > > > forwarding logic has to be changed slightly in > > > > > > > > > > > > case > > > > > > > > > > > > of shared rxq. > > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > > lookup > > > > > > > > > > > > source stream according to mbuf->port, forwarding > > > > > > > > > > > > to > > > > > > > > > > > > target fs. > > > > > > > > > > > > Packets from same source port could be grouped as > > > > > > > > > > > > a > > > > > > > > > > > > small burst to process, this will accelerates the > > > > > > > > > > > > performance if traffic > > > > > come from > > > > > > > > > > > > limited ports. I'll introduce some common api to > > > > > > > > > > > > do > > > > > > > > > > > > shard rxq forwarding, call it with packets > > > > > > > > > > > > handling > > > > > > > > > > > > callback, so it suites for > > > > > > > > > > > > all forwarding engine. Will sent patches soon. > > > > > > > > > > >
[dpdk-dev] [PATCH] doc/guides: update dev list with Marvell crypto PMDs
Update list with following PMDs, - crypto_cn9k - crypto_cn10k - crypto_octeontx - crypto_octeontx2 Also made the list alphabetical. Signed-off-by: Anoob Joseph --- doc/guides/tools/cryptoperf.rst | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/doc/guides/tools/cryptoperf.rst b/doc/guides/tools/cryptoperf.rst index 4159674..ce93483 100644 --- a/doc/guides/tools/cryptoperf.rst +++ b/doc/guides/tools/cryptoperf.rst @@ -146,19 +146,23 @@ The following are the application command-line options: Set device type, where ``name`` is one of the following:: - crypto_null - crypto_aesni_mb crypto_aesni_gcm + crypto_aesni_mb + crypto_armv8 + crypto_cn9k + crypto_cn10k + crypto_dpaa_sec + crypto_dpaa2_sec + crypto_kasumi + crypto_mvsam + crypto_null + crypto_octeontx + crypto_octeontx2 crypto_openss crypto_qat + crypto_scheduler crypto_snow3g - crypto_kasumi crypto_zuc - crypto_dpaa_sec - crypto_dpaa2_sec - crypto_armv8 - crypto_scheduler - crypto_mvsam * ``--optype `` -- 2.7.4
Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx
Hi Ferruh, What you described above looks like a ring buffer with single producer and > single consumer, and producer overwrites the not consumed items. Indeed. This is also my understanding of the bug. I am going to try to isolate the issue, and should probably be able to come up with a script in a few days. Our of curiosity, are you using an modified af_packet implementation in > kernel > for above described usage? We are currently using an Ubuntu-based distro with a 4.15 Linux kernel. We don't have any kernel patches for the af_packet implementation to my knowledge (probably excepting patches that are back-ported by Ubuntu maintainers from newer releases). On Mon, 20 Sept 2021 at 20:44, Ferruh Yigit wrote: > On 9/13/2021 2:45 PM, Tudor Cornea wrote: > > The poll call can return POLLERR which is ignored, or it can return > > POLLOUT, even if there are no free frames in the mmap-ed area. > > > > We can account for both of these cases by re-checking if the next > > frame is empty before writing into it. > > > > Signed-off-by: Mihai Pogonaru > > Signed-off-by: Tudor Cornea > > --- > > drivers/net/af_packet/rte_eth_af_packet.c | 19 +++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c > b/drivers/net/af_packet/rte_eth_af_packet.c > > index b73b211..087c196 100644 > > --- a/drivers/net/af_packet/rte_eth_af_packet.c > > +++ b/drivers/net/af_packet/rte_eth_af_packet.c > > @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf > **bufs, uint16_t nb_pkts) > > (poll(&pfd, 1, -1) < 0)) > > break; > > > > + /* > > + * Poll can return POLLERR if the interface is down > > + * > > + * It will almost always return POLLOUT, even if there > > + * are no extra buffers available > > + * > > + * This happens, because packet_poll() calls > datagram_poll() > > + * which checks the space left in the socket buffer and, > > + * in the case of packet_mmap, the default socket buffer > length > > + * doesn't match the requested size for the tx_ring. > > + * As such, there is almost always space left in socket > buffer, > > + * which doesn't seem to be correlated to the requested > size > > + * for the tx_ring in packet_mmap. > > + * > > + * This results in poll() returning POLLOUT. > > + */ > > + if (ppd->tp_status != TP_STATUS_AVAILABLE) > > + break; > > + > > If 'POLLOUT' doesn't indicate that there is space in the buffer, what is > the > point of the 'poll()' at all? > > What can we test/reproduce the mentioned behavior? Or is there a way to > fix the > behavior of poll() or use an alternative of it? > > > OK to break on the 'POLLERR', I guess it can be detected in the > 'pfd.revent'. > > > > /* copy the tx frame data */ > > pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN - > > sizeof(struct sockaddr_ll); > > > >
Re: [dpdk-dev] [PATCH v1 1/1] vfio: add page-by-page mapping API
On 10-Sep-21 12:27 PM, Anatoly Burakov wrote: Currently, there is no way to map memory for DMA in a way that allows unmapping it partially later, because some IOMMU's do not support partial unmapping. There is a workaround of mapping all of these segments separately, but this is inconvenient and silly, so this commit adds a proper API that does it. This commit relies on earlier infrastructure that was built out to support "chunking", as the concept of "chunks" is essentially the same as page size. Signed-off-by: Anatoly Burakov --- lib/eal/freebsd/eal.c | 10 lib/eal/include/rte_vfio.h | 33 ++ lib/eal/linux/eal_vfio.c | 93 +++--- lib/eal/version.map| 3 ++ lib/eal/windows/eal.c | 10 5 files changed, 133 insertions(+), 16 deletions(-) diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 6cee5ae369..78e18f9765 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -1085,6 +1085,16 @@ rte_vfio_container_dma_map(__rte_unused int container_fd, return -1; } +int +rte_vfio_container_dma_map_paged(__rte_unused int container_fd, + __rte_unused uint64_t vaddr, + __rte_unused uint64_t iova, + __rte_unused uint64_t len, + __rte_unused uint64_t pagesz) +{ + return -1; +} + int rte_vfio_container_dma_unmap(__rte_unused int container_fd, __rte_unused uint64_t vaddr, diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h index 2d90b36480..6afae2ccce 100644 --- a/lib/eal/include/rte_vfio.h +++ b/lib/eal/include/rte_vfio.h @@ -17,6 +17,8 @@ extern "C" { #include #include +#include + /* * determine if VFIO is present on the system */ @@ -331,6 +333,37 @@ int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova, uint64_t len); +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Perform DMA mapping for devices in a container, mapping memory page-by-page. + * + * @param container_fd + * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to + * use the default container. + * + * @param vaddr + * Starting virtual address of memory to be mapped. + * + * @param iova + * Starting IOVA address of memory to be mapped. + * + * @param len + * Length of memory segment being mapped. + * + * @param pagesz + * Page size of the underlying memory. + * + * @return + *0 if successful + * <0 if failed + */ +__rte_experimental +int +rte_vfio_container_dma_map_paged(int container_fd, uint64_t vaddr, + uint64_t iova, uint64_t len, uint64_t pagesz); + /** * Perform DMA unmapping for devices in a container. * diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index 657c89ca58..c791730251 100644 --- a/lib/eal/linux/eal_vfio.c +++ b/lib/eal/linux/eal_vfio.c @@ -1872,11 +1872,12 @@ vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, static int container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, - uint64_t len) + uint64_t len, uint64_t pagesz) { struct user_mem_map *new_map; struct user_mem_maps *user_mem_maps; bool has_partial_unmap; + uint64_t chunk_size; int ret = 0; user_mem_maps = &vfio_cfg->mem_maps; @@ -1887,19 +1888,37 @@ container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, ret = -1; goto out; } - /* map the entry */ - if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) { - /* technically, this will fail if there are currently no devices -* plugged in, even if a device were added later, this mapping -* might have succeeded. however, since we cannot verify if this -* is a valid mapping without having a device attached, consider -* this to be unsupported, because we can't just store any old -* mapping and pollute list of active mappings willy-nilly. -*/ - RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n"); - ret = -1; - goto out; + + /* technically, mapping will fail if there are currently no devices +* plugged in, even if a device were added later, this mapping might +* have succeeded. however, since we cannot verify if this is a valid +* mapping without having a device attached, consider this to be +* unsupported, because we can't just store any old mapping and pollute +* list of active mappings willy-nilly. +*/ + + /* if page size was not specified, map the entire segment in one go */ + if (pagesz == 0) { + if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) { + RTE_LOG(ERR,
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
> > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > could > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > port > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > > ethdev receive queues land into > > > > > the same > > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > > is > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > > > > > > > > > > > The > > > > > > > > > > > > forwarding logic has to be changed slightly in > > > > > > > > > > > > case > > > > > > > > > > > > of shared rxq. > > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > > lookup > > > > > > > > > > > > source stream according to mbuf->port, forwarding > > > > > > > > > > > > to > > > > > > > > > > > > target fs. > > > > > > > > > > > > Packets from same source port could be grouped as > > > > > > > > > > > > a > > > > > > > > > > > > small burst to process, this will accelerates the > > > > > > > > > > > > performance if traffic > > > > > come from > > > > > > > > > > > > limited ports. I'll introduce some common api to > > > > > > > > > > > > do > > > > > > > > > > > > shard rxq forwarding, call it with packets > > > > > > > > > > > > handling > > > > > > > > > > > > callback, so it suites for > > > > > > > > > > > > all forwarding engine. Will sent patches soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > All ports will put the packets in to the same queue > > > > > > > > > > > (share queue), right? Does > > > > > > > > > > > this means only single core will poll only, what > > > > > > > > > > > will > > > > > > > > > > > happen if there are > > > > > > > > > > > multiple cores polling, won't it cause problem? > > > > > > > > > > > > > > > > > > >
Re: [dpdk-dev] [PATCH v3 4/8] examples/ioat: add cmd line option to control stats print interval
On 28/09/2021 17:29, Kevin Laatz wrote: Add a command line option to control the interval between stats prints. Signed-off-by: Kevin Laatz --- Reviewed-by: Conor Walsh
Re: [dpdk-dev] [PATCH v3 5/8] examples/ioat: add signal-triggered device dumps
On 28/09/2021 17:29, Kevin Laatz wrote: Enable dumping device info via the signal handler. With this change, when a SIGUSR1 is issued, the application will print a dump of all devices being used by the application. Signed-off-by: Kevin Laatz --- Reviewed-by: Conor Walsh
Re: [dpdk-dev] [PATCH v2 1/3] security: add SA config option for inner pkt csum
> Add inner packet IPv4 hdr and L4 checksum enable options > in conf. These will be used in case of protocol offload. > Per SA, application could specify whether the > checksum(compute/verify) can be offloaded to security device. > > Signed-off-by: Archana Muniganti > --- > doc/guides/cryptodevs/features/default.ini | 1 + > doc/guides/rel_notes/deprecation.rst | 4 ++-- > doc/guides/rel_notes/release_21_11.rst | 4 > lib/cryptodev/rte_cryptodev.h | 2 ++ > lib/security/rte_security.h| 18 ++ > 5 files changed, 27 insertions(+), 2 deletions(-) > > diff --git a/doc/guides/cryptodevs/features/default.ini > b/doc/guides/cryptodevs/features/default.ini > index c24814de98..96d95ddc81 100644 > --- a/doc/guides/cryptodevs/features/default.ini > +++ b/doc/guides/cryptodevs/features/default.ini > @@ -33,6 +33,7 @@ Non-Byte aligned data = > Sym raw data path API = > Cipher multiple data units = > Cipher wrapped key = > +Inner checksum = > > ; > ; Supported crypto algorithms of a default crypto driver. > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 05fc2fdee7..8308e00ed4 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -232,8 +232,8 @@ Deprecation Notices >IPsec payload MSS (Maximum Segment Size), and ESN (Extended Sequence > Number). > > * security: The IPsec SA config options ``struct > rte_security_ipsec_sa_options`` > - will be updated with new fields to support new features like IPsec inner > - checksum, TSO in case of protocol offload. > + will be updated with new fields to support new features like TSO in case of > + protocol offload. > > * ipsec: The structure ``rte_ipsec_sa_prm`` will be extended with a new field >``hdr_l3_len`` to configure tunnel L3 header length. > diff --git a/doc/guides/rel_notes/release_21_11.rst > b/doc/guides/rel_notes/release_21_11.rst > index 8da851..93d1b36889 100644 > --- a/doc/guides/rel_notes/release_21_11.rst > +++ b/doc/guides/rel_notes/release_21_11.rst > @@ -194,6 +194,10 @@ ABI Changes >``rte_security_ipsec_xform`` to allow applications to configure SA soft >and hard expiry limits. Limits can be either in number of packets or bytes. > > +* security: The new options ``ip_csum_enable`` and ``l4_csum_enable`` were > added > + in structure ``rte_security_ipsec_sa_options`` to indicate whether inner > + packet IPv4 header checksum and L4 checksum need to be offloaded to > + security device. > > Known Issues > > diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h > index bb01f0f195..d9271a6c45 100644 > --- a/lib/cryptodev/rte_cryptodev.h > +++ b/lib/cryptodev/rte_cryptodev.h > @@ -479,6 +479,8 @@ rte_cryptodev_asym_get_xform_enum(enum > rte_crypto_asym_xform_type *xform_enum, > /**< Support operations on multiple data-units message */ > #define RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY (1ULL << 26) > /**< Support wrapped key in cipher xform */ > +#define RTE_CRYPTODEV_FF_SECURITY_INNER_CSUM (1ULL << 27) > +/**< Support inner checksum computation/verification */ > > /** > * Get the name of a crypto device feature flag > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h > index ab1a6e1f65..945f45ad76 100644 > --- a/lib/security/rte_security.h > +++ b/lib/security/rte_security.h > @@ -230,6 +230,24 @@ struct rte_security_ipsec_sa_options { >* * 0: Do not match UDP ports >*/ > uint32_t udp_ports_verify : 1; > + > + /** Compute/verify inner packet IPv4 header checksum in tunnel mode > + * > + * * 1: For outbound, compute inner packet IPv4 header checksum > + * before tunnel encapsulation and for inbound, verify after > + * tunnel decapsulation. > + * * 0: Inner packet IP header checksum is not computed/verified. > + */ > + uint32_t ip_csum_enable : 1; > + > + /** Compute/verify inner packet L4 checksum in tunnel mode > + * > + * * 1: For outbound, compute inner packet L4 checksum before > + * tunnel encapsulation and for inbound, verify after > + * tunnel decapsulation. > + * * 0: Inner packet L4 checksum is not computed/verified. > + */ > + uint32_t l4_csum_enable : 1; As I understand these 2 new flags serve two purposes: 1. report HW/PMD ability to perform these offloads. 2. allow user to enable/disable this offload on SA basis. One question I have - how it will work on data-path? Would decision to perform these offloads be based on mbuf->ol_flags value (same as we doing for ethdev TX offloads)? Or some other approach is implied? > }; > > /** IPSec security association direction */ > -- > 2.22.0
Re: [dpdk-dev] [PATCH v2 1/3] security: add SA config option for inner pkt csum
Hi Konstanin, Please see inline. Thanks, Anoob > -Original Message- > From: Ananyev, Konstantin > Sent: Wednesday, September 29, 2021 4:26 PM > To: Archana Muniganti ; Akhil Goyal > ; Nicolau, Radu ; Zhang, Roy > Fan ; hemant.agra...@nxp.com > Cc: Anoob Joseph ; Tejasree Kondoj > ; Ankur Dwivedi ; Jerin Jacob > Kollanukkaran ; dev@dpdk.org > Subject: [EXT] RE: [PATCH v2 1/3] security: add SA config option for inner pkt > csum > > External Email > > -- > > Add inner packet IPv4 hdr and L4 checksum enable options in conf. > > These will be used in case of protocol offload. > > Per SA, application could specify whether the > > checksum(compute/verify) can be offloaded to security device. > > > > Signed-off-by: Archana Muniganti > > --- > > doc/guides/cryptodevs/features/default.ini | 1 + > > doc/guides/rel_notes/deprecation.rst | 4 ++-- > > doc/guides/rel_notes/release_21_11.rst | 4 > > lib/cryptodev/rte_cryptodev.h | 2 ++ > > lib/security/rte_security.h| 18 ++ > > 5 files changed, 27 insertions(+), 2 deletions(-) > > > > diff --git a/doc/guides/cryptodevs/features/default.ini > > b/doc/guides/cryptodevs/features/default.ini > > index c24814de98..96d95ddc81 100644 > > --- a/doc/guides/cryptodevs/features/default.ini > > +++ b/doc/guides/cryptodevs/features/default.ini > > @@ -33,6 +33,7 @@ Non-Byte aligned data = Sym raw data path API = > > Cipher multiple data units = > > Cipher wrapped key = > > +Inner checksum = > > > > ; > > ; Supported crypto algorithms of a default crypto driver. > > diff --git a/doc/guides/rel_notes/deprecation.rst > > b/doc/guides/rel_notes/deprecation.rst > > index 05fc2fdee7..8308e00ed4 100644 > > --- a/doc/guides/rel_notes/deprecation.rst > > +++ b/doc/guides/rel_notes/deprecation.rst > > @@ -232,8 +232,8 @@ Deprecation Notices > >IPsec payload MSS (Maximum Segment Size), and ESN (Extended Sequence > Number). > > > > * security: The IPsec SA config options ``struct > > rte_security_ipsec_sa_options`` > > - will be updated with new fields to support new features like IPsec > > inner > > - checksum, TSO in case of protocol offload. > > + will be updated with new fields to support new features like TSO in > > + case of protocol offload. > > > > * ipsec: The structure ``rte_ipsec_sa_prm`` will be extended with a new > > field > >``hdr_l3_len`` to configure tunnel L3 header length. > > diff --git a/doc/guides/rel_notes/release_21_11.rst > > b/doc/guides/rel_notes/release_21_11.rst > > index 8da851..93d1b36889 100644 > > --- a/doc/guides/rel_notes/release_21_11.rst > > +++ b/doc/guides/rel_notes/release_21_11.rst > > @@ -194,6 +194,10 @@ ABI Changes > >``rte_security_ipsec_xform`` to allow applications to configure SA soft > >and hard expiry limits. Limits can be either in number of packets or > > bytes. > > > > +* security: The new options ``ip_csum_enable`` and ``l4_csum_enable`` > > +were added > > + in structure ``rte_security_ipsec_sa_options`` to indicate whether > > +inner > > + packet IPv4 header checksum and L4 checksum need to be offloaded to > > + security device. > > > > Known Issues > > > > diff --git a/lib/cryptodev/rte_cryptodev.h > > b/lib/cryptodev/rte_cryptodev.h index bb01f0f195..d9271a6c45 100644 > > --- a/lib/cryptodev/rte_cryptodev.h > > +++ b/lib/cryptodev/rte_cryptodev.h > > @@ -479,6 +479,8 @@ rte_cryptodev_asym_get_xform_enum(enum > > rte_crypto_asym_xform_type *xform_enum, /**< Support operations on > multiple data-units message */ > > #define RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY(1ULL << 26) > > /**< Support wrapped key in cipher xform */ > > +#define RTE_CRYPTODEV_FF_SECURITY_INNER_CSUM (1ULL > << 27) > > +/**< Support inner checksum computation/verification */ > > > > /** > > * Get the name of a crypto device feature flag diff --git > > a/lib/security/rte_security.h b/lib/security/rte_security.h index > > ab1a6e1f65..945f45ad76 100644 > > --- a/lib/security/rte_security.h > > +++ b/lib/security/rte_security.h > > @@ -230,6 +230,24 @@ struct rte_security_ipsec_sa_options { > > * * 0: Do not match UDP ports > > */ > > uint32_t udp_ports_verify : 1; > > + > > + /** Compute/verify inner packet IPv4 header checksum in tunnel mode > > +* > > +* * 1: For outbound, compute inner packet IPv4 header checksum > > +* before tunnel encapsulation and for inbound, verify after > > +* tunnel decapsulation. > > +* * 0: Inner packet IP header checksum is not computed/verified. > > +*/ > > + uint32_t ip_csum_enable : 1; > > + > > + /** Compute/verify inner packet L4 checksum in tunnel mode > > +* > > +* * 1: For outbound, compute inner packet L4 checksum before > > +* tunnel encapsulation and for inbound, verify after > > +* tunnel decapsulation. > >
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, Sep 29, 2021 at 09:52:20AM +, Ananyev, Konstantin wrote: > > > > -Original Message- > > From: Xueming(Steven) Li > > Sent: Wednesday, September 29, 2021 10:13 AM > > > + /* Locate real source fs according to mbuf->port. */ > > > + for (i = 0; i < nb_rx; ++i) { > > > + rte_prefetch0(pkts_burst[i + 1]); > > > > > > you access pkt_burst[] beyond array boundaries, > > > also you ask cpu to prefetch some unknown and possibly invalid > > > address. > > > > Sorry I forgot this topic. It's too late to prefetch current packet, so > > perfetch next is better. Prefetch an invalid address at end of a look > > doesn't hurt, it's common in DPDK. > > First of all it is usually never 'OK' to access array beyond its bounds. > Second prefetching invalid address *does* hurt performance badly on many CPUs > (TLB misses, consumed memory bandwidth etc.). > As a reference: https://lwn.net/Articles/444346/ > If some existing DPDK code really does that - then I believe it is an issue > and has to be addressed. > More important - it is really bad attitude to submit bogus code to DPDK > community > and pretend that it is 'OK'. > The main point we need to take from all this is that when prefetching you need to measure perf impact of it. In terms of the specific case of prefetching one past the end of the array, I would take the view that this is harmless in almost all cases. Unlike any prefetch of "NULL" as in the referenced mail, reading one past the end (or other small number of elements past the end) is far less likely to cause a TLB miss - and it's basically just reproducing behaviour we would expect off a HW prefetcher (though those my explicitly never cross page boundaries). However, if you feel it's just cleaner to put in an additional condition to remove the prefetch for the end case, that's ok also - again so long as it doesn't affect performance. [Since prefetch is a hint, I'm not sure if compilers or CPUs may be legally allowed to skip the branch and blindly prefetch in all cases?] /Bruce
Re: [dpdk-dev] [PATCH v5] ethdev: fix representor port ID search by name
On 9/13/2021 4:56 PM, Andrew Rybchenko wrote: From: Viacheslav Galaktionov Getting a list of representors from a representor does not make sense. Instead, a parent device should be used. To this end, extend the rte_eth_dev_data structure to include the port ID of the backing device for representors. Signed-off-by: Viacheslav Galaktionov Signed-off-by: Andrew Rybchenko Acked-by: Haiyue Wang Acked-by: Beilei Xing --- The new field is added into the hole in rte_eth_dev_data structure. The patch does not change ABI, but extra care is required since ABI check is disabled for the structure because of the libabigail bug [1]. It should not be a problem anyway since 21.11 is a ABI breaking release. Potentially it is bad for out-of-tree drivers which implement representors but do not fill in a new parert_port_id field in rte_eth_dev_data structure. Get ID by name will not work. Did we change name of new field from parert_port_id to backer_port_id. mlx5 changes should be reviwed by maintainers very carefully, since we are not sure if we patch it correctly. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060 v5: - try to improve name: backer_port_id instead of parent_port_id - init new field to RTE_MAX_ETHPORTS on allocation to avoid zero port usage by default v4: - apply mlx5 review notes: remove fallback from generic ethdev code and add fallback to mlx5 code to handle legacy usecase v3: - fix mlx5 build breakage v2: - fix mlx5 review notes - try device port ID first before parent in order to address backward compatibility issue drivers/net/bnxt/bnxt_reps.c | 1 + drivers/net/enic/enic_vf_representor.c | 1 + drivers/net/i40e/i40e_vf_representor.c | 1 + drivers/net/ice/ice_dcf_vf_representor.c | 1 + drivers/net/ixgbe/ixgbe_vf_representor.c | 1 + drivers/net/mlx5/linux/mlx5_os.c | 13 + drivers/net/mlx5/windows/mlx5_os.c | 13 + lib/ethdev/ethdev_driver.h | 6 +++--- lib/ethdev/rte_class_eth.c | 2 +- lib/ethdev/rte_ethdev.c | 9 + lib/ethdev/rte_ethdev_core.h | 6 ++ 11 files changed, 46 insertions(+), 8 deletions(-) diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c index bdbad53b7d..0d50c0f1da 100644 --- a/drivers/net/bnxt/bnxt_reps.c +++ b/drivers/net/bnxt/bnxt_reps.c @@ -187,6 +187,7 @@ int bnxt_representor_init(struct rte_eth_dev *eth_dev, void *params) eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR | RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; eth_dev->data->representor_id = rep_params->vf_id; + eth_dev->data->backer_port_id = rep_params->parent_dev->data->port_id; rte_eth_random_addr(vf_rep_bp->dflt_mac_addr); memcpy(vf_rep_bp->mac_addr, vf_rep_bp->dflt_mac_addr, diff --git a/drivers/net/enic/enic_vf_representor.c b/drivers/net/enic/enic_vf_representor.c index 79dd6e5640..fedb09ecd6 100644 --- a/drivers/net/enic/enic_vf_representor.c +++ b/drivers/net/enic/enic_vf_representor.c @@ -662,6 +662,7 @@ int enic_vf_representor_init(struct rte_eth_dev *eth_dev, void *init_params) eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR | RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; eth_dev->data->representor_id = vf->vf_id; + eth_dev->data->backer_port_id = pf->port_id; eth_dev->data->mac_addrs = rte_zmalloc("enic_mac_addr_vf", sizeof(struct rte_ether_addr) * ENIC_UNICAST_PERFECT_FILTERS, 0); diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c index 0481b55381..d65b821a01 100644 --- a/drivers/net/i40e/i40e_vf_representor.c +++ b/drivers/net/i40e/i40e_vf_representor.c @@ -514,6 +514,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params) ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR | RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS; ethdev->data->representor_id = representor->vf_id; + ethdev->data->backer_port_id = pf->dev_data->port_id; /* Setting the number queues allocated to the VF */ ethdev->data->nb_rx_queues = vf->vsi->nb_qps; diff --git a/drivers/net/ice/ice_dcf_vf_representor.c b/drivers/net/ice/ice_dcf_vf_representor.c index 970461f3e9..e51d0aa6b9 100644 --- a/drivers/net/ice/ice_dcf_vf_representor.c +++ b/drivers/net/ice/ice_dcf_vf_representor.c @@ -418,6 +418,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, void *init_param) vf_rep_eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR; vf_rep_eth_dev->data->representor_id = repr->vf_id; + vf_rep_eth_dev->data->backer_port_id = repr->dcf_eth_dev->data->port_id; vf_rep_eth_dev->data->mac_addrs = &repr->mac_addr; diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ix
[dpdk-dev] [PATCH v3 0/3] add SA config option for inner pkt csum
Add inner packet IPv4 hdr and L4 checksum enable options in conf. These will be used in case of protocol offload. Per SA, application could specify whether the checksum(compute/verify) can be offloaded to security device. Depends on https://patches.dpdk.org/project/dpdk/list/?series=19243 Changes in v3: - Removed code unrelated to this series. Changes in v2: - Fixed release notes - Added feature flag in default.ini and cn10k.ini - Fixed test patch subject Archana Muniganti (3): security: add SA config option for inner pkt csum crypto/cnxk: add inner checksum test/crypto: add inner checksum cases app/test/test_cryptodev.c | 34 +++ app/test/test_cryptodev_security_ipsec.c | 195 ++ app/test/test_cryptodev_security_ipsec.h | 2 + ...st_cryptodev_security_ipsec_test_vectors.h | 6 + doc/guides/cryptodevs/features/cn10k.ini | 1 + doc/guides/cryptodevs/features/default.ini| 1 + doc/guides/rel_notes/deprecation.rst | 4 +- doc/guides/rel_notes/release_21_11.rst| 6 + drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 65 -- drivers/crypto/cnxk/cn10k_ipsec.c | 49 - drivers/crypto/cnxk/cn10k_ipsec.h | 1 + drivers/crypto/cnxk/cn10k_ipsec_la_ops.h | 9 +- drivers/crypto/cnxk/cnxk_cryptodev.c | 3 + .../crypto/cnxk/cnxk_cryptodev_capabilities.c | 2 + lib/cryptodev/rte_cryptodev.h | 2 + lib/security/rte_security.h | 18 ++ 16 files changed, 378 insertions(+), 20 deletions(-) -- 2.22.0
[dpdk-dev] [PATCH v3 1/3] security: add SA config option for inner pkt csum
Add inner packet IPv4 hdr and L4 checksum enable options in conf. These will be used in case of protocol offload. Per SA, application could specify whether the checksum(compute/verify) can be offloaded to security device. Signed-off-by: Archana Muniganti --- doc/guides/cryptodevs/features/default.ini | 1 + doc/guides/rel_notes/deprecation.rst | 4 ++-- doc/guides/rel_notes/release_21_11.rst | 4 lib/cryptodev/rte_cryptodev.h | 2 ++ lib/security/rte_security.h| 18 ++ 5 files changed, 27 insertions(+), 2 deletions(-) diff --git a/doc/guides/cryptodevs/features/default.ini b/doc/guides/cryptodevs/features/default.ini index c24814de98..96d95ddc81 100644 --- a/doc/guides/cryptodevs/features/default.ini +++ b/doc/guides/cryptodevs/features/default.ini @@ -33,6 +33,7 @@ Non-Byte aligned data = Sym raw data path API = Cipher multiple data units = Cipher wrapped key = +Inner checksum = ; ; Supported crypto algorithms of a default crypto driver. diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 05fc2fdee7..8308e00ed4 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -232,8 +232,8 @@ Deprecation Notices IPsec payload MSS (Maximum Segment Size), and ESN (Extended Sequence Number). * security: The IPsec SA config options ``struct rte_security_ipsec_sa_options`` - will be updated with new fields to support new features like IPsec inner - checksum, TSO in case of protocol offload. + will be updated with new fields to support new features like TSO in case of + protocol offload. * ipsec: The structure ``rte_ipsec_sa_prm`` will be extended with a new field ``hdr_l3_len`` to configure tunnel L3 header length. diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 8da851..93d1b36889 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -194,6 +194,10 @@ ABI Changes ``rte_security_ipsec_xform`` to allow applications to configure SA soft and hard expiry limits. Limits can be either in number of packets or bytes. +* security: The new options ``ip_csum_enable`` and ``l4_csum_enable`` were added + in structure ``rte_security_ipsec_sa_options`` to indicate whether inner + packet IPv4 header checksum and L4 checksum need to be offloaded to + security device. Known Issues diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h index bb01f0f195..d9271a6c45 100644 --- a/lib/cryptodev/rte_cryptodev.h +++ b/lib/cryptodev/rte_cryptodev.h @@ -479,6 +479,8 @@ rte_cryptodev_asym_get_xform_enum(enum rte_crypto_asym_xform_type *xform_enum, /**< Support operations on multiple data-units message */ #define RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY(1ULL << 26) /**< Support wrapped key in cipher xform */ +#define RTE_CRYPTODEV_FF_SECURITY_INNER_CSUM (1ULL << 27) +/**< Support inner checksum computation/verification */ /** * Get the name of a crypto device feature flag diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h index ab1a6e1f65..945f45ad76 100644 --- a/lib/security/rte_security.h +++ b/lib/security/rte_security.h @@ -230,6 +230,24 @@ struct rte_security_ipsec_sa_options { * * 0: Do not match UDP ports */ uint32_t udp_ports_verify : 1; + + /** Compute/verify inner packet IPv4 header checksum in tunnel mode +* +* * 1: For outbound, compute inner packet IPv4 header checksum +* before tunnel encapsulation and for inbound, verify after +* tunnel decapsulation. +* * 0: Inner packet IP header checksum is not computed/verified. +*/ + uint32_t ip_csum_enable : 1; + + /** Compute/verify inner packet L4 checksum in tunnel mode +* +* * 1: For outbound, compute inner packet L4 checksum before +* tunnel encapsulation and for inbound, verify after +* tunnel decapsulation. +* * 0: Inner packet L4 checksum is not computed/verified. +*/ + uint32_t l4_csum_enable : 1; }; /** IPSec security association direction */ -- 2.22.0
[dpdk-dev] [PATCH v3 2/3] crypto/cnxk: add inner checksum
Add inner checksum support for cn10k. Signed-off-by: Archana Muniganti --- doc/guides/cryptodevs/features/cn10k.ini | 1 + doc/guides/rel_notes/release_21_11.rst| 1 + drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 65 +++ drivers/crypto/cnxk/cn10k_ipsec.c | 49 +- drivers/crypto/cnxk/cn10k_ipsec.h | 1 + drivers/crypto/cnxk/cn10k_ipsec_la_ops.h | 9 ++- drivers/crypto/cnxk/cnxk_cryptodev.c | 3 + .../crypto/cnxk/cnxk_cryptodev_capabilities.c | 2 + 8 files changed, 113 insertions(+), 18 deletions(-) diff --git a/doc/guides/cryptodevs/features/cn10k.ini b/doc/guides/cryptodevs/features/cn10k.ini index f5552feca3..9d08bd5c04 100644 --- a/doc/guides/cryptodevs/features/cn10k.ini +++ b/doc/guides/cryptodevs/features/cn10k.ini @@ -15,6 +15,7 @@ OOP SGL In SGL Out = Y OOP LB In LB Out = Y Symmetric sessionless = Y Digest encrypted = Y +Inner checksum = Y ; ; Supported crypto algorithms of 'cn10k' crypto driver. diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 93d1b36889..163cdaa800 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -72,6 +72,7 @@ New Features * Added Transport mode support in lookaside protocol (IPsec) for CN10K. * Added UDP encapsulation support in lookaside protocol (IPsec) for CN10K. * Added support for lookaside protocol (IPsec) offload for CN9K. + * Added inner checksum support in lookaside protocol (IPsec) for CN10K. * **Added support for event crypto adapter on Marvell CN10K and CN9K.** diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c index 3caf05aab9..c25c8e67b2 100644 --- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c +++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c @@ -50,7 +50,7 @@ cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op) static __rte_always_inline int __rte_hot cpt_sec_inst_fill(struct rte_crypto_op *op, struct cn10k_sec_session *sess, - struct cpt_inst_s *inst) + struct cpt_inflight_req *infl_req, struct cpt_inst_s *inst) { struct rte_crypto_sym_op *sym_op = op->sym; union roc_ot_ipsec_sa_word2 *w2; @@ -72,8 +72,10 @@ cpt_sec_inst_fill(struct rte_crypto_op *op, struct cn10k_sec_session *sess, if (w2->s.dir == ROC_IE_SA_DIR_OUTBOUND) ret = process_outb_sa(op, sa, inst); - else + else { + infl_req->op_flags |= CPT_OP_FLAGS_IPSEC_DIR_INBOUND; ret = process_inb_sa(op, sa, inst); + } return ret; } @@ -122,7 +124,8 @@ cn10k_cpt_fill_inst(struct cnxk_cpt_qp *qp, struct rte_crypto_op *ops[], if (op->sess_type == RTE_CRYPTO_OP_SECURITY_SESSION) { sec_sess = get_sec_session_private_data( sym_op->sec_session); - ret = cpt_sec_inst_fill(op, sec_sess, &inst[0]); + ret = cpt_sec_inst_fill(op, sec_sess, infl_req, + &inst[0]); if (unlikely(ret)) return 0; w7 = sec_sess->sa.inst.w7; @@ -342,6 +345,49 @@ cn10k_cpt_sec_post_process(struct rte_crypto_op *cop, m->pkt_len = m_len; } +static inline void +cn10k_cpt_sec_ucc_process(struct rte_crypto_op *cop, + struct cpt_inflight_req *infl_req, + const uint8_t uc_compcode) +{ + struct cn10k_sec_session *sess; + struct cn10k_ipsec_sa *sa; + struct rte_mbuf *mbuf; + + if (uc_compcode == ROC_IE_OT_UCC_SUCCESS_SA_SOFTEXP_FIRST) + cop->aux_flags = RTE_CRYPTO_OP_AUX_FLAGS_IPSEC_SOFT_EXPIRY; + + if (!(infl_req->op_flags & CPT_OP_FLAGS_IPSEC_DIR_INBOUND)) + return; + + sess = get_sec_session_private_data(cop->sym->sec_session); + sa = &sess->sa; + + mbuf = cop->sym->m_src; + + switch (uc_compcode) { + case ROC_IE_OT_UCC_SUCCESS: + if (sa->ip_csum_enable) + mbuf->ol_flags |= PKT_RX_IP_CKSUM_GOOD; + break; + case ROC_IE_OT_UCC_SUCCESS_PKT_IP_BADCSUM: + mbuf->ol_flags |= PKT_RX_IP_CKSUM_BAD; + break; + case ROC_IE_OT_UCC_SUCCESS_PKT_L4_GOODCSUM: + mbuf->ol_flags |= PKT_RX_L4_CKSUM_GOOD; + if (sa->ip_csum_enable) + mbuf->ol_flags |= PKT_RX_IP_CKSUM_GOOD; + break; + case ROC_IE_OT_UCC_SUCCESS_PKT_L4_BADCSUM: + mbuf->ol_flags |= PKT_RX_L4_CKSUM_BAD; + if (sa->ip_csum_enable) + mbuf->ol_flags |= PKT_RX_IP_CKSUM_GOOD; + break; + default: + break; + } +} + static inline void cn10k_cpt_de
[dpdk-dev] [PATCH v3 3/3] test/crypto: add inner checksum cases
This patch adds tests for inner IP and inner L4 checksum in IPsec mode. Signed-off-by: Archana Muniganti --- app/test/test_cryptodev.c | 34 +++ app/test/test_cryptodev_security_ipsec.c | 195 ++ app/test/test_cryptodev_security_ipsec.h | 2 + ...st_cryptodev_security_ipsec_test_vectors.h | 6 + doc/guides/rel_notes/release_21_11.rst| 1 + 5 files changed, 238 insertions(+) diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c index 5f0d023451..c127e6bc04 100644 --- a/app/test/test_cryptodev.c +++ b/app/test/test_cryptodev.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #ifdef RTE_CRYPTO_SCHEDULER #include @@ -9275,6 +9277,30 @@ test_ipsec_proto_udp_ports_verify(const void *data __rte_unused) return test_ipsec_proto_all(&flags); } +static int +test_ipsec_proto_inner_ip_csum(const void *data __rte_unused) +{ + struct ipsec_test_flags flags; + + memset(&flags, 0, sizeof(flags)); + + flags.ip_csum = true; + + return test_ipsec_proto_all(&flags); +} + +static int +test_ipsec_proto_inner_l4_csum(const void *data __rte_unused) +{ + struct ipsec_test_flags flags; + + memset(&flags, 0, sizeof(flags)); + + flags.l4_csum = true; + + return test_ipsec_proto_all(&flags); +} + static int test_PDCP_PROTO_all(void) { @@ -14231,6 +14257,14 @@ static struct unit_test_suite ipsec_proto_testsuite = { "Tunnel src and dst addr verification", ut_setup_security, ut_teardown, test_ipsec_proto_tunnel_src_dst_addr_verify), + TEST_CASE_NAMED_ST( + "Inner IP checksum", + ut_setup_security, ut_teardown, + test_ipsec_proto_inner_ip_csum), + TEST_CASE_NAMED_ST( + "Inner L4 checksum", + ut_setup_security, ut_teardown, + test_ipsec_proto_inner_l4_csum), TEST_CASES_END() /**< NULL terminate unit test array */ } }; diff --git a/app/test/test_cryptodev_security_ipsec.c b/app/test/test_cryptodev_security_ipsec.c index 764e77bbff..bcd9746c98 100644 --- a/app/test/test_cryptodev_security_ipsec.c +++ b/app/test/test_cryptodev_security_ipsec.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "test.h" @@ -103,6 +104,22 @@ test_ipsec_sec_caps_verify(struct rte_security_ipsec_xform *ipsec_xform, return -ENOTSUP; } + if (ipsec_xform->options.ip_csum_enable == 1 && + sec_cap->ipsec.options.ip_csum_enable == 0) { + if (!silent) + RTE_LOG(INFO, USER1, + "Inner IP checksum is not supported\n"); + return -ENOTSUP; + } + + if (ipsec_xform->options.l4_csum_enable == 1 && + sec_cap->ipsec.options.l4_csum_enable == 0) { + if (!silent) + RTE_LOG(INFO, USER1, + "Inner L4 checksum is not supported\n"); + return -ENOTSUP; + } + return 0; } @@ -160,6 +177,56 @@ test_ipsec_td_in_from_out(const struct ipsec_test_data *td_out, } } +static bool +is_ipv4(void *ip) +{ + struct rte_ipv4_hdr *ipv4 = ip; + uint8_t ip_ver; + + ip_ver = (ipv4->version_ihl & 0xf0) >> RTE_IPV4_IHL_MULTIPLIER; + if (ip_ver == IPVERSION) + return true; + else + return false; +} + +static void +test_ipsec_csum_init(void *ip, bool l3, bool l4) +{ + struct rte_ipv4_hdr *ipv4; + struct rte_tcp_hdr *tcp; + struct rte_udp_hdr *udp; + uint8_t next_proto; + uint8_t size; + + if (is_ipv4(ip)) { + ipv4 = ip; + size = sizeof(struct rte_ipv4_hdr); + next_proto = ipv4->next_proto_id; + + if (l3) + ipv4->hdr_checksum = 0; + } else { + size = sizeof(struct rte_ipv6_hdr); + next_proto = ((struct rte_ipv6_hdr *)ip)->proto; + } + + if (l4) { + switch (next_proto) { + case IPPROTO_TCP: + tcp = (struct rte_tcp_hdr *)RTE_PTR_ADD(ip, size); + tcp->cksum = 0; + break; + case IPPROTO_UDP: + udp = (struct rte_udp_hdr *)RTE_PTR_ADD(ip, size); + udp->dgram_cksum = 0; + break; + default: + return; + } + } +} + void test_ipsec_td_prepare(const struct crypto_param *param1, const struct crypto_param *param2, @@ -194,6 +261,17 @@ test_ipsec_td_prepare(const struct crypto_param *param1, if (flags->sa_expiry_pkts_soft)
Re: [dpdk-dev] [PATCH v2 1/3] security: add SA config option for inner pkt csum
Hi Anoob, > Hi Konstanin, > > Please see inline. > > Thanks, > Anoob > > > -Original Message- > > From: Ananyev, Konstantin > > Sent: Wednesday, September 29, 2021 4:26 PM > > To: Archana Muniganti ; Akhil Goyal > > ; Nicolau, Radu ; Zhang, Roy > > Fan ; hemant.agra...@nxp.com > > Cc: Anoob Joseph ; Tejasree Kondoj > > ; Ankur Dwivedi ; Jerin Jacob > > Kollanukkaran ; dev@dpdk.org > > Subject: [EXT] RE: [PATCH v2 1/3] security: add SA config option for inner > > pkt > > csum > > > > External Email > > > > -- > > > Add inner packet IPv4 hdr and L4 checksum enable options in conf. > > > These will be used in case of protocol offload. > > > Per SA, application could specify whether the > > > checksum(compute/verify) can be offloaded to security device. > > > > > > Signed-off-by: Archana Muniganti > > > --- > > > doc/guides/cryptodevs/features/default.ini | 1 + > > > doc/guides/rel_notes/deprecation.rst | 4 ++-- > > > doc/guides/rel_notes/release_21_11.rst | 4 > > > lib/cryptodev/rte_cryptodev.h | 2 ++ > > > lib/security/rte_security.h| 18 ++ > > > 5 files changed, 27 insertions(+), 2 deletions(-) > > > > > > diff --git a/doc/guides/cryptodevs/features/default.ini > > > b/doc/guides/cryptodevs/features/default.ini > > > index c24814de98..96d95ddc81 100644 > > > --- a/doc/guides/cryptodevs/features/default.ini > > > +++ b/doc/guides/cryptodevs/features/default.ini > > > @@ -33,6 +33,7 @@ Non-Byte aligned data = Sym raw data path API = > > > Cipher multiple data units = > > > Cipher wrapped key = > > > +Inner checksum = > > > > > > ; > > > ; Supported crypto algorithms of a default crypto driver. > > > diff --git a/doc/guides/rel_notes/deprecation.rst > > > b/doc/guides/rel_notes/deprecation.rst > > > index 05fc2fdee7..8308e00ed4 100644 > > > --- a/doc/guides/rel_notes/deprecation.rst > > > +++ b/doc/guides/rel_notes/deprecation.rst > > > @@ -232,8 +232,8 @@ Deprecation Notices > > >IPsec payload MSS (Maximum Segment Size), and ESN (Extended Sequence > > Number). > > > > > > * security: The IPsec SA config options ``struct > > > rte_security_ipsec_sa_options`` > > > - will be updated with new fields to support new features like IPsec > > > inner > > > - checksum, TSO in case of protocol offload. > > > + will be updated with new fields to support new features like TSO in > > > + case of protocol offload. > > > > > > * ipsec: The structure ``rte_ipsec_sa_prm`` will be extended with a new > > > field > > >``hdr_l3_len`` to configure tunnel L3 header length. > > > diff --git a/doc/guides/rel_notes/release_21_11.rst > > > b/doc/guides/rel_notes/release_21_11.rst > > > index 8da851..93d1b36889 100644 > > > --- a/doc/guides/rel_notes/release_21_11.rst > > > +++ b/doc/guides/rel_notes/release_21_11.rst > > > @@ -194,6 +194,10 @@ ABI Changes > > >``rte_security_ipsec_xform`` to allow applications to configure SA soft > > >and hard expiry limits. Limits can be either in number of packets or > > > bytes. > > > > > > +* security: The new options ``ip_csum_enable`` and ``l4_csum_enable`` > > > +were added > > > + in structure ``rte_security_ipsec_sa_options`` to indicate whether > > > +inner > > > + packet IPv4 header checksum and L4 checksum need to be offloaded to > > > + security device. > > > > > > Known Issues > > > > > > diff --git a/lib/cryptodev/rte_cryptodev.h > > > b/lib/cryptodev/rte_cryptodev.h index bb01f0f195..d9271a6c45 100644 > > > --- a/lib/cryptodev/rte_cryptodev.h > > > +++ b/lib/cryptodev/rte_cryptodev.h > > > @@ -479,6 +479,8 @@ rte_cryptodev_asym_get_xform_enum(enum > > > rte_crypto_asym_xform_type *xform_enum, /**< Support operations on > > multiple data-units message */ > > > #define RTE_CRYPTODEV_FF_CIPHER_WRAPPED_KEY (1ULL << 26) > > > /**< Support wrapped key in cipher xform */ > > > +#define RTE_CRYPTODEV_FF_SECURITY_INNER_CSUM (1ULL > > << 27) > > > +/**< Support inner checksum computation/verification */ > > > > > > /** > > > * Get the name of a crypto device feature flag diff --git > > > a/lib/security/rte_security.h b/lib/security/rte_security.h index > > > ab1a6e1f65..945f45ad76 100644 > > > --- a/lib/security/rte_security.h > > > +++ b/lib/security/rte_security.h > > > @@ -230,6 +230,24 @@ struct rte_security_ipsec_sa_options { > > >* * 0: Do not match UDP ports > > >*/ > > > uint32_t udp_ports_verify : 1; > > > + > > > + /** Compute/verify inner packet IPv4 header checksum in tunnel mode > > > + * > > > + * * 1: For outbound, compute inner packet IPv4 header checksum > > > + * before tunnel encapsulation and for inbound, verify after > > > + * tunnel decapsulation. > > > + * * 0: Inner packet IP header checksum is not computed/verified. > > > + */ > > > + uint32_t ip_csum_enable : 1; > > > + > > > + /** Compute/verify in
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
> -Original Message- > From: Richardson, Bruce > Sent: Wednesday, September 29, 2021 12:08 PM > To: Ananyev, Konstantin > Cc: Xueming(Steven) Li ; jerinjac...@gmail.com; > NBU-Contact-Thomas Monjalon ; > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > On Wed, Sep 29, 2021 at 09:52:20AM +, Ananyev, Konstantin wrote: > > > > > > > -Original Message- > > > From: Xueming(Steven) Li > > > Sent: Wednesday, September 29, 2021 10:13 AM > > > > > + /* Locate real source fs according to mbuf->port. */ > > > > + for (i = 0; i < nb_rx; ++i) { > > > > + rte_prefetch0(pkts_burst[i + 1]); > > > > > > > > you access pkt_burst[] beyond array boundaries, > > > > also you ask cpu to prefetch some unknown and possibly invalid > > > > address. > > > > > > Sorry I forgot this topic. It's too late to prefetch current packet, so > > > perfetch next is better. Prefetch an invalid address at end of a look > > > doesn't hurt, it's common in DPDK. > > > > First of all it is usually never 'OK' to access array beyond its bounds. > > Second prefetching invalid address *does* hurt performance badly on many > > CPUs > > (TLB misses, consumed memory bandwidth etc.). > > As a reference: https://lwn.net/Articles/444346/ > > If some existing DPDK code really does that - then I believe it is an issue > > and has to be addressed. > > More important - it is really bad attitude to submit bogus code to DPDK > > community > > and pretend that it is 'OK'. > > > > The main point we need to take from all this is that when > prefetching you need to measure perf impact of it. > > In terms of the specific case of prefetching one past the end of the array, > I would take the view that this is harmless in almost all cases. Unlike any > prefetch of "NULL" as in the referenced mail, reading one past the end (or > other small number of elements past the end) is far less likely to cause a > TLB miss - and it's basically just reproducing behaviour we would expect > off a HW prefetcher (though those my explicitly never cross page > boundaries). However, if you feel it's just cleaner to put in an > additional condition to remove the prefetch for the end case, that's ok > also - again so long as it doesn't affect performance. [Since prefetch is a > hint, I'm not sure if compilers or CPUs may be legally allowed to skip the > branch and blindly prefetch in all cases?] Please look at the code. It doesn't prefetch next element beyond array boundaries. It first reads address from the element that is beyond array boundaries (which is a bug by itself). Then it prefetches that bogus address. We simply don't know is this address is valid and where it points to. In other words, it doesn't do: rte_prefetch0(&pkts_burst[i + 1]); It does: rte_prefetch0(pkts_burst[i + 1]);
Re: [dpdk-dev] [dpdk-stable] [PATCH v5 2/2] ethdev: fix docs of drivers callbacks getting xstats by IDs
On 9/29/21 11:44 AM, Ferruh Yigit wrote: > On 9/28/2021 5:53 PM, Andrew Rybchenko wrote: >> On 9/28/21 7:50 PM, Ferruh Yigit wrote: >>> On 9/28/2021 1:05 PM, Andrew Rybchenko wrote: From: Ivan Ilchenko Update xstats by IDs callbacks documentation in accordance with ethdev usage of these callbacks. Document valid combinations of input arguments to make driver implementation simpler. Fixes: 79c913a42f0 ("ethdev: retrieve xstats by ID") Cc: sta...@dpdk.org Signed-off-by: Ivan Ilchenko Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton --- lib/ethdev/ethdev_driver.h | 42 -- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index 40e474aa7e..c89eefcc42 100644 --- a/lib/ethdev/ethdev_driver.h +++ b/lib/ethdev/ethdev_driver.h @@ -187,11 +187,28 @@ typedef int (*eth_xstats_get_t)(struct rte_eth_dev *dev, struct rte_eth_xstat *stats, unsigned int n); /**< @internal Get extended stats of an Ethernet device. */ +/** + * @internal + * Get extended stats of an Ethernet device. + * + * @param dev + * ethdev handle of port. + * @param ids + * IDs array to retrieve specific statistics. Must not be NULL. + * @param values + * A pointer to a table to be filled with device statistics values. + * Must not be NULL. + * @param n + * Element count in @p ids and @p values. + * + * @return + * - A number of filled in stats. + * - A negative value on error. + */ typedef int (*eth_xstats_get_by_id_t)(struct rte_eth_dev *dev, const uint64_t *ids, uint64_t *values, unsigned int n); -/**< @internal Get extended stats of an Ethernet device. */ /** * @internal @@ -218,10 +235,31 @@ typedef int (*eth_xstats_get_names_t)(struct rte_eth_dev *dev, struct rte_eth_xstat_name *xstats_names, unsigned int size); /**< @internal Get names of extended stats of an Ethernet device. */ +/** + * @internal + * Get names of extended stats of an Ethernet device. + * For name count, set @p xstats_names and @p ids to NULL. >>> >>> Why limiting this behavior to 'xstats_get_names_by_id'? >>> >>> Internally 'xstats_get_names_by_id' is used to get the count, but I think >>> this >>> is not intentionally selected, just one of the xstats_*_by_id dev_ops used. >>> >>> I think it is confusing to have this support for one of the '_by_id' >>> dev_ops but >>> not for other. Why not require both to support returning 'count'? >> >> Simply because it is dead code. There is no point to require >> from driver to have dead code. >> > > Let me step back a little, both ethdev APIs can be used to return xstats count > by providing 'values/names' & 'ids' pointers as NULL and 'size' as 0: > 'rte_eth_xstats_get_names_by_id()' > 'rte_eth_xstats_get_by_id()' > > But internally both APIs use 'xstats_get_names_by_id' dev_ops to get the > count, > as said above I believe this selection is done unintentionally. > > > I am for below two options: > > a) Internally use 'xstats_get_names' || 'xstats_get' dev_ops to get the xstats > count, and doesn't support getting xstats count for both '_by_id' dev_ops, > this > simplifies driver code. As far as I remember I suggested this before, still I > prefer this one. > > b) If we will support getting xstats count from '_by_id' dev_ops, I think both > should support it, to not make it more complex to figure out which one support > what. As sample both 'xstats_get_names' and 'xstats_get' supports getting > xstats > count, not just one. > In (b) do you suggest to change ethdev to use xstats_get_by_id driver op if users asks for a number of xstats using rte_eth_xstats_get_by_id()? It will complicate ethdev and will complicate drivers. Just for the symmetry? The patch does not change external API, does not change etcdev bahaviour. It just clarify requirements on driver ops to allow to check less in all drivers and support less cases in all drivers. If we make a one more step back, frankly speaking I think we have too many functions to retrieve statistics. I can understand from ethdev API point of view taking API stability into account etc. But why do we have so many complicated driver callbacks? First of all I'd like to do one more cleanup: change eth_xstats_get_names_by_id_t prototype to have ids before xstats_names. I.e. eth_xstats_get_by_id_t(dev, ids, values, size) eth_xstats_get_names_by_id_t(dev, ids, names, size) Second, merge eth_xstats_get_names_t and eth_xstats_get_names_by_id_t callbacks to keep name of the first, but prototype from the second. The reason is equal functionality if ids==
Re: [dpdk-dev] [PATCH] net/mlx5: fix flow tables double release
From: Bing Zhao > In the function mlx5_alloc_shared_dr(), there are various reasons to result in > a failure and error clean up process. While in the caller of mlx5_dev_spawn(), > once there is a error occurring after the mlx5_alloc_shared_dr(), the > mlx5_os_free_shared_dr() is called to release all the resources. > > To prevent a double release, the pointers of the resources should be > checked before the releasing and set to NULL after done. > > In the mlx5_free_table_hash_list(), after the releasing, the pointer was > missed to set to NULL and a double release may cause a crash. > > By setting the tables pointer to NULL as done for other resources, the double > release and crash could be solved. > > Fixes: 54534725d2f3 ("net/mlx5: fix flow table hash list conversion") > Cc: ma...@mellanox.com > Cc: sta...@dpdk.org > > Signed-off-by: Bing Zhao Acked-by: Matan Azrad
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, 2021-09-29 at 09:52 +, Ananyev, Konstantin wrote: > > > -Original Message- > > From: Xueming(Steven) Li > > Sent: Wednesday, September 29, 2021 10:13 AM > > To: jerinjac...@gmail.com; Ananyev, Konstantin > > > > Cc: NBU-Contact-Thomas Monjalon ; > > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > > > On Wed, 2021-09-29 at 00:26 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > > port > > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > > > ethdev receive queues land into > > > > > > the same > > > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > > > is > > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > > > > > > > > > > > > The > > > > > > > > > > > > > forwarding logic has to be changed slightly in > > > > > > > > > > > > > case > > > > > > > > > > > > > of shared rxq. > > > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > > > lookup > > > > > > > > > > > > > source stream according to mbuf->port, forwarding > > > > > > > > > > > > > to > > > > > > > > > > > > > target fs. > > > > > > > > > > > > > Packets from same source port could be grouped as > > > > > > > > > > > > > a > > > > > > > > > > > > > small burst to process, this will accelerates the > > > > > > > > > > > > > performance if traffic > > > > > > come from > > > > > > > > > > > > > limited ports. I'll introduce some common api to > >
[dpdk-dev] Questions about vm2vm vhost-user/virtio-net test
Hi, Coquelin, Chenbo, all, I want to seek help about vm2vm vhost-user/virtio-net test from you. When I set up vm2vm vhost-user/virtio-net test, I cannot ping vm2 in vm1. That is, ping failed between vm1 and vm2. Detailed description are as flows: host configuration: Linux localhost 5.14.0-rc4+ #1 SMP PREEMPT Sun Sep 26 16:52:13 CST 2021 aarch64 aarch64 aarch64 GNU/Linux 1, set up vhost on host cd /home/humin/virtio_test rm -rf ./vhost-net* /usr/app/testpmd -l 2-4 -n 4 --no-pci --file-prefix=vhost --vdev 'net_vhost0,iface=vhost-net0,queues=1' --vdev 'net_vhost1,iface=vhost-net1,queues=1' -- -i --nb-cores=2 --txd=1024 --rxd=1024 testpmd> set fwd mac testpmd> start 2, create vm1 and vm2 taskset -c 13 qemu-system-aarch64 \ -name us-vhost-vm1 \ -kernel /home/humin/virtio_test/Image \ -initrd /home/humin/virtio_test/rootfs.cpio.gz \ -machine virt,gic-version=3 -nographic \ -cpu host -enable-kvm -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem \ -mem-prealloc \ -smp cores=4,sockets=1 \ -monitor unix:/tmp/vm2_monitor.sock,server,nowait \ -net user,hostfwd=tcp:127.0.0.1:6004-:22 \ -chardev socket,id=char0,path=/home/humin/virtio_test/vhost-net0 \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,mac=52:54:00:00:00:01,netdev=mynet1,mrg_rxbuf=on,csum=on,guest_csum=on,guest_ecn=on \ -vnc :15 taskset -c 15 qemu-system-aarch64 \ -name us-vhost-vm2 \ -kernel /home/humin/virtio_test/Image \ -initrd /home/humin/virtio_test/rootfs.cpio.gz \ -machine virt,gic-version=3 -nographic \ -cpu host -enable-kvm -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem \ -mem-prealloc \ -smp cores=4,sockets=1 \ -monitor unix:/tmp/vm2_monitor.sock,server,nowait \ -net user,hostfwd=tcp:127.0.0.1:6005-:22 \ -chardev socket,id=char1,path=/home/humin/virtio_test/vhost-net1 \ -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ -device virtio-net-pci,mac=52:54:00:00:00:02,netdev=mynet2,mrg_rxbuf=on,csum=on,guest_csum=on,guest_ecn=on \ -vnc :16 3, do test in vm1: ifconfig eth0 up ifconfig eth0 1.1.1.2 arp -s 1.1.1.8 52:54:00:00:00:02 in vm2: ifconfig eth0 up ifconfig eth0 1.1.1.8 arp -s 1.1.1.2 52:54:00:00:00:01 Then in vm1: ping 1.1.1.8 the result: no icmp reply, ping failed. same for vm2. 4, Try to debug using tcpdump to capture packets in vm2, we can get only the ICMP request packet which is from vm1, but no ICMP reply. It looks like ping task does not send ICMP reply, but it is just guess for it. BTW, I also use iperf test: UDP packet test: vm1: iperf -s -i 1 vm2: iperf -c 1.1.1.2 -u -i 1 -t 60 Then vm1 can get the packet. Same result if vm1 is used as client. TCP packet test: vm1: iperf -s -i 1 vm2: iperf -c 1.1.1.2 -i 1 -t 60 Then vm1 can NOT get the packet. Same result if vm1 is used as client. 5, reference I refre to the website https://doc.dpdk.org/dts/test_plans/vm2vm_virtio_net_perf_test_plan.html In 199.7. Test Case 5: scp test success. But my scp test also failed, any one could help me? thanks very much.
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, Sep 29, 2021 at 12:46:51PM +0100, Ananyev, Konstantin wrote: > > > > -Original Message- > > From: Richardson, Bruce > > Sent: Wednesday, September 29, 2021 12:08 PM > > To: Ananyev, Konstantin > > Cc: Xueming(Steven) Li ; jerinjac...@gmail.com; > > NBU-Contact-Thomas Monjalon ; > > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > > > On Wed, Sep 29, 2021 at 09:52:20AM +, Ananyev, Konstantin wrote: > > > > > > > > > > -Original Message- > > > > From: Xueming(Steven) Li > > > > Sent: Wednesday, September 29, 2021 10:13 AM > > > > > > > + /* Locate real source fs according to mbuf->port. */ > > > > > + for (i = 0; i < nb_rx; ++i) { > > > > > + rte_prefetch0(pkts_burst[i + 1]); > > > > > > > > > > you access pkt_burst[] beyond array boundaries, > > > > > also you ask cpu to prefetch some unknown and possibly invalid > > > > > address. > > > > > > > > Sorry I forgot this topic. It's too late to prefetch current packet, so > > > > perfetch next is better. Prefetch an invalid address at end of a look > > > > doesn't hurt, it's common in DPDK. > > > > > > First of all it is usually never 'OK' to access array beyond its bounds. > > > Second prefetching invalid address *does* hurt performance badly on many > > > CPUs > > > (TLB misses, consumed memory bandwidth etc.). > > > As a reference: https://lwn.net/Articles/444346/ > > > If some existing DPDK code really does that - then I believe it is an > > > issue and has to be addressed. > > > More important - it is really bad attitude to submit bogus code to DPDK > > > community > > > and pretend that it is 'OK'. > > > > > > > The main point we need to take from all this is that when > > prefetching you need to measure perf impact of it. > > > > In terms of the specific case of prefetching one past the end of the array, > > I would take the view that this is harmless in almost all cases. Unlike any > > prefetch of "NULL" as in the referenced mail, reading one past the end (or > > other small number of elements past the end) is far less likely to cause a > > TLB miss - and it's basically just reproducing behaviour we would expect > > off a HW prefetcher (though those my explicitly never cross page > > boundaries). However, if you feel it's just cleaner to put in an > > additional condition to remove the prefetch for the end case, that's ok > > also - again so long as it doesn't affect performance. [Since prefetch is a > > hint, I'm not sure if compilers or CPUs may be legally allowed to skip the > > branch and blindly prefetch in all cases?] > > Please look at the code. > It doesn't prefetch next element beyond array boundaries. > It first reads address from the element that is beyond array boundaries > (which is a bug by itself). > Then it prefetches that bogus address. > We simply don't know is this address is valid and where it points to. > > In other words, it doesn't do: > rte_prefetch0(&pkts_burst[i + 1]); > > It does: > rte_prefetch0(pkts_burst[i + 1]); > Apologies, yes, you are right, and that is a bug. /Bruce
Re: [dpdk-dev] [PATCH 00/19] MLX5 FreeBSD support
27/09/2021 15:34, Srikanth Kaka: > This patch series adds support for MLX5 PMD on FreeBSD > > drivers/common/mlx5/freebsd/meson.build | 189 ++ > drivers/common/mlx5/freebsd/mlx5_common_os.c | 387 +++ > drivers/common/mlx5/freebsd/mlx5_common_os.h | 304 ++ > .../common/mlx5/freebsd/mlx5_common_verbs.c | 90 + > drivers/common/mlx5/freebsd/mlx5_glue.c | 1505 ++ > drivers/common/mlx5/freebsd/mlx5_glue.h | 374 +++ > drivers/common/mlx5/freebsd/mlx5_inet.c | 306 ++ > drivers/common/mlx5/freebsd/mlx5_inet.h | 75 + > drivers/common/mlx5/meson.build | 12 +- > drivers/net/mlx5/freebsd/meson.build | 14 + > drivers/net/mlx5/freebsd/mlx5_ethdev_os.c | 1187 > drivers/net/mlx5/freebsd/mlx5_flow_os.c | 38 + > drivers/net/mlx5/freebsd/mlx5_flow_os.h | 484 +++ > drivers/net/mlx5/freebsd/mlx5_mp_os.c | 305 ++ > drivers/net/mlx5/freebsd/mlx5_os.c| 2600 + > drivers/net/mlx5/freebsd/mlx5_os.h| 22 + > drivers/net/mlx5/freebsd/mlx5_socket.c| 249 ++ > drivers/net/mlx5/freebsd/mlx5_verbs.c | 1208 > drivers/net/mlx5/freebsd/mlx5_verbs.h | 18 + > drivers/net/mlx5/freebsd/mlx5_vlan_os.c | 84 + > drivers/net/mlx5/meson.build | 14 +- > 21 files changed, 9458 insertions(+), 7 deletions(-) > create mode 100644 drivers/common/mlx5/freebsd/meson.build > create mode 100644 drivers/common/mlx5/freebsd/mlx5_common_os.c > create mode 100644 drivers/common/mlx5/freebsd/mlx5_common_os.h > create mode 100644 drivers/common/mlx5/freebsd/mlx5_common_verbs.c > create mode 100644 drivers/common/mlx5/freebsd/mlx5_glue.c > create mode 100644 drivers/common/mlx5/freebsd/mlx5_glue.h > create mode 100644 drivers/common/mlx5/freebsd/mlx5_inet.c > create mode 100644 drivers/common/mlx5/freebsd/mlx5_inet.h > create mode 100644 drivers/net/mlx5/freebsd/meson.build > create mode 100644 drivers/net/mlx5/freebsd/mlx5_ethdev_os.c > create mode 100644 drivers/net/mlx5/freebsd/mlx5_flow_os.c > create mode 100644 drivers/net/mlx5/freebsd/mlx5_flow_os.h > create mode 100644 drivers/net/mlx5/freebsd/mlx5_mp_os.c > create mode 100644 drivers/net/mlx5/freebsd/mlx5_os.c > create mode 100644 drivers/net/mlx5/freebsd/mlx5_os.h > create mode 100644 drivers/net/mlx5/freebsd/mlx5_socket.c > create mode 100644 drivers/net/mlx5/freebsd/mlx5_verbs.c > create mode 100644 drivers/net/mlx5/freebsd/mlx5_verbs.h > create mode 100644 drivers/net/mlx5/freebsd/mlx5_vlan_os.c That's a lot of new code, thanks for the effort. Please could you summarize which features are supported, what are the dependencies, and how is it tested?
Re: [dpdk-dev] [RFC 3/3] app/testpmd: fix hex string parser in flow commands
Hi Ovsiienko, Can you please provide the flow command which causes "segmentation fault". Thanks Aman
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
Hi Bruce, On 24/09/2021 17:18, Bruce Richardson wrote: When DPDK is run with --in-memory mode, multiple processes can run simultaneously using the same runtime dir. This leads to each process removing another process' telemetry socket as it started up, giving unexpected behaviour. This patch changes that behaviour to first check if the existing socket is active. If not, it's an old socket to be cleaned up and can be removed. If it is active, telemetry initialization fails and an error message is printed out giving instructions on how to remove the error; either by using file-prefix to have a different runtime dir (and therefore socket path) or by disabling telemetry if it not needed. telemetry is enabled by default but it may not be used by the application. Hitting this issue will cause rte_eal_init() to fail which will probably stop or severely limit the application. So it could change a working application to a non-working one (albeit one that doesn't interfere with other process' sockets). Can it just print a warning that telemetry will not be enabled and continue so it's not returning an rte_eal_init failure? A more minor thing, I see it changes the behaviour from, last one runs with telemetry, to, first one runs with telemetry. Though it can be figured from the commit message, it might be worth calling that change out explicitly. thanks, Kevin. Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality") Cc: sta...@dpdk.org Reported-by: David Marchand Signed-off-by: Bruce Richardson --- v2: fix build error on FreeBSD --- lib/telemetry/telemetry.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c index 8304fbf6e9..78508c1a1d 100644 --- a/lib/telemetry/telemetry.c +++ b/lib/telemetry/telemetry.c @@ -457,15 +457,30 @@ create_socket(char *path) struct sockaddr_un sun = {.sun_family = AF_UNIX}; strlcpy(sun.sun_path, path, sizeof(sun.sun_path)); - unlink(sun.sun_path); + if (bind(sock, (void *) &sun, sizeof(sun)) < 0) { struct stat st; - TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno)); - if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) + /* first check if we have a runtime dir */ + if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) { TMTY_LOG(ERR, "Cannot access DPDK runtime directory: %s\n", socket_dir); - sun.sun_path[0] = 0; - goto error; + goto error; + } + + /* check if current socket is active */ + if (connect(sock, (void *)&sun, sizeof(sun)) == 0) { + TMTY_LOG(ERR, "Error binding telemetry socket, path already in use\n"); + TMTY_LOG(ERR, "Use '--file-prefix' to select a different socket path, or '--no-telemetry' to disable\n"); + path[0] = 0; + goto error; + } + + /* socket is not active, delete and attempt rebind */ + unlink(sun.sun_path); + if (bind(sock, (void *) &sun, sizeof(sun)) < 0) { + TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno)); + goto error; + } } if (listen(sock, 1) < 0) {
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
> -Original Message- > From: Xueming(Steven) Li > Sent: Wednesday, September 29, 2021 1:09 PM > To: jerinjac...@gmail.com; Ananyev, Konstantin > Cc: NBU-Contact-Thomas Monjalon ; > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue> > On Wed, 2021-09-29 at 09:52 +, Ananyev, Konstantin wrote: > > > > > -Original Message- > > > From: Xueming(Steven) Li > > > Sent: Wednesday, September 29, 2021 10:13 AM > > > To: jerinjac...@gmail.com; Ananyev, Konstantin > > > > > > Cc: NBU-Contact-Thomas Monjalon ; > > > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > > > > > On Wed, 2021-09-29 at 00:26 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > > > port > > > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > > > > ethdev receive queues land into > > > > > > > the same > > > > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > > > > > > > > > > > > > The > > > > > > > > > > > > > > forwarding logic has to be changed slightly in > > > > > > > > > > > > > > case > > > > > > > > > > > > > > of shared rxq. > > > > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > >
[dpdk-dev] [PATCH 1/2] net/i40e: fix generic build on FreeBSD
The common header file for vectorization is included in multiple files, and so must use macros for the current compilation unit, rather than the compiler-capability flag set for the whole driver. With the current, incorrect, macro, the AVX512 or AVX2 flags may be set when compiling up SSE code, leading to compilation errors. Changing from "CC_AVX*_SUPPORT" to the compiler-defined "__AVX*__" macros fixes this issue. In addition, splitting AVX-specific code into the new i40e_rxtx_common_avx.h header file to avoid such bugs. Bugzilla ID: 788 Fixes: 0604b1f2208f ("net/i40e: fix crash in AVX512") Cc: wenzhuo...@intel.com Cc: sta...@dpdk.org Signed-off-by: Leyi Rong Signed-off-by: Bruce Richardson --- drivers/net/i40e/i40e_rxtx_common_avx.h | 214 drivers/net/i40e/i40e_rxtx_vec_common.h | 200 +- 2 files changed, 218 insertions(+), 196 deletions(-) create mode 100644 drivers/net/i40e/i40e_rxtx_common_avx.h diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h new file mode 100644 index 00..cfc1e63173 --- /dev/null +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -0,0 +1,214 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2015 Intel Corporation + */ + +#ifndef _I40E_RXTX_COMMON_AVX_H_ +#define _I40E_RXTX_COMMON_AVX_H_ +#include +#include +#include + +#include "i40e_ethdev.h" +#include "i40e_rxtx.h" + +#ifndef __INTEL_COMPILER +#pragma GCC diagnostic ignored "-Wcast-qual" +#endif + +#ifdef __AVX2__ +static __rte_always_inline void +i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) +{ + int i; + uint16_t rx_id; + volatile union i40e_rx_desc *rxdp; + struct i40e_rx_entry *rxep = &rxq->sw_ring[rxq->rxrearm_start]; + + rxdp = rxq->rx_ring + rxq->rxrearm_start; + + /* Pull 'n' more MBUFs into the software ring */ + if (rte_mempool_get_bulk(rxq->mp, +(void *)rxep, +RTE_I40E_RXQ_REARM_THRESH) < 0) { + if (rxq->rxrearm_nb + RTE_I40E_RXQ_REARM_THRESH >= + rxq->nb_rx_desc) { + __m128i dma_addr0; + dma_addr0 = _mm_setzero_si128(); + for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { + rxep[i].mbuf = &rxq->fake_mbuf; + _mm_store_si128((__m128i *)&rxdp[i].read, + dma_addr0); + } + } + rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += + RTE_I40E_RXQ_REARM_THRESH; + return; + } + +#ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC + struct rte_mbuf *mb0, *mb1; + __m128i dma_addr0, dma_addr1; + __m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM, + RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 2 mbufs in one loop */ + for (i = 0; i < RTE_I40E_RXQ_REARM_THRESH; i += 2, rxep += 2) { + __m128i vaddr0, vaddr1; + + mb0 = rxep[0].mbuf; + mb1 = rxep[1].mbuf; + + /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != + offsetof(struct rte_mbuf, buf_addr) + 8); + vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); + vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); + + /* convert pa to dma_addr hdr/data */ + dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0); + dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1); + + /* add headroom to pa values */ + dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room); + dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); + + /* flush desc with pa dma_addr */ + _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); + _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + } +#else +#ifdef __AVX512VL__ + if (avx512) { + struct rte_mbuf *mb0, *mb1, *mb2, *mb3; + struct rte_mbuf *mb4, *mb5, *mb6, *mb7; + __m512i dma_addr0_3, dma_addr4_7; + __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 8 mbufs in one loop */ + for (i = 0; i < RTE_I40E_RXQ_REARM_THRESH; + i += 8, rxep += 8, rxdp += 8) { + __m128i vaddr0, vaddr1, vaddr2, vaddr3; + __m128i vaddr4, vaddr5, vaddr6, vaddr7; + __m256i vaddr0_1, vaddr2_3; + __m256i vaddr4_5, vaddr6_7; + __m512i vaddr0_3, vaddr4_7; + + mb0 = rxep[0].mbuf; +
[dpdk-dev] [PATCH 2/2] net/ice: fix generic build on FreeBSD
The common header file for vectorization is included in multiple files, and so must use macros for the current compilation unit, rather than the compiler-capability flag set for the whole driver. With the current, incorrect, macro, the AVX512 or AVX2 flags may be set when compiling up SSE code, leading to compilation errors. Changing from "CC_AVX*_SUPPORT" to the compiler-defined "__AVX*__" macros fixes this issue. In addition, splitting AVX-specific code into the new ice_rxtx_common_avx.h header file to avoid such bugs. Bugzilla ID: 788 Fixes: a4e480de268e ("net/ice: optimize Tx by using AVX512") Fixes: 20daa1c978b7 ("net/ice: fix crash in AVX512") Cc: wenzhuo...@intel.com Cc: leyi.r...@intel.com Cc: sta...@dpdk.org Signed-off-by: Leyi Rong Signed-off-by: Bruce Richardson --- drivers/net/ice/ice_rxtx_common_avx.h | 213 ++ drivers/net/ice/ice_rxtx_vec_common.h | 205 + 2 files changed, 218 insertions(+), 200 deletions(-) create mode 100644 drivers/net/ice/ice_rxtx_common_avx.h diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h new file mode 100644 index 00..81e0db5dd3 --- /dev/null +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -0,0 +1,213 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Intel Corporation + */ + +#ifndef _ICE_RXTX_COMMON_AVX_H_ +#define _ICE_RXTX_COMMON_AVX_H_ + +#include "ice_rxtx.h" + +#ifndef __INTEL_COMPILER +#pragma GCC diagnostic ignored "-Wcast-qual" +#endif + +#ifdef __AVX2__ +static __rte_always_inline void +ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) +{ + int i; + uint16_t rx_id; + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_entry *rxep = &rxq->sw_ring[rxq->rxrearm_start]; + + rxdp = rxq->rx_ring + rxq->rxrearm_start; + + /* Pull 'n' more MBUFs into the software ring */ + if (rte_mempool_get_bulk(rxq->mp, +(void *)rxep, +ICE_RXQ_REARM_THRESH) < 0) { + if (rxq->rxrearm_nb + ICE_RXQ_REARM_THRESH >= + rxq->nb_rx_desc) { + __m128i dma_addr0; + + dma_addr0 = _mm_setzero_si128(); + for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { + rxep[i].mbuf = &rxq->fake_mbuf; + _mm_store_si128((__m128i *)&rxdp[i].read, + dma_addr0); + } + } + rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += + ICE_RXQ_REARM_THRESH; + return; + } + +#ifndef RTE_LIBRTE_ICE_16BYTE_RX_DESC + struct rte_mbuf *mb0, *mb1; + __m128i dma_addr0, dma_addr1; + __m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM, + RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 2 mbufs in one loop */ + for (i = 0; i < ICE_RXQ_REARM_THRESH; i += 2, rxep += 2) { + __m128i vaddr0, vaddr1; + + mb0 = rxep[0].mbuf; + mb1 = rxep[1].mbuf; + + /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != + offsetof(struct rte_mbuf, buf_addr) + 8); + vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); + vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); + + /* convert pa to dma_addr hdr/data */ + dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0); + dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1); + + /* add headroom to pa values */ + dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room); + dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); + + /* flush desc with pa dma_addr */ + _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); + _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + } +#else +#ifdef __AVX512VL__ + if (avx512) { + struct rte_mbuf *mb0, *mb1, *mb2, *mb3; + struct rte_mbuf *mb4, *mb5, *mb6, *mb7; + __m512i dma_addr0_3, dma_addr4_7; + __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 8 mbufs in one loop */ + for (i = 0; i < ICE_RXQ_REARM_THRESH; + i += 8, rxep += 8, rxdp += 8) { + __m128i vaddr0, vaddr1, vaddr2, vaddr3; + __m128i vaddr4, vaddr5, vaddr6, vaddr7; + __m256i vaddr0_1, vaddr2_3; + __m256i vaddr4_5, vaddr6_7; + __m512i vaddr0_3, vaddr4_7; + + mb0 = rxep[0].mbuf; + mb1 = rxep[1
Re: [dpdk-dev] [PATCH 00/27] net/cnxk: support for inline ipsec
On Thu, Sep 2, 2021 at 7:46 AM Nithin Dabilpuram wrote: > > Support for inline ipsec in CN9K event mode and in Cn10K event mode and > poll mode. > > Depends-on: series-18524 ("Crypto adapter support for Marvell CNXK driver) > Depends-on: series-18262 ("security: Improve inline fast path routines") > Depends-on: series-18562 ("add lookaside IPsec additional features) Now that these patches merged to main. Please rebase based on main. Also, update the release notes for cnxk ethdev for this feature.
Re: [dpdk-dev] [PATCH v2] net/virtio: fix virtio-user init when using existing tap
On 9/28/2021 9:51 AM, David Marchand wrote: > When attaching to an existing mono queue tap, the virtio-user was not > reporting that the virtio device was not properly initialised which > prevented from starting the port later. > > $ ip tuntap add test mode tap > $ dpdk-testpmd --vdev \ > net_virtio_user0,iface=test,path=/dev/vhost-net,queues=2 -- -i > > ... > virtio_user_dev_init_mac(): (/dev/vhost-net) No valid MAC in devargs or > device, use random > vhost_kernel_open_tap(): TUNSETIFF failed: Invalid argument > vhost_kernel_enable_queue_pair(): fail to open tap for vhost kernel > virtio_user_start_device(): (/dev/vhost-net) Failed to start device > ... > Configuring Port 0 (socket 0) > vhost_kernel_open_tap(): TUNSETIFF failed: Invalid argument > vhost_kernel_enable_queue_pair(): fail to open tap for vhost kernel > virtio_set_multiple_queues(): Multiqueue configured but send command > failed, this is too late now... > Fail to start port 0: Invalid argument > Please stop the ports first > Done > > The virtio-user with vhost-kernel backend was going through a lot > of complications to initialise tap fds only when using them. > > For each qp enabled for the first time, a tapfd was created via > TUNSETIFF with unneeded additional steps (see below) and then mapped to > the right qp in the vhost-net backend. > Unneeded steps (as long as it has been done once for the port): > - tap features were queried while this is a constant on a running > system, > - the device name in DPDK was updated, > - the mac address of the tap was set, > > On subsequent qps state change, the vhost-net backend fd mapping was > updated and the associated queue/tapfd were disabled/enabled via > TUNSETQUEUE. > > Now, this patch simplifies the whole logic by keeping all tapfds opened > and in enabled state (from the tap point of view) at all time. > > Unused ioctl defines are removed. > > Tap features are validated earlier to fail initialisation asap. > Tap name discovery and mac address configuration are moved when > configuring qp 0. > > To support attaching to mono queue tap, the virtio-user driver now tries > to attach in multi queue first, then fallbacks to mono queue. > > Finally (but this is more for consistency), VIRTIO_NET_F_MQ feature is > exposed only if the underlying tap supports multi queue. > > Signed-off-by: David Marchand Do we want to backport this patch? If so can you please provide a fixes tag?
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, 2021-09-29 at 10:20 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > > In current DPDK framework, each RX > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX > > > > > > > > > > > > > > > > > queue. > > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > > configuration in a switch domain > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > > RX queue set by specifying sharing > > > > > > > > > > > > > > > > > group. > > > > > > > > > > > > > > > > > Polling any queue using same shared > > > > > > > > > > > > > > > > > RX > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be > > > > > > > > > > > > > > > > > polled > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group > > > > > > > > > > > > > > > > > ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > > so can this name be changed > > > > > > > > > > > > > > > > specifically to > > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the > > > > > > > > > > > > > > > > flow > > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > > port > > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, > > > > > > > > > > > > > > multiple > > > > > > > > > > > > > > ethdev receive queues land into > > > > > > the same > > > > > > > > > > > > > > receive queue, In that case, how the flow > > > > > > > > > > > > > > order > > > > > > > > > > > > > > is > > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward > > > > > > > > > > > > > stream? > > > > > > > > > > > > > The > > > > > > > > > > > > > forwarding logic has to be changed slightly > > > > > > > > > > > > > in > > > > > > > > > > > > > case > > > > > > > > > > > > > of shared rxq. > > > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > > > lookup > > > > > > > > > > > > > source stream according to mbuf->port, > > > > > > > > > > > > > forwarding > > > > > > > > > > > > > to > > > > > > > > > > > > > target fs. > > > > > > > > > > > > > Packets from same source port could be > > > > > > > > > > > > > grouped as > > > > > > > > > > > > > a > > > > > > > > > > > > > small burst to process, this will accelerates > > > > > > > > > > > > > the > > > > > > > > > > > > > performance if traffic > > > > > > come from > > > > > >
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
On Wed, Sep 29, 2021 at 01:28:53PM +0100, Kevin Traynor wrote: > Hi Bruce, > > On 24/09/2021 17:18, Bruce Richardson wrote: > > When DPDK is run with --in-memory mode, multiple processes can run > > simultaneously using the same runtime dir. This leads to each process > > removing another process' telemetry socket as it started up, giving > > unexpected behaviour. > > > > This patch changes that behaviour to first check if the existing socket > > is active. If not, it's an old socket to be cleaned up and can be > > removed. If it is active, telemetry initialization fails and an error > > message is printed out giving instructions on how to remove the error; > > either by using file-prefix to have a different runtime dir (and > > therefore socket path) or by disabling telemetry if it not needed. > > > > telemetry is enabled by default but it may not be used by the application. > Hitting this issue will cause rte_eal_init() to fail which will probably > stop or severely limit the application. > > So it could change a working application to a non-working one (albeit one > that doesn't interfere with other process' sockets). > > Can it just print a warning that telemetry will not be enabled and continue > so it's not returning an rte_eal_init failure? > For a backported fix, yes, that would probably be better behaviour, but for the latest branch, I think returning error and having the user explicitly choose the resolution they want to occur is best. I'll see about doing a separate backport patch for 20.11. > A more minor thing, I see it changes the behaviour from, last one runs with > telemetry, to, first one runs with telemetry. Though it can be figured from > the commit message, it might be worth calling that change out explicitly. > Sure. I'll resubmit a new version of this without stable CC'ed and include that behaviour change explicitly in the commit log. /Bruce
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
On Wed, Sep 29, 2021 at 02:32:02PM +0100, Bruce Richardson wrote: > On Wed, Sep 29, 2021 at 01:28:53PM +0100, Kevin Traynor wrote: > > Hi Bruce, > > > > On 24/09/2021 17:18, Bruce Richardson wrote: > > > When DPDK is run with --in-memory mode, multiple processes can run > > > simultaneously using the same runtime dir. This leads to each process > > > removing another process' telemetry socket as it started up, giving > > > unexpected behaviour. > > > > > > This patch changes that behaviour to first check if the existing socket > > > is active. If not, it's an old socket to be cleaned up and can be > > > removed. If it is active, telemetry initialization fails and an error > > > message is printed out giving instructions on how to remove the error; > > > either by using file-prefix to have a different runtime dir (and > > > therefore socket path) or by disabling telemetry if it not needed. > > > > > > > telemetry is enabled by default but it may not be used by the application. > > Hitting this issue will cause rte_eal_init() to fail which will probably > > stop or severely limit the application. > > > > So it could change a working application to a non-working one (albeit one > > that doesn't interfere with other process' sockets). > > > > Can it just print a warning that telemetry will not be enabled and continue > > so it's not returning an rte_eal_init failure? > > > > For a backported fix, yes, that would probably be better behaviour, but for > the latest branch, I think returning error and having the user explicitly > choose the resolution they want to occur is best. I'll see about doing a > separate backport patch for 20.11. > > > A more minor thing, I see it changes the behaviour from, last one runs with > > telemetry, to, first one runs with telemetry. Though it can be figured from > > the commit message, it might be worth calling that change out explicitly. > > > > Sure. I'll resubmit a new version of this without stable CC'ed and include > that behaviour change explicitly in the commit log. > Actually, subtle behaviour change would be in the backport version that doesn't error out, so I'll note it there when doing that patch, not in the v3 of this one. /Bruce
[dpdk-dev] [PATCH v3] telemetry: fix "in-memory" process socket conflicts
When DPDK is run with --in-memory mode, multiple processes can run simultaneously using the same runtime dir. This leads to each process, as it starts up, removing the telemetry socket of another process, giving unexpected behaviour. This patch changes that behaviour to first check if the existing socket is active. If not, it's an old socket to be cleaned up and can be removed. If it is active, telemetry initialization fails and an error message is printed out giving instructions on how to remove the error; either by using file-prefix to have a different runtime dir (and therefore socket path) or by disabling telemetry if it not needed. Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality") Reported-by: David Marchand Signed-off-by: Bruce Richardson Acked-by: Ciara Power --- V3: Drop CC stable, as will have separate backport patch which does not error out, so avoiding causing problems with currently running application V2: fix build error on FreeBSD --- lib/telemetry/telemetry.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c index 8304fbf6e9..78508c1a1d 100644 --- a/lib/telemetry/telemetry.c +++ b/lib/telemetry/telemetry.c @@ -457,15 +457,30 @@ create_socket(char *path) struct sockaddr_un sun = {.sun_family = AF_UNIX}; strlcpy(sun.sun_path, path, sizeof(sun.sun_path)); - unlink(sun.sun_path); + if (bind(sock, (void *) &sun, sizeof(sun)) < 0) { struct stat st; - TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno)); - if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) + /* first check if we have a runtime dir */ + if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) { TMTY_LOG(ERR, "Cannot access DPDK runtime directory: %s\n", socket_dir); - sun.sun_path[0] = 0; - goto error; + goto error; + } + + /* check if current socket is active */ + if (connect(sock, (void *)&sun, sizeof(sun)) == 0) { + TMTY_LOG(ERR, "Error binding telemetry socket, path already in use\n"); + TMTY_LOG(ERR, "Use '--file-prefix' to select a different socket path, or '--no-telemetry' to disable\n"); + path[0] = 0; + goto error; + } + + /* socket is not active, delete and attempt rebind */ + unlink(sun.sun_path); + if (bind(sock, (void *) &sun, sizeof(sun)) < 0) { + TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno)); + goto error; + } } if (listen(sock, 1) < 0) { -- 2.30.2
Re: [dpdk-dev] [PATCH v5 2/2] ethdev: change queue release callback
On Wed, 2021-09-22 at 12:54 +, Xueming(Steven) Li wrote: > On Wed, 2021-09-22 at 11:57 +0100, Ferruh Yigit wrote: > > > > > > > > <...> > > > > > > > > > void > > > > > -i40e_dev_rx_queue_release(void *rxq) > > > > > +i40e_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid) > > > > > +{ > > > > > + i40e_rx_queue_release(dev->data->rx_queues[qid]); > > > > > +} > > > > > + > > > > > +void > > > > > +i40e_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid) > > > > > +{ > > > > > + i40e_tx_queue_release(dev->data->tx_queues[qid]); > > > > > +} > > > > > + > > > > > > > > Is there any specific reason to not update driver but add wrappers for > > > > it? > > > > > > Some caller don't have queue ID on hand, adding wrapper seems more > > > convinient. > > > > > > > Convinient for the patch, but not sure convinient for the driver. > > > > As mentioned before, not sure about approach to update some driver and add > > wrappers for some others. > > > > qede, ice and i40e seems not updated, I am for syncronizing with their > > maintainers before proceed. > > > > > > > For qede, qede_tx_queue_release(txq_obj) is called by > qede_alloc_tx_queue_mem(dev, qid), while upper caller > qede_tx_queue_setup() doesn't always save txq_obj to dev->data->txqs[]. > > For ice and i40e, it's similar, ice_tx_queue_release() is used to free > txq, but some txq isn't saved into dev, please refer to > ice_fdir_setup(), wrapper is needed. > > These 3 PMDs create rxq/txq that not saved in dev->data, can't change > parameter to dev+qid for such case, that's why wrapper was there. > Hi Ferruh, No response from qede, ice and i40e. Basically the original queue release api is shared by private queues(not registered to ethdev), can't access by index, that why a warpper was there. To avoid more rebase in last minute for this big patch, do you think we could close it? BTW, from feedback from hns3, I will post a new version to add the macro.
[dpdk-dev] [PATCH v4] net/af_packet: reinsert the stripped vlan tag
The af_packet pmd driver binds to a raw socket and allows sending and receiving of packets through the kernel. Since commit [1], the kernel strips the vlan tags early in __netif_receive_skb_core(), so we receive untagged packets while running with the af_packet pmd. Luckily for us, the skb vlan-related fields are still populated from the stripped vlan tags, so we end up having all the information that we need in the mbuf. Having the pmd driver support DEV_RX_OFFLOAD_VLAN_STRIP allows the application to control the desired vlan stripping behavior, until we have a way to describe offloads that can't be disabled by pmd drivers. This patch will cause a change in the default way that the af_packet pmd treats received vlan-tagged frames. While previously, the application was required to check the PKT_RX_VLAN_STRIPPED flag, after this patch, the pmd will re-insert the vlan tag transparently to the user, unless the DEV_RX_OFFLOAD_VLAN_STRIP is enabled in rxmode.offloads. I've attempted a preliminary benchmark to understand if the change could cause a sizable performance hit. Setup: Two virtual machines running on top of an ESXi hypervisor Tx: DPDK app (running on top of vmxnet3 PMD) Rx: af_packet (running on top of a kernel vmxnet3 interface) Packet size :68 (packet contains a vlan tag) Rates: Tx - 1.419 Mpps Rx (without vlan insertion) - 1227636 pps Rx (with vlan insertion)- 1220081 pps At a first glance, we don't seem to have a large degradation in terms of packet rate. [1] https://github.com/torvalds/linux/commit/bcc6d47903612c3861201cc3a866fb604f26b8b2 Signed-off-by: Tudor Cornea --- v4: * Updated the af_packet documentation v3: * Updated release note and documentation * Updated commit with performance measurements v2: * Added DEV_RX_OFFLOAD_VLAN_STRIP to rxmode->offloads --- doc/guides/nics/af_packet.rst | 5 + doc/guides/rel_notes/release_21_11.rst| 4 drivers/net/af_packet/rte_eth_af_packet.c | 12 3 files changed, 21 insertions(+) diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index efd6f1c..c87310b 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -65,3 +65,8 @@ framecnt=512): .. code-block:: console --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + +Features and Limitations of the af_packet PMD +- + +Af_packet PMD now works with VLAN's on Linux diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index ad7c1af..095fd5b 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -66,6 +66,10 @@ New Features * Added rte_flow support for dual VLAN insert and strip actions. +* **Updated af_packet ethdev driver.** + + * Added DEV_RX_OFFLOAD_VLAN_STRIP capability. + * **Updated Marvell cnxk crypto PMD.** * Added AES-CBC SHA1-HMAC support in lookaside protocol (IPsec) for CN10K. diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index b73b211..5ed9dd6 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -48,6 +48,7 @@ struct pkt_rx_queue { struct rte_mempool *mb_pool; uint16_t in_port; + uint8_t vlan_strip; volatile unsigned long rx_pkts; volatile unsigned long rx_bytes; @@ -78,6 +79,7 @@ struct pmd_internals { struct pkt_rx_queue *rx_queue; struct pkt_tx_queue *tx_queue; + uint8_t vlan_strip; }; static const char *valid_arguments[] = { @@ -148,6 +150,9 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) if (ppd->tp_status & TP_STATUS_VLAN_VALID) { mbuf->vlan_tci = ppd->tp_vlan_tci; mbuf->ol_flags |= (PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED); + + if (!pkt_q->vlan_strip && rte_vlan_insert(&mbuf)) + PMD_LOG(ERR, "Failed to reinsert VLAN tag"); } /* release incoming frame and advance ring buffer */ @@ -302,6 +307,11 @@ eth_dev_stop(struct rte_eth_dev *dev) static int eth_dev_configure(struct rte_eth_dev *dev __rte_unused) { + struct rte_eth_conf *dev_conf = &dev->data->dev_conf; + const struct rte_eth_rxmode *rxmode = &dev_conf->rxmode; + struct pmd_internals *internals = dev->data->dev_private; + + internals->vlan_strip = !!(rxmode->offloads & DEV_RX_OFFLOAD_VLAN_STRIP); return 0; } @@ -318,6 +328,7 @@ eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->min_rx_bufsize = 0; dev_info->tx_offload_capa = DEV_TX_OFFLOAD_MULTI_SEGS | DEV_TX_OFFLOAD_VLAN_INSERT; + dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP; return 0; } @@ -448,6 +459,7 @@ eth_rx_q
Re: [dpdk-dev] [PATCH v3] net/af_packet: reinsert the stripped vlan tag
Thanks Stephen, for the suggestion I've sent v4 of the patch, which adds the succinct description in the af_packet documentation.. On Fri, 24 Sept 2021 at 18:11, Stephen Hemminger wrote: > On Fri, 24 Sep 2021 14:44:45 +0300 > Tudor Cornea wrote: > > > +Features and Limitations of the af_packet PMD > > +- > > + > > +Since the following commit, the Linux kernel strips the vlan tag > > + > > +.. code-block:: console > > + > > +commit bcc6d47903612c3861201cc3a866fb604f26b8b2 > > +Author: Jiri Pirko > > +Date: Thu Apr 7 19:48:33 2011 + > > + > > + net: vlan: make non-hw-accel rx path similar to hw-accel > > + > > +Running on such a kernel results in receiving untagged frames while > using > > +the af_packet PMD. Fortunately, the stripped information is still > available > > +for use in ``mbuf->vlan_tci``, and applications could check > ``PKT_RX_VLAN_STRIPPED``. > > + > > +However, we currently don't have a way to describe offloads which can't > be > > +disabled by PMDs, and this creates an inconsistency with the way > applications > > +expect the PMD offloads to work, and requires them to be aware of which > > +underlying driver they use. > > + > > +Since release 21.11 the af_packet PMD will implement support for the > > +``DEV_RX_OFFLOAD_VLAN_STRIP`` offload, and users can control the > desired vlan > > +stripping behavior. > > + > > +It's important to note that the default case will change. If previously, > > +the vlan tag was stripped, if the application now requires the same > behavior, > > +it will need to configure ``rxmode.offloads`` with > ``DEV_RX_OFFLOAD_VLAN_STRIP``. > > + > > +The PMD driver will re-insert the vlan tag transparently to the > application > > +if the kernel strips it, as long as the ``DEV_RX_OFFLOAD_VLAN_STRIP`` > is not > > +enabled. > > This seems like the wrong place for this text. > It is not a new feature, it is a bug fix. > > If you want to describe it as an enhancement, the text should be succinct > and describe > the situation from user point of view. Something like: > > Af_packet PMD now works with VLAN's on Linux > >
[dpdk-dev] [PATCH 0/4] net/mlx5: implicit mempool registration
From: Dmitry Kozlyuk MLX5 hardware has its internal IOMMU where PMD registers the memory. On the data path, PMD translates VA into a key consumed by the device IOMMU. It is impractical for the PMD to register all allocated memory because of increased lookup cost both in HW and SW. Most often mbuf memory comes from mempools, so if PMD tracks them, it can almost always have mbuf memory registered before an mbuf hits the PMD. This patchset adds such tracking in the PMD and internal API to support it. Please see [1] for a more thorough explanation of the patch 2/4 and how it can be useful outside of the MLX5 PMD. [1]: http://inbox.dpdk.org/dev/ch0pr12mb509112fadb778ab28af3771db9...@ch0pr12mb5091.namprd12.prod.outlook.com/ v2 (internal review and testing): 1. Change tracked mempool event from being created (CREATE) to being fully populated (READY), which is the state PMD is interested in. 2. Unit test the new mempool callback API. 3. Remove bogus "error" messages in normal conditions. 4. Fixes in PMD. Dmitry Kozlyuk (4): mempool: add event callbacks mempool: add non-IO flag common/mlx5: add mempool registration facilities net/mlx5: support mempool registration app/test/test_mempool.c| 75 doc/guides/nics/mlx5.rst | 11 + doc/guides/rel_notes/release_21_11.rst | 9 + drivers/common/mlx5/mlx5_common_mp.c | 50 +++ drivers/common/mlx5/mlx5_common_mp.h | 14 + drivers/common/mlx5/mlx5_common_mr.c | 580 + drivers/common/mlx5/mlx5_common_mr.h | 17 + drivers/common/mlx5/version.map| 5 + drivers/net/mlx5/linux/mlx5_mp_os.c| 44 ++ drivers/net/mlx5/linux/mlx5_os.c | 4 +- drivers/net/mlx5/mlx5.c| 152 +++ drivers/net/mlx5/mlx5.h| 10 + drivers/net/mlx5/mlx5_mr.c | 120 ++--- drivers/net/mlx5/mlx5_mr.h | 2 - drivers/net/mlx5/mlx5_rx.h | 21 +- drivers/net/mlx5/mlx5_rxq.c| 13 + drivers/net/mlx5/mlx5_trigger.c| 77 +++- drivers/net/mlx5/windows/mlx5_os.c | 1 + lib/mempool/rte_mempool.c | 143 +- lib/mempool/rte_mempool.h | 60 +++ lib/mempool/version.map| 8 + 21 files changed, 1297 insertions(+), 119 deletions(-) -- 2.25.1
[dpdk-dev] [PATCH v2 1/4] mempool: add event callbacks
From: Dmitry Kozlyuk Performance of MLX5 PMD of different classes can benefit if PMD knows which memory it will need to handle in advance, before the first mbuf is sent to the PMD. It is impractical, however, to consider all allocated memory for this purpose. Most often mbuf memory comes from mempools that can come and go. PMD can enumerate existing mempools on device start, but it also needs to track creation and destruction of mempools after the forwarding starts but before an mbuf from the new mempool is sent to the device. Add an internal API to register callback for mempool lify cycle events, currently RTE_MEMPOOL_EVENT_READY (after populating) and RTE_MEMPOOL_EVENT_DESTROY (before freeing): * rte_mempool_event_callback_register() * rte_mempool_event_callback_unregister() Provide a unit test for the new API. Signed-off-by: Dmitry Kozlyuk Acked-by: Matan Azrad --- app/test/test_mempool.c | 75 lib/mempool/rte_mempool.c | 143 +- lib/mempool/rte_mempool.h | 56 +++ lib/mempool/version.map | 8 +++ 4 files changed, 279 insertions(+), 3 deletions(-) diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c index 7675a3e605..0c4ed7c60b 100644 --- a/app/test/test_mempool.c +++ b/app/test/test_mempool.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -471,6 +472,74 @@ test_mp_mem_init(struct rte_mempool *mp, data->ret = 0; } +struct test_mempool_events_data { + struct rte_mempool *mp; + enum rte_mempool_event event; + bool invoked; +}; + +static void +test_mempool_events_cb(enum rte_mempool_event event, + struct rte_mempool *mp, void *arg) +{ + struct test_mempool_events_data *data = arg; + + data->mp = mp; + data->event = event; + data->invoked = true; +} + +static int +test_mempool_events(int (*populate)(struct rte_mempool *mp)) +{ + struct test_mempool_events_data data; + struct rte_mempool *mp; + int ret; + + ret = rte_mempool_event_callback_register(NULL, &data); + RTE_TEST_ASSERT_NOT_EQUAL(ret, 0, "Registered a NULL callback"); + + memset(&data, 0, sizeof(data)); + ret = rte_mempool_event_callback_register(test_mempool_events_cb, + &data); + RTE_TEST_ASSERT_EQUAL(ret, 0, "Failed to register the callback: %s", + rte_strerror(rte_errno)); + + mp = rte_mempool_create_empty("empty", MEMPOOL_SIZE, + MEMPOOL_ELT_SIZE, 0, 0, + SOCKET_ID_ANY, 0); + RTE_TEST_ASSERT_NOT_NULL(mp, "Cannot create an empty mempool: %s", +rte_strerror(rte_errno)); + RTE_TEST_ASSERT_EQUAL(data.invoked, false, + "Callback invoked on an empty mempool creation"); + + rte_mempool_set_ops_byname(mp, rte_mbuf_best_mempool_ops(), NULL); + ret = populate(mp); + RTE_TEST_ASSERT_EQUAL(ret, (int)mp->size, "Failed to populate the mempool: %s", + rte_strerror(rte_errno)); + RTE_TEST_ASSERT_EQUAL(data.invoked, true, + "Callback not invoked on an empty mempool population"); + RTE_TEST_ASSERT_EQUAL(data.event, RTE_MEMPOOL_EVENT_READY, + "Wrong callback invoked, expected READY"); + RTE_TEST_ASSERT_EQUAL(data.mp, mp, + "Callback invoked for a wrong mempool"); + + memset(&data, 0, sizeof(data)); + rte_mempool_free(mp); + RTE_TEST_ASSERT_EQUAL(data.invoked, true, + "Callback not invoked on mempool destruction"); + RTE_TEST_ASSERT_EQUAL(data.event, RTE_MEMPOOL_EVENT_DESTROY, + "Wrong callback invoked, expected DESTROY"); + RTE_TEST_ASSERT_EQUAL(data.mp, mp, + "Callback invoked for a wrong mempool"); + + ret = rte_mempool_event_callback_unregister(test_mempool_events_cb, + &data); + RTE_TEST_ASSERT_EQUAL(ret, 0, "Failed to unregister the callback: %s", + rte_strerror(rte_errno)); + return 0; +} + static int test_mempool(void) { @@ -645,6 +714,12 @@ test_mempool(void) if (test_mempool_basic(default_pool, 1) < 0) GOTO_ERR(ret, err); + /* test mempool event callbacks */ + if (test_mempool_events(rte_mempool_populate_default) < 0) + GOTO_ERR(ret, err); + if (test_mempool_events(rte_mempool_populate_anon) < 0) + GOTO_ERR(ret, err); + rte_mempool_list_dump(stdout); ret = 0; diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c index 59a588425b..c6cb99ba48 100644 --- a/lib/mempool/rte_mempool.c +++ b/lib/mempool/rte_mempool.c @
[dpdk-dev] [PATCH v2 2/4] mempool: add non-IO flag
From: Dmitry Kozlyuk Mempool is a generic allocator that is not necessarily used for device IO operations and its memory for DMA. Add MEMPOOL_F_NON_IO flag to mark such mempools. Signed-off-by: Dmitry Kozlyuk Acked-by: Matan Azrad --- doc/guides/rel_notes/release_21_11.rst | 3 +++ lib/mempool/rte_mempool.h | 4 2 files changed, 7 insertions(+) diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index f85dc99c8b..873beda633 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -155,6 +155,9 @@ API Changes the crypto/security operation. This field will be used to communicate events such as soft expiry with IPsec in lookaside mode. +* mempool: Added ``MEMPOOL_F_NON_IO`` flag to give a hint to DPDK components + that objects from this pool will not be used for device IO (e.g. DMA). + ABI Changes --- diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h index c81e488851..4d18957d6d 100644 --- a/lib/mempool/rte_mempool.h +++ b/lib/mempool/rte_mempool.h @@ -263,6 +263,7 @@ struct rte_mempool { #define MEMPOOL_F_SC_GET 0x0008 /**< Default get is "single-consumer".*/ #define MEMPOOL_F_POOL_CREATED 0x0010 /**< Internal: pool is created. */ #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */ +#define MEMPOOL_F_NON_IO 0x0040 /**< Not used for device IO (DMA). */ /** * @internal When debug is enabled, store some statistics. @@ -992,6 +993,9 @@ typedef void (rte_mempool_ctor_t)(struct rte_mempool *, void *); * "single-consumer". Otherwise, it is "multi-consumers". * - MEMPOOL_F_NO_IOVA_CONTIG: If set, allocated objects won't * necessarily be contiguous in IO memory. + * - MEMPOOL_F_NO_IO: If set, the mempool is considered to be + * never used for device IO, i.e. DMA operations, + * which may affect some PMD behavior. * @return * The pointer to the new allocated mempool, on success. NULL on error * with rte_errno set appropriately. Possible rte_errno values include: -- 2.25.1
[dpdk-dev] [PATCH v2 3/4] common/mlx5: add mempool registration facilities
From: Dmitry Kozlyuk Add internal API to register mempools, that is, to create memory regions (MR) for their memory and store them in a separate database. Implementation deals with multi-process, so that class drivers don't need to. Each protection domain has its own database. Memory regions can be shared within a database if they represent a single hugepage covering one or more mempools entirely. Add internal API to lookup an MR key for an address that belongs to a known mempool. It is a responsibility of a class driver to extract the mempool from an mbuf. Signed-off-by: Dmitry Kozlyuk Acked-by: Matan Azrad --- drivers/common/mlx5/mlx5_common_mp.c | 50 +++ drivers/common/mlx5/mlx5_common_mp.h | 14 + drivers/common/mlx5/mlx5_common_mr.c | 580 +++ drivers/common/mlx5/mlx5_common_mr.h | 17 + drivers/common/mlx5/version.map | 5 + 5 files changed, 666 insertions(+) diff --git a/drivers/common/mlx5/mlx5_common_mp.c b/drivers/common/mlx5/mlx5_common_mp.c index 673a7c31de..6dfc5535e0 100644 --- a/drivers/common/mlx5/mlx5_common_mp.c +++ b/drivers/common/mlx5/mlx5_common_mp.c @@ -54,6 +54,56 @@ mlx5_mp_req_mr_create(struct mlx5_mp_id *mp_id, uintptr_t addr) return ret; } +/** + * @param mp_id + * ID of the MP process. + * @param share_cache + * Shared MR cache. + * @param pd + * Protection domain. + * @param mempool + * Mempool to register or unregister. + * @param reg + * True to register the mempool, False to unregister. + */ +int +mlx5_mp_req_mempool_reg(struct mlx5_mp_id *mp_id, + struct mlx5_mr_share_cache *share_cache, void *pd, + struct rte_mempool *mempool, bool reg) +{ + struct rte_mp_msg mp_req; + struct rte_mp_msg *mp_res; + struct rte_mp_reply mp_rep; + struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param; + struct mlx5_mp_arg_mempool_reg *arg = &req->args.mempool_reg; + struct mlx5_mp_param *res; + struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0}; + enum mlx5_mp_req_type type; + int ret; + + MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_SECONDARY); + type = reg ? MLX5_MP_REQ_MEMPOOL_REGISTER : +MLX5_MP_REQ_MEMPOOL_UNREGISTER; + mp_init_msg(mp_id, &mp_req, type); + arg->share_cache = share_cache; + arg->pd = pd; + arg->mempool = mempool; + ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts); + if (ret) { + DRV_LOG(ERR, "port %u request to primary process failed", + mp_id->port_id); + return -rte_errno; + } + MLX5_ASSERT(mp_rep.nb_received == 1); + mp_res = &mp_rep.msgs[0]; + res = (struct mlx5_mp_param *)mp_res->param; + ret = res->result; + if (ret) + rte_errno = -ret; + mlx5_free(mp_rep.msgs); + return ret; +} + /** * Request Verbs queue state modification to the primary process. * diff --git a/drivers/common/mlx5/mlx5_common_mp.h b/drivers/common/mlx5/mlx5_common_mp.h index 6829141fc7..527bf3cad8 100644 --- a/drivers/common/mlx5/mlx5_common_mp.h +++ b/drivers/common/mlx5/mlx5_common_mp.h @@ -14,6 +14,8 @@ enum mlx5_mp_req_type { MLX5_MP_REQ_VERBS_CMD_FD = 1, MLX5_MP_REQ_CREATE_MR, + MLX5_MP_REQ_MEMPOOL_REGISTER, + MLX5_MP_REQ_MEMPOOL_UNREGISTER, MLX5_MP_REQ_START_RXTX, MLX5_MP_REQ_STOP_RXTX, MLX5_MP_REQ_QUEUE_STATE_MODIFY, @@ -33,6 +35,12 @@ struct mlx5_mp_arg_queue_id { uint16_t queue_id; /* DPDK queue ID. */ }; +struct mlx5_mp_arg_mempool_reg { + struct mlx5_mr_share_cache *share_cache; + void *pd; /* NULL for MLX5_MP_REQ_MEMPOOL_UNREGISTER */ + struct rte_mempool *mempool; +}; + /* Pameters for IPC. */ struct mlx5_mp_param { enum mlx5_mp_req_type type; @@ -41,6 +49,8 @@ struct mlx5_mp_param { RTE_STD_C11 union { uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */ + struct mlx5_mp_arg_mempool_reg mempool_reg; + /* MLX5_MP_REQ_MEMPOOL_(UN)REGISTER */ struct mlx5_mp_arg_queue_state_modify state_modify; /* MLX5_MP_REQ_QUEUE_STATE_MODIFY */ struct mlx5_mp_arg_queue_id queue_id; @@ -91,6 +101,10 @@ void mlx5_mp_uninit_secondary(const char *name); __rte_internal int mlx5_mp_req_mr_create(struct mlx5_mp_id *mp_id, uintptr_t addr); __rte_internal +int mlx5_mp_req_mempool_reg(struct mlx5_mp_id *mp_id, + struct mlx5_mr_share_cache *share_cache, void *pd, + struct rte_mempool *mempool, bool reg); +__rte_internal int mlx5_mp_req_queue_state_modify(struct mlx5_mp_id *mp_id, struct mlx5_mp_arg_queue_state_modify *sm); __rte_internal diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c index 98fe8698e2..2e039a4e70 100644 --- a
[dpdk-dev] [PATCH v2 4/4] net/mlx5: support mempool registration
From: Dmitry Kozlyuk When the first port in a given protection domain (PD) starts, install a mempool event callback for this PD and register all existing memory regions (MR) for it. When the last port in a PD closes, remove the callback and unregister all mempools for this PD. This behavior can be switched off with a new devarg: mr_mempool_reg_en. On TX slow path, i.e. when an MR key for the address of the buffer to send is not in the local cache, first try to retrieve it from the database of registered mempools. Supported are direct and indirect mbufs, as well as externally-attached ones from MLX5 MPRQ feature. Lookup in the database of non-mempool memory is used as the last resort. RX mempools are registered regardless of the devarg value. On RX data path only the local cache and the mempool database is used. If implicit mempool registration is disabled, these mempools are unregistered at port stop, releasing the MRs. Signed-off-by: Dmitry Kozlyuk Acked-by: Matan Azrad --- doc/guides/nics/mlx5.rst | 11 ++ doc/guides/rel_notes/release_21_11.rst | 6 + drivers/net/mlx5/linux/mlx5_mp_os.c| 44 +++ drivers/net/mlx5/linux/mlx5_os.c | 4 +- drivers/net/mlx5/mlx5.c| 152 + drivers/net/mlx5/mlx5.h| 10 ++ drivers/net/mlx5/mlx5_mr.c | 120 +-- drivers/net/mlx5/mlx5_mr.h | 2 - drivers/net/mlx5/mlx5_rx.h | 21 ++-- drivers/net/mlx5/mlx5_rxq.c| 13 +++ drivers/net/mlx5/mlx5_trigger.c| 77 +++-- drivers/net/mlx5/windows/mlx5_os.c | 1 + 12 files changed, 345 insertions(+), 116 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index bae73f42d8..58d1c5b65c 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -1001,6 +1001,17 @@ Driver options Enabled by default. +- ``mr_mempool_reg_en`` parameter [int] + + A nonzero value enables implicit registration of DMA memory of all mempools + except those having ``MEMPOOL_F_NON_IO``. The effect is that when a packet + from a mempool is transmitted, its memory is already registered for DMA + in the PMD and no registration will happen on the data path. The tradeoff is + extra work on the creation of each mempool and increased HW resource use + if some mempools are not used with MLX5 devices. + + Enabled by default. + - ``representor`` parameter [list] This parameter can be used to instantiate DPDK Ethernet devices from diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 873beda633..1fc09faf96 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -106,6 +106,12 @@ New Features * Added tests to validate packets hard expiry. * Added tests to verify tunnel header verification in IPsec inbound. +* **Updated Mellanox mlx5 driver.** + + Updated the Mellanox mlx5 driver with new features and improvements, including: + + * Added implicit mempool registration to avoid data path hiccups (opt-out). + Removed Items - diff --git a/drivers/net/mlx5/linux/mlx5_mp_os.c b/drivers/net/mlx5/linux/mlx5_mp_os.c index 3a4aa766f8..d2ac375a47 100644 --- a/drivers/net/mlx5/linux/mlx5_mp_os.c +++ b/drivers/net/mlx5/linux/mlx5_mp_os.c @@ -20,6 +20,45 @@ #include "mlx5_tx.h" #include "mlx5_utils.h" +/** + * Handle a port-agnostic message. + * + * @return + * 0 on success, 1 when message is not port-agnostic, (-1) on error. + */ +static int +mlx5_mp_os_handle_port_agnostic(const struct rte_mp_msg *mp_msg, + const void *peer) +{ + struct rte_mp_msg mp_res; + struct mlx5_mp_param *res = (struct mlx5_mp_param *)mp_res.param; + const struct mlx5_mp_param *param = + (const struct mlx5_mp_param *)mp_msg->param; + const struct mlx5_mp_arg_mempool_reg *mpr; + struct mlx5_mp_id mp_id; + + switch (param->type) { + case MLX5_MP_REQ_MEMPOOL_REGISTER: + mlx5_mp_id_init(&mp_id, param->port_id); + mp_init_msg(&mp_id, &mp_res, param->type); + mpr = ¶m->args.mempool_reg; + res->result = mlx5_mr_mempool_register(mpr->share_cache, + mpr->pd, mpr->mempool, + NULL); + return rte_mp_reply(&mp_res, peer); + case MLX5_MP_REQ_MEMPOOL_UNREGISTER: + mlx5_mp_id_init(&mp_id, param->port_id); + mp_init_msg(&mp_id, &mp_res, param->type); + mpr = ¶m->args.mempool_reg; + res->result = mlx5_mr_mempool_unregister(mpr->share_cache, +mpr->mempool, NULL); + return rte_mp_reply(&mp_res, peer); + default: + return 1; + } + return -1; +} + int mlx5_m
Re: [dpdk-dev] [PATCH v2 0/3] add option to configure UDP ports verification
> Add option to indicate whether UDP encapsulation ports verification > need to be done as part of inbound IPsec processing. > CNXK PMD support and unit tests are also added for the same. > > Depends on > https://patches.dpdk.org/project/dpdk/list/?series=18755 > Series Acked-by: Akhil Goyal Applied to dpdk-next-crypto
Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
On Wed, 2021-09-29 at 12:35 +, Ananyev, Konstantin wrote: > > > -Original Message- > > From: Xueming(Steven) Li > > Sent: Wednesday, September 29, 2021 1:09 PM > > To: jerinjac...@gmail.com; Ananyev, Konstantin > > > > Cc: NBU-Contact-Thomas Monjalon ; > > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue> > > On Wed, 2021-09-29 at 09:52 +, Ananyev, Konstantin wrote: > > > > > > > -Original Message- > > > > From: Xueming(Steven) Li > > > > Sent: Wednesday, September 29, 2021 10:13 AM > > > > To: jerinjac...@gmail.com; Ananyev, Konstantin > > > > > > > > Cc: NBU-Contact-Thomas Monjalon ; > > > > andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue > > > > > > > > On Wed, 2021-09-29 at 00:26 +, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX queue > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX queue. > > > > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > > > > configuration in a switch domain could > > > > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > > > > RX queue set by specifying sharing group. > > > > > > > > > > > > > > > > > > > Polling any queue using same shared RX > > > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be polled > > > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > > > > so can this name be changed specifically to > > > > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the flow > > > > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > > > > port > > > > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple > > > > > > > > > > > > > > > > ethdev receive queues land into > > > > > > > > the same > > > > > > > > > > > > > > > > receive queue, In that case, how the flow order > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward stream? > > >
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
On 29/09/2021 14:32, Bruce Richardson wrote: On Wed, Sep 29, 2021 at 01:28:53PM +0100, Kevin Traynor wrote: Hi Bruce, On 24/09/2021 17:18, Bruce Richardson wrote: When DPDK is run with --in-memory mode, multiple processes can run simultaneously using the same runtime dir. This leads to each process removing another process' telemetry socket as it started up, giving unexpected behaviour. This patch changes that behaviour to first check if the existing socket is active. If not, it's an old socket to be cleaned up and can be removed. If it is active, telemetry initialization fails and an error message is printed out giving instructions on how to remove the error; either by using file-prefix to have a different runtime dir (and therefore socket path) or by disabling telemetry if it not needed. telemetry is enabled by default but it may not be used by the application. Hitting this issue will cause rte_eal_init() to fail which will probably stop or severely limit the application. So it could change a working application to a non-working one (albeit one that doesn't interfere with other process' sockets). Can it just print a warning that telemetry will not be enabled and continue so it's not returning an rte_eal_init failure? For a backported fix, yes, that would probably be better behaviour, but for the latest branch, I think returning error and having the user explicitly choose the resolution they want to occur is best. I'll see about doing a separate backport patch for 20.11. But this is a runtime message dependent on runtime environment. The user may not have access or know how to change eal parameters. In the case where the application doesn't care about telemetry, they have gone from not having telemetry to rte_eal_init() failing, which probably has severe consequence. I could maybe agree if telemetry was default disable and the application had set the --telemetry flag indicating that they want/need it. As it is, it feels like it's possibly a worse outcome for the user. thanks, Kevin. A more minor thing, I see it changes the behaviour from, last one runs with telemetry, to, first one runs with telemetry. Though it can be figured from the commit message, it might be worth calling that change out explicitly. Sure. I'll resubmit a new version of this without stable CC'ed and include that behaviour change explicitly in the commit log. /Bruce
Re: [dpdk-dev] [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
On 9/15/2021 9:33 AM, Ruifeng Wang wrote: > Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates > that the rest of the descriptor words have valid values. Hence, the > word containing DD bit must be read first before reading the rest of > the descriptor words. > > Since the entire descriptor is not read atomically, on relaxed memory > ordered systems like Aarch64, read of the word containing DD field > could be reordered after read of other words. > > Read barrier is inserted between read of the word with DD field > and read of other words. The barrier ensures that the fetched data > is correct. > > Testpmd single core test showed no performance drop on x86 or N1SDP. > On ThunderX2, 22% performance regression was observed. > Is 22% performance drop value correct? That is a big drop, is it acceptable? Is this performance drop valid for all Arm scalar datapath, or is it specific to ThunderX2? > Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform") > Cc: sta...@dpdk.org > > Signed-off-by: Ruifeng Wang > Reviewed-by: Honnappa Nagarahalli
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
On Wed, Sep 29, 2021 at 03:54:48PM +0100, Kevin Traynor wrote: > On 29/09/2021 14:32, Bruce Richardson wrote: > > On Wed, Sep 29, 2021 at 01:28:53PM +0100, Kevin Traynor wrote: > > > Hi Bruce, > > > > > > On 24/09/2021 17:18, Bruce Richardson wrote: > > > > When DPDK is run with --in-memory mode, multiple processes can run > > > > simultaneously using the same runtime dir. This leads to each process > > > > removing another process' telemetry socket as it started up, giving > > > > unexpected behaviour. > > > > > > > > This patch changes that behaviour to first check if the existing socket > > > > is active. If not, it's an old socket to be cleaned up and can be > > > > removed. If it is active, telemetry initialization fails and an error > > > > message is printed out giving instructions on how to remove the error; > > > > either by using file-prefix to have a different runtime dir (and > > > > therefore socket path) or by disabling telemetry if it not needed. > > > > > > > > > > telemetry is enabled by default but it may not be used by the application. > > > Hitting this issue will cause rte_eal_init() to fail which will probably > > > stop or severely limit the application. > > > > > > So it could change a working application to a non-working one (albeit one > > > that doesn't interfere with other process' sockets). > > > > > > Can it just print a warning that telemetry will not be enabled and > > > continue > > > so it's not returning an rte_eal_init failure? > > > > > > > For a backported fix, yes, that would probably be better behaviour, but for > > the latest branch, I think returning error and having the user explicitly > > choose the resolution they want to occur is best. I'll see about doing a > > separate backport patch for 20.11. > > > > But this is a runtime message dependent on runtime environment. The user may > not have access or know how to change eal parameters. True. But on the other hand, this problem only occurs with non-default EAL parameters anyway, so someone must have configured this with the --in-memory flag. > > In the case where the application doesn't care about telemetry, they have > gone from not having telemetry to rte_eal_init() failing, which probably has > severe consequence. > Yes, I agree, which I why I would suggest that for any backport of this fix, the error be made non-fatal as you suggest. [Having looked into it, having it as a non-fatal error is rather awkward, so it may be best just left unfixed and the current behaviour documented as known-issue]. However, for any application being updated and rebuilt against 21.11, I would have thought it reasonable to flag this as an error, as any such application would require revalidation anyway. > I could maybe agree if telemetry was default disable and the application had > set the --telemetry flag indicating that they want/need it. As it is, it > feels like it's possibly a worse outcome for the user. > Perhaps, but I believe the only case of there being an issue would be where: 1) a user who cannot modify the EAL parameters 2) runs an application which has been updated and rebuilt against 21.11 3) where that application is hard-coded to use in-memory mode and 4) has never been verified with two or more instances of that running? Or am I missing something here? Regards, /Bruce
Re: [dpdk-dev] [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
> > On 9/15/2021 9:33 AM, Ruifeng Wang wrote: > > Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates > > that the rest of the descriptor words have valid values. Hence, the > > word containing DD bit must be read first before reading the rest of > > the descriptor words. > > > > Since the entire descriptor is not read atomically, on relaxed memory > > ordered systems like Aarch64, read of the word containing DD field > > could be reordered after read of other words. > > > > Read barrier is inserted between read of the word with DD field and > > read of other words. The barrier ensures that the fetched data is > > correct. > > > > Testpmd single core test showed no performance drop on x86 or N1SDP. > > On ThunderX2, 22% performance regression was observed. > > > > Is 22% performance drop value correct? That is a big drop, is it acceptable? Agree, it is a big drop. Fixing it will require using the barrier less frequently. For ex: read 4 descriptors (4 words containing the DD bits) before using the barrier. > > Is this performance drop valid for all Arm scalar datapath, or is it specific > to > ThunderX2? This is specific to ThunderX2. N1 CPU does not see any impact. A72 is not tested. Considering that the ThunderXx line of CPUs are not in further development, and it is scalar path, I would not suggest to make further changes to the code. It would be good to test this on Kunpeng servers and get some feedback. > > > Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Ruifeng Wang > > Reviewed-by: Honnappa Nagarahalli
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
On Wed, Sep 29, 2021 at 04:24:06PM +0100, Bruce Richardson wrote: > On Wed, Sep 29, 2021 at 03:54:48PM +0100, Kevin Traynor wrote: > > On 29/09/2021 14:32, Bruce Richardson wrote: > > > On Wed, Sep 29, 2021 at 01:28:53PM +0100, Kevin Traynor wrote: > > > > Hi Bruce, > > > > > > > > On 24/09/2021 17:18, Bruce Richardson wrote: > > > > > When DPDK is run with --in-memory mode, multiple processes can run > > > > > simultaneously using the same runtime dir. This leads to each process > > > > > removing another process' telemetry socket as it started up, giving > > > > > unexpected behaviour. > > > > > > > > > > This patch changes that behaviour to first check if the existing > > > > > socket > > > > > is active. If not, it's an old socket to be cleaned up and can be > > > > > removed. If it is active, telemetry initialization fails and an error > > > > > message is printed out giving instructions on how to remove the error; > > > > > either by using file-prefix to have a different runtime dir (and > > > > > therefore socket path) or by disabling telemetry if it not needed. > > > > > > > > > > > > > telemetry is enabled by default but it may not be used by the > > > > application. > > > > Hitting this issue will cause rte_eal_init() to fail which will probably > > > > stop or severely limit the application. > > > > > > > > So it could change a working application to a non-working one (albeit > > > > one > > > > that doesn't interfere with other process' sockets). > > > > > > > > Can it just print a warning that telemetry will not be enabled and > > > > continue > > > > so it's not returning an rte_eal_init failure? > > > > > > > > > > For a backported fix, yes, that would probably be better behaviour, but > > > for > > > the latest branch, I think returning error and having the user explicitly > > > choose the resolution they want to occur is best. I'll see about doing a > > > separate backport patch for 20.11. > > > > > > > But this is a runtime message dependent on runtime environment. The user may > > not have access or know how to change eal parameters. > > True. But on the other hand, this problem only occurs with non-default EAL > parameters anyway, so someone must have configured this with the > --in-memory flag. > > > > > In the case where the application doesn't care about telemetry, they have > > gone from not having telemetry to rte_eal_init() failing, which probably has > > severe consequence. > > > > Yes, I agree, which I why I would suggest that for any backport of this > fix, the error be made non-fatal as you suggest. [Having looked into it, > having it as a non-fatal error is rather awkward, so it may be best just > left unfixed and the current behaviour documented as known-issue]. > > However, for any application being updated and rebuilt against 21.11, I > would have thought it reasonable to flag this as an error, as any such > application would require revalidation anyway. > > > I could maybe agree if telemetry was default disable and the application had > > set the --telemetry flag indicating that they want/need it. As it is, it > > feels like it's possibly a worse outcome for the user. > > > > Perhaps, but I believe the only case of there being an issue would be where: > 1) a user who cannot modify the EAL parameters > 2) runs an application which has been updated and rebuilt against 21.11 > 3) where that application is hard-coded to use in-memory mode and > 4) has never been verified with two or more instances of that running? > Or am I missing something here? > Let me also go back to the drawing board on the solution here a bit, and see if I can come up with something better. If I can find a reasonable way to make it so that we can always create a socket in in-memory mode, despite other processes running, it would sidestep this problem completely. Not sure if it's possible, but let me see if I can come up with some ideas. [One idea I did try is using abstract sockets on linux, but with those we lose out on the permissions/protection we get from having a filesystem path, so were a no-go for me because of that] /Bruce
[dpdk-dev] [dpdk-dev v1] test/crypto: maxlen calculation update
Update the calculation of the max length needed when converting mbuf to data vec in partial digest test case. This update make sure the enough vec buffers are allocated for the appended digest in sgl op for QAT raw datapath api. Fixes: 4868f6591c6f ("test/crypto: add cases for raw datapath API") Cc: roy.fan.zh...@intel.com Signed-off-by: Kai Ji --- app/test/test_cryptodev.c | 35 +++ 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c index 16d770a17f..ea46911648 100644 --- a/app/test/test_cryptodev.c +++ b/app/test/test_cryptodev.c @@ -167,6 +167,10 @@ post_process_raw_dp_op(void *user_data,uint32_t index __rte_unused, RTE_CRYPTO_OP_STATUS_ERROR; } +static struct crypto_testsuite_params testsuite_params = { NULL }; +struct crypto_testsuite_params *p_testsuite_params = &testsuite_params; +static struct crypto_unittest_params unittest_params; + void process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id, struct rte_crypto_op *op, uint8_t is_cipher, uint8_t is_auth, @@ -181,6 +185,7 @@ process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id, struct rte_crypto_sgl sgl; uint32_t max_len; union rte_cryptodev_session_ctx sess; + uint64_t auth_end_iova; uint32_t count = 0; struct rte_crypto_raw_dp_ctx *ctx; uint32_t cipher_offset = 0, cipher_len = 0, auth_offset = 0, @@ -190,6 +195,9 @@ process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id, int ctx_service_size; int32_t status = 0; int enqueue_status, dequeue_status; + struct crypto_unittest_params *ut_params = &unittest_params; + /* oop is not supported in raw hw dp api*/ + int is_sgl = sop->m_src->nb_segs > 1; ctx_service_size = rte_cryptodev_get_raw_dp_ctx_size(dev_id); if (ctx_service_size < 0) { @@ -255,6 +263,29 @@ process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id, digest.va = (void *)sop->auth.digest.data; digest.iova = sop->auth.digest.phys_addr; + if (is_sgl) { + uint32_t remaining_off = auth_offset + auth_len; + struct rte_mbuf *sgl_buf = sop->m_src; + + while (remaining_off >= rte_pktmbuf_data_len(sgl_buf) + && sgl_buf->next != NULL) { + remaining_off -= rte_pktmbuf_data_len(sgl_buf); + sgl_buf = sgl_buf->next; + } + + auth_end_iova = (uint64_t)rte_pktmbuf_iova_offset( + sgl_buf, remaining_off); + } else { + /* oop is not supported in raw hw dp api */ + auth_end_iova = rte_pktmbuf_iova(op->sym->m_src) + +auth_offset + auth_len; + } + /* Then check if digest-encrypted conditions are met */ + if ((auth_offset + auth_len < cipher_offset + cipher_len) && + (digest.iova == auth_end_iova) && is_sgl) + max_len = RTE_MAX(max_len, auth_offset + auth_len + + ut_params->auth_xform.auth.digest_length); + } else if (is_cipher) { cipher_offset = sop->cipher.data.offset; cipher_len = sop->cipher.data.length; @@ -477,10 +508,6 @@ process_crypto_request(uint8_t dev_id, struct rte_crypto_op *op) return op; } -static struct crypto_testsuite_params testsuite_params = { NULL }; -struct crypto_testsuite_params *p_testsuite_params = &testsuite_params; -static struct crypto_unittest_params unittest_params; - static int testsuite_setup(void) { -- 2.17.1
Re: [dpdk-dev] [PATCH v1 1/3] common/cnxk: set key length setting for PDCP algos
> Subject: [PATCH v1 1/3] common/cnxk: set key length setting for PDCP algos > > Set proper bits in the context based on key length for PDCP > > Signed-off-by: Vidya Sagar Velumuri > Series Acked-by: Akhil Goyal Applied to dpdk-next-crypto Thanks.
Re: [dpdk-dev] [PATCH v4 10/18] net/i40e/base: fix headers to match functions
On 9/6/2021 3:02 AM, Robin Zhang wrote: > Fix several kernel-doc warnings when building with W=1. These changes > are only to comments. > Same here, why shared code drop has Linux kernel specific updates? > Fixes: 8db9e2a1b232 ("i40e: base driver") > Fixes: 842ea1996335 ("i40e/base: save link module type") > Fixes: fd72a2284a89 ("i40e/base: support LED blinking with new PHY") > Fixes: 788fc17b2dec ("i40e/base: support proxy config for X722") > Cc: sta...@dpdk.org > > Signed-off-by: Jesse Brandeburg > Signed-off-by: Robin Zhang
Re: [dpdk-dev] [PATCH v2] telemetry: fix "in-memory" process socket conflicts
On 29/09/2021 16:31, Bruce Richardson wrote: On Wed, Sep 29, 2021 at 04:24:06PM +0100, Bruce Richardson wrote: On Wed, Sep 29, 2021 at 03:54:48PM +0100, Kevin Traynor wrote: On 29/09/2021 14:32, Bruce Richardson wrote: On Wed, Sep 29, 2021 at 01:28:53PM +0100, Kevin Traynor wrote: Hi Bruce, On 24/09/2021 17:18, Bruce Richardson wrote: When DPDK is run with --in-memory mode, multiple processes can run simultaneously using the same runtime dir. This leads to each process removing another process' telemetry socket as it started up, giving unexpected behaviour. This patch changes that behaviour to first check if the existing socket is active. If not, it's an old socket to be cleaned up and can be removed. If it is active, telemetry initialization fails and an error message is printed out giving instructions on how to remove the error; either by using file-prefix to have a different runtime dir (and therefore socket path) or by disabling telemetry if it not needed. telemetry is enabled by default but it may not be used by the application. Hitting this issue will cause rte_eal_init() to fail which will probably stop or severely limit the application. So it could change a working application to a non-working one (albeit one that doesn't interfere with other process' sockets). Can it just print a warning that telemetry will not be enabled and continue so it's not returning an rte_eal_init failure? For a backported fix, yes, that would probably be better behaviour, but for the latest branch, I think returning error and having the user explicitly choose the resolution they want to occur is best. I'll see about doing a separate backport patch for 20.11. But this is a runtime message dependent on runtime environment. The user may not have access or know how to change eal parameters. True. But on the other hand, this problem only occurs with non-default EAL parameters anyway, so someone must have configured this with the --in-memory flag. In the case where the application doesn't care about telemetry, they have gone from not having telemetry to rte_eal_init() failing, which probably has severe consequence. Yes, I agree, which I why I would suggest that for any backport of this fix, the error be made non-fatal as you suggest. [Having looked into it, having it as a non-fatal error is rather awkward, so it may be best just left unfixed and the current behaviour documented as known-issue]. However, for any application being updated and rebuilt against 21.11, I would have thought it reasonable to flag this as an error, as any such application would require revalidation anyway. I could maybe agree if telemetry was default disable and the application had set the --telemetry flag indicating that they want/need it. As it is, it feels like it's possibly a worse outcome for the user. Perhaps, but I believe the only case of there being an issue would be where: 1) a user who cannot modify the EAL parameters 2) runs an application which has been updated and rebuilt against 21.11 3) where that application is hard-coded to use in-memory mode and >> 4) has never been verified with two or more instances of that running? That's a reasonable point that if it has in-memory hardcoded you might expect it to be tested with two or more, and if it's not hardcoded, it is added by the user so they are able to set eal params. I still see an extra step for the user but I agree if they can set eal params then it is a lot less impactful. For OVS, a user could update the dpdk-extra ovsdb entry for the additional eal flags. Or am I missing something here? Let me also go back to the drawing board on the solution here a bit, and see if I can come up with something better. If I can find a reasonable way to make it so that we can always create a socket in in-memory mode, despite other processes running, it would sidestep this problem completely. Not sure if it's possible, but let me see if I can come up with some ideas. [One idea I did try is using abstract sockets on linux, but with those we lose out on the permissions/protection we get from having a filesystem path, so were a no-go for me because of that] ok, thanks Bruce. I think you got the concerns anyway. I suppose a part of it goes back to: telemetry is default, but does that imply that it is required and dpdk should error out if it is not available or not. Kevin. /Bruce
Re: [dpdk-dev] [PATCH 00/19] MLX5 FreeBSD support
29/09/2021 17:56, Srikanth K: > Hi Thomas, > > PFA the test report. It covers all the features that were tested across > various platforms. > > The features mentioned in the test report depend on a modified FreeBSD > OFED. There is an ongoing review of these changes by the FreeBSD community. > https://reviews.freebsd.org/p/vag.singh_oneconvergence.com/ OK, this dependency is very important to notify in this patchset. For next version, please add it in the cover letter. The supported features (MTU, MAC, VLAN, RSS, fragments, jumbo, stats and trust mode) should be listed in the cover letter as well as in the mlx5 documentation. Please follow what was done for Windows. About the patches organization, please do not fix or remove something which was added in a previous patch. You need to have logical steps done in each patch without going backward. Please start with enabling compilation in first patch, so each step can be tested. Thanks
Re: [dpdk-dev] [PATCH v4 01/18] net/i40e/base: add new versions of send ASQ command functions
On 9/6/2021 3:02 AM, Robin Zhang wrote: > ASQ send command functions are returning only i40e status codes > yet some calling functions also need Admin Queue status > that is stored in hw->aq.asq_last_status. Since hw object > is stored on a heap it introduces a possibility for > a race condition in access to hw if calling function is not > fast enough to read hw->aq.asq_last_status before next > send ASQ command is executed. > > Added new versions of send ASQ command functions that return > Admin Queue status on the stack to avoid race conditions > in access to hw->aq.asq_last_status. > Added new _v2 version of i40e_aq_remove_macvlan and i40e_aq_add_macvlan > that is using new _v2 versions of ASQ send command functions and > returns the Admin Queue status on the stack. > > Signed-off-by: Sylwester Dziedziuch > Signed-off-by: Robin Zhang I assume ASQ is "Admin Send Queue" (although datasheet refers to it as ATQ), can you please give the long version of the abbreviations in the commit log in first usage as: "ASQ (Admin Send Queue) ... "
Re: [dpdk-dev] [PATCH v4 00/18] i40e base code update
On 9/6/2021 3:02 AM, Robin Zhang wrote: > update i40e base code. > > source code of i40e driver: > cid-i40e.2021.08.16.tar.gz > > changelog in i40e share repo: > From 59a080f4fafe ("i40e-shared: Add opcode 0x0406 and 0x0416 to Linux > support") To 2c7aab559654 ("i40e-shared: Add defines related to DDP") > > The following commits are ignored: > cb9139e3bce8 ("i40e-shared: Fix not blinking X722 with x557 PHY via > ‘ethtool -p'") > c09d4f9cf390 ("i40e-shared: i40e-shared: Fix build warning -Wformat > related to integer size") > ff8a1abc6c17 ("i40e-shared: Fix build warning with __packed") > 59a080f4fafe ("i40e-shared: Add opcode 0x0406 and 0x0416 to Linux > support") > > v4: > - update base code to cid-i40e.2021.08.16 > v3: > - there has a fix patch contains two issues, split it into two patches > v2: > - refine commit messages and macro name > > Robin Zhang (18): > net/i40e/base: add new versions of send ASQ command functions > net/i40e/base: add support for Min Rollback Revision for 4 more X722 > modules > net/i40e/base: set TSA table values when parsing CEE configuration > net/i40e/base: define new Shadow RAM pointers > net/i40e/base: fix PHY type identifiers for 2.5G and 5G adapters > net/i40e/base: fix PF reset failed > net/i40e/base: fix update link data for X722 > net/i40e/base: fix AOC media type reported by ethtool > net/i40e/base: add flags and fields for double vlan processing > net/i40e/base: fix headers to match functions > net/i40e/base: fix potentially uninitialized variables in NVM code > net/i40e/base: fix checksum is used before return value is checked > net/i40e/base: add defs for MAC frequency calculation if no link > net/i40e/base: separate kernel allocated rx_bi rings from AF_XDP rings > net/i40e/base: Update FVL FW API version to 1.15 > net/i40e/base: add defines related to DDP > net/i40e/base: update version in readme > net/i40e: fix redefinition warning > There are various './devtools/check-git-log.sh' warnings, can you please check them. Specially there are many 'Headline too long' warnings.
Re: [dpdk-dev] [dpdk-stable] [PATCH v4 05/18] net/i40e/base: fix PHY type identifiers for 2.5G and 5G adapters
On 9/6/2021 3:02 AM, Robin Zhang wrote: > Unlike other supported adapters, 2.5G and 5G use different > PHY type identifiers for reading/writing PHY settings > and for reading link status. This commit intruduces s/intruduces/introduces/ > separate PHY identifiers for these two operation types. > > Fixes: 988ed63c7441 ("net/i40e/base: add support for Carlsville device") > Cc: sta...@dpdk.org > > Signed-off-by: Dawid Lukwinski > Signed-off-by: Robin Zhang
Re: [dpdk-dev] [PATCH v4 02/18] net/i40e/base: add support for Min Rollback Revision for 4 more X722 modules
On 9/6/2021 3:02 AM, Robin Zhang wrote: > This change increments X722 API version and adds new constants related to > the extended implementation of Security Version Opt-In. > There are new 'I40E_AQ_RREV_MODULE_PHY_*' macros, what is their relation with "Security Version Opt-In"? Also title mentions "Min Rollback Revision for 4 more X722 modules", is this referred 4 more modules are 'I40E_AQ_RREV_MODULE_PHY_*', is added macros for "Security Version Opt-In" or "Min Rollback Revision"? And what does "Min Rollback Revision" mean? And why first letters of words are upper case? Overall can you please add a little more clarification to the commit log? > Signed-off-by: Stanislaw Grzeszczak > Signed-off-by: Robin Zhang > --- > drivers/net/i40e/base/i40e_adminq_cmd.h | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/i40e/base/i40e_adminq_cmd.h > b/drivers/net/i40e/base/i40e_adminq_cmd.h > index 2ca41db5d3..a96527f31c 100644 > --- a/drivers/net/i40e/base/i40e_adminq_cmd.h > +++ b/drivers/net/i40e/base/i40e_adminq_cmd.h > @@ -12,7 +12,7 @@ > */ > > #define I40E_FW_API_VERSION_MAJOR0x0001 > -#define I40E_FW_API_VERSION_MINOR_X722 0x000B > +#define I40E_FW_API_VERSION_MINOR_X722 0x000C > #define I40E_FW_API_VERSION_MINOR_X710 0x000C > > #define I40E_FW_MINOR_VERSION(_h) ((_h)->mac.type == I40E_MAC_XL710 ? \ > @@ -2425,11 +2425,15 @@ struct i40e_aqc_rollback_revision_update { > u8 optin_mode; /* bool */ > #define I40E_AQ_RREV_OPTION_MODE 0x01 > u8 module_selected; > -#define I40E_AQ_RREV_MODULE_PCIE_ANALOG 0 > -#define I40E_AQ_RREV_MODULE_PHY_ANALOG 1 > -#define I40E_AQ_RREV_MODULE_OPTION_ROM 2 > -#define I40E_AQ_RREV_MODULE_EMP_IMAGE3 > -#define I40E_AQ_RREV_MODULE_PE_IMAGE 4 > +#define I40E_AQ_RREV_MODULE_PCIE_ANALOG 0 > +#define I40E_AQ_RREV_MODULE_PHY_ANALOG 1 > +#define I40E_AQ_RREV_MODULE_OPTION_ROM 2 > +#define I40E_AQ_RREV_MODULE_EMP_IMAGE3 > +#define I40E_AQ_RREV_MODULE_PE_IMAGE 4 > +#define I40E_AQ_RREV_MODULE_PHY_PLL_O_CONFIGURATION 5 > +#define I40E_AQ_RREV_MODULE_PHY_0_CONFIGURATION 6 > +#define I40E_AQ_RREV_MODULE_PHY_PLL_1_CONFIGURATION 7 > +#define I40E_AQ_RREV_MODULE_PHY_1_CONFIGURATION 8 > u8 reserved1[2]; > u32 min_rrev; > u8 reserved2[8]; >
Re: [dpdk-dev] [PATCH v4 03/18] net/i40e/base: set TSA table values when parsing CEE configuration
On 9/6/2021 3:02 AM, Robin Zhang wrote: > Driver did not Set TSA table values when parsing CEE configuration > obtained from FW. > > Signed-off-by: Pawel Malinowski > Signed-off-by: Robin Zhang Can you please long version of the 'TSA' & 'CEE' abbreviations in the commit log? There are more in other commits and I won't comment on all, please update same on all commits.
Re: [dpdk-dev] [PATCH v4 08/18] net/i40e/base: fix AOC media type reported by ethtool
On 9/6/2021 3:02 AM, Robin Zhang wrote: > For Active Optical Cable (AOC) the correct media type is "Fibre", > not "Direct Attach Copper". > > Fixes: d749d4d89969 ("i40e/base: add AOC PHY types") > Fixes: aa153cc89ff0 ("net/i40e/base: add new PHY types for 25G AOC and > ACC") > Cc: sta...@dpdk.org > > Signed-off-by: Dawid Lukwinski > Signed-off-by: Robin Zhang I don't think the update in the dpdk is related to the ethtool, can you please update the patch title accordingly?
Re: [dpdk-dev] [PATCH v4 04/18] net/i40e/base: define new Shadow RAM pointers
On 9/6/2021 3:02 AM, Robin Zhang wrote: > Add definitions for Shadow RAM pointers: 6th FPA module, 5th FPA module > in X722 and Preservation Rules Module. > > Signed-off-by: Stanislaw Grzeszczak > Signed-off-by: Robin Zhang Can you please update patch title to use all lowercase except abbreviations, like s/Shadow RAM/shadow RAM/ A brief explanation in the commit log on what is shadow RAM pointer and what does it mean adding new ones?
Re: [dpdk-dev] [PATCH v4 14/18] net/i40e/base: separate kernel allocated rx_bi rings from AF_XDP rings
On 9/6/2021 3:02 AM, Robin Zhang wrote: > Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP > zero-copy/sk_buff rx_bi rings are now separate. Functions to properly > allocate the different rings are added as well. > > Signed-off-by: Björn Töpel > Signed-off-by: Robin Zhang Again, af_xdp concern is very specific to the Linux kernel, should we get this commit in shared code drop for DPDK?
Re: [dpdk-dev] [dpdk-stable] [PATCH v4 06/18] net/i40e/base: fix PF reset failed
On 9/6/2021 3:02 AM, Robin Zhang wrote: > PF has to delete all the filters during reset. > If it is fully loaded with filters then it is possible > that it will take more than 200 ms to finish the reset > resulting in timeout during pf_reset and > PF reset failed, -15 error indication. > Increasing the timeout value for PF reset from 200 to 1000 > to give PF more time to finish reset if it is loaded with filters. > > Fixes: 1e32378f0774 ("i40e/base: increase PF reset max loop limit") > Cc: sta...@dpdk.org > > Signed-off-by: Sylwester Dziedziuch > Signed-off-by: Robin Zhang What do you think updating patch title as: net/i40e/base: fix PF reset
Re: [dpdk-dev] [PATCH v4 09/18] net/i40e/base: add flags and fields for double vlan processing
On 9/6/2021 3:02 AM, Robin Zhang wrote: > Add flags for outer vlan and include set port parameters in Linux > compilation. Isn't this a shared code drop, why it has changes specific to Linux compilation? > Add flags, which describe port and switch state for both double vlan > functionality and outer vlan processing. > > Signed-off-by: Przemyslaw Patynowski > Signed-off-by: Robin Zhang