[dpdk-dev] mbuf: how to set data to NULL?
Hi, DPDK 1.8.0 removes the data pointer from the mbuf structure, such that the start of the data in the segment buffer must be calculated (i.e. buf_addr + data_off = 'data'). Given this, what is the best approach to set the mbuf data to NULL (previously mbuf.data = NULL)? As I see it, given an initialized mbuf, such that buf_addr is non-null, and data_off =RTE_PKTMBUF_HEADROOM, is it fair to say that the best solution is to memset to 0 from location (buf_addr + data_off) for a length of (data_len - data_off)? Thanks in advance, Mark
[dpdk-dev] Performance - linking against DPDK shared vs static libraries
Hi, I build a switching application, which links against DPDK shared libraries; when I run the application, I see throughput of X. I then build the application again, except this time I link against DPDK shared libraries, having modified the application's build parameters appropriately. In this case, I see a performance drop of around .04%, which given the high throughput of the application is significant. Is such performance degradation to be expected when using DPDK shared libraries, and if not, are there any best-known methods for preventing performance degradation, assuming that I may be constrained to using shared libraries going forward? Thanks, Mark -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Performance - linking against DPDK shared vs static libraries
-Original Message- From: Antti Kantee [mailto:po...@fixup.fi] Sent: Wednesday, July 23, 2014 5:37 PM To: Kavanagh, Mark B; dev at dpdk.org Subject: Re: [dpdk-dev] Performance - linking against DPDK shared vs static libraries On 23/07/14 15:58, Kavanagh, Mark B wrote: >> Hi, >> >> I build a switching application, which links against DPDK shared libraries; >> when I run the application, I see throughput of X. I then build the >> application again, except this time I link against DPDK shared libraries, >> having modified the application's build parameters appropriately. In this >> case, I see a performance drop of around .04%, which given the high >> throughput of the application is significant. >> >> Is such performance degradation to be expected when using DPDK shared >> libraries, and if not, are there any best-known methods for preventing >> performance degradation, assuming that I may be constrained to using shared >> libraries going forward? >Do you mean .04% or 4%? I would be more inclined to believe the latter. >Shared libraries are inherently slower due to indirection from PIC, and being >required to use them seems like a silly constraint in the context of >high-performance computing. > - antti It's actually the former, and I agree with your assertion regarding the constraint! I take it from your response then that the performance drop when using shared libraries is expected behavior? -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Fortville threshold register values
Hi, The l3fwd sample app contains the default values for the Intel(R) 82599 10 GbE controller's RX and TX Prefetch, Host, and Write-back registers; said values are used in the application to configure physical ports for optimal performance. What values could be used to configure the physical ports of a Fortville controller in a DPDK application for optimal performance? Many thanks, Mark -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Performance - linking against DPDK shared vs static libraries
Many thanks to all for your help. -Original Message- From: Matthew Hall [mailto:mh...@mhcomputing.net] Sent: Wednesday, July 23, 2014 10:56 PM To: Kavanagh, Mark B Cc: Antti Kantee; dev at dpdk.org Subject: Re: [dpdk-dev] Performance - linking against DPDK shared vs static libraries On Wed, Jul 23, 2014 at 09:43:49PM +, Kavanagh, Mark B wrote: > I take it ... that the performance drop when using shared libraries is > expected behavior? s/expected behavior/and unavoidable consequence/g ;) Matthew Hall. -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] Minimum Supported x86 microarchitecture
Hi, The recent reimplementation of rte_memcpy in DPDK v2.0.0 seems to have a placed an implicit floor on the microarchitecture/Instruction set supported by DPDK. For example, I can't compile head of OVS against DPDK 2.0 with gcc without passing the 'msse3' flag; this points to an implicit minimum supported CPU of 'core2'. More discussion on same is available here: http://openvswitch.org/pipermail/dev/2015-April/053523.html Can anyone confirm or deny this, and is/should it be documented? Thanks in advance, Mark
[dpdk-dev] Minimum Supported x86 microarchitecture
>-Original Message- >From: Richardson, Bruce >Sent: Wednesday, April 15, 2015 5:05 PM >To: Kavanagh, Mark B >Cc: dev at dpdk.org >Subject: Re: [dpdk-dev] Minimum Supported x86 microarchitecture > >On Wed, Apr 15, 2015 at 03:09:39PM +, Kavanagh, Mark B wrote: >> Hi, >> >> The recent reimplementation of rte_memcpy in DPDK v2.0.0 seems to have a >> placed an >implicit floor on the microarchitecture/Instruction set supported by DPDK. >> >> For example, I can't compile head of OVS against DPDK 2.0 with gcc without >> passing the >'msse3' flag; this points to an implicit minimum supported CPU of 'core2'. >More >discussion on same is available here: >http://openvswitch.org/pipermail/dev/2015- >April/053523.html >> >> Can anyone confirm or deny this, and is/should it be documented? >> >> Thanks in advance, >> Mark > >SSE3 is the minimum necessary. However, I believe all x86_64 cpus have at least >SSE3 support, so this should only be a problem with 32-bit builds. Is this the >case for you? > >/Bruce Hey Bruce, No, I'm compiling on an IVB system.
Re: [dpdk-dev] [dpdk-announce] DPDK 17.05.1 released
>From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yuanhan Liu >Sent: Friday, June 30, 2017 12:24 PM >To: annou...@dpdk.org >Subject: [dpdk-dev] [dpdk-announce] DPDK 17.05.1 released > >Hi all, > >Here is a new stable release: > http://fast.dpdk.org/rel/dpdk-17.05.1.tar.xz > >The git tree is at: > http://dpdk.org/browse/dpdk-stable/ > >It's an urgent release, mainly for fixing an important vhost-user bug. >Which is needed for OVS to integrate DPDK v17.05 for OVS v2.8 release. > >Thanks. > > --yliu Hi Yuanhan As you are already aware, this stable release will enable successful integration of DPDK 17.05 with OvS v2.8.0. I'd like to take this opportunity to thank you, John McNamara, and indeed all who were involved in making this early release possible - your efforts are very much appreciated. Thanks and best regards, Mark > >--- > app/test-pmd/config.c | 2 ++ > doc/guides/rel_notes/release_17_05.rst| 46 >+++ > drivers/net/af_packet/rte_eth_af_packet.c | 9 -- > drivers/net/ark/ark_ethdev.c | 18 +++ > drivers/net/ark/ark_pktchkr.c | 2 +- > drivers/net/ark/ark_pktgen.c | 2 +- > drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-- > drivers/net/bnxt/bnxt_ethdev.c| 2 +- > drivers/net/bnxt/bnxt_hwrm.c | 4 +-- > drivers/net/cxgbe/base/t4_hw.c| 20 > drivers/net/cxgbe/base/t4_regs.h | 18 +++ > drivers/net/cxgbe/cxgbe.h | 3 +- > drivers/net/cxgbe/cxgbe_ethdev.c | 8 +++-- > drivers/net/cxgbe/cxgbe_main.c| 33 +++ > drivers/net/e1000/em_ethdev.c | 2 +- > drivers/net/e1000/igb_ethdev.c| 18 +++ > drivers/net/e1000/igb_rxtx.c | 6 ++-- > drivers/net/ena/ena_ethdev.c | 2 +- > drivers/net/enic/base/vnic_dev.c | 4 +-- > drivers/net/enic/enic_ethdev.c| 2 +- > drivers/net/fm10k/fm10k_ethdev.c | 2 +- > drivers/net/i40e/base/i40e_register.h | 2 +- > drivers/net/i40e/i40e_ethdev.c| 30 +++-- > drivers/net/i40e/i40e_ethdev.h| 5 +++ > drivers/net/i40e/i40e_ethdev_vf.c | 2 +- > drivers/net/ixgbe/ixgbe_ethdev.c | 4 +-- > drivers/net/ixgbe/ixgbe_flow.c| 4 +++ > drivers/net/liquidio/lio_ethdev.c | 23 -- > drivers/net/mlx5/mlx5.c | 1 + > drivers/net/mlx5/mlx5.h | 2 +- > drivers/net/mlx5/mlx5_fdir.c | 7 ++-- > drivers/net/mlx5/mlx5_flow.c | 27 +++- > drivers/net/mlx5/mlx5_rxq.c | 14 +--- > drivers/net/mlx5/mlx5_txq.c | 16 ++ > drivers/net/nfp/nfp_net.c | 2 +- > drivers/net/qede/qede_ethdev.c| 4 +-- > drivers/net/qede/qede_rxtx.c | 4 +-- > drivers/net/qede/qede_rxtx.h | 3 +- > drivers/net/ring/rte_eth_ring.c | 2 +- > drivers/net/sfc/base/ef10_ev.c| 7 +++- > drivers/net/sfc/base/ef10_rx.c| 18 +-- > drivers/net/sfc/base/ef10_tx.c| 18 +-- > drivers/net/sfc/sfc_ethdev.c | 2 +- > drivers/net/sfc/sfc_tx.c | 2 +- > drivers/net/sfc/sfc_tx.h | 2 ++ > drivers/net/tap/tap_flow.c| 7 > drivers/net/thunderx/nicvf_ethdev.c | 2 +- > drivers/net/virtio/virtio_ethdev.c| 4 +-- > drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +- > examples/vhost/virtio_net.c | 5 +-- > lib/librte_eal/common/include/rte_version.h | 2 +- > lib/librte_eal/linuxapp/eal/eal_vfio.c| 6 ++-- > lib/librte_eal/linuxapp/kni/compat.h | 4 ++- > lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h | 2 +- > lib/librte_lpm/rte_lpm.c | 4 +-- > lib/librte_vhost/vhost.c | 2 +- > lib/librte_vhost/vhost_user.c | 8 +++-- > pkg/dpdk.spec | 2 +- > 58 files changed, 322 insertions(+), 136 deletions(-) > >--- >Ajit Khaparde (1): > net/bnxt: fix reporting of link status > >Alejandro Lucero (1): > vfio: fix array bounds check > >Andrew Rybchenko (1): > net/sfc: add Tx queue flush failed flag for sanity > >Andy Moreton (2): > net/sfc/base: fix error code usage in common code > net/sfc/base: let caller know that queue is already flushed > >Beilei Xing (1): > app/testpmd: fix creating E-Tag and NVGRE flow rules > >Chas Williams (3): > net/af_packet: handle possible null pointer > net/af_packet: fix packet bytes counting > net/ring: fix adding MAC addresses > >Dariusz Stojaczyk (2): > vhost
Re: [dpdk-dev] [PATCH] net/i40e: fix data segment buffer length
>From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Qi Zhang >Sent: Wednesday, August 2, 2017 8:22 AM >To: Wu, Jingjing >Cc: dev@dpdk.org; jianfeng.t...@intel.com; Zhang, Qi Z ; >sta...@dpdk.org; Tan, Jianfeng >Subject: [dpdk-dev] [PATCH] net/i40e: fix data segment buffer length > >Buffer length be configured for each data segment should not exceed >the requested value, or device may fill data that exceed the boundary >of memory that be reserved. > >Fixes: 4861cde46116 ("i40e: new poll mode driver") >Cc: sta...@dpdk.org > >Signed-off-by: Jianfeng Tan >Signed-off-by: Qi Zhang Thanks guys, this resolved an issue that I'd encountered (as discussed off-list). Reviewed-by: Mark Kavanagh Tested-by: Mark Kavanagh >--- > drivers/net/i40e/i40e_rxtx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c >index ae64de6..d42c23c 100644 >--- a/drivers/net/i40e/i40e_rxtx.c >+++ b/drivers/net/i40e/i40e_rxtx.c >@@ -2474,7 +2474,7 @@ i40e_rx_queue_config(struct i40e_rx_queue *rxq) > case I40E_FLAG_HEADER_SPLIT_DISABLED: > default: > rxq->rx_hdr_len = 0; >- rxq->rx_buf_len = RTE_ALIGN(buf_size, >+ rxq->rx_buf_len = RTE_ALIGN_FLOOR(buf_size, > (1 << I40E_RXQ_CTX_DBUFF_SHIFT)); > rxq->hs_mode = i40e_header_split_none; > break; >-- >2.9.4
Re: [dpdk-dev] GSO/GRO support
> >We, at Juniper Opencontrail have added software support for TCP send offload >and receive >offload to DPDK. > >If the community is interested, we can publish/upstream it. > >Pl let us know what you think of it. Hi Kiran, I'd be very interested in this, with a view to integrating your APIs into Open vSwitch as a software fallback option for situations in which a particular NIC doesn't support TSO/LRO. If/when you release the code I'd be happy to review and/or test it. Thanks in advance, Mark > >Thanks, > Kiran
[dpdk-dev] [dpdk-users] How to printout PMD logs to console
> >Hi All, > > >I'm doing app debug and would like to see device PMD logs, e.g. >dpdk/drivers/net/ixgbe/ixgbe_rxtx.c:1703 > PMD_RX_LOG(...) Hi Yingzhi, PMD_RX_LOG is defined in ixgbe_logs.h (see code snippet below). #ifdef RTE_LIBRTE_IXGBE_DEBUG_RX #define PMD_RX_LOG(level, fmt, args...) \ RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args) #else #define PMD_RX_LOG(level, fmt, args...) do { } while(0) #endif As you can see, the logs are only printed if RTE_LIBRTE_IXGBE_DEBUG_RX is defined. Just recompile DPDK with 'CONFIG_RTE_LIBRTE_IXGBE_DEBUG_RX=y' to resolve your issue. Thanks, Mark > > >Currently I can only see RTE logs from console. > > >Any comments is appreciated >Thanks
[dpdk-dev] [PATCH v4]net/virtio: add mtu set in virtio
> >Hi Liu, > >Yes agreed your comment. I will definitely remove the declaration as it is not >really >required. > So the latest patch will look like this . Yes I did rush a bit to submit the > patch last will >correct my suite. So sending the patch in a reply if we have more comments we >can take a look >and they re-submit the final reviewed patch. Thanks for the help though. > >--- > drivers/net/virtio/virtio_ethdev.c | 12 > 1 file changed, 12 insertions(+) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index 07d6449..da16ad4 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c > >+static int >+virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) >+{ >+ struct virtio_hw *hw = dev->data->dev_private; >+ if (mtu < VIRTIO_MIN_RX_BUFSIZE || mtu > VIRTIO_MAX_RX_PKTLEN) { >+ PMD_INIT_LOG(ERR, "Mtu should be between 64 and 9728\n"); Hi Souvik, Why hardcode the values in the PMD_INIT_LOG? Why not do the following: PMD_INIT_LOG(ERR, "MTU should be between %d and %d", VIRTIO_MIN_RX_BUFSIZE, VIRTIO_MAX_RX_PKTLEN); That way, if the values that the macros evaluate to change, the log will update correspondingly. Also, does 'MTU' in this context relate to the L2 or L3 MTU? >+ return -EINVAL; >+ } >+ return 0; >+} >+ > /* > * dev_ops for virtio, bare necessities for basic operation > */ >@@ -664,6 +675,7 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { > .promiscuous_disable = virtio_dev_promiscuous_disable, > .allmulticast_enable = virtio_dev_allmulticast_enable, > .allmulticast_disable= virtio_dev_allmulticast_disable, >+ .mtu_set = virtio_mtu_set, > > .dev_infos_get = virtio_dev_info_get, > .stats_get = virtio_dev_stats_get, >-- > >-- >Regards, >Souvik > >-Original Message- >From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com] >Sent: Friday, September 9, 2016 3:00 AM >To: Dey, Souvik >Cc: dev at dpdk.org; stephen at networkplumber.org >Subject: Re: [PATCH v4]net/virtio: add mtu set in virtio > >On Wed, Sep 07, 2016 at 04:21:30AM +, Dey, Souvik wrote: >> Hi Liu, >> Submitted the version 4 of the patch as you suggested , > >Your patch is looking much better. But not really, you ignored few of my >comments. > >> and have removed the Reviewed by line. >> I have still kept the function definition as to follow the same suit as we >> have done for >other eth_dev_ops. > >That's because most of the method implementions are after the reference, thus >the declaration >is needed. > >For your case, I see no good reason to do that. BTW, if you disagree with my >comment, you >shoud made a reply, instead of rushing to sending a new version. > >> -- >> Regards, >> Souvik >> >> -Original Message- >> From: Dey, Souvik >> Sent: Wednesday, September 7, 2016 12:19 AM >> To: dev at dpdk.org; stephen at networkplumber.org; >> yuanhan.liu at linux.intel.com >> Cc: Dey, Souvik >> Subject: [PATCH v4]net/virtio: add mtu set in virtio >> >> >> Virtio interfaces should also support setting of mtu, as in case of cloud it >> is expected to >have the consistent mtu across the infrastructure that the dhcp server sends >and not >hardcoded to 1500(default). >> >> Changes in v4: Incorporated review comments. >> Changes in v3: Corrected few style errors as reported by sys-stv. >> Changes in v2: Incorporated review comments. > >DPDK prefers to put the change log to ... >> >> Signed-off-by: Souvik Dey >> >> --- > >... here. > >So that they will be showed in mailing list (for review), but they will be >gone after apply. >In another word, we don't like to see those change log in git history, but >we'd like to see >them while review. > >> drivers/net/virtio/virtio_ethdev.c | 12 >> 1 file changed, 12 insertions(+) >> >> diff --git a/drivers/net/virtio/virtio_ethdev.c >> b/drivers/net/virtio/virtio_ethdev.c >> index 07d6449..da16ad4 100644 >> --- a/drivers/net/virtio/virtio_ethdev.c >> +++ b/drivers/net/virtio/virtio_ethdev.c >> @@ -92,6 +92,7 @@ static void virtio_mac_addr_add(struct rte_eth_dev *dev, >> static void >virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index); static void >virtio_mac_addr_set(struct rte_eth_dev *dev, >> struct ether_addr *mac_addr); >> +static int virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu); >> >> static int virtio_dev_queue_stats_mapping_set( >> __rte_unused struct rte_eth_dev *eth_dev, @@ -652,6 +653,16 @@ >virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) >> PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); } >> >> +static int >> +virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { >> +struct virtio_hw *hw = dev->data->dev_private;
[dpdk-dev] [PATCH v4]net/virtio: add mtu set in virtio
> >Hi Mark, > >Yes I thought I did that change. Sorry once again.. making too many mistakes. >Changed it . >Thanks. >The MTU here is L3 MTU. Setting this will help in reducing the >fragmented/multi-segmented >packets and also in case we want to reduce the MTU below 1500, to support >VXLAN or GRE tunnel >for the packets in openstack and cloud environments. > >--- > drivers/net/virtio/virtio_ethdev.c | 12 > 1 file changed, 12 insertions(+) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index 07d6449..da16ad4 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c > >static int virtio_dev_queue_stats_mapping_set( > __rte_unused struct rte_eth_dev *eth_dev, >@@ -652,6 +653,16 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) > PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); > } > >+static int >+virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) >+{ >+ struct virtio_hw *hw = dev->data->dev_private; >+ if (mtu < VIRTIO_MIN_RX_BUFSIZE || mtu > VIRTIO_MAX_RX_PKTLEN) { If MTU is the L3 MTU as you've indicated, then you need to take account of L2 overhead before performing the comparison above. Say the user supplies 'mtu' as 9728 - the corresponding minimum frame size is L2_HDR_LEN + 9728 + L2_CRC_LEN = 9746 bytes, which is larger than the NIC can accommodate (note that 9728 is the largest L2 frame size allowed by the NIC, not the largest IP packet size). >+ PMD_INIT_LOG(ERR, "Mtu should be between VIRTIO_MIN_RX_BUFSIZE >and >VIRTIO_MAX_RX_PKTLEN \n"); Two things on this statement: 1) in the context of a log message, VIRTIO_XXX_RX_BUFSIZE most likely means very little, and as such is not helpful. I suggest going with the format that I included in my earlier mail, which prints the numeric value of the min and max rx defines 2) MTU is an initialization, and should be printed as such in a log (i.e. all caps) Hope this helps, Mark >+ return -EINVAL; >+ } >+ return 0; >+} >+ > /* > * dev_ops for virtio, bare necessities for basic operation > */ >@@ -664,6 +675,7 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { > .promiscuous_disable = virtio_dev_promiscuous_disable, > .allmulticast_enable = virtio_dev_allmulticast_enable, > .allmulticast_disable= virtio_dev_allmulticast_disable, >+ .mtu_set = virtio_mtu_set, > > .dev_infos_get = virtio_dev_info_get, > .stats_get = virtio_dev_stats_get, >-- > >-- >Regards, >Souvik >-Original Message- >From: Kavanagh, Mark B [mailto:mark.b.kavanagh at intel.com] >Sent: Friday, September 9, 2016 11:05 AM >To: Dey, Souvik ; Yuanhan Liu linux.intel.com> >Cc: dev at dpdk.org; stephen at networkplumber.org >Subject: RE: [PATCH v4]net/virtio: add mtu set in virtio > >> >>Hi Liu, >> >>Yes agreed your comment. I will definitely remove the declaration as it >>is not really required. >> So the latest patch will look like this . Yes I did rush a bit to >>submit the patch last will correct my suite. So sending the patch in a >>reply if we have more comments we can take a look and they re-submit the >>final reviewed >patch. Thanks for the help though. >> >>--- >> drivers/net/virtio/virtio_ethdev.c | 12 >> 1 file changed, 12 insertions(+) >> >>diff --git a/drivers/net/virtio/virtio_ethdev.c >>b/drivers/net/virtio/virtio_ethdev.c >>index 07d6449..da16ad4 100644 >>--- a/drivers/net/virtio/virtio_ethdev.c >>+++ b/drivers/net/virtio/virtio_ethdev.c >> >>+static int >>+virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { >>+ struct virtio_hw *hw = dev->data->dev_private; >>+ if (mtu < VIRTIO_MIN_RX_BUFSIZE || mtu > VIRTIO_MAX_RX_PKTLEN) { >>+ PMD_INIT_LOG(ERR, "Mtu should be between 64 and 9728\n"); > >Hi Souvik, > >Why hardcode the values in the PMD_INIT_LOG? > >Why not do the following: PMD_INIT_LOG(ERR, "MTU should be between %d and %d", >VIRTIO_MIN_RX_BUFSIZE, > VIRTIO_MAX_RX_PKTLEN); > >That way, if the values that the macros evaluate to change, the log will update >correspondingly. > >Also, does 'MTU' in this context relate to the L2 or L3 MTU? > >>+ return -EINVAL; >>+ } >>+ return 0; >>+} >>+ >> /* >> * dev_ops for virtio, b
[dpdk-dev] [PATCH v4]net/virtio: add mtu set in virtio
> >Hi Mark/Liu, >? Thanks to both of you for being so patient with a series of >silly errors. I >have tried to make it better this time. Also there were some un-used variable >in the function >which I have removed. Please check the new patch with all your comments >incorporated. Also >along with the L2_HDR_LEN and L2_CRC_LEN, I have taken in consideration the >VLAN_LEN too. > >One doubt though, >"9728 is the largest L2 frame size allowed by the NIC" -- this 9728 size is >some constrain >due to DPDK as the virtio driver in the kernel can support till mtu size of 68 >to 65535, >where as in DPDK pmd we are trying with 64 to 9728. Yes, I meant the NIC as controlled by a DPDK PMD (I thought this was implicit, given the context of this discussion). I'll be more explicit in future mails though. > >drivers/net/virtio/virtio_ethdev.c | 19 +++ >1 file changed, 19 insertions(+) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index 07d6449..da16ad4 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c >@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) >??? PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); >} > >+#define VLAN_TAG_LEN?? 4??? /* 802.3ac tag (not DMA'd) */ >+ >+static int? virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) One thing on this function: it doesn't actually set any fields, but rather just sanitizes 'mtu'. Maybe this is acceptable, since the calling function (i.e. rte_eth_dev_set_mtu) sets rte_eth_dev->data->mtu? >+{ >+?? struct rte_eth_dev_info dev_info; >+?? uint32_t ether_hdr_len = ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_LEN; >+?? uint32_t frame_size = mtu + ether_hdr_len; >+ >+?? virtio_dev_info_get(dev, &dev_info); >+ >+?? if (mtu < dev_info.min_rx_bufsize || frame_size > >dev_info.max_rx_pktlen) { It's not clear to me whether 'mtu' in this case should be compared with ETHER_MIN_MTU, as per other DPDK drivers, or alternatively whether 'frame_size' should be compared with dev_info.min_rx_bufsize. Any thoughts? >+?? PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n", >+??? ???dev_info.min_rx_bufsize, As above. >+?? (dev_info.max_rx_pktlen - ether_hdr_len)); >+?? return -EINVAL; >+?? } >+ ??return 0; >+} >@@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { >? ??.allmulticast_enable = virtio_dev_allmulticast_enable, >??? .allmulticast_disable??? = virtio_dev_allmulticast_disable, >+?? .mtu_set = virtio_mtu_set, >??? .dev_infos_get?? = virtio_dev_info_get, >??? .stats_get?? = virtio_dev_stats_get, >??? .xstats_get? = virtio_dev_xstats_get, > > >-- >Regards, >Souvik > >-Original Message- >From: Kavanagh, Mark B [mailto:mark.b.kavanagh at intel.com] >Sent: Friday, September 9, 2016 11:44 AM >To: Dey, Souvik ; Yuanhan Liu linux.intel.com> >Cc: dev at dpdk.org; stephen at networkplumber.org >Subject: RE: [PATCH v4]net/virtio: add mtu set in virtio > >> >>Hi Mark, >> >>Yes I thought I did that change. Sorry once again.. making too many mistakes. >>Changed it . >>Thanks. >>The MTU here is L3 MTU.? Setting this will help in reducing the >>fragmented/multi-segmented packets and also in case we want to reduce >>the MTU below 1500, to support VXLAN or GRE tunnel for the packets in >>openstack and cloud >environments. >> >>--- >> drivers/net/virtio/virtio_ethdev.c | 12 >> 1 file changed, 12 insertions(+) >> >>diff --git a/drivers/net/virtio/virtio_ethdev.c >>b/drivers/net/virtio/virtio_ethdev.c >>index 07d6449..da16ad4 100644 >>--- a/drivers/net/virtio/virtio_ethdev.c >>+++ b/drivers/net/virtio/virtio_ethdev.c >> >>static int virtio_dev_queue_stats_mapping_set( >> ?? __rte_unused struct rte_eth_dev *eth_dev, @@ -652,6 +653,16 @@ >>virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) >> ? PMD_INIT_LOG(ERR, "Failed to disable >> allmulticast");? } >> >>+static int >>+virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { >>+? struct virtio_hw *hw = dev->data->dev_private; >>+? if (mtu < VIRTIO_MIN_RX_BUFSIZE || mtu > VIRTIO_MAX_RX_PKTLEN) { > >If MTU is the L3 MTU as you've indicated, then you need to take account o
[dpdk-dev] [PATCH v4]net/virtio: add mtu set in virtio
> >> >>Hi Mark/Liu, >>? Thanks to both of you for being so patient with a series >>of silly errors. I have tried to make it better this time. Also there >>were some un-used variable in the function which I have removed. Please >>check the new patch with all your comments incorporated. Also along with the >>L2_HDR_LEN and >L2_CRC_LEN, I have taken in consideration the VLAN_LEN too. >> >>One doubt though, >>"9728 is the largest L2 frame size allowed by the NIC" -- this 9728 >>size is some constrain due to DPDK as the virtio driver in the kernel >>can support till mtu size of 68 to 65535, where as in DPDK pmd we are trying >>with 64 to >9728. > >Yes, I meant the NIC as controlled by a DPDK PMD (I thought this was implicit, >given the >context of this discussion). I'll be more explicit in future mails though. > >> >>drivers/net/virtio/virtio_ethdev.c | 19 +++ >>1 file changed, 19 insertions(+) >> >>diff --git a/drivers/net/virtio/virtio_ethdev.c >>b/drivers/net/virtio/virtio_ethdev.c >>index 07d6449..da16ad4 100644 >>--- a/drivers/net/virtio/virtio_ethdev.c >>+++ b/drivers/net/virtio/virtio_ethdev.c >>@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct >>rte_eth_dev *dev) >>??? PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); } >> >>+#define VLAN_TAG_LEN?? 4??? /* 802.3ac tag (not DMA'd) */ >>+ >>+static int? virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) > >One thing on this function: it doesn't actually set any fields, but rather >just sanitizes >'mtu'. >Maybe this is acceptable, since the calling function (i.e. >rte_eth_dev_set_mtu) sets >rte_eth_dev->data->mtu? >[Dey, Souvik] Yes , only the sanity check for the mtu is required here and >the setting of >the call back, else the rte_eth_dev_set_mtu() just returns without setting the >MTU in the >rte_eth_dev->data->mtu. Hi Souvik, apologies for the delayed response. That's what I thought, just wanted to verify that no additional structures should be updated here. > >>+{ >>+?? struct rte_eth_dev_info dev_info; >>+?? uint32_t ether_hdr_len = ETHER_HDR_LEN + ETHER_CRC_LEN + >>+VLAN_TAG_LEN; >>+?? uint32_t frame_size = mtu + ether_hdr_len; >>+ >>+?? virtio_dev_info_get(dev, &dev_info); >>+ >>+?? if (mtu < dev_info.min_rx_bufsize || frame_size > >>+dev_info.max_rx_pktlen) { > >It's not clear to me whether 'mtu' in this case should be compared with >ETHER_MIN_MTU, as per >other DPDK drivers, or alternatively whether 'frame_size' should be compared >with >dev_info.min_rx_bufsize. >Any thoughts? >[Dey, Souvik] I am not sure why virtio min_rx_bufsize is less than >ETHER_MIN_MTU, i think it >will be good to have the compare statement as >If(frame_size < ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen) , then >error. What do >you suggest ? Again, this all depends on what 'mtu' means in this context. Since you mentioned previously that it relates to the packet (i.e. L3) length, and not the frame (i.e. L2) length, I would suggest that the comparison should be: if (mtu < ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen) Yuanhan, any thoughts on this? > >>+?? PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n", >>+??? ???dev_info.min_rx_bufsize, >As above. > >>+?? (dev_info.max_rx_pktlen - >>+ether_hdr_len)); >>+?? return -EINVAL; >>+?? } >>+ ??return 0; >>+} >>@@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops >>= { >>? ??.allmulticast_enable = virtio_dev_allmulticast_enable, >>??? .allmulticast_disable??? = virtio_dev_allmulticast_disable, >>+?? .mtu_set = virtio_mtu_set, >>??? .dev_infos_get?? = virtio_dev_info_get, >>??? .stats_get?? = virtio_dev_stats_get, >>??? .xstats_get? = virtio_dev_xstats_get, >> >> >>-- >>Regards, >>Souvik >> >>-Original Message- >>From: Kavanagh, Mark B [mailto:mark.b.kavanagh at intel.com] >>Sent: Friday, September 9, 2016 11:44 AM >>To: Dey, Souvik ; Yuanhan Liu >> >>Cc: dev at dpdk.org; stephen at networkplumber.org >>Subject: RE: [PATCH v4]net/virtio: add mtu set in virtio >> >>> >>>Hi Mark, >>> >>>Yes I thought I did that change. Sorry on
[dpdk-dev] [PATCH v5]net/virtio: add mtu set in virtio
> > Hi Souvik, There are some very basic errors in this patch, particularly with respect to format. Review comments are inline - please address same and resubmit the patch. I also recommend running $DPDK_DIR/utilities/checkpatch.py on any future submissions. Thanks, Mark > As a general comment, please capitalize 'MTU' throughout the commit message. >Virtio interfaces should also support setting of mtu, as in case of cloud 'also' is redundant in the above sentence - please remove it. >it is expected to have the consistent mtu across the infrastructure that >the dhcp server sends and not hardcoded to 1500(default). Try to make the problem statement here more general, before going into specifics. i.e. something along the lines of "Virtio interfaces do not currently allow the user to specify a particular Maximum Transmission Unit (MTU). Consequently, the MTU of Virtio interfaces is typically set to the Ethernet default value of 1500. This is problematic in the case of cloud deployments, in which a specific (and potentially non-standard) MTU needs to be set by a DHCP server, and honored by all interfaces across the traffic path. , etc., etc." >In case when GRE/VXLAN tunneling is used, it adds overheads to the total Please be specific in your description - what is 'it' in the above context? >size of 1518, and in that cases the DHCP server corrects the L3 MTU to 1454 >to take care of the overhead. BUt since virtio interfaces was not having the Should be lowercase 'u' in 'But' >mtu set that mtu sent by dhcp was ignored and teh instance will still send Typo - 'teh' >packets with 1500 mtu which afetr encapsulation will become more than 1518 Typo - 'afetr' >and eventually gets dropped in the infrastructure. This issue can be solved >by honoring the mtu 1454 sent by dhcp server, which this below patch will >take care. Explain how the patch resolves the issue (i.e. by adding an additional 'set_mtu' function to the Virtio driver, which can be leveraged by the dhcp server/controller). > > >Changes in v5: Incorporated review comments. It is implicit that a new patch version incorporates review comments. To make things easier for the reviewer, please provide concise descriptions of the changes made/review comments addressed. For example: v5: - Fix log message for out-of-bounds MTU parameter in virtio_mtu_set - Calculate frame size, based on 'mtu' parameter, for upper bounds check in virtio_mtu_set - - etc., etc. >Changes in v4: Incorporated review comments. >Changes in v3: Corrected few style errors as reported by sys-stv. >Changes in v2: Incorporated review comments. > >Signed-off-by: Souvik Dey > >--- Version history should not be part of the commit message - please move it to below this dashed line. > drivers/net/virtio/virtio_ethdev.c | 19 +++ > 1 file changed, 19 insertions(+) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index 07d6449..da16ad4 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c >@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) >PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); > } > >+#define VLAN_TAG_SIZE 4/* 802.3ac tag (not DMA'd) */ >+ >+static int virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) Additional whitespace between 'int' and 'virtio_mtu_set' - please remove. >+{ >+ struct rte_eth_dev_info dev_info; >+ uint32_t ether_hdr_len = ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE; >+ uint32_t frame_size = mtu + ether_hdr_len; >+ >+ virtio_dev_info_get(dev, &dev_info); >+ >+ if (frame_size < ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen) >{ >+ PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n", >+ (ETHER_MIN_MTU - ether_hdr_len), >+ (dev_info.max_rx_pktlen - ether_hdr_len)); >+ return -EINVAL; >+ } >+ return 0; >+} >@@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { >.allmulticast_enable = virtio_dev_allmulticast_enable, >.allmulticast_disable= virtio_dev_allmulticast_disable, >+ .mtu_set = virtio_mtu_set, >.dev_infos_get = virtio_dev_info_get, >.stats_get = virtio_dev_stats_get, >.xstats_get = virtio_dev_xstats_get, >-- >2.9.3.windows.1
[dpdk-dev] [PATCH v6] net/virtio: add set_mtu in virtio
>Subject: Re: [PATCH v6] net/virtio: add set_mtu in virtio > >On Wed, Sep 21, 2016 at 06:45:05PM -0700, Stephen Hemminger wrote: >> On Thu, 22 Sep 2016 00:08:38 + >> "Dey, Souvik" wrote: >> >> > Answers inline. >> > >> > -- >> > Regards, >> > Souvik >> > >> > -Original Message- >> > From: Stephen Hemminger [mailto:stephen at networkplumber.org] >> > Sent: Wednesday, September 21, 2016 7:22 PM >> > To: Dey, Souvik >> > Cc: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; dev at >> > dpdk.org >> > Subject: Re: [PATCH v6] net/virtio: add set_mtu in virtio >> > >> > On Wed, 21 Sep 2016 19:11:47 -0400 >> > Dey wrote: >> > >> > > >> > > + >> > > +#define VLAN_TAG_SIZE 4/* 802.3ac tag (not DMA'd) */ >> > > + >> > > +static int virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { >> > > + struct rte_eth_dev_info dev_info; >> > > + uint32_t ether_hdr_len = ETHER_HDR_LEN + ETHER_CRC_LEN + >> > > VLAN_TAG_SIZE; >> > > + uint32_t frame_size = mtu + ether_hdr_len; >> > > + >> > > + virtio_dev_info_get(dev, &dev_info); >> > > + >> > > + if (mtu < ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen) { >> > > + PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n", >> > > + ETHER_MIN_MTU, >> > > + (dev_info.max_rx_pktlen - >> > > ether_hdr_len)); >> > > + return -EINVAL; >> > > + } >> > > + return 0; >> > > +} >> > >> > I am fine with the general idea of this patch but: >> > 1. Calling virtio_dev_info_get is needlessly wasteful when all you want >> > is to access the max packet length. Since max_rx_pktlen is always >> > VIRTIO_MAX_RX_PKTLEN, please just use that. >> > [Dey, Souvik] I am using the virtio_dev_info_get as in future can/may >> > support the >max_rx_pktlen as a variable to be set by the application. This will keep the >changes future >proof. As we need to support till 65535 instead of 9728 as the linux does. >> >> Fine, then just dereference hw->rx_max_pktlen. Driver code can/should >> reference >> its own data directly. > >Dey, maybe you could just use VIRTIO_MAX_RX_PKTLEN here, like what you >did in early versions. > >> > >> > 2. Defining VLAN_TAG_SIZE is irrelevant if doing vlan offload. >> > [Dey, Souvik] vlan offload is not mandatory. Se again still have vlan >> > being sent up to >the application. In that case we need to consider the vlan length in the >Ethernet size. >> >> The code needs to handle both vlan offload (or not), correctly. You are >> assuming >> the worst case here. > >I think we are fine here to assume worst case. > >> > >> > 3. Virtio doesn't insert CRC, therefore CRC_LEN is irrelevant >> > [Dey, Souvik] I am not sure of this. Mark commented earlier to consider >> > this length too. >Mark what do you suggest ? >> >> Actually, the thing that matters is the size of the merge rx buf header, not >> the CRC. > >Right. My comments were based on my experience with DPDK ethdev PMDs - Stephen and Yuanhan have much more experience with virtio, so I'd go with their suggestion. > > --yliu >> >> The patch is still buggy. >>
[dpdk-dev] Inconsistent behaviour of rte_eth_dev_set_mtu API between ixgbe and i40e
Hi, In OvS-DPDK, we support single mbuf-segment jumbo frames. To date, we've supported this by creating a mempool containing mbufs of size "~user-defined-MTU", and configured the NIC by crafting an rte_eth_conf structure with jumbo_frame mode enabled, and the device's max_rx_pkt_len set accordingly. e.g. static int dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) { ... struct rte_eth_conf conf = port_conf; if (dev->mtu > ETHER_MTU) { conf.rxmode.jumbo_frame = 1; conf.rxmode.max_rx_pkt_len = dev->max_packet_len; } else { conf.rxmode.jumbo_frame = 0; conf.rxmode.max_rx_pkt_len = 0; } In an attempt to clean this up somewhat, I recently wrote a patch for OvS-DPDK that introduced the rte_eth_dev_set_mtu API, in place of the rte_eth_conf manipulation. This worked fine for Fortville/i40e PMD, but when tested on Niantic/ixgbe, it was observed that the latter balks when a non-standard MTU is specified. Examining the driver's source code, the source of the Niantic issue is traced to the following: static int ixgbe_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { ... struct rte_eth_rxmode *rx_conf = &dev->data->dev_conf.rxmode; ... /* refuse mtu that requires the support of scattered packets when this * feature has not been enabled before. */ if (!rx_conf->enable_scatter && (frame_size + 2 * IXGBE_VLAN_TAG_SIZE > dev->data->min_rx_buf_size - RTE_PKTMBUF_HEADROOM)) return -EINVAL; In summary, there is an additional configuration requirement for scattered_rx mode on Niantic; this is problematic from an application perspective, as the rte_eth_dev_set_mtu API's behaviour is inconsistent across disparate NICs/PMDs. Should this be classified as a bug in DPDK? Thanks in advance, Mark
Re: [dpdk-dev] 17.05.1 patches review and test
>From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yuanhan Liu >Sent: Thursday, June 22, 2017 2:08 PM >To: dpdk stable >Cc: dev@dpdk.org >Subject: [dpdk-dev] 17.05.1 patches review and test > >Hi all, > >I'm doing an unexpected stable release, which is expected to be released >in about two months. The reason I'm doing it is there is an import bug >fix commit that is needed for OVS 2.8 release. The fix is a vhost-user >commit comes from me. Well, unfortunately, I'm also the author of the >original bug :/ > >So I'm doing an urgent release. That also means we will have another >17.05 release (17.05.2) shortly after v17.08 is out. > >And here is a list of patches targeted for this release. Please help >review and test. The planned date for the final release is 30, June. >Before that, please shout if anyone has objections. > >These patches are located at branch 17.05 of dpdk-stable repo: >http://dpdk.org/browse/dpdk-stable/ > Hi Yuanhan, Thanks for making this release available ahead of schedule - it puts OvS in a good position to upgrade the supported version of DPDK in OvS v2.8.0 (due ~mid-August). I can confirm that the OvS team has performed selective validation on the following DPDK-dependent OvS features: - Jumbo frames - vhost NUMA (broken in 17.05 release) - Rate Limiting - QoS - VxLAN tunneling - Rx checksum offload - Flow control - Addition/deletion of vdevs As such, we're happy to proceed with the 17.05.1 release. Thanks once again, Mark >Thanks. > >--yliu > >--- >Ajit Khaparde (1): > net/bnxt: fix reporting of link status > >Alejandro Lucero (1): > vfio: fix array bounds check > >Andrew Rybchenko (1): > net/sfc: add Tx queue flush failed flag for sanity > >Andy Moreton (2): > net/sfc/base: fix error code usage in common code > net/sfc/base: let caller know that queue is already flushed > >Beilei Xing (1): > app/testpmd: fix creating E-Tag and NVGRE flow rules > >Chas Williams (3): > net/af_packet: handle possible null pointer > net/af_packet: fix packet bytes counting > net/ring: fix adding MAC addresses > >Dariusz Stojaczyk (2): > vhost: fix malloc size too small > vhost: fix guest pages memory leak > >David Marchand (1): > drivers/net: fix vfio kmod dependency > >Ferruh Yigit (3): > kni: fix build with gcc 7.1 > net/enic: fix build with gcc 7.1 > net/mlx5: fix build with gcc 7.1 > >Harish Patil (1): > net/qede: fix VXLAN tunnel Tx offload flag setting > >Jerin Jacob (1): > examples/vhost: fix uninitialized descriptor indexes > >John Miller (4): > net/ark: fix buffer not null terminated > net/ark: fix return code not checked > net/ark: fix null pointer dereference > net/ark: fix return value of null not checked > >Lee Roberts (1): > kni: fix build on RHEL 7.4 > >Markus Theil (1): > net/igb: fix add/delete of flex filters > >Nélio Laranjeiro (1): > net/mlx5: fix flow application order on stop/start > >Pascal Mazon (1): > net/tap: fix some flow collision > >Qi Zhang (3): > net/ixgbe: fix fdir mask not be reset > net/i40e: exclude internal packet's byte count > net/i40e: fix VF statistics > >Rahul Lakkireddy (2): > net/cxgbe: fix port statistics > net/cxgbe: fix rxq default params for ports under same PF > >Shahaf Shuler (1): > net/mlx5: fix completion buffer size > >Shijith Thotton (1): > net/liquidio: fix MTU calculation from port configuration > >Tiwei Bie (1): > net/virtio: zero the whole memory zone > >Wei Dai (1): > lpm: fix index of tbl8 > >Wei Zhao (1): > net/igb: fix checksum valid flags > >Wenzhuo Lu (1): > net/i40e/base: fix Tx error stats on VF > >Yongseok Koh (2): > net/mlx5: fix exception handling > net/mlx5: fix redundant free of Tx buffer > >Yuanhan Liu (1): > vhost: fix crash on NUMA
Re: [dpdk-dev] [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support
>From: Hu, Jiayu >Sent: Wednesday, August 30, 2017 3:56 AM >To: Ananyev, Konstantin >Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, Jianfeng > >Subject: Re: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support > >Hi Konstantin, > >Thanks for your important suggestions. My feedbacks are inline. > >On Wed, Aug 30, 2017 at 09:38:33AM +0800, Ananyev, Konstantin wrote: >> >> >> > -Original Message- >> > From: Hu, Jiayu >> > Sent: Thursday, August 24, 2017 3:16 PM >> > To: dev@dpdk.org >> > Cc: Kavanagh, Mark B ; Ananyev, Konstantin >; Tan, Jianfeng >> > ; Hu, Jiayu >> > Subject: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support >> > >> > This patch adds GSO support for TCP/IPv4 packets. Supported packets >> > may include a single VLAN tag. TCP/IPv4 GSO assumes that all input >> > packets have correct checksums, and doesn't update checksums for output >> > packets (the responsibility for this lies with the application). >> > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. >> > >> > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect >> > MBUF, to organize an output packet. Note that we refer to these two >> > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet >> > header, while the indirect mbuf simply points to a location within the >> > original packet's payload. Consequently, use of the GSO library requires >> > multi-segment MBUF support in the TX functions of the NIC driver. >> > >> > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a >> > result, when all of its GSOed segments are freed, the packet is freed >> > automatically. >> > >> > Signed-off-by: Jiayu Hu >> > Signed-off-by: Mark Kavanagh >> > --- >> > lib/librte_eal/common/include/rte_log.h | 1 + >> > lib/librte_gso/Makefile | 2 + >> > lib/librte_gso/gso_common.c | 270 > >> > lib/librte_gso/gso_common.h | 120 ++ >> > lib/librte_gso/gso_tcp.c| 82 ++ >> > lib/librte_gso/gso_tcp.h| 73 + >> > lib/librte_gso/rte_gso.c| 44 +- >> > lib/librte_gso/rte_gso.h| 3 + >> > 8 files changed, 593 insertions(+), 2 deletions(-) >> > create mode 100644 lib/librte_gso/gso_common.c >> > create mode 100644 lib/librte_gso/gso_common.h >> > create mode 100644 lib/librte_gso/gso_tcp.c >> > create mode 100644 lib/librte_gso/gso_tcp.h >> > >> > diff --git a/lib/librte_eal/common/include/rte_log.h >b/lib/librte_eal/common/include/rte_log.h >> > index ec8dba7..2fa1199 100644 >> > --- a/lib/librte_eal/common/include/rte_log.h >> > +++ b/lib/librte_eal/common/include/rte_log.h >> > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs; >> > #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */ >> > #define RTE_LOGTYPE_EFD 18 /**< Log related to EFD. */ >> > #define RTE_LOGTYPE_EVENTDEV 19 /**< Log related to eventdev. */ >> > +#define RTE_LOGTYPE_GSO 20 /**< Log related to GSO. */ >> > >> > /* these log types can be used in an application */ >> > #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */ >> > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile >> > index aeaacbc..0f8e38f 100644 >> > --- a/lib/librte_gso/Makefile >> > +++ b/lib/librte_gso/Makefile >> > @@ -42,6 +42,8 @@ LIBABIVER := 1 >> > >> > #source files >> > SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c >> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c >> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp.c >> > >> > # install this header file >> > SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h >> > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c >> > new file mode 100644 >> > index 000..2b54fbd >> > --- /dev/null >> > +++ b/lib/librte_gso/gso_common.c >> > @@ -0,0 +1,270 @@ >> > +/*- >> > + * BSD LICENSE >> > + * >> > + * Copyright(c) 2017 Intel Corporation. All rights reserved. >> > + * All rights reserved. >> > + * >> > + * Redistribution and use in source and binary forms, with or without >> > + * modification, are permitted provided that the following condit
Re: [dpdk-dev] [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support
>From: Ananyev, Konstantin >Sent: Wednesday, August 30, 2017 10:59 AM >To: Ananyev, Konstantin ; Kavanagh, Mark B >; Hu, Jiayu >Cc: dev@dpdk.org; Tan, Jianfeng >Subject: RE: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support > > > >> -Original Message- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev, Konstantin >> Sent: Wednesday, August 30, 2017 10:39 AM >> To: Kavanagh, Mark B ; Hu, Jiayu > >> Cc: dev@dpdk.org; Tan, Jianfeng >> Subject: Re: [dpdk-dev] [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support >> >> Hi Mark, >> >> > >> > + >> > >> > +void >> > >> > +gso_parse_packet(struct rte_mbuf *pkt) >> > >> >> > >> There is a function rte_net_get_ptype() that supposed to provide >similar >> > >functionality. >> > >> So we probably don't need to create a new SW parse function here, >instead >> > >would be better >> > >> to reuse (and update if needed) an existing one. >> > >> Again user already might have l2/l3/l4.../_len and packet_type setuped. >> > >> So better to keep SW packet parsing out of scope of that library. >> > > >> > >Hmm, I know we have discussed this design choice in the GRO library, and >I >> > >also think it's >> > >better to reuse these values. >> > > >> > >But from the perspective of OVS, it may add extra overhead, since OVS >doesn't >> > >parse every >> > >packet originally. Maybe @Mark can give us more inputs from the view of >OVS. >> > >> > Hi Jiayu, Konstantin >> > >> > For GSO, the application needs to know: >> > - the packet type (as it only currently supports TCP/IPv4, VxLAN, GRE >packets) >> > - the l2/3/4_lens, etc. (in order to replicate the original packet's >headers across outgoing segments) >> > >> > For this, we can use the rte_net_get_ptype function, as per Konstantin's >suggestion, as it provides both - thanks Konstantin! >> > >> > WRT the extra overhead in OvS: TSO is the defacto standard, and GSO is >provided purely as a fallback option. As such, and since the >> > additional packet parsing is a necessity in order to facilitate GSO, the >additional overhead is IMO acceptable. >> >> As I remember, for TSO in DPDK user still have to provide l2/l3/l4_len and >mss information to the PMD. Yes, that's correct. >> So unless user knows these value straightway (user creates a packet himself) >some packet processing will be unavailable anyway. That's correct also. Currently, packets that originate in a VM, and which have been marked for TSO, have the l2_len, etc. fields populated by the 'parse_ethernet' function, called as part of the call stack of the rte_vhost_dequeue_burst function, so that particular overhead is already implicit in the TSO case. >> Konstantin > >s/unavailable/unavoidable/ >sorry for bad typing. >Konstantin > >> > >> > Thanks, >> > Mark >> >
Re: [dpdk-dev] [PATCH 0/5] Support TCP/IPv4, VxLAN and GRE GSO in DPDK
>From: Ananyev, Konstantin >Sent: Wednesday, August 30, 2017 11:49 AM >To: Hu, Jiayu >Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, Jianfeng > >Subject: RE: [PATCH 0/5] Support TCP/IPv4, VxLAN and GRE GSO in DPDK > > > >> -Original Message- >> From: Hu, Jiayu >> Sent: Wednesday, August 30, 2017 8:37 AM >> To: Ananyev, Konstantin >> Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, >Jianfeng >> Subject: Re: [PATCH 0/5] Support TCP/IPv4, VxLAN and GRE GSO in DPDK >> >> Hi Konstantin, >> >> Thanks for your suggestions. Feedbacks are inline. >> >> Thanks, >> Jiayu >> >> On Wed, Aug 30, 2017 at 09:37:42AM +0800, Ananyev, Konstantin wrote: >> > >> > Hi Jiayu, >> > Few questions/comments from me below in in next few mails. >> > Thanks >> > Konstantin >> > >> > > >> > > Generic Segmentation Offload (GSO) is a SW technique to split large >> > > packets into small ones. Akin to TSO, GSO enables applications to >> > > operate on large packets, thus reducing per-packet processing overhead. >> > > >> > > To enable more flexibility to applications, DPDK GSO is implemented >> > > as a standalone library. Applications explicitly use the GSO library >> > > to segment packets. This patch adds GSO support to DPDK for specific >> > > packet types: specifically, TCP/IPv4, VxLAN, and GRE. >> > > >> > > The first patch introduces the GSO API framework. The second patch >> > > adds GSO support for TCP/IPv4 packets (containing an optional VLAN >> > > tag). The third patch adds GSO support for VxLAN packets that contain >> > > outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or >> > > outer VLAN tags). The fourth patch adds GSO support for GRE packets >> > > that contain outer IPv4, and inner TCP/IPv4 headers (with optional >> > > outer VLAN tag). The last patch in the series enables TCP/IPv4, VxLAN, >> > > and GRE GSO in testpmd's checksum forwarding engine. >> > > >> > > The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using >> > > iperf. Setup for the test is described as follows: >> > > >> > > a. Connect 2 x 10Gbps physical ports (P0, P1), together physically. >> > > b. Launch testpmd with P0 and a vhost-user port, and use csum >> > >forwarding engine. >> > > c. Select IP and TCP HW checksum calculation for P0; select TCP HW >> > >checksum calculation for vhost-user port. >> > > d. Launch a VM with csum and tso offloading enabled. >> > > e. Run iperf-client on virtio-net port in the VM to send TCP packets. >> > >> > Not sure I understand the setup correctly: >> > So testpmd forwards packets between P0 and vhost-user port, right? >> >> Yes. >> >> > And who uses P1? iperf-server over linux kernel? >> >> P1 is possessed by linux kernel. >> >> > Also is P1 on another box or not? >> >> P0 and P1 are in the same machine and are connected physically. >> >> > >> > > >> > > With GSO enabled for P0 in testpmd, observed iperf throughput is ~9Gbps. >> > >> > Ok, and if GSO is disabled what is the throughput? >> > Another stupid question: if P0 is physical 10G (ixgbe?) we can just enable >a TSO on it, right? >> > If so, what would be the TSO numbers here? >> >> Here are more detailed experiment information: >> >> test1: only enable GSO for p0, GSO size is 1518, use two iperf-clients (i.e. >"-P 2") >> test2: only enable TSO for p0, TSO size is 1518, use two iperf-clients >> test3: disable TSO and GSO, use two iperf-clients >> >> test1 performance: 8.6Gpbs >> test2 throughput: 9.5Gbps >> test3 throughput: 3Mbps > >Ok thanks for detailed explanation. >I' d suggest you put it into next version cover letter. Thanks Konstantin - will do. > >> >> > >> > In fact, could you probably explain a bit more, what supposed to be a main >usage model for that library? >> >> The GSO library is just a SW segmentation method, which can be used by >applications, like OVS. >> Currently, most of NICs supports to segment TCP and UDP packets, but not for >all NICs. So current >> OVS doesn't enable TSO, as a result of lacking a SW segmentation fallback. >Besides, the protocol >> types in HW segmentation are limited. So it's necessary to provide a SW >segmentation solutio
Re: [dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>From: Ananyev, Konstantin >Sent: Wednesday, September 13, 2017 10:38 AM >To: Hu, Jiayu >Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, Jianfeng > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > > > >> > > + >> > > +int >> > > +gso_tcp4_segment(struct rte_mbuf *pkt, >> > > +uint16_t gso_size, >> > > +uint8_t ipid_delta, >> > > +struct rte_mempool *direct_pool, >> > > +struct rte_mempool *indirect_pool, >> > > +struct rte_mbuf **pkts_out, >> > > +uint16_t nb_pkts_out) >> > > +{ >> > > +struct ipv4_hdr *ipv4_hdr; >> > > +uint16_t tcp_dl; >> > > +uint16_t pyld_unit_size; >> > > +uint16_t hdr_offset; >> > > +int ret = 1; >> > > + >> > > +ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) + >> > > +pkt->l2_len); >> > > +/* Don't process the fragmented packet */ >> > > +if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16( >> > > +IPV4_HDR_DF_MASK)) == >> > > 0)) { >> > >> > >> > It is not a check for fragmented packet - it is a check that fragmentation >is allowed for that packet. >> > Should be IPV4_HDR_DF_MASK - 1, I think. > >DF bit doesn't indicate is packet fragmented or not. >It forbids to fragment packet any further. >To check is packet already fragmented or not, you have to check MF bit and >frag_offset. >Both have to be zero for un-fragmented packets. > >> >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. It's a >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order. >> So the value of DF bit should be "ipv4_hdr->fragment_offset & >rte_cpu_to_be_16( >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented. >> >> > >> > > +pkts_out[0] = pkt; >> > > +return ret; >> > > +} >> > > + >> > > +tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len >> > > - >> > > +pkt->l4_len; >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len? >> >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len here. >> >> > >> > > +/* Don't process the packet without data */ >> > > +if (unlikely(tcp_dl == 0)) { >> > > +pkts_out[0] = pkt; >> > > +return ret; >> > > +} >> > > + >> > > +hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len; >> > > +pyld_unit_size = gso_size - hdr_offset - ETHER_CRC_LEN; >> > >> > Hmm, why do we need to count CRC_LEN here? >> >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should be >> included in gso_size. > >Why? >What is the point to account crc len into this computation? >Why not just assume that gso_size is already a max_frame_size - crc_len >As I remember, when we RX packet crc bytes will be already stripped, >when user populates the packet, he doesn't care about crc bytes too. Hi Konstantin, When packet is tx'd, the 4B for CRC are added back into the packet; if the payload is already at max capacity, then the actual segment size will be 4B larger than expected (e.g. 1522B, as opposed to 1518B). To prevent that from happening, we account for the CRC len in this calculation. If I've missed anything, please do let me know! Thanks, Mark > >Konstantin
Re: [dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>From: Hu, Jiayu >Sent: Thursday, September 14, 2017 2:00 AM >To: Ananyev, Konstantin ; Kavanagh, Mark B > >Cc: dev@dpdk.org; Tan, Jianfeng >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > >Hi Konstantin, > >> -Original Message- >> From: Ananyev, Konstantin >> Sent: Wednesday, September 13, 2017 11:13 PM >> To: Kavanagh, Mark B ; Hu, Jiayu >> >> Cc: dev@dpdk.org; Tan, Jianfeng >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> Hi Mark, >> >> > -Original Message- >> > From: Kavanagh, Mark B >> > Sent: Wednesday, September 13, 2017 3:52 PM >> > To: Ananyev, Konstantin ; Hu, Jiayu >> >> > Cc: dev@dpdk.org; Tan, Jianfeng >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > >> > >From: Ananyev, Konstantin >> > >Sent: Wednesday, September 13, 2017 10:38 AM >> > >To: Hu, Jiayu >> > >Cc: dev@dpdk.org; Kavanagh, Mark B ; >> Tan, Jianfeng >> > > >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > > >> > > >> > > >> > >> > > + >> > >> > > +int >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt, >> > >> > > + uint16_t gso_size, >> > >> > > + uint8_t ipid_delta, >> > >> > > + struct rte_mempool *direct_pool, >> > >> > > + struct rte_mempool *indirect_pool, >> > >> > > + struct rte_mbuf **pkts_out, >> > >> > > + uint16_t nb_pkts_out) >> > >> > > +{ >> > >> > > + struct ipv4_hdr *ipv4_hdr; >> > >> > > + uint16_t tcp_dl; >> > >> > > + uint16_t pyld_unit_size; >> > >> > > + uint16_t hdr_offset; >> > >> > > + int ret = 1; >> > >> > > + >> > >> > > + ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) >> + >> > >> > > + pkt->l2_len); >> > >> > > + /* Don't process the fragmented packet */ >> > >> > > + if (unlikely((ipv4_hdr->fragment_offset & rte_cpu_to_be_16( >> > >> > > + >> IPV4_HDR_DF_MASK)) == 0)) { >> > >> > >> > >> > >> > >> > It is not a check for fragmented packet - it is a check that >> fragmentation >> > >is allowed for that packet. >> > >> > Should be IPV4_HDR_DF_MASK - 1, I think. >> > > >> > >DF bit doesn't indicate is packet fragmented or not. >> > >It forbids to fragment packet any further. >> > >To check is packet already fragmented or not, you have to check MF bit >> and >> > >frag_offset. >> > >Both have to be zero for un-fragmented packets. >> > > >> > >> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF bit. >> It's a >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian order. >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset & >> > >rte_cpu_to_be_16( >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is fragmented. >> > >> >> > >> > >> > >> > > + pkts_out[0] = pkt; >> > >> > > + return ret; >> > >> > > + } >> > >> > > + >> > >> > > + tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt- >> >l3_len - >> > >> > > + pkt->l4_len; >> > >> > >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len? >> > >> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - pkt_l4_len >here. >> > >> >> > >> > >> > >> > > + /* Don't process the packet without data */ >> > >> > > + if (unlikely(tcp_dl == 0)) { >> > >> > > + pkts_out[0] = pkt; >> > >> > > + return ret; >> > >> > > + } >> > >> > > + >> > >> > > + hdr_offset = pkt->l2_len + pkt->l3_len + pkt->l4_len; >> > >> > > + pyld
Re: [dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>From: Hu, Jiayu >Sent: Thursday, September 14, 2017 7:07 AM >To: Ananyev, Konstantin >Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, Jianfeng > >Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > >Hi Konstantin, > >On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote: >> >> Hi Jiayu, >> >> > > >> > > >> > > > -Original Message- >> > > > From: Ananyev, Konstantin >> > > > Sent: Tuesday, September 12, 2017 12:18 PM >> > > > To: Hu, Jiayu ; dev@dpdk.org >> > > > Cc: Kavanagh, Mark B ; Tan, Jianfeng > >> > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > > > >> > > > > result, when all of its GSOed segments are freed, the packet is >freed >> > > > > automatically. >> > > > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c >> > > > > index dda50ee..95f6ea6 100644 >> > > > > --- a/lib/librte_gso/rte_gso.c >> > > > > +++ b/lib/librte_gso/rte_gso.c >> > > > > @@ -33,18 +33,53 @@ >> > > > > >> > > > > #include >> > > > > >> > > > > +#include >> > > > > + >> > > > > #include "rte_gso.h" >> > > > > +#include "gso_common.h" >> > > > > +#include "gso_tcp4.h" >> > > > > >> > > > > int >> > > > > rte_gso_segment(struct rte_mbuf *pkt, >> > > > > -struct rte_gso_ctx gso_ctx __rte_unused, >> > > > > +struct rte_gso_ctx gso_ctx, >> > > > > struct rte_mbuf **pkts_out, >> > > > > uint16_t nb_pkts_out) >> > > > > { >> > > > > +struct rte_mempool *direct_pool, *indirect_pool; >> > > > > +struct rte_mbuf *pkt_seg; >> > > > > +uint16_t gso_size; >> > > > > +uint8_t ipid_delta; >> > > > > +int ret = 1; >> > > > > + >> > > > > if (pkt == NULL || pkts_out == NULL || nb_pkts_out < 1) >> > > > > return -EINVAL; >> > > > > >> > > > > -pkts_out[0] = pkt; >> > > > > +if (gso_ctx.gso_size >= pkt->pkt_len || >> > > > > +(pkt->packet_type & gso_ctx.gso_types) != >> > > > > +pkt->packet_type) { >> > > > > +pkts_out[0] = pkt; >> > > > > +return ret; >> > > > > +} >> > > > > + >> > > > > +direct_pool = gso_ctx.direct_pool; >> > > > > +indirect_pool = gso_ctx.indirect_pool; >> > > > > +gso_size = gso_ctx.gso_size; >> > > > > +ipid_delta = gso_ctx.ipid_flag == RTE_GSO_IPID_INCREASE; >> > > > > + >> > > > > +if (is_ipv4_tcp(pkt->packet_type)) { >> > > > >> > > > Probably we need here: >> > > > If (is_ipv4_tcp(pkt->packet_type) && (gso_ctx->gso_types & >DEV_TX_OFFLOAD_TCP_TSO) != 0) {... >> > > >> > > Sorry, actually it probably should be: >> > > If (pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_IPV4) == PKT_TX_IPV4 && >> > > (gso_ctx->gso_types & DEV_TX_OFFLOAD_TCP_TSO) != 0) {... >> > >> > I don't quite understand why the GSO library should be aware if the TSO >> > flag is set or not. Applications can query device TSO capability before >> > they call the GSO library. Do I misundertsand anything? >> > >> > Additionally, we don't need to check if the packet is a TCP/IPv4 packet >here? >> >> Well, right now PMD we doesn't rely on ptype to figure out what type of >packet and >> what TX offload have to be performed. >> Instead it looks at TX part of ol_flags, and >> My thought was that as what we doing is actually TSO in SW, it would be good >> to use the same API here too. >> Also with that approach, by setting ol_flags properly user can use the same >gso_ctx and still >> specify what segmentation to perform on a per-packet basis. >> >> Alternative way is to rely on ptype to distinguish should segmentation be >performed on that package or not. >> The only advantage I see here
Re: [dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>From: Ananyev, Konstantin >Sent: Thursday, September 14, 2017 9:40 AM >To: Kavanagh, Mark B ; Hu, Jiayu > >Cc: dev@dpdk.org; Tan, Jianfeng >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > > > >> -----Original Message- >> From: Kavanagh, Mark B >> Sent: Thursday, September 14, 2017 9:35 AM >> To: Hu, Jiayu ; Ananyev, Konstantin > >> Cc: dev@dpdk.org; Tan, Jianfeng >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >From: Hu, Jiayu >> >Sent: Thursday, September 14, 2017 2:00 AM >> >To: Ananyev, Konstantin ; Kavanagh, Mark B >> > >> >Cc: dev@dpdk.org; Tan, Jianfeng >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > >> >Hi Konstantin, >> > >> >> -Original Message- >> >> From: Ananyev, Konstantin >> >> Sent: Wednesday, September 13, 2017 11:13 PM >> >> To: Kavanagh, Mark B ; Hu, Jiayu >> >> >> >> Cc: dev@dpdk.org; Tan, Jianfeng >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> >> Hi Mark, >> >> >> >> > -Original Message- >> >> > From: Kavanagh, Mark B >> >> > Sent: Wednesday, September 13, 2017 3:52 PM >> >> > To: Ananyev, Konstantin ; Hu, Jiayu >> >> >> >> > Cc: dev@dpdk.org; Tan, Jianfeng >> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> > >> >> > >From: Ananyev, Konstantin >> >> > >Sent: Wednesday, September 13, 2017 10:38 AM >> >> > >To: Hu, Jiayu >> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B ; >> >> Tan, Jianfeng >> >> > > >> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> > > >> >> > > >> >> > > >> >> > >> > > + >> >> > >> > > +int >> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt, >> >> > >> > > +uint16_t gso_size, >> >> > >> > > +uint8_t ipid_delta, >> >> > >> > > +struct rte_mempool *direct_pool, >> >> > >> > > +struct rte_mempool *indirect_pool, >> >> > >> > > +struct rte_mbuf **pkts_out, >> >> > >> > > +uint16_t nb_pkts_out) >> >> > >> > > +{ >> >> > >> > > +struct ipv4_hdr *ipv4_hdr; >> >> > >> > > +uint16_t tcp_dl; >> >> > >> > > +uint16_t pyld_unit_size; >> >> > >> > > +uint16_t hdr_offset; >> >> > >> > > +int ret = 1; >> >> > >> > > + >> >> > >> > > +ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, >> >> > >> > > char *) >> >> + >> >> > >> > > +pkt->l2_len); >> >> > >> > > +/* Don't process the fragmented packet */ >> >> > >> > > +if (unlikely((ipv4_hdr->fragment_offset & >> >> > >> > > rte_cpu_to_be_16( >> >> > >> > > + >> >> IPV4_HDR_DF_MASK)) == 0)) { >> >> > >> > >> >> > >> > >> >> > >> > It is not a check for fragmented packet - it is a check that >> >> fragmentation >> >> > >is allowed for that packet. >> >> > >> > Should be IPV4_HDR_DF_MASK - 1, I think. >> >> > > >> >> > >DF bit doesn't indicate is packet fragmented or not. >> >> > >It forbids to fragment packet any further. >> >> > >To check is packet already fragmented or not, you have to check MF bit >> >> and >> >> > >frag_offset. >> >> > >Both have to be zero for un-fragmented packets. >> >> > > >> >> > >> >> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get DF >bit. >> >> It's a >> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-endian >order. >> >> > >> So the value of DF
Re: [dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>From: Ananyev, Konstantin >Sent: Thursday, September 14, 2017 10:11 AM >To: Kavanagh, Mark B ; Hu, Jiayu > >Cc: dev@dpdk.org; Tan, Jianfeng >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > > > >> -----Original Message- >> From: Kavanagh, Mark B >> Sent: Thursday, September 14, 2017 10:01 AM >> To: Ananyev, Konstantin ; Hu, Jiayu > >> Cc: dev@dpdk.org; Tan, Jianfeng >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >From: Ananyev, Konstantin >> >Sent: Thursday, September 14, 2017 9:40 AM >> >To: Kavanagh, Mark B ; Hu, Jiayu >> > >> >Cc: dev@dpdk.org; Tan, Jianfeng >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > >> > >> > >> >> -Original Message- >> >> From: Kavanagh, Mark B >> >> Sent: Thursday, September 14, 2017 9:35 AM >> >> To: Hu, Jiayu ; Ananyev, Konstantin >> > >> >> Cc: dev@dpdk.org; Tan, Jianfeng >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> >> >From: Hu, Jiayu >> >> >Sent: Thursday, September 14, 2017 2:00 AM >> >> >To: Ananyev, Konstantin ; Kavanagh, Mark B >> >> > >> >> >Cc: dev@dpdk.org; Tan, Jianfeng >> >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> > >> >> >Hi Konstantin, >> >> > >> >> >> -Original Message- >> >> >> From: Ananyev, Konstantin >> >> >> Sent: Wednesday, September 13, 2017 11:13 PM >> >> >> To: Kavanagh, Mark B ; Hu, Jiayu >> >> >> >> >> >> Cc: dev@dpdk.org; Tan, Jianfeng >> >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> >> >> >> Hi Mark, >> >> >> >> >> >> > -Original Message- >> >> >> > From: Kavanagh, Mark B >> >> >> > Sent: Wednesday, September 13, 2017 3:52 PM >> >> >> > To: Ananyev, Konstantin ; Hu, Jiayu >> >> >> >> >> >> > Cc: dev@dpdk.org; Tan, Jianfeng >> >> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> > >> >> >> > >From: Ananyev, Konstantin >> >> >> > >Sent: Wednesday, September 13, 2017 10:38 AM >> >> >> > >To: Hu, Jiayu >> >> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B ; >> >> >> Tan, Jianfeng >> >> >> > > >> >> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > >> > > + >> >> >> > >> > > +int >> >> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt, >> >> >> > >> > > + uint16_t gso_size, >> >> >> > >> > > + uint8_t ipid_delta, >> >> >> > >> > > + struct rte_mempool *direct_pool, >> >> >> > >> > > + struct rte_mempool *indirect_pool, >> >> >> > >> > > + struct rte_mbuf **pkts_out, >> >> >> > >> > > + uint16_t nb_pkts_out) >> >> >> > >> > > +{ >> >> >> > >> > > + struct ipv4_hdr *ipv4_hdr; >> >> >> > >> > > + uint16_t tcp_dl; >> >> >> > >> > > + uint16_t pyld_unit_size; >> >> >> > >> > > + uint16_t hdr_offset; >> >> >> > >> > > + int ret = 1; >> >> >> > >> > > + >> >> >> > >> > > + ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, >char *) >> >> >> + >> >> >> > >> > > + pkt->l2_len); >> >> >> > >> > > + /* Don't process the fragmented packet */ >> >> >> > >> > > + if (unlikely((ipv4_hdr->fragment_offset & >rte_cpu_to_be_16( >> >> >> > >> > > + >> >> >>IPV4_HDR_DF_MASK)) == 0)) { >> >> &g
Re: [dpdk-dev] [PATCH v3 2/5] gso: add TCP/IPv4 GSO support
>From: Hu, Jiayu >Sent: Thursday, September 14, 2017 11:01 AM >To: Ananyev, Konstantin ; Kavanagh, Mark B > >Cc: dev@dpdk.org; Tan, Jianfeng >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > >Hi Konstantin and Mark, > >> -Original Message- >> From: Ananyev, Konstantin >> Sent: Thursday, September 14, 2017 5:36 PM >> To: Hu, Jiayu >> Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, >> Jianfeng >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> >> > -Original Message- >> > From: Hu, Jiayu >> > Sent: Thursday, September 14, 2017 10:29 AM >> > To: Ananyev, Konstantin >> > Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, >> Jianfeng >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > >> > Hi Konstantin, >> > >> > > -Original Message- >> > > From: Ananyev, Konstantin >> > > Sent: Thursday, September 14, 2017 4:47 PM >> > > To: Hu, Jiayu >> > > Cc: dev@dpdk.org; Kavanagh, Mark B ; >> Tan, >> > > Jianfeng >> > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > > >> > > Hi Jiayu, >> > > >> > > > -Original Message- >> > > > From: Hu, Jiayu >> > > > Sent: Thursday, September 14, 2017 7:07 AM >> > > > To: Ananyev, Konstantin >> > > > Cc: dev@dpdk.org; Kavanagh, Mark B ; >> Tan, >> > > Jianfeng >> > > > Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > > > >> > > > Hi Konstantin, >> > > > >> > > > On Thu, Sep 14, 2017 at 06:10:37AM +0800, Ananyev, Konstantin wrote: >> > > > > >> > > > > Hi Jiayu, >> > > > > >> > > > > > > >> > > > > > > >> > > > > > > > -Original Message- >> > > > > > > > From: Ananyev, Konstantin >> > > > > > > > Sent: Tuesday, September 12, 2017 12:18 PM >> > > > > > > > To: Hu, Jiayu ; dev@dpdk.org >> > > > > > > > Cc: Kavanagh, Mark B ; Tan, >> Jianfeng >> > > >> > > > > > > > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > > > > > > > >> > > > > > > > > result, when all of its GSOed segments are freed, the packet >is >> > > freed >> > > > > > > > > automatically. >> > > > > > > > > diff --git a/lib/librte_gso/rte_gso.c >b/lib/librte_gso/rte_gso.c >> > > > > > > > > index dda50ee..95f6ea6 100644 >> > > > > > > > > --- a/lib/librte_gso/rte_gso.c >> > > > > > > > > +++ b/lib/librte_gso/rte_gso.c >> > > > > > > > > @@ -33,18 +33,53 @@ >> > > > > > > > > >> > > > > > > > > #include >> > > > > > > > > >> > > > > > > > > +#include >> > > > > > > > > + >> > > > > > > > > #include "rte_gso.h" >> > > > > > > > > +#include "gso_common.h" >> > > > > > > > > +#include "gso_tcp4.h" >> > > > > > > > > >> > > > > > > > > int >> > > > > > > > > rte_gso_segment(struct rte_mbuf *pkt, >> > > > > > > > > -struct rte_gso_ctx gso_ctx __rte_unused, >> > > > > > > > > +struct rte_gso_ctx gso_ctx, >> > > > > > > > > struct rte_mbuf **pkts_out, >> > > > > > > > > uint16_t nb_pkts_out) >> > > > > > > > > { >> > > > > > > > > +struct rte_mempool *direct_pool, *indirect_pool; >> > > > > > > > > +struct rte_mbuf *pkt_seg; >> > > > > > > > > +uint16_t gso_size; >> > > > > > > > > +uint8_t ipid_delta; >> > > > > > > > > +int ret = 1; >> > > > > > > > > + >> > > > > > > > > if (pkt == NULL || pkts_out == NULL || nb_pkts_out &l
Re: [dpdk-dev] [PATCH v5 2/6] gso: add TCP/IPv4 GSO support
Thanks for your comments Jiayu - please find responses inline. Thanks, Mark From: Hu, Jiayu >Sent: Friday, September 29, 2017 4:13 AM >To: Kavanagh, Mark B >Cc: dev@dpdk.org; Tan, Jianfeng ; Ananyev, Konstantin >; Yigit, Ferruh ; >tho...@monjalon.net >Subject: Re: [PATCH v5 2/6] gso: add TCP/IPv4 GSO support > >Hi Mark, > >One comment is inline. > >Thanks, >Jiayu > >On Thu, Sep 28, 2017 at 11:13:49PM +0100, Mark Kavanagh wrote: >> From: Jiayu Hu >> >> This patch adds GSO support for TCP/IPv4 packets. Supported packets >> may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input >> packets have correct checksums, and doesn't update checksums for >> output packets (the responsibility for this lies with the application). >> Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. >> >> TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect >> MBUF, to organize an output packet. Note that we refer to these two >> chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet >> header, while the indirect mbuf simply points to a location within the >> original packet's payload. Consequently, use of the GSO library requires >> multi-segment MBUF support in the TX functions of the NIC driver. >> >> If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a >> result, when all of its GSOed segments are freed, the packet is freed >> automatically. >> >> Signed-off-by: Jiayu Hu >> Signed-off-by: Mark Kavanagh >> Tested-by: Lei Yao >> --- >> doc/guides/rel_notes/release_17_11.rst | 12 +++ >> lib/librte_eal/common/include/rte_log.h | 1 + >> lib/librte_gso/Makefile | 2 + >> lib/librte_gso/gso_common.c | 153 > >> lib/librte_gso/gso_common.h | 141 + >> lib/librte_gso/gso_tcp4.c | 106 ++ >> lib/librte_gso/gso_tcp4.h | 74 +++ >> lib/librte_gso/rte_gso.c| 52 ++- >> 8 files changed, 538 insertions(+), 3 deletions(-) >> create mode 100644 lib/librte_gso/gso_common.c >> create mode 100644 lib/librte_gso/gso_common.h >> create mode 100644 lib/librte_gso/gso_tcp4.c >> create mode 100644 lib/librte_gso/gso_tcp4.h >> >> diff --git a/doc/guides/rel_notes/release_17_11.rst >b/doc/guides/rel_notes/release_17_11.rst >> index 7508be7..c414f73 100644 >> --- a/doc/guides/rel_notes/release_17_11.rst >> +++ b/doc/guides/rel_notes/release_17_11.rst >> @@ -41,6 +41,18 @@ New Features >> Also, make sure to start the actual text at the margin. >> = >> >> +* **Added the Generic Segmentation Offload Library.** >> + >> + Added the Generic Segmentation Offload (GSO) library to enable >> + applications to split large packets (e.g. MTU is 64KB) into small >> + ones (e.g. MTU is 1500B). Supported packet types are: >> + >> + * TCP/IPv4 packets, which may include a single VLAN tag. >> + >> + The GSO library doesn't check if the input packets have correct >> + checksums, and doesn't update checksums for output packets. >> + Additionally, the GSO library doesn't process IP fragmented packets. >> + >> >> Resolved Issues >> --- >> diff --git a/lib/librte_eal/common/include/rte_log.h >b/lib/librte_eal/common/include/rte_log.h >> index ec8dba7..2fa1199 100644 >> --- a/lib/librte_eal/common/include/rte_log.h >> +++ b/lib/librte_eal/common/include/rte_log.h >> @@ -87,6 +87,7 @@ struct rte_logs { >> #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */ >> #define RTE_LOGTYPE_EFD 18 /**< Log related to EFD. */ >> #define RTE_LOGTYPE_EVENTDEV 19 /**< Log related to eventdev. */ >> +#define RTE_LOGTYPE_GSO 20 /**< Log related to GSO. */ >> >> /* these log types can be used in an application */ >> #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */ >> diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile >> index aeaacbc..2be64d1 100644 >> --- a/lib/librte_gso/Makefile >> +++ b/lib/librte_gso/Makefile >> @@ -42,6 +42,8 @@ LIBABIVER := 1 >> >> #source files >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c >> +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c >> +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c >> >> # install this header file >> SYMLINK-$(CONFIG
Re: [dpdk-dev] [PATCH v6 1/6] gso: add Generic Segmentation Offload API framework
>From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 2:11 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 1/6] gso: add Generic Segmentation Offload API >framework > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Monday, October 2, 2017 5:46 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v6 1/6] gso: add Generic Segmentation Offload API framework >> >> From: Jiayu Hu >> >> Generic Segmentation Offload (GSO) is a SW technique to split large >> packets into small ones. Akin to TSO, GSO enables applications to >> operate on large packets, thus reducing per-packet processing overhead. >> >> To enable more flexibility to applications, DPDK GSO is implemented >> as a standalone library. Applications explicitly use the GSO library >> to segment packets. To segment a packet requires two steps. The first >> is to set proper flags to mbuf->ol_flags, where the flags are the same >> as that of TSO. The second is to call the segmentation API, >> rte_gso_segment(). This patch introduces the GSO API framework to DPDK. >> >> rte_gso_segment() splits an input packet into small ones in each >> invocation. The GSO library refers to these small packets generated >> by rte_gso_segment() as GSO segments. Each of the newly-created GSO >> segments is organized as a two-segment MBUF, where the first segment is a >> standard MBUF, which stores a copy of packet header, and the second is an >> indirect MBUF which points to a section of data in the input packet. >> rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore, >> when all GSO segments are freed, the input packet is freed automatically. >> Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs), >> the driver of the interface which the GSO segments are sent to should >> support to transmit multi-segment packets. >> >> The GSO framework clears the PKT_TX_TCP_SEG flag for both the input >> packet, and all produced GSO segments in the event of success, since >> segmentation in hardware is no longer required at that point. >> >> Signed-off-by: Jiayu Hu >> Signed-off-by: Mark Kavanagh >> --- >> config/common_base | 5 ++ >> doc/api/doxy-api-index.md | 1 + >> doc/api/doxy-api.conf | 1 + >> doc/guides/rel_notes/release_17_11.rst | 1 + >> lib/Makefile | 2 + >> lib/librte_gso/Makefile| 49 +++ >> lib/librte_gso/rte_gso.c | 52 >> lib/librte_gso/rte_gso.h | 145 >+ >> lib/librte_gso/rte_gso_version.map | 7 ++ >> mk/rte.app.mk | 1 + >> 10 files changed, 264 insertions(+) >> create mode 100644 lib/librte_gso/Makefile >> create mode 100644 lib/librte_gso/rte_gso.c >> create mode 100644 lib/librte_gso/rte_gso.h >> create mode 100644 lib/librte_gso/rte_gso_version.map >> >> diff --git a/config/common_base b/config/common_base >> index 12f6be9..58ca5c0 100644 >> --- a/config/common_base >> +++ b/config/common_base >> @@ -653,6 +653,11 @@ CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n >> CONFIG_RTE_LIBRTE_GRO=y >> >> # >> +# Compile GSO library >> +# >> +CONFIG_RTE_LIBRTE_GSO=y >> + >> +# >> # Compile librte_meter >> # >> CONFIG_RTE_LIBRTE_METER=y >> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md >> index 19e0d4f..6512918 100644 >> --- a/doc/api/doxy-api-index.md >> +++ b/doc/api/doxy-api-index.md >> @@ -101,6 +101,7 @@ The public API headers are grouped by topics: >>[TCP](@ref rte_tcp.h), >>[UDP](@ref rte_udp.h), >>[GRO](@ref rte_gro.h), >> + [GSO](@ref rte_gso.h), >>[frag/reass] (@ref rte_ip_frag.h), >>[LPM IPv4 route] (@ref rte_lpm.h), >>[LPM IPv6 route] (@ref rte_lpm6.h), >> diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf >> index 823554f..408f2e6 100644 >> --- a/doc/api/doxy-api.conf >> +++ b/doc/api/doxy-api.conf >> @@ -47,6 +47,7 @@ INPUT = doc/api/doxy-api-index.md \ >>lib/librte_ether \ >>lib/librte_eventdev \ >>
Re: [dpdk-dev] [PATCH v6 2/6] gso: add TCP/IPv4 GSO support
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 2:36 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 2/6] gso: add TCP/IPv4 GSO support > > > >> -Original Message- >> From: Ananyev, Konstantin >> Sent: Wednesday, October 4, 2017 2:32 PM >> To: Kavanagh, Mark B ; dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >> Subject: RE: [PATCH v6 2/6] gso: add TCP/IPv4 GSO support >> >> Hi Mark, >> >> > -Original Message- >> > From: Kavanagh, Mark B >> > Sent: Monday, October 2, 2017 5:46 PM >> > To: dev@dpdk.org >> > Cc: Hu, Jiayu ; Tan, Jianfeng >; Ananyev, Konstantin ; >> Yigit, >> > Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> > Subject: [PATCH v6 2/6] gso: add TCP/IPv4 GSO support >> > >> > From: Jiayu Hu >> > >> > This patch adds GSO support for TCP/IPv4 packets. Supported packets >> > may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input >> > packets have correct checksums, and doesn't update checksums for >> > output packets (the responsibility for this lies with the application). >> > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. >> > >> > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect >> > MBUF, to organize an output packet. Note that we refer to these two >> > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet >> > header, while the indirect mbuf simply points to a location within the >> > original packet's payload. Consequently, use of the GSO library requires >> > multi-segment MBUF support in the TX functions of the NIC driver. >> > >> > If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a >> > result, when all of its GSOed segments are freed, the packet is freed >> > automatically. >> > >> > Signed-off-by: Jiayu Hu >> > Signed-off-by: Mark Kavanagh >> > Tested-by: Lei Yao >> > --- >> > doc/guides/rel_notes/release_17_11.rst | 12 +++ >> > lib/librte_eal/common/include/rte_log.h | 1 + >> > lib/librte_gso/Makefile | 2 + >> > lib/librte_gso/gso_common.c | 153 > >> > lib/librte_gso/gso_common.h | 141 >+ >> > lib/librte_gso/gso_tcp4.c | 104 ++ >> > lib/librte_gso/gso_tcp4.h | 74 +++ >> > lib/librte_gso/rte_gso.c| 52 ++- >> > 8 files changed, 536 insertions(+), 3 deletions(-) >> > create mode 100644 lib/librte_gso/gso_common.c >> > create mode 100644 lib/librte_gso/gso_common.h >> > create mode 100644 lib/librte_gso/gso_tcp4.c >> > create mode 100644 lib/librte_gso/gso_tcp4.h >> > >> > diff --git a/doc/guides/rel_notes/release_17_11.rst >b/doc/guides/rel_notes/release_17_11.rst >> > index 7508be7..c414f73 100644 >> > --- a/doc/guides/rel_notes/release_17_11.rst >> > +++ b/doc/guides/rel_notes/release_17_11.rst >> > @@ -41,6 +41,18 @@ New Features >> > Also, make sure to start the actual text at the margin. >> > = >> > >> > +* **Added the Generic Segmentation Offload Library.** >> > + >> > + Added the Generic Segmentation Offload (GSO) library to enable >> > + applications to split large packets (e.g. MTU is 64KB) into small >> > + ones (e.g. MTU is 1500B). Supported packet types are: >> > + >> > + * TCP/IPv4 packets, which may include a single VLAN tag. > >As a nit: I think it doesn't matter as you are relying on mbuf->l2_len. >Konstantin > Okay, I'll remove any mention of VLAN tags in the description - thanks!
Re: [dpdk-dev] [PATCH v6 2/6] gso: add TCP/IPv4 GSO support
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 2:32 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 2/6] gso: add TCP/IPv4 GSO support > >Hi Mark, > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Monday, October 2, 2017 5:46 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v6 2/6] gso: add TCP/IPv4 GSO support >> >> From: Jiayu Hu >> >> This patch adds GSO support for TCP/IPv4 packets. Supported packets >> may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input >> packets have correct checksums, and doesn't update checksums for >> output packets (the responsibility for this lies with the application). >> Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. >> >> TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect >> MBUF, to organize an output packet. Note that we refer to these two >> chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet >> header, while the indirect mbuf simply points to a location within the >> original packet's payload. Consequently, use of the GSO library requires >> multi-segment MBUF support in the TX functions of the NIC driver. >> >> If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a >> result, when all of its GSOed segments are freed, the packet is freed >> automatically. >> >> Signed-off-by: Jiayu Hu >> Signed-off-by: Mark Kavanagh >> Tested-by: Lei Yao >> --- >> doc/guides/rel_notes/release_17_11.rst | 12 +++ >> lib/librte_eal/common/include/rte_log.h | 1 + >> lib/librte_gso/Makefile | 2 + >> lib/librte_gso/gso_common.c | 153 > >> lib/librte_gso/gso_common.h | 141 + >> lib/librte_gso/gso_tcp4.c | 104 ++ >> lib/librte_gso/gso_tcp4.h | 74 +++ >> lib/librte_gso/rte_gso.c| 52 ++- >> 8 files changed, 536 insertions(+), 3 deletions(-) >> create mode 100644 lib/librte_gso/gso_common.c >> create mode 100644 lib/librte_gso/gso_common.h >> create mode 100644 lib/librte_gso/gso_tcp4.c >> create mode 100644 lib/librte_gso/gso_tcp4.h >> >> diff --git a/doc/guides/rel_notes/release_17_11.rst >b/doc/guides/rel_notes/release_17_11.rst >> index 7508be7..c414f73 100644 >> --- a/doc/guides/rel_notes/release_17_11.rst >> +++ b/doc/guides/rel_notes/release_17_11.rst >> @@ -41,6 +41,18 @@ New Features >> Also, make sure to start the actual text at the margin. >> = >> >> +* **Added the Generic Segmentation Offload Library.** >> + >> + Added the Generic Segmentation Offload (GSO) library to enable >> + applications to split large packets (e.g. MTU is 64KB) into small >> + ones (e.g. MTU is 1500B). Supported packet types are: >> + >> + * TCP/IPv4 packets, which may include a single VLAN tag. >> + >> + The GSO library doesn't check if the input packets have correct >> + checksums, and doesn't update checksums for output packets. >> + Additionally, the GSO library doesn't process IP fragmented packets. >> + >> >> Resolved Issues >> --- >> diff --git a/lib/librte_eal/common/include/rte_log.h >b/lib/librte_eal/common/include/rte_log.h >> index ec8dba7..2fa1199 100644 >> --- a/lib/librte_eal/common/include/rte_log.h >> +++ b/lib/librte_eal/common/include/rte_log.h >> @@ -87,6 +87,7 @@ struct rte_logs { >> #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */ >> #define RTE_LOGTYPE_EFD 18 /**< Log related to EFD. */ >> #define RTE_LOGTYPE_EVENTDEV 19 /**< Log related to eventdev. */ >> +#define RTE_LOGTYPE_GSO 20 /**< Log related to GSO. */ >> >> /* these log types can be used in an application */ >> #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */ >> diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile >> index aeaacbc..2be64d1 100644 >> --- a/lib/librte_gso/Makefile >> +++ b/lib/librte_gso/Makefile >> @@ -42,6 +42,8 @@ LIBABIVER := 1 >> >> #source files >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c >
Re: [dpdk-dev] [PATCH v6 3/6] gso: add VxLAN GSO support
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 3:12 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 3/6] gso: add VxLAN GSO support > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Monday, October 2, 2017 5:46 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v6 3/6] gso: add VxLAN GSO support >> >> This patch adds a framework that allows GSO on tunneled packets. >> Furthermore, it leverages that framework to provide GSO support for >> VxLAN-encapsulated packets. >> >> Supported VxLAN packets must have an outer IPv4 header (prepended by an >> optional VLAN tag), and contain an inner TCP/IPv4 packet (with an optional >> inner VLAN tag). >> >> VxLAN GSO doesn't check if input packets have correct checksums and >> doesn't update checksums for output packets. Additionally, it doesn't >> process IP fragmented packets. >> >> As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each >> output packet, which mandates support for multi-segment mbufs in the TX >> functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO >> reduces its MBUF refcnt by 1. As a result, when all of its GSO'd segments >> are freed, the packet is freed automatically. >> >> Signed-off-by: Mark Kavanagh >> Signed-off-by: Jiayu Hu >> --- >> doc/guides/rel_notes/release_17_11.rst | 3 + >> lib/librte_gso/Makefile| 1 + >> lib/librte_gso/gso_common.h| 25 +++ >> lib/librte_gso/gso_tunnel_tcp4.c | 123 >+ >> lib/librte_gso/gso_tunnel_tcp4.h | 75 >> lib/librte_gso/rte_gso.c | 13 +++- >> 6 files changed, 237 insertions(+), 3 deletions(-) >> create mode 100644 lib/librte_gso/gso_tunnel_tcp4.c >> create mode 100644 lib/librte_gso/gso_tunnel_tcp4.h >> >> diff --git a/doc/guides/rel_notes/release_17_11.rst >b/doc/guides/rel_notes/release_17_11.rst >> index c414f73..25b8a78 100644 >> --- a/doc/guides/rel_notes/release_17_11.rst >> +++ b/doc/guides/rel_notes/release_17_11.rst >> @@ -48,6 +48,9 @@ New Features >>ones (e.g. MTU is 1500B). Supported packet types are: >> >>* TCP/IPv4 packets, which may include a single VLAN tag. >> + * VxLAN packets, which must have an outer IPv4 header (prepended by >> +an optional VLAN tag), and contain an inner TCP/IPv4 packet (with >> +an optional VLAN tag). >> >>The GSO library doesn't check if the input packets have correct >>checksums, and doesn't update checksums for output packets. >> diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile >> index 2be64d1..e6d41df 100644 >> --- a/lib/librte_gso/Makefile >> +++ b/lib/librte_gso/Makefile >> @@ -44,6 +44,7 @@ LIBABIVER := 1 >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c >> +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel_tcp4.c >> >> # install this header file >> SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h >> diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h >> index 8d9b94e..c051295 100644 >> --- a/lib/librte_gso/gso_common.h >> +++ b/lib/librte_gso/gso_common.h >> @@ -39,6 +39,7 @@ >> #include >> #include >> #include >> +#include >> >> #define IS_FRAGMENTED(frag_off) (((frag_off) & IPV4_HDR_OFFSET_MASK) != 0 \ >> || ((frag_off) & IPV4_HDR_MF_FLAG) == IPV4_HDR_MF_FLAG) >> @@ -49,6 +50,30 @@ >> #define IS_IPV4_TCP(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4)) == \ >> (PKT_TX_TCP_SEG | PKT_TX_IPV4)) >> >> +#define IS_IPV4_VXLAN_TCP4(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4 | >\ >> +PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN)) == \ >> +(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \ >> + PKT_TX_TUNNEL_VXLAN)) >> + >> +/** >> + * Internal function which updates the UDP header of a packet, following >> + * segmentation. This is required to update the header's datagram length >field. >> + * >> + * @param pkt >> + * The packet containing the UDP header. >> + * @param udp_o
Re: [dpdk-dev] [PATCH v6 4/6] gso: add GRE GSO support
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 3:16 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 4/6] gso: add GRE GSO support > > >> diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c >> index 6095689..b748ab1 100644 >> --- a/lib/librte_gso/rte_gso.c >> +++ b/lib/librte_gso/rte_gso.c >> @@ -60,8 +60,9 @@ >> >> if ((gso_ctx->gso_size >= pkt->pkt_len) || (gso_ctx->gso_types & >> (DEV_TX_OFFLOAD_TCP_TSO | >> - DEV_TX_OFFLOAD_VXLAN_TNL_TSO)) != >> -gso_ctx->gso_types) { >> + DEV_TX_OFFLOAD_VXLAN_TNL_TSO | >> + DEV_TX_OFFLOAD_GRE_TNL_TSO)) != >> + gso_ctx->gso_types) { >> pkt->ol_flags &= (~PKT_TX_TCP_SEG); >> pkts_out[0] = pkt; >> return 1; >> @@ -73,7 +74,8 @@ >> ipid_delta = (gso_ctx->ipid_flag != RTE_GSO_IPID_FIXED); >> ol_flags = pkt->ol_flags; >> >> -if (IS_IPV4_VXLAN_TCP4(pkt->ol_flags)) { >> +if (IS_IPV4_VXLAN_TCP4(pkt->ol_flags) || >> +IS_IPV4_GRE_TCP4(pkt->ol_flags)) { > >Same comment as for previous patch: user might want that ctx to >Segment vxlan packets and not segment gro packets. >Konstantin Thanks Konstantin - I'll update appropriately. > >> pkt->ol_flags &= (~PKT_TX_TCP_SEG); >> ret = gso_tunnel_tcp4_segment(pkt, gso_size, ipid_delta, >> direct_pool, indirect_pool, >> -- >> 1.9.3
Re: [dpdk-dev] [PATCH v6 2/6] gso: add TCP/IPv4 GSO support
>From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 3:49 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 2/6] gso: add TCP/IPv4 GSO support > >> >> int >> >> rte_gso_segment(struct rte_mbuf *pkt, >> >> @@ -41,12 +46,53 @@ >> >> struct rte_mbuf **pkts_out, >> >> uint16_t nb_pkts_out) >> >> { >> >> + struct rte_mempool *direct_pool, *indirect_pool; >> >> + struct rte_mbuf *pkt_seg; >> >> + uint64_t ol_flags; >> >> + uint16_t gso_size; >> >> + uint8_t ipid_delta; >> >> + int ret = 1; >> >> + >> >> if (pkt == NULL || pkts_out == NULL || gso_ctx == NULL || >> >> nb_pkts_out < 1) >> >> return -EINVAL; >> >> >> >> - pkt->ol_flags &= (~PKT_TX_TCP_SEG); >> >> - pkts_out[0] = pkt; >> >> + if ((gso_ctx->gso_size >= pkt->pkt_len) || (gso_ctx->gso_types & >> >> + DEV_TX_OFFLOAD_TCP_TSO) != >> >> + gso_ctx->gso_types) { >> >> + pkt->ol_flags &= (~PKT_TX_TCP_SEG); >> >> + pkts_out[0] = pkt; >> >> + return 1; >> >> + } >> >> + >> >> + direct_pool = gso_ctx->direct_pool; >> >> + indirect_pool = gso_ctx->indirect_pool; >> >> + gso_size = gso_ctx->gso_size; >> >> + ipid_delta = (gso_ctx->ipid_flag != RTE_GSO_IPID_FIXED); >> >> + ol_flags = pkt->ol_flags; >> >> + >> >> + if (IS_IPV4_TCP(pkt->ol_flags)) { >> >> + pkt->ol_flags &= (~PKT_TX_TCP_SEG); >> >> + ret = gso_tcp4_segment(pkt, gso_size, ipid_delta, >> >> + direct_pool, indirect_pool, >> >> + pkts_out, nb_pkts_out); >> >> + } else { >> >> + pkt->ol_flags &= (~PKT_TX_TCP_SEG); >> > >> >Not sure why do you clean this flag if you don't support that packet type >> >and no action was perfomed? >> >Suppose you have a mix ipv4 and ipv6 packets - gso lib would do ipv4 and >> >someone else >> >(HW?) can do ipv4 segmentation. >> >> I can't say for definite, since I didn't implement this change. However, I >can only presume that the assumption here is that since >> segmentation is being done in S/W that the underlying H/W does not support >TSO. >> Since the underlying HW can't segment the packet in HW, we should clear the >flag; otherwise, if an mbuf marked for TCP segmentation is >> passed to the driver of a NIC that does not support/understand that feature, >the behavior is undefined. >> Is this a fair assumption in your opinion, or is it the case that the packet >would simply be transmitted un-segmented in that case, and so we >> shouldn't clear the flag? > >Yes, I think if we shouldn't clear the flag if we didn't do any segmentation >(we just encounter a packet type that we don't support). >Konstantin Okay, thanks for clarifying - I'll update the code accordingly. -Mark > >> >> Thanks again, >> Mark >> >> >BTW, did you notice that building of shared target fails? >> >Konstantin >> >> I didn't, but I'll take a look right now - thanks for the catch! >> >> > >> > >> >> + pkts_out[0] = pkt; >> >> + RTE_LOG(WARNING, GSO, "Unsupported packet type\n"); >> >> + return 1; >> >> + } >> >> + >> >> + if (ret > 1) { >> >> + pkt_seg = pkt; >> >> + while (pkt_seg) { >> >> + rte_mbuf_refcnt_update(pkt_seg, -1); >> >> + pkt_seg = pkt_seg->next; >> >> + } >> >> + } else if (ret < 0) { >> >> + /* Revert the ol_flags in the event of failure. */ >> >> + pkt->ol_flags = ol_flags; >> >> + } >> >> >> >> - return 1; >> >> + return ret; >> >> } >> >> -- >> >> 1.9.3
Re: [dpdk-dev] [PATCH v6 3/6] gso: add VxLAN GSO support
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 3:12 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 3/6] gso: add VxLAN GSO support > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Monday, October 2, 2017 5:46 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v6 3/6] gso: add VxLAN GSO support >> >> This patch adds a framework that allows GSO on tunneled packets. >> Furthermore, it leverages that framework to provide GSO support for >> VxLAN-encapsulated packets. >> >> Supported VxLAN packets must have an outer IPv4 header (prepended by an >> optional VLAN tag), and contain an inner TCP/IPv4 packet (with an optional >> inner VLAN tag). >> >> VxLAN GSO doesn't check if input packets have correct checksums and >> doesn't update checksums for output packets. Additionally, it doesn't >> process IP fragmented packets. >> >> As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each >> output packet, which mandates support for multi-segment mbufs in the TX >> functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO >> reduces its MBUF refcnt by 1. As a result, when all of its GSO'd segments >> are freed, the packet is freed automatically. >> >> Signed-off-by: Mark Kavanagh >> Signed-off-by: Jiayu Hu >> --- >> doc/guides/rel_notes/release_17_11.rst | 3 + >> lib/librte_gso/Makefile| 1 + >> lib/librte_gso/gso_common.h| 25 +++ >> lib/librte_gso/gso_tunnel_tcp4.c | 123 >+ >> lib/librte_gso/gso_tunnel_tcp4.h | 75 >> lib/librte_gso/rte_gso.c | 13 +++- >> 6 files changed, 237 insertions(+), 3 deletions(-) >> create mode 100644 lib/librte_gso/gso_tunnel_tcp4.c >> create mode 100644 lib/librte_gso/gso_tunnel_tcp4.h >> >> diff --git a/doc/guides/rel_notes/release_17_11.rst >b/doc/guides/rel_notes/release_17_11.rst >> index c414f73..25b8a78 100644 >> --- a/doc/guides/rel_notes/release_17_11.rst >> +++ b/doc/guides/rel_notes/release_17_11.rst >> @@ -48,6 +48,9 @@ New Features >>ones (e.g. MTU is 1500B). Supported packet types are: >> >>* TCP/IPv4 packets, which may include a single VLAN tag. >> + * VxLAN packets, which must have an outer IPv4 header (prepended by >> +an optional VLAN tag), and contain an inner TCP/IPv4 packet (with >> +an optional VLAN tag). >> >>The GSO library doesn't check if the input packets have correct >>checksums, and doesn't update checksums for output packets. >> diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile >> index 2be64d1..e6d41df 100644 >> --- a/lib/librte_gso/Makefile >> +++ b/lib/librte_gso/Makefile >> @@ -44,6 +44,7 @@ LIBABIVER := 1 >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c >> SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c >> +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel_tcp4.c >> >> # install this header file >> SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h >> diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h >> index 8d9b94e..c051295 100644 >> --- a/lib/librte_gso/gso_common.h >> +++ b/lib/librte_gso/gso_common.h >> @@ -39,6 +39,7 @@ >> #include >> #include >> #include >> +#include >> >> #define IS_FRAGMENTED(frag_off) (((frag_off) & IPV4_HDR_OFFSET_MASK) != 0 \ >> || ((frag_off) & IPV4_HDR_MF_FLAG) == IPV4_HDR_MF_FLAG) >> @@ -49,6 +50,30 @@ >> #define IS_IPV4_TCP(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4)) == \ >> (PKT_TX_TCP_SEG | PKT_TX_IPV4)) >> >> +#define IS_IPV4_VXLAN_TCP4(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4 | >\ >> +PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN)) == \ >> +(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \ >> + PKT_TX_TUNNEL_VXLAN)) >> + >> +/** >> + * Internal function which updates the UDP header of a packet, following >> + * segmentation. This is required to update the header's datagram length >field. >> + * >> + * @param pkt >> + * The packet containing the UDP header. >> + * @param udp_o
Re: [dpdk-dev] [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 4:09 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Monday, October 2, 2017 5:46 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO >> >> From: Jiayu Hu >> >> This patch adds GSO support to the csum forwarding engine. Oversized >> packets transmitted over a GSO-enabled port will undergo segmentation >> (with the exception of packet-types unsupported by the GSO library). >> GSO support is disabled by default. >> >> GSO support may be toggled on a per-port basis, using the command: >> >> "set port gso on|off" >> >> The maximum packet length (including the packet header and payload) for >> GSO segments may be set with the command: >> >> "set gso segsz " >> >> Show GSO configuration for a given port with the command: >> >> "show port gso" >> >> Signed-off-by: Jiayu Hu >> Signed-off-by: Mark Kavanagh >> --- >> app/test-pmd/cmdline.c | 178 > >> app/test-pmd/config.c | 24 >> app/test-pmd/csumonly.c | 69 ++- >> app/test-pmd/testpmd.c | 13 ++ >> app/test-pmd/testpmd.h | 10 ++ >> doc/guides/testpmd_app_ug/testpmd_funcs.rst | 46 +++ >> 6 files changed, 335 insertions(+), 5 deletions(-) >> >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> index ccdf239..05b0ce8 100644 >> --- a/app/test-pmd/cmdline.c >> +++ b/app/test-pmd/cmdline.c >> @@ -431,6 +431,17 @@ static void cmd_help_long_parsed(void *parsed_result, >> "Set max flow number and max packet number per-flow" >> " for GRO.\n\n" >> >> +"set port (port_id) gso (on|off)" >> +"Enable or disable Generic Segmentation Offload in" >> +" csum forwarding engine.\n\n" >> + >> +"set gso segsz (length)\n" >> +"Set max packet length for output GSO segments," >> +" including packet header and payload.\n\n" > >Probably a good future improvement would be to allow user to specify gso_type >too. Would you like to see that change implemented in time for the 17.11 release? > >> + >> +"show port (port_id) gso\n" >> +"Show GSO configuration.\n\n" >> + >> "set fwd (%s)\n" >> "Set packet forwarding mode.\n\n" >> >> @@ -3967,6 +3978,170 @@ struct cmd_gro_set_result { >> }, >> }; >> >> +/* *** ENABLE/DISABLE GSO *** */ >> +struct cmd_gso_enable_result { >> +cmdline_fixed_string_t cmd_set; >> +cmdline_fixed_string_t cmd_port; >> +cmdline_fixed_string_t cmd_keyword; >> +cmdline_fixed_string_t cmd_mode; >> +uint8_t cmd_pid; >> +}; >> + >> +static void >> +cmd_gso_enable_parsed(void *parsed_result, >> +__attribute__((unused)) struct cmdline *cl, >> +__attribute__((unused)) void *data) >> +{ >> +struct cmd_gso_enable_result *res; >> + >> +res = parsed_result; >> +if (!strcmp(res->cmd_keyword, "gso")) >> +setup_gso(res->cmd_mode, res->cmd_pid); >> +} >> + >> +cmdline_parse_token_string_t cmd_gso_enable_set = >> +TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result, >> +cmd_set, "set"); >> +cmdline_parse_token_string_t cmd_gso_enable_port = >> +TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result, >> +cmd_port, "port"); >> +cmdline_parse_token_string_t cmd_gso_enable_keyword = >> +TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result, >> +cmd_keyword, "gso"); >> +cmdline_parse_token_string_t cmd_gso_enable_mode = >>
Re: [dpdk-dev] [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO
>-Original Message- >From: Ananyev, Konstantin >Sent: Wednesday, October 4, 2017 5:27 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Wednesday, October 4, 2017 5:23 PM >> To: Ananyev, Konstantin ; dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >> Subject: RE: [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO >> >> >> >> >-Original Message----- >> >From: Ananyev, Konstantin >> >Sent: Wednesday, October 4, 2017 4:09 PM >> >To: Kavanagh, Mark B ; dev@dpdk.org >> >Cc: Hu, Jiayu ; Tan, Jianfeng ; >> >Yigit, Ferruh ; tho...@monjalon.net >> >Subject: RE: [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO >> > >> > >> > >> >> -Original Message- >> >> From: Kavanagh, Mark B >> >> Sent: Monday, October 2, 2017 5:46 PM >> >> To: dev@dpdk.org >> >> Cc: Hu, Jiayu ; Tan, Jianfeng >; >> >Ananyev, Konstantin ; Yigit, >> >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B >> > >> >> Subject: [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO >> >> >> >> From: Jiayu Hu >> >> >> >> This patch adds GSO support to the csum forwarding engine. Oversized >> >> packets transmitted over a GSO-enabled port will undergo segmentation >> >> (with the exception of packet-types unsupported by the GSO library). >> >> GSO support is disabled by default. >> >> >> >> GSO support may be toggled on a per-port basis, using the command: >> >> >> >> "set port gso on|off" >> >> >> >> The maximum packet length (including the packet header and payload) for >> >> GSO segments may be set with the command: >> >> >> >> "set gso segsz " >> >> >> >> Show GSO configuration for a given port with the command: >> >> >> >> "show port gso" >> >> >> >> Signed-off-by: Jiayu Hu >> >> Signed-off-by: Mark Kavanagh >> >> --- >> >> app/test-pmd/cmdline.c | 178 >> > >> >> app/test-pmd/config.c | 24 >> >> app/test-pmd/csumonly.c | 69 ++- >> >> app/test-pmd/testpmd.c | 13 ++ >> >> app/test-pmd/testpmd.h | 10 ++ >> >> doc/guides/testpmd_app_ug/testpmd_funcs.rst | 46 +++ >> >> 6 files changed, 335 insertions(+), 5 deletions(-) >> >> >> >> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c >> >> index ccdf239..05b0ce8 100644 >> >> --- a/app/test-pmd/cmdline.c >> >> +++ b/app/test-pmd/cmdline.c >> >> @@ -431,6 +431,17 @@ static void cmd_help_long_parsed(void >*parsed_result, >> >> "Set max flow number and max packet number per-flow" >> >> " for GRO.\n\n" >> >> >> >> + "set port (port_id) gso (on|off)" >> >> + "Enable or disable Generic Segmentation Offload in" >> >> + " csum forwarding engine.\n\n" >> >> + >> >> + "set gso segsz (length)\n" >> >> + "Set max packet length for output GSO segments," >> >> + " including packet header and payload.\n\n" >> > >> >Probably a good future improvement would be to allow user to specify >gso_type >> >too. >> >> Would you like to see that change implemented in time for the 17.11 release? > >I think that's too late for such change in 17.11. >My thought was about 18.02 here. >Konstantin No problem - thanks Konstantin.
Re: [dpdk-dev] [PATCH v7 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK
>From: Ananyev, Konstantin >Sent: Thursday, October 5, 2017 2:23 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v7 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK > >Hi Mark, > >> >> Generic Segmentation Offload (GSO) is a SW technique to split large >> packets into small ones. Akin to TSO, GSO enables applications to >> operate on large packets, thus reducing per-packet processing overhead. >> >> To enable more flexibility to applications, DPDK GSO is implemented >> as a standalone library. Applications explicitly use the GSO library >> to segment packets. This patch adds GSO support to DPDK for specific >> packet types: specifically, TCP/IPv4, VxLAN, and GRE. >> >> The first patch introduces the GSO API framework. The second patch >> adds GSO support for TCP/IPv4 packets (containing an optional VLAN >> tag). The third patch adds GSO support for VxLAN packets that contain >> outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or >> outer VLAN tags). The fourth patch adds GSO support for GRE packets >> that contain outer IPv4, and inner TCP/IPv4 headers (with optional >> outer VLAN tag). The fifth patch in the series enables TCP/IPv4, VxLAN, >> and GRE GSO in testpmd's checksum forwarding engine. The final patch >> in the series adds GSO documentation to the programmer's guide. >> >> Performance Testing >> === >> The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using >> iperf. Setup for the test is described as follows: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum >>forwarding engine with "retry". >> c. Select IP and TCP HW checksum calculation for P0; select TCP HW >>checksum calculation for vhost-user port. >> d. Launch a VM with csum and tso offloading enabled. >> e. Run iperf-client on virtio-net port in the VM to send TCP packets. >>With enabling csum and tso, the VM can send large TCP/IPv4 packets >>(mss is up to 64KB). >> f. P1 is assigned to linux kernel and enabled kernel GRO. Run >>iperf-server on P1. >> >> We conduct three iperf tests: >> >> test-1: enable GSO for P0 in testpmd, and set max GSO segment length >> to 1518B. Run two iperf-client in the VM. >> test-2: enable TSO for P0 in testpmd, and set TSO segsz to 1518B. Run >> two iperf-client in the VM. >> test-3: disable GSO and TSO in testpmd. Run two iperf-client in the VM. >> >> Throughput of the above three tests: >> >> test-1: 9.4Gbps >> test-2: 9.5Gbps >> test-3: 3Mbps >> >> Functional Testing >> == >> Unlike TCP packets, VMs can't send large VxLAN or GRE packets. The max >> length of tunneled packets from VMs is 1514B. So current experiment >> method can't be used to measure VxLAN and GRE GSO performance, but simply >> test the functionality via setting small GSO segment length (e.g. 500B). >> >> VxLAN >> - >> To test VxLAN GSO functionality, we use the following setup: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum forwarding >>engine with "retry". >> c. Testpmd commands: >> - csum parse_tunnel on "P0" >> - csum parse_tunnel on "vhost-user port" >> - csum set outer-ip hw "P0" >> - csum set ip hw "P0" >> - csum set tcp hw "P0" >> - csum set tcp hw "vhost-user port" >> - set port "P0" gso on >> - set gso segsz 500 >> d. Launch a VM with csum and tso offloading enabled. >> e. Create a vxlan port for the virtio-net port in the VM. Run iperf-client >>on the VxLAN port, so TCP packets are VxLAN encapsulated. However, the >>max packet length is 1514B. >> f. P1 is assigned to linux kernel and kernel GRO is disabled. Similarly, >>create a VxLAN port for P1, and run iperf-server on the VxLAN port. >> >> In testpmd, we can see the length of all packets sent from P0 is smaller >> than or equal to 500B. Additionally, the packets arriving in P1 is >> encapsulated and is smaller than or equal to 500B. >> >> GRE >> --- >> The same process may be used to test GRE functionality, with the exception >that &g
Re: [dpdk-dev] [PATCH v8 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK
>-Original Message- >From: Ananyev, Konstantin >Sent: Thursday, October 5, 2017 6:12 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v8 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK > >Hi Mark, > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Thursday, October 5, 2017 4:44 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v8 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK >> >> Generic Segmentation Offload (GSO) is a SW technique to split large >> packets into small ones. Akin to TSO, GSO enables applications to >> operate on large packets, thus reducing per-packet processing overhead. >> >> To enable more flexibility to applications, DPDK GSO is implemented >> as a standalone library. Applications explicitly use the GSO library >> to segment packets. This patch adds GSO support to DPDK for specific >> packet types: specifically, TCP/IPv4, VxLAN, and GRE. >> >> The first patch introduces the GSO API framework. The second patch >> adds GSO support for TCP/IPv4 packets (containing an optional VLAN >> tag). The third patch adds GSO support for VxLAN packets that contain >> outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or >> outer VLAN tags). The fourth patch adds GSO support for GRE packets >> that contain outer IPv4, and inner TCP/IPv4 headers (with optional >> outer VLAN tag). The fifth patch in the series enables TCP/IPv4, VxLAN, >> and GRE GSO in testpmd's checksum forwarding engine. The final patch >> in the series adds GSO documentation to the programmer's guide. >> >> Performance Testing >> === >> The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using >> iperf. Setup for the test is described as follows: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum >>forwarding engine with "retry". >> c. Select IP and TCP HW checksum calculation for P0; select TCP HW >>checksum calculation for vhost-user port. >> d. Launch a VM with csum and tso offloading enabled. >> e. Run iperf-client on virtio-net port in the VM to send TCP packets. >>With enabling csum and tso, the VM can send large TCP/IPv4 packets >>(mss is up to 64KB). >> f. P1 is assigned to linux kernel and enabled kernel GRO. Run >>iperf-server on P1. >> >> We conduct three iperf tests: >> >> test-1: enable GSO for P0 in testpmd, and set max GSO segment length >> to 1518B. Run two iperf-client in the VM. >> test-2: enable TSO for P0 in testpmd, and set TSO segsz to 1518B. Run >> two iperf-client in the VM. >> test-3: disable GSO and TSO in testpmd. Run two iperf-client in the VM. >> >> Throughput of the above three tests: >> >> test-1: 9.4Gbps >> test-2: 9.5Gbps >> test-3: 3Mbps >> >> Functional Testing >> == >> Unlike TCP packets, VMs can't send large VxLAN or GRE packets. The max >> length of tunneled packets from VMs is 1514B. So current experiment >> method can't be used to measure VxLAN and GRE GSO performance, but simply >> test the functionality via setting small GSO segment length (e.g. 500B). >> >> VxLAN >> - >> To test VxLAN GSO functionality, we use the following setup: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum forwarding >>engine with "retry". >> c. Testpmd commands: >> - csum parse_tunnel on "P0" >> - csum parse_tunnel on "vhost-user port" >> - csum set outer-ip hw "P0" >> - csum set ip hw "P0" >> - csum set tcp hw "P0" >> - csum set tcp hw "vhost-user port" >> - set port "P0" gso on >> - set gso segsz 500 >> d. Launch a VM with csum and tso offloading enabled. >> e. Create a vxlan port for the virtio-net port in the VM. Run iperf-client >>on the VxLAN port, so TCP packets are VxLAN encapsulated. However, the >>max packet length is 1514B. >> f. P1 is assigned to linux kernel and kernel GRO is disabled. Similarly, >>create a VxLAN port for P1, an
[dpdk-dev] FW: [PATCH v9 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK
FYI - GSO series has been acked, so it will be included in DPDK v17.11 release. -Mark >-Original Message- >From: Ananyev, Konstantin >Sent: Thursday, October 5, 2017 11:24 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v9 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Thursday, October 5, 2017 9:37 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v9 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK >> >> Generic Segmentation Offload (GSO) is a SW technique to split large >> packets into small ones. Akin to TSO, GSO enables applications to >> operate on large packets, thus reducing per-packet processing overhead. >> >> To enable more flexibility to applications, DPDK GSO is implemented >> as a standalone library. Applications explicitly use the GSO library >> to segment packets. This patch adds GSO support to DPDK for specific >> packet types: specifically, TCP/IPv4, VxLAN, and GRE. >> >> The first patch introduces the GSO API framework. The second patch >> adds GSO support for TCP/IPv4 packets (containing an optional VLAN >> tag). The third patch adds GSO support for VxLAN packets that contain >> outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or >> outer VLAN tags). The fourth patch adds GSO support for GRE packets >> that contain outer IPv4, and inner TCP/IPv4 headers (with optional >> outer VLAN tag). The fifth patch in the series enables TCP/IPv4, VxLAN, >> and GRE GSO in testpmd's checksum forwarding engine. The final patch >> in the series adds GSO documentation to the programmer's guide. >> >> Performance Testing >> === >> The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using >> iperf. Setup for the test is described as follows: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum >>forwarding engine with "retry". >> c. Select IP and TCP HW checksum calculation for P0; select TCP HW >>checksum calculation for vhost-user port. >> d. Launch a VM with csum and tso offloading enabled. >> e. Run iperf-client on virtio-net port in the VM to send TCP packets. >>With enabling csum and tso, the VM can send large TCP/IPv4 packets >>(mss is up to 64KB). >> f. P1 is assigned to linux kernel and enabled kernel GRO. Run >>iperf-server on P1. >> >> We conduct three iperf tests: >> >> test-1: enable GSO for P0 in testpmd, and set max GSO segment length >> to 1518B. Run two iperf-client in the VM. >> test-2: enable TSO for P0 in testpmd, and set TSO segsz to 1518B. Run >> two iperf-client in the VM. >> test-3: disable GSO and TSO in testpmd. Run two iperf-client in the VM. >> >> Throughput of the above three tests: >> >> test-1: 9.4Gbps >> test-2: 9.5Gbps >> test-3: 3Mbps >> >> Functional Testing >> == >> Unlike TCP packets, VMs can't send large VxLAN or GRE packets. The max >> length of tunneled packets from VMs is 1514B. So current experiment >> method can't be used to measure VxLAN and GRE GSO performance, but simply >> test the functionality via setting small GSO segment length (e.g. 500B). >> >> VxLAN >> - >> To test VxLAN GSO functionality, we use the following setup: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum forwarding >>engine with "retry". >> c. Testpmd commands: >> - csum parse_tunnel on "P0" >> - csum parse_tunnel on "vhost-user port" >> - csum set outer-ip hw "P0" >> - csum set ip hw "P0" >> - csum set tcp hw "P0" >> - csum set tcp hw "vhost-user port" >> - set port "P0" gso on >> - set gso segsz 500 >> d. Launch a VM with csum and tso offloading enabled. >> e. Create a vxlan port for the virtio-net port in the VM. Run iperf-client >>on the VxLAN port, so TCP packets are VxLAN encapsulated. However, the >>max packet length is 1514B. >> f. P1 is assigned to linux kernel and
Re: [dpdk-dev] [PATCH v9 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK
>-Original Message- >From: Ananyev, Konstantin >Sent: Thursday, October 5, 2017 11:24 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Yigit, Ferruh ; tho...@monjalon.net >Subject: RE: [PATCH v9 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK > > > >> -Original Message- >> From: Kavanagh, Mark B >> Sent: Thursday, October 5, 2017 9:37 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, >> Ferruh ; tho...@monjalon.net; Kavanagh, Mark B > >> Subject: [PATCH v9 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK >> >> Generic Segmentation Offload (GSO) is a SW technique to split large >> packets into small ones. Akin to TSO, GSO enables applications to >> operate on large packets, thus reducing per-packet processing overhead. >> >> To enable more flexibility to applications, DPDK GSO is implemented >> as a standalone library. Applications explicitly use the GSO library >> to segment packets. This patch adds GSO support to DPDK for specific >> packet types: specifically, TCP/IPv4, VxLAN, and GRE. >> >> The first patch introduces the GSO API framework. The second patch >> adds GSO support for TCP/IPv4 packets (containing an optional VLAN >> tag). The third patch adds GSO support for VxLAN packets that contain >> outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or >> outer VLAN tags). The fourth patch adds GSO support for GRE packets >> that contain outer IPv4, and inner TCP/IPv4 headers (with optional >> outer VLAN tag). The fifth patch in the series enables TCP/IPv4, VxLAN, >> and GRE GSO in testpmd's checksum forwarding engine. The final patch >> in the series adds GSO documentation to the programmer's guide. >> >> Performance Testing >> === >> The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using >> iperf. Setup for the test is described as follows: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum >>forwarding engine with "retry". >> c. Select IP and TCP HW checksum calculation for P0; select TCP HW >>checksum calculation for vhost-user port. >> d. Launch a VM with csum and tso offloading enabled. >> e. Run iperf-client on virtio-net port in the VM to send TCP packets. >>With enabling csum and tso, the VM can send large TCP/IPv4 packets >>(mss is up to 64KB). >> f. P1 is assigned to linux kernel and enabled kernel GRO. Run >>iperf-server on P1. >> >> We conduct three iperf tests: >> >> test-1: enable GSO for P0 in testpmd, and set max GSO segment length >> to 1518B. Run two iperf-client in the VM. >> test-2: enable TSO for P0 in testpmd, and set TSO segsz to 1518B. Run >> two iperf-client in the VM. >> test-3: disable GSO and TSO in testpmd. Run two iperf-client in the VM. >> >> Throughput of the above three tests: >> >> test-1: 9.4Gbps >> test-2: 9.5Gbps >> test-3: 3Mbps >> >> Functional Testing >> == >> Unlike TCP packets, VMs can't send large VxLAN or GRE packets. The max >> length of tunneled packets from VMs is 1514B. So current experiment >> method can't be used to measure VxLAN and GRE GSO performance, but simply >> test the functionality via setting small GSO segment length (e.g. 500B). >> >> VxLAN >> - >> To test VxLAN GSO functionality, we use the following setup: >> >> a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same >>machine, together physically. >> b. Launch testpmd with P0 and a vhost-user port, and use csum forwarding >>engine with "retry". >> c. Testpmd commands: >> - csum parse_tunnel on "P0" >> - csum parse_tunnel on "vhost-user port" >> - csum set outer-ip hw "P0" >> - csum set ip hw "P0" >> - csum set tcp hw "P0" >> - csum set tcp hw "vhost-user port" >> - set port "P0" gso on >> - set gso segsz 500 >> d. Launch a VM with csum and tso offloading enabled. >> e. Create a vxlan port for the virtio-net port in the VM. Run iperf-client >>on the VxLAN port, so TCP packets are VxLAN encapsulated. However, the >>max packet length is 1514B. >> f. P1 is assigned to linux kernel and kernel GRO is disabled. Similarly, >>create a VxLAN port for P1, and run iperf-s
Re: [dpdk-dev] [PATCH v9 6/6] doc: add GSO programmer's guide
>-Original Message- >From: Mcnamara, John >Sent: Friday, October 6, 2017 2:35 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, Ferruh >; tho...@monjalon.net; Kavanagh, Mark B > >Subject: RE: [dpdk-dev] [PATCH v9 6/6] doc: add GSO programmer's guide > > > >> -Original Message- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Mark Kavanagh >> Sent: Thursday, October 5, 2017 9:37 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng >> ; Ananyev, Konstantin >> ; Yigit, Ferruh ; >> tho...@monjalon.net; Kavanagh, Mark B >> Subject: [dpdk-dev] [PATCH v9 6/6] doc: add GSO programmer's guide >> >> Add programmer's guide doc to explain the design and use of the >> GSO library. >> >> Signed-off-by: Mark Kavanagh >> Signed-off-by: Jiayu Hu > > >Hi Mark, > >If the docs (or another part of the patchset) were previously acked but >not changed in a new patchset then it is okay to include the previous >ack line. Of course - apologies John. > >Acked-by: John McNamara
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/bnxt: fix compilation
>From: De Lara Guarch, Pablo >Sent: Monday, October 9, 2017 3:15 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: sta...@dpdk.org; Yigit, Ferruh ; >ajit.khapa...@broadcom.com; Kavanagh, Mark B >Subject: RE: [dpdk-stable] [PATCH] net/bnxt: fix compilation > > > >> -Original Message- >> From: stable [mailto:stable-boun...@dpdk.org] On Behalf Of Mark >> Kavanagh >> Sent: Monday, October 9, 2017 3:00 PM >> To: dev@dpdk.org >> Cc: sta...@dpdk.org; Yigit, Ferruh ; >> ajit.khapa...@broadcom.com; Kavanagh, Mark B >> >> Subject: [dpdk-stable] [PATCH] net/bnxt: fix compilation >> >> As of 5ef3b79fdfe6f, compilation of DPDK fails with the following error >> message: >> "bnxt_filter.c:960:117: error: ‘vnic’ may be used uninitialized in this >> function [-Werror=maybe-uninitialized]". >> >> Resolve this by initializing 'vnic' to NULL; >> >> Fixes: 5ef3b79fdfe6f ("net/bnxt: support flow filter ops") >> CC: sta...@dpdk.org > >Hi Mark, > >You don't need to cc stable, since you are fixing code that was sent in this >release. >Only fixes for patches sent in previous DPDK versions need to be applied to >the stable repo. Thanks for the clarification Pablo - makes sense. Removing sta...@dpdk.org from this thread. Cheers, Mark > >Regards, >Pablo
Re: [dpdk-dev] [PATCH] net/bnxt: fix compilation
>From: Yigit, Ferruh >Sent: Monday, October 9, 2017 6:34 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: ajit.khapa...@broadcom.com >Subject: Re: [PATCH] net/bnxt: fix compilation > >On 10/9/2017 2:59 PM, Mark Kavanagh wrote: >> As of 5ef3b79fdfe6f, compilation of DPDK fails with the following >> error message: >> "bnxt_filter.c:960:117: error: ‘vnic’ may be used uninitialized in this >> function [-Werror=maybe-uninitialized]". >> >> Resolve this by initializing 'vnic' to NULL; >> >> Fixes: 5ef3b79fdfe6f ("net/bnxt: support flow filter ops") > >> >> Signed-off-by: Mark Kavanagh > >Applied to dpdk-next-net/master, thanks. > >Can you please provide compiler and version information? >I get the commit for now and will update it later with that information. Thanks Ferruh - compiler info is as follows: gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) Best, Mark
Re: [dpdk-dev] [PATCH v7 5/6] igb_uio: use kernel functions for masking MSI-X
Hi, This commit renders igb_uio unusable on Fedora 20, kernel version 3.19.8-100.fc20.x86_64. During the build (make -j 20), a warning is issued for igb_uio regarding a missing symbol (pci_msi_unmask): WARNING: "pci_msi_unmask_irq" [/home//x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.ko] undefined! WARNING: "pci_msi_mask_irq" [/home//x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.ko] undefined! Subsequently, the module is may not be loaded on account of same. (from dmesg): [673425.712110] igb_uio: Unknown symbol pci_msi_unmask_irq (err 0) [673425.712124] igb_uio: Unknown symbol pci_msi_mask_irq (err 0) Thanks, Mark >-Original Message- >From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Markus Theil >Sent: Tuesday, September 5, 2017 1:04 PM >To: dev@dpdk.org >Cc: Yigit, Ferruh ; step...@networkplumber.org; Markus >Theil >Subject: [dpdk-dev] [PATCH v7 5/6] igb_uio: use kernel functions for masking >MSI-X > >This patch removes the custom MSI-X mask/unmask code and >uses already existing kernel functions. > >Signed-off-by: Markus Theil >--- > lib/librte_eal/linuxapp/igb_uio/compat.h | 26 +--- > lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 51 -- >- > 2 files changed, 28 insertions(+), 49 deletions(-) > >diff --git a/lib/librte_eal/linuxapp/igb_uio/compat.h >b/lib/librte_eal/linuxapp/igb_uio/compat.h >index 3825933..67a7ab3 100644 >--- a/lib/librte_eal/linuxapp/igb_uio/compat.h >+++ b/lib/librte_eal/linuxapp/igb_uio/compat.h >@@ -15,24 +15,6 @@ > #define HAVE_PTE_MASK_PAGE_IOMAP > #endif > >-#ifndef PCI_MSIX_ENTRY_SIZE >-#define PCI_MSIX_ENTRY_SIZE 16 >-#define PCI_MSIX_ENTRY_LOWER_ADDR 0 >-#define PCI_MSIX_ENTRY_UPPER_ADDR 4 >-#define PCI_MSIX_ENTRY_DATA8 >-#define PCI_MSIX_ENTRY_VECTOR_CTRL 12 >-#define PCI_MSIX_ENTRY_CTRL_MASKBIT 1 >-#endif >- >-/* >- * for kernels < 2.6.38 and backported patch that moves MSI-X entry >definition >- * to pci_regs.h Those kernels has PCI_MSIX_ENTRY_SIZE defined but not >- * PCI_MSIX_ENTRY_CTRL_MASKBIT >- */ >-#ifndef PCI_MSIX_ENTRY_CTRL_MASKBIT >-#define PCI_MSIX_ENTRY_CTRL_MASKBIT1 >-#endif >- > #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 34) && \ > (!(defined(RHEL_RELEASE_CODE) && \ >RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(5, 9))) >@@ -127,3 +109,11 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev) > #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 8, 0) > #define HAVE_ALLOC_IRQ_VECTORS 1 > #endif >+ >+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 19, 0) >+#define HAVE_PCI_MSI_MASK_IRQ 1 >+#endif >+ >+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37) >+#define HAVE_IRQ_DATA 1 >+#endif >diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >index c570eed..e4ef817 100644 >--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >@@ -91,27 +91,6 @@ static struct attribute *dev_attrs[] = { > static const struct attribute_group dev_attr_grp = { > .attrs = dev_attrs, > }; >-/* >- * It masks the msix on/off of generating MSI-X messages. >- */ >-static void >-igbuio_msix_mask_irq(struct msi_desc *desc, int32_t state) >-{ >- u32 mask_bits = desc->masked; >- unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE + >- PCI_MSIX_ENTRY_VECTOR_CTRL; >- >- if (state != 0) >- mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; >- else >- mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; >- >- if (mask_bits != desc->masked) { >- writel(mask_bits, desc->mask_base + offset); >- readl(desc->mask_base); >- desc->masked = mask_bits; >- } >-} > > /** > * This is the irqcontrol callback to be registered to uio_info. >@@ -132,21 +111,31 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 >irq_state) > struct rte_uio_pci_dev *udev = info->priv; > struct pci_dev *pdev = udev->pdev; > >- pci_cfg_access_lock(pdev); >- if (udev->mode == RTE_INTR_MODE_LEGACY) >- pci_intx(pdev, !!irq_state); >+#ifdef HAVE_IRQ_DATA >+ struct irq_data *irq = irq_get_irq_data(udev->info.irq); >+#else >+ unsigned int irq = udev->info.irq; >+#endif > >- else if (udev->mode == RTE_INTR_MODE_MSIX) { >- struct msi_desc *desc; >+ pci_cfg_access_lock(pdev); > >-#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 3, 0)) >- list_for_each_entry(desc, &pdev->msi_list, list) >- igbuio_msix_mask_irq(desc, irq_state); >+ if (udev->mode == RTE_INTR_MODE_MSIX) { >+#ifdef HAVE_PCI_MSI_MASK_IRQ >+ if (irq_state == 1) >+ pci_msi_unmask_irq(irq); >+ else >+ pci_msi_mask_irq(irq); > #else >- list_for_each_entry(desc, &pdev->dev.ms
Re: [dpdk-dev] [PATCH] doc: minor fixes for GSO prog_guide
>From: Mcnamara, John >Sent: Thursday, October 12, 2017 4:29 PM >To: Kavanagh, Mark B ; dev@dpdk.org >Cc: Hu, Jiayu ; Tan, Jianfeng ; >Ananyev, Konstantin ; Yigit, Ferruh >; tho...@monjalon.net; Kavanagh, Mark B > >Subject: RE: [dpdk-dev] [PATCH] doc: minor fixes for GSO prog_guide > > > >> -Original Message- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Mark Kavanagh >> Sent: Thursday, October 12, 2017 2:46 PM >> To: dev@dpdk.org >> Cc: Hu, Jiayu ; Tan, Jianfeng >> ; Ananyev, Konstantin >> ; Yigit, Ferruh ; >> tho...@monjalon.net; Kavanagh, Mark B >> Subject: [dpdk-dev] [PATCH] doc: minor fixes for GSO prog_guide >> >> Correct two minor issues in the GSO programmer's guide: >> - a note is rendered incorrectly in the middle of an unordered list; >> this results in the remainder of the list appearing inside the note. >> Move note to end of the list to resolve same. >> - two minor visual artifacts are present in the 'three-part-output- >> segment' >> diagram. Remove same. >> >> Fixes: f6010c7 ("doc: add GSO programmer's guide") >> >> Signed-off-by: Mark Kavanagh > > > >> +.. _note: >> +.. note:: >> + >> + An application may use the same pool for both direct and indirect >> + buffers. However, since indirect mbufs simply store a pointer, the > >Hi Mark, Hey John, Thanks for the feedback - responses inline. I'll post v2 now, and carry forward your Ack. Cheers, Mark > >The _note target should be more specific since the these targets are used >across all the documentation. So something like _gso_note would be better. >Or alternatively leave out the target and just say something in the text >like "See the note below.", without the link, since the note is quite close >to the referring text. I've moved the note back to its original position, but corrected the indentation, so this target can be removed. > >Also, the note isn't indented to the level of the bullet point. I don't know >if that is intentional or not but I would say there should be an additional >3 spaces indentation from ..note:: to the end of the note. As above (the syntax highlighting of my editor shows indented 'note' tags as incorrect, which is why indentation was incorrect in the first place!). > >With or without these changes: > >Acked-by: John McNamara > >
Re: [dpdk-dev] [PATCH v7 5/6] igb_uio: use kernel functions for masking MSI-X
>From: Yigit, Ferruh >Sent: Friday, October 13, 2017 3:50 AM >To: Patrick MacArthur ; Markus Theil >; dev@dpdk.org; Kavanagh, Mark B > >Cc: step...@networkplumber.org; Bob Noseworthy ; Patrick >MacArthur ; O'Driscoll, Tim >Subject: Re: [dpdk-dev] [PATCH v7 5/6] igb_uio: use kernel functions for >masking MSI-X > >On 10/12/2017 6:04 PM, Ferruh Yigit wrote: >> On 10/9/2017 10:56 PM, Patrick MacArthur wrote: >>> Hi, Markus, >>> >>> This commit appears to cause a regression on CentOS 7.4 with the in-box >>> Linux 3.10.0-639-2.2.el7.x86_64 kernel. Although the kernel module >>> appears to build correctly, when I attempt to load the module with >>> insmod, it fails and I see the following errors in dmesg: >>> >>>> [620323.805125] igb_uio: Unknown symbol unmask_msi_irq (err 0) >>>> [620323.805163] igb_uio: Unknown symbol mask_msi_irq (err 0) >>> >> >> Hi Patrick, Mark, >> >> Thanks for reporting, I will check it. > >Can you please test http://dpdk.org/dev/patchwork/patch/30325/ Hey Ferruh, I can confirm that this patch resolves the build issues on Fedora 20, kernel 3.19.8-100.fc20.x86_64 - thanks! Incidentally, there is a minor warning reported when applying the patch: Applying: igb_uio: fix unknown symbols /home//.git/rebase-apply/patch:47: new blank line at EOF. + Thanks again, Mark > >> >> >>> It also fails with the same dmesg errors if I copy it into >>> /lib/modules/$(uname -r)/extra, run depmod, and try to modprobe it. >>> >>> Running git bisect points to this commit as the root cause. >>> >>> This issue was identified as part of setting up the DPDK performance >>> test lab CI environment at the University of New Hampshire >>> InterOperability Laboratory. >>> >>> Thanks, >>> Patrick >>> >> <...> >>
Re: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
Hi Maxime, First off, apologies for the lateness of this reply - I realize that this patch has already been upstreamed. Unfortunately, during OvS-DPDK regression testing for DPDK v17.11-rc2 just today, a regression involving vHost multiq was detected, and pinpointed to this patch. Version info for the components involved during the aforementioned testing is as follows: DPDK: v17.11-rc2 OvS:af2e40c ("sparse: eliminate "duplicate initialization") + DPDK v17.11 upgrade patch QEMU: v2.7.0 The regression may be reproduced as follows: - Set up OvS-DPDK as normal, and add one vHostUser client port: $OVS_DIR/utilities/ovs-vsctl add-port br0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient - Start QEMU, specifying 2 ports for the guest interface" $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ -cpu host -enable-kvm -m 4096M \ -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -drive file=$VM_IMAGE \ -chardev socket,id=char0,path=/tmp/sock0,server \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 \ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off,mq=on,vectors=6 \ -nographic" - The guest subsequently starts as normal, but then hangs completely, rendering it completely unusable. The last lines of ovs-vswitchd.log read as follows: |00051|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE |00052|dpdk|INFO|VHOST_CONFIG: set queue enable: 1 to qp idx: 1 |00053|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_FEATURES Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein lies the issue: QEMU v2.10.0 was only released in August of this year; anecdotally, we know that many OvS-DPDK customers use older versions of QEMU (typically, v2.7.0), and are likely un[able|willing] to move. With this patch, a hard dependency on QEMU v2.10 is created for users who want to use the vHU multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), which IMO will likely be unacceptable for many. One potential solution to this problem is to introduce a compile-time option that would allow the user to [dis|en]able the VHOST_USER_PROTOCOL_F_REPLY_ACK feature - is that something that would be acceptable to you Maxime? Thanks in advance, Mark >From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin >Sent: Thursday, October 5, 2017 9:36 AM >To: dev@dpdk.org; Horton, Remy ; Bie, Tiwei >; y...@fridaylinux.org >Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; >jasow...@redhat.com; Maxime Coquelin >Subject: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to >startup" > >This reverts commit 04d81227960b5c1cf2f11f492100979ead20c526. > >As agreed when this workaround was introduced, it can be reverted >as Qemu v2.10 that fixes the issue is now out. > >The reply-ack feature is required for vhost-user IOMMU support. > >Signed-off-by: Maxime Coquelin >--- > lib/librte_vhost/vhost_user.h | 6 +- > 1 file changed, 1 insertion(+), 5 deletions(-) > >diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h >index 35ebd7190..2ba22dbb0 100644 >--- a/lib/librte_vhost/vhost_user.h >+++ b/lib/librte_vhost/vhost_user.h >@@ -49,14 +49,10 @@ > #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 > #define VHOST_USER_PROTOCOL_F_NET_MTU 4 > >-/* >- * disable REPLY_ACK feature to workaround the buggy QEMU implementation. >- * Proved buggy QEMU includes v2.7 - v2.9. >- */ > #define VHOST_USER_PROTOCOL_FEATURES ((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \ >(1ULL << > VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\ >(1ULL << VHOST_USER_PROTOCOL_F_RARP) | > \ >- (0ULL << >VHOST_USER_PROTOCOL_F_REPLY_ACK) | \ >+ (1ULL << >VHOST_USER_PROTOCOL_F_REPLY_ACK) | \ >(1ULL << > VHOST_USER_PROTOCOL_F_NET_MTU)) > > typedef enum VhostUserRequest { >-- >2.13.6
Re: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup"
>From: Thomas Monjalon [mailto:tho...@monjalon.net] >Sent: Friday, November 3, 2017 3:35 PM >To: Kavanagh, Mark B ; y...@fridaylinux.org >Cc: dev@dpdk.org; Maxime Coquelin ; Horton, Remy >; Bie, Tiwei ; m...@redhat.com; >jfrei...@redhat.com; vkapl...@redhat.com; jasow...@redhat.com; Mcnamara, John >; Loftus, Ciara ; Stokes, Ian > >Subject: Re: [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to >startup" > >02/11/2017 10:40, Maxime Coquelin: >> Hi Mark, >> >> On 11/01/2017 06:11 PM, Kavanagh, Mark B wrote: >> > Hi Maxime, >> > >> > First off, apologies for the lateness of this reply - I realize that this >patch has already been upstreamed. >> >> No worries, great to see DPDK integration being tested before v17.11 is >> released. Is the v17.11 upgrade patch available somewhere? >> >> > Unfortunately, during OvS-DPDK regression testing for DPDK v17.11-rc2 just >today, a regression involving vHost multiq was detected, and pinpointed to >this patch. >> > >> > Version info for the components involved during the aforementioned testing >is as follows: >> > DPDK: v17.11-rc2 >> > OvS: af2e40c ("sparse: eliminate "duplicate initialization") + DPDK >v17.11 upgrade patch >> > QEMU: v2.7.0 >[...] >> > Moving from QEMU v2.7.0 to v2.10.0 resolves the issue. However, herein >lies the issue: QEMU v2.10.0 was only released in August of this year; >anecdotally, we know that many OvS-DPDK customers use older versions of QEMU >(typically, v2.7.0), and are likely un[able|willing] to move. With this patch, >a hard dependency on QEMU v2.10 is created for users who want to use the vHU >multiq feature in DPDK v17.11 (and subsequently, the upcoming OvS v2.9.0), >which IMO will likely be unacceptable for many. >> >> Do you mean that upstream Qemu v2.7.0 is used in production? >> I would expect the customers to use a distro Qemu which should contain >> relevant fixes, or follow upstream's stable branches. > To be honest, I don't have hard data to back this up, apart from anecdotal reports that "some customers use 'older' versions of QEMU". I understand that this is not the most solid foundation upon which to build an argument. >Me too, I would expect they integrate the fixes. > >> FYI, Qemu v2.9.1 contains a backport of the fix. > >But you know, some users do not want to upgrade anything in production, >as in the old time of hardware networking equipments. >Curiously, the case considered here seems to be users sticked to old Qemu >while willing to consider the switch as an upgradable software. >It is really really strange, but let's consider such constraint. > >If I remember well, we have integrated the vhost multiqueue feature >as soon as a Qemu release was almost ready. >For the record, it was Qemu 2.5.0 (released in Dec 2015): > https://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg02731.html > https://wiki.qemu.org/ChangeLog/2.5#virtio >And it was supported in DPDK 2.2.0 (released one day before): > http://dpdk.org/ml/archives/announce/2015-December/73.html > http://dpdk.org/doc/guides/rel_notes/release_2_2.html Hmm, that's an interesting data point. > >Nowadays, Qemu 2.10 is ready to let us enable IOMMU support. >But you ask to wait more. How much time should we wait? >Is there a policy to agree or are we just waiting for someone to approve? I'm afraid that I don't have an answer for this. My concern here was purely proactive - however, without concrete data to back it up, and in the light of Thomas' point regarding 2.5.0/DPDK 2.2.0, perhaps my concerns are unfounded. It may be sufficient therefore, to simply document compatible versions of QEMU as part of the OvS documentation. Thanks to all involved for your input, Mark
Re: [dpdk-dev] [PATCH] vhost: disable reply-ack protocol feature if iommu feature disabled
>From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] >Sent: Friday, November 3, 2017 5:58 PM >To: dev@dpdk.org; y...@fridaylinux.org; Kavanagh, Mark B >; tho...@monjalon.net; ktray...@redhat.com >Cc: Maxime Coquelin >Subject: [PATCH] vhost: disable reply-ack protocol feature if iommu feature >disabled > >If the application has disabled VIRTIO_F_IOMMU_PLATFORM, disable >VHOST_USER_PROTOCOL_F_REPLY_ACK protocol feature that is only >mandatory with IOMMU for now. > >This is done to provide a way for the application to support >multiqueue with old Qemu versions (v2.7.0 to v2.9.0) that have >reply-ack feature broken. > >Signed-off-by: Maxime Coquelin >--- > >This is an alternative to my proposition of adding a new flag at >vhost register time. Advantage of this solution is that it does >not bring API change. > > lib/librte_vhost/vhost_user.c | 24 ++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > >diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c >index 1f6cba4b9..e35218688 100644 >--- a/lib/librte_vhost/vhost_user.c >+++ b/lib/librte_vhost/vhost_user.c >@@ -878,6 +878,27 @@ vhost_user_set_vring_enable(struct virtio_net **pdev, > } > > static void >+vhost_user_get_protocol_features(struct virtio_net *dev, >+ struct VhostUserMsg *msg) >+{ >+ uint64_t features, protocol_features = VHOST_USER_PROTOCOL_FEATURES; >+ >+ rte_vhost_driver_get_features(dev->ifname, &features); >+ >+ /* >+ * REPLY_ACK protocol feature is only mandatory for now >+ * for IOMMU feature. If IOMMU is explicitly disabled by the >+ * application, disable also REPLY_ACK feature for older buggy >+ * Qemu versions (from v2.7.0 to v2.9.0). >+ */ >+ if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) Hi Maxime, Thanks for this patch - I like this approach, as it maintains API compatibility. Now for the gotchas, unfortunately: - VIRTIO_F_IOMMU_PLATFORM is defined in vhost.h, as opposed to rte_vhost.h, so it is not exposed to OvS :( - Currently, we use other similar macros in OvS (VIRTIO_NET_F_CSUM, VIRTIO_NET_F_HOST_TSO[4|6]); however, we obtain the definition of these from a kernel header file, as opposed to a DPDK header (linux/virtio_net.h) -- We could adopt the same approach for VIRTIO_F_IOMMU_PLATFORM, and include linux/virtio_config.h; unfortunately, the VIRTIO_F_IOMMU_PLATFORM macro is only defined from kernel 4.8, which creates another problem entirely. One potential solution is to move the VIRTIO_* definitions to rte_vhost.h, but at this stage in the DPDK release cycle, that's probably a tall ask. Any thoughts? Thanks again, Mark >+ protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK); >+ >+ msg->payload.u64 = protocol_features; >+ msg->size = sizeof(msg->payload.u64); >+} >+ >+static void > vhost_user_set_protocol_features(struct virtio_net *dev, >uint64_t protocol_features) > { >@@ -1248,8 +1269,7 @@ vhost_user_msg_handler(int vid, int fd) > break; > > case VHOST_USER_GET_PROTOCOL_FEATURES: >- msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES; >- msg.size = sizeof(msg.payload.u64); >+ vhost_user_get_protocol_features(dev, &msg); > send_vhost_reply(fd, &msg); > break; > case VHOST_USER_SET_PROTOCOL_FEATURES: >-- >2.13.6
Re: [dpdk-dev] [PATCH v2 0/3] vhost: disable iommu support by default
>From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] >Sent: Monday, November 6, 2017 8:38 PM >To: dev@dpdk.org; y...@fridaylinux.org; Kavanagh, Mark B >; tho...@monjalon.net; ktray...@redhat.com >Cc: Maxime Coquelin >Subject: [PATCH v2 0/3] vhost: disable iommu support by default > >This series disables IOMMU feature by default, and introduce >a new flag passed at vhost device registration time to enable >it explicitly. > >When disabled, patch 1 also disables reply-ack protocol feature >to avoid Qemu v2.7.0-v2.9.0 reply-ack bug with multiqueue. > >Last patch adds a Vhost PMD "iommu-support" parameter to enable >the IOMMU feature. Hi Maxime, I'm happy to confirm that this patchset resolves the vhost user mutltiq issue for OvS-DPDK, with QEMU v2.7.1. Additionally, all of the individual patches look good - thanks for all of your efforts on this! Tested-by: Mark Kavanagh Acked-by: Mark Kavanagh Cheers, Mark > >Maxime Coquelin (3): > vhost: disable reply-ack protocol feature if iommu feature disabled > vhost: add flag to enable iommu support > net: vhost: add iommu-support parameter to enable IOMMU feature > > doc/guides/nics/vhost.rst | 5 + > doc/guides/prog_guide/vhost_lib.rst| 14 ++ > doc/guides/rel_notes/release_17_11.rst | 3 ++- > drivers/net/vhost/rte_eth_vhost.c | 13 + > lib/librte_vhost/rte_vhost.h | 1 + > lib/librte_vhost/socket.c | 6 ++ > lib/librte_vhost/vhost_user.c | 24 ++-- > 7 files changed, 63 insertions(+), 3 deletions(-) > >-- >2.13.6
Re: [dpdk-dev] [PATCH v2 0/3] vhost: disable iommu support by default
>From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] >Sent: Tuesday, November 7, 2017 11:05 AM >To: Kavanagh, Mark B ; dev@dpdk.org; >y...@fridaylinux.org; tho...@monjalon.net; ktray...@redhat.com >Subject: Re: [PATCH v2 0/3] vhost: disable iommu support by default > >Hi Mark, > >On 11/07/2017 11:56 AM, Kavanagh, Mark B wrote: >>> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] >>> Sent: Monday, November 6, 2017 8:38 PM >>> To: dev@dpdk.org; y...@fridaylinux.org; Kavanagh, Mark B >>> ; tho...@monjalon.net; ktray...@redhat.com >>> Cc: Maxime Coquelin >>> Subject: [PATCH v2 0/3] vhost: disable iommu support by default >>> >>> This series disables IOMMU feature by default, and introduce >>> a new flag passed at vhost device registration time to enable >>> it explicitly. >>> >>> When disabled, patch 1 also disables reply-ack protocol feature >>> to avoid Qemu v2.7.0-v2.9.0 reply-ack bug with multiqueue. >>> >>> Last patch adds a Vhost PMD "iommu-support" parameter to enable >>> the IOMMU feature. >> >> Hi Maxime, >> >> I'm happy to confirm that this patchset resolves the vhost user mutltiq >issue for OvS-DPDK, with QEMU v2.7.1. > >Thanks for the testing. > >> Additionally, all of the individual patches look good - thanks for all of >your efforts on this! > >Great. >Now, what is required on OVS side is the introduction of a new vhost >port option to enable IOMMU support, so that management layer has a way >to enable it when VM has an iommu placed in front of the virtio device. Sounds good - I can add this as part of the DPDK v17.11 upgrade patch :) Thanks again, Mark > >Note that OVS can set the flag even if no IOMMU is present, as Virtio >feature negotiation will manage this. > >> Tested-by: Mark Kavanagh >> Acked-by: Mark Kavanagh > >Thanks, >Maxime > >> Cheers, >> Mark >> >>> >>> Maxime Coquelin (3): >>> vhost: disable reply-ack protocol feature if iommu feature disabled >>> vhost: add flag to enable iommu support >>> net: vhost: add iommu-support parameter to enable IOMMU feature >>> >>> doc/guides/nics/vhost.rst | 5 + >>> doc/guides/prog_guide/vhost_lib.rst| 14 ++ >>> doc/guides/rel_notes/release_17_11.rst | 3 ++- >>> drivers/net/vhost/rte_eth_vhost.c | 13 + >>> lib/librte_vhost/rte_vhost.h | 1 + >>> lib/librte_vhost/socket.c | 6 ++ >>> lib/librte_vhost/vhost_user.c | 24 ++-- >>> 7 files changed, 63 insertions(+), 3 deletions(-) >>> >>> -- >>> 2.13.6 >>
[dpdk-dev] mbuf: how to set data to NULL?
Hi Bruce, As a follow-on to my previous question: I suppose what I'm really getting at is trying to understand the implications of removing the data pointer, and determine if it's possible to replicate behavior observed in DPDK 1.7 (which we need in our use case). Take this situation for example: DPDK 1.7: I want to set an mbuf's data to NULL: => buf.data = NULL; Then, when I subsequently attempt to access the mbuf' data section, rte_pktmbuf_mtod(buf) returns NULL DPDK 1.8: I want to set an mbuf's data to NULL: => buf.data_off = 0; (is this correct?) Then, if I attempt to access the mbuf's data, instead of NULL, rte_pktmbuf_mtod(buf) returns buf_addr, not NULL. Is it possible in DPDK 1.8 to replicate the same behavior observed in 1.7? Btw, in our use case a data_len of 0 doesn't necessarily indicate a data value of NULL. Thanks, Mark > -Original Message- > From: Richardson, Bruce > Sent: Wednesday, December 17, 2014 4:50 PM > To: Kavanagh, Mark B > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] mbuf: how to set data to NULL? > > On Wed, Dec 17, 2014 at 04:44:15PM +, Kavanagh, Mark B wrote: > > Hi, > > > > DPDK 1.8.0 removes the data pointer from the mbuf structure, such that the > > start of the > data in the segment buffer must be calculated (i.e. buf_addr + data_off = > 'data'). > > > > Given this, what is the best approach to set the mbuf data to NULL > > (previously mbuf.data > = NULL)? > > > > As I see it, given an initialized mbuf, such that buf_addr is non-null, and > > data_off > =RTE_PKTMBUF_HEADROOM, is it fair to say that the best solution is to memset > to 0 from > location (buf_addr + data_off) for a length of (data_len - data_off)? > > > > Thanks in advance, > > Mark > > Why not just set data_len = 0 to indicate an empty mbuf?
[dpdk-dev] mbuf: how to set data to NULL?
Hi Bruce, I figured as much, thanks for confirming. We'll probably go with a flag. Thanks again, Mark > -Original Message- > From: Richardson, Bruce > Sent: Monday, February 9, 2015 12:59 PM > To: Kavanagh, Mark B > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] mbuf: how to set data to NULL? > > On Mon, Feb 09, 2015 at 10:51:36AM +, Kavanagh, Mark B wrote: > > Hi Bruce, > > > > As a follow-on to my previous question: I suppose what I'm really getting > > at is trying > to understand the implications of removing the data pointer, and determine if > it's > possible to replicate behavior observed in DPDK 1.7 (which we need in our use > case). > > > > Take this situation for example: > > > > DPDK 1.7: I want to set an mbuf's data to NULL: > > => buf.data = NULL; > > Then, when I subsequently attempt to access the mbuf' data section, > rte_pktmbuf_mtod(buf) returns NULL > > > > DPDK 1.8: I want to set an mbuf's data to NULL: > > => buf.data_off = 0; (is this correct?) > > Then, if I attempt to access the mbuf's data, instead of NULL, > rte_pktmbuf_mtod(buf) returns buf_addr, not NULL. > > > > Is it possible in DPDK 1.8 to replicate the same behavior observed in 1.7? > > > > Btw, in our use case a data_len of 0 doesn't necessarily indicate a data > > value of NULL. > > > > Thanks, > > Mark > > > > I don't think there is any way to replicate this behaviour exactly with the > new mbuf > structure. Memsetting zero may do what you want, but depending upon what the > meaning of an mbuf with NULL data is there may still be better ways to > indicate > such a thing e.g. a flag value in another field, or setting data_len to -1? > > /Bruce > > > > > > -Original Message- > > > From: Richardson, Bruce > > > Sent: Wednesday, December 17, 2014 4:50 PM > > > To: Kavanagh, Mark B > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] mbuf: how to set data to NULL? > > > > > > On Wed, Dec 17, 2014 at 04:44:15PM +, Kavanagh, Mark B wrote: > > > > Hi, > > > > > > > > DPDK 1.8.0 removes the data pointer from the mbuf structure, such that > > > > the start of > the > > > data in the segment buffer must be calculated (i.e. buf_addr + data_off = > > > 'data'). > > > > > > > > Given this, what is the best approach to set the mbuf data to NULL > > > > (previously > mbuf.data > > > = NULL)? > > > > > > > > As I see it, given an initialized mbuf, such that buf_addr is non-null, > > > > and data_off > > > =RTE_PKTMBUF_HEADROOM, is it fair to say that the best solution is to > > > memset to 0 from > > > location (buf_addr + data_off) for a length of (data_len - data_off)? > > > > > > > > Thanks in advance, > > > > Mark > > > > > > Why not just set data_len = 0 to indicate an empty mbuf?
[dpdk-dev] rte_memcpy.h: additional cflags required with OVS
Hi, Compilation of Open vSwitch fails when linked against current HEAD of DPDK (f2552cd5). The source of this issue appears to be commit ID 9144d6b: "eal/x86: optimize memcpy for SSE and AVX", and can be resolved by passing an additional argument to OVS when building same (CFLAGS="-march=native"). It seems that without this flag, OVS doesn't pick up one or more SSE #defines in DPDK, and doesn't include a relevant intrinsic header (emmintrin.h), leading to an 'implicit declaration' error for instrinsic '_mm_storeu_si128'. Has anyone else observed this behavior? Thanks, Mark
[dpdk-dev] rte_memcpy.h: additional cflags required with OVS
>-Original Message- >From: Mcnamara, John >Sent: Tuesday, March 10, 2015 12:57 AM >To: Mcnamara, John; Kavanagh, Mark B; dev at dpdk.org >Subject: RE: rte_memcpy.h: additional cflags required with OVS > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mcnamara, John >> Sent: Monday, March 9, 2015 5:51 PM >> To: Kavanagh, Mark B; dev at dpdk.org >> Subject: Re: [dpdk-dev] rte_memcpy.h: additional cflags required with OVS > > >>> In the meantime the following might work for OVS: >>> >>> $ ./configure CFLAGS='-Wno-bad-function-cast -march=native' --with- >>> dpdk=$DPDK_BUILD >>> $ make >>> >>> > > >Hi, > >It will also need a patch like the following to netdev-dpdk to account for >changes in the >RSS flags: > Thanks John, both myself and Panu had already caught this :) >$ git diff >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >index 1ba8310..90dd06f 100644 >--- a/lib/netdev-dpdk.c >+++ b/lib/netdev-dpdk.c >@@ -97,8 +97,7 @@ static const struct rte_eth_conf port_conf = { > .rx_adv_conf = { > .rss_conf = { > .rss_key = NULL, >-.rss_hf = ETH_RSS_IPV4_TCP | ETH_RSS_IPV4 | ETH_RSS_IPV6 >-| ETH_RSS_IPV4_UDP | ETH_RSS_IPV6_TCP | ETH_RSS_IPV6_UDP, >+.rss_hf = ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP, > }, > }, > .txmode = { > >
[dpdk-dev] rte_memcpy.h: additional cflags required with OVS
>-Original Message- >From: Mcnamara, John >Sent: Tuesday, March 10, 2015 8:27 AM >To: Qiu, Michael; Kavanagh, Mark B; dev at dpdk.org; Panu Matilainen >Subject: RE: [dpdk-dev] rte_memcpy.h: additional cflags required with OVS > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Qiu, Michael >> Sent: Tuesday, March 10, 2015 3:05 AM >> To: Kavanagh, Mark B; dev at dpdk.org >> Subject: Re: [dpdk-dev] rte_memcpy.h: additional cflags required with OVS >> > >> What's your gcc version? this should be an issue with old version gcc, and >> I'm working on this to solve this issue now. > > >Hi Michael, > >I see the issue with gcc 4.7.2 but not with 4.9.2. I'm using gcc v4.8.3. Just to clarify my initial post, there are two issues related to gcc intrinsic headers emmintrin.h, and tmmintrin.h: - in former, a difference in parameter types for _mm_storeu_si128 is the issue. This is the primary issue observed. - in tmmintrin.h, when __OPTIMIZE__ is not defined, function _mm_alignr_epi8 is also not defined, leading to an 'implicit definition of function' error. I've only noticed this intermittently (even though I compile OVS with -O2 CFLAGS) > >John
[dpdk-dev] [PATCH v2 1/4] mk: Remove combined library and related options
>--- > config/common_bsdapp| 6 -- > config/common_linuxapp | 6 -- > config/defconfig_ppc_64-power8-linuxapp-gcc | 2 - > lib/Makefile| 1 - > mk/rte.app.mk | 12 > mk/rte.lib.mk | 35 -- > mk/rte.sdkbuild.mk | 3 - > mk/rte.sharelib.mk | 101 > mk/rte.vars.mk | 9 --- > 9 files changed, 175 deletions(-) > delete mode 100644 mk/rte.sharelib.mk > >diff --git a/config/common_bsdapp b/config/common_bsdapp >index 8ff4dc2..7ee5ecf 100644 >--- a/config/common_bsdapp >+++ b/config/common_bsdapp >@@ -79,12 +79,6 @@ CONFIG_RTE_FORCE_INTRINSICS=n > CONFIG_RTE_BUILD_SHARED_LIB=n > > # >-# Combine to one single library >-# >-CONFIG_RTE_BUILD_COMBINE_LIBS=n >-CONFIG_RTE_LIBNAME=intel_dpdk Hi Sergio, Removing these options breaks compatibility with OVS. While it may be feasible to link to individual static libraries, in our experience, a single combined library provides a much more convenient way of linking. Thanks, Mark >- >-# > # Compile Environment Abstraction Layer > # > CONFIG_RTE_LIBRTE_EAL=y >diff --git a/config/common_linuxapp b/config/common_linuxapp >index 97f1c9e..ae13805 100644 >--- a/config/common_linuxapp >+++ b/config/common_linuxapp >@@ -79,12 +79,6 @@ CONFIG_RTE_FORCE_INTRINSICS=n > CONFIG_RTE_BUILD_SHARED_LIB=n > > # >-# Combine to one single library >-# >-CONFIG_RTE_BUILD_COMBINE_LIBS=n >-CONFIG_RTE_LIBNAME="intel_dpdk" >- >-# > # Compile Environment Abstraction Layer > # > CONFIG_RTE_LIBRTE_EAL=y >diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc >b/config/defconfig_ppc_64-power8- >linuxapp-gcc >index d97a885..f1af518 100644 >--- a/config/defconfig_ppc_64-power8-linuxapp-gcc >+++ b/config/defconfig_ppc_64-power8-linuxapp-gcc >@@ -39,8 +39,6 @@ CONFIG_RTE_ARCH_64=y > CONFIG_RTE_TOOLCHAIN="gcc" > CONFIG_RTE_TOOLCHAIN_GCC=y > >-CONFIG_RTE_LIBNAME="powerpc_dpdk" >- > # Note: Power doesn't have this support > CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n > >diff --git a/lib/Makefile b/lib/Makefile >index d94355d..c34cf2f 100644 >--- a/lib/Makefile >+++ b/lib/Makefile >@@ -77,5 +77,4 @@ DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni > DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += librte_ivshmem > endif > >-include $(RTE_SDK)/mk/rte.sharelib.mk > include $(RTE_SDK)/mk/rte.subdir.mk >diff --git a/mk/rte.app.mk b/mk/rte.app.mk >index 63a41e2..e2baa49 100644 >--- a/mk/rte.app.mk >+++ b/mk/rte.app.mk >@@ -61,12 +61,6 @@ ifeq ($(NO_AUTOLIBS),) > > LDLIBS += --whole-archive > >-ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),y) >-LDLIBS += -l$(RTE_LIBNAME) >-endif >- >-ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n) >- > ifeq ($(CONFIG_RTE_LIBRTE_DISTRIBUTOR),y) > LDLIBS += -lrte_distributor > endif >@@ -137,8 +131,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y) > LDLIBS += -lrte_vhost > endif > >-endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS >- > ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),y) > LDLIBS += -lpcap > endif >@@ -153,8 +145,6 @@ endif > > LDLIBS += --start-group > >-ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS),n) >- > ifeq ($(CONFIG_RTE_LIBRTE_KVARGS),y) > LDLIBS += -lrte_kvargs > endif >@@ -253,8 +243,6 @@ endif > > endif # plugins > >-endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS >- > LDLIBS += $(EXECENV_LDLIBS) > > LDLIBS += --end-group >diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk >index 0d7482d..d96101a 100644 >--- a/mk/rte.lib.mk >+++ b/mk/rte.lib.mk >@@ -87,24 +87,6 @@ O_TO_S_DO = @set -e; \ > $(O_TO_S) && \ > echo $(O_TO_S_CMD) > $(call exe2cmd,$(@)) > >-ifeq ($(RTE_BUILD_SHARED_LIB),n) >-O_TO_C = $(AR) crus $(LIB_ONE) $(OBJS-y) >-O_TO_C_STR = $(subst ','\'',$(O_TO_C)) #'# fix syntax highlight >-O_TO_C_DISP = $(if $(V),"$(O_TO_C_STR)"," AR_C $(@)") >-O_TO_C_DO = @set -e; \ >- $(lib_dir) \ >- $(copy_obj) >-else >-O_TO_C = $(LD) -shared $(OBJS-y) -o $(LIB_ONE) >-O_TO_C_STR = $(subst ','\'',$(O_TO_C)) #'# fix syntax highlight >-O_TO_C_DISP = $(if $(V),"$(O_TO_C_STR)"," LD_C $(@)") >-O_TO_C_DO = @set -e; \ >- $(lib_dir) \ >- $(copy_obj) >-endif >- >-copy_obj = cp -f $(OBJS-y) $(RTE_OUTPUT)/build/lib; >-lib_dir = [ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib; > -include .$(LIB).cmd > > # >@@ -129,15 +111,6 @@ endif > $(depfile_missing),\ > $(depfile_newer)),\ > $(O_TO_S_DO)) >- >-ifeq ($(RTE_BUILD_COMBINE_LIBS),y) >- $(if $(or \ >-$(file_missing),\ >-$(call cmdline_changed,$(O_TO_C_STR)),\ >-$(depfile_missing),\ >-$(depfile_newer)),\ >-$(O_TO_C_DO)) >-endif > else > $(LIB): $(OBJS-y) $(DEP_$(LIB)) FORCE > @[ -d $(dir $@) ] || mkdir -p $(dir $@) >@@ -153,14 +126,6 @@ $(LIB): $(OBJS-y) $(DEP_$(LIB)) FORCE > $(depfile_missing),\ > $(depfile_newer)),\ > $(O_TO_A_DO)) >-ifeq ($(RTE_BUILD_COMBINE_LIBS),y) >- $(if $(or \ >
[dpdk-dev] [PATCH v2 1/4] mk: Remove combined library and related options
>On 13/03/2015 10:49, Kavanagh, Mark B wrote: >>> --- >>> config/common_bsdapp| 6 -- >>> config/common_linuxapp | 6 -- >>> config/defconfig_ppc_64-power8-linuxapp-gcc | 2 - >>> lib/Makefile| 1 - >>> mk/rte.app.mk | 12 >>> mk/rte.lib.mk | 35 -- >>> mk/rte.sdkbuild.mk | 3 - >>> mk/rte.sharelib.mk | 101 >>> >>> mk/rte.vars.mk | 9 --- >>> 9 files changed, 175 deletions(-) >>> delete mode 100644 mk/rte.sharelib.mk >>> >>> diff --git a/config/common_bsdapp b/config/common_bsdapp >>> index 8ff4dc2..7ee5ecf 100644 >>> --- a/config/common_bsdapp >>> +++ b/config/common_bsdapp >>> @@ -79,12 +79,6 @@ CONFIG_RTE_FORCE_INTRINSICS=n >>> CONFIG_RTE_BUILD_SHARED_LIB=n >>> >>> # >>> -# Combine to one single library >>> -# >>> -CONFIG_RTE_BUILD_COMBINE_LIBS=n >>> -CONFIG_RTE_LIBNAME=intel_dpdk >> Hi Sergio, >> >> Removing these options breaks compatibility with OVS. While it may be >> feasible to link >to individual static libraries, in our experience, a single combined library >provides a >much more convenient way of linking. >> >> Thanks, >> Mark >> >>> - (snip) >>> -endif >>> - >>> -RTE_LIBNAME := $(CONFIG_RTE_LIBNAME:"%"=%) >>> -ifeq ($(RTE_LIBNAME),) >>> -RTE_LIBNAME := intel_dpdk >>> endif >>> >>> # RTE_TARGET is deducted from config when we are building the SDK. >>> -- >>> 1.9.3 >Hi Mark, > >How does this patch break compatibility with OVS? > >Thanks, >Sergio Hey Sergio, We use the CONFIG_RTE_BUILD_COMBINE_LIBS and CONFIG_RTE_LINBNAME flags to build a single static DPDK library, named 'libintel_dpdk.a', which OVS links against. Removing the combined library option breaks compatibility with any application that links against the combined DPDK library. Is there a strong technical motivation for removing these options? Thanks, Mark
[dpdk-dev] [PATCH v3] ABI: Add abi checking utility
>-Original Message- >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman >Sent: Wednesday, March 4, 2015 4:27 PM >To: dev at dpdk.org >Subject: [dpdk-dev] [PATCH v3] ABI: Add abi checking utility > >There was a request for an abi validation utilty for the ongoing ABI stability >work. As it turns out there is a abi compliance checker in development that >seems to be under active development and provides fairly detailed ABI >compliance >reports. Its not yet intellegent enough to understand symbol versioning, but >it >does provide the ability to identify symbols which have changed between >releases, along with details of the change, and offers developers the >opportunity to identify which symbols then need versioning and validation for a >given update via manual testing. > >This script automates the use of the compliance checker between two arbitrarily >specified tags within the dpdk tree. To execute enter the $RTE_SDK directory >and run: > >./scripts/validate_abi.sh $GIT_TAG1 $GIT_TAG2 $CONFIG > >where $GIT_TAG1 and 2 are git tags and $CONFIG is a config specification >suitable for passing as the T= variable in the make config command. > >Note the upstream source for the abi compliance checker is here: >http://ispras.linuxbase.org/index.php/ABI_compliance_checker > >It generates a report for each DSO built from the requested tags that >developers >can review to find ABI compliance issues. > >Signed-off-by: Neil Horman > >--- > >Change Notes: > >v2) Fixed some typos as requested by Thomas > >v3) Fixed some additional typos Thomas requested >Improved script to work from detached state >Added some documentation to the changelog >Added some comments to the scripts >--- > scripts/validate_abi.sh | 248 > 1 file changed, 248 insertions(+) > create mode 100755 scripts/validate_abi.sh > >diff --git a/scripts/validate_abi.sh b/scripts/validate_abi.sh >new file mode 100755 >index 000..899cf5f >--- /dev/null >+++ b/scripts/validate_abi.sh >@@ -0,0 +1,248 @@ >+#!/bin/sh >+# BSD LICENSE >+# >+# Copyright(c) 2015 Neil Horman. All rights reserved. >+# All rights reserved. >+# >+# Redistribution and use in source and binary forms, with or without >+# modification, are permitted provided that the following conditions >+# are met: >+# >+# * Redistributions of source code must retain the above copyright >+# notice, this list of conditions and the following disclaimer. >+# * Redistributions in binary form must reproduce the above copyright >+# notice, this list of conditions and the following disclaimer in >+# the documentation and/or other materials provided with the >+# distribution. >+# >+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS >+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR >+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT >+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE >+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. >+ >+TAG1=$1 >+TAG2=$2 >+TARGET=$3 >+ABI_DIR=`mktemp -d -p /tmp ABI.XX` >+ >+usage() { >+ echo "$0 " >+} >+ >+log() { >+ local level=$1 >+ shift >+ echo "$*" >+} >+ >+validate_tags() { >+ git tag -l | grep -q "$TAG1" >+ if [ $? -ne 0 ] >+ then >+ echo "$TAG1 is invalid" >+ return >+ fi >+ git tag -l | grep -q "$TAG2" >+ if [ $? -ne 0 ] >+ then >+ echo "$TAG2 is invalid" >+ return >+ fi >+} >+ >+validate_args() { >+ if [ -z "$TAG1" ] >+ then >+ echo "Must Specify TAG1" >+ return >+ fi >+ if [ -z "$TAG2" ] >+ then >+ echo "Must Specify TAG2" >+ return >+ fi >+ if [ -z "$TARGET" ] >+ then >+ echo "Must Specify a build target" >+ fi >+} >+ >+ >+cleanup_and_exit() { >+ rm -rf $ABI_DIR >+ exit $1 >+} >+ >+### >+#START >+ >+ >+#trap on ctrl-c to clean up >+trap cleanup_and_exit SIGINT >+ >+#Save the current branch >+CURRENT_BRANCH=`git branch | grep \* | cut -d' ' -f2` >+ >+if [ -z "$CURRENT_BRANCH" ] >+then >+ CURRENT_BRANCH=`git log --pretty=format:%H HEAD~1..HEAD` >+fi >+ >+if [ -n "$VERBOSE" ] >+then >+ export VERBOSE=/dev/stdout >+else >+ export VERBOSE=/dev/null >+fi >+
[dpdk-dev] [PATCH v2 1/4] mk: Remove combined library and related options
>-Original Message- >From: Gonzalez Monroy, Sergio >Sent: Friday, March 13, 2015 11:49 AM >To: Kavanagh, Mark B >Cc: dev at dpdk.org >Subject: Re: [dpdk-dev] [PATCH v2 1/4] mk: Remove combined library and related >options > >On 13/03/2015 11:34, Kavanagh, Mark B wrote: >>> On 13/03/2015 10:49, Kavanagh, Mark B wrote: >>>>> --- >>>>> config/common_bsdapp| 6 -- >>>>> config/common_linuxapp | 6 -- >>>>> config/defconfig_ppc_64-power8-linuxapp-gcc | 2 - >>>>> lib/Makefile| 1 - >>>>> mk/rte.app.mk | 12 >>>>> mk/rte.lib.mk | 35 -- >>>>> mk/rte.sdkbuild.mk | 3 - >>>>> mk/rte.sharelib.mk | 101 >>>>> >>>>> mk/rte.vars.mk | 9 --- >>>>> 9 files changed, 175 deletions(-) >>>>> delete mode 100644 mk/rte.sharelib.mk >>>>> >>>>> diff --git a/config/common_bsdapp b/config/common_bsdapp >>>>> index 8ff4dc2..7ee5ecf 100644 >>>>> --- a/config/common_bsdapp >>>>> +++ b/config/common_bsdapp >>>>> @@ -79,12 +79,6 @@ CONFIG_RTE_FORCE_INTRINSICS=n >>>>> CONFIG_RTE_BUILD_SHARED_LIB=n >>>>> >>>>> # >>>>> -# Combine to one single library >>>>> -# >>>>> -CONFIG_RTE_BUILD_COMBINE_LIBS=n >>>>> -CONFIG_RTE_LIBNAME=intel_dpdk >>>> Hi Sergio, >>>> >>>> Removing these options breaks compatibility with OVS. While it may be >>>> feasible to link >>> to individual static libraries, in our experience, a single combined >>> library provides a >>> much more convenient way of linking. >>>> Thanks, >>>> Mark >>>> >>>>> - >> >> (snip) >> >> >>>>> -endif >>>>> - >>>>> -RTE_LIBNAME := $(CONFIG_RTE_LIBNAME:"%"=%) >>>>> -ifeq ($(RTE_LIBNAME),) >>>>> -RTE_LIBNAME := intel_dpdk >>>>> endif >>>>> >>>>> # RTE_TARGET is deducted from config when we are building the SDK. >>>>> -- >>>>> 1.9.3 >>> Hi Mark, >>> >>> How does this patch break compatibility with OVS? >>> >>> Thanks, >>> Sergio >> Hey Sergio, >> >> We use the CONFIG_RTE_BUILD_COMBINE_LIBS and CONFIG_RTE_LINBNAME flags to >> build a single >static DPDK library, named 'libintel_dpdk.a', which OVS links against. >Removing the >combined library option breaks compatibility with any application that links >against the >combined DPDK library. >> >> Is there a strong technical motivation for removing these options? >> >> Thanks, >> Mark > From a shared library point of view, it just does not make sense to >have applications linked against a 'combined' library that may have >different features built in it. > For OVS, we don't build DPDK as a set of shared libraries, but rather an individual static library, due to the performance penalties inherent in using shared libraries. >Removing these options, aside from the obvious 'less build config >option', it simplifies maintenance of makefiles as we currently have a >separated makefile with specific rules just for combined library. > >It is pretty straight forward to build a single combined archive out of >multiple archives, would it be acceptable to have a script to do this? > This seems a bit 'hacky' to me and I'm not sure that it would be amenable to the OVS maintainers. Unless I'm overlooking something here, I'd prefer to maintain the status quo. >Thanks, >Sergio
[dpdk-dev] [PATCH v3] ABI: Add abi checking utility
>On Fri, Mar 13, 2015 at 11:56:59AM +0000, Kavanagh, Mark B wrote: >> >> >> >-Original Message- >> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman >> >Sent: Wednesday, March 4, 2015 4:27 PM >> >To: dev at dpdk.org >> >Subject: [dpdk-dev] [PATCH v3] ABI: Add abi checking utility >> > (snip) >> >+log "INFO" "Building DPDK $TAG1. This might take a moment" >> >+make O=$TARGET > $VERBOSE 2>&1 >> >+ >> >+if [ $? -ne 0 ] >> >+then >> >+ log "INFO" "THE BUILD FAILED. ABORTING" >> >> If the build fails while TAG1 is checked out, the user must check out their >> original >local branch manually. I'd prefer it if the script checked out $CURRENT_BRANCH >in the >'cleanup_and_exit' function. >> >Sure, its in V4. Cool. > >> Same applies to TAG2, if the user CTRL-C's out of the script, and to any >> other command >that might fail when a particular branch/tag is checked out (for example, the >'sed' >commands fail when I run the script; however, they work when I run them on the >command >line - I'm investigating this currently). >> >What does the log say? Please post it here. If it helps add a set -x to the >top of the script for additional verbosity. > Hey Neil - this is the error, but it's not a problem with the script; presumably I'd cleaned my DPDK installation directory, so 'sed' couldn't find the defconfig file: "sed: can't read config/defconfig_x86_64-ivshmem-linuxapp-gcc/: Not a directory" Thanks, Mark >Neil
[dpdk-dev] [PATCH v3] ABI: Add abi checking utility
>-Original Message- >From: Neil Horman [mailto:nhorman at tuxdriver.com] >Sent: Friday, March 13, 2015 2:59 PM >To: Kavanagh, Mark B >Cc: dev at dpdk.org >Subject: Re: [dpdk-dev] [PATCH v3] ABI: Add abi checking utility > >On Fri, Mar 13, 2015 at 02:25:17PM +, Kavanagh, Mark B wrote: >> >On Fri, Mar 13, 2015 at 11:56:59AM +, Kavanagh, Mark B wrote: >> >> >> >> >> >> >-Original Message- >> >> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman >> >> >Sent: Wednesday, March 4, 2015 4:27 PM >> >> >To: dev at dpdk.org >> >> >Subject: [dpdk-dev] [PATCH v3] ABI: Add abi checking utility >> >> > >> >> >> (snip) >> >> >> >+log "INFO" "Building DPDK $TAG1. This might take a moment" >> >> >+make O=$TARGET > $VERBOSE 2>&1 >> >> >+ >> >> >+if [ $? -ne 0 ] >> >> >+then >> >> >+log "INFO" "THE BUILD FAILED. ABORTING" >> >> >> >> If the build fails while TAG1 is checked out, the user must check out >> >> their original >> >local branch manually. I'd prefer it if the script checked out >> >$CURRENT_BRANCH in the >> >'cleanup_and_exit' function. >> >> >> >Sure, its in V4. >> >> Cool. >> >> > >> >> Same applies to TAG2, if the user CTRL-C's out of the script, and to any >> >> other >command >> >that might fail when a particular branch/tag is checked out (for example, >> >the 'sed' >> >commands fail when I run the script; however, they work when I run them on >> >the command >> >line - I'm investigating this currently). >> >> >> >What does the log say? Please post it here. If it helps add a set -x to >> >the >> >top of the script for additional verbosity. >> > >> >> Hey Neil - this is the error, but it's not a problem with the script; >> presumably I'd >cleaned my DPDK installation directory, so 'sed' couldn't find the defconfig >file: >> "sed: can't read config/defconfig_x86_64-ivshmem-linuxapp-gcc/: Not a >> directory" >> >Actually, it looks to me like you added a trailing "/" to the end of the third >argument on the script command line, so sed bombs when it tries to modify a >directory instead of a file. Try specifying: >x86_64-ivshmem-linuxapp-gcc >instead of >x86_64-ivshmem-linuxapp-gcc/ > >Neil Nice catch - thanks! > >> Thanks, >> Mark >> >> >Neil >> >>
[dpdk-dev] [PATCH] doc: fix vhost setup in tep-termination app guide
Please disregard this patch - I'll push it again, this time with the correct subject line format. Apologies - Mark. > >- Fix vhost setup flags >- Add minor edits to improve readability and consistency > >Signed-off-by: Mark Kavanagh >--- > doc/guides/sample_app_ug/tep_termination.rst | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > mode change 100644 => 100755 doc/guides/sample_app_ug/tep_termination.rst > >diff --git a/doc/guides/sample_app_ug/tep_termination.rst >b/doc/guides/sample_app_ug/tep_termination.rst >old mode 100644 >new mode 100755 >index 2d86a03..c3d1e97 >--- a/doc/guides/sample_app_ug/tep_termination.rst >+++ b/doc/guides/sample_app_ug/tep_termination.rst >@@ -59,8 +59,8 @@ This allows network isolation, QOS, etc to be provided on a >per client >basis. > In a typical setup, the network overlay tunnel is terminated at the > Virtual/Tunnel End Point >(VEP/TEP). > The TEP is normally located at the physical host level ideally in the > software switch. > Due to processing constraints and the inevitable bottleneck that the switch >-becomes the ability to offload overlay support features becomes an important >requirement. >-Intel? XL710 10/40 G Ethernet network card provides hardware filtering >+becomes, the ability to offload overlay support features becomes an important >requirement. >+Intel? XL710 10/40 Gigabit Ethernet network card provides hardware filtering > and offload capabilities to support overlay networks implementations such as > MAC in UDP and >MAC in GRE. > > Sample Code Overview >@@ -131,14 +131,14 @@ Compiling the Sample Code > > .. code-block:: console > >-CONFIG_RTE_LIBRTE_VHOST=n >+CONFIG_RTE_LIBRTE_VHOST=y > > vhost user is turned on by default in the configure file > config/common_linuxapp. > To enable vhost cuse, disable vhost user. > > .. code-block:: console > >-CONFIG_RTE_LIBRTE_VHOST_USER=y >+CONFIG_RTE_LIBRTE_VHOST_USER=n > > After vhost is enabled and the implementation is selected, build the > vhost library. > >-- >1.9.3
[dpdk-dev] [PATCH] doc: fix vhost setup in tep-termination app guide
> >Hi Mark, Hi Jianfeng, Thanks for your comments - I've responded inline. Cheers, Mark > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mark Kavanagh >> Sent: Tuesday, July 19, 2016 11:32 PM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH] doc: fix vhost setup in tep-termination app >> guide >> >> - Fix vhost setup flags >> - Add minor edits to improve readability and consistency >> >> Signed-off-by: Mark Kavanagh >> --- >> doc/guides/sample_app_ug/tep_termination.rst | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> mode change 100644 => 100755 >> doc/guides/sample_app_ug/tep_termination.rst >> >> diff --git a/doc/guides/sample_app_ug/tep_termination.rst >> b/doc/guides/sample_app_ug/tep_termination.rst >> old mode 100644 >> new mode 100755 >> index 2d86a03..c3d1e97 >> --- a/doc/guides/sample_app_ug/tep_termination.rst >> +++ b/doc/guides/sample_app_ug/tep_termination.rst >> @@ -59,8 +59,8 @@ This allows network isolation, QOS, etc to be provided >> on a per client basis. >> In a typical setup, the network overlay tunnel is terminated at the >> Virtual/Tunnel End Point (VEP/TEP). >> The TEP is normally located at the physical host level ideally in the >> software >> switch. >> Due to processing constraints and the inevitable bottleneck that the switch >> -becomes the ability to offload overlay support features becomes an >> important requirement. >> -Intel? XL710 10/40 G Ethernet network card provides hardware filtering >> +becomes, the ability to offload overlay support features becomes an >> important requirement. >> +Intel? XL710 10/40 Gigabit Ethernet network card provides hardware >> filtering >> and offload capabilities to support overlay networks implementations such >> as MAC in UDP and MAC in GRE. >> >> Sample Code Overview >> @@ -131,14 +131,14 @@ Compiling the Sample Code >> >> .. code-block:: console >> >> -CONFIG_RTE_LIBRTE_VHOST=n >> +CONFIG_RTE_LIBRTE_VHOST=y >> >> vhost user is turned on by default in the configure file >> config/common_linuxapp. >> To enable vhost cuse, disable vhost user. >> >> .. code-block:: console >> >> -CONFIG_RTE_LIBRTE_VHOST_USER=y >> +CONFIG_RTE_LIBRTE_VHOST_USER=n > >Actually, this example does not necessarily disable vhost-user. It can work >with vhost-user. Okay; however, the instruction that I modified relates to the case where vhost cuse is the preferred option (or at least that's how it comes across upon reading). With that in mind, CONFIG_RTE_LIBRTE_VHOST_USER should be set to 'n', and not 'y', as previously described. >I also notice that there are many comments inside this example to indicate it >works with >vhost-cuse. I think we need to correct this. Do you mean to say that the example app does not work with vhost cuse? What should be corrected? >By the way, vhost-cuse could be deprecated in >the future, please refer here, >http://dpdk.org/ml/archives/dev/2016-July/044080.html Thanks Jianfeng, I'm aware of this; however, as that will not be the case in the v16.07 release, I think that the cuse instructions should still be updated. > >Thanks, >Jianfeng > >> >> After vhost is enabled and the implementation is selected, build the >> vhost >> library. >> >> -- >> 1.9.3
[dpdk-dev] [PATCH V2] doc: fix vhost setup in tep-termination app guide
Please disregard - correct version of patch to follow. Cheers, Mark >-Original Message- >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mark Kavanagh >Sent: Thursday, July 21, 2016 2:04 PM >To: dev at dpdk.org >Subject: [dpdk-dev] [PATCH V2] doc: fix vhost setup in tep-termination app >guide > >- Fix vhost setup flags >- Add minor edits to improve readability and consistency > >--- > >v2: - revert file mode changes made erroneously in v1 > >Signed-off-by: Mark Kavanagh >--- > doc/guides/sample_app_ug/tep_termination.rst | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > mode change 100644 => 100755 doc/guides/sample_app_ug/tep_termination.rst > >diff --git a/doc/guides/sample_app_ug/tep_termination.rst >b/doc/guides/sample_app_ug/tep_termination.rst >old mode 100644 >new mode 100755 >index 2d86a03..c3d1e97 >--- a/doc/guides/sample_app_ug/tep_termination.rst >+++ b/doc/guides/sample_app_ug/tep_termination.rst >@@ -59,8 +59,8 @@ This allows network isolation, QOS, etc to be provided on a >per client >basis. > In a typical setup, the network overlay tunnel is terminated at the > Virtual/Tunnel End Point >(VEP/TEP). > The TEP is normally located at the physical host level ideally in the > software switch. > Due to processing constraints and the inevitable bottleneck that the switch >-becomes the ability to offload overlay support features becomes an important >requirement. >-Intel? XL710 10/40 G Ethernet network card provides hardware filtering >+becomes, the ability to offload overlay support features becomes an important >requirement. >+Intel? XL710 10/40 Gigabit Ethernet network card provides hardware filtering > and offload capabilities to support overlay networks implementations such as > MAC in UDP and >MAC in GRE. > > Sample Code Overview >@@ -131,14 +131,14 @@ Compiling the Sample Code > > .. code-block:: console > >-CONFIG_RTE_LIBRTE_VHOST=n >+CONFIG_RTE_LIBRTE_VHOST=y > > vhost user is turned on by default in the configure file > config/common_linuxapp. > To enable vhost cuse, disable vhost user. > > .. code-block:: console > >-CONFIG_RTE_LIBRTE_VHOST_USER=y >+CONFIG_RTE_LIBRTE_VHOST_USER=n > > After vhost is enabled and the implementation is selected, build the > vhost library. > >-- >1.9.3
[dpdk-dev] [PATCH 1/7] ofproto: Consider datapath_type when looking for internal ports.
Please disregard this patchset - it was sent erroneously to the incorrect mailing list. I've already removed the related patches from Patchwork. Cheers, Mark >-Original Message- >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mark Kavanagh >Sent: Friday, August 5, 2016 3:30 PM >To: dev at dpdk.org; diproiettod at vmware.com >Cc: i.maximets at samsung.com >Subject: [dpdk-dev] [PATCH 1/7] ofproto: Consider datapath_type when looking >for internal >ports. > >From: Daniele Di Proietto > >Interfaces with type "internal" end up having a netdev with type "tap" >in the dpif-netdev datapath, so a strcmp will fail to match internal >interfaces. > >We can translate the types with ofproto_port_open_type() before calling >strcmp to fix this. > >This fixes a minor issue where internal interfaces are considered >non-internal in the userspace datapath for the purpose of adjusting the >MTU. > >Signed-off-by: Daniele Di Proietto >--- > ofproto/ofproto.c | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) > >diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c >index 8e59c69..088f91a 100644 >--- a/ofproto/ofproto.c >+++ b/ofproto/ofproto.c >@@ -220,7 +220,8 @@ static void learned_cookies_flush(struct ofproto *, struct >ovs_list >*dead_cookie > /* ofport. */ > static void ofport_destroy__(struct ofport *) OVS_EXCLUDED(ofproto_mutex); > static void ofport_destroy(struct ofport *, bool del); >-static inline bool ofport_is_internal(const struct ofport *); >+static inline bool ofport_is_internal(const struct ofproto *, >+ const struct ofport *); > > static int update_port(struct ofproto *, const char *devname); > static int init_ports(struct ofproto *); >@@ -2465,7 +2466,7 @@ static void > ofport_remove(struct ofport *ofport) > { > struct ofproto *p = ofport->ofproto; >-bool is_internal = ofport_is_internal(ofport); >+bool is_internal = ofport_is_internal(p, ofport); > > connmgr_send_port_status(ofport->ofproto->connmgr, NULL, &ofport->pp, > OFPPR_DELETE); >@@ -2751,9 +2752,10 @@ init_ports(struct ofproto *p) > } > > static inline bool >-ofport_is_internal(const struct ofport *port) >+ofport_is_internal(const struct ofproto *p, const struct ofport *port) > { >-return !strcmp(netdev_get_type(port->netdev), "internal"); >+return !strcmp(netdev_get_type(port->netdev), >+ ofproto_port_open_type(p->type, "internal")); > } > > /* Find the minimum MTU of all non-datapath devices attached to 'p'. >@@ -2770,7 +2772,7 @@ find_min_mtu(struct ofproto *p) > > /* Skip any internal ports, since that's what we're trying to > * set. */ >-if (ofport_is_internal(ofport)) { >+if (ofport_is_internal(p, ofport)) { > continue; > } > >@@ -2797,7 +2799,7 @@ update_mtu(struct ofproto *p, struct ofport *port) > port->mtu = 0; > return; > } >-if (ofport_is_internal(port)) { >+if (ofport_is_internal(p, port)) { > if (dev_mtu > p->min_mtu) { >if (!netdev_set_mtu(port->netdev, p->min_mtu)) { >dev_mtu = p->min_mtu; >@@ -2827,7 +2829,7 @@ update_mtu_ofproto(struct ofproto *p) > HMAP_FOR_EACH (ofport, hmap_node, &p->ports) { > struct netdev *netdev = ofport->netdev; > >-if (ofport_is_internal(ofport)) { >+if (ofport_is_internal(p, ofport)) { > if (!netdev_set_mtu(netdev, p->min_mtu)) { > ofport->mtu = p->min_mtu; > } >-- >1.9.3
[dpdk-dev] dpdk/vpp and cross-version migration for vhost
> >On 11/24/2016 12:47 PM, Maxime Coquelin wrote: >> >> >> On 11/24/2016 01:33 PM, Yuanhan Liu wrote: >>> On Thu, Nov 24, 2016 at 09:30:49AM +, Kevin Traynor wrote: > On 11/24/2016 06:31 AM, Yuanhan Liu wrote: > > > On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: > You keep assuming that you have the VM started first and > figure out things afterwards, but this does not work. > > Think about a cluster of machines. You want to start a VM in > a way that will ensure compatibility with all hosts > in a cluster. >>> > >>> >>> > >>> I see. I was more considering about the case when the dst >>> > >>> host (including the qemu and dpdk combo) is given, and >>> > >>> then determine whether it will be a successfull migration >>> > >>> or not. >>> > >>> >>> > >>> And you are asking that we need to know which host could >>> > >>> be a good candidate before starting the migration. In such >>> > >>> case, we indeed need some inputs from both the qemu and >>> > >>> vhost-user backend. >>> > >>> >>> > >>> For DPDK, I think it could be simple, just as you said, it >>> > >>> could be either a tiny script, or even a macro defined in >>> > >>> the source code file (we extend it every time we add a >>> > >>> new feature) to let the libvirt to read it. Or something >>> > >>> else. >> > >> >> > >> There's the issue of APIs that tweak features as Maxime >> > >> suggested. > > > > > > Yes, it's a good point. > > > >> > >> Maybe the only thing to do is to deprecate it, > > > > > > Looks like so. > > > >> > >> but I feel some way for application to pass info into >> > >> guest might be benefitial. > > > > > > The two APIs are just for tweaking feature bits DPDK supports > before > > > any device got connected. It's another way to disable some features > > > (the another obvious way is to through QEMU command lines). > > > > > > IMO, it's bit handy only in a case like: we have bunch of VMs. > Instead > > > of disabling something though qemu one by one, we could disable it > > > once in DPDK. > > > > > > But I doubt the useful of it. It's only used in DPDK's vhost > example > > > after all. Nor is it used in vhost pmd, neither is it used in OVS. > > rte_vhost_feature_disable() is currently used in OVS, lib/netdev-dpdk.c >>> Hmmm. I must have checked very old code ... > > netdev_dpdk_vhost_class_init(void) > { > static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > > /* This function can be called for different classes. The > initialization > * needs to be done only once */ > if (ovsthread_once_start(&once)) { > rte_vhost_driver_callback_register(&virtio_net_device_ops); > rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4 > | 1ULL << VIRTIO_NET_F_HOST_TSO6 > | 1ULL << VIRTIO_NET_F_CSUM); >>> I saw the commit introduced such change, but it tells no reason why >>> it was added. >> >> I'm also interested to know the reason. > >I can't remember off hand, added Mark K or Michal W who should be able >to shed some light on it. DPDK v16.04 added support for vHost User TSO; as such, by default, TSO is advertised to guest devices as an available feature during feature negotiation with QEMU. However, while the vHost user backend sets up the majority of the mbuf fields that are required for TSO, there is still a reliance on the associated DPDK application (i.e. in this case OvS-DPDK) to set the remaining flags and/or offsets. Since OvS-DPDK doesn't currently provide that functionality, it is necessary to explicitly disable TSO; otherwise, undefined behaviour will ensue. > >> In any case, I think this is something that can/should be managed by >> the management tool, which should disable it in cmd parameters. >> >> Kevin, do you agree? > >I think best to find out the reason first. Because if no reason to >disable in the code, then no need to debate! > >> >> Cheers, >> Maxime
[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio
>Hi All, > Is there any further comments or modifications required for this patch, > or what next >steps do you guys suggest here ? Hi Souvik, Some minor comments inline. Thanks, Mark > >-- >Regards, >Souvik > >-Original Message- >From: Dey, Souvik >Sent: Saturday, October 1, 2016 10:09 AM >To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; stephen at >networkplumber.org; >dev at dpdk.org >Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio > >Hi Liu/Stephen/Mark, > > I have submitted Version 7 of this patch. Do let me know if this looks > proper. > >-- >Regards, >Souvik > >-Original Message- >From: Dey, Souvik >Sent: Thursday, September 29, 2016 4:32 PM >To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; stephen at >networkplumber.org; >dev at dpdk.org >Cc: Dey, Souvik >Subject: [PATCH v7] net/virtio: add set_mtu in virtio > > >Virtio interfaces do not currently allow the user to specify a particular >Maximum Transmission Unit (MTU).Consequently, the MTU of Virtio interfaces >is typically set to the Ethernet default value of 1500. >This is problematic in the case of cloud deployments, in which a specific >(and potentially non-standard) MTU needs to be set by a DHCP server, which >needs to be honored by all interfaces across the traffic path.To achieve >this Virtio interfaces should support setting of MTU. >In case when GRE/VXLAN tunneling is used for internal communication, there >will be an overhead added by the infrastructure in the packet over and >above the ETHER MTU of 1518. So to take care of this overhead in these >cases the DHCP server corrects the L3 MTU to 1454. But since virtio >interfaces was not having the MTU set functionality that MTU sent by the >DHCP server was ignored and the instance will still send packets with 1500 >MTU which after encapsulation will become more than 1518 and eventually >gets dropped in the infrastructure. >By adding an additional 'set_mtu' function to the Virtio driver, we can >honor the MTU sent by the DHCP server. The dhcp server/controller can >then leverage this 'set_mtu' functionality to resolve the above >mentioned issue of packets getting dropped due to incorrect size. > > >Signed-off-by: Souvik Dey > >--- >Changes in v7: >- Replaced the CRC_LEN with the merge rx buf header length. >- Changed the frame_len max validation to VIRTIO_MAX_RX_PKTLEN. >Changes in v6: >- Description of change corrected >- Corrected the identations >- Corrected the subject line too >- The From line was also not correct >- Re-submitting as the below patch was not proper >Changes in v5: >- Fix log message for out-of-bounds MTU parameter in virtio_mtu_set >- Calculate frame size, based on 'mtu' parameter >- Corrected the upper bound and lower bound checks in virtio_mtu_set >Changes in v4: Incorporated review comments. >Changes in v3: Corrected few style errors as reported by sys-stv. >Changes in v2: Incorporated review comments. > > drivers/net/virtio/virtio_ethdev.c | 16 > 1 file changed, 16 insertions(+) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index 423c597..1dbfea6 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c >@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) >PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); > } > >+#define VLAN_TAG_LEN 4/* 802.3ac tag (not DMA'd) */ There should be a blank line between the #define and the function prototype beneath. >+static int virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) >+{ >+ struct virtio_hw *hw = dev->data->dev_private; >+ uint32_t ether_hdr_len = ETHER_HDR_LEN + VLAN_TAG_LEN + >+ hw->vtnet_hdr_size; I'll rely on Stephen and Yuanhan's judgment for this. >+ uint32_t frame_size = mtu + ether_hdr_len; >+ >+ if (mtu < ETHER_MIN_MTU || frame_size > VIRTIO_MAX_RX_PKTLEN) { >+ PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n", >+ ETHER_MIN_MTU, VIRTIO_MAX_RX_PKTLEN); Shouldn't last format print parameter should be (VIRTIO_MAX_RX_PKTLEN - ether_hdr_len)? i.e PMD_INIT_LOG(ERR, "MTU should%d\n", ETHER_MIN_MTU, (VIRTIO_MAX_RX_PKTLEN - ether_hdr_len)); >+ return -EINVAL; >+ } >+ return 0; >+} > > /* > * dev_ops for virtio, bare necessities for basic operation > */ >@@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { > .allmulticast_enable = virtio_dev_allmulticast_enable, > .allmulticast_disable= virtio_dev_allmulticast_disable, >+ .mtu_set = virtio_mtu_set, > .dev_infos_get = virtio_dev_info_get, > .stats_get = virtio_dev_stats_get, > .xstats_get = virtio_dev_xstats_get, >-- >2.9.3.windows.1
[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio
> >Hi Stephen/Liu, > Any other comments or suggestions for this. If the below patch looks > fine then please >do suggest the next steps . Hi Souvik, Just to let you know that Yuanhan is out of office on account of public holidays in PRC. AFAIA he should be back online from next week. Thanks, Mark > >-- >Regards, >Souvik > >-Original Message- >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Dey, Souvik >Sent: Wednesday, October 5, 2016 10:05 AM >To: Kavanagh, Mark B ; yuanhan.liu at >linux.intel.com; >stephen at networkplumber.org >Cc: dev at dpdk.org >Subject: Re: [dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio > >Yes Mark, I have modified the patch with the below comments. > >drivers/net/virtio/virtio_ethdev.c | 17 + > 1 file changed, 17 insertions(+) > >diff --git a/drivers/net/virtio/virtio_ethdev.c >b/drivers/net/virtio/virtio_ethdev.c >index 423c597..1dbfea6 100644 >--- a/drivers/net/virtio/virtio_ethdev.c >+++ b/drivers/net/virtio/virtio_ethdev.c >@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) >PMD_INIT_LOG(ERR, "Failed to disable allmulticast"); } > >+#define VLAN_TAG_LEN 4/* 802.3ac tag (not DMA'd) */ >+ >+static int virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { >+ struct virtio_hw *hw = dev->data->dev_private; >+ uint32_t ether_hdr_len = ETHER_HDR_LEN + VLAN_TAG_LEN + >+ hw->vtnet_hdr_size; >+ uint32_t frame_size = mtu + ether_hdr_len; >+ >+ if (mtu < ETHER_MIN_MTU || frame_size > VIRTIO_MAX_RX_PKTLEN) { >+ PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n", >+ ETHER_MIN_MTU, (VIRTIO_MAX_RX_PKTLEN - ether_hdr_len)); >+ return -EINVAL; >+ } >+ return 0; >+} > >Let mem know if this looks good or we have few more comments. > >-- >Regards, >Souvik > >-Original Message- >From: Kavanagh, Mark B [mailto:mark.b.kavanagh at intel.com] >Sent: Wednesday, October 5, 2016 4:16 AM >To: Dey, Souvik ; yuanhan.liu at linux.intel.com; >stephen at networkplumber.org >Cc: dev at dpdk.org >Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio > >>Hi All, >> Is there any further comments or modifications required for this >>patch, or what next steps do you guys suggest here ? > >Hi Souvik, > >Some minor comments inline. > >Thanks, >Mark > >> >>-- >>Regards, >>Souvik >> >>-Original Message- >>From: Dey, Souvik >>Sent: Saturday, October 1, 2016 10:09 AM >>To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; >>stephen at networkplumber.org; dev at dpdk.org >>Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio >> >>Hi Liu/Stephen/Mark, >> >> I have submitted Version 7 of this patch. Do let me know if this looks >> proper. >> >>-- >>Regards, >>Souvik >> >>-Original Message- >>From: Dey, Souvik >>Sent: Thursday, September 29, 2016 4:32 PM >>To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; >>stephen at networkplumber.org; dev at dpdk.org >>Cc: Dey, Souvik >>Subject: [PATCH v7] net/virtio: add set_mtu in virtio >> >> >>Virtio interfaces do not currently allow the user to specify a >>particular Maximum Transmission Unit (MTU).Consequently, the MTU of >>Virtio interfaces is typically set to the Ethernet default value of 1500. >>This is problematic in the case of cloud deployments, in which a >>specific (and potentially non-standard) MTU needs to be set by a DHCP >>server, which needs to be honored by all interfaces across the traffic >>path.To achieve this Virtio interfaces should support setting of MTU. >>In case when GRE/VXLAN tunneling is used for internal communication, >>there will be an overhead added by the infrastructure in the packet >>over and above the ETHER MTU of 1518. So to take care of this overhead >>in these cases the DHCP server corrects the L3 MTU to 1454. But since >>virtio interfaces was not having the MTU set functionality that MTU >>sent by the DHCP server was ignored and the instance will still send >>packets with 1500 MTU which after encapsulation will become more than >>1518 and eventually gets dropped in the infrastructure. >>By adding an additional 'set_mtu' function to the Virtio driver, we can >>honor the MTU sent by the DHCP server. The dhcp server/controller can >>then leverage this 'set_mtu' functionality to resolve the above >>m
[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio
> >On Thu, Sep 29, 2016 at 04:31:30PM -0400, Dey wrote: >> /* >> * dev_ops for virtio, bare necessities for basic operation >> */ >> @@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { >> .allmulticast_enable = virtio_dev_allmulticast_enable, >> .allmulticast_disable= virtio_dev_allmulticast_disable, >> +.mtu_set = virtio_mtu_set, >> .dev_infos_get = virtio_dev_info_get, >> .stats_get = virtio_dev_stats_get, >> .xstats_get = virtio_dev_xstats_get, >> -- >> 2.9.3.windows.1 > >Your patch is malformed: I got an error while trying to apply it. > >patch: malformed patch at line 167: * dev_ops for virtio, >bare necessities for basic operation > >Seems like the way you were generating the patch is wrong. > >Anyway, I applied it manually, with the "- frame_size" fix as well >as few more minor coding style fixes. > >So applied to dpdk-next-virtio. Hi Yuanhan/Souvik, Given my contributions to this patch (and in particular since comments from here - http://dpdk.org/ml/archives/dev/2016-September/047208.html - were copied directly into the commit message), I think that I should have been added as co-author of the patch? Let me know if you think that I am mistaken. Thanks in advance, Mark > > --yliu
[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio
>-Original Message- >From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com] >Sent: Tuesday, October 11, 2016 5:01 AM >To: Kavanagh, Mark B >Cc: Dey ; stephen at networkplumber.org; dev at dpdk.org >Subject: Re: [PATCH v7] net/virtio: add set_mtu in virtio > >On Mon, Oct 10, 2016 at 10:49:22AM +, Kavanagh, Mark B wrote: >> > >> >On Thu, Sep 29, 2016 at 04:31:30PM -0400, Dey wrote: >> >> /* >> >> * dev_ops for virtio, bare necessities for basic operation >> >> */ >> >> @@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = { >> >> .allmulticast_enable = virtio_dev_allmulticast_enable, >> >> .allmulticast_disable= virtio_dev_allmulticast_disable, >> >> + .mtu_set = virtio_mtu_set, >> >> .dev_infos_get = virtio_dev_info_get, >> >> .stats_get = virtio_dev_stats_get, >> >> .xstats_get = virtio_dev_xstats_get, >> >> -- >> >> 2.9.3.windows.1 >> > >> >Your patch is malformed: I got an error while trying to apply it. >> > >> >patch: malformed patch at line 167: * dev_ops for virtio, >> >bare necessities for basic operation >> > >> >Seems like the way you were generating the patch is wrong. >> > >> >Anyway, I applied it manually, with the "- frame_size" fix as well >> >as few more minor coding style fixes. >> > >> >So applied to dpdk-next-virtio. >> >> Hi Yuanhan/Souvik, >> >> Given my contributions to this patch (and in particular since comments from >> here - >http://dpdk.org/ml/archives/dev/2016-September/047208.html - were copied >directly into the >commit message), I think that I should have been added as co-author of the >patch? > >Mark, > >I appreciate your contributions here. But for this case, I think it >might not be enough for adding you as the co-author: you don't co-write >the code with Souvik after all. > >However, I'd suggest you to add your "Reviewed-by" if a patch looks to >you after your review effort. This is another way to recognize your >contributions to a patch. No problem Yuanhan - thanks for clarifying. Reviewed-by: Mark Kavanagh > >Thanks. > > --yliu > > >> >> Let me know if you think that I am mistaken. >> >> Thanks in advance, >> Mark >> > >> >--yliu