Hi, Mark. On 03.08.2016 15:14, Kavanagh, Mark B wrote: >> >> Hi Daniele. Thanks for posting this. > > Hi Ilya, > > I actually implemented this patch as part of Daniele's MTU patchset, based on > my earlier patch - Daniele mainly rebased it to head of master :) > > Thanks for your feedback - I've responded inline. > > Cheers, > Mark > >> I have almost same patch in my local branch. >> >> I didn't test this with physical DPDK NICs yet, but I have few >> high level comments: >> >> 1. Do you thought about renaming of 'mtu_request' inside netdev-dpdk >> to 'requested_mtu'? I think, this would be more clear and >> consistent with other configurable parameters (n_rxq, n_txq, ...). > > 'mtu_request' was the name suggested by Daniele, following a discussion with > colleagues. > I don't have strong feelings either way, so I'll leave Daniele to comment.
I meant only renaming of 'netdev_dpdk->mtu_request' to 'netdev_dpdk->requested_mtu'. Database column should be 'mtu_request' as it is now. >> >> 2. I'd prefer not to fail reconfiguration if there is no enough memory >> for new mempool. I think, it'll be common situation when we are >> requesting more memory than we have. Failure leads to destruction >> of the port and inability to reconnect to vhost-user port after >> re-creation if vhost is in server mode. We can just keep old >> mempool and inform user via VLOG_ERR. >> > Agreed - I'll modify V2 accordingly. > > >> 3. Minor issues inline. > > Comments on these inline also. > >> >> What do you think? >> >> Best regards, Ilya Maximets. >> >> On 30.07.2016 04:22, Daniele Di Proietto wrote: >>> From: Mark Kavanagh <mark.b.kavan...@intel.com> >>> >>> Add support for Jumbo Frames to DPDK-enabled port types, >>> using single-segment-mbufs. >>> >>> Using this approach, the amount of memory allocated to each mbuf >>> to store frame data is increased to a value greater than 1518B >>> (typical Ethernet maximum frame length). The increased space >>> available in the mbuf means that an entire Jumbo Frame of a specific >>> size can be carried in a single mbuf, as opposed to partitioning >>> it across multiple mbuf segments. >>> >>> The amount of space allocated to each mbuf to hold frame data is >>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request' >>> parameter. >>> >>> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >>> [diproiet...@vmware.com rebased] >>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com> >>> --- >>> INSTALL.DPDK-ADVANCED.md | 59 +++++++++++++++++- >>> INSTALL.DPDK.md | 1 - >>> NEWS | 1 + >>> lib/netdev-dpdk.c | 151 >>> +++++++++++++++++++++++++++++++++++++++-------- >>> 4 files changed, 185 insertions(+), 27 deletions(-) >>> >>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >>> index 191e69e..5cd64bf 100755 >>> --- a/INSTALL.DPDK-ADVANCED.md >>> +++ b/INSTALL.DPDK-ADVANCED.md >>> @@ -1,5 +1,5 @@ >>> OVS DPDK ADVANCED INSTALL GUIDE >>> -================================= >>> +=============================== >>> >>> ## Contents >>> >>> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE >>> 7. [QOS](#qos) >>> 8. [Rate Limiting](#rl) >>> 9. [Flow Control](#fc) >>> -10. [Vsperf](#vsperf) >>> +10. [Jumbo Frames](#jumbo) >>> +11. [Vsperf](#vsperf) >>> >>> ## <a name="overview"></a> 1. Overview >>> >>> @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at >>> tx side, >>> >>> `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false` >>> >>> -## <a name="vsperf"></a> 10. Vsperf >>> +## <a name="jumbo"></a> 10. Jumbo Frames >>> + >>> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). >>> To >>> +enable Jumbo Frames support for a DPDK port, change the Interface's >>> `mtu_request` >>> +attribute to a sufficiently large value. >>> + >>> +e.g. Add a DPDK Phy port with MTU of 9000: >>> + >>> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set >>> Interface dpdk0 >> mtu_request=9000` >>> + >>> +e.g. Change the MTU of an existing port to 6200: >>> + >>> +`ovs-vsctl set Interface dpdk0 mtu_request=6200` >>> + >>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are >>> +increased, such that a full Jumbo Frame of a specific size may be >>> accommodated >>> +within a single mbuf segment. >>> + >>> +Jumbo frame support has been validated against 9728B frames (largest frame >>> size >>> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger >>> frames >>> +(particularly in use cases involving East-West traffic only), and other >>> DPDK NIC >>> +drivers may be supported. >>> + >>> +### 9.1 vHost Ports and Jumbo Frames >>> + >>> +Some additional configuration is needed to take advantage of jumbo frames >>> with >>> +vhost ports: >>> + >>> + 1. `mergeable buffers` must be enabled for vHost ports, as >>> demonstrated in >>> + the QEMU command line snippet below: >>> + >>> + ``` >>> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >>> + '-device >>> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >>> + ``` >>> + >>> + 2. Where virtio devices are bound to the Linux kernel driver in a guest >>> + environment (i.e. interfaces are not bound to an in-guest DPDK >>> driver), >>> + the MTU of those logical network interfaces must also be increased >>> to a >>> + sufficiently large value. This avoids segmentation of Jumbo Frames >>> + received in the guest. Note that 'MTU' refers to the length of the >>> IP >>> + packet only, and not that of the entire frame. >>> + >>> + To calculate the exact MTU of a standard IPv4 frame, subtract the L2 >>> + header and CRC lengths (i.e. 18B) from the max supported frame size. >>> + So, to set the MTU for a 9018B Jumbo Frame: >>> + >>> + ``` >>> + ifconfig eth1 mtu 9000 >>> + ``` >>> +>>>>>>> 5ec921d... netdev-dpdk: add support for Jumbo Frames >> >> Looks like rebasing artefact. > > Yup - I'll remove in V2. > >> >>> + >>> +## <a name="vsperf"></a> 11. Vsperf >>> >>> Vsperf project goal is to develop vSwitch test framework that can be used >>> to >>> validate the suitability of different vSwitch implementations in a Telco >>> deployment >>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >>> index 7609aa7..25c79de 100644 >>> --- a/INSTALL.DPDK.md >>> +++ b/INSTALL.DPDK.md >>> @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough]. >>> >>> ## <a name="ovslimits"></a> 6. Limitations >>> >>> - - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future >>> OVS release. >>> - Currently DPDK ports does not use HW offload functionality. >>> - Network Interface Firmware requirements: >>> Each release of DPDK is validated against a specific firmware version >>> for >>> diff --git a/NEWS b/NEWS >>> index 0ff5616..c004e5f 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -68,6 +68,7 @@ Post-v2.5.0 >>> is enabled in DPDK. >>> * Basic connection tracking for the userspace datapath (no ALG, >>> fragmentation or NAT support yet) >>> + * Jumbo frame support >>> - Increase number of registers to 16. >>> - ovs-benchmark: This utility has been removed due to lack of use and >>> bitrot. >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index 0b6e410..68639ae 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -82,6 +82,7 @@ static struct vlog_rate_limit rl = >>> VLOG_RATE_LIMIT_INIT(5, 20); >>> + sizeof(struct dp_packet) \ >>> + RTE_PKTMBUF_HEADROOM) >>> #define NETDEV_DPDK_MBUF_ALIGN 1024 >>> +#define NETDEV_DPDK_MAX_PKT_LEN 9728 >>> >>> /* Max and min number of packets in the mempool. OVS tries to allocate a >>> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have >>> @@ -336,6 +337,7 @@ struct netdev_dpdk { >>> struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex); >>> >>> struct dpdk_mp *dpdk_mp; >>> + int mtu_request; >>> int mtu; >>> int socket_id; >>> int buf_size; >>> @@ -474,10 +476,19 @@ dpdk_mp_get(int socket_id, int mtu) >>> OVS_REQUIRES(dpdk_mutex) >>> dmp->mtu = mtu; >>> dmp->refcount = 1; >>> mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct >>> dp_packet); >>> - mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) - >>> - sizeof (struct rte_mbuf); >>> + mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) >>> + - sizeof (struct rte_mbuf); >>> + /* XXX: this is a really rough method of provisioning memory. >>> + * It's impossible to determine what the exact memory requirements are >>> when >>> + * the number of ports and rxqs that utilize a particular mempool can >>> change >>> + * dynamically at runtime. For the moment, use this rough heurisitic. >>> + */ >>> + if (mtu >= ETHER_MTU) { >>> + mp_size = MAX_NB_MBUF; >>> + } else { >>> + mp_size = MIN_NB_MBUF; >>> + } >>> >>> - mp_size = MAX_NB_MBUF; >>> do { >>> if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", >>> dmp->mtu, dmp->socket_id, mp_size) < 0) { >>> @@ -522,6 +533,32 @@ dpdk_mp_put(struct dpdk_mp *dmp) >>> #endif >>> } >>> >>> +static int >>> +dpdk_mp_configure(struct netdev_dpdk *dev) >>> + OVS_REQUIRES(dpdk_mutex) >>> + OVS_REQUIRES(dev->mutex) >>> +{ >>> + uint32_t buf_size = dpdk_buf_size(dev->mtu); >>> + struct dpdk_mp *mp_old, *mp; >>> + >>> + mp_old = dev->dpdk_mp; >> >> Do we need this variable? We can just put old mempool before >> assigning the new one. > > Agreed - will remove in V2. > >> >>> + >>> + mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); >>> + if (!mp) { >>> + VLOG_ERR("Insufficient memory to create memory pool for netdev >>> %s\n", >>> + dev->up.name); >> >> +1 space character. > > Sure - will fix. > >> >>> + return ENOMEM; >>> + } >>> + >>> + dev->dpdk_mp = mp; >>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> + >>> + dpdk_mp_put(mp_old); >>> + >>> + return 0; >>> +} >>> + >>> + >>> static void >>> check_link_status(struct netdev_dpdk *dev) >>> { >>> @@ -573,7 +610,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >>> n_rxq, int >> n_txq) >>> { >>> int diag = 0; >>> int i; >>> + struct rte_eth_conf conf = port_conf; >>> >>> + if (dev->mtu > ETHER_MTU) { >>> + conf.rxmode.jumbo_frame = 1; >>> + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; >>> + } else { >>> + conf.rxmode.jumbo_frame = 0; >>> + conf.rxmode.max_rx_pkt_len = 0; >> >> I know, it was implemented this way in the original patch, but I'm >> not sure that all DPDK drivers will handle zero value of >> 'max_rx_pkt_len' in a right way. > > Out of interest, can you point to any examples of DPDK drivers handling > max_rx_pkt_len = 0 with unintended behaviour? > This field should only be of relevance if rxmode.jumbo_frame = 1, and 0 is > its default value; if it is an issue though, I don't know exactly. One place in DPDK i40e driver confuses me a little. It's function 'i40evf_rxq_init()' in drivers/net/i40e/i40e_ethdev_vf.c . It checks the value of 'rxq->max_pkt_len' if 'dev_conf.rxmode.jumbo_frame == 0'. > I can just default instead to ETHER_MAX_LEN. Will VLANs be handled properly in this case? _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev