>Hi Mark, > >Thanks for the patch! I've not done a very thorough review of this, but >my first-blush comments are inline.
Thanks for the review Aaron! Any additional comments are welcome - my initial responses are inline, below. Thanks, Mark > >Mark Kavanagh <mark.b.kavan...@intel.com> writes: >> Add support for Jumbo Frames to DPDK-enabled port types, >> using single-segment-mbufs. >> >> Using this approach, the amount of memory allocated for each mbuf >> to store frame data is increased to a value greater than 1518B >> (typical Ethernet maximum frame length). The increased space >> available in the mbuf means that an entire Jumbo Frame can be carried >> in a single mbuf, as opposed to partitioning it across multiple mbuf >> segments. >> >> The amount of space allocated to each mbuf to hold frame data is >> defined dynamically by the user when adding a DPDK port to a bridge. >> If an MTU value is not supplied, or the user-supplied value is invalid, >> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). >> >> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >> --- > >This is a new feature which has user-visible impact, so should be >announced in NEWS (plus, it's great news). No problem - I'll add an entry. > >> INSTALL.DPDK.md | 59 ++++++++- >> lib/netdev-dpdk.c | 356 >> +++++++++++++++++++++++++++++++++++++++++------------- >> 2 files changed, 328 insertions(+), 87 deletions(-) >> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> index 96b686c..2f23e27 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -859,10 +859,61 @@ by adding the following string: >> to <interface> sections of all network devices used by DPDK. Parameter 'N' >> determines how many queues can be used by the guest. >> >> + >> +Jumbo Frames >> +------------ >> + >> +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. >> + >> +To avail of Jumbo Frame support, add the '--dpdk-mtu' option to the >> ovs-vsctl >> +'add-port' command-line, along with the required MTU for the port. >> +e.g. > >This text and the example do not match. I think '--dpdk-mtu' is not valid. Good catch, thanks. > >> + >> + ``` >> + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >> options:dpdk-mtu=9000 >> + ``` >> + >> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are >> +increased, such that a full Jumbo Frame may be accommodated inside a single >> +mbuf segment. Once set, the MTU for a DPDK port is immutable. > >Why no support? DPDK supports changing the mtu. I guess my rationale here is that an MTU change can't be triggered via OVS command-line, nor can it be triggered programmatically via DPDK (apart from an explicit call to rte_eth_dev_set_mtu). So, while technically it's possibly, from a user's point of view, there's no way to configure it, outside of modifying the code directly. If I've missed something here, please let me know. > >> +Jumbo frame support has been validated against 13312B frames, using the >> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may >> +theoretically be supported. Supported port types excludes vHost-Cuse ports, >> as >> +this feature is pending deprecation. >> + >> + >> +vHost Ports and Jumbo Frames >> +---------------------------- >> +Jumbo frame support is available for DPDK vHost-User ports only. Some >> additional >> +configuration is needed to take advantage of this feature: >> + >> + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in >> + the QEMU command line snippet below: >> + >> + ``` >> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >> + '-device >> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >> + ``` >> + >> + 2. Where virtio devices are bound to the Linux kernel driver in a guest >> + environment (i.e. interfaces are not bound to an in-guest DPDK >> driver), the >> + MTU of those logical network interfaces must also be increased. This >> + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' >> refers >> + to the length of the IP packet only, and not that of the entire frame. >> + >> + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the >> L2 >> + header and CRC lengths (i.e. 18B) from the max supported frame size. >> + So, to set the MTU for a 13312B Jumbo Frame: >> + >> + ``` >> + ifconfig eth1 mtu 13294 >> + ``` >> + >> + >> Restrictions: >> ------------- >> >> - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. >> - Currently DPDK port does not make use any offload functionality. >> - DPDK-vHost support works with 1G huge pages. >> >> @@ -903,6 +954,12 @@ Restrictions: >> the next release of DPDK (which includes the above patch) is available >> and >> integrated into OVS. >> >> + Jumbo Frames: >> + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The source >> of >> + this issue is currently being investigated. >> + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports. >> + >> + >> Bug Reporting: >> -------------- >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index de7e488..5f23e28 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = >> VLOG_RATE_LIMIT_INIT(5, 20); >> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE >> #define OVS_VPORT_DPDK "ovs_dpdk" >> >> +#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1 >> +#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024 >> + >> /* >> * need to reserve tons of extra space in the mbufs so we can align the >> * DMA addresses to 4KB. >> * The minimum mbuf size is limited to avoid scatter behaviour and drop in >> * performance for standard Ethernet MTU. >> */ >> -#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >> -#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ >> - + sizeof(struct dp_packet) \ >> - + RTE_PKTMBUF_HEADROOM) >> -#define MBUF_SIZE_DRIVER (2048 \ >> - + sizeof (struct rte_mbuf) \ >> - + RTE_PKTMBUF_HEADROOM) >> -#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) >> +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >> +#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - >> ETHER_CRC_LEN) >> +#define MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \ >> + + sizeof(struct dp_packet) \ >> + + RTE_PKTMBUF_HEADROOM) >> + >> +/* This value should be specified as a multiple of the DPDK NIC driver's >> + * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). >> + */ >> +#define NETDEV_DPDK_MAX_FRAME_LEN 13312 >> >> /* Max and min number of packets in the mempool. OVS tries to allocate a >> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have >> @@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, >> 20); >> #define MIN_NB_MBUF (4096 * 4) >> #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE >> >> +#define DPDK_VLAN_TAG_LEN 4 >> + >> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ >> BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) == >> 0); >> >> @@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = { >> .header_split = 0, /* Header Split disabled */ >> .hw_ip_checksum = 0, /* IP checksum offload disabled */ >> .hw_vlan_filter = 0, /* VLAN filtering disabled */ >> - .jumbo_frame = 0, /* Jumbo Frame Support disabled */ >> .hw_strip_crc = 0, >> }, >> .rx_adv_conf = { >> @@ -254,6 +260,39 @@ is_dpdk_class(const struct netdev_class *class) >> return class->construct == netdev_dpdk_construct; >> } >> >> +/* DPDK NIC drivers allocate RX buffers at a particular granularity >> + * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for >> igb_uio). >> + * If 'frame_len' is not a multiple of this value, insufficient buffers are >> + * allocated to accomodate the packet in its entirety. Furthermore, the >> igb_uio >> + * driver needs to ensure that there is also sufficient space in the Rx >> buffer >> + * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is too >> + * small, then the driver enables scatter RX behaviour, which reduces >> + * performance. To prevent this, use a buffer size that is closest to >> + * 'frame_len', but which satisfies the aforementioned criteria. >> + */ >> +static uint32_t >> +dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len) >> +{ >> + struct rte_eth_dev_info info; >> + uint32_t buf_size; >> + uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN); >> + >> + if(netdev->type == DPDK_DEV_ETH) { >> + rte_eth_dev_info_get(netdev->port_id, &info); >> + buf_size = (info.min_rx_bufsize == 0) ? >> + NETDEV_DPDK_DEFAULT_RX_BUFSIZE : >> + info.min_rx_bufsize; > >Why not use the rte_eth_dev_get_mtu call to get the port configured mtu? I'm not sure if I follow - the MTU isn't the issue here, but rather the buffer size used when creating the rte_mempool, which must be a multiple of the driver's min_rx_bufsize. > >> + } else { >> + buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE; >> + } >> + >> + if(len % buf_size) { >> + len = buf_size * ((len/buf_size) + 1); >> + } >> + >> + return len; >> +} >> + >> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used >> * for all other segments data, bss and text. */ >> >> @@ -280,31 +319,70 @@ free_dpdk_buf(struct dp_packet *p) >> } >> >> static void >> -__rte_pktmbuf_init(struct rte_mempool *mp, >> - void *opaque_arg OVS_UNUSED, >> - void *_m, >> - unsigned i OVS_UNUSED) >> +ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) >> { >> - struct rte_mbuf *m = _m; >> - uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >> + struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv; >> + struct rte_pktmbuf_pool_private default_mbp_priv; >> + uint16_t roomsz; >> >> RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >> >> - memset(m, 0, mp->elt_size); >> + /* if no structure is provided, assume no mbuf private area */ >> + >> + user_mbp_priv = opaque_arg; >> + if (user_mbp_priv == NULL) { >> + default_mbp_priv.mbuf_priv_size = 0; >> + if (mp->elt_size > sizeof(struct dp_packet)) { >> + roomsz = mp->elt_size - sizeof(struct dp_packet); >> + } else { >> + roomsz = 0; >> + } >> + default_mbp_priv.mbuf_data_room_size = roomsz; >> + user_mbp_priv = &default_mbp_priv; >> + } >> + >> + RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) + >> + user_mbp_priv->mbuf_data_room_size + >> + user_mbp_priv->mbuf_priv_size); >> + >> + mbp_priv = rte_mempool_get_priv(mp); >> + memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv)); >> +} >> + >> +/* Initialise some fields in the mbuf structure that are not modified by the >> + * user once created (origin pool, buffer start address, etc.*/ >> +static void >> +__ovs_rte_pktmbuf_init(struct rte_mempool *mp, >> + void *opaque_arg OVS_UNUSED, >> + void *_m, >> + unsigned i OVS_UNUSED) >> +{ >> + struct rte_mbuf *m = _m; >> + uint32_t buf_size, buf_len, priv_size; >> + >> + priv_size = rte_pktmbuf_priv_size(mp); >> + buf_size = sizeof(struct dp_packet) + priv_size; >> + buf_len = rte_pktmbuf_data_room_size(mp); >> >> - /* start of buffer is just after mbuf structure */ >> - m->buf_addr = (char *)m + sizeof(struct dp_packet); >> - m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >> - sizeof(struct dp_packet); >> - m->buf_len = (uint16_t)buf_len; >> + RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size); >> + RTE_MBUF_ASSERT(mp->elt_size >= buf_size); >> + RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >> >> - /* keep some headroom between start of buffer and data */ >> - m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >> + memset(m, 0, mp->elt_size); >> >> - /* init some constant fields */ >> - m->pool = mp; >> - m->nb_segs = 1; >> - m->port = 0xff; >> + /* start of buffer is after dp_packet structure and priv data */ >> + m->priv_size = priv_size; >> + m->buf_addr = (char *)m + buf_size; >> + m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size; >> + m->buf_len = (uint16_t)buf_len; >> + >> + /* keep some headroom between start of buffer and data */ >> + m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len); >> + >> + /* init some constant fields */ >> + m->pool = mp; >> + m->nb_segs = 1; >> + m->port = 0xff; > >Please don't mix tabs and spaces in the file. Apologies - this was a copy/paste from a DPDK file, which uses tabs. > >> } >> >> static void >> @@ -315,7 +393,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, >> { >> struct rte_mbuf *m = _m; >> >> - __rte_pktmbuf_init(mp, opaque_arg, _m, i); >> + __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i); >> >> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >> } >> @@ -326,6 +404,7 @@ dpdk_mp_get(int socket_id, int mtu) >> OVS_REQUIRES(dpdk_mutex) >> struct dpdk_mp *dmp = NULL; >> char mp_name[RTE_MEMPOOL_NAMESIZE]; >> unsigned mp_size; >> + struct rte_pktmbuf_pool_private mbp_priv; >> >> LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { >> if (dmp->socket_id == socket_id && dmp->mtu == mtu) { >> @@ -338,6 +417,8 @@ dpdk_mp_get(int socket_id, int mtu) >> OVS_REQUIRES(dpdk_mutex) >> dmp->socket_id = socket_id; >> dmp->mtu = mtu; >> dmp->refcount = 1; >> + mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + >> RTE_PKTMBUF_HEADROOM; >> + mbp_priv.mbuf_priv_size = 0; >> >> mp_size = MAX_NB_MBUF; >> do { >> @@ -346,10 +427,10 @@ dpdk_mp_get(int socket_id, int mtu) >> OVS_REQUIRES(dpdk_mutex) >> return NULL; >> } >> >> - dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >> + dmp->mp = rte_mempool_create(mp_name, mp_size, >> MBUF_SEGMENT_SIZE(mtu), >> MP_CACHE_SZ, >> sizeof(struct >> rte_pktmbuf_pool_private), >> - rte_pktmbuf_pool_init, NULL, >> + ovs_rte_pktmbuf_pool_init, &mbp_priv, >> ovs_rte_pktmbuf_init, NULL, >> socket_id, 0); >> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >> MIN_NB_MBUF); >> @@ -433,6 +514,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >> n_rxq, int n_txq) >> { >> int diag = 0; >> int i; >> + struct rte_eth_conf conf = port_conf; >> >> /* A device may report more queues than it makes available (this has >> * been observed for Intel xl710, which reserves some of them for >> @@ -444,7 +526,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >> n_rxq, int >n_txq) >> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); >> } >> >> - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >> &port_conf); >> + if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) { >> + conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED; >> + conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu); >> + } >> + >> + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); >> if (diag) { >> break; >> } >> @@ -586,6 +673,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int >> port_no, >> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> int sid; >> int err = 0; >> + uint32_t buf_size; >> >> ovs_mutex_init(&netdev->mutex); >> ovs_mutex_lock(&netdev->mutex); >> @@ -605,10 +693,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int >> port_no, >> netdev->port_id = port_no; >> netdev->type = type; >> netdev->flags = 0; >> + >> + /* Initialize port's MTU and frame len to the default Ethernet values. >> + * Larger, user-specified (jumbo) frame buffers are accommodated in >> + * netdev_dpdk_set_config. >> + */ >> + netdev->max_packet_len = ETHER_MAX_LEN; >> netdev->mtu = ETHER_MTU; >> - netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); >> >> - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); >> + buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN); >> + netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >> FRAME_LEN_TO_MTU(buf_size)); >> if (!netdev->dpdk_mp) { >> err = ENOMEM; >> goto unlock; >> @@ -651,6 +745,24 @@ dpdk_dev_parse_name(const char dev_name[], const char >> prefix[], >> return 0; >> } >> >> +static void >> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu) >> +{ >> + const char *mtu_str = smap_get(args, "dpdk-mtu"); >> + int local_mtu; >> + >> + if(mtu_str) { >> + local_mtu = atoi(mtu_str); > >Please use strtol or strtoul here, and detect errors in the string by >checking endptr. That way, we can be sure that random garbage on the >stack won't accidentally become the mtu. Will do, thanks. > >> + } >> + if(!mtu_str || local_mtu < ETHER_MTU || >> + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN)) >> { >> + local_mtu = ETHER_MTU; >> + VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting to >> %d.\n", >> + local_mtu); >> + } >> + *mtu = local_mtu; >> +} >> + >> static int >> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) >> { >> @@ -777,11 +889,77 @@ netdev_dpdk_get_config(const struct netdev *netdev_, >> struct smap >*args) >> smap_add_format(args, "configured_rx_queues", "%d", netdev_->n_rxq); >> smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq); >> smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq); >> + smap_add_format(args, "mtu", "%d", dev->mtu); >> ovs_mutex_unlock(&dev->mutex); >> >> return 0; >> } >> >> +/* Set the mtu of DPDK_DEV_ETH ports */ >> +static int >> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >> +{ >> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + int old_mtu, err; >> + uint32_t buf_size; >> + int dpdk_mtu; >> + struct dpdk_mp *old_mp; >> + struct dpdk_mp *mp; >> + >> + ovs_mutex_lock(&dpdk_mutex); >> + ovs_mutex_lock(&dev->mutex); >> + if (dev->mtu == mtu) { >> + err = 0; >> + goto out; >> + } >> + >> + buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu)); >> + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >> + >> + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >> + if (!mp) { >> + err = ENOMEM; >> + goto out; >> + } >> + >> + rte_eth_dev_stop(dev->port_id); >> + >> + old_mtu = dev->mtu; >> + old_mp = dev->dpdk_mp; >> + dev->dpdk_mp = mp; >> + dev->mtu = mtu; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + >> + err = dpdk_eth_dev_init(dev); > >Why call the dpdk_eth_dev_init here? Since you are directly calling >rte_eth_dev_X functions here - wouldn't it make sense to use the >rte_eth_dev_start() instead? This is legacy code, but I can refactor if needs be. > >> + if (err) { >> + VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last known >> " >> + "good value '%d'\n", mtu, dev->up.name, old_mtu); >> + dpdk_mp_put(mp); >> + dev->mtu = old_mtu; >> + dev->dpdk_mp = old_mp; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + err = dpdk_eth_dev_init(dev); > >What happens if this returns a failure? By using dpdk_eth_dev_init, you >will cause a queue reconfiguration action, which is not needed, and can >error. > As above. >If you are intending to set MTU, I would have expected calls to >rte_eth_dev_set_mtu. However, I don't see any. Is there a reason? I >recognize that much of this is legacy, but since you are touching it :) I actually had a number of issues with rte_eth_dev_set_mtu early on in the patch; I can take a second look now that the code is more stable though. > >> + goto out; >> + } >> + >> + dpdk_mp_put(old_mp); >> + netdev_change_seq_changed(netdev); >> +out: >> + ovs_mutex_unlock(&dev->mutex); >> + ovs_mutex_unlock(&dpdk_mutex); >> + return err; >> +} >> + >> +static int >> +netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args) >> +{ >> + int mtu; >> + >> + dpdk_dev_parse_mtu(args, &mtu); >> + >> + return netdev_dpdk_set_mtu(netdev_, mtu); >> +} >> + >> static int >> netdev_dpdk_get_numa_id(const struct netdev *netdev_) >> { >> @@ -1358,54 +1536,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int >> *mtup) >> >> return 0; >> } >> - >> -static int >> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >> -{ >> - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> - int old_mtu, err; >> - struct dpdk_mp *old_mp; >> - struct dpdk_mp *mp; >> - >> - ovs_mutex_lock(&dpdk_mutex); >> - ovs_mutex_lock(&dev->mutex); >> - if (dev->mtu == mtu) { >> - err = 0; >> - goto out; >> - } >> - >> - mp = dpdk_mp_get(dev->socket_id, dev->mtu); >> - if (!mp) { >> - err = ENOMEM; >> - goto out; >> - } >> - >> - rte_eth_dev_stop(dev->port_id); >> - >> - old_mtu = dev->mtu; >> - old_mp = dev->dpdk_mp; >> - dev->dpdk_mp = mp; >> - dev->mtu = mtu; >> - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >> - >> - err = dpdk_eth_dev_init(dev); >> - if (err) { >> - dpdk_mp_put(mp); >> - dev->mtu = old_mtu; >> - dev->dpdk_mp = old_mp; >> - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >> - dpdk_eth_dev_init(dev); >> - goto out; >> - } >> - >> - dpdk_mp_put(old_mp); >> - netdev_change_seq_changed(netdev); >> -out: >> - ovs_mutex_unlock(&dev->mutex); >> - ovs_mutex_unlock(&dpdk_mutex); >> - return err; >> -} >> - >> static int >> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); >> >> @@ -1682,7 +1812,7 @@ netdev_dpdk_get_status(const struct netdev *netdev_, >> struct smap >*args) >> smap_add_format(args, "numa_id", "%d", >> rte_eth_dev_socket_id(dev->port_id)); >> smap_add_format(args, "driver_name", "%s", dev_info.driver_name); >> smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); >> - smap_add_format(args, "max_rx_pktlen", "%u", dev_info.max_rx_pktlen); >> + smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); >> smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); >> smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); >> smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs); >> @@ -1904,6 +2034,51 @@ dpdk_vhost_user_class_init(void) >> return 0; >> } >> >> +/* Set the mtu of DPDK_DEV_VHOST ports */ >> +static int >> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) >> +{ >> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + int err = 0; >> + struct dpdk_mp *old_mp; >> + struct dpdk_mp *mp; >> + >> + ovs_mutex_lock(&dpdk_mutex); >> + ovs_mutex_lock(&dev->mutex); >> + if (dev->mtu == mtu) { >> + err = 0; >> + goto out; >> + } >> + >> + mp = dpdk_mp_get(dev->socket_id, mtu); >> + if (!mp) { >> + err = ENOMEM; >> + goto out; >> + } >> + >> + old_mp = dev->dpdk_mp; >> + dev->dpdk_mp = mp; >> + dev->mtu = mtu; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + >> + dpdk_mp_put(old_mp); >> + netdev_change_seq_changed(netdev); >> +out: >> + ovs_mutex_unlock(&dev->mutex); >> + ovs_mutex_unlock(&dpdk_mutex); >> + return err; >> +} >> + >> +static int >> +netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap >> *args) >> +{ >> + int mtu; >> + >> + dpdk_dev_parse_mtu(args, &mtu); >> + >> + return netdev_dpdk_vhost_set_mtu(netdev_, mtu); >> +} >> + >> static void >> dpdk_common_init(void) >> { >> @@ -2040,8 +2215,9 @@ unlock_dpdk: >> return err; >> } >> >> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \ >> - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \ >> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \ >> + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \ >> + GET_STATUS, RXQ_RECV) \ >> { \ >> NAME, \ >> INIT, /* init */ \ >> @@ -2053,7 +2229,7 @@ unlock_dpdk: >> DESTRUCT, \ >> netdev_dpdk_dealloc, \ >> netdev_dpdk_get_config, \ >> - NULL, /* netdev_dpdk_set_config */ \ >> + SET_CONFIG, \ >> NULL, /* get_tunnel_config */ \ >> NULL, /* build header */ \ >> NULL, /* push header */ \ >> @@ -2067,7 +2243,7 @@ unlock_dpdk: >> netdev_dpdk_set_etheraddr, \ >> netdev_dpdk_get_etheraddr, \ >> netdev_dpdk_get_mtu, \ >> - netdev_dpdk_set_mtu, \ >> + SET_MTU, \ >> netdev_dpdk_get_ifindex, \ >> GET_CARRIER, \ >> netdev_dpdk_get_carrier_resets, \ >> @@ -2213,8 +2389,10 @@ static const struct netdev_class dpdk_class = >> NULL, >> netdev_dpdk_construct, >> netdev_dpdk_destruct, >> + netdev_dpdk_set_config, >> netdev_dpdk_set_multiq, >> netdev_dpdk_eth_send, >> + netdev_dpdk_set_mtu, >> netdev_dpdk_get_carrier, >> netdev_dpdk_get_stats, >> netdev_dpdk_get_features, >> @@ -2227,8 +2405,10 @@ static const struct netdev_class dpdk_ring_class = >> NULL, >> netdev_dpdk_ring_construct, >> netdev_dpdk_destruct, >> + netdev_dpdk_set_config, >> netdev_dpdk_set_multiq, >> netdev_dpdk_ring_send, >> + netdev_dpdk_set_mtu, >> netdev_dpdk_get_carrier, >> netdev_dpdk_get_stats, >> netdev_dpdk_get_features, >> @@ -2241,8 +2421,10 @@ static const struct netdev_class OVS_UNUSED >> dpdk_vhost_cuse_class = >> dpdk_vhost_cuse_class_init, >> netdev_dpdk_vhost_cuse_construct, >> netdev_dpdk_vhost_destruct, >> + NULL, >> netdev_dpdk_vhost_set_multiq, >> netdev_dpdk_vhost_send, >> + NULL, >> netdev_dpdk_vhost_get_carrier, >> netdev_dpdk_vhost_get_stats, >> NULL, >> @@ -2255,8 +2437,10 @@ static const struct netdev_class OVS_UNUSED >> dpdk_vhost_user_class = >> dpdk_vhost_user_class_init, >> netdev_dpdk_vhost_user_construct, >> netdev_dpdk_vhost_destruct, >> + netdev_dpdk_vhost_set_config, >> netdev_dpdk_vhost_set_multiq, >> netdev_dpdk_vhost_send, >> + netdev_dpdk_vhost_set_mtu, >> netdev_dpdk_vhost_get_carrier, >> netdev_dpdk_vhost_get_stats, >> NULL, _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev