Hi Ilya, Thanks for the comments, my replies are in-line.
On 17/10/2018 17:30, Ilya Maximets wrote: > On 10.10.2018 19:22, Tiago Lam wrote: >> From: Mark Kavanagh <mark.b.kavan...@intel.com> >> >> There are numerous factors that must be considered when calculating >> the size of an mbuf: >> - the data portion of the mbuf must be sized in accordance With Rx >> buffer alignment (typically 1024B). So, for example, in order to >> successfully receive and capture a 1500B packet, mbufs with a >> data portion of size 2048B must be used. >> - in OvS, the elements that comprise an mbuf are: >> * the dp packet, which includes a struct rte mbuf (704B) >> * RTE_PKTMBUF_HEADROOM (128B) >> * packet data (aligned to 1k, as previously described) >> * RTE_PKTMBUF_TAILROOM (typically 0) >> >> Some PMDs require that the total mbuf size (i.e. the total sum of all >> of the above-listed components' lengths) is cache-aligned. To satisfy >> this requirement, it may be necessary to round up the total mbuf size >> with respect to cacheline size. In doing so, it's possible that the >> dp_packet's data portion is inadvertently increased in size, such that >> it no longer adheres to Rx buffer alignment. Consequently, the >> following property of the mbuf no longer holds true: >> >> mbuf.data_len == mbuf.buf_len - mbuf.data_off >> >> This creates a problem in the case of multi-segment mbufs, where that >> assumption is assumed to be true for all but the final segment in an >> mbuf chain. Resolve this issue by adjusting the size of the mbuf's >> private data portion, as opposed to the packet data portion when >> aligning mbuf size to cachelines. >> >> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") >> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") >> CC: Santosh Shukla <santosh.shu...@caviumnetworks.com> >> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >> Acked-by: Santosh Shukla <santosh.shu...@caviumnetworks.com> >> Acked-by: Eelco Chaudron <echau...@redhat.com> >> --- >> lib/netdev-dpdk.c | 56 >> +++++++++++++++++++++++++++++++++++++------------------ >> 1 file changed, 38 insertions(+), 18 deletions(-) >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index f91aa27..11659eb 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = >> VLOG_RATE_LIMIT_INIT(5, 20); >> #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) >> #define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ >> - ETHER_HDR_LEN - ETHER_CRC_LEN) >> -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \ >> - + sizeof(struct dp_packet) \ >> - + RTE_PKTMBUF_HEADROOM), \ >> - RTE_CACHE_LINE_SIZE) >> #define NETDEV_DPDK_MBUF_ALIGN 1024 >> #define NETDEV_DPDK_MAX_PKT_LEN 9728 >> >> @@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool >> per_port_mp) >> char mp_name[RTE_MEMPOOL_NAMESIZE]; >> const char *netdev_name = netdev_get_name(&dev->up); >> int socket_id = dev->requested_socket_id; >> - uint32_t n_mbufs; >> + uint32_t n_mbufs = 0; >> + uint32_t mbuf_size = 0; >> + uint32_t aligned_mbuf_size = 0; >> + uint32_t mbuf_priv_data_len = 0; >> + uint32_t pkt_size = 0; >> uint32_t hash = hash_string(netdev_name, 0); >> struct dpdk_mp *dmp = NULL; >> int ret; >> @@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool >> per_port_mp) >> dmp->mtu = mtu; >> dmp->refcount = 1; >> >> + /* Get the size of each mbuf, based on the MTU */ >> + mbuf_size = dpdk_buf_size(dev->requested_mtu); > > 1. You're using 'requested_mtu' here directly, which is different to > 'mtu' passed to this function. This looks strange and misleading > for reader. Could you unify this? I see your point. The `mtu` passed to dpdk_mp_create() is a result of using the FRAME_LEN_TO_MTU macro on the buf_size already (in netdev_dpdk_mempool_configure()). So, I can use the reverse, MTU_TO_FRAME_LEN, to get bcak the mbuf_size. How does that sound? > > 2 Almost same as previous concern from Flavio. > 'mbuf_size' accounts RTE_PKTMBUF_HEADROOM here and rounded up. Thanks, I've replied below. > >> + >> n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp); >> >> do { >> @@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool >> per_port_mp) >> * so this is not an issue for tasks such as debugging. >> */ >> ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, >> - "ovs%08x%02d%05d%07u", >> - hash, socket_id, mtu, n_mbufs); >> + "ovs%08x%02d%05d%07u", >> + hash, socket_id, mtu, n_mbufs); >> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { >> VLOG_DBG("snprintf returned %d. " >> "Failed to generate a mempool name for \"%s\". " >> @@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool >> per_port_mp) >> break; >> } >> >> - VLOG_DBG("Port %s: Requesting a mempool of %u mbufs " >> - "on socket %d for %d Rx and %d Tx queues.", >> - netdev_name, n_mbufs, socket_id, >> - dev->requested_n_rxq, dev->requested_n_txq); >> - >> - dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, >> - MP_CACHE_SZ, >> - sizeof (struct dp_packet) >> - - sizeof (struct rte_mbuf), >> - MBUF_SIZE(mtu) >> - - sizeof(struct dp_packet), >> + VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u " >> + "on socket %d for %d Rx and %d Tx queues, " >> + "cache line size of %u", >> + netdev_name, n_mbufs, mbuf_size, socket_id, >> + dev->requested_n_rxq, dev->requested_n_txq, >> + RTE_CACHE_LINE_SIZE); >> + >> + mbuf_priv_data_len = sizeof(struct dp_packet) - >> + sizeof(struct rte_mbuf); >> + /* The size of the entire dp_packet. */ >> + pkt_size = sizeof (struct dp_packet) + >> + mbuf_size + RTE_PKTMBUF_HEADROOM; > > 2* Here RTE_PKTMBUF_HEADROOM accounted again. That is unclear. This had been removed already locally, only noticed after sending the patchset. Tiago. > >> + /* mbuf size, rounded up to cacheline size. */ >> + aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE); >> + /* If there is a size discrepancy, add padding to >> mbuf_priv_data_len. >> + * This maintains mbuf size cache alignment, while also honoring RX >> + * buffer alignment in the data portion of the mbuf. If this >> adjustment >> + * is not made, there is a possiblity later on that for an element >> of >> + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off). >> + * This is problematic in the case of multi-segment mbufs, >> particularly >> + * when an mbuf segment needs to be resized (when [push|popp]ing a >> VLAN >> + * header, for example. >> + */ >> + mbuf_priv_data_len += (aligned_mbuf_size - pkt_size); >> + >> + dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, >> + mbuf_priv_data_len, >> + mbuf_size, >> socket_id); >> >> if (dmp->mp) { >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev