> -----Original Message----- > From: Richardson, Bruce > Sent: Friday, December 05, 2014 12:11 PM > To: Ananyev, Konstantin > Cc: Jean-Mickael Guerin; dev at dpdk.org > Subject: Re: [PATCH v2] ixgbe: don't override mbuf buffer length > > On Fri, Dec 05, 2014 at 11:59:50AM +0000, Ananyev, Konstantin wrote: > > > > > > > -----Original Message----- > > > From: Richardson, Bruce > > > Sent: Friday, December 05, 2014 11:29 AM > > > To: Ananyev, Konstantin > > > Cc: Jean-Mickael Guerin; dev at dpdk.org > > > Subject: Re: [PATCH v2] ixgbe: don't override mbuf buffer length > > > > > > On Fri, Dec 05, 2014 at 11:19:11AM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Richardson, Bruce > > > > > Sent: Friday, December 05, 2014 10:39 AM > > > > > To: Ananyev, Konstantin > > > > > Cc: Jean-Mickael Guerin; dev at dpdk.org > > > > > Subject: Re: [PATCH v2] ixgbe: don't override mbuf buffer length > > > > > > > > > > On Fri, Dec 05, 2014 at 01:10:28AM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Jean-Mickael Guerin [mailto:jean-mickael.guerin at > > > > > > > 6wind.com] > > > > > > > Sent: Thursday, December 04, 2014 6:09 PM > > > > > > > To: dev at dpdk.org > > > > > > > Cc: Richardson, Bruce; Ananyev, Konstantin > > > > > > > Subject: [PATCH v2] ixgbe: don't override mbuf buffer length > > > > > > > > > > > > > > The template mbuf_initializer is hard coded with a buflen which > > > > > > > might have been set differently by the application at the time of > > > > > > > mbuf pool creation. > > > > > > > > > > > > > > Switch to a mbuf allocation, to fetch the correct default values. > > > > > > > There is no performance impact because this is not a data-plane > > > > > > > API. > > > > > > > > > > > > > > Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin at > > > > > > > 6wind.com> > > > > > > > Acked-by: David Marchand <david.marchand at 6wind.com> > > > > > > > Fixes: 0ff3324da2 ("ixgbe: rework vector pmd following mbuf > > > > > > > changes") > > > > > > > --- > > > > > > > > > > > > > > v2: check returned value of ixgbe_rxq_vec_setup > > > > > > > > > > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 5 ++++- > > > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 19 ++++++++++++------- > > > > > > > 2 files changed, 16 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > > > > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > > > > > index 5c36bff..7994da1 100644 > > > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c > > > > > > > @@ -2244,7 +2244,10 @@ ixgbe_dev_rx_queue_setup(struct > > > > > > > rte_eth_dev *dev, > > > > > > > use_def_burst_func = > > > > > > > check_rx_burst_bulk_alloc_preconditions(rxq); > > > > > > > > > > > > > > #ifdef RTE_IXGBE_INC_VECTOR > > > > > > > - ixgbe_rxq_vec_setup(rxq); > > > > > > > + if (ixgbe_rxq_vec_setup(rxq) < 0) { > > > > > > > + ixgbe_rx_queue_release(rxq); > > > > > > > + return (-ENOMEM); > > > > > > > + } > > > > > > > #endif > > > > > > > /* Check if pre-conditions are satisfied, and no Scattered Rx */ > > > > > > > if (!use_def_burst_func && !dev->data->scattered_rx) { > > > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > > > > > > b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > > > > > > index c1b5a78..f7b02f5 100644 > > > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > > > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c > > > > > > > @@ -732,17 +732,22 @@ static struct ixgbe_txq_ops vec_txq_ops = { > > > > > > > int > > > > > > > ixgbe_rxq_vec_setup(struct igb_rx_queue *rxq) > > > > > > > { > > > > > > > - struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */ > > > > > > > + struct rte_mbuf *mb_def; > > > > > > > > > > > > > > - mb_def.nb_segs = 1; > > > > > > > - mb_def.data_off = RTE_PKTMBUF_HEADROOM; > > > > > > > - mb_def.buf_len = rxq->mb_pool->elt_size - sizeof(struct > > > > > > > rte_mbuf); > > > > > > > - mb_def.port = rxq->port_id; > > > > > > > - rte_mbuf_refcnt_set(&mb_def, 1); > > > > > > > + mb_def = rte_pktmbuf_alloc(rxq->mb_pool); > > > > > > > + if (mb_def == NULL) { > > > > > > > + PMD_INIT_LOG(ERR, "ixgbe_rxq_vec_setup: could not > > > > > > > allocate one mbuf"); > > > > > > > + return -1; > > > > > > > + } > > > > > > > + /* nb_segs, refcnt, data_off and buf_len are already set */ > > > > > > > + mb_def->port = rxq->port_id; > > > > > > > > > > > > > > /* prevent compiler reordering: rearm_data covers previous > > > > > > > fields */ > > > > > > > rte_compiler_barrier(); > > > > > > > - rxq->mbuf_initializer = *((uint64_t *)&mb_def.rearm_data); > > > > > > > + rxq->mbuf_initializer = *((uint64_t *)&mb_def->rearm_data); > > > > > > > + > > > > > > > + rte_pktmbuf_free(mb_def); > > > > > > > + > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > -- > > > > > > > 2.1.3 > > > > > > > > > > > > As I said in another mail, I don't think it is a proper fix. > > > > > > What we did here - just changed one assumption to another. > > > > > > Current assumption - all mbuf obj_init() would setup buf_len in > > > > > > exactly the same manner as rte_pktmbuf_init() does: > > > > > > buf_len = mp->elt_size - sizeof(struct rte_mbuf); > > > > > > New assumption - all mbuf obj_init() would setup buf_len for all > > > > > > mbufs in the pool to the same value. > > > > > > Both assumptions, I believe, are not always correct. > > > > > > Though, probably the new one would be true more often. > > > > > > > > > > > > I still think the proper fix is not to update mbuf's buf_len field > > > > > > at ixgbe_rxq_rearm() at all. > > > > > > We should just leave the original value unmodified. > > > > > > Actually, while looking at ixgbe_rxq_rearm(), I don't see any > > > > > > reason why we need to update buf_len field. > > > > > > It is not the data that need to be rearmed. > > > > > > The fields that need to be rearmed are: > > > > > > uint16_t data_off; > > > > > > uint16_t refcnt > > > > > > uint8_t nb_segs; > > > > > > uint8_t port; > > > > > > > > > > > > 6B in total. > > > > > > We probably would like to keep rearming as one 64bit load/store. > > > > > > Though straight below them we have: > > > > > > uint64_t ol_flags; > > > > > > > > > > > > As RX fully override ol_flags anyway, we can safely overwrite first > > > > > > 2B of it. > > > > > > That would allow us to still read/write whole 64bits and avoid > > > > > > overwriting buf_len. > > > > > > I am talking about something like patch below. > > > > > > I admit that it looks not so pretty, but I think it is much safer > > > > > > and correct. > > > > > > Konstantin > > > > > > > > > > I just don't see this as worthwhile doing. We are looking here at an > > > > > mbuf pool > > > > > which is to be used for packet buffers for RX. If the packet buffers > > > > > in that > > > > > pool are all of different sizes then we need to go back and look at > > > > > other places > > > > > throughout the code too. For instance, we query the buffer length of > > > > > the mbuf > > > > > pool when initializing the RX queues to determine if we need to > > > > > enable scattered > > > > > RX. If the mbufs in the pool can potentially be of different sizes, > > > > > we need to > > > > > turn off the no-scattered-packets optimization and always use the > > > > > scatter > > > > > packets code path - because the assumption that buffers don't get > > > > > resized could > > > > > also be false. Similarly here, if the mbufs are going to be of > > > > > different > > > > > sizes then the user should disable the vector PMD, and use RX code > > > > > that doesn't > > > > > override the buf_len each time. > > > > > > > > Yes, all buffers in the pool supposed to be not smaller than some > > > > threshold. > > > > But it doesn't mean that buffer can't be bigger. > > > > Let say our usual case - buf_len = 2K. > > > > Why it should be prohibited to use for RX mbuf with buf_len == 4K? > > > > There is absolutely nothing wrong with it. > > > > > > > > > > > > > > Furthermore, we still don't have an actual use-case where the user > > > > > would want to > > > > > have different size mbufs in an mbuf pool used for RX. They can still > > > > > have > > > > > variable-sized mbufs in other pools, but having all buffers the same > > > > > size in the > > > > > pool used to receive packets seems a perfectly fair restriction to > > > > > have. If > > > > > someone has an app that they are creating that needs this > > > > > functionality, I'll > > > > > reconsider this opinion, but right now this is all theoretical. > > > > > > > > Well, the question is why do we need to override buf_len at all? > > > > At first place, it doesn't look correct. > > > > Second - what advantage we are gaining from it? > > > > Performance? > > > > I tried with the changes below and didn't see any performance > > > > difference at all. > > > > So what is the point in keeping that potential point of failure? > > > > > > > > About use cases - I think I gave two examples in my previous mail. > > > > Not sure why you consider them artificial. > > > > From other side, do we have right now any actual use-case where people > > > > setup buf_len to something > > > > different then: mp->elt_size - sizeof(struct rte_mbuf)? > > > > > > > > Konstantin > > > > > > > Ok, thanks for the info Konstantin - I assume there is a current case > > > where the > > > mbuf is being initialized to something other than the default, on the > > > basis that > > > this patch was proposed as a fix for it. > > > > > > However, rather than argue this further, I'll defer to whatever the rest > > > of the > > > community decides. Thomas, Jean-Mickael, David - what are your opinions > > > on this? > > > > That's ok, but I still wonder why do we need to keep buf_len under > > rearm_data marker and overwrite it for each RX? > > Honestly, probably there exist some obvious reason, that I am missing? > > To keep code tidy and easier to read? > > Or to keep whole (uint64_t)rearm_data on 64bits boundary? > > That are all good reasons, but I suppose data integrity is more important, > > right? > > > Yes, those are the reasons it was put there (easy to read, aligned), but also > because it was viewed as being constant for a queue, i.e. the case where you > would have different sized mbufs in a pool was never considered. [The first > line > of our documentation for the mempools is "A memory pool is an allocator of a > fixed-sized object", and everything is based around that idea of fixed size > object pools.] > When doing the mbuf redesign, I pulled together what were the fields that were > constant for each queue, and grouped them to a single 8-byte write to give > faster (and shorter in terms of code) mbuf initialization. >
Ok, thanks for explanation. I'll submit the fix I am proposing as RFC patch. Konstantin