Hi Robert, On 08/23/2016 10:01 PM, Sanford, Robert wrote: > Hi Olivier, > > On 8/23/16, 11:09 AM, "Olivier MATZ" <olivier.matz at 6wind.com> wrote: > > Hi Robert, > > On 08/01/2016 10:42 PM, Robert Sanford wrote: > > The following log message may appear after a slave is idle (or nearly > > idle) for a few minutes: "PMD: Failed to allocate LACP packet from > > pool". > > > > Problem: All mbufs from a slave's private pool (used exclusively for > > transmitting LACPDUs) have been allocated and are still sitting in > > the device's tx descriptor ring and other cores' mempool caches. > > > > Solution: Ensure that each slaves' tx (LACPDU) mempool owns more than > > n-tx-queues * (n-tx-descriptors + per-core-mempool-flush-threshold) > > mbufs. > > > > Note that the LACP tx machine function is the only code that allocates > > from a slave's private pool. It runs in the context of the interrupt > > thread, and thus it has no mempool cache of its own. > > > > Signed-off-by: Robert Sanford <rsanford at akamai.com> > > --- > > drivers/net/bonding/rte_eth_bond_8023ad.c | 10 +++++++--- > > 1 files changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c > b/drivers/net/bonding/rte_eth_bond_8023ad.c > > index 2f7ae70..1207896 100644 > > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c > > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c > > @@ -854,6 +854,8 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev > *bond_dev, uint8_t slave_id) > > char mem_name[RTE_ETH_NAME_MAX_LEN]; > > int socket_id; > > unsigned element_size; > > + unsigned cache_size; > > + unsigned cache_flushthresh; > > uint32_t total_tx_desc; > > struct bond_tx_queue *bd_tx_q; > > uint16_t q_id; > > @@ -890,19 +892,21 @@ bond_mode_8023ad_activate_slave(struct > rte_eth_dev *bond_dev, uint8_t slave_id) > > > > element_size = sizeof(struct slow_protocol_frame) + > sizeof(struct rte_mbuf) > > + RTE_PKTMBUF_HEADROOM; > > + cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? > > + 32 : RTE_MEMPOOL_CACHE_MAX_SIZE; > > + cache_flushthresh = > RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size); > > > > /* The size of the mempool should be at least: > > * the sum of the TX descriptors + > BOND_MODE_8023AX_SLAVE_TX_PKTS */ > > total_tx_desc = BOND_MODE_8023AX_SLAVE_TX_PKTS; > > for (q_id = 0; q_id < bond_dev->data->nb_tx_queues; q_id++) { > > bd_tx_q = (struct > bond_tx_queue*)bond_dev->data->tx_queues[q_id]; > > - total_tx_desc += bd_tx_q->nb_tx_desc; > > + total_tx_desc += bd_tx_q->nb_tx_desc + > cache_flushthresh; > > } > > > > snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", > slave_id); > > port->mbuf_pool = rte_mempool_create(mem_name, > > - total_tx_desc, element_size, > > - RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? 32 : > RTE_MEMPOOL_CACHE_MAX_SIZE, > > + total_tx_desc, element_size, cache_size, > > sizeof(struct rte_pktmbuf_pool_private), > rte_pktmbuf_pool_init, > > NULL, rte_pktmbuf_init, NULL, socket_id, > MEMPOOL_F_NO_SPREAD); > > > > > > I'm not very familiar with bonding code, so maybe your patch is correct. > I think the size of the mempool should be: > > BOND_MODE_8023AX_SLAVE_TX_PKTS + > n_cores * RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size) > > With n_cores = number of cores that can dequeue from the mempool. > > The safest thing to do would be to have n_cores = RTE_MAX_LCORE. I don't > know if bond_dev->data->nb_tx_queue corresponds to this definition, if > yes you can ignore my comment ;) > > > Regards, > Olivier > > -- > > Only one, non-EAL thread dequeues from a per-slave, private (LACPDU) mempool. > The thread calls mbuf-alloc in the context of an rte_eal_alarm_set( ) > callback function, and enqueues the mbuf to a multi-consumer ring. > > The consumer(s) of the mbufs can be any core(s) that invoke the bond PMD?s > tx-burst function. As we know, mbufs can sit around on TX descriptor rings > long after we transmit them, often until we need to reuse the descriptors for > subsequent TX packets. When the TX cores finally free some of the mbufs, some > of the mbufs get left in a pool?s per-core cache. So, the maximum number of > mbufs that may be lying around doing nothing is: the ?done? mbufs in the TX > desc ring and in the per-core cache, times the number of TX cores, or > approximately nb_tx_queues * (nb_tx_desc + CACHE_FLUSHTHRESH(cache_size)). (I > probably should also update the comment above the for-loop that calculates > total_tx_desc.) > > When we include normal TX mbufs and private LACPDU mbufs displacing each > other in the TX desc ring, there are many (if not infinite) possibilities. I > wrote a program to simulate all this interaction, trying different values for > cache_size, nb_tx_desc, etc., because it took a long time to run out of > LACPDU mbufs (or it never happened) just using long-interval pings. > > Anyway, I hope that this is a nice, long way of saying that I will ignore > your comment. ;-)
OK, thanks for the detailed explanation. > Another idea, related to mempools (but not this LACP patch) ... > Back when I first investigated the LACP pool problem, I got an idea related > to the mempool per-core caches. It seems like a waste that in some usage > patterns (like above), a core may ?put? objects to a pool, but rarely/never > ?get? objects from the pool, the result being that it rarely/never reuses > objects stuck in the per-core cache. We could recognize this kind of behavior > by adding a small counter to the rte_mempool_cache struct. During a put, we > increment the counter, and during a get, we set the counter to zero. If, > during a put that is going to flush some of the objects to the global part of > the pool, the counter is at or above some threshold (8, 16, 32, or whatever), > then we flush ALL of the objects, since this core hasn?t been using them. > What do you think? While I understand the need, it looks a bit complicated to me. When I read your first patch, I was thinking about automatically increase the size of the pool in mempool_create(), according to the number of cores and size of the cache. But it would have too many drawbacks: - waste of memory in many cases, especially for large objects - would not take the external caches (new 16.07 feature) in account - behavior change