Hi Robert,
On 08/23/2016 10:01 PM, Sanford, Robert wrote:
> Hi Olivier,
>
> On 8/23/16, 11:09 AM, "Olivier MATZ" <olivier.matz at 6wind.com> wrote:
>
> Hi Robert,
>
> On 08/01/2016 10:42 PM, Robert Sanford wrote:
> > The following log message may appear after a slave is idle (or nearly
> > idle) for a few minutes: "PMD: Failed to allocate LACP packet from
> > pool".
> >
> > Problem: All mbufs from a slave's private pool (used exclusively for
> > transmitting LACPDUs) have been allocated and are still sitting in
> > the device's tx descriptor ring and other cores' mempool caches.
> >
> > Solution: Ensure that each slaves' tx (LACPDU) mempool owns more than
> > n-tx-queues * (n-tx-descriptors + per-core-mempool-flush-threshold)
> > mbufs.
> >
> > Note that the LACP tx machine function is the only code that allocates
> > from a slave's private pool. It runs in the context of the interrupt
> > thread, and thus it has no mempool cache of its own.
> >
> > Signed-off-by: Robert Sanford <rsanford at akamai.com>
> > ---
> > drivers/net/bonding/rte_eth_bond_8023ad.c | 10 +++++++---
> > 1 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c
> b/drivers/net/bonding/rte_eth_bond_8023ad.c
> > index 2f7ae70..1207896 100644
> > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
> > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
> > @@ -854,6 +854,8 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev
> *bond_dev, uint8_t slave_id)
> > char mem_name[RTE_ETH_NAME_MAX_LEN];
> > int socket_id;
> > unsigned element_size;
> > + unsigned cache_size;
> > + unsigned cache_flushthresh;
> > uint32_t total_tx_desc;
> > struct bond_tx_queue *bd_tx_q;
> > uint16_t q_id;
> > @@ -890,19 +892,21 @@ bond_mode_8023ad_activate_slave(struct
> rte_eth_dev *bond_dev, uint8_t slave_id)
> >
> > element_size = sizeof(struct slow_protocol_frame) +
> sizeof(struct rte_mbuf)
> > + RTE_PKTMBUF_HEADROOM;
> > + cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
> > + 32 : RTE_MEMPOOL_CACHE_MAX_SIZE;
> > + cache_flushthresh =
> RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size);
> >
> > /* The size of the mempool should be at least:
> > * the sum of the TX descriptors +
> BOND_MODE_8023AX_SLAVE_TX_PKTS */
> > total_tx_desc = BOND_MODE_8023AX_SLAVE_TX_PKTS;
> > for (q_id = 0; q_id < bond_dev->data->nb_tx_queues; q_id++) {
> > bd_tx_q = (struct
> bond_tx_queue*)bond_dev->data->tx_queues[q_id];
> > - total_tx_desc += bd_tx_q->nb_tx_desc;
> > + total_tx_desc += bd_tx_q->nb_tx_desc +
> cache_flushthresh;
> > }
> >
> > snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool",
> slave_id);
> > port->mbuf_pool = rte_mempool_create(mem_name,
> > - total_tx_desc, element_size,
> > - RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? 32 :
> RTE_MEMPOOL_CACHE_MAX_SIZE,
> > + total_tx_desc, element_size, cache_size,
> > sizeof(struct rte_pktmbuf_pool_private),
> rte_pktmbuf_pool_init,
> > NULL, rte_pktmbuf_init, NULL, socket_id,
> MEMPOOL_F_NO_SPREAD);
> >
> >
>
> I'm not very familiar with bonding code, so maybe your patch is correct.
> I think the size of the mempool should be:
>
> BOND_MODE_8023AX_SLAVE_TX_PKTS +
> n_cores * RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size)
>
> With n_cores = number of cores that can dequeue from the mempool.
>
> The safest thing to do would be to have n_cores = RTE_MAX_LCORE. I don't
> know if bond_dev->data->nb_tx_queue corresponds to this definition, if
> yes you can ignore my comment ;)
>
>
> Regards,
> Olivier
>
> --
>
> Only one, non-EAL thread dequeues from a per-slave, private (LACPDU) mempool.
> The thread calls mbuf-alloc in the context of an rte_eal_alarm_set( )
> callback function, and enqueues the mbuf to a multi-consumer ring.
>
> The consumer(s) of the mbufs can be any core(s) that invoke the bond PMD?s
> tx-burst function. As we know, mbufs can sit around on TX descriptor rings
> long after we transmit them, often until we need to reuse the descriptors for
> subsequent TX packets. When the TX cores finally free some of the mbufs, some
> of the mbufs get left in a pool?s per-core cache. So, the maximum number of
> mbufs that may be lying around doing nothing is: the ?done? mbufs in the TX
> desc ring and in the per-core cache, times the number of TX cores, or
> approximately nb_tx_queues * (nb_tx_desc + CACHE_FLUSHTHRESH(cache_size)). (I
> probably should also update the comment above the for-loop that calculates
> total_tx_desc.)
>
> When we include normal TX mbufs and private LACPDU mbufs displacing each
> other in the TX desc ring, there are many (if not infinite) possibilities. I
> wrote a program to simulate all this interaction, trying different values for
> cache_size, nb_tx_desc, etc., because it took a long time to run out of
> LACPDU mbufs (or it never happened) just using long-interval pings.
>
> Anyway, I hope that this is a nice, long way of saying that I will ignore
> your comment. ;-)
OK, thanks for the detailed explanation.
> Another idea, related to mempools (but not this LACP patch) ...
> Back when I first investigated the LACP pool problem, I got an idea related
> to the mempool per-core caches. It seems like a waste that in some usage
> patterns (like above), a core may ?put? objects to a pool, but rarely/never
> ?get? objects from the pool, the result being that it rarely/never reuses
> objects stuck in the per-core cache. We could recognize this kind of behavior
> by adding a small counter to the rte_mempool_cache struct. During a put, we
> increment the counter, and during a get, we set the counter to zero. If,
> during a put that is going to flush some of the objects to the global part of
> the pool, the counter is at or above some threshold (8, 16, 32, or whatever),
> then we flush ALL of the objects, since this core hasn?t been using them.
> What do you think?
While I understand the need, it looks a bit complicated to me.
When I read your first patch, I was thinking about automatically
increase the size of the pool in mempool_create(), according to the
number of cores and size of the cache. But it would have too many drawbacks:
- waste of memory in many cases, especially for large objects
- would not take the external caches (new 16.07 feature) in account
- behavior change