bonding: another fix to LACP mempool size

Olivier MATZ Wed, 24 Aug 2016 18:14:07 +0200

Hi Robert,

On 08/23/2016 10:01 PM, Sanford, Robert wrote:
> Hi Olivier,
>
> On 8/23/16, 11:09 AM, "Olivier MATZ" <olivier.matz at 6wind.com> wrote:
>
>      Hi Robert,
>
>      On 08/01/2016 10:42 PM, Robert Sanford wrote:
>      > The following log message may appear after a slave is idle (or nearly
>      > idle) for a few minutes: "PMD: Failed to allocate LACP packet from
>      > pool".
>      >
>      > Problem: All mbufs from a slave's private pool (used exclusively for
>      > transmitting LACPDUs) have been allocated and are still sitting in
>      > the device's tx descriptor ring and other cores' mempool caches.
>      >
>      > Solution: Ensure that each slaves' tx (LACPDU) mempool owns more than
>      > n-tx-queues * (n-tx-descriptors + per-core-mempool-flush-threshold)
>      > mbufs.
>      >
>      > Note that the LACP tx machine function is the only code that allocates
>      > from a slave's private pool. It runs in the context of the interrupt
>      > thread, and thus it has no mempool cache of its own.
>      >
>      > Signed-off-by: Robert Sanford <rsanford at akamai.com>
>      > ---
>      >   drivers/net/bonding/rte_eth_bond_8023ad.c |   10 +++++++---
>      >   1 files changed, 7 insertions(+), 3 deletions(-)
>      >
>      > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
> b/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > index 2f7ae70..1207896 100644
>      > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > @@ -854,6 +854,8 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev 
> *bond_dev, uint8_t slave_id)
>      >        char mem_name[RTE_ETH_NAME_MAX_LEN];
>      >        int socket_id;
>      >        unsigned element_size;
>      > +      unsigned cache_size;
>      > +      unsigned cache_flushthresh;
>      >        uint32_t total_tx_desc;
>      >        struct bond_tx_queue *bd_tx_q;
>      >        uint16_t q_id;
>      > @@ -890,19 +892,21 @@ bond_mode_8023ad_activate_slave(struct 
> rte_eth_dev *bond_dev, uint8_t slave_id)
>      >
>      >        element_size = sizeof(struct slow_protocol_frame) + 
> sizeof(struct rte_mbuf)
>      >                                + RTE_PKTMBUF_HEADROOM;
>      > +      cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
>      > +              32 : RTE_MEMPOOL_CACHE_MAX_SIZE;
>      > +      cache_flushthresh = 
> RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size);
>      >
>      >        /* The size of the mempool should be at least:
>      >         * the sum of the TX descriptors + 
> BOND_MODE_8023AX_SLAVE_TX_PKTS */
>      >        total_tx_desc = BOND_MODE_8023AX_SLAVE_TX_PKTS;
>      >        for (q_id = 0; q_id < bond_dev->data->nb_tx_queues; q_id++) {
>      >                bd_tx_q = (struct 
> bond_tx_queue*)bond_dev->data->tx_queues[q_id];
>      > -              total_tx_desc += bd_tx_q->nb_tx_desc;
>      > +              total_tx_desc += bd_tx_q->nb_tx_desc + 
> cache_flushthresh;
>      >        }
>      >
>      >        snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", 
> slave_id);
>      >        port->mbuf_pool = rte_mempool_create(mem_name,
>      > -              total_tx_desc, element_size,
>      > -              RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? 32 : 
> RTE_MEMPOOL_CACHE_MAX_SIZE,
>      > +              total_tx_desc, element_size, cache_size,
>      >                sizeof(struct rte_pktmbuf_pool_private), 
> rte_pktmbuf_pool_init,
>      >                NULL, rte_pktmbuf_init, NULL, socket_id, 
> MEMPOOL_F_NO_SPREAD);
>      >
>      >
>
>      I'm not very familiar with bonding code, so maybe your patch is correct.
>      I think the size of the mempool should be:
>
>         BOND_MODE_8023AX_SLAVE_TX_PKTS +
>           n_cores * RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size)
>
>      With n_cores = number of cores that can dequeue from the mempool.
>
>      The safest thing to do would be to have n_cores = RTE_MAX_LCORE. I don't
>      know if bond_dev->data->nb_tx_queue corresponds to this definition, if
>      yes you can ignore my comment ;)
>
>
>      Regards,
>      Olivier
>
> --
>
> Only one, non-EAL thread dequeues from a per-slave, private (LACPDU) mempool. 
> The thread calls mbuf-alloc in the context of an rte_eal_alarm_set( ) 
> callback function, and enqueues the mbuf to a multi-consumer ring.
>
> The consumer(s) of the mbufs can be any core(s) that invoke the bond PMD?s 
> tx-burst function. As we know, mbufs can sit around on TX descriptor rings 
> long after we transmit them, often until we need to reuse the descriptors for 
> subsequent TX packets. When the TX cores finally free some of the mbufs, some 
> of the mbufs get left in a pool?s per-core cache. So, the maximum number of 
> mbufs that may be lying around doing nothing is: the ?done? mbufs in the TX 
> desc ring and in the per-core cache, times the number of TX cores, or 
> approximately nb_tx_queues * (nb_tx_desc + CACHE_FLUSHTHRESH(cache_size)). (I 
> probably should also update the comment above the for-loop that calculates 
> total_tx_desc.)
>
> When we include normal TX mbufs and private LACPDU mbufs displacing each 
> other in the TX desc ring, there are many (if not infinite) possibilities. I 
> wrote a program to simulate all this interaction, trying different values for 
> cache_size, nb_tx_desc, etc., because it took a long time to run out of 
> LACPDU mbufs (or it never happened) just using long-interval pings.
>
> Anyway, I hope that this is a nice, long way of saying that I will ignore 
> your comment. ;-)


OK, thanks for the detailed explanation.

> Another idea, related to mempools (but not this LACP patch) ...
> Back when I first investigated the LACP pool problem, I got an idea related 
> to the mempool per-core caches. It seems like a waste that in some usage 
> patterns (like above), a core may ?put? objects to a pool, but rarely/never 
> ?get? objects from the pool, the result being that it rarely/never reuses 
> objects stuck in the per-core cache. We could recognize this kind of behavior 
> by adding a small counter to the rte_mempool_cache struct. During a put, we 
> increment the counter, and during a get, we set the counter to zero. If, 
> during a put that is going to flush some of the objects to the global part of 
> the pool, the counter is at or above some threshold (8, 16, 32, or whatever), 
> then we flush ALL of the objects, since this core hasn?t been using them. 
> What do you think?

While I understand the need, it looks a bit complicated to me.

When I read your first patch, I was thinking about automatically 
increase the size of the pool in mempool_create(), according to the 
number of cores and size of the cache. But it would have too many drawbacks:
  - waste of memory in many cases, especially for large objects
  - would not take the external caches (new 16.07 feature) in account
  - behavior change

[dpdk-dev] [PATCH 3/4] net/bonding: another fix to LACP mempool size

Reply via email to