Some more observations

Regards,

Pradeep

> -----Original Message-----    
> From: ovs-discuss-boun...@openvswitch.org [mailto:ovs-discuss-
> boun...@openvswitch.org] On Behalf Of Venkatesan Pradeep
> Sent: Friday, January 26, 2018 11:04 PM
> To: Jan Scheurich <jan.scheur...@ericsson.com>; Stokes, Ian
> <ian.sto...@intel.com>; ovs-discuss@openvswitch.org
> Cc: Ilya Maximets (i.maxim...@samsung.com) <i.maxim...@samsung.com>;
> Flavio Leitner <f...@redhat.com>
> Subject: Re: [ovs-discuss] Mempool issue for OVS 2.9
> 
> Response marked [Pradeep]
> 
> Thanks,
> 
> Pradeep
> 
> -----Original Message-----
> From: Jan Scheurich
> Sent: Friday, January 26, 2018 10:26 PM
> To: Stokes, Ian <ian.sto...@intel.com>; ovs-discuss@openvswitch.org
> Cc: Kevin Traynor <ktray...@redhat.com>; Flavio Leitner <f...@redhat.com>;
> Ilya Maximets (i.maxim...@samsung.com) <i.maxim...@samsung.com>;
> Loftus, Ciara <ciara.lof...@intel.com>; Kavanagh, Mark B
> <mark.b.kavan...@intel.com>; Ben Pfaff (b...@ovn.org) <b...@ovn.org>;
> acon...@redhat.com; Venkatesan Pradeep
> <venkatesan.prad...@ericsson.com>
> Subject: RE: Mempool issue for OVS 2.9
> 
> > -----Original Message-----
> > From: Stokes, Ian [mailto:ian.sto...@intel.com]
> > Sent: Friday, 26 January, 2018 13:01
> > To: ovs-discuss@openvswitch.org
> > Cc: Kevin Traynor <ktray...@redhat.com>; Flavio Leitner
> > <f...@redhat.com>; Ilya Maximets (i.maxim...@samsung.com)
> > <i.maxim...@samsung.com>; Loftus, Ciara <ciara.lof...@intel.com>;
> > Kavanagh, Mark B <mark.b.kavan...@intel.com>; Jan Scheurich
> > <jan.scheur...@ericsson.com>; Ben Pfaff (b...@ovn.org) <b...@ovn.org>;
> > acon...@redhat.com; Venkatesan Pradeep
> > <venkatesan.prad...@ericsson.com>
> > Subject: Mempool issue for OVS 2.9
> >
> > Hi All,
> >
> > Recently an issue was raised regarding the move from a single shared
> > mempool model that was in place up to OVS 2.8, to a mempool per port
> model introduced in 2.9.
> >
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046021
> > .html
> >
> > The per port mempool model was introduced in September 5th with commit
> > d555d9bded to allow fine grain control on a per port case of memory usage.
> >
> > In the 'common/shared mempool' model, ports sharing a similar MTU
> > would all share the same buffer mempool (e.g. the most common example of
> this being that all ports are by default created with a 1500B MTU, and as such
> share the same mbuf mempool).
> >
> > This approach had some drawbacks however. For example, with the shared
> > memory pool model a user could exhaust the shared memory pool (for
> > instance by requesting a large number of RXQs for a port), this would
> > cause the vSwitch to crash as any remaining ports would not have the
> required memory to function. This bug was discovered and reported to the
> community in late 2016 https://mail.openvswitch.org/pipermail/ovs-
> discuss/2016-September/042560.html.
> >
> > The per port mempool patch aimed to avoid such issues by allocating a
> separate buffer mempool to each port.
> >
> > An issue has been flagged on ovs-discuss, whereby memory dimensions
> > provided for a given number of ports on OvS 2.6-2.8 may be
> > insufficient to support the same number of ports in OvS 2.9, on
> > account of the per-port mempool model without re-dimensioning extra
> > memory. The effect of this is use case dependent (number of ports, RXQs,
> MTU settings, number of PMDs etc.) The previous 'common- pool' model was
> rudimentary in estimating the mempool size and was marked as something
> that was to be improved upon. The memory allocation calculation for per port
> model was modified to take the possible configuration factors mentioned into
> account.
> >
> > It's unfortunate that this came to light so close to the release code
> > freeze - but better late than never as it is a valid problem to be resolved.
> >
> > I wanted to highlight some options to the community as I don't think
> > the next steps should be taken in isolation due to the impact this feature 
> > has.
> >
> > There are a number of possibilities for the 2.9 release.
> >
> > (i) Revert the mempool per port patches and return to the shared
> > mempool model. There are a number of features and refactoring in place on
> top of the change so this will not be a simple revert. I'm investigating what
> exactly is involved with this currently.
> 
> The shared mempool concept has been working fairly well in our NFVI systems
> for a long time. One can argue that it is too simplistic (as it does not at 
> all take
> the number of ports and queue lengths into account) and may be insufficient in
> specific scenarios but at least it can operate with a reasonable amount of
> memory.
> 
> Out of the very many vhostuser ports only very few actually carry significant
> amount of traffic. To reserve the theoretical worst case amount of buffers for
> every port separately is complete overkill.
> 
> > (ii) Leave the per port mempool implementation as is, flag to users
> > that memory requirements have increased. Extra memory may have to be
> provided on a per use case basis.
> 
> [Jan] From my point of view this a blocker for roll-out of OVS 2.9 in existing
> NFVI Cloud system. On these systems the amount of huge pages allocated to
> OVS is fixed by zero day configuration and there is no easy way to change that
> memory allocation as part of a SW upgrade procedure.
> 
> As such servers host a significant number of vhostuser ports and rx queues (in
> the order of 100 or more) the new per-port mempool scheme will likely cause
> OVS to fail after SW upgrade.
> 
> > (iii) Reduce the amount of memory allocated per mempool per port. An
> > RFC to this effect was submitted by Kevin but on follow up the feeling is 
> > that it
> does not resolve the issue adequately.
> 
> If we can't get the shared mempool back into 2.9. from day one, this might be
> an emergency measure to support a reasonable number of ports with typically
> deployed huge-page assignments.
> 
> [Pradeep] The per-port partitioning scheme uses a port's queue configuration
> to decide how many mbufs to allocate. That is fine so long as the buffers are
> consumed by that port only. However, as discussed on the original thread,
> mbufs of one port may go and sit on the queues of other ports and if we were
> to account for that the estimate would bloat up.  Also, since the calculation
> would depend on the configuration of other ports the requirements can
> increase long after the port is created. This defeats the original intent 
> behind
> the per-port allocation scheme.
> 
> If retaining the new scheme is the only option for 2.9, perhaps we can 
> consider
> the following based on what was discussed in the other thread:
> - the worst-case dimensioning assumes that tx queues will be full. Unless the
> queues are drained slowly compared to the enqueue rate (due to flow control,
> mismatched speeds etc) the queue occupancy is likely to be low. Instead of
> using txq_size (default: 2048) we can consider using a lower value to 
> calculate
> the # of mbufs allocated & used by the port.

[Pradeep] After looking at the driver code it looks like this may not be 
possible. Some of the drivers seem to take a fairly relaxed and lazy approach 
to freeing mbufs on the txq.

This brings me to a related question. When a mempool is freed as a result of 
port deletion or port configuration change and mbufs from that pool are on txqs 
of other ports how is it ensured that those mbufs are freed first before 
freeing the mempool? I couldn't find any code that does this but If this is 
indeed not done why isn't it required? I don't think it was done even with the 
shared pool model model but this situation would arise only if the last 
reference to the pool goes away and that case may be less common.

> - since zero-copy for vhostuser ports is not yet enabled (or at least not by
> default) and we always end up copying packets, the occupancy is effectively
> only 1. We could allocate a lower number of mbufs for vhostuser ports
> 
> To account for stolen buffers (i.e mbufs placed on other port queues) we can
> add fixed value. Since the number of dpdk ports would be much lower
> comparted to the number of vhostuser ports and since zero-copy is not enabled
> for vhostuser ports this fixed value need not be very high.
> 
> I don't think this will necessarily address all use-cases but if the queue
> occupancy and the fixed value (based on max expected # of dpdk ports/queues)
> are chosen properly it should, hopefully, cover most common deployments.
> 
> > (iv) Introduce a feature to allow users to configure mempool as shared
> > or on a per port basis: This would be the best of both worlds but given the
> proximity to the 2.9 freeze I don't think it's feasible by the end of January.
> 
> Yes, too late now. But something that should be backported to 2.9 as soon as
> we have it on master.
> 
> I think that we should aim for the combination of adaptive shared mempools as
> default and explicitly configurable per-port mempools, when needed.
> 
> Regards, Jan
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to