Re: [ovs-discuss] [ovs-dev] mbuf pool sizing

2018-01-24 Thread Venkatesan Pradeep
Hi Kevin,

My primary concern is that someone upgrading to OVS2.9 may find that 
configurations that were previously working fine no longer do because the 
memory dimensioned for OVS may not be sufficient. It could be argued that since 
the shared mempool method allocates a fixed number of buffers it may not be 
enough in all cases  but the fact remains that existing deployments that are 
working just fine may have issues after upgrading and that needs to be 
addressed.

Even with the per-port allocation scheme we can only have a rough estimate. RxQ 
buffer sizing is adequate but TxQ  buffer sizing is not for the following 
reasons:

1)  The estimate should consider the possibility of packets from one port 
being stuck on all other port txqs and so the *worst case * TxQ buffer sizing 
for stolen packets should really be the Sigma of (dev->requested_n_txq * 
dev->requested_txq_size) for every other port. This will bloat up the pool 
size. Also when a new port is added or an existing port’s queue attributes are 
changed, every other port’s mempool has to be resized and that may fail.  A 
high value for MIN_NB_MBUF is likely helping to cover the shortfall.
2)  Currently, In the case of tx to vhostuser queues, packets are always 
copied and so in the above calculation we need to consider only physical dpdk 
ports. I haven’t looked closely at the proposed zero-copy change but I assume 
if that is enabled we would have to take into account the queue size for 
vhostuser ports as well.
3) For cloned packets (dev->requested_n_txq * dev->requested_txq_size) would 
suffice
4) Tx batching would add a bit more to the estimate

That said, unless the TxQs are drained slowly in comparison to the rate at 
which the packets are enqueued, the queue occupancy may never be high enough to 
justify the worst case allocation estimate and lots of memory will be wasted. 

Shared mempool does have an advantage since it allows more efficient sharing of 
the mbufs but yes using a one-size-fits-all approach won’t work in all cases. 
Even when different MTUs are involved, if the values are close enough the 
associated ports will share the memory pools and we may only need a small 
number of memory pools. Perhaps making the size configurable or even having 
them grow dynamically when the usage goes high would be something to consider?

Regards,

Pradeep


-Original Message-
From: Kevin Traynor [mailto:ktray...@redhat.com] 
Sent: Wednesday, January 24, 2018 12:15 AM
To: Venkatesan Pradeep ; 
ovs-...@openvswitch.org; ovs-discuss@openvswitch.org; Robert Wojciechowicz 
; Ian Stokes ; Ilya 
Maximets ; Kavanagh, Mark B 
Subject: Re: [ovs-dev] mbuf pool sizing

On 01/23/2018 11:42 AM, Kevin Traynor wrote:
> On 01/17/2018 07:48 PM, Venkatesan Pradeep wrote:
>> Hi,
>>
>> Assuming that all ports use the same MTU,  in OVS2.8 and earlier, a 
>> single mempool of 256K buffers (MAX_NB_MBUF = 4096 * 64) will be 
>> created and shared by all the ports
>>
>> With the OVS2.9 mempool patches, we have port specific allocation and the 
>> number of mbufs created for each port is based on the following formula 
>> (with a lower limit of MIN_NB_MBUF = 4096*4)
>>n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
>>   + dev->requested_n_txq * dev->requested_txq_size
>>   + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>>   + MIN_NB_MBUF;
>>
>> Using minimal value (1) for n_rxq and n_rxq and default value (2048) for 
>> requested_rxq_size and requested_txq_size, the above translates to
>>   n_mbufs = 1*2048 + 1*2048 + 1*32 + 4096*4  = 20512
>>
>> Assuming all ports have the same MTU, this means that approximately 13 ports 
>> in OVS2.9 will consume as much memory as the single mempool shared by all 
>> ports in OVS2.8 (256*1024 / 20512) .
>>
>> When a node is upgraded from OVS2.8 to OVS2.9 it is quite possible that the 
>> memory set aside for OVS may be insufficient. I'm not sure if this aspect 
>> has been discussed previously and wanted to bring this up for discussion.
>>
> 
> Hi Pradeep, I don't think it has been discussed. I guess the thinking 
> was that with a giant shared mempool, it was over provisioning when 
> there was a few ports, and in the case where there was a lot of ports 
> there could be some starvation at run time. It also meant if you had a 
> mix of different MTU's you had multiple giant shared mempools and 
> could run out of memory very quickly at config or run time then also.
> 
> So I can see the argument for having a mempool per port, as it is more 
> fine grained and if you are going to run short of memory, it will at 
> least be at config time. The problem is if you give some over 
> provision to each port and you have a lot of ports, you hit the 
> situation you are seeing.
> 
> I think some amount of over 

Re: [ovs-discuss] [ovs-dev] mbuf pool sizing

2018-01-23 Thread Kevin Traynor
On 01/23/2018 11:42 AM, Kevin Traynor wrote:
> On 01/17/2018 07:48 PM, Venkatesan Pradeep wrote:
>> Hi,
>>
>> Assuming that all ports use the same MTU,  in OVS2.8 and earlier, a single 
>> mempool of 256K buffers (MAX_NB_MBUF = 4096 * 64) will be created and shared 
>> by all the ports
>>
>> With the OVS2.9 mempool patches, we have port specific allocation and the 
>> number of mbufs created for each port is based on the following formula 
>> (with a lower limit of MIN_NB_MBUF = 4096*4)
>>n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
>>   + dev->requested_n_txq * dev->requested_txq_size
>>   + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>>   + MIN_NB_MBUF;
>>
>> Using minimal value (1) for n_rxq and n_rxq and default value (2048) for 
>> requested_rxq_size and requested_txq_size, the above translates to
>>   n_mbufs = 1*2048 + 1*2048 + 1*32 + 4096*4  = 20512
>>
>> Assuming all ports have the same MTU, this means that approximately 13 ports 
>> in OVS2.9 will consume as much memory as the single mempool shared by all 
>> ports in OVS2.8 (256*1024 / 20512) .
>>
>> When a node is upgraded from OVS2.8 to OVS2.9 it is quite possible that the 
>> memory set aside for OVS may be insufficient. I'm not sure if this aspect 
>> has been discussed previously and wanted to bring this up for discussion.
>>
> 
> Hi Pradeep, I don't think it has been discussed. I guess the thinking
> was that with a giant shared mempool, it was over provisioning when
> there was a few ports, and in the case where there was a lot of ports
> there could be some starvation at run time. It also meant if you had a
> mix of different MTU's you had multiple giant shared mempools and could
> run out of memory very quickly at config or run time then also.
> 
> So I can see the argument for having a mempool per port, as it is more
> fine grained and if you are going to run short of memory, it will at
> least be at config time. The problem is if you give some over provision
> to each port and you have a lot of ports, you hit the situation you are
> seeing.
> 
> I think some amount of over provision per port is needed because you
> don't want to be cutting it so fine that you run into memory issues at
> run time about local mbuf caches on cores running out, or even if
> someone used dpdk rings to send the mbuf somewhere else for a time.
> There may be other corner cases too. Perhaps as compromise the min size
> could be reduce from 4096*4 to 4096*2 or 4096.
> 
> Thoughts?
> 

I just sent a compile tested only RFC here
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343581.html

> Kevin.
> 
>> Regards,
>>
>> Pradeep
>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] mbuf pool sizing

2018-01-23 Thread Kevin Traynor
On 01/17/2018 07:48 PM, Venkatesan Pradeep wrote:
> Hi,
> 
> Assuming that all ports use the same MTU,  in OVS2.8 and earlier, a single 
> mempool of 256K buffers (MAX_NB_MBUF = 4096 * 64) will be created and shared 
> by all the ports
> 
> With the OVS2.9 mempool patches, we have port specific allocation and the 
> number of mbufs created for each port is based on the following formula (with 
> a lower limit of MIN_NB_MBUF = 4096*4)
>n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
>   + dev->requested_n_txq * dev->requested_txq_size
>   + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>   + MIN_NB_MBUF;
> 
> Using minimal value (1) for n_rxq and n_rxq and default value (2048) for 
> requested_rxq_size and requested_txq_size, the above translates to
>   n_mbufs = 1*2048 + 1*2048 + 1*32 + 4096*4  = 20512
> 
> Assuming all ports have the same MTU, this means that approximately 13 ports 
> in OVS2.9 will consume as much memory as the single mempool shared by all 
> ports in OVS2.8 (256*1024 / 20512) .
> 
> When a node is upgraded from OVS2.8 to OVS2.9 it is quite possible that the 
> memory set aside for OVS may be insufficient. I'm not sure if this aspect has 
> been discussed previously and wanted to bring this up for discussion.
> 

Hi Pradeep, I don't think it has been discussed. I guess the thinking
was that with a giant shared mempool, it was over provisioning when
there was a few ports, and in the case where there was a lot of ports
there could be some starvation at run time. It also meant if you had a
mix of different MTU's you had multiple giant shared mempools and could
run out of memory very quickly at config or run time then also.

So I can see the argument for having a mempool per port, as it is more
fine grained and if you are going to run short of memory, it will at
least be at config time. The problem is if you give some over provision
to each port and you have a lot of ports, you hit the situation you are
seeing.

I think some amount of over provision per port is needed because you
don't want to be cutting it so fine that you run into memory issues at
run time about local mbuf caches on cores running out, or even if
someone used dpdk rings to send the mbuf somewhere else for a time.
There may be other corner cases too. Perhaps as compromise the min size
could be reduce from 4096*4 to 4096*2 or 4096.

Thoughts?

Kevin.

> Regards,
> 
> Pradeep
> 
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss