Thanks @joshw...@google.com <mailto:joshw...@google.com> @Maxime
Coquelin <mailto:maxime.coque...@redhat.com> for the inputs.
@Maxime Coquelin <mailto:maxime.coque...@redhat.com>
I did code bisecting and was able to pin-point through test-pmd run
that *this issue we are starting to see since DPDK-21.11 version
onwards. Till DPDK-21.08 this issue is not seen.*
To remind the issue what we see is that while actual amount of
serviced traffic by the hypervisor remains almost same between the two
versions (implying the underlying GCP hypervisor is only capable of
handling that much) but in >=dpdk-21.11 versions the virtio-PMD is
pushing almost 20x traffic compared to dpdk-21.08 (This humongous
traffic rate in >=dpdk-21.11 versions leads to high packet drop rates
since the underlying hypervisor is only capable of max handling the
same load it was servicing in <=dpdk-21.08)
The same pattern can be seen even if we run traffic for a longer
duration.
*_Eg:_*
Testpmd traffic run (for packet-size=1518) for exact same
time-interval of 15 seconds:
_*>=21.11 DPDK version*_
---------------------- Forward statistics for port 0
----------------------
RX-packets: 2 RX-dropped: 0 RX-total: 2
TX-packets: 19497570 *TX-dropped: 364674686 * TX-total: 384172256
----------------------------------------------------------------------------
_*Upto 21.08 DPDK version *_
---------------------- Forward statistics for port 0
----------------------
RX-packets: 3 RX-dropped: 0 RX-total: 3
TX-packets: 19480319 TX-dropped: 0 TX-total:
19480319
----------------------------------------------------------------------------
As you can see
>=dpdk-21.11
Packets generated : 384 million Packets serviced : ~19.5 million :
Tx-dropped : 364 million
<=dpdk-21.08
Packets generated : ~19.5 million Packets serviced : ~19.5 million :
Tx-dropped : 0
==========================================================================
@Maxime Coquelin <mailto:maxime.coque...@redhat.com>
I have run through all the commits made by virtio-team between
DPDK-21.11 and DPDK-21.08 as per the commit-logs available at
https://git.dpdk.org/dpdk/log/drivers/net/virtio
<https://git.dpdk.org/dpdk/log/drivers/net/virtio>
I even tried undoing all the possible relevant commits (I could think
of) on a dpdk-21.11 workspace & then re-running testpmd in order to
track down which commit has introduced this regression but no luck.
Need your inputs further if you could glance through the commits made
in between these releases and let us know if there's any particular
commit of interest which you think can cause the behavior as seen
above (or if there's any commit not captured in the above git link;
maybe a commit checkin outside the virtio PMD code perhaps?).
Thanks,
Mukul
On Mon, Dec 9, 2024 at 9:54 PM Joshua Washington <joshw...@google.com
<mailto:joshw...@google.com>> wrote:
Hello,
Based on your VM shape (8 vcpu VM) and packet size (1518B packets),
what you are seeing is exactly expected. 8 vCPU Gen 2 VMs has a
default egress cap of 16 Gbps. This equates to roughly 1.3Mpps when
using 1518B packets, including IFG. Over the course of 15 seconds,
19.5 million packets should be sent, which matches both cases. The
difference here seems to be what happens on DPDK, not GCP. I don't
believe that packet drops on the host NIC are captured in DPDK
stats; likely the descriptor ring just filled up because the egress
bandwidth cap was hit and queue servicing was throttled. This would
cause a TX burst to return less packets than the burst size. The
difference between 20.05 and 22.11 might have to do with this
reporting, or a change in testpmd logic for when to send new bursts
of traffic.
Best,
Josh
On Mon, Dec 9, 2024, 07:39 Mukul Sinha <mukul.si...@broadcom.com
<mailto:mukul.si...@broadcom.com>> wrote:
GCP-dev team
@jeroe...@google.com
<mailto:jeroe...@google.com>@rush...@google.com
<mailto:rush...@google.com> @joshw...@google.com
<mailto:joshw...@google.com>
Can you please check the following email & get back ?
On Fri, Dec 6, 2024 at 2:04 AM Mukul Sinha
<mukul.si...@broadcom.com <mailto:mukul.si...@broadcom.com>>
wrote:
Hi GCP & Virtio-PMD dev teams,
We are from VMware NSX Advanced Load Balancer Team whereby
in GCP-cloud (*custom-8-8192 VM instance type 8core8G*) we
are triaging an issue of TCP profile application throughput
performance with single dispatcher core single Rx/Tx queue
(queue depth: 2048) the throughput performance we get using
dpdk-22.11 virtio-PMD code is degraded significantly when
compared to when using dpdk-20.05 PMD
We see high amount of Tx packet drop counter incrementing on
virtio-NIC pointing to issue that the GCP hypervisor side is
unable to drain the packets faster (No drops are seen on Rx
side)
The behavior is like this :
_Using dpdk-22.11_
At 75% CPU usage itself we start seeing huge number of Tx
packet drops reported (no Rx drops) causing TCP
restransmissions eventually bringing down the effective
throughput numbers
_Using dpdk-20.05_
even at ~95% CPU usage without any packet drops (neither Rx
nor Tx) we are able to get a much better throughput
To improve performance numbers with dpdk-22.11 we have tried
increasing the queue depth to 4096 but that din't help.
If with dpdk-22.11 we move from single core Rx/Tx queue=1 to
single core Rx/Tx queue=2 we are able to get slightly better
numbers (but still doesnt match the numbers obtained using
dpdk-20.05 single core Rx/Tx queue=1). This again
corroborates the fact the GCP hypervisor is the bottleneck
here.
To root-cause this issue we were able to replicate this
behavior using native DPDK testpmd as shown below (cmds
used):-
Hugepage size: 2 MB
./app/dpdk-testpmd -l 0-1 -n 1 -- -i --nb-cores=1
--txd=2048 --rxd=2048 --rxq=1 --txq=1 --portmask=0x3
set fwd mac
set fwd flowgen
set txpkts 1518
start
stop
Testpmd traffic run (for packet-size=1518) for exact same
time-interval of 15 seconds:
_22.11_
---------------------- Forward statistics for port 0
----------------------
RX-packets: 2 RX-dropped: 0
RX-total: 2
TX-packets: 19497570 *TX-dropped: 364674686 *
TX-total: 384172256
----------------------------------------------------------------------------
_20.05_
---------------------- Forward statistics for port 0
----------------------
RX-packets: 3 RX-dropped: 0
RX-total: 3
TX-packets: 19480319 TX-dropped: 0
TX-total: 19480319
----------------------------------------------------------------------------
As you can see
dpdk-22.11
Packets generated : 384 million Packets serviced : ~19.5
million : Tx-dropped : 364 million
dpdk-20.05
Packets generated : ~19.5 million Packets serviced : ~19.5
million : Tx-dropped : 0
Actual serviced traffic remains almost same between the two
versions (implying the underlying GCP hypervisor is only
capable of handling that much) but in dpdk-22.11 the PMD is
pushing almost 20x traffic compared to dpdk-20.05
The same pattern can be seen even if we run traffic for a
longer duration.
===============================================================================================
Following are our queries:
@ Virtio-dev team
1. Why in dpdk-22.11 using virtio PMD the testpmd
application is able to pump 20 times Tx traffic towards
hypervisor compared to dpdk-20.05 ?
What has changed either in the virtio-PMD or in the
virtio-PMD & underlying hypervisor communication causing
this behavior ?
If you see actual serviced traffic by the hypervisor remains
almost on par with dpdk-20.05 but its the humongous packets
drop count which can be overall detrimental for any
DPDK-application running TCP traffic profile.
Is there a way to slow down the number of packets sent
towards the hypervisor (through either any code change in
virtio-PMD or any config setting) and make it on-par with
dpdk-20.05 performance ?
2. In the published Virtio performance report Release 22.11
we see no qualification of throughput numbers done on
GCP-cloud. Is there any internal performance benchmark
numbers you have for GCP-cloud and if yes can you please
share it with us so that we can check if there's any
configs/knobs/settings you used to get optimum performance.
@ GCP-cloud dev team
As we can see any amount of traffic greater than what can be
successfully serviced by the GCP hypervisor is all getting
dropped hence we need help from your side to reproduce this
issue in your in-house setup preferably using the same VM
instance type as highlighted before.
We need further investigation by you from the GCP host level
side to check on parameters like running out of Tx buffers
or Queue full conditions for the virtio-NIC or number of NIC
Rx/Tx kernel threads as to what is causing hypervisor to not
match up to the traffic load pumped in dpdk-22.11
Based on your debugging we would additionally need inputs as
to what can be tweaked or any knobs/settings can be
configured from the GCP-VM level to get better performance
numbers.
Please feel free to reach out to us for any further queries.
_Additional outputs for debugging:_
lspci | grep Eth
00:06.0 Ethernet controller: Red Hat, Inc. Virtio network
device
root@dcg15-se-ecmyw:/home/admin/dpdk/build# ethtool -i eth0
driver: virtio_net
version: 1.0.0
firmware-version:
expansion-rom-version:
bus-info: 0000:00:06.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
testpmd> show port info all
********************* Infos for port 0 *********************
MAC address: 42:01:0A:98:A0:0F
Device name: 0000:00:06.0
Driver name: net_virtio
Firmware-version: not available
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: Unknown
Link duplex: full-duplex
Autoneg status: On
MTU: 1500
Promiscuous mode: disabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 64
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off, filter off, extend off, qinq strip off
No RSS offload flow type is supported.
Minimum size of RX buffer: 64
Maximum configurable length of RX packet: 9728
Maximum configurable size of LRO aggregated packet: 0
Current number of RX queues: 1
Max possible RX queues: 2
Max possible number of RXDs per queue: 32768
Min possible number of RXDs per queue: 32
RXDs number alignment: 1
Current number of TX queues: 1
Max possible TX queues: 2
Max possible number of TXDs per queue: 32768
Min possible number of TXDs per queue: 32
TXDs number alignment: 1
Max segment number per packet: 65535
Max segment number per MTU/TSO: 65535
Device capabilities: 0x0( )
Device error handling mode: none
This electronic communication and the information and any files
transmitted with it, or attached to it, are confidential and are
intended solely for the use of the individual or entity to whom
it is addressed and may contain information that is
confidential, legally privileged, protected by privacy laws, or
otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for
delivering the e-mail to the intended recipient, you are hereby
notified that any use, copying, distributing, dissemination,
forwarding, printing, or copying of this e-mail is strictly
prohibited. If you received this e-mail in error, please return
the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
This electronic communication and the information and any files
transmitted with it, or attached to it, are confidential and are
intended solely for the use of the individual or entity to whom it is
addressed and may contain information that is confidential, legally
privileged, protected by privacy laws, or otherwise restricted from
disclosure to anyone else. If you are not the intended recipient or
the person responsible for delivering the e-mail to the intended
recipient, you are hereby notified that any use, copying,
distributing, dissemination, forwarding, printing, or copying of this
e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer,
and destroy any printed copy of it.