On Mon, Jun 17, 2024 at 5:15 PM Jiri Pirko <j...@resnulli.us> wrote: > > Fri, Jun 14, 2024 at 11:54:04AM CEST, kerneljasonx...@gmail.com wrote: > >Hello Jiri, > > > >On Thu, Jun 13, 2024 at 1:08 AM Jiri Pirko <j...@resnulli.us> wrote: > >> > >> From: Jiri Pirko <j...@nvidia.com> > >> > >> Add support for Byte Queue Limits (BQL). > >> > >> Tested on qemu emulated virtio_net device with 1, 2 and 4 queues. > >> Tested with fq_codel and pfifo_fast. Super netperf with 50 threads is > >> running in background. Netperf TCP_RR results: > >> > >> NOBQL FQC 1q: 159.56 159.33 158.50 154.31 agv: 157.925 > >> NOBQL FQC 2q: 184.64 184.96 174.73 174.15 agv: 179.62 > >> NOBQL FQC 4q: 994.46 441.96 416.50 499.56 agv: 588.12 > >> NOBQL PFF 1q: 148.68 148.92 145.95 149.48 agv: 148.2575 > >> NOBQL PFF 2q: 171.86 171.20 170.42 169.42 agv: 170.725 > >> NOBQL PFF 4q: 1505.23 1137.23 2488.70 3507.99 agv: 2159.7875 > >> BQL FQC 1q: 1332.80 1297.97 1351.41 1147.57 agv: 1282.4375 > >> BQL FQC 2q: 768.30 817.72 864.43 974.40 agv: 856.2125 > >> BQL FQC 4q: 945.66 942.68 878.51 822.82 agv: 897.4175 > >> BQL PFF 1q: 149.69 151.49 149.40 147.47 agv: 149.5125 > >> BQL PFF 2q: 2059.32 798.74 1844.12 381.80 agv: 1270.995 > >> BQL PFF 4q: 1871.98 4420.02 4916.59 13268.16 agv: 6119.1875 > > > >I cannot get such a huge improvement when I was doing multiple tests > >between two VMs. I'm pretty sure the BQL feature is working, but the > >numbers look the same with/without BQL. > > > >VM 1 (client): > >16 cpus, x86_64, 4 queues, the latest net-next kernel with/without > >this patch, pfifo_fast, napi_tx=true, napi_weight=128 > > > >VM 2 (server): > >16 cpus, aarch64, 4 queues, the latest net-next kernel without this > >patch, pfifo_fast > > > >What the 'ping' command shows to me between two VMs is : rtt > >min/avg/max/mdev = 0.233/0.257/0.300/0.024 ms > > > >I started 50 netperfs to communicate the other side with the following > >command: > >#!/bin/bash > > > >for i in $(seq 5000 5050); > >do > >netperf -p $i -H [ip addr] -l 60 -t TCP_RR -- -r 64,64 > /dev/null 2>&1 & > >done > > > >The results are around 30423.62 txkB/s. If I remove '-r 64 64', they > >are still the same/similar. > > You have to stress the line by parallel TCP_STREAM instances (50 in my > case). For consistent results, use -p portnum,locport to specify the > local port.
Thanks. Even though the results of TCP_RR mode vary sometimes, I can see a big improvement in the total value of those results under such circumstances. With BQL, the throughput is 2159.17 Without BQL, it's 1099.33 Please feel free to add the tag: Tested-by: Jason Xing <kerneljasonx...@gmail.com> Thanks, Jason