Re: [PATCH net-next 0/3] basic busy polling support for vhost_net
On Sun, Nov 29, 2015 at 10:31:10PM -0500, David Miller wrote: > From: Jason Wang > Date: Wed, 25 Nov 2015 15:11:26 +0800 > > > This series tries to add basic busy polling for vhost net. The idea is > > simple: at the end of tx/rx processing, busy polling for new tx added > > descriptor and rx receive socket for a while. The maximum number of > > time (in us) could be spent on busy polling was specified ioctl. > > > > Test A were done through: > > > > - 50 us as busy loop timeout > > - Netperf 2.6 > > - Two machines with back to back connected ixgbe > > - Guest with 1 vcpu and 1 queue > > > > Results: > > - For stream workload, ioexits were reduced dramatically in medium > > size (1024-2048) of tx (at most -43%) and almost all rx (at most > > -84%) as a result of polling. This compensate for the possible > > wasted cpu cycles more or less. That porbably why we can still see > > some increasing in the normalized throughput in some cases. > > - Throughput of tx were increased (at most 50%) expect for the huge > > write (16384). And we can send more packets in the case (+tpkts were > > increased). > > - Very minor rx regression in some cases. > > - Improvemnt on TCP_RR (at most 17%). > > Michael are you going to take this? It's touching vhost core as > much as it is the vhost_net driver. There's a minor bug there, but once it's fixed - I agree, it belongs in the vhost tree. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/3] basic busy polling support for vhost_net
From: Jason Wang Date: Wed, 25 Nov 2015 15:11:26 +0800 > This series tries to add basic busy polling for vhost net. The idea is > simple: at the end of tx/rx processing, busy polling for new tx added > descriptor and rx receive socket for a while. The maximum number of > time (in us) could be spent on busy polling was specified ioctl. > > Test A were done through: > > - 50 us as busy loop timeout > - Netperf 2.6 > - Two machines with back to back connected ixgbe > - Guest with 1 vcpu and 1 queue > > Results: > - For stream workload, ioexits were reduced dramatically in medium > size (1024-2048) of tx (at most -43%) and almost all rx (at most > -84%) as a result of polling. This compensate for the possible > wasted cpu cycles more or less. That porbably why we can still see > some increasing in the normalized throughput in some cases. > - Throughput of tx were increased (at most 50%) expect for the huge > write (16384). And we can send more packets in the case (+tpkts were > increased). > - Very minor rx regression in some cases. > - Improvemnt on TCP_RR (at most 17%). Michael are you going to take this? It's touching vhost core as much as it is the vhost_net driver. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/3] basic busy polling support for vhost_net
Hi all: This series tries to add basic busy polling for vhost net. The idea is simple: at the end of tx/rx processing, busy polling for new tx added descriptor and rx receive socket for a while. The maximum number of time (in us) could be spent on busy polling was specified ioctl. Test A were done through: - 50 us as busy loop timeout - Netperf 2.6 - Two machines with back to back connected ixgbe - Guest with 1 vcpu and 1 queue Results: - For stream workload, ioexits were reduced dramatically in medium size (1024-2048) of tx (at most -43%) and almost all rx (at most -84%) as a result of polling. This compensate for the possible wasted cpu cycles more or less. That porbably why we can still see some increasing in the normalized throughput in some cases. - Throughput of tx were increased (at most 50%) expect for the huge write (16384). And we can send more packets in the case (+tpkts were increased). - Very minor rx regression in some cases. - Improvemnt on TCP_RR (at most 17%). Guest TX: size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/ 64/ 1/ +18%/ -10%/ +7%/ +11%/0% 64/ 2/ +14%/ -13%/ +7%/ +10%/0% 64/ 4/ +8%/ -17%/ +7%/ +9%/0% 64/ 8/ +11%/ -15%/ +7%/ +10%/0% 256/ 1/ +35%/ +9%/ +21%/ +12%/ -11% 256/ 2/ +26%/ +2%/ +20%/ +9%/ -10% 256/ 4/ +23%/0%/ +21%/ +10%/ -9% 256/ 8/ +23%/0%/ +21%/ +9%/ -9% 512/ 1/ +31%/ +9%/ +23%/ +18%/ -12% 512/ 2/ +30%/ +8%/ +24%/ +15%/ -10% 512/ 4/ +26%/ +5%/ +24%/ +14%/ -11% 512/ 8/ +32%/ +9%/ +23%/ +15%/ -11% 1024/ 1/ +39%/ +16%/ +29%/ +22%/ -26% 1024/ 2/ +35%/ +14%/ +30%/ +21%/ -22% 1024/ 4/ +34%/ +13%/ +32%/ +21%/ -25% 1024/ 8/ +36%/ +14%/ +32%/ +19%/ -26% 2048/ 1/ +50%/ +27%/ +34%/ +26%/ -42% 2048/ 2/ +43%/ +21%/ +36%/ +25%/ -43% 2048/ 4/ +41%/ +20%/ +37%/ +27%/ -43% 2048/ 8/ +40%/ +18%/ +35%/ +25%/ -42% 16384/ 1/0%/ -12%/ -1%/ +8%/ +15% 16384/ 2/0%/ -10%/ +1%/ +4%/ +5% 16384/ 4/0%/ -11%/ -3%/0%/ +3% 16384/ 8/0%/ -10%/ -4%/0%/ +1% Guest RX: size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/ 64/ 1/ -2%/ -21%/ +1%/ +2%/ -75% 64/ 2/ +1%/ -9%/ +12%/0%/ -55% 64/ 4/0%/ -6%/ +5%/ -1%/ -44% 64/ 8/ -5%/ -5%/ +7%/ -23%/ -50% 256/ 1/ -8%/ -18%/ +16%/ +15%/ -63% 256/ 2/0%/ -8%/ +9%/ -2%/ -26% 256/ 4/0%/ -7%/ -8%/ +20%/ -41% 256/ 8/ -8%/ -11%/ -9%/ -24%/ -78% 512/ 1/ -6%/ -19%/ +20%/ +18%/ -29% 512/ 2/0%/ -10%/ -14%/ -8%/ -31% 512/ 4/ -1%/ -5%/ -11%/ -9%/ -38% 512/ 8/ -7%/ -9%/ -17%/ -22%/ -81% 1024/ 1/0%/ -16%/ +12%/ +9%/ -11% 1024/ 2/0%/ -11%/0%/ +3%/ -30% 1024/ 4/0%/ -4%/ +2%/ +6%/ -15% 1024/ 8/ -3%/ -4%/ -8%/ -8%/ -70% 2048/ 1/ -8%/ -23%/ +36%/ +22%/ -11% 2048/ 2/0%/ -12%/ +1%/ +3%/ -29% 2048/ 4/0%/ -3%/ -17%/ -15%/ -84% 2048/ 8/0%/ -3%/ +1%/ -3%/ +10% 16384/ 1/0%/ -11%/ +4%/ +7%/ -22% 16384/ 2/0%/ -7%/ +4%/ +4%/ -33% 16384/ 4/0%/ -2%/ -2%/ -4%/ -23% 16384/ 8/ -1%/ -2%/ +1%/ -22%/ -40% TCP_RR: size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/ 1/ 1/ +11%/ -26%/ +11%/ +11%/ +10% 1/25/ +11%/ -15%/ +11%/ +11%/0% 1/50/ +9%/ -16%/ +10%/ +10%/0% 1/ 100/ +9%/ -15%/ +9%/ +9%/0% 64/ 1/ +11%/ -31%/ +11%/ +11%/ +11% 64/25/ +12%/ -14%/ +12%/ +12%/0% 64/50/ +11%/ -14%/ +12%/ +12%/0% 64/ 100/ +11%/ -15%/ +11%/ +11%/0% 256/ 1/ +11%/ -27%/ +11%/ +11%/ +10% 256/25/ +17%/ -11%/ +16%/ +16%/ -1% 256/50/ +16%/ -11%/ +17%/ +17%/ +1% 256/ 100/ +17%/ -11%/ +18%/ +18%/ +1% Test B were done through: - 50us as busy loop timeout - Netperf 2.6 - Two machines with back to back connected ixgbe - Two guests each wich 1 vcpu and 1 queue - pin two vhost threads to the same cpu on host to simulate the cpu contending Results: - In this radical case, we can still get at most 14% improvement on TCP_RR. - For guest tx stream, minor improvemnt with at most 5% regression in one byte case. For guest rx stream, at most 5% regression were seen. Guest TX: size /-+% / 1/-5.55%/ 64 /+1.11%/ 256 /+2.33%/ 512 /-0.03%/ 1024 /+1.14%/ 4096 /+0.00%/ 16384/+0.00%/ Guest RX: size /-+% / 1/-5.11%/ 64 /-0.55%/ 256 /-2.35%/ 512 /-3.39%/ 1024 /+6.8% / 4096 /-0.01%/ 16384/+0.00%/ TCP_RR: size /-+%/ 1/+9.79% / 64 /+4.51% / 256 /+6.47% / 512 /-3.37% / 1024 /+6.15% / 4096 /+14.88%/ 16384/-2.23% / Changes from RFC V3: - small tweak on the code to avoid mul