Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?
On 2019/6/10 下午11:55, Michael S. Tsirkin wrote: On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote: Hi Michael, At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for network latency saying: --- reduce networking latency: allow handling short packets from softirq or VCPU context Plan: We are going through the scheduler 3 times (could be up to 5 if softirqd is involved) Consider RX: host irq -> io thread -> VCPU thread -> guest irq -> guest thread. This adds a lot of latency. We can cut it by some 1.5x if we do a bit of work either in the VCPU or softirq context. Testing: netperf TCP RR - should be improved drastically netperf TCP STREAM guest to host - no regression Contact: MST --- I am trying to make some contributions to improving netperf TCP_RR. Could you please share more ideas or plans or implemental details to make it happen? Thanks, Like Xu So some of this did happen. netif_receive_skb is now called directly from tun_get_user. Question is about the rx/tun_put_user path now. If the vhost thread is idle, there's a single packet outstanding then maybe we can forward the packet to userspace directly from BH without waking up the thread. For this to work we need to map some userspace memory into kernel ahead of the time. For example, maybe it can happen when guest adds RX buffers? Copying Jason who's looking into memory mapping matters. Btw, I wonder maybe it's time to make TODO wiki up to date. Thanks
Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?
On 2019/6/10 下午11:55, Michael S. Tsirkin wrote: On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote: Hi Michael, At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for network latency saying: --- reduce networking latency: allow handling short packets from softirq or VCPU context Plan: We are going through the scheduler 3 times (could be up to 5 if softirqd is involved) Consider RX: host irq -> io thread -> VCPU thread -> guest irq -> guest thread. This adds a lot of latency. We can cut it by some 1.5x if we do a bit of work either in the VCPU or softirq context. Testing: netperf TCP RR - should be improved drastically netperf TCP STREAM guest to host - no regression Contact: MST --- I am trying to make some contributions to improving netperf TCP_RR. Could you please share more ideas or plans or implemental details to make it happen? Thanks, Like Xu So some of this did happen. netif_receive_skb is now called directly from tun_get_user. Question is about the rx/tun_put_user path now. If the vhost thread is idle, there's a single packet outstanding then maybe we can forward the packet to userspace directly from BH without waking up the thread. After the batch dequeue, it's pretty hard to determine whether or not no packet is outstanding just from tun itself. For this to work we need to map some userspace memory into kernel ahead of the time. For example, maybe it can happen when guest adds RX buffers? Copying Jason who's looking into memory mapping matters. Need to go over the rx queue and pin the pages and then use MMU notifiers to unpin them if necessary. And need to consider a way to work with batch dequeue. Thanks
Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?
On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote: > Hi Michael, > > At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for > network latency saying: > > --- > reduce networking latency: > allow handling short packets from softirq or VCPU context > Plan: >We are going through the scheduler 3 times >(could be up to 5 if softirqd is involved) >Consider RX: host irq -> io thread -> VCPU thread -> >guest irq -> guest thread. >This adds a lot of latency. >We can cut it by some 1.5x if we do a bit of work >either in the VCPU or softirq context. > Testing: netperf TCP RR - should be improved drastically > netperf TCP STREAM guest to host - no regression > Contact: MST > --- > > I am trying to make some contributions to improving netperf TCP_RR. > Could you please share more ideas or plans or implemental details to make it > happen? > > Thanks, > Like Xu So some of this did happen. netif_receive_skb is now called directly from tun_get_user. Question is about the rx/tun_put_user path now. If the vhost thread is idle, there's a single packet outstanding then maybe we can forward the packet to userspace directly from BH without waking up the thread. For this to work we need to map some userspace memory into kernel ahead of the time. For example, maybe it can happen when guest adds RX buffers? Copying Jason who's looking into memory mapping matters. -- MST
[Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?
Hi Michael, At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for network latency saying: --- reduce networking latency: allow handling short packets from softirq or VCPU context Plan: We are going through the scheduler 3 times (could be up to 5 if softirqd is involved) Consider RX: host irq -> io thread -> VCPU thread -> guest irq -> guest thread. This adds a lot of latency. We can cut it by some 1.5x if we do a bit of work either in the VCPU or softirq context. Testing: netperf TCP RR - should be improved drastically netperf TCP STREAM guest to host - no regression Contact: MST --- I am trying to make some contributions to improving netperf TCP_RR. Could you please share more ideas or plans or implemental details to make it happen? Thanks, Like Xu