Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?

2019-06-11 Thread Jason Wang



On 2019/6/10 下午11:55, Michael S. Tsirkin wrote:

On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote:

Hi Michael,

At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for
network latency saying:

---
reduce networking latency:
  allow handling short packets from softirq or VCPU context
  Plan:
We are going through the scheduler 3 times
(could be up to 5 if softirqd is involved)
Consider RX: host irq -> io thread -> VCPU thread ->
guest irq -> guest thread.
This adds a lot of latency.
We can cut it by some 1.5x if we do a bit of work
either in the VCPU or softirq context.
  Testing: netperf TCP RR - should be improved drastically
   netperf TCP STREAM guest to host - no regression
  Contact: MST
---

I am trying to make some contributions to improving netperf TCP_RR.
Could you please share more ideas or plans or implemental details to make it
happen?

Thanks,
Like Xu


So some of this did happen. netif_receive_skb is now called
directly from tun_get_user.

Question is about the rx/tun_put_user path now.

If the vhost thread is idle, there's a single packet
outstanding then maybe we can forward the packet to userspace
directly from BH without waking up the thread.

For this to work we need to map some userspace memory into kernel
ahead of the time. For example, maybe it can happen when
guest adds RX buffers? Copying Jason who's looking into
memory mapping matters.



Btw, I wonder maybe it's time to make TODO wiki up to date.

Thanks




Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?

2019-06-11 Thread Jason Wang



On 2019/6/10 下午11:55, Michael S. Tsirkin wrote:

On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote:

Hi Michael,

At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for
network latency saying:

---
reduce networking latency:
  allow handling short packets from softirq or VCPU context
  Plan:
We are going through the scheduler 3 times
(could be up to 5 if softirqd is involved)
Consider RX: host irq -> io thread -> VCPU thread ->
guest irq -> guest thread.
This adds a lot of latency.
We can cut it by some 1.5x if we do a bit of work
either in the VCPU or softirq context.
  Testing: netperf TCP RR - should be improved drastically
   netperf TCP STREAM guest to host - no regression
  Contact: MST
---

I am trying to make some contributions to improving netperf TCP_RR.
Could you please share more ideas or plans or implemental details to make it
happen?

Thanks,
Like Xu


So some of this did happen. netif_receive_skb is now called
directly from tun_get_user.

Question is about the rx/tun_put_user path now.

If the vhost thread is idle, there's a single packet
outstanding then maybe we can forward the packet to userspace
directly from BH without waking up the thread.



After the batch dequeue, it's pretty hard to determine whether or not no 
packet is outstanding just from tun itself.





For this to work we need to map some userspace memory into kernel
ahead of the time. For example, maybe it can happen when
guest adds RX buffers? Copying Jason who's looking into
memory mapping matters.



Need to go over the rx queue and pin the pages and then use MMU 
notifiers to unpin them if necessary.  And need to consider a way to 
work with batch dequeue.


Thanks




Re: [Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?

2019-06-10 Thread Michael S. Tsirkin
On Tue, Jun 04, 2019 at 03:10:43PM +0800, Like Xu wrote:
> Hi Michael,
> 
> At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for
> network latency saying:
> 
> ---
> reduce networking latency:
>  allow handling short packets from softirq or VCPU context
>  Plan:
>We are going through the scheduler 3 times
>(could be up to 5 if softirqd is involved)
>Consider RX: host irq -> io thread -> VCPU thread ->
>guest irq -> guest thread.
>This adds a lot of latency.
>We can cut it by some 1.5x if we do a bit of work
>either in the VCPU or softirq context.
>  Testing: netperf TCP RR - should be improved drastically
>   netperf TCP STREAM guest to host - no regression
>  Contact: MST
> ---
> 
> I am trying to make some contributions to improving netperf TCP_RR.
> Could you please share more ideas or plans or implemental details to make it
> happen?
> 
> Thanks,
> Like Xu


So some of this did happen. netif_receive_skb is now called
directly from tun_get_user.

Question is about the rx/tun_put_user path now.

If the vhost thread is idle, there's a single packet
outstanding then maybe we can forward the packet to userspace
directly from BH without waking up the thread.

For this to work we need to map some userspace memory into kernel
ahead of the time. For example, maybe it can happen when
guest adds RX buffers? Copying Jason who's looking into
memory mapping matters.

-- 
MST



[Qemu-devel] [QUESTION] How to reduce network latency to improve netperf TCP_RR drastically?

2019-06-04 Thread Like Xu

Hi Michael,

At https://www.linux-kvm.org/page/NetworkingTodo, there is an entry for 
network latency saying:


---
reduce networking latency:
 allow handling short packets from softirq or VCPU context
 Plan:
   We are going through the scheduler 3 times
   (could be up to 5 if softirqd is involved)
   Consider RX: host irq -> io thread -> VCPU thread ->
   guest irq -> guest thread.
   This adds a lot of latency.
   We can cut it by some 1.5x if we do a bit of work
   either in the VCPU or softirq context.
 Testing: netperf TCP RR - should be improved drastically
  netperf TCP STREAM guest to host - no regression
 Contact: MST
---

I am trying to make some contributions to improving netperf TCP_RR.
Could you please share more ideas or plans or implemental details to 
make it happen?


Thanks,
Like Xu