Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-18 Thread David Miller
From: Rusty Russell 
Date: Mon, 18 May 2009 22:18:47 +0930

> We check for finished xmit skbs on every xmit, or on a timer (unless
> the host promises to force an interrupt when the xmit ring is empty).
> This can penalize userspace tasks which fill their sockbuf.  Not much
> difference with TSO, but measurable with large numbers of packets.
> 
> There are a finite number of packets which can be in the transmission
> queue.  We could fire the timer more than every 100ms, but that would
> just hurt performance for a corner case.  This seems neatest.
 ...
> Signed-off-by: Rusty Russell 

If this is so great for virtio it would also be a great idea
universally, but we don't do it.

What you're doing by orphan'ing is creating a situation where a single
UDP socket can loop doing sends and monopolize the TX queue of a
device.  The only control we have over a sender for fairness in
datagram protocols is that send buffer allocation.

I'm guilty of doing this too in the NIU driver, also because there I
lack a "TX queue empty" interrupt and this can keep TCP sockets from
getting stuck.

I think we need a generic solution to this issue because it is getting
quite common to see cases where the packets in the TX queue of a
device can sit there indefinitely.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-18 Thread Rusty Russell
We check for finished xmit skbs on every xmit, or on a timer (unless
the host promises to force an interrupt when the xmit ring is empty).
This can penalize userspace tasks which fill their sockbuf.  Not much
difference with TSO, but measurable with large numbers of packets.

There are a finite number of packets which can be in the transmission
queue.  We could fire the timer more than every 100ms, but that would
just hurt performance for a corner case.  This seems neatest.

With interrupt when Tx ring empty:
Seconds TxPkts  TxIRQs
 1G TCP Guest->Host:3.7632833   32758
 1M normal pings:   111 108 997463
 1M 1k pings (-l 120):  55  107 488920

Without interrupt, without orphaning:
 1G TCP Guest->Host:3.6432806   1
 1M normal pings:   106 108 1
 1M 1k pings (-l 120):  68  105 1

With orphaning:
 1G TCP Guest->Host:3.8632821   1
 1M normal pings:   102 107 1
 1M 1k pings (-l 120):  43  105 1

Signed-off-by: Rusty Russell 
---
 drivers/net/virtio_net.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -522,6 +522,11 @@ static int start_xmit(struct sk_buff *sk
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   /* We queue a limited number; don't let that delay writers if
+* we are slow in getting tx interrupt. */
+   if (!vi->free_in_tasklet)
+   skb_orphan(skb);
+
 again:
/* Free up any pending old buffers before queueing new ones. */
free_old_xmit_skbs(vi);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv6 4/4] virtio_pci: optional MSI-X support

2009-05-18 Thread Michael S. Tsirkin
On Mon, May 18, 2009 at 12:30:49AM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> This implements optional MSI-X support in virtio_pci.
>> MSI-X is used whenever the host supports at least 2 MSI-X
>> vectors: 1 for configuration changes and 1 for virtqueues.
>> Per-virtqueue vectors are allocated if enough vectors
>> available.
> 
>
> I'm not sure I understand how the vq -> msi mapping works.  Do we  
> actually support an arbitrary mapping, or just either linear or n:1?

Arbitrary mapping.

> I don't mind the driver being limited, but the device interface should  
> be flexible.  We'll want to deal with limited vector availability soon.

I agree.

The code in qemu lets you specify, for each queue, which MSIX vector you
want to use, or a special value if you don't want signalling. You also
specify which MSIX vector you want to use for config change
notifications, or a special value if you want to e.g. poll.

I think that's as flexible as it gets.

The API within guest is much simpler, but it does not need to be stable.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization