Re: [PATCH v4 00/12] virtio: s4 support

2011-12-07 Thread Rusty Russell
On Wed, 7 Dec 2011 13:14:56 +0530, Amit Shah  wrote:
> On (Wed) 07 Dec 2011 [17:54:29], Rusty Russell wrote:
> > I figure there's a reason, but it seems a bit weird :)
> 
> Well, there is one reason right now: migrating storage along with
> VMs.  The guest needs to sync all data to the disk before the target
> host accesses the image file.

I don't see why.  Sure, if qemu (or whatever) is doing buffering, it
needs to flush that.  It doesn't, last I looked, it just maps to
read/write.

What am I missing?

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first

2011-12-07 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 7 Dec 2011 18:10:02 +0200

> On Fri, Nov 25, 2011 at 01:35:52AM -0500, David Miller wrote:
>> From: Krishna Kumar2 
>> Date: Fri, 25 Nov 2011 09:39:11 +0530
>> 
>> > Jason Wang  wrote on 11/25/2011 08:51:57 AM:
>> >>
>> >> My description is not clear again :(
>> >> I mean the same vhost thead:
>> >>
>> >> vhost thread #0 transmits packets of flow A on processor M
>> >> ...
>> >> vhost thread #0 move to another process N and start to transmit packets
>> >> of flow A
>> > 
>> > Thanks for clarifying. Yes, binding vhosts to CPU's
>> > makes the incoming packet go to the same vhost each
>> > time. BTW, are you doing any binding and/or irqbalance
>> > when you run your tests? I am not running either at
>> > this time, but thought both might be useful.
>> 
>> So are we going with this patch or are we saying that vhost binding
>> is a requirement?
> 
> OK we didn't come to a conclusion so I would be inclined
> to merge this patch as is for 3.2, and revisit later.
> One question though: do these changes affect userspace
> in any way? For example, will this commit us to
> ensure that a single flow gets a unique hash even
> for strange configurations that transmit the same flow
> from multiple cpus?

Once you sort this out, reply with an Acked-by: for me, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-07 Thread Ben Hutchings
On Wed, 2011-12-07 at 19:31 +0800, Jason Wang wrote:
> On 12/07/2011 03:30 PM, Rusty Russell wrote:
> > On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang  wrote:
> >> multiple queue virtio-net: flow steering through host/guest cooperation
> >>
> >> Hello all:
> >>
> >> This is a rough series adds the guest/host cooperation of flow
> >> steering support based on Krish Kumar's multiple queue virtio-net
> >> driver patch 3/3 (http://lwn.net/Articles/467283/).
> > Is there a real (physical) device which does this kind of thing?  How do
> > they do it?  Can we copy them?
> >
> > Cheers,
> > Rusty.
> As far as I see, ixgbe and sfc have similar but much more sophisticated 
> mechanism.
> 
> The idea was originally suggested by Ben and it was just borrowed form 
> those real physical nic cards who can dispatch packets based on their 
> hash. All of theses cards can filter the flow based on the hash of 
> L2/L3/L4 header and the stack would tell the card which queue should 
> this flow goes.

Solarflare controllers (sfc driver) have 8192 perfect filters for
TCP/IPv4 and UDP/IPv4 which can be used for flow steering.  (The filters
are organised as a hash table, but matched based on 5-tuples.)  I
implemented the 'accelerated RFS' interface in this driver.

I believe the Intel 82599 controllers (ixgbe driver) have both
hash-based and perfect filter modes and the driver can be configured to
use one or the other.  The driver has its own independent mechanism for
steering RX and TX flows which predates RFS; I don't know whether it
uses hash-based or perfect filters.

Most multi-queue controllers could support a kind of hash-based
filtering for TCP/IP by adjusting the RSS indirection table.  However,
this table is usually quite small (64-256 entries).  This means that
hash collisions will be quite common and this can result in reordering.
The same applies to the small table Jason has proposed for virtio-net.

> So in host, a simple hash to queue table were introduced in tap/macvtap 
> and in guest, the guest driver would tell the desired queue of a flow 
> through changing this table.

I don't think accelerated RFS can work well without the use of perfect
filtering or hash-based filtering with a very low rate of collisions.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Raghavendra K T

On 12/07/2011 08:22 PM, Avi Kivity wrote:

On 12/07/2011 03:39 PM, Marcelo Tosatti wrote:

Also I think we can keep the kicked flag in vcpu->requests, no need for
new storage.


Was going to suggest it but it violates the currently organized
processing of entries at the beginning of vcpu_enter_guest.

That is, this "kicked" flag is different enough from vcpu->requests
processing that a separate variable seems worthwhile (even more
different with convertion to MP_STATE at KVM_GET_MP_STATE).


IMO, it's similar to KVM_REQ_EVENT (which can also cause mpstate to
change due to apic re-evaluation).



Ok, So what I understand is we have to either :
1. retain current kick flag AS-IS but would have to make it migration 
friendly. [I still have to get more familiar with migration side]

or
2. introduce notion similar to KVM_REQ_PVLOCK_KICK(??) to be part of
vcpu->requests.

So what would be better? Please let me know.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Raghavendra K T

On 12/07/2011 06:03 PM, Marcelo Tosatti wrote:

On Wed, Dec 07, 2011 at 05:24:59PM +0530, Raghavendra K T wrote:

On 12/07/2011 04:18 PM, Marcelo Tosatti wrote:
Yes you are right. It was potentially racy and it was harmful too!.
I had observed that it was stalling the CPU before I introduced
kicked flag.

But now,

vcpu->kicked = 1  ==>  kvm_make_request(KVM_REQ_UNHALT, vcpu); ==>


Ok, please use a more descriptive name, such as "pvlock_kicked" or
something.



Yes, pvlock_kicked seems good. 'll use same unless something else
flashes.



__vcpu_run() ==>  kvm_check_request(KVM_REQ_UNHALT, vcpu) ==>

vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; so eventually we will end up
in RUNNABLE.

Also Avi pointed that, logically kvm_arch_vcpu_ioctl_set_mpstate should
be called only in vcpu thread, so after further debugging, I noticed
that,  setting vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; is not
necessary.
I 'll remove that in the next patch. Thanks for pointing.


In fact you don't need kvm_arch_vcpu_ioctl_set_mpstate either, only the
new "kicked" flag.



True indeed, I meant the same.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first

2011-12-07 Thread Michael S. Tsirkin
On Fri, Nov 25, 2011 at 01:35:52AM -0500, David Miller wrote:
> From: Krishna Kumar2 
> Date: Fri, 25 Nov 2011 09:39:11 +0530
> 
> > Jason Wang  wrote on 11/25/2011 08:51:57 AM:
> >>
> >> My description is not clear again :(
> >> I mean the same vhost thead:
> >>
> >> vhost thread #0 transmits packets of flow A on processor M
> >> ...
> >> vhost thread #0 move to another process N and start to transmit packets
> >> of flow A
> > 
> > Thanks for clarifying. Yes, binding vhosts to CPU's
> > makes the incoming packet go to the same vhost each
> > time. BTW, are you doing any binding and/or irqbalance
> > when you run your tests? I am not running either at
> > this time, but thought both might be useful.
> 
> So are we going with this patch or are we saying that vhost binding
> is a requirement?

OK we didn't come to a conclusion so I would be inclined
to merge this patch as is for 3.2, and revisit later.
One question though: do these changes affect userspace
in any way? For example, will this commit us to
ensure that a single flow gets a unique hash even
for strange configurations that transmit the same flow
from multiple cpus?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 06/12] virtio: blk: Add freeze, restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 04:26:47PM +0530, Amit Shah wrote:
> On (Wed) 07 Dec 2011 [12:37:02], Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2011 at 01:18:44AM +0530, Amit Shah wrote:
> > > Delete the vq and flush any pending requests from the block queue on the
> > > freeze callback to prepare for hibernation.
> > > 
> > > Re-create the vq in the restore callback to resume normal function.
> > > 
> > > Signed-off-by: Amit Shah 
> > > ---
> > >  drivers/block/virtio_blk.c |   38 ++
> > >  1 files changed, 38 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 467f218..a9147a6 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -568,6 +568,40 @@ static void __devexit virtblk_remove(struct 
> > > virtio_device *vdev)
> > >   ida_simple_remove(&vd_index_ida, index);
> > >  }
> > >  
> > > +#ifdef CONFIG_PM
> > > +static int virtblk_freeze(struct virtio_device *vdev)
> > > +{
> > > + struct virtio_blk *vblk = vdev->priv;
> > > +
> > > + /* Ensure we don't receive any more interrupts */
> > > + vdev->config->reset(vdev);
> > > +
> > > + flush_work(&vblk->config_work);
> > 
> > It bothers me that config work can be running
> > after reset here. If it does, it will not get sane
> > values from reading config.
> 
> Why so?
> 
> The reset only ensures the host doesn't write anything more, isn't it?
> Why would the values be affected?

Generally, not only that. Reset also clears configuration to the
reset value :) As since accesses are done byte
by byte you might get a value that is different from
*both* old and new one as a result.

But that is a general comment, specifically for block,
I don't know if there is a problem with this.

Same for console.

> > Also, can there be stuff in the reqs list?
> > If yes is this a problem?
> 
> Should be all cleared by the two commands below.  At least that's my
> expectation.  If not, let me know!
> 
> > > + spin_lock_irq(vblk->disk->queue->queue_lock);
> > > + blk_stop_queue(vblk->disk->queue);
> > > + spin_unlock_irq(vblk->disk->queue->queue_lock);
> > > + blk_sync_queue(vblk->disk->queue);
> > > +
> > > + vdev->config->del_vqs(vdev);
> > > + return 0;
> > > +}
> > > +
> > 
> > Thinking about it, looks like there's a bug in
> > virtblk_remove: if we get a config change after
> > flush_work we schedule another work.
> > That's a problem for sure as structure is removed.
> 
> Yep, it is a potential issue.
> 
>   Amit

Sent a patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 04/12] virtio: console: Add freeze and restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 04:29:55PM +0530, Amit Shah wrote:
> On (Wed) 07 Dec 2011 [12:43:24], Michael S. Tsirkin wrote:
> 
> > > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > > index e14f5aa..fd2fd6f 100644
> > > --- a/drivers/char/virtio_console.c
> > > +++ b/drivers/char/virtio_console.c
> > > @@ -1844,6 +1844,60 @@ static unsigned int features[] = {
> > >   VIRTIO_CONSOLE_F_MULTIPORT,
> > >  };
> > >  
> > > +#ifdef CONFIG_PM
> > > +static int virtcons_freeze(struct virtio_device *vdev)
> > > +{
> > > + struct ports_device *portdev;
> > > + struct port *port;
> > > +
> > > + portdev = vdev->priv;
> > > +
> > > + vdev->config->reset(vdev);
> > 
> > 
> > So here, cancel_work_sync might still be running.
> > If it does run, might it try to access the device
> > config?  Could not determine this quickly, if yes it's a problem.
> 
> Similar to the other comment: I don't see why just resetting device
> can cause config queue access to go bad.
> 
>   Amit

Hmm, not sure anymore. Sorry.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 04:02:45PM +0200, Sasha Levin wrote:
> On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
> 
> [snip]
> 
> Rusty, Michael, does the below looks a reasonable optimization for you?

OK overall but a bit hard to say for sure as it looks pretty incomplete ...

> should I send it as a patch?

What's the performance gain?

> > Something like the following patch:
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c7a2c20..3166ca0 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -82,6 +82,7 @@ struct vring_virtqueue
> >  
> > /* Host supports indirect buffers */
> > bool indirect;
> > +   struct kmem_cache *indirect_cache;
> >  
> > /* Host publishes avail event idx */
> > bool event;
> > @@ -110,6 +111,9 @@ struct vring_virtqueue
> >  
> >  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
> >  
> > +static unsigned int ind_alloc_thresh = 0;
> > +module_param(ind_alloc_thresh, uint, S_IRUGO);

0 will have no effect?

> > +
> >  /* Set up an indirect table of descriptors and add it to the queue. */
> >  static int vring_add_indirect(struct vring_virtqueue *vq,
> >   struct scatterlist sg[],

A global parameter is OK for testing but likely not what we want
in real life. This needs to be different per device.



> > @@ -121,7 +125,10 @@ static int vring_add_indirect(struct vring_virtqueue 
> > *vq,
> > unsigned head;
> > int i;
> >  
> > -   desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> > +   if ((out + in) <= ind_alloc_thresh)
> > +   desc = kmem_cache_alloc(vq->indirect_cache, gfp);
> > +   else
> > +   desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> > if (!desc)
> > return -ENOMEM;
> >  

free unaffected?

> > @@ -479,6 +486,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> > vq->broken = false;
> > vq->last_used_idx = 0;
> > vq->num_added = 0;
> > +   if (ind_alloc_thresh)
> > +   vq->indirect_cache = KMEM_CACHE(vring_desc[ind_alloc_thresh], 
> > 0);

and need to cleanup too?

> > list_add_tail(&vq->vq.list, &vdev->vqs);
> >  #ifdef DEBUG
> > vq->in_use = false;
> > 
> 
> -- 
> 
> Sasha.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC] virtio_blk: fix config handler race

2011-12-07 Thread Michael S. Tsirkin
Fix a theoretical race related to config work
handler: a config interrupt might happen
after we flush config work but before we
reset the device. It will then cause the
config work to run during or after reset.

Two problems with this:
- if this runs after device is gone we will get use after free
- access of config while reset is in progress is racy
(as layout is changing).

As a solution
1. flush after reset when we know there will be no more interrupts
2. add a flag to disable config access before reset

Signed-off-by: Michael S. Tsirkin 
---

RFC only as it's untested.
Bugfix so 3.2 material? Comments?

 drivers/block/virtio_blk.c |   22 +-
 1 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4d0b70a..34633f3 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,12 @@ struct virtio_blk
/* Process context for config space updates */
struct work_struct config_work;
 
+   /* Lock for config space updates */
+   struct mutex config_lock;
+
+   /* enable config space updates */
+   bool config_enable;
+
/* What host tells us, plus 2 for header & tailer. */
unsigned int sg_elems;
 
@@ -318,6 +325,10 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
char cap_str_2[10], cap_str_10[10];
u64 capacity, size;
 
+   mutex_lock(&vblk->config_lock);
+   if (!vblk->config_enable)
+   goto done;
+
/* Host must always specify the capacity. */
vdev->config->get(vdev, offsetof(struct virtio_blk_config, capacity),
  &capacity, sizeof(capacity));
@@ -340,6 +351,8 @@ static void virtblk_config_changed_work(struct work_struct 
*work)
  cap_str_10, cap_str_2);
 
set_capacity(vblk->disk, capacity);
+done:
+   mutex_lock(&vblk->config_lock);
 }
 
 static void virtblk_config_changed(struct virtio_device *vdev)
@@ -388,7 +401,9 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
vblk->vdev = vdev;
vblk->sg_elems = sg_elems;
sg_init_table(vblk->sg, vblk->sg_elems);
+   mutex_init(&vblk->config_lock);
INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
+   vblk->config_enable = true;
 
/* We expect one virtqueue, for output. */
vblk->vq = virtio_find_single_vq(vdev, blk_done, "requests");
@@ -542,7 +557,10 @@ static void __devexit virtblk_remove(struct virtio_device 
*vdev)
struct virtio_blk *vblk = vdev->priv;
int index = vblk->index;
 
-   flush_work(&vblk->config_work);
+   /* Prevent config work handler from accessing the device. */
+   mutex_lock(&vblk->config_lock);
+   vblk->config_enable = false;
+   mutex_unlock(&vblk->config_lock);
 
/* Nothing should be pending. */
BUG_ON(!list_empty(&vblk->reqs));
@@ -550,6 +568,8 @@ static void __devexit virtblk_remove(struct virtio_device 
*vdev)
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
 
+   flush_work(&vblk->config_work);
+
del_gendisk(vblk->disk);
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
-- 
1.7.5.53.gc233e
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC] virtio_net: fix refill related races

2011-12-07 Thread Michael S. Tsirkin
Fix theoretical races related to refill work:
1. After napi is disabled by ndo_stop, refill work
   can run and re-enable it.
2. Refill can reschedule itself, if this happens
   it can run after cancel_delayed_work_sync,
   and will access device after it is destroyed.

As a solution, add flags to track napi state and
to disable refill, and toggle them on start, stop
and remove; check these flags on refill.

refill work structure and new fields aren't used
on data path, so put them together near the end of
struct virtnet_info.

Signed-off-by: Michael S. Tsirkin 
---

RFC only since it's untested at this point.
A bugfix so 3.2 material?
Comments?


 drivers/net/virtio_net.c |   57 +-
 1 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f5b3f19..39eb24c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,15 +69,24 @@ struct virtnet_info {
/* Active statistics */
struct virtnet_stats __percpu *stats;
 
-   /* Work struct for refilling if we run low on memory. */
-   struct delayed_work refill;
-
/* Chain pages by the private ptr. */
struct page *pages;
 
/* fragments + linear part + virtio header */
struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+
+   /* Work struct for refilling if we run low on memory. */
+   struct delayed_work refill;
+
+   /* Set flag to allow delayed refill work, protected by a refill_lock. */
+   bool refill_enable;
+
+   /* Whether napi is enabled, protected by a refill_lock. */
+   bool napi_enable;
+
+   /* Lock to protect refill and napi enable/disable operations. */
+   struct mutex refill_lock;
 };
 
 struct skb_vnet_hdr {
@@ -477,20 +487,35 @@ static void virtnet_napi_enable(struct virtnet_info *vi)
}
 }
 
+static void virtnet_refill_enable(struct virtnet_info *vi, bool enable)
+{
+   mutex_lock(&vi->refill_lock);
+   vi->refill_enable = enable;
+   mutex_unlock(&vi->refill_lock);
+}
+
 static void refill_work(struct work_struct *work)
 {
struct virtnet_info *vi;
bool still_empty;
 
vi = container_of(work, struct virtnet_info, refill.work);
-   napi_disable(&vi->napi);
-   still_empty = !try_fill_recv(vi, GFP_KERNEL);
-   virtnet_napi_enable(vi);
 
-   /* In theory, this can happen: if we don't get any buffers in
-* we will *never* try to fill again. */
-   if (still_empty)
-   schedule_delayed_work(&vi->refill, HZ/2);
+   mutex_lock(&vi->refill_lock);
+   if (vi->refill_enable) {
+   if (vi->napi_enable)
+   napi_disable(&vi->napi);
+   still_empty = !try_fill_recv(vi, GFP_KERNEL);
+   if (vi->napi_enable)
+   virtnet_napi_enable(vi);
+
+   /* In theory, this can happen: if we don't get any buffers in
+* we will *never* try to fill again. */
+   if (still_empty)
+   schedule_delayed_work(&vi->refill, HZ/2);
+
+   }
+   mutex_unlock(&vi->refill_lock);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
@@ -708,7 +733,10 @@ static int virtnet_open(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   mutex_lock(&vi->refill_lock);
+   vi->napi_enable = true;
virtnet_napi_enable(vi);
+   mutex_unlock(&vi->refill_lock);
return 0;
 }
 
@@ -761,7 +789,10 @@ static int virtnet_close(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   mutex_lock(&vi->refill_lock);
+   vi->napi_enable = false;
napi_disable(&vi->napi);
+   mutex_unlock(&vi->refill_lock);
 
return 0;
 }
@@ -1010,6 +1041,7 @@ static int virtnet_probe(struct virtio_device *vdev)
if (vi->stats == NULL)
goto free;
 
+   mutex_init(&vi->refill_lock);
INIT_DELAYED_WORK(&vi->refill, refill_work);
sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
@@ -1047,6 +1079,8 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_vqs;
}
 
+   virtnet_refill_enable(vi, true);
+
/* Last of all, set up some receive buffers. */
try_fill_recv(vi, GFP_KERNEL);
 
@@ -1107,10 +1141,11 @@ static void __devexit virtnet_remove(struct 
virtio_device *vdev)
 {
struct virtnet_info *vi = vdev->priv;
 
+   virtnet_refill_enable(vi, false);
+
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
 
-
unregister_netdev(vi->dev);
cancel_delayed_work_sync(&vi->refill);
 
-- 
1.7.5.53.gc233e
__

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Stefan Hajnoczi
On Wed, Dec 7, 2011 at 12:10 PM, Jason Wang  wrote:
> On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote:
> [...]
>
>>> >  Consider the complexity of the host nic each with their own steering
>>> >  features,  this series make the first step with minimal effort to try
>>> > to let
>>> >  guest driver and host tap/macvtap co-operate like what physical nic
>>> > does.
>>> >  There may be other method, but performance numbers is also needed to
>>> > give
>>> >  the answer.
>>
>> I agree that performance results for this need to be shown.
>>
>> My original point is really that it's not a good idea to take
>> individual steps without a good big picture because this will change
>> the virtio-net device specification.  If this turns out to be a dead
>> end then hosts will need to continue to support the interface forever
>> (legacy guests could still try to use it).  So please first explain
>> what the full stack picture is going to look like and how you think it
>> will lead to better performance.  You don't need to have all the code
>> or evidence, but just enough explanation so we see where this is all
>> going.
>
> I think I mention too little in the cover message.
>
> There's no much changes with Krishna's series except the method that
> choosing a rx virtqueue. Since original series use different hash methods in
> host (rxhash) and guest (txhash), a different virtqueue were chose for a
> flow which could lead packets of a flow to be handled by different vhost
> thread and vcpu. This may damage the performance.
>
> This series tries to let one vhost thread to process the packets of a flow
> and also let the packets to be sent directly to a vcpu local to the thread
> process the data. This is done by letting guest tell the desired queue form
> which it want to receive the pakcet of a dedicated flow.
>
> So passing the hash from host to guest is needed to get the same hash in the
> two sides. Then a guest programmable hash to queue table were introduced and
> guest co-operate with the host through accelerate RFS in guest.

Thanks for the context, that helps me understand it a little better.

Stefan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Avi Kivity
On 12/07/2011 03:39 PM, Marcelo Tosatti wrote:
> > Also I think we can keep the kicked flag in vcpu->requests, no need for
> > new storage.
>
> Was going to suggest it but it violates the currently organized
> processing of entries at the beginning of vcpu_enter_guest.
>
> That is, this "kicked" flag is different enough from vcpu->requests
> processing that a separate variable seems worthwhile (even more
> different with convertion to MP_STATE at KVM_GET_MP_STATE).

IMO, it's similar to KVM_REQ_EVENT (which can also cause mpstate to
change due to apic re-evaluation).

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-07 Thread Sasha Levin
On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:

[snip]

Rusty, Michael, does the below looks a reasonable optimization for you?
should I send it as a patch?

> Something like the following patch:
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c7a2c20..3166ca0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -82,6 +82,7 @@ struct vring_virtqueue
>  
>   /* Host supports indirect buffers */
>   bool indirect;
> + struct kmem_cache *indirect_cache;
>  
>   /* Host publishes avail event idx */
>   bool event;
> @@ -110,6 +111,9 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +static unsigned int ind_alloc_thresh = 0;
> +module_param(ind_alloc_thresh, uint, S_IRUGO);
> +
>  /* Set up an indirect table of descriptors and add it to the queue. */
>  static int vring_add_indirect(struct vring_virtqueue *vq,
> struct scatterlist sg[],
> @@ -121,7 +125,10 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
>   unsigned head;
>   int i;
>  
> - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> + if ((out + in) <= ind_alloc_thresh)
> + desc = kmem_cache_alloc(vq->indirect_cache, gfp);
> + else
> + desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
>   if (!desc)
>   return -ENOMEM;
>  
> @@ -479,6 +486,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
>   vq->broken = false;
>   vq->last_used_idx = 0;
>   vq->num_added = 0;
> + if (ind_alloc_thresh)
> + vq->indirect_cache = KMEM_CACHE(vring_desc[ind_alloc_thresh], 
> 0);
>   list_add_tail(&vq->vq.list, &vdev->vqs);
>  #ifdef DEBUG
>   vq->in_use = false;
> 

-- 

Sasha.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 00/12] virtio: s4 support

2011-12-07 Thread Dor Laor

On 12/07/2011 12:44 PM, Gleb Natapov wrote:

On Wed, Dec 07, 2011 at 05:54:29PM +1030, Rusty Russell wrote:

On Wed,  7 Dec 2011 01:18:38 +0530, Amit Shah  wrote:

Hi,

These patches add support for S4 to virtio (pci) and all drivers.


Dumb meta-question: why do we want to hibernate virtual machines?


For instance you want to hibernate your laptop while it is running a virtual
machine. You can pause them of course, but then time will be incorrect
inside a guest after resume.


Exactly. Today with do that through live migration into a file, we call 
it external hibernation. The problem that it is transparent from the 
guest and thus it loses time keeping. Even NTP can't handle it since if 
it has more than 0.5% delay it stops syncing.


The solution is to do it through s4 where the guest gets a notification 
when it resumes.






I figure there's a reason, but it seems a bit weird :)

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Marcelo Tosatti
On Wed, Dec 07, 2011 at 02:47:05PM +0200, Avi Kivity wrote:
> On 12/07/2011 02:33 PM, Marcelo Tosatti wrote:
> > > 
> > > Also Avi pointed that, logically kvm_arch_vcpu_ioctl_set_mpstate should
> > > be called only in vcpu thread, so after further debugging, I noticed
> > > that,  setting vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; is not
> > > necessary.
> > > I 'll remove that in the next patch. Thanks for pointing.
> >
> > In fact you don't need kvm_arch_vcpu_ioctl_set_mpstate either, only the
> > new "kicked" flag.
> 
> If we have a kicked flag, it becomes necessary to live migrate it.
> 
> Maybe we can change KVM_GET_MP_STATE to fold the kicked flag into the mp
> state (converting HALTED into RUNNABLE).

Yep, that works.

> Also I think we can keep the kicked flag in vcpu->requests, no need for
> new storage.

Was going to suggest it but it violates the currently organized
processing of entries at the beginning of vcpu_enter_guest.

That is, this "kicked" flag is different enough from vcpu->requests
processing that a separate variable seems worthwhile (even more
different with convertion to MP_STATE at KVM_GET_MP_STATE).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-07 Thread Avi Kivity
On 12/06/2011 02:03 PM, Rusty Russell wrote:
> On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity  wrote:
> > On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > > the rings themselves would be shared A would say "copy this to/from B's
> > > ring entry N" and you know that A can't have changed B's entry.
> > 
> > Sorry, I don't follow.  How can the rings be shared?  If A puts a gpa in
> > A's address space into the ring, there's no way B can do anything with
> > it, it's an opaque number.  Xen solves this with an extra layer of
> > indirection (grant table handles) that cost extra hypercalls to map or
> > copy.
>
> It's not symmetric.  B can see the desc and avail pages R/O, and the
> used page R/W.  It needs to ask the something to copy in/out of
> descriptors, though, because they're an opaque number, and it doesn't
> have access.  ie. the existence of the descriptor in the ring *implies*
> a grant.
>
> Perhaps this could be generalized further into a "connect these two
> rings", but I'm not sure.  Descriptors with both read and write parts
> are tricky.

Okay, I was using a wrong mental model of how this works.  B must be
aware of the translation from A's address space into B.  Both qemu and
the kernel can do this on their own, but if B is another guest, then it
cannot do this except by calling H.

vhost-copy cannot work fully transparently, because you need some memory
to copy into.  Maybe we can have a pci device with a large BAR that
contains buffers for copying, and also a translation from A addresses
into B addresses.  It would work something like this:

  A prepares a request with both out and in buffers
  vhost-copy allocates memory in B's virtio-copy BAR, copies (using a
DMA engine) the out buffers into it, and rewrites the out descriptors to
contain B addresses
  B services the request, and updates the in addresses in the
descriptors to point at B memory
  vhost-copy copies (using a DMA engine) the in buffers into A memory

> > > I'm just not sure how the host would even know to hint.
> > 
> > For JBOD storage, a good rule of thumb is (number of spindles) x 3. 
> > With less, you might leave an idle spindle; with more, you're just
> > adding latency.  This assumes you're using indirects so ring entry ==
> > request.  The picture is muddier with massive battery-backed RAID
> > controllers or flash.
> > 
> > For networking, you want (ring size) * min(expected packet size, page
> > size) / (link bandwidth) to be something that doesn't get the
> > bufferbloat people after your blood.
>
> OK, so while neither side knows, the host knows slightly more.
>
> Now I think about it, from a spec POV, saying it's a "hint" is useless,
> as it doesn't tell the driver what to do with it.  I'll say it's a
> maximum, which keeps it simple.
>

Those rules of thumb always have exceptions, I'd say it's the default
that the guest can override.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Avi Kivity
On 12/07/2011 02:33 PM, Marcelo Tosatti wrote:
> > 
> > Also Avi pointed that, logically kvm_arch_vcpu_ioctl_set_mpstate should
> > be called only in vcpu thread, so after further debugging, I noticed
> > that,  setting vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; is not
> > necessary.
> > I 'll remove that in the next patch. Thanks for pointing.
>
> In fact you don't need kvm_arch_vcpu_ioctl_set_mpstate either, only the
> new "kicked" flag.

If we have a kicked flag, it becomes necessary to live migrate it.

Maybe we can change KVM_GET_MP_STATE to fold the kicked flag into the mp
state (converting HALTED into RUNNABLE).

Also I think we can keep the kicked flag in vcpu->requests, no need for
new storage.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Marcelo Tosatti
On Wed, Dec 07, 2011 at 05:24:59PM +0530, Raghavendra K T wrote:
> On 12/07/2011 04:18 PM, Marcelo Tosatti wrote:
> >On Wed, Nov 30, 2011 at 02:29:59PM +0530, Raghavendra K T wrote:
> >>
> >>+/*
> >>+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
> >>+ *
> >>+ * @cpu - vcpu to be kicked.
> >>+ */
> >>+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
> >>+{
> >>+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
> >>+   struct kvm_mp_state mp_state;
> >>+
> >>+   mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
> >
> >Since vcpu->mp_state is not protected by a lock, this is potentially racy. 
> >For example:
> >
> >CPU0 CPU1
> >kvm_pv_kick_cpu_op   running vcpuN
> >vcpuN->mp_state = KVM_MP_STATE_RUNNABLE;
> > kvm_emulate_halt
> > vcpuN->mp_state = 
> > KVM_MP_STATE_HALTED
> >
> >Is it harmless to lose a kick?
> >
> 
> Yes you are right. It was potentially racy and it was harmful too!.
> I had observed that it was stalling the CPU before I introduced
> kicked flag.
> 
> But now,
> 
> vcpu->kicked = 1  ==> kvm_make_request(KVM_REQ_UNHALT, vcpu); ==>

Ok, please use a more descriptive name, such as "pvlock_kicked" or
something.

> 
> __vcpu_run() ==> kvm_check_request(KVM_REQ_UNHALT, vcpu) ==>
> 
> vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; so eventually we will end up
> in RUNNABLE.
> 
> Also Avi pointed that, logically kvm_arch_vcpu_ioctl_set_mpstate should
> be called only in vcpu thread, so after further debugging, I noticed
> that,  setting vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; is not
> necessary.
> I 'll remove that in the next patch. Thanks for pointing.

In fact you don't need kvm_arch_vcpu_ioctl_set_mpstate either, only the
new "kicked" flag.

> 
> 
>   
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote:
[...]

>  Consider the complexity of the host nic each with their own steering
>  features,  this series make the first step with minimal effort to try to let
>  guest driver and host tap/macvtap co-operate like what physical nic does.
>  There may be other method, but performance numbers is also needed to give
>  the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

I think I mention too little in the cover message.

There's no much changes with Krishna's series except the method that 
choosing a rx virtqueue. Since original series use different hash 
methods in host (rxhash) and guest (txhash), a different virtqueue were 
chose for a flow which could lead packets of a flow to be handled by 
different vhost thread and vcpu. This may damage the performance.


This series tries to let one vhost thread to process the packets of a 
flow and also let the packets to be sent directly to a vcpu local to the 
thread process the data. This is done by letting guest tell the desired 
queue form which it want to receive the pakcet of a dedicated flow.


So passing the hash from host to guest is needed to get the same hash in 
the two sides. Then a guest programmable hash to queue table were 
introduced and guest co-operate with the host through accelerate RFS in 
guest.





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Raghavendra K T

On 12/07/2011 04:18 PM, Marcelo Tosatti wrote:

On Wed, Nov 30, 2011 at 02:29:59PM +0530, Raghavendra K T wrote:


+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @cpu - vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
+   struct kvm_mp_state mp_state;
+
+   mp_state.mp_state = KVM_MP_STATE_RUNNABLE;


Since vcpu->mp_state is not protected by a lock, this is potentially racy. For 
example:

CPU0CPU1
kvm_pv_kick_cpu_op  running vcpuN
vcpuN->mp_state = KVM_MP_STATE_RUNNABLE;
kvm_emulate_halt
vcpuN->mp_state = 
KVM_MP_STATE_HALTED

Is it harmless to lose a kick?



Yes you are right. It was potentially racy and it was harmful too!. I 
had observed that it was stalling the CPU before I introduced kicked flag.


But now,

vcpu->kicked = 1  ==> kvm_make_request(KVM_REQ_UNHALT, vcpu); ==>

__vcpu_run() ==> kvm_check_request(KVM_REQ_UNHALT, vcpu) ==>

vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; so eventually we will end up
in RUNNABLE.

Also Avi pointed that, logically kvm_arch_vcpu_ioctl_set_mpstate should
be called only in vcpu thread, so after further debugging, I noticed
that,  setting vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; is not
necessary.
I 'll remove that in the next patch. Thanks for pointing.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 09/12] virtio: net: Add freeze, restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 04:05:29PM +0530, Amit Shah wrote:
> On (Wed) 07 Dec 2011 [12:28:24], Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2011 at 01:18:47AM +0530, Amit Shah wrote:
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 697a0fc..1378f3c 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1151,6 +1151,38 @@ static void __devexit virtnet_remove(struct 
> > > virtio_device *vdev)
> > >   free_netdev(vi->dev);
> > >  }
> > >  
> > > +#ifdef CONFIG_PM
> > > +static int virtnet_freeze(struct virtio_device *vdev)
> > > +{
> > > + struct virtnet_info *vi = vdev->priv;
> > > +
> > > + netif_device_detach(vi->dev);
> > > + if (netif_running(vi->dev))
> > > + napi_disable(&vi->napi);
> > > +
> > 
> > Could refill_work still be running at this point?
> 
> Yes, it could.  So moving the cancel_delayed_work_sync() before
> disabling napi would work fine?

No, because napi poll can schedule that.
Further, refill can reschedule itself.

It also looks like we have a bug in virtio net cleanup now:
cancel_delayed_work_sync is called after unregister, so
it will be calling napi API on an invalid device.
And, if it schedules itself it will run after device is gone.

I think we need some locking to fix this.

>  Anything else that might similar treatment?
> 
>   Amit



-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-07 Thread Jason Wang

On 12/07/2011 03:30 PM, Rusty Russell wrote:

On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang  wrote:

multiple queue virtio-net: flow steering through host/guest cooperation

Hello all:

This is a rough series adds the guest/host cooperation of flow
steering support based on Krish Kumar's multiple queue virtio-net
driver patch 3/3 (http://lwn.net/Articles/467283/).

Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.
As far as I see, ixgbe and sfc have similar but much more sophisticated 
mechanism.


The idea was originally suggested by Ben and it was just borrowed form 
those real physical nic cards who can dispatch packets based on their 
hash. All of theses cards can filter the flow based on the hash of 
L2/L3/L4 header and the stack would tell the card which queue should 
this flow goes.


So in host, a simple hash to queue table were introduced in tap/macvtap 
and in guest, the guest driver would tell the desired queue of a flow 
through changing this table.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/07/2011 07:10 AM, Sridhar Samudrala wrote:

On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote:

On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang   
wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason 
Wang wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?
Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be 
the same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of 
vhost or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.



Yes. Also the current model of  a vhost thread per VM's interface
doesn't help with packet steering
all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle
packets to/from physical NIC's
TX/RX queues.
Currently we have a single vhost thread for a VM's i/f
that handles all the packets from
various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV
Yes. I tried this before we went ahead with per-interface vhost 
threading model.

At that time, per-cpu vhost  showed a regression with a single-VM and
per-vq vhost showed good performance improvements upto 8 VMs.

So  just making it per-cpu would not be enough. I think we may need a way
to schedule vcpu threads on the same cpu-socket as vhost.

Another aspect we need to look into is the splitting of vhost thread 
into separate
threads for TX and RX. Shirley is doing some work in this area and she 
is seeing
perf. improvements as long as TX and RX threads are on the same 
cpu-socket.


I emulated this through my multi-queue series in the past, looks like it 
damages the performance of single stream especially guest tx.


On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

But this may not be scalable long term when we want to support a large 
number of VMs each

having multiple virtio-net interfaces with multiple queues.

Thanks
Sridhar



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/06/2011 11:42 PM, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang
wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be the 
same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost 
or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle 
packets to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.


Even if we have per-cpu workthread, only one socket is used to queue the 
packet then, so a multiple queue(sockets) tap/macvtap is still needed.


Thanks
Sridhar



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 04/12] virtio: console: Add freeze and restore handlers to support S4

2011-12-07 Thread Amit Shah
On (Wed) 07 Dec 2011 [12:43:24], Michael S. Tsirkin wrote:

> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index e14f5aa..fd2fd6f 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -1844,6 +1844,60 @@ static unsigned int features[] = {
> > VIRTIO_CONSOLE_F_MULTIPORT,
> >  };
> >  
> > +#ifdef CONFIG_PM
> > +static int virtcons_freeze(struct virtio_device *vdev)
> > +{
> > +   struct ports_device *portdev;
> > +   struct port *port;
> > +
> > +   portdev = vdev->priv;
> > +
> > +   vdev->config->reset(vdev);
> 
> 
> So here, cancel_work_sync might still be running.
> If it does run, might it try to access the device
> config?  Could not determine this quickly, if yes it's a problem.

Similar to the other comment: I don't see why just resetting device
can cause config queue access to go bad.

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 06/12] virtio: blk: Add freeze, restore handlers to support S4

2011-12-07 Thread Amit Shah
On (Wed) 07 Dec 2011 [12:37:02], Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2011 at 01:18:44AM +0530, Amit Shah wrote:
> > Delete the vq and flush any pending requests from the block queue on the
> > freeze callback to prepare for hibernation.
> > 
> > Re-create the vq in the restore callback to resume normal function.
> > 
> > Signed-off-by: Amit Shah 
> > ---
> >  drivers/block/virtio_blk.c |   38 ++
> >  1 files changed, 38 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 467f218..a9147a6 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -568,6 +568,40 @@ static void __devexit virtblk_remove(struct 
> > virtio_device *vdev)
> > ida_simple_remove(&vd_index_ida, index);
> >  }
> >  
> > +#ifdef CONFIG_PM
> > +static int virtblk_freeze(struct virtio_device *vdev)
> > +{
> > +   struct virtio_blk *vblk = vdev->priv;
> > +
> > +   /* Ensure we don't receive any more interrupts */
> > +   vdev->config->reset(vdev);
> > +
> > +   flush_work(&vblk->config_work);
> 
> It bothers me that config work can be running
> after reset here. If it does, it will not get sane
> values from reading config.

Why so?

The reset only ensures the host doesn't write anything more, isn't it?
Why would the values be affected?

> Also, can there be stuff in the reqs list?
> If yes is this a problem?

Should be all cleared by the two commands below.  At least that's my
expectation.  If not, let me know!

> > +   spin_lock_irq(vblk->disk->queue->queue_lock);
> > +   blk_stop_queue(vblk->disk->queue);
> > +   spin_unlock_irq(vblk->disk->queue->queue_lock);
> > +   blk_sync_queue(vblk->disk->queue);
> > +
> > +   vdev->config->del_vqs(vdev);
> > +   return 0;
> > +}
> > +
> 
> Thinking about it, looks like there's a bug in
> virtblk_remove: if we get a config change after
> flush_work we schedule another work.
> That's a problem for sure as structure is removed.

Yep, it is a potential issue.

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Avi Kivity
On 12/06/2011 06:49 PM, Konrad Rzeszutek Wilk wrote:
> On Sun, Dec 04, 2011 at 11:36:58PM +0530, Raghavendra K T wrote:
> > On 12/02/2011 01:20 AM, Raghavendra K T wrote:
> > >>Have you tested it on AMD machines? There are some differences in the
> > >>hypercall infrastructure there.
> > >
> > >Yes. 'll test on AMD machine and update on that.
> > >
> > 
> > I tested the code on 64 bit Dual-Core AMD Opteron machine, and it is 
> > working.
>
> I am not that familiar with how KVM does migration, but do you need any
> special flags in QEMU to when migrating a KVM-pv-spinlock enabled guest
> to another machine?

I don't think so.  Sleeping is an ordinary HLT state which we already
migrate.  There are no further states.

-- 
error compiling committee.c: too many arguments to function

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-07 Thread Marcelo Tosatti
On Wed, Nov 30, 2011 at 02:29:59PM +0530, Raghavendra K T wrote:
> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
> 
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt 
> state.
> 
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU.
> 
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.
> 
> There is no Xen/KVM hypercall interface to await kick from.
> 
> Signed-off-by: Srivatsa Vaddagiri 
> Signed-off-by: Suzuki Poulose 
> Signed-off-by: Raghavendra K T 
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 734c376..8b1d65d 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,12 +16,14 @@
>  #define KVM_FEATURE_CLOCKSOURCE  0
>  #define KVM_FEATURE_NOP_IO_DELAY 1
>  #define KVM_FEATURE_MMU_OP   2
> +
>  /* This indicates that the new set of kvmclock msrs
>   * are available. The use of 0x11 and 0x12 is deprecated
>   */
>  #define KVM_FEATURE_CLOCKSOURCE23
>  #define KVM_FEATURE_ASYNC_PF 4
>  #define KVM_FEATURE_STEAL_TIME   5
> +#define KVM_FEATURE_KICK_VCPU6
>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..6e1c8b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_XSAVE:
>   case KVM_CAP_ASYNC_PF:
>   case KVM_CAP_GET_TSC_KHZ:
> + case KVM_CAP_KICK_VCPU:
>   r = 1;
>   break;
>   case KVM_CAP_COALESCED_MMIO:
> @@ -2577,7 +2578,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>(1 << KVM_FEATURE_NOP_IO_DELAY) |
>(1 << KVM_FEATURE_CLOCKSOURCE2) |
>(1 << KVM_FEATURE_ASYNC_PF) |
> -  (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> +  (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> +  (1 << KVM_FEATURE_KICK_VCPU);
>  
>   if (sched_info_on())
>   entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> @@ -5305,6 +5307,26 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>   return 1;
>  }
>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @cpu - vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
> +{
> + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
> + struct kvm_mp_state mp_state;
> +
> + mp_state.mp_state = KVM_MP_STATE_RUNNABLE;

Since vcpu->mp_state is not protected by a lock, this is potentially racy. For 
example:

CPU0CPU1
kvm_pv_kick_cpu_op  running vcpuN
vcpuN->mp_state = KVM_MP_STATE_RUNNABLE;
kvm_emulate_halt
vcpuN->mp_state = 
KVM_MP_STATE_HALTED

Is it harmless to lose a kick?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 00/12] virtio: s4 support

2011-12-07 Thread Gleb Natapov
On Wed, Dec 07, 2011 at 05:54:29PM +1030, Rusty Russell wrote:
> On Wed,  7 Dec 2011 01:18:38 +0530, Amit Shah  wrote:
> > Hi,
> > 
> > These patches add support for S4 to virtio (pci) and all drivers.
> 
> Dumb meta-question: why do we want to hibernate virtual machines?
> 
For instance you want to hibernate your laptop while it is running a virtual
machine. You can pause them of course, but then time will be incorrect
inside a guest after resume.

> I figure there's a reason, but it seems a bit weird :)
> 
> Thanks,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 04/12] virtio: console: Add freeze and restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 01:18:42AM +0530, Amit Shah wrote:
> Remove all vqs and associated buffers in the freeze callback which
> prepares us to go into hibernation state.  On restore, re-create all the
> vqs and populate the input vqs with buffers to get to the pre-hibernate
> state.
> 
> Note: Any outstanding unconsumed buffers are discarded; which means
> there's a possibility of data loss in case the host or the guest didn't
> consume any data already present in the vqs.  This can be addressed in a
> later patch series, perhaps in virtio common code.
> 
> Signed-off-by: Amit Shah 
> ---
>  drivers/char/virtio_console.c |   58 
> +
>  1 files changed, 58 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index e14f5aa..fd2fd6f 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -1844,6 +1844,60 @@ static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_MULTIPORT,
>  };
>  
> +#ifdef CONFIG_PM
> +static int virtcons_freeze(struct virtio_device *vdev)
> +{
> + struct ports_device *portdev;
> + struct port *port;
> +
> + portdev = vdev->priv;
> +
> + vdev->config->reset(vdev);


So here, cancel_work_sync might still be running.
If it does run, might it try to access the device
config?  Could not determine this quickly, if yes it's a problem.

> +
> + cancel_work_sync(&portdev->control_work);
> + remove_controlq_data(portdev);
> +
> + list_for_each_entry(port, &portdev->ports, list) {
> + /*
> +  * We'll ask the host later if the new invocation has
> +  * the port opened or closed.
> +  */
> + port->host_connected = false;
> + remove_port_data(port);
> + }
> + remove_vqs(portdev);
> +
> + return 0;
> +}
> +
> +static int virtcons_restore(struct virtio_device *vdev)
> +{
> + struct ports_device *portdev;
> + struct port *port;
> + int ret;
> +
> + portdev = vdev->priv;
> +
> + ret = init_vqs(portdev);
> + if (ret)
> + return ret;
> +
> + if (use_multiport(portdev))
> + fill_queue(portdev->c_ivq, &portdev->cvq_lock);
> +
> + list_for_each_entry(port, &portdev->ports, list) {
> + port->in_vq = portdev->in_vqs[port->id];
> + port->out_vq = portdev->out_vqs[port->id];
> +
> + fill_queue(port->in_vq, &port->inbuf_lock);
> +
> + /* Get port open/close status on the host */
> + send_control_msg(port, VIRTIO_CONSOLE_PORT_READY, 1);
> + }
> + return 0;
> +}
> +#endif
> +
>  static struct virtio_driver virtio_console = {
>   .feature_table = features,
>   .feature_table_size = ARRAY_SIZE(features),
> @@ -1853,6 +1907,10 @@ static struct virtio_driver virtio_console = {
>   .probe =virtcons_probe,
>   .remove =   virtcons_remove,
>   .config_changed = config_intr,
> +#ifdef CONFIG_PM
> + .freeze =   virtcons_freeze,
> + .restore =  virtcons_restore,
> +#endif
>  };
>  
>  static int __init init(void)
> -- 
> 1.7.7.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: (resend) [PATCH v4 12/12] virtio: balloon: Add freeze, restore handlers to support S4

2011-12-07 Thread Amit Shah
On (Wed) 07 Dec 2011 [12:34:48], Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2011 at 01:18:50AM +0530, Amit Shah wrote:
> > Now to not race with a host issuing ballooning requests while we are in
> > the process of freezing, we just exit from the vballoon kthread when the
> > processes are asked to freeze.  Upon thaw and restore, we re-start the
> > thread.
> 
> ...
> 
> > ---
> >  drivers/virtio/virtio_balloon.c |   79 
> > ++-
> >  1 files changed, 78 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_balloon.c 
> > b/drivers/virtio/virtio_balloon.c
> > index 8bf99be..10ec638 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -258,7 +258,13 @@ static int balloon(void *_vballoon)
> > while (!kthread_should_stop()) {
> > s64 diff;
> >  
> > -   try_to_freeze();
> > +   /*
> > +* On suspend, we want to exit this thread.  We will
> > +* start a new thread on resume.
> > +*/
> > +   if (freezing(current))
> > +   break;
> > +
> > wait_event_interruptible(vb->config_change,
> >  (diff = towards_target(vb)) != 0
> >  || vb->need_stats_update
> 
> ...
> 
> Note: this relies on kthreads being frozen before devices.
> Looking at kernel/power/hibernate.c this is the case,
> but I think we should add a comment to note this.

Yes, it does.  And for that reason, I mentioned that stopping the
thread doesn't buy us anything; I'll revert this change in the next
submission.

> Also Cc linux-pm crowd in case I got it wrong.

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-07 Thread Rusty Russell
On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang  wrote:
> multiple queue virtio-net: flow steering through host/guest cooperation
> 
> Hello all:
> 
> This is a rough series adds the guest/host cooperation of flow
> steering support based on Krish Kumar's multiple queue virtio-net
> driver patch 3/3 (http://lwn.net/Articles/467283/).

Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 09/12] virtio: net: Add freeze, restore handlers to support S4

2011-12-07 Thread Amit Shah
On (Wed) 07 Dec 2011 [12:28:24], Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2011 at 01:18:47AM +0530, Amit Shah wrote:
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 697a0fc..1378f3c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1151,6 +1151,38 @@ static void __devexit virtnet_remove(struct 
> > virtio_device *vdev)
> > free_netdev(vi->dev);
> >  }
> >  
> > +#ifdef CONFIG_PM
> > +static int virtnet_freeze(struct virtio_device *vdev)
> > +{
> > +   struct virtnet_info *vi = vdev->priv;
> > +
> > +   netif_device_detach(vi->dev);
> > +   if (netif_running(vi->dev))
> > +   napi_disable(&vi->napi);
> > +
> 
> Could refill_work still be running at this point?

Yes, it could.  So moving the cancel_delayed_work_sync() before
disabling napi would work fine?  Anything else that might similar treatment?

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 06/12] virtio: blk: Add freeze, restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 01:18:44AM +0530, Amit Shah wrote:
> Delete the vq and flush any pending requests from the block queue on the
> freeze callback to prepare for hibernation.
> 
> Re-create the vq in the restore callback to resume normal function.
> 
> Signed-off-by: Amit Shah 
> ---
>  drivers/block/virtio_blk.c |   38 ++
>  1 files changed, 38 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 467f218..a9147a6 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -568,6 +568,40 @@ static void __devexit virtblk_remove(struct 
> virtio_device *vdev)
>   ida_simple_remove(&vd_index_ida, index);
>  }
>  
> +#ifdef CONFIG_PM
> +static int virtblk_freeze(struct virtio_device *vdev)
> +{
> + struct virtio_blk *vblk = vdev->priv;
> +
> + /* Ensure we don't receive any more interrupts */
> + vdev->config->reset(vdev);
> +
> + flush_work(&vblk->config_work);

It bothers me that config work can be running
after reset here. If it does, it will not get sane
values from reading config.


Also, can there be stuff in the reqs list?
If yes is this a problem?

> +
> + spin_lock_irq(vblk->disk->queue->queue_lock);
> + blk_stop_queue(vblk->disk->queue);
> + spin_unlock_irq(vblk->disk->queue->queue_lock);
> + blk_sync_queue(vblk->disk->queue);
> +
> + vdev->config->del_vqs(vdev);
> + return 0;
> +}
> +

Thinking about it, looks like there's a bug in
virtblk_remove: if we get a config change after
flush_work we schedule another work.
That's a problem for sure as structure is removed.


> +static int virtblk_restore(struct virtio_device *vdev)
> +{
> + struct virtio_blk *vblk = vdev->priv;
> + int ret;
> +
> + ret = init_vq(vdev->priv);
> + if (!ret) {
> + spin_lock_irq(vblk->disk->queue->queue_lock);
> + blk_start_queue(vblk->disk->queue);
> + spin_unlock_irq(vblk->disk->queue->queue_lock);
> + }
> + return ret;
> +}
> +#endif
> +
>  static const struct virtio_device_id id_table[] = {
>   { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID },
>   { 0 },
> @@ -593,6 +627,10 @@ static struct virtio_driver __refdata virtio_blk = {
>   .probe  = virtblk_probe,
>   .remove = __devexit_p(virtblk_remove),
>   .config_changed = virtblk_config_changed,
> +#ifdef CONFIG_PM
> + .freeze = virtblk_freeze,
> + .restore= virtblk_restore,
> +#endif
>  };
>  
>  static int __init init(void)
> -- 
> 1.7.7.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: (resend) [PATCH v4 12/12] virtio: balloon: Add freeze, restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 01:18:50AM +0530, Amit Shah wrote:
> Now to not race with a host issuing ballooning requests while we are in
> the process of freezing, we just exit from the vballoon kthread when the
> processes are asked to freeze.  Upon thaw and restore, we re-start the
> thread.

...

> ---
>  drivers/virtio/virtio_balloon.c |   79 
> ++-
>  1 files changed, 78 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8bf99be..10ec638 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -258,7 +258,13 @@ static int balloon(void *_vballoon)
>   while (!kthread_should_stop()) {
>   s64 diff;
>  
> - try_to_freeze();
> + /*
> +  * On suspend, we want to exit this thread.  We will
> +  * start a new thread on resume.
> +  */
> + if (freezing(current))
> + break;
> +
>   wait_event_interruptible(vb->config_change,
>(diff = towards_target(vb)) != 0
>|| vb->need_stats_update

...

Note: this relies on kthreads being frozen before devices.
Looking at kernel/power/hibernate.c this is the case,
but I think we should add a comment to note this.

Also Cc linux-pm crowd in case I got it wrong.


---

Resending due to corrupted headers. Sorry about the noise.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 09/12] virtio: net: Add freeze, restore handlers to support S4

2011-12-07 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 01:18:47AM +0530, Amit Shah wrote:
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 697a0fc..1378f3c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1151,6 +1151,38 @@ static void __devexit virtnet_remove(struct 
> virtio_device *vdev)
>   free_netdev(vi->dev);
>  }
>  
> +#ifdef CONFIG_PM
> +static int virtnet_freeze(struct virtio_device *vdev)
> +{
> + struct virtnet_info *vi = vdev->priv;
> +
> + netif_device_detach(vi->dev);
> + if (netif_running(vi->dev))
> + napi_disable(&vi->napi);
> +

Could refill_work still be running at this point?
If yes it can re-enable napi and cause other kind of trouble.

> + remove_vq_common(vi);
> +
> + return 0;
> +}
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[no subject]

2011-12-07 Thread Michael S. Tsirkin
Pavel Machek ,
"Rafael J. Wysocki" ,
Len Brown ,
linux...@vger.kernel.org
Bcc: 
Subject: Re: [PATCH v4 12/12] virtio: balloon: Add freeze, restore handlers
 to support S4
Reply-To: 
In-Reply-To: 
<5deccc36afa59032f0e3b10a653773bad511f303.1323199985.git.amit.s...@redhat.com>

On Wed, Dec 07, 2011 at 01:18:50AM +0530, Amit Shah wrote:
> Now to not race with a host issuing ballooning requests while we are in
> the process of freezing, we just exit from the vballoon kthread when the
> processes are asked to freeze.  Upon thaw and restore, we re-start the
> thread.

...

> ---
>  drivers/virtio/virtio_balloon.c |   79 
> ++-
>  1 files changed, 78 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8bf99be..10ec638 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -258,7 +258,13 @@ static int balloon(void *_vballoon)
>   while (!kthread_should_stop()) {
>   s64 diff;
>  
> - try_to_freeze();
> + /*
> +  * On suspend, we want to exit this thread.  We will
> +  * start a new thread on resume.
> +  */
> + if (freezing(current))
> + break;
> +
>   wait_event_interruptible(vb->config_change,
>(diff = towards_target(vb)) != 0
>|| vb->need_stats_update

...

Note: this relies on kthreads being frozen before devices.
Looking at kernel/power/hibernate.c this is the case,
but I think we should add a comment to note this.

Also Cc linux-pm crowd in case I got it wrong.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 01/12] virtio: pci: switch to new PM API

2011-12-07 Thread Rafael J. Wysocki
On Wednesday, December 07, 2011, Amit Shah wrote:
> On (Wed) 07 Dec 2011 [10:48:24], Rafael J. Wysocki wrote:
> > On Wednesday, December 07, 2011, Amit Shah wrote:
> > > Hi Rafael,
> > > 
> > > On (Tue) 06 Dec 2011 [23:12:36], Rafael J. Wysocki wrote:
> > > > Hi,
> > > > 
> > > > On Tuesday, December 06, 2011, Amit Shah wrote:
> > > > > The older PM API doesn't have a way to get notifications on hibernate
> > > > > events.  Switch to the newer one that gives us those notifications.
> > > > > 
> > > > > Signed-off-by: Amit Shah 
> > > > > ---
> > > > >  drivers/virtio/virtio_pci.c |   16 
> > > > >  1 files changed, 12 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > > > > index 03d1984..23e1532 100644
> > > > > --- a/drivers/virtio/virtio_pci.c
> > > > > +++ b/drivers/virtio/virtio_pci.c
> > > > > @@ -708,19 +708,28 @@ static void __devexit virtio_pci_remove(struct 
> > > > > pci_dev *pci_dev)
> > > > >  }
> > > > >  
> > > > >  #ifdef CONFIG_PM
> > > > > -static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t 
> > > > > state)
> > > > > +static int virtio_pci_suspend(struct device *dev)
> > > > >  {
> > > > > + struct pci_dev *pci_dev = to_pci_dev(dev);
> > > > > +
> > > > >   pci_save_state(pci_dev);
> > > > >   pci_set_power_state(pci_dev, PCI_D3hot);
> > > > >   return 0;
> > > > >  }
> > > > >  
> > > > > -static int virtio_pci_resume(struct pci_dev *pci_dev)
> > > > > +static int virtio_pci_resume(struct device *dev)
> > > > >  {
> > > > > + struct pci_dev *pci_dev = to_pci_dev(dev);
> > > > > +
> > > > >   pci_restore_state(pci_dev);
> > > > >   pci_set_power_state(pci_dev, PCI_D0);
> > > > >   return 0;
> > > > >  }
> > > > > +
> > > > > +static const struct dev_pm_ops virtio_pci_pm_ops = {
> > > > > + .suspend = virtio_pci_suspend,
> > > > > + .resume  = virtio_pci_resume,
> > > > > +};
> > > > >  #endif
> > > > 
> > > > You seem to have forgotten about hibernation callbacks.
> > > 
> > > This patch just moves to the new API keeping everything else the same.
> > > The hibernation callbacks come in patch 2.
> > 
> > OK, but then hibernation will be broken between the two patches, so perhaps
> > it's better to merge them into one?
> 
> The hibernation support doesn't exist now.  It's added in the later
> patches of the series (3 onwards).

OK then, sorry for the noise.

Rafael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 01/12] virtio: pci: switch to new PM API

2011-12-07 Thread Amit Shah
On (Wed) 07 Dec 2011 [10:48:24], Rafael J. Wysocki wrote:
> On Wednesday, December 07, 2011, Amit Shah wrote:
> > Hi Rafael,
> > 
> > On (Tue) 06 Dec 2011 [23:12:36], Rafael J. Wysocki wrote:
> > > Hi,
> > > 
> > > On Tuesday, December 06, 2011, Amit Shah wrote:
> > > > The older PM API doesn't have a way to get notifications on hibernate
> > > > events.  Switch to the newer one that gives us those notifications.
> > > > 
> > > > Signed-off-by: Amit Shah 
> > > > ---
> > > >  drivers/virtio/virtio_pci.c |   16 
> > > >  1 files changed, 12 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > > > index 03d1984..23e1532 100644
> > > > --- a/drivers/virtio/virtio_pci.c
> > > > +++ b/drivers/virtio/virtio_pci.c
> > > > @@ -708,19 +708,28 @@ static void __devexit virtio_pci_remove(struct 
> > > > pci_dev *pci_dev)
> > > >  }
> > > >  
> > > >  #ifdef CONFIG_PM
> > > > -static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t 
> > > > state)
> > > > +static int virtio_pci_suspend(struct device *dev)
> > > >  {
> > > > +   struct pci_dev *pci_dev = to_pci_dev(dev);
> > > > +
> > > > pci_save_state(pci_dev);
> > > > pci_set_power_state(pci_dev, PCI_D3hot);
> > > > return 0;
> > > >  }
> > > >  
> > > > -static int virtio_pci_resume(struct pci_dev *pci_dev)
> > > > +static int virtio_pci_resume(struct device *dev)
> > > >  {
> > > > +   struct pci_dev *pci_dev = to_pci_dev(dev);
> > > > +
> > > > pci_restore_state(pci_dev);
> > > > pci_set_power_state(pci_dev, PCI_D0);
> > > > return 0;
> > > >  }
> > > > +
> > > > +static const struct dev_pm_ops virtio_pci_pm_ops = {
> > > > +   .suspend = virtio_pci_suspend,
> > > > +   .resume  = virtio_pci_resume,
> > > > +};
> > > >  #endif
> > > 
> > > You seem to have forgotten about hibernation callbacks.
> > 
> > This patch just moves to the new API keeping everything else the same.
> > The hibernation callbacks come in patch 2.
> 
> OK, but then hibernation will be broken between the two patches, so perhaps
> it's better to merge them into one?

The hibernation support doesn't exist now.  It's added in the later
patches of the series (3 onwards).

Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 01/12] virtio: pci: switch to new PM API

2011-12-07 Thread Rafael J. Wysocki
On Wednesday, December 07, 2011, Amit Shah wrote:
> Hi Rafael,
> 
> On (Tue) 06 Dec 2011 [23:12:36], Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Tuesday, December 06, 2011, Amit Shah wrote:
> > > The older PM API doesn't have a way to get notifications on hibernate
> > > events.  Switch to the newer one that gives us those notifications.
> > > 
> > > Signed-off-by: Amit Shah 
> > > ---
> > >  drivers/virtio/virtio_pci.c |   16 
> > >  1 files changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> > > index 03d1984..23e1532 100644
> > > --- a/drivers/virtio/virtio_pci.c
> > > +++ b/drivers/virtio/virtio_pci.c
> > > @@ -708,19 +708,28 @@ static void __devexit virtio_pci_remove(struct 
> > > pci_dev *pci_dev)
> > >  }
> > >  
> > >  #ifdef CONFIG_PM
> > > -static int virtio_pci_suspend(struct pci_dev *pci_dev, pm_message_t 
> > > state)
> > > +static int virtio_pci_suspend(struct device *dev)
> > >  {
> > > + struct pci_dev *pci_dev = to_pci_dev(dev);
> > > +
> > >   pci_save_state(pci_dev);
> > >   pci_set_power_state(pci_dev, PCI_D3hot);
> > >   return 0;
> > >  }
> > >  
> > > -static int virtio_pci_resume(struct pci_dev *pci_dev)
> > > +static int virtio_pci_resume(struct device *dev)
> > >  {
> > > + struct pci_dev *pci_dev = to_pci_dev(dev);
> > > +
> > >   pci_restore_state(pci_dev);
> > >   pci_set_power_state(pci_dev, PCI_D0);
> > >   return 0;
> > >  }
> > > +
> > > +static const struct dev_pm_ops virtio_pci_pm_ops = {
> > > + .suspend = virtio_pci_suspend,
> > > + .resume  = virtio_pci_resume,
> > > +};
> > >  #endif
> > 
> > You seem to have forgotten about hibernation callbacks.
> 
> This patch just moves to the new API keeping everything else the same.
> The hibernation callbacks come in patch 2.

OK, but then hibernation will be broken between the two patches, so perhaps
it's better to merge them into one?

> >  Please use
> > one the macros defined in include/linux/pm.h if you want to use the same
> > callback routines for hibernation.
> 
> No, they're different functions, so I don't use the maros.

OK

Thanks,
Rafael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Stefan Hajnoczi
On Wed, Dec 7, 2011 at 3:03 AM, Jason Wang  wrote:
> On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang  wrote:
>>>
>>> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

 On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang
  wrote:
>
> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
>>
>> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang
>>  wrote:

 The vcpus are just threads and may not be bound to physical CPUs, so
 what is the big picture here?  Is the guest even in the position to
 set the best queue mappings today?
>>>
>>>
>>> Not sure it could publish the best mapping but the idea is to make sure
>>> the
>>> packets of a flow were handled by the same guest vcpu and may be the same
>>> vhost thread in order to eliminate the packet reordering and lock
>>> contention. But this assumption does not take the bouncing of vhost or
>>> vcpu
>>> threads which would also affect the result.
>>
>> Okay, this is why I'd like to know what the big picture here is.  What
>> solution are you proposing?  How are we going to have everything from
>> guest application, guest kernel, host threads, and host NIC driver
>> play along so we get the right steering up the entire stack.  I think
>> there needs to be an answer to that before changing virtio-net to add
>> any steering mechanism.
>
>
> Consider the complexity of the host nic each with their own steering
> features,  this series make the first step with minimal effort to try to let
> guest driver and host tap/macvtap co-operate like what physical nic does.
> There may be other method, but performance numbers is also needed to give
> the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

Stefan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization