[PATCH] virtio: harsher barriers for virtio-mmio.

2011-12-20 Thread Rusty Russell
We were cheating with our barriers; using the smp ones rather than the
real device ones.  That was fine, until virtio-mmio came along, which
could be talking to a real device (a non-SMP CPU).

Unfortunately, just putting back the real barriers (reverting
d57ed95d) causes a performance regression on virtio-pci.  In
particular, Amos reports netbench's TCP_RR over virtio_net CPU
utilization increased up to 35% while throughput went down by up to
14%.

By comparison, this branch is in the noise.

Reference: https://lkml.org/lkml/2011/12/11/22

Signed-off-by: Rusty Russell 
---
 drivers/lguest/lguest_device.c |   10 ++
 drivers/s390/kvm/kvm_virtio.c  |2 +-
 drivers/virtio/virtio_mmio.c   |7 ---
 drivers/virtio/virtio_pci.c|4 ++--
 drivers/virtio/virtio_ring.c   |   34 +-
 include/linux/virtio_ring.h|1 +
 tools/virtio/linux/virtio.h|1 +
 tools/virtio/virtio_test.c |3 ++-
 8 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -291,11 +291,13 @@ static struct virtqueue *lg_find_vq(stru
}
 
/*
-* OK, tell virtio_ring.c to set up a virtqueue now we know its size
-* and we've got a pointer to its pages.
+* OK, tell virtio_ring.c to set up a virtqueue now we know its size
+* and we've got a pointer to its pages.  Note that we set weak_barriers
+* to 'true': the host just a(nother) SMP CPU, so we only need inter-cpu
+* barriers.
 */
-   vq = vring_new_virtqueue(lvq->config.num, LGUEST_VRING_ALIGN,
-vdev, lvq->pages, lg_notify, callback, name);
+   vq = vring_new_virtqueue(lvq->config.num, LGUEST_VRING_ALIGN, vdev,
+true, lvq->pages, lg_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto unmap;
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -198,7 +198,7 @@ static struct virtqueue *kvm_find_vq(str
goto out;
 
vq = vring_new_virtqueue(config->num, KVM_S390_VIRTIO_RING_ALIGN,
-vdev, (void *) config->address,
+vdev, true, (void *) config->address,
 kvm_notify, callback, name);
if (!vq) {
err = -ENOMEM;
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -309,9 +309,10 @@ static struct virtqueue *vm_setup_vq(str
writel(virt_to_phys(info->queue) >> PAGE_SHIFT,
vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
 
-   /* Create the vring */
-   vq = vring_new_virtqueue(info->num, VIRTIO_MMIO_VRING_ALIGN,
-vdev, info->queue, vm_notify, callback, name);
+   /* Create the vring: no weak barriers, the other side is could
+* be an independent "device". */
+   vq = vring_new_virtqueue(info->num, VIRTIO_MMIO_VRING_ALIGN, vdev,
+false, info->queue, vm_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto error_new_virtqueue;
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -414,8 +414,8 @@ static struct virtqueue *setup_vq(struct
  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
 
/* create the vring */
-   vq = vring_new_virtqueue(info->num, VIRTIO_PCI_VRING_ALIGN,
-vdev, info->queue, vp_notify, callback, name);
+   vq = vring_new_virtqueue(info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
+true, info->queue, vp_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto out_activate_queue;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -28,17 +28,20 @@
 #ifdef CONFIG_SMP
 /* Where possible, use SMP barriers which are more lightweight than mandatory
  * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio does not use). */
-#define virtio_mb() smp_mb()
-#define virtio_rmb() smp_rmb()
-#define virtio_wmb() smp_wmb()
+ * through relaxed memory I/O windows (which virtio-pci does not use). */
+#define virtio_mb(vq) \
+   do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
+#define virtio_rmb(vq) \
+   do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
+#define virtio_wmb(vq) \
+   do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } w

Re: [PATCH RFC] virtio_net: fix refill related races

2011-12-20 Thread Rusty Russell
On Tue, 20 Dec 2011 21:45:19 +0200, "Michael S. Tsirkin"  
wrote:
> On Tue, Dec 20, 2011 at 11:31:54AM -0800, Tejun Heo wrote:
> > On Tue, Dec 20, 2011 at 09:30:55PM +0200, Michael S. Tsirkin wrote:
> > > Hmm, in that case it looks like a nasty race could get
> > > triggered, with try_fill_recv run on multiple CPUs in parallel,
> > > corrupting the linked list within the vq.
> > > 
> > > Using the mutex as my patch did will fix that naturally, as well.
> > 
> > Don't know the code but just use nrt wq.  There's even a system one
> > called system_nrq_wq.
> > 
> > Thanks.
> 
> We can, but we need the mutex for other reasons, anyway.

Well, here's the alternate approach.  What do you think?

Finding two wq issues makes you justifiably cautious, but it almost
feels like giving up to simply wrap it in a lock.  The APIs are designed
to let us do it without a lock; I was just using them wrong.

Two patches in one mail.  It's gauche, but it's an RFC only (untested).

Cheers,
Rusty.

virtio_net: set/cancel work on ndo_open/ndo_stop

Michael S. Tsirkin noticed that we could run the refill work after
ndo_close, which can re-enable napi - we don't disable it until
virtnet_remove.  This is clearly wrong, so move the workqueue control
to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).

One subtle point: virtnet_probe() could simply fail if it couldn't
allocate a receive buffer, but that's less polite in virtnet_open() so
we schedule a refill as we do in the normal receive path if we run out
of memory.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -719,6 +719,10 @@ static int virtnet_open(struct net_devic
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   /* Make sure we have some buffersl if oom use wq. */
+   if (!try_fill_recv(vi, GFP_KERNEL))
+   schedule_delayed_work(&vi->refill, 0);
+
virtnet_napi_enable(vi);
return 0;
 }
@@ -772,6 +776,8 @@ static int virtnet_close(struct net_devi
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   /* Make sure refill_work doesn't re-enable napi! */
+   cancel_delayed_work_sync(&vi->refill);
napi_disable(&vi->napi);
 
return 0;
@@ -1082,7 +1088,6 @@ static int virtnet_probe(struct virtio_d
 
 unregister:
unregister_netdev(dev);
-   cancel_delayed_work_sync(&vi->refill);
 free_vqs:
vdev->config->del_vqs(vdev);
 free_stats:
@@ -1121,9 +1126,7 @@ static void __devexit virtnet_remove(str
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
 
-
unregister_netdev(vi->dev);
-   cancel_delayed_work_sync(&vi->refill);
 
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);


virtio_net: use non-reentrant workqueue.

Michael S. Tsirkin also noticed that we could run the refill work
multiple CPUs: if we kick off a refill on one CPU and then on another,
they would both manipulate the queue at the same time (they use
napi_disable to avoid racing against the receive handler itself).

Tejun points out that this is what the WQ_NON_REENTRANT flag is for,
and that there is a convenient system kthread we can use.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -501,7 +501,7 @@ static void refill_work(struct work_stru
/* In theory, this can happen: if we don't get any buffers in
 * we will *never* try to fill again. */
if (still_empty)
-   schedule_delayed_work(&vi->refill, HZ/2);
+   queue_delayed_work(system_nrt_wq, &vi->refill, HZ/2);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
@@ -520,7 +520,7 @@ again:
 
if (vi->num < vi->max / 2) {
if (!try_fill_recv(vi, GFP_ATOMIC))
-   schedule_delayed_work(&vi->refill, 0);
+   queue_delayed_work(system_nrt_wq, &vi->refill, 0);
}
 
/* Out of packets? */
@@ -721,7 +721,7 @@ static int virtnet_open(struct net_devic
 
/* Make sure we have some buffersl if oom use wq. */
if (!try_fill_recv(vi, GFP_KERNEL))
-   schedule_delayed_work(&vi->refill, 0);
+   queue_delayed_work(&system_nrt_wq, &vi->refill, 0);
 
virtnet_napi_enable(vi);
return 0;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_blk: fix config handler race

2011-12-20 Thread Rusty Russell
On Wed, 7 Dec 2011 17:39:02 +0200, "Michael S. Tsirkin"  wrote:
> Fix a theoretical race related to config work
> handler: a config interrupt might happen
> after we flush config work but before we
> reset the device. It will then cause the
> config work to run during or after reset.

Thanks for the reminder.  Apologies that this slipped through :(

> As a solution
> 1. flush after reset when we know there will be no more interrupts
> 2. add a flag to disable config access before reset

It's crude, but it works.  Applied.

If this happens again, do we want a "__virtio_device_disable" which
stops any interrupts from being delivered?  That would make this neater,
though the implementation might be a bit nasty.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2011-12-20 Thread Rusty Russell
On Tue, 20 Dec 2011 13:37:18 +0200, "Michael S. Tsirkin"  
wrote:
> On Mon, Dec 19, 2011 at 09:38:42PM +1030, Rusty Russell wrote:
> > On Mon, 19 Dec 2011 11:13:26 +0200, "Michael S. Tsirkin"  
> > wrote:
> > 
> > Actually, the host doesn't need anything, since it can always lock out
> > the guest while it updates the area.
> > It's the guest which can't do atomic updates.
> 
> There are 2 cases
> - updates by guest accesses by host
> - accesses by guest updates by host
> 
> Both are problematic because the guest accesses are split.
> Consider the example I gave at the beginning was with capacity read
> by guest. Host can not solve it without guest changes, right?

Indeed, my brain fart.  Let's pretend I didn't say that, and you didn't
have to explain it to me in baby words :)

> > > The counter can be in guest memory, right? So we don't pay extra
> > > exits.
> > 
> > Could be, but I'm not delighted about the design.
> 
> OK, so this is an argument for always using a control vq, right?

Yes.  The idea that we can alter fields in the device-specific config
area is flawed.  There may be cases where it doesn't matter, but as an
idea it was holed to begin with.

We can reduce probability by doing a double read to check, but there are
still cases where it will fail.

> > What does the host do
> > if the guest screws things up?  How long do you wait for them to
> > complete the seqlock?  Or does it save the old version for use in the
> > duration?
> 
> Yes, it will have to only apply the change when seqlock is dropped.

If the seqlock is in normal memory, how does it get notified?  It would
have to poll.  That's annoying, since you don't know when to give up and
declare the device terminally broken.

> Don't have == not reported as observed in the field?
> It seems clear from code that we do have a race, correct?

Yes, and yes.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Call for Workshops at IEEE eScience 2012

2011-12-20 Thread Ioan Raicu

CALL FOR WORKSHOPS

8th IEEE International Conference on eScience
http://www.ci.uchicago.edu/escience2012/
October 8-12, 2012
Chicago, IL, USA

The 8th IEEE eScience conference (e-Science 2012), sponsored by the IEEE 
Computer Society's Technical
Committee for Scalable Computing (TCSC), will be held in Chicago Illinois from 
8-12th October 2012.
The eScience 2011 conference is designed to bring together leading 
international and interdisciplinary
research communities, developers, and users of eScience applications and 
enabling IT technologies.

Multiple e-Science 2012 Workshops will be held on Monday and Tuesday, 8th and 
9th October, co-
located with the main conference.

Workshops are an important part of the conference in providing opportunity for 
researchers to present
their work in a more focused way than the conference itself and to have 
discussion of particular topics
of interest to the community. We cordially invite you to submit workshop 
proposals on any eScience
related topic to the Workshop Chair.

To help those interested know their purpose and scope, workshop proposals 
should include:
• A description of the workshop, its focus, goals, and outcome
• A draft call for papers
• Names and affiliations of the organizers and tentative composition of 
the committees
• Expected numbers of submissions and accepted papers
• Prior history of this workshop, if any. Please include: number of 
submissions, number of
accepted papers, and attendee count.

Workshop organizers are responsible for establishing a program committee, 
collecting and evaluating
submissions, notifying authors of acceptance or rejection in due time, ensuring 
a transparent and fair
selection process, organizing selected papers into sessions, and assigning 
session chairs. Proposals will
be selected that show clear focus and objectives in areas of emerging or 
developing interest guaranteed
to generate significant interest in the community.

Once accepted, the workshop should establish its own paper submission system. 
For each paper
selected for publication, an author must be registered for eScience 2012. Each 
paper must be presented
in person by at least one of the authors. It is expected that the proceedings 
of the eScience 2012
workshops will be published by the IEEE Computer Society Press, USA and will be 
made available online
through the IEEE Digital Library.

SUBMISSION PROCESS

Workshop proposals should be emailed toescience2012-worksh...@fnal.gov

IMPORTANT DATES

Workshop submissions due: 23rd January 2012
Notification of workshop acceptance: 6th February 2012

While it is up to the workshop organizers to work with the authors of any 
papers to be published from
the workshop presenters, it should be noted that information about these will 
be needed by 27th
August 2012 and final camera ready papers are needed by 17th September 2012.

Workshops: 8-9 October 2012

--
=
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=
Cel:1-847-722-0876
Office: 1-312-567-5704
Email:  ira...@cs.iit.edu
Web:http://www.cs.iit.edu/~iraicu/
Web:http://datasys.cs.iit.edu/
=
=


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


CFP: 8th IEEE Int. Conf. on eScience 2012, Chicago IL USA

2011-12-20 Thread Ioan Raicu

CALL FOR PAPERS

8th IEEE International Conference on eScience
http://www.ci.uchicago.edu/escience2012/
October 8-12, 2012
Chicago, IL, USA

Researchers in all disciplines are increasingly adopting digital tools, 
techniques and practices, often in
communities and projects that span disciplines, laboratories, organizations, 
and national boundaries.
The eScience 2012 conference is designed to bring together leading 
international and interdisciplinary
research communities, developers, and users of eScience applications and 
enabling IT technologies. The
conference serves as a forum to present the results of the latest applications 
research and product/tool
developments and to highlight related activities from around the world. Also, 
we are now entering the
second decade of eScience and the 2012 conference gives an opportunity to take 
stock of what has
been achieved so far and look forward to the challenges and opportunities the 
next decade will bring.

A special emphasis of the 2012 conference is on advances in the application of 
technology in a
particular discipline. Accordingly, significant advances in applications 
science and technology will be
considered as important as the development of new technologies themselves. 
Further, we welcome
contributions in educational activities under any of these disciplines.

As a result, the conference will be structured around two e-Science tracks:

• eScience Algorithms and Applications
• eScience application areas, including:
• Physical sciences
• Biomedical sciences
• Social sciences and humanities
• Data-oriented approaches and applications
• Compute-oriented approaches and applications
• Extreme scale approaches and applications
• Cyberinfrastructure to support eScience
• Novel hardware
• Novel uses of production infrastructure
• Software and services
• Tools
The conference proceedings will be published by the IEEE Computer Society 
Press, USA and will be made
available online through the IEEE Digital Library. Selected papers will be 
invited to submit extended
versions to a special issue of the Future Generation Computer Systems 
(FGCS)journal.

SUBMISSION PROCESS
Authors are invited to submit papers with unpublished, original work of not 
more than 8 pages of
double column text using single spaced 10 point size on 8.5 x 11 inch pages, as 
per IEEE 8.5 x 11
manuscript guidelines. (Up to 2 additional pages may be purchased for 
US$150/page)

Templates are available
from 
http://www.ieee.org/conferences_events/conferences/publishing/templates.html.

Authors should submit a PDF file that will print on a PostScript printer
to https://www.easychair.org/conferences/?conf=escience2012

(Note that paper submitters also must submit an abstract in advance of the 
paper deadline. This should
be done through the same site where papers are submitted.)

It is a requirement that at least one author of each accepted paper attend the 
conference.

IMPORTANT DATES

Abstract submission (required): 4 July 2012
Paper submission: 11 July 2012
Paper author notification: 22 August 2012
Camera-ready papers due: 10 September 2012
Conference: 8-12 October 2012

CONFERENCE ORGANIZATION

General Chair
• Ian Foster, University of Chicago&  Argonne National Laboratory, USA
Program Co-Chairs
• Daniel S. Katz, University of Chicago&  Argonne National Laboratory, 
USA
• Heinz Stockinger, SIB Swiss Institute of Bioinformatics, Switzerland
Program Vice Co-Chairs
• eScience Algorithms and Applications Track
• David Abramson, Monash University, Australia
• Gabrielle Allen, Louisiana State University, USA
• Cyberinfrastructure to support eScience Track
• Rosa M. Badia, Barcelona Supercomputing Center / CSIC, Spain
• Geoffrey Fox, Indiana University, USA
Early Results and Works-in-Progress Posters Chair
• Roger Barga, Microsoft, USA
Workshops Chair
• Ruth Pordes, FNAL, USA
Sponsorship Chair
• Charlie Catlett, Argonne National Laboratory, USA
Conference Manager and Finance Chair
• Julie Wulf-Knoerzer, University of Chicago&  Argonne National 
Laboratory, USA
Publicity Chairs
• Kento Aida, National Institute of Informatics, Japan
• Ioan Raicu, Illinois Institute of Technology, USA
• David Wallom, Oxford e-Research Centre, UK
Local Organizing Committee
• Ninfa Mayorga, University of Chicago, USA
• Evelyn Rayburn, University of Chicago, USA
• Lynn Valentini, Argonne National Laboratory, USA
Program Committee
• eScience Algorithms and Applications Track
• Srinivas Aluru, Iowa State University, USA
• Ashiq Anjum, University of Derby, UK
• David A. Bade

[PATCH] virtio_net: fix refill related races

2011-12-20 Thread Michael S. Tsirkin
Fix theoretical races related to refill work:
1. After napi is disabled by ndo_stop, refill work
   can run and re-enable it.
2. Refill can get scheduled on many cpus in parallel.
   if this happens it will corrupt the vq linked list
   as there's no locking
3. refill work is cancelled after unregister netdev
   For small bufs this means it can alloc an skb
   for a device which is unregistered which is
   not a good idea.

As a solution, add flag to track napi state
and toggle it on start/stop
check on refill. Add a mutex to pretect the flag,
as well as serial refills.

Move refill cancellation to after unregister.

refill work structure and new fields aren't used
on data path, so put them together near the end of
struct virtnet_info.

TODO: consider using system_nrq_wq as suggested by
Tejun Heo. Probably be a good idea but not  a must for
correctness.

Lightly tested.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c |   34 +++---
 1 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6ee8410..452f186 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,15 +69,21 @@ struct virtnet_info {
/* Active statistics */
struct virtnet_stats __percpu *stats;
 
-   /* Work struct for refilling if we run low on memory. */
-   struct delayed_work refill;
-
/* Chain pages by the private ptr. */
struct page *pages;
 
/* fragments + linear part + virtio header */
struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+
+   /* Work struct for refilling if we run low on memory. */
+   struct delayed_work refill;
+
+   /* Whether napi is enabled, protected by a refill_lock. */
+   bool napi_enable;
+
+   /* Lock to protect refill and napi enable/disable operations. */
+   struct mutex refill_lock;
 };
 
 struct skb_vnet_hdr {
@@ -494,14 +501,20 @@ static void refill_work(struct work_struct *work)
bool still_empty;
 
vi = container_of(work, struct virtnet_info, refill.work);
-   napi_disable(&vi->napi);
+
+   mutex_lock(&vi->refill_lock);
+   if (vi->napi_enable)
+   napi_disable(&vi->napi);
still_empty = !try_fill_recv(vi, GFP_KERNEL);
-   virtnet_napi_enable(vi);
+   if (vi->napi_enable)
+   virtnet_napi_enable(vi);
 
/* In theory, this can happen: if we don't get any buffers in
 * we will *never* try to fill again. */
if (still_empty)
schedule_delayed_work(&vi->refill, HZ/2);
+
+   mutex_unlock(&vi->refill_lock);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
@@ -719,7 +732,10 @@ static int virtnet_open(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   mutex_lock(&vi->refill_lock);
+   vi->napi_enable = true;
virtnet_napi_enable(vi);
+   mutex_unlock(&vi->refill_lock);
return 0;
 }
 
@@ -772,7 +788,10 @@ static int virtnet_close(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
 
+   mutex_lock(&vi->refill_lock);
+   vi->napi_enable = false;
napi_disable(&vi->napi);
+   mutex_unlock(&vi->refill_lock);
 
return 0;
 }
@@ -1021,6 +1040,7 @@ static int virtnet_probe(struct virtio_device *vdev)
if (vi->stats == NULL)
goto free;
 
+   mutex_init(&vi->refill_lock);
INIT_DELAYED_WORK(&vi->refill, refill_work);
sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
@@ -1081,8 +1101,8 @@ static int virtnet_probe(struct virtio_device *vdev)
return 0;
 
 unregister:
-   unregister_netdev(dev);
cancel_delayed_work_sync(&vi->refill);
+   unregister_netdev(dev);
 free_vqs:
vdev->config->del_vqs(vdev);
 free_stats:
@@ -1121,9 +1141,9 @@ static void __devexit virtnet_remove(struct virtio_device 
*vdev)
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
 
+   cancel_delayed_work_sync(&vi->refill);
 
unregister_netdev(vi->dev);
-   cancel_delayed_work_sync(&vi->refill);
 
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);
-- 
1.7.5.53.gc233e
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: fix refill related races

2011-12-20 Thread Michael S. Tsirkin
On Tue, Dec 20, 2011 at 11:31:54AM -0800, Tejun Heo wrote:
> On Tue, Dec 20, 2011 at 09:30:55PM +0200, Michael S. Tsirkin wrote:
> > Hmm, in that case it looks like a nasty race could get
> > triggered, with try_fill_recv run on multiple CPUs in parallel,
> > corrupting the linked list within the vq.
> > 
> > Using the mutex as my patch did will fix that naturally, as well.
> 
> Don't know the code but just use nrt wq.  There's even a system one
> called system_nrq_wq.
> 
> Thanks.

We can, but we need the mutex for other reasons, anyway.

> -- 
> tejun
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_blk: fix config handler race

2011-12-20 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 05:39:02PM +0200, Michael S. Tsirkin wrote:
> Fix a theoretical race related to config work
> handler: a config interrupt might happen
> after we flush config work but before we
> reset the device. It will then cause the
> config work to run during or after reset.
> 
> Two problems with this:
> - if this runs after device is gone we will get use after free
> - access of config while reset is in progress is racy
> (as layout is changing).
> 
> As a solution
> 1. flush after reset when we know there will be no more interrupts
> 2. add a flag to disable config access before reset
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> RFC only as it's untested.
> Bugfix so 3.2 material? Comments?

Works fine for me. Comments?

> 
>  drivers/block/virtio_blk.c |   22 +-
>  1 files changed, 21 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 4d0b70a..34633f3 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -36,6 +37,12 @@ struct virtio_blk
>   /* Process context for config space updates */
>   struct work_struct config_work;
>  
> + /* Lock for config space updates */
> + struct mutex config_lock;
> +
> + /* enable config space updates */
> + bool config_enable;
> +
>   /* What host tells us, plus 2 for header & tailer. */
>   unsigned int sg_elems;
>  
> @@ -318,6 +325,10 @@ static void virtblk_config_changed_work(struct 
> work_struct *work)
>   char cap_str_2[10], cap_str_10[10];
>   u64 capacity, size;
>  
> + mutex_lock(&vblk->config_lock);
> + if (!vblk->config_enable)
> + goto done;
> +
>   /* Host must always specify the capacity. */
>   vdev->config->get(vdev, offsetof(struct virtio_blk_config, capacity),
> &capacity, sizeof(capacity));
> @@ -340,6 +351,8 @@ static void virtblk_config_changed_work(struct 
> work_struct *work)
> cap_str_10, cap_str_2);
>  
>   set_capacity(vblk->disk, capacity);
> +done:
> + mutex_lock(&vblk->config_lock);
>  }
>  
>  static void virtblk_config_changed(struct virtio_device *vdev)
> @@ -388,7 +401,9 @@ static int __devinit virtblk_probe(struct virtio_device 
> *vdev)
>   vblk->vdev = vdev;
>   vblk->sg_elems = sg_elems;
>   sg_init_table(vblk->sg, vblk->sg_elems);
> + mutex_init(&vblk->config_lock);
>   INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
> + vblk->config_enable = true;
>  
>   /* We expect one virtqueue, for output. */
>   vblk->vq = virtio_find_single_vq(vdev, blk_done, "requests");
> @@ -542,7 +557,10 @@ static void __devexit virtblk_remove(struct 
> virtio_device *vdev)
>   struct virtio_blk *vblk = vdev->priv;
>   int index = vblk->index;
>  
> - flush_work(&vblk->config_work);
> + /* Prevent config work handler from accessing the device. */
> + mutex_lock(&vblk->config_lock);
> + vblk->config_enable = false;
> + mutex_unlock(&vblk->config_lock);
>  
>   /* Nothing should be pending. */
>   BUG_ON(!list_empty(&vblk->reqs));
> @@ -550,6 +568,8 @@ static void __devexit virtblk_remove(struct virtio_device 
> *vdev)
>   /* Stop all the virtqueues. */
>   vdev->config->reset(vdev);
>  
> + flush_work(&vblk->config_work);
> +
>   del_gendisk(vblk->disk);
>   blk_cleanup_queue(vblk->disk->queue);
>   put_disk(vblk->disk);
> -- 
> 1.7.5.53.gc233e
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: fix refill related races

2011-12-20 Thread Tejun Heo
On Tue, Dec 20, 2011 at 09:30:55PM +0200, Michael S. Tsirkin wrote:
> Hmm, in that case it looks like a nasty race could get
> triggered, with try_fill_recv run on multiple CPUs in parallel,
> corrupting the linked list within the vq.
> 
> Using the mutex as my patch did will fix that naturally, as well.

Don't know the code but just use nrt wq.  There's even a system one
called system_nrq_wq.

Thanks.

-- 
tejun
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: fix refill related races

2011-12-20 Thread Michael S. Tsirkin
On Tue, Dec 20, 2011 at 11:09:46AM -0800, Tejun Heo wrote:
> Hello, Michael.
> 
> On Tue, Dec 20, 2011 at 09:09:08PM +0200, Michael S. Tsirkin wrote:
> > Another question, wanted to make sure:
> > virtnet_poll does schedule_delayed_work(&vi->refill, 0);
> > separately refill work itself also does
> > schedule_delayed_work(&vi->refill, HZ/2);
> > If two such events happen twice, on different CPUs, we are still guaranteed
> > the work will only run once, right?
> 
> No, it's not.  Normal workqueues only guarantee non-reentrance on
> local CPU.  If you want to guarantee that only one instance of a given
> item is executing across all CPUs, you need to use the nrt workqueue.
> 
> Thanks.

Hmm, in that case it looks like a nasty race could get
triggered, with try_fill_recv run on multiple CPUs in parallel,
corrupting the linked list within the vq.

Using the mutex as my patch did will fix that naturally, as well.

Rusty, am I missing something?

> -- 
> tejun
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: fix refill related races

2011-12-20 Thread Tejun Heo
Hello, Michael.

On Tue, Dec 20, 2011 at 09:09:08PM +0200, Michael S. Tsirkin wrote:
> Another question, wanted to make sure:
> virtnet_poll does schedule_delayed_work(&vi->refill, 0);
> separately refill work itself also does
> schedule_delayed_work(&vi->refill, HZ/2);
> If two such events happen twice, on different CPUs, we are still guaranteed
> the work will only run once, right?

No, it's not.  Normal workqueues only guarantee non-reentrance on
local CPU.  If you want to guarantee that only one instance of a given
item is executing across all CPUs, you need to use the nrt workqueue.

Thanks.

-- 
tejun
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: fix refill related races

2011-12-20 Thread Michael S. Tsirkin
On Tue, Dec 13, 2011 at 01:05:11PM +1030, Rusty Russell wrote:
> On Mon, 12 Dec 2011 13:54:06 +0200, "Michael S. Tsirkin"  
> wrote:
> > On Mon, Dec 12, 2011 at 09:25:07AM +1030, Rusty Russell wrote:
> > > Orthogonally, the refill-stop code is still buggy, as you noted.
> > 
> > Sorry I don't understand how it's still buggy.
> 
> Both places where we call:
> 
>   cancel_delayed_work_sync(&vi->refill);
> 
> Do not actually guarantee that vi->refill isn't running, because it
> can requeue itself.  A 'bool no_more_refill' field seems like the
> simplest fix for this, but I don't think it's sufficient.
> 
> Tejun, is this correct?  What's the correct way to synchronously stop a
> delayed_work which can "schedule_delayed_work(&vi->refill, HZ/2);" on
> itself?
> 
> Thanks,
> Rusty.

Another question, wanted to make sure:
virtnet_poll does schedule_delayed_work(&vi->refill, 0);
separately refill work itself also does
schedule_delayed_work(&vi->refill, HZ/2);
If two such events happen twice, on different CPUs, we are still guaranteed
the work will only run once, right?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first

2011-12-20 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 20 Dec 2011 13:15:12 +0200

> On Wed, Dec 07, 2011 at 01:52:35PM -0500, David Miller wrote:
>> Once you sort this out, reply with an Acked-by: for me, thanks.
> 
> Acked-by: Michael S. Tsirkin 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2011-12-20 Thread Michael S. Tsirkin
On Mon, Dec 19, 2011 at 09:38:42PM +1030, Rusty Russell wrote:
> On Mon, 19 Dec 2011 11:13:26 +0200, "Michael S. Tsirkin"  
> wrote:
> > On Mon, Dec 19, 2011 at 04:36:38PM +1030, Rusty Russell wrote:
> > > On Sun, 18 Dec 2011 12:18:32 +0200, "Michael S. Tsirkin" 
> > >  wrote:
> > > > On Fri, Dec 16, 2011 at 12:20:08PM +1030, Rusty Russell wrote:
> > > > > Perhaps a new feature VIRTIO_F_UNSTABLE?  Which (unlike other 
> > > > > features)
> > > > > appears and vanishes around config writes by either side?  Kind of a
> > > > > hack though...
> > > > 
> > > > Not sure how this can work in such a setup: when would guest
> > > > check this bit to avoid races?
> > > > A separate registers also seems nicer than a flag.
> > > > 
> > > > Some other possible design choices:
> > > > - a flag to signal config accesses in progress by guest
> > > >   host would need to buffer changes and apply them in one go
> > > >   when flag is cleared
> > > > - a register to make host get/set config in guest memory
> > > > - use a control vq for all devices
> > > 
> > > - seqlock-style generation count register(s)?
> > >   Has the advantage of
> > >   being a noop if things never change.
> 
> Actually, the host doesn't need anything, since it can always lock out
> the guest while it updates the area.
> It's the guest which can't do atomic updates.

There are 2 cases
- updates by guest accesses by host
- accesses by guest updates by host

Both are problematic because the guest accesses are split.
Consider the example I gave at the beginning was with capacity read
by guest. Host can not solve it without guest changes, right?

> > The counter can be in guest memory, right? So we don't pay extra
> > exits.
> 
> Could be, but I'm not delighted about the design.

OK, so this is an argument for always using a control vq, right?

> What does the host do
> if the guest screws things up?  How long do you wait for them to
> complete the seqlock?  Or does it save the old version for use in the
> duration?

Yes, it will have to only apply the change when seqlock is dropped.

> And we don't have any atomic guest write problems that I know of.  We
> can solve it in future (by specifying a config queue).

Don't have == not reported as observed in the field?
It seems clear from code that we do have a race, correct?

> > > - continue to ignore it ;)
> > 
> > Since you decided on a config layout redesign it seems a good time to
> > fix architectural problems ...
> 
> Yes, indeed.
> 
> Cheers,
> Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first

2011-12-20 Thread Michael S. Tsirkin
On Wed, Dec 07, 2011 at 01:52:35PM -0500, David Miller wrote:
> From: "Michael S. Tsirkin" 
> Date: Wed, 7 Dec 2011 18:10:02 +0200
> 
> > On Fri, Nov 25, 2011 at 01:35:52AM -0500, David Miller wrote:
> >> From: Krishna Kumar2 
> >> Date: Fri, 25 Nov 2011 09:39:11 +0530
> >> 
> >> > Jason Wang  wrote on 11/25/2011 08:51:57 AM:
> >> >>
> >> >> My description is not clear again :(
> >> >> I mean the same vhost thead:
> >> >>
> >> >> vhost thread #0 transmits packets of flow A on processor M
> >> >> ...
> >> >> vhost thread #0 move to another process N and start to transmit packets
> >> >> of flow A
> >> > 
> >> > Thanks for clarifying. Yes, binding vhosts to CPU's
> >> > makes the incoming packet go to the same vhost each
> >> > time. BTW, are you doing any binding and/or irqbalance
> >> > when you run your tests? I am not running either at
> >> > this time, but thought both might be useful.
> >> 
> >> So are we going with this patch or are we saying that vhost binding
> >> is a requirement?
> > 
> > OK we didn't come to a conclusion so I would be inclined
> > to merge this patch as is for 3.2, and revisit later.
> > One question though: do these changes affect userspace
> > in any way? For example, will this commit us to
> > ensure that a single flow gets a unique hash even
> > for strange configurations that transmit the same flow
> > from multiple cpus?
> 
> Once you sort this out, reply with an Acked-by: for me, thanks.

Acked-by: Michael S. Tsirkin 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization