[PATCH 1/1] virtio_ccw: introduce device_lost in virtio_ccw_device

2014-04-25 Thread Heinz Graalfs
When a device is lost, the common I/O layer calls the notification
handler with CIO_GONE: In that event, flag device_lost as true.

In case the device had been flagged as lost when the remove/offline callbacks
are called, call the new virtio_break_device() function prior to invoking
device_unregister(). This avoids hangs of I/O triggered via the device
unregistration callbacks.

Signed-off-by: Heinz Graalfs 
Reviewed-by: Cornelia Huck 
---
 drivers/s390/kvm/virtio_ccw.c | 49 ---
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 1e1fc67..d2c0b44 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -62,6 +63,7 @@ struct virtio_ccw_device {
struct vq_config_block *config_block;
bool is_thinint;
bool going_away;
+   bool device_lost;
void *airq_info;
 };
 
@@ -1010,11 +1012,14 @@ static void virtio_ccw_remove(struct ccw_device *cdev)
unsigned long flags;
struct virtio_ccw_device *vcdev = virtio_grab_drvdata(cdev);
 
-   if (vcdev && cdev->online)
+   if (vcdev && cdev->online) {
+   if (vcdev->device_lost)
+   virtio_break_device(&vcdev->vdev);
unregister_virtio_device(&vcdev->vdev);
-   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
-   dev_set_drvdata(&cdev->dev, NULL);
-   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
+   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
+   dev_set_drvdata(&cdev->dev, NULL);
+   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
+   }
cdev->handler = NULL;
 }
 
@@ -1023,12 +1028,14 @@ static int virtio_ccw_offline(struct ccw_device *cdev)
unsigned long flags;
struct virtio_ccw_device *vcdev = virtio_grab_drvdata(cdev);
 
-   if (vcdev) {
-   unregister_virtio_device(&vcdev->vdev);
-   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
-   dev_set_drvdata(&cdev->dev, NULL);
-   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
-   }
+   if (!vcdev)
+   return 0;
+   if (vcdev->device_lost)
+   virtio_break_device(&vcdev->vdev);
+   unregister_virtio_device(&vcdev->vdev);
+   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
+   dev_set_drvdata(&cdev->dev, NULL);
+   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
return 0;
 }
 
@@ -1096,8 +1103,26 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   /*
+* Make sure vcdev is set
+* i.e. set_offline/remove callback not already running
+*/
+   if (!vcdev)
+   return NOTIFY_DONE;
+
+   switch (event) {
+   case CIO_GONE:
+   vcdev->device_lost = true;
+   rc = NOTIFY_DONE;
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/1] virtio_ccw: exploit virtio_break_device()

2014-04-25 Thread Heinz Graalfs
Rusty,

this patch exploits the new function virtio_break_device() as of your patch
set dated January 15th on linux-ker...@vger.kernel.org.

The patch avoids hang situations during device unregister, when a (block)
device with active IO is hot-unplugged.

Heinz Graalfs (1):
  virtio_ccw: introduce device_lost in virtio_ccw_device

 drivers/s390/kvm/virtio_ccw.c | 49 ---
 1 file changed, 37 insertions(+), 12 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


virtio: virtio_break_device() to mark all virtqueues broken.

2014-04-02 Thread Heinz Graalfs

Hello Rusty,

the subject patch was part of your original patch series on January 
15th. When is that planned to go upstream?


Heinz

On 01/04/14 04:58, Rusty Russell wrote:

The following changes since commit 33807f4f0daec3b00565c2932d95f614f5833adf:

   Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 (2014-03-11 
11:53:42 -0700)

are available in the git repository at:


   git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git 
tags/virtio-next-for-linus

for you to fetch changes up to fc4324b4597c4eb8907207e82f9a6acec84dd335:

   virtio-blk: base queue-depth on virtqueue ringsize or module param 
(2014-03-24 12:20:18 +1030)


Nothing exciting: virtio-blk users might see a bit of a boost from the
doubling of the default queue length though.

Cheers,
Rusty.


Alexander Gordeev (1):
   virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()

Joel Stanley (3):
   tools/virtio: update internal copies of headers
   tools/virtio: fix missing kmemleak_ignore symbol
   tools/virtio: add a missing )

Randy Dunlap (1):
   MAINTAINERS: virtio-dev is subscribers only

Rusty Russell (8):
   virtio_balloon: don't softlockup on huge balloon changes.
   virtio_net: don't crash if virtqueue is broken.
   virtio_blk: don't crash, report error if virtqueue is broken.
   virtio_balloon: don't crash if virtqueue is broken.
   virtio-rng: don't crash if virtqueue is broken.
   virtio: fail adding buffer on broken queues.
   Revert a02bbb1ccfe8: MAINTAINERS: add virtio-dev ML for virtio
   virtio-blk: base queue-depth on virtqueue ringsize or module param

  MAINTAINERS |  3 ---
  drivers/block/virtio_blk.c  | 20 +---
  drivers/char/hw_random/virtio-rng.c |  3 +--
  drivers/net/virtio_net.c|  2 +-
  drivers/virtio/virtio_balloon.c | 14 +-
  drivers/virtio/virtio_pci.c |  6 ++
  drivers/virtio/virtio_ring.c| 12 +---
  tools/virtio/linux/kmemleak.h   |  3 +++
  tools/virtio/linux/virtio.h |  4 ++--
  tools/virtio/virtio_test.c  |  2 +-
  10 files changed, 45 insertions(+), 24 deletions(-)
  create mode 100644 tools/virtio/linux/kmemleak.h



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2014-02-20 Thread Heinz Graalfs

On 20/02/14 09:03, Rusty Russell wrote:

Heinz Graalfs  writes:

On 29/01/14 07:31, Rusty Russell wrote:

Heinz Graalfs  writes:

On 23/01/14 05:51, Rusty Russell wrote:

Heinz Graalfs  writes:

Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.


Hi Heinz,

   I didn't get a response on my 'break all the virtqueues' patch
series.  Could your System Z code work with this?

Rusty.




Sorry Rusty, I'm back as of today.

I applied your patch series and did some testing...

Removing a disk while reading from it mostly still ends up
in hangs as of below:


OK, we still have the problem of in-flight requests.

I think the correct answer is to drop all requests if the virtqueue
is broken:

   -blk_cleanup_queue(vblk->disk->queue);
   +if (virtqueue_is_broken(vblk->vq))
   +  /* Don't wait for completion, just drop queue. */
   +  blk_abandon_queue(vblk->disk->queue);

Rusty,

but blk_abandon_queue() would not solve the incomplete in-flight
requests, would it? I suppose it would avoid additional in-flight
requests similar to __blk_request_all() and passing -EIO.

Ending of asynchronous in-flight requests still cause other problems
in the host. Such problems should be handled/avoided there, I suppose.


The device is going away (or gone away!), so it shouldn't be completing
requests, right?


well, the device is gone and blk_cleanup_queue() should avoid synching 
data to disk. That is the approach of my original patch-set.


I'll try to find an alternative solution exploiting the new
virtio_break_device().

Heinz



If the device is actually broken, well, there's not much we can do.  We
could try to leak memory I suppose.

Cheers,
Rusty.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2014-02-18 Thread Heinz Graalfs


On 29/01/14 07:31, Rusty Russell wrote:

Heinz Graalfs  writes:

On 23/01/14 05:51, Rusty Russell wrote:

Heinz Graalfs  writes:

Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.


Hi Heinz,

  I didn't get a response on my 'break all the virtqueues' patch
series.  Could your System Z code work with this?

Rusty.




Sorry Rusty, I'm back as of today.

I applied your patch series and did some testing...

Removing a disk while reading from it mostly still ends up
in hangs as of below:


OK, we still have the problem of in-flight requests.

I think the correct answer is to drop all requests if the virtqueue
is broken:

  - blk_cleanup_queue(vblk->disk->queue);
  + if (virtqueue_is_broken(vblk->vq))
  +  /* Don't wait for completion, just drop queue. */
  +  blk_abandon_queue(vblk->disk->queue);

Rusty,

but blk_abandon_queue() would not solve the incomplete in-flight
requests, would it? I suppose it would avoid additional in-flight
requests similar to __blk_request_all() and passing -EIO.

Ending of asynchronous in-flight requests still cause other problems
in the host. Such problems should be handled/avoided there, I suppose.

Heinz


  +  else
  + blk_cleanup_queue(vblk->disk->queue);
  +

Unfortunately blk_abandon_queue() doesn't exist.  Your previous patch
did nothing in that path, which I suspect may leak memory.  That may be
acceptable given that this Shouldn't Happen (often).

At this point, I ask Jens :)


waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that

Cheers,
Rusty.



PID: 13 TASK: 163f8000  CPU: 0   COMMAND: "kworker/u128:1"
   #0 [163f72e0] __schedule at 6aa22c
   #1 [163f7428] io_schedule at 6aab6c
   #2 [163f7448] sleep_on_page at 22cbb2
   #3 [163f7460] __wait_on_bit at 6ab394
   #4 [163f74b0] wait_on_page_bit at 22cef4
   #5 [163f7508] filemap_fdatawait_range at 22d0a6
   #6 [163f75e8] filemap_write_and_wait at 22de62
   #7 [163f7618] fsync_bdev at 2dc5d8
   #8 [163f7640] invalidate_partition at 407ba8
   #9 [163f7668] del_gendisk at 408a4c
#10 [163f76c0] virtblk_remove at 46f81e
#11 [163f76f8] virtio_dev_remove at 43d302
#12 [163f7730] __device_release_driver at 4604c4

waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that

#13 [163f7758] device_release_driver at 46057c
#14 [163f7780] bus_remove_device at 45ff74
#15 [163f77b0] device_del at 45cf54
#16 [163f77e8] device_unregister at 45d00e
#17 [163f7808] unregister_virtio_device at 43d5ba
#18 [163f7828] virtio_ccw_remove at 55156c
#19 [163f7850] ccw_device_remove at 4d7e22
#20 [163f78d8] __device_release_driver at 4604c4
#21 [163f7900] device_release_driver at 46057c
#22 [163f7928] bus_remove_device at 45ff74
#23 [163f7958] device_del at 45cf54
#24 [163f7990] ccw_device_unregister at 4d86a0
#25 [163f79b0] io_subchannel_remove at 4d8d1a
#26 [163f79e8] css_remove at 4d2856
#27 [163f7a08] __device_release_driver at 4604c4
#28 [163f7a30] device_release_driver at 46057c
#29 [163f7a58] bus_remove_device at 45ff74
#30 [163f7a88] device_del at 45cf54
#31 [163f7ac0] device_unregister at 45d00e
#32 [163f7ae0] css_sch_device_unregister at 4d29d4
#33 [163f7b08] io_subchannel_sch_event at 4daad6
#34 [163f7b80] css_evaluate_known_subchannel at 4d2cc0
#35 [163f7be0] slow_eval_known_fn at 4d3cea
#36 [163f7c10] bus_for_each_dev at 45ea56
#37 [163f7c50] for_each_subchannel_staged at 4d337e
#38 [163f7c98] css_slow_path_func at 4d3450
#39 [163f7cc0] process_one_work at 164ff4
#40 [163f7d60] worker_thread at 166500
#41 [163f7da8] kthread at 16e67c
#42 [163f7eb0] kernel_thread_starter at 6b0a5e

Removing a disk while writing to it now ends up mostly
with errors (which is new behavior and good).
However, the detached device is still listed under /dev, and a
subsequent umount ends up in a hang. Latter also occurred with
my approach, sometimes.

Sometimes everything ends up in QEMU crashes, which is, however, not
reproducible. I will investigate on this.

Heinz


When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue reques

Re: [PATCH 0/3] tools/virtio: build fixes for virtio_test

2014-02-13 Thread Heinz Graalfs

On 13/02/14 05:40, Rusty Russell wrote:

"Michael S. Tsirkin"  writes:


On Tue, Feb 11, 2014 at 04:58:17PM +1030, Joel Stanley wrote:

Recent changes to drivers/virtio broke compilation for the tests in
tools/virtio. The following patches are build fixes for those changes, as well
as a fix for a typo that would have never built.

The changes were tested on my amd64 system against 3.14-rc2.


Ah, thanks a lot for posting this.
I had this fixed in my tree but forgot to send.
For the series:

Acked-by: Michael S. Tsirkin 

Rusty, will you pick this up?


Hmm, add added:

Fixes: 53c18c9906441 (virtio_test: verify if virtqueue_kick() succeeded)
Cc: Heinz Graalfs 

Heinz, you didn't compile test your patch.

Insert angry-Torvalds-face here,
Rusty.


Rusty, yes, I missed to compile the test module.

I'm sorry.

Heinz

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2014-01-28 Thread Heinz Graalfs

On 23/01/14 05:51, Rusty Russell wrote:

Heinz Graalfs  writes:

Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.


Hi Heinz,

 I didn't get a response on my 'break all the virtqueues' patch
series.  Could your System Z code work with this?

Rusty.




Sorry Rusty, I'm back as of today.

I applied your patch series and did some testing...

Removing a disk while reading from it mostly still ends up
in hangs as of below:

PID: 13 TASK: 163f8000  CPU: 0   COMMAND: "kworker/u128:1"
 #0 [163f72e0] __schedule at 6aa22c
 #1 [163f7428] io_schedule at 6aab6c
 #2 [163f7448] sleep_on_page at 22cbb2
 #3 [163f7460] __wait_on_bit at 6ab394
 #4 [163f74b0] wait_on_page_bit at 22cef4
 #5 [163f7508] filemap_fdatawait_range at 22d0a6
 #6 [163f75e8] filemap_write_and_wait at 22de62
 #7 [163f7618] fsync_bdev at 2dc5d8
 #8 [163f7640] invalidate_partition at 407ba8
 #9 [163f7668] del_gendisk at 408a4c
#10 [163f76c0] virtblk_remove at 46f81e
#11 [163f76f8] virtio_dev_remove at 43d302
#12 [163f7730] __device_release_driver at 4604c4
#13 [163f7758] device_release_driver at 46057c
#14 [163f7780] bus_remove_device at 45ff74
#15 [163f77b0] device_del at 45cf54
#16 [163f77e8] device_unregister at 45d00e
#17 [163f7808] unregister_virtio_device at 43d5ba
#18 [163f7828] virtio_ccw_remove at 55156c
#19 [163f7850] ccw_device_remove at 4d7e22
#20 [163f78d8] __device_release_driver at 4604c4
#21 [163f7900] device_release_driver at 46057c
#22 [163f7928] bus_remove_device at 45ff74
#23 [163f7958] device_del at 45cf54
#24 [163f7990] ccw_device_unregister at 4d86a0
#25 [163f79b0] io_subchannel_remove at 4d8d1a
#26 [163f79e8] css_remove at 4d2856
#27 [163f7a08] __device_release_driver at 4604c4
#28 [163f7a30] device_release_driver at 46057c
#29 [163f7a58] bus_remove_device at 45ff74
#30 [163f7a88] device_del at 45cf54
#31 [163f7ac0] device_unregister at 45d00e
#32 [163f7ae0] css_sch_device_unregister at 4d29d4
#33 [163f7b08] io_subchannel_sch_event at 4daad6
#34 [163f7b80] css_evaluate_known_subchannel at 4d2cc0
#35 [163f7be0] slow_eval_known_fn at 4d3cea
#36 [163f7c10] bus_for_each_dev at 45ea56
#37 [163f7c50] for_each_subchannel_staged at 4d337e
#38 [163f7c98] css_slow_path_func at 4d3450
#39 [163f7cc0] process_one_work at 164ff4
#40 [163f7d60] worker_thread at 166500
#41 [163f7da8] kthread at 16e67c
#42 [163f7eb0] kernel_thread_starter at 6b0a5e

Removing a disk while writing to it now ends up mostly
with errors (which is new behavior and good).
However, the detached device is still listed under /dev, and a 
subsequent umount ends up in a hang. Latter also occurred with

my approach, sometimes.

Sometimes everything ends up in QEMU crashes, which is, however, not 
reproducible. I will investigate on this.


Heinz


When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and passing on device loss information via the surprize_removal
flag to the remove callback of the backend driver, can solve this task.

v3->v4 changes:
  - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver
(e.g. locked vcdev pointer reset/query; serialize remove()/set_offline()
callback processing).
  - patch 2: introduces 'device_lost' atomic in virtio_device and use in
backend driver virtio_blk accordingly (original 3 patches merged).
  - patch 3: the notify() callback is now serialized with remove()/set_offline()
callbacks. The notification is ignored if the vcdev pointer has been cleared
already (by remove() or set_offline()).

v2->v3 changes:
  - remove virtio_driver's notify callback (and appropriate code) introduced
in my v1 RFC
  - introduce 'surprize_removal' in struct virtio_device
  - change virtio_blk's rem

Fwd: Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2014-01-13 Thread Heinz Graalfs


Rusty, a Happy New Year!

how about my answer? What am I missing?

 Original Message 
Subject: Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
Date: Mon, 23 Dec 2013 09:39:13 +0100
From: Heinz Graalfs 
To: Rusty Russell , m...@redhat.com, 
virtualization@lists.linux-foundation.org

CC: borntrae...@de.ibm.com

On 19/12/13 01:19, Rusty Russell wrote:

Heinz Graalfs  writes:

On 17/12/13 04:42, Rusty Russell wrote:

Heinz Graalfs  writes:

Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.

When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.



Hi Heinz,

  If you simply mark every virtqueue as broken when this
unexpected unplug happens, does that not Just Work?

I think I've asked this before...
Rusty.


Hi Rusty,

setting the (one) virtqueue, vblk is currently using, as broken doesn't
solve the problems.

In that case virtblk_request()s still succeed  - like this one...


No, you set *all* virtqueues broken.  Which is accurate, right?

Cheers,
Rusty.



I'm sorry, but I don't get this.
The vblk involved has only 1 virtqueue.
What do you mean by all "*all* virtqueues ?

Heinz



([<00112b28>] show_trace+0xf8/0x154)
   [<00112bde>] show_stack+0x5a/0xdc
   [<0045eb56>] virtblk_request+0x25a/0x2b8
   [<003e749c>] __blk_run_queue+0x50/0x64
   [<003edb54>] blk_queue_bio+0x358/0x3f0
   [<003eb446>] generic_make_request+0xea/0x130
   [<003eb536>] submit_bio+0xaa/0x1a8
   [<002c95e8>] _submit_bh+0x1c4/0x2f4
   [<003a25e4>] journal_write_superblock+0xa0/0x1fc
   [<003a3ed4>] journal_update_sb_log_tail+0x48/0x7c
   [<0039e742>] journal_commit_transaction+0x1586/0x1aa0
   [<003a2a0e>] kjournald+0xfe/0x2a0
   [<001786fc>] kthread+0xd8/0xe0
   [<00698fee>] kernel_thread_starter+0x6/0xc
2 locks held by kjournald/1984:

... and end up in hang situations ...

PID: 13 TASK: 1e3f8000  CPU: 0   COMMAND: "kworker/u128:1"
   #0 [1e2033e0] __schedule at 695ff2
   #1 [1e203530] log_wait_commit at 3a28a6
   #2 [1e2035a0] ext3_sync_fs at 328dea
   #3 [1e2035d8] sync_filesystem at 2c785c
   #4 [1e203600] fsync_bdev at 2d4650
   #5 [1e203628] invalidate_partition at 3f80c8
   #6 [1e203650] del_gendisk at 3f8f5c
   #7 [1e2036c8] virtblk_remove at 45e60e
   #8 [1e203700] virtio_dev_remove at 42d72e
   #9 [1e203738] __device_release_driver at 44f0b0
#10 [1e203760] device_release_driver at 44f16c
#11 [1e203788] bus_remove_device at 44ea92
#12 [1e2037b8] device_del at 44bb40
#13 [1e2037f0] device_unregister at 44bbfa
#14 [1e203810] unregister_virtio_device at 42d9e6
#15 [1e203830] virtio_ccw_remove at 53b834
#16 [1e203850] ccw_device_remove at 4c5bf6
#17 [1e2038d8] __device_release_driver at 44f0b0
#18 [1e203900] device_release_driver at 44f16c
#19 [1e203928] bus_remove_device at 44ea92
#20 [1e203958] device_del at 44bb40
#21 [1e203990] ccw_device_unregister at 4c645c
#22 [1e2039b0] io_subchannel_remove at 4c6b1a
#23 [1e2039e8] css_remove at 4c054e
#24 [1e203a08] __device_release_driver at 44f0b0
#25 [1e203a30] device_release_driver at 44f16c
#26 [1e203a58] bus_remove_device at 44ea92
#27 [1e203a88] device_del at 44bb40
#28 [1e203ac0] device_unregister at 44bbfa
#29 [1e203ae0] css_sch_device_unregister at 4c06cc
#30 [1e203b08] io_subchannel_sch_event at 4c8c3a
#31 [1e203b80] css_evaluate_known_subchannel at 4c09bc
#32 [1e203be0] slow_eval_known_fn at 4c19a6
#33 [1e203c10] bus_for_each_dev at 44d50e
#34 [1e203c50] for_each_subchannel_staged at 4c1066
#35 [1e203c98] css_slow_path_func at 4c1124
#36 [1e203cc0] process_one_work at 16c7f6
#37 [1e203d60] worker_thread at 16dce4
#38 [1e203da8] kthread at 1786fc
#39 [1e203eb0] kernel_thread_starter at 698fee

Heinz





When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent reque

Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2013-12-23 Thread Heinz Graalfs

On 19/12/13 01:19, Rusty Russell wrote:

Heinz Graalfs  writes:

On 17/12/13 04:42, Rusty Russell wrote:

Heinz Graalfs  writes:

Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.

When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.



Hi Heinz,

  If you simply mark every virtqueue as broken when this
unexpected unplug happens, does that not Just Work?

I think I've asked this before...
Rusty.


Hi Rusty,

setting the (one) virtqueue, vblk is currently using, as broken doesn't
solve the problems.

In that case virtblk_request()s still succeed  - like this one...


No, you set *all* virtqueues broken.  Which is accurate, right?

Cheers,
Rusty.



I'm sorry, but I don't get this.
The vblk involved has only 1 virtqueue.
What do you mean by all "*all* virtqueues ?

Heinz



([<00112b28>] show_trace+0xf8/0x154)
   [<00112bde>] show_stack+0x5a/0xdc
   [<0045eb56>] virtblk_request+0x25a/0x2b8
   [<003e749c>] __blk_run_queue+0x50/0x64
   [<003edb54>] blk_queue_bio+0x358/0x3f0
   [<003eb446>] generic_make_request+0xea/0x130
   [<003eb536>] submit_bio+0xaa/0x1a8
   [<002c95e8>] _submit_bh+0x1c4/0x2f4
   [<003a25e4>] journal_write_superblock+0xa0/0x1fc
   [<003a3ed4>] journal_update_sb_log_tail+0x48/0x7c
   [<0039e742>] journal_commit_transaction+0x1586/0x1aa0
   [<003a2a0e>] kjournald+0xfe/0x2a0
   [<001786fc>] kthread+0xd8/0xe0
   [<00698fee>] kernel_thread_starter+0x6/0xc
2 locks held by kjournald/1984:

... and end up in hang situations ...

PID: 13 TASK: 1e3f8000  CPU: 0   COMMAND: "kworker/u128:1"
   #0 [1e2033e0] __schedule at 695ff2
   #1 [1e203530] log_wait_commit at 3a28a6
   #2 [1e2035a0] ext3_sync_fs at 328dea
   #3 [1e2035d8] sync_filesystem at 2c785c
   #4 [1e203600] fsync_bdev at 2d4650
   #5 [1e203628] invalidate_partition at 3f80c8
   #6 [1e203650] del_gendisk at 3f8f5c
   #7 [1e2036c8] virtblk_remove at 45e60e
   #8 [1e203700] virtio_dev_remove at 42d72e
   #9 [1e203738] __device_release_driver at 44f0b0
#10 [1e203760] device_release_driver at 44f16c
#11 [1e203788] bus_remove_device at 44ea92
#12 [1e2037b8] device_del at 44bb40
#13 [1e2037f0] device_unregister at 44bbfa
#14 [1e203810] unregister_virtio_device at 42d9e6
#15 [1e203830] virtio_ccw_remove at 53b834
#16 [1e203850] ccw_device_remove at 4c5bf6
#17 [1e2038d8] __device_release_driver at 44f0b0
#18 [1e203900] device_release_driver at 44f16c
#19 [1e203928] bus_remove_device at 44ea92
#20 [1e203958] device_del at 44bb40
#21 [1e203990] ccw_device_unregister at 4c645c
#22 [1e2039b0] io_subchannel_remove at 4c6b1a
#23 [1e2039e8] css_remove at 4c054e
#24 [1e203a08] __device_release_driver at 44f0b0
#25 [1e203a30] device_release_driver at 44f16c
#26 [1e203a58] bus_remove_device at 44ea92
#27 [1e203a88] device_del at 44bb40
#28 [1e203ac0] device_unregister at 44bbfa
#29 [1e203ae0] css_sch_device_unregister at 4c06cc
#30 [1e203b08] io_subchannel_sch_event at 4c8c3a
#31 [1e203b80] css_evaluate_known_subchannel at 4c09bc
#32 [1e203be0] slow_eval_known_fn at 4c19a6
#33 [1e203c10] bus_for_each_dev at 44d50e
#34 [1e203c50] for_each_subchannel_staged at 4c1066
#35 [1e203c98] css_slow_path_func at 4c1124
#36 [1e203cc0] process_one_work at 16c7f6
#37 [1e203d60] worker_thread at 16dce4
#38 [1e203da8] kthread at 1786fc
#39 [1e203eb0] kernel_thread_starter at 698fee

Heinz





When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and passing on device loss information via the surprize_removal
flag to the remove callback of the backend driver, can solve this task.

v3->v4 changes:
   - patch 1: solves some vcdev pointer handling issues in 

Re: [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2013-12-17 Thread Heinz Graalfs

On 17/12/13 04:42, Rusty Russell wrote:

Heinz Graalfs  writes:

Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.

When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.



Hi Heinz,

 If you simply mark every virtqueue as broken when this
unexpected unplug happens, does that not Just Work?

I think I've asked this before...
Rusty.


Hi Rusty,

setting the (one) virtqueue, vblk is currently using, as broken doesn't 
solve the problems.


In that case virtblk_request()s still succeed  - like this one...

([<00112b28>] show_trace+0xf8/0x154)
 [<00112bde>] show_stack+0x5a/0xdc
 [<0045eb56>] virtblk_request+0x25a/0x2b8
 [<003e749c>] __blk_run_queue+0x50/0x64
 [<003edb54>] blk_queue_bio+0x358/0x3f0
 [<003eb446>] generic_make_request+0xea/0x130
 [<003eb536>] submit_bio+0xaa/0x1a8
 [<002c95e8>] _submit_bh+0x1c4/0x2f4
 [<003a25e4>] journal_write_superblock+0xa0/0x1fc
 [<003a3ed4>] journal_update_sb_log_tail+0x48/0x7c
 [<0039e742>] journal_commit_transaction+0x1586/0x1aa0
 [<003a2a0e>] kjournald+0xfe/0x2a0
 [<001786fc>] kthread+0xd8/0xe0
 [<00698fee>] kernel_thread_starter+0x6/0xc
2 locks held by kjournald/1984:

... and end up in hang situations ...

PID: 13 TASK: 1e3f8000  CPU: 0   COMMAND: "kworker/u128:1"
 #0 [1e2033e0] __schedule at 695ff2
 #1 [1e203530] log_wait_commit at 3a28a6
 #2 [1e2035a0] ext3_sync_fs at 328dea
 #3 [1e2035d8] sync_filesystem at 2c785c
 #4 [1e203600] fsync_bdev at 2d4650
 #5 [1e203628] invalidate_partition at 3f80c8
 #6 [1e203650] del_gendisk at 3f8f5c
 #7 [1e2036c8] virtblk_remove at 45e60e
 #8 [1e203700] virtio_dev_remove at 42d72e
 #9 [1e203738] __device_release_driver at 44f0b0
#10 [1e203760] device_release_driver at 44f16c
#11 [1e203788] bus_remove_device at 44ea92
#12 [1e2037b8] device_del at 44bb40
#13 [1e2037f0] device_unregister at 44bbfa
#14 [1e203810] unregister_virtio_device at 42d9e6
#15 [1e203830] virtio_ccw_remove at 53b834
#16 [1e203850] ccw_device_remove at 4c5bf6
#17 [1e2038d8] __device_release_driver at 44f0b0
#18 [1e203900] device_release_driver at 44f16c
#19 [1e203928] bus_remove_device at 44ea92
#20 [1e203958] device_del at 44bb40
#21 [1e203990] ccw_device_unregister at 4c645c
#22 [1e2039b0] io_subchannel_remove at 4c6b1a
#23 [1e2039e8] css_remove at 4c054e
#24 [1e203a08] __device_release_driver at 44f0b0
#25 [1e203a30] device_release_driver at 44f16c
#26 [1e203a58] bus_remove_device at 44ea92
#27 [1e203a88] device_del at 44bb40
#28 [1e203ac0] device_unregister at 44bbfa
#29 [1e203ae0] css_sch_device_unregister at 4c06cc
#30 [1e203b08] io_subchannel_sch_event at 4c8c3a
#31 [1e203b80] css_evaluate_known_subchannel at 4c09bc
#32 [1e203be0] slow_eval_known_fn at 4c19a6
#33 [1e203c10] bus_for_each_dev at 44d50e
#34 [1e203c50] for_each_subchannel_staged at 4c1066
#35 [1e203c98] css_slow_path_func at 4c1124
#36 [1e203cc0] process_one_work at 16c7f6
#37 [1e203d60] worker_thread at 16dce4
#38 [1e203da8] kthread at 1786fc
#39 [1e203eb0] kernel_thread_starter at 698fee

Heinz





When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and passing on device loss information via the surprize_removal
flag to the remove callback of the backend driver, can solve this task.

v3->v4 changes:
  - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver
(e.g. locked vcdev pointer reset/query; serialize remove()/set_offline()
callback processing).
  - patch 2: introduces 'device_lost' atomic in virtio_device and use in
backend driver virtio_blk accordingly (original 3 patches merged).
  - patch 3: the notify() callback is now serialized with

[PATCH v4 RFC 3/3] virtio_ccw: set 'device_lost' on CIO_GONE notification

2013-12-13 Thread Heinz Graalfs
When a CIO_GONE notification is received the device_lost flag is
set in the virtio_device. This flag should be tested by a backend
in order to be able to prevent triggering final I/O to a device that
is not reachable any more.

The notification is ignored in case remove or set_offline is already
running. The virtio_device pointer might point to freed memory in that
case.

Signed-off-by: Heinz Graalfs 
Acked-by: Cornelia Huck 
---
 drivers/s390/kvm/virtio_ccw.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index b939a7f..a468b9c 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1085,8 +1086,26 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   /*
+* Make sure vcdev is set
+* i.e. set_offline/remove callback not already running
+*/
+   if (!vcdev)
+   return NOTIFY_DONE;
+
+   switch (event) {
+   case CIO_GONE:
+   atomic_inc(&vcdev->vdev.device_lost);
+   rc = NOTIFY_DONE;
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

2013-12-13 Thread Heinz Graalfs
Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th. 

When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and passing on device loss information via the surprize_removal
flag to the remove callback of the backend driver, can solve this task.

v3->v4 changes:
 - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver
   (e.g. locked vcdev pointer reset/query; serialize remove()/set_offline()
   callback processing).
 - patch 2: introduces 'device_lost' atomic in virtio_device and use in
   backend driver virtio_blk accordingly (original 3 patches merged).
 - patch 3: the notify() callback is now serialized with remove()/set_offline()
   callbacks. The notification is ignored if the vcdev pointer has been cleared
   already (by remove() or set_offline()).

v2->v3 changes:
 - remove virtio_driver's notify callback (and appropriate code) introduced
   in my v1 RFC
 - introduce 'surprize_removal' in struct virtio_device
 - change virtio_blk's remove callback to perform special actions when the
   surprize_removal flag is set
   - avoid final I/O by preventing further request queueing
   - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests
 - set surprize_removal in virtio_ccw's notify callback when a device is lost

v1->v2 changes:
 - add include of linux/notifier.h (I also added it to the 3rd patch)
 - get queue lock in order to be able to use safe queue_flag_set() functions
   in virtblk_notify() handler


Heinz Graalfs (3):
  virtio_ccw: fix vcdev pointer handling issues
  virtio: introduce 'device_lost' flag in virtio_device
  virtio_ccw: set 'device_lost' on CIO_GONE notification

 drivers/block/virtio_blk.c| 14 ++-
 drivers/s390/kvm/virtio_ccw.c | 58 ---
 include/linux/virtio.h|  2 ++
 3 files changed, 64 insertions(+), 10 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 RFC 1/3] virtio_ccw: fix vcdev pointer handling issues

2013-12-13 Thread Heinz Graalfs
The interrupt handler virtio_ccw_int_handler() using the vcdev pointer
is protected by the ccw_device lock. Resetting the pointer within the
ccw_device structure should be done when holding this lock.

Also resetting the vcdev pointer (under the ccw_device lock) prior to
freeing the vcdev pointer memory removes a critical path.

Signed-off-by: Heinz Graalfs 
Acked-by: Cornelia Huck 
---
 drivers/s390/kvm/virtio_ccw.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..b939a7f 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -886,6 +886,8 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
struct virtqueue *vq;
struct virtio_driver *drv;
 
+   if (!vcdev)
+   return;
/* Check if it's a notification from the host. */
if ((intparm == 0) &&
(scsw_stctl(&irb->scsw) ==
@@ -985,23 +987,37 @@ static int virtio_ccw_probe(struct ccw_device *cdev)
return 0;
 }
 
+static struct virtio_ccw_device *virtio_grab_drvdata(struct ccw_device *cdev)
+{
+   unsigned long flags;
+   struct virtio_ccw_device *vcdev;
+
+   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
+   vcdev = dev_get_drvdata(&cdev->dev);
+   if (!vcdev) {
+   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
+   return NULL;
+   }
+   dev_set_drvdata(&cdev->dev, NULL);
+   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
+   return vcdev;
+}
+
 static void virtio_ccw_remove(struct ccw_device *cdev)
 {
-   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+   struct virtio_ccw_device *vcdev = virtio_grab_drvdata(cdev);
 
-   if (cdev->online) {
+   if (vcdev && cdev->online)
unregister_virtio_device(&vcdev->vdev);
-   dev_set_drvdata(&cdev->dev, NULL);
-   }
cdev->handler = NULL;
 }
 
 static int virtio_ccw_offline(struct ccw_device *cdev)
 {
-   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+   struct virtio_ccw_device *vcdev = virtio_grab_drvdata(cdev);
 
-   unregister_virtio_device(&vcdev->vdev);
-   dev_set_drvdata(&cdev->dev, NULL);
+   if (vcdev)
+   unregister_virtio_device(&vcdev->vdev);
return 0;
 }
 
@@ -1010,6 +1026,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
 {
int ret;
struct virtio_ccw_device *vcdev;
+   unsigned long flags;
 
vcdev = kzalloc(sizeof(*vcdev), GFP_KERNEL);
if (!vcdev) {
@@ -1039,7 +1056,9 @@ static int virtio_ccw_online(struct ccw_device *cdev)
INIT_LIST_HEAD(&vcdev->virtqueues);
spin_lock_init(&vcdev->lock);
 
+   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
dev_set_drvdata(&cdev->dev, vcdev);
+   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
vcdev->vdev.id.vendor = cdev->id.cu_type;
vcdev->vdev.id.device = cdev->id.cu_model;
ret = register_virtio_device(&vcdev->vdev);
@@ -1050,7 +1069,9 @@ static int virtio_ccw_online(struct ccw_device *cdev)
}
return 0;
 out_put:
+   spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
dev_set_drvdata(&cdev->dev, NULL);
+   spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
put_device(&vcdev->vdev.dev);
return ret;
 out_free:
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 RFC 2/3] virtio: introduce 'device_lost' flag in virtio_device

2013-12-13 Thread Heinz Graalfs
This flag should be set by a virtio transport driver, when it was
notified about a lost device, before the remove callback of a
backend driver is triggered.

A backend driver can test this flag in order to perform specific
actions that might be appropriate wrt the device loss.

In case of a device loss further request queueing should be prevented
by setting appropriate queue flags prior to invoking del_gendisk().
Blocking of request queueing leads to appropriate I/O errors when data
are tried to be synched. Trying to synch data to a lost block device
doesn't make too much sense.

Calling blk_cleanup_queue() when the device_lost flag is set due to a
disappeared device. It avoid hangs due to incomplete requests
(e.g. in-flight requests). Such requests must be considered as lost.

Signed-off-by: Heinz Graalfs 
Acked-by: Cornelia Huck 
---
 drivers/block/virtio_blk.c | 14 +-
 include/linux/virtio.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..e5b4947 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -876,14 +876,26 @@ static void virtblk_remove(struct virtio_device *vdev)
struct virtio_blk *vblk = vdev->priv;
int index = vblk->index;
int refc;
+   int device_lost;
+   unsigned long flags;
 
/* Prevent config work handler from accessing the device. */
mutex_lock(&vblk->config_lock);
vblk->config_enable = false;
mutex_unlock(&vblk->config_lock);
 
+   device_lost = atomic_read(&vdev->device_lost);
+   if (device_lost) {
+   spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
+   queue_flag_set(QUEUE_FLAG_DYING, vblk->disk->queue);
+   queue_flag_set(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+   queue_flag_set(QUEUE_FLAG_NOXMERGES, vblk->disk->queue);
+   spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
+   }
+
del_gendisk(vblk->disk);
-   blk_cleanup_queue(vblk->disk->queue);
+   if (!device_lost)
+   blk_cleanup_queue(vblk->disk->queue);
 
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..c18db21 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -87,6 +87,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  * @vringh_config: configuration ops for host vrings.
  * @vqs: the list of virtqueues for this device.
  * @features: the features supported by both driver and device.
+ * @device_lost: to flag a device loss.
  * @priv: private pointer for the driver's use.
  */
 struct virtio_device {
@@ -98,6 +99,7 @@ struct virtio_device {
struct list_head vqs;
/* Note that this is a Linux set_bit-style bitmap. */
unsigned long features[1];
+   atomic_t device_lost;
void *priv;
 };
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss

2013-11-27 Thread Heinz Graalfs

On 27/11/13 13:49, Michael S. Tsirkin wrote:

On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:

On 27/11/13 11:47, Michael S. Tsirkin wrote:

On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:

Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
flag is set due to a disappeared device. It avoid hangs due to incomplete
requests (e.g. in-flight requests). Such requests must be considered as lost.


Ugh. Can't we complete these immediately using detach_unused_buf? If not why?


OK, I will try


I tried virtqueue_detach_unused_buf(). It doesn't seem to solve the 
problem. Would that affect block layer in-flight requests anyway? The 
function comment also says it should not be used on an active queue.


Isn't there a mechanism to end vring requests for which a 
vring_interrupt() is missing? (simulate virtblk_done() with an error)?

At least that's it what would help, I suppose.






If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
that will never complete. This is a weird situation, and most likely not
'serializable'.


Hmm interesting. Implement some timeout and probe device to make sure
it's still alive?


This patch doesn't try to solve any weird races.
It avoids triggering the block queue cleanup, with potential for a hang, 
IFF a device is gone.




but there is always some race, isn't it?


To clarify, why this might not be very elegant, a timer-based
solution for surprise removal during driver cleanup
might be easier than trying to build robust interfaces
to address this esoteric case.

But what worries me is that it's not clear to me that ccw won't
invoke notify in parallel with remove callback.
If this happens there will be use after free.


OK, I agree, calling remove twice or working on freed stuff must not happen.










___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss

2013-11-27 Thread Heinz Graalfs

On 27/11/13 11:47, Michael S. Tsirkin wrote:

On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:

Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
flag is set due to a disappeared device. It avoid hangs due to incomplete
requests (e.g. in-flight requests). Such requests must be considered as lost.


Ugh. Can't we complete these immediately using detach_unused_buf? If not why?


OK, I will try




If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
that will never complete. This is a weird situation, and most likely not
'serializable'.


Hmm interesting. Implement some timeout and probe device to make sure
it's still alive?


but there is always some race, isn't it?




Signed-off-by: Heinz Graalfs 
---
  drivers/block/virtio_blk.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0f64282..8c05001 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev)
}

del_gendisk(vblk->disk);
-   blk_cleanup_queue(vblk->disk->queue);
+   if (!vdev->surprize_removal)
+   blk_cleanup_queue(vblk->disk->queue);

/* Stop all the virtqueues. */
vdev->config->reset(vdev);
--
1.8.3.1




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss

2013-11-27 Thread Heinz Graalfs
Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
flag is set due to a disappeared device. It avoid hangs due to incomplete
requests (e.g. in-flight requests). Such requests must be considered as lost.

If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
that will never complete. This is a weird situation, and most likely not
'serializable'.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0f64282..8c05001 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev)
}
 
del_gendisk(vblk->disk);
-   blk_cleanup_queue(vblk->disk->queue);
+   if (!vdev->surprize_removal)
+   blk_cleanup_queue(vblk->disk->queue);
 
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 RFC 1/4] virtio: add surprize_removal to virtio_device

2013-11-27 Thread Heinz Graalfs
Add new entry surprize_removal to struct virtio_device.

When a virtio transport driver is notified about a lost device
it should set surprize_removal to true.

A backend driver can test this flag in order to perform specific
actions that might be appropriate wrt the device loss.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio.c | 1 +
 include/linux/virtio.h  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..290d1e2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -115,6 +115,7 @@ static int virtio_dev_probe(struct device *_d)
 
/* We have a driver! */
add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+   dev->surprize_removal = false;
 
/* Figure out what features the device supports. */
device_features = dev->config->get_features(dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..131404a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -98,6 +98,7 @@ struct virtio_device {
struct list_head vqs;
/* Note that this is a Linux set_bit-style bitmap. */
unsigned long features[1];
+   bool surprize_removal;
void *priv;
 };
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device

2013-11-27 Thread Heinz Graalfs
Hi, here is an updated patch-set to my v2 RFC
   virtio: add new notify() callback to virtio_driver
This RFC introduces a new virtio_device entry 'surprize_removal' instead
of a new 'notify' callback in struct virtio_driver.

When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and passing on device loss information via the surprize_removal
flag to the remove callback of the backend driver, can solve this task.

v2->v3 changes:
 - remove virtio_driver's notify callback (and appropriate code) introduced
   in my v1 RFC
 - introduce 'surprize_removal' in struct virtio_device
 - change virtio_blk's remove callback to perform special actions when the
   surprize_removal flag is set
   - avoid final I/O by preventing further request queueing
   - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests
 - set surprize_removal in virtio_ccw's notify callback when a device is lost

v1->v2 changes:
 - add include of linux/notifier.h (I also added it to the 3rd patch)
 - get queue lock in order to be able to use safe queue_flag_set() functions
   in virtblk_notify() handler


Heinz Graalfs (4):
  virtio: add surprize_removal to virtio_device
  virtio_blk: avoid further request queueing on device loss
  virtio_blk: avoid calling blk_cleanup_queue() on device loss
  virtio_ccw: set surprize_removal in virtio_device if a device was lost

 drivers/block/virtio_blk.c| 12 +++-
 drivers/s390/kvm/virtio_ccw.c | 16 ++--
 drivers/virtio/virtio.c   |  1 +
 include/linux/virtio.h|  1 +
 4 files changed, 27 insertions(+), 3 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss

2013-11-27 Thread Heinz Graalfs
Code is added to the remove callback to verify if a device was lost.

In case of a device loss further request queueing should be prevented
by setting appropriate queue flags prior to invoking del_gendisk().
Blocking of request queueing leads to appropriate I/O errors when data
are tried to be synched. Trying to synch data to a lost block device
doesn't make too much sense.

If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), del_gendisk() would be triggered
resulting in a hang. This is a weird situation, and most likely not
'serializable'.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..0f64282 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -875,6 +875,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 {
struct virtio_blk *vblk = vdev->priv;
int index = vblk->index;
+   unsigned long flags;
int refc;
 
/* Prevent config work handler from accessing the device. */
@@ -882,6 +883,14 @@ static void virtblk_remove(struct virtio_device *vdev)
vblk->config_enable = false;
mutex_unlock(&vblk->config_lock);
 
+   if (vdev->surprize_removal) {
+   spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
+   queue_flag_set(QUEUE_FLAG_DYING, vblk->disk->queue);
+   queue_flag_set(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+   queue_flag_set(QUEUE_FLAG_NOXMERGES, vblk->disk->queue);
+   spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
+   }
+
del_gendisk(vblk->disk);
blk_cleanup_queue(vblk->disk->queue);
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost

2013-11-27 Thread Heinz Graalfs
Code is added to the notify handler to set the 'surprize_removal' flag
in virtio_device in case a CIO_GONE notification occurs. The remove
callback of the backend driver must check this flag in order to perform
special processing for a lost device.

Signed-off-by: Heinz Graalfs 
---
 drivers/s390/kvm/virtio_ccw.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..8f6c74a 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1064,8 +1065,19 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   switch (event) {
+   case CIO_GONE:
+   vcdev->vdev.surprize_removal = true;
+   rc = NOTIFY_DONE;
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

2013-11-27 Thread Heinz Graalfs

On 21/11/13 19:31, Michael S. Tsirkin wrote:

On Thu, Nov 21, 2013 at 06:12:21PM +0100, Heinz Graalfs wrote:

On 21/11/13 16:15, Michael S. Tsirkin wrote:

On Thu, Nov 21, 2013 at 03:45:33PM +0100, Heinz Graalfs wrote:

virtio_ccw's notify() callback for the common IO layer invokes
virtio_driver's notify() callback to pass-on information to a
backend driver if an online device disappeared.

Signed-off-by: Heinz Graalfs 


simple question: is this serialized with device removal?
If this races with removal we have a problem.



notify and remove callbacks are not serialized.

Additional processing in the notify handler is now locked by the queue lock.

If remove is already active (e.g. driver unregister) and performing
its cleanup (via virtblk_remove), we should definitely perform the
(locked) request queue processing in notify (little later), because
if a notify comes in, the device is GONE (not going away). Earlier
triggered cleanup processing is about to fail anyway.


I don't get it. It's possible nothing is in queue
or everything timed out.
Then this notify will address freed memory.

I'm starting to think the right solution is to pass
a flag to remove: bool surprize_removal.

Then you will cleanly set flags before removing the device.

No new callbacks and no races.


OK, I will send a v3 RFC.

However, I suppose we will always have some kind of 'race' wrt this
weird scenario, ending up in different kind of hang situations.




---
  drivers/s390/kvm/virtio_ccw.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..682f688 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1064,8 +1065,18 @@ out_free:

  static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
  {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   switch (event) {
+   case CIO_GONE:
+   rc = notify_virtio_device(&vcdev->vdev, VDEV_GONE);
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
  }

  static struct ccw_device_id virtio_ids[] = {
--
1.8.3.1






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-21 Thread Heinz Graalfs

On 21/11/13 15:58, Michael S. Tsirkin wrote:

On Thu, Nov 21, 2013 at 03:43:51PM +0100, Heinz Graalfs wrote:

On 21/11/13 07:47, Michael S. Tsirkin wrote:

On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote:

Hi,

when an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

It's just the block device drivers that care to provide a notify
callback.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.

Patch 1 adds an optional notify() callback to virtio_driver.

Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.

Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.


Question: I guess remove callback is invoked eventually?
Could you please clarify why isn't this sufficient?



yes, the remove callback is invoked lateron, and it could be done
there. However, it should be done conditionally, and prior to
invoking del_gendisk() (which triggers final I/O). We would still
have the need for such notification information. The remove callback
is also invoked when a device is set offline, and in that case we
don't want a queue to reject further requests. The way it is done
right doesn't affect the remove callback. The weird situation,
however, is solved by the new notify callback.
Doing it in blk_cleanup_queue() (also triggered from
virtblk_remove()) is too late for this scenario of a lost device.
One wouldn't see any errors, but experience a 'hang' due to
inclomplete I/O. The invocation of virtblk_request() indirectly
caused by del_gendisk() would accept requests, the subsequent host
notification, however, would fail. (This is probably another
'window' that should be closed.)


I see. All this makes sense.

So it's really important that the event is sent
*before* device is removed.


well, if this event comes in all device related I/O will fail,
so we better don't trigger any further I/O.



Maybe it's a good idea to rename event GONE->GOING_AWAY ?


if this event comes in the device is GONE, it's not like 'going away'









Heinz Graalfs (3):
   virtio: add notify() callback to virtio_driver
   virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
   virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

  drivers/block/virtio_blk.c| 14 ++
  drivers/s390/kvm/virtio_ccw.c | 14 --
  drivers/virtio/virtio.c   |  8 
  include/linux/virtio.h| 10 ++
  4 files changed, 44 insertions(+), 2 deletions(-)

--
1.8.3.1






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-21 Thread Heinz Graalfs
Hi, here is an updated patch-set with changes as suggested by Michael Tsirkin.

When an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

It's just the block device drivers that care to provide a notify
callback.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.


v1->v2 changes:
 - add include of linux/notifier.h (I also added it to the 3rd patch)
 - get queue lock in order to be able to use safe queue_flag_set() functions
   in virtblk_notify() handler


Patch 1 adds an optional notify() callback to virtio_driver.

Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.

Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.

Heinz Graalfs (3):
  virtio: add notify() callback to virtio_driver
  virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
  virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

 drivers/block/virtio_blk.c| 17 +
 drivers/s390/kvm/virtio_ccw.c | 15 +--
 drivers/virtio/virtio.c   |  9 +
 include/linux/virtio.h| 10 ++
 4 files changed, 49 insertions(+), 2 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-21 Thread Heinz Graalfs
Add an optional notify() callback to virtio_driver. A backend
driver can provide this callback to perform actions for a lost
device.

notify() event values are inherited from virtio_ccw's notify()
callback. We might want to support even more of them lateron.

notify() return values are defined in include/linux/notifier.h.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio.c |  9 +
 include/linux/virtio.h  | 10 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..a7f4302 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Unique numbering for virtio devices. */
 static DEFINE_IDA(virtio_index_ida);
@@ -186,6 +187,14 @@ void unregister_virtio_driver(struct virtio_driver *driver)
 }
 EXPORT_SYMBOL_GPL(unregister_virtio_driver);
 
+int notify_virtio_device(struct virtio_device *vdev, int event)
+{
+   struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
+
+   return drv->notify ? drv->notify(vdev, event) : NOTIFY_DONE;
+}
+EXPORT_SYMBOL_GPL(notify_virtio_device);
+
 int register_virtio_device(struct virtio_device *dev)
 {
int err;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..da18e9a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
 /**
+ * notify event values
+ * @VDEV_GONE: device gone
+ */
+enum {
+   VDEV_GONE   = 1,
+};
+int notify_virtio_device(struct virtio_device *dev, int event);
+
+/**
  * virtio_driver - operations for a virtio I/O driver
  * @driver: underlying device driver (populate name and owner).
  * @id_table: the ids serviced by this driver.
@@ -129,6 +138,7 @@ struct virtio_driver {
void (*scan)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
void (*config_changed)(struct virtio_device *dev);
+   int (*notify)(struct virtio_device *dev, int event);
 #ifdef CONFIG_PM
int (*freeze)(struct virtio_device *dev);
int (*restore)(struct virtio_device *dev);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-21 Thread Heinz Graalfs

On 21/11/13 07:47, Michael S. Tsirkin wrote:

On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote:

Hi,

when an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

It's just the block device drivers that care to provide a notify
callback.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.

Patch 1 adds an optional notify() callback to virtio_driver.

Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.

Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.


Question: I guess remove callback is invoked eventually?
Could you please clarify why isn't this sufficient?



yes, the remove callback is invoked lateron, and it could be done there. 
However, it should be done conditionally, and prior to invoking 
del_gendisk() (which triggers final I/O). We would still have the need 
for such notification information. The remove callback is also invoked 
when a device is set offline, and in that case we don't want a queue to 
reject further requests. The way it is done right doesn't affect the 
remove callback. The weird situation, however, is solved by the new 
notify callback.


Doing it in blk_cleanup_queue() (also triggered from virtblk_remove()) 
is too late for this scenario of a lost device. One wouldn't see any 
errors, but experience a 'hang' due to inclomplete I/O. The invocation 
of virtblk_request() indirectly caused by del_gendisk() would accept 
requests, the subsequent host notification, however, would fail. (This 
is probably another 'window' that should be closed.)






Heinz Graalfs (3):
   virtio: add notify() callback to virtio_driver
   virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
   virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

  drivers/block/virtio_blk.c| 14 ++
  drivers/s390/kvm/virtio_ccw.c | 14 --
  drivers/virtio/virtio.c   |  8 
  include/linux/virtio.h| 10 ++
  4 files changed, 44 insertions(+), 2 deletions(-)

--
1.8.3.1




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-21 Thread Heinz Graalfs
Add virtblk_notify() as virtio_driver's notify() callback.

When a transport driver is notified that a device disappeared it
should invoke this callback to prevent further request queueing.

Subsequent block layer calls of virtio_blk's request function will
fail, resulting in appropriate I/O errors.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..04fd635 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define PART_BITS 4
 
@@ -901,6 +902,21 @@ static void virtblk_remove(struct virtio_device *vdev)
ida_simple_remove(&vd_index_ida, index);
 }
 
+static int virtblk_notify(struct virtio_device *vdev, int event)
+{
+   struct virtio_blk *vblk = vdev->priv;
+   unsigned long flags;
+
+   if (event == VDEV_GONE) {
+   spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
+   queue_flag_set(QUEUE_FLAG_DYING, vblk->disk->queue);
+   queue_flag_set(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+   queue_flag_set(QUEUE_FLAG_NOXMERGES, vblk->disk->queue);
+   spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
+   }
+   return NOTIFY_DONE;
+}
+
 #ifdef CONFIG_PM
 static int virtblk_freeze(struct virtio_device *vdev)
 {
@@ -961,6 +977,7 @@ static struct virtio_driver virtio_blk = {
.probe  = virtblk_probe,
.remove = virtblk_remove,
.config_changed = virtblk_config_changed,
+   .notify = virtblk_notify,
 #ifdef CONFIG_PM
.freeze = virtblk_freeze,
.restore= virtblk_restore,
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

2013-11-21 Thread Heinz Graalfs
virtio_ccw's notify() callback for the common IO layer invokes
virtio_driver's notify() callback to pass-on information to a
backend driver if an online device disappeared.

Signed-off-by: Heinz Graalfs 
---
 drivers/s390/kvm/virtio_ccw.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..682f688 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1064,8 +1065,18 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   switch (event) {
+   case CIO_GONE:
+   rc = notify_virtio_device(&vcdev->vdev, VDEV_GONE);
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

2013-11-21 Thread Heinz Graalfs

On 21/11/13 16:15, Michael S. Tsirkin wrote:

On Thu, Nov 21, 2013 at 03:45:33PM +0100, Heinz Graalfs wrote:

virtio_ccw's notify() callback for the common IO layer invokes
virtio_driver's notify() callback to pass-on information to a
backend driver if an online device disappeared.

Signed-off-by: Heinz Graalfs 


simple question: is this serialized with device removal?
If this races with removal we have a problem.



notify and remove callbacks are not serialized.

Additional processing in the notify handler is now locked by the queue lock.

If remove is already active (e.g. driver unregister) and performing its 
cleanup (via virtblk_remove), we should definitely perform the (locked) 
request queue processing in notify (little later), because if a notify 
comes in, the device is GONE (not going away). Earlier triggered cleanup 
processing is about to fail anyway.



---
  drivers/s390/kvm/virtio_ccw.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..682f688 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1064,8 +1065,18 @@ out_free:

  static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
  {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   switch (event) {
+   case CIO_GONE:
+   rc = notify_virtio_device(&vcdev->vdev, VDEV_GONE);
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
  }

  static struct ccw_device_id virtio_ids[] = {
--
1.8.3.1




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

2013-11-20 Thread Heinz Graalfs
virtio_ccw's notify() callback for the common IO layer invokes
virtio_driver's notify() callback to pass-on information to a
backend driver if an online device disappeared.

Signed-off-by: Heinz Graalfs 
---
 drivers/s390/kvm/virtio_ccw.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..420010d 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -1064,8 +1064,18 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   switch (event) {
+   case CIO_GONE:
+   rc = notify_virtio_device(&vcdev->vdev, VDEV_GONE);
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Heinz Graalfs
Add virtblk_notify() as virtio_driver's notify() callback.

When a transport driver is notified that a device disappeared it
should invoke this callback to prevent further request queueing.

Subsequent block layer calls of virtio_blk's request function will
fail, resulting in appropriate I/O errors.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..7fc1d62 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
ida_simple_remove(&vd_index_ida, index);
 }
 
+static int virtblk_notify(struct virtio_device *vdev, int event)
+{
+   struct virtio_blk *vblk = vdev->priv;
+
+   if (event == VDEV_GONE) {
+   queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue);
+   queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+   queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
+   vblk->disk->queue);
+   }
+   return NOTIFY_DONE;
+}
+
 #ifdef CONFIG_PM
 static int virtblk_freeze(struct virtio_device *vdev)
 {
@@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
.probe  = virtblk_probe,
.remove = virtblk_remove,
.config_changed = virtblk_config_changed,
+   .notify = virtblk_notify,
 #ifdef CONFIG_PM
.freeze = virtblk_freeze,
.restore= virtblk_restore,
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-20 Thread Heinz Graalfs
Hi,

when an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

It's just the block device drivers that care to provide a notify
callback.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.

Patch 1 adds an optional notify() callback to virtio_driver.

Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.

Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.

Heinz Graalfs (3):
  virtio: add notify() callback to virtio_driver
  virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
  virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

 drivers/block/virtio_blk.c| 14 ++
 drivers/s390/kvm/virtio_ccw.c | 14 --
 drivers/virtio/virtio.c   |  8 
 include/linux/virtio.h| 10 ++
 4 files changed, 44 insertions(+), 2 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Heinz Graalfs
Add an optional notify() callback to virtio_driver. A backend
driver can provide this callback to perform actions for a lost
device.

notify() event values are inherited from virtio_ccw's notify()
callback. We might want to support even more of them lateron.

notify() return values are defined in include/linux/notifier.h.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio.c |  8 
 include/linux/virtio.h  | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..a09abb4 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver *driver)
 }
 EXPORT_SYMBOL_GPL(unregister_virtio_driver);
 
+int notify_virtio_device(struct virtio_device *vdev, int event)
+{
+   struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
+
+   return drv->notify ? drv->notify(vdev, event) : NOTIFY_DONE;
+}
+EXPORT_SYMBOL_GPL(notify_virtio_device);
+
 int register_virtio_device(struct virtio_device *dev)
 {
int err;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..da18e9a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
 /**
+ * notify event values
+ * @VDEV_GONE: device gone
+ */
+enum {
+   VDEV_GONE   = 1,
+};
+int notify_virtio_device(struct virtio_device *dev, int event);
+
+/**
  * virtio_driver - operations for a virtio I/O driver
  * @driver: underlying device driver (populate name and owner).
  * @id_table: the ids serviced by this driver.
@@ -129,6 +138,7 @@ struct virtio_driver {
void (*scan)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
void (*config_changed)(struct virtio_device *dev);
+   int (*notify)(struct virtio_device *dev, int event);
 #ifdef CONFIG_PM
int (*freeze)(struct virtio_device *dev);
int (*restore)(struct virtio_device *dev);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 9/9] virtio_scsi: verify if queue is broken after virtqueue_get_buf()

2013-11-04 Thread Heinz Graalfs
Rusty,

and here is patch 9 as you suggested.

Thanks.

Heinz Graalfs (9):
  virtio_ring: change host notification API
  virtio_ring: let virtqueue_{kick()/notify()} return a bool
  virtio_net: verify if virtqueue_kick() succeeded
  virtio_test: verify if virtqueue_kick() succeeded
  virtio_ring: add new function virtqueue_is_broken()
  virtio_blk: verify if queue is broken after virtqueue_get_buf()
  virtio_console: verify if queue is broken after virtqueue_get_buf()
  virtio_net: verify if queue is broken after virtqueue_get_buf()
  virtio_scsi: verify if queue is broken after virtqueue_get_buf()

 drivers/block/virtio_blk.c |  2 ++
 drivers/char/virtio_console.c  |  6 --
 drivers/lguest/lguest_device.c |  3 ++-
 drivers/net/virtio_net.c   | 12 +++-
 drivers/remoteproc/remoteproc_virtio.c |  3 ++-
 drivers/s390/kvm/kvm_virtio.c  |  8 ++--
 drivers/s390/kvm/virtio_ccw.c  |  5 -
 drivers/scsi/virtio_scsi.c |  3 +++
 drivers/virtio/virtio_mmio.c   |  3 ++-
 drivers/virtio/virtio_pci.c|  3 ++-
 drivers/virtio/virtio_ring.c   | 32 ++--
 include/linux/virtio.h |  6 --
 include/linux/virtio_ring.h|  2 +-
 tools/virtio/virtio_test.c |  6 --
 tools/virtio/vringh_test.c | 13 +
 15 files changed, 78 insertions(+), 29 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 9/9] virtio_scsi: verify if queue is broken after virtqueue_get_buf()

2013-11-04 Thread Heinz Graalfs
If virtqueue_get_buf() returned with a NULL pointer avoid a possibly
endless loop by checking for a broken virtqueue.

Signed-off-by: Heinz Graalfs 
---
 drivers/scsi/virtio_scsi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 74b88ef..aa25aab 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -224,6 +224,9 @@ static void virtscsi_vq_done(struct virtio_scsi *vscsi,
virtqueue_disable_cb(vq);
while ((buf = virtqueue_get_buf(vq, &len)) != NULL)
fn(vscsi, buf);
+
+   if (unlikely(virtqueue_is_broken(vq)))
+   break;
} while (!virtqueue_enable_cb(vq));
spin_unlock_irqrestore(&virtscsi_vq->vq_lock, flags);
 }
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 1/9] virtio_ring: change host notification API

2013-10-28 Thread Heinz Graalfs
Currently a host kick error is silently ignored and not reflected in
the virtqueue of a particular virtio device.

Changing the notify API for guest->host notification seems to be one
prerequisite in order to be able to handle such errors in the context
where the kick is triggered.

This patch changes the notify API. The notify function must return a
bool return value. It returns false if the host notification failed.

Signed-off-by: Heinz Graalfs 
---
 drivers/lguest/lguest_device.c |  3 ++-
 drivers/remoteproc/remoteproc_virtio.c |  3 ++-
 drivers/s390/kvm/kvm_virtio.c  |  8 ++--
 drivers/s390/kvm/virtio_ccw.c  |  5 -
 drivers/virtio/virtio_mmio.c   |  3 ++-
 drivers/virtio/virtio_pci.c|  3 ++-
 drivers/virtio/virtio_ring.c   |  4 ++--
 include/linux/virtio_ring.h|  2 +-
 tools/virtio/virtio_test.c |  3 ++-
 tools/virtio/vringh_test.c | 13 +
 10 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index b3256ff..d0a1d8a 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -229,7 +229,7 @@ struct lguest_vq_info {
  * make a hypercall.  We hand the physical address of the virtqueue so the Host
  * knows which virtqueue we're talking about.
  */
-static void lg_notify(struct virtqueue *vq)
+static bool lg_notify(struct virtqueue *vq)
 {
/*
 * We store our virtqueue information in the "priv" pointer of the
@@ -238,6 +238,7 @@ static void lg_notify(struct virtqueue *vq)
struct lguest_vq_info *lvq = vq->priv;
 
hcall(LHCALL_NOTIFY, lvq->config.pfn << PAGE_SHIFT, 0, 0, 0);
+   return true;
 }
 
 /* An extern declaration inside a C file is bad form.  Don't do it. */
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index b09c75c..a34b506 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -30,7 +30,7 @@
 #include "remoteproc_internal.h"
 
 /* kick the remote processor, and let it know which virtqueue to poke at */
-static void rproc_virtio_notify(struct virtqueue *vq)
+static bool rproc_virtio_notify(struct virtqueue *vq)
 {
struct rproc_vring *rvring = vq->priv;
struct rproc *rproc = rvring->rvdev->rproc;
@@ -39,6 +39,7 @@ static void rproc_virtio_notify(struct virtqueue *vq)
dev_dbg(&rproc->dev, "kicking vq index: %d\n", notifyid);
 
rproc->ops->kick(rproc, notifyid);
+   return true;
 }
 
 /**
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 2ea6165..0b51fa7 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -166,11 +166,15 @@ static void kvm_reset(struct virtio_device *vdev)
  * make a hypercall.  We hand the address  of the virtqueue so the Host
  * knows which virtqueue we're talking about.
  */
-static void kvm_notify(struct virtqueue *vq)
+static bool kvm_notify(struct virtqueue *vq)
 {
+   long rc;
struct kvm_vqconfig *config = vq->priv;
 
-   kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, config->address);
+   rc = kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, config->address);
+   if (rc < 0)
+   return false;
+   return true;
 }
 
 /*
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 6a410d4..be2ed27 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -375,7 +375,7 @@ static inline long do_kvm_notify(struct subchannel_id schid,
return __rc;
 }
 
-static void virtio_ccw_kvm_notify(struct virtqueue *vq)
+static bool virtio_ccw_kvm_notify(struct virtqueue *vq)
 {
struct virtio_ccw_vq_info *info = vq->priv;
struct virtio_ccw_device *vcdev;
@@ -384,6 +384,9 @@ static void virtio_ccw_kvm_notify(struct virtqueue *vq)
vcdev = to_vc_device(info->vq->vdev);
ccw_device_get_schid(vcdev->cdev, &schid);
info->cookie = do_kvm_notify(schid, vq->index, info->cookie);
+   if (info->cookie < 0)
+   return false;
+   return true;
 }
 
 static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev,
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 1ba0d68..e9fdeb8 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -219,13 +219,14 @@ static void vm_reset(struct virtio_device *vdev)
 /* Transport interface */
 
 /* the notify function used when creating a virt queue */
-static void vm_notify(struct virtqueue *vq)
+static bool vm_notify(struct virtqueue *vq)
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
 
/* We write the queue's selector into the notification register to
 * signal the other

[PATCH V2 RFC 1/9] virtio_ring: change host notification API

2013-10-28 Thread Heinz Graalfs
Rusty,

here is just patch 1 (using bool as return value in notify API).

Thanks.

Heinz Graalfs (9):
  virtio_ring: change host notification API
  virtio_ring: let virtqueue_{kick()/notify()} return a bool
  virtio_net: verify if virtqueue_kick() succeeded
  virtio_test: verify if virtqueue_kick() succeeded
  virtio_ring: add new function virtqueue_is_broken()
  virtio_blk: verify if queue is broken after virtqueue_get_buf()
  virtio_console: verify if queue is broken after virtqueue_get_buf()
  virtio_net: verify if queue is broken after virtqueue_get_buf()
  virtio_scsi: verify if queue is broken after virtqueue_get_buf()

 drivers/block/virtio_blk.c |  2 ++
 drivers/char/virtio_console.c  |  6 --
 drivers/lguest/lguest_device.c |  3 ++-
 drivers/net/virtio_net.c   | 12 +++-
 drivers/remoteproc/remoteproc_virtio.c |  3 ++-
 drivers/s390/kvm/kvm_virtio.c  |  8 ++--
 drivers/s390/kvm/virtio_ccw.c  |  5 -
 drivers/scsi/virtio_scsi.c |  3 ++-
 drivers/virtio/virtio_mmio.c   |  3 ++-
 drivers/virtio/virtio_pci.c|  3 ++-
 drivers/virtio/virtio_ring.c   | 32 ++--
 include/linux/virtio.h |  6 --
 include/linux/virtio_ring.h|  2 +-
 tools/virtio/virtio_test.c |  6 --
 tools/virtio/vringh_test.c | 13 +
 15 files changed, 77 insertions(+), 30 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 9/9] virtio_scsi: verify if queue is broken after virtqueue_get_buf()

2013-10-24 Thread Heinz Graalfs
If virtqueue_get_buf() returned with a NULL pointer avoid a possibly
endless loop by checking for a broken virtqueue.

Signed-off-by: Heinz Graalfs 
---
 drivers/scsi/virtio_scsi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 74b88ef..45bcdb5 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -224,7 +224,8 @@ static void virtscsi_vq_done(struct virtio_scsi *vscsi,
virtqueue_disable_cb(vq);
while ((buf = virtqueue_get_buf(vq, &len)) != NULL)
fn(vscsi, buf);
-   } while (!virtqueue_enable_cb(vq));
+   } while (unlikely(!virtqueue_is_broken(vq)) &&
+!virtqueue_enable_cb(vq));
spin_unlock_irqrestore(&virtscsi_vq->vq_lock, flags);
 }
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 6/9] virtio_blk: verify if queue is broken after virtqueue_get_buf()

2013-10-24 Thread Heinz Graalfs
In case virtqueue_get_buf() returned with a NULL pointer verify if the
virtqueue is broken in order to leave while loop.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6472395..2d43be4 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -292,6 +292,8 @@ static void virtblk_done(struct virtqueue *vq)
req_done = true;
}
}
+   if (unlikely(virtqueue_is_broken(vq)))
+   break;
} while (!virtqueue_enable_cb(vq));
/* In case queue is stopped waiting for more buffers. */
if (req_done)
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 5/9] virtio_ring: add new function virtqueue_is_broken()

2013-10-24 Thread Heinz Graalfs
Add new function virtqueue_is_broken(). Callers of virtqueue_get_buf()
should check for a broken queue.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio_ring.c | 8 
 include/linux/virtio.h   | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a7e0eec..7eb844b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -848,4 +848,12 @@ unsigned int virtqueue_get_vring_size(struct virtqueue 
*_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
+bool virtqueue_is_broken(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->broken;
+}
+EXPORT_SYMBOL_GPL(virtqueue_is_broken);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4557e8a..f15f6e7 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -76,6 +76,8 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
 unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
+bool virtqueue_is_broken(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 7/9] virtio_console: verify if queue is broken after virtqueue_get_buf()

2013-10-24 Thread Heinz Graalfs
If virtqueue_get_buf() returns with a NULL pointer it should be verified
if the virtqueue is broken, in order to avoid loop calling cpu_relax().

Signed-off-by: Heinz Graalfs 
---
 drivers/char/virtio_console.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 1b456fe..9dce2f0 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -574,7 +574,8 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
spin_lock(&portdev->c_ovq_lock);
if (virtqueue_add_outbuf(vq, sg, 1, &cpkt, GFP_ATOMIC) == 0) {
virtqueue_kick(vq);
-   while (!virtqueue_get_buf(vq, &len))
+   while (!virtqueue_get_buf(vq, &len)
+   && !virtqueue_is_broken(vq))
cpu_relax();
}
spin_unlock(&portdev->c_ovq_lock);
@@ -647,7 +648,8 @@ static ssize_t __send_to_port(struct port *port, struct 
scatterlist *sg,
 * we need to kmalloc a GFP_ATOMIC buffer each time the
 * console driver writes something out.
 */
-   while (!virtqueue_get_buf(out_vq, &len))
+   while (!virtqueue_get_buf(out_vq, &len)
+   && !virtqueue_is_broken(out_vq))
cpu_relax();
 done:
spin_unlock_irqrestore(&port->outvq_lock, flags);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 2/9] virtio_ring: let virtqueue_{kick()/notify()} return a bool

2013-10-24 Thread Heinz Graalfs
virtqueue_{kick()/notify()} should exploit the new host notification API.
If the notify call returned with a negative value the host kick failed
(e.g. a kick triggered after a device was hot-unplugged). In this case
the virtqueue is set to 'broken' and false is returned, otherwise true.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio_ring.c | 20 
 include/linux/virtio.h   |  4 ++--
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index aa40f6c..a7e0eec 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -459,13 +459,22 @@ EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
  * @vq: the struct virtqueue
  *
  * This does not need to be serialized.
+ *
+ * Returns false if host notify failed or queue is broken, otherwise true.
  */
-void virtqueue_notify(struct virtqueue *_vq)
+bool virtqueue_notify(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   if (unlikely(vq->broken))
+   return false;
+
/* Prod other side to tell it about changes. */
-   vq->notify(_vq);
+   if (vq->notify(_vq) < 0) {
+   vq->broken = true;
+   return false;
+   }
+   return true;
 }
 EXPORT_SYMBOL_GPL(virtqueue_notify);
 
@@ -478,11 +487,14 @@ EXPORT_SYMBOL_GPL(virtqueue_notify);
  *
  * Caller must ensure we don't call this with other virtqueue
  * operations at the same time (except where noted).
+ *
+ * Returns false if kick failed, otherwise true.
  */
-void virtqueue_kick(struct virtqueue *vq)
+bool virtqueue_kick(struct virtqueue *vq)
 {
if (virtqueue_kick_prepare(vq))
-   virtqueue_notify(vq);
+   return virtqueue_notify(vq);
+   return true;
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 9ff8645..4557e8a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -58,11 +58,11 @@ int virtqueue_add_sgs(struct virtqueue *vq,
  void *data,
  gfp_t gfp);
 
-void virtqueue_kick(struct virtqueue *vq);
+bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
 
-void virtqueue_notify(struct virtqueue *vq);
+bool virtqueue_notify(struct virtqueue *vq);
 
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 1/9] virtio_ring: change host notification API

2013-10-24 Thread Heinz Graalfs
Currently a host kick error is silently ignored and not reflected in
the virtqueue of a particular virtio device.

Changing the notify API for guest->host notification seems to be one
prerequisite in order to be able to handle such errors in the context
where the kick is triggered.

This patch changes the notify API. The notify function must return a
negative int return value in case the host notification failed.

Signed-off-by: Heinz Graalfs 
---
 drivers/lguest/lguest_device.c |  3 ++-
 drivers/remoteproc/remoteproc_virtio.c |  3 ++-
 drivers/s390/kvm/kvm_virtio.c  |  8 ++--
 drivers/s390/kvm/virtio_ccw.c  |  5 -
 drivers/virtio/virtio_mmio.c   |  3 ++-
 drivers/virtio/virtio_pci.c|  3 ++-
 drivers/virtio/virtio_ring.c   |  4 ++--
 include/linux/virtio_ring.h|  2 +-
 tools/virtio/virtio_test.c |  3 ++-
 tools/virtio/vringh_test.c | 13 +
 10 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index b3256ff..1275f18 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -229,7 +229,7 @@ struct lguest_vq_info {
  * make a hypercall.  We hand the physical address of the virtqueue so the Host
  * knows which virtqueue we're talking about.
  */
-static void lg_notify(struct virtqueue *vq)
+static int lg_notify(struct virtqueue *vq)
 {
/*
 * We store our virtqueue information in the "priv" pointer of the
@@ -238,6 +238,7 @@ static void lg_notify(struct virtqueue *vq)
struct lguest_vq_info *lvq = vq->priv;
 
hcall(LHCALL_NOTIFY, lvq->config.pfn << PAGE_SHIFT, 0, 0, 0);
+   return 0;
 }
 
 /* An extern declaration inside a C file is bad form.  Don't do it. */
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index b09c75c..39723ee 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -30,7 +30,7 @@
 #include "remoteproc_internal.h"
 
 /* kick the remote processor, and let it know which virtqueue to poke at */
-static void rproc_virtio_notify(struct virtqueue *vq)
+static int rproc_virtio_notify(struct virtqueue *vq)
 {
struct rproc_vring *rvring = vq->priv;
struct rproc *rproc = rvring->rvdev->rproc;
@@ -39,6 +39,7 @@ static void rproc_virtio_notify(struct virtqueue *vq)
dev_dbg(&rproc->dev, "kicking vq index: %d\n", notifyid);
 
rproc->ops->kick(rproc, notifyid);
+   return 0;
 }
 
 /**
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 2ea6165..4097a46 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -166,11 +166,15 @@ static void kvm_reset(struct virtio_device *vdev)
  * make a hypercall.  We hand the address  of the virtqueue so the Host
  * knows which virtqueue we're talking about.
  */
-static void kvm_notify(struct virtqueue *vq)
+static int kvm_notify(struct virtqueue *vq)
 {
+   long rc;
struct kvm_vqconfig *config = vq->priv;
 
-   kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, config->address);
+   rc = kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, config->address);
+   if (rc < 0)
+   return rc;
+   return 0;
 }
 
 /*
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 6a410d4..c640ecd 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -375,7 +375,7 @@ static inline long do_kvm_notify(struct subchannel_id schid,
return __rc;
 }
 
-static void virtio_ccw_kvm_notify(struct virtqueue *vq)
+static int virtio_ccw_kvm_notify(struct virtqueue *vq)
 {
struct virtio_ccw_vq_info *info = vq->priv;
struct virtio_ccw_device *vcdev;
@@ -384,6 +384,9 @@ static void virtio_ccw_kvm_notify(struct virtqueue *vq)
vcdev = to_vc_device(info->vq->vdev);
ccw_device_get_schid(vcdev->cdev, &schid);
info->cookie = do_kvm_notify(schid, vq->index, info->cookie);
+   if (info->cookie < 0)
+   return info->cookie;
+   return 0;
 }
 
 static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev,
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 1ba0d68..e213991 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -219,13 +219,14 @@ static void vm_reset(struct virtio_device *vdev)
 /* Transport interface */
 
 /* the notify function used when creating a virt queue */
-static void vm_notify(struct virtqueue *vq)
+static int vm_notify(struct virtqueue *vq)
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
 
/* We write the queue's selector into the notification register to
 * signal the other end */
write

[PATCH V2 RFC 3/9] virtio_net: verify if virtqueue_kick() succeeded

2013-10-24 Thread Heinz Graalfs
Verify if a host kick succeeded by checking return value of virtqueue_kick().

Signed-off-by: Heinz Graalfs 
---
 drivers/net/virtio_net.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ab2e5d0..6584e3a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -542,7 +542,8 @@ static bool try_fill_recv(struct receive_queue *rq, gfp_t 
gfp)
} while (rq->vq->num_free);
if (unlikely(rq->num > rq->max))
rq->max = rq->num;
-   virtqueue_kick(rq->vq);
+   if (unlikely(!virtqueue_kick(rq->vq)))
+   return false;
return !oom;
 }
 
@@ -728,7 +729,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
err = xmit_skb(sq, skb);
 
/* This should not happen! */
-   if (unlikely(err)) {
+   if (unlikely(err) || unlikely(!virtqueue_kick(sq->vq))) {
dev->stats.tx_fifo_errors++;
if (net_ratelimit())
dev_warn(&dev->dev,
@@ -737,7 +738,6 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
kfree_skb(skb);
return NETDEV_TX_OK;
}
-   virtqueue_kick(sq->vq);
 
/* Don't wait up for transmitted skbs to be freed. */
skb_orphan(skb);
@@ -796,7 +796,8 @@ static bool virtnet_send_command(struct virtnet_info *vi, 
u8 class, u8 cmd,
BUG_ON(virtqueue_add_sgs(vi->cvq, sgs, out_num, in_num, vi, GFP_ATOMIC)
   < 0);
 
-   virtqueue_kick(vi->cvq);
+   if (unlikely(!virtqueue_kick(vi->cvq)))
+   return status == VIRTIO_NET_OK;
 
/* Spin for a response, the kick causes an ioport write, trapping
 * into the hypervisor, so the request should be handled immediately.
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 0/9] virtio: fix hang(loop) after hot-unplug vlan

2013-10-24 Thread Heinz Graalfs
Hi,

this patch-set solves a hang situation when a vlan network device is
hot-unplugged from a KVM guest.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

The guest is notified about the hard removal with a CRW machine check.
As per architecture, the host must repond to any I/O operation for the
removed device with an error condition as if the device had never been
there.

During machine check handling in the guest, virtio exit functions try to
perform cleanup operations by triggering final I/O, including appropriate
host kicks. These operations fail, or do not complete, and lead to several
kinds of hang situations. In particular, virtio-ccw guest->host notification
on an unplugged device will receive an error; this is, however, not reflected
back to the affected virtqueues.

Here are the details of the error.

A hang (loop) occurs when a machine check is handled on System z due to a
vlan device removal. A loop spinning for a response for final IO in
virtnet_send_command() will never complete successfully because of a previous
unsuccessfull host kick operation (virtqueue_kick()).


Patch [1] changes the guest->host notification API. A potential error returned
by the host during notify() should not be ignored, but used in order to reflect
the error back to the affected virtqueue.

Patch [2] changes virtqueue_kick() and virtqueue_notify() to return a bool
depending on the result of the host notification operation. If the host
kick failed the current virtqueue is now flagged as 'broken'.

Patches [3,4] add code to verify host kicks by testing the return value of
virtqueue_kick() in order to avoid potential loops.

Patch [5] adds a new function virtqueue_is_broken(). This function should be
used to verify the state of a virtqueue when a previous virtqueue_get_buf()
returned a NULL pointer.

Patch [6,7,8,9] add virtqueue_is_broken() calls to handle potential errors
when a virtqueue_bet_buf() doesn't deliver any more buffers.


Heinz Graalfs (9):
  virtio_ring: change host notification API
  virtio_ring: let virtqueue_{kick()/notify()} return a bool
  virtio_net: verify if virtqueue_kick() succeeded
  virtio_test: verify if virtqueue_kick() succeeded
  virtio_ring: add new function virtqueue_is_broken()
  virtio_blk: verify if queue is broken after virtqueue_get_buf()
  virtio_console: verify if queue is broken after virtqueue_get_buf()
  virtio_net: verify if queue is broken after virtqueue_get_buf()
  virtio_scsi: verify if queue is broken after virtqueue_get_buf()

 drivers/block/virtio_blk.c |  2 ++
 drivers/char/virtio_console.c  |  6 --
 drivers/lguest/lguest_device.c |  3 ++-
 drivers/net/virtio_net.c   | 12 +++-
 drivers/remoteproc/remoteproc_virtio.c |  3 ++-
 drivers/s390/kvm/kvm_virtio.c  |  8 ++--
 drivers/s390/kvm/virtio_ccw.c  |  5 -
 drivers/scsi/virtio_scsi.c |  3 ++-
 drivers/virtio/virtio_mmio.c   |  3 ++-
 drivers/virtio/virtio_pci.c|  3 ++-
 drivers/virtio/virtio_ring.c   | 32 ++--
 include/linux/virtio.h |  6 --
 include/linux/virtio_ring.h|  2 +-
 tools/virtio/virtio_test.c |  6 --
 tools/virtio/vringh_test.c | 13 +
 15 files changed, 77 insertions(+), 30 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 4/9] virtio_test: verify if virtqueue_kick() succeeded

2013-10-24 Thread Heinz Graalfs
Verify if a host kick succeeded by checking return value of virtqueue_kick().

Signed-off-by: Heinz Graalfs 
---
 tools/virtio/virtio_test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index 8e3e432..8691317 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -172,7 +172,8 @@ static void run_test(struct vdev_info *dev, struct vq_info 
*vq,
 GFP_ATOMIC);
if (likely(r == 0)) {
++started;
-   virtqueue_kick(vq->vq);
+   if (unlikely(!virtqueue_kick(vq->vq))
+   r = -1;
}
} else
r = -1;
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2 RFC 8/9] virtio_net: verify if queue is broken after virtqueue_get_buf()

2013-10-24 Thread Heinz Graalfs
If a virtqueue_get_buf() call returns a NULL pointer a possibly endless while
loop should be avoided by checking for a broken virtqueue.

Signed-off-by: Heinz Graalfs 
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6584e3a..f092b9d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -802,7 +802,8 @@ static bool virtnet_send_command(struct virtnet_info *vi, 
u8 class, u8 cmd,
/* Spin for a response, the kick causes an ioport write, trapping
 * into the hypervisor, so the request should be handled immediately.
 */
-   while (!virtqueue_get_buf(vi->cvq, &tmp))
+   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
+  !virtqueue_is_broken(vi->cvq))
cpu_relax();
 
return status == VIRTIO_NET_OK;
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 3/7] virtio_net: avoid cpu_relax() call loop in case virtqueue is broken

2013-10-22 Thread Heinz Graalfs
A virtqueue_kick() call to notify a host might fail in case the network
device was hot-unplugged.
Spinning for a response for a VIRTIO_NET_CTRL_VLAN_DEL command response
will end up in a never ending loop waiting for a response.
This patch avoids the cpu_relax() loop in case the virtqueue is flagged
as broken.

Signed-off-by: Heinz Graalfs 
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ab2e5d0..57f5f13 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -800,8 +800,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, 
u8 class, u8 cmd,
 
/* Spin for a response, the kick causes an ioport write, trapping
 * into the hypervisor, so the request should be handled immediately.
+* Do not wait for a response in case the virtqueue is 'broken'.
 */
-   while (!virtqueue_get_buf(vi->cvq, &tmp))
+   while (!virtqueue_get_buf(vi->cvq, &tmp)
+   && !virtqueue_is_broken(vi->cvq))
cpu_relax();
 
return status == VIRTIO_NET_OK;
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/7] s390/virtio_ccw: set virtqueue as broken if host notify failed

2013-10-22 Thread Heinz Graalfs
Set the current virtqueue as broken if the appropriate host kick failed
(e.g. if the device was hot-unplugged via host means).
This error info can be exploited at various other places where a host
kick is triggered.

Signed-off-by: Heinz Graalfs 
---
 drivers/s390/kvm/virtio_ccw.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 6a410d4..ac0ace6 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -384,6 +384,8 @@ static void virtio_ccw_kvm_notify(struct virtqueue *vq)
vcdev = to_vc_device(info->vq->vdev);
ccw_device_get_schid(vcdev->cdev, &schid);
info->cookie = do_kvm_notify(schid, vq->index, info->cookie);
+   if (info->cookie < 0)
+   virtqueue_set_broken(vq);
 }
 
 static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev,
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 1/7] virtio_ring: add new functions virtqueue{_set_broken()/_is_broken()}

2013-10-22 Thread Heinz Graalfs
This patch adds 2 new functions:

virtqueue_set_broken(): to be called when a virtqueue kick operation fails.

virtqueue_is_broken(): can be called to query the virtqueue state after a host
was kicked.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio_ring.c | 16 
 include/linux/virtio.h   |  4 
 2 files changed, 20 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5217baf..930ee39 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -836,4 +836,20 @@ unsigned int virtqueue_get_vring_size(struct virtqueue 
*_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
+void virtqueue_set_broken(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   vq->broken = true;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_broken);
+
+bool virtqueue_is_broken(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->broken;
+}
+EXPORT_SYMBOL_GPL(virtqueue_is_broken);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 9ff8645..c09ae2b 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -76,6 +76,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
 unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
+void virtqueue_set_broken(struct virtqueue *vq);
+
+bool virtqueue_is_broken(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 6/7] virtio_blk: set request queue as dying in case virtqueue is broken

2013-10-22 Thread Heinz Graalfs
The request queue should be flagged as QUEUE_FLAG_DYING in case the host
kick failed for a new virtqueue request.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index a787e6e..01b5d3a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -377,8 +377,14 @@ static void virtblk_request(struct request_queue *q)
issued++;
}
 
-   if (issued)
+   if (issued) {
virtqueue_kick(vblk->vq);
+   if (virtqueue_is_broken(vblk->vq)) {
+   mutex_lock(&q->sysfs_lock);
+   queue_flag_set_unlocked(QUEUE_FLAG_DYING, q);
+   mutex_unlock(&q->sysfs_lock);
+   }
+   }
 }
 
 static void virtblk_make_request(struct request_queue *q, struct bio *bio)
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 7/7] virtio_blk: trigger IO errors in case virtqueue is broken

2013-10-22 Thread Heinz Graalfs
In case the virtqueue is flagged as broken, IO errors are triggered
for current request queue entries.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 01b5d3a..8eb91be 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -875,6 +875,20 @@ out:
return err;
 }
 
+static void virtblk_flush_request_queue(struct request_queue *q)
+{
+   spinlock_t *lock = q->queue_lock;
+   struct request *req;
+
+   if (!q)
+   return;
+
+   spin_lock_irq(lock);
+   while ((req = blk_fetch_request(q)))
+   __blk_end_request_all(req, -EIO);
+   spin_unlock_irq(lock);
+}
+
 static void virtblk_remove(struct virtio_device *vdev)
 {
struct virtio_blk *vblk = vdev->priv;
@@ -890,6 +904,7 @@ static void virtblk_remove(struct virtio_device *vdev)
virtqueue_notify(vblk->vq);
if (virtqueue_is_broken(vblk->vq)) {
queue_broken = true;
+   virtblk_flush_request_queue(vblk->disk->queue);
blk_cleanup_queue(vblk->disk->queue);
del_gendisk(vblk->disk);
} else {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 0/7] virtio: avoid various hang situations during hot-unplug

2013-10-22 Thread Heinz Graalfs
Hi,

this patch-set tries to solve various hang situations when virtio devices
(network or block) are hot-unplugged from a KVM guest.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

The guest is notified about the hard removal with a CRW machine check.
As per architecture, the host must repond to any I/O operation for the
removed device with an error condition as if the device had never been
there.

During machine check handling in the guest, virtio exit functions try to
perform cleanup operations by triggering final I/O, including appropriate
host kicks. These operations fail, or do not complete, and lead to several
kinds of hang situations. In particular, virtio-ccw guest->host notification
on an unplugged device will receive an error; this is, however, not reflected
back to the affected virtqueues.

Here are the details for some of the errors.

In the network case a hang (loop) occurs when a machine check is handled
on System z due to a vlan device removal. A loop spinning for a response
for final IO in virtnet_send_command() will never complete successfully
because of a previous unsuccessfull host kick operation (virtqueue_kick()).

The below patches [1,2] flag the virtqueue as 'broken' when a host kick failure
is detected. Patch [3] exploits this error info to avoid an endless invocation
of cpu_relax() when waiting for the command to complete.

Hang situations also occur when a block device is hot-unplugged.

Several different errors occur when a block device with mounted file-system(s)
is hot-unplugged. Asynchronous writeback functions, as well as page cache read
or write operations end up in never ending wait situations. Hang situations
occur during device removal when virtblk_remove() invokes del_gendisk() to
synch dirty inode pages (invalidate_partition()).

The below patches [4,5,6,7] also exploit a 'broken' virtqueue in order to
trigger IO errors as well as to prevent final hanging IO operations.


Heinz Graalfs (7):
  virtio_ring: add new functions virtqueue{_set_broken()/_is_broken()}
  s390/virtio_ccw: set virtqueue as broken if host notify failed
  virtio_net: avoid cpu_relax() call loop in case virtqueue is broken
  virtio_blk: use dummy virtqueue_notify() to detect host kick error
  virtio_blk: do not free device id if virtqueue is broken
  virtio_blk: set request queue as dying in case virtqueue is broken
  virtio_blk: trigger IO errors in case virtqueue is broken

 drivers/block/virtio_blk.c| 41 -
 drivers/net/virtio_net.c  |  4 +++-
 drivers/s390/kvm/virtio_ccw.c |  2 ++
 drivers/virtio/virtio_ring.c  | 16 
 include/linux/virtio.h|  4 
 5 files changed, 61 insertions(+), 6 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 5/7] virtio_blk: do not free device id if virtqueue is broken

2013-10-22 Thread Heinz Graalfs
Re-using the same device id when a device is re-added after it was
previously hot-unplugged should be avoided.
An additional test is added to check a potential broken virtqueue when
freeing the device id.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 98f081a..a787e6e 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -874,6 +874,7 @@ static void virtblk_remove(struct virtio_device *vdev)
struct virtio_blk *vblk = vdev->priv;
int index = vblk->index;
int refc;
+   bool queue_broken = false;
 
/* Prevent config work handler from accessing the device. */
mutex_lock(&vblk->config_lock);
@@ -882,6 +883,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 
virtqueue_notify(vblk->vq);
if (virtqueue_is_broken(vblk->vq)) {
+   queue_broken = true;
blk_cleanup_queue(vblk->disk->queue);
del_gendisk(vblk->disk);
} else {
@@ -900,8 +902,10 @@ static void virtblk_remove(struct virtio_device *vdev)
vdev->config->del_vqs(vdev);
kfree(vblk);
 
-   /* Only free device id if we don't have any users */
-   if (refc == 1)
+   /* Only free device id if we don't have any users
+* and virtqueue is not broken due to a hot-unplugged device
+*/
+   if (refc == 1 && !queue_broken)
ida_simple_remove(&vd_index_ida, index);
 }
 
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 4/7] virtio_blk: use dummy virtqueue_notify() to detect host kick error

2013-10-22 Thread Heinz Graalfs
Deleting the disk and partitions in virtblk_remove() via del_gendisk() causes
never ending waits when trying to synch dirty inode pages.

A dummy virtqueue_notify() in virtblk_remove() is used to detect a host
notification error, latter occurs when block device was hot-unplugged.
When the dummy host kick failed blk_cleanup_queue() should be invoked
prior to del_gendisk().

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6472395..98f081a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -880,8 +880,14 @@ static void virtblk_remove(struct virtio_device *vdev)
vblk->config_enable = false;
mutex_unlock(&vblk->config_lock);
 
-   del_gendisk(vblk->disk);
-   blk_cleanup_queue(vblk->disk->queue);
+   virtqueue_notify(vblk->vq);
+   if (virtqueue_is_broken(vblk->vq)) {
+   blk_cleanup_queue(vblk->disk->queue);
+   del_gendisk(vblk->disk);
+   } else {
+   del_gendisk(vblk->disk);
+   blk_cleanup_queue(vblk->disk->queue);
+   }
 
/* Stop all the virtqueues. */
vdev->config->reset(vdev);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization