[RFC PATCH 0/3] scsi-generic and BLKSECTGET

2017-01-16 Thread Eric Farman
(cc'ing linux-scsi for the cover-letter; patches only to QEMU lists.)

In the Linux kernel, I see two (three) places where the BLKSECTGET ioctl is
handled:

(1) block/(compat_)ioctl.c -- (compat_)blkdev_ioctl
(2) drivers/scsi/sg.c -- sg_ioctl

The former has been around forever[1], and returns a short value measured in
sectors.  A sector is generally assumed to be 512 bytes.

The latter has been around for slightly less than forever[2], and returns an
int that measures the value in bytes.  A change to return the block count
was brought up a few years ago[3] and nacked.

As a convenient example, if I use the blockdev tool to drive the ioctl to a
SCSI disk and its scsi-generic equivalent, I get different results:

  # lsscsi -g
  [0:0:8:1077166114]diskIBM  2107900  .217  /dev/sda /dev/sg0
  # blockdev --getmaxsect /dev/sda
  2560
  # blockdev --getmaxsect /dev/sg0
  20

Now, the value for /dev/sda looks "correct" to me.

  # cd /sys/devices/css0/0.0.0125/0.0.1f69/host0/rport-0\:0-8/
  # cd target0\:0\:8/0\:0\:8\:1077166114/
  # cat block/sda/queue/max_sectors_kb
  1280
  # cat block/sda/queue/hw_sector_size
  512

And the math checks out:

  max_sectors_kb * 1024 / hw_sector_size == getmaxsect
  -OR-
  1280 * 1024 / 512 = 2560

For /dev/sg0, it appears the answer is coming from the sg_ioctl result
which is already multiplied by the block size, and then looking at only the
upper half (short) of the returned big-endian fullword:

  (1280 * 1024 / 512) * 512 = 1310720 = x0014 => x0014 = 20

The reason for all this?  Well, QEMU recently added a BLKSECTGET ioctl
call[4] which we see during guest boot.  This code presumes the value is in
blocks/sectors, and converts it to bytes[5].  Not that this matters, because
the short/int discrepancy gives us "zero" on s390x.

Also, that code doesn't execute for scsi-generic devices, so the conversion
to bytes is correct, but I'd like to extend this code to interrogate
scsi-generic devices as well.  This is important because libvirt converts
a specified virtio-scsi device to its /dev/sgX address for the QEMU
commandline.

So, do I have to code around the different field sizes (int vs short) as
well as scaling (bytes vs blocks)?  Obviously doable, but looking at the
resulting commits, I find myself feeling a little ill.

[1] The initial kernel git commit
[2] kernel commit 44ec95425c1d9dce6e4638c29e4362cfb44814e7
[3] https://lkml.org/lkml/2012/6/27/78
[4] qemu commit 6f6071745bd0366221f5a0160ed7d18d0e38b9f7
[5] qemu commit 5def6b80e1eca696c1fc6099e7f4d36729686402

Eric Farman (3):
  hw/scsi: Fix debug message of cdb structure in scsi-generic
  block: Fix target variable of BLKSECTGET ioctl
  block: get max_transfer limit for char (scsi-generic) devices

 block/file-posix.c | 16 +---
 hw/scsi/scsi-generic.c |  5 +++--
 2 files changed, 16 insertions(+), 5 deletions(-)

-- 
2.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] virtio_scsi: Reject commands when virtqueue is broken

2017-01-13 Thread Eric Farman
In the case of a graceful set of detaches, where the virtio-scsi-ccw
disk is removed from the guest prior to the controller, the guest
behaves quite normally.  Specifically, the detach gets us into
sd_sync_cache to issue a Synchronize Cache(10) command, which
immediately fails (and is retried a couple of times) because the
device has been removed.  Later, the removal of the controller
sees two CRWs presented, but there's no further indication of the
removal from the guest viewpoint.

 [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
 [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
erc=4, rsid=2
 [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
erc=4, rsid=0

However, on s390, the SCSI disks can be removed "by surprise" when
an entire controller (host) is removed and all associated disks
are removed via the loop in scsi_forget_host.  The same call to
sd_sync_cache is made, but because the controller has already
been removed, the Synchronize Cache(10) command is neither issued
(and then failed) nor rejected.

That the I/O isn't returned means the guest cannot have other devices
added nor removed, and other tasks (such as shutdown or reboot) issued
by the guest will not complete either.  The virtio ring has already
been marked as broken (via virtio_break_device in virtio_ccw_remove),
but we still attempt to queue the command only to have it remain there.
The calling sequence provides a bit of distinction for us:

  virtscsi_queuecommand()
   -> virtscsi_kick_cmd()
-> virtscsi_add_cmd()
 -> virtqueue_add_sgs()
  -> virtqueue_add()
 if success
   return 0
 elseif vq->broken or vring_mapping_error()
   return -EIO
 else
   return -ENOSPC

A return of ENOSPC is generally a temporary condition, so returning
"host busy" from virtscsi_queuecommand makes sense here, to have it
redriven in a moment or two.  But the EIO return code is more of a
permanent error and so it would be wise to return the I/O itself and
allow the calling thread to finish gracefully.  The result is these
four kernel messages in the guest (the fourth one does not occur
prior to this patch):

 [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
erc=4, rsid=2
 [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
erc=4, rsid=0
 [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

I opted to fill in the same response data that is returned from the
more graceful device detach, where the disk device is removed prior
to the controller device.

Signed-off-by: Eric Farman <far...@linux.vnet.ibm.com>
---
 drivers/scsi/virtio_scsi.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index ec91bd0..c680d76 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -534,7 +534,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 {
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
+   unsigned long flags;
int req_size;
+   int ret;
 
BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
 
@@ -562,8 +564,15 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
req_size = sizeof(cmd->req.cmd);
}
 
-   if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 
0)
+   ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
+   if (ret == -EIO) {
+   cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
+   spin_lock_irqsave(_vq->vq_lock, flags);
+   virtscsi_complete_cmd(vscsi, cmd);
+   spin_unlock_irqrestore(_vq->vq_lock, flags);
+   } else if (ret != 0) {
return SCSI_MLQUEUE_HOST_BUSY;
+   }
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] virtio_scsi: Reject commands when virtqueue is broken

2017-01-13 Thread Eric Farman
While doing some disruptive testing with QEMU/KVM, I have encountered some
guest problems during hot unplug of virtio-scsi devices depending on the
order of operations in which they are performed.  The following notes
describe my setup (s390x), and how I'm able to reproduce the error and
test the attached fix.

In both the "working" and "failing" case, the detaches appear to work
just fine.  Any sign of problems only begin to appear later based on
other actions I may perform, such as powering off the guest system.

Host:
 # lsscsi -g | grep sg6
 [6:0:6:1074151456]diskIBM  2107900  .217  /dev/sdg   /dev/sg6 

QEMU:
 - Include the following parameters
-device virtio-scsi-ccw,id=scsi0
-drive file=/dev/sg6,if=none,id=drive0,format=raw
-device 
scsi-generic,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive0,id=hostdev6
 - QMP commands (working)
- device_del hostdev6
- device_del scsi0
 - QMP commands (failing)
- device_del scsi0

Libvirt:
 - Note: A preventative fix went into Libvirt 2.5.0
   (libvirt commit 655429a0d4a5 ("qemu: Prevent detaching SCSI controller used 
by hostdev"))
 - Include the following XML
# cat scsicontroller.xml 

# cat scsihostdev.xml 

  


  

 - virsh commands (working)
- virsh detach-device guest scsihostdev.xml
- virsh detach-device guest scsicontroller.xml
 - virsh commands (failing)
- virsh detach-device guest scsicontroller.xml

v1->v2:
 - Hold vq_lock across virtscsi_complete_cmd call (Fam Zheng)

Eric Farman (1):
  virtio_scsi: Reject commands when virtqueue is broken

 drivers/scsi/virtio_scsi.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_scsi: Reject commands when virtqueue is broken

2017-01-12 Thread Eric Farman



On 01/12/2017 08:45 AM, Fam Zheng wrote:

On Thu, 01/12 08:28, Eric Farman wrote:

-   if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 
0)
+   ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
+   if (ret == -EIO) {
+   cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
+   virtscsi_complete_cmd(vscsi, cmd);


Is this safe? Calling virtscsi_complete_cmd requires vq_lock but we don't seem
to have it here.


Hrm...  Didn't notice that, and can't speak to its safety.  I had a bit of
an I/O workload going to other disks, and things seemed okay, but it was by
no means an exhaustive test.

I can't use virtscsi_vq_done, which normally handles that acquire/release.
It calls virtqueue_get_buf prior to calling virtscsi_complete_cmd, which
returns NULL because the virtqueue is broken.Thus, no call to
virtscsi_complete_cmd.

Can I mock up a wrapping routine that only handles the lock and complete_cmd
call, and ignore the virtqueue components that virtscsi_vq_done does?


That sounds good to me, taking the vq_lock here around the call to
virtscsi_complete_cmd, just like virtscsi_kick_cmd().


Okay, working up a v2 today.  Thanks!

Eric



Fam



Eric



Fam


+   } else if (ret != 0) {
return SCSI_MLQUEUE_HOST_BUSY;
+   }
return 0;
 }








--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_scsi: Reject commands when virtqueue is broken

2017-01-12 Thread Eric Farman



On 01/11/2017 10:11 PM, Fam Zheng wrote:

On Wed, 01/11 17:02, Eric Farman wrote:

In the case of a graceful set of detaches, where the virtio-scsi-ccw
disk is removed from the guest prior to the controller, the guest
behaves quite normally.  Specifically, the detach gets us into
sd_sync_cache to issue a Synchronize Cache(10) command, which
immediately fails (and is retried a couple of times) because the
device has been removed.  Later, the removal of the controller
sees two CRWs presented, but there's no further indication of the
removal from the guest viewpoint.

 [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
 [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
erc=4, rsid=2
 [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
erc=4, rsid=0

However, on s390, the SCSI disks can be removed "by surprise" when
an entire controller (host) is removed and all associated disks
are removed via the loop in scsi_forget_host.  The same call to
sd_sync_cache is made, but because the controller has already
been removed, the Synchronize Cache(10) command is neither issued
(and then failed) nor rejected.

That the I/O isn't returned means the guest cannot have other devices
added nor removed, and other tasks (such as shutdown or reboot) issued
by the guest will not complete either.  The virtio ring has already
been marked as broken (via virtio_break_device in virtio_ccw_remove),
but we still attempt to queue the command only to have it remain there.
The calling sequence provides a bit of distinction for us:

  virtscsi_queuecommand()
   -> virtscsi_kick_cmd()
-> virtscsi_add_cmd()
 -> virtqueue_add_sgs()
  -> virtqueue_add()
 if success
   return 0
 elseif vq->broken or vring_mapping_error()
   return -EIO
 else
   return -ENOSPC

A return of ENOSPC is generally a temporary condition, so returning
"host busy" from virtscsi_queuecommand makes sense here, to have it
redriven in a moment or two.  But the EIO return code is more of a
permanent error and so it would be wise to return the I/O itself and
allow the calling thread to finish gracefully.  The result is these
four kernel messages in the guest (the fourth one does not occur
prior to this patch):

 [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
erc=4, rsid=2
 [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
erc=4, rsid=0
 [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

I opted to fill in the same response data that is returned from the
more graceful device detach, where the disk device is removed prior
to the controller device.

Signed-off-by: Eric Farman <far...@linux.vnet.ibm.com>
---
 drivers/scsi/virtio_scsi.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index ec91bd0..78d50ca 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -535,6 +535,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
int req_size;
+   int ret;

BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);

@@ -562,8 +563,13 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
req_size = sizeof(cmd->req.cmd);
}

-   if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 
0)
+   ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
+   if (ret == -EIO) {
+   cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
+   virtscsi_complete_cmd(vscsi, cmd);


Is this safe? Calling virtscsi_complete_cmd requires vq_lock but we don't seem
to have it here.


Hrm...  Didn't notice that, and can't speak to its safety.  I had a bit 
of an I/O workload going to other disks, and things seemed okay, but it 
was by no means an exhaustive test.


I can't use virtscsi_vq_done, which normally handles that 
acquire/release.  It calls virtqueue_get_buf prior to calling 
virtscsi_complete_cmd, which returns NULL because the virtqueue is 
broken.Thus, no call to virtscsi_complete_cmd.


Can I mock up a wrapping routine that only handles the lock and 
complete_cmd call, and ignore the virtqueue components that 
virtscsi_vq_done does?  Or do I need to consider that somehow despite it 
all being broken?


Eric



Fam


+   } else if (ret != 0) {
return SCSI_MLQUEUE_HOST_BUSY;
+   }
return 0;
 }




--
To unsubscribe from this list: send the li

[PATCH] virtio_scsi: Reject commands when virtqueue is broken

2017-01-11 Thread Eric Farman
In the case of a graceful set of detaches, where the virtio-scsi-ccw
disk is removed from the guest prior to the controller, the guest
behaves quite normally.  Specifically, the detach gets us into
sd_sync_cache to issue a Synchronize Cache(10) command, which
immediately fails (and is retried a couple of times) because the
device has been removed.  Later, the removal of the controller
sees two CRWs presented, but there's no further indication of the
removal from the guest viewpoint.

 [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
 [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
erc=4, rsid=2
 [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
erc=4, rsid=0

However, on s390, the SCSI disks can be removed "by surprise" when
an entire controller (host) is removed and all associated disks
are removed via the loop in scsi_forget_host.  The same call to
sd_sync_cache is made, but because the controller has already
been removed, the Synchronize Cache(10) command is neither issued
(and then failed) nor rejected.

That the I/O isn't returned means the guest cannot have other devices
added nor removed, and other tasks (such as shutdown or reboot) issued
by the guest will not complete either.  The virtio ring has already
been marked as broken (via virtio_break_device in virtio_ccw_remove),
but we still attempt to queue the command only to have it remain there.
The calling sequence provides a bit of distinction for us:

  virtscsi_queuecommand()
   -> virtscsi_kick_cmd()
-> virtscsi_add_cmd()
 -> virtqueue_add_sgs()
  -> virtqueue_add()
 if success
   return 0
 elseif vq->broken or vring_mapping_error()
   return -EIO
 else
   return -ENOSPC

A return of ENOSPC is generally a temporary condition, so returning
"host busy" from virtscsi_queuecommand makes sense here, to have it
redriven in a moment or two.  But the EIO return code is more of a
permanent error and so it would be wise to return the I/O itself and
allow the calling thread to finish gracefully.  The result is these
four kernel messages in the guest (the fourth one does not occur
prior to this patch):

 [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
erc=4, rsid=2
 [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
erc=4, rsid=0
 [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
 [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

I opted to fill in the same response data that is returned from the
more graceful device detach, where the disk device is removed prior
to the controller device.

Signed-off-by: Eric Farman <far...@linux.vnet.ibm.com>
---
 drivers/scsi/virtio_scsi.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index ec91bd0..78d50ca 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -535,6 +535,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
int req_size;
+   int ret;
 
BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
 
@@ -562,8 +563,13 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
req_size = sizeof(cmd->req.cmd);
}
 
-   if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 
0)
+   ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
+   if (ret == -EIO) {
+   cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
+   virtscsi_complete_cmd(vscsi, cmd);
+   } else if (ret != 0) {
return SCSI_MLQUEUE_HOST_BUSY;
+   }
return 0;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_scsi: Reject commands when virtqueue is broken

2017-01-11 Thread Eric Farman
While doing some disruptive testing with QEMU/KVM, I have encountered some
guest problems during hot unplug of virtio-scsi devices depending on the
order of operations in which they are performed.  The following notes
describe my setup (s390x), and how I'm able to reproduce the error and
test the attached fix.

In both the "working" and "failing" case, the detaches appear to work
just fine.  Any sign of problems only begin to appear later based on
other actions I may perform, such as powering off the guest system.

Host:
 # lsscsi -g | grep sg6
 [6:0:6:1074151456]diskIBM  2107900  .217  /dev/sdg   /dev/sg6 

QEMU:
 - Include the following parameters
-device virtio-scsi-ccw,id=scsi0
-drive file=/dev/sg6,if=none,id=drive0,format=raw
-device 
scsi-generic,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive0,id=hostdev6
 - QMP commands (working)
- device_del hostdev6
- device_del scsi0
 - QMP commands (failing)
- device_del scsi0

Libvirt:
 - Note: A preventative fix went into Libvirt 2.5.0
   (libvirt commit 655429a0d4a5 ("qemu: Prevent detaching SCSI controller used 
by hostdev"))
 - Include the following XML
# cat scsicontroller.xml 

# cat scsihostdev.xml 

  


  

 - virsh commands (working)
- virsh detach-device guest scsihostdev.xml
- virsh detach-device guest scsicontroller.xml
 - virsh commands (failing)
- virsh detach-device guest scsicontroller.xml

Eric Farman (1):
  virtio_scsi: Reject commands when virtqueue is broken

 drivers/scsi/virtio_scsi.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html