Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-24 Thread Paolo Bonzini
Il 24/04/2012 06:21, ronnie sahlberg ha scritto:
 Hi Stefan,
 
 A little bit off-topic but
 
 When you design the proper place and API to plug virt-scsi into an
 external SCSI parser outside of qemu like the target in the kernel ...
 
 It would be very nice if one could also plug virt-scsi into libiscsi
 and pass the CDBs straight to the remote iSCSI target too.
 Keep some thoughts on virt-scsi + libiscsi integration.

Yes, that makes a lot of sense.  It's a bit harder than scsi-generic but
we do want to get there.

Paolo



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-24 Thread Stefan Hajnoczi
On Tue, Apr 24, 2012 at 8:05 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/04/2012 06:21, ronnie sahlberg ha scritto:
 Hi Stefan,

 A little bit off-topic but

 When you design the proper place and API to plug virt-scsi into an
 external SCSI parser outside of qemu like the target in the kernel ...

 It would be very nice if one could also plug virt-scsi into libiscsi
 and pass the CDBs straight to the remote iSCSI target too.
 Keep some thoughts on virt-scsi + libiscsi integration.

 Yes, that makes a lot of sense.  It's a bit harder than scsi-generic but
 we do want to get there.

Yep.  I think previously there was discussion about a libiscsi
SCSIDevice so that guest SCSI commands can be sent to libiscsi LUNs
without going through the QEMU block layer.  (Another way to pass
arbitrary SCSI commands to libiscsi is by hooking up .bdrv_aio_ioctl()
with SG_IO scsi-generic compatible code in block/iscsi.c.)

I think what you're describing now is one level higher: the libiscsi
target *is* the virtual SCSI target and virtio-scsi just ships off
commands to it?  This would be like a SCSIBus that is handle by
libiscsi target instead of emulated in QEMU.

Stefan



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-24 Thread ronnie sahlberg
On Tue, Apr 24, 2012 at 7:13 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Apr 24, 2012 at 8:05 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/04/2012 06:21, ronnie sahlberg ha scritto:
 Hi Stefan,

 A little bit off-topic but

 When you design the proper place and API to plug virt-scsi into an
 external SCSI parser outside of qemu like the target in the kernel ...

 It would be very nice if one could also plug virt-scsi into libiscsi
 and pass the CDBs straight to the remote iSCSI target too.
 Keep some thoughts on virt-scsi + libiscsi integration.

 Yes, that makes a lot of sense.  It's a bit harder than scsi-generic but
 we do want to get there.

 Yep.  I think previously there was discussion about a libiscsi
 SCSIDevice so that guest SCSI commands can be sent to libiscsi LUNs
 without going through the QEMU block layer.  (Another way to pass
 arbitrary SCSI commands to libiscsi is by hooking up .bdrv_aio_ioctl()
 with SG_IO scsi-generic compatible code in block/iscsi.c.)

bdrv_aio_ioctl() and SG_IO would mean #ifdef __linux__

So maybe it would be better to instead create a new hw/scsi-scsi.c
that calls straight into block/iscsi.c ?
That would be a lot more work than emulating SG_IO but would work on
all platforms.


Comments? How important is !linux support ?

regards
ronnie sahlberg



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-23 Thread Stefan Hajnoczi
On Sat, Apr 21, 2012 at 9:51 AM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 On Fri, 2012-04-20 at 12:09 +0100, Stefan Hajnoczi wrote:
 On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto:

 SNIP

  - no support for migration (there can be pending SCSI requests at
  migration time, that need to be restarted on the destination)

 Yes and it hasn't been thought through by me at least ;-).  So
 migration is indeed a challenge that needs to be worked through.

  - no support for non-raw images (fix: use NBD on a Unix socket? perhaps
  add an NBD backend to lio)

 For me this is the biggest issue with kernel-level storage for virtual
 machines.  We have NBD today but it goes through the network stack
 using a limited protocol and probably can't do zero-copy.

 The most promising option I found was dm-userspace
 (http://wiki.xensource.com/xenwiki/DmUserspace), which implements a
 device-mapper target with an in-kernel MMU-like lookup mechanism that
 calls out to userspace when block addresses need to be translated.
 It's not anywhere near to upstream and hasn't been pushed for several
 years.  On the plus side we could also write a userspace
 implementation of this so that QEMU image formats continue to be
 portable to other host OSes without duplicating code.

 If tcm_vhost only works with raw images then I don't see it as a
 realistic option given the effort it will require to complete and
 maintain.


 So there has been interest in the past for creating a TCM backend that
 allows for a userspace passthrough, but so far the code to do this has
 not materialized yet..

 There are pieces of logic from STGT that provide an interface for doing
 something similar that still exist in the upstream kernel.  Allowing
 different QEMU formats to be processed (in userspace) through a hybrid
 TCM backend driver that fits into the existing HBA/DEV layout in
 /sys/kernel/config/target/$HBA/$DEV/ is what would be required to really
 do this properly..

Could you point to the existing upstream code?

I think the hybrid TCM backend driver means a way for a userspace
process to execute SCSI Tasks from the target - implementing a subset
of se_subsystem_api in userspace?

If we solve the problem at the block driver level instead of inside
the SCSI target then it's also possible for the host to inspect VM
disk images similar to loopback devices (mount -o loop).  Do you think
putting the userspace interface into the SCSI target has advantages
over the block driver or device-mapper level?

Stefan



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-23 Thread ronnie sahlberg
On Mon, Apr 23, 2012 at 7:33 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Sat, Apr 21, 2012 at 9:51 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
 On Fri, 2012-04-20 at 12:09 +0100, Stefan Hajnoczi wrote:
 On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto:

 SNIP

  - no support for migration (there can be pending SCSI requests at
  migration time, that need to be restarted on the destination)

 Yes and it hasn't been thought through by me at least ;-).  So
 migration is indeed a challenge that needs to be worked through.

  - no support for non-raw images (fix: use NBD on a Unix socket? perhaps
  add an NBD backend to lio)

 For me this is the biggest issue with kernel-level storage for virtual
 machines.  We have NBD today but it goes through the network stack
 using a limited protocol and probably can't do zero-copy.

 The most promising option I found was dm-userspace
 (http://wiki.xensource.com/xenwiki/DmUserspace), which implements a
 device-mapper target with an in-kernel MMU-like lookup mechanism that
 calls out to userspace when block addresses need to be translated.
 It's not anywhere near to upstream and hasn't been pushed for several
 years.  On the plus side we could also write a userspace
 implementation of this so that QEMU image formats continue to be
 portable to other host OSes without duplicating code.

 If tcm_vhost only works with raw images then I don't see it as a
 realistic option given the effort it will require to complete and
 maintain.


 So there has been interest in the past for creating a TCM backend that
 allows for a userspace passthrough, but so far the code to do this has
 not materialized yet..

 There are pieces of logic from STGT that provide an interface for doing
 something similar that still exist in the upstream kernel.  Allowing
 different QEMU formats to be processed (in userspace) through a hybrid
 TCM backend driver that fits into the existing HBA/DEV layout in
 /sys/kernel/config/target/$HBA/$DEV/ is what would be required to really
 do this properly..

 Could you point to the existing upstream code?

 I think the hybrid TCM backend driver means a way for a userspace
 process to execute SCSI Tasks from the target - implementing a subset
 of se_subsystem_api in userspace?

 If we solve the problem at the block driver level instead of inside
 the SCSI target then it's also possible for the host to inspect VM
 disk images similar to loopback devices (mount -o loop).  Do you think
 putting the userspace interface into the SCSI target has advantages
 over the block driver or device-mapper level?


Hi Stefan,

A little bit off-topic but

When you design the proper place and API to plug virt-scsi into an
external SCSI parser outside of qemu like the target in the kernel ...

It would be very nice if one could also plug virt-scsi into libiscsi
and pass the CDBs straight to the remote iSCSI target too.
Keep some thoughts on virt-scsi + libiscsi integration.


regards
ronnie sahlberg



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-21 Thread Nicholas A. Bellinger
On Fri, 2012-04-20 at 12:09 +0100, Stefan Hajnoczi wrote:
 On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto:

SNIP

  - no support for migration (there can be pending SCSI requests at
  migration time, that need to be restarted on the destination)
 
 Yes and it hasn't been thought through by me at least ;-).  So
 migration is indeed a challenge that needs to be worked through.
 
  - no support for non-raw images (fix: use NBD on a Unix socket? perhaps
  add an NBD backend to lio)
 
 For me this is the biggest issue with kernel-level storage for virtual
 machines.  We have NBD today but it goes through the network stack
 using a limited protocol and probably can't do zero-copy.
 
 The most promising option I found was dm-userspace
 (http://wiki.xensource.com/xenwiki/DmUserspace), which implements a
 device-mapper target with an in-kernel MMU-like lookup mechanism that
 calls out to userspace when block addresses need to be translated.
 It's not anywhere near to upstream and hasn't been pushed for several
 years.  On the plus side we could also write a userspace
 implementation of this so that QEMU image formats continue to be
 portable to other host OSes without duplicating code.
 
 If tcm_vhost only works with raw images then I don't see it as a
 realistic option given the effort it will require to complete and
 maintain.
 

So there has been interest in the past for creating a TCM backend that
allows for a userspace passthrough, but so far the code to do this has
not materialized yet..

There are pieces of logic from STGT that provide an interface for doing
something similar that still exist in the upstream kernel.  Allowing
different QEMU formats to be processed (in userspace) through a hybrid
TCM backend driver that fits into the existing HBA/DEV layout in
/sys/kernel/config/target/$HBA/$DEV/ is what would be required to really
do this properly..

  In order for QEMU userspace to support this, Linux would need to expose
  a method to userspace for issuing DIF protected CDBs.  This userspace
  API currently does not exist AFAIK, so a kernel-level approach is the
  currently the only option when it comes to supporting end-to-end block
  protection information originating from within Linux guests.
 
  I think it would be worthwhile to have this in userspace too.
 
  (Note this is going to involve a virtio-scsi spec rev as well)
 
  Yes.  By the way, another possible modification could be to tell the
  guest what is its (initiator) WWPN.
 
 Going back to ALUA, I'd like to understand ALUA multipathing a bit
 better.  I've never played with multipath, hence my questions:
 
 I have a SAN with multiple controllers and ALUA support - so ALUA
 multipathing is possible.  Now I want my KVM guests to take advantage
 of multipath themselves.  Since the LIO target virtualizes the SCSI
 bus (the host admin defines LUNs, target ports, and ACLs that do not
 have to map 1:1 to the SAN) we also have to implement ALUA in the
 virtio-scsi target.  The same would be true for QEMU SCSI emulation.
 

The virtio-scsi (as an SCSI LLD in guest) is using scsi_dh_alua device
handler just like any other SCSI driver does.  (eg: ALUA is a fabric
independent feature)

That means there is no special requirements for initiator LLDs to be
able to use scsi_dh_alua, other than the target supporting ALUA
primitives + NAA IEEE extended registered naming to identify the backend
device across multiple paths.

This also currently requires explict multipathd.conf setup (in the
guest) if the target LUNs vendor/product strings do not match the
default supported ALUA array list in upstream scsi_dh_alua.c code.

 How would we configure LIO's ALUA in this case?  We really want to
 reflect the port attributes (available/offline,
 optimized/non-optimized) that the external SAN fabric reports.  Is
 this supported by LIO?
 

Absolutely.  The ability to set ALUA primary access state comes for free
with all fabric modules using TCM + virtual backends (BLOCK+FILEIO).  

The ALUA status appear as attributes under each endpoint LUN under:

   
/sys/kernel/config/target/vhost/naa.60014050088ae39a/tpgt_1/lun/lun_0/alua_tg_pt_*

The 'alua_tg_pt_gp' attr is used to optionally set the fabric LUN ALUA
target port group membership.

Each fabric target LUN is (by default) associated with an alua_tg_pt_gp
that is specific to the exported device backend.  Each backend device
can have any number of ALUA tg_pt_gps that exist in a configfs group
under /sys/kernel/config/target/$HBA/$DEV/alua/$TG_PT_GP_NAME.

Here is an quick idea of how an 'default_tg_pt_gp' looks for an
IBLOCK device with multiple fabric exports (iscsi, loopback, vhost)

# head 
/sys/kernel/config/target/core/iblock_0/mpt_fusion/alua/default_tg_pt_gp/*
== alua_access_state ==
0

== alua_access_status ==
None

== alua_access_type ==
Implict and Explict

== alua_write_metadata ==
1

== members ==

Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread Nicholas A. Bellinger
On Thu, 2012-04-19 at 07:30 -0500, Anthony Liguori wrote:
 Hi,
 

Hi Anthony,

 As I've mentioned before in the past, I will apply vhost-* without an 
 extremely 
 compelling argument for it.
 
 The reason we ultimately settled on vhost-net is that in the absence of a 
 fundamental change in the userspace networking interface (with something like 
 VJ 
 channels), it's extremely unlikely that we would ever get the right 
 interfaces 
 to do zero-copy transmit/receive in userspace.
 
 However, for storage, be it scsi or direct access, the same problem really 
 doesn't exist.  There isn't an obvious benefit to being in the kernel.
 

In the modern Linux v3.x tree, it was decided there is an obvious
benefit to fabric drivers developers for going ahead and putting proper
SCSI target logic directly into the kernel..  ;)

 There are many downsides though.  It's a big security risk.  SCSI is a 
 complex 
 protocol and I'd rather the piece of code that's parsing and emulating SCSI 
 cdbs 
 was unprivileged and sandboxed than running within the kernel context.
 

It has historically been a security risk doing raw SCSI passthrough of
complex CDBs to underlying SCSI LLD code, because SCSI CDB emulation
support within specific LLDs had an easy and sure-fire chance to get
said complex CDB emulation wrong..  (eg: no generic library in the
kernel for LLDs until libata was created)

With Linux v3.x hosts we now have universal target mode support in the
upstream kernel for BLOCK+FILEIO backends with full SPC-3 (NAA IEEE
Registered Extended WWN naming, Explict/Implict ALUA multipath,
Persistent Reservations) using a control plane with ConfigFS
representing objects at the VFS layer for parent/child and intra-module
kernel data structure relationships.

We also have userspace rtslib (supported in downstream distros) that
exposes the ConfigFS layout of tcm_vhost fabric endpoints as easily
scriptable python library objects into higher level application code.

 So before we get too deep in patches, we need some solid justification first.
 

So the potential performance benefit is one thing that will be in favor
of vhost-scsi, I think the ability to utilize the underlying TCM fabric
and run concurrent ALUA multipath using multiple virtio-scsi LUNs to the
same /sys/kernel/config/target/core/$HBA/$DEV/ backend can potentially
give us some nice flexibility when dynamically managing paths into the
virtio-scsi guest.

Also, since client side ALUA is supported across pretty much all server
class SCSI clients in the last years, it does end up getting alot of
usage in the SCSI world.  It's a client side SCSI multipath feature that
is fabric independent, and that we know is already supported for free
across all Linux flavours + other modern server class guests.

--nab




Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread Zhi Yong Wu
I have pushed the rebased QEMU codes and tcm_vhost codes to my git tree.
Perhaps some one is interested in playing with it.

1.) my kernel git
g...@github.com:wuzhy/linux.git tcm_vhost
https://github.com/wuzhy/linux/commits/tcm_vhost
2.) my QEMU git
g...@github.com:wuzhy/qemu.git vhost-scsi
https://github.com/wuzhy/qemu/commits/vhost-scsi

By the way, it is based on latest qemu.git/master.

On Thu, 2012-04-19 at 10:38 +0800, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
 The patchset was developed originally by Stefan about one year ago. I now 
 rebase it to latest qemu.git/master and fixed some issues to make it work 
 against tcm_vhost and virtio_scsi driver. But there are still some issues to 
 fix. Let us make more effort later.
 
 Stefan Hajnoczi (13):
   virtio-scsi: Add wwpn and tgpt properties
   vhost: Pass device path to vhost_dev_init()
   virtio-scsi: Add vhost_vring_target ioctl struct
   virtio-scsi: Fix tgpt typo to tpgt and use uint16_t
   virtio-scsi: Build virtio-scsi.o against vhost.o
   virtio-scsi: Open and initialize /dev/vhost-scsi
   virtio-scsi: Start/stop vhost
   notifier: add validity check and notify function
   virtio-pci: support host notifiers in TCG mode
   virtio-pci: check that event notification worked
   vhost-scsi: add -vhost-scsi host device
   virtio-scsi: use the vhost-scsi host device
   virtio-scsi: WIP VHOST_SCSI_SET_ENDPOINT call
 
 Zhi Yong Wu (3):
   vhost-scsi: enable vhost notifiers for multiple queues
   vhost-scsi: move some definitions to its header file
   vhost-scsi: clear endpoint on stopped
 
  Makefile.target  |2 +-
  configure|2 +
  event_notifier.c |   21 +++
  event_notifier.h |4 +
  hw/qdev-properties.c |   32 ++
  hw/qdev.h|3 +
  hw/vhost-scsi.c  |  156 
 ++
  hw/vhost-scsi.h  |   39 +
  hw/vhost.c   |5 +-
  hw/vhost.h   |3 +-
  hw/vhost_net.c   |2 +-
  hw/virtio-pci.c  |   28 -
  hw/virtio-scsi.c |   52 +
  hw/virtio-scsi.h |1 +
  hw/virtio.c  |7 ++
  qemu-common.h|1 +
  qemu-config.c|   16 +
  qemu-options.hx  |4 +
  vl.c |   18 ++
  19 files changed, 388 insertions(+), 8 deletions(-)
  create mode 100644 hw/vhost-scsi.c
  create mode 100644 hw/vhost-scsi.h
 

-- 
Regards,

Zhi Yong Wu




Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread Nicholas A. Bellinger
On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote:
 Hi Nicholas,
 
 On 04/19/2012 06:53 PM, Nicholas A. Bellinger wrote:
  On Thu, 2012-04-19 at 07:30 -0500, Anthony Liguori wrote:
  However, for storage, be it scsi or direct access, the same problem really
  doesn't exist.  There isn't an obvious benefit to being in the kernel.
 
 
  In the modern Linux v3.x tree, it was decided there is an obvious
  benefit to fabric drivers developers for going ahead and putting proper
  SCSI target logic directly into the kernel..  ;)
 
 I'm sure there are obvious benefits to having the kernel have SCSI target 
 logic. 
   I'm not claiming that there isn't.
 
 But there is not an obvious benefit to doing SCSI emulation *for virtual 
 machine* guests in the kernel.
 
 Guests are unconditionally hostile.  There is no qualification here.  Public 
 clouds are the obvious example of this.
 
 TCM runs in the absolute most privileged context possible.  When you're 
 dealing 
 with extremely hostile input, it's pretty obvious that you want to run it in 
 the 
 lowest privileged context as humanly possible.
 

The argument that a SCSI target for virtual machines is so complex that
it can't possibly be implemented properly in the kernel is a bunch of
non-sense.

 Good or bad, QEMU runs as a very unprivileged user confined by SELinux and 
 very 
 soon, sandboxed with seccomp.  There's an obvious benefit to putting complex 
 code into an environment like this.
 

Being able to identify which virtio-scsi guests can actually connect via
vhost-scsi into individual tcm_vhost endpoints is step one here.

tcm_vhost (as well as it's older sibling tcm_loop) are currently both
using a virtual initiator WWPN that is set via configfs before attaching
the virtual machine to tcm_vhost fabric endpoint + LUNs.

Using vhost-scsi initiator WWPNs to enforce what clients can connect to
individual tcm_vhost endpoints is one option for restricting access.  We
are already doing something similar with iscsi-target and tcm_fc(FCoE)
endpoints to restrict fabric login access from remote SCSI initiator
ports..

SNIP

 
  So before we get too deep in patches, we need some solid justification 
  first.
 
 
  So the potential performance benefit is one thing that will be in favor
  of vhost-scsi,
 
 Why?  Why would vhost-scsi be any faster than doing target emulation in 
 userspace and then doing O_DIRECT with linux-aio to a raw device?
 
 ?

Well, using a raw device from userspace there is still going to be a
SG-IO memcpy going on here between user - kernel in current code,
yes..?

Being able to deliver interrupts and SGL memory directly into tcm_vhost
cmwq kernel context for backend device execution w/o QEMU userspace
involvement or extra SGL memcpy is the perceived performance benefit
here.

How much benefit will this actually provide across single port and multi
port tcm_vhost LUNs into a single guest..?  That still remains to be
demonstrated with performance+throughput benchmarks..

  I think the ability to utilize the underlying TCM fabric
  and run concurrent ALUA multipath using multiple virtio-scsi LUNs to the
  same /sys/kernel/config/target/core/$HBA/$DEV/ backend can potentially
  give us some nice flexibility when dynamically managing paths into the
  virtio-scsi guest.
 
 The thing is, you can always setup this kind of infrastructure and expose a 
 raw 
 block device to userspace and then have QEMU emulate a target and turn that 
 into 
 O_DIRECT + linux-aio.
 
 We can also use SG_IO to inject SCSI commands if we really need to.  I'd 
 rather 
 we optimize this path.  If nothing else, QEMU should be filtering SCSI 
 requests 
 before the kernel sees them.  If something is going to SEGV, it's far better 
 that it's QEMU than the kernel.
 

QEMU SG-IO and BSG drivers are fine for tcm_loop SCSI LUNs with QEMU HBA
emulation, but they still aren't tied directly to an individual guest
instance.

That is, the raw devices being passed into SG-IO / BSG are still locally
accessible on host via SCSI devices w/o guest access restrictions, while
a tcm_vhost endpoint is not exposing any host accessible block device,
that could also restrict access to an authorized list of virtio-scsi
clients.

 We cannot avoid doing SCSI emulation in QEMU.  SCSI is too fundamental to far 
 too many devices.  So the prospect of not having good SCSI emulation in QEMU 
 is 
 not realistic.
 

I'm certainly not advocating for a lack of decent SCSI emulation in
QEMU.  Being able to support this across all host platform is something
QEMU certainly needs to take seriously.

Quite the opposite, I think virtio-scsi - vhost-scsi is a mechanism by
which it will be (eventually) possible to support T10 DIF protection for
storage blocks directly between Linux KVM guest - host.

In order for QEMU userspace to support this, Linux would need to expose
a method to userspace for issuing DIF protected CDBs.  This userspace
API currently does not exist AFAIK, so a kernel-level approach is 

Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread ronnie sahlberg
On Fri, Apr 20, 2012 at 5:00 PM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote:
 Hi Nicholas,

 On 04/19/2012 06:53 PM, Nicholas A. Bellinger wrote:
  On Thu, 2012-04-19 at 07:30 -0500, Anthony Liguori wrote:
  However, for storage, be it scsi or direct access, the same problem really
  doesn't exist.  There isn't an obvious benefit to being in the kernel.
 
 
  In the modern Linux v3.x tree, it was decided there is an obvious
  benefit to fabric drivers developers for going ahead and putting proper
  SCSI target logic directly into the kernel..  ;)

 I'm sure there are obvious benefits to having the kernel have SCSI target 
 logic.
   I'm not claiming that there isn't.

 But there is not an obvious benefit to doing SCSI emulation *for virtual
 machine* guests in the kernel.

 Guests are unconditionally hostile.  There is no qualification here.  Public
 clouds are the obvious example of this.

 TCM runs in the absolute most privileged context possible.  When you're 
 dealing
 with extremely hostile input, it's pretty obvious that you want to run it in 
 the
 lowest privileged context as humanly possible.


 The argument that a SCSI target for virtual machines is so complex that
 it can't possibly be implemented properly in the kernel is a bunch of
 non-sense.

There are also other benefits to NOT implement scsi emulation in the
kernel, aside from the security aspect of running large amounts of
code inside kernel context vs within restricted userspace context.

I am very happy to be able to add emulation of new opcodes or new
features to tgtd WITHOUT having to recompile my kernel.


regards
ronnie sahlberg



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread Nicholas A. Bellinger
On Fri, 2012-04-20 at 17:50 +1000, ronnie sahlberg wrote:
 On Fri, Apr 20, 2012 at 5:00 PM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote:
  Hi Nicholas,
 

SNIP

  The argument that a SCSI target for virtual machines is so complex that
  it can't possibly be implemented properly in the kernel is a bunch of
  non-sense.
 
 There are also other benefits to NOT implement scsi emulation in the
 kernel, aside from the security aspect of running large amounts of
 code inside kernel context vs within restricted userspace context.
 
 I am very happy to be able to add emulation of new opcodes or new
 features to tgtd WITHOUT having to recompile my kernel.
 
 

We can certainly add new emulation to an in-kernel target w/o having to
actually rebuild vmlinuz..  You just have to ensure that target_core_mod
is being built as a loadable module.

;)

--nab





Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread Paolo Bonzini
Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto:
 On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote:
 TCM runs in the absolute most privileged context possible.  When you're 
 dealing 
 with extremely hostile input, it's pretty obvious that you want to run it in 
 the 
 lowest privileged context as humanly possible.
 
 The argument that a SCSI target for virtual machines is so complex that
 it can't possibly be implemented properly in the kernel is a bunch of
 non-sense.

I agree.  A VM is not any more hostile than another iSCSI initiator.
lio _always_ must assume to operates in a hostile environment.

 Being able to identify which virtio-scsi guests can actually connect via
 vhost-scsi into individual tcm_vhost endpoints is step one here.

Yes, the ACL system in lio is quite good for this.

 Well, using a raw device from userspace there is still going to be a
 SG-IO memcpy going on here between user - kernel in current code,
 yes..?
 
 Being able to deliver interrupts and SGL memory directly into tcm_vhost
 cmwq kernel context for backend device execution w/o QEMU userspace
 involvement or extra SGL memcpy is the perceived performance benefit
 here.
 
 How much benefit will this actually provide across single port and multi
 port tcm_vhost LUNs into a single guest..?  That still remains to be
 demonstrated with performance+throughput benchmarks..

Yes, this is the key.

The problems I have with vhost-scsi are, from easiest to hardest:

- completely different configuration mechanism with respect to the
in-QEMU target (fix: need to integrate configfs into scsi-{disk,generic}).

- no support for migration (there can be pending SCSI requests at
migration time, that need to be restarted on the destination)

- no support for non-raw images (fix: use NBD on a Unix socket? perhaps
add an NBD backend to lio)

- cannot migrate from lio-target to QEMU-target and vice versa.

The first three are definitely fixable.

 In order for QEMU userspace to support this, Linux would need to expose
 a method to userspace for issuing DIF protected CDBs.  This userspace
 API currently does not exist AFAIK, so a kernel-level approach is the
 currently the only option when it comes to supporting end-to-end block
 protection information originating from within Linux guests.

I think it would be worthwhile to have this in userspace too.

 (Note this is going to involve a virtio-scsi spec rev as well)

Yes.  By the way, another possible modification could be to tell the
guest what is its (initiator) WWPN.

Paolo



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-20 Thread Stefan Hajnoczi
On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto:
 On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote:
 TCM runs in the absolute most privileged context possible.  When you're 
 dealing
 with extremely hostile input, it's pretty obvious that you want to run it 
 in the
 lowest privileged context as humanly possible.

 The argument that a SCSI target for virtual machines is so complex that
 it can't possibly be implemented properly in the kernel is a bunch of
 non-sense.

 I agree.  A VM is not any more hostile than another iSCSI initiator.
 lio _always_ must assume to operates in a hostile environment.

 Being able to identify which virtio-scsi guests can actually connect via
 vhost-scsi into individual tcm_vhost endpoints is step one here.

 Yes, the ACL system in lio is quite good for this.

 Well, using a raw device from userspace there is still going to be a
 SG-IO memcpy going on here between user - kernel in current code,
 yes..?

 Being able to deliver interrupts and SGL memory directly into tcm_vhost
 cmwq kernel context for backend device execution w/o QEMU userspace
 involvement or extra SGL memcpy is the perceived performance benefit
 here.

 How much benefit will this actually provide across single port and multi
 port tcm_vhost LUNs into a single guest..?  That still remains to be
 demonstrated with performance+throughput benchmarks..

 Yes, this is the key.

The overall goal is for virtio-scsi to compete with or be faster than
virtio-blk, whether we go the tcm_vhost or the QEMU SCSI emulation
route.  So Cong and I discussed the details of such a benchmark
yesterday.  The results will be presented to the QEMU community when
they have been collected - maybe a topic for the KVM community call.

 The problems I have with vhost-scsi are, from easiest to hardest:

 - completely different configuration mechanism with respect to the
 in-QEMU target (fix: need to integrate configfs into scsi-{disk,generic}).

Why is this a problem?  The target is a lot richer than QEMU's SCSI
emulation.  All the ACLs and other configuration should be done using
RTSadmin or configfs.  I don't think it makes sense to duplicate that
into QEMU.

 - no support for migration (there can be pending SCSI requests at
 migration time, that need to be restarted on the destination)

Yes and it hasn't been thought through by me at least ;-).  So
migration is indeed a challenge that needs to be worked through.

 - no support for non-raw images (fix: use NBD on a Unix socket? perhaps
 add an NBD backend to lio)

For me this is the biggest issue with kernel-level storage for virtual
machines.  We have NBD today but it goes through the network stack
using a limited protocol and probably can't do zero-copy.

The most promising option I found was dm-userspace
(http://wiki.xensource.com/xenwiki/DmUserspace), which implements a
device-mapper target with an in-kernel MMU-like lookup mechanism that
calls out to userspace when block addresses need to be translated.
It's not anywhere near to upstream and hasn't been pushed for several
years.  On the plus side we could also write a userspace
implementation of this so that QEMU image formats continue to be
portable to other host OSes without duplicating code.

If tcm_vhost only works with raw images then I don't see it as a
realistic option given the effort it will require to complete and
maintain.

 In order for QEMU userspace to support this, Linux would need to expose
 a method to userspace for issuing DIF protected CDBs.  This userspace
 API currently does not exist AFAIK, so a kernel-level approach is the
 currently the only option when it comes to supporting end-to-end block
 protection information originating from within Linux guests.

 I think it would be worthwhile to have this in userspace too.

 (Note this is going to involve a virtio-scsi spec rev as well)

 Yes.  By the way, another possible modification could be to tell the
 guest what is its (initiator) WWPN.

Going back to ALUA, I'd like to understand ALUA multipathing a bit
better.  I've never played with multipath, hence my questions:

I have a SAN with multiple controllers and ALUA support - so ALUA
multipathing is possible.  Now I want my KVM guests to take advantage
of multipath themselves.  Since the LIO target virtualizes the SCSI
bus (the host admin defines LUNs, target ports, and ACLs that do not
have to map 1:1 to the SAN) we also have to implement ALUA in the
virtio-scsi target.  The same would be true for QEMU SCSI emulation.

How would we configure LIO's ALUA in this case?  We really want to
reflect the port attributes (available/offline,
optimized/non-optimized) that the external SAN fabric reports.  Is
this supported by LIO?

Does it even make sense to pass the multipathing up into the guest?
If we terminate it on the host using Linux's ALUA support, we can hide
multipath entirely from the guest.  Do we lose an obvious advantage 

Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Paolo Bonzini
Il 19/04/2012 04:38, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
 The patchset was developed originally by Stefan about one year ago. I
 now rebase it to latest qemu.git/master and fixed some issues to make
 it work against tcm_vhost and virtio_scsi driver. But there are still
 some issues to fix. Let us make more effort later.

Zhi Yong, I'll warn you in advance that this message is going to be a
bit harsh.

You need to understand that once people have patches that work, that can
be only a minor part of the work for getting them ready for submission.
 This is especially true of experienced people like Stefan.  It is quite
common to work briefly on something to build a proof of concept, and
then pick up the work again months later.

Taking patches from other people and completing them can be much harder
than working on new code, because you need to get up to speed with
incomplete code and that is difficult.  You need to understand where the
work was left and what comes next.  You also need to ask the right
person to the guy who did the work, Stefan in this case.  Anything else
will lead to frustration for you and everyone involved, and delay the
project indefinitely.

To put it very frankly I don't know where to start listing mistakes in
this series.  For example:

1) The series is missing any information on how you configure and use
vhost-scsi.  Or on how you build it for that matter.

2) The series uses vhost-scsi unconditionally, as far as I can tell.  So
it breaks command-lines that build scsi-{disk,block} devices for use
with a virtio-scsi-pci device.

Even if it actually worked, you need to mention what happens if you
configure SCSI devices on the same virtio-scsi driver that uses vhost.

3) The cover letter is missing any performance numbers or examples of
why vhost-scsi is preferable to scsi-{disk,block}.  I know some of them,
but it needs to be documented elsewhere.

Documentation is also needed on how/whether migration works with
vhost-scsi, and between the QEMU and kernel targets (I know the answer
to the latter is it doesn't; I don't know about the former.

4) The series is not properly split, for example a patch like 4/16 needs
to be squashed above.

5) The series is not up-to-date, for example CONFIG_VIRTIO_SCSI does not
exist anymore.

6) You sent this series without even an RFC tag in the subject.

7) You sent this series, but you didn't even reply on target-devel to my
review of the kernel-side changes.

8) Last but not least, I count two pending patch series from you (NetDev
QOM conversion and network hubs) that you dropped on the list without
hardly following up to comments that were made.  So this gives me very
little incentive to look at your series.

To put it very frankly, I seriously think that you need either more
discipline, or very good mentoring to contribute complex patch series to
QEMU.  The list can offer mentoring, but you must expect criticism like
this one (believe me, this is nothing compared to what you get on the
linux kernel mailing list).

I strongly suggest that you work on one series at a time, for example
the network hub conversion seems to be the most promising.

Paolo



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Zhi Yong Wu
On Thu, Apr 19, 2012 at 3:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 19/04/2012 04:38, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 The patchset was developed originally by Stefan about one year ago. I
 now rebase it to latest qemu.git/master and fixed some issues to make
 it work against tcm_vhost and virtio_scsi driver. But there are still
 some issues to fix. Let us make more effort later.

 Zhi Yong, I'll warn you in advance that this message is going to be a
 bit harsh.
thanks for you reminder.

 You need to understand that once people have patches that work, that can
 be only a minor part of the work for getting them ready for submission.
  This is especially true of experienced people like Stefan.  It is quite
 common to work briefly on something to build a proof of concept, and
 then pick up the work again months later.

 Taking patches from other people and completing them can be much harder
 than working on new code, because you need to get up to speed with
 incomplete code and that is difficult.  You need to understand where the
 work was left and what comes next.  You also need to ask the right
 person to the guy who did the work, Stefan in this case.  Anything else
 will lead to frustration for you and everyone involved, and delay the
 project indefinitely.

 To put it very frankly I don't know where to start listing mistakes in
 this series.  For example:

 1) The series is missing any information on how you configure and use
 vhost-scsi.  Or on how you build it for that matter.
OK, i will send out how to build  configure in next version if
everything is good enough.

 2) The series uses vhost-scsi unconditionally, as far as I can tell.  So
Perhaps there're some cleanup work to do, but it use vhost-scsi
*conditionally*. In virtio_scsi_init(), you can see the code as below:
if (s-vhost_scsi) {
s-vdev.set_status = virtio_scsi_set_status;
}
As you have seen, virtio_scsi_set_status will be used to trigger
vhost_scsi function.
 it breaks command-lines that build scsi-{disk,block} devices for use
 with a virtio-scsi-pci device.

 Even if it actually worked, you need to mention what happens if you
 configure SCSI devices on the same virtio-scsi driver that uses vhost.
OK, i will do this in next version.


 3) The cover letter is missing any performance numbers or examples of
 why vhost-scsi is preferable to scsi-{disk,block}.  I know some of them,
 but it needs to be documented elsewhere.
Sorry, i will do more effort about this later.

 Documentation is also needed on how/whether migration works with
 vhost-scsi, and between the QEMU and kernel targets (I know the answer
 to the latter is it doesn't; I don't know about the former.
ditto

 4) The series is not properly split, for example a patch like 4/16 needs
 to be squashed above.
ah, i thought that i can carefully split them only before this
patchset are basically accepted.

 5) The series is not up-to-date, for example CONFIG_VIRTIO_SCSI does not
 exist anymore.
Let me check this.

 6) You sent this series without even an RFC tag in the subject.
OK, will add it in next version.

 7) You sent this series, but you didn't even reply on target-devel to my
 review of the kernel-side changes.
Sorry, recently i indeed am busy. i will reply it soon.

 8) Last but not least, I count two pending patch series from you (NetDev
 QOM conversion and network hubs) that you dropped on the list without
Observe so carefully.:), Yeah, but i have dropped them, and Stefan
usually pushed me do them and i am working on them.:)
In fact, the NetDev QOM depends on some of your push, push patchset,
i hoped to work on it after your patchset is merged into UPSTREAM. But
everything is changing, so i have to work on it now.
 hardly following up to comments that were made.  So this gives me very
 little incentive to look at your series.
sorry, please don't discourage, these tasks have to be done by me this
year. Very thinks for your comments, and they let me get a lot of
useful skills.

 To put it very frankly, I seriously think that you need either more
 discipline, or very good mentoring to contribute complex patch series to
 QEMU.  The list can offer mentoring, but you must expect criticism like
 this one (believe me, this is nothing compared to what you get on the
 linux kernel mailing list).
:), welcome.

 I strongly suggest that you work on one series at a time, for example
 the network hub conversion seems to be the most promising.
OK, thanks.

 Paolo



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Zhi Yong Wu
On Thu, Apr 19, 2012 at 4:19 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 3:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 19/04/2012 04:38, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 The patchset was developed originally by Stefan about one year ago. I
 now rebase it to latest qemu.git/master and fixed some issues to make
 it work against tcm_vhost and virtio_scsi driver. But there are still
 some issues to fix. Let us make more effort later.

 Zhi Yong, I'll warn you in advance that this message is going to be a
 bit harsh.
 thanks for you reminder.

 You need to understand that once people have patches that work, that can
 be only a minor part of the work for getting them ready for submission.
  This is especially true of experienced people like Stefan.  It is quite
 common to work briefly on something to build a proof of concept, and
 then pick up the work again months later.

 Taking patches from other people and completing them can be much harder
 than working on new code, because you need to get up to speed with
 incomplete code and that is difficult.  You need to understand where the
 work was left and what comes next.  You also need to ask the right
 person to the guy who did the work, Stefan in this case.  Anything else
 will lead to frustration for you and everyone involved, and delay the
 project indefinitely.

 To put it very frankly I don't know where to start listing mistakes in
 this series.  For example:

 1) The series is missing any information on how you configure and use
 vhost-scsi.  Or on how you build it for that matter.
 OK, i will send out how to build  configure in next version if
 everything is good enough.

 2) The series uses vhost-scsi unconditionally, as far as I can tell.  So
 Perhaps there're some cleanup work to do, but it use vhost-scsi
 *conditionally*. In virtio_scsi_init(), you can see the code as below:
    if (s-vhost_scsi) {
        s-vdev.set_status = virtio_scsi_set_status;
    }
 As you have seen, virtio_scsi_set_status will be used to trigger
 vhost_scsi function.
 it breaks command-lines that build scsi-{disk,block} devices for use
 with a virtio-scsi-pci device.

 Even if it actually worked, you need to mention what happens if you
 configure SCSI devices on the same virtio-scsi driver that uses vhost.
 OK, i will do this in next version.


 3) The cover letter is missing any performance numbers or examples of
 why vhost-scsi is preferable to scsi-{disk,block}.  I know some of them,
 but it needs to be documented elsewhere.
 Sorry, i will do more effort about this later.

 Documentation is also needed on how/whether migration works with
 vhost-scsi, and between the QEMU and kernel targets (I know the answer
 to the latter is it doesn't; I don't know about the former.
 ditto

 4) The series is not properly split, for example a patch like 4/16 needs
 to be squashed above.
 ah, i thought that i can carefully split them only before this
 patchset are basically accepted.

 5) The series is not up-to-date, for example CONFIG_VIRTIO_SCSI does not
 exist anymore.
 Let me check this.

 6) You sent this series without even an RFC tag in the subject.
 OK, will add it in next version.

 7) You sent this series, but you didn't even reply on target-devel to my
 review of the kernel-side changes.
 Sorry, recently i indeed am busy. i will reply it soon.

 8) Last but not least, I count two pending patch series from you (NetDev
 QOM conversion and network hubs) that you dropped on the list without
 Observe so carefully.:), Yeah, but i have dropped them, and Stefan
s/have/haven't/g
 usually pushed me do them and i am working on them.:)
 In fact, the NetDev QOM depends on some of your push, push patchset,
 i hoped to work on it after your patchset is merged into UPSTREAM. But
 everything is changing, so i have to work on it now.
 hardly following up to comments that were made.  So this gives me very
 little incentive to look at your series.
 sorry, please don't discourage, these tasks have to be done by me this
 year. Very thinks for your comments, and they let me get a lot of
 useful skills.

 To put it very frankly, I seriously think that you need either more
 discipline, or very good mentoring to contribute complex patch series to
 QEMU.  The list can offer mentoring, but you must expect criticism like
 this one (believe me, this is nothing compared to what you get on the
 linux kernel mailing list).
 :), welcome.

 I strongly suggest that you work on one series at a time, for example
 the network hub conversion seems to be the most promising.
 OK, thanks.

 Paolo



 --
 Regards,

 Zhi Yong Wu



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Paolo Bonzini
Il 19/04/2012 10:19, Zhi Yong Wu ha scritto:
 
  2) The series uses vhost-scsi unconditionally, as far as I can tell.  So
 Perhaps there're some cleanup work to do, but it use vhost-scsi
 *conditionally*. In virtio_scsi_init(), you can see the code as below:
 if (s-vhost_scsi) {
 s-vdev.set_status = virtio_scsi_set_status;
 }
 As you have seen, virtio_scsi_set_status will be used to trigger
 vhost_scsi function.

Ok, I was not sure if that was enough.

 Even if it actually worked, you need to mention what happens if you
 configure SCSI devices on the same virtio-scsi driver that uses vhost.
 OK, i will do this in next version.

You probably need to stop exposing the SCSI bus altogether if vhost is
chosen.

Paolo



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Zhi Yong Wu
On Thu, Apr 19, 2012 at 4:43 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 19/04/2012 10:19, Zhi Yong Wu ha scritto:
 
  2) The series uses vhost-scsi unconditionally, as far as I can tell.  So
 Perhaps there're some cleanup work to do, but it use vhost-scsi
 *conditionally*. In virtio_scsi_init(), you can see the code as below:
     if (s-vhost_scsi) {
         s-vdev.set_status = virtio_scsi_set_status;
     }
 As you have seen, virtio_scsi_set_status will be used to trigger
 vhost_scsi function.

 Ok, I was not sure if that was enough.

 Even if it actually worked, you need to mention what happens if you
 configure SCSI devices on the same virtio-scsi driver that uses vhost.
 OK, i will do this in next version.

 You probably need to stop exposing the SCSI bus altogether if vhost is
 chosen.
There're still some cleanup work to do about how vhost-scsi exists
with virtio-scsi together. I will carefully consider this, thanks.

 Paolo



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Zhi Yong Wu
On Thu, Apr 19, 2012 at 4:43 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 19/04/2012 10:19, Zhi Yong Wu ha scritto:
 
  2) The series uses vhost-scsi unconditionally, as far as I can tell.  So
 Perhaps there're some cleanup work to do, but it use vhost-scsi
 *conditionally*. In virtio_scsi_init(), you can see the code as below:
     if (s-vhost_scsi) {
         s-vdev.set_status = virtio_scsi_set_status;
     }
 As you have seen, virtio_scsi_set_status will be used to trigger
 vhost_scsi function.

 Ok, I was not sure if that was enough.
From current testing, it can be used to identify if vhost will be
used. I planed to add one flag.


 Even if it actually worked, you need to mention what happens if you
 configure SCSI devices on the same virtio-scsi driver that uses vhost.
 OK, i will do this in next version.

 You probably need to stop exposing the SCSI bus altogether if vhost is
 chosen.

 Paolo



-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Anthony Liguori

Hi,

As I've mentioned before in the past, I will apply vhost-* without an extremely 
compelling argument for it.


The reason we ultimately settled on vhost-net is that in the absence of a 
fundamental change in the userspace networking interface (with something like VJ 
channels), it's extremely unlikely that we would ever get the right interfaces 
to do zero-copy transmit/receive in userspace.


However, for storage, be it scsi or direct access, the same problem really 
doesn't exist.  There isn't an obvious benefit to being in the kernel.


There are many downsides though.  It's a big security risk.  SCSI is a complex 
protocol and I'd rather the piece of code that's parsing and emulating SCSI cdbs 
was unprivileged and sandboxed than running within the kernel context.


So before we get too deep in patches, we need some solid justification first.

Regards,

Anthony Liguori


On 04/18/2012 09:38 PM, zwu.ker...@gmail.com wrote:

From: Zhi Yong Wuwu...@linux.vnet.ibm.com

The patchset was developed originally by Stefan about one year ago. I now 
rebase it to latest qemu.git/master and fixed some issues to make it work 
against tcm_vhost and virtio_scsi driver. But there are still some issues to 
fix. Let us make more effort later.

Stefan Hajnoczi (13):
   virtio-scsi: Add wwpn and tgpt properties
   vhost: Pass device path to vhost_dev_init()
   virtio-scsi: Add vhost_vring_target ioctl struct
   virtio-scsi: Fix tgpt typo to tpgt and use uint16_t
   virtio-scsi: Build virtio-scsi.o against vhost.o
   virtio-scsi: Open and initialize /dev/vhost-scsi
   virtio-scsi: Start/stop vhost
   notifier: add validity check and notify function
   virtio-pci: support host notifiers in TCG mode
   virtio-pci: check that event notification worked
   vhost-scsi: add -vhost-scsi host device
   virtio-scsi: use the vhost-scsi host device
   virtio-scsi: WIP VHOST_SCSI_SET_ENDPOINT call

Zhi Yong Wu (3):
   vhost-scsi: enable vhost notifiers for multiple queues
   vhost-scsi: move some definitions to its header file
   vhost-scsi: clear endpoint on stopped

  Makefile.target  |2 +-
  configure|2 +
  event_notifier.c |   21 +++
  event_notifier.h |4 +
  hw/qdev-properties.c |   32 ++
  hw/qdev.h|3 +
  hw/vhost-scsi.c  |  156 ++
  hw/vhost-scsi.h  |   39 +
  hw/vhost.c   |5 +-
  hw/vhost.h   |3 +-
  hw/vhost_net.c   |2 +-
  hw/virtio-pci.c  |   28 -
  hw/virtio-scsi.c |   52 +
  hw/virtio-scsi.h |1 +
  hw/virtio.c  |7 ++
  qemu-common.h|1 +
  qemu-config.c|   16 +
  qemu-options.hx  |4 +
  vl.c |   18 ++
  19 files changed, 388 insertions(+), 8 deletions(-)
  create mode 100644 hw/vhost-scsi.c
  create mode 100644 hw/vhost-scsi.h






Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-19 Thread Anthony Liguori

Hi Nicholas,

On 04/19/2012 06:53 PM, Nicholas A. Bellinger wrote:

On Thu, 2012-04-19 at 07:30 -0500, Anthony Liguori wrote:

However, for storage, be it scsi or direct access, the same problem really
doesn't exist.  There isn't an obvious benefit to being in the kernel.



In the modern Linux v3.x tree, it was decided there is an obvious
benefit to fabric drivers developers for going ahead and putting proper
SCSI target logic directly into the kernel..  ;)


I'm sure there are obvious benefits to having the kernel have SCSI target logic. 
 I'm not claiming that there isn't.


But there is not an obvious benefit to doing SCSI emulation *for virtual 
machine* guests in the kernel.


Guests are unconditionally hostile.  There is no qualification here.  Public 
clouds are the obvious example of this.


TCM runs in the absolute most privileged context possible.  When you're dealing 
with extremely hostile input, it's pretty obvious that you want to run it in the 
lowest privileged context as humanly possible.


Good or bad, QEMU runs as a very unprivileged user confined by SELinux and very 
soon, sandboxed with seccomp.  There's an obvious benefit to putting complex 
code into an environment like this.





There are many downsides though.  It's a big security risk.  SCSI is a complex
protocol and I'd rather the piece of code that's parsing and emulating SCSI cdbs
was unprivileged and sandboxed than running within the kernel context.



It has historically been a security risk doing raw SCSI passthrough of
complex CDBs to underlying SCSI LLD code, because SCSI CDB emulation
support within specific LLDs had an easy and sure-fire chance to get
said complex CDB emulation wrong..  (eg: no generic library in the
kernel for LLDs until libata was created)


That's a perfectly good argument for having TCM in the kernel


With Linux v3.x hosts we now have universal target mode support in the
upstream kernel for BLOCK+FILEIO backends with full SPC-3 (NAA IEEE
Registered Extended WWN naming, Explict/Implict ALUA multipath,
Persistent Reservations) using a control plane with ConfigFS
representing objects at the VFS layer for parent/child and intra-module
kernel data structure relationships.

We also have userspace rtslib (supported in downstream distros) that
exposes the ConfigFS layout of tcm_vhost fabric endpoints as easily
scriptable python library objects into higher level application code.


And that's all goodness :-)




So before we get too deep in patches, we need some solid justification first.



So the potential performance benefit is one thing that will be in favor
of vhost-scsi,


Why?  Why would vhost-scsi be any faster than doing target emulation in 
userspace and then doing O_DIRECT with linux-aio to a raw device?


? I think the ability to utilize the underlying TCM fabric

and run concurrent ALUA multipath using multiple virtio-scsi LUNs to the
same /sys/kernel/config/target/core/$HBA/$DEV/ backend can potentially
give us some nice flexibility when dynamically managing paths into the
virtio-scsi guest.


The thing is, you can always setup this kind of infrastructure and expose a raw 
block device to userspace and then have QEMU emulate a target and turn that into 
O_DIRECT + linux-aio.


We can also use SG_IO to inject SCSI commands if we really need to.  I'd rather 
we optimize this path.  If nothing else, QEMU should be filtering SCSI requests 
before the kernel sees them.  If something is going to SEGV, it's far better 
that it's QEMU than the kernel.


We cannot avoid doing SCSI emulation in QEMU.  SCSI is too fundamental to far 
too many devices.  So the prospect of not having good SCSI emulation in QEMU is 
not realistic.


Regards,

Anthony Liguori


Also, since client side ALUA is supported across pretty much all server
class SCSI clients in the last years, it does end up getting alot of
usage in the SCSI world.  It's a client side SCSI multipath feature that
is fabric independent, and that we know is already supported for free
across all Linux flavours + other modern server class guests.






--nab






[Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

2012-04-18 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

The patchset was developed originally by Stefan about one year ago. I now 
rebase it to latest qemu.git/master and fixed some issues to make it work 
against tcm_vhost and virtio_scsi driver. But there are still some issues to 
fix. Let us make more effort later.

Stefan Hajnoczi (13):
  virtio-scsi: Add wwpn and tgpt properties
  vhost: Pass device path to vhost_dev_init()
  virtio-scsi: Add vhost_vring_target ioctl struct
  virtio-scsi: Fix tgpt typo to tpgt and use uint16_t
  virtio-scsi: Build virtio-scsi.o against vhost.o
  virtio-scsi: Open and initialize /dev/vhost-scsi
  virtio-scsi: Start/stop vhost
  notifier: add validity check and notify function
  virtio-pci: support host notifiers in TCG mode
  virtio-pci: check that event notification worked
  vhost-scsi: add -vhost-scsi host device
  virtio-scsi: use the vhost-scsi host device
  virtio-scsi: WIP VHOST_SCSI_SET_ENDPOINT call

Zhi Yong Wu (3):
  vhost-scsi: enable vhost notifiers for multiple queues
  vhost-scsi: move some definitions to its header file
  vhost-scsi: clear endpoint on stopped

 Makefile.target  |2 +-
 configure|2 +
 event_notifier.c |   21 +++
 event_notifier.h |4 +
 hw/qdev-properties.c |   32 ++
 hw/qdev.h|3 +
 hw/vhost-scsi.c  |  156 ++
 hw/vhost-scsi.h  |   39 +
 hw/vhost.c   |5 +-
 hw/vhost.h   |3 +-
 hw/vhost_net.c   |2 +-
 hw/virtio-pci.c  |   28 -
 hw/virtio-scsi.c |   52 +
 hw/virtio-scsi.h |1 +
 hw/virtio.c  |7 ++
 qemu-common.h|1 +
 qemu-config.c|   16 +
 qemu-options.hx  |4 +
 vl.c |   18 ++
 19 files changed, 388 insertions(+), 8 deletions(-)
 create mode 100644 hw/vhost-scsi.c
 create mode 100644 hw/vhost-scsi.h

-- 
1.7.6