Re: [Qemu-block] [PATCH 06/10] scsi, file-posix: add support for persistent reservation management

2017-08-30 Thread Stefan Hajnoczi
On Tue, Aug 22, 2017 at 03:18:28PM +0200, Paolo Bonzini wrote:
> +#ifdef CONFIG_LINUX
> +PRManager *pr_manager_lookup(const char *id, Error **errp);
> +#else
> +static inline PRManager *pr_manager_lookup(const char *id,
> +  Error **errp)

Indentation

> diff --git a/scsi/pr-manager.c b/scsi/pr-manager.c
> new file mode 100644
> index 00..e80f8d9b31
> --- /dev/null
> +++ b/scsi/pr-manager.c
> @@ -0,0 +1,109 @@
> +/*
> + * Persistent reservation manager abstract class
> + *
> + * Copyright (c) 2017 Red Hat, Inc.
> + *
> + * Author: Paolo Bonzini 
> + *
> + * This code is licensed under the LGPL.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "block/aio.h"
> +#include "block/thread-pool.h"
> +#include "scsi/pr-manager.h"
> +#include "scsi/trace.h"
> +
> +#include 

HACKING "1.2. Include directives" defines the order of #includes.  It
should be "qemu/osdep.h", , followed by all other QEMU
headers.



Re: [Qemu-block] [PATCH 06/10] scsi, file-posix: add support for persistent reservation management

2017-08-23 Thread Paolo Bonzini
On 23/08/2017 06:13, Fam Zheng wrote:
> On Tue, 08/22 15:18, Paolo Bonzini wrote:
>> diff --git a/docs/pr-manager.rst b/docs/pr-manager.rst
> 
> Is docs/interop/persistent-reservation-manager.rst better? (Move to interop/ 
> and
> de-abbreviate) ...
> 
>> new file mode 100644
>> index 00..b6089fb57c
>> --- /dev/null
>> +++ b/docs/pr-manager.rst
>> @@ -0,0 +1,51 @@
>> +==
>> +Persistent reservation managers
>> +==
>> +
>> +SCSI persistent Reservations allow restricting access to block devices
>> +to specific initiators in a shared storage setup.  When implementing
>> +clustering of virtual machines, it is a common requirement for virtual
>> +machines to send persistent reservation SCSI commands.  However,
>> +the operating system restricts sending these commands to unprivileged
>> +programs because incorrect usage can disrupt regular operation of the
>> +storage fabric.
>> +
>> +For this reason, QEMU's SCSI passthrough devices, ``scsi-block``
>> +and ``scsi-generic`` (both are only available on Linux) can delegate
>> +implementation of persistent reservations to a separate object,
>> +the "persistent reservation manager".  Only PERSISTENT RESERVE OUT and
>> +PERSISTENT RESERVE IN commands are passed to the persistent reservation
>> +manager object; other commands are processed by QEMU as usual.
>> +
>> +-
>> +Defining a persistent reservation manager
>> +-
>> +
>> +A persistent reservation manager is an instance of a subclass of the
>> +"pr-manager" QOM class.
> 
> Or is this abstraction class the reason it is not under interop? Why not just
> define the protocol?

It is not under interop because this is user documentation.  The
protocol documentation is under interop because the protocol is public,
and if someone else wants to talk to qemu-pr-helper they can.

(If the protocol was private, the protocol documentation would have been
under docs/devel).

Paolo

>> +
>> +Right now only one subclass is defined, ``pr-manager-helper``, which
>> +forwards the commands to an external privileged helper program
>> +over Unix sockets.  The helper program only allows sending persistent
>> +reservation commands to devices for which QEMU has a file descriptor,
>> +so that QEMU will not be able to effect persistent reservations
>> +unless it has access to both the socket and the device.
>> +
>> +``pr-manager-helper`` has a single string property, ``path``, which
>> +accepts the path to the helper program's Unix socket.  For example,
>> +the following command line defines a ``pr-manager-helper`` object and
>> +attaches it to a SCSI passthrough device::
>> +
>> +  $ qemu-system-x86_64
>> +  -device virtio-scsi \
>> +  -object 
>> pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
>> +  -drive 
>> if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
>> +  -device scsi-block,drive=hd
>> +
>> +Alternatively, using ``-blockdev``::
>> +
>> +  $ qemu-system-x86_64
>> +  -device virtio-scsi \
>> +  -object 
>> pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
>> +  -blockdev 
>> node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
>> +  -device scsi-block,drive=hd
> 
> Fam
> 




Re: [Qemu-block] [PATCH 06/10] scsi, file-posix: add support for persistent reservation management

2017-08-22 Thread Fam Zheng
On Tue, 08/22 15:18, Paolo Bonzini wrote:
> diff --git a/docs/pr-manager.rst b/docs/pr-manager.rst

Is docs/interop/persistent-reservation-manager.rst better? (Move to interop/ and
de-abbreviate) ...

> new file mode 100644
> index 00..b6089fb57c
> --- /dev/null
> +++ b/docs/pr-manager.rst
> @@ -0,0 +1,51 @@
> +==
> +Persistent reservation managers
> +==
> +
> +SCSI persistent Reservations allow restricting access to block devices
> +to specific initiators in a shared storage setup.  When implementing
> +clustering of virtual machines, it is a common requirement for virtual
> +machines to send persistent reservation SCSI commands.  However,
> +the operating system restricts sending these commands to unprivileged
> +programs because incorrect usage can disrupt regular operation of the
> +storage fabric.
> +
> +For this reason, QEMU's SCSI passthrough devices, ``scsi-block``
> +and ``scsi-generic`` (both are only available on Linux) can delegate
> +implementation of persistent reservations to a separate object,
> +the "persistent reservation manager".  Only PERSISTENT RESERVE OUT and
> +PERSISTENT RESERVE IN commands are passed to the persistent reservation
> +manager object; other commands are processed by QEMU as usual.
> +
> +-
> +Defining a persistent reservation manager
> +-
> +
> +A persistent reservation manager is an instance of a subclass of the
> +"pr-manager" QOM class.

Or is this abstraction class the reason it is not under interop? Why not just
define the protocol?

> +
> +Right now only one subclass is defined, ``pr-manager-helper``, which
> +forwards the commands to an external privileged helper program
> +over Unix sockets.  The helper program only allows sending persistent
> +reservation commands to devices for which QEMU has a file descriptor,
> +so that QEMU will not be able to effect persistent reservations
> +unless it has access to both the socket and the device.
> +
> +``pr-manager-helper`` has a single string property, ``path``, which
> +accepts the path to the helper program's Unix socket.  For example,
> +the following command line defines a ``pr-manager-helper`` object and
> +attaches it to a SCSI passthrough device::
> +
> +  $ qemu-system-x86_64
> +  -device virtio-scsi \
> +  -object 
> pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
> +  -drive 
> if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
> +  -device scsi-block,drive=hd
> +
> +Alternatively, using ``-blockdev``::
> +
> +  $ qemu-system-x86_64
> +  -device virtio-scsi \
> +  -object 
> pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
> +  -blockdev 
> node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
> +  -device scsi-block,drive=hd

Fam



[Qemu-block] [PATCH 06/10] scsi, file-posix: add support for persistent reservation management

2017-08-22 Thread Paolo Bonzini
It is a common requirement for virtual machine to send persistent
reservations, but this currently requires either running QEMU with
CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged
QEMU bypass Linux's filter on SG_IO commands.

As an alternative mechanism, the next patches will introduce a
privileged helper to run persistent reservation commands without
expanding QEMU's attack surface unnecessarily.

The helper is invoked through a "pr-manager" QOM object, to which
file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and
PERSISTENT RESERVE IN commands.  For example:

  $ qemu-system-x86_64
  -device virtio-scsi \
  -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
  -drive 
if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
  -device scsi-block,drive=hd

or:

  $ qemu-system-x86_64
  -device virtio-scsi \
  -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
  -blockdev 
node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
  -device scsi-block,drive=hd

Multiple pr-manager implementations are conceivable and possible, though
only one is implemented right now.  For example, a pr-manager could:

- talk directly to the multipath daemon from a privileged QEMU
  (i.e. QEMU links to libmpathpersist); this makes reservation work
  properly with multipath, but still requires CAP_SYS_RAWIO

- use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though)

- more interestingly, implement reservations directly in QEMU
  through file system locks or a shared database (e.g. sqlite)

Signed-off-by: Paolo Bonzini 
---
 Makefile.objs |   1 +
 block/file-posix.c|  30 +
 docs/pr-manager.rst   |  51 ++
 include/scsi/pr-manager.h |  57 
 qapi/block-core.json  |   4 ++
 scsi/Makefile.objs|   2 +
 scsi/pr-manager.c | 109 ++
 vl.c  |   3 +-
 8 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 docs/pr-manager.rst
 create mode 100644 include/scsi/pr-manager.h
 create mode 100644 scsi/pr-manager.c

diff --git a/Makefile.objs b/Makefile.objs
index f68aa3b60d..64bebd05db 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -168,6 +168,7 @@ trace-events-subdirs += qapi
 trace-events-subdirs += accel/tcg
 trace-events-subdirs += accel/kvm
 trace-events-subdirs += nbd
+trace-events-subdirs += scsi
 
 trace-events-files = $(SRC_PATH)/trace-events 
$(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events)
 
diff --git a/block/file-posix.c b/block/file-posix.c
index f4de022ae0..47aadbf45d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -34,6 +34,9 @@
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
 
+#include "scsi/pr-manager.h"
+#include "scsi/constants.h"
+
 #if defined(__APPLE__) && (__MACH__)
 #include 
 #include 
@@ -156,6 +159,8 @@ typedef struct BDRVRawState {
 bool page_cache_inconsistent:1;
 bool has_fallocate;
 bool needs_alignment;
+
+PRManager *pr_mgr;
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -403,6 +408,11 @@ static QemuOptsList raw_runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "file locking mode (on/off/auto, default: auto)",
 },
+{
+.name = "pr-manager",
+.type = QEMU_OPT_STRING,
+.help = "id of persistent reservation manager object (default: 
none)",
+},
 { /* end of list */ }
 },
 };
@@ -414,6 +424,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 QemuOpts *opts;
 Error *local_err = NULL;
 const char *filename = NULL;
+const char *str;
 BlockdevAioOptions aio, aio_default;
 int fd, ret;
 struct stat st;
@@ -478,6 +489,16 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 abort();
 }
 
+str = qemu_opt_get(opts, "pr-manager");
+if (str) {
+s->pr_mgr = pr_manager_lookup(str, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+}
+
 s->open_flags = open_flags;
 raw_parse_flags(bdrv_flags, >open_flags);
 
@@ -2600,6 +2621,15 @@ static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
 if (fd_open(bs) < 0)
 return NULL;
 
+if (req == SG_IO && s->pr_mgr) {
+struct sg_io_hdr *io_hdr = buf;
+if (io_hdr->cmdp[0] == PERSISTENT_RESERVE_OUT ||
+io_hdr->cmdp[0] == PERSISTENT_RESERVE_IN) {
+return pr_manager_execute(s->pr_mgr, bdrv_get_aio_context(bs),
+  s->fd, io_hdr, cb, opaque);
+}
+}
+
 acb = g_new(RawPosixAIOData, 1);
 acb->bs = bs;
 acb->aio_type = QEMU_AIO_IOCTL;
diff --git a/docs/pr-manager.rst