Re: [PATCH 1/2] file-posix: Use OFD lock only if the filesystem supports the lock

2021-02-10 Thread Masayoshi Mizuma
On Fri, Nov 20, 2020 at 04:42:28PM +0100, Kevin Wolf wrote:
> Am 20.11.2020 um 00:56 hat Masayoshi Mizuma geschrieben:
> > On Thu, Nov 19, 2020 at 11:44:42AM +0100, Kevin Wolf wrote:
> > > Am 18.11.2020 um 20:48 hat Masayoshi Mizuma geschrieben:
> > > > On Wed, Nov 18, 2020 at 02:10:36PM -0500, Masayoshi Mizuma wrote:
> > > > > On Wed, Nov 18, 2020 at 04:42:47PM +0100, Kevin Wolf wrote:
> > > > > > The logic looks fine to me, at least assuming that EINVAL is really 
> > > > > > what
> > > > > > we will consistently get from the kernel if OFD locks are not 
> > > > > > supported.
> > > > > > Is this documented anywhere? The fcntl manpage doesn't seem to 
> > > > > > mention
> > > > > > this case.
> > > > 
> > > > The man page of fcntl(2) says:
> > > > 
> > > >EINVAL The value specified in cmd is not recognized by this 
> > > > kernel.
> > > > 
> > > > So I think EINVAL is good enough to check whether the filesystem 
> > > > supports
> > > > OFD locks or not...
> > > 
> > > A kernel not knowing the cmd at all is a somewhat different case (and
> > > certainly a different code path) than a filesystem not supporting it.
> > > 
> > > I just had a look at the kernel code, and to me it seems that the
> > > difference between POSIX locks and OFD locks is handled entirely in
> > > filesystem independent code. A filesystem driver would in theory have
> > > ways to distinguish both, but I don't see any driver in the kernel tree
> > > that actually does this (and there is probably little reason for a
> > > driver to do so).
> > > 
> > > So now I wonder what filesystem you are using? I'm curious what I
> > > missed.
> > 
> > I'm using a proprietary filesystem, which isn't in the linux kernel.
> > The filesystem supports posix lock only, doesn't support OFD lock...
> 
> Do you know why that proprietary filesystem driver makes a difference
> between POSIX locks and OFD locks? The main difference between both
> types is when they are released automatically, and this is handled by
> generic kernel code and not the filesystem driver.
> 
> From a filesystem perspective, I don't see any reason to even
> distuingish. So unless there are good reasons for making the
> distinction, I'm currently inclined to view this as a filesystem
> driver bug.
> 
> It makes handling this case hard because when the case isn't even
> supposed to exist, of course there won't be a defined error code.

Hi Kevin,

The filesystem team found a locking issue in the filesystem.
Your comments were very helpful! I really appriciate it.

Thanks,
Masa



Re: To start multiple KVM guests from one qcow2 image with transient disk option

2021-01-17 Thread Masayoshi Mizuma
On Fri, Jan 15, 2021 at 01:49:54PM +0100, Peter Krempa wrote:
> On Thu, Jan 07, 2021 at 16:32:59 -0500, Masayoshi Mizuma wrote:
> > On Thu, Jan 07, 2021 at 09:05:42AM +0100, Peter Krempa wrote:
> > > On Tue, Jan 05, 2021 at 15:12:55 +0100, Peter Krempa wrote:
> > > > On Mon, Jan 04, 2021 at 15:30:19 -0500, Masayoshi Mizuma wrote:
> > > > > On Sat, Dec 19, 2020 at 11:30:39PM -0500, Masayoshi Mizuma wrote:
> 
> [...]
> 
> > Guest0 QMP:
> > 
> >   
> > {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT-Guest0","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}
> > 
> > Guest1 QMP:
> > 
> >   
> > {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT-Guest1","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}
> 
> Anyways, if you want, you can try implement this using the
> frontend-hotplug approach for 'virtio' and 'scsi' disks, but I can't
> guarantee that it will be accepted upstream in cases when the
> implementation will turn out to be too hairy.

Thanks.
I'm trying to implement this. I will submit the patches as RFC or WIP.

- Masa



Re: To start multiple KVM guests from one qcow2 image with transient disk option

2021-01-07 Thread Masayoshi Mizuma
On Thu, Jan 07, 2021 at 09:05:42AM +0100, Peter Krempa wrote:
> On Tue, Jan 05, 2021 at 15:12:55 +0100, Peter Krempa wrote:
> > On Mon, Jan 04, 2021 at 15:30:19 -0500, Masayoshi Mizuma wrote:
> > > On Sat, Dec 19, 2020 at 11:30:39PM -0500, Masayoshi Mizuma wrote:
>  
>  [...]
>  
>  >   {"execute":"cont"}
> > 
> > So that is a no-go. Some disk bus-es such as IDE don't support hotplug:
> > 
> > https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_hotplug.c#L1074
> > 
> > You could try to just instantiate the backend of the disk as read-only,
> > and then create a writable overlay. You just need to make sure that the
> > disk will be writable and that it works even for IDE/SATA which doesn't
> > support read-only:
> > 
> > https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_validate.c#L2634
> 
> Okay, I've actually tried implementing my suggestion and that doesn't
> work. With scsi disks qemu crashes on an assertion failure when I try
> writing to the disk after setting it up as suggested and virtio disk
> returns an I/O error as it remembers that it was read-only when
> starting.
> 
> I'm considering whether it's worth implementing this via hotplug then.
> 
> Arguably simple deployments don't need it and complex ones have some
> kind of orchestration which can do it externally.
> 
> This specifically comes with a caveat of the above, as currently the
> overlay used to discard writes is created under the same path as the
> image and can't be controlled, which might be a problem for complex
> deployments.
> 
> Also the above algorithm with a constant suffix added to the image
> prevents to use it with multiple VMs anyways, since the overlay file
> name will collide (since it's generated based on the same rules).
> 
> Didn't you run into the collision?

Yes, I needed to set the different filename for each guests with blockdev-add 
QMP command.
Is it good idea to add the guest name as the suffix to the filename? Like as:

Guest0 QMP:

  
{"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT-Guest0","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}

Guest1 QMP:

  
{"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT-Guest1","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}

Thanks,
Masa



Re: To start multiple KVM guests from one qcow2 image with transient disk option

2021-01-04 Thread Masayoshi Mizuma
On Sat, Dec 19, 2020 at 11:30:39PM -0500, Masayoshi Mizuma wrote:
> On Thu, Dec 17, 2020 at 10:39:36AM +0100, Peter Krempa wrote:
> > On Mon, Dec 14, 2020 at 22:49:03 -0500, Masayoshi Mizuma wrote:
> > > On Sat, Dec 12, 2020 at 11:57:15AM +0100, Peter Krempa wrote:
> > > > On Fri, Dec 11, 2020 at 20:58:48 -0500, Masayoshi Mizuma wrote:
> > > > > Hello,
> > > > > 
> > > > > I would like to start multiple KVM guests from one qcow2 image, and
> > > > > discard the changes which the KVM guests done.
> > > > > 
> > > > > transient disk option is useful for discarding the changes, however,
> > > > > we cannot start multiple KVM guest from one qcow2 image because the
> > > > > image is write-locked by the first guest to be started.
> > > > > 
> > > > > I suppose the disk which transient option is enabled don't need to
> > > > > get the write lock because any changes go to the overlay image, and
> > > > > the overlay image is removed when the guest shutdown.
> > > > > 
> > > > > qemu has 'locking' option and the write lock is disabled when 
> > > > > locking=off.
> > > > > To implement that, I have two ideas. I would appreciate it if you 
> > > > > could
> > > > > give me the ideas which way is better (or another way).
> > > > 
> > > > There are two aspects of this.
> > > > 
> > > > 1) Controlling the locking property of qemu may be worth in many cases,
> > > > so by itself this would be a worthwhile patchset to add control of
> > > > qemu locking for a per-backing store basis.
> > > > 
> > > > 2) Disabling locking to achieve this is wrong though. The qemu image
> > > > locking infrastructure is justified and prevents many problems which
> > > > users might get into by wrong configuration.
> > > > 
> > > > For  disks, we should rather instantiate the top level
> > > > format node as 'readonly' and then put a read-write overlay on top of
> > > > it. This still prevents from using the file which became a backing file
> > > > in read-write mode in another VM.
> > > 
> > > Thank you for the idea!
> > > I just tried to change the original image to read-only (changed the 
> > > "read-only":false
> > > to "read-only":true) simply, then created a read-write overlay.
> > > qemu stopped with following assertion:
> > > 
> > >   qemu-system-x86_64: ../block/io.c:1874: bdrv_co_write_req_prepare: 
> > > Assertion `child->perm & BLK_PERM_WRITE' failed.
> > > 
> > > It looks like the qemu hit the assertion because the permission of the 
> > > overlay image
> > > is same as the original image.
> > > Probably I'm missing something... I'll dive into that...
> > 
> > Could you please post the command line and the QMP commands? Maybe
> > something isn't configured right in libvirt. Alternatively qemu might
> > need modification.
> 
> Sure, here is the qemu command line options and QMP commands:
> 
> qemu command line options:
> 
> qemu-system-x86_64 \
>   -M q35,accel=kvm,usb=off,vmport=off,smm=on,dump-guest-core=off \
>   -smp 4 \
>   -m 4096 \
>   -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/guest.qcow2","node-name":"storage1","auto-read-only":true,"discard":"unmap"}'
>  \
>   -blockdev 
> '{"node-name":"format1","read-only":true,"driver":"qcow2","file":"storage1","backing":null}'
>  \
>   -device virtio-blk-pci,drive=format1,id=virtio-disk0,bootindex=1 \
>   -nographic \
>   -nodefaults \
>   -no-user-config \
>   -serial telnet::1235,server,nowait \
>   -qmp tcp::1335,server,nowait \
>   -S
> 
> QMP commands:
> (Before running following QMP commands, I create 
> '/var/lib/libvirt/images/guest.qcow2.TRANSIENT'
>  by touch command)
> 
>{"execute":"qmp_capabilities"}
>
> {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.qcow2.TRANSIENT","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}
>
> {"execute":"blockdev-create",&qu

Re: [PATCH 1/2] file-posix: Use OFD lock only if the filesystem supports the lock

2020-11-19 Thread Masayoshi Mizuma
On Thu, Nov 19, 2020 at 11:44:42AM +0100, Kevin Wolf wrote:
> Am 18.11.2020 um 20:48 hat Masayoshi Mizuma geschrieben:
> > On Wed, Nov 18, 2020 at 02:10:36PM -0500, Masayoshi Mizuma wrote:
> > > On Wed, Nov 18, 2020 at 04:42:47PM +0100, Kevin Wolf wrote:
> > > > Am 06.11.2020 um 05:01 hat Masayoshi Mizuma geschrieben:
> > > > > From: Masayoshi Mizuma 
> > > > > 
> > > > > locking=auto doesn't work if the filesystem doesn't support OFD lock.
> > > > > In that situation, following error happens:
> > > > > 
> > > > >   qemu-system-x86_64: -blockdev 
> > > > > driver=qcow2,node-name=disk,file.driver=file,file.filename=/mnt/guest.qcow2,file.locking=auto:
> > > > >  Failed to lock byte 100
> > > > > 
> > > > > qemu_probe_lock_ops() judges whether qemu can use OFD lock
> > > > > or not with doing fcntl(F_OFD_GETLK) to /dev/null. So the
> > > > > error happens if /dev/null supports OFD lock, but the filesystem
> > > > > doesn't support the lock.
> > > > > 
> > > > > Lock the actual file, not /dev/null, using F_OFD_SETLK and if that
> > > > > fails, then fallback to F_SETLK.
> > > > > 
> > > > > Signed-off-by: Masayoshi Mizuma 
> 
> > > > > -bool qemu_has_ofd_lock(void)
> > > > > -{
> > > > > -qemu_probe_lock_ops();
> > > > >  #ifdef F_OFD_SETLK
> > > > > -return fcntl_op_setlk == F_OFD_SETLK;
> > > > > +static int _qemu_lock_fcntl(int fd, struct flock *fl)
> > > > > +{
> > > > > +int ret;
> > > > > +bool ofd_lock = true;
> > > > > +
> > > > > +do {
> > > > > +if (ofd_lock) {
> > > > > +ret = fcntl(fd, F_OFD_SETLK, fl);
> > > > > +if ((ret == -1) && (errno == EINVAL)) {
> > > > > +ofd_lock = false;
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +if (!ofd_lock) {
> > > > > +/* Fallback to POSIX lock */
> > > > > +ret = fcntl(fd, F_SETLK, fl);
> > > > > +}
> > > > > +} while (ret == -1 && errno == EINTR);
> > > > > +
> > > > > +return ret == -1 ? -errno : 0;
> > > > > +}
> > > > >  #else
> > > > > -return false;
> > > > > -#endif
> > > > > +static int _qemu_lock_fcntl(int fd, struct flock *fl)
> > > > > +{
> > > > > +int ret;
> > > > > +
> > > > > +do {
> > > > > +ret = fcntl(fd, F_SETLK, fl);
> > > > > +} while (ret == -1 && errno == EINTR);
> > > > > +
> > > > > +return ret == -1 ? -errno : 0;
> > > > >  }
> > > > > +#endif
> > > > 
> > > > The logic looks fine to me, at least assuming that EINVAL is really what
> > > > we will consistently get from the kernel if OFD locks are not supported.
> > > > Is this documented anywhere? The fcntl manpage doesn't seem to mention
> > > > this case.
> > 
> > The man page of fcntl(2) says:
> > 
> >EINVAL The value specified in cmd is not recognized by this kernel.
> > 
> > So I think EINVAL is good enough to check whether the filesystem supports
> > OFD locks or not...
> 
> A kernel not knowing the cmd at all is a somewhat different case (and
> certainly a different code path) than a filesystem not supporting it.
> 
> I just had a look at the kernel code, and to me it seems that the
> difference between POSIX locks and OFD locks is handled entirely in
> filesystem independent code. A filesystem driver would in theory have
> ways to distinguish both, but I don't see any driver in the kernel tree
> that actually does this (and there is probably little reason for a
> driver to do so).
> 
> So now I wonder what filesystem you are using? I'm curious what I
> missed.

I'm using a proprietary filesystem, which isn't in the linux kernel.
The filesystem supports posix lock only, doesn't support OFD lock...

Thanks,
Masa



[PATCH v2] file-posix: Use OFD lock only if the filesystem supports the lock

2020-11-18 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

locking=auto doesn't work if the filesystem doesn't support OFD lock.
In that situation, following error happens:

  qemu-system-x86_64: -blockdev 
driver=qcow2,node-name=disk,file.driver=file,file.filename=/mnt/guest.qcow2,file.locking=auto:
 Failed to lock byte 100

qemu_probe_lock_ops() judges whether qemu can use OFD lock
or not with doing fcntl(F_OFD_GETLK) to /dev/null. So the
error happens if /dev/null supports OFD lock, but the filesystem
doesn't support the lock.

Lock the actual file, not /dev/null, using F_OFD_SETLK and if that
fails, then fallback to F_SETLK.

Fixes: ca749954b0 ("osdep: Add runtime OFD lock detection")
Signed-off-by: Masayoshi Mizuma 
---
 block/file-posix.c |  56 -
 include/qemu/osdep.h   |   2 +-
 tests/test-image-locking.c |  13 +++-
 util/osdep.c   | 125 +
 4 files changed, 111 insertions(+), 85 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d5fd1dbcd2..ff8a2804c5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -584,34 +584,6 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 s->use_linux_io_uring = (aio == BLOCKDEV_AIO_OPTIONS_IO_URING);
 #endif
 
-locking = qapi_enum_parse(_lookup,
-  qemu_opt_get(opts, "locking"),
-  ON_OFF_AUTO_AUTO, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
-goto fail;
-}
-switch (locking) {
-case ON_OFF_AUTO_ON:
-s->use_lock = true;
-if (!qemu_has_ofd_lock()) {
-warn_report("File lock requested but OFD locking syscall is "
-"unavailable, falling back to POSIX file locks");
-error_printf("Due to the implementation, locks can be lost "
- "unexpectedly.\n");
-}
-break;
-case ON_OFF_AUTO_OFF:
-s->use_lock = false;
-break;
-case ON_OFF_AUTO_AUTO:
-s->use_lock = qemu_has_ofd_lock();
-break;
-default:
-abort();
-}
-
 str = qemu_opt_get(opts, "pr-manager");
 if (str) {
 s->pr_mgr = pr_manager_lookup(str, _err);
@@ -641,6 +613,34 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 s->fd = fd;
 
+locking = qapi_enum_parse(_lookup,
+  qemu_opt_get(opts, "locking"),
+  ON_OFF_AUTO_AUTO, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+switch (locking) {
+case ON_OFF_AUTO_ON:
+s->use_lock = true;
+if (!qemu_has_ofd_lock(s->fd)) {
+warn_report("File lock requested but OFD locking syscall is "
+"unavailable, falling back to POSIX file locks");
+error_printf("Due to the implementation, locks can be lost "
+ "unexpectedly.\n");
+}
+break;
+case ON_OFF_AUTO_OFF:
+s->use_lock = false;
+break;
+case ON_OFF_AUTO_AUTO:
+s->use_lock = qemu_has_ofd_lock(s->fd);
+break;
+default:
+abort();
+}
+
 /* Check s->open_flags rather than bdrv_flags due to auto-read-only */
 if (s->open_flags & O_RDWR) {
 ret = check_hdev_writable(s->fd);
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index f9ec8c84e9..222138a81a 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -512,7 +512,7 @@ int qemu_dup(int fd);
 int qemu_lock_fd(int fd, int64_t start, int64_t len, bool exclusive);
 int qemu_unlock_fd(int fd, int64_t start, int64_t len);
 int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive);
-bool qemu_has_ofd_lock(void);
+bool qemu_has_ofd_lock(int fd);
 #endif
 
 #if defined(__HAIKU__) && defined(__i386__)
diff --git a/tests/test-image-locking.c b/tests/test-image-locking.c
index ba057bd66c..7a28d9d043 100644
--- a/tests/test-image-locking.c
+++ b/tests/test-image-locking.c
@@ -144,14 +144,21 @@ static void test_set_perm_abort(void)
 
 int main(int argc, char **argv)
 {
+int fd;
+
 bdrv_init();
 qemu_init_main_loop(_abort);
 
 g_test_init(, , NULL);
 
-if (qemu_has_ofd_lock()) {
-g_test_add_func("/image-locking/basic", test_image_locking_basic);
-g_test_add_func("/image-locking/set-perm-abort", test_set_perm_abort);
+fd = open("/dev/null", O_RDONLY);
+
+if (fd != -1 && qemu_has_ofd_lock(fd)) {
+if (qemu_has_ofd_lock(fd)) {
+g_test_add_func("/image-locking/basic", test_image_locking_basic);
+g_test_add_func("/image-locking/set-perm-abort",
+  

Re: [PATCH 1/2] file-posix: Use OFD lock only if the filesystem supports the lock

2020-11-18 Thread Masayoshi Mizuma
On Wed, Nov 18, 2020 at 02:10:36PM -0500, Masayoshi Mizuma wrote:
> On Wed, Nov 18, 2020 at 04:42:47PM +0100, Kevin Wolf wrote:
> > Am 06.11.2020 um 05:01 hat Masayoshi Mizuma geschrieben:
> > > From: Masayoshi Mizuma 
> > > 
> > > locking=auto doesn't work if the filesystem doesn't support OFD lock.
> > > In that situation, following error happens:
> > > 
> > >   qemu-system-x86_64: -blockdev 
> > > driver=qcow2,node-name=disk,file.driver=file,file.filename=/mnt/guest.qcow2,file.locking=auto:
> > >  Failed to lock byte 100
> > > 
> > > qemu_probe_lock_ops() judges whether qemu can use OFD lock
> > > or not with doing fcntl(F_OFD_GETLK) to /dev/null. So the
> > > error happens if /dev/null supports OFD lock, but the filesystem
> > > doesn't support the lock.
> > > 
> > > Lock the actual file, not /dev/null, using F_OFD_SETLK and if that
> > > fails, then fallback to F_SETLK.
> > > 
> > > Signed-off-by: Masayoshi Mizuma 
> > 
> > CCing qemu-block, which is the relevant mailing list. You can use the
> > scripts/get_maintainer.pl script to find out who should be CCed on your
> > patches.
> > 
> > As qemu-devel itself is a very high traffic list, it's easy for a patch
> > to get lost if it's only sent there.
> 
> Thank you for letting me know.
> I'll do scripts/get_maintainer.pl to get the mailing list before posting 
> patches.
> 
> > 
> > > diff --git a/util/osdep.c b/util/osdep.c
> > > index 66d01b9160..454e8ef9f4 100644
> > > --- a/util/osdep.c
> > > +++ b/util/osdep.c
> > > @@ -117,9 +117,6 @@ int qemu_mprotect_none(void *addr, size_t size)
> > >  
> > >  #ifndef _WIN32
> > >  
> > > -static int fcntl_op_setlk = -1;
> > > -static int fcntl_op_getlk = -1;
> > > -
> > >  /*
> > >   * Dups an fd and sets the flags
> > >   */
> > > @@ -187,68 +184,87 @@ static int qemu_parse_fdset(const char *param)
> > >  return qemu_parse_fd(param);
> > >  }
> > >  
> > > -static void qemu_probe_lock_ops(void)
> > > +bool qemu_has_ofd_lock(int orig_fd)
> > >  {
> > > -if (fcntl_op_setlk == -1) {
> > >  #ifdef F_OFD_SETLK
> > > -int fd;
> > > -int ret;
> > > -struct flock fl = {
> > > -.l_whence = SEEK_SET,
> > > -.l_start  = 0,
> > > -.l_len= 0,
> > > -.l_type   = F_WRLCK,
> > > -};
> > > -
> > > -fd = open("/dev/null", O_RDWR);
> > > -if (fd < 0) {
> > > +int fd;
> > > +int ret;
> > > +struct flock fl = {
> > > +.l_whence = SEEK_SET,
> > > +.l_start  = 0,
> > > +.l_len= 0,
> > > +.l_type   = F_RDLCK,
> > > +};
> > > +
> > > +fd = qemu_dup(orig_fd);
> > > +if (fd >= 0) {
> > > +ret = fcntl_setfl(fd, O_RDONLY);
> > 
> > I don't understand this part. Why are you trying to reopen the file
> > descriptor read-only? Shouldn't the test work fine with a read-write
> > file descriptor? /dev/null was opened O_RDWR in the old code.
> > 
> > > +if (ret) {
> > >  fprintf(stderr,
> > > -"Failed to open /dev/null for OFD lock probing: 
> > > %s\n",
> > > -strerror(errno));
> > > -fcntl_op_setlk = F_SETLK;
> > > -fcntl_op_getlk = F_GETLK;
> > > -return;
> > > -}
> > > -ret = fcntl(fd, F_OFD_GETLK, );
> > > -close(fd);
> > > -if (!ret) {
> > > -fcntl_op_setlk = F_OFD_SETLK;
> > > -fcntl_op_getlk = F_OFD_GETLK;
> > > -} else {
> > > -fcntl_op_setlk = F_SETLK;
> > > -fcntl_op_getlk = F_GETLK;
> > > +"Failed to fcntl for OFD lock probing.\n");
> > > +qemu_close(fd);
> > > +return false;
> > >  }
> > > +}
> > > +
> > > +ret = fcntl(fd, F_OFD_GETLK, );
> > > +qemu_close(fd);
> > 
> > F_OFD_GETLK doesn't modify the state, so it seems to me that even the
> > qemu_dup() is unnecessary and we could just directly try F_OFD_

Re: [PATCH 1/2] file-posix: Use OFD lock only if the filesystem supports the lock

2020-11-18 Thread Masayoshi Mizuma
On Wed, Nov 18, 2020 at 04:42:47PM +0100, Kevin Wolf wrote:
> Am 06.11.2020 um 05:01 hat Masayoshi Mizuma geschrieben:
> > From: Masayoshi Mizuma 
> > 
> > locking=auto doesn't work if the filesystem doesn't support OFD lock.
> > In that situation, following error happens:
> > 
> >   qemu-system-x86_64: -blockdev 
> > driver=qcow2,node-name=disk,file.driver=file,file.filename=/mnt/guest.qcow2,file.locking=auto:
> >  Failed to lock byte 100
> > 
> > qemu_probe_lock_ops() judges whether qemu can use OFD lock
> > or not with doing fcntl(F_OFD_GETLK) to /dev/null. So the
> > error happens if /dev/null supports OFD lock, but the filesystem
> > doesn't support the lock.
> > 
> > Lock the actual file, not /dev/null, using F_OFD_SETLK and if that
> > fails, then fallback to F_SETLK.
> > 
> > Signed-off-by: Masayoshi Mizuma 
> 
> CCing qemu-block, which is the relevant mailing list. You can use the
> scripts/get_maintainer.pl script to find out who should be CCed on your
> patches.
> 
> As qemu-devel itself is a very high traffic list, it's easy for a patch
> to get lost if it's only sent there.

Thank you for letting me know.
I'll do scripts/get_maintainer.pl to get the mailing list before posting 
patches.

> 
> > diff --git a/util/osdep.c b/util/osdep.c
> > index 66d01b9160..454e8ef9f4 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -117,9 +117,6 @@ int qemu_mprotect_none(void *addr, size_t size)
> >  
> >  #ifndef _WIN32
> >  
> > -static int fcntl_op_setlk = -1;
> > -static int fcntl_op_getlk = -1;
> > -
> >  /*
> >   * Dups an fd and sets the flags
> >   */
> > @@ -187,68 +184,87 @@ static int qemu_parse_fdset(const char *param)
> >  return qemu_parse_fd(param);
> >  }
> >  
> > -static void qemu_probe_lock_ops(void)
> > +bool qemu_has_ofd_lock(int orig_fd)
> >  {
> > -if (fcntl_op_setlk == -1) {
> >  #ifdef F_OFD_SETLK
> > -int fd;
> > -int ret;
> > -struct flock fl = {
> > -.l_whence = SEEK_SET,
> > -.l_start  = 0,
> > -.l_len= 0,
> > -.l_type   = F_WRLCK,
> > -};
> > -
> > -fd = open("/dev/null", O_RDWR);
> > -if (fd < 0) {
> > +int fd;
> > +int ret;
> > +struct flock fl = {
> > +.l_whence = SEEK_SET,
> > +.l_start  = 0,
> > +.l_len= 0,
> > +.l_type   = F_RDLCK,
> > +};
> > +
> > +fd = qemu_dup(orig_fd);
> > +if (fd >= 0) {
> > +ret = fcntl_setfl(fd, O_RDONLY);
> 
> I don't understand this part. Why are you trying to reopen the file
> descriptor read-only? Shouldn't the test work fine with a read-write
> file descriptor? /dev/null was opened O_RDWR in the old code.
> 
> > +if (ret) {
> >  fprintf(stderr,
> > -"Failed to open /dev/null for OFD lock probing: %s\n",
> > -strerror(errno));
> > -fcntl_op_setlk = F_SETLK;
> > -fcntl_op_getlk = F_GETLK;
> > -return;
> > -}
> > -ret = fcntl(fd, F_OFD_GETLK, );
> > -close(fd);
> > -if (!ret) {
> > -fcntl_op_setlk = F_OFD_SETLK;
> > -fcntl_op_getlk = F_OFD_GETLK;
> > -} else {
> > -fcntl_op_setlk = F_SETLK;
> > -fcntl_op_getlk = F_GETLK;
> > +"Failed to fcntl for OFD lock probing.\n");
> > +qemu_close(fd);
> > +return false;
> >  }
> > +}
> > +
> > +ret = fcntl(fd, F_OFD_GETLK, );
> > +qemu_close(fd);
> 
> F_OFD_GETLK doesn't modify the state, so it seems to me that even the
> qemu_dup() is unnecessary and we could just directly try F_OFD_GETLK on
> the passed file descriptor (orig_fd).

OK, I'll change to try F_OFD_GETLK of orig_fd directly.

> 
> > +
> > +if (ret == 0) {
> > +return true;
> > +} else {
> > +return false;
> > +}
> 
> This should be written shorter as return ret == 0;
> 
> >  #else
> > -fcntl_op_setlk = F_SETLK;
> > -fcntl_op_getlk = F_GETLK;
> > +return false;
> >  #endif
> > -}
> >  }
> >  
> > -bool qemu_has_ofd_lock(void)
> > -{
> > -qemu_probe_lock_ops();
> >  #ifdef F_OFD_SETLK
> > -return fcntl_op_setlk == F_OFD_

Re: [PATCH v3 1/6] qemu: Block blockjobs when transient disk option is enabled

2020-09-19 Thread Masayoshi Mizuma
On Fri, Sep 18, 2020 at 02:57:08PM +0200, Peter Krempa wrote:
> On Fri, Sep 18, 2020 at 14:53:53 +0200, Peter Krempa wrote:
> > On Thu, Sep 17, 2020 at 09:30:40 -0400, Masayoshi Mizuma wrote:
> > > From: Masayoshi Mizuma 
> > > 
> > > Signed-off-by: Masayoshi Mizuma 
> > > ---
> > >  src/qemu/qemu_domain.c | 7 +++
> > >  1 file changed, 7 insertions(+)
> > 
> > Reviewed-by: Peter Krempa 
> 
> Actually, I'd prefer some explanation in the commit message. No need to
> re-send for now as I'll probably be fixing some things in other patches
> as well.

Got it, thank you so much for your help!

- Masa



[PATCH v3 5/6] qemu: Introduce to handle transient disk

2020-09-17 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Here is the implementation of transient option for qcow2 and raw format
disk. This gets available  directive in domain xml file
like as:


  
  
  
  


When the qemu command line options are built, a new qcow2 image is
created with backing qcow2 by using blockdev-snapshot command.
The backing image is the qcow2 file which is set as .
The filename of the new qcow2 image is original-source-file.TRANSIENT.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_snapshot.c  | 103 +++---
 src/qemu/qemu_snapshot.h  |   5 ++
 src/util/virstoragefile.c |   2 +
 src/util/virstoragefile.h |   4 ++
 4 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index 1e8ea80b22..67fdc488e0 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -1158,7 +1158,8 @@ qemuSnapshotCreateDiskActive(virQEMUDriverPtr driver,
  virHashTablePtr blockNamedNodeData,
  unsigned int flags,
  virQEMUDriverConfigPtr cfg,
- qemuDomainAsyncJob asyncJob)
+ qemuDomainAsyncJob asyncJob,
+ bool domSave)
 {
 qemuDomainObjPrivatePtr priv = vm->privateData;
 g_autoptr(virJSONValue) actions = NULL;
@@ -1201,17 +1202,26 @@ qemuSnapshotCreateDiskActive(virQEMUDriverPtr driver,
 
 virDomainAuditDisk(vm, dd->disk->src, dd->src, "snapshot", rc >= 0);
 
-if (rc == 0)
+if (rc == 0) {
 qemuSnapshotDiskUpdateSource(driver, vm, dd, blockdev);
+
+if (dd->disk->transient) {
+/* Mark the transient working is completed to make sure we can 
*/
+/* remove the transient disk when the guest is shutdown. */
+dd->disk->src->transientEstablished = true;
+}
+}
 }
 
 if (rc < 0)
 goto cleanup;
 
-if (virDomainObjSave(vm, driver->xmlopt, cfg->stateDir) < 0 ||
-(vm->newDef && virDomainDefSave(vm->newDef, driver->xmlopt,
-cfg->configDir) < 0))
-goto cleanup;
+if (domSave) {
+if (virDomainObjSave(vm, driver->xmlopt, cfg->stateDir) < 0 ||
+(vm->newDef && virDomainDefSave(vm->newDef, driver->xmlopt,
+cfg->configDir) < 0))
+goto cleanup;
+}
 
 ret = 0;
 
@@ -1349,7 +1359,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriverPtr driver,
 
 if ((ret = qemuSnapshotCreateDiskActive(driver, vm, snap,
 blockNamedNodeData, flags, cfg,
-QEMU_ASYNC_JOB_SNAPSHOT)) < 0)
+QEMU_ASYNC_JOB_SNAPSHOT, true)) < 
0)
 goto cleanup;
 
 /* the snapshot is complete now */
@@ -2264,3 +2274,82 @@ qemuSnapshotDelete(virDomainObjPtr vm,
  cleanup:
 return ret;
 }
+
+
+static int
+qemuSnapshotSetupTransientDisk(virDomainSnapshotDiskDefPtr def,
+   virStorageSourcePtr src)
+{
+g_autoptr(virStorageSource) trans = NULL;
+
+if (!(trans = virStorageSourceNew()))
+return -1;
+
+trans->path = g_strdup_printf("%s.TRANSIENT", src->path);
+if (virFileExists(trans->path)) {
+virReportError(VIR_ERR_INVALID_ARG,
+   _("Transient disk '%s' for '%s' exists"),
+   trans->path, src->path);
+return -1;
+}
+
+trans->type = VIR_STORAGE_TYPE_FILE;
+trans->format = VIR_STORAGE_FILE_QCOW2;
+
+def->src = g_steal_pointer();
+
+return 0;
+}
+
+
+int
+qemuSnapshotCreateTransientDisk(virQEMUDriverPtr driver,
+virDomainObjPtr vm,
+int asyncJob)
+{
+int rc;
+size_t i;
+virDomainMomentObjPtr snap = NULL;
+g_autoptr(virDomainSnapshotDef) snapdef = NULL;
+g_autoptr(virHashTable) blockNamedNodeData = NULL;
+qemuDomainObjPrivatePtr priv = vm->privateData;
+g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver);
+
+if (!(snapdef = virDomainSnapshotDefNew()))
+return -1;
+
+snapdef->parent.name = g_strdup_printf("transient");
+
+snapdef->ndisks = vm->def->ndisks;
+if (VIR_ALLOC_N(snapdef->disks, snapdef->ndisks) < 0)
+return -1;
+
+for (i = 0; i < vm->def->ndisks; i++) {
+virDomainDiskDefPtr disk = vm->def->disks[i];
+virDomainSnapshotDiskDefPtr snapdisk = &(snapdef->disks[i]);
+
+if (disk->transient) {
+if ((rc = qemuSnapshotSetupTransientDisk(snapdisk, disk->src)) < 0

[PATCH v3 0/6] qemu: implementation of transient disk option

2020-09-17 Thread Masayoshi Mizuma
This patchset tries to implement transient option for qcow2 and raw
format disk.

It gets user available to set  to the domain xml file like as:


  
  
  
  


Any changes which the Guest does to the disk is dropped when the Guest
is shutdowned.

There are some limitations for transient disk option so far:

- Supported disk format is qcow2 and raw
- blockdev capability is required for qemu
- Following features are blocked with transient disk option
  - blockjobs 
  - Migration
  - Disk hotplug

Masayoshi Mizuma (6):
  qemu: Block blockjobs when transient disk option is enabled
  qemu: Block disk hotplug when transient disk option is enabled
  qemu: Block migration when transient disk option is enabled
  qemu: update the validation for transient disk option
  qemu: Introduce to handle transient disk
  qemu: Add transient disk handler to start and stop the guest

 src/qemu/qemu_domain.c|   7 +++
 src/qemu/qemu_hotplug.c   |   6 +++
 src/qemu/qemu_migration.c |  10 
 src/qemu/qemu_process.c   |  10 
 src/qemu/qemu_snapshot.c  | 103 +++---
 src/qemu/qemu_snapshot.h  |   5 ++
 src/qemu/qemu_validate.c  |  25 -
 src/util/virstoragefile.c |   2 +
 src/util/virstoragefile.h |   4 ++
 9 files changed, 164 insertions(+), 8 deletions(-)

-- 
2.27.0




[PATCH v3 3/6] qemu: Block migration when transient disk option is enabled

2020-09-17 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Block migration when transient disk option is enabled because migration
requires some blockjobs.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_migration.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index a530c17582..7316d74677 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -1397,6 +1397,16 @@ qemuMigrationSrcIsAllowed(virQEMUDriverPtr driver,
_("cannot migrate this domain without dbus-vmstate 
support"));
 return false;
 }
+
+for (i = 0; i < vm->def->ndisks; i++) {
+virDomainDiskDefPtr disk = vm->def->disks[i];
+
+if (disk->transient) {
+virReportError(VIR_ERR_OPERATION_INVALID, "%s",
+   _("migration with transient disk is not 
supported"));
+return false;
+}
+}
 }
 
 return true;
-- 
2.27.0




[PATCH v3 4/6] qemu: update the validation for transient disk option

2020-09-17 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Update validation of transient disk option. The option for qemu is supported
with under condistions.

- qemu has blockdev feature 
- the type is file and the format is qcow2 and raw
- writable disk

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_validate.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 070f1c962b..9cf78ca0c9 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -2002,6 +2002,28 @@ qemuValidateDomainDeviceDefDiskSerial(const char *value)
 }
 
 
+static int
+qemuValidateDomainDeviceDefDiskTransient(const virDomainDiskDef *disk,
+ virQEMUCapsPtr qemuCaps)
+{
+if ((!qemuCaps) || !virQEMUCapsGet(qemuCaps, QEMU_CAPS_BLOCKDEV))
+return false;
+
+if (disk->src->readonly)
+return false;
+
+if ((disk->src->format != VIR_STORAGE_FILE_QCOW2) &&
+(disk->src->format != VIR_STORAGE_FILE_RAW) &&
+(disk->src->type != VIR_STORAGE_TYPE_FILE)) {
+return false;
+}
+
+if (virStorageSourceIsEmpty(disk->src))
+return false;
+
+return true;
+}
+
 static int
 qemuValidateDomainDeviceDefDiskFrontend(const virDomainDiskDef *disk,
 virQEMUCapsPtr qemuCaps)
@@ -2186,7 +2208,8 @@ qemuValidateDomainDeviceDefDiskFrontend(const 
virDomainDiskDef *disk,
 }
 }
 
-if (disk->transient) {
+if ((disk->transient) &&
+!qemuValidateDomainDeviceDefDiskTransient(disk, qemuCaps)) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("transient disks not supported yet"));
 return -1;
-- 
2.27.0




[PATCH v3 2/6] qemu: Block disk hotplug when transient disk option is enabled

2020-09-17 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_hotplug.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index e2c6e14c2e..20dcec7b65 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -1031,6 +1031,12 @@ qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr 
driver,
 return -1;
 }
 
+if (disk->transient) {
+virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
+   _("transient disk hotplug isn't supported"));
+return -1;
+}
+
 if (virDomainDiskTranslateSourcePool(disk) < 0)
 goto cleanup;
 
-- 
2.27.0




[PATCH v3 6/6] qemu: Add transient disk handler to start and stop the guest

2020-09-17 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

The transient disk is attached before the guest starts.
Remove the transient disk when the guest does shutdown.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_process.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index b1af35b933..387b8071f4 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -60,6 +60,7 @@
 #include "qemu_firmware.h"
 #include "qemu_backup.h"
 #include "qemu_dbus.h"
+#include "qemu_snapshot.h"
 
 #include "cpu/cpu.h"
 #include "cpu/cpu_x86.h"
@@ -7053,6 +7054,10 @@ qemuProcessLaunch(virConnectPtr conn,
 qemuProcessAutoDestroyAdd(driver, vm, conn) < 0)
 goto cleanup;
 
+VIR_DEBUG("Setting up transient disk");
+if (qemuSnapshotCreateTransientDisk(driver, vm, asyncJob) < 0)
+goto cleanup;
+
 ret = 0;
 
  cleanup:
@@ -7689,6 +7694,11 @@ void qemuProcessStop(virQEMUDriverPtr driver,
 }
 
 qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src);
+
+if ((disk->transient) && (disk->src->transientEstablished)) {
+VIR_DEBUG("unlink transient disk: %s", disk->src->path);
+unlink(disk->src->path);
+}
 }
 }
 
-- 
2.27.0




[PATCH v3 1/6] qemu: Block blockjobs when transient disk option is enabled

2020-09-17 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_domain.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 785cee6f18..1ce0f70971 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -10794,6 +10794,13 @@ qemuDomainDiskBlockJobIsSupported(virDomainObjPtr vm,
 return false;
 }
 
+if (disk->transient) {
+virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
+   _("block jobs are not supported on transient disk 
'%s'"),
+   disk->dst);
+return false;
+}
+
 return true;
 }
 
-- 
2.27.0




Re: [PATCH v2 5/7] qemu: Block migration when transient disk option is enabled

2020-09-13 Thread Masayoshi Mizuma
On Tue, Sep 08, 2020 at 03:17:47PM +0200, Peter Krempa wrote:
> On Fri, Aug 28, 2020 at 10:08:35 -0400, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma 
> > 
> > Block migration when transient disk option is enabled because migration
> > requires some blockjobs.
> > 
> > Signed-off-by: Masayoshi Mizuma 
> > ---
> >  src/qemu/qemu_migration.c | 22 ++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> > index 0f2f92b211..6fcf5a3a07 100644
> > --- a/src/qemu/qemu_migration.c
> > +++ b/src/qemu/qemu_migration.c
> > @@ -2949,6 +2949,22 @@ qemuMigrationDstPrepareDirect(virQEMUDriverPtr 
> > driver,
> >  }
> >  
> >  
> > +static bool
> > +qemuMigrationTransientDiskExists(virDomainDefPtr def)
> > +{
> > +size_t i;
> > +
> > +for (i = 0; i < def->ndisks; i++) {
> > +   virDomainDiskDefPtr disk = def->disks[i];
> > +
> > +   if (disk->transient)
> > +return true;
> > +}
> > +
> > +return false;
> > +}
> > +
> > +
> >  virDomainDefPtr
> >  qemuMigrationAnyPrepareDef(virQEMUDriverPtr driver,
> > virQEMUCapsPtr qemuCaps,
> > @@ -2971,6 +2987,12 @@ qemuMigrationAnyPrepareDef(virQEMUDriverPtr driver,
> >  
> > VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE)))
> >  goto cleanup;
> >  
> > +/*
> > + * transient disk option is a blocker for migration
> > + */
> > +if (qemuMigrationTransientDiskExists(def))
> > +   goto cleanup;
> 
> This should really be placed into qemuMigrationSrcIsAllowed() rather
> than open-coded. Migration is used in other places too and doesn't use
> this API to trigger it.

OK, I'll add the following to block the migration.

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index a530c17582..7316d74677 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -1397,6 +1397,16 @@ qemuMigrationSrcIsAllowed(virQEMUDriverPtr driver,
_("cannot migrate this domain without dbus-vmstate 
support"));
 return false;
 }
+
+for (i = 0; i < vm->def->ndisks; i++) {
+virDomainDiskDefPtr disk = vm->def->disks[i];
+
+if (disk->transient) {
+virReportError(VIR_ERR_OPERATION_INVALID, "%s",
+   _("migration with transient disk is not 
supported"));
+return false;
+}
+}
 }
 
 return true;



Re: [PATCH v2 4/7] qemu: Transient option gets avaiable for qcow2 and raw format disk

2020-09-13 Thread Masayoshi Mizuma
On Tue, Sep 08, 2020 at 03:14:42PM +0200, Peter Krempa wrote:
> On Fri, Aug 28, 2020 at 10:08:33 -0400, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma 
> > 
> > Signed-off-by: Masayoshi Mizuma 
> > ---
> >  src/qemu/qemu_validate.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
> > index 488f258d00..82818a4fdc 100644
> > --- a/src/qemu/qemu_validate.c
> > +++ b/src/qemu/qemu_validate.c
> > @@ -2166,9 +2166,12 @@ qemuValidateDomainDeviceDefDiskFrontend(const 
> > virDomainDiskDef *disk,
> >  }
> >  
> >  if (disk->transient) {
> > -virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> > -   _("transient disks not supported yet"));
> > -return -1;
> > +if ((disk->src->format != VIR_STORAGE_FILE_QCOW2) &&
> > +(disk->src->format != VIR_STORAGE_FILE_RAW)) {
> 
> This needs a check that QEMU_CAPS_BLOCKDEV is available as it won't
> really work properly in pre-blockdev config. Also here you
> should reject any other unsupported configuration rather than in the
> code.

Thanks. I'll add the following validation.

diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 070f1c962b..9cf78ca0c9 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -2002,6 +2002,28 @@ qemuValidateDomainDeviceDefDiskSerial(const char *value)
 }
 
 
+static int
+qemuValidateDomainDeviceDefDiskTransient(const virDomainDiskDef *disk,
+ virQEMUCapsPtr qemuCaps)
+{
+if ((!qemuCaps) || !virQEMUCapsGet(qemuCaps, QEMU_CAPS_BLOCKDEV))
+return false;
+
+if (disk->src->readonly)
+return false;
+
+if ((disk->src->format != VIR_STORAGE_FILE_QCOW2) &&
+(disk->src->format != VIR_STORAGE_FILE_RAW) &&
+(disk->src->type != VIR_STORAGE_TYPE_FILE)) {
+return false;
+}
+
+if (virStorageSourceIsEmpty(disk->src))
+return false;
+
+return true;
+}
+
 static int
 qemuValidateDomainDeviceDefDiskFrontend(const virDomainDiskDef *disk,
 virQEMUCapsPtr qemuCaps)
@@ -2186,7 +2208,8 @@ qemuValidateDomainDeviceDefDiskFrontend(const 
virDomainDiskDef *disk,
 }
 }
 
-if (disk->transient) {
+if ((disk->transient) &&
+!qemuValidateDomainDeviceDefDiskTransient(disk, qemuCaps)) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("transient disks not supported yet"));
 return -1;




[PATCH v2 5/7] qemu: Block migration when transient disk option is enabled

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Block migration when transient disk option is enabled because migration
requires some blockjobs.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_migration.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 0f2f92b211..6fcf5a3a07 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -2949,6 +2949,22 @@ qemuMigrationDstPrepareDirect(virQEMUDriverPtr driver,
 }
 
 
+static bool
+qemuMigrationTransientDiskExists(virDomainDefPtr def)
+{
+size_t i;
+
+for (i = 0; i < def->ndisks; i++) {
+   virDomainDiskDefPtr disk = def->disks[i];
+
+   if (disk->transient)
+return true;
+}
+
+return false;
+}
+
+
 virDomainDefPtr
 qemuMigrationAnyPrepareDef(virQEMUDriverPtr driver,
virQEMUCapsPtr qemuCaps,
@@ -2971,6 +2987,12 @@ qemuMigrationAnyPrepareDef(virQEMUDriverPtr driver,
 VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE)))
 goto cleanup;
 
+/*
+ * transient disk option is a blocker for migration
+ */
+if (qemuMigrationTransientDiskExists(def))
+   goto cleanup;
+
 if (dname) {
 name = def->name;
 def->name = g_strdup(dname);
-- 
2.27.0




[PATCH v2 6/7] qemu: Block disk hotplug when transient disk option is enabled

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Block disk hotplug when transient disk option is enabled so far.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_hotplug.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 2c6c30ce03..1c1b6c3acf 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -1031,6 +1031,12 @@ qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr 
driver,
 return -1;
 }
 
+if (disk->transient) {
+virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
+   _("transient disk hotplug isn't supported"));
+return -1;
+}
+
 if (virDomainDiskTranslateSourcePool(disk) < 0)
 goto cleanup;
 
-- 
2.27.0




[PATCH v2 7/7] qemu: Block blockjobs when transient disk option is enabled

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Block blockjobs when transient disk option is enabled so far.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_domain.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index e28f704dba..98a52e5476 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -10678,6 +10678,13 @@ qemuDomainDiskBlockJobIsSupported(virDomainObjPtr vm,
 return false;
 }
 
+if (disk->transient) {
+virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
+   _("block jobs are not supported on transient disk 
'%s'"),
+   disk->dst);
+return false;
+}
+
 return true;
 }
 
-- 
2.27.0




[PATCH v2 1/7] qemuSnapshotDiskPrepareOne: Get available even if snapdisk is NULL

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Get available even if snapdisk argument is NULL at qemuSnapshotDiskPrepareOne()
so that the caller can setup dd->src.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_snapshot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index 1e8ea80b22..d310e6ff02 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -953,8 +953,9 @@ qemuSnapshotDiskPrepareOne(virQEMUDriverPtr driver,
 if (qemuDomainStorageSourceValidateDepth(disk->src, 1, disk->dst) < 0)
 return -1;
 
-if (!(dd->src = virStorageSourceCopy(snapdisk->src, false)))
-return -1;
+if (snapdisk)
+if (!(dd->src = virStorageSourceCopy(snapdisk->src, false)))
+return -1;
 
 if (virStorageSourceInitChainElement(dd->src, dd->disk->src, false) < 0)
 return -1;
-- 
2.27.0




[PATCH v2 0/7] qemu: implementation of transient disk option

2020-08-28 Thread Masayoshi Mizuma
This patchset tries to implement transient option for qcow2 and raw
format disk. This uses the snapshot cleanup codes:
https://www.redhat.com/archives/libvir-list/2020-August/msg00299.html

It gets user available to set  to the domain xml file like as:


  
  
  
  


Any changes which the Guest does to the disk is dropped when the Guest
is shutdowned.

There are some limitations for transient disk option so far:

- Supported disk format is qcow2 and raw
- blockdev capability is required for qemu
- Following features are blocked with transient disk option
  - blockjobs 
  - Migration
  - Disk hotplug

Masayoshi Mizuma (7):
  qemuSnapshotDiskPrepareOne: Get available even if snapdisk is NULL
  qemu: Introduce functions to handle transient disk
  qemu: Add transient disk handler to start and stop the guest
  qemu: Transient option gets avaiable for qcow2 and raw format disk
  qemu: Block blockjobs when transient disk option is enabled
  qemu: Block migration when transient disk option is enabled
  qemu: Block disk hotplug when transient disk option is enabled

 src/qemu/qemu_domain.c|   7 ++
 src/qemu/qemu_hotplug.c   |   6 ++
 src/qemu/qemu_migration.c |  22 ++
 src/qemu/qemu_process.c   |   8 +++
 src/qemu/qemu_snapshot.c  | 139 +-
 src/qemu/qemu_snapshot.h  |   8 +++
 src/qemu/qemu_validate.c  |   9 ++-
 src/util/virstoragefile.h |   2 +
 8 files changed, 196 insertions(+), 5 deletions(-)

-- 
2.27.0




[PATCH v2 3/7] qemu: Add transient disk handler to start and stop the guest

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

The transient disk is attached before the guest starts.
Remove the transient disk when the guest does shutdown.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_process.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 126fabf5ef..5753258135 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -60,6 +60,7 @@
 #include "qemu_firmware.h"
 #include "qemu_backup.h"
 #include "qemu_dbus.h"
+#include "qemu_snapshot.h"
 
 #include "cpu/cpu.h"
 #include "cpu/cpu_x86.h"
@@ -7000,6 +7001,10 @@ qemuProcessLaunch(virConnectPtr conn,
 qemuProcessAutoDestroyAdd(driver, vm, conn) < 0)
 goto cleanup;
 
+VIR_DEBUG("Setting up transient disk");
+if (qemuTransientCreatetDisk(driver, vm, asyncJob) < 0)
+goto cleanup;
+
 ret = 0;
 
  cleanup:
@@ -7636,6 +7641,9 @@ void qemuProcessStop(virQEMUDriverPtr driver,
 }
 
 qemuBlockRemoveImageMetadata(driver, vm, disk->dst, disk->src);
+
+if (disk->transient)
+qemuTransientRemoveDisk(disk);
 }
 }
 
-- 
2.27.0




[PATCH v2 2/7] qemu: Introduce functions to handle transient disk

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Here is the implementation of transient option for qcow2 and raw format
disk. This gets available  directive in domain xml file
like as:


  
  
  
  


When the qemu command line options are built, a new qcow2 image is
created with backing qcow2 by using blockdev-snapshot command.
The backing image is the qcow2 file which is set as .
The filename of the new qcow2 image is original-source-file.TRANSIENT.

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_snapshot.c  | 134 ++
 src/qemu/qemu_snapshot.h  |   8 +++
 src/util/virstoragefile.h |   2 +
 3 files changed, 144 insertions(+)

diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index d310e6ff02..5c61d19f26 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -2265,3 +2265,137 @@ qemuSnapshotDelete(virDomainObjPtr vm,
  cleanup:
 return ret;
 }
+
+static int
+qemuTransientDiskPrepareOne(virQEMUDriverPtr driver,
+virDomainObjPtr vm,
+qemuSnapshotDiskDataPtr data,
+virHashTablePtr blockNamedNodeData,
+int asyncJob,
+virJSONValuePtr actions)
+{
+int rc = -1;
+virStorageSourcePtr dest;
+virStorageSourcePtr src = data->disk->src;
+qemuDomainObjPrivatePtr priv = vm->privateData;
+g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver);
+
+if (!strlen(src->path))
+return rc;
+
+if (!(dest = virStorageSourceNew()))
+return rc;
+
+dest->path = g_strdup_printf("%s.TRANSIENT", src->path);
+
+if (virFileExists(dest->path)) {
+virReportError(VIR_ERR_INVALID_ARG,
+   _("Transient disk '%s' for '%s' exists"),
+   dest->path, src->path);
+goto cleanup;
+}
+
+dest->type = VIR_STORAGE_TYPE_FILE;
+dest->format = VIR_STORAGE_FILE_QCOW2;
+data->src = dest;
+
+if (qemuSnapshotDiskPrepareOne(driver, vm, cfg, data->disk,
+ NULL, data, blockNamedNodeData,
+ false, true, asyncJob, actions) < 0)
+goto cleanup;
+
+rc = 0;
+ cleanup:
+if (rc < 0)
+g_free(dest->path);
+
+return rc;
+}
+
+static int
+qemuWaitTransaction(virQEMUDriverPtr driver,
+virDomainObjPtr vm,
+int asyncJob,
+virJSONValuePtr *actions)
+{
+qemuDomainObjPrivatePtr priv = vm->privateData;
+
+if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0)
+return -1;
+
+if (qemuMonitorTransaction(priv->mon, actions) < 0)
+return -1;
+
+if (qemuDomainObjExitMonitor(driver, vm) < 0)
+return -1;
+
+return 0;
+}
+
+int
+qemuTransientCreatetDisk(virQEMUDriverPtr driver,
+ virDomainObjPtr vm,
+ int asyncJob)
+{
+size_t i;
+int rc = -1;
+size_t ndata = 0;
+qemuSnapshotDiskDataPtr data = NULL;
+g_autoptr(virJSONValue) actions = NULL;
+qemuDomainObjPrivatePtr priv = vm->privateData;
+g_autoptr(virHashTable) blockNamedNodeData = NULL;
+bool blockdev =  virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV);
+
+if (!blockdev)
+return rc;
+
+if (VIR_ALLOC_N(data, vm->def->ndisks) < 0)
+return rc;
+
+if (!(blockNamedNodeData = qemuBlockGetNamedNodeData(vm, asyncJob)))
+goto cleanup;
+
+if (!(actions = virJSONValueNewArray()))
+goto cleanup;
+
+for (i = 0; i < vm->def->ndisks; i++) {
+virDomainDiskDefPtr disk = vm->def->disks[i];
+
+if (disk->src->readonly)
+continue;
+
+if (disk->transient) {
+data[ndata].disk = disk;
+if (qemuTransientDiskPrepareOne(driver, vm, [ndata], 
blockNamedNodeData,
+ asyncJob, actions) < 0)
+goto cleanup;
+ndata++;
+}
+}
+
+if (qemuWaitTransaction(driver, vm, asyncJob, ) < 0)
+goto cleanup;
+
+for (i = 0; i < ndata; i++) {
+qemuSnapshotDiskDataPtr dd = [i];
+
+qemuSnapshotDiskUpdateSource(driver, vm, dd, blockdev);
+dd->disk->src->transientEstablished = true;
+}
+
+VIR_FREE(data);
+rc = 0;
+ cleanup:
+qemuSnapshotDiskCleanup(data, vm->def->ndisks, driver, vm, asyncJob);
+
+return rc;
+}
+
+void
+qemuTransientRemoveDisk(virDomainDiskDefPtr disk)
+{
+if (disk->src->transientEstablished) {
+VIR_DEBUG("unlink transient disk: %s", disk->src->path);
+unlink(disk->src->path);
+}
+}
diff --git a/src/qemu/qemu_snapshot.h b/src/qemu/qemu_snapshot.h
index 8b3ebe87b1..aecb17

[PATCH v2 4/7] qemu: Transient option gets avaiable for qcow2 and raw format disk

2020-08-28 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_validate.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 488f258d00..82818a4fdc 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -2166,9 +2166,12 @@ qemuValidateDomainDeviceDefDiskFrontend(const 
virDomainDiskDef *disk,
 }
 
 if (disk->transient) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
-   _("transient disks not supported yet"));
-return -1;
+if ((disk->src->format != VIR_STORAGE_FILE_QCOW2) &&
+(disk->src->format != VIR_STORAGE_FILE_RAW)) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+   _("transient disks not supported yet"));
+return -1;
+}
 }
 
 if (disk->iomode == VIR_DOMAIN_DISK_IO_NATIVE &&
-- 
2.27.0




Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-08-06 Thread Masayoshi Mizuma
On Thu, Aug 06, 2020 at 12:09:20PM +0200, Peter Krempa wrote:
> On Sun, Jul 19, 2020 at 22:56:50 -0400, Masayoshi Mizuma wrote:
> > On Sat, Jul 18, 2020 at 08:06:00AM +0200, Peter Krempa wrote:
> > > On Thu, Jul 16, 2020 at 20:55:29 -0400, Masayoshi Mizuma wrote:
> > > > Thank you for your review.
> > > > 
> > > > On Tue, Jul 07, 2020 at 06:36:23AM -0500, Eric Blake wrote:
> > > > > On 7/7/20 2:12 AM, Peter Krempa wrote:
> > > > > 
> > > > > > You can install a qcow2 overlay on top of a raw file too. IMO the
> > > > > > implications of using  allow that.
> > > > > > 
> > > > > > As said above I'd strongly prefer if the overlay is created in qemu
> > > > > > using the blockdev-create blockjob (there is already infrastructure 
> > > > > > in
> > > > > > libvirt to achieve that).
> > > > > 
> > > > > Agreed.  At this point, any time we call out to qemu-img as a separate
> > > > > process, we are probably doing it wrong.
> > > > 
> > > > Got it. I'm thinking about the procedure such as followings.
> > > > Does that make sense?
> > > > 
> > > >   1) Open the monitor with qemuProcessQMPNew()/qemuProcessQMPStart(), 
> > > >  and connect it.
> > > 
> > > Starting a new qemu process just to format an image is extreme overkill
> > > and definitely not what we want to do.
> > 
> > I see, thanks.
> > 
> > > 
> > > >   2) Setup the transient disk with 
> > > > qemuDomainPrepareStorageSourceBlockdev(),
> > > >  qemuBlockStorageSourceAttachApplyStorage(), 
> > > > qemuBlockStorageSourceCreateGetFormatProps()
> > > >  and something...
> > > > 
> > > >   3) Run blockdev-create command with qemuMonitorBlockdevCreate(), then
> > > >  close the monitor.
> > > 
> > > These two steps should be exectued in the qemu process which already
> > > will run the VM prior to starting the guest CPUs.
> > > 
> > > >   4) Switch the original disk to the transient disk.
> > > > 
> > > >   5) Build the blockdev argument for qemu.
> > > 
> > > And instead of this step, you use the external snapshot infrastructure
> > > to install the overlays via 'blockdev-snapshot' QMP command
> > 
> > OK. I suppose qemuDomainSnapshotDiskPrepare() and
> > qemuDomainSnapshotDiskUpdateSource() maybe helpful to implement the
> > setup steps of transient disk.
> > 
> > > 
> > > > 
> > > >   6) Run qemu
> > > 
> > > And instead of this the VM cpus will be started.
> > 
> > Got it, I think the appropriate place is after virCommandRun() at 
> > qemuProcessLaunch(),
> > and before qemuProcessFinishStartup().
> > 
> > > 
> > > 
> > > The above steps require factoring out snapshot code a bit. I have a few
> > > patches in that direction so I'll try posting them next week hopefully.
> 
> Sorry this took longer than expected, but we were dealing with the build
> system change.
> 
> The refactor is here:
> 
> https://www.redhat.com/archives/libvir-list/2020-August/msg00299.html
> 
> You can now create an equivalent of 'qemuSnapshotDiskPrepare' which will
> go through the disks and find the 'transient' ones. It will then create
> snapshot data by a call to 'qemuSnapshotDiskPrepareOne' with a faked
> snapshot disk object.
> 
> 'qemuSnapshotDiskPrepareOne' ensures that the files are created and
> formatted properly.
> 
> You then use same algorithm as 'qemuSnapshotCreateDiskActive'
> (e.g. by extracting the common internals (basically everything except
> call to 'qemuSnapshotDiskPrepare') into a separate function) and reuse
> it when starting the VM as you've described above.
> 
> Note that all of the above can work only when QEMU_CAPS_BLOCKDEV is
> supported.
> 
> The caveats/limitations with blockjobs and snapshots still apply though.
> It depends on how you approach it. It's okay to limit/block the features
> if transient disk is used. Alternatively you can implement some form of
> private data to mark which image was transient and allow those
> operations.

Thank you for the helpful advice and the patches!
I'll try to implement the transient disk procedure with the refactor.

Thanks!
Masa



Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-19 Thread Masayoshi Mizuma
On Sat, Jul 18, 2020 at 08:06:00AM +0200, Peter Krempa wrote:
> On Thu, Jul 16, 2020 at 20:55:29 -0400, Masayoshi Mizuma wrote:
> > Thank you for your review.
> > 
> > On Tue, Jul 07, 2020 at 06:36:23AM -0500, Eric Blake wrote:
> > > On 7/7/20 2:12 AM, Peter Krempa wrote:
> > > 
> > > > You can install a qcow2 overlay on top of a raw file too. IMO the
> > > > implications of using  allow that.
> > > > 
> > > > As said above I'd strongly prefer if the overlay is created in qemu
> > > > using the blockdev-create blockjob (there is already infrastructure in
> > > > libvirt to achieve that).
> > > 
> > > Agreed.  At this point, any time we call out to qemu-img as a separate
> > > process, we are probably doing it wrong.
> > 
> > Got it. I'm thinking about the procedure such as followings.
> > Does that make sense?
> > 
> >   1) Open the monitor with qemuProcessQMPNew()/qemuProcessQMPStart(), 
> >  and connect it.
> 
> Starting a new qemu process just to format an image is extreme overkill
> and definitely not what we want to do.

I see, thanks.

> 
> >   2) Setup the transient disk with qemuDomainPrepareStorageSourceBlockdev(),
> >  qemuBlockStorageSourceAttachApplyStorage(), 
> > qemuBlockStorageSourceCreateGetFormatProps()
> >  and something...
> > 
> >   3) Run blockdev-create command with qemuMonitorBlockdevCreate(), then
> >  close the monitor.
> 
> These two steps should be exectued in the qemu process which already
> will run the VM prior to starting the guest CPUs.
> 
> >   4) Switch the original disk to the transient disk.
> > 
> >   5) Build the blockdev argument for qemu.
> 
> And instead of this step, you use the external snapshot infrastructure
> to install the overlays via 'blockdev-snapshot' QMP command

OK. I suppose qemuDomainSnapshotDiskPrepare() and
qemuDomainSnapshotDiskUpdateSource() maybe helpful to implement the
setup steps of transient disk.

> 
> > 
> >   6) Run qemu
> 
> And instead of this the VM cpus will be started.

Got it, I think the appropriate place is after virCommandRun() at 
qemuProcessLaunch(),
and before qemuProcessFinishStartup().

> 
> 
> The above steps require factoring out snapshot code a bit. I have a few
> patches in that direction so I'll try posting them next week hopefully.

Great, thanks!

- Masa



Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-16 Thread Masayoshi Mizuma
Thank you for your review.

On Tue, Jul 07, 2020 at 06:36:23AM -0500, Eric Blake wrote:
> On 7/7/20 2:12 AM, Peter Krempa wrote:
> 
> > You can install a qcow2 overlay on top of a raw file too. IMO the
> > implications of using  allow that.
> > 
> > As said above I'd strongly prefer if the overlay is created in qemu
> > using the blockdev-create blockjob (there is already infrastructure in
> > libvirt to achieve that).
> 
> Agreed.  At this point, any time we call out to qemu-img as a separate
> process, we are probably doing it wrong.

Got it. I'm thinking about the procedure such as followings.
Does that make sense?

  1) Open the monitor with qemuProcessQMPNew()/qemuProcessQMPStart(), 
 and connect it.

  2) Setup the transient disk with qemuDomainPrepareStorageSourceBlockdev(),
 qemuBlockStorageSourceAttachApplyStorage(), 
qemuBlockStorageSourceCreateGetFormatProps()
 and something...

  3) Run blockdev-create command with qemuMonitorBlockdevCreate(), then
 close the monitor.
 
  4) Switch the original disk to the transient disk.

  5) Build the blockdev argument for qemu.

  6) Run qemu

And I suppose qemuBlockStorageSourceCreate() maybe useful for the hotplug...

Thanks,
Masa

> 
> > 
> > Additionally there are corner cases which this patch doesn't deal with:
> > 
> > 1) the virDomainBlockCopy operation flattens the backing chain into the
> > top level only. This means that  must be stripped or the
> > operation rejected, as otherwise shutting down the VM would end up
> > removing the disk image completely.
> 
> If you have marked a disk transient, does it still make sense to allow
> copying that disk?  Rejecting the operation is easiest, as permitting it
> implies that even though you already said you don't care about changes to
> your disk, you still want to be able to back up that disk.
> 
> > 
> > 2) the same as above is used also for non-shared-storage migration where
> > we use block-copy internally to transport the disks, same as above
> > applies. Here again it requires careful consideration of the semantics,
> > e.g whether to reject it or e.g. copy it into the original filename and
> > strip  (we can't currently copy multiple layers).
> 
> The easiest solution is to make a transient disk a migration-blocker. Of
> course, this may annoy people, so migration properly creating a transient
> destination on top of the original base, to preserve the fact that when the
> migrated guest shuts down it reverts to original contents, is a nicer (but
> more complex) goal.
> 
> > 
> > 3) active-layer virDomainBlockCommit moves the data from the transient
> > overlay into the original (now backing image). The  flag is
> > stored in the disk struct though, so that would mean that the original
> > disk source would be removed after stopping the VM. block commit must
> > clear the  flag.
> 
> Why should commit be permitted, when you declared that disk contents
> shouldn't change?  For that matter, external snapshots should be blocked if
> there is a transient disk.
> 
> > 
> > One nice-to-have but not required modification would be to allow
> > configuration of the transient disk's overlay path.
> > 
> > 



[PATCH 0/3] qemu: implementation of transient option for qcow2 file

2020-07-06 Thread Masayoshi Mizuma
Hello,

This patchset tries to implement transient option for qcow2 file.

It gets user available to set  to the domain xml file like as:


  
  
  
  


Any changes which the Guest does to the disk is dropped when the Guest
is shutdowned.

Masayoshi Mizuma (3):
  qemu: implementation of transient option for qcow2 file
  testutilsqemu: Assign qemu-img path to driver->qemuImgBinary
  qemublocktest: add test of transient option for qcow2 file

 src/qemu/qemu_block.c | 71 +++
 src/qemu/qemu_block.h |  7 ++
 src/qemu/qemu_domain.c|  4 ++
 src/qemu/qemu_process.c   |  3 +
 src/qemu/qemu_validate.c  |  2 +-
 tests/qemublocktest.c | 10 +++
 .../xml2json/qcow2-transient-srconly.json |  9 +++
 .../xml2json/qcow2-transient.json | 13 
 .../xml2json/qcow2-transient.xml  | 13 
 tests/testutilsqemu.c |  6 +-
 10 files changed, 136 insertions(+), 2 deletions(-)
 create mode 100644 
tests/qemublocktestdata/xml2json/qcow2-transient-srconly.json
 create mode 100644 tests/qemublocktestdata/xml2json/qcow2-transient.json
 create mode 100644 tests/qemublocktestdata/xml2json/qcow2-transient.xml

-- 
2.27.0




[PATCH 2/3] testutilsqemu: Assign qemu-img path to driver->qemuImgBinary

2020-07-06 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Assign qemu-img command file path to driver->qemuImgBinary
so that the unit tests can use qemu-img command.

Signed-off-by: Masayoshi Mizuma 
---
 tests/testutilsqemu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/testutilsqemu.c b/tests/testutilsqemu.c
index 4dcc308..8517f31 100644
--- a/tests/testutilsqemu.c
+++ b/tests/testutilsqemu.c
@@ -104,7 +104,8 @@ char *
 virFindFileInPath(const char *file)
 {
 if (g_str_has_prefix(file, "qemu-system") ||
-g_str_equal(file, "qemu-kvm")) {
+g_str_equal(file, "qemu-kvm") ||
+g_str_equal(file, "qemu-img")) {
 return g_strdup_printf("/usr/bin/%s", file);
 }
 
@@ -317,6 +318,7 @@ void qemuTestDriverFree(virQEMUDriver *driver)
 virObjectUnref(driver->caps);
 virObjectUnref(driver->config);
 virObjectUnref(driver->securityManager);
+virObjectUnref(driver->qemuImgBinary);
 }
 
 int qemuTestCapsCacheInsert(virFileCachePtr cache,
@@ -447,6 +449,8 @@ int qemuTestDriverInit(virQEMUDriver *driver)
 
 qemuTestSetHostCPU(driver, driver->hostarch, NULL);
 
+driver->qemuImgBinary = virFindFileInPath("qemu-img");
+
 return 0;
 
  error:
-- 
2.27.0




[PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-06 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Here is the implementation of transient option for qcow2 file.
This gets available  directive in domain xml file
like as:


  
  
  
  


The internal procedure is as follows.
When the qemu command line options are built, a new qcow2 image is created
with backing qcow2 image by using qemu-img command. The backing image is the
qcow2 file which is set as .
The filename of the new qcow2 image is original-source-file.TRANSIENT.
The qemu-img will be:

qemu-img create -f qcow2 -F qcow2 \
-b /var/lib/libvirt/images/guest.qcow2 \
/var/lib/libvirt/images/guest.qcow2.TRANSIENT

Then, it switches the disk path, virStorageSourcePtr src->path, to
the new qcow2 image. The new image and the backing image is handled and
the blockdev option of qemu will be built like as:

-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/guest.qcow2",

"node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev 
'{"node-name":"libvirt-2-format","read-only":true,"driver":"qcow2",
"file":"libvirt-2-storage","backing":null}' \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/guest.qcow2.TRANSIENT",

"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev 
'{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2",
"file":"libvirt-1-storage","backing":"libvirt-2-format"}'

The new qcow2 image is removed when the Guest is shutdowned, 

Signed-off-by: Masayoshi Mizuma 
---
 src/qemu/qemu_block.c| 71 
 src/qemu/qemu_block.h|  7 
 src/qemu/qemu_domain.c   |  4 +++
 src/qemu/qemu_process.c  |  3 ++
 src/qemu/qemu_validate.c |  2 +-
 5 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c
index 6f9c707..5eb0225 100644
--- a/src/qemu/qemu_block.c
+++ b/src/qemu/qemu_block.c
@@ -27,6 +27,7 @@
 #include "viralloc.h"
 #include "virstring.h"
 #include "virlog.h"
+#include "virqemu.h"
 
 #define VIR_FROM_THIS VIR_FROM_QEMU
 
@@ -3438,3 +3439,73 @@ qemuBlockUpdateRelativeBacking(virDomainObjPtr vm,
 
 return 0;
 }
+
+int
+qemuBlockCreateTransientDisk(virStorageSourcePtr src,
+ qemuDomainObjPrivatePtr priv)
+{
+virBuffer buf = VIR_BUFFER_INITIALIZER;
+virCommandPtr cmd = NULL;
+g_autofree char *filename = NULL;
+g_autofree char *dirname = NULL;
+const char *qemuImgPath;
+char *newpath;
+int err = -1;
+
+if ((src->format != VIR_STORAGE_FILE_QCOW2)) {
+virReportError(VIR_ERR_OPERATION_INVALID,
+   "%s", _("transient option supports qcow2"));
+return -1;
+}
+
+if (!(qemuImgPath = qemuFindQemuImgBinary(priv->driver)))
+return -1;
+
+newpath = g_strdup_printf("%s.TRANSIENT", src->path);
+
+if (virFileExists(newpath)) {
+virReportError(VIR_ERR_INVALID_ARG,
+   _(" '%s' is already exists. Please remove it."), 
newpath);
+goto cleanup;
+}
+
+if (!(cmd = virCommandNewArgList(qemuImgPath,
+ "create",
+ "-f",
+ "qcow2",
+ "-F",
+ "qcow2",
+ "-b",
+ NULL)))
+goto cleanup;
+
+virQEMUBuildBufferEscapeComma(, src->path);
+virCommandAddArgBuffer(cmd, );
+
+virQEMUBuildBufferEscapeComma(, newpath);
+virCommandAddArgBuffer(cmd, );
+
+if (virCommandRun(cmd, NULL) < 0)
+goto cleanup;
+
+VIR_DEBUG("Original disk: %s Transient disk: %s", src->path, newpath);
+
+g_free(src->path);
+src->path = newpath;
+
+err = 0;
+ cleanup:
+virBufferFreeAndReset();
+virCommandFree(cmd);
+if (err)
+g_free(newpath);
+
+return err;
+}
+
+void
+qemuBlockRemoveTransientDisk(virStorageSourcePtr src)
+{
+VIR_DEBUG("unlink transient disk: %s ", src->path);
+unlink(src->path);
+}
diff --git a/src/qemu/qemu_block.h b/src/qemu/qemu_block.h
index 2ad2ce1..60c6898 100644
--- a/src/qemu/qemu_block.h
+++ b/src/qemu/qemu_block.h
@@ -266,3 +266,10 @@ int
 qemuBlockUpdateRelativeBacking(virDomainObjPtr vm,

[PATCH 3/3] qemublocktest: add test of transient option for qcow2 disk

2020-07-06 Thread Masayoshi Mizuma
From: Masayoshi Mizuma 

Add a unit test for transient option for qcow2 file.

Signed-off-by: Masayoshi Mizuma 
---
 tests/qemublocktest.c   | 10 ++
 .../xml2json/qcow2-transient-srconly.json   |  9 +
 .../qemublocktestdata/xml2json/qcow2-transient.json | 13 +
 .../qemublocktestdata/xml2json/qcow2-transient.xml  | 13 +
 4 files changed, 45 insertions(+)
 create mode 100644 
tests/qemublocktestdata/xml2json/qcow2-transient-srconly.json
 create mode 100644 tests/qemublocktestdata/xml2json/qcow2-transient.json
 create mode 100644 tests/qemublocktestdata/xml2json/qcow2-transient.xml

diff --git a/tests/qemublocktest.c b/tests/qemublocktest.c
index 0cdedb9..1294c18 100644
--- a/tests/qemublocktest.c
+++ b/tests/qemublocktest.c
@@ -266,6 +266,7 @@ testQemuDiskXMLToProps(const void *opaque)
 g_autoptr(virJSONValue) formatProps = NULL;
 g_autoptr(virJSONValue) storageProps = NULL;
 g_autoptr(virJSONValue) storageSrcOnlyProps = NULL;
+qemuDomainObjPrivate priv;
 g_autofree char *xmlpath = NULL;
 g_autofree char *xmlstr = NULL;
 
@@ -288,6 +289,13 @@ testQemuDiskXMLToProps(const void *opaque)
 return -1;
 }
 
+if (disk->transient) {
+priv.driver = data->driver;
+if (qemuBlockCreateTransientDisk(disk->src, ) < 0)
+return EXIT_AM_SKIP;
+unlink(disk->src->path);
+}
+
 for (n = disk->src; virStorageSourceIsBacking(n); n = n->backingStore) {
 g_autofree char *backingstore = NULL;
 
@@ -1226,6 +1234,8 @@ mymain(void)
 
 TEST_DISK_TO_JSON("nvme-raw-noopts");
 
+TEST_DISK_TO_JSON("qcow2-transient");
+
 #define TEST_JSON_TO_JSON(nme) \
 do { \
 jsontojsondata.name = nme; \
diff --git a/tests/qemublocktestdata/xml2json/qcow2-transient-srconly.json 
b/tests/qemublocktestdata/xml2json/qcow2-transient-srconly.json
new file mode 100644
index 000..3c2de91
--- /dev/null
+++ b/tests/qemublocktestdata/xml2json/qcow2-transient-srconly.json
@@ -0,0 +1,9 @@
+(
+  source only properties:
+  {
+"driver": "file",
+"filename": "/var/lib/libvirt/images/transient.qcow2.TRANSIENT"
+  }
+  backing store string:
+  /var/lib/libvirt/images/transient.qcow2.TRANSIENT
+)
diff --git a/tests/qemublocktestdata/xml2json/qcow2-transient.json 
b/tests/qemublocktestdata/xml2json/qcow2-transient.json
new file mode 100644
index 000..57463e1
--- /dev/null
+++ b/tests/qemublocktestdata/xml2json/qcow2-transient.json
@@ -0,0 +1,13 @@
+{
+  "node-name": "1234567890",
+  "read-only": false,
+  "driver": "qcow2",
+  "file": "1234567890"
+}
+{
+  "driver": "file",
+  "filename": "/var/lib/libvirt/images/transient.qcow2.TRANSIENT",
+  "node-name": "1234567890",
+  "auto-read-only": true,
+  "discard": "unmap"
+}
diff --git a/tests/qemublocktestdata/xml2json/qcow2-transient.xml 
b/tests/qemublocktestdata/xml2json/qcow2-transient.xml
new file mode 100644
index 000..d2e3919
--- /dev/null
+++ b/tests/qemublocktestdata/xml2json/qcow2-transient.xml
@@ -0,0 +1,13 @@
+
+  
+  
+
+  
+
+
+  
+
+  
+  
+  
+
-- 
2.27.0




Re: Question: How do I discard any changes for the device which is set by blockdev option?

2020-05-19 Thread Masayoshi Mizuma
On Tue, May 19, 2020 at 01:41:08PM -0500, Eric Blake wrote:
> On 5/19/20 12:56 PM, Masayoshi Mizuma wrote:
> > Hello,
> > 
> > I would like to discard any changes while the qemu guest OS is done.
> > I can do that with snapshot and drive option.
> > However, snapshot option doesn't work for the device which set by
> > blockdev option like as:
> > 
> > $QEMU --enable-kvm \
> >-m 1024 \
> >-nographic \
> >-serial mon:stdio \
> >-blockdev driver=file,node-name=mydisk,filename=/mnt/fedora.qcow2 \
> >-blockdev driver=qcow2,node-name=vda,file=mydisk \
> >-device virtio-blk-pci,drive=vda,bootindex=1 \
> >-snapshot
> > 
> > I would like to use blockdev option to set the device because
> > libvirt uses blockdev option for disk element.
> > 
> > If there's no way to do so, does that make sense to get available
> > snapshot option to blockdev as well? If that makes sense, I'll try to
> > implement that.
> > 
> > As for qcow2, I think we can do such things to use qemu-img snapshot
> > command, for example save the original image and restore the image
> > after the qemu guest OS is shutdowned. However, it may be complecated
> > for user. I would like the simple way like as snapshot/drive option...
> > 
> > If I'm missing something, let me know.
> > 
> 
> Sounds like a repeat of this thread:
> https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg06144.html
> 
> where the consensus is yes, -blockdev and -snapshot are incompatible,
> libvirt has plans to use the  tag to behave the same as what
> -snapshot does (but no one has implemented it yet), and in the meantime, it
> is possible to force libvirt to avoid -blockdev if you still need to supply
> -snapshot behind libvirt's back.

Thank you for the info! I didn't notice the thread.
I got we should implement that feature for libvirt side, not qemu.

Thanks!
Masa

> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 



Question: How do I discard any changes for the device which is set by blockdev option?

2020-05-19 Thread Masayoshi Mizuma
Hello,

I would like to discard any changes while the qemu guest OS is done.
I can do that with snapshot and drive option.
However, snapshot option doesn't work for the device which set by
blockdev option like as:

$QEMU --enable-kvm \
  -m 1024 \
  -nographic \
  -serial mon:stdio \
  -blockdev driver=file,node-name=mydisk,filename=/mnt/fedora.qcow2 \
  -blockdev driver=qcow2,node-name=vda,file=mydisk \
  -device virtio-blk-pci,drive=vda,bootindex=1 \
  -snapshot

I would like to use blockdev option to set the device because
libvirt uses blockdev option for disk element.

If there's no way to do so, does that make sense to get available
snapshot option to blockdev as well? If that makes sense, I'll try to
implement that.

As for qcow2, I think we can do such things to use qemu-img snapshot
command, for example save the original image and restore the image
after the qemu guest OS is shutdowned. However, it may be complecated
for user. I would like the simple way like as snapshot/drive option...

If I'm missing something, let me know.

Thanks!
Masa