Re: [PATCH 2/2] nbd: disable signals and forking on Windows builds

2020-08-24 Thread Thomas Huth

On 24/08/2020 19.02, Daniel P. Berrangé wrote:

Disabling these parts are sufficient to get the qemu-nbd program
compiling in a Windows build.


Maybe add:
"This also enables compilation of qemu-nbd on macOS again (which got 
disabled by accident during the conversion to the meson build system)"



Signed-off-by: Daniel P. Berrangé 
---
  meson.build |  7 ++-
  qemu-nbd.c  | 10 +-
  2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/meson.build b/meson.build
index df5bf728b5..1071871605 100644
--- a/meson.build
+++ b/meson.build
@@ -1074,12 +1074,9 @@ if have_tools
   dependencies: [authz, block, crypto, io, qom, qemuutil], 
install: true)
qemu_io = executable('qemu-io', files('qemu-io.c'),
   dependencies: [block, qemuutil], install: true)
-  qemu_block_tools += [qemu_img, qemu_io]
-  if targetos == 'linux' or targetos == 'sunos' or targetos.endswith('bsd')
-qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
+  qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
 dependencies: [block, qemuutil], install: true)
-qemu_block_tools += [qemu_nbd]
-  endif
+  qemu_block_tools += [qemu_img, qemu_io, qemu_nbd]
  
subdir('storage-daemon')

subdir('contrib/rdmacm-mux')
diff --git a/qemu-nbd.c b/qemu-nbd.c
index b102874f0f..c6fd6524d3 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -155,12 +155,13 @@ QEMU_COPYRIGHT "\n"
  , name);
  }
  
+#ifndef WIN32

  static void termsig_handler(int signum)
  {
  atomic_cmpxchg(, RUNNING, TERMINATE);
  qemu_notify_event();
  }
-
+#endif
  
  static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,

  const char *hostname)
@@ -587,6 +588,7 @@ int main(int argc, char **argv)
  unsigned socket_activation;
  const char *pid_file_name = NULL;
  
+#ifndef WIN32

  /* The client thread uses SIGTERM to interrupt the server.  A signal
   * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
   */
@@ -594,6 +596,7 @@ int main(int argc, char **argv)
  memset(_sigterm, 0, sizeof(sa_sigterm));
  sa_sigterm.sa_handler = termsig_handler;
  sigaction(SIGTERM, _sigterm, NULL);
+#endif
  
  #ifdef CONFIG_POSIX


I wonder why the CONFIG_POSIX does not simply start earlier here ... I 
think you could replace your #ifndef WIN32 with #ifdef CONFIG_POSIX that 
way?



  signal(SIGPIPE, SIG_IGN);
@@ -896,6 +899,7 @@ int main(int argc, char **argv)
  #endif
  
  if ((device && !verbose) || fork_process) {

+#ifndef WIN32
  int stderr_fd[2];
  pid_t pid;
  int ret;
@@ -959,6 +963,10 @@ int main(int argc, char **argv)
   */
  exit(errors);
  }
+#else /* WIN32 */
+error_report("Unable to fork into background on Windows hosts");
+exit(EXIT_FAILURE);
+#endif /* WIN32 */
  }
  
  if (device != NULL && sockpath == NULL) {




 Thomas




Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-24 Thread Alberto Garcia
On Sun 23 Aug 2020 11:59:07 PM CEST, Dave Chinner wrote:
>> >> Option 4 is described above as initial file preallocation whereas
>> >> option 1 is per 64k cluster prealloc. Prealloc mode mixup aside, Berto
>> >> is reporting that the initial file preallocation mode is slower than
>> >> the per cluster prealloc mode. Berto, am I following that right?
>> 
>> After looking more closely at the data I can see that there is a peak of
>> ~30K IOPS during the first 5 or 6 seconds and then it suddenly drops to
>> ~7K for the rest of the test.
>
> How big is the filesystem, how big is the log? (xfs_info output,
> please!)

The size of the filesystem is 126GB and here's the output of xfs_info:

meta-data=/dev/vg/test   isize=512agcount=4, agsize=8248576 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=1, rmapbt=0
 =   reflink=0
data =   bsize=4096   blocks=32994304, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=16110, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

>> I was running fio with --ramp_time=5 which ignores the first 5 seconds
>> of data in order to let performance settle, but if I remove that I can
>> see the effect more clearly. I can observe it with raw files (in 'off'
>> and 'prealloc' modes) and qcow2 files in 'prealloc' mode. With qcow2 and
>> preallocation=off the performance is stable during the whole test.
>
> What does "preallocation=off" mean again? Is that using
> fallocate(ZERO_RANGE) prior to the data write rather than
> preallocating the metadata/entire file?

Exactly, it means that. One fallocate() call before each data write
(unless the area has been allocated by a previous write).

> If so, I would expect the limiting factor is the rate at which IO can
> be issued because of the fallocate() triggered pipeline bubbles. That
> leaves idle device time so you're not pushing the limits of the
> hardware and hence none of the behaviours above will be evident...

The thing is that with raw (i.e. non-qcow2) images the number of IOPS is
similar, but in that case there are no fallocate() calls, only the data
writes.

Berto



Re: [PATCH v5 07/10] block: introduce preallocate filter

2020-08-24 Thread Vladimir Sementsov-Ogievskiy

21.08.2020 17:11, Vladimir Sementsov-Ogievskiy wrote:

It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  docs/system/qemu-block-drivers.rst.inc |  26 +++
  qapi/block-core.json   |  20 +-
  block/preallocate.c| 291 +
  block/Makefile.objs|   1 +
  4 files changed, 337 insertions(+), 1 deletion(-)
  create mode 100644 block/preallocate.c

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..5e8a35c571 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -952,3 +952,29 @@ on host and see if there are locks held by the QEMU 
process on the image file.
  More than one byte could be locked by the QEMU instance, each byte of which
  reflects a particular permission that is acquired or protected by the running
  block driver.
+


[..]


+
+static coroutine_fn int preallocate_co_preadv_part(
+BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+QEMUIOVector *qiov, size_t qiov_offset, int flags)
+{
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags);
+}
+
+static int coroutine_fn preallocate_co_pdiscard(BlockDriverState *bs,
+   int64_t offset, int bytes)
+{
+return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static bool coroutine_fn do_preallocate(BlockDriverState *bs, int64_t offset,
+   int64_t bytes, bool write_zero)
+{
+BDRVPreallocateState *s = bs->opaque;
+int64_t len, start, end;


int64_t old_data_end = s->data_end;


+
+if (!s->active) {
+return false;
+}
+
+if (s->data_end >= 0) {
+s->data_end = MAX(s->data_end,
+  QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE));
+}
+
+len = bdrv_getlength(bs->file->bs);
+if (len < 0) {
+return false;


return old_data_end >= 0 && offset >= old_data_end;


And with this small improvement we make the following test 20% faster (ssd, 
ext4):

./qemu-img create -f qcow2 $img 16G; ./qemu-img bench -c 15 -d 64 -f qcow2  
-s 16k -t none -n -w $img;

(assume additional patch which inserts the filter).

Great! So, preallocation is generally good idea, not only for vstorage.

What about inserting the filter by default?

I'll come tomorrow with more complete test results.



--
Best regards,
Vladimir



Re: Suspicious QOM types without instance/class size

2020-08-24 Thread Eduardo Habkost
On Mon, Aug 24, 2020 at 08:06:42PM +0300, Roman Bolshakov wrote:
> On Mon, Aug 24, 2020 at 12:45:52PM -0400, Eduardo Habkost wrote:
> > On Mon, Aug 24, 2020 at 07:41:34PM +0300, Roman Bolshakov wrote:
> > > On Fri, Aug 21, 2020 at 01:48:02PM -0400, Eduardo Habkost wrote:
> > > > Signed-off-by: Eduardo Habkost 
> > > > ---
> > > > diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
> > > > index d81f569aed..81d1662d06 100644
> > > > --- a/target/i386/hvf/hvf.c
> > > > +++ b/target/i386/hvf/hvf.c
> > > > @@ -878,13 +878,11 @@ static int hvf_accel_init(MachineState *ms)
> > > >  {
> > > >  int x;
> > > >  hv_return_t ret;
> > > > -HVFState *s;
> > > > +HVFState *s = HVF_STATE(ms->accelerator);
> > > 
> > > The file also needs definition of MachineState:
> > > #include "hw/boards.h"
> > > 
> > > >  
> > > >  ret = hv_vm_create(HV_VM_DEFAULT);
> > > >  assert_hvf_ok(ret);
> > > >  
> > > > -s = g_new0(HVFState, 1);
> > > > - 
> > > >  s->num_slots = 32;
> > > >  for (x = 0; x < s->num_slots; ++x) {
> > > >  s->slots[x].size = 0;
> > > > @@ -908,6 +906,7 @@ static void hvf_accel_class_init(ObjectClass *oc, 
> > > > void *data)
> > > >  static const TypeInfo hvf_accel_type = {
> > > >  .name = TYPE_HVF_ACCEL,
> > > >  .parent = TYPE_ACCEL,
> > > > +.instance_size = sizeof(HVFState),
> > > >  .class_init = hvf_accel_class_init,
> > > >  };
> > > >  
> > > >  
> > 
> > However, the hvf patch above shouldn't require it.  You should be
> > able to apply and test it on top of qemu.git master.
> > 
> 
> Yeah, that's correct, thanks.
> 
> With the include fix for hw/boards.h, the patch works:
> Reviewed-By: Roman Bolshakov 
> Tested-By: Roman Bolshakov 
> 
> BTW, am I expected to see the accel in "info qtree" (or qom-tree)? It's
> not there for a reason.

I don't know if you are expect to see it.  I don't think there's
explicit code to attach the accel object to the user-visible QOM
tree.

-- 
Eduardo




Re: [PATCH 2/2] nbd: disable signals and forking on Windows builds

2020-08-24 Thread Eric Blake

On 8/24/20 12:02 PM, Daniel P. Berrangé wrote:

Disabling these parts are sufficient to get the qemu-nbd program
compiling in a Windows build.

Signed-off-by: Daniel P. Berrangé 
---
  meson.build |  7 ++-
  qemu-nbd.c  | 10 +-
  2 files changed, 11 insertions(+), 6 deletions(-)


Feels a bit hacky at what it supports, but certainly better than nothing ;)



diff --git a/meson.build b/meson.build
index df5bf728b5..1071871605 100644
--- a/meson.build
+++ b/meson.build
@@ -1074,12 +1074,9 @@ if have_tools
   dependencies: [authz, block, crypto, io, qom, qemuutil], 
install: true)
qemu_io = executable('qemu-io', files('qemu-io.c'),
   dependencies: [block, qemuutil], install: true)
-  qemu_block_tools += [qemu_img, qemu_io]
-  if targetos == 'linux' or targetos == 'sunos' or targetos.endswith('bsd')
-qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
+  qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
 dependencies: [block, qemuutil], install: true)


Conflicts with this patch:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg05546.html

but this one gets rid of the need for that one.


-qemu_block_tools += [qemu_nbd]
-  endif
+  qemu_block_tools += [qemu_img, qemu_io, qemu_nbd]
  
subdir('storage-daemon')

subdir('contrib/rdmacm-mux')
diff --git a/qemu-nbd.c b/qemu-nbd.c
index b102874f0f..c6fd6524d3 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -155,12 +155,13 @@ QEMU_COPYRIGHT "\n"
  , name);
  }
  
+#ifndef WIN32

  static void termsig_handler(int signum)
  {
  atomic_cmpxchg(, RUNNING, TERMINATE);
  qemu_notify_event();
  }
-
+#endif


How does one terminate a long-running server on Windows if there is no 
SIGTERM handler?  I guess Ctrl-C does something, but without the state 
notification from a signal handler, you are getting less-clean 
shutdowns, which may explain the hangs you were seeing in testing?  But 
incremental progress is fine, and I see no reason to not take this patch 
as-is.


Reviewed-by: Eric Blake 

I'm happy to queue this series through my NBD tree.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 1/2] block: add missing socket_init() calls to tools

2020-08-24 Thread Eric Blake

On 8/24/20 12:02 PM, Daniel P. Berrangé wrote:

Any tool that uses sockets needs to call socket_init() in order to work
on the Windows platform.

Signed-off-by: Daniel P. Berrangé 
---
  qemu-img.c | 2 ++
  qemu-io.c  | 2 ++
  qemu-nbd.c | 1 +
  3 files changed, 5 insertions(+)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Suspicious QOM types without instance/class size

2020-08-24 Thread Roman Bolshakov
On Mon, Aug 24, 2020 at 12:45:52PM -0400, Eduardo Habkost wrote:
> On Mon, Aug 24, 2020 at 07:41:34PM +0300, Roman Bolshakov wrote:
> > On Fri, Aug 21, 2020 at 01:48:02PM -0400, Eduardo Habkost wrote:
> > > Signed-off-by: Eduardo Habkost 
> > > ---
> > > diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
> > > index d81f569aed..81d1662d06 100644
> > > --- a/target/i386/hvf/hvf.c
> > > +++ b/target/i386/hvf/hvf.c
> > > @@ -878,13 +878,11 @@ static int hvf_accel_init(MachineState *ms)
> > >  {
> > >  int x;
> > >  hv_return_t ret;
> > > -HVFState *s;
> > > +HVFState *s = HVF_STATE(ms->accelerator);
> > 
> > The file also needs definition of MachineState:
> > #include "hw/boards.h"
> > 
> > >  
> > >  ret = hv_vm_create(HV_VM_DEFAULT);
> > >  assert_hvf_ok(ret);
> > >  
> > > -s = g_new0(HVFState, 1);
> > > - 
> > >  s->num_slots = 32;
> > >  for (x = 0; x < s->num_slots; ++x) {
> > >  s->slots[x].size = 0;
> > > @@ -908,6 +906,7 @@ static void hvf_accel_class_init(ObjectClass *oc, 
> > > void *data)
> > >  static const TypeInfo hvf_accel_type = {
> > >  .name = TYPE_HVF_ACCEL,
> > >  .parent = TYPE_ACCEL,
> > > +.instance_size = sizeof(HVFState),
> > >  .class_init = hvf_accel_class_init,
> > >  };
> > >  
> > >  
> 
> However, the hvf patch above shouldn't require it.  You should be
> able to apply and test it on top of qemu.git master.
> 

Yeah, that's correct, thanks.

With the include fix for hw/boards.h, the patch works:
Reviewed-By: Roman Bolshakov 
Tested-By: Roman Bolshakov 

BTW, am I expected to see the accel in "info qtree" (or qom-tree)? It's
not there for a reason.

Regards,
Roman



[PATCH 0/2] nbd: build qemu-nbd on Windows

2020-08-24 Thread Daniel P . Berrangé
We are already building the NBD client and server on Windows when it is
used via the main system emulator binaries. This demonstrates there is
no fundamental blocker to buildig the qemu-nbd binary too.


In testing this I used:

   wine qemu-nbd.exe -t -p 9000 demo.img 

and

   wine qemu-img.exe info nbd:localhost:9000

In fact I tested the full matrix of native vs windows client and native
vs windows server.

I did notice that native qemu-img will sometimes hang when talking to
the windows qemu-nbd.exe binary, and the windows qemu-img will almost
always hang.

The hang happens in the "blk_unref" call in collect_image_info_list().
This suggests something related to the socket teardown / cleanup in
the NBD code.

While we should obviously investigate and fix that, I didn't consider
it a blocker for enabling build of qemu-nbd.exe, since we're already
building the same (buggy) NBD code in the system emulators on Windows.

Daniel P. Berrangé (2):
  block: add missing socket_init() calls to tools
  nbd: disable signals and forking on Windows builds

 meson.build |  7 ++-
 qemu-img.c  |  2 ++
 qemu-io.c   |  2 ++
 qemu-nbd.c  | 11 ++-
 4 files changed, 16 insertions(+), 6 deletions(-)

-- 
2.26.2





[PATCH 2/2] nbd: disable signals and forking on Windows builds

2020-08-24 Thread Daniel P . Berrangé
Disabling these parts are sufficient to get the qemu-nbd program
compiling in a Windows build.

Signed-off-by: Daniel P. Berrangé 
---
 meson.build |  7 ++-
 qemu-nbd.c  | 10 +-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/meson.build b/meson.build
index df5bf728b5..1071871605 100644
--- a/meson.build
+++ b/meson.build
@@ -1074,12 +1074,9 @@ if have_tools
  dependencies: [authz, block, crypto, io, qom, qemuutil], install: 
true)
   qemu_io = executable('qemu-io', files('qemu-io.c'),
  dependencies: [block, qemuutil], install: true)
-  qemu_block_tools += [qemu_img, qemu_io]
-  if targetos == 'linux' or targetos == 'sunos' or targetos.endswith('bsd')
-qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
+  qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
dependencies: [block, qemuutil], install: true)
-qemu_block_tools += [qemu_nbd]
-  endif
+  qemu_block_tools += [qemu_img, qemu_io, qemu_nbd]
 
   subdir('storage-daemon')
   subdir('contrib/rdmacm-mux')
diff --git a/qemu-nbd.c b/qemu-nbd.c
index b102874f0f..c6fd6524d3 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -155,12 +155,13 @@ QEMU_COPYRIGHT "\n"
 , name);
 }
 
+#ifndef WIN32
 static void termsig_handler(int signum)
 {
 atomic_cmpxchg(, RUNNING, TERMINATE);
 qemu_notify_event();
 }
-
+#endif
 
 static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
 const char *hostname)
@@ -587,6 +588,7 @@ int main(int argc, char **argv)
 unsigned socket_activation;
 const char *pid_file_name = NULL;
 
+#ifndef WIN32
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
  */
@@ -594,6 +596,7 @@ int main(int argc, char **argv)
 memset(_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, _sigterm, NULL);
+#endif
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
@@ -896,6 +899,7 @@ int main(int argc, char **argv)
 #endif
 
 if ((device && !verbose) || fork_process) {
+#ifndef WIN32
 int stderr_fd[2];
 pid_t pid;
 int ret;
@@ -959,6 +963,10 @@ int main(int argc, char **argv)
  */
 exit(errors);
 }
+#else /* WIN32 */
+error_report("Unable to fork into background on Windows hosts");
+exit(EXIT_FAILURE);
+#endif /* WIN32 */
 }
 
 if (device != NULL && sockpath == NULL) {
-- 
2.26.2




[PATCH 1/2] block: add missing socket_init() calls to tools

2020-08-24 Thread Daniel P . Berrangé
Any tool that uses sockets needs to call socket_init() in order to work
on the Windows platform.

Signed-off-by: Daniel P. Berrangé 
---
 qemu-img.c | 2 ++
 qemu-io.c  | 2 ++
 qemu-nbd.c | 1 +
 3 files changed, 5 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 5308773811..eb2fc1f862 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -41,6 +41,7 @@
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
+#include "qemu/sockets.h"
 #include "qemu/units.h"
 #include "qom/object_interfaces.h"
 #include "sysemu/block-backend.h"
@@ -5410,6 +5411,7 @@ int main(int argc, char **argv)
 signal(SIGPIPE, SIG_IGN);
 #endif
 
+socket_init();
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
 qemu_init_exec_dir(argv[0]);
diff --git a/qemu-io.c b/qemu-io.c
index 3adc5a7d0d..7cc832b3d6 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -25,6 +25,7 @@
 #include "qemu/config-file.h"
 #include "qemu/readline.h"
 #include "qemu/log.h"
+#include "qemu/sockets.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qdict.h"
 #include "qom/object_interfaces.h"
@@ -542,6 +543,7 @@ int main(int argc, char **argv)
 signal(SIGPIPE, SIG_IGN);
 #endif
 
+socket_init();
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
 qemu_init_exec_dir(argv[0]);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index d2657b8db5..b102874f0f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -599,6 +599,7 @@ int main(int argc, char **argv)
 signal(SIGPIPE, SIG_IGN);
 #endif
 
+socket_init();
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
 qcrypto_init(_fatal);
-- 
2.26.2




Re: Suspicious QOM types without instance/class size

2020-08-24 Thread Eduardo Habkost
On Mon, Aug 24, 2020 at 07:41:34PM +0300, Roman Bolshakov wrote:
> On Fri, Aug 21, 2020 at 01:48:02PM -0400, Eduardo Habkost wrote:
> > On Fri, Aug 21, 2020 at 01:29:38PM -0400, Eduardo Habkost wrote:
> > > On Fri, Aug 21, 2020 at 01:53:52PM +0300, Roman Bolshakov wrote:
> > > > On Thu, Aug 20, 2020 at 05:55:29PM -0400, Eduardo Habkost wrote:
> > > > > While trying to convert TypeInfo declarations to the new
> > > > > OBJECT_DECLARE* macros, I've stumbled on a few suspicious cases
> > > > > where instance_size or class_size is not set, despite having type
> > > > > checker macros that use a specific type.
> > > > > 
> > > > > The ones with "WARNING" are abstract types (maybe not serious if
> > > > > subclasses set the appropriate sizes).  The ones with "ERROR"
> > > > > don't seem to be abstract types.
> > > > > 
> > > > 
> > > > > ERROR: target/i386/hvf/hvf.c:908:1: instance_size should be set to 
> > > > > sizeof(HVFState)?
> > > > 
> > > 
> > > > BTW, the object definition for hvf seems different from KVM (and perhaps
> > > > wrong?), e.g. HVFState is allocated within init_machine handler and then
> > > > assigned to a global variable:
> > > 
> > > Interesting.  It looks like hvf_state is _not_ the actual QOM
> > > object instance.  The actual TYPE_HVF_ACCEL instance is created
> > > by do_configure_accelerator().  That would explain why the lack
> > > of instance_init never caused any problems.
> > > 
> > > Luckily, no code ever used the HVF_STATE macro.  If
> > > HVF_STATE(hvf_state) got called, it would crash because of
> > > uninitialized object instance data.  If HVF_STATE(machine->accel)
> > > got called, it would return an invalid HVFState pointer (not
> > > hvf_state).
> > > 
> > > I believe the simplest short term solution here is to just delete
> > > the HVF_STATE macro and HVFState::parent field.  We can worry
> > > about actually moving hvf_state to the machine->accel QOM object
> > > later.
> > 
> > Actually, it might be easier to do the full QOM conversion in a
> > single patch instead of deleting the incomplete code.
> > 
> 
> I agree full QOM conversion is better, perhaps we can later move
> certains bits to accel/hvf.c like it's done for kvm/tcg/qtest.
> 
> > Can you check if the following patch works?  I don't have a host
> > where I can test it.
> > 
> 
> Sure, thanks :)
> 
> > Signed-off-by: Eduardo Habkost 
> > ---
> > diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
> > index d81f569aed..81d1662d06 100644
> > --- a/target/i386/hvf/hvf.c
> > +++ b/target/i386/hvf/hvf.c
> > @@ -878,13 +878,11 @@ static int hvf_accel_init(MachineState *ms)
> >  {
> >  int x;
> >  hv_return_t ret;
> > -HVFState *s;
> > +HVFState *s = HVF_STATE(ms->accelerator);
> 
> The file also needs definition of MachineState:
> #include "hw/boards.h"
> 
> >  
> >  ret = hv_vm_create(HV_VM_DEFAULT);
> >  assert_hvf_ok(ret);
> >  
> > -s = g_new0(HVFState, 1);
> > - 
> >  s->num_slots = 32;
> >  for (x = 0; x < s->num_slots; ++x) {
> >  s->slots[x].size = 0;
> > @@ -908,6 +906,7 @@ static void hvf_accel_class_init(ObjectClass *oc, void 
> > *data)
> >  static const TypeInfo hvf_accel_type = {
> >  .name = TYPE_HVF_ACCEL,
> >  .parent = TYPE_ACCEL,
> > +.instance_size = sizeof(HVFState),
> >  .class_init = hvf_accel_class_init,
> >  };
> >  
> >  
> 
> Unfortunately it fails to start (even without the HVF patch):
> ERROR:../qom/object.c:314:type_initialize: assertion failed: 
> (parent->class_size <= ti->class_size)
> Bail out! ERROR:../qom/object.c:314:type_initialize: assertion failed: 
> (parent->class_size <= ti->class_size)
[...]
> It doesn't seem related to HVF QOM changes 樂
> 
> Bisected it to:
> 
> b717702de21461138ac0e1d6775da0bd0482c013 is the first bad commit
> commit b717702de21461138ac0e1d6775da0bd0482c013
> Author: Daniel P. Berrangé 
> Date:   Wed Aug 19 20:12:35 2020 -0400
> 
> crypto: use QOM macros for declaration/definition of secret types
> 
> This introduces the use of the OBJECT_DEFINE and OBJECT_DECLARE macro
> families in the secret types, in order to eliminate boilerplate code.
> 
> Signed-off-by: Daniel P. Berrangé 
> Message-Id: <20200723181410.3145233-4-berra...@redhat.com>
> [ehabkost: rebase, update to pass additional arguments to macro]
> Signed-off-by: Eduardo Habkost 
> Message-Id: <20200820001236.1284548-58-ehabk...@redhat.com>

Oh, that's a bug in my QOM series.  Thanks for debugging it!  I
will fix it in v3.

However, the hvf patch above shouldn't require it.  You should be
able to apply and test it on top of qemu.git master.

-- 
Eduardo




Re: Suspicious QOM types without instance/class size

2020-08-24 Thread Roman Bolshakov
On Fri, Aug 21, 2020 at 01:48:02PM -0400, Eduardo Habkost wrote:
> On Fri, Aug 21, 2020 at 01:29:38PM -0400, Eduardo Habkost wrote:
> > On Fri, Aug 21, 2020 at 01:53:52PM +0300, Roman Bolshakov wrote:
> > > On Thu, Aug 20, 2020 at 05:55:29PM -0400, Eduardo Habkost wrote:
> > > > While trying to convert TypeInfo declarations to the new
> > > > OBJECT_DECLARE* macros, I've stumbled on a few suspicious cases
> > > > where instance_size or class_size is not set, despite having type
> > > > checker macros that use a specific type.
> > > > 
> > > > The ones with "WARNING" are abstract types (maybe not serious if
> > > > subclasses set the appropriate sizes).  The ones with "ERROR"
> > > > don't seem to be abstract types.
> > > > 
> > > 
> > > > ERROR: target/i386/hvf/hvf.c:908:1: instance_size should be set to 
> > > > sizeof(HVFState)?
> > > 
> > 
> > > BTW, the object definition for hvf seems different from KVM (and perhaps
> > > wrong?), e.g. HVFState is allocated within init_machine handler and then
> > > assigned to a global variable:
> > 
> > Interesting.  It looks like hvf_state is _not_ the actual QOM
> > object instance.  The actual TYPE_HVF_ACCEL instance is created
> > by do_configure_accelerator().  That would explain why the lack
> > of instance_init never caused any problems.
> > 
> > Luckily, no code ever used the HVF_STATE macro.  If
> > HVF_STATE(hvf_state) got called, it would crash because of
> > uninitialized object instance data.  If HVF_STATE(machine->accel)
> > got called, it would return an invalid HVFState pointer (not
> > hvf_state).
> > 
> > I believe the simplest short term solution here is to just delete
> > the HVF_STATE macro and HVFState::parent field.  We can worry
> > about actually moving hvf_state to the machine->accel QOM object
> > later.
> 
> Actually, it might be easier to do the full QOM conversion in a
> single patch instead of deleting the incomplete code.
> 

I agree full QOM conversion is better, perhaps we can later move
certains bits to accel/hvf.c like it's done for kvm/tcg/qtest.

> Can you check if the following patch works?  I don't have a host
> where I can test it.
> 

Sure, thanks :)

> Signed-off-by: Eduardo Habkost 
> ---
> diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
> index d81f569aed..81d1662d06 100644
> --- a/target/i386/hvf/hvf.c
> +++ b/target/i386/hvf/hvf.c
> @@ -878,13 +878,11 @@ static int hvf_accel_init(MachineState *ms)
>  {
>  int x;
>  hv_return_t ret;
> -HVFState *s;
> +HVFState *s = HVF_STATE(ms->accelerator);

The file also needs definition of MachineState:
#include "hw/boards.h"

>  
>  ret = hv_vm_create(HV_VM_DEFAULT);
>  assert_hvf_ok(ret);
>  
> -s = g_new0(HVFState, 1);
> - 
>  s->num_slots = 32;
>  for (x = 0; x < s->num_slots; ++x) {
>  s->slots[x].size = 0;
> @@ -908,6 +906,7 @@ static void hvf_accel_class_init(ObjectClass *oc, void 
> *data)
>  static const TypeInfo hvf_accel_type = {
>  .name = TYPE_HVF_ACCEL,
>  .parent = TYPE_ACCEL,
> +.instance_size = sizeof(HVFState),
>  .class_init = hvf_accel_class_init,
>  };
>  
>  

Unfortunately it fails to start (even without the HVF patch):
ERROR:../qom/object.c:314:type_initialize: assertion failed: 
(parent->class_size <= ti->class_size)
Bail out! ERROR:../qom/object.c:314:type_initialize: assertion failed: 
(parent->class_size <= ti->class_size)

(lldb) bt
* thread #3, stop reason = signal SIGABRT
  * frame #0: 0x7fff6a75e33a libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff6a81ae60 libsystem_pthread.dylib`pthread_kill + 430
frame #2: 0x7fff6a6e5808 libsystem_c.dylib`abort + 120
frame #3: 0x000101289c36 libglib-2.0.0.dylib`g_assertion_message + 406
frame #4: 0x000101289c9e libglib-2.0.0.dylib`g_assertion_message_expr + 
94
frame #5: 0x000100353c00 
qemu-system-x86_64`type_initialize(ti=0x0001032403e0) at object.c:314:9 
[opt]
frame #6: 0x00010035378b 
qemu-system-x86_64`type_initialize(ti=0x00010323fd70) at object.c:310:9 
[opt]
frame #7: 0x000100353de8 
qemu-system-x86_64`object_class_foreach_tramp(key=, 
value=0x00010323fd70, opaque=0x75cb8d98) at object.c:1030:5 [opt]
frame #8: 0x00010124b83d libglib-2.0.0.dylib`g_hash_table_foreach + 125
frame #9: 0x000100354079 qemu-system-x86_64`object_class_get_list 
[inlined] object_class_foreach(fn=, implements_type=, 
include_abstract=, opaque=0x75cb8d90) at object.c:1052:5 
[opt]
frame #10: 0x00010035401e 
qemu-system-x86_64`object_class_get_list(implements_type=, 
include_abstract=) at object.c:1109 [opt]
frame #11: 0x00010030875d qemu-system-x86_64`qemu_init [inlined] 
select_machine at vl.c:2438:24 [opt]
frame #12: 0x00010030874c 
qemu-system-x86_64`qemu_init(argc=, argv=, 
envp=) at vl.c:3842 [opt]
frame #13: 0x00018ef9 
qemu-system-x86_64`qemu_main(argc=, argv=, 
envp=) at main.c:49:5 [opt]
frame #14: 

Re: [PATCH] hw/sd/allwinner-sdhost: Use AddressSpace for DMA transfers

2020-08-24 Thread Peter Maydell
On Fri, 14 Aug 2020 at 12:01, Philippe Mathieu-Daudé  wrote:
>
> Allow the device to execute the DMA transfers in a different
> AddressSpace.
>
> The A10 and H3 SoC keep using the system_memory address space,
> but via the proper dma_memory_access() API.
>
> Signed-off-by: Philippe Mathieu-Daudé 



Applied to target-arm.next, thanks.

-- PMM



Re: [RFC qemu 0/6] mirror: implement incremental and bitmap modes

2020-08-24 Thread John Snow

On 8/21/20 9:03 AM, Max Reitz wrote:

On 18.02.20 11:07, Fabian Grünbichler wrote:

[Sorry :/]


picking up on John's in-progress patch series from last summer, this is
a stab at rebasing and adding test cases for the low-hanging fruits:

- bitmap mirror mode with always/on-success/never bitmap sync mode
- incremental mirror mode as sugar for bitmap + on-success

Fabian Grünbichler (4):
   mirror: add check for bitmap-mode without bitmap
   mirror: switch to bdrv_dirty_bitmap_merge_internal
   iotests: add test for bitmap mirror
   mirror: move some checks to QMP

John Snow (2):
   drive-mirror: add support for sync=bitmap mode=never
   drive-mirror: add support for conditional and always bitmap sync modes


Looks reasonable to me.  I would indeed merge patches 2 through 4 into a
single one, and perhaps switch patches 5 and 6.

Also, we still need an S-o-b from John on patch 2.



Whoops! Will do.


I have one question: When the mirror job completes successfully (or is
cancelled “successfully”), the bitmap is always fully cleared when the
job completes, right?  (Unless in “never” mode.)



I don't remember personally, it's been a minute ...


Not that I think we should change the current implementation of “clear
sync_bitmap; merge dirty_bitmap into sync_bitmap;”.  Just a question for
understanding.


Soo...  What’s the plan?

Max






Re: [PATCH v7 00/47] block: Deal with filters

2020-08-24 Thread Kevin Wolf
Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> v6: https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg01715.html
> 
> Branch: https://github.com/XanClic/qemu.git child-access-functions-v7
> Branch: https://git.xanclic.moe/XanClic/qemu.git child-access-functions-v7

Okay, finally made it through the series. Sorry for taking so long. You
can add my Reviewed-by to all patches that I didn't comment on. (Yes,
I'm just too lazy to make the list myself. :-))

Kevin




Re: [PATCH v7 39/47] blockdev: Fix active commit choice

2020-08-24 Thread Kevin Wolf
Am 24.08.2020 um 16:41 hat Max Reitz geschrieben:
> On 24.08.20 16:07, Kevin Wolf wrote:
> > Am 24.08.2020 um 15:18 hat Max Reitz geschrieben:
> >> On 21.08.20 17:50, Kevin Wolf wrote:
> >>> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>  We have to perform an active commit whenever the top node has a parent
>  that has taken the WRITE permission on it.
> 
>  Signed-off-by: Max Reitz 
>  Reviewed-by: Vladimir Sementsov-Ogievskiy 
>  ---
>   blockdev.c | 24 +---
>   1 file changed, 21 insertions(+), 3 deletions(-)
> 
>  diff --git a/blockdev.c b/blockdev.c
>  index 402f1d1df1..237fffbe53 100644
>  --- a/blockdev.c
>  +++ b/blockdev.c
>  @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char 
>  *job_id, const char *device,
>   AioContext *aio_context;
>   Error *local_err = NULL;
>   int job_flags = JOB_DEFAULT;
>  +uint64_t top_perm, top_shared;
>   
>   if (!has_speed) {
>   speed = 0;
>  @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const 
>  char *job_id, const char *device,
>   goto out;
>   }
>   
>  -if (top_bs == bs) {
>  +/*
>  + * Active commit is required if and only if someone has taken a
>  + * WRITE permission on the top node.
> >>>
> >>> ...or if someone wants to take a WRITE permission while the job is
> >>> running.
> >>>
> >>> Future intentions of the user is something that we can't know, so maybe
> >>> this should become an option in the future (not in this series, of
> >>> course).
> >>>
> Historically, we have always
>  + * used active commit for top nodes, so continue that practice.
>  + * (Active commit is never really wrong.)
>  + */
> >>>
> >>> Changing the practice would break compatibility with clients that start
> >>> an active commit job and then attach it to a read-write device, so we
> >>> must continue the practice. I think the comment should be clearer about
> >>> this, it sounds more like "no reason, but why not".
> >>
> >> I think that’s what I meant by “historically”.  Is “legacily” a word?
> >>
> >> But sure, I can make it more explicit.
> >>
> >>> This is even more problematic because the commit job doesn't unshare
> >>> BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
> >>> error.
> >>>
>  +bdrv_get_cumulative_perm(top_bs, _perm, _shared);
>  +if (top_perm & BLK_PERM_WRITE ||
>  +bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
>  +{
>   if (has_backing_file) {
>   error_setg(errp, "'backing-file' specified,"
>    " but 'top' is the active layer");
> >>>
> >>> Hm, this error message isn't accurate any more.
> >>>
> >>> In fact, the implementation isn't consistent with the QAPI documentation
> >>> any more, because backing-file is only an error for the top level.
> >>
> >> Hm.  I wanted to agree, and then I wanted to come up with a QAPI
> >> documentation that fits the new behavior (because I think it makes more
> >> sense to change the QAPI documentation along with the behavior change,
> >> rather than to force us to allow backing-file for anything that isn’t on
> >> the top layer).
> >>
> >> But in the process of coming up with a better description, I noticed
> >> that this doesn’t say “is a root node”, it says “is the active layer”.
> >> I would say a node in the active layer is a node that has some parent
> >> that has taken a WRITE permission on it.  So actually I think that the
> >> documentation is right, and this code only now fits.
> > 
> > Then you may have not only "the" active layer, but multiple active
> > layers. I find this a bit counterintuitive.
> 
> Depends on what you count as a layer.  I don’t think that’s a clearly
> defined term, is it?  I only know of “active layer”, “format layer”,
> “protocol layer”, and you can at least have multiple format layers above
> each other.  So I don’t find it counterintuitive.
> 
> But perhaps it’d be best to just get away from the term “active layer”,
> as you propose below.

Hm, if I needed to describe what a layer is for me intuitively, I guess
it would be something like each non-filter node on a node chain with all
of the filters directly on top of it?

Depending on which link you follow, you get different sets of layers:
For bs->file, you get the format/protocol layer distinction. For
bs->backing, you get essentially what bdrv_backing_chain_next()
iterates.

In this context (which is talking about COW overlays), I expected the
bs->backing link to apply.

The active layer is then the COW layer that is directly referenced by a
guest device, block job or block export.

> > There is a simple reason why backing-file is an error for a root node:
> > It doesn't have overlays, so a value to 

Re: [PATCH v7 39/47] blockdev: Fix active commit choice

2020-08-24 Thread Max Reitz
On 24.08.20 16:07, Kevin Wolf wrote:
> Am 24.08.2020 um 15:18 hat Max Reitz geschrieben:
>> On 21.08.20 17:50, Kevin Wolf wrote:
>>> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
 We have to perform an active commit whenever the top node has a parent
 that has taken the WRITE permission on it.

 Signed-off-by: Max Reitz 
 Reviewed-by: Vladimir Sementsov-Ogievskiy 
 ---
  blockdev.c | 24 +---
  1 file changed, 21 insertions(+), 3 deletions(-)

 diff --git a/blockdev.c b/blockdev.c
 index 402f1d1df1..237fffbe53 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char 
 *job_id, const char *device,
  AioContext *aio_context;
  Error *local_err = NULL;
  int job_flags = JOB_DEFAULT;
 +uint64_t top_perm, top_shared;
  
  if (!has_speed) {
  speed = 0;
 @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char 
 *job_id, const char *device,
  goto out;
  }
  
 -if (top_bs == bs) {
 +/*
 + * Active commit is required if and only if someone has taken a
 + * WRITE permission on the top node.
>>>
>>> ...or if someone wants to take a WRITE permission while the job is
>>> running.
>>>
>>> Future intentions of the user is something that we can't know, so maybe
>>> this should become an option in the future (not in this series, of
>>> course).
>>>
Historically, we have always
 + * used active commit for top nodes, so continue that practice.
 + * (Active commit is never really wrong.)
 + */
>>>
>>> Changing the practice would break compatibility with clients that start
>>> an active commit job and then attach it to a read-write device, so we
>>> must continue the practice. I think the comment should be clearer about
>>> this, it sounds more like "no reason, but why not".
>>
>> I think that’s what I meant by “historically”.  Is “legacily” a word?
>>
>> But sure, I can make it more explicit.
>>
>>> This is even more problematic because the commit job doesn't unshare
>>> BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
>>> error.
>>>
 +bdrv_get_cumulative_perm(top_bs, _perm, _shared);
 +if (top_perm & BLK_PERM_WRITE ||
 +bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
 +{
  if (has_backing_file) {
  error_setg(errp, "'backing-file' specified,"
   " but 'top' is the active layer");
>>>
>>> Hm, this error message isn't accurate any more.
>>>
>>> In fact, the implementation isn't consistent with the QAPI documentation
>>> any more, because backing-file is only an error for the top level.
>>
>> Hm.  I wanted to agree, and then I wanted to come up with a QAPI
>> documentation that fits the new behavior (because I think it makes more
>> sense to change the QAPI documentation along with the behavior change,
>> rather than to force us to allow backing-file for anything that isn’t on
>> the top layer).
>>
>> But in the process of coming up with a better description, I noticed
>> that this doesn’t say “is a root node”, it says “is the active layer”.
>> I would say a node in the active layer is a node that has some parent
>> that has taken a WRITE permission on it.  So actually I think that the
>> documentation is right, and this code only now fits.
> 
> Then you may have not only "the" active layer, but multiple active
> layers. I find this a bit counterintuitive.

Depends on what you count as a layer.  I don’t think that’s a clearly
defined term, is it?  I only know of “active layer”, “format layer”,
“protocol layer”, and you can at least have multiple format layers above
each other.  So I don’t find it counterintuitive.

But perhaps it’d be best to just get away from the term “active layer”,
as you propose below.

> There is a simple reason why backing-file is an error for a root node:
> It doesn't have overlays, so a value to write to the header of overlay
> images just doesn't make sense.

Ah, yeah...

> The same reasoning doesn't apply for writable images that do have
> overlays. Forbidding backing-file is a more arbitrary restriction there.
> I'm not saying that we can't make arbitrary restrictions where allowing
> an option is not worth the effort, but I feel they should be spelt out
> more explicitly instead of twisting words like "active layer" until they
> fit the code.

I’m all for spelling it out more explicitly.  I just noticed that I
couldn’t clearly distinguish “active layer” from “other” cases of nodes
with writers on them, which is why I noted that “active” to me means the
post-patch behavior already.

You’re right that there is no semantic reason for making it an error.
So I just want it to be an error to be lazy.  I hope you let me do that.
 (I 

Re: [PATCH v7 41/47] block: Leave BDS.backing_file constant

2020-08-24 Thread Max Reitz
On 24.08.20 15:14, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> Parts of the block layer treat BDS.backing_file as if it were whatever
>> the image header says (i.e., if it is a relative path, it is relative to
>> the overlay), other parts treat it like a cache for
>> bs->backing->bs->filename (relative paths are relative to the CWD).
>> Considering bs->backing->bs->filename exists, let us make it mean the
>> former.
>>
>> Among other things, this now allows the user to specify a base when
>> using qemu-img to commit an image file in a directory that is not the
>> CWD (assuming, everything uses relative filenames).
>>
>> Before this patch:
>>
>> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
>> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
>> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 
>> 'foo/top.qcow2'
>>
>> After this patch:
>>
>> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
>> Image committed.
>> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
>> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 
>> 'foo/top.qcow2'
>> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
>> Image committed.
>>
>> With this change, bdrv_find_backing_image() must look at whether the
>> user has overridden a BDS's backing file.  If so, it can no longer use
>> bs->backing_file, but must instead compare the given filename against
>> the backing node's filename directly.
>>
>> Note that this changes the QAPI output for a node's backing_file.  We
>> had very inconsistent output there (sometimes what the image header
>> said, sometimes the actual filename of the backing image).  This
>> inconsistent output was effectively useless, so we have to decide one
>> way or the other.  Considering that bs->backing_file usually at runtime
>> contained the path to the image relative to qemu's CWD (or absolute),
>> this patch changes QAPI's backing_file to always report the
>> bs->backing->bs->filename from now on.  If you want to receive the image
>> header information, you have to refer to full-backing-filename.
>>
>> This necessitates a change to iotest 228.  The interesting information
>> it really wanted is the image header, and it can get that now, but it
>> has to use full-backing-filename instead of backing_file.  Because of
>> this patch's changes to bs->backing_file's behavior, we also need some
>> reference output changes.
>>
>> Along with the changes to bs->backing_file, stop updating
>> BDS.backing_format in bdrv_backing_attach() as well.  In order not to
>> change our externally visible behavior (incompatibly), we have to let
>> bdrv_query_image_info() try to get the image format from bs->backing if
>> bs->backing_format is unset.  (The QAPI schema describes
>> backing-filename-format as "the format of the backing file", so it is
>> not necessarily what the image header says, but just the format of the
>> file referenced by backing-filename (if known).)
> 
> Why is it okay to change backing-filename incompatibly, but not
> backing-filename-format?

I hope you’re asking the reverse, i.e. why I don’t change
backing-filename-format, too.  The answer to that is yeah, why not. :)

> I would find it much more consistent if
> ImageInfo reported the value from the header in both fields, and
> BlockDeviceInfo reported the values actually in use.
> 
> The QAPI schema described ImageInfo as "Information about a QEMU image
> file" and runtime state really isn't information about an image file.
> 
> If you want to know the probed image format, you can still look at
> backing-image.format. I don't think this change is much different from
> what you described above for BlockDeviceInfo.backing_file.

Well, OK then.

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 5/9] block/curl: Allow the blocksize to be specified by the user

2020-08-24 Thread David Edmondson
On Monday, 2020-08-24 at 15:19:24 +02, Markus Armbruster wrote:

> David Edmondson  writes:
>
>> Rather than a fixed 256kB blocksize, allow the user to specify the
>> size used. It must be a non-zero power of two, defaulting to 256kB.
>
> Nitpick: any power of two is non-zero.  Scratch "non-zero".

Indeed.

>> Signed-off-by: David Edmondson 
>> ---
> [...]
>> diff --git a/docs/system/device-url-syntax.rst.inc 
>> b/docs/system/device-url-syntax.rst.inc
>> index bc38b9df38..ee504ee41a 100644
>> --- a/docs/system/device-url-syntax.rst.inc
>> +++ b/docs/system/device-url-syntax.rst.inc
>> @@ -194,6 +194,13 @@ These are specified using a special URL syntax.
>>Add an offset, in bytes, to all range requests sent to the
>>remote server.
>>  
>> +   ``blocksize``
>> +  The size of all IO requests sent to the remote server. This
>> +  value may optionally have the suffix 'T', 'G', 'M', 'K', 'k' or
>> +  'b'. If it does not have a suffix, it will be assumed to be in
>> +  bytes. The value must be a non-zero power of two.  It defaults
>> +  to 256kB.
>> +
>> Note that when passing options to qemu explicitly, ``driver`` is the
>> value of .
>>  
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index d6f5e91cc3..cd16197e1e 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3764,10 +3764,14 @@
>>  # @proxy-password-secret: ID of a QCryptoSecret object providing a password
>>  # for proxy authentication (defaults to no password)
>>  #
>> +# @blocksize: Size of all IO requests sent to the remote server; must
>> +# be a non-zero power of two (defaults to 1 256kB)
>
> Scratch "non-zero".
>
> "(defaults to 1 256kB)" confuses me.  Do you mean "(defaults to 256kB)"?

Yes, thanks for catching it.

> Please add "(since 5.2)".

Will do.

>> +#
>>  # Since: 2.9
>>  ##
>>  { 'struct': 'BlockdevOptionsCurlBase',
>>'data': { 'url': 'str',
>> +'*blocksize': 'int',
>
> Should we use 'size' rather than 'int'?

Yes.

>>  '*timeout': 'int',
>>  '*username': 'str',
>>  '*password-secret': 'str',

dme.
-- 
But are you safe Miss Gradenko?



Re: [PATCH v7 39/47] blockdev: Fix active commit choice

2020-08-24 Thread Kevin Wolf
Am 24.08.2020 um 15:18 hat Max Reitz geschrieben:
> On 21.08.20 17:50, Kevin Wolf wrote:
> > Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> >> We have to perform an active commit whenever the top node has a parent
> >> that has taken the WRITE permission on it.
> >>
> >> Signed-off-by: Max Reitz 
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >>  blockdev.c | 24 +---
> >>  1 file changed, 21 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/blockdev.c b/blockdev.c
> >> index 402f1d1df1..237fffbe53 100644
> >> --- a/blockdev.c
> >> +++ b/blockdev.c
> >> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char 
> >> *job_id, const char *device,
> >>  AioContext *aio_context;
> >>  Error *local_err = NULL;
> >>  int job_flags = JOB_DEFAULT;
> >> +uint64_t top_perm, top_shared;
> >>  
> >>  if (!has_speed) {
> >>  speed = 0;
> >> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char 
> >> *job_id, const char *device,
> >>  goto out;
> >>  }
> >>  
> >> -if (top_bs == bs) {
> >> +/*
> >> + * Active commit is required if and only if someone has taken a
> >> + * WRITE permission on the top node.
> > 
> > ...or if someone wants to take a WRITE permission while the job is
> > running.
> > 
> > Future intentions of the user is something that we can't know, so maybe
> > this should become an option in the future (not in this series, of
> > course).
> > 
> >>Historically, we have always
> >> + * used active commit for top nodes, so continue that practice.
> >> + * (Active commit is never really wrong.)
> >> + */
> > 
> > Changing the practice would break compatibility with clients that start
> > an active commit job and then attach it to a read-write device, so we
> > must continue the practice. I think the comment should be clearer about
> > this, it sounds more like "no reason, but why not".
> 
> I think that’s what I meant by “historically”.  Is “legacily” a word?
> 
> But sure, I can make it more explicit.
> 
> > This is even more problematic because the commit job doesn't unshare
> > BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
> > error.
> > 
> >> +bdrv_get_cumulative_perm(top_bs, _perm, _shared);
> >> +if (top_perm & BLK_PERM_WRITE ||
> >> +bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
> >> +{
> >>  if (has_backing_file) {
> >>  error_setg(errp, "'backing-file' specified,"
> >>   " but 'top' is the active layer");
> > 
> > Hm, this error message isn't accurate any more.
> > 
> > In fact, the implementation isn't consistent with the QAPI documentation
> > any more, because backing-file is only an error for the top level.
> 
> Hm.  I wanted to agree, and then I wanted to come up with a QAPI
> documentation that fits the new behavior (because I think it makes more
> sense to change the QAPI documentation along with the behavior change,
> rather than to force us to allow backing-file for anything that isn’t on
> the top layer).
> 
> But in the process of coming up with a better description, I noticed
> that this doesn’t say “is a root node”, it says “is the active layer”.
> I would say a node in the active layer is a node that has some parent
> that has taken a WRITE permission on it.  So actually I think that the
> documentation is right, and this code only now fits.

Then you may have not only "the" active layer, but multiple active
layers. I find this a bit counterintuitive.

There is a simple reason why backing-file is an error for a root node:
It doesn't have overlays, so a value to write to the header of overlay
images just doesn't make sense.

The same reasoning doesn't apply for writable images that do have
overlays. Forbidding backing-file is a more arbitrary restriction there.
I'm not saying that we can't make arbitrary restrictions where allowing
an option is not worth the effort, but I feel they should be spelt out
more explicitly instead of twisting words like "active layer" until they
fit the code.

> Though I do think this wants for some clarification.  Perhaps “If 'top'
> is the active layer (i.e., is a node that may be written to), specifying
> a backing [...]”?

"If 'top' doesn't have an overlay image or is in use by a writer..."?

> There’s more wrong with the specification, namely the whole part under
> @backing-file past the “(Since 2.1)”, starting with “If top == base”.  I
> think all of that should go to the top level.  (And “If top == active”
> should be changed to “If top is active (i.e., may be written to)”.)

At least the latter only becomes wrong with this patch, so I think it
needs to be changed by this patch.

Kevin


signature.asc
Description: PGP signature


Re: [PATCH v2 0/7] vhost-user-blk: fix the migration issue and enhance qtests

2020-08-24 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/cover.1598257838.git.dimas...@yandex-team.ru/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

C linker for the host machine: cc ld.bfd 2.27-43
Host machine cpu family: x86_64
Host machine cpu: x86_64
../src/meson.build:9: WARNING: Module unstable-keyval has no backwards or 
forwards compatibility and might not exist in future releases.
Program sh found: YES
Program python3 found: YES (/usr/bin/python3)
C++ compiler for the host machine: c++ (gcc 4.8.5 "c++ (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-39)")
---
 Linux keyring: YES

Found ninjatool-1.8 at /tmp/qemu-test/build/ninjatool
WARNING: custom_target 'shared QAPI source files' has more than one output! 
Using the first one.
WARNING: custom_target 'QGA QAPI files' has more than one output! Using the 
first one.
WARNING: custom_target 'QAPI files for qemu-storage-daemon' has more than one 
output! Using the first one.
WARNING: custom_target 'QAPI doc' has more than one output! Using the first one.
WARNING: custom_target 'dbus-vmstate description' has more than one output! 
Using the first one.
Command line for building ['libcommon.fa'] is long, using a response file
Command line for building ['libqemu-aarch64-softmmu.fa'] is long, using a 
response file
Command line for building ['qemu-system-aarch64'] is long, using a response file
---
Linking static target hw/core/libhwcore.fa
Linking static target chardev/libchardev.fa
../src/tests/qtest/vhost-user-test.c: In function 'test_reconnect':
../src/tests/qtest/vhost-user-test.c:935:9: error: 'nq' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 int nq;
 ^
cc1: all warnings being treated as errors
make: *** [tests/qtest/qos-test.p/vhost-user-test.c.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 709, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=b3addb190ca74c36acdbb7205792264a', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-b6ixlpeu/src/docker-src.2020-08-24-09.41.48.11658:/var/tmp/qemu:z,ro',
 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=b3addb190ca74c36acdbb7205792264a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-b6ixlpeu/src'
make: *** [docker-run-test-quick@centos7] Error 2

real4m29.821s
user0m21.159s


The full log is available at
http://patchew.org/logs/cover.1598257838.git.dimas...@yandex-team.ru/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v6 3/4] qapi: add filter-node-name to block-stream

2020-08-24 Thread Markus Armbruster
Andrey Shinkevich  writes:

> Provide the possibility to pass the 'filter-node-name' parameter to the
> block-stream job as it is done for the commit block job. That will be
> needed for further iotests implementations.
>
> Signed-off-by: Andrey Shinkevich 
[...]
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0b8ccd3..1db6ce1 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2524,6 +2524,11 @@
>  #'stop' and 'enospc' can only be used if the block device
>  #supports io-status (see BlockInfo).  Since 1.3.
>  #
> +# @filter-node-name: the node name that should be assigned to the
> +#filter driver that the stream job inserts into the graph
> +#above @device. If this option is not given, a node name 
> is
> +#autogenerated. (Since: 5.1)

It's (Since: 5.2) now.

> +#
>  # @auto-finalize: When false, this job will wait in a PENDING state after it 
> has
>  # finished its work, waiting for @block-job-finalize before
>  # making any block graph changes.
> @@ -2554,6 +2559,7 @@
>'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
>  '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
>  '*on-error': 'BlockdevOnError',
> +'*filter-node-name': 'str',
>  '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
>  
>  ##




Re: [RFC PATCH 5/9] block/curl: Allow the blocksize to be specified by the user

2020-08-24 Thread Markus Armbruster
David Edmondson  writes:

> Rather than a fixed 256kB blocksize, allow the user to specify the
> size used. It must be a non-zero power of two, defaulting to 256kB.

Nitpick: any power of two is non-zero.  Scratch "non-zero".

> Signed-off-by: David Edmondson 
> ---
[...]
> diff --git a/docs/system/device-url-syntax.rst.inc 
> b/docs/system/device-url-syntax.rst.inc
> index bc38b9df38..ee504ee41a 100644
> --- a/docs/system/device-url-syntax.rst.inc
> +++ b/docs/system/device-url-syntax.rst.inc
> @@ -194,6 +194,13 @@ These are specified using a special URL syntax.
>Add an offset, in bytes, to all range requests sent to the
>remote server.
>  
> +   ``blocksize``
> +  The size of all IO requests sent to the remote server. This
> +  value may optionally have the suffix 'T', 'G', 'M', 'K', 'k' or
> +  'b'. If it does not have a suffix, it will be assumed to be in
> +  bytes. The value must be a non-zero power of two.  It defaults
> +  to 256kB.
> +
> Note that when passing options to qemu explicitly, ``driver`` is the
> value of .
>  
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index d6f5e91cc3..cd16197e1e 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3764,10 +3764,14 @@
>  # @proxy-password-secret: ID of a QCryptoSecret object providing a password
>  # for proxy authentication (defaults to no password)
>  #
> +# @blocksize: Size of all IO requests sent to the remote server; must
> +# be a non-zero power of two (defaults to 1 256kB)

Scratch "non-zero".

"(defaults to 1 256kB)" confuses me.  Do you mean "(defaults to 256kB)"?

Please add "(since 5.2)".

> +#
>  # Since: 2.9
>  ##
>  { 'struct': 'BlockdevOptionsCurlBase',
>'data': { 'url': 'str',
> +'*blocksize': 'int',

Should we use 'size' rather than 'int'?

>  '*timeout': 'int',
>  '*username': 'str',
>  '*password-secret': 'str',




Re: [PATCH v7 39/47] blockdev: Fix active commit choice

2020-08-24 Thread Max Reitz
On 21.08.20 17:50, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> We have to perform an active commit whenever the top node has a parent
>> that has taken the WRITE permission on it.
>>
>> Signed-off-by: Max Reitz 
>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>  blockdev.c | 24 +---
>>  1 file changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index 402f1d1df1..237fffbe53 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -2589,6 +2589,7 @@ void qmp_block_commit(bool has_job_id, const char 
>> *job_id, const char *device,
>>  AioContext *aio_context;
>>  Error *local_err = NULL;
>>  int job_flags = JOB_DEFAULT;
>> +uint64_t top_perm, top_shared;
>>  
>>  if (!has_speed) {
>>  speed = 0;
>> @@ -2704,14 +2705,31 @@ void qmp_block_commit(bool has_job_id, const char 
>> *job_id, const char *device,
>>  goto out;
>>  }
>>  
>> -if (top_bs == bs) {
>> +/*
>> + * Active commit is required if and only if someone has taken a
>> + * WRITE permission on the top node.
> 
> ...or if someone wants to take a WRITE permission while the job is
> running.
> 
> Future intentions of the user is something that we can't know, so maybe
> this should become an option in the future (not in this series, of
> course).
> 
>>Historically, we have always
>> + * used active commit for top nodes, so continue that practice.
>> + * (Active commit is never really wrong.)
>> + */
> 
> Changing the practice would break compatibility with clients that start
> an active commit job and then attach it to a read-write device, so we
> must continue the practice. I think the comment should be clearer about
> this, it sounds more like "no reason, but why not".

I think that’s what I meant by “historically”.  Is “legacily” a word?

But sure, I can make it more explicit.

> This is even more problematic because the commit job doesn't unshare
> BLK_PERM_WRITE yet, so it would lead to silent corruption rather than an
> error.
> 
>> +bdrv_get_cumulative_perm(top_bs, _perm, _shared);
>> +if (top_perm & BLK_PERM_WRITE ||
>> +bdrv_skip_filters(top_bs) == bdrv_skip_filters(bs))
>> +{
>>  if (has_backing_file) {
>>  error_setg(errp, "'backing-file' specified,"
>>   " but 'top' is the active layer");
> 
> Hm, this error message isn't accurate any more.
> 
> In fact, the implementation isn't consistent with the QAPI documentation
> any more, because backing-file is only an error for the top level.

Hm.  I wanted to agree, and then I wanted to come up with a QAPI
documentation that fits the new behavior (because I think it makes more
sense to change the QAPI documentation along with the behavior change,
rather than to force us to allow backing-file for anything that isn’t on
the top layer).

But in the process of coming up with a better description, I noticed
that this doesn’t say “is a root node”, it says “is the active layer”.
I would say a node in the active layer is a node that has some parent
that has taken a WRITE permission on it.  So actually I think that the
documentation is right, and this code only now fits.

Though I do think this wants for some clarification.  Perhaps “If 'top'
is the active layer (i.e., is a node that may be written to), specifying
a backing [...]”?

There’s more wrong with the specification, namely the whole part under
@backing-file past the “(Since 2.1)”, starting with “If top == base”.  I
think all of that should go to the top level.  (And “If top == active”
should be changed to “If top is active (i.e., may be written to)”.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 41/47] block: Leave BDS.backing_file constant

2020-08-24 Thread Kevin Wolf
Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
> Parts of the block layer treat BDS.backing_file as if it were whatever
> the image header says (i.e., if it is a relative path, it is relative to
> the overlay), other parts treat it like a cache for
> bs->backing->bs->filename (relative paths are relative to the CWD).
> Considering bs->backing->bs->filename exists, let us make it mean the
> former.
> 
> Among other things, this now allows the user to specify a base when
> using qemu-img to commit an image file in a directory that is not the
> CWD (assuming, everything uses relative filenames).
> 
> Before this patch:
> 
> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 
> 'foo/top.qcow2'
> 
> After this patch:
> 
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> Image committed.
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> Image committed.
> 
> With this change, bdrv_find_backing_image() must look at whether the
> user has overridden a BDS's backing file.  If so, it can no longer use
> bs->backing_file, but must instead compare the given filename against
> the backing node's filename directly.
> 
> Note that this changes the QAPI output for a node's backing_file.  We
> had very inconsistent output there (sometimes what the image header
> said, sometimes the actual filename of the backing image).  This
> inconsistent output was effectively useless, so we have to decide one
> way or the other.  Considering that bs->backing_file usually at runtime
> contained the path to the image relative to qemu's CWD (or absolute),
> this patch changes QAPI's backing_file to always report the
> bs->backing->bs->filename from now on.  If you want to receive the image
> header information, you have to refer to full-backing-filename.
> 
> This necessitates a change to iotest 228.  The interesting information
> it really wanted is the image header, and it can get that now, but it
> has to use full-backing-filename instead of backing_file.  Because of
> this patch's changes to bs->backing_file's behavior, we also need some
> reference output changes.
> 
> Along with the changes to bs->backing_file, stop updating
> BDS.backing_format in bdrv_backing_attach() as well.  In order not to
> change our externally visible behavior (incompatibly), we have to let
> bdrv_query_image_info() try to get the image format from bs->backing if
> bs->backing_format is unset.  (The QAPI schema describes
> backing-filename-format as "the format of the backing file", so it is
> not necessarily what the image header says, but just the format of the
> file referenced by backing-filename (if known).)

Why is it okay to change backing-filename incompatibly, but not
backing-filename-format? I would find it much more consistent if
ImageInfo reported the value from the header in both fields, and
BlockDeviceInfo reported the values actually in use.

The QAPI schema described ImageInfo as "Information about a QEMU image
file" and runtime state really isn't information about an image file.

If you want to know the probed image format, you can still look at
backing-image.format. I don't think this change is much different from
what you described above for BlockDeviceInfo.backing_file.

> iotest 245 changes in behavior: With the backing node no longer
> overriding the parent node's backing_file string, you can now omit the
> @backing option when reopening a node with neither a default nor a
> current backing file even if it used to have a backing node at some
> point.
> 
> Signed-off-by: Max Reitz 

Kevin




Re: [PATCH v7 37/47] qemu-img: Use child access functions

2020-08-24 Thread Max Reitz
On 21.08.20 17:29, Kevin Wolf wrote:
> Am 25.06.2020 um 17:22 hat Max Reitz geschrieben:
>> This changes iotest 204's output, because blkdebug on top of a COW node
>> used to make qemu-img map disregard the rest of the backing chain (the
>> backing chain was broken by the filter).  With this patch, the
>> allocation in the base image is reported correctly.
>>
>> Signed-off-by: Max Reitz 
> 
>> @@ -3437,6 +3441,7 @@ static int img_rebase(int argc, char **argv)
>>  uint8_t *buf_old = NULL;
>>  uint8_t *buf_new = NULL;
>>  BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
>> +BlockDriverState *unfiltered_bs;
>>  char *filename;
>>  const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
>>  int c, flags, src_flags, ret;
>> @@ -3571,6 +3576,8 @@ static int img_rebase(int argc, char **argv)
>>  }
>>  bs = blk_bs(blk);
>>  
>> +unfiltered_bs = bdrv_skip_filters(bs);
>> +
>>  if (out_basefmt != NULL) {
>>  if (bdrv_find_format(out_basefmt) == NULL) {
>>  error_report("Invalid format name: '%s'", out_basefmt);
>> @@ -3582,7 +3589,7 @@ static int img_rebase(int argc, char **argv)
>>  /* For safe rebasing we need to compare old and new backing file */
>>  if (!unsafe) {
>>  QDict *options = NULL;
>> -BlockDriverState *base_bs = backing_bs(bs);
>> +BlockDriverState *base_bs = bdrv_cow_bs(unfiltered_bs);
>>  
>>  if (base_bs) {
>>  blk_old_backing = blk_new(qemu_get_aio_context(),
>> @@ -3738,8 +3745,9 @@ static int img_rebase(int argc, char **argv)
>>   * If cluster wasn't changed since prefix_chain, we don't 
>> need
>>   * to take action
>>   */
>> -ret = bdrv_is_allocated_above(backing_bs(bs), 
>> prefix_chain_bs,
>> -  false, offset, n, );
>> +ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
>> +  prefix_chain_bs, false,
>> +  offset, n, );
>>  if (ret < 0) {
>>  error_report("error while reading image metadata: %s",
>>   strerror(-ret));
> 
> img_rebase() has these additional calls:
> 
> /* If the cluster is allocated, we don't need to take action */
> ret = bdrv_is_allocated(bs, offset, n, );
> 
> And:
> 
> if (out_baseimg && *out_baseimg) {
> ret = bdrv_change_backing_file(bs, out_baseimg, out_basefmt);
> } else {
> ret = bdrv_change_backing_file(bs, NULL, NULL);
> }
> 
> Shouldn't they use unfiltered_bs?

Oh, yes, the second one definitely.

As for the first one, I don’t think there’s a difference.  But why not,
we really want to query unfiltered_bs, so it’s better to do so
explicitly than through the implicit fall-through behavior of block_status.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 4/4] block: apply COR-filter to block-stream jobs

2020-08-24 Thread Vladimir Sementsov-Ogievskiy

24.08.2020 11:31, Andrey Shinkevich wrote:

This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 58 +++---
  tests/qemu-iotests/030 | 50 ---
  tests/qemu-iotests/030.out |  4 ++--
  tests/qemu-iotests/245 | 19 +++
  4 files changed, 65 insertions(+), 66 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 8bf6b6d..e927fed 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -19,6 +19,7 @@
  #include "qapi/qmp/qerror.h"
  #include "qemu/ratelimit.h"
  #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
  
  enum {

  /*
@@ -33,6 +34,8 @@ typedef struct StreamBlockJob {
  BlockJob common;
  BlockDriverState *base_overlay; /* COW overlay (stream from this) */
  BlockDriverState *above_base;   /* Node directly above the base */
+BlockDriverState *cor_filter_bs;
+BlockDriverState *target_bs;
  BlockdevOnError on_error;
  char *backing_file_str;
  bool bs_read_only;
@@ -53,22 +56,20 @@ static void stream_abort(Job *job)
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
  
  if (s->chain_frozen) {

-BlockJob *bjob = >common;
-bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
  }
  }
  
  static int stream_prepare(Job *job)

  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+BlockDriverState *bs = s->target_bs;
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);


variable bs is used only here, we can directly use s->target_bs instead


  BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
  Error *local_err = NULL;
  int ret = 0;
  
-bdrv_unfreeze_backing_chain(bs, s->above_base);

+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
  s->chain_frozen = false;
  
  if (bdrv_cow_child(unfiltered_bs)) {

@@ -94,7 +95,9 @@ static void stream_clean(Job *job)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
  BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+BlockDriverState *bs = s->target_bs;


Here as well, I'd drop extra local bs variable


+
+bdrv_cor_filter_drop(s->cor_filter_bs);
  
  /* Reopen the image back in read-only mode if necessary */

  if (s->bs_read_only) {
@@ -110,7 +113,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
  BlockBackend *blk = s->common.blk;
-BlockDriverState *bs = blk_bs(blk);
+BlockDriverState *bs = s->target_bs;


Now we have both bdrv_enable_copy_on_read(bs) and cor-filter, which doesn't seem correct. 
Also, we'll need "base" parameter for cor filter to not copy-on-read extra 
chunks.


  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  bool enable_cor = !bdrv_cow_child(s->base_overlay);
  int64_t len;
@@ -231,6 +234,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
  int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
  BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
  BlockDriverState *above_base;
+BlockDriverState *cor_filter_bs = NULL;
  
  if (!base_overlay) {

  error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -264,17 +268,36 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
  }
  }
  
-/* Prevent concurrent jobs trying to modify the graph structure here, we

- * already have our own plans. Also don't allow resize as the image size is
- * queried only at the job start and then cached. */
-s = block_job_create(job_id, _job_driver, NULL, bs,
- basic_flags | BLK_PERM_GRAPH_MOD,
- 

Re: Suspicious QOM types without instance/class size

2020-08-24 Thread Cornelia Huck
On Fri, 21 Aug 2020 17:01:49 -0400
Eduardo Habkost  wrote:

> On Fri, Aug 21, 2020 at 11:43:35AM +0200, Cornelia Huck wrote:
> > On Thu, 20 Aug 2020 17:55:29 -0400
> > Eduardo Habkost  wrote:
> >   
> > > While trying to convert TypeInfo declarations to the new
> > > OBJECT_DECLARE* macros, I've stumbled on a few suspicious cases
> > > where instance_size or class_size is not set, despite having type
> > > checker macros that use a specific type.
> > > 
> > > The ones with "WARNING" are abstract types (maybe not serious if
> > > subclasses set the appropriate sizes).  The ones with "ERROR"
> > > don't seem to be abstract types.
> > > 
> > > ERROR: hw/s390x/virtio-ccw.c:1237:1: class_size should be set to 
> > > sizeof(VirtioCcwBusClass)?
> > > ERROR: hw/virtio/virtio-pci.c:2101:1: class_size should be set to 
> > > sizeof(VirtioPCIBusClass)?  
> > 
> > VirtioCcwBusClass and VirtioPCIBusClass are both simple typedefs of
> > VirtioBusClass (it's likely that I copied the ccw definition from the
> > pci one). virtio-mmio instead uses VirtioBusClass directly in its
> > checker macros.
> > 
> > I don't see a real reason for the typedefs, maybe ccw and pci should
> > use the mmio approach as well?  
> 
> I think it's OK to keep the typedefs if the code is consistent
> (i.e. we set instance_size and class_size just in case the
> typedefs are replaced by a real struct one day).

AFAIU, VirtioBusClass is providing functionality needed for all virtio
buses, and they should not need to add anything on top. We can however
try to make it safe, as it is only a line of code for both pci and ccw.

> 
> I'm not sure about the TYPE_VIRTIO_MMIO_BUS approach.  If the
> code just needs VirtioBusState or VirtioBusClass pointers, it can
> already use the VIRTIO_BUS* macros.

We could go ahead and ditch the bus-specific macros if there's no real
need for it. At least, I don't see a real need for *Class. OTOH, having
types and macros defined everywhere makes it more symmetric.

> 
> The OBJECT_DECLARE_TYPE macro Daniel sent expects each QOM type
> to have a separate struct being defined, which isn't true in many
> cases.  I'm considering removing the "typedef struct Foo Foo"
> lines from OBJECT_DECLARE_TYPE(), to make initial conversion
> easier.

Would be interesting to figure out the individual reasons why there's
no separate struct, just to make sure we're not operating from wrong
assumptions.




Re: [PATCH 2/3] block: add logging facility for long standing IO requests

2020-08-24 Thread Denis V. Lunev
On 8/20/20 11:03 AM, David Edmondson wrote:
> On Monday, 2020-08-10 at 13:14:46 +03, Denis V. Lunev wrote:
>
>> +strftime(buf, sizeof(buf), "%m-%d %H:%M:%S",
> "%F %T" would include the year, which can be useful.
ok

>
>> + localtime_r(_time_host_s, ));
>> +
>> +bs = blk_bs(blk_stats2blk(stats));
>> +qemu_log("long %s[%ld] IO request: %d.03%d since %s.%03d bs: %s(%s, 
>> %s)\n",
>> + block_account_type(cookie->type), cookie->bytes,
>> + (int)(latency_ms / 1000), (int)(latency_ms % 1000), buf,
>> + (int)((cookie->start_time_ns / 100) % 1000),
> Is there a reason not to use %u rather than casting?
no, we could use unsigned.

Will resend shortly.



Re: [PATCH v6 4/4] block: apply COR-filter to block-stream jobs

2020-08-24 Thread Andrey Shinkevich

On 24.08.2020 11:20, Vladimir Sementsov-Ogievskiy wrote:

23.08.2020 22:28, Andrey Shinkevich wrote:

On 19.08.2020 13:46, Vladimir Sementsov-Ogievskiy wrote:

19.08.2020 00:24, Andrey Shinkevich wrote:

The patch completes the series with the COR-filter insertion to any
block-stream operation. It also makes changes to the iotests 030.
The test case 'test_stream_parallel' was deleted due to multiple
errors.



...




Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 76 
--

  tests/qemu-iotests/030 | 50 +++---
  tests/qemu-iotests/030.out |  4 +--
  3 files changed, 61 insertions(+), 69 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 8bf6b6d..0b11979 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -19,6 +19,7 @@
  #include "qapi/qmp/qerror.h"
  #include "qemu/ratelimit.h"
  #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
    enum {
  /*
@@ -33,8 +34,11 @@ typedef struct StreamBlockJob {
  BlockJob common;
  BlockDriverState *base_overlay; /* COW overlay (stream from 
this) */
  BlockDriverState *above_base;   /* Node directly above the 
base */

+    BlockDriverState *cor_filter_bs;
+    BlockDriverState *target_bs;
  BlockdevOnError on_error;
  char *backing_file_str;
+    char *base_fmt;
  bool bs_read_only;
  bool chain_frozen;
  } StreamBlockJob;
@@ -53,34 +57,26 @@ static void stream_abort(Job *job)
  StreamBlockJob *s = container_of(job, StreamBlockJob, 
common.job);

    if (s->chain_frozen) {
-    BlockJob *bjob = >common;
-    bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), 
s->above_base);

+    bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
  }
  }
    static int stream_prepare(Job *job)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, 
common.job);

-    BlockJob *bjob = >common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *bs = s->target_bs;
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
  Error *local_err = NULL;
  int ret = 0;
  -    bdrv_unfreeze_backing_chain(bs, s->above_base);
+    bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
  s->chain_frozen = false;
    if (bdrv_cow_child(unfiltered_bs)) {
-    const char *base_id = NULL, *base_fmt = NULL;
-    if (base) {
-    base_id = s->backing_file_str;
-    if (base->drv) {
-    base_fmt = base->drv->format_name;
-    }
-    }
  bdrv_set_backing_hd(unfiltered_bs, base, _err);
-    ret = bdrv_change_backing_file(unfiltered_bs, base_id, 
base_fmt);
+    ret = bdrv_change_backing_file(unfiltered_bs, 
s->backing_file_str,

+   s->base_fmt);
  if (local_err) {
  error_report_err(local_err);
  return -EPERM;
@@ -94,7 +90,9 @@ static void stream_clean(Job *job)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, 
common.job);

  BlockJob *bjob = >common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *bs = s->target_bs;
+
+    bdrv_cor_filter_drop(s->cor_filter_bs);
    /* Reopen the image back in read-only mode if necessary */
  if (s->bs_read_only) {
@@ -104,13 +102,14 @@ static void stream_clean(Job *job)
  }
    g_free(s->backing_file_str);
+    g_free(s->base_fmt);
  }
    static int coroutine_fn stream_run(Job *job, Error **errp)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, 
common.job);

  BlockBackend *blk = s->common.blk;
-    BlockDriverState *bs = blk_bs(blk);
+    BlockDriverState *bs = s->target_bs;
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  bool enable_cor = !bdrv_cow_child(s->base_overlay);
  int64_t len;
@@ -231,6 +230,12 @@ void stream_start(const char *job_id, 
BlockDriverState *bs,
  int basic_flags = BLK_PERM_CONSISTENT_READ | 
BLK_PERM_WRITE_UNCHANGED;

  BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
  BlockDriverState *above_base;
+    BlockDriverState *cor_filter_bs = NULL;
+    char *base_fmt = NULL;
+
+    if (base && base->drv) {
+    base_fmt = g_strdup(base->drv->format_name);
+    }
    if (!base_overlay) {
  error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -264,17 +269,36 @@ void stream_start(const char *job_id, 
BlockDriverState *bs,

  }
  }
  -    /* Prevent concurrent jobs trying to modify the graph 
structure here, we
- * already have our own plans. Also don't allow resize as the 
image size is

- * queried only at the job start and then cached. */
-    s = block_job_create(job_id, _job_driver, NULL, bs,
- basic_flags | BLK_PERM_GRAPH_MOD,
- basic_flags | BLK_PERM_WRITE,
+    cor_filter_bs = 

[PATCH v2 4/7] tests/qtest/libqos/virtio-blk: add support for vhost-user-blk

2020-08-24 Thread Dima Stepanov
Add support for the vhost-user-blk-pci device. This node can be used by
the vhost-user-blk tests. Tests for the vhost-user-blk device are added
in the following patches.

Signed-off-by: Dima Stepanov 
---
 tests/qtest/libqos/virtio-blk.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tests/qtest/libqos/virtio-blk.c b/tests/qtest/libqos/virtio-blk.c
index 5da0259..959c5dc 100644
--- a/tests/qtest/libqos/virtio-blk.c
+++ b/tests/qtest/libqos/virtio-blk.c
@@ -36,6 +36,9 @@ static void *qvirtio_blk_get_driver(QVirtioBlk *v_blk,
 if (!g_strcmp0(interface, "virtio")) {
 return v_blk->vdev;
 }
+if (!g_strcmp0(interface, "vhost-user-blk")) {
+return v_blk;
+}
 
 fprintf(stderr, "%s not present in virtio-blk-device\n", interface);
 g_assert_not_reached();
@@ -120,6 +123,17 @@ static void virtio_blk_register_nodes(void)
 qos_node_produces("virtio-blk-pci", "virtio-blk");
 
 g_free(arg);
+
+/* vhost-user-blk-pci */
+arg = g_strdup_printf("id=drv0,chardev=chdev0,addr=%x.%x",
+PCI_SLOT, PCI_FN);
+opts.extra_device_opts = arg;
+add_qpci_address(, );
+qos_node_create_driver("vhost-user-blk-pci", virtio_blk_pci_create);
+qos_node_consumes("vhost-user-blk-pci", "pci-bus", );
+qos_node_produces("vhost-user-blk-pci", "vhost-user-blk");
+
+g_free(arg);
 }
 
 libqos_init(virtio_blk_register_nodes);
-- 
2.7.4




[PATCH v2 7/7] tests/qtest/vhost-user-test: enable the reconnect tests

2020-08-24 Thread Dima Stepanov
For now a QTEST_VHOST_USER_FIXME environment variable is used to
separate reconnect tests for the vhost-user-net device. Looks like the
reconnect functionality is pretty stable, so this separation is
deprecated.
Remove it and enable these tests for the default run.

Signed-off-by: Dima Stepanov 
---
 tests/qtest/vhost-user-test.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 6b4c147..5fbfa02 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -1141,20 +1141,17 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_migrate, );
 
-/* keeps failing on build-system since Aug 15 2017 */
-if (getenv("QTEST_VHOST_USER_FIXME")) {
-opts.before = vhost_user_test_setup_reconnect;
-qos_add_test("vhost-user/reconnect", "virtio-net",
- test_reconnect, );
-
-opts.before = vhost_user_test_setup_connect_fail;
-qos_add_test("vhost-user/connect-fail", "virtio-net",
- test_vhost_user_started, );
-
-opts.before = vhost_user_test_setup_flags_mismatch;
-qos_add_test("vhost-user/flags-mismatch", "virtio-net",
- test_vhost_user_started, );
-}
+opts.before = vhost_user_test_setup_reconnect;
+qos_add_test("vhost-user/reconnect", "virtio-net",
+ test_reconnect, );
+
+opts.before = vhost_user_test_setup_connect_fail;
+qos_add_test("vhost-user/connect-fail", "virtio-net",
+ test_vhost_user_started, );
+
+opts.before = vhost_user_test_setup_flags_mismatch;
+qos_add_test("vhost-user/flags-mismatch", "virtio-net",
+ test_vhost_user_started, );
 
 opts.before = vhost_user_test_setup_multiqueue;
 opts.edge.extra_device_opts = "mq=on";
-- 
2.7.4




[PATCH v2 6/7] tests/qtest/vhost-user-test: add migrate_reconnect test

2020-08-24 Thread Dima Stepanov
Add new migrate_reconnect test for the vhost-user-blk device. Perform a
disconnect after sending response for the VHOST_USER_SET_LOG_BASE
command.

Signed-off-by: Dima Stepanov 
---
 tests/qtest/vhost-user-test.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 9b6e202..6b4c147 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -146,6 +146,7 @@ static VhostUserMsg m __attribute__ ((unused));
 enum {
 TEST_FLAGS_OK,
 TEST_FLAGS_DISCONNECT,
+TEST_FLAGS_MIGRATE_DISCONNECT,
 TEST_FLAGS_BAD,
 TEST_FLAGS_END,
 };
@@ -436,6 +437,15 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE);
 
 g_cond_broadcast(>data_cond);
+/*
+ * Perform disconnect after sending a response. In this
+ * case the next write command on the QEMU side (for now
+ * it is SET_FEATURES will return -1, because of disconnect.
+ */
+if (s->test_flags == TEST_FLAGS_MIGRATE_DISCONNECT) {
+qemu_chr_fe_disconnect(chr);
+s->test_flags = TEST_FLAGS_BAD;
+}
 break;
 
 case VHOST_USER_SET_VRING_BASE:
@@ -737,6 +747,17 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_migrate_reconnect(GString *cmd_line,
+void *arg)
+{
+TestServer *server;
+
+server = vhost_user_test_setup_memfd(cmd_line, arg);
+server->test_flags = TEST_FLAGS_MIGRATE_DISCONNECT;
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1151,5 +1172,9 @@ static void register_vhost_user_test(void)
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("migrate", "vhost-user-blk",
  test_migrate, );
+
+opts.before = vhost_user_test_setup_migrate_reconnect;
+qos_add_test("migrate_reconnect", "vhost-user-blk",
+ test_migrate, );
 }
 libqos_init(register_vhost_user_test);
-- 
2.7.4




[PATCH v2 2/7] vhost: check queue state in the vhost_dev_set_log routine

2020-08-24 Thread Dima Stepanov
If the vhost-user-blk daemon provides only one virtqueue, but device was
added with several queues, then QEMU will send more VHOST-USER command
than expected by daemon side. The vhost_virtqueue_start() routine
handles such case by checking the return value from the
virtio_queue_get_desc_addr() function call. Add the same check to the
vhost_dev_set_log() routine.

Signed-off-by: Dima Stepanov 
---
 hw/virtio/vhost.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ffef7ab..a33ffd4 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -825,12 +825,24 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
 static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
 {
 int r, i, idx;
+hwaddr addr;
+
 r = vhost_dev_set_features(dev, enable_log);
 if (r < 0) {
 goto err_features;
 }
 for (i = 0; i < dev->nvqs; ++i) {
 idx = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + i);
+addr = virtio_queue_get_desc_addr(dev->vdev, idx);
+if (!addr) {
+/*
+ * The queue might not be ready for start. If this
+ * is the case there is no reason to continue the process.
+ * The similar logic is used by the vhost_virtqueue_start()
+ * routine.
+ */
+break;
+}
 r = vhost_virtqueue_set_addr(dev, dev->vqs + i, idx,
  enable_log);
 if (r < 0) {
-- 
2.7.4




[PATCH v2 3/7] tests/qtest/vhost-user-test: prepare the tests for adding new dev class

2020-08-24 Thread Dima Stepanov
For now only vhost-user-net device is supported by the test. Other
vhost-user devices are not tested. As a first step make source code
refactoring so new devices can reuse the same test routines. To make
this provide a new vhost_user_ops structure with the methods to
initialize device, its command line or make a proper vhost-user
responses.

Signed-off-by: Dima Stepanov 
---
 tests/qtest/vhost-user-test.c | 105 ++
 1 file changed, 76 insertions(+), 29 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 9ee0f1e..3df5322 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -135,6 +135,10 @@ enum {
 TEST_FLAGS_END,
 };
 
+enum {
+VHOST_USER_NET,
+};
+
 typedef struct TestServer {
 gchar *socket_path;
 gchar *mig_path;
@@ -154,10 +158,25 @@ typedef struct TestServer {
 bool test_fail;
 int test_flags;
 int queues;
+struct vhost_user_ops *vu_ops;
 } TestServer;
 
+struct vhost_user_ops {
+/* Device types. */
+int type;
+void (*append_opts)(TestServer *s, GString *cmd_line,
+const char *chr_opts);
+
+/* VHOST-USER commands. */
+void (*set_features)(TestServer *s, CharBackend *chr,
+VhostUserMsg *msg);
+void (*get_protocol_features)(TestServer *s,
+CharBackend *chr, VhostUserMsg *msg);
+};
+
 static const char *init_hugepagefs(void);
-static TestServer *test_server_new(const gchar *name);
+static TestServer *test_server_new(const gchar *name,
+struct vhost_user_ops *ops);
 static void test_server_free(TestServer *server);
 static void test_server_listen(TestServer *server);
 
@@ -167,7 +186,7 @@ enum test_memfd {
 TEST_MEMFD_NO,
 };
 
-static void append_vhost_opts(TestServer *s, GString *cmd_line,
+static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
  const char *chr_opts)
 {
 g_string_append_printf(cmd_line, QEMU_CMD_CHR QEMU_CMD_NETDEV,
@@ -332,25 +351,15 @@ static void chr_read(void *opaque, const uint8_t *buf, 
int size)
 break;
 
 case VHOST_USER_SET_FEATURES:
-g_assert_cmpint(msg.payload.u64 & (0x1ULL << 
VHOST_USER_F_PROTOCOL_FEATURES),
-!=, 0ULL);
-if (s->test_flags == TEST_FLAGS_DISCONNECT) {
-qemu_chr_fe_disconnect(chr);
-s->test_flags = TEST_FLAGS_BAD;
+if (s->vu_ops->set_features) {
+s->vu_ops->set_features(s, chr, );
 }
 break;
 
 case VHOST_USER_GET_PROTOCOL_FEATURES:
-/* send back features to qemu */
-msg.flags |= VHOST_USER_REPLY_MASK;
-msg.size = sizeof(m.payload.u64);
-msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
-msg.payload.u64 |= 1 << VHOST_USER_PROTOCOL_F_CROSS_ENDIAN;
-if (s->queues > 1) {
-msg.payload.u64 |= 1 << VHOST_USER_PROTOCOL_F_MQ;
+if (s->vu_ops->get_protocol_features) {
+s->vu_ops->get_protocol_features(s, chr, );
 }
-p = (uint8_t *) 
-qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
 
 case VHOST_USER_GET_VRING_BASE:
@@ -467,7 +476,8 @@ static const char *init_hugepagefs(void)
 #endif
 }
 
-static TestServer *test_server_new(const gchar *name)
+static TestServer *test_server_new(const gchar *name,
+struct vhost_user_ops *ops)
 {
 TestServer *server = g_new0(TestServer, 1);
 char template[] = "/tmp/vhost-test-XX";
@@ -495,6 +505,7 @@ static TestServer *test_server_new(const gchar *name)
 
 server->log_fd = -1;
 server->queues = 1;
+server->vu_ops = ops;
 
 return server;
 }
@@ -669,11 +680,11 @@ static void vhost_user_test_cleanup(void *s)
 
 static void *vhost_user_test_setup(GString *cmd_line, void *arg)
 {
-TestServer *server = test_server_new("vhost-user-test");
+TestServer *server = test_server_new("vhost-user-test", arg);
 test_server_listen(server);
 
 append_mem_opts(server, cmd_line, 256, TEST_MEMFD_AUTO);
-append_vhost_opts(server, cmd_line, "");
+server->vu_ops->append_opts(server, cmd_line, "");
 
 g_test_queue_destroy(vhost_user_test_cleanup, server);
 
@@ -682,11 +693,11 @@ static void *vhost_user_test_setup(GString *cmd_line, 
void *arg)
 
 static void *vhost_user_test_setup_memfd(GString *cmd_line, void *arg)
 {
-TestServer *server = test_server_new("vhost-user-test");
+TestServer *server = test_server_new("vhost-user-test", arg);
 test_server_listen(server);
 
 append_mem_opts(server, cmd_line, 256, TEST_MEMFD_YES);
-append_vhost_opts(server, cmd_line, "");
+server->vu_ops->append_opts(server, cmd_line, "");
 
 g_test_queue_destroy(vhost_user_test_cleanup, server);
 
@@ -720,7 +731,7 @@ static void test_migrate(void *obj, void *arg, 
QGuestAllocator *alloc)
 return;
 }
 
-dest = test_server_new("dest");
+dest = 

[PATCH v2 1/7] vhost: recheck dev state in the vhost_migration_log routine

2020-08-24 Thread Dima Stepanov
vhost-user devices can get a disconnect in the middle of the VHOST-USER
handshake on the migration start. If disconnect event happened right
before sending next VHOST-USER command, then the vhost_dev_set_log()
call in the vhost_migration_log() function will return error. This error
will lead to the assert() and close the QEMU migration source process.
For the vhost-user devices the disconnect event should not break the
migration process, because:
  - the device will be in the stopped state, so it will not be changed
during migration
  - if reconnect will be made the migration log will be reinitialized as
part of reconnect/init process:
#0  vhost_log_global_start (listener=0x563989cf7be0)
at hw/virtio/vhost.c:920
#1  0x56398603d8bc in listener_add_address_space 
(listener=0x563989cf7be0,
as=0x563986ea4340 )
at softmmu/memory.c:2664
#2  0x56398603dd30 in memory_listener_register (listener=0x563989cf7be0,
as=0x563986ea4340 )
at softmmu/memory.c:2740
#3  0x563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
busyloop_timeout=0)
at hw/virtio/vhost.c:1385
#4  0x563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
at hw/block/vhost-user-blk.c:315
#5  0x563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
event=CHR_EVENT_OPENED)
at hw/block/vhost-user-blk.c:379
Update the vhost-user-blk device with the internal started field which
will be used for initialization and clean up. The disconnect event will
set the overall VHOST device to the stopped state, so it can be used by
the vhost_migration_log routine.
Such approach could be propogated to the other vhost-user devices, but
better idea is just to make the same connect/disconnect code for all the
vhost-user devices.

This migration issue was slightly discussed earlier:
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html

Signed-off-by: Dima Stepanov 
---
 hw/block/vhost-user-blk.c  | 19 ---
 hw/virtio/vhost.c  | 27 ---
 include/hw/virtio/vhost-user-blk.h | 10 ++
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index a00b854..5573e89 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -150,6 +150,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
 error_report("Error starting vhost: %d", -ret);
 goto err_guest_notifiers;
 }
+s->started = true;
 
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here. virtio-pci will do the right thing by
@@ -175,6 +176,11 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 int ret;
 
+if (!s->started) {
+return;
+}
+s->started = false;
+
 if (!k->set_guest_notifiers) {
 return;
 }
@@ -341,9 +347,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
 }
 s->connected = false;
 
-if (s->dev.started) {
-vhost_user_blk_stop(vdev);
-}
+vhost_user_blk_stop(vdev);
 
 vhost_dev_cleanup(>dev);
 }
@@ -399,6 +403,15 @@ static void vhost_user_blk_event(void *opaque, 
QEMUChrEvent event)
 NULL, NULL, false);
 aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
 }
+
+/*
+ * Move vhost device to the stopped state. The vhost-user device
+ * will be clean up and disconnected in BH. This can be useful in
+ * the vhost migration code. If disconnect was caught there is an
+ * option for the general vhost code to get the dev state without
+ * knowing its type (in this case vhost-user).
+ */
+s->dev.started = false;
 break;
 case CHR_EVENT_BREAK:
 case CHR_EVENT_MUX_IN:
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 1a1384e..ffef7ab 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -861,21 +861,42 @@ static int vhost_migration_log(MemoryListener *listener, 
bool enable)
 dev->log_enabled = enable;
 return 0;
 }
+
+r = 0;
 if (!enable) {
 r = vhost_dev_set_log(dev, false);
 if (r < 0) {
-return r;
+goto check_dev_state;
 }
 vhost_log_put(dev, false);
 } else {
 vhost_dev_log_resize(dev, vhost_get_log_size(dev));
 r = vhost_dev_set_log(dev, true);
 if (r < 0) {
-return r;
+goto check_dev_state;
 }
 }
+
+check_dev_state:
 dev->log_enabled = enable;
-return 0;
+/*
+ * vhost-user-* devices could change their state during log
+ * initialization due to disconnect. So check dev state after
+ * vhost 

[PATCH v2 5/7] tests/qtest/vhost-user-test: add support for the vhost-user-blk device

2020-08-24 Thread Dima Stepanov
Add vhost_user_ops structure for the vhost-user-blk device class. Add
the test_reconnect and test_migrate tests for this device.

Signed-off-by: Dima Stepanov 
---
 tests/qtest/vhost-user-test.c | 140 +-
 1 file changed, 138 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 3df5322..9b6e202 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -24,6 +24,7 @@
 #include "libqos/libqos.h"
 #include "libqos/pci-pc.h"
 #include "libqos/virtio-pci.h"
+#include "libqos/virtio-blk.h"
 
 #include "libqos/malloc-pc.h"
 #include "hw/virtio/virtio-net.h"
@@ -31,6 +32,7 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "standard-headers/linux/virtio_ids.h"
 #include "standard-headers/linux/virtio_net.h"
+#include "standard-headers/linux/virtio_blk.h"
 
 #ifdef CONFIG_LINUX
 #include 
@@ -43,6 +45,7 @@
 " -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce"
+#define QEMU_CMD_BLKCHR " -chardev socket,id=chdev0,path=%s%s"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -55,6 +58,7 @@
 #define VHOST_USER_PROTOCOL_F_MQ 0
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
+#define VHOST_USER_PROTOCOL_F_CONFIG 9
 
 #define VHOST_LOG_PAGE 0x1000
 
@@ -78,6 +82,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 VHOST_USER_GET_QUEUE_NUM = 17,
 VHOST_USER_SET_VRING_ENABLE = 18,
+VHOST_USER_GET_CONFIG = 24,
+VHOST_USER_SET_CONFIG = 25,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -99,6 +105,14 @@ typedef struct VhostUserLog {
 uint64_t mmap_offset;
 } VhostUserLog;
 
+#define VHOST_USER_MAX_CONFIG_SIZE 256
+typedef struct VhostUserConfig {
+uint32_t offset;
+uint32_t size;
+uint32_t flags;
+uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
+} VhostUserConfig;
+
 typedef struct VhostUserMsg {
 VhostUserRequest request;
 
@@ -114,6 +128,7 @@ typedef struct VhostUserMsg {
 struct vhost_vring_addr addr;
 VhostUserMemory memory;
 VhostUserLog log;
+VhostUserConfig config;
 } payload;
 } QEMU_PACKED VhostUserMsg;
 
@@ -137,6 +152,7 @@ enum {
 
 enum {
 VHOST_USER_NET,
+VHOST_USER_BLK,
 };
 
 typedef struct TestServer {
@@ -166,12 +182,15 @@ struct vhost_user_ops {
 int type;
 void (*append_opts)(TestServer *s, GString *cmd_line,
 const char *chr_opts);
+void (*driver_init)(void *obj, QGuestAllocator *alloc);
 
 /* VHOST-USER commands. */
 void (*set_features)(TestServer *s, CharBackend *chr,
 VhostUserMsg *msg);
 void (*get_protocol_features)(TestServer *s,
 CharBackend *chr, VhostUserMsg *msg);
+void (*get_config)(TestServer *s, CharBackend *chr,
+VhostUserMsg *msg);
 };
 
 static const char *init_hugepagefs(void);
@@ -194,6 +213,14 @@ static void append_vhost_net_opts(TestServer *s, GString 
*cmd_line,
chr_opts, s->chr_name);
 }
 
+static void append_vhost_blk_opts(TestServer *s, GString *cmd_line,
+ const char *chr_opts)
+{
+g_string_append_printf(cmd_line, QEMU_CMD_BLKCHR,
+   s->socket_path,
+   chr_opts);
+}
+
 static void append_mem_opts(TestServer *server, GString *cmd_line,
 int size, enum test_memfd memfd)
 {
@@ -425,6 +452,12 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
 
+case VHOST_USER_GET_CONFIG:
+if (s->vu_ops->get_config) {
+s->vu_ops->get_config(s, chr, );
+}
+break;
+
 default:
 break;
 }
@@ -727,6 +760,9 @@ static void test_migrate(void *obj, void *arg, 
QGuestAllocator *alloc)
 guint8 *log;
 guint64 size;
 
+if (s->vu_ops->driver_init) {
+s->vu_ops->driver_init(obj, alloc);
+}
 if (!wait_for_fds(s)) {
 return;
 }
@@ -796,6 +832,24 @@ static void test_migrate(void *obj, void *arg, 
QGuestAllocator *alloc)
 g_string_free(dest_cmdline, true);
 }
 
+static void vu_blk_driver_init(void *obj, QGuestAllocator *alloc)
+{
+QVirtioBlk *blk_if;
+QVirtioDevice *dev;
+QVirtQueue *vq;
+uint64_t features;
+
+blk_if = obj;
+dev = blk_if->vdev;
+features = qvirtio_get_features(dev);
+qvirtio_set_features(dev, features);
+
+vq = qvirtqueue_setup(dev, alloc, 0);
+g_assert(vq);
+
+qvirtio_set_driver_ok(dev);
+}
+
 static void wait_for_rings_started(TestServer *s, size_t count)
 {
 gint64 end_time;
@@ -857,12 +911,22 @@ static void test_reconnect(void *obj, void *arg, 
QGuestAllocator *alloc)
 {
 TestServer *s = arg;
 GSource *src;
+int nq;
 
+   

[PATCH v2 0/7] vhost-user-blk: fix the migration issue and enhance qtests

2020-08-24 Thread Dima Stepanov
v1 -> v2:
  - add comments to connected/started fields in the header file
  - move the "s->started" logic from the vhost_user_blk_disconnect
routine to the vhost_user_blk_stop routine

Reference e-mail threads:
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. There was a general
question here: should we consider it as an error or okay state for the 
vhost-user
devices during migration process?
I think the disconnect event for the vhost-user devices should not break the
migration process, because:
  - the device will be in the stopped state, so it will not be changed
during migration
  - if reconnect will be made the migration log will be reinitialized as
part of reconnect/init process:
#0  vhost_log_global_start (listener=0x563989cf7be0)
at hw/virtio/vhost.c:920
#1  0x56398603d8bc in listener_add_address_space 
(listener=0x563989cf7be0,
as=0x563986ea4340 )
at softmmu/memory.c:2664
#2  0x56398603dd30 in memory_listener_register (listener=0x563989cf7be0,
as=0x563986ea4340 )
at softmmu/memory.c:2740
#3  0x563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
busyloop_timeout=0)
at hw/virtio/vhost.c:1385
#4  0x563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
at hw/block/vhost-user-blk.c:315
#5  0x563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
event=CHR_EVENT_OPENED)
at hw/block/vhost-user-blk.c:379
The first patch in the patchset fixes this issue by setting vhost device to the
stopped state in the disconnect handler and check it the vhost_migration_log()
routine before returning from the function.
qtest framework was updated to test vhost-user-blk functionality. The
vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to 
reproduce
the original issue found.

Dima Stepanov (7):
  vhost: recheck dev state in the vhost_migration_log routine
  vhost: check queue state in the vhost_dev_set_log routine
  tests/qtest/vhost-user-test: prepare the tests for adding new dev
class
  tests/qtest/libqos/virtio-blk: add support for vhost-user-blk
  tests/qtest/vhost-user-test: add support for the vhost-user-blk device
  tests/qtest/vhost-user-test: add migrate_reconnect test
  tests/qtest/vhost-user-test: enable the reconnect tests

 hw/block/vhost-user-blk.c  |  19 ++-
 hw/virtio/vhost.c  |  39 -
 include/hw/virtio/vhost-user-blk.h |  10 ++
 tests/qtest/libqos/virtio-blk.c|  14 ++
 tests/qtest/vhost-user-test.c  | 291 +++--
 5 files changed, 324 insertions(+), 49 deletions(-)

-- 
2.7.4




[PATCH v7 2/4] copy-on-read: add filter append/drop functions

2020-08-24 Thread Andrey Shinkevich
Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.c | 104 +++
 block/copy-on-read.h |  35 +
 2 files changed, 139 insertions(+)
 create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..0ede7aa 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,21 @@
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
+BDRVStateCOR *state = bs->opaque;
+
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
false, errp);
@@ -42,6 +52,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
 
+state->active = true;
+
+/*
+ * We don't need to call bdrv_child_refresh_perms() now as the permissions
+ * will be updated later when the filter node gets its parent.
+ */
+
 return 0;
 }
 
@@ -57,6 +74,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild 
*c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVStateCOR *s = bs->opaque;
+
+if (!s->active) {
+/*
+ * While the filter is being removed
+ */
+*nperm = 0;
+*nshared = BLK_PERM_ALL;
+return;
+}
+
 *nperm = perm & PERM_PASSTHROUGH;
 *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +163,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool 
locked)
 
 static BlockDriver bdrv_copy_on_read = {
 .format_name= "copy-on-read",
+.instance_size  = sizeof(BDRVStateCOR),
 
 .bdrv_open  = cor_open,
 .bdrv_child_perm= cor_child_perm,
@@ -159,4 +188,79 @@ static void bdrv_copy_on_read_init(void)
 bdrv_register(_copy_on_read);
 }
 
+
+static BlockDriverState *create_filter_node(BlockDriverState *bs,
+const char *filter_node_name,
+Error **errp)
+{
+QDict *opts = qdict_new();
+
+qdict_put_str(opts, "driver", "copy-on-read");
+qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+if (filter_node_name) {
+qdict_put_str(opts, "node-name", filter_node_name);
+}
+
+return bdrv_open(NULL, NULL, opts, BDRV_O_RDWR, errp);
+}
+
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+ const char *filter_node_name,
+ Error **errp)
+{
+BlockDriverState *cor_filter_bs;
+Error *local_err = NULL;
+
+cor_filter_bs = create_filter_node(bs, filter_node_name, errp);
+if (cor_filter_bs == NULL) {
+error_prepend(errp, "Could not create filter node: ");
+return NULL;
+}
+
+if (!filter_node_name) {
+cor_filter_bs->implicit = true;
+}
+
+bdrv_drained_begin(bs);
+bdrv_replace_node(bs, cor_filter_bs, _err);
+bdrv_drained_end(bs);
+
+if (local_err) {
+bdrv_unref(cor_filter_bs);
+error_propagate(errp, local_err);
+return NULL;
+}
+
+return cor_filter_bs;
+}
+
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+BdrvChild *child;
+BlockDriverState *bs;
+BDRVStateCOR *s = cor_filter_bs->opaque;
+
+child = bdrv_filter_child(cor_filter_bs);
+if (!child) {
+return;
+}
+bs = child->bs;
+
+/* Retain the BDS until we complete the graph change. */
+bdrv_ref(bs);
+/* Hold a guest back from writing while permissions are being reset. */
+bdrv_drained_begin(bs);
+/* Drop permissions before the graph change. */
+s->active = false;
+bdrv_child_refresh_perms(cor_filter_bs, child, _abort);
+bdrv_replace_node(cor_filter_bs, bs, _abort);
+
+bdrv_drained_end(bs);
+bdrv_unref(bs);
+bdrv_unref(cor_filter_bs);
+}
+
+
 block_init(bdrv_copy_on_read_init);
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 000..1686e4e
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,35 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 

[PATCH v7 4/4] block: apply COR-filter to block-stream jobs

2020-08-24 Thread Andrey Shinkevich
This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich 
---
 block/stream.c | 58 +++---
 tests/qemu-iotests/030 | 50 ---
 tests/qemu-iotests/030.out |  4 ++--
 tests/qemu-iotests/245 | 19 +++
 4 files changed, 65 insertions(+), 66 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 8bf6b6d..e927fed 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -19,6 +19,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
 
 enum {
 /*
@@ -33,6 +34,8 @@ typedef struct StreamBlockJob {
 BlockJob common;
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
+BlockDriverState *cor_filter_bs;
+BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
 bool bs_read_only;
@@ -53,22 +56,20 @@ static void stream_abort(Job *job)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
 if (s->chain_frozen) {
-BlockJob *bjob = >common;
-bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 }
 }
 
 static int stream_prepare(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+BlockDriverState *bs = s->target_bs;
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
 BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
 Error *local_err = NULL;
 int ret = 0;
 
-bdrv_unfreeze_backing_chain(bs, s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 s->chain_frozen = false;
 
 if (bdrv_cow_child(unfiltered_bs)) {
@@ -94,7 +95,9 @@ static void stream_clean(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+BlockDriverState *bs = s->target_bs;
+
+bdrv_cor_filter_drop(s->cor_filter_bs);
 
 /* Reopen the image back in read-only mode if necessary */
 if (s->bs_read_only) {
@@ -110,7 +113,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockBackend *blk = s->common.blk;
-BlockDriverState *bs = blk_bs(blk);
+BlockDriverState *bs = s->target_bs;
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
 bool enable_cor = !bdrv_cow_child(s->base_overlay);
 int64_t len;
@@ -231,6 +234,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
 BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
 BlockDriverState *above_base;
+BlockDriverState *cor_filter_bs = NULL;
 
 if (!base_overlay) {
 error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -264,17 +268,36 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 }
 }
 
-/* Prevent concurrent jobs trying to modify the graph structure here, we
- * already have our own plans. Also don't allow resize as the image size is
- * queried only at the job start and then cached. */
-s = block_job_create(job_id, _job_driver, NULL, bs,
- basic_flags | BLK_PERM_GRAPH_MOD,
- basic_flags | BLK_PERM_WRITE,
+cor_filter_bs = bdrv_cor_filter_append(bs, filter_node_name, errp);
+if (cor_filter_bs == NULL) {
+goto fail;
+}
+
+if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
+bdrv_cor_filter_drop(cor_filter_bs);
+cor_filter_bs = NULL;
+goto fail;
+}
+
+s = block_job_create(job_id, _job_driver, NULL, cor_filter_bs,
+  

[PATCH v7 3/4] qapi: add filter-node-name to block-stream

2020-08-24 Thread Andrey Shinkevich
Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/block-hmp-cmds.c | 4 ++--
 block/stream.c | 4 +++-
 blockdev.c | 4 +++-
 include/block/block_int.h  | 7 ++-
 qapi/block-core.json   | 6 ++
 5 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4d3db5e..4e66775 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 
 qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
  false, NULL, qdict_haskey(qdict, "speed"), speed, true,
- BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
- );
+ BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, 
false,
+ false, );
 
 hmp_handle_error(mon, error);
 }
diff --git a/block/stream.c b/block/stream.c
index b9c1141..8bf6b6d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = {
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp)
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp)
 {
 StreamBlockJob *s;
 BlockDriverState *iter;
diff --git a/blockdev.c b/blockdev.c
index 237fffb..cc531cb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2476,6 +2476,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
   bool has_backing_file, const char *backing_file,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
+  bool has_filter_node_name, const char *filter_node_name,
   bool has_auto_finalize, bool auto_finalize,
   bool has_auto_dismiss, bool auto_dismiss,
   Error **errp)
@@ -2558,7 +2559,8 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 
 stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
- job_flags, has_speed ? speed : 0, on_error, _err);
+ job_flags, has_speed ? speed : 0, on_error,
+ filter_node_name, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto out;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 465a601..3efde33 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1122,6 +1122,9 @@ int is_windows_drive(const char *filename);
  *  See @BlockJobCreateFlags
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @on_error: The action to take upon error.
+ * @filter_node_name: The node name that should be assigned to the filter
+ * driver that the commit job inserts into the graph above @bs. NULL means
+ * that a node name should be autogenerated.
  * @errp: Error object.
  *
  * Start a streaming operation on @bs.  Clusters that are unallocated
@@ -1134,7 +1137,9 @@ int is_windows_drive(const char *filename);
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp);
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp);
 
 /**
  * commit_start:
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0b8ccd3..1db6ce1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2524,6 +2524,11 @@
 #'stop' and 'enospc' can only be used if the block device
 #supports io-status (see BlockInfo).  Since 1.3.
 #
+# @filter-node-name: the node name that should be assigned to the
+#filter driver that the stream job inserts into the graph
+#above @device. If this option is not given, a node name is
+#autogenerated. (Since: 5.1)
+#
 # @auto-finalize: When false, this job will wait in a PENDING state after it 
has
 # finished its work, waiting for @block-job-finalize before
 # making any block graph changes.
@@ -2554,6 +2559,7 @@
   'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
 '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
 '*on-error': 'BlockdevOnError',
+ 

[PATCH v7 0/4] Apply COR-filter to the block-stream permanently

2020-08-24 Thread Andrey Shinkevich
Note: this series is based on the another one "block: Deal with filters"
  by Max Reitz that could be found in the branches:
  https://git.xanclic.moe/XanClic/qemu child-access-functions-v6
  https://github.com/XanClic/qemu child-access-functions-v6

v7:
  01: A complete reversion of the commit c624b015bf14f "block/stream:
  introduce a bottom node" doesn't make a sense as the patch
  "stream: Deal with filters" replaces the concept of the
  'bottom node' with the one of the 'base_overlay' and the
  'above_base', introduced by Max in the series "block: Deal with
  filters".
  02: #include "block/block-copy.h" was removed.
  The extra "state->active = true" was dropped.
  The comment about the bdrv_child_refresh_perms() supplied.
  03: "filter_node_name = NULL" dropped in the qmp_block_stream().
  04: The comment to the commit was extended with an explanation of
  why the test case 'test_stream_parallel' was removed.
  Due to the v6 review, the initial "char *base_fmt" is kept for
  the final backing file change no more. I urge to keep it.

The v6 Message-Id:
<1597785880-431103-1-git-send-email-andrey.shinkev...@virtuozzo.com>

Andrey Shinkevich (4):
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  block: apply COR-filter to block-stream jobs

 block/copy-on-read.c   | 132 +
 block/copy-on-read.h   |  35 +++
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c |  62 ++-
 blockdev.c |   4 +-
 include/block/block_int.h  |   7 ++-
 qapi/block-core.json   |   6 ++
 tests/qemu-iotests/030 |  50 ++--
 tests/qemu-iotests/030.out |   4 +-
 tests/qemu-iotests/245 |  19 --
 10 files changed, 240 insertions(+), 83 deletions(-)
 create mode 100644 block/copy-on-read.h

-- 
1.8.3.1




[PATCH v7 1/4] copy-on-read: Support preadv/pwritev_part functions

2020-08-24 Thread Andrey Shinkevich
Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 2816e61..cb03e0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs)
 }
 
 
-static int coroutine_fn cor_co_preadv(BlockDriverState *bs,
-  uint64_t offset, uint64_t bytes,
-  QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
+   uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov,
+   size_t qiov_offset,
+   int flags)
 {
-return bdrv_co_preadv(bs->file, offset, bytes, qiov,
-  flags | BDRV_REQ_COPY_ON_READ);
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
 }
 
 
-static int coroutine_fn cor_co_pwritev(BlockDriverState *bs,
-   uint64_t offset, uint64_t bytes,
-   QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
+uint64_t offset,
+uint64_t bytes,
+QEMUIOVector *qiov,
+size_t qiov_offset, int flags)
 {
-
-return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+flags);
 }
 
 
@@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = {
 
 .bdrv_getlength = cor_getlength,
 
-.bdrv_co_preadv = cor_co_preadv,
-.bdrv_co_pwritev= cor_co_pwritev,
+.bdrv_co_preadv_part= cor_co_preadv_part,
+.bdrv_co_pwritev_part   = cor_co_pwritev_part,
 .bdrv_co_pwrite_zeroes  = cor_co_pwrite_zeroes,
 .bdrv_co_pdiscard   = cor_co_pdiscard,
 .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed,
-- 
1.8.3.1




Re: [PATCH v6 4/4] block: apply COR-filter to block-stream jobs

2020-08-24 Thread Vladimir Sementsov-Ogievskiy

23.08.2020 22:28, Andrey Shinkevich wrote:

On 19.08.2020 13:46, Vladimir Sementsov-Ogievskiy wrote:

19.08.2020 00:24, Andrey Shinkevich wrote:

The patch completes the series with the COR-filter insertion to any
block-stream operation. It also makes changes to the iotests 030.
The test case 'test_stream_parallel' was deleted due to multiple
errors.



...




Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 76 --
  tests/qemu-iotests/030 | 50 +++---
  tests/qemu-iotests/030.out |  4 +--
  3 files changed, 61 insertions(+), 69 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 8bf6b6d..0b11979 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -19,6 +19,7 @@
  #include "qapi/qmp/qerror.h"
  #include "qemu/ratelimit.h"
  #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
    enum {
  /*
@@ -33,8 +34,11 @@ typedef struct StreamBlockJob {
  BlockJob common;
  BlockDriverState *base_overlay; /* COW overlay (stream from this) */
  BlockDriverState *above_base;   /* Node directly above the base */
+    BlockDriverState *cor_filter_bs;
+    BlockDriverState *target_bs;
  BlockdevOnError on_error;
  char *backing_file_str;
+    char *base_fmt;
  bool bs_read_only;
  bool chain_frozen;
  } StreamBlockJob;
@@ -53,34 +57,26 @@ static void stream_abort(Job *job)
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
    if (s->chain_frozen) {
-    BlockJob *bjob = >common;
-    bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+    bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
  }
  }
    static int stream_prepare(Job *job)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-    BlockJob *bjob = >common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *bs = s->target_bs;
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
  Error *local_err = NULL;
  int ret = 0;
  -    bdrv_unfreeze_backing_chain(bs, s->above_base);
+    bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
  s->chain_frozen = false;
    if (bdrv_cow_child(unfiltered_bs)) {
-    const char *base_id = NULL, *base_fmt = NULL;
-    if (base) {
-    base_id = s->backing_file_str;
-    if (base->drv) {
-    base_fmt = base->drv->format_name;
-    }
-    }
  bdrv_set_backing_hd(unfiltered_bs, base, _err);
-    ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt);
+    ret = bdrv_change_backing_file(unfiltered_bs, s->backing_file_str,
+   s->base_fmt);
  if (local_err) {
  error_report_err(local_err);
  return -EPERM;
@@ -94,7 +90,9 @@ static void stream_clean(Job *job)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
  BlockJob *bjob = >common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
+    BlockDriverState *bs = s->target_bs;
+
+    bdrv_cor_filter_drop(s->cor_filter_bs);
    /* Reopen the image back in read-only mode if necessary */
  if (s->bs_read_only) {
@@ -104,13 +102,14 @@ static void stream_clean(Job *job)
  }
    g_free(s->backing_file_str);
+    g_free(s->base_fmt);
  }
    static int coroutine_fn stream_run(Job *job, Error **errp)
  {
  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
  BlockBackend *blk = s->common.blk;
-    BlockDriverState *bs = blk_bs(blk);
+    BlockDriverState *bs = s->target_bs;
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  bool enable_cor = !bdrv_cow_child(s->base_overlay);
  int64_t len;
@@ -231,6 +230,12 @@ void stream_start(const char *job_id, BlockDriverState *bs,
  int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
  BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
  BlockDriverState *above_base;
+    BlockDriverState *cor_filter_bs = NULL;
+    char *base_fmt = NULL;
+
+    if (base && base->drv) {
+    base_fmt = g_strdup(base->drv->format_name);
+    }
    if (!base_overlay) {
  error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -264,17 +269,36 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
  }
  }
  -    /* Prevent concurrent jobs trying to modify the graph structure here, we
- * already have our own plans. Also don't allow resize as the image size is
- * queried only at the job start and then cached. */
-    s = block_job_create(job_id, _job_driver, NULL, bs,
- basic_flags | BLK_PERM_GRAPH_MOD,
- basic_flags | BLK_PERM_WRITE,
+    cor_filter_bs = bdrv_cor_filter_append(bs, filter_node_name, errp);
+    if (cor_filter_bs == NULL) {
+