date:20210506

Re: [PATCH] tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic

2021-05-06 Thread John Snow


On 5/6/21 3:43 PM, Peter Maydell wrote:

Coverity notes that when calculating the 64-bit iso_size value in
ahci_test_cdrom() we actually only do it with 32-bit arithmetic.
This doesn't matter for the current test code because nsectors is
always small; but adding the cast avoids the coverity complaints.

Fixes: Coverity CID 1432343
Signed-off-by: Peter Maydell 
---
  tests/qtest/ahci-test.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index 5e1954852e7..8073ccc2052 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -1491,14 +1491,14 @@ static void ahci_test_cdrom(int nsectors, bool dma, 
uint8_t cmd,
  char *iso;
  int fd;
  AHCIOpts opts = {
-.size = (ATAPI_SECTOR_SIZE * nsectors),
+.size = ((uint64_t)ATAPI_SECTOR_SIZE * nsectors),
  .atapi = true,
  .atapi_dma = dma,
  .post_cb = ahci_cb_cmp_buff,
  .set_bcl = override_bcl,
  .bcl = bcl,
  };
-uint64_t iso_size = ATAPI_SECTOR_SIZE * (nsectors + 1);
+uint64_t iso_size = (uint64_t)ATAPI_SECTOR_SIZE * (nsectors + 1);
  
  /* Prepare ISO and fill 'tx' buffer */

  fd = prepare_iso(iso_size, , );



Whupps, thank you, Coverity.

Reviewed-by: John Snow

Re: [PATCH] virtio-blk: Convert QEMUBH callback to "bitops.h" API

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 9:42 PM, Philippe Mathieu-Daudé wrote:
> On 5/6/21 8:22 PM, Richard Henderson wrote:
>> On 5/6/21 8:54 AM, Philippe Mathieu-Daudé wrote:
>>>   static void notify_guest_bh(void *opaque)
>>>   {
>>>   VirtIOBlockDataPlane *s = opaque;
>>> -    unsigned nvqs = s->conf->num_queues;
>>> -    unsigned long bitmap[BITS_TO_LONGS(nvqs)];
>>> -    unsigned j;
>>>   -    memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
>>> -    memset(s->batch_notify_vqs, 0, sizeof(bitmap));
>>> -
>>> -    for (j = 0; j < nvqs; j += BITS_PER_LONG) {
>>> -    unsigned long bits = bitmap[j / BITS_PER_LONG];
>>> -
>>> -    while (bits != 0) {
>>> -    unsigned i = j + ctzl(bits);
>>> +    for (unsigned i = 0; i < s->conf->num_queues; i++) {
>>
>> Is this bitmap dense enough that you want to iterate by index,

The max is 1Kb:

#define VIRTIO_QUEUE_MAX 1024

> 
> By 'iterate by index' do you mean the actual iteration with 'j'?
> 
>> or is it
>> sparse enough to iterate via find_first_bit/find_next_bit?
> 
> I looked at find_first_bit/find_next_bit() but they seemed to do
> a lot more than test_and_clear_bit(). As Stefan said this is hot
> path, I thought this would be cheaper, but haven't profiled the
> performance.
> 
>> In either case, leave the copy of  s->conf->num_queues to a local variable.
> 
> That is sensible to do :)
>

Re: [PATCH] tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 9:43 PM, Peter Maydell wrote:
> Coverity notes that when calculating the 64-bit iso_size value in
> ahci_test_cdrom() we actually only do it with 32-bit arithmetic.
> This doesn't matter for the current test code because nsectors is
> always small; but adding the cast avoids the coverity complaints.
> 
> Fixes: Coverity CID 1432343
> Signed-off-by: Peter Maydell 
> ---
>  tests/qtest/ahci-test.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

[PATCH] tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic

2021-05-06 Thread Peter Maydell

Coverity notes that when calculating the 64-bit iso_size value in
ahci_test_cdrom() we actually only do it with 32-bit arithmetic.
This doesn't matter for the current test code because nsectors is
always small; but adding the cast avoids the coverity complaints.

Fixes: Coverity CID 1432343
Signed-off-by: Peter Maydell 
---
 tests/qtest/ahci-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index 5e1954852e7..8073ccc2052 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -1491,14 +1491,14 @@ static void ahci_test_cdrom(int nsectors, bool dma, 
uint8_t cmd,
 char *iso;
 int fd;
 AHCIOpts opts = {
-.size = (ATAPI_SECTOR_SIZE * nsectors),
+.size = ((uint64_t)ATAPI_SECTOR_SIZE * nsectors),
 .atapi = true,
 .atapi_dma = dma,
 .post_cb = ahci_cb_cmp_buff,
 .set_bcl = override_bcl,
 .bcl = bcl,
 };
-uint64_t iso_size = ATAPI_SECTOR_SIZE * (nsectors + 1);
+uint64_t iso_size = (uint64_t)ATAPI_SECTOR_SIZE * (nsectors + 1);
 
 /* Prepare ISO and fill 'tx' buffer */
 fd = prepare_iso(iso_size, , );
-- 
2.20.1

Re: [PATCH] virtio-blk: Convert QEMUBH callback to "bitops.h" API

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 8:22 PM, Richard Henderson wrote:
> On 5/6/21 8:54 AM, Philippe Mathieu-Daudé wrote:
>>   static void notify_guest_bh(void *opaque)
>>   {
>>   VirtIOBlockDataPlane *s = opaque;
>> -    unsigned nvqs = s->conf->num_queues;
>> -    unsigned long bitmap[BITS_TO_LONGS(nvqs)];
>> -    unsigned j;
>>   -    memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
>> -    memset(s->batch_notify_vqs, 0, sizeof(bitmap));
>> -
>> -    for (j = 0; j < nvqs; j += BITS_PER_LONG) {
>> -    unsigned long bits = bitmap[j / BITS_PER_LONG];
>> -
>> -    while (bits != 0) {
>> -    unsigned i = j + ctzl(bits);
>> +    for (unsigned i = 0; i < s->conf->num_queues; i++) {
> 
> Is this bitmap dense enough that you want to iterate by index,

By 'iterate by index' do you mean the actual iteration with 'j'?

> or is it
> sparse enough to iterate via find_first_bit/find_next_bit?

I looked at find_first_bit/find_next_bit() but they seemed to do
a lot more than test_and_clear_bit(). As Stefan said this is hot
path, I thought this would be cheaper, but haven't profiled the
performance.

> In either case, leave the copy of  s->conf->num_queues to a local variable.

That is sensible to do :)

[PATCH] block/copy-on-read: use bdrv_drop_filter() and drop s->active

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

Now, after huge update of block graph permission update algorithm, we
don't need this workaround with active state of the filter. Drop it and
use new smart bdrv_drop_filter() function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.c | 33 +
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 9cad9e1b8c..c428682272 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -29,7 +29,6 @@
 
 
 typedef struct BDRVStateCOR {
-bool active;
 BlockDriverState *bottom_bs;
 bool chain_frozen;
 } BDRVStateCOR;
@@ -89,7 +88,6 @@ static int cor_open(BlockDriverState *bs, QDict *options, int 
flags,
  */
 bdrv_ref(bottom_bs);
 }
-state->active = true;
 state->bottom_bs = bottom_bs;
 
 /*
@@ -112,17 +110,6 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild 
*c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
-BDRVStateCOR *s = bs->opaque;
-
-if (!s->active) {
-/*
- * While the filter is being removed
- */
-*nperm = 0;
-*nshared = BLK_PERM_ALL;
-return;
-}
-
 *nperm = perm & PERM_PASSTHROUGH;
 *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -280,32 +267,14 @@ static BlockDriver bdrv_copy_on_read = {
 
 void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
 {
-BdrvChild *child;
-BlockDriverState *bs;
 BDRVStateCOR *s = cor_filter_bs->opaque;
 
-child = bdrv_filter_child(cor_filter_bs);
-if (!child) {
-return;
-}
-bs = child->bs;
-
-/* Retain the BDS until we complete the graph change. */
-bdrv_ref(bs);
-/* Hold a guest back from writing while permissions are being reset. */
-bdrv_drained_begin(bs);
-/* Drop permissions before the graph change. */
-s->active = false;
 /* unfreeze, as otherwise bdrv_replace_node() will fail */
 if (s->chain_frozen) {
 s->chain_frozen = false;
 bdrv_unfreeze_backing_chain(cor_filter_bs, s->bottom_bs);
 }
-bdrv_child_refresh_perms(cor_filter_bs, child, _abort);
-bdrv_replace_node(cor_filter_bs, bs, _abort);
-
-bdrv_drained_end(bs);
-bdrv_unref(bs);
+bdrv_drop_filter(cor_filter_bs, _abort);
 bdrv_unref(cor_filter_bs);
 }
 
-- 
2.29.2

Re: [PATCH] Add missing coroutine_fn function signature to functions

2021-05-06 Thread cennedee

Thank you for the feedback. Here is an updated patch.
Added more functions related to do_sgio() as suggested. Also found one related 
to prh_co_entry()

I have removed (edits on) two files as there are many more functions I found in 
those. So I think another thread might be more appropriate (where I'll send 
patches flie by file). Focusing on two files here now.

For reference the path of calls that end up on a coroutine_fn below

do_pr_in() --> do_sgio()
do_pr_out() --> do_sgio()
mpath_reconstruct_sense() --> do_sgio()
multipath_pr_out() --> mpath_reconstruct_sense() --> do_sgio()
multipath_pr_in() --> mpath_reconstruct_sense() --> do_sgio()
accept_client() --> prh_co_entry()



>From 79f787c2ef5f713546b38a76367d273ff65742d6 Mon Sep 17 00:00:00 2001
From: Cenne Dee 
Date: Fri, 30 Apr 2021 15:52:28 -0400
Subject: [PATCH] Add missing coroutine_fn function signature to some _co()
 functions

Patch adds the signature for relevant functions ending with _co
or those that use them.

Signed-off-by: Cenne Dee 
---
 scsi/qemu-pr-helper.c | 14 +++---
 tests/unit/test-thread-pool.c |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
index 7b9389b..695539b 100644
--- a/scsi/qemu-pr-helper.c
+++ b/scsi/qemu-pr-helper.c
@@ -175,7 +175,7 @@ static int do_sgio_worker(void *opaque)
 return status;
 }

-static int do_sgio(int fd, const uint8_t *cdb, uint8_t *sense,
+static int coroutine_fn do_sgio(int fd, const uint8_t *cdb, uint8_t *sense,
 uint8_t *buf, int *sz, int dir)
 {
 ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
@@ -318,7 +318,7 @@ static SCSISense mpath_generic_sense(int r)
 }
 }

-static int mpath_reconstruct_sense(int fd, int r, uint8_t *sense)
+static int coroutine_fn mpath_reconstruct_sense(int fd, int r, uint8_t *sense)
 {
 switch (r) {
 case MPATH_PR_SUCCESS:
@@ -370,7 +370,7 @@ static int mpath_reconstruct_sense(int fd, int r, uint8_t 
*sense)
 }
 }

-static int multipath_pr_in(int fd, const uint8_t *cdb, uint8_t *sense,
+static int coroutine_fn multipath_pr_in(int fd, const uint8_t *cdb, uint8_t 
*sense,
uint8_t *data, int sz)
 {
 int rq_servact = cdb[1];
@@ -425,7 +425,7 @@ static int multipath_pr_in(int fd, const uint8_t *cdb, 
uint8_t *sense,
 return mpath_reconstruct_sense(fd, r, sense);
 }

-static int multipath_pr_out(int fd, const uint8_t *cdb, uint8_t *sense,
+static int coroutine_fn multipath_pr_out(int fd, const uint8_t *cdb, uint8_t 
*sense,
 const uint8_t *param, int sz)
 {
 int rq_servact = cdb[1];
@@ -543,7 +543,7 @@ static int multipath_pr_out(int fd, const uint8_t *cdb, 
uint8_t *sense,
 }
 #endif

-static int do_pr_in(int fd, const uint8_t *cdb, uint8_t *sense,
+static int coroutine_fn do_pr_in(int fd, const uint8_t *cdb, uint8_t *sense,
 uint8_t *data, int *resp_sz)
 {
 #ifdef CONFIG_MPATH
@@ -561,7 +561,7 @@ static int do_pr_in(int fd, const uint8_t *cdb, uint8_t 
*sense,
SG_DXFER_FROM_DEV);
 }

-static int do_pr_out(int fd, const uint8_t *cdb, uint8_t *sense,
+static int coroutine_fn do_pr_out(int fd, const uint8_t *cdb, uint8_t *sense,
  const uint8_t *param, int sz)
 {
 int resp_sz;
@@ -804,7 +804,7 @@ out:
 g_free(client);
 }

-static gboolean accept_client(QIOChannel *ioc, GIOCondition cond, gpointer 
opaque)
+static gboolean coroutine_fn accept_client(QIOChannel *ioc, GIOCondition cond, 
gpointer opaque)
 {
 QIOChannelSocket *cioc;
 PRHelperClient *prh;
diff --git a/tests/unit/test-thread-pool.c b/tests/unit/test-thread-pool.c
index 70dc631..dbaf72c 100644
--- a/tests/unit/test-thread-pool.c
+++ b/tests/unit/test-thread-pool.c
@@ -72,7 +72,7 @@ static void test_submit_aio(void)
 g_assert_cmpint(data.ret, ==, 0);
 }

-static void co_test_cb(void *opaque)
+static void coroutine_fn co_test_cb(void *opaque)
 {
 WorkerTestData *data = opaque;

--
2.31.1

Re: [PATCH] virtio-blk: Convert QEMUBH callback to "bitops.h" API

2021-05-06 Thread Richard Henderson


On 5/6/21 8:54 AM, Philippe Mathieu-Daudé wrote:

  static void notify_guest_bh(void *opaque)
  {
  VirtIOBlockDataPlane *s = opaque;
-unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-unsigned j;
  
-memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));

-memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-while (bits != 0) {
-unsigned i = j + ctzl(bits);
+for (unsigned i = 0; i < s->conf->num_queues; i++) {


Is this bitmap dense enough that you want to iterate by index, or is it sparse 
enough to iterate via find_first_bit/find_next_bit?


In either case, leave the copy of  s->conf->num_queues to a local variable.


r~

Re: [PATCH 7/7] virtiofsd: use GDateTime for formatting timestamp for debug messages

2021-05-06 Thread Dr. David Alan Gilbert

* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> * Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > The GDateTime APIs provided by GLib avoid portability pitfalls, such
> > > as some platforms where 'struct timeval.tv_sec' field is still 'long'
> > > instead of 'time_t'. When combined with automatic cleanup, GDateTime
> > > often results in simpler code too.
> > > 
> > > Signed-off-by: Daniel P. Berrangé 
> > 
> > Reviewed-by: Dr. David Alan Gilbert 
> 
> Queued this 7/7 via virtiofsd

Hi Dan,
  I've had to drop this because it triggers seccomp:

#0  0x7f0afec4d21b in readlink () from /lib64/libc.so.6
#1  0x7f0afeecff24 in g_file_read_link () from /lib64/libglib-2.0.so.0
#2  0x7f0afef16390 in g_time_zone_new_identifier () from 
/lib64/libglib-2.0.so.0
#3  0x7f0afef16fe8 in g_time_zone_new_local () from /lib64/libglib-2.0.so.0
#4  0x7f0afeec9520 in g_date_time_new_now_local () from 
/lib64/libglib-2.0.so.0
#5  0x55a6bd6bc27a in log_func (level=FUSE_LOG_INFO, fmt=0x55a6bd6e3cda 
"%s: Entry\n", ap=0x7ffece4b2f50) at ../tools/virtiofsd/passthrough_ll.c:3578
#6  0x55a6bd6b3dd5 in fuse_log (level=level@entry=FUSE_LOG_INFO, 
fmt=fmt@entry=0x55a6bd6e3cda "%s: Entry\n") at ../tools/virtiofsd/fuse_log.c:37
#7  0x55a6bd6ba9fd in virtio_loop (se=se@entry=0x55a6bee29a30) at 
../tools/virtiofsd/fuse_virtio.c:878
#8  0x55a6bd6b2017 in main (argc=, argv=) at 
../tools/virtiofsd/passthrough_ll.c:3868


so you either need to add seccomp rules and/or persuade it not to moan;
maybe dropping down to UTC would work.

Dave

> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 25 -
> > >  1 file changed, 4 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > > b/tools/virtiofsd/passthrough_ll.c
> > > index 1553d2ef45..cdd224918c 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -3558,10 +3558,6 @@ static void setup_nofile_rlimit(unsigned long 
> > > rlimit_nofile)
> > >  static void log_func(enum fuse_log_level level, const char *fmt, va_list 
> > > ap)
> > >  {
> > >  g_autofree char *localfmt = NULL;
> > > -struct timespec ts;
> > > -struct tm tm;
> > > -char sec_fmt[sizeof "2020-12-07 18:17:54"];
> > > -char zone_fmt[sizeof "+0100"];
> > >  
> > >  if (current_log_level < level) {
> > >  return;
> > > @@ -3573,23 +3569,10 @@ static void log_func(enum fuse_log_level level, 
> > > const char *fmt, va_list ap)
> > >  localfmt = g_strdup_printf("[ID: %08ld] %s", 
> > > syscall(__NR_gettid),
> > > fmt);
> > >  } else {
> > > -/* try formatting a broken-down timestamp */
> > > -if (clock_gettime(CLOCK_REALTIME, ) != -1 &&
> > > -localtime_r(_sec, ) != NULL &&
> > > -strftime(sec_fmt, sizeof sec_fmt, "%Y-%m-%d %H:%M:%S",
> > > - ) != 0 &&
> > > -strftime(zone_fmt, sizeof zone_fmt, "%z", ) != 0) {
> > > -localfmt = g_strdup_printf("[%s.%02ld%s] [ID: %08ld] %s",
> > > -   sec_fmt,
> > > -   ts.tv_nsec / (10L * 1000 * 
> > > 1000),
> > > -   zone_fmt, 
> > > syscall(__NR_gettid),
> > > -   fmt);
> > > -} else {
> > > -/* fall back to a flat timestamp */
> > > -localfmt = g_strdup_printf("[%" PRId64 "] [ID: %08ld] 
> > > %s",
> > > -   get_clock(), 
> > > syscall(__NR_gettid),
> > > -   fmt);
> > > -}
> > > +g_autoptr(GDateTime) now = g_date_time_new_now_local();
> > > +g_autofree char *nowstr = g_date_time_format(now, "%Y-%m-%d 
> > > %H:%M:%S.%f%z");
> > > +localfmt = g_strdup_printf("[%s] [ID: %08ld] %s",
> > > +   nowstr, syscall(__NR_gettid), 
> > > fmt);
> > >  }
> > >  fmt = localfmt;
> > >  }
> > > -- 
> > > 2.31.1
> > > 
> > -- 
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > 
> > 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH] block/snapshot: Clarify goto fallback behavior

2021-05-06 Thread Vladimir Sementsov-Ogievskiy


05.05.2021 23:37, Vladimir Sementsov-Ogievskiy wrote:

05.05.2021 19:25, Max Reitz wrote:

On 05.05.21 17:05, Vladimir Sementsov-Ogievskiy wrote:

03.05.2021 12:54, Max Reitz wrote:

In the bdrv_snapshot_goto() fallback code, we work with a pointer to
either bs->file or bs->backing. 



We close that child,


Do we?


We *detach it.


close the node
(with .bdrv_close()), apply the snapshot on the child node, and then
re-open the node (with .bdrv_open()).

In order for .bdrv_open() to attach the same child node that we had
before, we pass "file={child-node}" or "backing={child-node}" to it.
Therefore, when .bdrv_open() has returned success, we can assume that
bs->file or bs->backing (respectively) points to our original child
again.  This is verified by an assertion.

All of this is not immediately clear from a quick glance at the code,
so add a comment to the assertion what it is for, and why it is valid.
It certainly confused Coverity.

Reported-by: Coverity (CID 1452774)
Signed-off-by: Max Reitz 
---
  block/snapshot.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index e8ae9a28c1..cce5e35b35 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -275,13 +275,16 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
  qobject_unref(file_options);
  g_free(subqdict_prefix);
+    /* Force .bdrv_open() below to re-attach fallback_bs on *fallback_ptr 
*/
  qdict_put_str(options, (*fallback_ptr)->name,
    bdrv_get_node_name(fallback_bs));
+    /* Now close bs, apply the snapshot on fallback_bs, and re-open bs */
  if (drv->bdrv_close) {
  drv->bdrv_close(bs);
  }
+    /* .bdrv_open() will re-attach it */
  bdrv_unref_child(bs, *fallback_ptr);
  *fallback_ptr = NULL;
@@ -296,7 +299,16 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
  return ret < 0 ? ret : open_ret;
  }
-    assert(fallback_bs == (*fallback_ptr)->bs);
+    /*
+ * fallback_ptr is >file or >backing.  *fallback_ptr
+ * was closed above and set to NULL, but the .bdrv_open() call
+ * has opened it again, because we set the respective option
+ * (with the qdict_put_str() call above).
+ * Assert that .bdrv_open() has attached some child on
+ * *fallback_ptr, and that it has attached the one we wanted
+ * it to (i.e., fallback_bs).
+ */
+    assert(*fallback_ptr && fallback_bs == (*fallback_ptr)->bs);
  bdrv_unref(fallback_bs);
  return ret;
  }



Reviewed-by: Vladimir Sementsov-Ogievskiy 

===

I still think that this fallback has more problems.. Nothing guarantee that all 
format drivers (even restricted to those who have only one child) support such 
logic.

For example, .bdrv_open() may rely on state structure were zeroed and not 
initialize some fields. And .bdrv_close() may just g_free() some things, not 
setting them to zero.. So we probably should call bdrv_open()/bdrv_close() 
generic functions. But we can't: at least bdrv_close() requires that node has 
no parents.

Not saying about that we lose everything on failure (when actually, it's better 
to restore original state on failure).

This feature should instead be done with help of bdrv_reopen_multiple(), and 
even with it it's not obvious how to do it properly.

The feature were done long ago in 2010 with commit 
7cdb1f6d305e1000b5f882257cbee71b8bb08ef5 by Morita Kazutaka.

Note also, that the only protocol driver that support snapshots is rbd, and 
snapshot support was added to it in 2012 with commit 
bd6032470631d8d5de6db84ecb55398b70d9d2f3 by Gregory Farnum.

Other two drivers with support are sheepdog (which is deprecated) and qcow2 (I 
doubt that it should be used as protocol driver for some other format).


Do we really need this fallback? Is it used somewhere? May be, we can just 
deprecate it, and look will someone complain?


Maybe? :)



:) I'll send a patch later.




Or not.. We need first some clean snapshot qmp api. And blockdev-reopen. And 
then just deprecate old hmp snapshot api.


--
Best regards,
Vladimir

Re: [PATCH 7/7] virtiofsd: use GDateTime for formatting timestamp for debug messages

2021-05-06 Thread Dr. David Alan Gilbert

* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
> * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > The GDateTime APIs provided by GLib avoid portability pitfalls, such
> > as some platforms where 'struct timeval.tv_sec' field is still 'long'
> > instead of 'time_t'. When combined with automatic cleanup, GDateTime
> > often results in simpler code too.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> 
> Reviewed-by: Dr. David Alan Gilbert 

Queued this 7/7 via virtiofsd

> > ---
> >  tools/virtiofsd/passthrough_ll.c | 25 -
> >  1 file changed, 4 insertions(+), 21 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > b/tools/virtiofsd/passthrough_ll.c
> > index 1553d2ef45..cdd224918c 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -3558,10 +3558,6 @@ static void setup_nofile_rlimit(unsigned long 
> > rlimit_nofile)
> >  static void log_func(enum fuse_log_level level, const char *fmt, va_list 
> > ap)
> >  {
> >  g_autofree char *localfmt = NULL;
> > -struct timespec ts;
> > -struct tm tm;
> > -char sec_fmt[sizeof "2020-12-07 18:17:54"];
> > -char zone_fmt[sizeof "+0100"];
> >  
> >  if (current_log_level < level) {
> >  return;
> > @@ -3573,23 +3569,10 @@ static void log_func(enum fuse_log_level level, 
> > const char *fmt, va_list ap)
> >  localfmt = g_strdup_printf("[ID: %08ld] %s", 
> > syscall(__NR_gettid),
> > fmt);
> >  } else {
> > -/* try formatting a broken-down timestamp */
> > -if (clock_gettime(CLOCK_REALTIME, ) != -1 &&
> > -localtime_r(_sec, ) != NULL &&
> > -strftime(sec_fmt, sizeof sec_fmt, "%Y-%m-%d %H:%M:%S",
> > - ) != 0 &&
> > -strftime(zone_fmt, sizeof zone_fmt, "%z", ) != 0) {
> > -localfmt = g_strdup_printf("[%s.%02ld%s] [ID: %08ld] %s",
> > -   sec_fmt,
> > -   ts.tv_nsec / (10L * 1000 * 
> > 1000),
> > -   zone_fmt, syscall(__NR_gettid),
> > -   fmt);
> > -} else {
> > -/* fall back to a flat timestamp */
> > -localfmt = g_strdup_printf("[%" PRId64 "] [ID: %08ld] %s",
> > -   get_clock(), 
> > syscall(__NR_gettid),
> > -   fmt);
> > -}
> > +g_autoptr(GDateTime) now = g_date_time_new_now_local();
> > +g_autofree char *nowstr = g_date_time_format(now, "%Y-%m-%d 
> > %H:%M:%S.%f%z");
> > +localfmt = g_strdup_printf("[%s] [ID: %08ld] %s",
> > +   nowstr, syscall(__NR_gettid), fmt);
> >  }
> >  fmt = localfmt;
> >  }
> > -- 
> > 2.31.1
> > 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH] virtio-blk: Convert QEMUBH callback to "bitops.h" API

2021-05-06 Thread Philippe Mathieu-Daudé

By directly using test_and_clear_bit() from the "bitops.h" API,
we can remove the bitmap[] variable-length array copy on the stack
and the complex manual bit testing/clearing logic.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/dataplane/virtio-blk.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e9050c8987e..25d9b10c02b 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -59,23 +59,12 @@ void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, 
VirtQueue *vq)
 static void notify_guest_bh(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
-unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-unsigned j;
 
-memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
-memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-while (bits != 0) {
-unsigned i = j + ctzl(bits);
+for (unsigned i = 0; i < s->conf->num_queues; i++) {
+if (test_and_clear_bit(i, s->batch_notify_vqs)) {
 VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
 virtio_notify_irqfd(s->vdev, vq);
-
-bits &= bits - 1; /* clear right-most bit */
 }
 }
 }
-- 
2.26.3

Re: [PATCH 06/23] hw/block/dataplane/virtio-blk: Avoid dynamic stack allocation

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 4:47 PM, Stefan Hajnoczi wrote:
> On Thu, May 06, 2021 at 11:01:47AM +0200, Philippe Mathieu-Daudé wrote:
>> On 5/6/21 10:53 AM, Stefan Hajnoczi wrote:
>>> On Wed, May 05, 2021 at 11:10:30PM +0200, Philippe Mathieu-Daudé wrote:
 Use autofree heap allocation instead of variable-length
 array on the stack.

 Signed-off-by: Philippe Mathieu-Daudé 
 ---
  hw/block/dataplane/virtio-blk.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> Why?
>>
>> The motivation behind removing all variable-length allocations
>> (and adding CPPFLAG+=-Wvla at the end) is to avoid security
>> vulnerabilities such CVE-2021-3527.
> 
> I see. Please mention it in the commit description. There could be other
> reasons for this change, like minimizing stack usage, so I wasn't sure
> why.

Fair enough.

>>> This is a performance-critical code path and BITS_TO_LONGS(nvqs) is
>>> small.
>>
>> OK, having looked better at nvqs, I suppose this is preferred:
>>
>> -- >8 --
>> @@ -60,7 +60,7 @@ static void notify_guest_bh(void *opaque)
>>  {
>>  VirtIOBlockDataPlane *s = opaque;
>>  unsigned nvqs = s->conf->num_queues;
>> -unsigned long bitmap[BITS_TO_LONGS(nvqs)];
>> +unsigned long bitmap[BITS_TO_LONGS(VIRTIO_QUEUE_MAX)];
>>  unsigned j;
>>
>>  memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
>> ---
>>
>> Would that work for you?
> 
> It's a little risky since s->batch_notify_vqs does not have
> sizeof(bitmap). That makes uninitialized data and buffer overflows more
> likely. Your example has the bug:
> 
>   memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
>   ^^
>   Accesses beyond the end of s->batch_notify_vqs[].

Doh I missed that :/

> Can we eliminate bitmap[] entirely by using bitops.h APIs on
> s->batch_notify_vqs instead?

You meant the bitmap.h API I assume, OK, better idea!

Thanks,

Phil.

Re: [PATCH 06/23] hw/block/dataplane/virtio-blk: Avoid dynamic stack allocation

2021-05-06 Thread Stefan Hajnoczi

On Thu, May 06, 2021 at 11:01:47AM +0200, Philippe Mathieu-Daudé wrote:
> On 5/6/21 10:53 AM, Stefan Hajnoczi wrote:
> > On Wed, May 05, 2021 at 11:10:30PM +0200, Philippe Mathieu-Daudé wrote:
> >> Use autofree heap allocation instead of variable-length
> >> array on the stack.
> >>
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >>  hw/block/dataplane/virtio-blk.c | 7 ---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > Why?
> 
> The motivation behind removing all variable-length allocations
> (and adding CPPFLAG+=-Wvla at the end) is to avoid security
> vulnerabilities such CVE-2021-3527.

I see. Please mention it in the commit description. There could be other
reasons for this change, like minimizing stack usage, so I wasn't sure
why.

> > This is a performance-critical code path and BITS_TO_LONGS(nvqs) is
> > small.
> 
> OK, having looked better at nvqs, I suppose this is preferred:
> 
> -- >8 --
> @@ -60,7 +60,7 @@ static void notify_guest_bh(void *opaque)
>  {
>  VirtIOBlockDataPlane *s = opaque;
>  unsigned nvqs = s->conf->num_queues;
> -unsigned long bitmap[BITS_TO_LONGS(nvqs)];
> +unsigned long bitmap[BITS_TO_LONGS(VIRTIO_QUEUE_MAX)];
>  unsigned j;
> 
>  memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
> ---
> 
> Would that work for you?

It's a little risky since s->batch_notify_vqs does not have
sizeof(bitmap). That makes uninitialized data and buffer overflows more
likely. Your example has the bug:

  memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
  ^^
  Accesses beyond the end of s->batch_notify_vqs[].

Can we eliminate bitmap[] entirely by using bitops.h APIs on
s->batch_notify_vqs instead?

Stefan


signature.asc
Description: PGP signature

[PATCH 3/5] block/stream: add own blk

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

block-stream is the only block-job, that reasonably use BlockJob.blk.
We are going to drop BlockJob.blk soon. So, let block-stream have own
blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/stream.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 97bee482dc..01a8afc48c 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -33,6 +33,7 @@ enum {
 
 typedef struct StreamBlockJob {
 BlockJob common;
+BlockBackend *blk;
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
 BlockDriverState *cor_filter_bs;
@@ -85,17 +86,18 @@ static int stream_prepare(Job *job)
 static void stream_clean(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockJob *bjob = >common;
 
 if (s->cor_filter_bs) {
 bdrv_cor_filter_drop(s->cor_filter_bs);
 s->cor_filter_bs = NULL;
 }
 
+blk_unref(s->blk);
+s->blk = NULL;
+
 /* Reopen the image back in read-only mode if necessary */
 if (s->bs_read_only) {
 /* Give up write permissions before making it read-only */
-blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort);
 bdrv_reopen_set_read_only(s->target_bs, true, NULL);
 }
 
@@ -105,7 +107,6 @@ static void stream_clean(Job *job)
 static int coroutine_fn stream_run(Job *job, Error **errp)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockBackend *blk = s->common.blk;
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
 int64_t len;
 int64_t offset = 0;
@@ -156,7 +157,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 }
 trace_stream_one_iteration(s, offset, n, ret);
 if (copy) {
-ret = stream_populate(blk, offset, n);
+ret = stream_populate(s->blk, offset, n);
 }
 if (ret < 0) {
 BlockErrorAction action =
@@ -291,13 +292,24 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 }
 
 s = block_job_create(job_id, _job_driver, NULL, cor_filter_bs,
- BLK_PERM_CONSISTENT_READ,
- basic_flags | BLK_PERM_WRITE,
+ 0, BLK_PERM_ALL,
  speed, creation_flags, NULL, NULL, errp);
 if (!s) {
 goto fail;
 }
 
+s->blk = blk_new_with_bs(cor_filter_bs, BLK_PERM_CONSISTENT_READ,
+ basic_flags | BLK_PERM_WRITE, errp);
+if (!s->blk) {
+goto fail;
+}
+/*
+ * Disable request queuing in the BlockBackend to avoid deadlocks on drain:
+ * The job reports that it's busy until it reaches a pause point.
+ */
+blk_set_disable_request_queuing(s->blk, true);
+blk_set_allow_aio_context_change(s->blk, true);
+
 /*
  * Prevent concurrent jobs trying to modify the graph structure here, we
  * already have our own plans. Also don't allow resize as the image size is
-- 
2.29.2

[PATCH 2/5] test-blockjob-txn: don't abuse job->blk

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

Here we use job->blk to drop our own reference in job cleanup. Let's do
simpler: drop our reference immediately after job creation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/unit/test-blockjob-txn.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index 8bd13b9949..c69028b450 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -25,14 +25,6 @@ typedef struct {
 int *result;
 } TestBlockJob;
 
-static void test_block_job_clean(Job *job)
-{
-BlockJob *bjob = container_of(job, BlockJob, job);
-BlockDriverState *bs = blk_bs(bjob->blk);
-
-bdrv_unref(bs);
-}
-
 static int coroutine_fn test_block_job_run(Job *job, Error **errp)
 {
 TestBlockJob *s = container_of(job, TestBlockJob, common.job);
@@ -73,7 +65,6 @@ static const BlockJobDriver test_block_job_driver = {
 .free  = block_job_free,
 .user_resume   = block_job_user_resume,
 .run   = test_block_job_run,
-.clean = test_block_job_clean,
 },
 };
 
@@ -105,6 +96,7 @@ static BlockJob *test_block_job_start(unsigned int 
iterations,
 s = block_job_create(job_id, _block_job_driver, txn, bs,
  0, BLK_PERM_ALL, 0, JOB_DEFAULT,
  test_block_job_cb, data, _abort);
+bdrv_unref(bs); /* referenced by job now */
 s->iterations = iterations;
 s->use_timer = use_timer;
 s->rc = rc;
-- 
2.29.2

[PATCH 5/5] blockjob: drop BlockJob.blk field

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

It's unused now (except for permission handling)[*]. The only reasonable
user of it was block-stream job, recently updated to use own blk. And
other block jobs prefer to use own source node related objects.

So, the arguments of dropping the field are:

 - block jobs prefer not to use it
 - block jobs usually has more then one node to operate on, and better
   to operate symmetrically (for example has both source and target
   blk's in specific block-job state structure)

*: BlockJob.blk is used to keep some permissions. We simply move
permissions to block-job child created in block_job_create() together
with blk.

In mirror, we just should not care anymore about restoring state of
blk. Most probably this code could be dropped long ago, after dropping
bs->job pointer. Now it finally goes away together with BlockJob.blk
itself.

iotest 141 output is updated, as "bdrv_has_blk(bs)" check in
qmp_blockdev_del() doesn't fail (we don't have blk now). Still, new
error message looks even better.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/blockjob.h   |  3 ---
 block/mirror.c |  7 ---
 blockjob.c | 19 ++-
 tests/qemu-iotests/141.out |  2 +-
 4 files changed, 3 insertions(+), 28 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 3b84805140..87fbb3985f 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -43,9 +43,6 @@ typedef struct BlockJob {
 /** Data belonging to the generic Job infrastructure */
 Job job;
 
-/** The block device on which the job is operating.  */
-BlockBackend *blk;
-
 /** Status that is published by the query-block-jobs QMP API */
 BlockDeviceIoStatus iostatus;
 
diff --git a/block/mirror.c b/block/mirror.c
index 840b8e8c15..cbebcf8798 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -754,13 +754,6 @@ static int mirror_exit_common(Job *job)
 block_job_remove_all_bdrv(bjob);
 bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, _abort);
 
-/* We just changed the BDS the job BB refers to (with either or both of the
- * bdrv_replace_node() calls), so switch the BB back so the cleanup does
- * the right thing. We don't need any permissions any more now. */
-blk_remove_bs(bjob->blk);
-blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort);
-blk_insert_bs(bjob->blk, mirror_top_bs, _abort);
-
 bs_opaque->job = NULL;
 
 bdrv_drained_end(src);
diff --git a/blockjob.c b/blockjob.c
index 2c168b93b0..70f3220ffd 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -86,7 +86,6 @@ void block_job_free(Job *job)
 BlockJob *bjob = container_of(job, BlockJob, job);
 
 block_job_remove_all_bdrv(bjob);
-blk_unref(bjob->blk);
 error_free(bjob->blocker);
 }
 
@@ -419,22 +418,15 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
uint64_t shared_perm, int64_t speed, int flags,
BlockCompletionFunc *cb, void *opaque, Error **errp)
 {
-BlockBackend *blk;
 BlockJob *job;
 
 if (job_id == NULL && !(flags & JOB_INTERNAL)) {
 job_id = bdrv_get_device_name(bs);
 }
 
-blk = blk_new_with_bs(bs, perm, shared_perm, errp);
-if (!blk) {
-return NULL;
-}
-
-job = job_create(job_id, >job_driver, txn, 
blk_get_aio_context(blk),
+job = job_create(job_id, >job_driver, txn, 
bdrv_get_aio_context(bs),
  flags, cb, opaque, errp);
 if (job == NULL) {
-blk_unref(blk);
 return NULL;
 }
 
@@ -442,8 +434,6 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 assert(job->job.driver->free == _job_free);
 assert(job->job.driver->user_resume == _job_user_resume);
 
-job->blk = blk;
-
 job->finalize_cancelled_notifier.notify = block_job_event_cancelled;
 job->finalize_completed_notifier.notify = block_job_event_completed;
 job->pending_notifier.notify = block_job_event_pending;
@@ -460,15 +450,10 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 
 error_setg(>blocker, "block device is in use by block job: %s",
job_type_str(>job));
-block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, _abort);
+block_job_add_bdrv(job, "main node", bs, perm, shared_perm, _abort);
 
 bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
-/* Disable request queuing in the BlockBackend to avoid deadlocks on drain:
- * The job reports that it's busy until it reaches a pause point. */
-blk_set_disable_request_queuing(blk, true);
-blk_set_allow_aio_context_change(blk, true);
-
 /* Only set speed when necessary to avoid NotSupported error */
 if (speed != 0) {
 if (!block_job_set_speed(job, speed, errp)) {
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index c4c15fb275..63203d9944 100644
--- a/tests/qemu-iotests/141.out
+++

[PATCH 4/5] test-bdrv-drain: don't use BlockJob.blk

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

We are going to drop BlockJob.blk in further commit. For tests it's
enough to simply pass bs pointer.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/unit/test-bdrv-drain.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 892f7f47d8..d7b155d13a 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -770,6 +770,7 @@ static void test_iothread_drain_subtree(void)
 
 typedef struct TestBlockJob {
 BlockJob common;
+BlockDriverState *bs;
 int run_ret;
 int prepare_ret;
 bool running;
@@ -781,7 +782,7 @@ static int test_job_prepare(Job *job)
 TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
 /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
-blk_flush(s->common.blk);
+bdrv_flush(s->bs);
 return s->prepare_ret;
 }
 
@@ -790,7 +791,7 @@ static void test_job_commit(Job *job)
 TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
 /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
-blk_flush(s->common.blk);
+bdrv_flush(s->bs);
 }
 
 static void test_job_abort(Job *job)
@@ -798,7 +799,7 @@ static void test_job_abort(Job *job)
 TestBlockJob *s = container_of(job, TestBlockJob, common.job);
 
 /* Provoke an AIO_WAIT_WHILE() call to verify there is no deadlock */
-blk_flush(s->common.blk);
+bdrv_flush(s->bs);
 }
 
 static int coroutine_fn test_job_run(Job *job, Error **errp)
@@ -913,6 +914,7 @@ static void test_blockjob_common_drain_node(enum drain_type 
drain_type,
 tjob = block_job_create("job0", _job_driver, NULL, src,
 0, BLK_PERM_ALL,
 0, 0, NULL, NULL, _abort);
+tjob->bs = src;
 job = >common;
 block_job_add_bdrv(job, "target", target, 0, BLK_PERM_ALL, _abort);
 
@@ -1535,6 +1537,7 @@ typedef struct TestDropBackingBlockJob {
 bool should_complete;
 bool *did_complete;
 BlockDriverState *detach_also;
+BlockDriverState *bs;
 } TestDropBackingBlockJob;
 
 static int coroutine_fn test_drop_backing_job_run(Job *job, Error **errp)
@@ -1554,7 +1557,7 @@ static void test_drop_backing_job_commit(Job *job)
 TestDropBackingBlockJob *s =
 container_of(job, TestDropBackingBlockJob, common.job);
 
-bdrv_set_backing_hd(blk_bs(s->common.blk), NULL, _abort);
+bdrv_set_backing_hd(s->bs, NULL, _abort);
 bdrv_set_backing_hd(s->detach_also, NULL, _abort);
 
 *s->did_complete = true;
@@ -1654,6 +1657,7 @@ static void test_blockjob_commit_by_drained_end(void)
 job = block_job_create("job", _drop_backing_job_driver, NULL,
bs_parents[2], 0, BLK_PERM_ALL, 0, 0, NULL, NULL,
_abort);
+job->bs = bs_parents[2];
 
 job->detach_also = bs_parents[0];
 job->did_complete = _has_completed;
-- 
2.29.2

[PATCH 1/5] blockjob: implement and use block_job_get_aio_context

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

We are going to drop BlockJob.blk. So let's retrieve block job context
from underlying job instead of main node.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/blockjob.h | 7 +++
 blockdev.c   | 6 +++---
 blockjob.c   | 5 +
 qemu-img.c   | 2 +-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d200f33c10..3b84805140 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -173,4 +173,11 @@ bool block_job_is_internal(BlockJob *job);
  */
 const BlockJobDriver *block_job_driver(BlockJob *job);
 
+/*
+ * block_job_get_aio_context:
+ *
+ * Returns aio context associated with a block job.
+ */
+AioContext *block_job_get_aio_context(BlockJob *job);
+
 #endif
diff --git a/blockdev.c b/blockdev.c
index 834c2304a1..18ed8cd03f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3328,7 +3328,7 @@ static BlockJob *find_block_job(const char *id, 
AioContext **aio_context,
 return NULL;
 }
 
-*aio_context = blk_get_aio_context(job->blk);
+*aio_context = block_job_get_aio_context(job);
 aio_context_acquire(*aio_context);
 
 return job;
@@ -3433,7 +3433,7 @@ void qmp_block_job_finalize(const char *id, Error **errp)
  * automatically acquires the new one), so make sure we release the correct
  * one.
  */
-aio_context = blk_get_aio_context(job->blk);
+aio_context = block_job_get_aio_context(job);
 job_unref(>job);
 aio_context_release(aio_context);
 }
@@ -3710,7 +3710,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 if (block_job_is_internal(job)) {
 continue;
 }
-aio_context = blk_get_aio_context(job->blk);
+aio_context = block_job_get_aio_context(job);
 aio_context_acquire(aio_context);
 value = block_job_query(job, errp);
 aio_context_release(aio_context);
diff --git a/blockjob.c b/blockjob.c
index 2fe1d788ba..2c168b93b0 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -534,3 +534,8 @@ BlockErrorAction block_job_error_action(BlockJob *job, 
BlockdevOnError on_err,
 }
 return action;
 }
+
+AioContext *block_job_get_aio_context(BlockJob *job)
+{
+return job->job.aio_context;
+}
diff --git a/qemu-img.c b/qemu-img.c
index a5993682aa..b690d3ae2b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -900,7 +900,7 @@ static void common_block_job_cb(void *opaque, int ret)
 
 static void run_block_job(BlockJob *job, Error **errp)
 {
-AioContext *aio_context = blk_get_aio_context(job->blk);
+AioContext *aio_context = block_job_get_aio_context(job);
 int ret = 0;
 
 aio_context_acquire(aio_context);
-- 
2.29.2

[PATCH 0/5] block-job: drop BlockJob.blk

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

Hi all!

Block jobs usually operate with several block nodes, and better to
handle them symmetrically, than use one from s->common.blk and one from
s->target (or something like this). Moreover, generic blockjob layer has
no use of BlockJob.blk. And more-moreover, most of block-jobs don't
really use this blk. Actually only block-stream use it.

I've started this thing (unbinding block-job and its main node) long
ago. First step was removing bs->job pointer in b23c580c946644b. Then
block_job_drain was dropped in bb0c94099382b5273.

Now let's finally drop job->blk pointer.

Vladimir Sementsov-Ogievskiy (5):
  blockjob: implement and use block_job_get_aio_context
  test-blockjob-txn: don't abuse job->blk
  block/stream: add own blk
  test-bdrv-drain: don't use BlockJob.blk
  blockjob: drop BlockJob.blk field

 include/block/blockjob.h   | 10 +++---
 block/mirror.c |  7 ---
 block/stream.c | 24 ++--
 blockdev.c |  6 +++---
 blockjob.c | 24 +++-
 qemu-img.c |  2 +-
 tests/unit/test-bdrv-drain.c   | 12 
 tests/unit/test-blockjob-txn.c | 10 +-
 tests/qemu-iotests/141.out |  2 +-
 9 files changed, 46 insertions(+), 51 deletions(-)

-- 
2.29.2

Re: [ANNOUNCE] libblkio v0.1.0 preview release

2021-05-06 Thread Stefan Hajnoczi

On Thu, May 06, 2021 at 12:33:24PM +0200, Kevin Wolf wrote:
> Am 06.05.2021 um 10:46 hat Stefan Hajnoczi geschrieben:
> > On Wed, May 05, 2021 at 06:46:36PM +0200, Kevin Wolf wrote:
> > > Am 05.05.2021 um 18:19 hat Stefan Hajnoczi geschrieben:
> > > > On Tue, May 04, 2021 at 03:44:23PM +0200, Kevin Wolf wrote:
> > > > > Am 30.04.2021 um 17:49 hat Stefan Hajnoczi geschrieben:
> > > > > > On Thu, Apr 29, 2021 at 05:51:16PM +0200, Kevin Wolf wrote:
> > > > > > > Am 29.04.2021 um 16:05 hat Stefan Hajnoczi geschrieben:
> > > > > There is one more thing I'm wondering right now: Why do we have 
> > > > > separate
> > > > > states for connecting to the backend (created) and configuring it
> > > > > (initialized)? The property setters check for the right state, but 
> > > > > they
> > > > > don't really do anything that wouldn't be possible in the other state.
> > > > > A state machine exposed as two boolean rather than a tristate enum 
> > > > > feels
> > > > > a bit awkward, too, but even more so if two states could possibly be
> > > > > enough.
> > > > > 
> > > > > The reason why I'm asking is that in order to address my points, it
> > > > > would be best to have separate property accessors for each state, and
> > > > > two pairs of accessors would make property declarations more managable
> > > > > than three pairs.
> > > > 
> > > > There is no need to have separate boolean properties, it's just how I
> > > > implemented it. There could be a single state property. For example, a
> > > > string that is "uninitialized", "initialized", and "started".
> > > 
> > > Right. I think this would make the way the API works a bit more obvious,
> > > though it's really only a small detail.
> > > 
> > > However, my real question was why we even need those three distinct
> > > states. From what I could see so far in the io_uring implemtation,
> > > "uninitialized" and "started" would be enough. Everything that can be
> > > configured in "initialized", could just as well be configured in
> > > "uninitialized" and the state transition would both open the image file
> > > and apply the configuration in a single step.
> > > 
> > > Do you have intentions to add features in the future where the three
> > > separate states are actually needed because the user needs to be able to
> > > do something while the image file is already opened, but queue
> > > processing hasn't started yet?
> > 
> > Yes, three states will be needed. Think of a virtio-blk driver where a
> > VFIO PCI bus address or a vhost-user-blk UNIX domain socket path are
> > needed to connect to the device. After connection completes you then
> > have access to the block queue limits, the number of queues, etc. Once
> > those things are configured the device can be started.
> 
> You mean, the user of the library would first query some limits before
> deciding on the right values to use for some properties?
> 
> When the values come directly from the command line, this won't be the
> case anyway, but I agree there may be cases where not the user, but the
> application decides and then the separation would be helpful.

Yes, for example the number of queues. After attaching to the device the
application can determine the maximum number of queues. It then has a
chance to lower the number of queues if it wants.

> > > > > > - So many functions! This makes understanding the API harder.
> > > > > > 
> > > > > > - Very verbose. The function and type names get long and there is a 
> > > > > > lot
> > > > > >   of repetition in the API.
> > > > > 
> > > > > I think it wouldn't be too bad if all drivers exposed the same
> > > > > properties, but you're explicitly expecting driver-specific 
> > > > > properties.
> > > > > If drivers add an external APIs that just fail for other drivers, it
> > > > > would indeed make understanding the API much harder.
> > > > > 
> > > > > We could consider a mix where you would first create a configuration
> > > > > object, then use the generic property functions to set options for it
> > > > > and finally have a separate blkio_initialize() function where you turn
> > > > > that config into a struct blkio that is needed to actually do I/O (and
> > > > > also supports generic property functions for runtime option updates).
> > > > > 
> > > > > I'm not sure it provides much except making the state machine more
> > > > > prominent than just two random bool properties.
> > > > 
> > > > I prefer to keep the configuration public API as it is. We can change
> > > > the properties.rs implementation however we want though.
> > > > 
> > > > Do you think the public API should be a typestate API instead with
> > > > struct blkio_init_info, struct blkio_start_info, and struct blkio
> > > > expressing the 3 states instead?
> > > 
> > > I just mentioned it as a theoretical option for a middle ground. I'm not
> > > sure if it's a good idea, and probably not worth the effort to change
> > > it.
> > > 
> > > Maybe I would give the state transitions a separate

Re: [PATCH 11/23] hw/intc/xics: Avoid dynamic stack allocation

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 10:22 AM, Greg Kurz wrote:
> On Wed,  5 May 2021 23:10:35 +0200
> Philippe Mathieu-Daudé  wrote:
> 
>> Use autofree heap allocation instead of variable-length
>> array on the stack.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/intc/xics.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)

>> +g_autofree uint8_t *flags = g_malloc(ics->nr_irqs);
> 
> I would have made it g_new(uint8_t, ics->nr_irqs) so that changes
> in the type of 'flags' that could potentially change the allocated
> size are safely detected.

OK, will update.

> This is unlikely though, so:
> 
> Reviewed-by: Greg Kurz 

Thanks!

Re: [ANNOUNCE] libblkio v0.1.0 preview release

2021-05-06 Thread Kevin Wolf

Am 06.05.2021 um 10:46 hat Stefan Hajnoczi geschrieben:
> On Wed, May 05, 2021 at 06:46:36PM +0200, Kevin Wolf wrote:
> > Am 05.05.2021 um 18:19 hat Stefan Hajnoczi geschrieben:
> > > On Tue, May 04, 2021 at 03:44:23PM +0200, Kevin Wolf wrote:
> > > > Am 30.04.2021 um 17:49 hat Stefan Hajnoczi geschrieben:
> > > > > On Thu, Apr 29, 2021 at 05:51:16PM +0200, Kevin Wolf wrote:
> > > > > > Am 29.04.2021 um 16:05 hat Stefan Hajnoczi geschrieben:
> > > > There is one more thing I'm wondering right now: Why do we have separate
> > > > states for connecting to the backend (created) and configuring it
> > > > (initialized)? The property setters check for the right state, but they
> > > > don't really do anything that wouldn't be possible in the other state.
> > > > A state machine exposed as two boolean rather than a tristate enum feels
> > > > a bit awkward, too, but even more so if two states could possibly be
> > > > enough.
> > > > 
> > > > The reason why I'm asking is that in order to address my points, it
> > > > would be best to have separate property accessors for each state, and
> > > > two pairs of accessors would make property declarations more managable
> > > > than three pairs.
> > > 
> > > There is no need to have separate boolean properties, it's just how I
> > > implemented it. There could be a single state property. For example, a
> > > string that is "uninitialized", "initialized", and "started".
> > 
> > Right. I think this would make the way the API works a bit more obvious,
> > though it's really only a small detail.
> > 
> > However, my real question was why we even need those three distinct
> > states. From what I could see so far in the io_uring implemtation,
> > "uninitialized" and "started" would be enough. Everything that can be
> > configured in "initialized", could just as well be configured in
> > "uninitialized" and the state transition would both open the image file
> > and apply the configuration in a single step.
> > 
> > Do you have intentions to add features in the future where the three
> > separate states are actually needed because the user needs to be able to
> > do something while the image file is already opened, but queue
> > processing hasn't started yet?
> 
> Yes, three states will be needed. Think of a virtio-blk driver where a
> VFIO PCI bus address or a vhost-user-blk UNIX domain socket path are
> needed to connect to the device. After connection completes you then
> have access to the block queue limits, the number of queues, etc. Once
> those things are configured the device can be started.

You mean, the user of the library would first query some limits before
deciding on the right values to use for some properties?

When the values come directly from the command line, this won't be the
case anyway, but I agree there may be cases where not the user, but the
application decides and then the separation would be helpful.

> > > > > > Alternatives in QEMU are qdev properties (which are internally QOM
> > > > > > properties, but provide default implementations and are at least
> > > > > > automatically read-only after realize, avoiding that whole class of
> > > > > > bugs) and QAPI.
> > > > > > If this was QEMU code, I would of course go for QAPI, but a library 
> > > > > > is
> > > > > > something different and adding the code generator would probably be 
> > > > > > a
> > > > > > bit too much anyway. But the idea in the resulting code would be 
> > > > > > dealing
> > > > > > with native structs instead of a bunch of function calls. This would
> > > > > > probably be the least error prone way for the implementation, but of
> > > > > > course, it would make binary compatibility a bit harder when adding 
> > > > > > new
> > > > > > properties.
> > > > > 
> > > > > An alternative I considered was the typestate and builder patterns:
> > > > > 
> > > > >   /* Create a new io_uring driver in the uninitialized state */
> > > > >   struct blkio_iou_uninit *blkio_new_io_uring(void);
> > > > > 
> > > > >   /* Uninitialized state property setters */
> > > > >   int blkio_iou_uninit_set_path(struct blkio_iou_uninit *u,
> > > > > const char *path);
> > > > >   int blkio_iou_uninit_set_direct(struct blkio_iou_uninit *u,
> > > > >   bool o_direct);
> > > > >   ...
> > > > > 
> > > > >   /* Transition to initialized state. Frees u on success. */
> > > > >   struct blkio_iou_init *blkio_iou_init(struct blkio_iou_uninit *u);
> > > > > 
> > > > >   /* Initialized state property setters/getters */
> > > > >   int blkio_iou_init_get_capacity(struct blkio_iou_init *i,
> > > > >   uint64_t *capacity);
> > > > >   ...
> > > > > 
> > > > >   /* Transition to started state. Frees i on success. */
> > > > >   struct blkio_iou_started *blkio_iou_start(struct blkio_iou_init *i);
> > > > > 
> > > > >   ...
> > > > > 
> > > > >   /* Transition back to initialized state. Frees s on success. */
> > > > >

Re: [PATCH v2 3/3] qapi: deprecate drive-backup

2021-05-06 Thread Kashyap Chamarthy

On Wed, May 05, 2021 at 04:58:03PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Modern way is using blockdev-add + blockdev-backup, which provides a
> lot more control on how target is opened.
> 
> As example of drive-backup problems consider the following:
> 
> User of drive-backup expects that target will be opened in the same
> cache and aio mode as source. Corresponding logic is in
> drive_backup_prepare(), where we take bs->open_flags of source.
> 
> It works rather bad if source was added by blockdev-add. Assume source
> is qcow2 image. On blockdev-add we should specify aio and cache options
> for file child of qcow2 node. What happens next:
> 
> drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
> But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
> places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
> as file-posix parse options and simply set s->use_linux_aio.
> 
> The documentation is updated in a minimal way, so that drive-backup is
> noted only as a deprecated command, and blockdev-backup used in most of
> places.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> TODO: We also need to deprecate drive-backup transaction action..
> But union members in QAPI doesn't support 'deprecated' feature. I tried
> to dig a bit, but failed :/ Markus, could you please help with it? At
> least by advice?
> 
>  docs/interop/live-block-operations.rst | 47 +-
>  docs/system/deprecated.rst | 11 ++
>  qapi/block-core.json   |  5 ++-

The core changes itself looks good; I have some minor nit-picks below,
hope that's not annoying. :-)

With those addressed:

Reviewed-by: Kashyap Chamarthy 

>  3 files changed, 46 insertions(+), 17 deletions(-)
> 
> diff --git a/docs/interop/live-block-operations.rst 
> b/docs/interop/live-block-operations.rst
> index 1073b930dc..f71f79ae2a 100644
> --- a/docs/interop/live-block-operations.rst
> +++ b/docs/interop/live-block-operations.rst
> @@ -116,8 +116,8 @@ QEMU block layer supports.
>  (3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
>  disk to another image.
>  
> -(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
> -of a block device to a destination.
> +(4) ``blockdev-backup`` (and deprecated ``drive-backup``): Point-in-time
> +(live) copy of a block device to a destination.

nit: s/deprecated ``drive-backup``/the deprecated ``drive-backup``/  

>  
>  .. _`Interacting with a QEMU instance`:
> @@ -553,13 +553,14 @@ Currently, there are four different kinds:
>  
>  (3) ``none`` -- Synchronize only the new writes from this point on.
>  
> -.. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
> -  the behavior of ``none`` synchronization mode is different.
> -  Normally, a ``backup`` job consists of two parts: Anything
> -  that is overwritten by the guest is first copied out to
> -  the backup, and in the background the whole image is
> -  copied from start to end. With ``sync=none``, it's only
> -  the first part.
> +.. note:: In the case of ``blockdev-backup`` (or deprecated
> +  ``drive-backup``), the behavior of ``none``
> +  synchronization mode is different.  Normally, a
> +  ``backup`` job consists of two parts: Anything that is
> +  overwritten by the guest is first copied out to the
> +  backup, and in the background the whole image is copied
> +  from start to end. With ``sync=none``, it's only the
> +  first part.
>  
>  (4) ``incremental`` -- Synchronize content that is described by the
>  dirty bitmap
> @@ -924,19 +925,22 @@ Shutdown the guest, by issuing the ``quit`` QMP 
> command::
>  }
>  
>  
> -Live disk backup --- ``drive-backup`` and ``blockdev-backup``
> --
> +Live disk backup --- ``blockdev-backup`` and deprecated``drive-backup``
> +---

Here too, missing the article "the": "the deprecated".

> -The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows
> +The ``blockdev-backup`` (and deprecated ``drive-backup``) allows
>  you to create a point-in-time snapshot.
>  
> -In this case, the point-in-time is when you *start* the ``drive-backup``
> -(or its newer equivalent ``blockdev-backup``) command.
> +In this case, the point-in-time is when you *start* the
> +``blockdev-backup`` (or deprecated ``drive-backup``) command.
>  
>  
>  QMP invocation for ``drive-backup``
>  ~~~
>  
> +Note that ``drive-backup`` command is deprecated since Qemu 6.1 and
> +will be removed in future.

nit: Let's consistently spell QEMU in all caps, please: s/Qemu/QEMU/

>  Yet again, starting afresh with our example disk image chain::
>

Re: [PATCH v2 2/3] docs/interop/bitmaps: use blockdev-backup

2021-05-06 Thread Kashyap Chamarthy

On Wed, May 05, 2021 at 04:58:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We are going to deprecate drive-backup, so use modern interface here.
> In examples where target image creation is shown, show blockdev-add as
> well. If target creation omitted, omit blockdev-add as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/interop/bitmaps.rst | 285 +--
>  1 file changed, 215 insertions(+), 70 deletions(-)

Looks fine to me.  I have a couple of nits below, perhaps whoever is
applying the patch can tweak them.  FWIW:

Reviewed-by: Kashyap Chamarthy 

> diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
> index 059ad67929..ef95090c81 100644
> --- a/docs/interop/bitmaps.rst
> +++ b/docs/interop/bitmaps.rst
> @@ -539,12 +539,11 @@ other partial disk images on top of a base image to 
> reconstruct a full backup
>  from the point in time at which the incremental backup was issued.
>  
>  The "Push Model" here references the fact that QEMU is "pushing" the modified
> -blocks out to a destination. We will be using the `drive-backup
> -`_ and `blockdev-backup
> -`_ QMP commands to create both
> +blocks out to a destination. We will be using the  `blockdev-backup
> +`_ QMP command to create both
>  full and incremental backups.
>  
> -Both of these commands are jobs, which have their own QMP API for querying 
> and
> +The command is a job, which has its own QMP API for querying and

nit: s/job/background job/

>  management documented in `Background jobs
>  `_.
>  
> @@ -557,6 +556,10 @@ create a new incremental backup chain attached to a 
> drive.
>  This example creates a new, full backup of "drive0" and accompanies it with a
>  new, empty bitmap that records writes from this point in time forward.
>  
> +The target may be created with help of `blockdev-add

It is less ambiguous (at least to my eyes) to replace "may" with "can".

nit: s/with help of/with the help of/

> +`_ or `blockdev-create
> +`_ command.
> +
>  .. note:: Any new writes that happen after this command is issued, even while
>the backup job runs, will be written locally and not to the backup
>destination. These writes will be recorded in the bitmap
  
[...]

-- 
/kashyap

Re: [PATCH v4 4/6] block: Support multiple reopening with x-blockdev-reopen

2021-05-06 Thread Vladimir Sementsov-Ogievskiy


05.05.2021 19:18, Kevin Wolf wrote:

And then, bdrv_reopen_multiple() is called with no aio context
acquired, and no drained section.. And it shoud be refactored to
properly operate with acquiring and realeasing the contexts and
drained sections when needed...

The drained section shouldn't be a problem, we can keep this. In fact,
we need this, it is a documented requirement of bdrv_reopen_multiple().
We just need to drop the AioContext lock after drained_begin.

bdrv_reopen_multiple() should really document its requirements regarding
AioContext locks. It probably doesn't really care, but requires that
it be called from the main thread.


(note preexisting problem of reopen, that during reopen the whole tree
may be moved to another aio context, but we continue operations with
acquired old aio context which is wrong).

Do we even acquire the old AioContext?



I think, I mean aio_context_acquire(ctx);  in current qmp_x_blockdev_reopen()..

And I remember, that you said about that problem. Still, may be I misunderstood 
(or misremember). Doesn't worth digging, I'd prefer just review v5 with a fresh 
eye when it comes.


--
Best regards,
Vladimir

Re: [PATCH v2 1/8] block: prepare write threshold code for thread safety

2021-05-06 Thread Vladimir Sementsov-Ogievskiy


06.05.2021 12:04, Stefan Hajnoczi wrote:

On Wed, May 05, 2021 at 12:54:09PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Hi Stefan!

Note my "[PATCH v2 0/9] block: refactor write threshold", it's a kind of 
counter-proposal for first half of this series.


Thanks! Emanuele mentioned he will drop his patches.

I took a look at your series and that approach looks good to me.



Note, I've sent "[PATCH v3 0/8] block: refactor write threshold" a moment ago.


--
Best regards,
Vladimir

[PATCH v3 8/8] write-threshold: deal with includes

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

"qemu/typedefs.h" is enough for include/block/write-threshold.h header
with forward declaration of BlockDriverState. Also drop extra includes
from block/write-threshold.c and tests/unit/test-write-threshold.c

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/write-threshold.h   | 2 +-
 block/write-threshold.c   | 2 --
 tests/unit/test-write-threshold.c | 1 -
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/block/write-threshold.h b/include/block/write-threshold.h
index 28c35a1c05..e3acbc249d 100644
--- a/include/block/write-threshold.h
+++ b/include/block/write-threshold.h
@@ -13,7 +13,7 @@
 #ifndef BLOCK_WRITE_THRESHOLD_H
 #define BLOCK_WRITE_THRESHOLD_H
 
-#include "block/block_int.h"
+#include "qemu/typedefs.h"
 
 /*
  * bdrv_write_threshold_set:
diff --git a/block/write-threshold.c b/block/write-threshold.c
index 8b46bb9a75..7578c0599a 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -13,10 +13,8 @@
 
 #include "qemu/osdep.h"
 #include "block/block_int.h"
-#include "qemu/coroutine.h"
 #include "block/write-threshold.h"
 #include "qemu/atomic.h"
-#include "qemu/notify.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-core.h"
 #include "qapi/qapi-events-block-core.h"
diff --git a/tests/unit/test-write-threshold.c 
b/tests/unit/test-write-threshold.c
index 49b1ef7a20..0158e4637a 100644
--- a/tests/unit/test-write-threshold.c
+++ b/tests/unit/test-write-threshold.c
@@ -7,7 +7,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qapi/error.h"
 #include "block/block_int.h"
 #include "block/write-threshold.h"
 
-- 
2.29.2

[PATCH v3 4/8] block/write-threshold: drop extra APIs

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

bdrv_write_threshold_exceeded() is unused at all.

bdrv_write_threshold_is_set() is used only to double check the value of
bs->write_threshold_offset in tests. No real sense in it (both tests do
check real value with help of bdrv_write_threshold_get())

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/write-threshold.h   | 24 
 block/write-threshold.c   | 19 ---
 tests/unit/test-write-threshold.c |  4 
 3 files changed, 47 deletions(-)

diff --git a/include/block/write-threshold.h b/include/block/write-threshold.h
index 555afd0de6..c60b9954cd 100644
--- a/include/block/write-threshold.h
+++ b/include/block/write-threshold.h
@@ -35,30 +35,6 @@ void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t 
threshold_bytes);
  */
 uint64_t bdrv_write_threshold_get(const BlockDriverState *bs);
 
-/*
- * bdrv_write_threshold_is_set
- *
- * Tell if a write threshold is set for a given BDS.
- */
-bool bdrv_write_threshold_is_set(const BlockDriverState *bs);
-
-/*
- * bdrv_write_threshold_exceeded
- *
- * Return the extent of a write request that exceeded the threshold,
- * or zero if the request is below the threshold.
- * Return zero also if the threshold was not set.
- *
- * NOTE: here we assume the following holds for each request this code
- * deals with:
- *
- * assert((req->offset + req->bytes) <= UINT64_MAX)
- *
- * Please not there is *not* an actual C assert().
- */
-uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
-   const BdrvTrackedRequest *req);
-
 /*
  * bdrv_write_threshold_check_write
  *
diff --git a/block/write-threshold.c b/block/write-threshold.c
index 71df3c434f..65a6acd142 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -24,25 +24,6 @@ uint64_t bdrv_write_threshold_get(const BlockDriverState *bs)
 return bs->write_threshold_offset;
 }
 
-bool bdrv_write_threshold_is_set(const BlockDriverState *bs)
-{
-return bs->write_threshold_offset > 0;
-}
-
-uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
-   const BdrvTrackedRequest *req)
-{
-if (bdrv_write_threshold_is_set(bs)) {
-if (req->offset > bs->write_threshold_offset) {
-return (req->offset - bs->write_threshold_offset) + req->bytes;
-}
-if ((req->offset + req->bytes) > bs->write_threshold_offset) {
-return (req->offset + req->bytes) - bs->write_threshold_offset;
-}
-}
-return 0;
-}
-
 void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes)
 {
 bs->write_threshold_offset = threshold_bytes;
diff --git a/tests/unit/test-write-threshold.c 
b/tests/unit/test-write-threshold.c
index fd40a815b8..bb5c1a5217 100644
--- a/tests/unit/test-write-threshold.c
+++ b/tests/unit/test-write-threshold.c
@@ -18,8 +18,6 @@ static void test_threshold_not_set_on_init(void)
 BlockDriverState bs;
 memset(, 0, sizeof(bs));
 
-g_assert(!bdrv_write_threshold_is_set());
-
 res = bdrv_write_threshold_get();
 g_assert_cmpint(res, ==, 0);
 }
@@ -33,8 +31,6 @@ static void test_threshold_set_get(void)
 
 bdrv_write_threshold_set(, threshold);
 
-g_assert(bdrv_write_threshold_is_set());
-
 res = bdrv_write_threshold_get();
 g_assert_cmpint(res, ==, threshold);
 }
-- 
2.29.2

[PATCH v3 7/8] test-write-threshold: drop extra TestStruct structure

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

We don't need this extra logic: it doesn't make code simpler.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/unit/test-write-threshold.c | 20 +++-
 1 file changed, 3 insertions(+), 17 deletions(-)

diff --git a/tests/unit/test-write-threshold.c 
b/tests/unit/test-write-threshold.c
index 9e9986aefc..49b1ef7a20 100644
--- a/tests/unit/test-write-threshold.c
+++ b/tests/unit/test-write-threshold.c
@@ -37,26 +37,12 @@ static void test_threshold_trigger(void)
 g_assert_cmpuint(bdrv_write_threshold_get(), ==, 0);
 }
 
-typedef struct TestStruct {
-const char *name;
-void (*func)(void);
-} TestStruct;
-
 
 int main(int argc, char **argv)
 {
-size_t i;
-TestStruct tests[] = {
-{ "/write-threshold/not-trigger",
-  test_threshold_not_trigger },
-{ "/write-threshold/trigger",
-  test_threshold_trigger },
-{ NULL, NULL }
-};
-
 g_test_init(, , NULL);
-for (i = 0; tests[i].name != NULL; i++) {
-g_test_add_func(tests[i].name, tests[i].func);
-}
+g_test_add_func("/write-threshold/not-trigger", 
test_threshold_not_trigger);
+g_test_add_func("/write-threshold/trigger", test_threshold_trigger);
+
 return g_test_run();
 }
-- 
2.29.2

[PATCH v3 1/8] block/write-threshold: don't use write notifiers

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

write-notifiers are used only for write-threshold. New code for such
purpose should create filters.

Let's better special-case write-threshold and drop write notifiers at
all. (Actually, write-threshold is special-cased anyway, as the only
user of write-notifiers)

So, create a new direct interface for bdrv_co_write_req_prepare() and
drop all write-notifier related logic from write-threshold.c.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/block_int.h   |  1 -
 include/block/write-threshold.h |  9 +
 block/io.c  |  5 ++-
 block/write-threshold.c | 70 +++--
 4 files changed, 27 insertions(+), 58 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c823f5b1b3..51f51286a5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -959,7 +959,6 @@ struct BlockDriverState {
 
 /* threshold limit for writes, in bytes. "High water mark". */
 uint64_t write_threshold_offset;
-NotifierWithReturn write_threshold_notifier;
 
 /* Writing to the list requires the BQL _and_ the dirty_bitmap_mutex.
  * Reading from the list can be done with either the BQL or the
diff --git a/include/block/write-threshold.h b/include/block/write-threshold.h
index c646f267a4..555afd0de6 100644
--- a/include/block/write-threshold.h
+++ b/include/block/write-threshold.h
@@ -59,4 +59,13 @@ bool bdrv_write_threshold_is_set(const BlockDriverState *bs);
 uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
const BdrvTrackedRequest *req);
 
+/*
+ * bdrv_write_threshold_check_write
+ *
+ * Check whether the specified request exceeds the write threshold.
+ * If it is, send corresponding event and disable write threshold checking.
+ */
+void bdrv_write_threshold_check_write(BlockDriverState *bs, int64_t offset,
+  int64_t bytes);
+
 #endif
diff --git a/block/io.c b/block/io.c
index 35b6c56efc..3520de51bb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -30,6 +30,7 @@
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
 #include "block/coroutines.h"
+#include "block/write-threshold.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
@@ -2008,8 +2009,8 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, int64_t bytes,
 } else {
 assert(child->perm & BLK_PERM_WRITE);
 }
-return notifier_with_return_list_notify(>before_write_notifiers,
-req);
+bdrv_write_threshold_check_write(bs, offset, bytes);
+return 0;
 case BDRV_TRACKED_TRUNCATE:
 assert(child->perm & BLK_PERM_RESIZE);
 return 0;
diff --git a/block/write-threshold.c b/block/write-threshold.c
index 85b78dc2a9..71df3c434f 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -29,14 +29,6 @@ bool bdrv_write_threshold_is_set(const BlockDriverState *bs)
 return bs->write_threshold_offset > 0;
 }
 
-static void write_threshold_disable(BlockDriverState *bs)
-{
-if (bdrv_write_threshold_is_set(bs)) {
-notifier_with_return_remove(>write_threshold_notifier);
-bs->write_threshold_offset = 0;
-}
-}
-
 uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
const BdrvTrackedRequest *req)
 {
@@ -51,55 +43,9 @@ uint64_t bdrv_write_threshold_exceeded(const 
BlockDriverState *bs,
 return 0;
 }
 
-static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
-void *opaque)
-{
-BdrvTrackedRequest *req = opaque;
-BlockDriverState *bs = req->bs;
-uint64_t amount = 0;
-
-amount = bdrv_write_threshold_exceeded(bs, req);
-if (amount > 0) {
-qapi_event_send_block_write_threshold(
-bs->node_name,
-amount,
-bs->write_threshold_offset);
-
-/* autodisable to avoid flooding the monitor */
-write_threshold_disable(bs);
-}
-
-return 0; /* should always let other notifiers run */
-}
-
-static void write_threshold_register_notifier(BlockDriverState *bs)
-{
-bs->write_threshold_notifier.notify = before_write_notify;
-bdrv_add_before_write_notifier(bs, >write_threshold_notifier);
-}
-
-static void write_threshold_update(BlockDriverState *bs,
-   int64_t threshold_bytes)
-{
-bs->write_threshold_offset = threshold_bytes;
-}
-
 void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes)
 {
-if (bdrv_write_threshold_is_set(bs)) {
-if (threshold_bytes > 0) {
-write_threshold_update(bs, threshold_bytes);
-} else {
-write_threshold_disable(bs);
-}
-} else {
-if (threshold_bytes > 0) {
-/* avoid multiple registration */
-

[PATCH v3 5/8] block/write-threshold: don't use aio context lock

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

Instead of relying on aio context lock, let's make use of atomic
operations.

The tricky place is bdrv_write_threshold_check_write(): we want
atomically unset bs->write_threshold_offset iff
  offset + bytes > bs->write_threshold_offset
We don't have such atomic operation, so let's go in a loop:

1. fetch wtr atomically
2. if condition satisfied, try cmpxchg (if not satisfied, we are done,
   don't send event)
3. if cmpxchg succeeded, we are done (send event), else go to [1]

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h   |  6 +-
 include/block/write-threshold.h |  6 ++
 block/write-threshold.c | 34 +
 3 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index eab352f363..e3f3d79f5b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -954,7 +954,11 @@ struct BlockDriverState {
  */
 int64_t total_sectors;
 
-/* threshold limit for writes, in bytes. "High water mark". */
+/*
+ * Threshold limit for writes, in bytes. "High water mark".
+ * Don't access directly, use bdrw_write_threshold* interface.
+ * Protected by atomic access, no lock is needed.
+ */
 uint64_t write_threshold_offset;
 
 /* Writing to the list requires the BQL _and_ the dirty_bitmap_mutex.
diff --git a/include/block/write-threshold.h b/include/block/write-threshold.h
index c60b9954cd..28c35a1c05 100644
--- a/include/block/write-threshold.h
+++ b/include/block/write-threshold.h
@@ -24,6 +24,8 @@
  * To be used with thin-provisioned block devices.
  *
  * Use threshold_bytes == 0 to disable.
+ *
+ * Function is thread-safe, no lock is needed.
  */
 void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes);
 
@@ -32,6 +34,8 @@ void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t 
threshold_bytes);
  *
  * Get the configured write threshold, in bytes.
  * Zero means no threshold configured.
+ *
+ * Function is thread-safe, no lock is needed.
  */
 uint64_t bdrv_write_threshold_get(const BlockDriverState *bs);
 
@@ -40,6 +44,8 @@ uint64_t bdrv_write_threshold_get(const BlockDriverState *bs);
  *
  * Check whether the specified request exceeds the write threshold.
  * If it is, send corresponding event and disable write threshold checking.
+ *
+ * Function is thread-safe, no lock is needed.
  */
 void bdrv_write_threshold_check_write(BlockDriverState *bs, int64_t offset,
   int64_t bytes);
diff --git a/block/write-threshold.c b/block/write-threshold.c
index 65a6acd142..8b46bb9a75 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -2,6 +2,7 @@
  * QEMU System Emulator block write threshold notification
  *
  * Copyright Red Hat, Inc. 2014
+ * Copyright (c) 2021 Virtuozzo International GmbH.
  *
  * Authors:
  *  Francesco Romani 
@@ -14,6 +15,7 @@
 #include "block/block_int.h"
 #include "qemu/coroutine.h"
 #include "block/write-threshold.h"
+#include "qemu/atomic.h"
 #include "qemu/notify.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-core.h"
@@ -21,45 +23,45 @@
 
 uint64_t bdrv_write_threshold_get(const BlockDriverState *bs)
 {
-return bs->write_threshold_offset;
+return qatomic_read(>write_threshold_offset);
 }
 
 void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes)
 {
-bs->write_threshold_offset = threshold_bytes;
+qatomic_set(>write_threshold_offset, threshold_bytes);
 }
 
 void qmp_block_set_write_threshold(const char *node_name,
uint64_t threshold_bytes,
Error **errp)
 {
-BlockDriverState *bs;
-AioContext *aio_context;
-
-bs = bdrv_find_node(node_name);
+BlockDriverState *bs = bdrv_find_node(node_name);
 if (!bs) {
 error_setg(errp, "Device '%s' not found", node_name);
 return;
 }
 
-aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
-
 bdrv_write_threshold_set(bs, threshold_bytes);
-
-aio_context_release(aio_context);
 }
 
 void bdrv_write_threshold_check_write(BlockDriverState *bs, int64_t offset,
   int64_t bytes)
 {
 int64_t end = offset + bytes;
-uint64_t wtr = bs->write_threshold_offset;
+uint64_t wtr;
 
-if (wtr > 0 && end > wtr) {
-qapi_event_send_block_write_threshold(bs->node_name, end - wtr, wtr);
+retry:
+wtr = bdrv_write_threshold_get(bs);
+if (wtr == 0 || wtr >= end) {
+return;
+}
 
-/* autodisable to avoid flooding the monitor */
-bdrv_write_threshold_set(bs, 0);
+/* autodisable to avoid flooding the monitor */
+if (qatomic_cmpxchg(>write_threshold_offset, wtr, 0) != wtr) {
+/* bs->write_threshold_offset changed in parallel */
+goto retry;
 }
+
+/* We have cleared bs->write_threshold_offset, so let's send

[PATCH v3 3/8] test-write-threshold: rewrite test_threshold_(not_)trigger tests

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

These tests use bdrv_write_threshold_exceeded() API, which is used only
for test (since pre-previous commit). Better is testing real API, which
is used in block.c as well.

So, let's call bdrv_write_threshold_check_write(), and check is
bs->write_threshold_offset cleared or not (it's cleared iff threshold
triggered).

Also we get rid of BdrvTrackedRequest use here. Note, that paranoiac
bdrv_check_request() calls were added in 8b1170012b1 to protect
BdrvTrackedRequest. Drop them now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/unit/test-write-threshold.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/tests/unit/test-write-threshold.c 
b/tests/unit/test-write-threshold.c
index fc1c45a2eb..fd40a815b8 100644
--- a/tests/unit/test-write-threshold.c
+++ b/tests/unit/test-write-threshold.c
@@ -55,41 +55,27 @@ static void test_threshold_multi_set_get(void)
 
 static void test_threshold_not_trigger(void)
 {
-uint64_t amount = 0;
 uint64_t threshold = 4 * 1024 * 1024;
 BlockDriverState bs;
-BdrvTrackedRequest req;
 
 memset(, 0, sizeof(bs));
-memset(, 0, sizeof(req));
-req.offset = 1024;
-req.bytes = 1024;
-
-bdrv_check_request(req.offset, req.bytes, _abort);
 
 bdrv_write_threshold_set(, threshold);
-amount = bdrv_write_threshold_exceeded(, );
-g_assert_cmpuint(amount, ==, 0);
+bdrv_write_threshold_check_write(, 1024, 1024);
+g_assert_cmpuint(bdrv_write_threshold_get(), ==, threshold);
 }
 
 
 static void test_threshold_trigger(void)
 {
-uint64_t amount = 0;
 uint64_t threshold = 4 * 1024 * 1024;
 BlockDriverState bs;
-BdrvTrackedRequest req;
 
 memset(, 0, sizeof(bs));
-memset(, 0, sizeof(req));
-req.offset = (4 * 1024 * 1024) - 1024;
-req.bytes = 2 * 1024;
-
-bdrv_check_request(req.offset, req.bytes, _abort);
 
 bdrv_write_threshold_set(, threshold);
-amount = bdrv_write_threshold_exceeded(, );
-g_assert_cmpuint(amount, >=, 1024);
+bdrv_write_threshold_check_write(, threshold - 1024, 2 * 1024);
+g_assert_cmpuint(bdrv_write_threshold_get(), ==, 0);
 }
 
 typedef struct TestStruct {
-- 
2.29.2

[PATCH v3 6/8] test-write-threshold: drop extra tests

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

Testing set/get of one 64bit variable doesn't seem necessary. We have a
lot of such variables. Also remaining tests do test set/get anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 tests/unit/test-write-threshold.c | 43 ---
 1 file changed, 43 deletions(-)

diff --git a/tests/unit/test-write-threshold.c 
b/tests/unit/test-write-threshold.c
index bb5c1a5217..9e9986aefc 100644
--- a/tests/unit/test-write-threshold.c
+++ b/tests/unit/test-write-threshold.c
@@ -12,43 +12,6 @@
 #include "block/write-threshold.h"
 
 
-static void test_threshold_not_set_on_init(void)
-{
-uint64_t res;
-BlockDriverState bs;
-memset(, 0, sizeof(bs));
-
-res = bdrv_write_threshold_get();
-g_assert_cmpint(res, ==, 0);
-}
-
-static void test_threshold_set_get(void)
-{
-uint64_t threshold = 4 * 1024 * 1024;
-uint64_t res;
-BlockDriverState bs;
-memset(, 0, sizeof(bs));
-
-bdrv_write_threshold_set(, threshold);
-
-res = bdrv_write_threshold_get();
-g_assert_cmpint(res, ==, threshold);
-}
-
-static void test_threshold_multi_set_get(void)
-{
-uint64_t threshold1 = 4 * 1024 * 1024;
-uint64_t threshold2 = 15 * 1024 * 1024;
-uint64_t res;
-BlockDriverState bs;
-memset(, 0, sizeof(bs));
-
-bdrv_write_threshold_set(, threshold1);
-bdrv_write_threshold_set(, threshold2);
-res = bdrv_write_threshold_get();
-g_assert_cmpint(res, ==, threshold2);
-}
-
 static void test_threshold_not_trigger(void)
 {
 uint64_t threshold = 4 * 1024 * 1024;
@@ -84,12 +47,6 @@ int main(int argc, char **argv)
 {
 size_t i;
 TestStruct tests[] = {
-{ "/write-threshold/not-set-on-init",
-  test_threshold_not_set_on_init },
-{ "/write-threshold/set-get",
-  test_threshold_set_get },
-{ "/write-threshold/multi-set-get",
-  test_threshold_multi_set_get },
 { "/write-threshold/not-trigger",
   test_threshold_not_trigger },
 { "/write-threshold/trigger",
-- 
2.29.2

[PATCH v3 0/8] block: refactor write threshold

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

Hi all!

v3:
01-04,06,07: add Max's r-b.
01: improve commit msg and comments
03: improve commit msg
05: add more comments and qemu/atomic.h include
08: new, replacement for v2:08,09

Vladimir Sementsov-Ogievskiy (8):
  block/write-threshold: don't use write notifiers
  block: drop write notifiers
  test-write-threshold: rewrite test_threshold_(not_)trigger tests
  block/write-threshold: drop extra APIs
  block/write-threshold: don't use aio context lock
  test-write-threshold: drop extra tests
  test-write-threshold: drop extra TestStruct structure
  write-threshold: deal with includes

 include/block/block_int.h |  19 ++---
 include/block/write-threshold.h   |  31 +++--
 block.c   |   1 -
 block/io.c|  11 +--
 block/write-threshold.c   | 111 +++---
 tests/unit/test-write-threshold.c |  90 ++--
 6 files changed, 52 insertions(+), 211 deletions(-)

-- 
2.29.2

[PATCH v3 2/8] block: drop write notifiers

2021-05-06 Thread Vladimir Sementsov-Ogievskiy

They are unused now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/block_int.h | 12 
 block.c   |  1 -
 block/io.c|  6 --
 3 files changed, 19 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 51f51286a5..eab352f363 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -954,9 +954,6 @@ struct BlockDriverState {
  */
 int64_t total_sectors;
 
-/* Callback before write request is processed */
-NotifierWithReturnList before_write_notifiers;
-
 /* threshold limit for writes, in bytes. "High water mark". */
 uint64_t write_threshold_offset;
 
@@ -1083,15 +1080,6 @@ void bdrv_parse_filename_strip_prefix(const char 
*filename, const char *prefix,
 bool bdrv_backing_overridden(BlockDriverState *bs);
 
 
-/**
- * bdrv_add_before_write_notifier:
- *
- * Register a callback that is invoked before write requests are processed but
- * after any throttling or waiting for overlapping requests.
- */
-void bdrv_add_before_write_notifier(BlockDriverState *bs,
-NotifierWithReturn *notifier);
-
 /**
  * bdrv_add_aio_context_notifier:
  *
diff --git a/block.c b/block.c
index 9ad725d205..75a82af641 100644
--- a/block.c
+++ b/block.c
@@ -400,7 +400,6 @@ BlockDriverState *bdrv_new(void)
 for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
 QLIST_INIT(>op_blockers[i]);
 }
-notifier_with_return_list_init(>before_write_notifiers);
 qemu_co_mutex_init(>reqs_lock);
 qemu_mutex_init(>dirty_bitmap_mutex);
 bs->refcnt = 1;
diff --git a/block/io.c b/block/io.c
index 3520de51bb..1e826ba9e8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3165,12 +3165,6 @@ bool bdrv_qiov_is_aligned(BlockDriverState *bs, 
QEMUIOVector *qiov)
 return true;
 }
 
-void bdrv_add_before_write_notifier(BlockDriverState *bs,
-NotifierWithReturn *notifier)
-{
-notifier_with_return_list_add(>before_write_notifiers, notifier);
-}
-
 void bdrv_io_plug(BlockDriverState *bs)
 {
 BdrvChild *child;
-- 
2.29.2

Re: [PATCH v2 1/8] block: prepare write threshold code for thread safety

2021-05-06 Thread Stefan Hajnoczi

On Wed, May 05, 2021 at 12:54:09PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi Stefan!
> 
> Note my "[PATCH v2 0/9] block: refactor write threshold", it's a kind of 
> counter-proposal for first half of this series.

Thanks! Emanuele mentioned he will drop his patches.

I took a look at your series and that approach looks good to me.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 06/23] hw/block/dataplane/virtio-blk: Avoid dynamic stack allocation

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 10:53 AM, Stefan Hajnoczi wrote:
> On Wed, May 05, 2021 at 11:10:30PM +0200, Philippe Mathieu-Daudé wrote:
>> Use autofree heap allocation instead of variable-length
>> array on the stack.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/block/dataplane/virtio-blk.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Why?

The motivation behind removing all variable-length allocations
(and adding CPPFLAG+=-Wvla at the end) is to avoid security
vulnerabilities such CVE-2021-3527.

> This is a performance-critical code path and BITS_TO_LONGS(nvqs) is
> small.

OK, having looked better at nvqs, I suppose this is preferred:

-- >8 --
@@ -60,7 +60,7 @@ static void notify_guest_bh(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
 unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
+unsigned long bitmap[BITS_TO_LONGS(VIRTIO_QUEUE_MAX)];
 unsigned j;

 memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
---

Would that work for you?

> 
> Stefan
>

Re: [PATCH 06/23] hw/block/dataplane/virtio-blk: Avoid dynamic stack allocation

2021-05-06 Thread Stefan Hajnoczi

On Wed, May 05, 2021 at 11:10:30PM +0200, Philippe Mathieu-Daudé wrote:
> Use autofree heap allocation instead of variable-length
> array on the stack.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/dataplane/virtio-blk.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Why? This is a performance-critical code path and BITS_TO_LONGS(nvqs) is
small.

Stefan


signature.asc
Description: PGP signature

[PATCH] qemu-iotests: fix pylint 2.8 consider-using-with error

2021-05-06 Thread Emanuele Giuseppe Esposito

pylint 2.8 introduces consider-using-with error, suggesting
to use the 'with' block statement when possible.
http://pylint.pycqa.org/en/latest/whatsnew/2.8.html

Modify all subprocess.Popen calls to use the 'with' statement,
except one in __init__ of QemuIoInteractive class, since
the return value is assigned to a class field and used in other methods.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/iotests.py| 63 
 tests/qemu-iotests/testrunner.py | 22 +--
 2 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 5af0182895..e1117e8ae8 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -113,15 +113,14 @@ def qemu_tool_pipe_and_status(tool: str, args: 
Sequence[str],
 Run a tool and return both its output and its exit code
 """
 stderr = subprocess.STDOUT if connect_stderr else None
-subp = subprocess.Popen(args,
-stdout=subprocess.PIPE,
-stderr=stderr,
-universal_newlines=True)
-output = subp.communicate()[0]
-if subp.returncode < 0:
-cmd = ' '.join(args)
-sys.stderr.write(f'{tool} received signal {-subp.returncode}: {cmd}\n')
-return (output, subp.returncode)
+with subprocess.Popen(args, stdout=subprocess.PIPE,
+stderr=stderr, universal_newlines=True) as subp:
+output = subp.communicate()[0]
+if subp.returncode < 0:
+cmd = ' '.join(args)
+sys.stderr.write(f'{tool} received signal \
+   {-subp.returncode}: {cmd}\n')
+return (output, subp.returncode)
 
 def qemu_img_pipe_and_status(*args: str) -> Tuple[str, int]:
 """
@@ -237,6 +236,7 @@ def qemu_io_silent_check(*args):
 class QemuIoInteractive:
 def __init__(self, *args):
 self.args = qemu_io_args_no_fmt + list(args)
+# pylint: disable=consider-using-with
 self._p = subprocess.Popen(self.args, stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
@@ -310,22 +310,22 @@ def qemu_nbd_popen(*args):
 cmd.extend(args)
 
 log('Start NBD server')
-p = subprocess.Popen(cmd)
-try:
-while not os.path.exists(pid_file):
-if p.poll() is not None:
-raise RuntimeError(
-"qemu-nbd terminated with exit code {}: {}"
-.format(p.returncode, ' '.join(cmd)))
-
-time.sleep(0.01)
-yield
-finally:
-if os.path.exists(pid_file):
-os.remove(pid_file)
-log('Kill NBD server')
-p.kill()
-p.wait()
+with subprocess.Popen(cmd) as p:
+try:
+while not os.path.exists(pid_file):
+if p.poll() is not None:
+raise RuntimeError(
+"qemu-nbd terminated with exit code {}: {}"
+.format(p.returncode, ' '.join(cmd)))
+
+time.sleep(0.01)
+yield
+finally:
+if os.path.exists(pid_file):
+os.remove(pid_file)
+log('Kill NBD server')
+p.kill()
+p.wait()
 
 def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
 '''Return True if two image files are identical'''
@@ -334,13 +334,12 @@ def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
 
 def create_image(name, size):
 '''Create a fully-allocated raw image with sector markers'''
-file = open(name, 'wb')
-i = 0
-while i < size:
-sector = struct.pack('>l504xl', i // 512, i // 512)
-file.write(sector)
-i = i + 512
-file.close()
+with open(name, 'wb') as file:
+i = 0
+while i < size:
+sector = struct.pack('>l504xl', i // 512, i // 512)
+file.write(sector)
+i = i + 512
 
 def image_size(img):
 '''Return image's virtual size'''
diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index 1fc61fcaa3..729fe9ee3b 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -258,17 +258,17 @@ def do_run_test(self, test: str) -> TestResult:
 
 t0 = time.time()
 with f_bad.open('w', encoding="utf-8") as f:
-proc = subprocess.Popen(args, cwd=str(f_test.parent), env=env,
-stdout=f, stderr=subprocess.STDOUT)
-try:
-proc.wait()
-except KeyboardInterrupt:
-proc.terminate()
-proc.wait()
-return TestResult(status='not run',
-  description='Interrupted by user',
-  interrupted=True)
-ret = proc.returncode
+

Re: [ANNOUNCE] libblkio v0.1.0 preview release

2021-05-06 Thread Stefan Hajnoczi

On Wed, May 05, 2021 at 06:46:36PM +0200, Kevin Wolf wrote:
> Am 05.05.2021 um 18:19 hat Stefan Hajnoczi geschrieben:
> > On Tue, May 04, 2021 at 03:44:23PM +0200, Kevin Wolf wrote:
> > > Am 30.04.2021 um 17:49 hat Stefan Hajnoczi geschrieben:
> > > > On Thu, Apr 29, 2021 at 05:51:16PM +0200, Kevin Wolf wrote:
> > > > > Am 29.04.2021 um 16:05 hat Stefan Hajnoczi geschrieben:
> > > There is one more thing I'm wondering right now: Why do we have separate
> > > states for connecting to the backend (created) and configuring it
> > > (initialized)? The property setters check for the right state, but they
> > > don't really do anything that wouldn't be possible in the other state.
> > > A state machine exposed as two boolean rather than a tristate enum feels
> > > a bit awkward, too, but even more so if two states could possibly be
> > > enough.
> > > 
> > > The reason why I'm asking is that in order to address my points, it
> > > would be best to have separate property accessors for each state, and
> > > two pairs of accessors would make property declarations more managable
> > > than three pairs.
> > 
> > There is no need to have separate boolean properties, it's just how I
> > implemented it. There could be a single state property. For example, a
> > string that is "uninitialized", "initialized", and "started".
> 
> Right. I think this would make the way the API works a bit more obvious,
> though it's really only a small detail.
> 
> However, my real question was why we even need those three distinct
> states. From what I could see so far in the io_uring implemtation,
> "uninitialized" and "started" would be enough. Everything that can be
> configured in "initialized", could just as well be configured in
> "uninitialized" and the state transition would both open the image file
> and apply the configuration in a single step.
> 
> Do you have intentions to add features in the future where the three
> separate states are actually needed because the user needs to be able to
> do something while the image file is already opened, but queue
> processing hasn't started yet?

Yes, three states will be needed. Think of a virtio-blk driver where a
VFIO PCI bus address or a vhost-user-blk UNIX domain socket path are
needed to connect to the device. After connection completes you then
have access to the block queue limits, the number of queues, etc. Once
those things are configured the device can be started.

> > > > > Alternatives in QEMU are qdev properties (which are internally QOM
> > > > > properties, but provide default implementations and are at least
> > > > > automatically read-only after realize, avoiding that whole class of
> > > > > bugs) and QAPI.
> > > > > If this was QEMU code, I would of course go for QAPI, but a library is
> > > > > something different and adding the code generator would probably be a
> > > > > bit too much anyway. But the idea in the resulting code would be 
> > > > > dealing
> > > > > with native structs instead of a bunch of function calls. This would
> > > > > probably be the least error prone way for the implementation, but of
> > > > > course, it would make binary compatibility a bit harder when adding 
> > > > > new
> > > > > properties.
> > > > 
> > > > An alternative I considered was the typestate and builder patterns:
> > > > 
> > > >   /* Create a new io_uring driver in the uninitialized state */
> > > >   struct blkio_iou_uninit *blkio_new_io_uring(void);
> > > > 
> > > >   /* Uninitialized state property setters */
> > > >   int blkio_iou_uninit_set_path(struct blkio_iou_uninit *u,
> > > > const char *path);
> > > >   int blkio_iou_uninit_set_direct(struct blkio_iou_uninit *u,
> > > >   bool o_direct);
> > > >   ...
> > > > 
> > > >   /* Transition to initialized state. Frees u on success. */
> > > >   struct blkio_iou_init *blkio_iou_init(struct blkio_iou_uninit *u);
> > > > 
> > > >   /* Initialized state property setters/getters */
> > > >   int blkio_iou_init_get_capacity(struct blkio_iou_init *i,
> > > >   uint64_t *capacity);
> > > >   ...
> > > > 
> > > >   /* Transition to started state. Frees i on success. */
> > > >   struct blkio_iou_started *blkio_iou_start(struct blkio_iou_init *i);
> > > > 
> > > >   ...
> > > > 
> > > >   /* Transition back to initialized state. Frees s on success. */
> > > >   struct blkio_iou_init *blkio_iou_stop(struct blkio_iou_started *s);
> > > > 
> > > > On the plus side:
> > > > 
> > > > - No state checks are needed because an API won't even exist if it's
> > > >   unavailable in a given state (uninitialized/initialized/started).
> > > > 
> > > > - State structs come with pre-initialized default values, so the caller
> > > >   only needs to set non-default values. For example O_DIRECT is false by
> > > >   default and callers happy with that don't need to set the property.
> > > > 
> > > > - ABI compatibility is easy since the state structs

Re: [PATCH 05/23] io/channel-websock: Replace strlen(const_str) by sizeof(const_str) - 1

2021-05-06 Thread Daniel P . Berrangé

On Wed, May 05, 2021 at 11:10:29PM +0200, Philippe Mathieu-Daudé wrote:
> The combined_key[... QIO_CHANNEL_WEBSOCK_GUID_LEN ...] array in
> qio_channel_websock_handshake_send_res_ok() expands to a call
> to strlen(QIO_CHANNEL_WEBSOCK_GUID), and the compiler doesn't
> realize the string is const, so consider combined_key[] being
> a variable-length array.
> 
> To remove the variable-length array, we provide it a hint to
> the compiler by using sizeof() - 1 instead of strlen().
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  io/channel-websock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 11/23] hw/intc/xics: Avoid dynamic stack allocation

2021-05-06 Thread Greg Kurz

On Wed,  5 May 2021 23:10:35 +0200
Philippe Mathieu-Daudé  wrote:

> Use autofree heap allocation instead of variable-length
> array on the stack.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/intc/xics.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 68f9d44feb4..c293d00d5c4 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -566,8 +566,8 @@ static void ics_reset_irq(ICSIRQState *irq)
>  static void ics_reset(DeviceState *dev)
>  {
>  ICSState *ics = ICS(dev);
> +g_autofree uint8_t *flags = g_malloc(ics->nr_irqs);

I would have made it g_new(uint8_t, ics->nr_irqs) so that changes
in the type of 'flags' that could potentially change the allocated
size are safely detected.

This is unlikely though, so:

Reviewed-by: Greg Kurz 

>  int i;
> -uint8_t flags[ics->nr_irqs];
>  
>  for (i = 0; i < ics->nr_irqs; i++) {
>  flags[i] = ics->irqs[i].flags;

Re: [for-6.1 2/4] virtio-blk: Configure all host notifiers in a single MR transaction

2021-05-06 Thread Michael S. Tsirkin

On Wed, May 05, 2021 at 01:15:51PM +0200, Greg Kurz wrote:
> On Wed, 5 May 2021 11:14:23 +0100
> Stefan Hajnoczi  wrote:
> 
> > On Wed, Apr 07, 2021 at 04:34:59PM +0200, Greg Kurz wrote:
> > > This allows the virtio-blk-pci device to batch the setup of all its
> > > host notifiers. This significantly improves boot time of VMs with a
> > > high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
> > > pseries machine with 384 vCPUs.
> > > 
> > > Note that memory_region_transaction_commit() must be called before
> > > virtio_bus_cleanup_host_notifier() because the latter might close
> > > ioeventfds that the transaction still assumes to be around when it
> > > commits.
> > > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  hw/block/dataplane/virtio-blk.c | 25 +
> > >  1 file changed, 25 insertions(+)
> > > 
> > > diff --git a/hw/block/dataplane/virtio-blk.c 
> > > b/hw/block/dataplane/virtio-blk.c
> > > index d7b5c95d26d9..cd81893d1d01 100644
> > > --- a/hw/block/dataplane/virtio-blk.c
> > > +++ b/hw/block/dataplane/virtio-blk.c
> > > @@ -198,19 +198,30 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> > >  goto fail_guest_notifiers;
> > >  }
> > >  
> > > +memory_region_transaction_begin();
> > 
> > This call is optional and an optimization. It would be nice to have a
> > comment here explaining that - otherwise people may wonder why this is
> > necessary.
> > 
> 
> Indeed. Same goes for patch 4 actually.
> 
> Michael,
> 
> Should I re-post an updated version of this series or can the
> comments be added in a followup ?


An updated version is better. Stefan did not ack anyway.

> > > +
> > >  /* Set up virtqueue notify */
> > >  for (i = 0; i < nvqs; i++) {
> > >  r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
> > >  if (r != 0) {
> > > +int j = i;
> > > +
> > >  fprintf(stderr, "virtio-blk failed to set host notifier 
> > > (%d)\n", r);
> > >  while (i--) {
> > >  virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
> > > +}
> > > +
> > > +memory_region_transaction_commit();
> > > +
> > > +while (j--) {
> > >  virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
> > >  }
> > >  goto fail_host_notifiers;
> > >  }
> > >  }
> > >  
> > > +memory_region_transaction_commit();
> > > +
> > >  s->starting = false;
> > >  vblk->dataplane_started = true;
> > >  trace_virtio_blk_data_plane_start(s);
> > > @@ -246,8 +257,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> > >  return 0;
> > >  
> > >fail_aio_context:
> > > +memory_region_transaction_begin();
> > 
> > Probably unnecessary since this is an error code path. We don't need to
> > optimize it.
> > 
> > Doesn't hurt though.
> 
> True. I can drop this if I repost.

Re: [PATCH v2 9/9] block/write-threshold: drop extra includes

2021-05-06 Thread Max Reitz


On 05.05.21 22:34, Vladimir Sementsov-Ogievskiy wrote:

05.05.2021 19:23, Max Reitz wrote:

On 04.05.21 10:25, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/write-threshold.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/block/write-threshold.c b/block/write-threshold.c
index fbf4e6f5c4..db271c5537 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -12,10 +12,7 @@
   */
  #include "qemu/osdep.h"
-#include "block/block_int.h"


We need this (for bs->write_threshold_offset), but it’s included 
through block/block_int.h.  I’m not sure whether we should drop it 
from the includes.


Perhaps we should instead drop block_int.h from write-threshold.h? 
Shouldn’t including qemu/osdep.h (which includes qemu/typedefs.h, 
which forward-declares BDS) be sufficient there?



-#include "qemu/coroutine.h"
  #include "block/write-threshold.h"
-#include "qemu/notify.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-block-core.h"
  #include "qapi/qapi-events-block-core.h"


Btw, where does qemu/atomic.h come in?  Looks like it comes through 
block_int.h.  I think we should include it explicitly.




I'm not sure. If something is included through another include, why to 
include it explicitly?


Because the other include may change.  I’d include something if you need 
the feature, and we need atomics here.


And block_int.h doesn’t even provide atomic.h, it comes through various 
of its includes (which are probably not included to provide atomics). 
So this is already indirect and basically just incidental; block_int.h 
doesn’t guarantee atomic.h.


Another thing: I see that other fields in BDS that are to be accessed 
with atomic ops have a comment saying so.  I think 
write_threshold_offset should have, too.


Max

For me, if statement can be removed with no effect, it's an extra 
statement.

Re: [PATCH 15/23] net: Avoid dynamic stack allocation

2021-05-06 Thread Jason Wang




在 2021/5/6 上午5:10, Philippe Mathieu-Daudé 写道:

Use autofree heap allocation instead of variable-length
array on the stack.

Signed-off-by: Philippe Mathieu-Daudé 



Acked-by: Jason Wang 



---
  hw/net/fsl_etsec/rings.c  | 9 -
  hw/net/rocker/rocker_of_dpa.c | 2 +-
  net/dump.c| 2 +-
  net/tap.c | 2 +-
  4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
index 8f084464155..1abdcb5a29c 100644
--- a/hw/net/fsl_etsec/rings.c
+++ b/hw/net/fsl_etsec/rings.c
@@ -381,8 +381,6 @@ static void fill_rx_bd(eTSEC  *etsec,
  uint16_t to_write;
  hwaddr   bufptr = bd->bufptr +
  ((hwaddr)(etsec->regs[TBDBPH].value & 0xF) << 32);
-uint8_t  padd[etsec->rx_padding];
-uint8_t  rem;
  
  RING_DEBUG("eTSEC fill Rx buffer @ 0x%016" HWADDR_PRIx

 " size:%zu(padding + crc:%u) + fcb:%u\n",
@@ -423,11 +421,12 @@ static void fill_rx_bd(eTSEC  *etsec,
  /* The remaining bytes are only for padding which is not actually
   * allocated in the data buffer.
   */
-
-rem = MIN(etsec->regs[MRBLR].value - bd->length, etsec->rx_padding);
+uint8_t  rem = MIN(etsec->regs[MRBLR].value - bd->length,
+   etsec->rx_padding);
  
  if (rem > 0) {

-memset(padd, 0x0, sizeof(padd));
+g_autofree uint8_t *padd = g_malloc0(etsec->rx_padding);
+
  etsec->rx_padding -= rem;
  *size -= rem;
  bd->length+= rem;
diff --git a/hw/net/rocker/rocker_of_dpa.c b/hw/net/rocker/rocker_of_dpa.c
index b3b8c5bb6d4..3e400ceaef7 100644
--- a/hw/net/rocker/rocker_of_dpa.c
+++ b/hw/net/rocker/rocker_of_dpa.c
@@ -1043,7 +1043,7 @@ static void of_dpa_flow_ig_tbl(OfDpaFlowContext *fc, 
uint32_t tbl_id)
  static ssize_t of_dpa_ig(World *world, uint32_t pport,
   const struct iovec *iov, int iovcnt)
  {
-struct iovec iov_copy[iovcnt + 2];
+g_autofree struct iovec *iov_copy = g_new(struct iovec, iovcnt + 2);
  OfDpaFlowContext fc = {
  .of_dpa = world_private(world),
  .in_pport = pport,
diff --git a/net/dump.c b/net/dump.c
index 4d538d82a69..b830302e27d 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -68,7 +68,7 @@ static ssize_t dump_receive_iov(DumpState *s, const struct 
iovec *iov, int cnt)
  int64_t ts;
  int caplen;
  size_t size = iov_size(iov, cnt);
-struct iovec dumpiov[cnt + 1];
+g_autofree struct iovec *dumpiov = g_new(struct iovec, cnt + 1);
  
  /* Early return in case of previous error. */

  if (s->fd < 0) {
diff --git a/net/tap.c b/net/tap.c
index bae895e2874..2b9ed8a2cd8 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -120,7 +120,7 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
  {
  TAPState *s = DO_UPCAST(TAPState, nc, nc);
  const struct iovec *iovp = iov;
-struct iovec iov_copy[iovcnt + 1];
+g_autofree struct iovec *iov_copy = g_new(struct iovec, iovcnt + 1);
  struct virtio_net_hdr_mrg_rxbuf hdr = { };
  
  if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {

Re: [PATCH 08/23] hw/block/nvme: Avoid dynamic stack allocation

2021-05-06 Thread Klaus Jensen


On May  5 23:10, Philippe Mathieu-Daudé wrote:

Use autofree heap allocation instead of variable-length
array on the stack.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/block/nvme.c | 15 ---
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2f6d4925826..905c4bb57af 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -652,7 +652,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
len -= trans_len;
if (len) {
if (len > n->page_size) {
-uint64_t prp_list[n->max_prp_ents];
+g_autofree uint64_t *prp_list = NULL;
uint32_t nents, prp_trans;
int i = 0;

@@ -662,8 +662,10 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 * that offset.
 */
nents = (n->page_size - (prp2 & (n->page_size - 1))) >> 3;
-prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
+prp_trans = MIN(n->max_prp_ents, nents);
+prp_list = g_new(uint64_t, prp_trans);
+ret = nvme_addr_read(n, prp2, (void *)prp_list,
+ prp_trans * sizeof(uint64_t));


prp_trans determines how much we must transfer, not the size of the 
prp_list. Subsequent PRP lists may contain more than nents PRPs, so this 
may now go out of bounds.


Just do the allocation when prp_list is declared:

g_autofree uint64_t *prp_list = g_new(uint64_t, n->max_prp_ents);


if (ret) {
trace_pci_nvme_err_addr_read(prp2);
status = NVME_DATA_TRAS_ERROR;
@@ -682,9 +684,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
i = 0;
nents = (len + n->page_size - 1) >> n->page_bits;
nents = MIN(nents, n->max_prp_ents);
-prp_trans = nents * sizeof(uint64_t);
ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
- prp_trans);
+ nents * sizeof(uint64_t));
if (ret) {
trace_pci_nvme_err_addr_read(prp_ent);
status = NVME_DATA_TRAS_ERROR;
@@ -2510,10 +2511,10 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
if (attr & NVME_DSMGMT_AD) {
int64_t offset;
size_t len;
-NvmeDsmRange range[nr];
+g_autofree NvmeDsmRange *range = g_new(NvmeDsmRange, nr);
uintptr_t *discards = (uintptr_t *)>opaque;

-status = nvme_h2c(n, (uint8_t *)range, sizeof(range), req);
+status = nvme_h2c(n, (uint8_t *)range, sizeof(*range) * nr, req);
if (status) {
return status;
}


DSM change LGTM.


signature.asc
Description: PGP signature

Re: [PATCH 07/23] hw/block/nvme: Use definition to avoid dynamic stack allocation

2021-05-06 Thread Philippe Mathieu-Daudé

On 5/6/21 4:15 AM, Keith Busch wrote:
> On Wed, May 05, 2021 at 06:09:10PM -0500, Eric Blake wrote:
>> On 5/5/21 5:07 PM, Philippe Mathieu-Daudé wrote:
>>> +Eric
>>>
>>> On 5/5/21 11:22 PM, Keith Busch wrote:
 On Wed, May 05, 2021 at 11:10:31PM +0200, Philippe Mathieu-Daudé wrote:
> The compiler isn't clever enough to figure 'SEG_CHUNK_SIZE' is
> a constant! Help it by using a definitions instead.

 I don't understand.
>>>
>>> Neither do I TBH...
>>>
 It's labeled 'const', so any reasonable compiler
 will place it in the 'text' segment of the executable rather than on the
 stack. While that's compiler specific, is there really a compiler doing
 something bad with this? If not, I do prefer the 'const' here if only
 because it limits the symbol scope ('static const' is also preferred if
 that helps).
>>>
>>> Using: gcc version 10.2.1 20201125 (Red Hat 10.2.1-9) (GCC)
>>>
>>> Both static+const / const trigger:
>>>
>>> hw/block/nvme.c: In function ‘nvme_map_sgl’:
>>> hw/block/nvme.c:818:5: error: ISO C90 forbids variable length array
>>> ‘segment’ [-Werror=vla]
>>>   818 | NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
>>>   | ^
>>> cc1: all warnings being treated as errors
>>
>> C99 6.7.5.2 paragraph 4:
>> "If the size is an integer constant expression and the element type has
>> a known constant size, the array type is not a variable length array
>> type; otherwise, the array type is a variable length array type."
>>
>> 6.6 paragraph 6:
>> "An integer constant expression shall have integer type and shall only
>> have operands that are integer constants, enumeration constants,
>> character constants, sizeof expressions whose results are integer
>> constants, and floating constants that are the immediate operands of
>> casts. Cast operators in an integer constant expression shall only
>> convert arithmetic types to integer types, except as part of an operand
>> to the sizeof operator."
>>
>> Notably absent from that list are 'const int' variables, which even
>> though they act as constants (in that the name always represents the
>> same value), do not actually qualify as such under C99 rules.  Yes, it's
>> a pain.  What's more, 6.6 paragraph 10:
>>
>> "An implementation may accept other forms of constant expressions."
>>
>> which means it _should_ be possible for the compiler to do what we want.
>>  But just because it is permitted does not make it actually work. :(
> 
> Thank you, that makes sense. In this case, we are talking about an
> integer constant expression for the value of a 'const' symbol. That
> seems like perfect fodder for a compiler to optimize. I understand it's
> not obligated to do that, but I assumed it would.
> 
> Anyway, Philippe, I have no objection if you want to push forward with
> this series.

Thanks both. I'll amend Eric explanation in the commit description.

Regards,

Phil.

Re: [PATCH 07/23] hw/block/nvme: Use definition to avoid dynamic stack allocation

2021-05-06 Thread Klaus Jensen


On May  5 23:10, Philippe Mathieu-Daudé wrote:

The compiler isn't clever enough to figure 'SEG_CHUNK_SIZE' is
a constant! Help it by using a definitions instead.

Signed-off-by: Philippe Mathieu-Daudé 
---
hw/block/nvme.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fe082ec34c..2f6d4925826 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -812,7 +812,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 * descriptors and segment chain) than the command transfer size, so it is
 * not bounded by MDTS.
 */
-const int SEG_CHUNK_SIZE = 256;
+#define SEG_CHUNK_SIZE 256

NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
uint64_t nsgld;
--
2.26.3




Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

49 matches

Mail list logo