Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-21 Thread Emanuele Giuseppe Esposito




On 20/01/2022 14:48, Kevin Wolf wrote:

Am 20.01.2022 um 14:22 hat Paolo Bonzini geschrieben:

On 1/19/22 19:34, Kevin Wolf wrote:

So if we go back to a bdrv_invalidate_cache() that does all the graph
manipulations (and asserts that we're in the main loop) and then have a
much smaller bdrv_co_invalidate_cache() that basically just calls into
the driver, would that solve the problem?


I was going to suggest something similar, and even name the former
bdrv_activate().  Then bdrv_activate() is a graph manipulation function,
while bdrv_co_invalidate_cache() is an I/O function.


I like this. The naming inconsistency between inactivate and
invalidate_cache has always bothered me.



Ok, I am going to apply this. Thank you for the suggestion!


Did look further, couldn’t find anything else interesting.
So I think(TM) that this blk_exp_add() is the only thing that needs fixing.


When splitting the logic between bdrv_activate and 
bdrv_co_invalidate_cache, there is no need anymore to fix blk_exp_add :)


I am going to send v6 by the end of today.

Thank you,
Emanuele




Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-20 Thread Kevin Wolf
Am 20.01.2022 um 14:22 hat Paolo Bonzini geschrieben:
> On 1/19/22 19:34, Kevin Wolf wrote:
> > So if we go back to a bdrv_invalidate_cache() that does all the graph
> > manipulations (and asserts that we're in the main loop) and then have a
> > much smaller bdrv_co_invalidate_cache() that basically just calls into
> > the driver, would that solve the problem?
> 
> I was going to suggest something similar, and even name the former
> bdrv_activate().  Then bdrv_activate() is a graph manipulation function,
> while bdrv_co_invalidate_cache() is an I/O function.

I like this. The naming inconsistency between inactivate and
invalidate_cache has always bothered me.

Kevin




Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-20 Thread Paolo Bonzini

On 1/19/22 19:34, Kevin Wolf wrote:

So if we go back to a bdrv_invalidate_cache() that does all the graph
manipulations (and asserts that we're in the main loop) and then have a
much smaller bdrv_co_invalidate_cache() that basically just calls into
the driver, would that solve the problem?


I was going to suggest something similar, and even name the former 
bdrv_activate().  Then bdrv_activate() is a graph manipulation function, 
while bdrv_co_invalidate_cache() is an I/O function.


Paolo



Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-19 Thread Kevin Wolf
Am 20.12.2021 um 13:20 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:
> > 
> > 
> > On 17/12/2021 12:04, Hanna Reitz wrote:
> > > On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:
> > > > bdrv_co_invalidate_cache is special: it is an I/O function,
> > > 
> > > I still don’t believe it is, but well.
> > > 
> > > (Yes, it is called by a test in an iothread, but I believe we’ve
> > > seen that the tests simply sometimes test things that shouldn’t be
> > > allowed.)
> > > 
> > > > but uses the block layer permission API, which is GS.
> > > > 
> > > > Because of this, we can assert that either the function is
> > > > being called with BQL held, and thus can use the permission API,
> > > > or make sure that the permission API is not used, by ensuring that
> > > > bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.
> > > > 
> > > > Signed-off-by: Emanuele Giuseppe Esposito 
> > > > ---
> > > >   block.c | 26 ++
> > > >   1 file changed, 26 insertions(+)
> > > > 
> > > > diff --git a/block.c b/block.c
> > > > index a0309f827d..805974676b 100644
> > > > --- a/block.c
> > > > +++ b/block.c
> > > > @@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
> > > >   bdrv_init();
> > > >   }
> > > > +static bool bdrv_is_active(BlockDriverState *bs)
> > > > +{
> > > > +    BdrvChild *parent;
> > > > +
> > > > +    if (bs->open_flags & BDRV_O_INACTIVE) {
> > > > +    return false;
> > > > +    }
> > > > +
> > > > +    QLIST_FOREACH(parent, &bs->parents, next_parent) {
> > > > +    if (parent->klass->parent_is_bds) {
> > > > +    BlockDriverState *parent_bs = parent->opaque;
> > > 
> > > This looks like a really bad hack to me.  We purposefully have made
> > > the parent link opaque so that a BDS cannot easily reach its
> > > parents.  All accesses should go through BdrvChildClass methods.
> > > 
> > > I also don’t understand why we need to query parents at all.  The
> > > only fact that determines whether the current BDS will have its
> > > permissions changed is whether the BDS itself is active or
> > > inactive.  Sure, we’ll invoke bdrv_co_invalidate_cache() on the
> > > parents, too, but then we could simply let the assertion fail there.
> > > 
> > > > +    if (!bdrv_is_active(parent_bs)) {
> > > > +    return false;
> > > > +    }
> > > > +    }
> > > > +    }
> > > > +
> > > > +   return true;
> > > > +}
> > > > +
> > > >   int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState
> > > > *bs, Error **errp)
> > > >   {
> > > >   BdrvChild *child, *parent;
> > > > @@ -6585,6 +6605,12 @@ int coroutine_fn
> > > > bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
> > > >   return -ENOMEDIUM;
> > > >   }
> > > > +    /*
> > > > + * No need to muck with permissions if bs is active.
> > > > + * TODO: should activation be a separate function?
> > > > + */
> > > > +    assert(qemu_in_main_thread() || bdrv_is_active(bs));
> > > > +
> > > 
> > > I don’t understand this, really.  It looks to me like “if you don’t
> > > call this in the main thread, this better be a no-op”, i.e., you
> > > must never call this function in an I/O thread if you really want to
> > > use it.  I.e. what I’d classify as a GS function.
> > > 
> > > It sounds like this is just a special case for said test, and
> > > special-casing code for tests sounds like a bad idea.
> > 
> > Ok, but trying to leave just the qemu_in_main_thread() assertion makes
> > test 307 (./check 307) fail.
> > I am actually not sure on why it fails, but I am sure it is because of
> > the assertion, since without it it passes.
> > 
> > I tried with gdb (./check -gdb 307 on one terminal and
> > gdb -iex "target remote localhost:12345"
> > in another) but it points me to this below, which I think is the ndb
> > server getting the socket closed (because on the other side it crashed),
> > and not the actual error.
> > 
> > 
> > Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
> > 0x768af54d in sendmsg () from target:/lib64/libc.so.6
> > (gdb) bt
> > #0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
> > #1  0x55c13cc9 in qio_channel_socket_writev (ioc= > out>, iov=0x569a4870, niov=1, fds=0x0, nfds=,
> > errp=0x0)
> >      at ../io/channel-socket.c:561
> > #2  0x55c19b18 in qio_channel_writev_full_all
> > (ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, niov=niov@entry=1,
> > fds=fds@entry=0x0,
> >      nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
> > #3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1,
> > iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
> > #4  qio_channel_write_all (ioc=,
> > buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20,
> > errp=errp@entry=0x0) at ../io/channel.c:330
> > #5  0x55c27e75 in nbd_write (errp=0x0, size=20,
> > buffer=0x7fffe8dffdd0, ioc=) at ../nbd/nbd-in

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-19 Thread Hanna Reitz

On 19.01.22 16:57, Hanna Reitz wrote:

On 23.12.21 18:11, Hanna Reitz wrote:

On 20.12.21 13:20, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 12:04, Hanna Reitz wrote:

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve 
seen that the tests simply sometimes test things that shouldn’t be 
allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+    BdrvChild *parent;
+
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+    return false;
+    }
+
+    QLIST_FOREACH(parent, &bs->parents, next_parent) {
+    if (parent->klass->parent_is_bds) {
+    BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have 
made the parent link opaque so that a BDS cannot easily reach its 
parents.  All accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all. The 
only fact that determines whether the current BDS will have its 
permissions changed is whether the BDS itself is active or 
inactive.  Sure, we’ll invoke bdrv_co_invalidate_cache() on the 
parents, too, but then we could simply let the assertion fail there.



+    if (!bdrv_is_active(parent_bs)) {
+    return false;
+    }
+    }
+    }
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, 
Error **errp)

  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)

  return -ENOMEDIUM;
  }
+    /*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+    assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you 
don’t call this in the main thread, this better be a no-op”, i.e., 
you must never call this function in an I/O thread if you really 
want to use it.  I.e. what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Ok, but trying to leave just the qemu_in_main_thread() assertion 
makes test 307 (./check 307) fail.
I am actually not sure on why it fails, but I am sure it is because 
of the assertion, since without it it passes.


I tried with gdb (./check -gdb 307 on one terminal and
gdb -iex "target remote localhost:12345"
in another) but it points me to this below, which I think is the 
ndb server getting the socket closed (because on the other side it 
crashed), and not the actual error.



Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
0x768af54d in sendmsg () from target:/lib64/libc.so.6
(gdb) bt
#0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
#1  0x55c13cc9 in qio_channel_socket_writev (ioc=out>, iov=0x569a4870, niov=1, fds=0x0, nfds=, 
errp=0x0)

 at ../io/channel-socket.c:561
#2  0x55c19b18 in qio_channel_writev_full_all 
(ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, 
niov=niov@entry=1, fds=fds@entry=0x0,

 nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
#3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1, 
iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
#4  qio_channel_write_all (ioc=, 
buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20, 
errp=errp@entry=0x0) at ../io/channel.c:330
#5  0x55c27e75 in nbd_write (errp=0x0, size=20, 
buffer=0x7fffe8dffdd0, ioc=) at 
../nbd/nbd-internal.h:71
#6  nbd_negotiate_send_rep_len (client=client@entry=0x56f60930, 
type=type@entry=1, len=len@entry=0, errp=errp@entry=0x0) at 
../nbd/server.c:203
#7  0x55c29db1 in nbd_negotiate_send_rep (errp=0x0, type=1, 
client=0x56f60930) at ../nbd/server.c:211

--Type  for more, q to quit, c to continue without paging--
#8  nbd_negotiate_options (errp=0x7fffe8dffe88, client=out>) at ../nbd/server.c:1224
#9  nbd_negotiate (errp=0x7fffe8dffe88, client=) at 
../nbd/server.c:1340
#10 nbd_co_client_start (opaque=) at 
../nbd/server.c:2715
#11 0x55d70423 in coroutine_trampoline (i0=, 
i1=) at ../util/cor

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2022-01-19 Thread Hanna Reitz

On 23.12.21 18:11, Hanna Reitz wrote:

On 20.12.21 13:20, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 12:04, Hanna Reitz wrote:

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve 
seen that the tests simply sometimes test things that shouldn’t be 
allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+    BdrvChild *parent;
+
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+    return false;
+    }
+
+    QLIST_FOREACH(parent, &bs->parents, next_parent) {
+    if (parent->klass->parent_is_bds) {
+    BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have made 
the parent link opaque so that a BDS cannot easily reach its 
parents.  All accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all. The 
only fact that determines whether the current BDS will have its 
permissions changed is whether the BDS itself is active or 
inactive.  Sure, we’ll invoke bdrv_co_invalidate_cache() on the 
parents, too, but then we could simply let the assertion fail there.



+    if (!bdrv_is_active(parent_bs)) {
+    return false;
+    }
+    }
+    }
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, 
Error **errp)

  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)

  return -ENOMEDIUM;
  }
+    /*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+    assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you don’t 
call this in the main thread, this better be a no-op”, i.e., you 
must never call this function in an I/O thread if you really want 
to use it.  I.e. what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Ok, but trying to leave just the qemu_in_main_thread() assertion 
makes test 307 (./check 307) fail.
I am actually not sure on why it fails, but I am sure it is because 
of the assertion, since without it it passes.


I tried with gdb (./check -gdb 307 on one terminal and
gdb -iex "target remote localhost:12345"
in another) but it points me to this below, which I think is the ndb 
server getting the socket closed (because on the other side it 
crashed), and not the actual error.



Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
0x768af54d in sendmsg () from target:/lib64/libc.so.6
(gdb) bt
#0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
#1  0x55c13cc9 in qio_channel_socket_writev (ioc=out>, iov=0x569a4870, niov=1, fds=0x0, nfds=, 
errp=0x0)

 at ../io/channel-socket.c:561
#2  0x55c19b18 in qio_channel_writev_full_all 
(ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, 
niov=niov@entry=1, fds=fds@entry=0x0,

 nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
#3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1, 
iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
#4  qio_channel_write_all (ioc=, 
buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20, 
errp=errp@entry=0x0) at ../io/channel.c:330
#5  0x55c27e75 in nbd_write (errp=0x0, size=20, 
buffer=0x7fffe8dffdd0, ioc=) at ../nbd/nbd-internal.h:71
#6  nbd_negotiate_send_rep_len (client=client@entry=0x56f60930, 
type=type@entry=1, len=len@entry=0, errp=errp@entry=0x0) at 
../nbd/server.c:203
#7  0x55c29db1 in nbd_negotiate_send_rep (errp=0x0, type=1, 
client=0x56f60930) at ../nbd/server.c:211

--Type  for more, q to quit, c to continue without paging--
#8  nbd_negotiate_options (errp=0x7fffe8dffe88, client=out>) at ../nbd/server.c:1224
#9  nbd_negotiate (errp=0x7fffe8dffe88, client=) at 
../nbd/server.c:1340
#10 nbd_co_client_start (opaque=) at 
../nbd/server.c:2715
#11 0x55d70423 in coroutine_trampoline (i0=, 
i1=) at ../util/coroutine-ucontext.c:173

#12 0x767

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2021-12-23 Thread Hanna Reitz

On 20.12.21 13:20, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 12:04, Hanna Reitz wrote:

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve 
seen that the tests simply sometimes test things that shouldn’t be 
allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+    BdrvChild *parent;
+
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+    return false;
+    }
+
+    QLIST_FOREACH(parent, &bs->parents, next_parent) {
+    if (parent->klass->parent_is_bds) {
+    BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have made 
the parent link opaque so that a BDS cannot easily reach its 
parents.  All accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all. The 
only fact that determines whether the current BDS will have its 
permissions changed is whether the BDS itself is active or 
inactive.  Sure, we’ll invoke bdrv_co_invalidate_cache() on the 
parents, too, but then we could simply let the assertion fail there.



+    if (!bdrv_is_active(parent_bs)) {
+    return false;
+    }
+    }
+    }
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, 
Error **errp)

  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)

  return -ENOMEDIUM;
  }
+    /*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+    assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you don’t 
call this in the main thread, this better be a no-op”, i.e., you 
must never call this function in an I/O thread if you really want to 
use it.  I.e. what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Ok, but trying to leave just the qemu_in_main_thread() assertion 
makes test 307 (./check 307) fail.
I am actually not sure on why it fails, but I am sure it is because 
of the assertion, since without it it passes.


I tried with gdb (./check -gdb 307 on one terminal and
gdb -iex "target remote localhost:12345"
in another) but it points me to this below, which I think is the ndb 
server getting the socket closed (because on the other side it 
crashed), and not the actual error.



Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
0x768af54d in sendmsg () from target:/lib64/libc.so.6
(gdb) bt
#0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
#1  0x55c13cc9 in qio_channel_socket_writev (ioc=out>, iov=0x569a4870, niov=1, fds=0x0, nfds=, 
errp=0x0)

 at ../io/channel-socket.c:561
#2  0x55c19b18 in qio_channel_writev_full_all 
(ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, niov=niov@entry=1, 
fds=fds@entry=0x0,

 nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
#3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1, 
iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
#4  qio_channel_write_all (ioc=, 
buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20, 
errp=errp@entry=0x0) at ../io/channel.c:330
#5  0x55c27e75 in nbd_write (errp=0x0, size=20, 
buffer=0x7fffe8dffdd0, ioc=) at ../nbd/nbd-internal.h:71
#6  nbd_negotiate_send_rep_len (client=client@entry=0x56f60930, 
type=type@entry=1, len=len@entry=0, errp=errp@entry=0x0) at 
../nbd/server.c:203
#7  0x55c29db1 in nbd_negotiate_send_rep (errp=0x0, type=1, 
client=0x56f60930) at ../nbd/server.c:211

--Type  for more, q to quit, c to continue without paging--
#8  nbd_negotiate_options (errp=0x7fffe8dffe88, client=out>) at ../nbd/server.c:1224
#9  nbd_negotiate (errp=0x7fffe8dffe88, client=) at 
../nbd/server.c:1340

#10 nbd_co_client_start (opaque=) at ../nbd/server.c:2715
#11 0x55d70423 in coroutine_trampoline (i0=, 
i1=) at ../util/coroutine-ucontext.c:173

#12 0x767f3820 in ?? () from target:/lib64/libc.

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2021-12-20 Thread Emanuele Giuseppe Esposito




On 17/12/2021 17:38, Emanuele Giuseppe Esposito wrote:



On 17/12/2021 12:04, Hanna Reitz wrote:

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve seen 
that the tests simply sometimes test things that shouldn’t be allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+    BdrvChild *parent;
+
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+    return false;
+    }
+
+    QLIST_FOREACH(parent, &bs->parents, next_parent) {
+    if (parent->klass->parent_is_bds) {
+    BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have made 
the parent link opaque so that a BDS cannot easily reach its parents.  
All accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all.  The only 
fact that determines whether the current BDS will have its permissions 
changed is whether the BDS itself is active or inactive.  Sure, we’ll 
invoke bdrv_co_invalidate_cache() on the parents, too, but then we 
could simply let the assertion fail there.



+    if (!bdrv_is_active(parent_bs)) {
+    return false;
+    }
+    }
+    }
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, 
Error **errp)

  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)

  return -ENOMEDIUM;
  }
+    /*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+    assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you don’t 
call this in the main thread, this better be a no-op”, i.e., you must 
never call this function in an I/O thread if you really want to use 
it.  I.e. what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Ok, but trying to leave just the qemu_in_main_thread() assertion makes 
test 307 (./check 307) fail.
I am actually not sure on why it fails, but I am sure it is because of 
the assertion, since without it it passes.


I tried with gdb (./check -gdb 307 on one terminal and
gdb -iex "target remote localhost:12345"
in another) but it points me to this below, which I think is the ndb 
server getting the socket closed (because on the other side it crashed), 
and not the actual error.



Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
0x768af54d in sendmsg () from target:/lib64/libc.so.6
(gdb) bt
#0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
#1  0x55c13cc9 in qio_channel_socket_writev (ioc=out>, iov=0x569a4870, niov=1, fds=0x0, nfds=, errp=0x0)

     at ../io/channel-socket.c:561
#2  0x55c19b18 in qio_channel_writev_full_all 
(ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, niov=niov@entry=1, 
fds=fds@entry=0x0,

     nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
#3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1, 
iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
#4  qio_channel_write_all (ioc=, 
buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20, 
errp=errp@entry=0x0) at ../io/channel.c:330
#5  0x55c27e75 in nbd_write (errp=0x0, size=20, 
buffer=0x7fffe8dffdd0, ioc=) at ../nbd/nbd-internal.h:71
#6  nbd_negotiate_send_rep_len (client=client@entry=0x56f60930, 
type=type@entry=1, len=len@entry=0, errp=errp@entry=0x0) at 
../nbd/server.c:203
#7  0x55c29db1 in nbd_negotiate_send_rep (errp=0x0, type=1, 
client=0x56f60930) at ../nbd/server.c:211

--Type  for more, q to quit, c to continue without paging--
#8  nbd_negotiate_options (errp=0x7fffe8dffe88, client=) 
at ../nbd/server.c:1224
#9  nbd_negotiate (errp=0x7fffe8dffe88, client=) at 
../nbd/server.c:1340

#10 nbd_co_client_start (opaque=) at ../nbd/server.c:2715
#11 0x55d70423 in coroutine_trampoline (i0=, 
i1=) at ../util/coroutine-ucontext.c:173

#12 0x767f3820 in ?? () from target:/lib64/libc.so.6
#13 0x7fffca80 in ?? ()



Ok the reason for

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2021-12-17 Thread Emanuele Giuseppe Esposito




On 17/12/2021 12:04, Hanna Reitz wrote:

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve seen 
that the tests simply sometimes test things that shouldn’t be allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+    BdrvChild *parent;
+
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+    return false;
+    }
+
+    QLIST_FOREACH(parent, &bs->parents, next_parent) {
+    if (parent->klass->parent_is_bds) {
+    BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have made the 
parent link opaque so that a BDS cannot easily reach its parents.  All 
accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all.  The only 
fact that determines whether the current BDS will have its permissions 
changed is whether the BDS itself is active or inactive.  Sure, we’ll 
invoke bdrv_co_invalidate_cache() on the parents, too, but then we could 
simply let the assertion fail there.



+    if (!bdrv_is_active(parent_bs)) {
+    return false;
+    }
+    }
+    }
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, 
Error **errp)

  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)

  return -ENOMEDIUM;
  }
+    /*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+    assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you don’t call 
this in the main thread, this better be a no-op”, i.e., you must never 
call this function in an I/O thread if you really want to use it.  I.e. 
what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Ok, but trying to leave just the qemu_in_main_thread() assertion makes 
test 307 (./check 307) fail.
I am actually not sure on why it fails, but I am sure it is because of 
the assertion, since without it it passes.


I tried with gdb (./check -gdb 307 on one terminal and
gdb -iex "target remote localhost:12345"
in another) but it points me to this below, which I think is the ndb 
server getting the socket closed (because on the other side it crashed), 
and not the actual error.



Thread 1 "qemu-system-x86" received signal SIGPIPE, Broken pipe.
0x768af54d in sendmsg () from target:/lib64/libc.so.6
(gdb) bt
#0  0x768af54d in sendmsg () from target:/lib64/libc.so.6
#1  0x55c13cc9 in qio_channel_socket_writev (ioc=out>, iov=0x569a4870, niov=1, fds=0x0, nfds=, errp=0x0)

at ../io/channel-socket.c:561
#2  0x55c19b18 in qio_channel_writev_full_all 
(ioc=0x5763b800, iov=iov@entry=0x7fffe8dffd80, niov=niov@entry=1, 
fds=fds@entry=0x0,

nfds=nfds@entry=0, errp=errp@entry=0x0) at ../io/channel.c:240
#3  0x55c19bd2 in qio_channel_writev_all (errp=0x0, niov=1, 
iov=0x7fffe8dffd80, ioc=) at ../io/channel.c:220
#4  qio_channel_write_all (ioc=, 
buf=buf@entry=0x7fffe8dffdd0 "", buflen=buflen@entry=20, 
errp=errp@entry=0x0) at ../io/channel.c:330
#5  0x55c27e75 in nbd_write (errp=0x0, size=20, 
buffer=0x7fffe8dffdd0, ioc=) at ../nbd/nbd-internal.h:71
#6  nbd_negotiate_send_rep_len (client=client@entry=0x56f60930, 
type=type@entry=1, len=len@entry=0, errp=errp@entry=0x0) at 
../nbd/server.c:203
#7  0x55c29db1 in nbd_negotiate_send_rep (errp=0x0, type=1, 
client=0x56f60930) at ../nbd/server.c:211

--Type  for more, q to quit, c to continue without paging--
#8  nbd_negotiate_options (errp=0x7fffe8dffe88, client=) 
at ../nbd/server.c:1224
#9  nbd_negotiate (errp=0x7fffe8dffe88, client=) at 
../nbd/server.c:1340

#10 nbd_co_client_start (opaque=) at ../nbd/server.c:2715
#11 0x55d70423 in coroutine_trampoline (i0=, 
i1=) at ../util/coroutine-ucontext.c:173

#12 0x767f3820 in ?? () from target:/lib64/libc.so.6
#13 0x7fffca80 in ?? ()

Emanuele


Hanna


  QLIST_FOREACH(child, &bs->children, next) {
  

Re: [PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2021-12-17 Thread Hanna Reitz

On 24.11.21 07:44, Emanuele Giuseppe Esposito wrote:

bdrv_co_invalidate_cache is special: it is an I/O function,


I still don’t believe it is, but well.

(Yes, it is called by a test in an iothread, but I believe we’ve seen 
that the tests simply sometimes test things that shouldn’t be allowed.)



but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
  bdrv_init();
  }
  
+static bool bdrv_is_active(BlockDriverState *bs)

+{
+BdrvChild *parent;
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+return false;
+}
+
+QLIST_FOREACH(parent, &bs->parents, next_parent) {
+if (parent->klass->parent_is_bds) {
+BlockDriverState *parent_bs = parent->opaque;


This looks like a really bad hack to me.  We purposefully have made the 
parent link opaque so that a BDS cannot easily reach its parents.  All 
accesses should go through BdrvChildClass methods.


I also don’t understand why we need to query parents at all.  The only 
fact that determines whether the current BDS will have its permissions 
changed is whether the BDS itself is active or inactive.  Sure, we’ll 
invoke bdrv_co_invalidate_cache() on the parents, too, but then we could 
simply let the assertion fail there.



+if (!bdrv_is_active(parent_bs)) {
+return false;
+}
+}
+}
+
+   return true;
+}
+
  int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
  {
  BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
  return -ENOMEDIUM;
  }
  
+/*

+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+assert(qemu_in_main_thread() || bdrv_is_active(bs));
+


I don’t understand this, really.  It looks to me like “if you don’t call 
this in the main thread, this better be a no-op”, i.e., you must never 
call this function in an I/O thread if you really want to use it.  I.e. 
what I’d classify as a GS function.


It sounds like this is just a special case for said test, and 
special-casing code for tests sounds like a bad idea.


Hanna


  QLIST_FOREACH(child, &bs->children, next) {
  bdrv_co_invalidate_cache(child->bs, &local_err);
  if (local_err) {