Re: [PATCH-for-9.0? 3/3] hw/block/nand: Fix out-of-bound access in NAND block buffer

2024-04-08 Thread Philippe Mathieu-Daudé

On 8/4/24 18:39, Richard Henderson wrote:

On 4/7/24 22:36, Philippe Mathieu-Daudé wrote:

nand_command() and nand_getio() don't check @offset points
into the block, nor the available data length (s->iolen) is
not negative.

In order to fix:

- check the offset is in range in nand_blk_load_NAND_PAGE_SIZE(),
- do not set @iolen if blk_load() failed.


Do not set, or do not set to non-zero?  I had been wondering if the 


Oh, "do not set to non-zero", thanks :)


final assignment to s->iolen should go into nand_load_block as well...


For the next tag I rather keep it this way which seems more
explicit to me.


Reviewed-by: Richard Henderson 


Thanks!




Re: [PATCH-for-9.0] hw/sd/sdhci: Discard excess of data written to Buffer Data Port register

2024-04-08 Thread Peter Maydell
On Mon, 8 Apr 2024 at 13:34, Peter Maydell  wrote:
>
> On Thu, 4 Apr 2024 at 09:56, Philippe Mathieu-Daudé  wrote:
> > diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
> > index c5e0bc018b..2dd88fa139 100644
> > --- a/hw/sd/sdhci.c
> > +++ b/hw/sd/sdhci.c
> > @@ -552,7 +552,7 @@ static void sdhci_write_block_to_card(SDHCIState *s)
> >   * register */
> >  static void sdhci_write_dataport(SDHCIState *s, uint32_t value, unsigned 
> > size)
> >  {
> > -unsigned i;
> > +unsigned i, available;
> >
> >  /* Check that there is free space left in a buffer */
> >  if (!(s->prnsts & SDHC_SPACE_AVAILABLE)) {
> > @@ -560,6 +560,14 @@ static void sdhci_write_dataport(SDHCIState *s, 
> > uint32_t value, unsigned size)
> >  return;
> >  }
> >
> > +available = s->buf_maxsz - s->data_count;
> > +if (size > available) {
> > +qemu_log_mask(LOG_GUEST_ERROR, "SDHC buffer data full (size: 
> > %"PRIu32")"
> > +   " discarding %u byte%s\n",
> > +   s->buf_maxsz, size - available,
> > +   size - available > 1 ? "s" : "");
> > +size = available; /* Excess data of the last write is ignored. */
> > +}
> >  for (i = 0; i < size; i++) {
> >  s->fifo_buffer[s->data_count] = value & 0xFF;
> >  s->data_count++;
>
> So, this will definitely avoid the buffer overrun, and the
> quoted text also suggests that we should not be doing the
> "if sdhci_write_block_to_card() writes the data then keep
> going with the rest of the bytes in the value for the start
> of the new block". (With this change we could move the
> "if (s->data_count >= (s->blksize & BLOCK_SIZE_MASK)) ..."
> out of the for() loop and down to the bottom of the function.)
>
> But I'm not sure it fixes the underlying cause of the problem,
> because the repro case isn't writing a multi-byte value, it's
> only writing a single byte.
>
> It looks from the code like if there's no space in the
> buffer then SDHC_SPACE_AVAILABLE should be clear in the
> present-status register, but that has somehow got out of sync.
> The way the repro from the fuzzer toggles the device in and
> out of DMA mode looks suspicious about how that out-of-sync
> situation might have come about.

It looks like the spec says that TRNMOD writes are supposed to
be ignored while the Command Inhibit (DAT) bit is set, which
would forbid this kind of "turn DMA off in the middle of a
DMA transfer" silliness:

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index c5e0bc018b0..9cd887b7f30 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1208,6 +1208,12 @@ sdhci_write(void *opaque, hwaddr offset,
uint64_t val, unsigned size)
 if (!(s->capareg & R_SDHC_CAPAB_SDMA_MASK)) {
 value &= ~SDHC_TRNS_DMA;
 }
+
+/* TRNMOD writes are inhibited while Command Inhibit (DAT) is true */
+if (s->prnsts & SDHC_DATA_INHIBIT) {
+mask |= 0x;
+}
+
 MASKED_WRITE(s->trnmod, mask, value & SDHC_TRNMOD_MASK);
 MASKED_WRITE(s->cmdreg, mask >> 16, value >> 16);

And if you forbid that TRNMOD write that turns off the DMA
then the write to BDATA is no longer an overrun.

So another approach here would be to do that, plus an assert()
at the point of writing into fifo_buffer[] in case there are
other ways to get there with a bogus SPACE_AVAILABLE setting.
(That way any further bug is only "badly behaved code makes
QEMU assert" rather than an overrun.)

The spec is annoyingly vague about the exact conditions
when writing to the used-in-dma registers like TRNMOD
and the block-size register. Currently we use the
TRANSFERRING_DATA() condition to block writes to BLKSIZE,
and you might have thought TRNMOD would be the same, but
apparently not.

thanks
-- PMM



Re: [PATCH-for-9.0? 3/3] hw/block/nand: Fix out-of-bound access in NAND block buffer

2024-04-08 Thread Richard Henderson

On 4/7/24 22:36, Philippe Mathieu-Daudé wrote:

nand_command() and nand_getio() don't check @offset points
into the block, nor the available data length (s->iolen) is
not negative.

In order to fix:

- check the offset is in range in nand_blk_load_NAND_PAGE_SIZE(),
- do not set @iolen if blk_load() failed.


Do not set, or do not set to non-zero?  I had been wondering if the final assignment to 
s->iolen should go into nand_load_block as well...



diff --git a/hw/block/nand.c b/hw/block/nand.c
index 3627c799b5..d90dc965a1 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -255,7 +255,9 @@ static int nand_load_block(NANDFlashState *s, int offset)
  {
  int iolen;
  
-s->blk_load(s, s->addr, offset);

+if (!s->blk_load(s, s->addr, offset)) {
+return 0;
+}
  
  iolen = (1 << s->page_shift) - offset;

  if (s->gnd) {
@@ -780,6 +782,10 @@ static bool glue(nand_blk_load_, 
NAND_PAGE_SIZE)(NANDFlashState *s,
  return false;
  }
  
+if (offset > NAND_PAGE_SIZE + OOB_SIZE) {

+return false;
+}
+


Reviewed-by: Richard Henderson 


r~




Re: [PATCH-for-9.0? 2/3] hw/block/nand: Have blk_load() return boolean indicating success

2024-04-08 Thread Richard Henderson

On 4/7/24 22:36, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/block/nand.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH-for-9.0? 1/3] hw/block/nand: Factor nand_load_iolen() method out

2024-04-08 Thread Richard Henderson

On 4/7/24 22:36, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/block/nand.c | 32 +++-
  1 file changed, 19 insertions(+), 13 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-08 Thread Peter Xu
On Mon, Apr 08, 2024 at 04:07:20PM +0200, Jinpu Wang wrote:
> Hi Peter,

Jinpu,

Thanks for joining the discussion.

> 
> On Tue, Apr 2, 2024 at 11:24 PM Peter Xu  wrote:
> >
> > On Mon, Apr 01, 2024 at 11:26:25PM +0200, Yu Zhang wrote:
> > > Hello Peter und Zhjian,
> > >
> > > Thank you so much for letting me know about this. I'm also a bit 
> > > surprised at
> > > the plan for deprecating the RDMA migration subsystem.
> >
> > It's not too late, since it looks like we do have users not yet notified
> > from this, we'll redo the deprecation procedure even if it'll be the final
> > plan, and it'll be 2 releases after this.
> >
> > >
> > > > IMHO it's more important to know whether there are still users and 
> > > > whether
> > > > they would still like to see it around.
> > >
> > > > I admit RDMA migration was lack of testing(unit/CI test), which led to 
> > > > the a few
> > > > obvious bugs being noticed too late.
> > >
> > > Yes, we are a user of this subsystem. I was unaware of the lack of test 
> > > coverage
> > > for this part. As soon as 8.2 was released, I saw that many of the
> > > migration test
> > > cases failed and came to realize that there might be a bug between 8.1
> > > and 8.2, but
> > > was unable to confirm and report it quickly to you.
> > >
> > > The maintenance of this part could be too costly or difficult from
> > > your point of view.
> >
> > It may or may not be too costly, it's just that we need real users of RDMA
> > taking some care of it.  Having it broken easily for >1 releases definitely
> > is a sign of lack of users.  It is an implication to the community that we
> > should consider dropping some features so that we can get the best use of
> > the community resources for the things that may have a broader audience.
> >
> > One thing majorly missing is a RDMA tester to guard all the merges to not
> > break RDMA paths, hopefully in CI.  That should not rely on RDMA hardwares
> > but just to sanity check the migration+rdma code running all fine.  RDMA
> > taught us the lesson so we're requesting CI coverage for all other new
> > features that will be merged at least for migration subsystem, so that we
> > plan to not merge anything that is not covered by CI unless extremely
> > necessary in the future.
> >
> > For sure CI is not the only missing part, but I'd say we should start with
> > it, then someone should also take care of the code even if only in
> > maintenance mode (no new feature to add on top).
> >
> > >
> > > My concern is, this plan will forces a few QEMU users (not sure how
> > > many) like us
> > > either to stick to the RDMA migration by using an increasingly older
> > > version of QEMU,
> > > or to abandon the currently used RDMA migration.
> >
> > RDMA doesn't get new features anyway, if there's specific use case for RDMA
> > migrations, would it work if such a scenario uses the old binary?  Is it
> > possible to switch to the TCP protocol with some good NICs?
> We have used rdma migration with HCA from Nvidia for years, our
> experience is RDMA migration works better than tcp (over ipoib).

Please bare with me, as I know little on rdma stuff.

I'm actually pretty confused (and since a long time ago..) on why we need
to operation with rdma contexts when ipoib seems to provide all the tcp
layers.  I meant, can it work with the current "tcp:" protocol with ipoib
even if there's rdma/ib hardwares underneath?  Is it because of performance
improvements so that we must use a separate path comparing to generic
"tcp:" protocol here?

> 
> Switching back to TCP will lead us to the old problems which was
> solved by RDMA migration.

Can you elaborate the problems, and why tcp won't work in this case?  They
may not be directly relevant to the issue we're discussing, but I'm happy
to learn more.

What is the NICs you were testing before?  Did the test carry out with
things like modern ones (50Gbps-200Gbps NICs), or the test was done when
these hardwares are not common?

Per my recent knowledge on the new Intel hardwares, at least the ones that
support QPL, it's easy to achieve single core 50Gbps+.

https://lore.kernel.org/r/ph7pr11mb5941a91ac1e514bcc32896a6a3...@ph7pr11mb5941.namprd11.prod.outlook.com

Quote from Yuan:

  Yes, I use iperf3 to check the bandwidth for one core, the bandwith is 60Gbps.
  [ ID] Interval   Transfer Bitrate Retr  Cwnd
  [  5]   0.00-1.00   sec  7.00 GBytes  60.1 Gbits/sec0   2.87 MBytes
  [  5]   1.00-2.00   sec  7.05 GBytes  60.6 Gbits/sec0   2.87 Mbytes

  And in the live migration test, a multifd thread's CPU utilization is almost 
100%

It boils down to what old problems were there with tcp first, though.

> 
> >
> > Per our best knowledge, RDMA users are rare, and please let anyone know if
> > you are aware of such users.  IIUC the major reason why RDMA stopped being
> > the trend is because the network is not like ten years ago; I don't think I
> > have good knowledge in RDMA at all nor network, but my 

[PATCH v5 1/2] nbd/server: do not poll within a coroutine context

2024-04-08 Thread Eric Blake
From: Zhu Yangyang 

Coroutines are not supposed to block. Instead, they should yield.

The client performs TLS upgrade outside of an AIOContext, during
synchronous handshake; this still requires g_main_loop.  But the
server responds to TLS upgrade inside a coroutine, so a nested
g_main_loop is wrong.  Since the two callbacks no longer share more
than the setting of data.complete and data.error, it's just as easy to
use static helpers instead of trying to share a common code path.  It
is also possible to add assertions that no other code is interfering
with the eventual path to qio reaching the callback, whether or not it
required a yield or main loop.

Fixes: f95910f ("nbd: implement TLS support in the protocol negotiation")
Signed-off-by: Zhu Yangyang 
[eblake: move callbacks to their use point, add assertions]
Signed-off-by: Eric Blake 
---
 nbd/nbd-internal.h | 10 --
 nbd/client.c   | 28 
 nbd/common.c   | 11 ---
 nbd/server.c   | 28 +++-
 4 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index dfa02f77ee4..91895106a95 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -72,16 +72,6 @@ static inline int nbd_write(QIOChannel *ioc, const void 
*buffer, size_t size,
 return qio_channel_write_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
 }

-struct NBDTLSHandshakeData {
-GMainLoop *loop;
-bool complete;
-Error *error;
-};
-
-
-void nbd_tls_handshake(QIOTask *task,
-   void *opaque);
-
 int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);

 #endif
diff --git a/nbd/client.c b/nbd/client.c
index 29ffc609a4b..c89c7504673 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -596,13 +596,31 @@ static int nbd_request_simple_option(QIOChannel *ioc, int 
opt, bool strict,
 return 1;
 }

+/* Callback to learn when QIO TLS upgrade is complete */
+struct NBDTLSClientHandshakeData {
+bool complete;
+Error *error;
+GMainLoop *loop;
+};
+
+static void nbd_client_tls_handshake(QIOTask *task, void *opaque)
+{
+struct NBDTLSClientHandshakeData *data = opaque;
+
+qio_task_propagate_error(task, >error);
+data->complete = true;
+if (data->loop) {
+g_main_loop_quit(data->loop);
+}
+}
+
 static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 QCryptoTLSCreds *tlscreds,
 const char *hostname, Error **errp)
 {
 int ret;
 QIOChannelTLS *tioc;
-struct NBDTLSHandshakeData data = { 0 };
+struct NBDTLSClientHandshakeData data = { 0 };

 ret = nbd_request_simple_option(ioc, NBD_OPT_STARTTLS, true, errp);
 if (ret <= 0) {
@@ -619,18 +637,20 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 return NULL;
 }
 qio_channel_set_name(QIO_CHANNEL(tioc), "nbd-client-tls");
-data.loop = g_main_loop_new(g_main_context_default(), FALSE);
 trace_nbd_receive_starttls_tls_handshake();
 qio_channel_tls_handshake(tioc,
-  nbd_tls_handshake,
+  nbd_client_tls_handshake,
   ,
   NULL,
   NULL);

 if (!data.complete) {
+data.loop = g_main_loop_new(g_main_context_default(), FALSE);
 g_main_loop_run(data.loop);
+assert(data.complete);
+g_main_loop_unref(data.loop);
 }
-g_main_loop_unref(data.loop);
+
 if (data.error) {
 error_propagate(errp, data.error);
 object_unref(OBJECT(tioc));
diff --git a/nbd/common.c b/nbd/common.c
index 3247c1d618a..589a748cfe6 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -47,17 +47,6 @@ int nbd_drop(QIOChannel *ioc, size_t size, Error **errp)
 }


-void nbd_tls_handshake(QIOTask *task,
-   void *opaque)
-{
-struct NBDTLSHandshakeData *data = opaque;
-
-qio_task_propagate_error(task, >error);
-data->complete = true;
-g_main_loop_quit(data->loop);
-}
-
-
 const char *nbd_opt_lookup(uint32_t opt)
 {
 switch (opt) {
diff --git a/nbd/server.c b/nbd/server.c
index c3484cc1ebc..98ae0e16326 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -748,6 +748,23 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 return rc;
 }

+/* Callback to learn when QIO TLS upgrade is complete */
+struct NBDTLSServerHandshakeData {
+bool complete;
+Error *error;
+Coroutine *co;
+};
+
+static void nbd_server_tls_handshake(QIOTask *task, void *opaque)
+{
+struct NBDTLSServerHandshakeData *data = opaque;
+
+qio_task_propagate_error(task, >error);
+data->complete = true;
+if (!qemu_coroutine_entered(data->co)) {
+aio_co_wake(data->co);
+}
+}

 /* Handle NBD_OPT_STARTTLS. Return NULL to drop connection, or else the
  * new channel for all further (now-encrypted) communication. */
@@ -756,7 +773,7 

[PATCH v5 2/2] nbd/server: Mark negotiation functions as coroutine_fn

2024-04-08 Thread Eric Blake
nbd_negotiate() is already marked coroutine_fn.  And given the fix in
the previous patch to have nbd_negotiate_handle_starttls not create
and wait on a g_main_loop (as that would violate coroutine
constraints), it is worth marking the rest of the related static
functions reachable only during option negotiation as also being
coroutine_fn.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Eric Blake 
---
 nbd/server.c | 102 +--
 1 file changed, 59 insertions(+), 43 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 98ae0e16326..1857fba51c1 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -195,8 +195,9 @@ static inline void set_be_option_rep(NBDOptionReply *rep, 
uint32_t option,

 /* Send a reply header, including length, but no payload.
  * Return -errno on error, 0 on success. */
-static int nbd_negotiate_send_rep_len(NBDClient *client, uint32_t type,
-  uint32_t len, Error **errp)
+static coroutine_fn int
+nbd_negotiate_send_rep_len(NBDClient *client, uint32_t type,
+   uint32_t len, Error **errp)
 {
 NBDOptionReply rep;

@@ -211,15 +212,15 @@ static int nbd_negotiate_send_rep_len(NBDClient *client, 
uint32_t type,

 /* Send a reply header with default 0 length.
  * Return -errno on error, 0 on success. */
-static int nbd_negotiate_send_rep(NBDClient *client, uint32_t type,
-  Error **errp)
+static coroutine_fn int
+nbd_negotiate_send_rep(NBDClient *client, uint32_t type, Error **errp)
 {
 return nbd_negotiate_send_rep_len(client, type, 0, errp);
 }

 /* Send an error reply.
  * Return -errno on error, 0 on success. */
-static int G_GNUC_PRINTF(4, 0)
+static coroutine_fn int G_GNUC_PRINTF(4, 0)
 nbd_negotiate_send_rep_verr(NBDClient *client, uint32_t type,
 Error **errp, const char *fmt, va_list va)
 {
@@ -259,7 +260,7 @@ nbd_sanitize_name(const char *name)

 /* Send an error reply.
  * Return -errno on error, 0 on success. */
-static int G_GNUC_PRINTF(4, 5)
+static coroutine_fn int G_GNUC_PRINTF(4, 5)
 nbd_negotiate_send_rep_err(NBDClient *client, uint32_t type,
Error **errp, const char *fmt, ...)
 {
@@ -275,7 +276,7 @@ nbd_negotiate_send_rep_err(NBDClient *client, uint32_t type,
 /* Drop remainder of the current option, and send a reply with the
  * given error type and message. Return -errno on read or write
  * failure; or 0 if connection is still live. */
-static int G_GNUC_PRINTF(4, 0)
+static coroutine_fn int G_GNUC_PRINTF(4, 0)
 nbd_opt_vdrop(NBDClient *client, uint32_t type, Error **errp,
   const char *fmt, va_list va)
 {
@@ -288,7 +289,7 @@ nbd_opt_vdrop(NBDClient *client, uint32_t type, Error 
**errp,
 return ret;
 }

-static int G_GNUC_PRINTF(4, 5)
+static coroutine_fn int G_GNUC_PRINTF(4, 5)
 nbd_opt_drop(NBDClient *client, uint32_t type, Error **errp,
  const char *fmt, ...)
 {
@@ -302,7 +303,7 @@ nbd_opt_drop(NBDClient *client, uint32_t type, Error **errp,
 return ret;
 }

-static int G_GNUC_PRINTF(3, 4)
+static coroutine_fn int G_GNUC_PRINTF(3, 4)
 nbd_opt_invalid(NBDClient *client, Error **errp, const char *fmt, ...)
 {
 int ret;
@@ -319,8 +320,9 @@ nbd_opt_invalid(NBDClient *client, Error **errp, const char 
*fmt, ...)
  * If @check_nul, require that no NUL bytes appear in buffer.
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
-static int nbd_opt_read(NBDClient *client, void *buffer, size_t size,
-bool check_nul, Error **errp)
+static coroutine_fn int
+nbd_opt_read(NBDClient *client, void *buffer, size_t size,
+ bool check_nul, Error **errp)
 {
 if (size > client->optlen) {
 return nbd_opt_invalid(client, errp,
@@ -343,7 +345,8 @@ static int nbd_opt_read(NBDClient *client, void *buffer, 
size_t size,
 /* Drop size bytes from the unparsed payload of the current option.
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
-static int nbd_opt_skip(NBDClient *client, size_t size, Error **errp)
+static coroutine_fn int
+nbd_opt_skip(NBDClient *client, size_t size, Error **errp)
 {
 if (size > client->optlen) {
 return nbd_opt_invalid(client, errp,
@@ -366,8 +369,9 @@ static int nbd_opt_skip(NBDClient *client, size_t size, 
Error **errp)
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success.
  */
-static int nbd_opt_read_name(NBDClient *client, char **name, uint32_t *length,
- Error **errp)
+static coroutine_fn int
+nbd_opt_read_name(NBDClient *client, char **name, uint32_t *length,
+  Error **errp)
 {
 int ret;
 uint32_t len;
@@ -402,8 +406,8 @@ static int 

[PATCH v5 for-9.0? 0/2] Fix NBD TLS poll in coroutine

2024-04-08 Thread Eric Blake
v4 was here:
https://lists.gnu.org/archive/html/qemu-devel/2024-04/msg00624.html

Since then: add some asserts [Vladimir], add second patch with more
coroutine_fn annotations [Vladimir]

Eric Blake (1):
  nbd/server: Mark negotiation functions as coroutine_fn

Zhu Yangyang (1):
  nbd/server: do not poll within a coroutine context

 nbd/nbd-internal.h |  10 
 nbd/client.c   |  28 --
 nbd/common.c   |  11 
 nbd/server.c   | 128 -
 4 files changed, 105 insertions(+), 72 deletions(-)

-- 
2.44.0




Re: [PATCH-for-9.0? 0/3] hw/block/nand: Fix out-of-bound access in NAND block buffer

2024-04-08 Thread Mauro Matteo Cascella
On Mon, Apr 8, 2024 at 10:36 AM Philippe Mathieu-Daudé
 wrote:
>
> Fix for https://gitlab.com/qemu-project/qemu/-/issues/1446

Does hw/block/nand meet the security requirements for CVE assignment?

=> https://www.qemu.org/docs/master/system/security.html

> Philippe Mathieu-Daudé (3):
>   hw/block/nand: Factor nand_load_iolen() method out
>   hw/block/nand: Have blk_load() return boolean indicating success
>   hw/block/nand: Fix out-of-bound access in NAND block buffer
>
>  hw/block/nand.c | 50 +
>  1 file changed, 34 insertions(+), 16 deletions(-)
>
> --
> 2.41.0
>

-- 
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0




[RFC v3 5/6] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-04-08 Thread Jonah Palmer
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..d176ed857e 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[RFC v3 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for virtqueue_flush operations.

The goal of the virtqueue_flush operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to write elements to the used/descriptor
ring in-order and then update used_idx.

The function iterates through the VirtQueueElement used_elems array
in-order starting at vq->used_idx. If the element is valid (filled), the
element is written to the used/descriptor ring. This process continues
until we find an invalid (not filled) element.

If any elements were written, the used_idx is updated.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 75 +-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 0730f26f74..13451d0cae 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -997,6 +997,77 @@ static void virtqueue_packed_flush(VirtQueue *vq, unsigned 
int count)
 }
 }
 
+static void virtqueue_ordered_flush(VirtQueue *vq)
+{
+unsigned int i = vq->used_idx;
+unsigned int ndescs = 0;
+uint16_t old = vq->used_idx;
+bool packed;
+VRingUsedElem uelem;
+
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
+
+if (packed) {
+if (unlikely(!vq->vring.desc)) {
+return;
+}
+} else if (unlikely(!vq->vring.used)) {
+return;
+}
+
+/* First expected in-order element isn't ready, nothing to do */
+if (!vq->used_elems[i].filled) {
+return;
+}
+
+/* Write first expected in-order element to used ring (split VQs) */
+if (!packed) {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+
+/* Search for more filled elements in-order */
+while (vq->used_elems[i].filled) {
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, false);
+} else {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+vq->used_elems[i].filled = false;
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, true);
+vq->used_idx += ndescs;
+if (vq->used_idx >= vq->vring.num) {
+vq->used_idx -= vq->vring.num;
+vq->used_wrap_counter ^= 1;
+vq->signalled_used_valid = false;
+}
+} else {
+vring_used_idx_set(vq, i);
+if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - old))) 
{
+vq->signalled_used_valid = false;
+}
+}
+vq->inuse -= ndescs;
+}
+
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 if (virtio_device_disabled(vq->vdev)) {
@@ -1004,7 +1075,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_flush(vq);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_flush(vq, count);
 } else {
 virtqueue_split_flush(vq, count);
-- 
2.39.3




[RFC v3 6/6] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-04-08 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9034719f1d..43ea738e65 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -385,7 +385,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[RFC v3 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..3ad58100b2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1497,7 +1497,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu
 
 static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
-unsigned int i, head, max;
+unsigned int i, j, head, max;
 VRingMemoryRegionCaches *caches;
 MemoryRegionCache indirect_desc_cache;
 MemoryRegionCache *desc_cache;
@@ -1530,6 +1530,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 goto done;
 }
 
+j = vq->last_avail_idx;
+
 if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
 goto done;
 }
@@ -1621,6 +1623,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
 elem->in_sg[i] = iov[out_num + i];
 }
 
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[j].index = elem->index;
+vq->used_elems[j].len = elem->len;
+vq->used_elems[j].ndescs = elem->ndescs;
+}
+
 vq->inuse++;
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1749,6 +1757,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 elem->index = id;
 elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->last_avail_idx += elem->ndescs;
 vq->inuse += elem->ndescs;
 
-- 
2.39.3




[RFC v3 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for virtqueue_fill operations.

The goal of the virtqueue_fill operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to search for this now-used element,
set its length, and mark the element as filled in the VirtQueue's
used_elems array.

By marking the element as filled, it will indicate that this element is
ready to be flushed, so long as the element is in-order.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 3ad58100b2..0730f26f74 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -872,6 +872,28 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
 vq->used_elems[idx].ndescs = elem->ndescs;
 }
 
+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i = vq->used_idx;
+
+/* Search for element in vq->used_elems */
+while (i != vq->last_avail_idx) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+
 static void virtqueue_packed_fill_desc(VirtQueue *vq,
const VirtQueueElement *elem,
unsigned int idx,
@@ -922,7 +944,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
 virtqueue_split_fill(vq, elem, len, idx);
-- 
2.39.3




[RFC v3 1/6] virtio: Add bool to VirtQueueElement

2024-04-08 Thread Jonah Palmer
Add the boolean 'filled' member to the VirtQueueElement structure. The
use of this boolean will signify if the element has been written to the
used / descriptor ring or not. This boolean is used to support the
VIRTIO_F_IN_ORDER feature.

Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..9034719f1d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -71,6 +71,7 @@ typedef struct VirtQueueElement
 unsigned int ndescs;
 unsigned int out_num;
 unsigned int in_num;
+bool filled;
 hwaddr *in_addr;
 hwaddr *out_addr;
 struct iovec *in_sg;
-- 
2.39.3




[RFC v3 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequent elements are completed out-of-order.

A VirtQueue whose device who uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that has returned from
processing and an "unused" element being an element that has not yet
been processed), we added a boolean 'filled' member to the
VirtQueueElement struct. This flag is set to true when the element comes
back from processing (virtqueue_ordered_fill) and then set back to false
once it's been written to the used/descriptor ring
(virtqueue_ordered_flush).

---
v3: Add elements to used_elems during virtqueue_split/packed_pop
Replace current_seq_idx usage with vq->last_avail_idx
Remove used_seq_idx, leverage used_idx and last_avail_idx for
searching used_elems
Remove seq_idx in VirtQueueElement
Add boolean to VirtQueueElement to signal element status
Add virtqueue_ordered_fill/flush functions for ordering

v2: Use a VirtQueue's used_elems array as a buffer mechanism

v1: Implement custom GLib GHashTable as a buffer mechanism

Jonah Palmer (6):
  virtio: Add bool to VirtQueueElement
  virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c|   1 +
 hw/net/vhost_net.c   |   2 +
 hw/scsi/vhost-scsi.c |   1 +
 hw/scsi/vhost-user-scsi.c|   1 +
 hw/virtio/vhost-user-fs.c|   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c   | 118 ++-
 include/hw/virtio/virtio.h   |   5 +-
 net/vhost-vdpa.c |   1 +
 9 files changed, 127 insertions(+), 4 deletions(-)

-- 
2.39.3




Re: [PATCH-for-9.1 2/2] hw/sd/sdcard: Assert @data_offset is in range

2024-04-08 Thread Peter Maydell
On Mon, 8 Apr 2024 at 15:18, Philippe Mathieu-Daudé  wrote:
>
> Prevent out-of-bound access with assertions.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/sd/sd.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 16d8d52a78..c081211582 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1875,6 +1875,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
>  sd->current_cmd, value);
>  switch (sd->current_cmd) {
>  case 24:  /* CMD24:  WRITE_SINGLE_BLOCK */
> +assert(sd->data_offset < sizeof(sd->data));
>  sd->data[sd->data_offset ++] = value;

Abstract out functions

static void append_sd_data_byte(SDState *sd, uint8_t value)
{
assert(sd->data_offset < sizeof(sd->data));
sd->data[sd->data_offset++] = value;
}

static void read_sd_data_byte(SDState *sd, uint8_t value)
{
assert(sd->data_offset < sizeof(sd->sd_data));
return sd->data[sd->data_offset++];
}

(etc for read_sd_status_byte() etc) ?

(sadly I don't think there's a verb that is the equivalent
of "prepend/append" but for removing elements.)


>  case 22:  /* ACMD22: SEND_NUM_WR_BLOCKS */
> +assert(sd->data_offset < sizeof(sd->sd_status));
>  ret = sd->data[sd->data_offset ++];

Checking against the size of a different array from
the one we're reading from.

thanks
-- PMM



[RFC PATCH-for-9.0? 1/2] hw/sd/sdcard: Avoid OOB in sd_read_byte() during unexpected CMD switch

2024-04-08 Thread Philippe Mathieu-Daudé
For multi-bytes commands, our implementation uses the @data_start
and @data_offset fields to track byte access. We initialize the
command start/offset in buffer once. Malicious guest might abuse
by switching command while staying in the 'transfer' state, switching
command buffer size, and our implementation can access out of buffer
boundary. For example, CMD17 (READ_SINGLE_BLOCK) allows to read up to
512 bytes, and CMD13 (SEND_STATUS) up to 64 bytes. By switching from
CMD17 to CMD13 (see reproducer below), bytes [64-511] are out of the
'status' buffer.

Our implementation return R0 status code for unexpected commands.
Such in-transaction command switch is unexpected and returns R0.
This is a good place to reset the start/offset fields to avoid
malicious accesses.

Can be reproduced running:

  $ export UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1
  $ cat << EOF | qemu-system-i386 \
 -display none -nographic \
 -machine accel=qtest -m 512M \
 -nodefaults \
 -device sdhci-pci,sd-spec-version=3 \
 -device sd-card,drive=mydrive \
 -drive 
if=none,index=0,file=null-co://,format=raw,id=mydrive \
 -qtest stdio -trace sd\* -trace -sdbus_read
  outl 0xcf8 0x80001010
  outl 0xcfc 0xe000
  outl 0xcf8 0x80001004
  outw 0xcfc 0x02
  write 0xe02c 0x1 0x05
  write 0xe00f 0x1 0x37
  write 0xe00a 0x1 0x01
  write 0xe00f 0x1 0x29
  write 0xe00f 0x1 0x02
  write 0xe00f 0x1 0x03
  write 0xe00c 0x1 0x32
  write 0xe00f 0x1 0x06
  write 0xe005 0x1 0x01
  write 0xe007 0x1 0x01
  write 0xe003 0x1 0x00
  write 0xe00f 0x1 0x11
  write 0xe02a 0x1 0x01
  write 0xe02a 0x1 0x02
  write 0xe00f 0x1 0x0d
  write 0xe02a 0x1 0x01
  write 0xe02a 0x1 0x02
  EOF
  hw/sd/sd.c:1984:15: runtime error: index 256 out of bounds for type 'uint8_t 
[64]'
  #0 sd_read_byte hw/sd/sd.c:1984:15
  #1 sdbus_read_data hw/sd/core.c:157:23
  #2 sdhci_read_block_from_card hw/sd/sdhci.c:423:9
  #3 sdhci_blkgap_write hw/sd/sdhci.c:1074:13
  #4 sdhci_write hw/sd/sdhci.c:1195:13
  #5 memory_region_write_accessor softmmu/memory.c:492:5
  #6 access_with_adjusted_size softmmu/memory.c:554:18
  #7 memory_region_dispatch_write softmmu/memory.c
  #8 flatview_write_continue softmmu/physmem.c:2778:23
  #9 flatview_write softmmu/physmem.c:2818:14
  #10 address_space_write softmmu/physmem.c:2910:18
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior hw/sd/sd.c:1984:15

Reported-by: Alexander Bulekov 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/487
Buglink: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=36240
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 807b5d3de3..16d8d52a78 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1826,6 +1826,12 @@ send_response:
 break;
 
 case sd_r0:
+/*
+ * Invalid state transition, reset implementation
+ * fields to avoid OOB abuse.
+ */
+sd->data_start = 0;
+sd->data_offset = 0;
 case sd_illegal:
 rsplen = 0;
 break;
-- 
2.41.0




[PATCH-for-9.1 2/2] hw/sd/sdcard: Assert @data_offset is in range

2024-04-08 Thread Philippe Mathieu-Daudé
Prevent out-of-bound access with assertions.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 16d8d52a78..c081211582 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1875,6 +1875,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
 sd->current_cmd, value);
 switch (sd->current_cmd) {
 case 24:  /* CMD24:  WRITE_SINGLE_BLOCK */
+assert(sd->data_offset < sizeof(sd->data));
 sd->data[sd->data_offset ++] = value;
 if (sd->data_offset >= sd->blk_len) {
 /* TODO: Check CRC before committing */
@@ -1901,6 +1902,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
 }
 }
 }
+assert(sd->data_offset < sizeof(sd->data));
 sd->data[sd->data_offset++] = value;
 if (sd->data_offset >= sd->blk_len) {
 /* TODO: Check CRC before committing */
@@ -1925,6 +1927,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
 break;
 
 case 26:  /* CMD26:  PROGRAM_CID */
+assert(sd->data_offset < sizeof(sd->data));
 sd->data[sd->data_offset ++] = value;
 if (sd->data_offset >= sizeof(sd->cid)) {
 /* TODO: Check CRC before committing */
@@ -1944,6 +1947,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
 break;
 
 case 27:  /* CMD27:  PROGRAM_CSD */
+assert(sd->data_offset < sizeof(sd->data));
 sd->data[sd->data_offset ++] = value;
 if (sd->data_offset >= sizeof(sd->csd)) {
 /* TODO: Check CRC before committing */
@@ -1968,6 +1972,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
 break;
 
 case 42:  /* CMD42:  LOCK_UNLOCK */
+assert(sd->data_offset < sizeof(sd->data));
 sd->data[sd->data_offset ++] = value;
 if (sd->data_offset >= sd->blk_len) {
 /* TODO: Check CRC before committing */
@@ -1979,6 +1984,7 @@ void sd_write_byte(SDState *sd, uint8_t value)
 break;
 
 case 56:  /* CMD56:  GEN_CMD */
+assert(sd->data_offset < sizeof(sd->data));
 sd->data[sd->data_offset ++] = value;
 if (sd->data_offset >= sd->blk_len) {
 APP_WRITE_BLOCK(sd->data_start, sd->data_offset);
@@ -2046,6 +2052,7 @@ uint8_t sd_read_byte(SDState *sd)
 break;
 
 case 13:  /* ACMD13: SD_STATUS */
+assert(sd->data_offset < sizeof(sd->sd_status));
 ret = sd->sd_status[sd->data_offset ++];
 
 if (sd->data_offset >= sizeof(sd->sd_status))
@@ -2055,6 +2062,7 @@ uint8_t sd_read_byte(SDState *sd)
 case 17:  /* CMD17:  READ_SINGLE_BLOCK */
 if (sd->data_offset == 0)
 BLK_READ_BLOCK(sd->data_start, io_len);
+assert(sd->data_offset < sizeof(sd->data));
 ret = sd->data[sd->data_offset ++];
 
 if (sd->data_offset >= io_len)
@@ -2069,6 +2077,7 @@ uint8_t sd_read_byte(SDState *sd)
 }
 BLK_READ_BLOCK(sd->data_start, io_len);
 }
+assert(sd->data_offset < sizeof(sd->data));
 ret = sd->data[sd->data_offset ++];
 
 if (sd->data_offset >= io_len) {
@@ -2089,10 +2098,12 @@ uint8_t sd_read_byte(SDState *sd)
 if (sd->data_offset >= SD_TUNING_BLOCK_SIZE - 1) {
 sd->state = sd_transfer_state;
 }
+assert(sd->data_offset < sizeof(sd_tuning_block_pattern));
 ret = sd_tuning_block_pattern[sd->data_offset++];
 break;
 
 case 22:  /* ACMD22: SEND_NUM_WR_BLOCKS */
+assert(sd->data_offset < sizeof(sd->sd_status));
 ret = sd->data[sd->data_offset ++];
 
 if (sd->data_offset >= 4)
@@ -2100,6 +2111,7 @@ uint8_t sd_read_byte(SDState *sd)
 break;
 
 case 30:  /* CMD30:  SEND_WRITE_PROT */
+assert(sd->data_offset < sizeof(sd->data));
 ret = sd->data[sd->data_offset ++];
 
 if (sd->data_offset >= 4)
@@ -2107,6 +2119,7 @@ uint8_t sd_read_byte(SDState *sd)
 break;
 
 case 51:  /* ACMD51: SEND_SCR */
+assert(sd->data_offset < sizeof(sd->scr));
 ret = sd->scr[sd->data_offset ++];
 
 if (sd->data_offset >= sizeof(sd->scr))
@@ -2116,6 +2129,7 @@ uint8_t sd_read_byte(SDState *sd)
 case 56:  /* CMD56:  GEN_CMD */
 if (sd->data_offset == 0)
 APP_READ_BLOCK(sd->data_start, sd->blk_len);
+assert(sd->data_offset < sizeof(sd->data));
 ret = sd->data[sd->data_offset ++];
 
 if (sd->data_offset >= sd->blk_len)
-- 
2.41.0




[PATCH-for-9.0? 0/2] hw/sd/sdcard: Avoid OOB in sd_read_byte()

2024-04-08 Thread Philippe Mathieu-Daudé
Since this is Fix day, I went over this old bug:
https://gitlab.com/qemu-project/qemu/-/issues/487
It happens to be a QEMU implementation detail not
really related to the spec.

Philippe Mathieu-Daudé (2):
  hw/sd/sdcard: Avoid OOB in sd_read_byte() during unexpected CMD switch
  hw/sd/sdcard: Assert @data_offset is in range

 hw/sd/sd.c | 20 
 1 file changed, 20 insertions(+)

-- 
2.41.0




Re: [PATCH v4] nbd/server: do not poll within a coroutine context

2024-04-08 Thread Eric Blake
On Mon, Apr 08, 2024 at 11:46:39AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 05.04.24 20:44, Eric Blake wrote:
> > From: Zhu Yangyang 
> > 
> > Coroutines are not supposed to block. Instead, they should yield.
> > 
> > The client performs TLS upgrade outside of an AIOContext, during
> > synchronous handshake; this still requires g_main_loop.  But the
> > server responds to TLS upgrade inside a coroutine, so a nested
> > g_main_loop is wrong.  Since the two callbacks no longer share more
> > than the setting of data.complete and data.error, it's just as easy to
> > use static helpers instead of trying to share a common code path.
> > 
> > Fixes: f95910f ("nbd: implement TLS support in the protocol negotiation")
> > Signed-off-by: Zhu Yangyang 
> > [eblake: move callbacks to their use point]
> > Signed-off-by: Eric Blake 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

I'm debating whether it is worth trying to shove this into 9.0; -rc3
is very late, and the problem is pre-existing, so I'm leaning towards
no.  At which point, it's better to get this right.

> 
> still, some notes below
> 
> > ---
> > 
> > v3: https://lists.gnu.org/archive/html/qemu-devel/2024-04/msg00375.html
> > 
> > in v4, factor even the struct to the .c files, avoiding a union [Vladimir]
> > 
> >   nbd/nbd-internal.h | 10 --
> >   nbd/client.c   | 27 +++
> >   nbd/common.c   | 11 ---
> >   nbd/server.c   | 29 +++--
> >   4 files changed, 46 insertions(+), 31 deletions(-)
> > 

> > +++ b/nbd/client.c
> > @@ -596,13 +596,31 @@ static int nbd_request_simple_option(QIOChannel *ioc, 
> > int opt, bool strict,
> >   return 1;
> >   }
> > 
> > +/* Callback to learn when QIO TLS upgrade is complete */
> > +struct NBDTLSClientHandshakeData {
> > +bool complete;
> > +Error *error;
> > +GMainLoop *loop;
> > +};
> > +
> > +static void nbd_client_tls_handshake(QIOTask *task, void *opaque)
> > +{
> > +struct NBDTLSClientHandshakeData *data = opaque;
> > +
> > +qio_task_propagate_error(task, >error);
> > +data->complete = true;
> > +if (data->loop) {
> > +g_main_loop_quit(data->loop);
> > +}
> > +}
> > +
> >   static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
> >   QCryptoTLSCreds *tlscreds,
> >   const char *hostname, Error 
> > **errp)
> >   {
> >   int ret;
> >   QIOChannelTLS *tioc;
> > -struct NBDTLSHandshakeData data = { 0 };
> > +struct NBDTLSClientHandshakeData data = { 0 };
> > 
> >   ret = nbd_request_simple_option(ioc, NBD_OPT_STARTTLS, true, errp);
> >   if (ret <= 0) {
> > @@ -619,18 +637,19 @@ static QIOChannel *nbd_receive_starttls(QIOChannel 
> > *ioc,
> >   return NULL;
> >   }
> >   qio_channel_set_name(QIO_CHANNEL(tioc), "nbd-client-tls");
> > -data.loop = g_main_loop_new(g_main_context_default(), FALSE);
> >   trace_nbd_receive_starttls_tls_handshake();
> >   qio_channel_tls_handshake(tioc,
> > -  nbd_tls_handshake,
> > +  nbd_client_tls_handshake,
> > ,
> > NULL,
> > NULL);
> > 
> >   if (!data.complete) {
> > +data.loop = g_main_loop_new(g_main_context_default(), FALSE);
> >   g_main_loop_run(data.loop);
> > +g_main_loop_unref(data.loop);
> 
> probably good to assert(data.complete);

Seems reasonable.

> > +++ b/nbd/server.c
> > @@ -748,6 +748,23 @@ static int nbd_negotiate_handle_info(NBDClient 
> > *client, Error **errp)
> >   return rc;
> >   }
> > 
> > +/* Callback to learn when QIO TLS upgrade is complete */
> > +struct NBDTLSServerHandshakeData {
> > +bool complete;
> > +Error *error;
> > +Coroutine *co;
> > +};
> > +
> > +static void nbd_server_tls_handshake(QIOTask *task, void *opaque)
> > +{
> > +struct NBDTLSServerHandshakeData *data = opaque;
> > +
> > +qio_task_propagate_error(task, >error);
> > +data->complete = true;
> > +if (!qemu_coroutine_entered(data->co)) {
> > +aio_co_wake(data->co);
> > +}
> > +}
> > 
> >   /* Handle NBD_OPT_STARTTLS. Return NULL to drop connection, or else the
> >* new channel for all further (now-encrypted) communication. */
> > @@ -756,7 +773,7 @@ static QIOChannel 
> > *nbd_negotiate_handle_starttls(NBDClient *client,
> >   {
> >   QIOChannel *ioc;
> >   QIOChannelTLS *tioc;
> > -struct NBDTLSHandshakeData data = { 0 };
> > +struct NBDTLSServerHandshakeData data = { 0 };
> > 
> >   assert(client->opt == NBD_OPT_STARTTLS);
> > 
> > @@ -777,17 +794,17 @@ static QIOChannel 
> > *nbd_negotiate_handle_starttls(NBDClient *client,
> 
> preexisting: lack coroutine_fn, as well as caller nbd_negotiate_options()

Indeed, so now would not hurt to add them now that a callback is no
longer shared 

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-08 Thread Jinpu Wang
Hi Peter,

On Tue, Apr 2, 2024 at 11:24 PM Peter Xu  wrote:
>
> On Mon, Apr 01, 2024 at 11:26:25PM +0200, Yu Zhang wrote:
> > Hello Peter und Zhjian,
> >
> > Thank you so much for letting me know about this. I'm also a bit surprised 
> > at
> > the plan for deprecating the RDMA migration subsystem.
>
> It's not too late, since it looks like we do have users not yet notified
> from this, we'll redo the deprecation procedure even if it'll be the final
> plan, and it'll be 2 releases after this.
>
> >
> > > IMHO it's more important to know whether there are still users and whether
> > > they would still like to see it around.
> >
> > > I admit RDMA migration was lack of testing(unit/CI test), which led to 
> > > the a few
> > > obvious bugs being noticed too late.
> >
> > Yes, we are a user of this subsystem. I was unaware of the lack of test 
> > coverage
> > for this part. As soon as 8.2 was released, I saw that many of the
> > migration test
> > cases failed and came to realize that there might be a bug between 8.1
> > and 8.2, but
> > was unable to confirm and report it quickly to you.
> >
> > The maintenance of this part could be too costly or difficult from
> > your point of view.
>
> It may or may not be too costly, it's just that we need real users of RDMA
> taking some care of it.  Having it broken easily for >1 releases definitely
> is a sign of lack of users.  It is an implication to the community that we
> should consider dropping some features so that we can get the best use of
> the community resources for the things that may have a broader audience.
>
> One thing majorly missing is a RDMA tester to guard all the merges to not
> break RDMA paths, hopefully in CI.  That should not rely on RDMA hardwares
> but just to sanity check the migration+rdma code running all fine.  RDMA
> taught us the lesson so we're requesting CI coverage for all other new
> features that will be merged at least for migration subsystem, so that we
> plan to not merge anything that is not covered by CI unless extremely
> necessary in the future.
>
> For sure CI is not the only missing part, but I'd say we should start with
> it, then someone should also take care of the code even if only in
> maintenance mode (no new feature to add on top).
>
> >
> > My concern is, this plan will forces a few QEMU users (not sure how
> > many) like us
> > either to stick to the RDMA migration by using an increasingly older
> > version of QEMU,
> > or to abandon the currently used RDMA migration.
>
> RDMA doesn't get new features anyway, if there's specific use case for RDMA
> migrations, would it work if such a scenario uses the old binary?  Is it
> possible to switch to the TCP protocol with some good NICs?
We have used rdma migration with HCA from Nvidia for years, our
experience is RDMA migration works better than tcp (over ipoib).

Switching back to TCP will lead us to the old problems which was
solved by RDMA migration.

>
> Per our best knowledge, RDMA users are rare, and please let anyone know if
> you are aware of such users.  IIUC the major reason why RDMA stopped being
> the trend is because the network is not like ten years ago; I don't think I
> have good knowledge in RDMA at all nor network, but my understanding is
> it's pretty easy to fetch modern NIC to outperform RDMAs, then it may make
> little sense to maintain multiple protocols, considering RDMA migration
> code is so special so that it has the most custom code comparing to other
> protocols.
+cc some guys from Huawei.

I'm surprised RDMA users are rare,  I guess maybe many are just
working with different code base.
>
> Thanks,
>
> --
> Peter Xu

Thx!
Jinpu Wang
>



Re: [PATCH-for-9.0] hw/sd/sdhci: Discard excess of data written to Buffer Data Port register

2024-04-08 Thread Peter Maydell
On Thu, 4 Apr 2024 at 09:56, Philippe Mathieu-Daudé  wrote:
>
> Per "SD Host Controller Standard Specification Version 3.00":
>
>   * 1.7 Buffer Control
>
>   - 1.7.1 Control of Buffer Pointer
>
> (3) Buffer Control with Block Size
>
> In case of write operation, the buffer accumulates the data
> written through the Buffer Data Port register. When the buffer
> pointer reaches the block size, Buffer Write Enable in the
> Present State register changes 1 to 0. It means no more data
> can be written to the buffer. Excess data of the last write is
> ignored. For example, if just lower 2 bytes data can be written
> to the buffer and a 32-bit (4-byte) block of data is written to
> the Buffer Data Port register, the lower 2 bytes of data is
> written to the buffer and the upper 2 bytes is ignored.
>
> Discard the excess of data to avoid overflow reported by fuzzer:
>
>   $ cat << EOF | qemu-system-i386 \
>  -display none -nodefaults \
>  -machine accel=qtest -m 512M \
>  -device sdhci-pci,sd-spec-version=3 \
>  -device sd-card,drive=mydrive \
>  -drive 
> if=none,index=0,file=null-co://,format=raw,id=mydrive -nographic \
>  -qtest stdio
>   outl 0xcf8 0x80001013
>   outl 0xcfc 0x91
>   outl 0xcf8 0x80001001
>   outl 0xcfc 0x0600
>   write 0x912c 0x1 0x05
>   write 0x9158 0x1 0x16
>   write 0x9105 0x1 0x04
>   write 0x9128 0x1 0x08
>   write 0x16 0x1 0x21
>   write 0x19 0x1 0x20
>   write 0x910c 0x1 0x01
>   write 0x910e 0x1 0x20
>   write 0x910f 0x1 0x00
>   write 0x910c 0x1 0x00
>   write 0x9120 0x1 0x00
>   EOF

> diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
> index c5e0bc018b..2dd88fa139 100644
> --- a/hw/sd/sdhci.c
> +++ b/hw/sd/sdhci.c
> @@ -552,7 +552,7 @@ static void sdhci_write_block_to_card(SDHCIState *s)
>   * register */
>  static void sdhci_write_dataport(SDHCIState *s, uint32_t value, unsigned 
> size)
>  {
> -unsigned i;
> +unsigned i, available;
>
>  /* Check that there is free space left in a buffer */
>  if (!(s->prnsts & SDHC_SPACE_AVAILABLE)) {
> @@ -560,6 +560,14 @@ static void sdhci_write_dataport(SDHCIState *s, uint32_t 
> value, unsigned size)
>  return;
>  }
>
> +available = s->buf_maxsz - s->data_count;
> +if (size > available) {
> +qemu_log_mask(LOG_GUEST_ERROR, "SDHC buffer data full (size: 
> %"PRIu32")"
> +   " discarding %u byte%s\n",
> +   s->buf_maxsz, size - available,
> +   size - available > 1 ? "s" : "");
> +size = available; /* Excess data of the last write is ignored. */
> +}
>  for (i = 0; i < size; i++) {
>  s->fifo_buffer[s->data_count] = value & 0xFF;
>  s->data_count++;

So, this will definitely avoid the buffer overrun, and the
quoted text also suggests that we should not be doing the
"if sdhci_write_block_to_card() writes the data then keep
going with the rest of the bytes in the value for the start
of the new block". (With this change we could move the
"if (s->data_count >= (s->blksize & BLOCK_SIZE_MASK)) ..."
out of the for() loop and down to the bottom of the function.)

But I'm not sure it fixes the underlying cause of the problem,
because the repro case isn't writing a multi-byte value, it's
only writing a single byte.

It looks from the code like if there's no space in the
buffer then SDHC_SPACE_AVAILABLE should be clear in the
present-status register, but that has somehow got out of sync.
The way the repro from the fuzzer toggles the device in and
out of DMA mode looks suspicious about how that out-of-sync
situation might have come about.

thanks
-- PMM



Re: [PATCH v4] nbd/server: do not poll within a coroutine context

2024-04-08 Thread Vladimir Sementsov-Ogievskiy

On 05.04.24 20:44, Eric Blake wrote:

From: Zhu Yangyang 

Coroutines are not supposed to block. Instead, they should yield.

The client performs TLS upgrade outside of an AIOContext, during
synchronous handshake; this still requires g_main_loop.  But the
server responds to TLS upgrade inside a coroutine, so a nested
g_main_loop is wrong.  Since the two callbacks no longer share more
than the setting of data.complete and data.error, it's just as easy to
use static helpers instead of trying to share a common code path.

Fixes: f95910f ("nbd: implement TLS support in the protocol negotiation")
Signed-off-by: Zhu Yangyang 
[eblake: move callbacks to their use point]
Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 

still, some notes below


---

v3: https://lists.gnu.org/archive/html/qemu-devel/2024-04/msg00375.html

in v4, factor even the struct to the .c files, avoiding a union [Vladimir]

  nbd/nbd-internal.h | 10 --
  nbd/client.c   | 27 +++
  nbd/common.c   | 11 ---
  nbd/server.c   | 29 +++--
  4 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index dfa02f77ee4..91895106a95 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -72,16 +72,6 @@ static inline int nbd_write(QIOChannel *ioc, const void 
*buffer, size_t size,
  return qio_channel_write_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
  }

-struct NBDTLSHandshakeData {
-GMainLoop *loop;
-bool complete;
-Error *error;
-};
-
-
-void nbd_tls_handshake(QIOTask *task,
-   void *opaque);
-
  int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);

  #endif
diff --git a/nbd/client.c b/nbd/client.c
index 29ffc609a4b..c7141d7a098 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -596,13 +596,31 @@ static int nbd_request_simple_option(QIOChannel *ioc, int 
opt, bool strict,
  return 1;
  }

+/* Callback to learn when QIO TLS upgrade is complete */
+struct NBDTLSClientHandshakeData {
+bool complete;
+Error *error;
+GMainLoop *loop;
+};
+
+static void nbd_client_tls_handshake(QIOTask *task, void *opaque)
+{
+struct NBDTLSClientHandshakeData *data = opaque;
+
+qio_task_propagate_error(task, >error);
+data->complete = true;
+if (data->loop) {
+g_main_loop_quit(data->loop);
+}
+}
+
  static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
  QCryptoTLSCreds *tlscreds,
  const char *hostname, Error **errp)
  {
  int ret;
  QIOChannelTLS *tioc;
-struct NBDTLSHandshakeData data = { 0 };
+struct NBDTLSClientHandshakeData data = { 0 };

  ret = nbd_request_simple_option(ioc, NBD_OPT_STARTTLS, true, errp);
  if (ret <= 0) {
@@ -619,18 +637,19 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
  return NULL;
  }
  qio_channel_set_name(QIO_CHANNEL(tioc), "nbd-client-tls");
-data.loop = g_main_loop_new(g_main_context_default(), FALSE);
  trace_nbd_receive_starttls_tls_handshake();
  qio_channel_tls_handshake(tioc,
-  nbd_tls_handshake,
+  nbd_client_tls_handshake,
,
NULL,
NULL);

  if (!data.complete) {
+data.loop = g_main_loop_new(g_main_context_default(), FALSE);
  g_main_loop_run(data.loop);
+g_main_loop_unref(data.loop);


probably good to assert(data.complete);


  }
-g_main_loop_unref(data.loop);
+
  if (data.error) {
  error_propagate(errp, data.error);
  object_unref(OBJECT(tioc));
diff --git a/nbd/common.c b/nbd/common.c
index 3247c1d618a..589a748cfe6 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -47,17 +47,6 @@ int nbd_drop(QIOChannel *ioc, size_t size, Error **errp)
  }


-void nbd_tls_handshake(QIOTask *task,
-   void *opaque)
-{
-struct NBDTLSHandshakeData *data = opaque;
-
-qio_task_propagate_error(task, >error);
-data->complete = true;
-g_main_loop_quit(data->loop);
-}
-
-
  const char *nbd_opt_lookup(uint32_t opt)
  {
  switch (opt) {
diff --git a/nbd/server.c b/nbd/server.c
index c3484cc1ebc..ea13cf0e766 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -748,6 +748,23 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
  return rc;
  }

+/* Callback to learn when QIO TLS upgrade is complete */
+struct NBDTLSServerHandshakeData {
+bool complete;
+Error *error;
+Coroutine *co;
+};
+
+static void nbd_server_tls_handshake(QIOTask *task, void *opaque)
+{
+struct NBDTLSServerHandshakeData *data = opaque;
+
+qio_task_propagate_error(task, >error);
+data->complete = true;
+if (!qemu_coroutine_entered(data->co)) {
+aio_co_wake(data->co);
+}
+}

  /* Handle NBD_OPT_STARTTLS. 

Re: [PATCH-for-9.0? 3/3] hw/block/nand: Fix out-of-bound access in NAND block buffer

2024-04-08 Thread Philippe Mathieu-Daudé

On 8/4/24 10:36, Philippe Mathieu-Daudé wrote:

nand_command() and nand_getio() don't check @offset points
into the block, nor the available data length (s->iolen) is
not negative.

In order to fix:

- check the offset is in range in nand_blk_load_NAND_PAGE_SIZE(),
- do not set @iolen if blk_load() failed.

Reproducer:

   $ cat << EOF | qemu-system-arm -machine tosa \
  -monitor none -serial none \
  -display none -qtest stdio
   write 0x1111 0x1 0xca
   write 0x1104 0x1 0x47
   write 0x1000ca04 0x1 0xd7
   write 0x1000ca01 0x1 0xe0
   write 0x1000ca04 0x1 0x71
   write 0x1000ca00 0x1 0x50
   write 0x1000ca04 0x1 0xd7
   read 0x1000ca02 0x1
   write 0x1000ca01 0x1 0x10
   EOF

=
==15750==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61f00de0
  at pc 0x560e61557210 bp 0x7ffcfc4a59f0 sp 0x7ffcfc4a59e8
READ of size 1 at 0x61f00de0 thread T0
 #0 0x560e6155720f in mem_and hw/block/nand.c:101:20
 #1 0x560e6155ac9c in nand_blk_write_512 hw/block/nand.c:663:9
 #2 0x560e61544200 in nand_command hw/block/nand.c:293:13
 #3 0x560e6153cc83 in nand_setio hw/block/nand.c:520:13
 #4 0x560e61a0a69e in tc6393xb_nand_writeb hw/display/tc6393xb.c:380:13
 #5 0x560e619f9bf7 in tc6393xb_writeb hw/display/tc6393xb.c:524:9
 #6 0x560e647c7d03 in memory_region_write_accessor softmmu/memory.c:492:5
 #7 0x560e647c7641 in access_with_adjusted_size softmmu/memory.c:554:18
 #8 0x560e647c5f66 in memory_region_dispatch_write softmmu/memory.c:1514:16
 #9 0x560e6485409e in flatview_write_continue softmmu/physmem.c:2825:23
 #10 0x560e648421eb in flatview_write softmmu/physmem.c:2867:12
 #11 0x560e64841ca8 in address_space_write softmmu/physmem.c:2963:18
 #12 0x560e61170162 in qemu_writeb tests/qtest/videzzo/videzzo_qemu.c:1080:5
 #13 0x560e6116eef7 in dispatch_mmio_write 
tests/qtest/videzzo/videzzo_qemu.c:1227:28

0x61f00de0 is located 0 bytes to the right of 3424-byte region 
[0x61f00080,0x61f00de0)
allocated by thread T0 here:
 #0 0x560e611276cf in malloc 
/root/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
 #1 0x7f7959a87e98 in g_malloc 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57e98)
 #2 0x560e64b98871 in object_new qom/object.c:749:12
 #3 0x560e64b5d1a1 in qdev_new hw/core/qdev.c:153:19
 #4 0x560e61547ea5 in nand_init hw/block/nand.c:639:11
 #5 0x560e619f8772 in tc6393xb_init hw/display/tc6393xb.c:558:16
 #6 0x560e6390bad2 in tosa_init hw/arm/tosa.c:250:12

SUMMARY: AddressSanitizer: heap-buffer-overflow hw/block/nand.c:101:20 in 
mem_and
==15750==ABORTING

Broken since introduction in commit 3e3d5815cb ("NAND Flash memory
emulation and ECC calculation helpers for use by NAND controllers").

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1446


Also:

  Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1445


Reported-by: Qiang Liu 
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/block/nand.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)




[PATCH-for-9.0? 2/3] hw/block/nand: Have blk_load() return boolean indicating success

2024-04-08 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/nand.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 6fa9038bb5..3627c799b5 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -84,7 +84,11 @@ struct NANDFlashState {
 
 void (*blk_write)(NANDFlashState *s);
 void (*blk_erase)(NANDFlashState *s);
-void (*blk_load)(NANDFlashState *s, uint64_t addr, int offset);
+/*
+ * Returns %true when block containing (@addr + @offset) is
+ * successfully loaded, otherwise %false.
+ */
+bool (*blk_load)(NANDFlashState *s, uint64_t addr, int offset);
 
 uint32_t ioaddr_vmstate;
 };
@@ -769,11 +773,11 @@ static void glue(nand_blk_erase_, 
NAND_PAGE_SIZE)(NANDFlashState *s)
 }
 }
 
-static void glue(nand_blk_load_, NAND_PAGE_SIZE)(NANDFlashState *s,
+static bool glue(nand_blk_load_, NAND_PAGE_SIZE)(NANDFlashState *s,
 uint64_t addr, int offset)
 {
 if (PAGE(addr) >= s->pages) {
-return;
+return false;
 }
 
 if (s->blk) {
@@ -801,6 +805,8 @@ static void glue(nand_blk_load_, 
NAND_PAGE_SIZE)(NANDFlashState *s,
 offset, NAND_PAGE_SIZE + OOB_SIZE - offset);
 s->ioaddr = s->io;
 }
+
+return true;
 }
 
 static void glue(nand_init_, NAND_PAGE_SIZE)(NANDFlashState *s)
-- 
2.41.0




[PATCH-for-9.0? 1/3] hw/block/nand: Factor nand_load_iolen() method out

2024-04-08 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/nand.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index d1435f2207..6fa9038bb5 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -243,9 +243,25 @@ static inline void nand_pushio_byte(NANDFlashState *s, 
uint8_t value)
 }
 }
 
+/*
+ * nand_load_block: Load block containing (s->addr + @offset).
+ * Returns length of data available at @offset in this block.
+ */
+static int nand_load_block(NANDFlashState *s, int offset)
+{
+int iolen;
+
+s->blk_load(s, s->addr, offset);
+
+iolen = (1 << s->page_shift) - offset;
+if (s->gnd) {
+iolen += 1 << s->oob_shift;
+}
+return iolen;
+}
+
 static void nand_command(NANDFlashState *s)
 {
-unsigned int offset;
 switch (s->cmd) {
 case NAND_CMD_READ0:
 s->iolen = 0;
@@ -271,12 +287,7 @@ static void nand_command(NANDFlashState *s)
 case NAND_CMD_NOSERIALREAD2:
 if (!(nand_flash_ids[s->chip_id].options & NAND_SAMSUNG_LP))
 break;
-offset = s->addr & ((1 << s->addr_shift) - 1);
-s->blk_load(s, s->addr, offset);
-if (s->gnd)
-s->iolen = (1 << s->page_shift) - offset;
-else
-s->iolen = (1 << s->page_shift) + (1 << s->oob_shift) - offset;
+s->iolen = nand_load_block(s, s->addr & ((1 << s->addr_shift) - 1));
 break;
 
 case NAND_CMD_RESET:
@@ -597,12 +608,7 @@ uint32_t nand_getio(DeviceState *dev)
 if (!s->iolen && s->cmd == NAND_CMD_READ0) {
 offset = (int) (s->addr & ((1 << s->addr_shift) - 1)) + s->offset;
 s->offset = 0;
-
-s->blk_load(s, s->addr, offset);
-if (s->gnd)
-s->iolen = (1 << s->page_shift) - offset;
-else
-s->iolen = (1 << s->page_shift) + (1 << s->oob_shift) - offset;
+s->iolen = nand_load_block(s, offset);
 }
 
 if (s->ce || s->iolen <= 0) {
-- 
2.41.0




[PATCH-for-9.0? 0/3] hw/block/nand: Fix out-of-bound access in NAND block buffer

2024-04-08 Thread Philippe Mathieu-Daudé
Fix for https://gitlab.com/qemu-project/qemu/-/issues/1446

Philippe Mathieu-Daudé (3):
  hw/block/nand: Factor nand_load_iolen() method out
  hw/block/nand: Have blk_load() return boolean indicating success
  hw/block/nand: Fix out-of-bound access in NAND block buffer

 hw/block/nand.c | 50 +
 1 file changed, 34 insertions(+), 16 deletions(-)

-- 
2.41.0




[PATCH-for-9.0? 3/3] hw/block/nand: Fix out-of-bound access in NAND block buffer

2024-04-08 Thread Philippe Mathieu-Daudé
nand_command() and nand_getio() don't check @offset points
into the block, nor the available data length (s->iolen) is
not negative.

In order to fix:

- check the offset is in range in nand_blk_load_NAND_PAGE_SIZE(),
- do not set @iolen if blk_load() failed.

Reproducer:

  $ cat << EOF | qemu-system-arm -machine tosa \
 -monitor none -serial none \
 -display none -qtest stdio
  write 0x1111 0x1 0xca
  write 0x1104 0x1 0x47
  write 0x1000ca04 0x1 0xd7
  write 0x1000ca01 0x1 0xe0
  write 0x1000ca04 0x1 0x71
  write 0x1000ca00 0x1 0x50
  write 0x1000ca04 0x1 0xd7
  read 0x1000ca02 0x1
  write 0x1000ca01 0x1 0x10
  EOF

=
==15750==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61f00de0
 at pc 0x560e61557210 bp 0x7ffcfc4a59f0 sp 0x7ffcfc4a59e8
READ of size 1 at 0x61f00de0 thread T0
#0 0x560e6155720f in mem_and hw/block/nand.c:101:20
#1 0x560e6155ac9c in nand_blk_write_512 hw/block/nand.c:663:9
#2 0x560e61544200 in nand_command hw/block/nand.c:293:13
#3 0x560e6153cc83 in nand_setio hw/block/nand.c:520:13
#4 0x560e61a0a69e in tc6393xb_nand_writeb hw/display/tc6393xb.c:380:13
#5 0x560e619f9bf7 in tc6393xb_writeb hw/display/tc6393xb.c:524:9
#6 0x560e647c7d03 in memory_region_write_accessor softmmu/memory.c:492:5
#7 0x560e647c7641 in access_with_adjusted_size softmmu/memory.c:554:18
#8 0x560e647c5f66 in memory_region_dispatch_write softmmu/memory.c:1514:16
#9 0x560e6485409e in flatview_write_continue softmmu/physmem.c:2825:23
#10 0x560e648421eb in flatview_write softmmu/physmem.c:2867:12
#11 0x560e64841ca8 in address_space_write softmmu/physmem.c:2963:18
#12 0x560e61170162 in qemu_writeb tests/qtest/videzzo/videzzo_qemu.c:1080:5
#13 0x560e6116eef7 in dispatch_mmio_write 
tests/qtest/videzzo/videzzo_qemu.c:1227:28

0x61f00de0 is located 0 bytes to the right of 3424-byte region 
[0x61f00080,0x61f00de0)
allocated by thread T0 here:
#0 0x560e611276cf in malloc 
/root/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x7f7959a87e98 in g_malloc 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57e98)
#2 0x560e64b98871 in object_new qom/object.c:749:12
#3 0x560e64b5d1a1 in qdev_new hw/core/qdev.c:153:19
#4 0x560e61547ea5 in nand_init hw/block/nand.c:639:11
#5 0x560e619f8772 in tc6393xb_init hw/display/tc6393xb.c:558:16
#6 0x560e6390bad2 in tosa_init hw/arm/tosa.c:250:12

SUMMARY: AddressSanitizer: heap-buffer-overflow hw/block/nand.c:101:20 in 
mem_and
==15750==ABORTING

Broken since introduction in commit 3e3d5815cb ("NAND Flash memory
emulation and ECC calculation helpers for use by NAND controllers").

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1446
Reported-by: Qiang Liu 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/nand.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 3627c799b5..d90dc965a1 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -255,7 +255,9 @@ static int nand_load_block(NANDFlashState *s, int offset)
 {
 int iolen;
 
-s->blk_load(s, s->addr, offset);
+if (!s->blk_load(s, s->addr, offset)) {
+return 0;
+}
 
 iolen = (1 << s->page_shift) - offset;
 if (s->gnd) {
@@ -780,6 +782,10 @@ static bool glue(nand_blk_load_, 
NAND_PAGE_SIZE)(NANDFlashState *s,
 return false;
 }
 
+if (offset > NAND_PAGE_SIZE + OOB_SIZE) {
+return false;
+}
+
 if (s->blk) {
 if (s->mem_oob) {
 if (blk_pread(s->blk, SECTOR(addr) << BDRV_SECTOR_BITS,
-- 
2.41.0




Re: [PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-08 Thread Stefano Garzarella

On Mon, Apr 08, 2024 at 10:03:15AM +0200, David Hildenbrand wrote:

On 08.04.24 09:58, Stefano Garzarella wrote:

On Thu, Apr 04, 2024 at 04:09:34PM +0200, David Hildenbrand wrote:

On 04.04.24 14:23, Stefano Garzarella wrote:

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  11 +++
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+


Acked-by: David Hildenbrand 

One comment: we should maybe just forbid setting share=off. it doesn't
make any sense and it can even result in an unexpected double memory
consumption. We missed doing that for memfd, unfortunately.


Good point!

IIUC the `share` property is defined by the parent `hostmem`, so I
should find a way to override the property here and disable the setter,
or add an option to `hostmem` to make the property non-writable.


Right, or simply fail later when you would find "share=off" in 
shm_backend_memory_alloc().


This seems like the simplest and cleanest approach, I'll go in this 
direction!




When ever supporting named shmem_open(), it could make sense for VM 
snapshotting. Right now it doesn't really make any sense.


Yeah, I see.

Thanks,
Stefano




Re: [PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-08 Thread David Hildenbrand

On 08.04.24 09:58, Stefano Garzarella wrote:

On Thu, Apr 04, 2024 at 04:09:34PM +0200, David Hildenbrand wrote:

On 04.04.24 14:23, Stefano Garzarella wrote:

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
   want to mimic memfd (David)
---
  docs/system/devices/vhost-user.rst |   5 +-
  qapi/qom.json  |  17 +
  backends/hostmem-shm.c | 118 +
  backends/meson.build   |   1 +
  qemu-options.hx|  11 +++
  5 files changed, 150 insertions(+), 2 deletions(-)
  create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
  In order for the daemon to access the VirtIO queues to process the
  requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
  will be passed via the socket as part of the protocol negotiation.
  Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
  '*hugetlbsize': 'size',
  '*seal': 'bool' } }
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+


Acked-by: David Hildenbrand 

One comment: we should maybe just forbid setting share=off. it doesn't
make any sense and it can even result in an unexpected double memory
consumption. We missed doing that for memfd, unfortunately.


Good point!

IIUC the `share` property is defined by the parent `hostmem`, so I
should find a way to override the property here and disable the setter,
or add an option to `hostmem` to make the property non-writable.


Right, or simply fail later when you would find "share=off" in 
shm_backend_memory_alloc().


When ever supporting named shmem_open(), it could make sense for VM 
snapshotting. Right now it doesn't really make any sense.


--
Cheers,

David / dhildenb




Re: [PATCH for-9.1 v3 00/11] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-04-08 Thread Stefano Garzarella

FYI I'll be on PTO till May 2nd, I'll send the v4 when I'm back ASAP.

Thanks,
Stefano

On Thu, Apr 04, 2024 at 02:23:19PM +0200, Stefano Garzarella wrote:

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3:
 - rebased on v9.0.0-rc2
 - patch 4: avoiding setting fd non-blocking for messages where we
   have memory fd (Eric)
 - patch 9: enriched commit message and documentation to highlight that we
   want to mimic memfd (David)

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, and 8 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 9 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 10 and 11 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 39 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

 vhost-user-blk -s /tmp/vhost.socket \
   -b Fedora-Cloud-Base-39-1.5.x86_64.raw

 qemu-storage-daemon \
   --blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
   --blockdev qcow2,file=file,node-name=qcow2 \
   --export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

 qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
   -drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
   -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
   -device ramfb -device usb-ehci -device usb-kbd \
   -object memory-backend-shm,id=mem,size=512M \
   -device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

 qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
   -object memory-backend-shm,id=mem,size="512M" \
   -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

 qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
   -object memory-backend-shm,size="512M" \
   -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Thanks,
Stefano

Stefano Garzarella (11):
 libvhost-user: set msg.msg_control to NULL when it is empty
 libvhost-user: fail vu_message_write() if sendmsg() is failing
 libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
 vhost-user-server: do not set memory fd non-blocking
 contrib/vhost-user-blk: fix bind() using the right size of the address
 vhost-user: enable frontends on any POSIX system
 libvhost-user: enable it on any POSIX system
 contrib/vhost-user-blk: enable it on any POSIX system
 hostmem: add a new memory backend based on POSIX shm_open()
 tests/qtest/vhost-user-blk-test: use memory-backend-shm
 tests/qtest/vhost-user-test: add a test case for memory-backend-shm

docs/system/devices/vhost-user.rst|   5 +-
meson.build   |   5 +-
qapi/qom.json |  17 
subprojects/libvhost-user/libvhost-user.h |   2 +-
backends/hostmem-shm.c| 118 ++
contrib/vhost-user-blk/vhost-user-blk.c   |  23 -
hw/net/vhost_net.c|   5 +
subprojects/libvhost-user/libvhost-user.c |  76 +-
tests/qtest/vhost-user-blk-test.c |   2 +-
tests/qtest/vhost-user-test.c |  23 +
util/vhost-user-server.c  |  12 +++
backends/meson.build  |   1 +
hw/block/Kconfig  |   2 +-
qemu-options.hx   |  11 ++
util/meson.build  |   4 +-
15 files 

Re: [PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-08 Thread Stefano Garzarella

On Thu, Apr 04, 2024 at 04:09:34PM +0200, David Hildenbrand wrote:

On 04.04.24 14:23, Stefano Garzarella wrote:

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  11 +++
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+


Acked-by: David Hildenbrand 

One comment: we should maybe just forbid setting share=off. it doesn't 
make any sense and it can even result in an unexpected double memory 
consumption. We missed doing that for memfd, unfortunately.


Good point!

IIUC the `share` property is defined by the parent `hostmem`, so I 
should find a way to override the property here and disable the setter, 
or add an option to `hostmem` to make the property non-writable.


Thanks,
Stefano




Re: [PATCH-for-9.0] hw/sd/sdhci: Discard excess of data written to Buffer Data Port register

2024-04-08 Thread Mauro Matteo Cascella
On Thu, Apr 4, 2024 at 10:55 AM Philippe Mathieu-Daudé
 wrote:
>
> Per "SD Host Controller Standard Specification Version 3.00":
>
>   * 1.7 Buffer Control
>
>   - 1.7.1 Control of Buffer Pointer
>
> (3) Buffer Control with Block Size
>
> In case of write operation, the buffer accumulates the data
> written through the Buffer Data Port register. When the buffer
> pointer reaches the block size, Buffer Write Enable in the
> Present State register changes 1 to 0. It means no more data
> can be written to the buffer. Excess data of the last write is
> ignored. For example, if just lower 2 bytes data can be written
> to the buffer and a 32-bit (4-byte) block of data is written to
> the Buffer Data Port register, the lower 2 bytes of data is
> written to the buffer and the upper 2 bytes is ignored.
>
> Discard the excess of data to avoid overflow reported by fuzzer:
>
>   $ cat << EOF | qemu-system-i386 \
>  -display none -nodefaults \
>  -machine accel=qtest -m 512M \
>  -device sdhci-pci,sd-spec-version=3 \
>  -device sd-card,drive=mydrive \
>  -drive 
> if=none,index=0,file=null-co://,format=raw,id=mydrive -nographic \
>  -qtest stdio
>   outl 0xcf8 0x80001013
>   outl 0xcfc 0x91
>   outl 0xcf8 0x80001001
>   outl 0xcfc 0x0600
>   write 0x912c 0x1 0x05
>   write 0x9158 0x1 0x16
>   write 0x9105 0x1 0x04
>   write 0x9128 0x1 0x08
>   write 0x16 0x1 0x21
>   write 0x19 0x1 0x20
>   write 0x910c 0x1 0x01
>   write 0x910e 0x1 0x20
>   write 0x910f 0x1 0x00
>   write 0x910c 0x1 0x00
>   write 0x9120 0x1 0x00
>   EOF
>
> Stack trace (part):
> =
> ==89993==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x61529900 at pc 0x55d5f885700d bp 0x7ffc1e1e9470 sp 0x7ffc1e1e9468
> WRITE of size 1 at 0x61529900 thread T0
> #0 0x55d5f885700c in sdhci_write_dataport hw/sd/sdhci.c:564:39
> #1 0x55d5f8849150 in sdhci_write hw/sd/sdhci.c:1223:13
> #2 0x55d5fa01db63 in memory_region_write_accessor system/memory.c:497:5
> #3 0x55d5fa01d245 in access_with_adjusted_size system/memory.c:573:18
> #4 0x55d5fa01b1a9 in memory_region_dispatch_write system/memory.c:1521:16
> #5 0x55d5fa09f5c9 in flatview_write_continue system/physmem.c:2711:23
> #6 0x55d5fa08f78b in flatview_write system/physmem.c:2753:12
> #7 0x55d5fa08f258 in address_space_write system/physmem.c:2860:18
> ...
> 0x61529900 is located 0 bytes to the right of 512-byte region
> [0x61529700,0x61529900) allocated by thread T0 here:
> #0 0x55d5f7237b27 in __interceptor_calloc
> #1 0x7f9e36dd4c50 in g_malloc0
> #2 0x55d5f88672f7 in sdhci_pci_realize hw/sd/sdhci-pci.c:36:5
> #3 0x55d5f844b582 in pci_qdev_realize hw/pci/pci.c:2092:9
> #4 0x55d5fa2ee74b in device_set_realized hw/core/qdev.c:510:13
> #5 0x55d5fa325bfb in property_set_bool qom/object.c:2358:5
> #6 0x55d5fa31ea45 in object_property_set qom/object.c:1472:5
> #7 0x55d5fa332509 in object_property_set_qobject om/qom-qobject.c:28:10
> #8 0x55d5fa31f6ed in object_property_set_bool qom/object.c:1541:15
> #9 0x55d5fa2e2948 in qdev_realize hw/core/qdev.c:292:12
> #10 0x55d5f8eed3f1 in qdev_device_add_from_qdict 
> system/qdev-monitor.c:719:10
> #11 0x55d5f8eef7ff in qdev_device_add system/qdev-monitor.c:738:11
> #12 0x55d5f8f211f0 in device_init_func system/vl.c:1200:11
> #13 0x55d5fad0877d in qemu_opts_foreach util/qemu-option.c:1135:14
> #14 0x55d5f8f0df9c in qemu_create_cli_devices system/vl.c:2638:5
> #15 0x55d5f8f0db24 in qmp_x_exit_preconfig system/vl.c:2706:5
> #16 0x55d5f8f14dc0 in qemu_init system/vl.c:3737:9
> ...
> SUMMARY: AddressSanitizer: heap-buffer-overflow hw/sd/sdhci.c:564:39
> in sdhci_write_dataport
>
> Cc: qemu-sta...@nongnu.org
> Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
> Buglink: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58813
> Reported-by: Alexander Bulekov 
> Reported-by: Chuhong Yuan 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/sd/sdhci.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
> index c5e0bc018b..2dd88fa139 100644
> --- a/hw/sd/sdhci.c
> +++ b/hw/sd/sdhci.c
> @@ -552,7 +552,7 @@ static void sdhci_write_block_to_card(SDHCIState *s)
>   * register */
>  static void sdhci_write_dataport(SDHCIState *s, uint32_t value, unsigned 
> size)
>  {
> -unsigned i;
> +unsigned i, available;
>
>  /* Check that there is free space left in a buffer */
>  if (!(s->prnsts & SDHC_SPACE_AVAILABLE)) {
> @@ -560,6 +560,14 @@ static void sdhci_write_dataport(SDHCIState *s, uint32_t 
> value, unsigned size)
>  return;
>  }
>
> +available = s->buf_maxsz - s->data_count;
> +

Re: [PATCH for-9.1 v3 08/11] contrib/vhost-user-blk: enable it on any POSIX system

2024-04-08 Thread Stefano Garzarella

On Thu, Apr 04, 2024 at 04:00:38PM +0200, Philippe Mathieu-Daudé wrote:

Hi Stefano,


Hi Phil!



On 4/4/24 14:23, Stefano Garzarella wrote:

Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
 util/meson.build|  4 +++-
 3 files changed, 20 insertions(+), 5 deletions(-)




diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"




@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, >out_sg[1], out_num);
 } else {

Can we switch to the bswap API in a preliminary patch,


Sure, I tried to minimize the patches because it's already big,
but I can split this.


converting all the source files?



What do you mean with "all the source files"?

"le64toh" is used here and in some subprojects (e.g. libvduse,
libvhost-user), where IIUC we can't use QEMU's bswap.h because we
don't want to put a dependency with the QEMU code.

BTW I'll check for other *toh() usage in QEMU code and change in the
preliminary patch you suggested to add.

Thanks for the review,
Stefano