Re: [RFC 0/3] Add Generic SPI GPIO model

2022-07-28 Thread Peter Delevoryas
On Thu, Jul 28, 2022 at 04:23:19PM -0700, Iris Chen wrote:
> Hey everyone,
> 
> I have been working on a project to add support for SPI-based TPMs in QEMU.
> Currently, most of our vboot platforms using a SPI-based TPM use the Linux 
> SPI-GPIO driver to "bit-bang" the SPI protocol. This is because the Aspeed 
> SPI controller (modeled in QEMU under hw/ssi/aspeed_smc.c) has an 
> implementation 
> deficiency that prevents bi-directional operations. Thus, in order to connect 
> a TPM to this bus, my patch implements a QEMU SPI-GPIO driver (as the QEMU
> counterpart of the Linux SPI-GPIO driver).
> 
> As we use SPI-based TPMs on many of our BMCs for the secure-boot 
> implementation,  
> I have already tested this implementation locally with our Yosemite-v3.5 
> platform 
> and Facebook-OpenBMC. This model was tested by connecting a generic SPI-NOR 
> (m25p80 
> for example) to the Yosemite-v3.5 SPI bus containing the TPM.
> 
> This patch is an RFC because I have several questions about design. Although 
> the
> model is working, I understand there are many areas where the design decision
> is not deal (ie. abstracting hard coded GPIO values). Below are some details 
> of the 
> patch and specific areas where I would appreciate feedback on how to make 
> this better:
>  
> hw/arm/aspeed.c: 
> I am not sure where the best place is to instantiate the spi_gpio besides the
> aspeed_machine_init. Could we add the ability to instantiate it on the 
> command line?

Yes, hypothetically I think it would be something like this:

-device spi-gpio,miso=aspeed.gpio.gpioX4,mosi=aspeed.gpio.gpioX5,id=foo
-device n25q00,bus=foo,drive=bar
-drive file=bar.mtd,format=raw,if=mtd,id=bar

Something like that? I haven't really looked into the details. I think
it requires work in several other places though.

> 
> m25p80_transfer8_ex in hw/block/m25p80.c: 
> Existing SPI model assumed that the output byte is fully formed, can be 
> passed to 
> the SPI device, and the input byte can be returned, all in one operation. 
> With 
> SPI-GPIO we don't have the input byte until all 8 bits of the output have 
> been 
> shifted out, so we have to prime the MISO with bogus values (0xFF).

Perhaps the Aspeed FMC model needs to support asynchronous transfers?
(Is the M25P80 model not asynchronous already?) I'm still skeptical that
the m25p80_transfer8_ex solution is appropriate.

> 
> MOSI pin in spi_gpio: the mosi pin is not included and we poll the realtime 
> value
> of the gpio for input bits to prevent bugs with caching the mosi value. It 
> was discovered 
> during testing that when using the mosi pin as the input pin, the mosi value 
> was not 
> being updated due to a kernel and aspeed_gpio model optimization. Thus, here 
> we are 
> reading the value directly from the gpio controller instead of waiting for 
> the push.
> 
> I realize there are Aspeed specifics in the spi_gpio model. To make this 
> extensible, 
> is it preferred to make this into a base class and have our Aspeed SPI GPIO 
> extend 
> this or we could set up params to pass in the constructor?

Actually, I would hope that there won't be any inheritance here. The
kernel driver doesn't have some kind of inheritance implementation, for
example.

> 
> Thanks for your review and any direction here would be helpful :) 
> 
> Iris Chen (3):
>   hw: m25p80: add prereading ability in transfer8
>   hw: spi_gpio: add spi gpio model
>   hw: aspeed: hook up the spi gpio model
> 
>  hw/arm/Kconfig|   1 +
>  hw/arm/aspeed.c   |   5 ++
>  hw/block/m25p80.c |  29 ++-
>  hw/ssi/Kconfig|   4 +
>  hw/ssi/meson.build|   1 +
>  hw/ssi/spi_gpio.c | 166 ++
>  hw/ssi/ssi.c  |   4 -
>  include/hw/ssi/spi_gpio.h |  38 +
>  include/hw/ssi/ssi.h  |   5 ++
>  9 files changed, 248 insertions(+), 5 deletions(-)
>  create mode 100644 hw/ssi/spi_gpio.c
>  create mode 100644 include/hw/ssi/spi_gpio.h
> 
> -- 
> 2.30.2
> 



[RFC 2/3] hw: spi_gpio: add spi gpio model

2022-07-28 Thread Iris Chen
Signed-off-by: Iris Chen 
---
 hw/ssi/spi_gpio.c | 166 ++
 include/hw/ssi/spi_gpio.h |  38 +
 2 files changed, 204 insertions(+)
 create mode 100644 hw/ssi/spi_gpio.c
 create mode 100644 include/hw/ssi/spi_gpio.h

diff --git a/hw/ssi/spi_gpio.c b/hw/ssi/spi_gpio.c
new file mode 100644
index 00..1e99c55933
--- /dev/null
+++ b/hw/ssi/spi_gpio.c
@@ -0,0 +1,166 @@
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates. (http://www.meta.com)
+ *
+ * This code is licensed under the GPL version 2 or later. See the COPYING
+ * file in the top-level directory.
+ */
+
+#include "hw/ssi/spi_gpio.h"
+#include "hw/irq.h"
+
+#define SPI_CPHA BIT(0) /* clock phase (1 = SPI_CLOCK_PHASE_SECOND) */
+#define SPI_CPOL BIT(1) /* clock polarity (1 = SPI_POLARITY_HIGH) */
+
+static void do_leading_edge(SpiGpioState *s)
+{
+if (!s->CPHA) {
+s->input_bits |= object_property_get_bool(OBJECT(s->controller_state),
+  "gpioX4", NULL);
+/*
+ * According to SPI protocol:
+ * CPHA=0 leading half clock cycle is sampling phase
+ * We technically should not drive out miso
+ * However, when the kernel bitbang driver is setting the clk pin,
+ * it will overwrite the miso value, so we are driving out miso in
+ * the sampling half clock cycle as well to workaround this issue
+ */
+s->miso = !!(s->output_bits & 0x80);
+object_property_set_bool(OBJECT(s->controller_state), "gpioX5", 
s->miso,
+ NULL);
+}
+}
+
+static void do_trailing_edge(SpiGpioState *s)
+{
+if (s->CPHA) {
+s->input_bits |= object_property_get_bool(OBJECT(s->controller_state),
+  "gpioX4", NULL);
+/*
+ * According to SPI protocol:
+ * CPHA=1 trailing half clock cycle is sampling phase
+ * We technically should not drive out miso
+ * However, when the kernel bitbang driver is setting the clk pin,
+ * it will overwrite the miso value, so we are driving out miso in
+ * the sampling half clock cycle as well to workaround this issue
+ */
+s->miso = !!(s->output_bits & 0x80);
+object_property_set_bool(OBJECT(s->controller_state), "gpioX5", 
s->miso,
+ NULL);
+}
+}
+
+static void cs_set_level(void *opaque, int n, int level)
+{
+SpiGpioState *s = SPI_GPIO(opaque);
+s->cs = !!level;
+
+/* relay the CS value to the CS output pin */
+qemu_set_irq(s->cs_output_pin, s->cs);
+
+s->miso = !!(s->output_bits & 0x80);
+object_property_set_bool(OBJECT(s->controller_state),
+ "gpioX5", s->miso, NULL);
+
+s->clk = !!(s->mode & SPI_CPOL);
+}
+
+static void clk_set_level(void *opaque, int n, int level)
+{
+SpiGpioState *s = SPI_GPIO(opaque);
+
+bool cur = !!level;
+
+/* CS# is high/not selected, do nothing */
+if (s->cs) {
+return;
+}
+
+/* When the lock has not changed, do nothing */
+if (s->clk == cur) {
+return;
+}
+
+s->clk = cur;
+
+/* Leading edge */
+if (s->clk != s->CIDLE) {
+do_leading_edge(s);
+}
+
+/* Trailing edge */
+if (s->clk == s->CIDLE) {
+do_trailing_edge(s);
+s->clk_counter++;
+
+/*
+ * Deliver the input to and
+ * get the next output byte
+ * from the SPI device
+ */
+if (s->clk_counter == 8) {
+s->output_bits = ssi_transfer(s->spi, s->input_bits);
+s->clk_counter = 0;
+s->input_bits = 0;
+ } else {
+s->input_bits <<= 1;
+s->output_bits <<= 1;
+ }
+}
+}
+
+static void spi_gpio_realize(DeviceState *dev, Error **errp)
+{
+SpiGpioState *s = SPI_GPIO(dev);
+
+s->spi = ssi_create_bus(dev, "spi");
+s->spi->preread = true;
+
+s->mode = 0;
+s->clk_counter = 0;
+
+s->cs = true;
+s->clk = true;
+
+/* Assuming the first output byte is 0 */
+s->output_bits = 0;
+s->CIDLE = !!(s->mode & SPI_CPOL);
+s->CPHA = !!(s->mode & SPI_CPHA);
+
+/* init the input GPIO lines */
+/* SPI_CS_in connects to the Aspeed GPIO */
+qdev_init_gpio_in_named(dev, cs_set_level, "SPI_CS_in", 1);
+qdev_init_gpio_in_named(dev, clk_set_level, "SPI_CLK", 1);
+
+/* init the output GPIO lines */
+/* SPI_CS_out connects to the SSI_GPIO_CS */
+qdev_init_gpio_out_named(dev, >cs_output_pin, "SPI_CS_out", 1);
+
+qdev_connect_gpio_out_named(s->controller_state, "sysbus-irq",
+AST_GPIO_IRQ_X0_NUM, qdev_get_gpio_in_named(
+DEVICE(s), "SPI_CS_in", 0));
+qdev_connect_gpio_out_named(s->controller_state, "sysbus-irq",
+AST_GPIO_IRQ_X3_NUM, qdev_get_gpio_in_named(
+   

[RFC 1/3] hw: m25p80: add prereading ability in transfer8

2022-07-28 Thread Iris Chen
With SPI-GPIO we don't have the input bits until
all 8 bits of the output have been shifted out,
so we have to prime the MISO with bogus values (0xFF).

Signed-off-by: Iris Chen 
---
 hw/block/m25p80.c| 29 -
 hw/ssi/ssi.c |  4 
 include/hw/ssi/ssi.h |  5 +
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index a8d2519141..9b26bb96e5 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1462,7 +1462,7 @@ static int m25p80_cs(SSIPeripheral *ss, bool select)
 return 0;
 }
 
-static uint32_t m25p80_transfer8(SSIPeripheral *ss, uint32_t tx)
+static uint32_t m25p80_transfer8_ex(SSIPeripheral *ss, uint32_t tx)
 {
 Flash *s = M25P80(ss);
 uint32_t r = 0;
@@ -1548,6 +1548,33 @@ static uint32_t m25p80_transfer8(SSIPeripheral *ss, 
uint32_t tx)
 return r;
 }
 
+/*
+ * Pre-reading logic for m25p80_transfer8:
+ * The existing SPI model assumes the output byte is fully formed,
+ * can be passed to the SPI device, and the input byte can be returned,
+ * all in one operation. With SPI-GPIO, we don't have the input byte
+ * until all 8 bits of the output have been shifted out, so we have
+ * to prime the MISO with bogus values (0xFF).
+ */
+static uint32_t m25p80_transfer8(SSIPeripheral *ss, uint32_t tx)
+{
+Flash *s = M25P80(ss);
+SSIBus *ssibus = (SSIBus *)qdev_get_parent_bus(DEVICE(s));
+
+uint8_t prev_state = s->state;
+uint32_t r = m25p80_transfer8_ex(ss, tx);
+uint8_t curr_state = s->state;
+
+if (ssibus->preread &&
+   /* cmd state has changed into DATA reading state */
+   (!(prev_state == STATE_READ || prev_state == STATE_READING_DATA) &&
+   (curr_state == STATE_READ || curr_state == STATE_READING_DATA))) {
+r = m25p80_transfer8_ex(ss, 0xFF);
+}
+
+return r;
+}
+
 static void m25p80_write_protect_pin_irq_handler(void *opaque, int n, int 
level)
 {
 Flash *s = M25P80(opaque);
diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c
index 003931fb50..21570fe25f 100644
--- a/hw/ssi/ssi.c
+++ b/hw/ssi/ssi.c
@@ -19,10 +19,6 @@
 #include "qapi/error.h"
 #include "qom/object.h"
 
-struct SSIBus {
-BusState parent_obj;
-};
-
 #define TYPE_SSI_BUS "SSI"
 OBJECT_DECLARE_SIMPLE_TYPE(SSIBus, SSI_BUS)
 
diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h
index f411858ab0..e54073d822 100644
--- a/include/hw/ssi/ssi.h
+++ b/include/hw/ssi/ssi.h
@@ -30,6 +30,11 @@ enum SSICSMode {
 SSI_CS_HIGH,
 };
 
+struct SSIBus {
+BusState parent_obj;
+bool preread;
+};
+
 /* Peripherals.  */
 struct SSIPeripheralClass {
 DeviceClass parent_class;
-- 
2.30.2




[RFC 0/3] Add Generic SPI GPIO model

2022-07-28 Thread Iris Chen
Hey everyone,

I have been working on a project to add support for SPI-based TPMs in QEMU.
Currently, most of our vboot platforms using a SPI-based TPM use the Linux 
SPI-GPIO driver to "bit-bang" the SPI protocol. This is because the Aspeed 
SPI controller (modeled in QEMU under hw/ssi/aspeed_smc.c) has an 
implementation 
deficiency that prevents bi-directional operations. Thus, in order to connect 
a TPM to this bus, my patch implements a QEMU SPI-GPIO driver (as the QEMU
counterpart of the Linux SPI-GPIO driver).

As we use SPI-based TPMs on many of our BMCs for the secure-boot 
implementation,  
I have already tested this implementation locally with our Yosemite-v3.5 
platform 
and Facebook-OpenBMC. This model was tested by connecting a generic SPI-NOR 
(m25p80 
for example) to the Yosemite-v3.5 SPI bus containing the TPM.

This patch is an RFC because I have several questions about design. Although the
model is working, I understand there are many areas where the design decision
is not deal (ie. abstracting hard coded GPIO values). Below are some details of 
the 
patch and specific areas where I would appreciate feedback on how to make this 
better:
 
hw/arm/aspeed.c: 
I am not sure where the best place is to instantiate the spi_gpio besides the
aspeed_machine_init. Could we add the ability to instantiate it on the command 
line?

m25p80_transfer8_ex in hw/block/m25p80.c: 
Existing SPI model assumed that the output byte is fully formed, can be passed 
to 
the SPI device, and the input byte can be returned, all in one operation. With 
SPI-GPIO we don't have the input byte until all 8 bits of the output have been 
shifted out, so we have to prime the MISO with bogus values (0xFF).

MOSI pin in spi_gpio: the mosi pin is not included and we poll the realtime 
value
of the gpio for input bits to prevent bugs with caching the mosi value. It was 
discovered 
during testing that when using the mosi pin as the input pin, the mosi value 
was not 
being updated due to a kernel and aspeed_gpio model optimization. Thus, here we 
are 
reading the value directly from the gpio controller instead of waiting for the 
push.

I realize there are Aspeed specifics in the spi_gpio model. To make this 
extensible, 
is it preferred to make this into a base class and have our Aspeed SPI GPIO 
extend 
this or we could set up params to pass in the constructor?

Thanks for your review and any direction here would be helpful :) 

Iris Chen (3):
  hw: m25p80: add prereading ability in transfer8
  hw: spi_gpio: add spi gpio model
  hw: aspeed: hook up the spi gpio model

 hw/arm/Kconfig|   1 +
 hw/arm/aspeed.c   |   5 ++
 hw/block/m25p80.c |  29 ++-
 hw/ssi/Kconfig|   4 +
 hw/ssi/meson.build|   1 +
 hw/ssi/spi_gpio.c | 166 ++
 hw/ssi/ssi.c  |   4 -
 include/hw/ssi/spi_gpio.h |  38 +
 include/hw/ssi/ssi.h  |   5 ++
 9 files changed, 248 insertions(+), 5 deletions(-)
 create mode 100644 hw/ssi/spi_gpio.c
 create mode 100644 include/hw/ssi/spi_gpio.h

-- 
2.30.2




Re: [PATCH 1/1] block: add missed block_acct_setup with new block device init procedure

2022-07-28 Thread Denis V. Lunev

On 28.07.2022 16:42, Vladimir Sementsov-Ogievskiy wrote:

On 7/11/22 14:07, Denis V. Lunev wrote:

Commit 5f76a7aac156ca75680dad5df4a385fd0b58f6b1 is looking harmless from
the first glance, but it has changed things a lot. 'libvirt' uses it to
detect that it should follow new initialization way and this changes
things considerably. With this procedure followed, blockdev_init() is
not called anymore and thus block_acct_setup() helper is not called.


I'm not sure that 5f76a7aac156ca is really the corner stone.. But yes,
libvirt moved to "blockdev era", which means that we don't use old 
-drive,
instead block nodes are created by -blockdev / qmp: blockdev-add, and 
attached

to block devices by node-name.


git bisected, thus I am sure here


And if I understand correctly blockdev_init() is called only on -drive 
path.


I have some questions:

1. After this patch, don't we call block_acct_setup() twice on old 
path with -drive? That seems safe as block_acct_setup just assign 
fields of BlockAcctStats.. But that's doesn't look good.



hmmm

2. Do we really need these options? Could we instead just enable 
accounting invalid and failed ops unconditionally? I doubt that 
someone will learn that these new options appeared and will use them 
to disable the failed/invalid accounting again.



I can move assignment of these fields to true int
block_acct_init() and forget about "configurable"
items in new path. I do not think that somebody
ever has these options set.

The real question in this patch is that this initialization
was a precondition for old good "long IO" report
configuration, which should be "enableable".

But  we could move this option to "tracked request"
layer only and this will solve my puzzle. So, I'll move
"long IO report" to tracked request level only and will
create an option for it on bdrv_ level and will avoid
it on blk_ accounting.

What do you think?

Den







This means in particular that defaults for block accounting statistics
are changed and account_invalid/account_failed are actually initialized
as false instead of true originally.

This commit changes things to match original world. It adds
account_invalid/account_failed properties to BlockConf and setups them
accordingly.

Signed-off-by: Denis V. Lunev 
CC: Peter Krempa 
CC: Markus Armbruster 
CC: John Snow 
CC: Kevin Wolf 
CC: Hanna Reitz 
---
  hw/block/block.c   |  2 +
  include/hw/block/block.h   |  7 +++-
  tests/qemu-iotests/172.out | 76 ++
  3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 25f45df723..53b100cdc3 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -205,6 +205,8 @@ bool blkconf_apply_backend_options(BlockConf 
*conf, bool readonly,

 BlockdevOnError rerror;

  blk_set_enable_write_cache(blk, wce);
  blk_set_on_error(blk, rerror, werror);
  +    block_acct_setup(blk_get_stats(blk), conf->account_invalid,
+ conf->account_failed);
  return true;
  }
  diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 5902c0440a..ffd439fc83 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -31,6 +31,7 @@ typedef struct BlockConf {
  uint32_t lcyls, lheads, lsecs;
  OnOffAuto wce;
  bool share_rw;
+    bool account_invalid, account_failed;
  BlockdevOnError rerror;
  BlockdevOnError werror;
  } BlockConf;
@@ -61,7 +62,11 @@ static inline unsigned int 
get_physical_block_exp(BlockConf *conf)
 _conf.discard_granularity, 
-1),  \
  DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, 
_conf.wce,   \

ON_OFF_AUTO_AUTO),  \
-    DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false)
+    DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, 
false),    \
+    DEFINE_PROP_BOOL("account-invalid", 
_state, \
+ _conf.account_invalid, 
true),  \
+    DEFINE_PROP_BOOL("account-failed", 
_state,  \

+ _conf.account_failed, true)
    #define DEFINE_BLOCK_PROPERTIES(_state, 
_conf)  \
  DEFINE_PROP_DRIVE("drive", _state, 
_conf.blk),  \

diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
index 9479b92185..a6c451e098 100644
--- a/tests/qemu-iotests/172.out
+++ b/tests/qemu-iotests/172.out
@@ -28,6 +28,8 @@ Formatting 'TEST_DIR/t.IMGFMT.3', fmt=IMGFMT 
size=737280

  discard_granularity = 4294967295 (4 GiB)
  write-cache = "auto"
  share-rw = false
+    account-invalid = true
+    account-failed = true
  drive-type = "288"
    @@ -55,6 +57,8 @@ Testing: -fda TEST_DIR/t.qcow2
  discard_granularity = 4294967295 (4 GiB)
  write-cache = "auto"
  share-rw = false
+ 

Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Kevin Wolf
Am 28.07.2022 um 16:50 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Am 28.07.2022 um 11:46 hat Peter Maydell geschrieben:
> >> On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:
> >> >
> >> > Am 18.07.2022 um 11:49 hat Markus Armbruster geschrieben:
> >> > > An OTP device isn't really a parallel flash, and neither are eFuses.
> >> > > More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
> >> > > other interface types, too.
> >> > >
> >> > > This patch introduces IF_OTHER.  The patch after next uses it for an
> >> > > EEPROM device.
> >> > >
> >> > > Do we want IF_OTHER?
> >> >
> >> > What would the semantics even be? Any block device that doesn't pick up
> >> > a different category may pick up IF_OTHER backends?
> >> >
> >> > It certainly feels like a strange interface to ask for "other" disk and
> >> > then getting as surprise what this other thing might be. It's
> >> > essentially the same as having an explicit '-device other', and I
> >> > suppose most people would find that strange.
> >> >
> >> > > If no, I guess we get to abuse IF_PFLASH some more.
> >> > >
> >> > > If yes, I guess we should use IF_PFLASH only for actual parallel flash
> >> > > memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
> >> > > be worth the trouble, though.
> >> > >
> >> > > Thoughts?
> >> >
> >> > If the existing types aren't good enough (I don't have an opinion on
> >> > whether IF_PFLASH is a good match), let's add a new one. But a specific
> >> > new one, not just "other".
> >> 
> >> I think the common thread is "this isn't what anybody actually thinks
> >> of as being a 'disk', but we would like to back it with a block device
> >> anyway". That can cover a fair range of possibilities...
> >
> > How confident are we that no board will ever have two devices of this
> > kind?
> >
> > As long as every board has at most one, if=other is a bad user interface
> > in terms of descriptiveness, but still more or less workable as long as
> > you know what it means for the specific board you use.
> >
> > But if you have more than one device, it becomes hard to predict which
> > device gets which backend - it depends on the initialisation order in
> > the code then,
> 
> Really?  Board code should use IF_OTHER devices just like it uses the
> other interface types, namely connecting each frontend device to a
> backend device with a well-known and fixed interface type and index (or
> bus and unit instead, where appropriate).
> 
> >and I'm pretty sure that this isn't something that should
> > have significance in external interfaces and therefore become a stable
> > API.
> 
> I agree that "implied by execution order" is a bad idea: commit
> 95fd260f0a "blockdev: Drop unused drive_get_next()".

Ah good, I was indeed thinking of something drive_get_next()-like.

In case the board works with explicit indices, the situation is not as
bad as I was afraid. It will certainly be workable (even if not obvious)
for any boards that have a fixed number of devices with block backends,
which should cover everything we're intending to cover here.

I still consider if=other a bad user interface because what it means is
completely opaque, but if that's okay for you in your board, who am I to
object.

(Of course, the real solution would be having a generic way to set qdev
properties for on-board devices. I'm not expecting that we're getting
this anytime soon, though.)

Kevin




Re: [RFC patch 0/1] block: vhost-blk backend

2022-07-28 Thread Stefano Garzarella
On Thu, Jul 28, 2022 at 7:28 AM Andrey Zhadchenko 
 wrote:
> On 7/27/22 16:06, Stefano Garzarella wrote:
> > On Tue, Jul 26, 2022 at 04:15:48PM +0200, Denis V. Lunev wrote:
> >> On 26.07.2022 15:51, Michael S. Tsirkin wrote:
> >>> On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote:
>  Although QEMU virtio-blk is quite fast, there is still some room for
>  improvements. Disk latency can be reduced if we handle virito-blk
>  requests
>  in host kernel so we avoid a lot of syscalls and context switches.
> 
>  The biggest disadvantage of this vhost-blk flavor is raw format.
>  Luckily Kirill Thai proposed device mapper driver for QCOW2 format
>  to attach
>  files as block devices:
>  https://www.spinics.net/lists/kernel/msg4292965.html
> >>> That one seems stalled. Do you plan to work on that too?
> >> We have too. The difference in numbers, as you seen below is quite too
> >> much. We have waited for this patch to be sent to keep pushing.
> >>
> >> It should be noted that may be talk on OSS this year could also push a
> >> bit.
> >
> > Cool, the results are similar of what I saw when I compared vhost-blk
> > and io_uring passthrough with NVMe (Slide 7 here: [1]).
> >
> > About QEMU block layer support, we recently started to work on libblkio
> > [2]. Stefan also sent an RFC [3] to implement the QEMU BlockDriver.
> > Currently it supports virtio-blk devices using vhost-vdpa and vhost-user.
> > We could add support for vhost (kernel) as well, though, we were
> > thinking of leveraging vDPA to implement in-kernel software device as well.
> >
> > That way we could reuse a lot of the code to support both hardware and
> > software accelerators.
> >
> > In the talk [1] I describe the idea a little bit, and a few months ago I
> > did a PoC (unsubmitted RFC) to see if it was feasible and the numbers
> > were in line with vhost-blk.
> >
> > Do you think we could join forces and just have an in-kernel vdpa-blk
> > software device?
>
> This seems worth trying. Why double the efforts to do the same. Yet I
> would like to play a bit with your vdpa-blk PoC beforehand.

Great :-)

> Can you send it to me with some instructions how to run it?

Yep, sure!

The PoC is available here: 
https://gitlab.com/sgarzarella/linux/-/tree/vdpa-sw-blk-poc

The tree was based on Linux v5.16, but I had some issues to rebuild with 
new gcc, so I rebased on v5.16.20 (not tested), configs needed:
CONFIG_VDPA_SW_BLOCK=m + CONFIG_VHOST_VDPA=m + dependencies.

It contains:
  - patches required for QEMU generic vhost-vdpa support
  - patches to support blk_mq_ops->poll() (to use io_uring iopoll) in
the guest virtio-blk driver (I used the same kernel on guest and
host)
  - some improvements for vringh (not completed, it could be a
bottleneck)
  - vdpa-sw and vdpa-sw-blk patches (and hacks)

It is based on the vDPA simulator framework already merged upstream. The 
idea is to generalize the simulator to share the code between both 
software devices and simulators. The code needs a lot of work, I was 
focusing just on a working virtio-blk device emulation, but more focus 
on the generic part should be done.
In the code there are a couple of defines to control polling.

About the vdpa-blk device, you need iproute2's vdpa tool available 
upstream:
  https://wiki.linuxfoundation.org/networking/iproute2

Once the device is instantiated (see instructions later), the backend 
(raw file or device) can be set through a device attribute (not robust, 
but it was a PoC): /sys/bus/vdpa/devices/$dev_name/backend_fd

I wrote a simple python script available here: 
https://github.com/stefano-garzarella/vm-build/blob/main/vm-tools/vdpa_set_backend_fd.py

For QEMU, we are working on libblkio to support both slow path (when 
QEMU block layer is needed) and fast path (vqs passed directly to the 
device). For now libblkio supports only slow path, so to test the fast 
path you can use Longpeng's patches (not yet merged upstream) with 
generic vhost-vdpa support: 
https://lore.kernel.org/qemu-devel/20220514041107.1980-1-longpe...@huawei.com/

Steps:
  # load vDPA block in-kernel sw device module
  modprobe vdpa_sw_blk

  # load nvme module with poll_queues set if you want to use iopoll
  modprobe nvme poll_queues=15

  # instantiate a new vdpa-blk device
  vdpa dev add mgmtdev vdpasw_blk name blk0

  # set backend (/dev/nvme0n1)
  vdpa_set_backend_fd.py -b /dev/nvme0n1 blk0

  # load vhost vDPA bus ...
  modprobe vhost_vdpa

  # ... and vhost-vdpa device will appear
  ls -l /dev/vhost-vdpa-0
  crw---. 1 root root 510, 0 Jul 28 17:06 /dev/vhost-vdpa-0

  # start QEMU patched with generic vhost-vdpa
  qemu-system-x86_64 ... \
  -device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-0

I haven't tested it recently, so I'm not sure it all works, but in the 
next few days I'll try. For anything else, feel free to reach me here or 
on IRC (sgarzare on #qemu).

Thanks,
Stefano




Re: [PATCH for-7.1 0/3] hw/nvme: misc ioeventfd fixes

2022-07-28 Thread Jinhao Fan
at 4:25 PM, Klaus Jensen  wrote:

> From: Klaus Jensen 
> 
> A set of fixes/changes to the ioeventfd support.
> 
> Klaus Jensen (3):
>  hw/nvme: skip queue processing if notifier is cleared
>  hw/nvme: unregister the event notifier handler on the main loop
>  hw/nvme: do not enable ioeventfd by default
> 
> hw/nvme/ctrl.c | 12 +---
> 1 file changed, 9 insertions(+), 3 deletions(-)
> 
> -- 
> 2.36.1

Looks good to me as well.

Reviewed-by: Jinhao Fan 




Re: [PATCH for-7.1 0/3] hw/nvme: misc ioeventfd fixes

2022-07-28 Thread Keith Busch
On Thu, Jul 28, 2022 at 10:25:19AM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> A set of fixes/changes to the ioeventfd support.

Series looks good.

Reviewed-by: Keith Busch 



Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Markus Armbruster
Kevin Wolf  writes:

> Am 28.07.2022 um 11:46 hat Peter Maydell geschrieben:
>> On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:
>> >
>> > Am 18.07.2022 um 11:49 hat Markus Armbruster geschrieben:
>> > > An OTP device isn't really a parallel flash, and neither are eFuses.
>> > > More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
>> > > other interface types, too.
>> > >
>> > > This patch introduces IF_OTHER.  The patch after next uses it for an
>> > > EEPROM device.
>> > >
>> > > Do we want IF_OTHER?
>> >
>> > What would the semantics even be? Any block device that doesn't pick up
>> > a different category may pick up IF_OTHER backends?
>> >
>> > It certainly feels like a strange interface to ask for "other" disk and
>> > then getting as surprise what this other thing might be. It's
>> > essentially the same as having an explicit '-device other', and I
>> > suppose most people would find that strange.
>> >
>> > > If no, I guess we get to abuse IF_PFLASH some more.
>> > >
>> > > If yes, I guess we should use IF_PFLASH only for actual parallel flash
>> > > memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
>> > > be worth the trouble, though.
>> > >
>> > > Thoughts?
>> >
>> > If the existing types aren't good enough (I don't have an opinion on
>> > whether IF_PFLASH is a good match), let's add a new one. But a specific
>> > new one, not just "other".
>> 
>> I think the common thread is "this isn't what anybody actually thinks
>> of as being a 'disk', but we would like to back it with a block device
>> anyway". That can cover a fair range of possibilities...
>
> How confident are we that no board will ever have two devices of this
> kind?
>
> As long as every board has at most one, if=other is a bad user interface
> in terms of descriptiveness, but still more or less workable as long as
> you know what it means for the specific board you use.
>
> But if you have more than one device, it becomes hard to predict which
> device gets which backend - it depends on the initialisation order in
> the code then,

Really?  Board code should use IF_OTHER devices just like it uses the
other interface types, namely connecting each frontend device to a
backend device with a well-known and fixed interface type and index (or
bus and unit instead, where appropriate).

>and I'm pretty sure that this isn't something that should
> have significance in external interfaces and therefore become a stable
> API.

I agree that "implied by execution order" is a bad idea: commit
95fd260f0a "blockdev: Drop unused drive_get_next()".




Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Peter Maydell
On Thu, 28 Jul 2022 at 15:50, Markus Armbruster  wrote:
> Kevin Wolf  writes:
> >
> > But if you have more than one device, it becomes hard to predict which
> > device gets which backend - it depends on the initialisation order in
> > the code then,
>
> Really?  Board code should use IF_OTHER devices just like it uses the
> other interface types, namely connecting each frontend device to a
> backend device with a well-known and fixed interface type and index (or
> bus and unit instead, where appropriate).

I think part of the problem is that unlike the typical disk
interface, where there is some idea of bus-and-unit-number or
index number that it makes sense to expose to users, these
"miscellaneous storage" devices don't have any particular index
concept -- in the real hardware there are just a random set of
devices that are connected in various places. So you're requiring
users to look up the documentation for "index 0 is this eeprom,
index 1 is that other eeprom, index 2 is ...".

thanks
-- PMM



Re: [PATCH 1/1] block: add missed block_acct_setup with new block device init procedure

2022-07-28 Thread Vladimir Sementsov-Ogievskiy

On 7/11/22 14:07, Denis V. Lunev wrote:

Commit 5f76a7aac156ca75680dad5df4a385fd0b58f6b1 is looking harmless from
the first glance, but it has changed things a lot. 'libvirt' uses it to
detect that it should follow new initialization way and this changes
things considerably. With this procedure followed, blockdev_init() is
not called anymore and thus block_acct_setup() helper is not called.


I'm not sure that 5f76a7aac156ca is really the corner stone.. But yes,
libvirt moved to "blockdev era", which means that we don't use old -drive,
instead block nodes are created by -blockdev / qmp: blockdev-add, and attached
to block devices by node-name.

And if I understand correctly blockdev_init() is called only on -drive path.

I have some questions:

1. After this patch, don't we call block_acct_setup() twice on old path with 
-drive? That seems safe as block_acct_setup just assign fields of 
BlockAcctStats.. But that's doesn't look good.

2. Do we really need these options? Could we instead just enable accounting 
invalid and failed ops unconditionally? I doubt that someone will learn that 
these new options appeared and will use them to disable the failed/invalid 
accounting again.




This means in particular that defaults for block accounting statistics
are changed and account_invalid/account_failed are actually initialized
as false instead of true originally.

This commit changes things to match original world. It adds
account_invalid/account_failed properties to BlockConf and setups them
accordingly.

Signed-off-by: Denis V. Lunev 
CC: Peter Krempa 
CC: Markus Armbruster 
CC: John Snow 
CC: Kevin Wolf 
CC: Hanna Reitz 
---
  hw/block/block.c   |  2 +
  include/hw/block/block.h   |  7 +++-
  tests/qemu-iotests/172.out | 76 ++
  3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 25f45df723..53b100cdc3 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -205,6 +205,8 @@ bool blkconf_apply_backend_options(BlockConf *conf, bool 
readonly,

 BlockdevOnError rerror;

  blk_set_enable_write_cache(blk, wce);
  blk_set_on_error(blk, rerror, werror);
  
+block_acct_setup(blk_get_stats(blk), conf->account_invalid,

+ conf->account_failed);
  return true;
  }
  
diff --git a/include/hw/block/block.h b/include/hw/block/block.h

index 5902c0440a..ffd439fc83 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -31,6 +31,7 @@ typedef struct BlockConf {
  uint32_t lcyls, lheads, lsecs;
  OnOffAuto wce;
  bool share_rw;
+bool account_invalid, account_failed;
  BlockdevOnError rerror;
  BlockdevOnError werror;
  } BlockConf;
@@ -61,7 +62,11 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 _conf.discard_granularity, -1),  \
  DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, _conf.wce,   \
  ON_OFF_AUTO_AUTO),  \
-DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false)
+DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false),\
+DEFINE_PROP_BOOL("account-invalid", _state, \
+ _conf.account_invalid, true),  \
+DEFINE_PROP_BOOL("account-failed", _state,  \
+ _conf.account_failed, true)
  
  #define DEFINE_BLOCK_PROPERTIES(_state, _conf)  \

  DEFINE_PROP_DRIVE("drive", _state, _conf.blk),  \
diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
index 9479b92185..a6c451e098 100644
--- a/tests/qemu-iotests/172.out
+++ b/tests/qemu-iotests/172.out
@@ -28,6 +28,8 @@ Formatting 'TEST_DIR/t.IMGFMT.3', fmt=IMGFMT size=737280
  discard_granularity = 4294967295 (4 GiB)
  write-cache = "auto"
  share-rw = false
+account-invalid = true
+account-failed = true
  drive-type = "288"
  
  
@@ -55,6 +57,8 @@ Testing: -fda TEST_DIR/t.qcow2

  discard_granularity = 4294967295 (4 GiB)
  write-cache = "auto"
  share-rw = false
+account-invalid = true
+account-failed = true
  drive-type = "144"
  floppy0 (NODE_NAME): TEST_DIR/t.qcow2 (qcow2)
  Attached to:  /machine/unattached/device[N]
@@ -92,6 +96,8 @@ Testing: -fdb TEST_DIR/t.qcow2
  discard_granularity = 4294967295 (4 GiB)
  write-cache = "auto"
  share-rw = false
+account-invalid = true
+account-failed = true
  drive-type = "144"
dev: floppy, id ""
  unit = 0 (0x0)
@@ -104,6 +110,8 @@ Testing: -fdb TEST_DIR/t.qcow2
  discard_granularity = 4294967295 (4 

Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Cédric Le Goater

On 7/28/22 15:29, Kevin Wolf wrote:

Am 28.07.2022 um 11:46 hat Peter Maydell geschrieben:

On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:


Am 18.07.2022 um 11:49 hat Markus Armbruster geschrieben:

An OTP device isn't really a parallel flash, and neither are eFuses.
More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
other interface types, too.

This patch introduces IF_OTHER.  The patch after next uses it for an
EEPROM device.

Do we want IF_OTHER?


What would the semantics even be? Any block device that doesn't pick up
a different category may pick up IF_OTHER backends?

It certainly feels like a strange interface to ask for "other" disk and
then getting as surprise what this other thing might be. It's
essentially the same as having an explicit '-device other', and I
suppose most people would find that strange.


If no, I guess we get to abuse IF_PFLASH some more.

If yes, I guess we should use IF_PFLASH only for actual parallel flash
memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
be worth the trouble, though.

Thoughts?


If the existing types aren't good enough (I don't have an opinion on
whether IF_PFLASH is a good match), let's add a new one. But a specific
new one, not just "other".


I think the common thread is "this isn't what anybody actually thinks
of as being a 'disk', but we would like to back it with a block device
anyway". That can cover a fair range of possibilities...


How confident are we that no board will ever have two devices of this
kind?


The BMC machines have a lot of eeproms.



As long as every board has at most one, if=other is a bad user interface
in terms of descriptiveness, but still more or less workable as long as
you know what it means for the specific board you use.

But if you have more than one device, it becomes hard to predict which
device gets which backend - it depends on the initialisation order in
the code then, and I'm pretty sure that this isn't something that should
have significance in external interfaces and therefore become a stable
API.


Can't we use the drive index ?


There has been various attempts to solve this problem for the Aspeed
machines also. See below. May be we should introduce and IF_EEPROM for
the purpose.

Thanks,

C.

[PATCH v2] aspeed: qcom: add block backed FRU devices
https://www.mail-archive.com/qemu-devel@nongnu.org/msg900485.html

[PATCH] aspeed: Enable backend file for eeprom
http://patchwork.ozlabs.org/project/qemu-devel/patch/20220728061228.152704-1-wangzhiqian...@inspur.com/



Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Peter Maydell
On Thu, 28 Jul 2022 at 14:30, Kevin Wolf  wrote:
>
> Am 28.07.2022 um 11:46 hat Peter Maydell geschrieben:
> > On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:
> > > If the existing types aren't good enough (I don't have an opinion on
> > > whether IF_PFLASH is a good match), let's add a new one. But a specific
> > > new one, not just "other".
> >
> > I think the common thread is "this isn't what anybody actually thinks
> > of as being a 'disk', but we would like to back it with a block device
> > anyway". That can cover a fair range of possibilities...
>
> How confident are we that no board will ever have two devices of this
> kind?
>
> As long as every board has at most one, if=other is a bad user interface
> in terms of descriptiveness, but still more or less workable as long as
> you know what it means for the specific board you use.

Extremely non-confident, but that holds for all these things,
unless we add new IF_* for every possible new thing:
 IF_PFLASH
 IF_EEPROM
 IF_EFUSE
 IF_BBRAM
 ...
?

thanks
-- PMM



Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Kevin Wolf
Am 28.07.2022 um 11:46 hat Peter Maydell geschrieben:
> On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:
> >
> > Am 18.07.2022 um 11:49 hat Markus Armbruster geschrieben:
> > > An OTP device isn't really a parallel flash, and neither are eFuses.
> > > More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
> > > other interface types, too.
> > >
> > > This patch introduces IF_OTHER.  The patch after next uses it for an
> > > EEPROM device.
> > >
> > > Do we want IF_OTHER?
> >
> > What would the semantics even be? Any block device that doesn't pick up
> > a different category may pick up IF_OTHER backends?
> >
> > It certainly feels like a strange interface to ask for "other" disk and
> > then getting as surprise what this other thing might be. It's
> > essentially the same as having an explicit '-device other', and I
> > suppose most people would find that strange.
> >
> > > If no, I guess we get to abuse IF_PFLASH some more.
> > >
> > > If yes, I guess we should use IF_PFLASH only for actual parallel flash
> > > memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
> > > be worth the trouble, though.
> > >
> > > Thoughts?
> >
> > If the existing types aren't good enough (I don't have an opinion on
> > whether IF_PFLASH is a good match), let's add a new one. But a specific
> > new one, not just "other".
> 
> I think the common thread is "this isn't what anybody actually thinks
> of as being a 'disk', but we would like to back it with a block device
> anyway". That can cover a fair range of possibilities...

How confident are we that no board will ever have two devices of this
kind?

As long as every board has at most one, if=other is a bad user interface
in terms of descriptiveness, but still more or less workable as long as
you know what it means for the specific board you use.

But if you have more than one device, it becomes hard to predict which
device gets which backend - it depends on the initialisation order in
the code then, and I'm pretty sure that this isn't something that should
have significance in external interfaces and therefore become a stable
API.

Kevin




Re: [PULL 0/2] block: fix parallels block driver

2022-07-28 Thread Richard Henderson

On 7/27/22 12:06, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit f6cce6bcb2ef959cdd4da0e368f7c72045f21d6d:

   Merge tag 'pull-target-arm-20220726' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-07-26 
08:32:01 -0700)

are available in the Git repository at:

   https://gitlab.com/vsementsov/qemu.git tags/pull-block-2022-07-27

for you to fetch changes up to 0c2cb3827e46dc30cd41eeb38f8e318eb665e6a4:

   iotests/131: Add parallels regression test (2022-07-26 22:05:20 +0300)


Block: fix parallels block driver


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Hanna Reitz (2):
   block/parallels: Fix buffer-based write call
   iotests/131: Add parallels regression test

  block/parallels.c  |  4 ++--
  tests/qemu-iotests/131 | 35 ++-
  tests/qemu-iotests/131.out | 13 +
  3 files changed, 49 insertions(+), 3 deletions(-)






Re: [PATCH v3 05/21] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES

2022-07-28 Thread Alex Bennée


Jason Wang  writes:

> 在 2022/7/27 03:21, Alex Bennée 写道:
>> This bit is unused in actual VirtIO feature negotiation and should
>> only appear in the vhost-user messages between master and slave.
>
>
> I might be wrong, but this is actually used between master and slave
> not the device and driver?

Argh you may be right. However got confused with grepping:

  static const VuDevIface vu_blk_iface = {
  .get_features  = vu_blk_get_features,
  .queue_set_started = vu_blk_queue_set_started,
  .get_protocol_features = vu_blk_get_protocol_features,
  .get_config= vu_blk_get_config,
  .set_config= vu_blk_set_config,
  .process_msg   = vu_blk_process_msg,
  };

and this is all part of libvhostuser but vhost-user-blk-server is in the
main tree. I guess it isn't moved into tools/ because it also re-uses
bits of the block layer?

Anyway I shall drop this patch.

>
> Thanks
>
>
>>
>> [AJB: experiment, this doesn't break the tests but I'm not super
>> confident of the range of tests]
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>   block/export/vhost-user-blk-server.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/block/export/vhost-user-blk-server.c 
>> b/block/export/vhost-user-blk-server.c
>> index 3409d9e02e..d31436006d 100644
>> --- a/block/export/vhost-user-blk-server.c
>> +++ b/block/export/vhost-user-blk-server.c
>> @@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
>>  1ull << VIRTIO_BLK_F_MQ |
>>  1ull << VIRTIO_F_VERSION_1 |
>>  1ull << VIRTIO_RING_F_INDIRECT_DESC |
>> -   1ull << VIRTIO_RING_F_EVENT_IDX |
>> -   1ull << VHOST_USER_F_PROTOCOL_FEATURES;
>> +   1ull << VIRTIO_RING_F_EVENT_IDX ;
>> if (!vexp->handler.writable) {
>>   features |= 1ull << VIRTIO_BLK_F_RO;


-- 
Alex Bennée



Re: [PATCH v1 1/5] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES

2022-07-28 Thread Alex Bennée


Kevin Wolf  writes:

> Am 27.07.2022 um 17:56 hat Alex Bennée geschrieben:
>> This bit is unused in actual VirtIO feature negotiation and should
>> only appear in the vhost-user messages between master and slave.
>> 
>> [AJB: experiment, this doesn't break the tests but I'm not super
>> confident of the range of tests]
>> 
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20220726192150.2435175-6-alex.ben...@linaro.org>
>> ---
>>  block/export/vhost-user-blk-server.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/block/export/vhost-user-blk-server.c 
>> b/block/export/vhost-user-blk-server.c
>> index 3409d9e02e..d31436006d 100644
>> --- a/block/export/vhost-user-blk-server.c
>> +++ b/block/export/vhost-user-blk-server.c
>> @@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
>> 1ull << VIRTIO_BLK_F_MQ |
>> 1ull << VIRTIO_F_VERSION_1 |
>> 1ull << VIRTIO_RING_F_INDIRECT_DESC |
>> -   1ull << VIRTIO_RING_F_EVENT_IDX |
>> -   1ull << VHOST_USER_F_PROTOCOL_FEATURES;
>> +   1ull << VIRTIO_RING_F_EVENT_IDX ;
>
> I didn't see this series yet when I replied to the other series this is
> split off from, but of course, my comments are still relevant for this
> one.
>
> I asked for a changed commit message (the "experiment" part should
> probably go away if we're merging it; instead, it should explain that
> in vu_get_features_exec(), libvhost-user adds the vhost-user protocol
> level VHOST_USER_F_PROTOCOL_FEATURES flag anyway and the device is the
> wrong layer to add it, but the behaviour doesn't change with this patch)
> and noted the extra space before the semicolon.

OK - mst do you want me to respin or can you tweak the commit?

>
> Kevin


-- 
Alex Bennée



Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-07-28 Thread Peter Maydell
On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:
>
> Am 18.07.2022 um 11:49 hat Markus Armbruster geschrieben:
> > An OTP device isn't really a parallel flash, and neither are eFuses.
> > More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of
> > other interface types, too.
> >
> > This patch introduces IF_OTHER.  The patch after next uses it for an
> > EEPROM device.
> >
> > Do we want IF_OTHER?
>
> What would the semantics even be? Any block device that doesn't pick up
> a different category may pick up IF_OTHER backends?
>
> It certainly feels like a strange interface to ask for "other" disk and
> then getting as surprise what this other thing might be. It's
> essentially the same as having an explicit '-device other', and I
> suppose most people would find that strange.
>
> > If no, I guess we get to abuse IF_PFLASH some more.
> >
> > If yes, I guess we should use IF_PFLASH only for actual parallel flash
> > memory going forward.  Cleaning up existing abuse of IF_PFLASH may not
> > be worth the trouble, though.
> >
> > Thoughts?
>
> If the existing types aren't good enough (I don't have an opinion on
> whether IF_PFLASH is a good match), let's add a new one. But a specific
> new one, not just "other".

I think the common thread is "this isn't what anybody actually thinks
of as being a 'disk', but we would like to back it with a block device
anyway". That can cover a fair range of possibilities...

thanks
-- PMM



Re: [PATCH v1 1/5] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES

2022-07-28 Thread Kevin Wolf
Am 27.07.2022 um 17:56 hat Alex Bennée geschrieben:
> This bit is unused in actual VirtIO feature negotiation and should
> only appear in the vhost-user messages between master and slave.
> 
> [AJB: experiment, this doesn't break the tests but I'm not super
> confident of the range of tests]
> 
> Signed-off-by: Alex Bennée 
> Message-Id: <20220726192150.2435175-6-alex.ben...@linaro.org>
> ---
>  block/export/vhost-user-blk-server.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/block/export/vhost-user-blk-server.c 
> b/block/export/vhost-user-blk-server.c
> index 3409d9e02e..d31436006d 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
> 1ull << VIRTIO_BLK_F_MQ |
> 1ull << VIRTIO_F_VERSION_1 |
> 1ull << VIRTIO_RING_F_INDIRECT_DESC |
> -   1ull << VIRTIO_RING_F_EVENT_IDX |
> -   1ull << VHOST_USER_F_PROTOCOL_FEATURES;
> +   1ull << VIRTIO_RING_F_EVENT_IDX ;

I didn't see this series yet when I replied to the other series this is
split off from, but of course, my comments are still relevant for this
one.

I asked for a changed commit message (the "experiment" part should
probably go away if we're merging it; instead, it should explain that
in vu_get_features_exec(), libvhost-user adds the vhost-user protocol
level VHOST_USER_F_PROTOCOL_FEATURES flag anyway and the device is the
wrong layer to add it, but the behaviour doesn't change with this patch)
and noted the extra space before the semicolon.

Kevin




[PATCH for-7.1 3/3] hw/nvme: do not enable ioeventfd by default

2022-07-28 Thread Klaus Jensen
From: Klaus Jensen 

Do not enable ioeventfd by default. Let the feature mature a bit before
we consider enabling it by default.

Fixes: 2e53b0b45024 ("hw/nvme: Use ioeventfd to handle doorbell updates")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 70b454eedbd8..87aeba056499 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7670,7 +7670,7 @@ static Property nvme_props[] = {
 DEFINE_PROP_UINT8("vsl", NvmeCtrl, params.vsl, 7),
 DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
 DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false),
-DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, true),
+DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false),
 DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0),
 DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl,
  params.auto_transition_zones, true),
-- 
2.36.1




[PATCH for-7.1 2/3] hw/nvme: unregister the event notifier handler on the main loop

2022-07-28 Thread Klaus Jensen
From: Klaus Jensen 

Make sure the notifier handler is unregistered in the main loop prior to
cleaning it up.

Fixes: 2e53b0b45024 ("hw/nvme: Use ioeventfd to handle doorbell updates")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8aa73b048d51..70b454eedbd8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4311,6 +4311,7 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
 if (sq->ioeventfd_enabled) {
 memory_region_del_eventfd(>iomem,
   0x1000 + offset, 4, false, 0, >notifier);
+event_notifier_set_handler(>notifier, NULL);
 event_notifier_cleanup(>notifier);
 }
 g_free(sq->io_req);
@@ -4701,6 +4702,7 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 if (cq->ioeventfd_enabled) {
 memory_region_del_eventfd(>iomem,
   0x1000 + offset, 4, false, 0, >notifier);
+event_notifier_set_handler(>notifier, NULL);
 event_notifier_cleanup(>notifier);
 }
 if (msix_enabled(>parent_obj)) {
-- 
2.36.1




[PATCH for-7.1 0/3] hw/nvme: misc ioeventfd fixes

2022-07-28 Thread Klaus Jensen
From: Klaus Jensen 

A set of fixes/changes to the ioeventfd support.

Klaus Jensen (3):
  hw/nvme: skip queue processing if notifier is cleared
  hw/nvme: unregister the event notifier handler on the main loop
  hw/nvme: do not enable ioeventfd by default

 hw/nvme/ctrl.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

-- 
2.36.1




[PATCH for-7.1 1/3] hw/nvme: skip queue processing if notifier is cleared

2022-07-28 Thread Klaus Jensen
From: Klaus Jensen 

While it is safe to process the queues when they are empty, skip it if
the event notifier callback was invoked spuriously.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 533ad14e7a61..8aa73b048d51 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4238,7 +4238,9 @@ static void nvme_cq_notifier(EventNotifier *e)
 NvmeCQueue *cq = container_of(e, NvmeCQueue, notifier);
 NvmeCtrl *n = cq->ctrl;
 
-event_notifier_test_and_clear(>notifier);
+if (!event_notifier_test_and_clear(e)) {
+return;
+}
 
 nvme_update_cq_head(cq);
 
@@ -4275,7 +4277,9 @@ static void nvme_sq_notifier(EventNotifier *e)
 {
 NvmeSQueue *sq = container_of(e, NvmeSQueue, notifier);
 
-event_notifier_test_and_clear(>notifier);
+if (!event_notifier_test_and_clear(e)) {
+return;
+}
 
 nvme_process_sq(sq);
 }
-- 
2.36.1




[PATCH] hw/nvme: Add helper functions for qid-db conversion

2022-07-28 Thread Jinhao Fan
With the introduction of shadow doorbell and ioeventfd, we need to do
frequent conversion between qid and its doorbell offset. The original
hard-coded calculation is confusing and error-prone. Add several helper
functions to do this task.

Signed-off-by: Jinhao Fan 
---
 hw/nvme/ctrl.c | 61 --
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 533ad14e7a..6116c0e660 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -487,6 +487,29 @@ static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
 return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
+static inline bool nvme_db_offset_is_cq(NvmeCtrl *n, hwaddr offset)
+{
+hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(>bar.cap));
+return (offset / stride) & 1;
+}
+
+static inline uint16_t nvme_db_offset_to_qid(NvmeCtrl *n, hwaddr offset)
+{
+hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(>bar.cap));
+return offset / (2 * stride);
+}
+
+static inline hwaddr nvme_cqid_to_db_offset(NvmeCtrl *n, uint16_t cqid)
+{
+hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(>bar.cap));
+return stride * (cqid * 2 + 1);
+}
+
+static inline hwaddr nvme_sqid_to_db_offset(NvmeCtrl *n, uint16_t sqid)
+{
+hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(>bar.cap));
+return stride * sqid * 2;
+}
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
 {
@@ -4256,7 +4279,7 @@ static void nvme_cq_notifier(EventNotifier *e)
 static int nvme_init_cq_ioeventfd(NvmeCQueue *cq)
 {
 NvmeCtrl *n = cq->ctrl;
-uint16_t offset = (cq->cqid << 3) + (1 << 2);
+uint16_t offset = nvme_cqid_to_db_offset(n, cq->cqid);
 int ret;
 
 ret = event_notifier_init(>notifier, 0);
@@ -4283,7 +4306,7 @@ static void nvme_sq_notifier(EventNotifier *e)
 static int nvme_init_sq_ioeventfd(NvmeSQueue *sq)
 {
 NvmeCtrl *n = sq->ctrl;
-uint16_t offset = sq->sqid << 3;
+uint16_t offset = nvme_sqid_to_db_offset(n, sq->sqid);
 int ret;
 
 ret = event_notifier_init(>notifier, 0);
@@ -4300,7 +4323,7 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq)
 
 static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
 {
-uint16_t offset = sq->sqid << 3;
+uint16_t offset = nvme_sqid_to_db_offset(n, sq->sqid);
 
 n->sq[sq->sqid] = NULL;
 timer_free(sq->timer);
@@ -4379,8 +4402,8 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, 
uint64_t dma_addr,
 sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
 
 if (n->dbbuf_enabled) {
-sq->db_addr = n->dbbuf_dbs + (sqid << 3);
-sq->ei_addr = n->dbbuf_eis + (sqid << 3);
+sq->db_addr = n->dbbuf_dbs + nvme_sqid_to_db_offset(n, sqid);
+sq->ei_addr = n->dbbuf_eis + nvme_sqid_to_db_offset(n, sqid);
 
 if (n->params.ioeventfd && sq->sqid != 0) {
 if (!nvme_init_sq_ioeventfd(sq)) {
@@ -4690,8 +4713,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 
 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 {
-uint16_t offset = (cq->cqid << 3) + (1 << 2);
-
+uint16_t offset = nvme_cqid_to_db_offset(n, cq->cqid);
+
 n->cq[cq->cqid] = NULL;
 timer_free(cq->timer);
 if (cq->ioeventfd_enabled) {
@@ -4755,8 +4778,8 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, 
uint64_t dma_addr,
 QTAILQ_INIT(>req_list);
 QTAILQ_INIT(>sq_list);
 if (n->dbbuf_enabled) {
-cq->db_addr = n->dbbuf_dbs + (cqid << 3) + (1 << 2);
-cq->ei_addr = n->dbbuf_eis + (cqid << 3) + (1 << 2);
+cq->db_addr = n->dbbuf_dbs + nvme_cqid_to_db_offset(n, cqid);
+cq->ei_addr = n->dbbuf_eis + nvme_cqid_to_db_offset(n, cqid);
 
 if (n->params.ioeventfd && cqid != 0) {
 if (!nvme_init_cq_ioeventfd(cq)) {
@@ -6128,13 +6151,8 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 NvmeCQueue *cq = n->cq[i];
 
 if (sq) {
-/*
- * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3)
- * nvme_process_db() uses this hard-coded way to calculate
- * doorbell offsets. Be consistent with that here.
- */
-sq->db_addr = dbs_addr + (i << 3);
-sq->ei_addr = eis_addr + (i << 3);
+sq->db_addr = dbs_addr + nvme_sqid_to_db_offset(n, i);
+sq->ei_addr = eis_addr + nvme_sqid_to_db_offset(n, i);
 pci_dma_write(>parent_obj, sq->db_addr, >tail,
 sizeof(sq->tail));
 
@@ -6146,9 +6164,8 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 }
 
 if (cq) {
-/* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */
-cq->db_addr = dbs_addr + (i << 3) + (1 << 2);
-cq->ei_addr = eis_addr + (i << 3) + (1 << 2);
+cq->db_addr = dbs_addr + nvme_cqid_to_db_offset(n, i);
+cq->ei_addr = eis_addr + nvme_cqid_to_db_offset(n, i);
 

Re: [PATCH v3 05/21] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES

2022-07-28 Thread Jason Wang



在 2022/7/27 03:21, Alex Bennée 写道:

This bit is unused in actual VirtIO feature negotiation and should
only appear in the vhost-user messages between master and slave.



I might be wrong, but this is actually used between master and slave not 
the device and driver?


Thanks




[AJB: experiment, this doesn't break the tests but I'm not super
confident of the range of tests]

Signed-off-by: Alex Bennée 
---
  block/export/vhost-user-blk-server.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 3409d9e02e..d31436006d 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
 1ull << VIRTIO_BLK_F_MQ |
 1ull << VIRTIO_F_VERSION_1 |
 1ull << VIRTIO_RING_F_INDIRECT_DESC |
-   1ull << VIRTIO_RING_F_EVENT_IDX |
-   1ull << VHOST_USER_F_PROTOCOL_FEATURES;
+   1ull << VIRTIO_RING_F_EVENT_IDX ;
  
  if (!vexp->handler.writable) {

  features |= 1ull << VIRTIO_BLK_F_RO;