Re: [PATCH] aspeed/smc: Fix DMA support for AST2600

2020-03-19 Thread Joel Stanley
On Fri, 20 Mar 2020 at 05:39, Cédric Le Goater  wrote:
>
> Recent firmwares uses SPI DMA transfers in U-Boot to load the
> different images (kernel, initrd, dtb) in the SoC DRAM. The AST2600
> FMC model is missing the masks to be applied on the DMA registers
> which resulted in incorrect values. Fix that and wire the SPI
> controllers which have DMA support on the AST2600.
>
> Fixes: bcaa8ddd081c ("aspeed/smc: Add AST2600 support")
> Signed-off-by: Cédric Le Goater 

I gave this a spin with the Tacoma machine and it resolved the issue I
saw. Thanks for fixing it.

Reviewed-by: Joel Stanley 


> ---
>  hw/arm/aspeed_ast2600.c |  6 ++
>  hw/ssi/aspeed_smc.c | 15 +--
>  hw/ssi/trace-events |  1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
> index 446b44d31cf1..1a869e09b96a 100644
> --- a/hw/arm/aspeed_ast2600.c
> +++ b/hw/arm/aspeed_ast2600.c
> @@ -411,6 +411,12 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, 
> Error **errp)
>
>  /* SPI */
>  for (i = 0; i < sc->spis_num; i++) {
> +object_property_set_link(OBJECT(>spi[i]), OBJECT(s->dram_mr),
> + "dram", );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
>  object_property_set_int(OBJECT(>spi[i]), 1, "num-cs", );
>  object_property_set_bool(OBJECT(>spi[i]), true, "realized",
>   _err);
> diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
> index 9d5c696d5a17..2edccef2d54d 100644
> --- a/hw/ssi/aspeed_smc.c
> +++ b/hw/ssi/aspeed_smc.c
> @@ -364,6 +364,8 @@ static const AspeedSMCController controllers[] = {
>  .flash_window_base = ASPEED26_SOC_FMC_FLASH_BASE,
>  .flash_window_size = 0x1000,
>  .has_dma   = true,
> +.dma_flash_mask= 0x0FFC,
> +.dma_dram_mask = 0x3FFC,
>  .nregs = ASPEED_SMC_R_MAX,
>  .segment_to_reg= aspeed_2600_smc_segment_to_reg,
>  .reg_to_segment= aspeed_2600_smc_reg_to_segment,
> @@ -379,7 +381,9 @@ static const AspeedSMCController controllers[] = {
>  .segments  = aspeed_segments_ast2600_spi1,
>  .flash_window_base = ASPEED26_SOC_SPI_FLASH_BASE,
>  .flash_window_size = 0x1000,
> -.has_dma   = false,
> +.has_dma   = true,
> +.dma_flash_mask= 0x0FFC,
> +.dma_dram_mask = 0x3FFC,
>  .nregs = ASPEED_SMC_R_MAX,
>  .segment_to_reg= aspeed_2600_smc_segment_to_reg,
>  .reg_to_segment= aspeed_2600_smc_reg_to_segment,
> @@ -395,7 +399,9 @@ static const AspeedSMCController controllers[] = {
>  .segments  = aspeed_segments_ast2600_spi2,
>  .flash_window_base = ASPEED26_SOC_SPI2_FLASH_BASE,
>  .flash_window_size = 0x1000,
> -.has_dma   = false,
> +.has_dma   = true,
> +.dma_flash_mask= 0x0FFC,
> +.dma_dram_mask = 0x3FFC,
>  .nregs = ASPEED_SMC_R_MAX,
>  .segment_to_reg= aspeed_2600_smc_segment_to_reg,
>  .reg_to_segment= aspeed_2600_smc_reg_to_segment,
> @@ -1135,6 +1141,11 @@ static void aspeed_smc_dma_rw(AspeedSMCState *s)
>  MemTxResult result;
>  uint32_t data;
>
> +trace_aspeed_smc_dma_rw(s->regs[R_DMA_CTRL] & DMA_CTRL_WRITE ?
> +"write" : "read",
> +s->regs[R_DMA_FLASH_ADDR],
> +s->regs[R_DMA_DRAM_ADDR],
> +s->regs[R_DMA_LEN]);
>  while (s->regs[R_DMA_LEN]) {
>  if (s->regs[R_DMA_CTRL] & DMA_CTRL_WRITE) {
>  data = address_space_ldl_le(>dram_as, 
> s->regs[R_DMA_DRAM_ADDR],
> diff --git a/hw/ssi/trace-events b/hw/ssi/trace-events
> index 0a70629801a9..0ea498de910b 100644
> --- a/hw/ssi/trace-events
> +++ b/hw/ssi/trace-events
> @@ -6,5 +6,6 @@ aspeed_smc_do_snoop(int cs, int index, int dummies, int data) 
> "CS%d index:0x%x d
>  aspeed_smc_flash_write(int cs, uint64_t addr,  uint32_t size, uint64_t data, 
> int mode) "CS%d @0x%" PRIx64 " size %u: 0x%" PRIx64" mode:%d"
>  aspeed_smc_read(uint64_t addr,  uint32_t size, uint64_t data) "@0x%" PRIx64 
> " size %u: 0x%" PRIx64
>  aspeed_smc_dma_checksum(uint32_t addr, uint32_t data) "0x%08x: 0x%08x"
> +aspeed_smc_dma_rw(const char *dir, uint32_t flash_addr, uint32_t dram_addr, 
> uint32_t size) "%s flash:@0x%08x dram:@0x%08x size:0x%08x"
>  aspeed_smc_write(uint64_t addr,  uint32_t size, uint64_t data) "@0x%" PRIx64 
> " size %u: 0x%" PRIx64
>  aspeed_smc_flash_select(int cs, const char *prefix) "CS%d %sselect"
> --
> 2.21.1
>



[PATCH] aspeed/smc: Fix DMA support for AST2600

2020-03-19 Thread Cédric Le Goater
Recent firmwares uses SPI DMA transfers in U-Boot to load the
different images (kernel, initrd, dtb) in the SoC DRAM. The AST2600
FMC model is missing the masks to be applied on the DMA registers
which resulted in incorrect values. Fix that and wire the SPI
controllers which have DMA support on the AST2600.

Fixes: bcaa8ddd081c ("aspeed/smc: Add AST2600 support")
Signed-off-by: Cédric Le Goater 
---
 hw/arm/aspeed_ast2600.c |  6 ++
 hw/ssi/aspeed_smc.c | 15 +--
 hw/ssi/trace-events |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index 446b44d31cf1..1a869e09b96a 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -411,6 +411,12 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, 
Error **errp)
 
 /* SPI */
 for (i = 0; i < sc->spis_num; i++) {
+object_property_set_link(OBJECT(>spi[i]), OBJECT(s->dram_mr),
+ "dram", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
 object_property_set_int(OBJECT(>spi[i]), 1, "num-cs", );
 object_property_set_bool(OBJECT(>spi[i]), true, "realized",
  _err);
diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
index 9d5c696d5a17..2edccef2d54d 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -364,6 +364,8 @@ static const AspeedSMCController controllers[] = {
 .flash_window_base = ASPEED26_SOC_FMC_FLASH_BASE,
 .flash_window_size = 0x1000,
 .has_dma   = true,
+.dma_flash_mask= 0x0FFC,
+.dma_dram_mask = 0x3FFC,
 .nregs = ASPEED_SMC_R_MAX,
 .segment_to_reg= aspeed_2600_smc_segment_to_reg,
 .reg_to_segment= aspeed_2600_smc_reg_to_segment,
@@ -379,7 +381,9 @@ static const AspeedSMCController controllers[] = {
 .segments  = aspeed_segments_ast2600_spi1,
 .flash_window_base = ASPEED26_SOC_SPI_FLASH_BASE,
 .flash_window_size = 0x1000,
-.has_dma   = false,
+.has_dma   = true,
+.dma_flash_mask= 0x0FFC,
+.dma_dram_mask = 0x3FFC,
 .nregs = ASPEED_SMC_R_MAX,
 .segment_to_reg= aspeed_2600_smc_segment_to_reg,
 .reg_to_segment= aspeed_2600_smc_reg_to_segment,
@@ -395,7 +399,9 @@ static const AspeedSMCController controllers[] = {
 .segments  = aspeed_segments_ast2600_spi2,
 .flash_window_base = ASPEED26_SOC_SPI2_FLASH_BASE,
 .flash_window_size = 0x1000,
-.has_dma   = false,
+.has_dma   = true,
+.dma_flash_mask= 0x0FFC,
+.dma_dram_mask = 0x3FFC,
 .nregs = ASPEED_SMC_R_MAX,
 .segment_to_reg= aspeed_2600_smc_segment_to_reg,
 .reg_to_segment= aspeed_2600_smc_reg_to_segment,
@@ -1135,6 +1141,11 @@ static void aspeed_smc_dma_rw(AspeedSMCState *s)
 MemTxResult result;
 uint32_t data;
 
+trace_aspeed_smc_dma_rw(s->regs[R_DMA_CTRL] & DMA_CTRL_WRITE ?
+"write" : "read",
+s->regs[R_DMA_FLASH_ADDR],
+s->regs[R_DMA_DRAM_ADDR],
+s->regs[R_DMA_LEN]);
 while (s->regs[R_DMA_LEN]) {
 if (s->regs[R_DMA_CTRL] & DMA_CTRL_WRITE) {
 data = address_space_ldl_le(>dram_as, s->regs[R_DMA_DRAM_ADDR],
diff --git a/hw/ssi/trace-events b/hw/ssi/trace-events
index 0a70629801a9..0ea498de910b 100644
--- a/hw/ssi/trace-events
+++ b/hw/ssi/trace-events
@@ -6,5 +6,6 @@ aspeed_smc_do_snoop(int cs, int index, int dummies, int data) 
"CS%d index:0x%x d
 aspeed_smc_flash_write(int cs, uint64_t addr,  uint32_t size, uint64_t data, 
int mode) "CS%d @0x%" PRIx64 " size %u: 0x%" PRIx64" mode:%d"
 aspeed_smc_read(uint64_t addr,  uint32_t size, uint64_t data) "@0x%" PRIx64 " 
size %u: 0x%" PRIx64
 aspeed_smc_dma_checksum(uint32_t addr, uint32_t data) "0x%08x: 0x%08x"
+aspeed_smc_dma_rw(const char *dir, uint32_t flash_addr, uint32_t dram_addr, 
uint32_t size) "%s flash:@0x%08x dram:@0x%08x size:0x%08x"
 aspeed_smc_write(uint64_t addr,  uint32_t size, uint64_t data) "@0x%" PRIx64 " 
size %u: 0x%" PRIx64
 aspeed_smc_flash_select(int cs, const char *prefix) "CS%d %sselect"
-- 
2.21.1




Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-19 Thread Yan Zhao
On Fri, Mar 20, 2020 at 12:09:18PM +0800, Alex Williamson wrote:
> On Thu, 19 Mar 2020 23:06:56 -0400
> Yan Zhao  wrote:
> 
> > On Fri, Mar 20, 2020 at 10:34:40AM +0800, Alex Williamson wrote:
> > > On Thu, 19 Mar 2020 21:30:39 -0400
> > > Yan Zhao  wrote:
> > >   
> > > > On Thu, Mar 19, 2020 at 09:09:21PM +0800, Alex Williamson wrote:  
> > > > > On Thu, 19 Mar 2020 01:05:54 -0400
> > > > > Yan Zhao  wrote:
> > > > > 
> > > > > > On Thu, Mar 19, 2020 at 11:49:26AM +0800, Alex Williamson wrote:
> > > > > > > On Wed, 18 Mar 2020 21:17:03 -0400
> > > > > > > Yan Zhao  wrote:
> > > > > > >   
> > > > > > > > On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote: 
> > > > > > > >  
> > > > > > > > > - Defined MIGRATION region type and sub-type.
> > > > > > > > > 
> > > > > > > > > - Defined vfio_device_migration_info structure which will be 
> > > > > > > > > placed at the
> > > > > > > > >   0th offset of migration region to get/set VFIO device 
> > > > > > > > > related
> > > > > > > > >   information. Defined members of structure and usage on 
> > > > > > > > > read/write access.
> > > > > > > > > 
> > > > > > > > > - Defined device states and state transition details.
> > > > > > > > > 
> > > > > > > > > - Defined sequence to be followed while saving and resuming 
> > > > > > > > > VFIO device.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Kirti Wankhede 
> > > > > > > > > Reviewed-by: Neo Jia 
> > > > > > > > > ---
> > > > > > > > >  include/uapi/linux/vfio.h | 227 
> > > > > > > > > ++
> > > > > > > > >  1 file changed, 227 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/include/uapi/linux/vfio.h 
> > > > > > > > > b/include/uapi/linux/vfio.h
> > > > > > > > > index 9e843a147ead..d0021467af53 100644
> > > > > > > > > --- a/include/uapi/linux/vfio.h
> > > > > > > > > +++ b/include/uapi/linux/vfio.h
> > > > > > > > > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> > > > > > > > >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0x)
> > > > > > > > >  #define VFIO_REGION_TYPE_GFX(1)
> > > > > > > > >  #define VFIO_REGION_TYPE_CCW (2)
> > > > > > > > > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> > > > > > > > >  
> > > > > > > > >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> > > > > > > > >  
> > > > > > > > > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> > > > > > > > >  /* sub-types for VFIO_REGION_TYPE_CCW */
> > > > > > > > >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD(1)
> > > > > > > > >  
> > > > > > > > > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > > > > > > > > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * The structure vfio_device_migration_info is placed at the 
> > > > > > > > > 0th offset of
> > > > > > > > > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set 
> > > > > > > > > VFIO device related
> > > > > > > > > + * migration information. Field accesses from this structure 
> > > > > > > > > are only supported
> > > > > > > > > + * at their native width and alignment. Otherwise, the 
> > > > > > > > > result is undefined and
> > > > > > > > > + * vendor drivers should return an error.
> > > > > > > > > + *
> > > > > > > > > + * device_state: (read/write)
> > > > > > > > > + *  - The user application writes to this field to 
> > > > > > > > > inform the vendor driver
> > > > > > > > > + *about the device state to be transitioned to.
> > > > > > > > > + *  - The vendor driver should take the necessary 
> > > > > > > > > actions to change the
> > > > > > > > > + *device state. After successful transition to a 
> > > > > > > > > given state, the
> > > > > > > > > + *vendor driver should return success on 
> > > > > > > > > write(device_state, state)
> > > > > > > > > + *system call. If the device state transition fails, 
> > > > > > > > > the vendor driver
> > > > > > > > > + *should return an appropriate -errno for the fault 
> > > > > > > > > condition.
> > > > > > > > > + *  - On the user application side, if the device state 
> > > > > > > > > transition fails,
> > > > > > > > > + * that is, if write(device_state, state) returns an 
> > > > > > > > > error, read
> > > > > > > > > + * device_state again to determine the current state of 
> > > > > > > > > the device from
> > > > > > > > > + * the vendor driver.
> > > > > > > > > + *  - The vendor driver should return previous state of 
> > > > > > > > > the device unless
> > > > > > > > > + *the vendor driver has encountered an internal 
> > > > > > > > > error, in which case
> > > > > > > > > + *the vendor driver may report the device_state 
> > > > > > > > > VFIO_DEVICE_STATE_ERROR.
> > > > > > > > > + *  - The user application must use the device reset 
> > > > > > > > > ioctl to recover the
> > > > > > > > 

Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-19 Thread Alex Williamson
On Thu, 19 Mar 2020 23:06:56 -0400
Yan Zhao  wrote:

> On Fri, Mar 20, 2020 at 10:34:40AM +0800, Alex Williamson wrote:
> > On Thu, 19 Mar 2020 21:30:39 -0400
> > Yan Zhao  wrote:
> >   
> > > On Thu, Mar 19, 2020 at 09:09:21PM +0800, Alex Williamson wrote:  
> > > > On Thu, 19 Mar 2020 01:05:54 -0400
> > > > Yan Zhao  wrote:
> > > > 
> > > > > On Thu, Mar 19, 2020 at 11:49:26AM +0800, Alex Williamson wrote:
> > > > > > On Wed, 18 Mar 2020 21:17:03 -0400
> > > > > > Yan Zhao  wrote:
> > > > > >   
> > > > > > > On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:   
> > > > > > >
> > > > > > > > - Defined MIGRATION region type and sub-type.
> > > > > > > > 
> > > > > > > > - Defined vfio_device_migration_info structure which will be 
> > > > > > > > placed at the
> > > > > > > >   0th offset of migration region to get/set VFIO device related
> > > > > > > >   information. Defined members of structure and usage on 
> > > > > > > > read/write access.
> > > > > > > > 
> > > > > > > > - Defined device states and state transition details.
> > > > > > > > 
> > > > > > > > - Defined sequence to be followed while saving and resuming 
> > > > > > > > VFIO device.
> > > > > > > > 
> > > > > > > > Signed-off-by: Kirti Wankhede 
> > > > > > > > Reviewed-by: Neo Jia 
> > > > > > > > ---
> > > > > > > >  include/uapi/linux/vfio.h | 227 
> > > > > > > > ++
> > > > > > > >  1 file changed, 227 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/include/uapi/linux/vfio.h 
> > > > > > > > b/include/uapi/linux/vfio.h
> > > > > > > > index 9e843a147ead..d0021467af53 100644
> > > > > > > > --- a/include/uapi/linux/vfio.h
> > > > > > > > +++ b/include/uapi/linux/vfio.h
> > > > > > > > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> > > > > > > >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
> > > > > > > >  #define VFIO_REGION_TYPE_GFX(1)
> > > > > > > >  #define VFIO_REGION_TYPE_CCW   (2)
> > > > > > > > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> > > > > > > >  
> > > > > > > >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> > > > > > > >  
> > > > > > > > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> > > > > > > >  /* sub-types for VFIO_REGION_TYPE_CCW */
> > > > > > > >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
> > > > > > > >  
> > > > > > > > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > > > > > > > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * The structure vfio_device_migration_info is placed at the 
> > > > > > > > 0th offset of
> > > > > > > > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set 
> > > > > > > > VFIO device related
> > > > > > > > + * migration information. Field accesses from this structure 
> > > > > > > > are only supported
> > > > > > > > + * at their native width and alignment. Otherwise, the result 
> > > > > > > > is undefined and
> > > > > > > > + * vendor drivers should return an error.
> > > > > > > > + *
> > > > > > > > + * device_state: (read/write)
> > > > > > > > + *  - The user application writes to this field to inform 
> > > > > > > > the vendor driver
> > > > > > > > + *about the device state to be transitioned to.
> > > > > > > > + *  - The vendor driver should take the necessary actions 
> > > > > > > > to change the
> > > > > > > > + *device state. After successful transition to a given 
> > > > > > > > state, the
> > > > > > > > + *vendor driver should return success on 
> > > > > > > > write(device_state, state)
> > > > > > > > + *system call. If the device state transition fails, 
> > > > > > > > the vendor driver
> > > > > > > > + *should return an appropriate -errno for the fault 
> > > > > > > > condition.
> > > > > > > > + *  - On the user application side, if the device state 
> > > > > > > > transition fails,
> > > > > > > > + *   that is, if write(device_state, state) returns an 
> > > > > > > > error, read
> > > > > > > > + *   device_state again to determine the current state of 
> > > > > > > > the device from
> > > > > > > > + *   the vendor driver.
> > > > > > > > + *  - The vendor driver should return previous state of 
> > > > > > > > the device unless
> > > > > > > > + *the vendor driver has encountered an internal error, 
> > > > > > > > in which case
> > > > > > > > + *the vendor driver may report the device_state 
> > > > > > > > VFIO_DEVICE_STATE_ERROR.
> > > > > > > > + *  - The user application must use the device reset ioctl 
> > > > > > > > to recover the
> > > > > > > > + *device from VFIO_DEVICE_STATE_ERROR state. If the 
> > > > > > > > device is
> > > > > > > > + *indicated to be in a valid device state by reading 
> > > > > > > > device_state, the
> > > > > > > > + *user application may attempt to 

Re: Qemu API documentation

2020-03-19 Thread Priyamvad Acharya
Thanks Alex, I will check it out.
Have you look at below issue which I mention in my previous email?


>>> *>> qemu-system-arm: Unknown device 'soc' for default sysbusAborted
>>> (core>> dumped)**
>>>
>>

On Thu, 19 Mar 2020 at 20:09, Alex Bennée  wrote:

>
> Priyamvad Acharya  writes:
>
> > Thanks John and Peter for guiding me, but still it will be hard to
> > understand from source code for a newbie.
> >
> > I basically want to implement a trivial device for arm architecture which
> > basically contains register for read/write operation with a program.So
> what
> > are the references?
>
> I would look at hw/misc/unimp.c as a useful template for implementing a
> new device. Many boards instantiate the unimp devices for areas of SoC's
> that are not yet implemented ;-)
>
> >
> > I am providing pointers about my device which I am trying to implement:
> >  - I am implementing a device which will be attached to *versatilepb*
> > board, that board has* ARM926 CPU*.
> > - My device name is "*soc*" , whose description is in
> *qemu/hw/misc/soc.c*
> > file attached below.
> > - I have written below line to make my device available to qemu in
> > *qemu/hw/misc/Makefile.objs*.
> >
> >> *$ common-obj-$(CONFIG_SOC) += soc.o *
> >>
> > - I added following lines in *qemu/hw/arm/versatilepb.c* to attach my
> > device to board.
> >
> >>
> >> *#define DES_BASEADDR0x101f5000*
> >>
> >>
> >>
> >> *soc=qdev_create(NULL, "soc");// +qdev_init_nofail(soc);// +
> >> sysbus_mmio_map(SYS_BUS_DEVICE(soc), 0, DES_BASEADDR);// +*
> >>
> >
> > - Run below commands to build my device
> >
> >> *$ make distclean*
> >> *$ make -j8 -C build *
> >>
> >
> > - Run below command to run a bare metal program on device.
> >
> >> *$ ./qemu-system-arm -M versatilepb -nographic -kernel
> >> /lhome/priyamvad/debian_qemu_arm32/c_application/DES/des_demo.elf*
> >>
> >
> > -I get following output in terminal shown below
> >
> >>
> >>
> >> *[priyamvad@predator arm-softmmu]$ ./qemu-system-arm -M versatilepb
> >> -nographic -kernel
> >> /lhome/priyamvad/debian_qemu_arm32/c_application/DES/des_demo.elf
> >> qemu-system-arm: Unknown device 'soc' for default sysbusAborted (core
> >> dumped)*
> >>
> >
> > -Here des_demo.elf is our *bare metal program* executable for
> *arm(926ej-s)*
> > processor.
> >
> > So how to resolve below issue to run executable
> >
> >>
> >>
> >> *[priyamvad@predator arm-softmmu]$ ./qemu-system-arm -M versatilepb
> >> -nographic -kernel
> >> /lhome/priyamvad/debian_qemu_arm32/c_application/DES/des_demo.elf
> >> qemu-system-arm: Unknown device 'soc' for default sysbusAborted (core
> >> dumped)*
> >>
> >
> > test.s,test.ld,startup.S,Makefile,des_demo.c are files required for bare
> >> metal program
> >>
> >
> > References:
> >
> >
> https://devkail.wordpress.com/2014/12/16/emulation-of-des-encryption-device-in-qemu/
> >
> > Thanks,
> > Priyamvad
> >
> > On Thu, 19 Mar 2020 at 01:19, John Snow  wrote:
> >
> >>
> >>
> >> On 3/18/20 7:09 AM, Peter Maydell wrote:
> >> > On Wed, 18 Mar 2020 at 09:55, Priyamvad Acharya
> >> >  wrote:
> >> >>
> >> >> Hello developer community,
> >> >>
> >> >> I am working on implementing a custom device in Qemu, so to implement
> >> it I need documentation of functions which are used to emulate a
> hardware
> >> model in Qemu.
> >> >>
> >> >> What are the references to get it ?
> >> >
> >> > QEMU has very little documentation of its internals;
> >> > the usual practice is to figure things out by
> >> > reading the source code. What we do have is in
> >> > docs/devel. There are also often documentation comments
> >> > for specific functions in the include files where
> >> > those functions are declared, which form the API
> >> > documentation for them.
> >> >
> >>
> >> ^ Unfortunately true. One thing you can do is try to pick an existing
> >> device that's close to yours -- some donor PCI, USB etc device and start
> >> using that as a reference.
> >>
> >> If you can share (broad) details of what device you are trying to
> >> implement, we might be able to point you to relevant examples to use as
> >> a reference.
> >>
> >> --js
> >>
> >>
>
>
> --
> Alex Bennée
>


Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-19 Thread Yan Zhao
On Fri, Mar 20, 2020 at 10:34:40AM +0800, Alex Williamson wrote:
> On Thu, 19 Mar 2020 21:30:39 -0400
> Yan Zhao  wrote:
> 
> > On Thu, Mar 19, 2020 at 09:09:21PM +0800, Alex Williamson wrote:
> > > On Thu, 19 Mar 2020 01:05:54 -0400
> > > Yan Zhao  wrote:
> > >   
> > > > On Thu, Mar 19, 2020 at 11:49:26AM +0800, Alex Williamson wrote:  
> > > > > On Wed, 18 Mar 2020 21:17:03 -0400
> > > > > Yan Zhao  wrote:
> > > > > 
> > > > > > On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:
> > > > > > > - Defined MIGRATION region type and sub-type.
> > > > > > > 
> > > > > > > - Defined vfio_device_migration_info structure which will be 
> > > > > > > placed at the
> > > > > > >   0th offset of migration region to get/set VFIO device related
> > > > > > >   information. Defined members of structure and usage on 
> > > > > > > read/write access.
> > > > > > > 
> > > > > > > - Defined device states and state transition details.
> > > > > > > 
> > > > > > > - Defined sequence to be followed while saving and resuming VFIO 
> > > > > > > device.
> > > > > > > 
> > > > > > > Signed-off-by: Kirti Wankhede 
> > > > > > > Reviewed-by: Neo Jia 
> > > > > > > ---
> > > > > > >  include/uapi/linux/vfio.h | 227 
> > > > > > > ++
> > > > > > >  1 file changed, 227 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > > > index 9e843a147ead..d0021467af53 100644
> > > > > > > --- a/include/uapi/linux/vfio.h
> > > > > > > +++ b/include/uapi/linux/vfio.h
> > > > > > > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> > > > > > >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0x)
> > > > > > >  #define VFIO_REGION_TYPE_GFX(1)
> > > > > > >  #define VFIO_REGION_TYPE_CCW (2)
> > > > > > > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> > > > > > >  
> > > > > > >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> > > > > > >  
> > > > > > > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> > > > > > >  /* sub-types for VFIO_REGION_TYPE_CCW */
> > > > > > >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD(1)
> > > > > > >  
> > > > > > > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > > > > > > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * The structure vfio_device_migration_info is placed at the 0th 
> > > > > > > offset of
> > > > > > > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO 
> > > > > > > device related
> > > > > > > + * migration information. Field accesses from this structure are 
> > > > > > > only supported
> > > > > > > + * at their native width and alignment. Otherwise, the result is 
> > > > > > > undefined and
> > > > > > > + * vendor drivers should return an error.
> > > > > > > + *
> > > > > > > + * device_state: (read/write)
> > > > > > > + *  - The user application writes to this field to inform 
> > > > > > > the vendor driver
> > > > > > > + *about the device state to be transitioned to.
> > > > > > > + *  - The vendor driver should take the necessary actions to 
> > > > > > > change the
> > > > > > > + *device state. After successful transition to a given 
> > > > > > > state, the
> > > > > > > + *vendor driver should return success on 
> > > > > > > write(device_state, state)
> > > > > > > + *system call. If the device state transition fails, the 
> > > > > > > vendor driver
> > > > > > > + *should return an appropriate -errno for the fault 
> > > > > > > condition.
> > > > > > > + *  - On the user application side, if the device state 
> > > > > > > transition fails,
> > > > > > > + * that is, if write(device_state, state) returns an 
> > > > > > > error, read
> > > > > > > + * device_state again to determine the current state of 
> > > > > > > the device from
> > > > > > > + * the vendor driver.
> > > > > > > + *  - The vendor driver should return previous state of the 
> > > > > > > device unless
> > > > > > > + *the vendor driver has encountered an internal error, 
> > > > > > > in which case
> > > > > > > + *the vendor driver may report the device_state 
> > > > > > > VFIO_DEVICE_STATE_ERROR.
> > > > > > > + *  - The user application must use the device reset ioctl 
> > > > > > > to recover the
> > > > > > > + *device from VFIO_DEVICE_STATE_ERROR state. If the 
> > > > > > > device is
> > > > > > > + *indicated to be in a valid device state by reading 
> > > > > > > device_state, the
> > > > > > > + *user application may attempt to transition the device 
> > > > > > > to any valid
> > > > > > > + *state reachable from the current state or terminate 
> > > > > > > itself.
> > > > > > > + *
> > > > > > > + *  device_state consists of 3 bits:
> > > > > > > + *  - If bit 0 is set, it indicates the _RUNNING 

Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-19 Thread Alex Williamson
On Thu, 19 Mar 2020 21:30:39 -0400
Yan Zhao  wrote:

> On Thu, Mar 19, 2020 at 09:09:21PM +0800, Alex Williamson wrote:
> > On Thu, 19 Mar 2020 01:05:54 -0400
> > Yan Zhao  wrote:
> >   
> > > On Thu, Mar 19, 2020 at 11:49:26AM +0800, Alex Williamson wrote:  
> > > > On Wed, 18 Mar 2020 21:17:03 -0400
> > > > Yan Zhao  wrote:
> > > > 
> > > > > On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:
> > > > > > - Defined MIGRATION region type and sub-type.
> > > > > > 
> > > > > > - Defined vfio_device_migration_info structure which will be placed 
> > > > > > at the
> > > > > >   0th offset of migration region to get/set VFIO device related
> > > > > >   information. Defined members of structure and usage on read/write 
> > > > > > access.
> > > > > > 
> > > > > > - Defined device states and state transition details.
> > > > > > 
> > > > > > - Defined sequence to be followed while saving and resuming VFIO 
> > > > > > device.
> > > > > > 
> > > > > > Signed-off-by: Kirti Wankhede 
> > > > > > Reviewed-by: Neo Jia 
> > > > > > ---
> > > > > >  include/uapi/linux/vfio.h | 227 
> > > > > > ++
> > > > > >  1 file changed, 227 insertions(+)
> > > > > > 
> > > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > > index 9e843a147ead..d0021467af53 100644
> > > > > > --- a/include/uapi/linux/vfio.h
> > > > > > +++ b/include/uapi/linux/vfio.h
> > > > > > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> > > > > >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
> > > > > >  #define VFIO_REGION_TYPE_GFX(1)
> > > > > >  #define VFIO_REGION_TYPE_CCW   (2)
> > > > > > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> > > > > >  
> > > > > >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> > > > > >  
> > > > > > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> > > > > >  /* sub-types for VFIO_REGION_TYPE_CCW */
> > > > > >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
> > > > > >  
> > > > > > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > > > > > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > > > > > +
> > > > > > +/*
> > > > > > + * The structure vfio_device_migration_info is placed at the 0th 
> > > > > > offset of
> > > > > > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO 
> > > > > > device related
> > > > > > + * migration information. Field accesses from this structure are 
> > > > > > only supported
> > > > > > + * at their native width and alignment. Otherwise, the result is 
> > > > > > undefined and
> > > > > > + * vendor drivers should return an error.
> > > > > > + *
> > > > > > + * device_state: (read/write)
> > > > > > + *  - The user application writes to this field to inform the 
> > > > > > vendor driver
> > > > > > + *about the device state to be transitioned to.
> > > > > > + *  - The vendor driver should take the necessary actions to 
> > > > > > change the
> > > > > > + *device state. After successful transition to a given 
> > > > > > state, the
> > > > > > + *vendor driver should return success on 
> > > > > > write(device_state, state)
> > > > > > + *system call. If the device state transition fails, the 
> > > > > > vendor driver
> > > > > > + *should return an appropriate -errno for the fault 
> > > > > > condition.
> > > > > > + *  - On the user application side, if the device state 
> > > > > > transition fails,
> > > > > > + *   that is, if write(device_state, state) returns an error, read
> > > > > > + *   device_state again to determine the current state of the 
> > > > > > device from
> > > > > > + *   the vendor driver.
> > > > > > + *  - The vendor driver should return previous state of the 
> > > > > > device unless
> > > > > > + *the vendor driver has encountered an internal error, in 
> > > > > > which case
> > > > > > + *the vendor driver may report the device_state 
> > > > > > VFIO_DEVICE_STATE_ERROR.
> > > > > > + *  - The user application must use the device reset ioctl to 
> > > > > > recover the
> > > > > > + *device from VFIO_DEVICE_STATE_ERROR state. If the device 
> > > > > > is
> > > > > > + *indicated to be in a valid device state by reading 
> > > > > > device_state, the
> > > > > > + *user application may attempt to transition the device to 
> > > > > > any valid
> > > > > > + *state reachable from the current state or terminate 
> > > > > > itself.
> > > > > > + *
> > > > > > + *  device_state consists of 3 bits:
> > > > > > + *  - If bit 0 is set, it indicates the _RUNNING state. If bit 
> > > > > > 0 is clear,
> > > > > > + *it indicates the _STOP state. When the device state is 
> > > > > > changed to
> > > > > > + *_STOP, driver should stop the device before write() 
> > > > > > returns.
> > > > > > + *  - If bit 1 is set, it indicates the _SAVING 

RE: [PATCH v3] block/iscsi:use the flags in iscsi_open() prevent Clang warning

2020-03-19 Thread Chenqun (kuhn)
Gentle ping.

Any other suggestions about this?

Thanks.

>-Original Message-
>From: Chenqun (kuhn)
>Sent: Wednesday, March 11, 2020 11:29 AM
>To: qemu-devel@nongnu.org; qemu-triv...@nongnu.org
>Cc: Zhanghailiang ; Chenqun (kuhn)
>; Euler Robot ;
>Kevin Wolf ; Ronnie Sahlberg
>; Paolo Bonzini ; Peter
>Lieven ; Max Reitz ; Laurent Vivier
>
>Subject: [PATCH v3] block/iscsi:use the flags in iscsi_open() prevent Clang
>warning
>
>Clang static code analyzer show warning:
>  block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read
>flags &= ~BDRV_O_RDWR;
>^
>
>In iscsi_allocmap_init() only checks BDRV_O_NOCACHE, which is the same in
>both of flags and bs->open_flags.
>We can use the flags instead bs->open_flags to prevent Clang warning.
>
>Reported-by: Euler Robot 
>Signed-off-by: Chen Qun 
>Reviewed-by: Kevin Wolf 
>---
>Cc: Ronnie Sahlberg 
>Cc: Paolo Bonzini 
>Cc: Peter Lieven 
>Cc: Kevin Wolf 
>Cc: Max Reitz 
>Cc: Laurent Vivier 
>
>v1->v2:
> Keep the 'flags' then use it(Base on Kevin's comments).
>
>v2->v3:
> Modify subject and commit messages(Base on Kevin's and Laurent's
>comments).
>---
> block/iscsi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/block/iscsi.c b/block/iscsi.c index 682abd8e09..50bae51700 100644
>--- a/block/iscsi.c
>+++ b/block/iscsi.c
>@@ -2002,7 +2002,7 @@ static int iscsi_open(BlockDriverState *bs, QDict
>*options, int flags,
> iscsilun->cluster_size = iscsilun->bl.opt_unmap_gran *
> iscsilun->block_size;
> if (iscsilun->lbprz) {
>-ret = iscsi_allocmap_init(iscsilun, bs->open_flags);
>+ret = iscsi_allocmap_init(iscsilun, flags);
> }
> }
>
>--
>2.23.0
>



Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-19 Thread Yan Zhao
On Thu, Mar 19, 2020 at 09:09:21PM +0800, Alex Williamson wrote:
> On Thu, 19 Mar 2020 01:05:54 -0400
> Yan Zhao  wrote:
> 
> > On Thu, Mar 19, 2020 at 11:49:26AM +0800, Alex Williamson wrote:
> > > On Wed, 18 Mar 2020 21:17:03 -0400
> > > Yan Zhao  wrote:
> > >   
> > > > On Thu, Mar 19, 2020 at 03:41:08AM +0800, Kirti Wankhede wrote:  
> > > > > - Defined MIGRATION region type and sub-type.
> > > > > 
> > > > > - Defined vfio_device_migration_info structure which will be placed 
> > > > > at the
> > > > >   0th offset of migration region to get/set VFIO device related
> > > > >   information. Defined members of structure and usage on read/write 
> > > > > access.
> > > > > 
> > > > > - Defined device states and state transition details.
> > > > > 
> > > > > - Defined sequence to be followed while saving and resuming VFIO 
> > > > > device.
> > > > > 
> > > > > Signed-off-by: Kirti Wankhede 
> > > > > Reviewed-by: Neo Jia 
> > > > > ---
> > > > >  include/uapi/linux/vfio.h | 227 
> > > > > ++
> > > > >  1 file changed, 227 insertions(+)
> > > > > 
> > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > index 9e843a147ead..d0021467af53 100644
> > > > > --- a/include/uapi/linux/vfio.h
> > > > > +++ b/include/uapi/linux/vfio.h
> > > > > @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
> > > > >  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0x)
> > > > >  #define VFIO_REGION_TYPE_GFX(1)
> > > > >  #define VFIO_REGION_TYPE_CCW (2)
> > > > > +#define VFIO_REGION_TYPE_MIGRATION  (3)
> > > > >  
> > > > >  /* sub-types for VFIO_REGION_TYPE_PCI_* */
> > > > >  
> > > > > @@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
> > > > >  /* sub-types for VFIO_REGION_TYPE_CCW */
> > > > >  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD(1)
> > > > >  
> > > > > +/* sub-types for VFIO_REGION_TYPE_MIGRATION */
> > > > > +#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
> > > > > +
> > > > > +/*
> > > > > + * The structure vfio_device_migration_info is placed at the 0th 
> > > > > offset of
> > > > > + * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO 
> > > > > device related
> > > > > + * migration information. Field accesses from this structure are 
> > > > > only supported
> > > > > + * at their native width and alignment. Otherwise, the result is 
> > > > > undefined and
> > > > > + * vendor drivers should return an error.
> > > > > + *
> > > > > + * device_state: (read/write)
> > > > > + *  - The user application writes to this field to inform the 
> > > > > vendor driver
> > > > > + *about the device state to be transitioned to.
> > > > > + *  - The vendor driver should take the necessary actions to 
> > > > > change the
> > > > > + *device state. After successful transition to a given 
> > > > > state, the
> > > > > + *vendor driver should return success on write(device_state, 
> > > > > state)
> > > > > + *system call. If the device state transition fails, the 
> > > > > vendor driver
> > > > > + *should return an appropriate -errno for the fault 
> > > > > condition.
> > > > > + *  - On the user application side, if the device state 
> > > > > transition fails,
> > > > > + * that is, if write(device_state, state) returns an error, read
> > > > > + * device_state again to determine the current state of the 
> > > > > device from
> > > > > + * the vendor driver.
> > > > > + *  - The vendor driver should return previous state of the 
> > > > > device unless
> > > > > + *the vendor driver has encountered an internal error, in 
> > > > > which case
> > > > > + *the vendor driver may report the device_state 
> > > > > VFIO_DEVICE_STATE_ERROR.
> > > > > + *  - The user application must use the device reset ioctl to 
> > > > > recover the
> > > > > + *device from VFIO_DEVICE_STATE_ERROR state. If the device is
> > > > > + *indicated to be in a valid device state by reading 
> > > > > device_state, the
> > > > > + *user application may attempt to transition the device to 
> > > > > any valid
> > > > > + *state reachable from the current state or terminate itself.
> > > > > + *
> > > > > + *  device_state consists of 3 bits:
> > > > > + *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 
> > > > > is clear,
> > > > > + *it indicates the _STOP state. When the device state is 
> > > > > changed to
> > > > > + *_STOP, driver should stop the device before write() 
> > > > > returns.
> > > > > + *  - If bit 1 is set, it indicates the _SAVING state, which 
> > > > > means that the
> > > > > + *driver should start gathering device state information 
> > > > > that will be
> > > > > + *provided to the VFIO user application to save the device's 
> > > > > state.
> > > > > + *  - If bit 2 is set, it indicates 

Re: [PATCH v4 0/2] add new options to set smbios type 4 fields

2020-03-19 Thread Heyi Guo



On 2020/3/19 22:46, Igor Mammedov wrote:

On Wed, 18 Mar 2020 14:48:18 +0800
Heyi Guo  wrote:


Common VM users sometimes care about CPU speed, so we add two new
options to allow VM vendors to present CPU speed to their users.
Normally these information can be fetched from host smbios.

it's probably too late for this series due to soft-freeze,
pls repost once 5.0 is released


Ah, I didn't pay enough attention to the merge window.

When will the soft-freeze be ended? Will it be announced in the mailing 
list?


Thanks,

Heyi




v3 -> v4:
- Fix the default value when not specifying "-smbios type=4" option;
   it would be 0 instead of 2000 in previous versions
- Use uint64_t type to check value overflow
- Add test case to check smbios type 4 CPU speed

Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Philippe Mathieu-Daudé 
Cc: Thomas Huth 
Cc: Laurent Vivier 
Cc: Paolo Bonzini 

Heyi Guo (2):
   hw/smbios: add options for type 4 max-speed and current-speed
   tests/bios-tables-test: add smbios cpu speed test

  hw/smbios/smbios.c | 36 +
  qemu-options.hx|  3 ++-
  tests/qtest/bios-tables-test.c | 42 ++
  3 files changed, 76 insertions(+), 5 deletions(-)



.





Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Yan Zhao
On Fri, Mar 20, 2020 at 12:57:30AM +0800, Kirti Wankhede wrote:
> 
> 
> On 3/19/2020 6:36 PM, Alex Williamson wrote:
> > On Thu, 19 Mar 2020 02:15:34 -0400
> > Yan Zhao  wrote:
> > 
> >> On Thu, Mar 19, 2020 at 12:40:53PM +0800, Alex Williamson wrote:
> >>> On Thu, 19 Mar 2020 00:15:33 -0400
> >>> Yan Zhao  wrote:
> >>>
>  On Thu, Mar 19, 2020 at 12:01:00PM +0800, Alex Williamson wrote:
> > On Wed, 18 Mar 2020 23:06:39 -0400
> > Yan Zhao  wrote:
> >  
> >> On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:
> >>> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> >>> - Start dirty pages tracking while migration is active
> >>> - Stop dirty pages tracking.
> >>> - Get dirty pages bitmap. Its user space application's responsibility 
> >>> to
> >>>copy content of dirty pages from source to destination during 
> >>> migration.
> >>>
> >>> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> >>> structure. Bitmap size is calculated considering smallest supported 
> >>> page
> >>> size. Bitmap is allocated for all vfio_dmas when dirty logging is 
> >>> enabled
> >>>
> >>> Bitmap is populated for already pinned pages when bitmap is allocated 
> >>> for
> >>> a vfio_dma with the smallest supported page size. Update bitmap from
> >>> pinning functions when tracking is enabled. When user application 
> >>> queries
> >>> bitmap, check if requested page size is same as page size used to
> >>> populated bitmap. If it is equal, copy bitmap, but if not equal, 
> >>> return
> >>> error.
> >>>
> >>> Signed-off-by: Kirti Wankhede 
> >>> Reviewed-by: Neo Jia 
> >>> ---
> >>>   drivers/vfio/vfio_iommu_type1.c | 205 
> >>> +++-
> >>>   1 file changed, 203 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >>> b/drivers/vfio/vfio_iommu_type1.c
> >>> index 70aeab921d0f..d6417fb02174 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -71,6 +71,7 @@ struct vfio_iommu {
> >>>   unsigned intdma_avail;
> >>>   boolv2;
> >>>   boolnesting;
> >>> + booldirty_page_tracking;
> >>>   };
> >>>   
> >>>   struct vfio_domain {
> >>> @@ -91,6 +92,7 @@ struct vfio_dma {
> >>>   boollock_cap;   /* 
> >>> capable(CAP_IPC_LOCK) */
> >>>   struct task_struct  *task;
> >>>   struct rb_root  pfn_list;   /* Ex-user pinned pfn 
> >>> list */
> >>> + unsigned long   *bitmap;
> >>>   };
> >>>   
> >>>   struct vfio_group {
> >>> @@ -125,7 +127,10 @@ struct vfio_regions {
> >>>   #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
> >>>   
> >>> (!list_empty(>domain_list))
> >>>   
> >>> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> >>> BITS_PER_BYTE)
> >>> +
> >>>   static int put_pfn(unsigned long pfn, int prot);
> >>> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> >>>   
> >>>   /*
> >>>* This code handles mapping and unmapping of user data buffers
> >>> @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu 
> >>> *iommu, struct vfio_dma *old)
> >>>   rb_erase(>node, >dma_list);
> >>>   }
> >>>   
> >>> +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t 
> >>> pgsize)
> >>> +{
> >>> + struct rb_node *n = rb_first(>dma_list);
> >>> +
> >>> + for (; n; n = rb_next(n)) {
> >>> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, 
> >>> node);
> >>> + struct rb_node *p;
> >>> + unsigned long npages = dma->size / pgsize;
> >>> +
> >>> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), 
> >>> GFP_KERNEL);
> >>> + if (!dma->bitmap) {
> >>> + struct rb_node *p = rb_prev(n);
> >>> +
> >>> + for (; p; p = rb_prev(p)) {
> >>> + struct vfio_dma *dma = rb_entry(n,
> >>> + struct 
> >>> vfio_dma, node);
> >>> +
> >>> + kfree(dma->bitmap);
> >>> + dma->bitmap = NULL;
> >>> + }
> >>> + return -ENOMEM;
> >>> + }
> >>> +
> >>> + if (RB_EMPTY_ROOT(>pfn_list))
> >>> + continue;
> >>> +
> >>> + for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> >>> +   

Re: [PATCH 0/5] QEMU Gating CI

2020-03-19 Thread Cleber Rosa
On Thu, Mar 19, 2020 at 05:33:01PM +0100, Markus Armbruster wrote:
> Peter Maydell  writes:
> 
> > On Tue, 17 Mar 2020 at 14:13, Cleber Rosa  wrote:
> >>
> >> On Tue, Mar 17, 2020 at 09:29:32AM +, Peter Maydell wrote:
> >> > Ah, I see. My assumption was that this was all stuff that you were
> >> > working on, so that I would then be able to test that it worked 
> >> > correctly,
> >> > not that I would need to do configuration of the gitlab.com setup.
> >
> >> So, I had to use temporary hardware resources to set this up (and set
> >> it up countless times TBH).  I had the understanding based on the list
> >> of machines you documented[1] that at least some of them would be used
> >> for the permanent setup.
> >
> > Well, some of them will be (eg the s390 box), but some of them
> > are my personal ones that can't be reused easily. I'd assumed
> > in any case that gitlab would have at least support for x86 hosts:
> > we are definitely not going to continue to use my desktop machine
> > for running CI builds! Also IIRC RedHat said they'd be able to
> > provide some machines for runners.

While GitLab let's you run x86 code for free with the "Linux Shared
Runners"[1], I don't think it would be suitable to what we're trying
to achieve.  It's limited to a single OS (CoreOS), single architecture
and really geared towards running containers.  BTW, if it isn't clear,
this is the approach being used today for the jobs defined on
".gitlab-ci.yml".

IMO we can leverage and still expand on the use of the "Linux Shared
Runners", but to really get a grasp oh how well this model can work
for QEMU, we'll need "Specific Runners", because we're validating
how/if we can depend on it for OS/architectures they don't support on
shared runners (and sometimes not even for the gitlab-runner agent).

> 
> Correct!  As discussed at the QEMU summit, we'll gladly chip in runners
> to test the stuff we care about, but to match the coverage of your
> private zoo of machines, others will have to chip in, too.
>

I'm sorry I missed the original discussions, and I'm even more sorry
if that led to any misunderstandings here.

> >> OK, I see it, now it makes more sense.  So we're "only" missing the
> >> setup for the machines we'll use for the more permanent setup.  Would
> >> you like to do a staged setup/migration using one or some of the
> >> machines you documented?  I'm 100% onboard to help with this, meaning
> >> that I can assist you with instructions, or do "pair setup" of the
> >> machines if needed.  I think a good part of the evaluation here comes
> >> down to how manageable/reproducible the setup is, so it'd make sense
> >> for one to be part of the setup itself.
> >
> > I think we should start by getting the gitlab setup working
> > for the basic "x86 configs" first. Then we can try adding
> > a runner for s390 (that one's logistically easiest because
> > it is a project machine, not one owned by me personally or
> > by Linaro) once the basic framework is working, and expand
> > from there.
> 
> Makes sense to me.
> 
> Next steps to get this off the ground:
> 
> * Red Hat provides runner(s) for x86 stuff we care about.
> 
> * If that doesn't cover 'basic "x86 configs" in your judgement, we
>   fill the gaps as described below under "Expand from there".
> 
> * Add an s390 runner using the project machine you mentioned.
> 
> * Expand from there: identify the remaining gaps, map them to people /
>   organizations interested in them, and solicit contributions from these
>   guys.
> 
> A note on contributions: we need both hardware and people.  By people I
> mean maintainers for the infrastructure, the tools and all the runners.
> Cleber & team are willing to serve for the infrastructure, the tools and
> the Red Hat runners.

Right, while we've tried to streamline the process of setting up the
machines, there will be plenty of changes to improve the automation.

More importantly, maintaining the machines is very important to the
super important goal of catching code regressions only, and not facing
other failures.  Mundane tasks such as making sure enough disk space
is always available can be completely change the perception of the
usefulness of a CI environment.  And for this maintenance, we need
help from people "owning" those machines.

> 
> Does this sound workable?
> 
> > But to a large degree I really don't want to have to get
> > into the details of how gitlab works or setting up runners
> > myself if I can avoid it. We're going through this migration
> > because I want to be able to hand off the CI stuff to other
> > people, not to retain control of it.
> 
> Understand.  We need contributions to gating CI, but the whole point of
> this exercise is to make people other than *you* contribute to our
> gating CI :)
> 
> Let me use this opportunity to say thank you for all your integration
> work!
> 
>

^ THIS.  I have to say that I'm still amazed as to how Peter has
managed to automate, integrate and run all those tests in 

[PATCH v2 0/2] Replaced locks with lock guard macros

2020-03-19 Thread dnbrdsky
From: Daniel Brodsky 

This patch set adds:
- a fix for lock guard macros so they can be used multiple times in
the same function
- replacement of locks with lock guards where appropriate

v1 -> v2:
- fixed whitespace churn
- added cover letter so patch set referenced correctly

Daniel Brodsky (2):
  lockable: fix __COUNTER__ macro to be referenced properly
  lockable: replaced locks with lock guard macros where appropriate

 block/iscsi.c   | 11 +++--
 block/nfs.c | 51 +++--
 cpus-common.c   | 13 ---
 hw/display/qxl.c| 43 --
 hw/vfio/platform.c  |  4 +---
 include/qemu/lockable.h |  2 +-
 include/qemu/rcu.h  |  2 +-
 migration/migration.c   |  3 +--
 migration/multifd.c |  8 +++
 migration/ram.c |  3 +--
 monitor/misc.c  |  4 +---
 ui/spice-display.c  | 14 +--
 util/log.c  |  4 ++--
 util/qemu-timer.c   | 17 +++---
 util/rcu.c  |  8 +++
 util/thread-pool.c  |  3 +--
 util/vfio-helpers.c |  4 ++--
 17 files changed, 85 insertions(+), 109 deletions(-)

-- 
2.25.1




[PATCH v2 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread dnbrdsky
From: Daniel Brodsky 

- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky 
---
 block/iscsi.c | 11 +++---
 block/nfs.c   | 51 ---
 cpus-common.c | 13 +--
 hw/display/qxl.c  | 43 +---
 hw/vfio/platform.c|  4 +---
 migration/migration.c |  3 +--
 migration/multifd.c   |  8 +++
 migration/ram.c   |  3 +--
 monitor/misc.c|  4 +---
 ui/spice-display.c| 14 ++--
 util/log.c|  4 ++--
 util/qemu-timer.c | 17 +++
 util/rcu.c|  8 +++
 util/thread-pool.c|  3 +--
 util/vfio-helpers.c   |  4 ++--
 15 files changed, 83 insertions(+), 107 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 682abd8e09..89f8a656a4 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1086,7 +1086,7 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 acb->task->expxferlen = acb->ioh->dxfer_len;
 
 data.size = 0;
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
 if (acb->ioh->iovec_count == 0) {
 data.data = acb->ioh->dxferp;
@@ -1102,7 +1102,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
  iscsi_aio_ioctl_cb,
  (data.size > 0) ?  : NULL,
  acb) != 0) {
-qemu_mutex_unlock(>mutex);
 scsi_free_scsi_task(acb->task);
 qemu_aio_unref(acb);
 return NULL;
@@ -1122,7 +1121,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 }
 
 iscsi_set_events(iscsilun);
-qemu_mutex_unlock(>mutex);
 
 return >common;
 }
@@ -1395,20 +1393,17 @@ static void iscsi_nop_timed_event(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
 error_report("iSCSI: NOP timeout. Reconnecting...");
 iscsilun->request_timed_out = true;
 } else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 0) 
{
 error_report("iSCSI: failed to sent NOP-Out. Disabling NOP messages.");
-goto out;
+return;
 }
 
 timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 
NOP_INTERVAL);
 iscsi_set_events(iscsilun);
-
-out:
-qemu_mutex_unlock(>mutex);
 }
 
 static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
diff --git a/block/nfs.c b/block/nfs.c
index 9a6311e270..09a78aede8 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -273,15 +273,14 @@ static int coroutine_fn nfs_co_preadv(BlockDriverState 
*bs, uint64_t offset,
 nfs_co_init_task(bs, );
 task.iov = iov;
 
-qemu_mutex_lock(>mutex);
-if (nfs_pread_async(client->context, client->fh,
-offset, bytes, nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pread_async(client->context, client->fh,
+offset, bytes, nfs_co_generic_cb, ) != 0) {
+return -ENOMEM;
+}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -320,19 +319,18 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 buf = iov->iov[0].iov_base;
 }
 
-qemu_mutex_lock(>mutex);
-if (nfs_pwrite_async(client->context, client->fh,
- offset, bytes, buf,
- nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-if (my_buffer) {
-g_free(buf);
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pwrite_async(client->context, client->fh,
+ offset, bytes, buf,
+ nfs_co_generic_cb, ) != 0) {
+if (my_buffer) {
+g_free(buf);
+}
+return -ENOMEM;
 }
-return -ENOMEM;
-}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -355,15 +353,14 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 
 nfs_co_init_task(bs, );
 
-qemu_mutex_lock(>mutex);
-if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
-) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
+  

[PATCH v2 1/2] lockable: fix __COUNTER__ macro to be referenced properly

2020-03-19 Thread dnbrdsky
From: Daniel Brodsky 

- __COUNTER__ doesn't work with ## concat
- replaced ## with glue() macro so __COUNTER__ is evaluated

Fixes: 3284c3ddc4

Signed-off-by: Daniel Brodsky 
---
 include/qemu/lockable.h | 2 +-
 include/qemu/rcu.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index 1aeb2cb1a6..a9258f2c2c 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -170,7 +170,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(QemuLockable, 
qemu_lockable_auto_unlock)
  *   }
  */
 #define QEMU_LOCK_GUARD(x) \
-g_autoptr(QemuLockable) qemu_lockable_auto##__COUNTER__ = \
+g_autoptr(QemuLockable) glue(qemu_lockable_auto, __COUNTER__) = \
 qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE((x)))
 
 #endif
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 9c82683e37..570aa603eb 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -170,7 +170,7 @@ static inline void rcu_read_auto_unlock(RCUReadAuto *r)
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(RCUReadAuto, rcu_read_auto_unlock)
 
 #define WITH_RCU_READ_LOCK_GUARD() \
-WITH_RCU_READ_LOCK_GUARD_(_rcu_read_auto##__COUNTER__)
+WITH_RCU_READ_LOCK_GUARD_(glue(_rcu_read_auto, __COUNTER__))
 
 #define WITH_RCU_READ_LOCK_GUARD_(var) \
 for (g_autoptr(RCUReadAuto) var = rcu_read_auto_lock(); \
-- 
2.25.1




Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Alex Williamson
On Fri, 20 Mar 2020 01:46:41 +0530
Kirti Wankhede  wrote:

> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 242 
> +++-
>  1 file changed, 236 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 70aeab921d0f..239f61764d03 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;
>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,21 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */
> +#define DIRTY_BITMAP_PAGES_MAX   ((1UL << 31) - 1)
> +#define DIRTY_BITMAP_SIZE_MAX 
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +191,67 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(>node, >dma_list);
>  }
>  
> +
> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
> +{
> + uint64_t npages = dma->size / pgsize;
> +

Shouldn't we test this against one of the MAX macros defined above?  It
would be bad if we could enabled dirty tracking but not allow the user
to retrieve it.

> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t 
> pgsize)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + struct rb_node *p;
> + int ret;
> +
> + ret = vfio_dma_bitmap_alloc(dma, pgsize);
> + if (ret) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> + }
> + return ret;
> + }
> +
> + if (RB_EMPTY_ROOT(>pfn_list))
> + continue;
> +
> + for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
> +  node);
> +
> + bitmap_set(dma->bitmap,
> +(vpfn->iova - dma->iova) / pgsize, 1);
> + }
> + }
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {
> +

Re: [PATCH] ext4: Give 32bit personalities 32bit hashes

2020-03-19 Thread Linus Walleij
On Thu, Mar 19, 2020 at 4:25 PM Peter Maydell  wrote:
> On Thu, 19 Mar 2020 at 15:13, Linus Walleij  wrote:
> > On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell  
> > wrote:
> > > What in particular does this personality setting affect?
> > > My copy of the personality(2) manpage just says:
> > >
> > >PER_LINUX32 (since Linux 2.2)
> > >   [To be documented.]
> > >
> > > which isn't very informative.
> >
> > It's not a POSIX thing (not part of the Single Unix Specification)
> > so as with most Linux things it has some fuzzy semantics
> > defined by the community...
> >
> > I usually just go to the source.
>
> If we're going to decide that this is the way to say
> "give me 32-bit semantics" we need to actually document
> that and define in at least broad terms what we mean
> by it, so that when new things are added that might or
> might not check against the setting there is a reference
> defining whether they should or not, and so that
> userspace knows what it's opting into by setting the flag.
> The kernel loves undocumented APIs but userspace
> consumers of them are not so enamoured :-)

OK I guess we can at least take this opportunity to add
some kerneldoc to the include file.

> As a concrete example, should "give me 32-bit semantics
> via PER_LINUX32" mean "mmap should always return addresses
> within 4GB" ? That would seem like it would make sense --

Incidentally that thing in particular has its own personality
flag (personalities are additive, it's a bit schizophrenic)
so PER_LINUX_32BIT is defined as:
PER_LINUX_32BIT =   0x | ADDR_LIMIT_32BIT,
and that is specifically for limiting the address space to
32bit.

There is also PER_LINUX32_3GB for a 3GB lowmem
limit.

Since the personality is kind of additive, if
we want a flag *specifically* for indicating that we want
32bit hashes from the file system, there are bits left so we
can provide that.

Is this what we want to do? I just think we shouldn't
decide on that lightly as we will be using up personality
bug bits, but sometimes you have to use them.

PER_LINUX32 as it stands means 32bit personality
but very specifically does not include memory range
limitations since that has its own flags.

Yours,
Linus Walleij



Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread Daniel Brodsky
On Thu, Mar 19, 2020 at 1:53 PM Eric Blake  wrote:

>
> Hmm. This one is a different failure than the other patchew warnings
> about variable redefinition; but is still evidence that it is missing
> your "[PATCH] misc: fix __COUNTER__ macro to be referenced properly".
> At any rate, the fact that we have a compiler warning about an unused
> variable (when in reality it IS used by the auto-cleanup attribute) is
> annoying; we may have to further tweak QEMU_LOCK_GUARD to add an
> __attribute__((unused)) to shut up this particular compiler false positive.
>
> This might fix itself once I revise the patch to properly reference the
prior patch
before this one. If not then I can add another patch to get rid of the
false positive.


Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread Daniel Brodsky
On Thu, Mar 19, 2020 at 1:48 PM Eric Blake  wrote:
>
> On 3/19/20 11:19 AM, dnbrd...@gmail.com wrote:
> > From: danbrodsky 
> >
> > - ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
> > - replaced result with QEMU_LOCK_GUARD if all unlocks at function end
> > - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end
> >
> > Signed-off-by: danbrodsky 
> > ---
> >   block/iscsi.c | 23 +++
> >   block/nfs.c   | 53 ---
> >   cpus-common.c | 13 ---
> >   hw/display/qxl.c  | 44 +--
> >   hw/vfio/platform.c|  4 +---
> >   migration/migration.c |  3 +--
> >   migration/multifd.c   |  8 +++
> >   migration/ram.c   |  3 +--
> >   monitor/misc.c|  4 +---
> >   ui/spice-display.c| 14 ++--
> >   util/log.c|  4 ++--
> >   util/qemu-timer.c | 17 +++---
> >   util/rcu.c|  8 +++
> >   util/thread-pool.c|  3 +--
> >   util/vfio-helpers.c   |  4 ++--
> >   15 files changed, 90 insertions(+), 115 deletions(-)
>
> That's a rather big patch touching multiple areas of code at once; I'm
> not sure if it would be easier to review if you were to break it up into
> a series of smaller patches each touching a smaller group of related
> files.  For example, I don't mind reviwing block/, but tend to shy away
> from migration/ code.

Is this necessary for a series of fairly basic changes? Most files are only
modified on 1 or 2 lines.
>
> >
> > diff --git a/block/iscsi.c b/block/iscsi.c
> > index 682abd8e09..df73bde114 100644
> > --- a/block/iscsi.c
> > +++ b/block/iscsi.c
> > @@ -1086,23 +1086,21 @@ static BlockAIOCB
*iscsi_aio_ioctl(BlockDriverState *bs,
> >   acb->task->expxferlen = acb->ioh->dxfer_len;
> >
> >   data.size = 0;
> > -qemu_mutex_lock(>mutex);
> > +QEMU_LOCK_GUARD(>mutex);
> >   if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
> >   if (acb->ioh->iovec_count == 0) {
> >   data.data = acb->ioh->dxferp;
> >   data.size = acb->ioh->dxfer_len;
> >   } else {
> >   scsi_task_set_iov_out(acb->task,
> > - (struct scsi_iovec *)
acb->ioh->dxferp,
> > - acb->ioh->iovec_count);
> > +  (struct scsi_iovec
*)acb->ioh->dxferp,
> > +  acb->ioh->iovec_count);
>
> This looks like a spurious whitespace change.  Why is it part of the
patch?
>

Sorry, it looks like my editor was autoformatting some areas of the text.
I'll remove
those changes in the next version.

> >   }
> >   }
> >
> >   if (iscsi_scsi_command_async(iscsi, iscsilun->lun, acb->task,
> >iscsi_aio_ioctl_cb,
> > - (data.size > 0) ?  : NULL,
> > - acb) != 0) {
> > -qemu_mutex_unlock(>mutex);
> > + (data.size > 0) ?  : NULL, acb)
!= 0) {
> >   scsi_free_scsi_task(acb->task);
>
> Unwrapping the line fit in 80 columns, but again, why are you mixing
> whitespace changes in rather than focusing on the cleanup of mutex
> actions?  Did you create this patch mechanically with a tool like
> Coccinelle, as the source of your reflowing of lines?  If so, what was
> the input to Coccinelle; if it was some other automated tool, can you
> include the formula so that someone else could reproduce your changes
> (whitespace and all)?  If it was not automated, that's also okay, but
> then I would not expect as much whitespace churn.
>

Should I not be including changes that fix warnings in code check? I'll
correct
the mistakes and submit a new version without all the whitespace churn.


Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Alex Williamson
On Fri, 20 Mar 2020 01:55:10 +0530
Kirti Wankhede  wrote:

> On 3/19/2020 9:52 PM, Alex Williamson wrote:
> > On Thu, 19 Mar 2020 20:22:41 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 3/19/2020 9:15 AM, Alex Williamson wrote:  
> >>> On Thu, 19 Mar 2020 01:11:11 +0530
> >>> Kirti Wankhede  wrote:
> >>>  
> 
> 
> 
>  +
>  +static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
>  +{
>  +uint64_t bsize;
>  +
>  +if (!npages || !bitmap_size || bitmap_size > UINT_MAX)  
> >>>
> >>> As commented previously, how do we derive this UINT_MAX limitation?
> >>>  
> >>
> >> Sorry, I missed that earlier
> >>  
> >>   > UINT_MAX seems arbitrary, is this specified in our API?  The size of a
> >>   > vfio_dma is limited to what the user is able to pin, and therefore
> >>   > their locked memory limit, but do we have an explicit limit elsewhere
> >>   > that results in this limit here.  I think a 4GB bitmap would track
> >>   > something like 2^47 bytes of memory, that's pretty excessive, but still
> >>   > an arbitrary limit.  
> >>
> >> There has to be some upper limit check. In core KVM, in
> >> virt/kvm/kvm_main.c there is max number of pages check:
> >>
> >> if (new.npages > KVM_MEM_MAX_NR_PAGES)
> >>
> >> Where
> >> /*
> >>* Some of the bitops functions do not support too long bitmaps.
> >>* This number must be determined not to exceed such limits.
> >>*/
> >> #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
> >>
> >> Though I don't know which bitops functions do not support long bitmaps.
> >>
> >> Something similar as above can be done or same as you also mentioned of
> >> 4GB bitmap limit? that is U32_MAX instead of UINT_MAX?  
> > 
> > Let's see, we use bitmap_set():
> > 
> > void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits)
> > 
> > So we're limited to an unsigned int number of bits, but for an
> > unaligned, multi-bit operation this will call __bitmap_set():
> > 
> > void __bitmap_set(unsigned long *map, unsigned int start, int len)
> > 
> > So we're down to a signed int number of bits (seems like an API bug in
> > bitops there), so it makes sense that KVM is testing against MAX_INT
> > number of pages, ie. number of bits.  But that still suggests a bitmap
> > size of MAX_UINT is off by a factor of 16.  So we can have 2^31 bits
> > divided by 2^3 bits/byte yields a maximum bitmap size of 2^28 (ie.
> > 256MB), which maps 2^31 * 2^12 = 2^43 (8TB) on a 4K system.
> > 
> > Let's fix the limit check and put a nice comment explaining it.  Thanks,
> >   
> 
> Agreed. Adding DIRTY_BITMAP_SIZE_MAX macro and comment as below.
> 
> /*
>   * Input argument of number of bits to bitmap_set() is unsigned 
> integer, which
>   * further casts to signed integer for unaligned multi-bit operation,
>   * __bitmap_set().
>   * Then maximum bitmap size supported is 2^31 bits divided by 2^3 
> bits/byte,
>   * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
>   * system.
>   */
> #define DIRTY_BITMAP_PAGES_MAX  ((1UL << 31) - 1)

nit, can we just use INT_MAX here?

> #define DIRTY_BITMAP_SIZE_MAX \
>   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> 
> 
> Thanks,
> Kirti
> 




[PATCH v15 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support

2020-03-19 Thread Kirti Wankhede
Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver
support dirty pages tracking.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 3 ++-
 include/uapi/linux/vfio.h   | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e79c1ff6fb41..dce0a3e1e8b7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2368,7 +2368,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.cap_offset = 0; /* output, no-recopy necessary */
}
 
-   info.flags = VFIO_IOMMU_INFO_PGSIZES;
+   info.flags = VFIO_IOMMU_INFO_PGSIZES |
+VFIO_IOMMU_INFO_DIRTY_PGS;
 
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2780a5742c04..4a886ff84c92 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -947,8 +947,9 @@ struct vfio_device_ioeventfd {
 struct vfio_iommu_type1_info {
__u32   argsz;
__u32   flags;
-#define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes info */
-#define VFIO_IOMMU_INFO_CAPS   (1 << 1)/* Info supports caps */
+#define VFIO_IOMMU_INFO_PGSIZES   (1 << 0) /* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS  (1 << 1) /* Info supports caps */
+#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */
__u64   iova_pgsizes;   /* Bitmap of supported page sizes */
__u32   cap_offset; /* Offset within info struct of first cap */
 };
-- 
2.7.0




[PATCH v15 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking.

2020-03-19 Thread Kirti Wankhede
IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop dirtied pages tracking and to get bitmap of all
dirtied pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 55 +++
 1 file changed, 55 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d0021467af53..8138f94cac15 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -995,6 +995,12 @@ struct vfio_iommu_type1_dma_map {
 
 #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
 
+struct vfio_bitmap {
+   __u64pgsize;/* page size for bitmap */
+   __u64size;  /* in bytes */
+   __u64 __user *data; /* one bit per page */
+};
+
 /**
  * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
  * struct vfio_dma_unmap)
@@ -1021,6 +1027,55 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size of
+ * struct vfio_iommu_type1_dirty_bitmap. Caller set flag depend on which
+ * operation to perform, details as below:
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
+ * migration is active and IOMMU module should track pages which are dirtied or
+ * potentially dirtied by device.
+ * Dirty pages are tracked until tracking is stopped by user application by
+ * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
+ * IOMMU should stop tracking dirtied pages.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
+ * IOCTL returns dirty pages bitmap for IOMMU container during migration for
+ * given IOVA range. User must provide data[] as the structure
+ * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
+ * pgsize. This interface supports to get bitmap of smallest supported pgsize
+ * only and can be modified in future to get bitmap of specified pgsize.
+ * User must allocate memory for bitmap, zero the bitmap memory and set size
+ * of allocated memory in bitmap_size field. One bit is used to represent one
+ * page consecutively starting from iova offset. User should provide page size
+ * in 'pgsize'. Bit set in bitmap indicates page at that offset from iova is
+ * dirty. Caller must set argsz including size of structure
+ * vfio_iommu_type1_dirty_bitmap_get.
+ *
+ * Only one of the flags _START, STOP and _GET may be specified at a time.
+ *
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+   __u32argsz;
+   __u32flags;
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START  (1 << 0)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP   (1 << 1)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2)
+   __u8 data[];
+};
+
+struct vfio_iommu_type1_dirty_bitmap_get {
+   __u64  iova;/* IO virtual address */
+   __u64  size;/* Size of iova range */
+   struct vfio_bitmap bitmap;
+};
+
+#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0




[PATCH v15 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2020-03-19 Thread Kirti Wankhede
Added a check such that only singleton IOMMU groups can pin pages.
>From the point when vendor driver pins any pages, consider IOMMU group
dirty page scope to be limited to pinned pages.

To optimize to avoid walking list often, added flag
pinned_page_dirty_scope to indicate if all of the vfio_groups for each
vfio_domain in the domain_list dirty page scope is limited to pinned
pages. This flag is updated on first pinned pages request for that IOMMU
group and on attaching/detaching group.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio.c | 13 --
 drivers/vfio/vfio_iommu_type1.c | 94 +++--
 include/linux/vfio.h|  4 +-
 3 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 210fcf426643..311b5e4e111e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -85,6 +85,7 @@ struct vfio_group {
atomic_topened;
wait_queue_head_t   container_q;
boolnoiommu;
+   unsigned intdev_counter;
struct kvm  *kvm;
struct blocking_notifier_head   notifier;
 };
@@ -555,6 +556,7 @@ struct vfio_device *vfio_group_create_device(struct 
vfio_group *group,
 
mutex_lock(>device_lock);
list_add(>group_next, >device_list);
+   group->dev_counter++;
mutex_unlock(>device_lock);
 
return device;
@@ -567,6 +569,7 @@ static void vfio_device_release(struct kref *kref)
struct vfio_group *group = device->group;
 
list_del(>group_next);
+   group->dev_counter--;
mutex_unlock(>device_lock);
 
dev_set_drvdata(device->dev, NULL);
@@ -1933,6 +1936,9 @@ int vfio_pin_pages(struct device *dev, unsigned long 
*user_pfn, int npage,
if (!group)
return -ENODEV;
 
+   if (group->dev_counter > 1)
+   return -EINVAL;
+
ret = vfio_group_add_container_user(group);
if (ret)
goto err_pin_pages;
@@ -1940,7 +1946,8 @@ int vfio_pin_pages(struct device *dev, unsigned long 
*user_pfn, int npage,
container = group->container;
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
-   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+   ret = driver->ops->pin_pages(container->iommu_data,
+group->iommu_group, user_pfn,
 npage, prot, phys_pfn);
else
ret = -ENOTTY;
@@ -2038,8 +2045,8 @@ int vfio_group_pin_pages(struct vfio_group *group,
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
-user_iova_pfn, npage,
-prot, phys_pfn);
+group->iommu_group, user_iova_pfn,
+npage, prot, phys_pfn);
else
ret = -ENOTTY;
 
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index dce0a3e1e8b7..881abfc04f0a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,6 +72,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
booldirty_page_tracking;
+   boolpinned_page_dirty_scope;
 };
 
 struct vfio_domain {
@@ -99,6 +100,7 @@ struct vfio_group {
struct iommu_group  *iommu_group;
struct list_headnext;
boolmdev_group; /* An mdev group */
+   boolpinned_page_dirty_scope;
 };
 
 struct vfio_iova {
@@ -143,6 +145,10 @@ struct vfio_regions {
 static int put_pfn(unsigned long pfn, int prot);
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
+static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
+  struct iommu_group *iommu_group);
+
+static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
 /*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
@@ -579,11 +585,13 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, 
dma_addr_t iova,
 }
 
 static int vfio_iommu_type1_pin_pages(void *iommu_data,
+ struct iommu_group *iommu_group,
  unsigned long *user_pfn,
  int npage, int prot,
  unsigned long *phys_pfn)
 {
struct vfio_iommu *iommu = iommu_data;
+   struct vfio_group *group;
int i, j, ret;
  

Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread Eric Blake

On 3/19/20 2:31 PM, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200319161925.1818377-2-dnbrd...@gmail.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

   CC  trace-root.o
   CC  accel/kvm/trace.o
   CC  accel/tcg/trace.o
/tmp/qemu-test/src/util/thread-pool.c:213:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
 QEMU_LOCK_GUARD(>lock);
 ^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'


Hmm. This one is a different failure than the other patchew warnings 
about variable redefinition; but is still evidence that it is missing 
your "[PATCH] misc: fix __COUNTER__ macro to be referenced properly". 
At any rate, the fact that we have a compiler warning about an unused 
variable (when in reality it IS used by the auto-cleanup attribute) is 
annoying; we may have to further tweak QEMU_LOCK_GUARD to add an 
__attribute__((unused)) to shut up this particular compiler false positive.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH v15 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-19 Thread Kirti Wankhede
DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
size of allocated memory, set page size to be considered for bitmap and
set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 54 ++---
 include/uapi/linux/vfio.h   | 10 
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 239f61764d03..e79c1ff6fb41 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -962,7 +962,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
bitmap_size)
 }
 
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
-struct vfio_iommu_type1_dma_unmap *unmap)
+struct vfio_iommu_type1_dma_unmap *unmap,
+struct vfio_bitmap *bitmap)
 {
uint64_t mask;
struct vfio_dma *dma, *dma_last = NULL;
@@ -1013,6 +1014,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 * will be returned if these conditions are not met.  The v2 interface
 * will only return success and a size of zero if there were no
 * mappings within the range.
+*
+* When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
+* must be for single mapping. Multiple mappings with this flag set is
+* not supported.
 */
if (iommu->v2) {
dma = vfio_find_dma(iommu, unmap->iova, 1);
@@ -1020,6 +1025,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
ret = -EINVAL;
goto unlock;
}
+
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+   (dma->iova != unmap->iova || dma->size != unmap->size)) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
ret = -EINVAL;
@@ -1037,6 +1049,11 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
if (dma->task->mm != current->mm)
break;
 
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+iommu->dirty_page_tracking)
+   vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
+   bitmap->pgsize, bitmap->data);
+
if (!RB_EMPTY_ROOT(>pfn_list)) {
struct vfio_iommu_type1_dma_unmap nb_unmap;
 
@@ -2398,17 +2415,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
struct vfio_iommu_type1_dma_unmap unmap;
-   long ret;
+   struct vfio_bitmap bitmap = { 0 };
+   int ret;
 
minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (unmap.argsz < minsz || unmap.flags)
+   if (unmap.argsz < minsz ||
+   unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
return -EINVAL;
 
-   ret = vfio_dma_do_unmap(iommu, );
+   if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+   unsigned long pgshift;
+   uint64_t iommu_pgsize =
+1 << __ffs(vfio_pgsize_bitmap(iommu));
+
+   if (unmap.argsz < (minsz + sizeof(bitmap)))
+   return -EINVAL;
+
+   if (copy_from_user(,
+  (void __user *)(arg + minsz),
+  sizeof(bitmap)))
+   return -EFAULT;
+
+   /* allow only min supported pgsize */
+   if (bitmap.pgsize != iommu_pgsize)
+   return -EINVAL;
+   if (!access_ok((void __user *)bitmap.data, bitmap.size))
+   return -EINVAL;
+
+   pgshift = __ffs(bitmap.pgsize);
+   ret = verify_bitmap_size(unmap.size >> pgshift,
+  

[PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Kirti Wankhede
VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 242 +++-
 1 file changed, 236 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..239f61764d03 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
 };
 
 struct vfio_domain {
@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
 };
 
 struct vfio_group {
@@ -125,7 +127,21 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
+
+/*
+ * Input argument of number of bits to bitmap_set() is unsigned integer, which
+ * further casts to signed integer for unaligned multi-bit operation,
+ * __bitmap_set().
+ * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
+ * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
+ * system.
+ */
+#define DIRTY_BITMAP_PAGES_MAX ((1UL << 31) - 1)
+#define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
+
 static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
 /*
  * This code handles mapping and unmapping of user data buffers
@@ -175,6 +191,67 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
 }
 
+
+static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
+{
+   uint64_t npages = dma->size / pgsize;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t pgsize)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   struct rb_node *p;
+   int ret;
+
+   ret = vfio_dma_bitmap_alloc(dma, pgsize);
+   if (ret) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+   return ret;
+   }
+
+   if (RB_EMPTY_ROOT(>pfn_list))
+   continue;
+
+   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
+node);
+
+   bitmap_set(dma->bitmap,
+  (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+}
+
 /*
  * Helper Functions for host iova-pfn list
  */
@@ -567,6 +644,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
vfio_unpin_page_external(dma, iova, do_accounting);
goto pin_unwind;
 

[PATCH v15 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-19 Thread Kirti Wankhede
- Defined MIGRATION region type and sub-type.

- Defined vfio_device_migration_info structure which will be placed at the
  0th offset of migration region to get/set VFIO device related
  information. Defined members of structure and usage on read/write access.

- Defined device states and state transition details.

- Defined sequence to be followed while saving and resuming VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 227 ++
 1 file changed, 227 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..d0021467af53 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |->|
+ *
+ * 2. Save the state or suspend
+ * |->|-->|
+ *
+ * 3. 

[PATCH v15 Kernel 0/7] KABIs to support migration for VFIO devices

2020-03-19 Thread Kirti Wankhede
Hi,

This patch set adds:
* New IOCTL VFIO_IOMMU_DIRTY_PAGES to get dirty pages bitmap with
  respect to IOMMU container rather than per device. All pages pinned by
  vendor driver through vfio_pin_pages external API has to be marked as
  dirty during  migration. When IOMMU capable device is present in the
  container and all pages are pinned and mapped, then all pages are marked
  dirty.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned should be considered
  as dirty.
  This ioctl is also used to start/stop dirty pages tracking for pinned and
  unpinned pages while migration is active.

* Updated IOCTL VFIO_IOMMU_UNMAP_DMA to get dirty pages bitmap before
  unmapping IO virtual address range.
  With vIOMMU, during pre-copy phase of migration, while CPUs are still
  running, IO virtual address unmap can happen while device still keeping
  reference of guest pfns. Those pages should be reported as dirty before
  unmap, so that VFIO user space application can copy content of those
  pages from source to destination.

* Patch 7 detect if IOMMU capable device driver is smart to report pages
  to be marked dirty by pinning pages using vfio_pin_pages() API.


Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required such that vendor
driver could report dirty pages to VFIO module during migration phases.

Adding revision history from previous QEMU patch set to understand KABI
changes done till now

v14 -> v15
- Minor edits and nit picks.
- In the verification of user allocated bitmap memory, added check of
   maximum size.
- Patches are on tag: next-20200318 and 1-3 patches from Yan's series
  https://lkml.org/lkml/2020/3/12/1255

v13 -> v14
- Added struct vfio_bitmap to kabi. updated structure
  vfio_iommu_type1_dirty_bitmap_get and vfio_iommu_type1_dma_unmap.
- All small changes suggested by Alex.
- Patches are on tag: next-20200318 and 1-3 patches from Yan's series
  https://lkml.org/lkml/2020/3/12/1255

v12 -> v13
- Changed bitmap allocation in vfio_iommu_type1 to per vfio_dma
- Changed VFIO_IOMMU_DIRTY_PAGES ioctl behaviour to be per vfio_dma range.
- Changed vfio_iommu_type1_dirty_bitmap structure to have separate data
  field.

v11 -> v12
- Changed bitmap allocation in vfio_iommu_type1.
- Remove atomicity of ref_count.
- Updated comments for migration device state structure about error
  reporting.
- Nit picks from v11 reviews

v10 -> v11
- Fix pin pages API to free vpfn if it is marked as unpinned tracking page.
- Added proposal to detect if IOMMU capable device calls external pin pages
  API to mark pages dirty.
- Nit picks from v10 reviews

v9 -> v10:
- Updated existing VFIO_IOMMU_UNMAP_DMA ioctl to get dirty pages bitmap
  during unmap while migration is active
- Added flag in VFIO_IOMMU_GET_INFO to indicate driver support dirty page
  tracking.
- If iommu_mapped, mark all pages dirty.
- Added unpinned pages tracking while migration is active.
- Updated comments for migration device state structure with bit
  combination table and state transition details.

v8 -> v9:
- Split patch set in 2 sets, Kernel and QEMU.
- Dirty pages bitmap is queried from IOMMU container rather than from
  vendor driver for per device. Added 2 ioctls to achieve this.

v7 -> v8:
- Updated comments for KABI
- Added BAR address validation check during PCI device's config space load
  as suggested by Dr. David Alan Gilbert.
- Changed vfio_migration_set_state() to set or clear device state flags.
- Some nit fixes.

v6 -> v7:
- Fix build failures.

v5 -> v6:
- Fix build failure.

v4 -> v5:
- Added decriptive comment about the sequence of access of members of
  structure vfio_device_migration_info to be followed based on Alex's
  suggestion
- Updated get dirty pages sequence.
- As per Cornelia Huck's suggestion, added callbacks to VFIODeviceOps to
  get_object, save_config and load_config.
- Fixed multiple nit picks.
- Tested live migration with multiple vfio device assigned to a VM.

v3 -> v4:
- Added one more bit for _RESUMING flag to be set explicitly.
- data_offset field is read-only for user space application.
- data_size is read for every iteration before reading data from migration,
  that is removed assumption that data will be till end of migration
  region.
- If vendor driver supports mappable sparsed region, map those region
  during setup state of save/load, similarly unmap those from cleanup
  routines.
- Handles race condition that causes data corruption in migration region
  during save device state by adding mutex and serialiaing save_buffer and
  get_dirty_pages routines.
- Skip 

Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread Eric Blake

On 3/19/20 2:39 PM, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20200319161925.1818377-2-dnbrd...@gmail.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

   CC  hw/i2c/trace.o
In file included from /tmp/qemu-test/src/util/rcu.c:34:0:
/tmp/qemu-test/src/util/rcu.c: In function 'synchronize_rcu':
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: error: redefinition of 
'qemu_lockable_auto__COUNTER__'
  g_autoptr(QemuLockable) qemu_lockable_auto##__COUNTER__ = \


Patchew was confused: you sent this message in-reply-to "[PATCH] misc: 
fix __COUNTER__ macro to be referenced properly" which addresses this 
complaint, but as you did not thread the messages 0/2, 1/2, 2/2, patchew 
applied this one out of sequence.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH v15 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages

2020-03-19 Thread Kirti Wankhede
vfio_pfn.ref_count is always updated by holding iommu->lock, using atomic
variable is overkill.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9fdfae1cb17a..70aeab921d0f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -112,7 +112,7 @@ struct vfio_pfn {
struct rb_node  node;
dma_addr_t  iova;   /* Device address */
unsigned long   pfn;/* Host pfn */
-   atomic_tref_count;
+   unsigned intref_count;
 };
 
 struct vfio_regions {
@@ -233,7 +233,7 @@ static int vfio_add_to_pfn_list(struct vfio_dma *dma, 
dma_addr_t iova,
 
vpfn->iova = iova;
vpfn->pfn = pfn;
-   atomic_set(>ref_count, 1);
+   vpfn->ref_count = 1;
vfio_link_pfn(dma, vpfn);
return 0;
 }
@@ -251,7 +251,7 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct 
vfio_dma *dma,
struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova);
 
if (vpfn)
-   atomic_inc(>ref_count);
+   vpfn->ref_count++;
return vpfn;
 }
 
@@ -259,7 +259,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
struct vfio_pfn *vpfn)
 {
int ret = 0;
 
-   if (atomic_dec_and_test(>ref_count)) {
+   vpfn->ref_count--;
+   if (!vpfn->ref_count) {
ret = put_pfn(vpfn->pfn, dma->prot);
vfio_remove_from_pfn_list(dma, vpfn);
}
-- 
2.7.0




Re: [PATCH v2 6/8] target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector

2020-03-19 Thread Programmingkid


> On Mar 17, 2020, at 6:46 PM, David Gibson  wrote:
> 
> On Tue, Mar 17, 2020 at 11:06:15AM -0400, Programmingkid wrote:
>> 
>>> On Mar 17, 2020, at 7:01 AM, qemu-ppc-requ...@nongnu.org wrote:
>>> 
>>> Message: 3
>>> Date: Tue, 17 Mar 2020 11:47:32 +0100
>>> From: Cédric Le Goater 
>>> To: David Gibson , Nicholas Piggin
>>> 
>>> Cc: qemu-...@nongnu.org, Aravinda Prasad ,
>>> Ganesh Goudar , Greg Kurz ,
>>> qemu-devel@nongnu.org
>>> Subject: Re: [PATCH v2 6/8] target/ppc: allow ppc_cpu_do_system_reset
>>> to take an alternate vector
>>> Message-ID: <097148e5-78be-a294-236d-160fb5c29...@kaod.org>
>>> Content-Type: text/plain; charset=windows-1252
>>> 
>>> On 3/17/20 12:34 AM, David Gibson wrote:
 On Tue, Mar 17, 2020 at 09:28:24AM +1000, Nicholas Piggin wrote:
> Cédric Le Goater's on March 17, 2020 4:15 am:
>> On 3/16/20 3:26 PM, Nicholas Piggin wrote:
>>> Provide for an alternate delivery location, -1 defaults to the
>>> architected address.
>> 
>> I don't know what is the best approach, to override the vector addr
>> computed by powerpc_excp() or use a machine class handler with 
>> cpu->vhyp.
> 
> Yeah it's getting a bit ad hoc and inconsistent with machine check
> etc, I just figured get something minimal in there now. The whole
> exception delivery needs a spring clean though.
 
>> 
>> Currently Mac OS 9 will not restart. When someone goes to restart it
>> the screen will turn black and stay that way. Could this patch solve
>> this problem?
> 
> No.  It's unlikely to be related, and at this stage is used
> exclusively to implement the FWNMI stuff for the pseries machine.
> 
> -- 
> David Gibson  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
> _other_
>   | _way_ _around_!
> http://www.ozlabs.org/~dgibson

Ok. Thank you.




Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread Eric Blake

On 3/19/20 11:19 AM, dnbrd...@gmail.com wrote:

From: danbrodsky 

- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: danbrodsky 
---
  block/iscsi.c | 23 +++
  block/nfs.c   | 53 ---
  cpus-common.c | 13 ---
  hw/display/qxl.c  | 44 +--
  hw/vfio/platform.c|  4 +---
  migration/migration.c |  3 +--
  migration/multifd.c   |  8 +++
  migration/ram.c   |  3 +--
  monitor/misc.c|  4 +---
  ui/spice-display.c| 14 ++--
  util/log.c|  4 ++--
  util/qemu-timer.c | 17 +++---
  util/rcu.c|  8 +++
  util/thread-pool.c|  3 +--
  util/vfio-helpers.c   |  4 ++--
  15 files changed, 90 insertions(+), 115 deletions(-)


That's a rather big patch touching multiple areas of code at once; I'm 
not sure if it would be easier to review if you were to break it up into 
a series of smaller patches each touching a smaller group of related 
files.  For example, I don't mind reviwing block/, but tend to shy away 
from migration/ code.




diff --git a/block/iscsi.c b/block/iscsi.c
index 682abd8e09..df73bde114 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1086,23 +1086,21 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
  acb->task->expxferlen = acb->ioh->dxfer_len;
  
  data.size = 0;

-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
  if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
  if (acb->ioh->iovec_count == 0) {
  data.data = acb->ioh->dxferp;
  data.size = acb->ioh->dxfer_len;
  } else {
  scsi_task_set_iov_out(acb->task,
- (struct scsi_iovec *) acb->ioh->dxferp,
- acb->ioh->iovec_count);
+  (struct scsi_iovec *)acb->ioh->dxferp,
+  acb->ioh->iovec_count);


This looks like a spurious whitespace change.  Why is it part of the patch?


  }
  }
  
  if (iscsi_scsi_command_async(iscsi, iscsilun->lun, acb->task,

   iscsi_aio_ioctl_cb,
- (data.size > 0) ?  : NULL,
- acb) != 0) {
-qemu_mutex_unlock(>mutex);
+ (data.size > 0) ?  : NULL, acb) != 0) {
  scsi_free_scsi_task(acb->task);


Unwrapping the line fit in 80 columns, but again, why are you mixing 
whitespace changes in rather than focusing on the cleanup of mutex 
actions?  Did you create this patch mechanically with a tool like 
Coccinelle, as the source of your reflowing of lines?  If so, what was 
the input to Coccinelle; if it was some other automated tool, can you 
include the formula so that someone else could reproduce your changes 
(whitespace and all)?  If it was not automated, that's also okay, but 
then I would not expect as much whitespace churn.



@@ -,18 +1109,16 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
  /* tell libiscsi to read straight into the buffer we got from ioctl */
  if (acb->task->xfer_dir == SCSI_XFER_READ) {
  if (acb->ioh->iovec_count == 0) {
-scsi_task_add_data_in_buffer(acb->task,
- acb->ioh->dxfer_len,
+scsi_task_add_data_in_buffer(acb->task, acb->ioh->dxfer_len,
   acb->ioh->dxferp);
  } else {
  scsi_task_set_iov_in(acb->task,
- (struct scsi_iovec *) acb->ioh->dxferp,
+ (struct scsi_iovec *)acb->ioh->dxferp,
   acb->ioh->iovec_count);


Again, spurious whitespace changes.


  }
  }
  
  iscsi_set_events(iscsilun);

-qemu_mutex_unlock(>mutex);
  
  return >common;

  }
@@ -1395,20 +1391,17 @@ static void iscsi_nop_timed_event(void *opaque)
  {
  IscsiLun *iscsilun = opaque;
  
-qemu_mutex_lock(>mutex);

+QEMU_LOCK_GUARD(>mutex);
  if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
  error_report("iSCSI: NOP timeout. Reconnecting...");
  iscsilun->request_timed_out = true;
  } else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 
0) {
  error_report("iSCSI: failed to sent NOP-Out. Disabling NOP 
messages.");
-goto out;
+return;
  }
  
  timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);

  iscsi_set_events(iscsilun);
-
-out:
-qemu_mutex_unlock(>mutex);
  }


But the cleanup itself is functionally correct in this file.

  
  static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)


Re: [PULL v3 00/16] Linux user for 5.0 patches

2020-03-19 Thread Peter Maydell
On Thu, 19 Mar 2020 at 09:29, Laurent Vivier  wrote:
>
> The following changes since commit 373c7068dd610e97f0b551b5a6d0a27cd6da4506:
>
>   qemu.nsi: Install Sphinx documentation (2020-03-09 16:45:00 +)
>
> are available in the Git repository at:
>
>   git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request
>
> for you to fetch changes up to c91518bb0649f09e2c636790603907ef93ea95d4:
>
>   linux-user, openrisc: sync syscall numbers with kernel v5.5 (2020-03-19 
> 09:22:21 +0100)
>
> 
> update syscall numbers to linux 5.5 (with scripts)
> add futex_time64/clock_gettime64/clock_settime64
> add AT_EXECFN
> Emulate x86_64 vsyscalls
>
> v3: remove syscall.tbl series
> v2: guard copy_to_user_timezone() with TARGET_NR_gettimeofday
> remove "Support futex_time64" patch
> guard sys_futex with TARGET_NR_exit
>
> 

Still fails:

/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/x86_64-linux-user/qemu-x86_64
-L ./gnemul/qemu-x86_64 x86_64/ls -l dummyfile
qemu: 0x40008117e9: unhandled CPU exception 0x101 - aborting
RAX=003f RBX=6e34 RCX=004000800b18
RDX=004000813180
RSI=0064 RDI=004000800670 RBP=6f40
RSP=004000800668
R8 = R9 =004000800b45 R10=004000801a18
R11=004000801260
R12=0040008008c0 R13=0008 R14=00400040
R15=0040008032d0
RIP=0040008117e9 RFL=0246 [---Z-P-] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =   
CS =0033   00effb00 DPL=3 CS64 [-RA]
SS =002b   00cff300 DPL=3 DS   [-WA]
DS =   
FS =   
GS =   
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS64-busy
GDT= 00400091a000 007f
IDT= 004000919000 01ff
CR0=80010001 CR2= CR3= CR4=0220
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=0500
Makefile:6: recipe for target 'test' failed
make: *** [test] Error 127

thanks
-- PMM



[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-19 Thread Peter Maydell
Thanks, that's pretty clear. I expect you'll find the bug is just that
QEMU doesn't get the semantics of an iret from userspace correct. The
helper_iret_protected() function is probably a good place to look.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no alt stacks come anywhere near
  the failing address; all alt stacks are in the heap, as expected.
  None of the mmap/mprotect/munmap 

Re: [PATCH] misc: fix __COUNTER__ macro to be referenced properly

2020-03-19 Thread Eric Blake

On 3/19/20 11:19 AM, dnbrd...@gmail.com wrote:

From: danbrodsky 

- __COUNTER__ doesn't work with ## concat
- replaced ## with glue() macro so __COUNTER__ is evaluated

Signed-off-by: danbrodsky 


Thanks - this appears to be your first contribution to qemu.

Typically, the S-o-b should match how you would spell your legal name, 
rather than being a single-word computer user name.


It looks like you threaded another message to this one:
Message-Id: <20200319161925.1818377-2-dnbrd...@gmail.com>
Subject: [PATCH] lockable: replaced locks with lock guard macros where
 appropriate
but without a 0/2 cover letter, or even a 1/2 or 2/2 indicator on the 
individual patches.  This makes it more likely that the second patch may 
be overlooked by our CI tools.


Since this patch is fixing an issue that just went into the tree 
recently, it would be useful to add mention of that in the commit message:

Fixes: 3284c3ddc4

In fact, using 'lockable:' rather than 'misc:' as your subject prefix 
makes it more obvious that you are fixing an issue in the same area as 
where it was introduced.


More patch submission hints at https://wiki.qemu.org/Contribute/SubmitAPatch


---
  include/qemu/lockable.h | 2 +-
  include/qemu/rcu.h  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index 1aeb2cb1a6..a9258f2c2c 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -170,7 +170,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(QemuLockable, 
qemu_lockable_auto_unlock)
   *   }
   */
  #define QEMU_LOCK_GUARD(x) \
-g_autoptr(QemuLockable) qemu_lockable_auto##__COUNTER__ = \
+g_autoptr(QemuLockable) glue(qemu_lockable_auto, __COUNTER__) = \


That said, the patch itself is correct.
Reviewed-by: Eric Blake 

I'll leave it up to the maintainer for this file whether they can 
improve your commit message (although the hardest part of that would be 
knowing a full proper name to use in place of your username), or if you 
will need to send a v2.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PULL 3/4] s390/ipl: sync back loadparm

2020-03-19 Thread Peter Maydell
On Tue, 10 Mar 2020 at 15:09, Christian Borntraeger
 wrote:
>
> From: Halil Pasic 
>
> We expose loadparm as a r/w machine property, but if loadparm is set by
> the guest via DIAG 308, we don't update the property. Having a
> disconnect between the guest view and the QEMU property is not nice in
> itself, but things get even worse for SCSI, where under certain
> circumstances (see 789b5a401b "s390: Ensure IPL from SCSI works as
> expected" for details) we call s390_gen_initial_iplb() on resets
> effectively overwriting the guest/user supplied loadparm with the stale
> value.

Hi; Coverity points out (CID 1421966) that you have a buffer overrun here:

> +static void update_machine_ipl_properties(IplParameterBlock *iplb)
> +{
> +Object *machine = qdev_get_machine();
> +Error *err = NULL;
> +
> +/* Sync loadparm */
> +if (iplb->flags & DIAG308_FLAGS_LP_VALID) {
> +uint8_t *ebcdic_loadparm = iplb->loadparm;
> +char ascii_loadparm[8];

This array is 8 bytes...

> +int i;
> +
> +for (i = 0; i < 8 && ebcdic_loadparm[i]; i++) {
> +ascii_loadparm[i] = ebcdic2ascii[(uint8_t) ebcdic_loadparm[i]];
> +}
> +ascii_loadparm[i] = 0;

...but you can write 9 bytes into it (8 from the guest-controlled
iplb_loadparm buffer plus one for the trailing NUL).

> +object_property_set_str(machine, ascii_loadparm, "loadparm", );
> +} else {
> +object_property_set_str(machine, "", "loadparm", );
> +}
> +if (err) {
> +warn_report_err(err);
> +}
> +}

thanks
-- PMM



Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Kirti Wankhede




On 3/19/2020 9:52 PM, Alex Williamson wrote:

On Thu, 19 Mar 2020 20:22:41 +0530
Kirti Wankhede  wrote:


On 3/19/2020 9:15 AM, Alex Williamson wrote:

On Thu, 19 Mar 2020 01:11:11 +0530
Kirti Wankhede  wrote:
   





+
+static int verify_bitmap_size(uint64_t npages, uint64_t bitmap_size)
+{
+   uint64_t bsize;
+
+   if (!npages || !bitmap_size || bitmap_size > UINT_MAX)


As commented previously, how do we derive this UINT_MAX limitation?
   


Sorry, I missed that earlier

  > UINT_MAX seems arbitrary, is this specified in our API?  The size of a
  > vfio_dma is limited to what the user is able to pin, and therefore
  > their locked memory limit, but do we have an explicit limit elsewhere
  > that results in this limit here.  I think a 4GB bitmap would track
  > something like 2^47 bytes of memory, that's pretty excessive, but still
  > an arbitrary limit.

There has to be some upper limit check. In core KVM, in
virt/kvm/kvm_main.c there is max number of pages check:

if (new.npages > KVM_MEM_MAX_NR_PAGES)

Where
/*
   * Some of the bitops functions do not support too long bitmaps.
   * This number must be determined not to exceed such limits.
   */
#define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)

Though I don't know which bitops functions do not support long bitmaps.

Something similar as above can be done or same as you also mentioned of
4GB bitmap limit? that is U32_MAX instead of UINT_MAX?


Let's see, we use bitmap_set():

void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits)

So we're limited to an unsigned int number of bits, but for an
unaligned, multi-bit operation this will call __bitmap_set():

void __bitmap_set(unsigned long *map, unsigned int start, int len)

So we're down to a signed int number of bits (seems like an API bug in
bitops there), so it makes sense that KVM is testing against MAX_INT
number of pages, ie. number of bits.  But that still suggests a bitmap
size of MAX_UINT is off by a factor of 16.  So we can have 2^31 bits
divided by 2^3 bits/byte yields a maximum bitmap size of 2^28 (ie.
256MB), which maps 2^31 * 2^12 = 2^43 (8TB) on a 4K system.

Let's fix the limit check and put a nice comment explaining it.  Thanks,



Agreed. Adding DIRTY_BITMAP_SIZE_MAX macro and comment as below.

/*
 * Input argument of number of bits to bitmap_set() is unsigned 
integer, which

 * further casts to signed integer for unaligned multi-bit operation,
 * __bitmap_set().
 * Then maximum bitmap size supported is 2^31 bits divided by 2^3 
bits/byte,

 * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
 * system.
 */
#define DIRTY_BITMAP_PAGES_MAX  ((1UL << 31) - 1)
#define DIRTY_BITMAP_SIZE_MAX   \
DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)


Thanks,
Kirti



[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-19 Thread Robert Henry
yes, it is intentional.  I don't yet understand why, but am talking to
those who do.
https://github.com/dotnet/runtime/blob/1b02665be501b695b9c22c1ebd69148c07a225f6/src/coreclr/src/pal/src/arch/amd64/context2.S#L183

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no alt stacks come anywhere near
  the failing address; all alt stacks are in the heap, as expected.
  None of the mmap/mprotect/munmap 

Re: [PATCH v6 14/61] target/riscv: vector single-width bit shift instructions

2020-03-19 Thread Alistair Francis
On Tue, Mar 17, 2020 at 8:35 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   | 25 
>  target/riscv/insn32.decode  |  9 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 54 
>  target/riscv/vector_helper.c| 85 +
>  4 files changed, 173 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 4373e9e8c2..47284c7476 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -397,3 +397,28 @@ DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsll_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsll_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsll_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsll_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsll_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 3ad6724632..f6d0f5aec5 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -320,6 +320,15 @@ vor_vi  001010 . . . 011 . 1010111 
> @r_vm
>  vxor_vv 001011 . . . 000 . 1010111 @r_vm
>  vxor_vx 001011 . . . 100 . 1010111 @r_vm
>  vxor_vi 001011 . . . 011 . 1010111 @r_vm
> +vsll_vv 100101 . . . 000 . 1010111 @r_vm
> +vsll_vx 100101 . . . 100 . 1010111 @r_vm
> +vsll_vi 100101 . . . 011 . 1010111 @r_vm
> +vsrl_vv 101000 . . . 000 . 1010111 @r_vm
> +vsrl_vx 101000 . . . 100 . 1010111 @r_vm
> +vsrl_vi 101000 . . . 011 . 1010111 @r_vm
> +vsra_vv 101001 . . . 000 . 1010111 @r_vm
> +vsra_vx 101001 . . . 100 . 1010111 @r_vm
> +vsra_vi 101001 . . . 011 . 1010111 @r_vm
>
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index b4ba6d83f3..6ed2466e75 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -1258,3 +1258,57 @@ GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
>  GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
>  GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
>  GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
> +
> +/* Vector Single-Width Bit Shift Instructions */
> +GEN_OPIVV_GVEC_TRANS(vsll_vv,  shlv)
> +GEN_OPIVV_GVEC_TRANS(vsrl_vv,  shrv)
> +GEN_OPIVV_GVEC_TRANS(vsra_vv,  sarv)
> +
> +typedef void GVecGen2sFn32(unsigned, uint32_t, uint32_t, TCGv_i32,
> +   uint32_t, uint32_t);
> +
> +static inline bool
> +do_opivx_gvec_shift(DisasContext *s, arg_rmrr *a, GVecGen2sFn32 *gvec_fn,
> +gen_helper_opivx *fn)
> +{
> +if (!opivx_check(s, a)) {
> +return false;
> +}
> +
> +if (a->vm && s->vl_eq_vlmax) {
> +TCGv_i32 src1 = tcg_temp_new_i32();
> +TCGv tmp = tcg_temp_new();
> +
> +gen_get_gpr(tmp, a->rs1);
> +tcg_gen_trunc_tl_i32(src1, tmp);
> +tcg_gen_extract_i32(src1, src1, 0, s->sew + 3);
> +gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
> +src1, MAXSZ(s), MAXSZ(s));
> +
> +tcg_temp_free_i32(src1);
> +tcg_temp_free(tmp);
> +return true;

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-19 Thread Peter Maydell
Is dotnet intentionally doing an iret? It seems like an odd thing for a
userspace program to do, given it's basically "return from interrupt".

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no alt stacks come anywhere near
  the failing address; all alt stacks are in the heap, as expected.
  None of the mmap/mprotect/munmap syscalls were given arguments in the
  high memory 0x7fff and 

Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200319161925.1818377-2-dnbrd...@gmail.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  trace-root.o
In file included from /tmp/qemu-test/src/util/rcu.c:34:
/tmp/qemu-test/src/util/rcu.c: In function 'synchronize_rcu':
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: error: redefinition of 
'qemu_lockable_auto__COUNTER__'
 g_autoptr(QemuLockable) qemu_lockable_auto##__COUNTER__ = \
 ^~
/tmp/qemu-test/src/util/rcu.c:152:5: note: in expansion of macro 
'QEMU_LOCK_GUARD'
---
/tmp/qemu-test/src/util/rcu.c:145:5: note: in expansion of macro 
'QEMU_LOCK_GUARD'
 QEMU_LOCK_GUARD(_sync_lock);
 ^~~
make: *** [/tmp/qemu-test/src/rules.mak:69: util/rcu.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=4b63ffd2ab5f4987a0a55d2a52a064c7', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-5ena2idk/src/docker-src.2020-03-19-15.39.39.23586:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=4b63ffd2ab5f4987a0a55d2a52a064c7
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-5ena2idk/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real1m46.752s
user0m8.628s


The full log is available at
http://patchew.org/logs/20200319161925.1818377-2-dnbrd...@gmail.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200319161925.1818377-2-dnbrd...@gmail.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  hw/i2c/trace.o
In file included from /tmp/qemu-test/src/util/rcu.c:34:0:
/tmp/qemu-test/src/util/rcu.c: In function 'synchronize_rcu':
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: error: redefinition of 
'qemu_lockable_auto__COUNTER__'
 g_autoptr(QemuLockable) qemu_lockable_auto##__COUNTER__ = \
 ^
/tmp/qemu-test/src/util/rcu.c:152:5: note: in expansion of macro 
'QEMU_LOCK_GUARD'
---
/tmp/qemu-test/src/util/rcu.c:145:5: note: in expansion of macro 
'QEMU_LOCK_GUARD'
 QEMU_LOCK_GUARD(_sync_lock);
 ^
make: *** [util/rcu.o] Error 1
make: *** Waiting for unfinished jobs
  CC  hw/i386/trace.o
Traceback (most recent call last):
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=a933fe13ee2642d4bc80a6fa2e811043', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-uea2fvv2/src/docker-src.2020-03-19-15.37.40.16923:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=a933fe13ee2642d4bc80a6fa2e811043
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-uea2fvv2/src'
make: *** [docker-run-test-quick@centos7] Error 2

real1m40.963s
user0m8.233s


The full log is available at
http://patchew.org/logs/20200319161925.1818377-2-dnbrd...@gmail.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v2 6/6] scripts/coverity-scan: Add Docker support

2020-03-19 Thread Peter Maydell
Add support for running the Coverity Scan tools inside a Docker
container rather than directly on the host system.

Signed-off-by: Peter Maydell 
---
v1->v2:
 * various bug fixes
 * added --src-tarball rather than putting the whole source
   tree in the 'secrets' directory
 * docker file package list updated
---
 scripts/coverity-scan/coverity-scan.docker | 131 +
 scripts/coverity-scan/run-coverity-scan|  90 ++
 2 files changed, 221 insertions(+)
 create mode 100644 scripts/coverity-scan/coverity-scan.docker

diff --git a/scripts/coverity-scan/coverity-scan.docker 
b/scripts/coverity-scan/coverity-scan.docker
new file mode 100644
index 000..a4f64d12834
--- /dev/null
+++ b/scripts/coverity-scan/coverity-scan.docker
@@ -0,0 +1,131 @@
+# syntax=docker/dockerfile:1.0.0-experimental
+#
+# Docker setup for running the "Coverity Scan" tools over the source
+# tree and uploading them to the website, as per
+# https://scan.coverity.com/projects/qemu/builds/new
+# We do this on a fixed config (currently Fedora 30 with a known
+# set of dependencies and a configure command that enables a specific
+# set of options) so that random changes don't result in our accidentally
+# dropping some files from the scan.
+#
+# We don't build on top of the fedora.docker file because we don't
+# want to accidentally change or break the scan config when that
+# is updated.
+
+# The work of actually doing the build is handled by the
+# run-coverity-scan script.
+
+FROM fedora:30
+ENV PACKAGES \
+alsa-lib-devel \
+bc \
+bison \
+brlapi-devel \
+bzip2 \
+bzip2-devel \
+ccache \
+clang \
+curl \
+cyrus-sasl-devel \
+dbus-daemon \
+device-mapper-multipath-devel \
+findutils \
+flex \
+gcc \
+gcc-c++ \
+gettext \
+git \
+glib2-devel \
+glusterfs-api-devel \
+gnutls-devel \
+gtk3-devel \
+hostname \
+libaio-devel \
+libasan \
+libattr-devel \
+libblockdev-mpath-devel \
+libcap-devel \
+libcap-ng-devel \
+libcurl-devel \
+libepoxy-devel \
+libfdt-devel \
+libgbm-devel \
+libiscsi-devel \
+libjpeg-devel \
+libpmem-devel \
+libnfs-devel \
+libpng-devel \
+librbd-devel \
+libseccomp-devel \
+libssh-devel \
+libubsan \
+libudev-devel \
+libusbx-devel \
+libxml2-devel \
+libzstd-devel \
+llvm \
+lzo-devel \
+make \
+mingw32-bzip2 \
+mingw32-curl \
+mingw32-glib2 \
+mingw32-gmp \
+mingw32-gnutls \
+mingw32-gtk3 \
+mingw32-libjpeg-turbo \
+mingw32-libpng \
+mingw32-libtasn1 \
+mingw32-nettle \
+mingw32-nsis \
+mingw32-pixman \
+mingw32-pkg-config \
+mingw32-SDL2 \
+mingw64-bzip2 \
+mingw64-curl \
+mingw64-glib2 \
+mingw64-gmp \
+mingw64-gnutls \
+mingw64-gtk3 \
+mingw64-libjpeg-turbo \
+mingw64-libpng \
+mingw64-libtasn1 \
+mingw64-nettle \
+mingw64-pixman \
+mingw64-pkg-config \
+mingw64-SDL2 \
+ncurses-devel \
+nettle-devel \
+nss-devel \
+numactl-devel \
+perl \
+perl-Test-Harness \
+pixman-devel \
+pulseaudio-libs-devel \
+python3 \
+python3-sphinx \
+PyYAML \
+rdma-core-devel \
+SDL2-devel \
+snappy-devel \
+sparse \
+spice-server-devel \
+systemd-devel \
+systemtap-sdt-devel \
+tar \
+texinfo \
+usbredir-devel \
+virglrenderer-devel \
+vte291-devel \
+wget \
+which \
+xen-devel \
+xfsprogs-devel \
+zlib-devel
+ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
+
+RUN dnf install -y $PACKAGES
+RUN rpm -q $PACKAGES | sort > /packages.txt
+ENV PATH $PATH:/usr/libexec/python3-sphinx/
+ENV COVERITY_TOOL_BASE=/coverity-tools
+COPY run-coverity-scan run-coverity-scan
+RUN --mount=type=secret,id=coverity.token,required ./run-coverity-scan 
--update-tools-only --tokenfile /run/secrets/coverity.token
diff --git a/scripts/coverity-scan/run-coverity-scan 
b/scripts/coverity-scan/run-coverity-scan
index d40b51969fa..2e067ef5cfc 100755
--- a/scripts/coverity-scan/run-coverity-scan
+++ b/scripts/coverity-scan/run-coverity-scan
@@ -29,6 +29,7 @@
 
 # Command line options:
 #   --dry-run : run the tools, but don't actually do the upload
+#   --docker : create and work inside a docker container
 #   --update-tools-only : update the cached copy of the tools, but don't run 
them
 #   --tokenfile : file to read Coverity token from
 #   --version ver : specify version being analyzed (default: ask git)
@@ -36,6 +37,8 @@
 #   --srcdir : QEMU source tree to analyze (default: current working dir)
 #   --results-tarball : path to copy the results tarball to (default: don't
 #   copy it anywhere, just upload it)
+#   --src-tarball : tarball to untar into src dir (default: none); this
+#   is intended mainly for internal use by the Docker support
 #
 # User-specifiable environment 

[PATCH v2 3/6] thread.h: Remove trailing semicolons from Coverity qemu_mutex_lock() etc

2020-03-19 Thread Peter Maydell
All the Coverity-specific definitions of qemu_mutex_lock() and friends
have a trailing semicolon. This works fine almost everywhere because
of QEMU's mandatory-braces coding style and because most callsites are
simple, but target/s390x/sigp.c has a use of qemu_mutex_trylock() as
an if() statement, which makes the ';' a syntax error:
"../target/s390x/sigp.c", line 461: warning #18: expected a ")"
  if (qemu_mutex_trylock(_sigp_mutex)) {
  ^

Remove the bogus semicolons from the macro definitions.

Signed-off-by: Peter Maydell 
---
 include/qemu/thread.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 10262c63f58..d22848138ea 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -57,17 +57,17 @@ extern QemuCondTimedWaitFunc qemu_cond_timedwait_func;
  * hide them.
  */
 #define qemu_mutex_lock(m)  \
-qemu_mutex_lock_impl(m, __FILE__, __LINE__);
+qemu_mutex_lock_impl(m, __FILE__, __LINE__)
 #define qemu_mutex_trylock(m)   \
-qemu_mutex_trylock_impl(m, __FILE__, __LINE__);
+qemu_mutex_trylock_impl(m, __FILE__, __LINE__)
 #define qemu_rec_mutex_lock(m)  \
-qemu_rec_mutex_lock_impl(m, __FILE__, __LINE__);
+qemu_rec_mutex_lock_impl(m, __FILE__, __LINE__)
 #define qemu_rec_mutex_trylock(m)   \
-qemu_rec_mutex_trylock_impl(m, __FILE__, __LINE__);
+qemu_rec_mutex_trylock_impl(m, __FILE__, __LINE__)
 #define qemu_cond_wait(c, m)\
-qemu_cond_wait_impl(c, m, __FILE__, __LINE__);
+qemu_cond_wait_impl(c, m, __FILE__, __LINE__)
 #define qemu_cond_timedwait(c, m, ms)   \
-qemu_cond_timedwait_impl(c, m, ms, __FILE__, __LINE__);
+qemu_cond_timedwait_impl(c, m, ms, __FILE__, __LINE__)
 #else
 #define qemu_mutex_lock(m) ({   \
 QemuMutexLockFunc _f = atomic_read(_mutex_lock_func);  \
-- 
2.20.1




[PATCH v2 5/6] scripts/run-coverity-scan: Script to run Coverity Scan build

2020-03-19 Thread Peter Maydell
Add a new script to automate the process of running the Coverity
Scan build tools and uploading the resulting tarball to the
website.

This is intended eventually to be driven from Travis,
but it can be run locally, if you are a maintainer of the
QEMU project on the Coverity Scan website and have the secret
upload token.

The script must be run on a Fedora 30 system.  Support for using a
Docker container is added in a following commit.

Signed-off-by: Peter Maydell 
---
changes v1->v2:
 * fix sense of DRYRUN test in check_upload_permissions
 * use nproc rather than hardcoding -j8
 * use $PWD rather than $(pwd)
 * minor tweaks to configure line
 * new --results-tarball option
---
 MAINTAINERS |   5 +
 scripts/coverity-scan/run-coverity-scan | 311 
 2 files changed, 316 insertions(+)
 create mode 100755 scripts/coverity-scan/run-coverity-scan

diff --git a/MAINTAINERS b/MAINTAINERS
index 7364af0d8b0..395534522b6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2003,6 +2003,11 @@ M: Markus Armbruster 
 S: Supported
 F: scripts/coverity-model.c
 
+Coverity Scan integration
+M: Peter Maydell 
+S: Maintained
+F: scripts/coverity-scan/
+
 Device Tree
 M: Alistair Francis 
 R: David Gibson 
diff --git a/scripts/coverity-scan/run-coverity-scan 
b/scripts/coverity-scan/run-coverity-scan
new file mode 100755
index 000..d40b51969fa
--- /dev/null
+++ b/scripts/coverity-scan/run-coverity-scan
@@ -0,0 +1,311 @@
+#!/bin/sh -e
+
+# Upload a created tarball to Coverity Scan, as per
+# https://scan.coverity.com/projects/qemu/builds/new
+
+# This work is licensed under the terms of the GNU GPL version 2,
+# or (at your option) any later version.
+# See the COPYING file in the top-level directory.
+#
+# Copyright (c) 2017-2020 Linaro Limited
+# Written by Peter Maydell
+
+# Note that this script will automatically download and
+# run the (closed-source) coverity build tools, so don't
+# use it if you don't trust them!
+
+# This script assumes that you're running it from a QEMU source
+# tree, and that tree is a fresh clean one, because we do an in-tree
+# build. (This is necessary so that the filenames that the Coverity
+# Scan server sees are relative paths that match up with the component
+# regular expressions it uses; an out-of-tree build won't work for this.)
+# The host machine should have as many of QEMU's dependencies
+# installed as possible, for maximum coverity coverage.
+
+# To do an upload you need to be a maintainer in the Coverity online
+# service, and you will need to know the "Coverity token", which is a
+# secret 8 digit hex string. You can find that from the web UI in the
+# project settings, if you have maintainer access there.
+
+# Command line options:
+#   --dry-run : run the tools, but don't actually do the upload
+#   --update-tools-only : update the cached copy of the tools, but don't run 
them
+#   --tokenfile : file to read Coverity token from
+#   --version ver : specify version being analyzed (default: ask git)
+#   --description desc : specify description of this version (default: ask git)
+#   --srcdir : QEMU source tree to analyze (default: current working dir)
+#   --results-tarball : path to copy the results tarball to (default: don't
+#   copy it anywhere, just upload it)
+#
+# User-specifiable environment variables:
+#  COVERITY_TOKEN -- Coverity token
+#  COVERITY_EMAIL -- the email address to use for uploads (default:
+#looks at your git user.email config)
+#  COVERITY_BUILD_CMD -- make command (default: 'make -jN' where N is
+#number of CPUs as determined by 'nproc')
+#  COVERITY_TOOL_BASE -- set to directory to put coverity tools
+#(default: /tmp/coverity-tools)
+#
+# You must specify the token, either by environment variable or by
+# putting it in a file and using --tokenfile. Everything else has
+# a reasonable default if this is run from a git tree.
+
+check_upload_permissions() {
+# Check whether we can do an upload to the server; will exit the script
+# with status 1 if the check failed (usually a bad token);
+# will exit the script with status 0 if the check indicated that we
+# can't upload yet (ie we are at quota)
+# Assumes that PROJTOKEN, PROJNAME and DRYRUN have been initialized.
+
+echo "Checking upload permissions..."
+
+if ! up_perm="$(wget https://scan.coverity.com/api/upload_permitted 
--post-data "token=$PROJTOKEN=$PROJNAME" -q -O -)"; then
+echo "Coverity Scan API access denied: bad token?"
+exit 1
+fi
+
+# Really up_perm is a JSON response with either
+# {upload_permitted:true} or {next_upload_permitted_at:}
+# We do some hacky string parsing instead of properly parsing it.
+case "$up_perm" in
+*upload_permitted*true*)
+echo "Coverity Scan: upload permitted"
+;;
+*next_upload_permitted_at*)
+if [ "$DRYRUN" = yes 

[PATCH v2 0/6] Automation of Coverity Scan uploads (via Docker)

2020-03-19 Thread Peter Maydell
v1 of this series was over a year ago:
https://patchew.org/QEMU/20181113184641.4492-1-peter.mayd...@linaro.org/

I dusted it off and fixed some stuff because Paolo reports that the
machine he was previously using for uploads can't run the Coverity
tools any more.

The first four patches are fixes for problems that cause the Coverity
tool not to be able to scan everything.  The first one in particular
meant that every compilation unit failed, which would block uploads. 
The other 3 would reduce the scan coverage but weren't fatal.  (The
only remaining warnings in the log are where Coverity complains about
asm intrinsics system headers.)

With these scripts you can do an upload with
COVERITY_TOKEN=n ./scripts/coverity-scan/run-coverity-scan --docker
(where  is the project's secret token which admins can
get from the Coverity web UI).

I did in fact do an upload to test it, so the currently visible
results on the website are the result of a scan on ce73691e258 plus
this series.

The new upload has +112 defects, which is quite a lot, but I don't
think it's so many that it is "defects we rejected as false positives
coming back again"; my guess is a combination of the fixes in the
first 4 patches increasing coverage plus we haven't run a test in a
while plus maybe the script has some more config options enabled that
Paolo's box did not.  (In the web UI defects that were dismissed as
FPs seem still to be considered present-but-dismissed, so I think
that's OK.)

Not much has changed since v1; I didn't get very much feedback
the first time around[*]. Docker still seems to do the "download
the Coverity tools" part more often than I expect. On the other
hand "actually automated with a script in the tree" beats "not
automated and currently broken" so maybe this patchset as it
stands is good enough, given that basically 1 or 2 people ever
will be running the script ?

[*] Eric will note that yes, the script still uses set -e.

(Like v1 this doesn't try to tie it into Travis, but we could
in theory do that some day, or have some other automated once
a week run of the script.)

thanks
-- PMM

Peter Maydell (6):
  osdep.h: Drop no-longer-needed Coverity workarounds
  thread.h: Fix Coverity version of qemu_cond_timedwait()
  thread.h: Remove trailing semicolons from Coverity qemu_mutex_lock()
etc
  linux-user/flatload.c: Use "" for include of QEMU header target_flat.h
  scripts/run-coverity-scan: Script to run Coverity Scan build
  scripts/coverity-scan: Add Docker support

 include/qemu/osdep.h   |  14 -
 include/qemu/thread.h  |  12 +-
 linux-user/flatload.c  |   2 +-
 MAINTAINERS|   5 +
 scripts/coverity-scan/coverity-scan.docker | 131 +++
 scripts/coverity-scan/run-coverity-scan| 401 +
 6 files changed, 544 insertions(+), 21 deletions(-)
 create mode 100644 scripts/coverity-scan/coverity-scan.docker
 create mode 100755 scripts/coverity-scan/run-coverity-scan

-- 
2.20.1



[PATCH v2 4/6] linux-user/flatload.c: Use "" for include of QEMU header target_flat.h

2020-03-19 Thread Peter Maydell
The target_flat.h file is a QEMU header, so we should include it using
quotes, not angle brackets.

Coverity otherwise is unable to find the header:

"../linux-user/flatload.c", line 40: error #1712: cannot open source file
  "target_flat.h"
  #include 
  ^

because the relevant directory is only on the -iquote path, not the -I path.

Signed-off-by: Peter Maydell 
---
I don't know why Coverity in particular has trouble here but
real compilers don't. Still, the "" is the right thing.
---
 linux-user/flatload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/flatload.c b/linux-user/flatload.c
index 0122ab3afe6..66901f39cc5 100644
--- a/linux-user/flatload.c
+++ b/linux-user/flatload.c
@@ -37,7 +37,7 @@
 
 #include "qemu.h"
 #include "flat.h"
-#include 
+#include "target_flat.h"
 
 //#define DEBUG
 
-- 
2.20.1




[PATCH v2 2/6] thread.h: Fix Coverity version of qemu_cond_timedwait()

2020-03-19 Thread Peter Maydell
For Coverity's benefit, we provide simpler versions of functions like
qemu_mutex_lock(), qemu_cond_wait() and qemu_cond_timedwait().  When
we added qemu_cond_timedwait() in commit 3dcc9c6ec4ea, a cut and
paste error meant that the Coverity version of qemu_cond_timedwait()
was using the wrong _impl function, which makes the Coverity parser
complain:

"/qemu/include/qemu/thread.h", line 159: warning #140: too many arguments in
  function call
  return qemu_cond_timedwait(cond, mutex, ms);
 ^

"/qemu/include/qemu/thread.h", line 159: warning #120: return value type does
  not match the function type
  return qemu_cond_timedwait(cond, mutex, ms);
 ^

"/qemu/include/qemu/thread.h", line 156: warning #1563: function
  "qemu_cond_timedwait" not emitted, consider modeling it or review
  parse diagnostics to improve fidelity
  static inline bool (qemu_cond_timedwait)(QemuCond *cond, QemuMutex *mutex,
  ^

These aren't fatal, but reduce the scope of the analysis. Fix the error.

Signed-off-by: Peter Maydell 
---
 include/qemu/thread.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 047db0307e7..10262c63f58 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -67,7 +67,7 @@ extern QemuCondTimedWaitFunc qemu_cond_timedwait_func;
 #define qemu_cond_wait(c, m)\
 qemu_cond_wait_impl(c, m, __FILE__, __LINE__);
 #define qemu_cond_timedwait(c, m, ms)   \
-qemu_cond_wait_impl(c, m, ms, __FILE__, __LINE__);
+qemu_cond_timedwait_impl(c, m, ms, __FILE__, __LINE__);
 #else
 #define qemu_mutex_lock(m) ({   \
 QemuMutexLockFunc _f = atomic_read(_mutex_lock_func);  \
-- 
2.20.1




[PATCH v2 1/6] osdep.h: Drop no-longer-needed Coverity workarounds

2020-03-19 Thread Peter Maydell
In commit a1a98357e3fd in 2018 we added some workarounds for Coverity
not being able to handle the _Float* types introduced by recent
glibc.  Newer versions of the Coverity scan tools have support for
these types, and will fail with errors about duplicate typedefs if we
have our workaround.  Remove our copy of the typedefs.

Signed-off-by: Peter Maydell 
---
Sadly I don't think there's any way to tell whether we should
or should not provide the typedefs, so anybody with an older
Coverity will presumably find this breaks them.
---
 include/qemu/osdep.h | 14 --
 1 file changed, 14 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9bd3dcfd136..20f5c5f197d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -33,20 +33,6 @@
 #else
 #include "exec/poison.h"
 #endif
-#ifdef __COVERITY__
-/* Coverity does not like the new _Float* types that are used by
- * recent glibc, and croaks on every single file that includes
- * stdlib.h.  These typedefs are enough to please it.
- *
- * Note that these fix parse errors so they cannot be placed in
- * scripts/coverity-model.c.
- */
-typedef float _Float32;
-typedef double _Float32x;
-typedef double _Float64;
-typedef __float80 _Float64x;
-typedef __float128 _Float128;
-#endif
 
 #include "qemu/compiler.h"
 
-- 
2.20.1




Re: [PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200319161925.1818377-2-dnbrd...@gmail.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  trace-root.o
  CC  accel/kvm/trace.o
  CC  accel/tcg/trace.o
/tmp/qemu-test/src/util/thread-pool.c:213:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(>lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: util/thread-pool.o] Error 1
make: *** Waiting for unfinished jobs
  CC  backends/trace.o
/tmp/qemu-test/src/util/rcu.c:152:5: error: redefinition of 
'qemu_lockable_auto__COUNTER__'
QEMU_LOCK_GUARD(_registry_lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: util/rcu.o] Error 1
/tmp/qemu-test/src/util/vfio-helpers.c:671:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(>lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: util/vfio-helpers.o] Error 1
/tmp/qemu-test/src/util/log.c:98:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(_logfile_mutex);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: util/log.o] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=45b667b7aa8a4261818bebb78971bc11', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-p_chzwm4/src/docker-src.2020-03-19-15.28.32.10188:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=45b667b7aa8a4261818bebb78971bc11
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-p_chzwm4/src'
make: *** [docker-run-test-debug@fedora] Error 2

real2m54.468s
user0m7.639s


The full log is available at
http://patchew.org/logs/20200319161925.1818377-2-dnbrd...@gmail.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v5 2/3] net: tulip: add .can_receive routine

2020-03-19 Thread Philippe Mathieu-Daudé

On 3/19/20 6:40 PM, P J P wrote:

From: Prasad J Pandit 

Define .can_receive routine to do sanity checks before receiving
packet data.

Signed-off-by: Prasad J Pandit 
---
  hw/net/tulip.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

Update v3: define .can_receive routine
   -> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg06275.html

Update v5: fix a typo in commit log message
   -> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg06209.html

diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index fbe40095da..757f12c710 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -229,6 +229,18 @@ static bool tulip_filter_address(TULIPState *s, const 
uint8_t *addr)
  return ret;
  }
  
+static int

+tulip_can_receive(NetClientState *nc)
+{
+TULIPState *s = qemu_get_nic_opaque(nc);
+
+if (s->rx_frame_len || tulip_rx_stopped(s)) {
+return false;
+}
+
+return true;
+}
+
  static ssize_t tulip_receive(TULIPState *s, const uint8_t *buf, size_t size)
  {
  struct tulip_descriptor desc;
@@ -236,7 +248,7 @@ static ssize_t tulip_receive(TULIPState *s, const uint8_t 
*buf, size_t size)
  trace_tulip_receive(buf, size);
  
  if (size < 14 || size > sizeof(s->rx_frame) - 4

-|| s->rx_frame_len || tulip_rx_stopped(s)) {
+|| !tulip_can_receive(s->nic->ncs)) {
  return 0;
  }
  
@@ -288,6 +300,7 @@ static NetClientInfo net_tulip_info = {

  .type = NET_CLIENT_DRIVER_NIC,
  .size = sizeof(NICState),
  .receive = tulip_receive_nc,
+.can_receive = tulip_can_receive,
  };
  
  static const char *tulip_reg_name(const hwaddr addr)




Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 0/4] linux-user: Fix some issues in termbits.h files

2020-03-19 Thread Aleksandar Markovic
On Thursday, March 19, 2020, Laurent Vivier  wrote:

> Le 19/03/2020 à 17:24, Aleksandar Markovic a écrit :
> >> I think we should first introduce a linux-user/generic/termbits.h as we
> >> have an asm-generic/termbits.h in the kernel and use it with all the
> >> targets except alpha, mips, hppa, sparc and xtensa.
> >>
> >> I think this linux-user/generic/termbits.h could be copied from
> >> linux-user/openrisc/termbits.h (without the ioctl definitions)
> >>
> >> Then you could update the remaining ones.
> >>
> >
> > I agree with you, Laurent, that would be the cleanest
> > implementation.
> >
> > However, I think it requires at least several days of meticulous
> > dev work, that I can't afford at this moment. May I ask you to
> > accept this series as is for 5.0, as a sort of bridge towards
> > the implementation you described? It certainly fixes a majority
> > of termbits-related bugs, many of them remained latent just
> > by fact that XXX and TARGET_XXX were identical. The most
> > affected targets, xtensa, mips and alpha should be cleaned up
> > by this series wrt termbits, and for great majority of issues
> > are cleaned up for all platforms.
> >
> > I just don't have enough time resources to additionally
> > devote to this problem.
>
> ok, but I need to review and test them, I don't know if I will have
> enough time for that. I will try...
>
>
OK, thanks!

Aleksandar



> Thanks,
> Laurent
>


Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Kirti Wankhede




On 3/19/2020 9:15 AM, Alex Williamson wrote:

On Thu, 19 Mar 2020 01:11:11 +0530
Kirti Wankhede  wrote:


VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
   copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  drivers/vfio/vfio_iommu_type1.c | 205 +++-
  1 file changed, 203 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..d6417fb02174 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
  };
  
  struct vfio_domain {

@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;


We've made the bitmap a width invariant u64 else, should be here as
well.



Changing to u64 causes compile time warnings as below. Keeping 'unsigned 
long *'


drivers/vfio/vfio_iommu_type1.c: In function ‘vfio_dma_bitmap_alloc_all’:
drivers/vfio/vfio_iommu_type1.c:232:8: warning: passing argument 1 of 
‘bitmap_set’ from incompatible pointer type [enabled by default]

(vpfn->iova - dma->iova) / pgsize, 1);
^
In file included from ./include/linux/cpumask.h:12:0,
 from ./arch/x86/include/asm/cpumask.h:5,
 from ./arch/x86/include/asm/msr.h:11,
 from ./arch/x86/include/asm/processor.h:22,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:78,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/seqlock.h:36,
 from ./include/linux/time.h:6,
 from ./include/linux/compat.h:10,
 from drivers/vfio/vfio_iommu_type1.c:24:
./include/linux/bitmap.h:405:29: note: expected ‘long unsigned int *’ 
but argument is of type ‘u64 *’
 static __always_inline void bitmap_set(unsigned long *map, unsigned 
int start,


Thanks,
Kirti



[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-19 Thread Robert Henry
I have confirmed that the dotnet guest application is executing a
"iretq" instruction when this guest kernel bug is hit. A first round of
analysis shows nothing unreasonable at the point the iretq is executed.
The $rsp points into the middle of a mapped in page, the returned-to
$rip looks reasonable, etc. We continue our analysis of qemu and the
dotnet runtime.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the 

Re: [PATCH v10 01/16] s390x: Move diagnose 308 subcodes and rcs into ipl.h

2020-03-19 Thread Claudio Imbrenda
On Wed, 18 Mar 2020 10:30:32 -0400
Janosch Frank  wrote:

> They are part of the IPL process, so let's put them into the ipl
> header.
> 
> Signed-off-by: Janosch Frank 


Reviewed-by: Claudio Imbrenda 


> ---
>  hw/s390x/ipl.h  | 11 +++
>  target/s390x/diag.c | 11 ---
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
> index 3e44abe1c651d8a0..a5665e6bfde2e8cf 100644
> --- a/hw/s390x/ipl.h
> +++ b/hw/s390x/ipl.h
> @@ -159,6 +159,17 @@ struct S390IPLState {
>  typedef struct S390IPLState S390IPLState;
>  QEMU_BUILD_BUG_MSG(offsetof(S390IPLState, iplb) & 3, "alignment of
> iplb wrong"); 
> +#define DIAG_308_RC_OK  0x0001
> +#define DIAG_308_RC_NO_CONF 0x0102
> +#define DIAG_308_RC_INVALID 0x0402
> +
> +#define DIAG308_RESET_MOD_CLR   0
> +#define DIAG308_RESET_LOAD_NORM 1
> +#define DIAG308_LOAD_CLEAR  3
> +#define DIAG308_LOAD_NORMAL_DUMP4
> +#define DIAG308_SET 5
> +#define DIAG308_STORE   6
> +
>  #define S390_IPL_TYPE_FCP 0x00
>  #define S390_IPL_TYPE_CCW 0x02
>  #define S390_IPL_TYPE_QEMU_SCSI 0xff
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index 54e5670b3fd6d960..8aba6341f94848e1 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -49,17 +49,6 @@ int handle_diag_288(CPUS390XState *env, uint64_t
> r1, uint64_t r3) return diag288_class->handle_timer(diag288, func,
> timeout); }
>  
> -#define DIAG_308_RC_OK  0x0001
> -#define DIAG_308_RC_NO_CONF 0x0102
> -#define DIAG_308_RC_INVALID 0x0402
> -
> -#define DIAG308_RESET_MOD_CLR   0
> -#define DIAG308_RESET_LOAD_NORM 1
> -#define DIAG308_LOAD_CLEAR  3
> -#define DIAG308_LOAD_NORMAL_DUMP4
> -#define DIAG308_SET 5
> -#define DIAG308_STORE   6
> -
>  static int diag308_parm_check(CPUS390XState *env, uint64_t r1,
> uint64_t addr, uintptr_t ra, bool write)
>  {




Re: [PATCH v3] MAINTAINERS: Add an entry for the HVF accelerator

2020-03-19 Thread Paolo Bonzini
On 19/03/20 19:07, Roman Bolshakov wrote:
>> From the other thread discussions, I'd keep you at least listed as
>> designated reviewer:
>>
>> R: Roman Bolshakov 
>>
> Sounds good to me, thanks.

I'll add you back.  Anyway as long as it's me sending pull requests, M
vs. R doesn't change much.

Thanks,

Paolo




Re: [PATCH v5 7/7] virtio-net: add migration support for RSS and hash report

2020-03-19 Thread Dr. David Alan Gilbert
* Yuri Benditovich (yuri.benditov...@daynix.com) wrote:
> On Wed, Mar 18, 2020 at 12:48 PM Dr. David Alan Gilbert 
> wrote:
> 
> > * Yuri Benditovich (yuri.benditov...@daynix.com) wrote:
> > > Save and restore RSS/hash report configuration.
> > >
> > > Signed-off-by: Yuri Benditovich 
> > > ---
> > >  hw/net/virtio-net.c | 26 ++
> > >  1 file changed, 26 insertions(+)
> > >
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index a0614ad4e6..0b058aae9f 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -2842,6 +2842,13 @@ static int virtio_net_post_load_device(void
> > *opaque, int version_id)
> > >  }
> > >  }
> > >
> > > +if (n->rss_data.enabled) {
> > > +trace_virtio_net_rss_enable(n->rss_data.hash_types,
> > > +n->rss_data.indirections_len,
> > > +sizeof(n->rss_data.key));
> > > +} else {
> > > +trace_virtio_net_rss_disable();
> > > +}
> > >  return 0;
> > >  }
> > >
> > > @@ -3019,6 +3026,24 @@ static const VMStateDescription
> > vmstate_virtio_net_has_vnet = {
> > >  },
> > >  };
> > >
> > > +static const VMStateDescription vmstate_rss = {
> > > +.name  = "vmstate_rss",
> >
> > You need to do something to avoid breaking migration compatibility
> > from/to old QEMU's and from/to QEMU's on hosts without the new virtio
> > features.
> > Probably adding a .needed =   here pointing to a function that
> > checks 'enabled' might do it.
> >
> > Does VMSTATE_STRUCT_TEST(..,..,checker_procedure,...) result the same
> thing?
> 
> Another question about migration support:
> What is expected/required behavior?
> Possible cases:
> old qemu -> new qemu
> new qemu (new feature off) -> old qemu

Just works.

Also be careful about the definition of 'new feature off'; normally we
tie these things to machine types, so that with the old machine type the
guest doesn't even see the feature; it can't turn it on.

> new qemu (new feature on) -> old qemu

Fails; hopefully nicely.

Dave

> 
> 
> > Dave
> >
> >
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_BOOL(enabled, VirtioNetRssData),
> > > +VMSTATE_BOOL(redirect, VirtioNetRssData),
> > > +VMSTATE_BOOL(populate_hash, VirtioNetRssData),
> > > +VMSTATE_UINT32(hash_types, VirtioNetRssData),
> > > +VMSTATE_UINT16(indirections_len, VirtioNetRssData),
> > > +VMSTATE_UINT16(default_queue, VirtioNetRssData),
> > > +VMSTATE_UINT8_ARRAY(key, VirtioNetRssData,
> > > +VIRTIO_NET_RSS_MAX_KEY_SIZE),
> > > +VMSTATE_VARRAY_UINT16_ALLOC(indirections_table,
> > VirtioNetRssData,
> > > +indirections_len, 0,
> > > +vmstate_info_uint16, uint16_t),
> > > +VMSTATE_END_OF_LIST()
> > > +},
> > > +};
> > > +
> > >  static const VMStateDescription vmstate_virtio_net_device = {
> > >  .name = "virtio-net-device",
> > >  .version_id = VIRTIO_NET_VM_VERSION,
> > > @@ -3067,6 +3092,7 @@ static const VMStateDescription
> > vmstate_virtio_net_device = {
> > >   vmstate_virtio_net_tx_waiting),
> > >  VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
> > >  has_ctrl_guest_offloads),
> > > +VMSTATE_STRUCT(rss_data, VirtIONet, 1, vmstate_rss,
> > VirtioNetRssData),
> > >  VMSTATE_END_OF_LIST()
> > > },
> > >  };
> > > --
> > > 2.17.1
> > >
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> >
> >
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v3] MAINTAINERS: Add an entry for the HVF accelerator

2020-03-19 Thread Roman Bolshakov
On Thu, Mar 19, 2020 at 06:06:15PM +0100, Philippe Mathieu-Daudé wrote:
> On 3/19/20 2:55 PM, Roman Bolshakov wrote:
> > Cameron signed up for taking HVF ownership.
> > 
> > Cc: Cameron Esfahani 
> > Cc: Nikita Leshenko 
> > Cc: Sergio Andres Gomez Del Real 
> > Cc: Patrick Colp 
> > Cc: Liran Alon 
> > Cc: Heiher 
> > 
> > Signed-off-by: Roman Bolshakov 
> > ---
> > Changes since v2:
> >Removed myself from the list of maintainers, added Cameron from Apple.
> >Status is changed to Supported again.
> > Changes since v1:
> >Status is changed to Maintained instead of Supported.
> > 
> >   MAINTAINERS | 7 +++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 7364af0d8b..ab4dc2816c 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -406,6 +406,13 @@ S: Supported
> >   F: target/i386/kvm.c
> >   F: scripts/kvm/vmxcap
> > +X86 HVF CPUs
> > +M: Cameron Esfahani 
> 
> From the other thread discussions, I'd keep you at least listed as
> designated reviewer:
> 
> R: Roman Bolshakov 
> 

Sounds good to me, thanks.

Roman



Re: [PULL v2 00/11] Bitmaps patches

2020-03-19 Thread John Snow



On 3/19/20 1:57 PM, Peter Maydell wrote:
> On Wed, 18 Mar 2020 at 20:24, John Snow  wrote:
>>
>> The following changes since commit d649689a8ecb2e276cc20d3af6d416e3c299cb17:
>>
>>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
>> staging (2020-03-17 18:33:05 +)
>>
>> are available in the Git repository at:
>>
>>   https://github.com/jnsnow/qemu.git tags/bitmaps-pull-request
>>
>> for you to fetch changes up to 2d00cbd8e222a4adc08f415c399e84590ee8ff9a:
>>
>>   block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty (2020-03-18 14:03:46 
>> -0400)
>>
>> 
>> Pull request
>>
>> 
> 
> 
> Applied, thanks.
> 

Wonderful, thanks!

> Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
> for any user-visible changes.
> 
> -- PMM

Will do.




Re: [PULL 00/20] Ide patches

2020-03-19 Thread John Snow



On 3/19/20 8:33 AM, Peter Maydell wrote:
> On Tue, 17 Mar 2020 at 23:23, John Snow  wrote:
>>
>> The following changes since commit 373c7068dd610e97f0b551b5a6d0a27cd6da4506:
>>
>>   qemu.nsi: Install Sphinx documentation (2020-03-09 16:45:00 +)
>>
>> are available in the Git repository at:
>>
>>   https://github.com/jnsnow/qemu.git tags/ide-pull-request
>>
>> for you to fetch changes up to 7d0776ca7f853d466b6174d96daa5c8afc43d1a4:
>>
>>   hw/ide: Remove unneeded inclusion of hw/ide.h (2020-03-17 12:22:36 -0400)
>>
>> 
>> Pull request
>>
>> 
>>
> 
> 
> Applied, thanks.
> 
> Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
> for any user-visible changes.
> 
> -- PMM
> 

Mark, I'm sorry to foist this on you, but would you mind updating the
changelog?

--js




Re: [PATCH v5 7/7] virtio-net: add migration support for RSS and hash report

2020-03-19 Thread Juan Quintela


Hi Yuri

Yuri Benditovich  wrote:
> On Wed, Mar 18, 2020 at 12:48 PM Dr. David Alan Gilbert 
> wrote:
>
>  * Yuri Benditovich (yuri.benditov...@daynix.com) wrote:
>  > Save and restore RSS/hash report configuration.
>  > 
>  > Signed-off-by: Yuri Benditovich 
>  > ---
>  >  hw/net/virtio-net.c | 26 ++
>  >  1 file changed, 26 insertions(+)
>  > 
>  > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>  > index a0614ad4e6..0b058aae9f 100644
>  > --- a/hw/net/virtio-net.c
>  > +++ b/hw/net/virtio-net.c
>  > @@ -2842,6 +2842,13 @@ static int virtio_net_post_load_device(void
>  *opaque, int version_id)
>  >  }
>  >  }
>  >  
>  > +if (n->rss_data.enabled) {
>  > +trace_virtio_net_rss_enable(n->rss_data.hash_types,
>  > +n->rss_data.indirections_len,
>  > +sizeof(n->rss_data.key));
>  > +} else {
>  > +trace_virtio_net_rss_disable();
>  > +}

This is the bigger "abuser" that I have ever seen for a post_load
function.  Just to add a trace depending on a value O:-)

>  >  return 0;
>  >  }
>  >  
>  > @@ -3019,6 +3026,24 @@ static const VMStateDescription
>  vmstate_virtio_net_has_vnet = {
>  >  },
>  >  };
>  >  
>  > +static const VMStateDescription vmstate_rss = {
>  > +.name  = "vmstate_rss",
>
>  You need to do something to avoid breaking migration compatibility
>  from/to old QEMU's and from/to QEMU's on hosts without the new virtio
>  features.
>  Probably adding a .needed =   here pointing to a function that
>  checks 'enabled' might do it.
>
> Does VMSTATE_STRUCT_TEST(..,..,checker_procedure,...) result the same thing?

It is just a similar thing, not the same.
If you add a new field, you need to increase the version number.  And
that make backward compatibility really annoying.
With subsections, you can make it work correctly with old versions
always that you don't use rss.

> Another question about migration support:
> What is expected/required behavior?
> Possible cases:
> old qemu -> new qemu

That should always work.
If you use an optional subsection this works for free.  Old qemu has no
rss subsection.

> new qemu (new feature off) -> old qemu

This is desirable.  And with the optional subsection it just works, no
chang eneeded.

> new qemu (new feature on) -> old qemu

This obviosly will not work, and we are fine.  There will appear a new
subsection that old qemu don't undertand.  Destination will give one
error and give up.

For one example, just look at something like:

hw/virtio/virtio.c:: vmstate_virtio

There are lots of subscitnsios there.

Later, Juan.




Re: [PULL v2 00/11] Bitmaps patches

2020-03-19 Thread Peter Maydell
On Wed, 18 Mar 2020 at 20:24, John Snow  wrote:
>
> The following changes since commit d649689a8ecb2e276cc20d3af6d416e3c299cb17:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2020-03-17 18:33:05 +)
>
> are available in the Git repository at:
>
>   https://github.com/jnsnow/qemu.git tags/bitmaps-pull-request
>
> for you to fetch changes up to 2d00cbd8e222a4adc08f415c399e84590ee8ff9a:
>
>   block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty (2020-03-18 14:03:46 
> -0400)
>
> 
> Pull request
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM



Re: [PATCH v5 6/7] vmstate.h: provide VMSTATE_VARRAY_UINT16_ALLOC macro

2020-03-19 Thread Michael S. Tsirkin
On Thu, Mar 19, 2020 at 07:12:20PM +0200, Yuri Benditovich wrote:
> 
> 
> On Wed, Mar 18, 2020 at 11:42 AM Michael S. Tsirkin  wrote:
> 
> On Wed, Mar 18, 2020 at 11:15:24AM +0200, Yuri Benditovich wrote:
> > Similar to VMSTATE_VARRAY_UINT32_ALLOC, but the size is
> > 16-bit field.
> >
> > Signed-off-by: Yuri Benditovich 
> 
> Hmm this is exactly my patch isn't it? If yes pls fix up attribution
> (if this is not reposted, then when applying):
> 
> 
> Of course, it is similar to the one you wrote inline.
> Unlike one you wrote inline this patch does not fail on checkpatch.

If you feel you modified it significantly enough, you can write
"based on a patch by mst".

> But the idea is the same, hard to invent something.
> Please just let me know what exactly should I do: resubmit or not and whether
> it is possible to fix it without resubmission.
>  
> 
> 
> From: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> > ---
> >  include/migration/vmstate.h | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 30667631bc..baaefb6b9b 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -432,6 +432,16 @@ extern const VMStateInfo vmstate_info_qlist;
> >      .offset     = vmstate_offset_pointer(_state, _field, _type),     \
> >  }
> > 
> > +#define VMSTATE_VARRAY_UINT16_ALLOC(_field, _state, _field_num,
> _version, _info, _type) {\
> > +    .name       = (stringify(_field)),                               \
> > +    .version_id = (_version),                                        \
> > +    .num_offset = vmstate_offset_value(_state, _field_num, uint16_t),\
> > +    .info       = &(_info),                                          \
> > +    .size       = sizeof(_type),                                     \
> > +    .flags      = VMS_VARRAY_UINT16 | VMS_POINTER | VMS_ALLOC,       \
> > +    .offset     = vmstate_offset_pointer(_state, _field, _type),     \
> > +}
> > +
> >  #define VMSTATE_VARRAY_UINT16_UNSAFE(_field, _state, _field_num,
> _version, _info, _type) {\
> >      .name       = (stringify(_field)),                               \
> >      .version_id = (_version),                                        \
> > --
> > 2.17.1
> 
> 




Re: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately

2020-03-19 Thread Michael S. Tsirkin
On Thu, Mar 12, 2020 at 09:27:32AM +, Shameerali Kolothum Thodi wrote:
> 
> 
> > -Original Message-
> > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > Sent: 11 March 2020 21:10
> > To: Shameerali Kolothum Thodi 
> > Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org;
> > eric.au...@redhat.com; imamm...@redhat.com; peter.mayd...@linaro.org;
> > shannon.zha...@gmail.com; xiaoguangrong.e...@gmail.com;
> > da...@redhat.com; xuwei (O) ; ler...@redhat.com;
> > Linuxarm 
> > Subject: Re: [PATCH v3 02/10] fw_cfg: Migrate ACPI table mr sizes separately
> > 
> > On Wed, Mar 11, 2020 at 05:20:06PM +, Shameer Kolothum wrote:
> > > Any sub-page size update to ACPI table MRs will be lost during
> > > migration, as we use aligned size in ram_load_precopy() ->
> > > qemu_ram_resize() path. This will result in inconsistency in sizes
> > > between source and destination. In order to avoid this, save and
> > > restore them separately during migration.
> 
> 
> > Is there a reason this is part of nvdimm patchset?
> 
> Not really. But this problem is more visible if we have nvdimm hotplug
> support added to arm/virt. On x86, both acpi table and linker MRs are already
> aligned and I don't know a use case where you can change RSDP MR size(See 
> below).
> 
> >
> > Hmm but for old machine types we still have a problem right?
> > How about aligning size on source for them?
> > Then there won't be an inconsistency across migration.
> > Wastes some boot time/memory but maybe that's better
> > than a chance of not booting ...
> 
> Right. That was considered. On x86, except RSDP MR, both the LINKER and ACPI
> TABLE MRs are already aligned/padded. And we cannot make RSDP mr aligned
> as it will break the seabios based boot.

Hmm. So right now if we migrate just before RSDP is read, there's
a failure?

> So a generic solution based on alignment 
> is not possible unless we guarantee that RSDP is not going to be modified.
> 
> What we could do for Arm/virt is just follow the x86 way and add padding for
> table and linker MRs. But this was discussed before and IIRC, was not well
> received.
> 
> Thanks,
> Shameer
> 
> > > Suggested-by: David Hildenbrand 
> > > Signed-off-by: Shameer Kolothum 
> > > ---
> > > Please find the discussion here,
> > > https://patchwork.kernel.org/patch/11339591/
> > > ---
> > >  hw/core/machine.c |  1 +
> > >  hw/nvram/fw_cfg.c | 86
> > ++-
> > >  include/hw/nvram/fw_cfg.h |  6 +++
> > >  3 files changed, 92 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index 9e8c06036f..6d960bd47f 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -39,6 +39,7 @@ GlobalProperty hw_compat_4_2[] = {
> > >  { "usb-redir", "suppress-remote-wake", "off" },
> > >  { "qxl", "revision", "4" },
> > >  { "qxl-vga", "revision", "4" },
> > > +{ "fw_cfg", "acpi-mr-restore", "false" },
> > >  };
> > >  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
> > >
> > > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > > index 179b302f01..36d1e32f83 100644
> > > --- a/hw/nvram/fw_cfg.c
> > > +++ b/hw/nvram/fw_cfg.c
> > > @@ -39,6 +39,7 @@
> > >  #include "qemu/config-file.h"
> > >  #include "qemu/cutils.h"
> > >  #include "qapi/error.h"
> > > +#include "hw/acpi/aml-build.h"
> > >
> > >  #define FW_CFG_FILE_SLOTS_DFLT 0x20
> > >
> > > @@ -610,6 +611,50 @@ bool fw_cfg_dma_enabled(void *opaque)
> > >  return s->dma_enabled;
> > >  }
> > >
> > > +static bool fw_cfg_acpi_mr_restore(void *opaque)
> > > +{
> > > +FWCfgState *s = opaque;
> > > +return s->acpi_mr_restore;
> > > +}
> > > +
> > > +static void fw_cfg_update_mr(FWCfgState *s, uint16_t key, size_t size)
> > > +{
> > > +MemoryRegion *mr;
> > > +ram_addr_t offset;
> > > +int arch = !!(key & FW_CFG_ARCH_LOCAL);
> > > +void *ptr;
> > > +
> > > +key &= FW_CFG_ENTRY_MASK;
> > > +assert(key < fw_cfg_max_entry(s));
> > > +
> > > +ptr = s->entries[arch][key].data;
> > > +mr = memory_region_from_host(ptr, );
> > > +
> > > +memory_region_ram_resize(mr, size, _abort);
> > > +}
> > > +
> > > +static int fw_cfg_acpi_mr_restore_post_load(void *opaque, int version_id)
> > > +{
> > > +FWCfgState *s = opaque;
> > > +int i, index;
> > > +
> > > +assert(s->files);
> > > +
> > > +index = be32_to_cpu(s->files->count);
> > > +
> > > +for (i = 0; i < index; i++) {
> > > +if (!strcmp(s->files->f[i].name, ACPI_BUILD_TABLE_FILE)) {
> > > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> > s->table_mr_size);
> > > +} else if (!strcmp(s->files->f[i].name, ACPI_BUILD_LOADER_FILE)) 
> > > {
> > > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> > s->linker_mr_size);
> > > +} else if (!strcmp(s->files->f[i].name, ACPI_BUILD_RSDP_FILE)) {
> > > +fw_cfg_update_mr(s, FW_CFG_FILE_FIRST + i,
> > s->rsdp_mr_size);
> > > +}
> > 

[PATCH] misc: fix __COUNTER__ macro to be referenced properly

2020-03-19 Thread dnbrdsky
From: danbrodsky 

- __COUNTER__ doesn't work with ## concat
- replaced ## with glue() macro so __COUNTER__ is evaluated

Signed-off-by: danbrodsky 
---
 include/qemu/lockable.h | 2 +-
 include/qemu/rcu.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index 1aeb2cb1a6..a9258f2c2c 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -170,7 +170,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(QemuLockable, 
qemu_lockable_auto_unlock)
  *   }
  */
 #define QEMU_LOCK_GUARD(x) \
-g_autoptr(QemuLockable) qemu_lockable_auto##__COUNTER__ = \
+g_autoptr(QemuLockable) glue(qemu_lockable_auto, __COUNTER__) = \
 qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE((x)))
 
 #endif
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 9c82683e37..570aa603eb 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -170,7 +170,7 @@ static inline void rcu_read_auto_unlock(RCUReadAuto *r)
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(RCUReadAuto, rcu_read_auto_unlock)
 
 #define WITH_RCU_READ_LOCK_GUARD() \
-WITH_RCU_READ_LOCK_GUARD_(_rcu_read_auto##__COUNTER__)
+WITH_RCU_READ_LOCK_GUARD_(glue(_rcu_read_auto, __COUNTER__))
 
 #define WITH_RCU_READ_LOCK_GUARD_(var) \
 for (g_autoptr(RCUReadAuto) var = rcu_read_auto_lock(); \
-- 
2.25.1




[PATCH] lockable: replaced locks with lock guard macros where appropriate

2020-03-19 Thread dnbrdsky
From: danbrodsky 

- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: danbrodsky 
---
 block/iscsi.c | 23 +++
 block/nfs.c   | 53 ---
 cpus-common.c | 13 ---
 hw/display/qxl.c  | 44 +--
 hw/vfio/platform.c|  4 +---
 migration/migration.c |  3 +--
 migration/multifd.c   |  8 +++
 migration/ram.c   |  3 +--
 monitor/misc.c|  4 +---
 ui/spice-display.c| 14 ++--
 util/log.c|  4 ++--
 util/qemu-timer.c | 17 +++---
 util/rcu.c|  8 +++
 util/thread-pool.c|  3 +--
 util/vfio-helpers.c   |  4 ++--
 15 files changed, 90 insertions(+), 115 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 682abd8e09..df73bde114 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1086,23 +1086,21 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 acb->task->expxferlen = acb->ioh->dxfer_len;
 
 data.size = 0;
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
 if (acb->ioh->iovec_count == 0) {
 data.data = acb->ioh->dxferp;
 data.size = acb->ioh->dxfer_len;
 } else {
 scsi_task_set_iov_out(acb->task,
- (struct scsi_iovec *) acb->ioh->dxferp,
- acb->ioh->iovec_count);
+  (struct scsi_iovec *)acb->ioh->dxferp,
+  acb->ioh->iovec_count);
 }
 }
 
 if (iscsi_scsi_command_async(iscsi, iscsilun->lun, acb->task,
  iscsi_aio_ioctl_cb,
- (data.size > 0) ?  : NULL,
- acb) != 0) {
-qemu_mutex_unlock(>mutex);
+ (data.size > 0) ?  : NULL, acb) != 0) {
 scsi_free_scsi_task(acb->task);
 qemu_aio_unref(acb);
 return NULL;
@@ -,18 +1109,16 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 /* tell libiscsi to read straight into the buffer we got from ioctl */
 if (acb->task->xfer_dir == SCSI_XFER_READ) {
 if (acb->ioh->iovec_count == 0) {
-scsi_task_add_data_in_buffer(acb->task,
- acb->ioh->dxfer_len,
+scsi_task_add_data_in_buffer(acb->task, acb->ioh->dxfer_len,
  acb->ioh->dxferp);
 } else {
 scsi_task_set_iov_in(acb->task,
- (struct scsi_iovec *) acb->ioh->dxferp,
+ (struct scsi_iovec *)acb->ioh->dxferp,
  acb->ioh->iovec_count);
 }
 }
 
 iscsi_set_events(iscsilun);
-qemu_mutex_unlock(>mutex);
 
 return >common;
 }
@@ -1395,20 +1391,17 @@ static void iscsi_nop_timed_event(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
 error_report("iSCSI: NOP timeout. Reconnecting...");
 iscsilun->request_timed_out = true;
 } else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 0) 
{
 error_report("iSCSI: failed to sent NOP-Out. Disabling NOP messages.");
-goto out;
+return;
 }
 
 timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 
NOP_INTERVAL);
 iscsi_set_events(iscsilun);
-
-out:
-qemu_mutex_unlock(>mutex);
 }
 
 static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
diff --git a/block/nfs.c b/block/nfs.c
index 9a6311e270..37e8b82731 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -273,15 +273,14 @@ static int coroutine_fn nfs_co_preadv(BlockDriverState 
*bs, uint64_t offset,
 nfs_co_init_task(bs, );
 task.iov = iov;
 
-qemu_mutex_lock(>mutex);
-if (nfs_pread_async(client->context, client->fh,
-offset, bytes, nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pread_async(client->context, client->fh,
+offset, bytes, nfs_co_generic_cb, ) != 0) {
+return -ENOMEM;
+}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -290,7 +289,7 @@ static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, 
uint64_t offset,
 return task.ret;
 }
 
-/* zero pad short reads */
+/* zero pad short reads */
 if (task.ret < iov->size) {
 

Re: [PATCH v4 2/3] net: tulip: add .can_recieve routine

2020-03-19 Thread P J P
+-- On Thu, 19 Mar 2020, Philippe Mathieu-Daudé wrote --+
| Typo "can_recieve" -> "can_receive" in subject.

Oops! Fixed it, sent revised patch v5.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
8685 545E B54C 486B C6EB 271E E285 8B5A F050 DE8D

Re: [PATCH v5 07/18] s390x: protvirt: Inhibit balloon when switching to protected mode

2020-03-19 Thread Michael S. Tsirkin
On Thu, Mar 19, 2020 at 02:54:11PM +0100, David Hildenbrand wrote:
> Why does the balloon driver not support VIRTIO_F_IOMMU_PLATFORM? It is
> absolutely not clear to me. The introducing commit mentioned that it
> "bypasses DMA". I fail to see that.

Well sure one can put the balloon behind an IOMMU.  If will shuffle PFN
lists through a shared page.  Problem is, you can't run an untrusted
driver with it since if you do it can corrupt guest memory.
And VIRTIO_F_IOMMU_PLATFORM so far meant that you can run
a userspace driver.

Maybe we need a separate feature bit for this kind of thing where you
assume the driver is trusted? Such a bit - unlike
VIRTIO_F_IOMMU_PLATFORM - would allow legacy guests ...



-- 
MST




[PATCH v5 3/3] net: tulip: flush queued packets post receive

2020-03-19 Thread P J P
From: Prasad J Pandit 

Call qemu_flush_queued_packets to flush queued packets once they
are read in tulip_receive().

Suggested-by: Jason Wang 
Signed-off-by: Prasad J Pandit 
---
 hw/net/tulip.c | 2 ++
 1 file changed, 2 insertions(+)

Update v4: call qemu_flush_queued_packets()
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg05868.html

diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index 757f12c710..8d8c9519e7 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -287,6 +287,8 @@ static ssize_t tulip_receive(TULIPState *s, const uint8_t 
*buf, size_t size)
 tulip_desc_write(s, s->current_rx_desc, );
 tulip_next_rx_descriptor(s, );
 } while (s->rx_frame_len);
+
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
 return size;
 }
 
-- 
2.25.1




[PATCH v5 1/3] net: tulip: check frame size and r/w data length

2020-03-19 Thread P J P
From: Prasad J Pandit 

Tulip network driver while copying tx/rx buffers does not check
frame size against r/w data length. This may lead to OOB buffer
access. Add check to avoid it.

Limit iterations over descriptors to avoid potential infinite
loop issue in tulip_xmit_list_update.

Reported-by: Li Qiang 
Reported-by: Ziming Zhang 
Reported-by: Jason Wang 
Signed-off-by: Prasad J Pandit 
---
 hw/net/tulip.c | 36 +++-
 1 file changed, 27 insertions(+), 9 deletions(-)

Update v3: return a value from tulip_copy_tx_buffers() and avoid infinite loop
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg06275.html

diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index cfac2719d3..fbe40095da 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -170,6 +170,10 @@ static void tulip_copy_rx_bytes(TULIPState *s, struct 
tulip_descriptor *desc)
 } else {
 len = s->rx_frame_len;
 }
+
+if (s->rx_frame_len + len >= sizeof(s->rx_frame)) {
+return;
+}
 pci_dma_write(>dev, desc->buf_addr1, s->rx_frame +
 (s->rx_frame_size - s->rx_frame_len), len);
 s->rx_frame_len -= len;
@@ -181,6 +185,10 @@ static void tulip_copy_rx_bytes(TULIPState *s, struct 
tulip_descriptor *desc)
 } else {
 len = s->rx_frame_len;
 }
+
+if (s->rx_frame_len + len >= sizeof(s->rx_frame)) {
+return;
+}
 pci_dma_write(>dev, desc->buf_addr2, s->rx_frame +
 (s->rx_frame_size - s->rx_frame_len), len);
 s->rx_frame_len -= len;
@@ -227,7 +235,8 @@ static ssize_t tulip_receive(TULIPState *s, const uint8_t 
*buf, size_t size)
 
 trace_tulip_receive(buf, size);
 
-if (size < 14 || size > 2048 || s->rx_frame_len || tulip_rx_stopped(s)) {
+if (size < 14 || size > sizeof(s->rx_frame) - 4
+|| s->rx_frame_len || tulip_rx_stopped(s)) {
 return 0;
 }
 
@@ -275,7 +284,6 @@ static ssize_t tulip_receive_nc(NetClientState *nc,
 return tulip_receive(qemu_get_nic_opaque(nc), buf, size);
 }
 
-
 static NetClientInfo net_tulip_info = {
 .type = NET_CLIENT_DRIVER_NIC,
 .size = sizeof(NICState),
@@ -558,7 +566,7 @@ static void tulip_tx(TULIPState *s, struct tulip_descriptor 
*desc)
 if ((s->csr[6] >> CSR6_OM_SHIFT) & CSR6_OM_MASK) {
 /* Internal or external Loopback */
 tulip_receive(s, s->tx_frame, s->tx_frame_len);
-} else {
+} else if (s->tx_frame_len <= sizeof(s->tx_frame)) {
 qemu_send_packet(qemu_get_queue(s->nic),
 s->tx_frame, s->tx_frame_len);
 }
@@ -570,23 +578,31 @@ static void tulip_tx(TULIPState *s, struct 
tulip_descriptor *desc)
 }
 }
 
-static void tulip_copy_tx_buffers(TULIPState *s, struct tulip_descriptor *desc)
+static int tulip_copy_tx_buffers(TULIPState *s, struct tulip_descriptor *desc)
 {
 int len1 = (desc->control >> TDES1_BUF1_SIZE_SHIFT) & TDES1_BUF1_SIZE_MASK;
 int len2 = (desc->control >> TDES1_BUF2_SIZE_SHIFT) & TDES1_BUF2_SIZE_MASK;
 
+if (s->tx_frame_len + len1 >= sizeof(s->tx_frame)) {
+return -1;
+}
 if (len1) {
 pci_dma_read(>dev, desc->buf_addr1,
 s->tx_frame + s->tx_frame_len, len1);
 s->tx_frame_len += len1;
 }
 
+if (s->tx_frame_len + len2 >= sizeof(s->tx_frame)) {
+return -1;
+}
 if (len2) {
 pci_dma_read(>dev, desc->buf_addr2,
 s->tx_frame + s->tx_frame_len, len2);
 s->tx_frame_len += len2;
 }
 desc->status = (len1 + len2) ? 0 : 0x7fff;
+
+return 0;
 }
 
 static void tulip_setup_filter_addr(TULIPState *s, uint8_t *buf, int n)
@@ -651,13 +667,15 @@ static uint32_t tulip_ts(TULIPState *s)
 
 static void tulip_xmit_list_update(TULIPState *s)
 {
+#define TULIP_DESC_MAX 128
+uint8_t i = 0;
 struct tulip_descriptor desc;
 
 if (tulip_ts(s) != CSR5_TS_SUSPENDED) {
 return;
 }
 
-for (;;) {
+for (i = 0; i < TULIP_DESC_MAX; i++) {
 tulip_desc_read(s, s->current_tx_desc, );
 tulip_dump_tx_descriptor(s, );
 
@@ -675,10 +693,10 @@ static void tulip_xmit_list_update(TULIPState *s)
 s->tx_frame_len = 0;
 }
 
-tulip_copy_tx_buffers(s, );
-
-if (desc.control & TDES1_LS) {
-tulip_tx(s, );
+if (!tulip_copy_tx_buffers(s, )) {
+if (desc.control & TDES1_LS) {
+tulip_tx(s, );
+}
 }
 }
 tulip_desc_write(s, s->current_tx_desc, );
-- 
2.25.1




[PATCH v5 2/3] net: tulip: add .can_receive routine

2020-03-19 Thread P J P
From: Prasad J Pandit 

Define .can_receive routine to do sanity checks before receiving
packet data.

Signed-off-by: Prasad J Pandit 
---
 hw/net/tulip.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Update v3: define .can_receive routine
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg06275.html

Update v5: fix a typo in commit log message
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg06209.html

diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index fbe40095da..757f12c710 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -229,6 +229,18 @@ static bool tulip_filter_address(TULIPState *s, const 
uint8_t *addr)
 return ret;
 }
 
+static int
+tulip_can_receive(NetClientState *nc)
+{
+TULIPState *s = qemu_get_nic_opaque(nc);
+
+if (s->rx_frame_len || tulip_rx_stopped(s)) {
+return false;
+}
+
+return true;
+}
+
 static ssize_t tulip_receive(TULIPState *s, const uint8_t *buf, size_t size)
 {
 struct tulip_descriptor desc;
@@ -236,7 +248,7 @@ static ssize_t tulip_receive(TULIPState *s, const uint8_t 
*buf, size_t size)
 trace_tulip_receive(buf, size);
 
 if (size < 14 || size > sizeof(s->rx_frame) - 4
-|| s->rx_frame_len || tulip_rx_stopped(s)) {
+|| !tulip_can_receive(s->nic->ncs)) {
 return 0;
 }
 
@@ -288,6 +300,7 @@ static NetClientInfo net_tulip_info = {
 .type = NET_CLIENT_DRIVER_NIC,
 .size = sizeof(NICState),
 .receive = tulip_receive_nc,
+.can_receive = tulip_can_receive,
 };
 
 static const char *tulip_reg_name(const hwaddr addr)
-- 
2.25.1




[PATCH v5 0/3] net: tulip: add checks to avoid OOB access

2020-03-19 Thread P J P
From: Prasad J Pandit 

Hello,

* This series adds checks to avoid potential OOB access and infinite loop
  issues while processing rx/tx data.

* Tulip tx descriptors are capped at 128 to avoid infinite loop in
  tulip_xmit_list_update(), wrt Tulip kernel driver
  -> 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/dec/tulip/tulip.h#n319

* Update v3: add .can_receive routine
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg06275.html

* Update v4: flush queued packets once they are received
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg05868.html

* Update v5: fixed a typo in patch commit message
  -> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg06209.html

Thank you.
--
Prasad J Pandit (3):
  net: tulip: check frame size and r/w data length
  net: tulip: add .can_receive routine
  net: tulip: flush queued packets post receive

 hw/net/tulip.c | 51 +-
 1 file changed, 42 insertions(+), 9 deletions(-)

--
2.25.1




Re: [PATCH v6 12/61] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions

2020-03-19 Thread Alistair Francis
On Tue, Mar 17, 2020 at 8:31 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   |  33 ++
>  target/riscv/insn32.decode  |  11 ++
>  target/riscv/insn_trans/trans_rvv.inc.c |  78 +
>  target/riscv/vector_helper.c| 148 
>  4 files changed, 270 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 1256defb6c..72c733bf49 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -339,3 +339,36 @@ DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 4bdbfd16fa..022c8ea18b 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -70,6 +70,7 @@
>  @r2_nfvm ... ... vm:1 . . ... . ...  %nf %rs1 %rd
>  @r_nfvm  ... ... vm:1 . . ... . ...  %nf %rs2 %rs1 %rd
>  @r_vm.. vm:1 . . ... . ...  %rs2 %rs1 %rd
> +@r_vm_1  .. . . . ... . ... vm=1 %rs2 %rs1 %rd
>  @r_wdvm  . wd:1 vm:1 . . ... . ...  %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  . ... . ... %rs1 %rd
>
> @@ -300,6 +301,16 @@ vwsubu_wv   110110 . . . 010 . 1010111 
> @r_vm
>  vwsubu_wx   110110 . . . 110 . 1010111 @r_vm
>  vwsub_wv110111 . . . 010 . 1010111 @r_vm
>  vwsub_wx110111 . . . 110 . 1010111 @r_vm
> +vadc_vvm01 1 . . 000 . 1010111 @r_vm_1
> +vadc_vxm01 1 . . 100 . 1010111 @r_vm_1
> +vadc_vim01 1 . . 011 . 1010111 @r_vm_1
> +vmadc_vvm   010001 1 . . 000 . 1010111 @r_vm_1
> +vmadc_vxm   010001 1 . . 100 . 1010111 @r_vm_1
> +vmadc_vim   010001 1 . . 011 . 1010111 @r_vm_1
> +vsbc_vvm010010 1 . . 000 . 1010111 @r_vm_1
> +vsbc_vxm010010 1 . . 100 . 1010111 @r_vm_1
> +vmsbc_vvm   010011 1 . . 000 . 1010111 @r_vm_1
> +vmsbc_vxm   010011 1 . . 100 . 1010111 @r_vm_1
>
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index 8f17faa3f3..4562d5f14f 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -1169,3 +1169,81 @@ GEN_OPIWX_WIDEN_TRANS(vwaddu_wx)
>  GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
>  GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
>  GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
> +
> +/* 

Re: [PATCH v2] MAINTAINERS: Add an entry for the HVF accelerator

2020-03-19 Thread Paolo Bonzini
On 19/03/20 14:43, Roman Bolshakov wrote:
> On Wed, Mar 18, 2020 at 11:47:15AM +0100, Paolo Bonzini wrote:
>>
>> Queued, thanks.
>>
> 
> Hi Paolo,
> 
> I'm going to send v3 shortly to include Cameron as maintainer.

Okay!

Paolo




Re: [PATCH 00/13] microvm: add acpi support

2020-03-19 Thread Paolo Bonzini
On 19/03/20 14:40, Gerd Hoffmann wrote:
>> Also, can you confirm that it builds without CONFIG_I440FX and
>> CONFIG_Q35?  You probably need to add "imply ACPI" and possibly some
>> '#include "config-devices.h"' and '#ifdef CONFIG_ACPI' here and there.
> Hmm, is there some way to do this without modifying
> default-configs/i386-softmmu.mak in the source tree?  So I can have two
> build trees with different confugurations?  Also to reduce the risk that
> I commit default-config changes by mistake?

No, there is no way yet.

Paolo




Re: [PATCH v5 07/18] s390x: protvirt: Inhibit balloon when switching to protected mode

2020-03-19 Thread David Hildenbrand
[...]

>>
>> I asked this question already to Michael (cc) via a different channel,
>> but hare is it again:
>>
>> Why does the balloon driver not support VIRTIO_F_IOMMU_PLATFORM? It is
>> absolutely not clear to me. The introducing commit mentioned that it
>> "bypasses DMA". I fail to see that.
>>
>> At least the communication via the SG mechanism should work perfectly
>> fine with an IOMMU enabled. So I assume it boils down to the pages that
>> we inflate/deflate not being referenced via IOVA?
> 
> AFAIU the IOVA/GPA stuff is not the problem here. You have said it
> yourself, the SG mechanism would work for balloon out of the box, as it
> does for the other virtio devices. 
> 
> But VIRTIO_F_ACCESS_PLATFORM (aka VIRTIO_F_IOMMU_PLATFORM)  not presented
> means according to Michael that the device has full access to the entire
> guest RAM. If VIRTIO_F_ACCESS_PLATFORM is negotiated this may or may not
> be the case.

So you say

"The virtio specification tells that the device is to present
VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
device "can only access certain memory addresses with said access
specified and/or granted by the platform"."

So, AFAIU, *any* virtio device (hypervisor side) has to present this
flag when PV is enabled. In that regard, your patch makes perfect sense
(although I am not sure it's a good idea to overwrite these feature bits
- maybe they should be activated on the cmdline permanently instead when
PV is to be used? (or enable )).

> 
> The actual problem is that the pages denoted by the buffer transmitted
> via the virtqueue are normally not shared pages. I.e. the hypervisor
> can not reuse them (what is the point of balloon inflate). To make this
> work, the guest would need to share the pages before saying 'host these
> are in my balloon, so you can use them'. This is a piece of logic we

What exactly would have to be done in the hypervisor to support it?

Assume we have to trigger sharing/unsharing - this sounds like a very
architecture specific thing? Or is this e.g., doing a map/unmap
operation like mapping/unmapping the SG?

Right now it sounds to me "we have to do $ARCHSPECIFIC when
inflating/deflating in the guest", which feels wrong.

> need only if the host/the device does not have full access to the
> guest RAM. That is in my opinion why the balloon driver fences
> VIRTIO_F_ACCESS_PLATFORM.> Does that make sense?

Yeah, I understood the "device has to set VIRTIO_F_ACCESS_PLATFORM"
part. Struggling with the "what can the guest driver actually do" part.

> 
>>
>> I don't think they have to be IOVA addresses. We're neither reading nor
>> writing these pages. We really speak about "physical memory in the
>> system" when ballooning. Everything else doesn't really make sense.
>> There is no need to map/unmap pages we inflate/deflate AFAIKs.
>>
>> IMHO, we should not try to piggy-back on VIRTIO_F_IOMMU_PLATFORM here,
>> but instead explicitly disable it either in the hypervisor or the guest.
>>
> 
> We need a feature bit here. We can say fencing VIRTIO_F_ACCESS_PLATFORM
> was a bug, fix that bug, and then invent another 'the guest RAM is
> somehow different' feature bit specific to the balloon, and then create
> arch hooks in the driver that get active if this feature is negotiated.
> 
> I assumed the fact that the balloon driver fences
> VIRTIO_F_ACCESS_PLATFORM is not a bug.
> 
>> I hope someone can clarify what the real issue with an IOMMU and
>> ballooning is, because I'll be having the same "issue" with
> virtio-mem.
>>
> 
> The issue is not with the IOMMU, the issue is with restricted access
> to guest RAM. The definition of VIRTIO_F_ACCESS_PLATFORM is such that we
> pretty much know what's up when VIRTIO_F_ACCESS_PLATFORM is not
> presented, but VIRTIO_F_ACCESS_PLATFORM presented can mean a couple of
> things.

Understood.

-- 
Thanks,

David / dhildenb




Re: [PATCH v5 7/7] virtio-net: add migration support for RSS and hash report

2020-03-19 Thread Michael S. Tsirkin
On Thu, Mar 19, 2020 at 07:19:26PM +0200, Yuri Benditovich wrote:
> 
> 
> On Wed, Mar 18, 2020 at 12:48 PM Dr. David Alan Gilbert 
> wrote:
> 
> * Yuri Benditovich (yuri.benditov...@daynix.com) wrote:
> > Save and restore RSS/hash report configuration.
> >
> > Signed-off-by: Yuri Benditovich 
> > ---
> >  hw/net/virtio-net.c | 26 ++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index a0614ad4e6..0b058aae9f 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -2842,6 +2842,13 @@ static int virtio_net_post_load_device(void
> *opaque, int version_id)
> >          }
> >      }
> > 
> > +    if (n->rss_data.enabled) {
> > +        trace_virtio_net_rss_enable(n->rss_data.hash_types,
> > +                                    n->rss_data.indirections_len,
> > +                                    sizeof(n->rss_data.key));
> > +    } else {
> > +        trace_virtio_net_rss_disable();
> > +    }
> >      return 0;
> >  }
> > 
> > @@ -3019,6 +3026,24 @@ static const VMStateDescription
> vmstate_virtio_net_has_vnet = {
> >      },
> >  };
> > 
> > +static const VMStateDescription vmstate_rss = {
> > +    .name      = "vmstate_rss",
> 
> You need to do something to avoid breaking migration compatibility
> from/to old QEMU's and from/to QEMU's on hosts without the new virtio
> features.
> Probably adding a .needed =   here pointing to a function that
> checks 'enabled' might do it.
> 
> 
> Does VMSTATE_STRUCT_TEST(..,..,checker_procedure,...) result the same thing?
> 
> Another question about migration support:
> What is expected/required behavior?
> Possible cases:
> old qemu -> new qemu
> new qemu (new feature off) -> old qemu

works

> new qemu (new feature on) -> old qemu
>

fails gracefully

> 
> Dave
> 
> 
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_BOOL(enabled, VirtioNetRssData),
> > +        VMSTATE_BOOL(redirect, VirtioNetRssData),
> > +        VMSTATE_BOOL(populate_hash, VirtioNetRssData),
> > +        VMSTATE_UINT32(hash_types, VirtioNetRssData),
> > +        VMSTATE_UINT16(indirections_len, VirtioNetRssData),
> > +        VMSTATE_UINT16(default_queue, VirtioNetRssData),
> > +        VMSTATE_UINT8_ARRAY(key, VirtioNetRssData,
> > +                            VIRTIO_NET_RSS_MAX_KEY_SIZE),
> > +        VMSTATE_VARRAY_UINT16_ALLOC(indirections_table,
> VirtioNetRssData,
> > +                                    indirections_len, 0,
> > +                                    vmstate_info_uint16, uint16_t),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> >  static const VMStateDescription vmstate_virtio_net_device = {
> >      .name = "virtio-net-device",
> >      .version_id = VIRTIO_NET_VM_VERSION,
> > @@ -3067,6 +3092,7 @@ static const VMStateDescription
> vmstate_virtio_net_device = {
> >                           vmstate_virtio_net_tx_waiting),
> >          VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
> >                              has_ctrl_guest_offloads),
> > +        VMSTATE_STRUCT(rss_data, VirtIONet, 1, vmstate_rss,
> VirtioNetRssData),
> >          VMSTATE_END_OF_LIST()
> >     },
> >  };
> > --
> > 2.17.1
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 




Re: [PATCH v5 7/7] virtio-net: add migration support for RSS and hash report

2020-03-19 Thread Yuri Benditovich
On Wed, Mar 18, 2020 at 12:48 PM Dr. David Alan Gilbert 
wrote:

> * Yuri Benditovich (yuri.benditov...@daynix.com) wrote:
> > Save and restore RSS/hash report configuration.
> >
> > Signed-off-by: Yuri Benditovich 
> > ---
> >  hw/net/virtio-net.c | 26 ++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index a0614ad4e6..0b058aae9f 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -2842,6 +2842,13 @@ static int virtio_net_post_load_device(void
> *opaque, int version_id)
> >  }
> >  }
> >
> > +if (n->rss_data.enabled) {
> > +trace_virtio_net_rss_enable(n->rss_data.hash_types,
> > +n->rss_data.indirections_len,
> > +sizeof(n->rss_data.key));
> > +} else {
> > +trace_virtio_net_rss_disable();
> > +}
> >  return 0;
> >  }
> >
> > @@ -3019,6 +3026,24 @@ static const VMStateDescription
> vmstate_virtio_net_has_vnet = {
> >  },
> >  };
> >
> > +static const VMStateDescription vmstate_rss = {
> > +.name  = "vmstate_rss",
>
> You need to do something to avoid breaking migration compatibility
> from/to old QEMU's and from/to QEMU's on hosts without the new virtio
> features.
> Probably adding a .needed =   here pointing to a function that
> checks 'enabled' might do it.
>
> Does VMSTATE_STRUCT_TEST(..,..,checker_procedure,...) result the same
thing?

Another question about migration support:
What is expected/required behavior?
Possible cases:
old qemu -> new qemu
new qemu (new feature off) -> old qemu
new qemu (new feature on) -> old qemu


> Dave
>
>
> > +.fields = (VMStateField[]) {
> > +VMSTATE_BOOL(enabled, VirtioNetRssData),
> > +VMSTATE_BOOL(redirect, VirtioNetRssData),
> > +VMSTATE_BOOL(populate_hash, VirtioNetRssData),
> > +VMSTATE_UINT32(hash_types, VirtioNetRssData),
> > +VMSTATE_UINT16(indirections_len, VirtioNetRssData),
> > +VMSTATE_UINT16(default_queue, VirtioNetRssData),
> > +VMSTATE_UINT8_ARRAY(key, VirtioNetRssData,
> > +VIRTIO_NET_RSS_MAX_KEY_SIZE),
> > +VMSTATE_VARRAY_UINT16_ALLOC(indirections_table,
> VirtioNetRssData,
> > +indirections_len, 0,
> > +vmstate_info_uint16, uint16_t),
> > +VMSTATE_END_OF_LIST()
> > +},
> > +};
> > +
> >  static const VMStateDescription vmstate_virtio_net_device = {
> >  .name = "virtio-net-device",
> >  .version_id = VIRTIO_NET_VM_VERSION,
> > @@ -3067,6 +3092,7 @@ static const VMStateDescription
> vmstate_virtio_net_device = {
> >   vmstate_virtio_net_tx_waiting),
> >  VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
> >  has_ctrl_guest_offloads),
> > +VMSTATE_STRUCT(rss_data, VirtIONet, 1, vmstate_rss,
> VirtioNetRssData),
> >  VMSTATE_END_OF_LIST()
> > },
> >  };
> > --
> > 2.17.1
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>
>


Re: [PATCH 1/4] tests/vm: write raw console log

2020-03-19 Thread Alex Bennée


Cleber Rosa  writes:

> On Mon, Mar 16, 2020 at 03:22:07PM +0100, Philippe Mathieu-Daudé wrote:
>> On 3/16/20 3:16 PM, Alex Bennée wrote:
>> > 
>> > Gerd Hoffmann  writes:
>> > 
>> > > Run "tail -f /var/tmp/*/qemu*console.raw" in another terminal
>> > > to watch the install console.
>> > > 
>> > > Signed-off-by: Gerd Hoffmann 
>> > 
>> > I suspect this is what's breaking "make check-acceptance" so I've
>> > dropped the series from testing/next for now.
>> >
>> >2020-03-11 12:12:30,546 stacktrace   L0039 ERROR|
>> >2020-03-11 12:12:30,546 stacktrace   L0042 ERROR| Reproduced 
>> > traceback from: 
>> > /home/alex.bennee/lsrc/qemu.git/builds/all/tests/venv/lib/python3.6/site-packages/avocado/c\
>> >ore/test.py:860
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| Traceback (most 
>> > recent call last):
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/home/alex.bennee/lsrc/qemu.git/builds/all/tests/venv/lib/python3.6/site-packages/avocado/core/test.py",
>> >  line \
>> >1456, in test
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| 
>> > self.error(self.exception)
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/home/alex.bennee/lsrc/qemu.git/builds/all/tests/venv/lib/python3.6/site-packages/avocado/core/test.py",
>> >  line \
>> >1064, in error
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| raise 
>> > exceptions.TestError(message)
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| 
>> > avocado.core.exceptions.TestError: Traceback (most recent call last):
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/usr/lib/python3.6/imp.py", line 235, in load_module
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| return 
>> > load_source(name, filename, file)
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/usr/lib/python3.6/imp.py", line 172, in load_source
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| module = 
>> > _load(spec)
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File "> > importlib._bootstrap>", line 684, in _load
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File "> > importlib._bootstrap>", line 665, in _load_unlocked
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File "> > importlib._bootstrap_external>", line 678, in exec_module
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File "> > importlib._bootstrap>", line 219, in _call_with_frames_removed
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/home/alex.bennee/lsrc/qemu.git/builds/all/tests/acceptance/machine_mips_malta.py",
>> >  line 15, in 
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| from 
>> > avocado_qemu import Test
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/home/alex.bennee/lsrc/qemu.git/builds/all/tests/acceptance/avocado_qemu/__init__.py",
>> >  line 22, in 
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| from 
>> > qemu.machine import QEMUMachine
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|   File 
>> > "/home/alex.bennee/lsrc/qemu.git/builds/all/tests/acceptance/avocado_qemu/../../../python/qemu/machine.py",
>> >  lin\
>> >e 27, in 
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| from 
>> > qemu.console_socket import ConsoleSocket
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR| 
>> > ModuleNotFoundError: No module named 'qemu.console_socket'
>> 
>> Cc'ing Wainer/Cleber in case...
>>
>
> I've applied the "[PATCH v4 00/10] tests/vm: Add support for aarch64
> VMs" series and this patch (on top of d649689a8) and could not
> replicate this issue with "make check-acceptance".
>
> Maybe I'm missing some other patch?
>
> - Cleber.
>
>> >2020-03-11 12:12:30,547 stacktrace   L0045 ERROR|
>> >2020-03-11 12:12:30,547 stacktrace   L0046 ERROR|
>> >2020-03-11 12:12:30,548 test L0865 DEBUG| Local variables:
>> >2020-03-11 12:12:30,561 test L0868 DEBUG|  -> self > > 'avocado.core.test.TestError'>: 
>> > 1-./tests/acceptance/machine_mips_malta.py:MaltaMachineFramebuffer.tes\
>> >t_mips_malta_i6400_framebuffer_logo_1core
>> > 
>> > 
>> > > ---
>> > >   tests/vm/basevm.py | 6 ++
>> > >   1 file changed, 6 insertions(+)
>> > > 
>> > > diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
>> > > index 8400b0e07f65..c53fd354d955 100644
>> > > --- a/tests/vm/basevm.py
>> > > +++ b/tests/vm/basevm.py
>> > > @@ -213,6 +213,9 @@ class BaseVM(object):
>> > >   def console_init(self, timeout = 120):
>> > >   vm = self._guest
>> > >   vm.console_socket.settimeout(timeout)
>> > > +self.console_raw_path = os.path.join(vm._temp_dir,
>> > > + vm._name + "-console.raw")
>> > > +

Re: [PATCH v5 6/7] vmstate.h: provide VMSTATE_VARRAY_UINT16_ALLOC macro

2020-03-19 Thread Yuri Benditovich
On Wed, Mar 18, 2020 at 11:42 AM Michael S. Tsirkin  wrote:

> On Wed, Mar 18, 2020 at 11:15:24AM +0200, Yuri Benditovich wrote:
> > Similar to VMSTATE_VARRAY_UINT32_ALLOC, but the size is
> > 16-bit field.
> >
> > Signed-off-by: Yuri Benditovich 
>
> Hmm this is exactly my patch isn't it? If yes pls fix up attribution
> (if this is not reposted, then when applying):
>

Of course, it is similar to the one you wrote inline.
Unlike one you wrote inline this patch does not fail on checkpatch.
But the idea is the same, hard to invent something.
Please just let me know what exactly should I do: resubmit or not and
whether it is possible to fix it without resubmission.


>
> From: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
>
>
> > ---
> >  include/migration/vmstate.h | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index 30667631bc..baaefb6b9b 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -432,6 +432,16 @@ extern const VMStateInfo vmstate_info_qlist;
> >  .offset = vmstate_offset_pointer(_state, _field, _type), \
> >  }
> >
> > +#define VMSTATE_VARRAY_UINT16_ALLOC(_field, _state, _field_num,
> _version, _info, _type) {\
> > +.name   = (stringify(_field)),   \
> > +.version_id = (_version),\
> > +.num_offset = vmstate_offset_value(_state, _field_num, uint16_t),\
> > +.info   = &(_info),  \
> > +.size   = sizeof(_type), \
> > +.flags  = VMS_VARRAY_UINT16 | VMS_POINTER | VMS_ALLOC,   \
> > +.offset = vmstate_offset_pointer(_state, _field, _type), \
> > +}
> > +
> >  #define VMSTATE_VARRAY_UINT16_UNSAFE(_field, _state, _field_num,
> _version, _info, _type) {\
> >  .name   = (stringify(_field)),   \
> >  .version_id = (_version),\
> > --
> > 2.17.1
>
>


Re: [PATCH v1] configure: record sphinx output

2020-03-19 Thread Philippe Mathieu-Daudé

On 3/19/20 3:39 PM, Olaf Hering wrote:

If configure fails to run due to errors in the expected sphinx
environment no helpful message is recorded. Write all of the output to
config.log to assist with debugging.

Signed-off-by: Olaf Hering 
---
  configure | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 12dbb0c76b..55086b0280 100755
--- a/configure
+++ b/configure
@@ -4908,7 +4908,7 @@ has_sphinx_build() {
  # sphinx-build doesn't exist at all or if it is too old.
  mkdir -p "$TMPDIR1/sphinx"
  touch "$TMPDIR1/sphinx/index.rst"
-"$sphinx_build" -c "$source_path/docs" -b html "$TMPDIR1/sphinx" 
"$TMPDIR1/sphinx/out" >/dev/null 2>&1
+"$sphinx_build" -c "$source_path/docs" -b html "$TMPDIR1/sphinx" "$TMPDIR1/sphinx/out" 
>> config.log 2>&1
  }
  
  # Check if tools are available to build documentation.



Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v6 09/61] target/riscv: add vector amo operations

2020-03-19 Thread Alistair Francis
On Tue, Mar 17, 2020 at 8:25 AM LIU Zhiwei  wrote:
>
> Vector AMOs operate as if aq and rl bits were zero on each element
> with regard to ordering relative to other instructions in the same hart.
> Vector AMOs provide no ordering guarantee between element operations
> in the same vector AMO instruction
>
> Signed-off-by: LIU Zhiwei 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   |  29 +
>  target/riscv/insn32-64.decode   |  11 ++
>  target/riscv/insn32.decode  |  13 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 134 ++
>  target/riscv/internals.h|   1 +
>  target/riscv/vector_helper.c| 143 
>  6 files changed, 331 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 72ba4d9bdb..70a4b05f75 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
> +#ifdef TARGET_RISCV64
> +DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
> +#endif
> +DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
> index 380bf791bc..86153d93fa 100644
> --- a/target/riscv/insn32-64.decode
> +++ b/target/riscv/insn32-64.decode
> @@ -57,6 +57,17 @@ amomax_d   10100 . . . . 011 . 010 @atom_st
>  amominu_d  11000 . . . . 011 . 010 @atom_st
>  amomaxu_d  11100 . . . . 011 . 010 @atom_st
>
> +#*** Vector AMO operations (in addition to Zvamo) ***
> +vamoswapd_v 1 . . . . 111 . 010 @r_wdvm
> +vamoaddd_v  0 . . . . 111 . 010 @r_wdvm
> +vamoxord_v  00100 . . . . 111 . 010 @r_wdvm
> +vamoandd_v  01100 . . . . 111 . 010 @r_wdvm
> +vamoord_v   01000 . . . . 111 . 010 @r_wdvm
> +vamomind_v  1 . . . . 111 . 010 @r_wdvm
> +vamomaxd_v  10100 . . . . 111 . 010 @r_wdvm
> +vamominud_v 11000 . . . . 111 . 010 @r_wdvm
> +vamomaxud_v 11100 . . . . 111 . 010 @r_wdvm
> +
>  # *** RV64F Standard Extension (in addition to RV32F) ***
>  fcvt_l_s   110  00010 . ... . 1010011 @r2_rm
>  fcvt_lu_s  110  00011 . ... . 1010011 @r2_rm
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b76c09c8c0..1330703720 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -44,6 +44,7 @@
>  imm rd
>   shamt rs1 rd
>  aq rl rs2 rs1 rd
> + vm wd rd rs1 rs2
>  vm rd rs1 nf
>   vm rd rs1 rs2 nf
>
> @@ -67,6 +68,7 @@
>  @r2  ...   . . ... . ... %rs1 %rd
>  @r2_nfvm ... ... vm:1 . . ... . ...  %nf %rs1 %rd
>  @r_nfvm  ... ... vm:1 . . ... . ...  %nf %rs2 %rs1 %rd
> +@r_wdvm  . wd:1 vm:1 . . ... . ...  %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  . ... . ... %rs1 %rd
>
>  @hfence_gvma ... . .   ... . ... 

Re: [PATCH v3] MAINTAINERS: Add an entry for the HVF accelerator

2020-03-19 Thread Philippe Mathieu-Daudé

On 3/19/20 2:55 PM, Roman Bolshakov wrote:

Cameron signed up for taking HVF ownership.

Cc: Cameron Esfahani 
Cc: Nikita Leshenko 
Cc: Sergio Andres Gomez Del Real 
Cc: Patrick Colp 
Cc: Liran Alon 
Cc: Heiher 

Signed-off-by: Roman Bolshakov 
---
Changes since v2:
   Removed myself from the list of maintainers, added Cameron from Apple.
   Status is changed to Supported again.
Changes since v1:
   Status is changed to Maintained instead of Supported.

  MAINTAINERS | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7364af0d8b..ab4dc2816c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -406,6 +406,13 @@ S: Supported
  F: target/i386/kvm.c
  F: scripts/kvm/vmxcap
  
+X86 HVF CPUs

+M: Cameron Esfahani 


From the other thread discussions, I'd keep you at least listed as 
designated reviewer:


R: Roman Bolshakov 


+S: Supported
+F: accel/stubs/hvf-stub.c
+F: target/i386/hvf/
+F: include/sysemu/hvf.h
+
  WHPX CPUs
  M: Sunil Muthuswamy 
  S: Supported






Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Kirti Wankhede




On 3/19/2020 6:36 PM, Alex Williamson wrote:

On Thu, 19 Mar 2020 02:15:34 -0400
Yan Zhao  wrote:


On Thu, Mar 19, 2020 at 12:40:53PM +0800, Alex Williamson wrote:

On Thu, 19 Mar 2020 00:15:33 -0400
Yan Zhao  wrote:
   

On Thu, Mar 19, 2020 at 12:01:00PM +0800, Alex Williamson wrote:

On Wed, 18 Mar 2020 23:06:39 -0400
Yan Zhao  wrote:
 

On Thu, Mar 19, 2020 at 03:41:11AM +0800, Kirti Wankhede wrote:

VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
   copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  drivers/vfio/vfio_iommu_type1.c | 205 +++-
  1 file changed, 203 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..d6417fb02174 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
  };
  
  struct vfio_domain {

@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
  };
  
  struct vfio_group {

@@ -125,7 +127,10 @@ struct vfio_regions {
  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)   \
(!list_empty(>domain_list))
  
+#define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)

+
  static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
  
  /*

   * This code handles mapping and unmapping of user data buffers
@@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
  }
  
+static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t pgsize)

+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   struct rb_node *p;
+   unsigned long npages = dma->size / pgsize;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+   return -ENOMEM;
+   }
+
+   if (RB_EMPTY_ROOT(>pfn_list))
+   continue;
+
+   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
+node);
+
+   bitmap_set(dma->bitmap,
+   (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+}
+
  /*
   * Helper Functions for host iova-pfn list
   */
@@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
vfio_unpin_page_external(dma, iova, do_accounting);
goto pin_unwind;
}
+
+   if (iommu->dirty_page_tracking) {
+   unsigned long pgshift =
+__ffs(vfio_pgsize_bitmap(iommu));
+
+   bitmap_set(dma->bitmap,
+  

[PATCH] aio-posix: fix io_uring with external events

2020-03-19 Thread Stefan Hajnoczi
When external event sources are disabled fdmon-io_uring falls back to
fdmon-poll.  The ->need_wait() callback needs to watch for this so it
can return true when external event sources are disabled.

It is also necessary to call ->wait() when AioHandlers have changed
because io_uring is asynchronous and we must submit new sqes.

Both of these changes to ->need_wait() together fix tests/test-aio -p
/aio/external-client, which failed with:

  test-aio: tests/test-aio.c:404: test_aio_external_client: Assertion 
`aio_poll(ctx, false)' failed.

Reported-by: Julia Suvorova 
Signed-off-by: Stefan Hajnoczi 
---
 util/fdmon-io_uring.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index 893b79b622..7e143ef515 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -290,7 +290,18 @@ static int fdmon_io_uring_wait(AioContext *ctx, 
AioHandlerList *ready_list,
 
 static bool fdmon_io_uring_need_wait(AioContext *ctx)
 {
-return io_uring_cq_ready(>fdmon_io_uring);
+/* Have io_uring events completed? */
+if (io_uring_cq_ready(>fdmon_io_uring)) {
+return true;
+}
+
+/* Do we need to submit new io_uring sqes? */
+if (!QSLIST_EMPTY_RCU(>submit_list)) {
+return true;
+}
+
+/* Are we falling back to fdmon-poll? */
+return atomic_read(>external_disable_cnt);
 }
 
 static const FDMonOps fdmon_io_uring_ops = {
-- 
2.24.1



Re: [PATCH v6 11/61] target/riscv: vector widening integer add and subtract

2020-03-19 Thread Alistair Francis
On Tue, Mar 17, 2020 at 8:29 AM LIU Zhiwei  wrote:
>
> Reviewed-by: Richard Henderson 
> Signed-off-by: LIU Zhiwei 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   |  49 +++
>  target/riscv/insn32.decode  |  16 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 178 
>  target/riscv/vector_helper.c| 111 +++
>  4 files changed, 354 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index e73701d4bb..1256defb6c 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -290,3 +290,52 @@ DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vwaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwadd_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsub_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwaddu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsubu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwadd_wx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwadd_wx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index d1034a0e61..4bdbfd16fa 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -284,6 +284,22 @@ vsub_vv 10 . . . 000 . 1010111 
> @r_vm
>  vsub_vx 10 . . . 100 . 1010111 @r_vm
>  vrsub_vx11 . . . 100 . 1010111 @r_vm
>  vrsub_vi11 . . . 011 . 1010111 @r_vm
> +vwaddu_vv   11 . . . 010 . 1010111 @r_vm
> +vwaddu_vx   11 . . . 110 . 1010111 @r_vm
> +vwadd_vv110001 . . . 010 . 1010111 @r_vm
> +vwadd_vx110001 . . . 110 . 1010111 @r_vm
> +vwsubu_vv   110010 . . . 010 . 1010111 @r_vm
> +vwsubu_vx   110010 . . . 110 . 1010111 @r_vm
> +vwsub_vv110011 . . . 010 . 1010111 @r_vm
> +vwsub_vx110011 . . . 110 . 1010111 @r_vm
> +vwaddu_wv   110100 . . . 010 . 1010111 @r_vm

Re: [PATCH 0/4] linux-user: Fix some issues in termbits.h files

2020-03-19 Thread Laurent Vivier
Le 19/03/2020 à 17:24, Aleksandar Markovic a écrit :
>> I think we should first introduce a linux-user/generic/termbits.h as we
>> have an asm-generic/termbits.h in the kernel and use it with all the
>> targets except alpha, mips, hppa, sparc and xtensa.
>>
>> I think this linux-user/generic/termbits.h could be copied from
>> linux-user/openrisc/termbits.h (without the ioctl definitions)
>>
>> Then you could update the remaining ones.
>>
> 
> I agree with you, Laurent, that would be the cleanest
> implementation.
> 
> However, I think it requires at least several days of meticulous
> dev work, that I can't afford at this moment. May I ask you to
> accept this series as is for 5.0, as a sort of bridge towards
> the implementation you described? It certainly fixes a majority
> of termbits-related bugs, many of them remained latent just
> by fact that XXX and TARGET_XXX were identical. The most
> affected targets, xtensa, mips and alpha should be cleaned up
> by this series wrt termbits, and for great majority of issues
> are cleaned up for all platforms.
> 
> I just don't have enough time resources to additionally
> devote to this problem.

ok, but I need to review and test them, I don't know if I will have
enough time for that. I will try...

Thanks,
Laurent



Re: [PATCH 0/5] QEMU Gating CI

2020-03-19 Thread Markus Armbruster
Peter Maydell  writes:

> On Tue, 17 Mar 2020 at 14:13, Cleber Rosa  wrote:
>>
>> On Tue, Mar 17, 2020 at 09:29:32AM +, Peter Maydell wrote:
>> > Ah, I see. My assumption was that this was all stuff that you were
>> > working on, so that I would then be able to test that it worked correctly,
>> > not that I would need to do configuration of the gitlab.com setup.
>
>> So, I had to use temporary hardware resources to set this up (and set
>> it up countless times TBH).  I had the understanding based on the list
>> of machines you documented[1] that at least some of them would be used
>> for the permanent setup.
>
> Well, some of them will be (eg the s390 box), but some of them
> are my personal ones that can't be reused easily. I'd assumed
> in any case that gitlab would have at least support for x86 hosts:
> we are definitely not going to continue to use my desktop machine
> for running CI builds! Also IIRC RedHat said they'd be able to
> provide some machines for runners.

Correct!  As discussed at the QEMU summit, we'll gladly chip in runners
to test the stuff we care about, but to match the coverage of your
private zoo of machines, others will have to chip in, too.

>> OK, I see it, now it makes more sense.  So we're "only" missing the
>> setup for the machines we'll use for the more permanent setup.  Would
>> you like to do a staged setup/migration using one or some of the
>> machines you documented?  I'm 100% onboard to help with this, meaning
>> that I can assist you with instructions, or do "pair setup" of the
>> machines if needed.  I think a good part of the evaluation here comes
>> down to how manageable/reproducible the setup is, so it'd make sense
>> for one to be part of the setup itself.
>
> I think we should start by getting the gitlab setup working
> for the basic "x86 configs" first. Then we can try adding
> a runner for s390 (that one's logistically easiest because
> it is a project machine, not one owned by me personally or
> by Linaro) once the basic framework is working, and expand
> from there.

Makes sense to me.

Next steps to get this off the ground:

* Red Hat provides runner(s) for x86 stuff we care about.

* If that doesn't cover 'basic "x86 configs" in your judgement, we
  fill the gaps as described below under "Expand from there".

* Add an s390 runner using the project machine you mentioned.

* Expand from there: identify the remaining gaps, map them to people /
  organizations interested in them, and solicit contributions from these
  guys.

A note on contributions: we need both hardware and people.  By people I
mean maintainers for the infrastructure, the tools and all the runners.
Cleber & team are willing to serve for the infrastructure, the tools and
the Red Hat runners.

Does this sound workable?

> But to a large degree I really don't want to have to get
> into the details of how gitlab works or setting up runners
> myself if I can avoid it. We're going through this migration
> because I want to be able to hand off the CI stuff to other
> people, not to retain control of it.

Understand.  We need contributions to gating CI, but the whole point of
this exercise is to make people other than *you* contribute to our
gating CI :)

Let me use this opportunity to say thank you for all your integration
work!




Re: [EXTERNAL] [PATCH v2] target/ppc: Fix ISA v3.0 (POWER9) slbia implementation

2020-03-19 Thread Cédric Le Goater
On 3/19/20 7:44 AM, Nicholas Piggin wrote:
> The new ISA v3.0 slbia variants have not been implemented for TCG,
> which can lead to crashing when a POWER9 machine boots Linux using
> the hash MMU, for example ("disable_radix" kernel command line).
> 
> Add them.
> 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Cédric Le Goater 

> ---
> Changes in v2:
> - Rewrite changelog.
> - Remove stray slbie hunk that crept in
> 
> I don't think the slbie invalidation is necessary, as explained on the
> list.
> 
>  target/ppc/helper.h |  2 +-
>  target/ppc/mmu-hash64.c | 56 +++--
>  target/ppc/translate.c  |  5 +++-
>  3 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index ee1498050d..2dfa1c6942 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -615,7 +615,7 @@ DEF_HELPER_FLAGS_3(store_slb, TCG_CALL_NO_RWG, void, env, 
> tl, tl)
>  DEF_HELPER_2(load_slb_esid, tl, env, tl)
>  DEF_HELPER_2(load_slb_vsid, tl, env, tl)
>  DEF_HELPER_2(find_slb_vsid, tl, env, tl)
> -DEF_HELPER_FLAGS_1(slbia, TCG_CALL_NO_RWG, void, env)
> +DEF_HELPER_FLAGS_2(slbia, TCG_CALL_NO_RWG, void, env, i32)
>  DEF_HELPER_FLAGS_2(slbie, TCG_CALL_NO_RWG, void, env, tl)
>  DEF_HELPER_FLAGS_2(slbieg, TCG_CALL_NO_RWG, void, env, tl)
>  #endif
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 373d44de74..e5baabf0e1 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -95,9 +95,10 @@ void dump_slb(PowerPCCPU *cpu)
>  }
>  }
> 
> -void helper_slbia(CPUPPCState *env)
> +void helper_slbia(CPUPPCState *env, uint32_t ih)
>  {
>  PowerPCCPU *cpu = env_archcpu(env);
> +int starting_entry;
>  int n;
> 
>  /*
> @@ -111,18 +112,59 @@ void helper_slbia(CPUPPCState *env)
>   * expected that slbmte is more common than slbia, and slbia is usually
>   * going to evict valid SLB entries, so that tradeoff is unlikely to be a
>   * good one.
> + *
> + * ISA v2.05 introduced IH field with values 0,1,2,6. These all 
> invalidate
> + * the same SLB entries (everything but entry 0), but differ in what
> + * "lookaside information" is invalidated. TCG can ignore this and flush
> + * everything.
> + *
> + * ISA v3.0 introduced additional values 3,4,7, which change what SLBs 
> are
> + * invalidated.
>   */
> 
> -/* XXX: Warning: slbia never invalidates the first segment */
> -for (n = 1; n < cpu->hash64_opts->slb_size; n++) {
> -ppc_slb_t *slb = >slb[n];
> +env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> +
> +starting_entry = 1; /* default for IH=0,1,2,6 */
> +
> +if (env->mmu_model == POWERPC_MMU_3_00) {
> +switch (ih) {
> +case 0x7:
> +/* invalidate no SLBs, but all lookaside information */
> +return;
> 
> -if (slb->esid & SLB_ESID_V) {
> -slb->esid &= ~SLB_ESID_V;
> +case 0x3:
> +case 0x4:
> +/* also considers SLB entry 0 */
> +starting_entry = 0;
> +break;
> +
> +case 0x5:
> +/* treat undefined values as ih==0, and warn */
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "slbia undefined IH field %u.\n", ih);
> +break;
> +
> +default:
> +/* 0,1,2,6 */
> +break;
>  }
>  }
> 
> -env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
> +for (n = starting_entry; n < cpu->hash64_opts->slb_size; n++) {
> +ppc_slb_t *slb = >slb[n];
> +
> +if (!(slb->esid & SLB_ESID_V)) {
> +continue;
> +}
> +if (env->mmu_model == POWERPC_MMU_3_00) {
> +if (ih == 0x3 && (slb->vsid & SLB_VSID_C) == 0) {
> +/* preserves entries with a class value of 0 */
> +continue;
> +}
> +}
> +
> +slb->esid &= ~SLB_ESID_V;
> +}
>  }
> 
>  static void __helper_slbie(CPUPPCState *env, target_ulong addr,
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index eb0ddba850..e514732a09 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -5027,12 +5027,15 @@ static void gen_tlbsync(DisasContext *ctx)
>  /* slbia */
>  static void gen_slbia(DisasContext *ctx)
>  {
> +uint32_t ih = (ctx->opcode >> 21) & 0x7;
> +TCGv_i32 t0 = tcg_const_i32(ih);
> +
>  #if defined(CONFIG_USER_ONLY)
>  GEN_PRIV;
>  #else
>  CHK_SV;
> 
> -gen_helper_slbia(cpu_env);
> +gen_helper_slbia(cpu_env, t0);
>  #endif /* defined(CONFIG_USER_ONLY) */
>  }
> 




Re: [PATCH 0/4] linux-user: Fix some issues in termbits.h files

2020-03-19 Thread Aleksandar Markovic
> I think we should first introduce a linux-user/generic/termbits.h as we
> have an asm-generic/termbits.h in the kernel and use it with all the
> targets except alpha, mips, hppa, sparc and xtensa.
>
> I think this linux-user/generic/termbits.h could be copied from
> linux-user/openrisc/termbits.h (without the ioctl definitions)
>
> Then you could update the remaining ones.
>

I agree with you, Laurent, that would be the cleanest
implementation.

However, I think it requires at least several days of meticulous
dev work, that I can't afford at this moment. May I ask you to
accept this series as is for 5.0, as a sort of bridge towards
the implementation you described? It certainly fixes a majority
of termbits-related bugs, many of them remained latent just
by fact that XXX and TARGET_XXX were identical. The most
affected targets, xtensa, mips and alpha should be cleaned up
by this series wrt termbits, and for great majority of issues
are cleaned up for all platforms.

I just don't have enough time resources to additionally
devote to this problem.

Sincerely,
Aleksandar

> Thanks,
> Laurent
>



Re: [PATCH v14 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-19 Thread Alex Williamson
On Thu, 19 Mar 2020 20:22:41 +0530
Kirti Wankhede  wrote:

> On 3/19/2020 9:15 AM, Alex Williamson wrote:
> > On Thu, 19 Mar 2020 01:11:11 +0530
> > Kirti Wankhede  wrote:
> >   
> >> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> >> - Start dirty pages tracking while migration is active
> >> - Stop dirty pages tracking.
> >> - Get dirty pages bitmap. Its user space application's responsibility to
> >>copy content of dirty pages from source to destination during migration.
> >>
> >> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> >> structure. Bitmap size is calculated considering smallest supported page
> >> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> >>
> >> Bitmap is populated for already pinned pages when bitmap is allocated for
> >> a vfio_dma with the smallest supported page size. Update bitmap from
> >> pinning functions when tracking is enabled. When user application queries
> >> bitmap, check if requested page size is same as page size used to
> >> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> >> error.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   drivers/vfio/vfio_iommu_type1.c | 205 
> >> +++-
> >>   1 file changed, 203 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >> b/drivers/vfio/vfio_iommu_type1.c
> >> index 70aeab921d0f..d6417fb02174 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -71,6 +71,7 @@ struct vfio_iommu {
> >>unsigned intdma_avail;
> >>boolv2;
> >>boolnesting;
> >> +  booldirty_page_tracking;
> >>   };
> >>   
> >>   struct vfio_domain {
> >> @@ -91,6 +92,7 @@ struct vfio_dma {
> >>boollock_cap;   /* capable(CAP_IPC_LOCK) */
> >>struct task_struct  *task;
> >>struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> >> +  unsigned long   *bitmap;  
> > 
> > We've made the bitmap a width invariant u64 else, should be here as
> > well.
> >   
> >>   };
> >>   
> >>   struct vfio_group {
> >> @@ -125,7 +127,10 @@ struct vfio_regions {
> >>   #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> >>(!list_empty(>domain_list))
> >>   
> >> +#define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) / 
> >> BITS_PER_BYTE)
> >> +
> >>   static int put_pfn(unsigned long pfn, int prot);
> >> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> >>   
> >>   /*
> >>* This code handles mapping and unmapping of user data buffers
> >> @@ -175,6 +180,55 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> >> struct vfio_dma *old)
> >>rb_erase(>node, >dma_list);
> >>   }
> >>   
> >> +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, uint64_t 
> >> pgsize)
> >> +{
> >> +  struct rb_node *n = rb_first(>dma_list);
> >> +
> >> +  for (; n; n = rb_next(n)) {
> >> +  struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> >> +  struct rb_node *p;
> >> +  unsigned long npages = dma->size / pgsize;
> >> +
> >> +  dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> >> +  if (!dma->bitmap) {
> >> +  struct rb_node *p = rb_prev(n);
> >> +
> >> +  for (; p; p = rb_prev(p)) {
> >> +  struct vfio_dma *dma = rb_entry(n,
> >> +  struct vfio_dma, node);
> >> +
> >> +  kfree(dma->bitmap);
> >> +  dma->bitmap = NULL;
> >> +  }
> >> +  return -ENOMEM;
> >> +  }
> >> +
> >> +  if (RB_EMPTY_ROOT(>pfn_list))
> >> +  continue;
> >> +
> >> +  for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> >> +  struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
> >> +   node);
> >> +
> >> +  bitmap_set(dma->bitmap,
> >> +  (vpfn->iova - dma->iova) / pgsize, 1);
> >> +  }
> >> +  }
> >> +  return 0;
> >> +}
> >> +
> >> +static void vfio_dma_bitmap_free(struct vfio_iommu *iommu)
> >> +{
> >> +  struct rb_node *n = rb_first(>dma_list);
> >> +
> >> +  for (; n; n = rb_next(n)) {
> >> +  struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> >> +
> >> +  kfree(dma->bitmap);
> >> +  dma->bitmap = NULL;
> >> +  }
> >> +}
> >> +
> >>   /*
> >>* Helper Functions for host iova-pfn list
> >>*/
> >> @@ -567,6 +621,14 @@ static int vfio_iommu_type1_pin_pages(void 
> >> *iommu_data,
> >>vfio_unpin_page_external(dma, iova, do_accounting);
> >>goto pin_unwind;
> >>}
> >> +
> >> 

Re: [PATCH 1/1] conf: qemu 9pfs: add 'multidevs' option

2020-03-19 Thread Ján Tomko

On a Thursday in 2020, Christian Schoenebeck wrote:

On Donnerstag, 19. März 2020 14:10:26 CET Ján Tomko wrote:

On a Tuesday in 2020, Christian Schoenebeck wrote:
>Introduce new 'multidevs' option for filesystem.
>
>  

I don't like the 'multidevs' name, but cannot think of anything
beter.

'collisions' maybe?


Not sure if 'collisions' is better, e.g. collisions='remap' sounds scary. :)
And which collision would that be? At least IMO 'multidevs' is less ambigious.
I have no problem though to change it to whatever name you might come up with.
Just keep the resulting key-value pair set in mind:

multidevs='default'
multidevs='remap'
multidevs='forbid'
multidevs='warn'

vs.

collisions='default'
collisions='remap' <- probably misleading what 'remap' means in this case
collisions='forbid'
collisions='warn' <- wrong, it warns about multiple devices, not about file ID
collisions.

So different key-name might also require different value-names.

Another option would be the long form 'multi-devices=...'


Okay, let's leave it at 'multidevs'.




>
>
>
>  
>
>This option prevents misbheaviours on guest if a 9pfs export
>contains multiple devices, due to the potential file ID collisions
>this otherwise may cause.
>
>Signed-off-by: Christian Schoenebeck 
>---
>
> docs/formatdomain.html.in | 47 ++-
> docs/schemas/domaincommon.rng | 10 
> src/conf/domain_conf.c| 30 ++
> src/conf/domain_conf.h| 13 ++
> src/qemu/qemu_command.c   |  7 ++
> 5 files changed, 106 insertions(+), 1 deletion(-)

Please split the XML changes from the qemu driver changes.


Ok


Also missing:
* qemu_capabilities addition


AFAICS the common procedure is to add new capabilities always to the end of
the enum list. So I guess I will do that as well.


* qemuDomainDeviceDefValidateFS in qemu_domain.c - check for the capability,
reject this setting for virtiofs


Good to know where that check is supposed to go to, thanks!


* qemuxml2xmltest addition
* qemuxml2argvtest addition


Ok, I have to read up how those tests work. AFAICS I need to add xml files to
their data subdirs.

Separate patches required for those 2 tests?


Usually xml2xmltest is in the same test as the XML parser/formatter
and xml2argvtest is a part of the qemu driver patch.

Jano


signature.asc
Description: PGP signature


Re: [PATCH v1 1/1] target/riscv: Don't set write permissions on dirty PTEs

2020-03-19 Thread Alistair Francis
On Wed, Mar 18, 2020 at 9:52 PM Palmer Dabbelt  wrote:
>
> On Tue, 03 Mar 2020 17:16:59 PST (-0800), Alistair Francis wrote:
> > The RISC-V spec specifies that when a write happens and the D bit is
> > clear the implementation will set the bit in the PTE. It does not
> > describe that the PTE being dirty means that we should provide write
> > access. This patch removes the write access granted to pages when the
> > dirty bit is set.
> >
> > Following the prot variable we can see that it affects all of these
> > functions:
> >  riscv_cpu_tlb_fill()
> >tlb_set_page()
> >  tlb_set_page_with_attrs()
> >address_space_translate_for_iotlb()
> >
> > Looking at the cputlb code (tlb_set_page_with_attrs() and
> > address_space_translate_for_iotlb()) it looks like the main affect of
> > setting write permissions is that the page can be marked as TLB_NOTDIRTY.
> >
> > I don't see any other impacts (related to the dirty bit) for giving a
> > page write permissions.
> >
> > Setting write permission on dirty PTEs results in userspace inside a
> > Hypervisor guest (VU) becoming corrupted. This appears to be because it
> > ends up with write permission in the second stage translation in cases
> > where we aren't doing a store.
> >
> > Signed-off-by: Alistair Francis 
> > Reviewed-by: Bin Meng 
> > ---
> >  target/riscv/cpu_helper.c | 6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> > index 5ea5d133aa..cc9f20b471 100644
> > --- a/target/riscv/cpu_helper.c
> > +++ b/target/riscv/cpu_helper.c
> > @@ -572,10 +572,8 @@ restart:
> >  if ((pte & PTE_X)) {
> >  *prot |= PAGE_EXEC;
> >  }
> > -/* add write permission on stores or if the page is already 
> > dirty,
> > -   so that we TLB miss on later writes to update the dirty bit 
> > */
> > -if ((pte & PTE_W) &&
> > -(access_type == MMU_DATA_STORE || (pte & PTE_D))) {
> > +/* add write permission on stores */
> > +if ((pte & PTE_W) && (access_type == MMU_DATA_STORE)) {
> >  *prot |= PAGE_WRITE;
> >  }
> >  return TRANSLATE_SUCCESS;
>
> I remember having seen this patch before and having some objections, but I 
> feel
> like I mistakenly had this backwards before or something because it makes 
> sense
> now.

Ha, we have come full circle because I think this is wrong.

This is an optimisation which from what I can tell (and talking to
Richard) is correct.

In saying that this patch is the only thing that I have found that
fixes Hypervisor guest userspace. It shouldn't be applied though.

Alistair

>
> Thanks!



Re: [PATCH 1/1] conf: qemu 9pfs: add 'multidevs' option

2020-03-19 Thread Daniel P . Berrangé
On Thu, Mar 19, 2020 at 04:57:41PM +0100, Christian Schoenebeck wrote:
> On Donnerstag, 19. März 2020 14:10:26 CET Ján Tomko wrote:
> > On a Tuesday in 2020, Christian Schoenebeck wrote:
> > >Introduce new 'multidevs' option for filesystem.
> > >
> > >  
> > 
> > I don't like the 'multidevs' name, but cannot think of anything
> > beter.
> > 
> > 'collisions' maybe?
> 
> Not sure if 'collisions' is better, e.g. collisions='remap' sounds scary. :)
> And which collision would that be? At least IMO 'multidevs' is less ambigious.
> I have no problem though to change it to whatever name you might come up 
> with. 
> Just keep the resulting key-value pair set in mind:
> 
> multidevs='default'
> multidevs='remap'
> multidevs='forbid'
> multidevs='warn'
> 
> vs.
> 
> collisions='default'
> collisions='remap' <- probably misleading what 'remap' means in this case
> collisions='forbid'
> collisions='warn' <- wrong, it warns about multiple devices, not about file 
> ID 
> collisions.
> 
> So different key-name might also require different value-names.
> 
> Another option would be the long form 'multi-devices=...'

I tried to come up with names when this was posted to QEMU, but didn't
think of much better than multidevs, so I think that's acceptable for
libvirt usage.

"collisions" isn't better enough to justify different naming from QEMU


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 1/1] conf: qemu 9pfs: add 'multidevs' option

2020-03-19 Thread Christian Schoenebeck
On Donnerstag, 19. März 2020 14:10:26 CET Ján Tomko wrote:
> On a Tuesday in 2020, Christian Schoenebeck wrote:
> >Introduce new 'multidevs' option for filesystem.
> >
> >  
> 
> I don't like the 'multidevs' name, but cannot think of anything
> beter.
> 
> 'collisions' maybe?

Not sure if 'collisions' is better, e.g. collisions='remap' sounds scary. :)
And which collision would that be? At least IMO 'multidevs' is less ambigious.
I have no problem though to change it to whatever name you might come up with. 
Just keep the resulting key-value pair set in mind:

multidevs='default'
multidevs='remap'
multidevs='forbid'
multidevs='warn'

vs.

collisions='default'
collisions='remap' <- probably misleading what 'remap' means in this case
collisions='forbid'
collisions='warn' <- wrong, it warns about multiple devices, not about file ID 
collisions.

So different key-name might also require different value-names.

Another option would be the long form 'multi-devices=...'

> >
> >
> >  
> >  
> >
> >This option prevents misbheaviours on guest if a 9pfs export
> >contains multiple devices, due to the potential file ID collisions
> >this otherwise may cause.
> >
> >Signed-off-by: Christian Schoenebeck 
> >---
> >
> > docs/formatdomain.html.in | 47 ++-
> > docs/schemas/domaincommon.rng | 10 
> > src/conf/domain_conf.c| 30 ++
> > src/conf/domain_conf.h| 13 ++
> > src/qemu/qemu_command.c   |  7 ++
> > 5 files changed, 106 insertions(+), 1 deletion(-)
> 
> Please split the XML changes from the qemu driver changes.

Ok

> Also missing:
> * qemu_capabilities addition

AFAICS the common procedure is to add new capabilities always to the end of 
the enum list. So I guess I will do that as well.

> * qemuDomainDeviceDefValidateFS in qemu_domain.c - check for the capability,
> reject this setting for virtiofs

Good to know where that check is supposed to go to, thanks!

> * qemuxml2xmltest addition
> * qemuxml2argvtest addition

Ok, I have to read up how those tests work. AFAICS I need to add xml files to 
their data subdirs.

Separate patches required for those 2 tests?

> (no changes required for virschematest - it checks all the XML files in
> the directories used by the above tests against the schema)
> 
> >diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> >index 594146009d..13c506988b 100644
> >--- a/docs/formatdomain.html.in
> >+++ b/docs/formatdomain.html.in
> >@@ -3967,7 +3967,7 @@
> >
> > source name='my-vm-template'/
> > target dir='/'/
> >   
> >   /filesystem
> >
> >-  filesystem type='mount' accessmode='passthrough'
> >+  filesystem type='mount' accessmode='passthrough'
> >multidevs='remap'>
> > driver type='path' wrpolicy='immediate'/
> > source dir='/export/to/guest'/
> > target dir='/import/from/host'/
> >
> >@@ -4084,13 +4084,58 @@
> >
> > 
> > 
> >
> >+  
> >
> >   Since 5.2.0, the filesystem element
> >   has an optional attribute model with supported values
> >   "virtio-transitional", "virtio-non-transitional", or "virtio".
> >   See Virtio transitional
> >   devices
> >   for more details.
> >
> >+  
> >+
> 
> Unrelated change that can be split out.

Ok, I'll make that the 1st preparatory patch then including ...

> >+  
> >+  The filesystem element has an optional attribute
> >multidevs +  which specifies how to deal with a
> >filesystem export containing more than +  one device, in order to
> >avoid file ID collisions on guest when using 9pfs +  ( >class="since">since 6.2.0, requires QEMU 4.2). +  This
> >attribute is not available for virtiofs. The possible values are: + 
> >
> >+
> >+
> >+default
> >+
> >+Use QEMU's default setting (which currently is warn).
> >+
> >+remap
> >+
> >+This setting allows guest to access multiple devices per export
> >without +encountering misbehaviours. Inode numbers from host are
> >automatically +remapped on guest to actively prevent file ID
> >collisions if guest +accesses one export containing multiple
> >devices.
> >+
> >+forbid
> >+
> >+Only allow to access one device per export by guest. Attempts to
> >access +additional devices on the same export will cause the
> >individual +filesystem access by guest to fail with an error and
> >being logged (once) +as error on host side.
> >+
> >+warn
> >+
> >+This setting resembles the behaviour of 9pfs prior to QEMU 4.2,
> >that is +no action is performed to prevent any potential file ID
> >collisions if an +export contains multiple devices, with the only
> >exception: a warning is +logged (once) on host side now. This
> >setting may lead to misbehaviours +on guest side if more than one
> >device is 

Re: [PATCH v5 07/18] s390x: protvirt: Inhibit balloon when switching to protected mode

2020-03-19 Thread Halil Pasic
On Thu, 19 Mar 2020 14:54:11 +0100
David Hildenbrand  wrote:

> On 27.02.20 13:24, Halil Pasic wrote:
> > On Wed, 26 Feb 2020 16:11:03 +0100
> > Janosch Frank  wrote:
> > 
> >> On 2/26/20 3:59 PM, David Hildenbrand wrote:
> >>> On 26.02.20 13:20, Janosch Frank wrote:
>  Ballooning in protected VMs can only be done when the guest shares the
>  pages it gives to the host. Hence, until we have a solution for this
>  in the guest kernel, we inhibit ballooning when switching into
>  protected mode and reverse that once we move out of it.
> >>>
> >>> I don't understand what you mean here, sorry. zapping a page will mean
> >>> that a fresh one will be faulted in when accessed. And AFAIK, that means
> >>> it will be encrypted again when needed.
> >>
> >> Yes, as soon as the host alters non-shared memory we'll run into
> >> integrity issues.
> >>
> >>
> >> I've been talking to Halil after I sent this out and it looks like we'll
> >> rather try to automatically enable the IOMMU for all devices when
> >> switching into protected mode. He said that if the IOMMU is set the
> >> balloon code will do an early exit on feature negotiation.
> >>
> > 
> > I have a discussion starter RFC for you.
> > 
> > --8<--
> > From: Halil Pasic 
> > Date: Wed, 26 Feb 2020 16:48:21 +0100
> > Subject: [RFC PATCH 1/1] virtio-ccw: auto-manage VIRTIO_F_IOMMU_PLATFORM
> > 
> > The virtio specification tells that the device is to present
> > VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> > device "can only access certain memory addresses with said access
> > specified and/or granted by the platform". This is the case for a
> > protected VM, as the device can access only memory addresses that are in
> > pages that are currently shared (only the guest can share/unsare its
> > page).
> > 
> > No VM however starts out as a protected VM, but some VMs may be
> > converted to protected VMs if the guest decides so.
> > 
> > Making the end user explicitly manage the VIRTIO_F_ACCESS_PLATFORM via
> > the property iommu_on is a minor disaster. If the correctness of the
> > paravirtualized virtio devices depends (and thus in a sense the
> > correctness of the hypervisor), then the hypervisor should have the last
> > word about whether VIRTIO_F_ACCESS_PLATFORM is to be presented or not.
> > 
> > Let's manage the VIRTIO_F_ACCESS_PLATFORM virtio feature automatically
> > for virtio-ccw devices, so that we set it before we start the protected
> > configuration, and unset it when our VM is not protected.
> > 
> > Signed-off-by: Halil Pasic 
> > ---
> > NOTES:
> > * I wanted to have a discussion starter fast, there are multiple open
> > questions.
> > 
> > * Doing more than one system_resets() is hackish.  We
> > should look into this.
> > 
> > * The user interface implications of this patch are also an ugly can of
> > worms. Needs to be discussed.
> > 
> > * We should consider keeping the original value if !pv and restoring it
> > on pv --> !pv, instead of forcing to unset when leaving pv, and actually
> > not forcing unset when !pv.
> > 
> > * It might make sense to do something like this in virtio core, but I
> >   decided start the discussion with a ccw-only change.
> > 
> > * Maybe we need a machine option that enables this sort of behavior,
> > especially if we decide not to do the conserving/restoring.
> > 
> > * Lightly tested.
> > ---
> >  hw/s390x/s390-virtio-ccw.c |  4 ++--
> >  hw/s390x/virtio-ccw.c  | 13 +
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index 0f4455d1df..996124f152 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -337,7 +337,7 @@ static void s390_machine_unprotect(S390CcwMachineState 
> > *ms)
> >  ms->pv = false;
> >  }
> >  migrate_del_blocker(pv_mig_blocker);
> > -qemu_balloon_inhibit(false);
> > +subsystem_reset();
> >  }
> >  
> >  static int s390_machine_protect(S390CcwMachineState *ms)
> > @@ -346,7 +346,6 @@ static int s390_machine_protect(S390CcwMachineState *ms)
> >  CPUState *t;
> >  int rc;
> >  
> > -qemu_balloon_inhibit(true);
> >  if (!pv_mig_blocker) {
> >  error_setg(_mig_blocker,
> > "protected VMs are currently not migrateable.");
> > @@ -384,6 +383,7 @@ static int s390_machine_protect(S390CcwMachineState *ms)
> >  if (rc) {
> >  goto out_err;
> >  }
> > +subsystem_reset();
> >  return rc;
> >  
> >  out_err:
> > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
> > index 13f57e7b67..48bb54f68e 100644
> > --- a/hw/s390x/virtio-ccw.c
> > +++ b/hw/s390x/virtio-ccw.c
> > @@ -869,12 +869,24 @@ static void virtio_ccw_notify(DeviceState *d, 
> > uint16_t vector)
> >  }
> >  }
> >  
> > +static inline void virtio_ccw_pv_enforce_features(VirtIODevice *vdev)
> > +{
> > +

Re: [PULL 00/13] x86 and machine queue for 5.0 soft freeze

2020-03-19 Thread Peter Maydell
On Wed, 18 Mar 2020 at 01:17, Eduardo Habkost  wrote:
>
> The following changes since commit d649689a8ecb2e276cc20d3af6d416e3c299cb17:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2020-03-17 18:33:05 +)
>
> are available in the Git repository at:
>
>   git://github.com/ehabkost/qemu.git tags/x86-and-machine-pull-request
>
> for you to fetch changes up to 3c6712eca07255803b61ca3d632f61a65c078c36:
>
>   hw/i386: Rename apicid_from_topo_ids to x86_apicid_from_topo_ids 
> (2020-03-17 19:48:10 -0400)
>
> 
> x86 and machine queue for 5.0 soft freeze
>
> Bug fixes:
> * memory encryption: Disable mem merge
>   (Dr. David Alan Gilbert)
>
> Features:
> * New EPYC CPU definitions (Babu Moger)
> * Denventon-v2 CPU model (Tao Xu)
> * New 'note' field on versioned CPU models (Tao Xu)
>
> Cleanups:
> * x86 CPU topology cleanups (Babu Moger)
> * cpu: Use DeviceClass reset instead of a special CPUClass reset
>   (Peter Maydell)


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM



Re: [PATCH] ext4: Give 32bit personalities 32bit hashes

2020-03-19 Thread Peter Maydell
On Thu, 19 Mar 2020 at 15:13, Linus Walleij  wrote:
> On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell  
> wrote:
> > What in particular does this personality setting affect?
> > My copy of the personality(2) manpage just says:
> >
> >PER_LINUX32 (since Linux 2.2)
> >   [To be documented.]
> >
> > which isn't very informative.
>
> It's not a POSIX thing (not part of the Single Unix Specification)
> so as with most Linux things it has some fuzzy semantics
> defined by the community...
>
> I usually just go to the source.

If we're going to decide that this is the way to say
"give me 32-bit semantics" we need to actually document
that and define in at least broad terms what we mean
by it, so that when new things are added that might or
might not check against the setting there is a reference
defining whether they should or not, and so that
userspace knows what it's opting into by setting the flag.
The kernel loves undocumented APIs but userspace
consumers of them are not so enamoured :-)

As a concrete example, should "give me 32-bit semantics
via PER_LINUX32" mean "mmap should always return addresses
within 4GB" ? That would seem like it would make sense --
but on the other hand it would make it absolutely unusable
for QEMU's purposes, because we want to be able to
do full 64-bit mmap() for our own internal allocations.
(This is a specific example of the general case that
I'm dubious about having this be a process-wide switch,
because QEMU is a single process which sometimes
makes syscalls on its own behalf and sometimes makes
syscalls on behalf of the guest program it is emulating.
We want 32-bit semantics for the latter but if we
also get them for the former then there might be
unintended side effects.)

> I would not be surprised if running say i586 on x86_64
> has the same problem, just that noone has reported
> it yet. But what do I know. If they have the same problem
> they can use the same solution. Hm QEMU supports
> emulating i586 or even i386 ... maybe you could test?

Native i586 code on x86-64 should be fine, because it
will be using the compat syscalls, which ext4 already
ensures get the 32-bit sized hash (see hash2pos() and
friends).

thanks
-- PMM



Re: [PATCH] ext4: Give 32bit personalities 32bit hashes

2020-03-19 Thread Linus Walleij
On Tue, Mar 17, 2020 at 11:29 PM Andreas Dilger  wrote:

> That said, I'd think it would be preferable for ease of use and
> compatibility that applications didn't have to be modified
> (e.g. have QEMU or glibc internally set PER_LINUX32 for this
> process before the 32-bit syscall is called, given that it knows
> whether it is emulating a 32-bit runtime or not).

I think the plan is that QEMU set this, either directly when
you run qemu-user or through the binfmt handler.
https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html

IIUC the binfmt handler is invoked when you try to
execute ELF so-and-so-for-arch-so-and-so when you
are not that arch yourself. So that can set up this
personality.

Not that I know how the binfmt handler works, I am just
assuming that thing is some program that can issue
syscalls. It JustWorks for me after installing the QEMU
packages...

The problem still stands that userspace need to somehow
inform kernelspace that 32bit is going on, and this was the
best I could think of.

Yours,
Linus Walleij



Re: [PATCH] ext4: Give 32bit personalities 32bit hashes

2020-03-19 Thread Linus Walleij
On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell  wrote:
> On Tue, 17 Mar 2020 at 11:31, Linus Walleij  wrote:
> >
> > It was brought to my attention that this bug from 2018 was
> > still unresolved: 32 bit emulators like QEMU were given
> > 64 bit hashes when running 32 bit emulation on 64 bit systems.
> >
> > The personality(2) system call supports to let processes
> > indicate that they are 32 bit Linux to the kernel. This
> > was suggested by Teo in the original thread, so I just wired
> > it up and it solves the problem.
>
> Thanks for having a look at this. I'm not sure this is what
> QEMU needs, though. When QEMU runs, it is not a 32-bit
> process, it's a 64-bit process. Some of the syscalls
> it makes are on behalf of the guest and would need 32-bit
> semantics (including this one of wanting 32-bit hash sizes
> in directory reads). But some syscalls it makes for itself
> (either directly, or via libraries it's linked against
> including glibc and glib) -- those would still want the
> usual 64-bit semantics, I would have thought.

The "personality" thing is a largely unused facility that
a process can use to say it has this generic behaviour.
In this case we say we have the PER_LINUX32 personality
so it will make the process evoke 32bit behaviours inside
the kernel when dealing with this process.

Which I (loosely) appreciate that we want to achieve.

> > Programs that need the 32 bit hash only need to issue the
> > personality(PER_LINUX32) call and things start working.
>
> What in particular does this personality setting affect?
> My copy of the personality(2) manpage just says:
>
>PER_LINUX32 (since Linux 2.2)
>   [To be documented.]
>
> which isn't very informative.

It's not a POSIX thing (not part of the Single Unix Specification)
so as with most Linux things it has some fuzzy semantics
defined by the community...

I usually just go to the source.

If you grep the kernel what turns up is a bunch of
architecture-specific behaviors on arm64, ia64, mips, parisc,
powerpc, s390, sparc. They are very minor.

On x86_64 the semantic effect is
none so this would work for any kind of 32bit userspace
on x86_64. It would be the first time this flag has any
effect at all on x86_64, but arguably a useful one.

I would not be surprised if running say i586 on x86_64
has the same problem, just that noone has reported
it yet. But what do I know. If they have the same problem
they can use the same solution. Hm QEMU supports
emulating i586 or even i386 ... maybe you could test?
Or tell me how to test? We might be solving a bigger
issue here.

Yours,
Linus Walleij



  1   2   3   >