date:20191120

Re: [PATCH for-5.0 v5 11/23] ppc/pnv: Introduce a pnv_xive_is_cpu_enabled() helper

2019-11-20 Thread Greg Kurz

On Wed, 20 Nov 2019 22:40:31 +0100
Cédric Le Goater  wrote:

> On 20/11/2019 18:26, Greg Kurz wrote:
> > On Fri, 15 Nov 2019 17:24:24 +0100
> > Cédric Le Goater  wrote:
> > 
> >> and use this helper to exclude CPUs which are not enabled in the XIVE
> >> controller.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/intc/pnv_xive.c | 18 ++
> >>  1 file changed, 18 insertions(+)
> >>
> >> diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
> >> index 71ca4961b6b1..4c8c6e51c20f 100644
> >> --- a/hw/intc/pnv_xive.c
> >> +++ b/hw/intc/pnv_xive.c
> >> @@ -372,6 +372,20 @@ static int pnv_xive_get_eas(XiveRouter *xrtr, uint8_t 
> >> blk, uint32_t idx,
> >>  return pnv_xive_vst_read(xive, VST_TSEL_IVT, blk, idx, eas);
> >>  }
> >>  
> >> +static int cpu_pir(PowerPCCPU *cpu)
> >> +{
> >> +CPUPPCState *env = &cpu->env;
> >> +return env->spr_cb[SPR_PIR].default_value;
> >> +}
> >> +
> >> +static bool pnv_xive_is_cpu_enabled(PnvXive *xive, PowerPCCPU *cpu)
> >> +{
> >> +int pir = cpu_pir(cpu);
> >> +int thrd_id = pir & 0x7f;
> >> +
> >> +return xive->regs[PC_THREAD_EN_REG0 >> 3] & PPC_BIT(thrd_id);
> > 
> > A similar check is open-coded in pnv_xive_get_indirect_tctx() :
> > 
> > /* Check that HW thread is XIVE enabled */
> > if (!(xive->regs[PC_THREAD_EN_REG0 >> 3] & PPC_BIT(pir & 0x3f))) {
> > xive_error(xive, "IC: CPU %x is not enabled", pir);
> > }
> > 
> > The thread id is only the 6 lower bits of the PIR there, and so seems to
> > indicate the skiboot sources:
> > 
> > /* Get bit in register */
> > bit = c->pir & 0x3f;
> 
> skiboot uses 0x3f when enabling the TCTXT of a CPU because register
> INT_TCTXT_EN0 covers cores 0-15 (normal) and 0-7 (fused) and 
> register INT_TCTXT_EN1 covers cores 16-23 (normal) and 8-11 (fused). 
> The encoding in the registers is a bit different.
> 
> > Why make it pir & 0x7f here ? 
> 
> See pnv_chip_core_pir_p9 comments for some details on the CPU ID 
> layout.
> 

*   57:61  Core number
*   62:63  Thread ID

Ok, so the CPU ID within the socket is 7 bits, ie. pir & 0x7f

> > If it should actually be 0x3f, 
> but yes, we should fix the mask in the register setting. 
> 
> > maybe also use the helper in pnv_xive_get_indirect_tctx().
> 
> This is getting changed later on. So I rather not.
> 

I don't see any later change there, neither in this series, nor
in your powernv-4.2 on github, but nevermind, this patch is
good enough for the purpose of CAM line matching.

Reviewed-by: Greg Kurz 

> C.
> 
> > 
> >> +}
> >> +
> >>  static int pnv_xive_match_nvt(XivePresenter *xptr, uint8_t format,
> >>uint8_t nvt_blk, uint32_t nvt_idx,
> >>bool cam_ignore, uint8_t priority,
> >> @@ -393,6 +407,10 @@ static int pnv_xive_match_nvt(XivePresenter *xptr, 
> >> uint8_t format,
> >>  XiveTCTX *tctx;
> >>  int ring;
> >>  
> >> +if (!pnv_xive_is_cpu_enabled(xive, cpu)) {
> >> +continue;
> >> +}
> >> +
> >>  tctx = XIVE_TCTX(pnv_cpu_state(cpu)->intc);
> >>  
> >>  /*
> > 
>

Re: [PATCH for-5.0 v5 15/23] ppc/xive: Use the XiveFabric and XivePresenter interfaces

2019-11-20 Thread Cédric Le Goater

On 21/11/2019 08:30, Greg Kurz wrote:
> On Thu, 21 Nov 2019 08:01:44 +0100
> Cédric Le Goater  wrote:
> 
>> On 20/11/2019 19:30, Greg Kurz wrote:
>>> On Fri, 15 Nov 2019 17:24:28 +0100
>>> Cédric Le Goater  wrote:
>>>
 Now that the machines have handlers implementing the XiveFabric and
 XivePresenter interfaces, remove xive_presenter_match() and make use
 of the 'match_nvt' handler of the machine.

 Signed-off-by: Cédric Le Goater 
 ---
  hw/intc/xive.c | 48 +---
  1 file changed, 17 insertions(+), 31 deletions(-)

>>>
>>> Nice diffstat :)
>>>
 diff --git a/hw/intc/xive.c b/hw/intc/xive.c
 index 1c9e58f8deac..ab62bda85788 100644
 --- a/hw/intc/xive.c
 +++ b/hw/intc/xive.c
 @@ -1423,30 +1423,6 @@ int xive_presenter_tctx_match(XivePresenter *xptr, 
 XiveTCTX *tctx,
  return -1;
  }
  
 -static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
 - uint8_t nvt_blk, uint32_t nvt_idx,
 - bool cam_ignore, uint8_t priority,
 - uint32_t logic_serv, XiveTCTXMatch 
 *match)
 -{
 -XivePresenter *xptr = XIVE_PRESENTER(xrtr);
 -XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
 -int count;
 -
 -count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
 -   priority, logic_serv, match);
 -if (count < 0) {
 -return false;
 -}
 -
 -if (!match->tctx) {
 -qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
 -  nvt_blk, nvt_idx);
>>>
>>> Maybe keep this trace...
>>
>> It's in spapr_xive_match_nvt() now.
>>
> 
> Not really... spapr_xive_match_nvt() has a trace for the opposite case of 
> duplicate
> matches:

not that one. The one in spapr.c ... Yes I need to change the name.

C.

> 
> if (match->tctx) {
> qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
>   "context NVT %x/%x\n", nvt_blk, nvt_idx);
> return -1;
> }
> 
>>>
 -return false;
 -}
 -
 -return true;
 -}
 -
  /*
   * This is our simple Xive Presenter Engine model. It is merged in the
   * Router as it does not require an extra object.
 @@ -1462,22 +1438,32 @@ static bool xive_presenter_match(XiveRouter *xrtr, 
 uint8_t format,
   *
   * The parameters represent what is sent on the PowerBus
   */
 -static bool xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
 +static bool xive_presenter_notify(uint8_t format,
uint8_t nvt_blk, uint32_t nvt_idx,
bool cam_ignore, uint8_t priority,
uint32_t logic_serv)
  {
 +XiveFabric *xfb = XIVE_FABRIC(qdev_get_machine());
 +XiveFabricClass *xfc = XIVE_FABRIC_GET_CLASS(xfb);
  XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
 -bool found;
 +int count;
  
 -found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, 
 cam_ignore,
 - priority, logic_serv, &match);
 -if (found) {
 +/*
 + * Ask the machine to scan the interrupt controllers for a match
 + */
 +count = xfc->match_nvt(xfb, format, nvt_blk, nvt_idx, cam_ignore,
 +   priority, logic_serv, &match);
 +if (count < 0) {
 +return false;
 +}
 +
 +/* handle CPU exception delivery */
 +if (count) {
  ipb_update(&match.tctx->regs[match.ring], priority);
  xive_tctx_notify(match.tctx, match.ring);
  }
>>>
>>> ... in an else block here ^^ ?
>>>
  
 -return found;
 +return count;
>>>
>>> Implicit cast is ok I guess, but !!count would ensure no paranoid
>>> compiler ever complains.
>>
>> yes. 
>>
>> Thanks,
>>
>> C.
>>
>>
>>>
  }
  
  /*
 @@ -1590,7 +1576,7 @@ static void xive_router_end_notify(XiveRouter *xrtr, 
 uint8_t end_blk,
  return;
  }
  
 -found = xive_presenter_notify(xrtr, format, nvt_blk, nvt_idx,
 +found = xive_presenter_notify(format, nvt_blk, nvt_idx,
xive_get_field32(END_W7_F0_IGNORE, end.w7),
priority,
xive_get_field32(END_W7_F1_LOG_SERVER_ID, 
 end.w7));
>>>
>>
>

Re: [PATCH for-5.0 v5 14/23] ppc/spapr: Implement the XiveFabric interface

2019-11-20 Thread Cédric Le Goater

On 21/11/2019 08:24, Greg Kurz wrote:
> On Thu, 21 Nov 2019 07:56:32 +0100
> Cédric Le Goater  wrote:
> 
>> On 20/11/2019 18:53, Greg Kurz wrote:
>>> On Fri, 15 Nov 2019 17:24:27 +0100
>>> Cédric Le Goater  wrote:
>>>
 The CAM line matching sequence in the pseries machine does not change
 much apart from the use of the new QOM interfaces. There is an extra
 indirection because of the sPAPR IRQ backend of the machine. Only the
 XIVE backend implements the new 'match_nvt' handler.

>>>
>>> The changelog needs an update since you dropped the indirection you had
>>> in v4.
>>
>> Indeed.
>>
>>>
 Signed-off-by: Cédric Le Goater 
 ---
  hw/ppc/spapr.c | 36 
  1 file changed, 36 insertions(+)

 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 94f9d27096af..a8f5850f65bb 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -4270,6 +4270,39 @@ static void 
 spapr_pic_print_info(InterruptStatsProvider *obj,
 kvm_irqchip_in_kernel() ? "in-kernel" : "emulated");
  }
  
 +static int spapr_xive_match_nvt(XiveFabric *xfb, uint8_t format,
 +uint8_t nvt_blk, uint32_t nvt_idx,
 +bool cam_ignore, uint8_t priority,
 +uint32_t logic_serv, XiveTCTXMatch *match)
 +{
 +SpaprMachineState *spapr = SPAPR_MACHINE(xfb);
 +XivePresenter *xptr = XIVE_PRESENTER(spapr->xive);
 +XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
 +int count;
 +
>>>
>>> As suggested by David, you should probably assert() that XIVE is in use
>>> for extra paran^Wsafety.
>>
>> I don't see the need. The stack call is clear enough IMO. It can only be 
>> reached from the XiveRouter.
>>
> 
> Hmm... the assert() proposal isn't about this getting called by some
> other code, it is about ensuring XIVE is the active IC in case the
> machine was started with ic-mode=dual. But if you're confident enough
> it can never ever happen, no matter any subsequent change may done to
> the code, then don't add it :)

If XIVE mode is not selected, the XIVE ESB pages are not mapped in the 
machine address space and you can not reach the Router without them.

C.

> 
>> Thanks,
>>
>> C. 
>>
>>> With these fixed,
>>>
>>> Reviewed-by: Greg Kurz 
>>>
 +count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
 +   priority, logic_serv, match);
 +if (count < 0) {
 +return count;
 +}
 +
 +/*
 + * When we implement the save and restore of the thread interrupt
 + * contexts in the enter/exit CPU handlers of the machine and the
 + * escalations in QEMU, we should be able to handle non dispatched
 + * vCPUs.
 + *
 + * Until this is done, the sPAPR machine should find at least one
 + * matching context always.
 + */
 +if (count == 0) {
 +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not 
 dispatched\n",
 +  nvt_blk, nvt_idx);
 +}
 +
 +return count;
 +}
 +
  int spapr_get_vcpu_id(PowerPCCPU *cpu)
  {
  return cpu->vcpu_id;
 @@ -4366,6 +4399,7 @@ static void spapr_machine_class_init(ObjectClass 
 *oc, void *data)
  PPCVirtualHypervisorClass *vhc = PPC_VIRTUAL_HYPERVISOR_CLASS(oc);
  XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
  InterruptStatsProviderClass *ispc = 
 INTERRUPT_STATS_PROVIDER_CLASS(oc);
 +XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
  
  mc->desc = "pSeries Logical Partition (PAPR compliant)";
  mc->ignore_boot_device_suffixes = true;
 @@ -4442,6 +4476,7 @@ static void spapr_machine_class_init(ObjectClass 
 *oc, void *data)
  smc->linux_pci_probe = true;
  smc->smp_threads_vsmt = true;
  smc->nr_xirqs = SPAPR_NR_XIRQS;
 +xfc->match_nvt = spapr_xive_match_nvt;
  }
  
  static const TypeInfo spapr_machine_info = {
 @@ -4460,6 +4495,7 @@ static const TypeInfo spapr_machine_info = {
  { TYPE_PPC_VIRTUAL_HYPERVISOR },
  { TYPE_XICS_FABRIC },
  { TYPE_INTERRUPT_STATS_PROVIDER },
 +{ TYPE_XIVE_FABRIC },
  { }
  },
  };
>>>
>>
>

Re: [PATCH for-5.0 v5 15/23] ppc/xive: Use the XiveFabric and XivePresenter interfaces

2019-11-20 Thread Greg Kurz

On Thu, 21 Nov 2019 08:01:44 +0100
Cédric Le Goater  wrote:

> On 20/11/2019 19:30, Greg Kurz wrote:
> > On Fri, 15 Nov 2019 17:24:28 +0100
> > Cédric Le Goater  wrote:
> > 
> >> Now that the machines have handlers implementing the XiveFabric and
> >> XivePresenter interfaces, remove xive_presenter_match() and make use
> >> of the 'match_nvt' handler of the machine.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/intc/xive.c | 48 +---
> >>  1 file changed, 17 insertions(+), 31 deletions(-)
> >>
> > 
> > Nice diffstat :)
> > 
> >> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> >> index 1c9e58f8deac..ab62bda85788 100644
> >> --- a/hw/intc/xive.c
> >> +++ b/hw/intc/xive.c
> >> @@ -1423,30 +1423,6 @@ int xive_presenter_tctx_match(XivePresenter *xptr, 
> >> XiveTCTX *tctx,
> >>  return -1;
> >>  }
> >>  
> >> -static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> >> - uint8_t nvt_blk, uint32_t nvt_idx,
> >> - bool cam_ignore, uint8_t priority,
> >> - uint32_t logic_serv, XiveTCTXMatch 
> >> *match)
> >> -{
> >> -XivePresenter *xptr = XIVE_PRESENTER(xrtr);
> >> -XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
> >> -int count;
> >> -
> >> -count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
> >> -   priority, logic_serv, match);
> >> -if (count < 0) {
> >> -return false;
> >> -}
> >> -
> >> -if (!match->tctx) {
> >> -qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
> >> -  nvt_blk, nvt_idx);
> > 
> > Maybe keep this trace...
> 
> It's in spapr_xive_match_nvt() now.
> 

Not really... spapr_xive_match_nvt() has a trace for the opposite case of 
duplicate
matches:

if (match->tctx) {
qemu_log_mask(LOG_GUEST_ERROR, "XIVE: already found a thread "
  "context NVT %x/%x\n", nvt_blk, nvt_idx);
return -1;
}

> > 
> >> -return false;
> >> -}
> >> -
> >> -return true;
> >> -}
> >> -
> >>  /*
> >>   * This is our simple Xive Presenter Engine model. It is merged in the
> >>   * Router as it does not require an extra object.
> >> @@ -1462,22 +1438,32 @@ static bool xive_presenter_match(XiveRouter *xrtr, 
> >> uint8_t format,
> >>   *
> >>   * The parameters represent what is sent on the PowerBus
> >>   */
> >> -static bool xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> >> +static bool xive_presenter_notify(uint8_t format,
> >>uint8_t nvt_blk, uint32_t nvt_idx,
> >>bool cam_ignore, uint8_t priority,
> >>uint32_t logic_serv)
> >>  {
> >> +XiveFabric *xfb = XIVE_FABRIC(qdev_get_machine());
> >> +XiveFabricClass *xfc = XIVE_FABRIC_GET_CLASS(xfb);
> >>  XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
> >> -bool found;
> >> +int count;
> >>  
> >> -found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, 
> >> cam_ignore,
> >> - priority, logic_serv, &match);
> >> -if (found) {
> >> +/*
> >> + * Ask the machine to scan the interrupt controllers for a match
> >> + */
> >> +count = xfc->match_nvt(xfb, format, nvt_blk, nvt_idx, cam_ignore,
> >> +   priority, logic_serv, &match);
> >> +if (count < 0) {
> >> +return false;
> >> +}
> >> +
> >> +/* handle CPU exception delivery */
> >> +if (count) {
> >>  ipb_update(&match.tctx->regs[match.ring], priority);
> >>  xive_tctx_notify(match.tctx, match.ring);
> >>  }
> > 
> > ... in an else block here ^^ ?
> > 
> >>  
> >> -return found;
> >> +return count;
> > 
> > Implicit cast is ok I guess, but !!count would ensure no paranoid
> > compiler ever complains.
> 
> yes. 
> 
> Thanks,
> 
> C.
> 
> 
> > 
> >>  }
> >>  
> >>  /*
> >> @@ -1590,7 +1576,7 @@ static void xive_router_end_notify(XiveRouter *xrtr, 
> >> uint8_t end_blk,
> >>  return;
> >>  }
> >>  
> >> -found = xive_presenter_notify(xrtr, format, nvt_blk, nvt_idx,
> >> +found = xive_presenter_notify(format, nvt_blk, nvt_idx,
> >>xive_get_field32(END_W7_F0_IGNORE, end.w7),
> >>priority,
> >>xive_get_field32(END_W7_F1_LOG_SERVER_ID, 
> >> end.w7));
> > 
>

Re: [PATCH for-5.0 v5 14/23] ppc/spapr: Implement the XiveFabric interface

2019-11-20 Thread Greg Kurz

On Thu, 21 Nov 2019 07:56:32 +0100
Cédric Le Goater  wrote:

> On 20/11/2019 18:53, Greg Kurz wrote:
> > On Fri, 15 Nov 2019 17:24:27 +0100
> > Cédric Le Goater  wrote:
> > 
> >> The CAM line matching sequence in the pseries machine does not change
> >> much apart from the use of the new QOM interfaces. There is an extra
> >> indirection because of the sPAPR IRQ backend of the machine. Only the
> >> XIVE backend implements the new 'match_nvt' handler.
> >>
> > 
> > The changelog needs an update since you dropped the indirection you had
> > in v4.
> 
> Indeed.
> 
> > 
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/ppc/spapr.c | 36 
> >>  1 file changed, 36 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 94f9d27096af..a8f5850f65bb 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -4270,6 +4270,39 @@ static void 
> >> spapr_pic_print_info(InterruptStatsProvider *obj,
> >> kvm_irqchip_in_kernel() ? "in-kernel" : "emulated");
> >>  }
> >>  
> >> +static int spapr_xive_match_nvt(XiveFabric *xfb, uint8_t format,
> >> +uint8_t nvt_blk, uint32_t nvt_idx,
> >> +bool cam_ignore, uint8_t priority,
> >> +uint32_t logic_serv, XiveTCTXMatch *match)
> >> +{
> >> +SpaprMachineState *spapr = SPAPR_MACHINE(xfb);
> >> +XivePresenter *xptr = XIVE_PRESENTER(spapr->xive);
> >> +XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
> >> +int count;
> >> +
> > 
> > As suggested by David, you should probably assert() that XIVE is in use
> > for extra paran^Wsafety.
> 
> I don't see the need. The stack call is clear enough IMO. It can only be 
> reached from the XiveRouter.
> 

Hmm... the assert() proposal isn't about this getting called by some
other code, it is about ensuring XIVE is the active IC in case the
machine was started with ic-mode=dual. But if you're confident enough
it can never ever happen, no matter any subsequent change may done to
the code, then don't add it :)

> Thanks,
> 
> C. 
> 
> > With these fixed,
> > 
> > Reviewed-by: Greg Kurz 
> > 
> >> +count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
> >> +   priority, logic_serv, match);
> >> +if (count < 0) {
> >> +return count;
> >> +}
> >> +
> >> +/*
> >> + * When we implement the save and restore of the thread interrupt
> >> + * contexts in the enter/exit CPU handlers of the machine and the
> >> + * escalations in QEMU, we should be able to handle non dispatched
> >> + * vCPUs.
> >> + *
> >> + * Until this is done, the sPAPR machine should find at least one
> >> + * matching context always.
> >> + */
> >> +if (count == 0) {
> >> +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not 
> >> dispatched\n",
> >> +  nvt_blk, nvt_idx);
> >> +}
> >> +
> >> +return count;
> >> +}
> >> +
> >>  int spapr_get_vcpu_id(PowerPCCPU *cpu)
> >>  {
> >>  return cpu->vcpu_id;
> >> @@ -4366,6 +4399,7 @@ static void spapr_machine_class_init(ObjectClass 
> >> *oc, void *data)
> >>  PPCVirtualHypervisorClass *vhc = PPC_VIRTUAL_HYPERVISOR_CLASS(oc);
> >>  XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
> >>  InterruptStatsProviderClass *ispc = 
> >> INTERRUPT_STATS_PROVIDER_CLASS(oc);
> >> +XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
> >>  
> >>  mc->desc = "pSeries Logical Partition (PAPR compliant)";
> >>  mc->ignore_boot_device_suffixes = true;
> >> @@ -4442,6 +4476,7 @@ static void spapr_machine_class_init(ObjectClass 
> >> *oc, void *data)
> >>  smc->linux_pci_probe = true;
> >>  smc->smp_threads_vsmt = true;
> >>  smc->nr_xirqs = SPAPR_NR_XIRQS;
> >> +xfc->match_nvt = spapr_xive_match_nvt;
> >>  }
> >>  
> >>  static const TypeInfo spapr_machine_info = {
> >> @@ -4460,6 +4495,7 @@ static const TypeInfo spapr_machine_info = {
> >>  { TYPE_PPC_VIRTUAL_HYPERVISOR },
> >>  { TYPE_XICS_FABRIC },
> >>  { TYPE_INTERRUPT_STATS_PROVIDER },
> >> +{ TYPE_XIVE_FABRIC },
> >>  { }
> >>  },
> >>  };
> > 
>

Re: [PATCH] pseries: disable migration-test if /dev/kvm cannot be used

2019-11-20 Thread Juan Quintela

Laurent Vivier  wrote:
> On ppc64, migration-test only works with kvm_hv, and we already
> have a check to verify the module is loaded.
>
> kvm_hv module can be loaded in memory and /sys/module/kvm_hv exists,
> but on some systems (like build systems) /dev/kvm can be missing
> (by administrators choice).
>
> And as kvm_hv exists test-migration is started but QEMU falls back to
> TCG because it cannot be used:
>
> Could not access KVM kernel module: No such file or directory
> failed to initialize KVM: No such file or directory
> Back to tcg accelerator
>
> And as the test is done with TCG, it fails.
>
> As for s390x, we must check for the existence and the access rights
> of /dev/kvm.
>
> Reported-by: Cole Robinson 
> Signed-off-by: Laurent Vivier 

Reviewed-by: Juan Quintela 

Oh, why it is so difficult!!!

Thanks, Juan.

Re: [PATCH for-5.0 v5 15/23] ppc/xive: Use the XiveFabric and XivePresenter interfaces

2019-11-20 Thread Cédric Le Goater

On 20/11/2019 19:30, Greg Kurz wrote:
> On Fri, 15 Nov 2019 17:24:28 +0100
> Cédric Le Goater  wrote:
> 
>> Now that the machines have handlers implementing the XiveFabric and
>> XivePresenter interfaces, remove xive_presenter_match() and make use
>> of the 'match_nvt' handler of the machine.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  hw/intc/xive.c | 48 +---
>>  1 file changed, 17 insertions(+), 31 deletions(-)
>>
> 
> Nice diffstat :)
> 
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 1c9e58f8deac..ab62bda85788 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -1423,30 +1423,6 @@ int xive_presenter_tctx_match(XivePresenter *xptr, 
>> XiveTCTX *tctx,
>>  return -1;
>>  }
>>  
>> -static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
>> - uint8_t nvt_blk, uint32_t nvt_idx,
>> - bool cam_ignore, uint8_t priority,
>> - uint32_t logic_serv, XiveTCTXMatch *match)
>> -{
>> -XivePresenter *xptr = XIVE_PRESENTER(xrtr);
>> -XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
>> -int count;
>> -
>> -count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
>> -   priority, logic_serv, match);
>> -if (count < 0) {
>> -return false;
>> -}
>> -
>> -if (!match->tctx) {
>> -qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
>> -  nvt_blk, nvt_idx);
> 
> Maybe keep this trace...

It's in spapr_xive_match_nvt() now.

> 
>> -return false;
>> -}
>> -
>> -return true;
>> -}
>> -
>>  /*
>>   * This is our simple Xive Presenter Engine model. It is merged in the
>>   * Router as it does not require an extra object.
>> @@ -1462,22 +1438,32 @@ static bool xive_presenter_match(XiveRouter *xrtr, 
>> uint8_t format,
>>   *
>>   * The parameters represent what is sent on the PowerBus
>>   */
>> -static bool xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
>> +static bool xive_presenter_notify(uint8_t format,
>>uint8_t nvt_blk, uint32_t nvt_idx,
>>bool cam_ignore, uint8_t priority,
>>uint32_t logic_serv)
>>  {
>> +XiveFabric *xfb = XIVE_FABRIC(qdev_get_machine());
>> +XiveFabricClass *xfc = XIVE_FABRIC_GET_CLASS(xfb);
>>  XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
>> -bool found;
>> +int count;
>>  
>> -found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
>> - priority, logic_serv, &match);
>> -if (found) {
>> +/*
>> + * Ask the machine to scan the interrupt controllers for a match
>> + */
>> +count = xfc->match_nvt(xfb, format, nvt_blk, nvt_idx, cam_ignore,
>> +   priority, logic_serv, &match);
>> +if (count < 0) {
>> +return false;
>> +}
>> +
>> +/* handle CPU exception delivery */
>> +if (count) {
>>  ipb_update(&match.tctx->regs[match.ring], priority);
>>  xive_tctx_notify(match.tctx, match.ring);
>>  }
> 
> ... in an else block here ^^ ?
> 
>>  
>> -return found;
>> +return count;
> 
> Implicit cast is ok I guess, but !!count would ensure no paranoid
> compiler ever complains.

yes. 

Thanks,

C.


> 
>>  }
>>  
>>  /*
>> @@ -1590,7 +1576,7 @@ static void xive_router_end_notify(XiveRouter *xrtr, 
>> uint8_t end_blk,
>>  return;
>>  }
>>  
>> -found = xive_presenter_notify(xrtr, format, nvt_blk, nvt_idx,
>> +found = xive_presenter_notify(format, nvt_blk, nvt_idx,
>>xive_get_field32(END_W7_F0_IGNORE, end.w7),
>>priority,
>>xive_get_field32(END_W7_F1_LOG_SERVER_ID, 
>> end.w7));
>

Re: [PATCH for-5.0 v5 14/23] ppc/spapr: Implement the XiveFabric interface

2019-11-20 Thread Cédric Le Goater

On 20/11/2019 18:53, Greg Kurz wrote:
> On Fri, 15 Nov 2019 17:24:27 +0100
> Cédric Le Goater  wrote:
> 
>> The CAM line matching sequence in the pseries machine does not change
>> much apart from the use of the new QOM interfaces. There is an extra
>> indirection because of the sPAPR IRQ backend of the machine. Only the
>> XIVE backend implements the new 'match_nvt' handler.
>>
> 
> The changelog needs an update since you dropped the indirection you had
> in v4.

Indeed.

> 
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  hw/ppc/spapr.c | 36 
>>  1 file changed, 36 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 94f9d27096af..a8f5850f65bb 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -4270,6 +4270,39 @@ static void 
>> spapr_pic_print_info(InterruptStatsProvider *obj,
>> kvm_irqchip_in_kernel() ? "in-kernel" : "emulated");
>>  }
>>  
>> +static int spapr_xive_match_nvt(XiveFabric *xfb, uint8_t format,
>> +uint8_t nvt_blk, uint32_t nvt_idx,
>> +bool cam_ignore, uint8_t priority,
>> +uint32_t logic_serv, XiveTCTXMatch *match)
>> +{
>> +SpaprMachineState *spapr = SPAPR_MACHINE(xfb);
>> +XivePresenter *xptr = XIVE_PRESENTER(spapr->xive);
>> +XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
>> +int count;
>> +
> 
> As suggested by David, you should probably assert() that XIVE is in use
> for extra paran^Wsafety.

I don't see the need. The stack call is clear enough IMO. It can only be 
reached from the XiveRouter.

Thanks,

C. 

> With these fixed,
> 
> Reviewed-by: Greg Kurz 
> 
>> +count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
>> +   priority, logic_serv, match);
>> +if (count < 0) {
>> +return count;
>> +}
>> +
>> +/*
>> + * When we implement the save and restore of the thread interrupt
>> + * contexts in the enter/exit CPU handlers of the machine and the
>> + * escalations in QEMU, we should be able to handle non dispatched
>> + * vCPUs.
>> + *
>> + * Until this is done, the sPAPR machine should find at least one
>> + * matching context always.
>> + */
>> +if (count == 0) {
>> +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not 
>> dispatched\n",
>> +  nvt_blk, nvt_idx);
>> +}
>> +
>> +return count;
>> +}
>> +
>>  int spapr_get_vcpu_id(PowerPCCPU *cpu)
>>  {
>>  return cpu->vcpu_id;
>> @@ -4366,6 +4399,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
>> void *data)
>>  PPCVirtualHypervisorClass *vhc = PPC_VIRTUAL_HYPERVISOR_CLASS(oc);
>>  XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
>>  InterruptStatsProviderClass *ispc = INTERRUPT_STATS_PROVIDER_CLASS(oc);
>> +XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
>>  
>>  mc->desc = "pSeries Logical Partition (PAPR compliant)";
>>  mc->ignore_boot_device_suffixes = true;
>> @@ -4442,6 +4476,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
>> void *data)
>>  smc->linux_pci_probe = true;
>>  smc->smp_threads_vsmt = true;
>>  smc->nr_xirqs = SPAPR_NR_XIRQS;
>> +xfc->match_nvt = spapr_xive_match_nvt;
>>  }
>>  
>>  static const TypeInfo spapr_machine_info = {
>> @@ -4460,6 +4495,7 @@ static const TypeInfo spapr_machine_info = {
>>  { TYPE_PPC_VIRTUAL_HYPERVISOR },
>>  { TYPE_XICS_FABRIC },
>>  { TYPE_INTERRUPT_STATS_PROVIDER },
>> +{ TYPE_XIVE_FABRIC },
>>  { }
>>  },
>>  };
>

Re: [PATCH v3 07/33] serial: register vmsd with DeviceClass

2019-11-20 Thread Marc-André Lureau

Hi

On Wed, Nov 20, 2019 at 10:54 PM Dr. David Alan Gilbert
 wrote:
>
> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote:
> > On Tue, Nov 19, 2019 at 2:35 PM Peter Maydell  
> > wrote:
> > >
> > > On Tue, 19 Nov 2019 at 10:23, Marc-André Lureau
> > >  wrote:
> > > > On Mon, Nov 18, 2019 at 6:22 PM Peter Maydell 
> > > >  wrote:
> > > > > Did you test whether migration still works from a QEMU
> > > > > version without this patch to one with it? (The migration
> > > >
> > > > Yes, I thought I did test correctly, but I realized testing with x86
> > > > isn't correct.
> > > >
> > > > So with arm/musicpal for ex, I can migrate from before->after, however
> > > > after->before won't work. Is that ok?
> > >
> > > Broadly speaking, the only case where we care about not
> > > breaking cross-version migration is where we have a versioned
> > > machine type. So musicpal doesn't matter too much. Beyond
> > > that, yes, generally before->after is more important than
> > > after->before. I have a feeling Red Hat downstream cares about
> > > after->before migration at least for x86 but you or your colleagues
> > > would know that better than me :-)
> > >
> > > > > vmstate code is too complicated for me to be able to figure
> > > > > out whether passing the 'dev' pointer makes a difference
> > > > > to whot it names the state sections and whether the
> > > > > 'qdev_set_legacy_instance_id' suffices to avoid problems.)
> > > >
> > > > I don't see a way to fix after->before, because the instance id is
> > > > initially 0 with the new code, and the old code expect a different
> > > > value.
> > >
> > > Can you explain how the instance ID stuff works? I was
> > > expecting that the result of setting the legacy instance ID
> > > would just be that the new version would always have
> > > the older setting, so if it works for old->new it would also
> > > work for new->old. But as I say I don't understand this bit
> > > of the migration code.
> >
> > From what I understand, the alias_id is only used in
> > savevm.c:find_se(), and thus can only be used to match against
> > "legacy" instance id values. On new code, instance_id is generated
> > incrementally from 0 with calculate_new_instance_id(), based on
> > "qdev-path/vmsd-name".
>
> I think there are cases here there's no qdev path that's viable;
> e.g. for ISA devices, the ID is set to the ISA IO base:
>
> hw/char/serial-isa.c
> 79:qdev_set_legacy_instance_id(dev, isa->iobase, 3);
>
> (In serial_isa_realizefn )
>
> but to be honest I'd have to trace this out and see what values the
> devices are actually using to be sure.

There is no qdev path, because ISA bus doesn't have get_dev_path()
implemented for some reason.

However, vmstate_register_with_alias_id() will use
calculate_new_instance_id(se->idstr) in this case.

>
> (And yes, please don't break backwards migration; otherwise I'll
> end up having to figure out a fix).

My understanding is that qdev_set_legacy_instance_id() always broke
backward migration.

To keep backward migration to work, we would need a mechanism to
"force" to use legacy instance id.

Would it be acceptable to have a patch that does that when the
original VM state uses legacy instance id?
If you start the VM with the new code path, and try to migrate to the
old / legacy it would fail. But migrating existing old VM back and
forth between old/new would work.





-- 
Marc-André Lureau

Re: [PATCH 0/6] qapi: Module fixes and cleanups

2019-11-20 Thread Markus Armbruster

Markus Armbruster  writes:

> Kevin recently posted a minimally invasive fix for empty QAPI
> modules[*].  This is my attempt at a fix that also addresses the
> design weakness that led to the bug.

[*] Subject: [RFC PATCH 15/18] qapi: Support empty modules
Message-Id: <20191017130204.16131-16-kw...@redhat.com>

Re: [PATCH] Add minimal Hexagon target - First in a series of patches - linux-user changes + linux-user/hexagon + skeleton of target/hexagon - Files in target/hexagon/imported are from another project

2019-11-20 Thread Aleksandar Markovic

On Wed, Nov 20, 2019 at 8:49 AM Richard Henderson
 wrote:
>

> How's that?  He has been asked to split the linux-user stuff from the target
> skeleton stuff.

...

> This argument would make more sense if there were more present here than a
> skeleton.

Speaking about anatomy, I am opposed to upstreaming any "skeletons".
The other month, another community was dead serious wanting to
upstream code based on "proposal of the draft" (or was it "draft of
the proposal"), and now we want to upstream "skeletons"??

And even that "skeleton" can't be regularly built stage by stage, but
must resort to "enable configure at the end" cheap tricks?

What happened to QEMU upstream?

If this is really just a skeleton that can't be organized in a decent
patch series that actually builds, my recommendation to Taylor is
simply to postpone upstreaming until the skeleton is made stronger,
all bones are in their right place, and the full body is ready - what
is the point/purpose of having such a skeleton in QEMU upstream?

I am slightly disappointed that after a slick presentation on KVM
Forum, we now talk about a "skeleton".

Yours,
Aleksandar

Re: guest / host buffer sharing ...

2019-11-20 Thread Tomasz Figa

On Thu, Nov 21, 2019 at 6:41 AM Geoffrey McRae  wrote:
>
>
>
> On 2019-11-20 23:13, Tomasz Figa wrote:
> > Hi Geoffrey,
> >
> > On Thu, Nov 7, 2019 at 7:28 AM Geoffrey McRae 
> > wrote:
> >>
> >>
> >>
> >> On 2019-11-06 23:41, Gerd Hoffmann wrote:
> >> > On Wed, Nov 06, 2019 at 05:36:22PM +0900, David Stevens wrote:
> >> >> > (1) The virtio device
> >> >> > =
> >> >> >
> >> >> > Has a single virtio queue, so the guest can send commands to register
> >> >> > and unregister buffers.  Buffers are allocated in guest ram.  Each 
> >> >> > buffer
> >> >> > has a list of memory ranges for the data. Each buffer also has some
> >> >>
> >> >> Allocating from guest ram would work most of the time, but I think
> >> >> it's insufficient for many use cases. It doesn't really support things
> >> >> such as contiguous allocations, allocations from carveouts or <4GB,
> >> >> protected buffers, etc.
> >> >
> >> > If there are additional constrains (due to gpu hardware I guess)
> >> > I think it is better to leave the buffer allocation to virtio-gpu.
> >>
> >> The entire point of this for our purposes is due to the fact that we
> >> can
> >> not allocate the buffer, it's either provided by the GPU driver or
> >> DirectX. If virtio-gpu were to allocate the buffer we might as well
> >> forget
> >> all this and continue using the ivshmem device.
> >
> > I don't understand why virtio-gpu couldn't allocate those buffers.
> > Allocation doesn't necessarily mean creating new memory. Since the
> > virtio-gpu device on the host talks to the GPU driver (or DirectX?),
> > why couldn't it return one of the buffers provided by those if
> > BIND_SCANOUT is requested?
> >
>
> Because in our application we are a user-mode application in windows
> that is provided with buffers that were allocated by the video stack in
> windows. We are not using a virtual GPU but a physical GPU via vfio
> passthrough and as such we are limited in what we can do. Unless I have
> completely missed what virtio-gpu does, from what I understand it's
> attempting to be a virtual GPU in its own right, which is not at all
> suitable for our requirements.

Not necessarily. virtio-gpu in its basic shape is an interface for
allocating frame buffers and sending them to the host to display.

It sounds to me like a PRIME-based setup similar to how integrated +
discrete GPUs are handled on regular systems could work for you. The
virtio-gpu device would be used like the integrated GPU that basically
just drives the virtual screen. The guest component that controls the
display of the guest (typically some sort of a compositor) would
allocate the frame buffers using virtio-gpu and then import those to
the vfio GPU when using it for compositing the parts of the screen.
The parts of the screen themselves would be rendered beforehand by
applications into local buffers managed fully by the vfio GPU, so
there wouldn't be any need to involve virtio-gpu there. Only the
compositor would have to be aware of it.

Of course if your guest is not Linux, I have no idea if that can be
handled in any reasonable way. I know those integrated + discrete GPU
setups do work on Windows, but things are obviously 100% proprietary,
so I don't know if one could make them work with virtio-gpu as the
integrated GPU.

>
> This discussion seems to have moved away completely from the original
> simple feature we need, which is to share a random block of guest
> allocated ram with the host. While it would be nice if it's contiguous
> ram, it's not an issue if it's not, and with udmabuf (now I understand
> it) it can be made to appear contigous if it is so desired anyway.
>
> vhost-user could be used for this if it is fixed to allow dynamic
> remapping, all the other bells and whistles that are virtio-gpu are
> useless to us.
>

As far as I followed the thread, my impression is that we don't want
to have an ad-hoc interface just for sending memory to the host. The
thread was started to look for a way to create identifiers for guest
memory, which proper virtio devices could use to refer to the memory
within requests sent to the host.

That said, I'm not really sure if there is any benefit of making it
anything other than just the specific virtio protocol accepting
scatterlist of guest pages directly.

Putting the ability to obtain the shared memory itself, how do you
trigger a copy from the guest frame buffer to the shared memory?

> >>
> >> Our use case is niche, and the state of things may change if vendors
> >> like
> >> AMD follow through with their promises and give us SR-IOV on consumer
> >> GPUs, but even then we would still need their support to achieve the
> >> same
> >> results as the same issue would still be present.
> >>
> >> Also don't forget that QEMU already has a non virtio generic device
> >> (IVSHMEM). The only difference is, this device doesn't allow us to
> >> attain
> >> zero-copy transfers.
> >>
> >> Currently IVSHMEM is used by two projects that I am aware of, Looki

Re: [PATCH v3 0/4] net/virtio: fixes for failover

2019-11-20 Thread Jason Wang




On 2019/11/20 下午11:56, Michael S. Tsirkin wrote:

On Wed, Nov 20, 2019 at 04:49:47PM +0100, Jens Freimann wrote:

This series fixes bugs found by coverity and one reported by David
Gilbert.


Looks good.  Jason can you merge this pls?



Yes, applied.

Thanks





v2->v3:
  * change patch description and subject of patch 3/4

Jens Freimann (4):
   net/virtio: fix dev_unplug_pending
   net/virtio: return early when failover primary alread added
   net/virtio: fix re-plugging of primary device
   net/virtio: return error when device_opts arg is NULL

  hw/net/virtio-net.c | 58 +
  migration/savevm.c  |  3 ++-
  2 files changed, 40 insertions(+), 21 deletions(-)

--
2.21.0

Re: [PATCH] spapr: Fix VSMT mode when it is not supported by the kernel

2019-11-20 Thread David Gibson

On Wed, Nov 20, 2019 at 03:28:19PM +0100, Laurent Vivier wrote:
> On 20/11/2019 12:41, David Gibson wrote:
> > On Wed, Nov 20, 2019 at 12:28:19PM +0100, Laurent Vivier wrote:
> >> On 20/11/2019 10:00, Laurent Vivier wrote:
> >>> On 20/11/2019 05:36, David Gibson wrote:
>  On Tue, Nov 19, 2019 at 04:45:26PM +0100, Greg Kurz wrote:
> > On Tue, 19 Nov 2019 15:06:51 +0100
> > Laurent Vivier  wrote:
> >
> >> On 19/11/2019 02:00, David Gibson wrote:
> >>> On Fri, Nov 08, 2019 at 05:47:59PM +0100, Greg Kurz wrote:
>  On Fri,  8 Nov 2019 16:40:35 +0100
>  Laurent Vivier  wrote:
> 
> > Commit 29cb4187497d sets by default the VSMT to smp_threads,
> > but older kernels (< 4.13) don't support that.
> >
> > We can reasonably restore previous behavior with this kernel
> > to allow to run QEMU as before.
> >
> > If VSMT is not supported, VSMT will be set to MAX(8, smp_threads)
> > as it is done for previous machine types (< pseries-4.2)
> >
> 
>  It is usually _bad_ to base the machine behavior on host 
>  capabilities.
>  What happens if we migrate between an older kernel and a recent one ?
> >>>
> >>> Right.  We're really trying to remove instaces of such behaviour.  I'd
> >>> prefer to completely revert Greg's original patch than to re-introduce
> >>> host configuration dependency into the guest configuration..
> >>>
>  I understand this is to fix tests/migration-test on older kernels.
>  Couldn't this be achieved with migration-test doing some 
>  introspection
>  and maybe pass vsmt=8 on the QEMU command line ?
> >>>
> >>> ..adjusting the test case like this might be a better idea, though.
> >>>
> >>> What's the test setup where we're using the old kernel?  I really only
> >>> applied the original patch on the guess that we didn't really care
> >>> about kernels that old.  The fact you've hit this in practice makes me
> >>> doubt that assumption.
> >>>
> >>
> >> The way to fix the tests is to add "-smp threads=8" on the command line
> >> (for all tests, so basically in qtest_init_without_qmp_handshake(), and
> >> it will impact all the machine types), and we have to check if it is
> >
> > Ohhh... it isn't possible to initialize Qtest with machine specific
> > properties ? That's a bit unfortunate :-\
> 
>  Uhh... I don't see why we can't.  Couldn't we just put either -machine
>  vsmt=8 or -smp 8 into the cmd_src / cmd_dst printfs() in the
>  strcmp(arch, "ppc64") case?
> >>>
> >>> Yes, but we need to do that to all other tests that fail. test-migration
> >>> is not the only one impacted by the problem (we have also pxe-test), so
> >>> it's why I thought to fix the problem in a generic place.
> >>>
> >>> But it seems there are only this couple of tests that are impacted so I
> >>> can modify both instead. I think only tests that really start CPU have
> >>> the problem.
> >>>
> >>> I'm going to send a patch to fix that.
> >>
> >> And again, it's a little bit more complicated than expected: setting
> >> vsmt to 8 works only with kvm_hv, but breaks in case of TCG or kvm_pr.
> >> So the test must check what is in use...
> > 
> > Ugh, yeah, that's getting too ugly.  I think the feasible options are
> > either to revert the patch, or just say that upstream qemu no longer
> > supports a RHEL7 host.
> 
> In I was mistakenly using "-smp threads=8", with "-M vsmt=8" it works
> with TCG and KVM PR (with a warning).

Ah, yes, that's not so bad.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH v6] Implement backend program convention command for vhost-user-blk

2019-11-20 Thread Micky Yun Chan

From: michan 

This patch is to add standard commands defined in docs/interop/vhost-user.rst
For vhost-user-* program

Signed-off-by: Micky Yun Chan (michiboo) 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 108 ++--
 docs/interop/vhost-user.json|  31 +++
 2 files changed, 95 insertions(+), 44 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index ae61034656..6fd91c7e99 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -576,70 +576,90 @@ vub_new(char *blk_file)
 return vdev_blk;
 }
 
+static int opt_fdnum = -1;
+static char *opt_socket_path;
+static char *opt_blk_file;
+static gboolean opt_print_caps;
+static gboolean opt_read_only;
+
+static GOptionEntry entries[] = {
+{ "print-capabilities", 'c', 0, G_OPTION_ARG_NONE, &opt_print_caps,
+  "Print capabilities", NULL },
+{ "fd", 'f', 0, G_OPTION_ARG_INT, &opt_fdnum,
+  "Use inherited fd socket", "FDNUM" },
+{ "socket-path", 's', 0, G_OPTION_ARG_FILENAME, &opt_socket_path,
+  "Use UNIX socket path", "PATH" },
+{"blk-file", 'b', 0, G_OPTION_ARG_FILENAME, &opt_blk_file,
+ "block device or file path", "PATH"},
+{ "read-only", 'r', 0, G_OPTION_ARG_NONE, &opt_read_only,
+  "Enable read-only", NULL }
+};
+
 int main(int argc, char **argv)
 {
-int opt;
-char *unix_socket = NULL;
-char *blk_file = NULL;
-bool enable_ro = false;
 int lsock = -1, csock = -1;
 VubDev *vdev_blk = NULL;
+GError *error = NULL;
+GOptionContext *context;
 
-while ((opt = getopt(argc, argv, "b:rs:h")) != -1) {
-switch (opt) {
-case 'b':
-blk_file = g_strdup(optarg);
-break;
-case 's':
-unix_socket = g_strdup(optarg);
-break;
-case 'r':
-enable_ro = true;
-break;
-case 'h':
-default:
-printf("Usage: %s [ -b block device or file, -s UNIX domain socket"
-   " | -r Enable read-only ] | [ -h ]\n", argv[0]);
-return 0;
+context = g_option_context_new(NULL);
+g_option_context_add_main_entries(context, entries, NULL);
+if (!g_option_context_parse(context, &argc, &argv, &error)) {
+g_printerr("Option parsing failed: %s\n", error->message);
+exit(EXIT_FAILURE);
+}
+if (opt_print_caps) {
+g_print("{\n");
+g_print("  \"type\": \"block\",\n");
+g_print("  \"features\": [\n");
+g_print("\"read-only\",\n");
+g_print("\"blk-file\"\n");
+g_print("  ]\n");
+g_print("}\n");
+exit(EXIT_SUCCESS);
+}
+
+if (!opt_blk_file) {
+g_print("%s\n", g_option_context_get_help(context, true, NULL));
+exit(EXIT_FAILURE);
+}
+
+if (opt_socket_path) {
+lsock = unix_sock_new(opt_socket_path);
+if (lsock < 0) {
+exit(EXIT_FAILURE);
 }
+} else if (opt_fdnum < 0) {
+g_print("%s\n", g_option_context_get_help(context, true, NULL));
+exit(EXIT_FAILURE);
+} else {
+lsock = opt_fdnum;
 }
 
-if (!unix_socket || !blk_file) {
-printf("Usage: %s [ -b block device or file, -s UNIX domain socket"
-   " | -r Enable read-only ] | [ -h ]\n", argv[0]);
-return -1;
-}
-
-lsock = unix_sock_new(unix_socket);
-if (lsock < 0) {
-goto err;
-}
-
-csock = accept(lsock, (void *)0, (void *)0);
+csock = accept(lsock, NULL, NULL);
 if (csock < 0) {
-fprintf(stderr, "Accept error %s\n", strerror(errno));
-goto err;
+g_printerr("Accept error %s\n", strerror(errno));
+exit(EXIT_FAILURE);
 }
 
-vdev_blk = vub_new(blk_file);
+vdev_blk = vub_new(opt_blk_file);
 if (!vdev_blk) {
-goto err;
+exit(EXIT_FAILURE);
 }
-if (enable_ro) {
+if (opt_read_only) {
 vdev_blk->enable_ro = true;
 }
 
 if (!vug_init(&vdev_blk->parent, VHOST_USER_BLK_MAX_QUEUES, csock,
   vub_panic_cb, &vub_iface)) {
-fprintf(stderr, "Failed to initialized libvhost-user-glib\n");
-goto err;
+g_printerr("Failed to initialize libvhost-user-glib\n");
+exit(EXIT_FAILURE);
 }
 
 g_main_loop_run(vdev_blk->loop);
-
+g_main_loop_unref(vdev_blk->loop);
+g_option_context_free(context);
 vug_deinit(&vdev_blk->parent);
-
-err:
 vub_free(vdev_blk);
 if (csock >= 0) {
 close(csock);
@@ -647,8 +667,8 @@ err:
 if (lsock >= 0) {
 close(lsock);
 }
-g_free(unix_socket);
-g_free(blk_file);
+g_free(opt_socket_path);
+g_free(opt_blk_file);
 
 return 0;
 }
diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index da6aaf51c8..d25c3a957f 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -54,6 +54,37 @@
   ]

Open ISA (RISC-V, OpenPOWER, etc) Miniconf at LCA 2020

2019-11-20 Thread Alistair Francis

Hi All,

We’re pleased to announce the first Open ISA miniconf at linux.conf.au

Open Instruction Set Architectures like RISC-V, OpenPOWER and others are
the next step in the evolution of Open Hardware.  The mini-conference
will commence with a brief overview of Open ISAs in general.

It will then introduce the two most common open ISAs, RISC-V and
OpenPOWER. This will include an overview of the ISAs, how they are
supported and why people should use them. Finally we will delve straight
into a set of curated presentations from across the Open ISA ecosystem.

The miniconf organisers, Alistair Francis and Hugh Blemings along with
the LCA 2020 team invite proposals for sessions in the Open ISA miniconf
of either 15 or 30 minutes duration.  Suggested topics include Linux
kernel support for open ISAs, RISC-V, Open POWER and any other Open ISA
related topics.  As befits LCA, sessions should have a strong technical
emphasis rather than marketing/sales focus.

Places are limited in the miniconf schedule and early submission of your
proposal will assist our planning.  The Call for Sessions formally
closes on December 8th.

For attendees - the miniconf promises a great deal of technical depth
and breadth across this relatively new aspect of the open technical
commons, we look forward to seeing you!

As an extra plus it's on the beautiful Gold Coast in the middle of Summer :)

Registration and details are available at:
https://linux.conf.au/programme/miniconfs/open-isa/

And don't forget there's a bunch of other awesome miniconfs scheduled
for LCA 2020 - the complete list is available here -
https://linux.conf.au/programme/miniconfs/

We look forward to seeing you!

Regards,
Hugh Blemings & Alistair Francis

Re: [PATCH] exynos4210_gic: Suppress gcc9 format-truncation warnings

2019-11-20 Thread David Gibson

On Wed, Nov 20, 2019 at 10:31:48AM +, Peter Maydell wrote:
> On Wed, 20 Nov 2019 at 05:27, David Gibson  
> wrote:
> >
> > On Mon, Oct 14, 2019 at 01:51:39PM +0100, Peter Maydell wrote:
> > > If we assert() that num_cpu is always <= EXYNOS4210_NCPUS
> > > is that sufficient to clue gcc in that the buffer can't overflow?
> >
> > Interestingly, assert(s->num_cpu <= EXYNOS$210_NCPUS) is *not*
> > sufficient, but assert(i <= EXYNOS4210_NCPUS) within the loop *is*
> > enough.  I've updated my patch accordingly.
> >
> > This isn't 4.2 material, obviously.  Should I just sit on it until 5.0
> > opens, or does one of you have someplace to stage the patch in the
> > meanwhile?
> 
> Easy fixes for compiler warnings aren't inherently out of scope
> for 4.2. I'm also collecting stuff for 5.0 anyway so I suggest you
> just send the patch.

Ok, done.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH] pseries: fix migration-test and pxe-test

2019-11-20 Thread David Gibson

On Wed, Nov 20, 2019 at 03:25:39PM +0100, Laurent Vivier wrote:
> Commit 29cb4187497d ("spapr: Set VSMT to smp_threads by default")
> has introduced a new default value for VSMT that is not supported
> by old kernels (before 4.13 kernel) and this breaks "make check"
> on these kernels.
> 
> To fix that, explicitly set in the involved tests the value that was
> used as the default value before the change.
> 
> Cc: Greg Kurz 
> Signed-off-by: Laurent Vivier 

Applied to ppc-for-4.2, thanks.

> ---
>  tests/migration-test.c | 4 ++--
>  tests/pxe-test.c   | 6 +++---
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index ac780dffdaad..ebd77a581aff 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -614,7 +614,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  end_address = S390_TEST_MEM_END;
>  } else if (strcmp(arch, "ppc64") == 0) {
>  extra_opts = use_shmem ? get_shmem_opts("256M", shmem_path) : NULL;
> -cmd_src = g_strdup_printf("-machine accel=%s -m 256M -nodefaults"
> +cmd_src = g_strdup_printf("-machine accel=%s,vsmt=8 -m 256M 
> -nodefaults"
>" -name source,debug-threads=on"
>" -serial file:%s/src_serial"
>" -prom-env 'use-nvramrc?=true' -prom-env "
> @@ -623,7 +623,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>"until' %s %s",  accel, tmpfs, end_address,
>start_address, extra_opts ? extra_opts : 
> "",
>opts_src);
> -cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
> +cmd_dst = g_strdup_printf("-machine accel=%s,vsmt=8 -m 256M"
>" -name target,debug-threads=on"
>" -serial file:%s/dest_serial"
>" -incoming %s %s %s",
> diff --git a/tests/pxe-test.c b/tests/pxe-test.c
> index 948b0fbdc727..aaae54f7550d 100644
> --- a/tests/pxe-test.c
> +++ b/tests/pxe-test.c
> @@ -46,15 +46,15 @@ static testdef_t x86_tests_slow[] = {
>  
>  static testdef_t ppc64_tests[] = {
>  { "pseries", "spapr-vlan",
> -  "-machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken" },
> +  "-machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,vsmt=8" },
>  { "pseries", "virtio-net-pci",
> -  "-machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken" },
> +  "-machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,vsmt=8" },
>  { NULL },
>  };
>  
>  static testdef_t ppc64_tests_slow[] = {
>  { "pseries", "e1000",
> -  "-machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken" },
> +  "-machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,vsmt=8" },
>  { NULL },
>  };
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[PATCH v2] exynos4210_gic: Suppress gcc9 format-truncation warnings

2019-11-20 Thread David Gibson

exynos4210_gic_realize() prints the number of cpus into some temporary
buffers, but it only allows 3 bytes space for it.  That's plenty:
existing machines will only ever set this value to EXYNOS4210_NCPUS
(2).  But the compiler can't always figure that out, so some[*] gcc9
versions emit -Wformat-truncation warnings.

We can fix that by hinting the constraint to the compiler with a
suitably placed assert().

[*] The bizarre thing here, is that I've long gotten these warnings
compiling in a 32-bit x86 container as host - Fedora 30 with
gcc-9.2.1-1.fc30.i686 - but it compiles just fine on my normal
x86_64 host - Fedora 30 with and gcc-9.2.1-1.fc30.x86_64.

Signed-off-by: David Gibson 

Changes since v1:
 * Used an assert to hint the compiler, instead of increasing the
   buffer size.

---
 hw/intc/exynos4210_gic.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index a1b699b6ba..ed4d8482e3 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -314,6 +314,14 @@ static void exynos4210_gic_realize(DeviceState *dev, Error 
**errp)
 EXYNOS4210_EXT_GIC_DIST_REGION_SIZE);
 
 for (i = 0; i < s->num_cpu; i++) {
+/*
+ * This clues in gcc that our on-stack buffers do, in fact
+ * have enough room for the cpu numbers.  gcc 9.2.1 on 32-bit
+ * x86 doesn't figure this out, otherwise and gives spurious
+ * warnings.
+ */
+assert(i <= EXYNOS4210_NCPUS);
+
 /* Map CPU interface per SMP Core */
 sprintf(cpu_alias_name, "%s%x", cpu_prefix, i);
 memory_region_init_alias(&s->cpu_alias[i], obj,
-- 
2.23.0

Re: [PATCH v16 11/14] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)

2019-11-20 Thread Tao Xu


On 11/20/2019 6:09 PM, Igor Mammedov wrote:

On Fri, 15 Nov 2019 15:53:49 +0800
Tao Xu  wrote:


From: Liu Jingqi 

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

Changes in v16:
 - Add more description for lb_length (Igor)
 - Drop entry_list and calculate entries in this patch (Igor)

Changes in v13:
 - Calculate the entries in a new patch.
---
  hw/acpi/hmat.c | 105 -
  1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 9ff79308a4..ed19ebed2f 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -25,8 +25,10 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/units.h"
  #include "sysemu/numa.h"
  #include "hw/acpi/hmat.h"
+#include "qemu/error-report.h"


do you really need this header in this patch?



I will drop this header in next version


modulo above nit, patch looks good so
with above fixed (if necessary)

Reviewed-by: Igor Mammedov 

  
  /*

   * ACPI 6.3:
@@ -67,11 +69,89 @@ static void build_hmat_mpda(GArray *table_data, uint16_t 
flags,
  build_append_int_noprefix(table_data, 0, 8);
  }
  
+/*

+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+  uint32_t num_initiator, uint32_t num_target,
+  uint32_t *initiator_list)
+{
+int i, index;
+HMAT_LB_Data *lb_data;
+uint16_t *entry_list;
+uint32_t base;
+/* Length in bytes for entire structure */
+uint32_t lb_length
+= 32 /* Table length upto and including Entry Base Unit */
++ 4 * num_initiator /* Initiator Proximity Domain List */
++ 4 * num_target /* Target Proximity Domain List */
++ 2 * num_initiator * num_target; /* Latency or Bandwidth Entries */
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, lb_length, 4);
+/* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+assert(!(hmat_lb->hierarchy >> 4));
+build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
+/* Data Type */
+build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Number of Initiator Proximity Domains (s) */
+build_append_int_noprefix(table_data, num_initiator, 4);
+/* Number of Target Proximity Domains (t) */
+build_append_int_noprefix(table_data, num_target, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Entry Base Unit */
+if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+/* Convert latency base from nanoseconds to picosecond */
+base = hmat_lb->base * 1000;
+} else {
+/* Convert bandwidth base from Byte to Megabyte */
+base = hmat_lb->base / MiB;
+}
+build_append_int_noprefix(table_data, base, 8);
+
+/* Initiator Proximity Domain List */
+for (i = 0; i < num_initiator; i++) {
+build_append_int_noprefix(table_data, initiator_list[i], 4);
+}
+
+/* Target Proximity Domain List */
+for (i = 0; i < num_target; i++) {
+build_append_int_noprefix(table_data, i, 4);
+}
+
+/* Latency or Bandwidth Entries */
+entry_list = g_malloc0(hmat_lb->list->len * sizeof(uint16_t));
+for (i = 0; i < hmat_lb->list->len; i++) {
+lb_data = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
+index = lb_data->initiator * num_target + lb_data->target;
+
+entry_list[index] = (uint16_t)(lb_data->data / hmat_lb->base);
+}
+
+for (i = 0; i < num_initiator * num_target; i++) {
+build_append_int_noprefix(table_data, entry_list[i], 2);
+}
+
+g_free(entry_list);
+}
+
  /* Build HMAT sub table structures */
  static void hmat_build_table_structs(GArray *table_data, NumaState 
*numa_state)
  {
  uint16_t flags;
-int i;
+uint32_t num_initiator = 0;
+uint32_t initiator_list[MAX_NODES];
+int i, hierarchy, type;
+HMAT_LB_Info *hmat_lb;
  
  for (i = 0; i < numa_state->num_nodes; i++) {

  flags = 0;
@@ -82,6 +162,29 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *numa_state)
  
  build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);

  }
+
+for (i = 0; i < numa_state->num_nodes; i++) {
+if (numa_state->nodes[i].has_cpu) {
+initiator_list[num_initiator++] = i;
+}
+}
+
+

Re: [PATCH v16 08/14] numa: Extend CLI to provide memory latency and bandwidth information

2019-11-20 Thread Tao Xu


On 11/20/2019 8:56 PM, Igor Mammedov wrote:

On Wed, 20 Nov 2019 15:55:04 +0800
Tao Xu  wrote:


On 11/19/2019 7:03 PM, Igor Mammedov wrote:

On Fri, 15 Nov 2019 15:53:46 +0800
Tao Xu  wrote:
   

From: Liu Jingqi 

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 


looks good to me, so

Reviewed-by: Igor Mammedov 


PS:
also see question below
   

[...]

+
+hmat_lb->range_bitmap |= node->bandwidth;
+first_bit = ctz64(hmat_lb->range_bitmap);
+hmat_lb->base = UINT64_C(1) << first_bit;
+max_entry = node->bandwidth / hmat_lb->base;
+last_bit = 64 - clz64(hmat_lb->range_bitmap);
+
+/*
+ * For bandwidth, first_bit record the base unit of bandwidth bits,
+ * last_bit record the last bit of the max bandwidth. The max 
compressed
+ * bandwidth should be less than 0x (UINT16_MAX)
+ */
+if ((last_bit - first_bit) > UINT16_BITS || max_entry >= UINT16_MAX) {

 ^^^
what bandwidth combination is going to trigger above condition?
   

Only use (last_bit - first_bit) > UINT16_BITS, we can't trigger error if
the max compressed bandwidth is 0x. Because in that condition,
"last_bit - first_bit == UINT16_BITS". So I add "max_entry >=
UINT16_MAX" to catch 0x. For example:

Combination 1 (Error):
bandwidth1   = ...    1110 ... (max_entry 32767)
range_bitmap = ...    1110 ... (range is 15 bits)
bandwidth2   = ...     ... (max_entry 65535)
range_bitmap = ...     ... (range is 16 bits)

Combination 2 (Error):
bandwidth1   = ...    1110 ... (max_entry 32767)
range_bitmap = ...    1110 ... (range is 15 bits)
bandwidth2   = ...0001    1110 ... (max_entry 65535)
range_bitmap = ...0001    1110 ... (range is 16 bits)

Combination 3 (OK, because bandwidth1 will be compressed to 65534):
bandwidth1   = ...    1110 ... (max_entry 32767)
range_bitmap = ...    1110 ... (range is 15 bits)
bandwidth2   = ... 0111    ... (max_entry 32767)
range_bitmap = ...     ... (range is 16 bits)

Combination 4 (Error):
bandwidth1   = ...     ... (max_entry 65535)
range_bitmap = ...     ... (range is 16 bits)


ok, I'd use in max/min possible values in bios-tables-test,
to make sure that we are testing whole range and would be able
to detect a error in case the valid ranges regressed (shrink)
and x-fail tests I've asked for in QMP test should detect
error other way around.


OK I will add these tests.

[PATCH 2/5] vfio/pci: Split vfio_intx_update()

2019-11-20 Thread David Gibson

This splits the vfio_intx_update() function into one part doing the actual
reconnection with the KVM irqchip (vfio_intx_update(), now taking an
argument with the new routing) and vfio_intx_routing_notifier() which
handles calls to the pci device intx routing notifier and calling
vfio_intx_update() when necessary.  This will make adding support for the
irqchip change notifier easier.

Cc: Alex Williamson 
Cc: Alexey Kardashevskiy 

Signed-off-by: David Gibson 
---
 hw/vfio/pci.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0c55883bba..521289aa7d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -216,30 +216,18 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
 #endif
 }
 
-static void vfio_intx_update(PCIDevice *pdev)
+static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
 {
-VFIOPCIDevice *vdev = PCI_VFIO(pdev);
-PCIINTxRoute route;
 Error *err = NULL;
 
-if (vdev->interrupt != VFIO_INT_INTx) {
-return;
-}
-
-route = pci_device_route_intx_to_irq(&vdev->pdev, vdev->intx.pin);
-
-if (!pci_intx_route_changed(&vdev->intx.route, &route)) {
-return; /* Nothing changed */
-}
-
 trace_vfio_intx_update(vdev->vbasedev.name,
-   vdev->intx.route.irq, route.irq);
+   vdev->intx.route.irq, route->irq);
 
 vfio_intx_disable_kvm(vdev);
 
-vdev->intx.route = route;
+vdev->intx.route = *route;
 
-if (route.mode != PCI_INTX_ENABLED) {
+if (route->mode != PCI_INTX_ENABLED) {
 return;
 }
 
@@ -252,6 +240,22 @@ static void vfio_intx_update(PCIDevice *pdev)
 vfio_intx_eoi(&vdev->vbasedev);
 }
 
+static void vfio_intx_routing_notifier(PCIDevice *pdev)
+{
+VFIOPCIDevice *vdev = PCI_VFIO(pdev);
+PCIINTxRoute route;
+
+if (vdev->interrupt != VFIO_INT_INTx) {
+return;
+}
+
+route = pci_device_route_intx_to_irq(&vdev->pdev, vdev->intx.pin);
+
+if (pci_intx_route_changed(&vdev->intx.route, &route)) {
+vfio_intx_update(vdev, &route);
+}
+}
+
 static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
@@ -2967,7 +2971,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1)) {
 vdev->intx.mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
   vfio_intx_mmap_enable, vdev);
-pci_device_set_intx_routing_notifier(&vdev->pdev, vfio_intx_update);
+pci_device_set_intx_routing_notifier(&vdev->pdev,
+ vfio_intx_routing_notifier);
 ret = vfio_intx_enable(vdev, errp);
 if (ret) {
 goto out_teardown;
-- 
2.23.0

[PATCH 1/5] kvm: Introduce KVM irqchip change notifier

2019-11-20 Thread David Gibson

Awareness of an in kernel irqchip is usually local to the machine and its
top-level interrupt controller.  However, in a few cases other things need
to know about it.  In particular vfio devices need this in order to
accelerate interrupt delivery.

If interrupt routing is changed, such devices may need to readjust their
connection to the KVM irqchip.  pci_bus_fire_intx_routing_notifier() exists
to do just this.

However, for the pseries machine type we have a situation where the routing
remains constant but the top-level irq chip itself is changed.  This occurs
because of PAPR feature negotiation which allows the guest to decide
between the older XICS and newer XIVE irq chip models (both of which are
paravirtualized).

To allow devices like vfio to adjust to this change, introduce a new
notifier for the purpose kvm_irqchip_change_notify().

Cc: Alex Williamson 
Cc: Alexey Kardashevskiy 

Signed-off-by: David Gibson 
---
 accel/kvm/kvm-all.c| 18 ++
 accel/stubs/kvm-stub.c | 12 
 include/sysemu/kvm.h   |  5 +
 3 files changed, 35 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 140b0bd8f6..ca00daa2f5 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -149,6 +149,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = 
{
 KVM_CAP_LAST_INFO
 };
 
+static NotifierList kvm_irqchip_change_notifiers =
+NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
+
 #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
 #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
 
@@ -1396,6 +1399,21 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
 trace_kvm_irqchip_release_virq(virq);
 }
 
+void kvm_irqchip_add_change_notifier(Notifier *n)
+{
+notifier_list_add(&kvm_irqchip_change_notifiers, n);
+}
+
+void kvm_irqchip_remove_change_notifier(Notifier *n)
+{
+notifier_remove(n);
+}
+
+void kvm_irqchip_change_notify(void)
+{
+notifier_list_notify(&kvm_irqchip_change_notifiers, NULL);
+}
+
 static unsigned int kvm_hash_msi(uint32_t data)
 {
 /* This is optimized for IA32 MSI layout. However, no other arch shall
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 6feb66ed80..82f118d2df 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -138,6 +138,18 @@ void kvm_irqchip_commit_routes(KVMState *s)
 {
 }
 
+void kvm_irqchip_add_change_notifier(Notifier *n)
+{
+}
+
+void kvm_irqchip_remove_change_notifier(Notifier *n)
+{
+}
+
+void kvm_irqchip_change_notify(void)
+{
+}
+
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
 return -ENOSYS;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9d143282bc..9fe233b9bf 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -201,6 +201,7 @@ typedef struct KVMCapabilityInfo {
 struct KVMState;
 typedef struct KVMState KVMState;
 extern KVMState *kvm_state;
+typedef struct Notifier Notifier;
 
 /* external API */
 
@@ -401,6 +402,10 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
 void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
 
+void kvm_irqchip_add_change_notifier(Notifier *n);
+void kvm_irqchip_remove_change_notifier(Notifier *n);
+void kvm_irqchip_change_notify(void);
+
 void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
 
 struct kvm_guest_debug;
-- 
2.23.0

[PATCH 4/5] spapr: Handle irq backend changes with VFIO PCI devices

2019-11-20 Thread David Gibson

pseries machine type can have one of two different interrupt controllers in
use depending on feature negotiation with the guest.  Usually this is
invisible to devices, because they route to a common set of qemu_irqs which
in turn dispatch to the correct back end.

VFIO passthrough devices, however, wire themselves up directly to the KVM
irqchip for performance, which means they are affected by this change in
interrupt controller.  To get them to adjust correctly for the change in
irqchip, we need to fire the kvm irqchip change notifier.

Cc: Alex Williamson 
Cc: Alexey Kardashevskiy 

Signed-off-by: David Gibson 
---
 hw/ppc/spapr_irq.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 168044be85..1d27034962 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -508,6 +508,12 @@ static void set_active_intc(SpaprMachineState *spapr,
 }
 
 spapr->active_intc = new_intc;
+
+/*
+ * We've changed the kernel irqchip, let VFIO devices know they
+ * need to readjust.
+ */
+kvm_irqchip_change_notify();
 }
 
 void spapr_irq_update_active_intc(SpaprMachineState *spapr)
-- 
2.23.0

[PATCH 3/5] vfio/pci: Respond to KVM irqchip change notifier

2019-11-20 Thread David Gibson

VFIO PCI devices already respond to the pci intx routing notifier, in order
to update kernel irqchip mappings when routing is updated.  However this
won't handle the case where the irqchip itself is replaced by a different
model while retaining the same routing.  This case can happen on
the pseries machine type due to PAPR feature negotiation.

To handle that case, add a handler for the irqchip change notifier, which
does much the same thing as the routing notifier, but is unconditional,
rather than being a no-op when the routing hasn't changed.

Cc: Alex Williamson 
Cc: Alexey Kardashevskiy 

Signed-off-by: David Gibson 
---
 hw/vfio/pci.c | 23 ++-
 hw/vfio/pci.h |  1 +
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 521289aa7d..95478c2c55 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -256,6 +256,14 @@ static void vfio_intx_routing_notifier(PCIDevice *pdev)
 }
 }
 
+static void vfio_irqchip_change(Notifier *notify, void *data)
+{
+VFIOPCIDevice *vdev = container_of(notify, VFIOPCIDevice,
+   irqchip_change_notifier);
+
+vfio_intx_update(vdev, &vdev->intx.route);
+}
+
 static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
@@ -2973,16 +2981,18 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
   vfio_intx_mmap_enable, vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev,
  vfio_intx_routing_notifier);
+vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
+kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
 ret = vfio_intx_enable(vdev, errp);
 if (ret) {
-goto out_teardown;
+goto out_deregister;
 }
 }
 
 if (vdev->display != ON_OFF_AUTO_OFF) {
 ret = vfio_display_probe(vdev, errp);
 if (ret) {
-goto out_teardown;
+goto out_deregister;
 }
 }
 if (vdev->enable_ramfb && vdev->dpy == NULL) {
@@ -2992,11 +3002,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 if (vdev->display_xres || vdev->display_yres) {
 if (vdev->dpy == NULL) {
 error_setg(errp, "xres and yres properties require display=on");
-goto out_teardown;
+goto out_deregister;
 }
 if (vdev->dpy->edid_regs == NULL) {
 error_setg(errp, "xres and yres properties need edid support");
-goto out_teardown;
+goto out_deregister;
 }
 }
 
@@ -3020,8 +3030,10 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 return;
 
-out_teardown:
+out_deregister:
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
+kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
+out_teardown:
 vfio_teardown_msi(vdev);
 vfio_bars_exit(vdev);
 error:
@@ -3064,6 +3076,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
+kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
 vfio_disable_interrupts(vdev);
 if (vdev->intx.mmap_timer) {
 timer_free(vdev->intx.mmap_timer);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b329d50338..35626cd63e 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -169,6 +169,7 @@ typedef struct VFIOPCIDevice {
 bool enable_ramfb;
 VFIODisplay *dpy;
 Error *migration_blocker;
+Notifier irqchip_change_notifier;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.23.0

[PATCH 0/5] vfio/spapr: Handle changes of master irq chip for VFIO devices

2019-11-20 Thread David Gibson

Due to the way feature negotiation works in PAPR (which is a
paravirtualized platform), we can end up changing the global irq chip
at runtime, including it's KVM accelerate model.  That causes
complications for VFIO devices with INTx, which wire themselves up
directly to the KVM irqchip for performance.

This series introduces a new notifier to let VFIO devices (and
anything else that needs to in the future) know about changes to the
master irqchip.  It modifies VFIO to respond to the notifier,
reconnecting itself to the new KVM irqchip as necessary.

In particular this removes a misleading (though not wholly inaccurate)
warning that occurs when using VFIO devices on a pseries machine type
guest.

Open question: should this go into qemu-4.2 or wait until 5.0?  It's
has medium complexity / intrusiveness, but it *is* a bugfix that I
can't see a simpler way to fix.  It's effectively a regression from
qemu-4.0 to qemu-4.1 (because that introduced XIVE support by
default), although not from 4.1 to 4.2.

Changes since RFC:
 * Fixed some incorrect error paths pointed by aw in 3/5
 * 5/5 had some problems previously, but they have been obsoleted by
   other changes merged in the meantime

David Gibson (5):
  kvm: Introduce KVM irqchip change notifier
  vfio/pci: Split vfio_intx_update()
  vfio/pci: Respond to KVM irqchip change notifier
  spapr: Handle irq backend changes with VFIO PCI devices
  spapr: Work around spurious warnings from vfio INTx initialization

 accel/kvm/kvm-all.c| 18 
 accel/stubs/kvm-stub.c | 12 
 hw/ppc/spapr_irq.c | 17 +++-
 hw/vfio/pci.c  | 62 +++---
 hw/vfio/pci.h  |  1 +
 include/sysemu/kvm.h   |  5 
 6 files changed, 92 insertions(+), 23 deletions(-)

-- 
2.23.0

[PATCH 5/5] spapr: Work around spurious warnings from vfio INTx initialization

2019-11-20 Thread David Gibson

Traditional PCI INTx for vfio devices can only perform well if using
an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
if an in kernel irqchip is not available.

We usually do have an in-kernel irqchip available for pseries machines
on POWER hosts.  However, because the platform allows feature
negotiation of what interrupt controller model to use, we don't
currently initialize it until machine reset.  vfio_intx_update() is
called (first) from vfio_realize() before that, so it can issue a
spurious warning, even if we will have an in kernel irqchip by the
time we need it.

To workaround this, make a call to spapr_irq_update_active_intc() from
spapr_irq_init() which is called at machine realize time, before the
vfio realize.  This call will be pretty much obsoleted by the later
call at reset time, but it serves to suppress the spurious warning
from VFIO.

Cc: Alex Williamson 
Cc: Alexey Kardashevskiy 

Signed-off-by: David Gibson 
---
 hw/ppc/spapr_irq.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 1d27034962..d6bb7fd2d6 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -373,6 +373,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 
 spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
   smc->nr_xirqs + SPAPR_XIRQ_BASE);
+
+/*
+ * Mostly we don't actually need this until reset, except that not
+ * having this set up can cause VFIO devices to issue a
+ * false-positive warning during realize(), because they don't yet
+ * have an in-kernel irq chip.
+ */
+spapr_irq_update_active_intc(spapr);
 }
 
 int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
@@ -528,7 +536,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr)
  * this.
  */
 new_intc = SPAPR_INTC(spapr->xive);
-} else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+} else if (spapr->ov5_cas
+   && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
 new_intc = SPAPR_INTC(spapr->xive);
 } else {
 new_intc = SPAPR_INTC(spapr->ics);
-- 
2.23.0

Re: [PATCH v16 13/14] tests/numa: Add case for QMP build HMAT

2019-11-20 Thread Tao Xu


On 11/20/2019 8:32 PM, Igor Mammedov wrote:

On Fri, 15 Nov 2019 15:53:51 +0800
Tao Xu  wrote:


Check configuring HMAT usecase

Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
---

New patch in v16.
---
  tests/numa-test.c | 51 +++
  1 file changed, 51 insertions(+)


I'd also add X-FAIL variants here, to test fail conditions.
Taking in account that QMP interface returns error without
affecting QEMU state, you can do it within one test case without
restarting it on every fail scenario.
(just add appropriate comments so reader would know that you are
testing this and that failure path)

So I'd first test x-fail variants and then finish test with
valid configuration.



Thank you for your suggestion. I will add it in next version.

[PATCH v3 4/4] target/arm: Add support for DC CVAP & DC CVADP ins

2019-11-20 Thread Beata Michalska

ARMv8.2 introduced support for Data Cache Clean instructions
to PoP (point-of-persistence) - DC CVAP and PoDP (point-of-deep-persistence)
- DV CVADP. Both specify conceptual points in a memory system where all writes
that are to reach them are considered persistent.
The support provided considers both to be actually the same so there is no
distinction between the two. If none is available (there is no backing store
for given memory) both will result in Data Cache Clean up to the point of
coherency. Otherwise sync for the specified range shall be performed.

Signed-off-by: Beata Michalska 
Reviewed-by: Richard Henderson 
---
 linux-user/elfload.c |  2 ++
 target/arm/cpu.h | 10 ++
 target/arm/cpu64.c   |  1 +
 target/arm/helper.c  | 56 
 4 files changed, 69 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f6693e5..07b16cc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -656,6 +656,7 @@ static uint32_t get_elf_hwcap(void)
 GET_FEATURE_ID(aa64_jscvt, ARM_HWCAP_A64_JSCVT);
 GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
 GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
+GET_FEATURE_ID(aa64_dcpop, ARM_HWCAP_A64_DCPOP);
 
 return hwcaps;
 }
@@ -665,6 +666,7 @@ static uint32_t get_elf_hwcap2(void)
 ARMCPU *cpu = ARM_CPU(thread_cpu);
 uint32_t hwcaps = 0;
 
+GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
 GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
 GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 83a809d..c3c0bf5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3616,6 +3616,16 @@ static inline bool isar_feature_aa64_frint(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FRINTTS) != 0;
 }
 
+static inline bool isar_feature_aa64_dcpop(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) != 0;
+}
+
+static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
+}
+
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index a39d6fc..61fd0ad 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -646,6 +646,7 @@ static void aarch64_max_initfn(Object *obj)
 cpu->isar.id_aa64isar0 = t;
 
 t = cpu->isar.id_aa64isar1;
+t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a089fb5..f90f3ec 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5929,6 +5929,52 @@ static const ARMCPRegInfo rndr_reginfo[] = {
   .access = PL0_R, .readfn = rndr_readfn },
 REGINFO_SENTINEL
 };
+
+#ifndef CONFIG_USER_ONLY
+static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque,
+  uint64_t value)
+{
+ARMCPU *cpu = env_archcpu(env);
+/* CTR_EL0 System register -> DminLine, bits [19:16] */
+uint64_t dline_size = 4 << ((cpu->ctr >> 16) & 0xF);
+uint64_t vaddr_in = (uint64_t) value;
+uint64_t vaddr = vaddr_in & ~(dline_size - 1);
+void *haddr;
+int mem_idx = cpu_mmu_index(env, false);
+
+/* This won't be crossing page boundaries */
+haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
+if (haddr) {
+
+ram_addr_t offset;
+MemoryRegion *mr;
+
+/* RCU lock is already being held */
+mr = memory_region_from_host(haddr, &offset);
+
+if (mr) {
+memory_region_do_writeback(mr, offset, dline_size);
+}
+}
+}
+
+static const ARMCPRegInfo dcpop_reg[] = {
+{ .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
+  .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+  .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+REGINFO_SENTINEL
+};
+
+static const ARMCPRegInfo dcpodp_reg[] = {
+{ .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
+  .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
+  .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+REGINFO_SENTINEL
+};
+#endif /*CONFIG_USER_ONLY*/
+
 #endif
 
 static CPAccessResult access_predinv(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -6889,6 +6935,16 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_rndr, cpu)) {
 define_arm_cp_regs(cpu, rndr_reginfo);
 }
+#ifndef CONFIG_USER_ONLY
+/* Data Cache clean instructions up to PoP */
+if (cpu_

[PATCH v3 3/4] migration: ram: Switch to ram block writeback

2019-11-20 Thread Beata Michalska

Switch to ram block writeback for pmem migration.

Signed-off-by: Beata Michalska 
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Acked-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5078f94..38070f1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -33,7 +33,6 @@
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
 #include "qemu/main-loop.h"
-#include "qemu/pmem.h"
 #include "xbzrle.h"
 #include "ram.h"
 #include "migration.h"
@@ -3981,9 +3980,7 @@ static int ram_load_cleanup(void *opaque)
 RAMBlock *rb;
 
 RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
-if (ramblock_is_pmem(rb)) {
-pmem_persist(rb->host, rb->used_length);
-}
+qemu_ram_block_writeback(rb);
 }
 
 xbzrle_load_cleanup();
-- 
2.7.4

[PATCH v3 2/4] Memory: Enable writeback for given memory region

2019-11-20 Thread Beata Michalska

Add an option to trigger memory writeback to sync given memory region
with the corresponding backing store, case one is available.
This extends the support for persistent memory, allowing syncing on-demand.

Signed-off-by: Beata Michalska 
---
 exec.c  | 36 
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  8 
 include/qemu/cutils.h   |  1 +
 memory.c| 12 
 util/cutils.c   | 38 ++
 6 files changed, 101 insertions(+)

diff --git a/exec.c b/exec.c
index ffdb518..a34c348 100644
--- a/exec.c
+++ b/exec.c
@@ -65,6 +65,8 @@
 #include "exec/ram_addr.h"
 #include "exec/log.h"
 
+#include "qemu/pmem.h"
+
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -2156,6 +2158,40 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return 0;
 }
 
+/*
+ * Trigger sync on the given ram block for range [start, start + length]
+ * with the backing store if one is available.
+ * Otherwise no-op.
+ * @Note: this is supposed to be a synchronous op.
+ */
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length)
+{
+void *addr = ramblock_ptr(block, start);
+
+/* The requested range should fit in within the block range */
+g_assert((start + length) <= block->used_length);
+
+#ifdef CONFIG_LIBPMEM
+/* The lack of support for pmem should not block the sync */
+if (ramblock_is_pmem(block)) {
+pmem_persist(addr, length);
+return;
+}
+#endif
+if (block->fd >= 0) {
+/**
+ * Case there is no support for PMEM or the memory has not been
+ * specified as persistent (or is not one) - use the msync.
+ * Less optimal but still achieves the same goal
+ */
+if (qemu_msync(addr, length, block->fd)) {
+warn_report("%s: failed to sync memory range: start: "
+RAM_ADDR_FMT " length: " RAM_ADDR_FMT,
+__func__, start, length);
+}
+}
+}
+
 /* Called with ram_list.mutex held */
 static void dirty_memory_extend(ram_addr_t old_ram_size,
 ram_addr_t new_ram_size)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e499dc2..27a84e0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1265,6 +1265,12 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
  */
 void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize,
   Error **errp);
+/**
+ * memory_region_do_writeback: Trigger writeback for selected address range
+ * [addr, addr + size]
+ *
+ */
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size);
 
 /**
  * memory_region_set_log: Turn dirty logging on or off for a region.
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index bed0554..5adebb0 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -174,6 +174,14 @@ void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
+void qemu_ram_writeback(RAMBlock *block, ram_addr_t start, ram_addr_t length);
+
+/* Clear whole block of mem */
+static inline void qemu_ram_block_writeback(RAMBlock *block)
+{
+qemu_ram_writeback(block, 0, block->used_length);
+}
+
 #define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index b54c847..eb59852 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -130,6 +130,7 @@ const char *qemu_strchrnul(const char *s, int c);
 #endif
 time_t mktimegm(struct tm *tm);
 int qemu_fdatasync(int fd);
+int qemu_msync(void *addr, size_t length, int fd);
 int fcntl_setfl(int fd, int flag);
 int qemu_parse_fd(const char *param);
 int qemu_strtoi(const char *nptr, const char **endptr, int base,
diff --git a/memory.c b/memory.c
index 06484c2..0228cad 100644
--- a/memory.c
+++ b/memory.c
@@ -2207,6 +2207,18 @@ void memory_region_ram_resize(MemoryRegion *mr, 
ram_addr_t newsize, Error **errp
 qemu_ram_resize(mr->ram_block, newsize, errp);
 }
 
+
+void memory_region_do_writeback(MemoryRegion *mr, hwaddr addr, hwaddr size)
+{
+/*
+ * Might be extended case needed to cover
+ * different types of memory regions
+ */
+if (mr->ram_block && mr->dirty_log_mask) {
+qemu_ram_writeback(mr->ram_block, addr, size);
+}
+}
+
 /*
  * Call proper memory listeners about the change on the newly
  * added/removed CoalescedMemoryRange.
diff --git a/util/cutils.c b/util/cutils.c
index fd591ca..c76ed88 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -164,6 +164,44 @@ int qemu_fdatasync(int fd)
 #endif
 }
 
+/**
+ * Sync changes made to the memory mapped file back to the backing
+ * storage. For POSIX compliant systems this will fallback
+ * to regular msync call. Otherwis

[PATCH v3 1/4] tcg: cputlb: Add probe_read

2019-11-20 Thread Beata Michalska

Add probe_read alongside the write probing equivalent.

Signed-off-by: Beata Michalska 
Reviewed-by: Alex Bennée 
---
 include/exec/exec-all.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d85e610..350c4b4 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -339,6 +339,12 @@ static inline void *probe_write(CPUArchState *env, 
target_ulong addr, int size,
 return probe_access(env, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
 }
 
+static inline void *probe_read(CPUArchState *env, target_ulong addr, int size,
+   int mmu_idx, uintptr_t retaddr)
+{
+return probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
+}
+
 #define CODE_GEN_ALIGN   16 /* must be >= of the size of a icache line 
*/
 
 /* Estimated block size for TB allocation.  */
-- 
2.7.4

[PATCH v3 0/4] target/arm: Support for Data Cache Clean up to PoP

2019-11-20 Thread Beata Michalska

ARMv8.2 introduced support for Data Cache Clean instructions to PoP
(point-of-persistence) and PoDP (point-of-deep-persistence):
ARMv8.2-DCCVAP &  ARMv8.2-DCCVADP respectively.
This patch set adds support for emulating both, though there is no
distinction between the two points: the PoDP is assumed to represent
the same point of persistence as PoP. Case there is no such point specified
for the considered memory system both will fall back to the DV CVAC inst
(clean up to the point of coherency).
The changes introduced include adding probe_read for validating read memory
access to allow verification for mandatory read access for both cache
clean instructions, along with support for writeback for requested memory
regions through msync, if one is available, based otherwise on fsyncdata.

As currently the virt platform is missing support for NVDIMM,
the changes have been tested  with [1] & [2]


[1] https://patchwork.kernel.org/cover/10830237/
[2] https://patchwork.kernel.org/project/qemu-devel/list/?series=159441

v3:
- Assert on invalid sync range for ram block
- Drop alignment handling from qemu_msync

v2:
- Moved the msync into a qemu wrapper with
  CONFIG_POSIX switch + additional comments
- Fixed length alignment
- Dropped treating the DC CVAP/CVADP as special case
  and moved those to conditional registration
- Dropped needless locking for grabbing mem region


Beata Michalska (4):
  tcg: cputlb: Add probe_read
  Memory: Enable writeback for given memory region
  migration: ram: Switch to ram block writeback
  target/arm: Add support for DC CVAP & DC CVADP ins

 exec.c  | 36 +++
 include/exec/exec-all.h |  6 ++
 include/exec/memory.h   |  6 ++
 include/exec/ram_addr.h |  8 +++
 include/qemu/cutils.h   |  1 +
 linux-user/elfload.c|  2 ++
 memory.c| 12 +++
 migration/ram.c |  5 +
 target/arm/cpu.h| 10 +
 target/arm/cpu64.c  |  1 +
 target/arm/helper.c | 56 +
 util/cutils.c   | 38 +
 12 files changed, 177 insertions(+), 4 deletions(-)

-- 
2.7.4

Re: [PATCH 0/6] qapi: Module fixes and cleanups

2019-11-20 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20191120182551.23795-1-arm...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTiotest-qcow2: 268
Failures: 192
Failed 1 of 108 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=7206e3a304c44c7187ff2b4aa7fff8d6', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-wk8w7tot/src/docker-src.2019-11-20-18.46.04.18266:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=7206e3a304c44c7187ff2b4aa7fff8d6
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-wk8w7tot/src'
make: *** [docker-run-test-quick@centos7] Error 2

real12m12.040s
user0m8.724s


The full log is available at
http://patchew.org/logs/20191120182551.23795-1-arm...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: 9p: requests efficiency

2019-11-20 Thread Christian Schoenebeck

On Freitag, 15. November 2019 14:26:56 CET Greg Kurz wrote:
> > However when there are a large number of (i.e. small) 9p requests, no
> > matter what the actual request type is, then I am encountering severe
> > performance issues with 9pfs and I try to understand whether this could
> > be improved with reasonable effort.
> 
> Thanks for doing that. This is typically the kind of effort I never
> dared starting on my own.

If you don't mind I still ask some more questions though, just in case you can 
gather them from the back of your head.

> > If I understand it correctly, each incoming request (T message) is
> > dispatched to its own qemu coroutine queue. So individual requests should
> > already be processed in parallel, right?
> 
> Sort of but not exactly. The real parallelization, ie. doing parallel
> processing with concurrent threads, doesn't take place on a per-request
> basis. 

Ok I see, I was just reading that each request causes this call sequence:

handle_9p_output() -> pdu_submit() -> qemu_co_queue_init(&pdu->complete)

and I was misinterpreting specifically that latter call to be an implied 
thread creation. Because that's what happens with other somewhat similar 
collaborative thread synchronization frameworks like "Grand Central Dispatch" 
or std::async.

But now I realize the entire QEMU coroutine framework is really just managing 
memory stacks, not actually anything about threads per se. The QEMU docs often 
use the term "threads" which is IMO misleading for what it really does.

> A typical request is broken down into several calls to the backend
> which may block because the backend itself calls a syscall that may block
> in the kernel. Each backend call is thus handled by its own thread from the
> mainloop thread pool (see hw/9pfs/coth.[ch] for details). The rest of the
> 9p code, basically everything in 9p.c, is serialized in the mainloop thread.

So the precise parallelism fork points in 9pfs (where tasks are dispatched to 
other threads) are the *_co_*() functions, and there precisely at where they 
are using v9fs_co_run_in_worker( X ) respectively, correct? Or are there more 
fork points than those?

If so, I haven't understood how precisely v9fs_co_run_in_worker() works. I 
mean I understand now how QEMU coroutines are working, and the idea of 
v9fs_co_run_in_worker() is dispatching the passed code block to the worker 
thread, but immediately returning back to main thread and continueing there on 
main thread with other coroutines while the worker thread's dispatched 
coroutine finished. But how that happens there precisely in 
v9fs_co_run_in_worker() is not yet clear to me.

Also where are the worker threads spawned actually?

Best regards,
Christian Schoenebeck

Re: [PATCH v4 25/37] dp8393x: replace PROP_PTR with PROP_LINK

2019-11-20 Thread Laurent Vivier

Le 20/11/2019 à 16:24, Marc-André Lureau a écrit :
> Link property is the correct way to pass a MemoryRegion to a device
> for DMA purposes.
> 
> Sidenote: as a sysbus device, this remains non-usercreatable
> even though we can drop the specific flag here.
> 
> Signed-off-by: Marc-André Lureau 
> Reviewed-by: Peter Maydell 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  hw/m68k/q800.c  | 3 ++-
>  hw/mips/mips_jazz.c | 3 ++-
>  hw/net/dp8393x.c| 7 +++
>  3 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
> index 4ca8678007..8f3eb6bfe7 100644
> --- a/hw/m68k/q800.c
> +++ b/hw/m68k/q800.c
> @@ -239,7 +239,8 @@ static void q800_init(MachineState *machine)
>  qdev_set_nic_properties(dev, &nd_table[0]);
>  qdev_prop_set_uint8(dev, "it_shift", 2);
>  qdev_prop_set_bit(dev, "big_endian", true);
> -qdev_prop_set_ptr(dev, "dma_mr", get_system_memory());
> +object_property_set_link(OBJECT(dev), OBJECT(get_system_memory()),
> + "dma_mr", &error_abort);
>  qdev_init_nofail(dev);
>  sysbus = SYS_BUS_DEVICE(dev);
>  sysbus_mmio_map(sysbus, 0, SONIC_BASE);
> diff --git a/hw/mips/mips_jazz.c b/hw/mips/mips_jazz.c
> index d978bb64a0..1518eb5e55 100644
> --- a/hw/mips/mips_jazz.c
> +++ b/hw/mips/mips_jazz.c
> @@ -284,7 +284,8 @@ static void mips_jazz_init(MachineState *machine,
>  dev = qdev_create(NULL, "dp8393x");
>  qdev_set_nic_properties(dev, nd);
>  qdev_prop_set_uint8(dev, "it_shift", 2);
> -qdev_prop_set_ptr(dev, "dma_mr", rc4030_dma_mr);
> +object_property_set_link(OBJECT(dev), OBJECT(rc4030_dma_mr),
> + "dma_mr", &error_abort);
>  qdev_init_nofail(dev);
>  sysbus = SYS_BUS_DEVICE(dev);
>  sysbus_mmio_map(sysbus, 0, 0x80001000);
> diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
> index 3d991af163..cdc2631c0c 100644
> --- a/hw/net/dp8393x.c
> +++ b/hw/net/dp8393x.c
> @@ -175,7 +175,7 @@ typedef struct dp8393xState {
>  int loopback_packet;
>  
>  /* Memory access */
> -void *dma_mr;
> +MemoryRegion *dma_mr;
>  AddressSpace as;
>  } dp8393xState;
>  
> @@ -948,7 +948,8 @@ static const VMStateDescription vmstate_dp8393x = {
>  
>  static Property dp8393x_properties[] = {
>  DEFINE_NIC_PROPERTIES(dp8393xState, conf),
> -DEFINE_PROP_PTR("dma_mr", dp8393xState, dma_mr),
> +DEFINE_PROP_LINK("dma_mr", dp8393xState, dma_mr,
> + TYPE_MEMORY_REGION, MemoryRegion *),
>  DEFINE_PROP_UINT8("it_shift", dp8393xState, it_shift, 0),
>  DEFINE_PROP_BOOL("big_endian", dp8393xState, big_endian, false),
>  DEFINE_PROP_END_OF_LIST(),
> @@ -963,8 +964,6 @@ static void dp8393x_class_init(ObjectClass *klass, void 
> *data)
>  dc->reset = dp8393x_reset;
>  dc->vmsd = &vmstate_dp8393x;
>  dc->props = dp8393x_properties;
> -/* Reason: dma_mr property can't be set */
> -dc->user_creatable = false;
>  }
>  
>  static const TypeInfo dp8393x_info = {
> 

Reviewed-by: Laurent Vivier 
Tested-by: Laurent Vivier

Re: [PATCH v2] linux-user/strace: Improve output of various syscalls

2019-11-20 Thread Aleksandar Markovic

On Wed, Nov 20, 2019 at 10:13 PM Aleksandar Markovic
 wrote:
>
> On Wed, Nov 20, 2019 at 3:58 PM Helge Deller  wrote:
> >
> > Improve strace output of various syscalls which either have none
> > or only int-type parameters.
> >
> > Signed-off-by: Helge Deller 
> >
>
> It would be nice if you included a history of the patch (after the line
> "---", as it is customary for single patch submission). You changed
> only ioctl() in v2, right?
>
> I missed your v2, but responded with several hints to v1.
>

userfaultfd(), membarrier(), mlock2()... - all could be included into
your patch.

The table https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
would make your job almost effortless.

> Yours,
> Aleksandar
>
> > diff --git a/linux-user/strace.list b/linux-user/strace.list
> > index 1de4319dcf..add53b1734 100644
> > --- a/linux-user/strace.list
> > +++ b/linux-user/strace.list
> > @@ -26,7 +26,7 @@
> >  { TARGET_NR_afs_syscall, "afs_syscall" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_alarm
> > -{ TARGET_NR_alarm, "alarm" , NULL, NULL, NULL },
> > +{ TARGET_NR_alarm, "alarm" , "%s(%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_aplib
> >  { TARGET_NR_aplib, "aplib" , NULL, NULL, NULL },
> > @@ -116,13 +116,13 @@
> >  { TARGET_NR_dipc, "dipc" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_dup
> > -{ TARGET_NR_dup, "dup" , NULL, NULL, NULL },
> > +{ TARGET_NR_dup, "dup" , "%s(%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_dup2
> > -{ TARGET_NR_dup2, "dup2" , NULL, NULL, NULL },
> > +{ TARGET_NR_dup2, "dup2" , "%s(%d,%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_dup3
> > -{ TARGET_NR_dup3, "dup3" , NULL, NULL, NULL },
> > +{ TARGET_NR_dup3, "dup3" , "%s(%d,%d,%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_epoll_create
> >  { TARGET_NR_epoll_create, "epoll_create" , NULL, NULL, NULL },
> > @@ -191,7 +191,7 @@
> >  { TARGET_NR_fanotify_mark, "fanotify_mark" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_fchdir
> > -{ TARGET_NR_fchdir, "fchdir" , NULL, NULL, NULL },
> > +{ TARGET_NR_fchdir, "fchdir" , "%s(%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_fchmod
> >  { TARGET_NR_fchmod, "fchmod" , "%s(%d,%#o)", NULL, NULL },
> > @@ -287,7 +287,7 @@
> >  { TARGET_NR_getdtablesize, "getdtablesize" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getegid
> > -{ TARGET_NR_getegid, "getegid" , NULL, NULL, NULL },
> > +{ TARGET_NR_getegid, "getegid" , "%s()", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getegid32
> >  { TARGET_NR_getegid32, "getegid32" , NULL, NULL, NULL },
> > @@ -299,7 +299,7 @@
> >  { TARGET_NR_geteuid32, "geteuid32" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getgid
> > -{ TARGET_NR_getgid, "getgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_getgid, "getgid" , "%s()", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getgid32
> >  { TARGET_NR_getgid32, "getgid32" , NULL, NULL, NULL },
> > @@ -329,10 +329,10 @@
> >  { TARGET_NR_getpeername, "getpeername" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getpgid
> > -{ TARGET_NR_getpgid, "getpgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_getpgid, "getpgid" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getpgrp
> > -{ TARGET_NR_getpgrp, "getpgrp" , NULL, NULL, NULL },
> > +{ TARGET_NR_getpgrp, "getpgrp" , "%s()", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getpid
> >  { TARGET_NR_getpid, "getpid" , "%s()", NULL, NULL },
> > @@ -432,7 +432,7 @@
> >  { TARGET_NR_io_cancel, "io_cancel" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_ioctl
> > -{ TARGET_NR_ioctl, "ioctl" , NULL, NULL, NULL },
> > +{ TARGET_NR_ioctl, "ioctl" , "%s(%d,%#x,%#x)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_io_destroy
> >  { TARGET_NR_io_destroy, "io_destroy" , NULL, NULL, NULL },
> > @@ -1257,22 +1257,22 @@
> >  { TARGET_NR_setdomainname, "setdomainname" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsgid
> > -{ TARGET_NR_setfsgid, "setfsgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_setfsgid, "setfsgid" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsgid32
> > -{ TARGET_NR_setfsgid32, "setfsgid32" , NULL, NULL, NULL },
> > +{ TARGET_NR_setfsgid32, "setfsgid32" , "%s(%u)" , NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsuid
> > -{ TARGET_NR_setfsuid, "setfsuid" , NULL, NULL, NULL },
> > +{ TARGET_NR_setfsuid, "setfsuid" , "%s(%u)" , NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsuid32
> >  { TARGET_NR_setfsuid32, "setfsuid32" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setgid
> > -{ TARGET_NR_setgid, "setgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_setgid, "setgid" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setgid32
> > -{ TARGET_NR_setgid32, "setgid32" , NULL, NULL, NULL },
> > +{ TARGET_NR_setgid32, "setgid32" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setgroups
> >  { TARGET_NR_setgroups, "setgroups" , NULL, NULL, NULL },
> > @@ -1296,7 +1296,7 @@
> >  { TARGET_NR_setns

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Paolo Bonzini

On 20/11/19 22:15, Eduardo Habkost wrote:
> 
> For how long was this broken?  Jiri, was libvirt including +vmx
> in mode=host-model for a long time, or is this something recent?

Could that be related to making nested=1 the default in the kernel?  KVM has

static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2
*entry)
{
if (func == 1 && nested)
entry->ecx |= bit(X86_FEATURE_VMX);
}

which would date the change to Linux 4.20 (December 2018).

Paolo

Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-11-20 Thread Scott Cheloha

On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > >>
> > > > >> If we instead keep track of the head of each priority's subqueue
> > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > >>
> > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > >> subqueue when removing it.
> > > > >>
> > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 
> > > > >> such
> > > > >> objects to insert.
> > > > > 
> > > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > > got 4 objects.  This feels very very wrong and is likely to cause
> > > > > problems to random other bits of qemu as well.
> > > > 
> > > > I think the 4 objects are the "dr-connectors" that are used to plug
> > > > peripherals (memory, pci card, cpus, ...).
> > > 
> > > Yes, Scott confirmed that in the reply to the previous version.
> > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > - I'm sure that something other than the migration code is going to
> > > get upset.
> > 
> > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > behaviour in the property stuff we had some ludicrously long startup
> > times (hours) with large maxmem values.
> > 
> > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > chunk of memory (or possible memory).
> > 
> > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > device that knows about all DRCs?
> > 
> > Maybe.  The tricky bit is how to get there from here without breaking
> > migration or something else along the way.
> 
> Switch on the next machine type version - it doesn't matter if migration
> is incompatible then.

1mo bump.

Is there anything I need to do with this patch in particular to make it suitable
for merging?

Re: guest / host buffer sharing ...

2019-11-20 Thread Geoffrey McRae

On 2019-11-20 23:13, Tomasz Figa wrote:

Hi Geoffrey,

On Thu, Nov 7, 2019 at 7:28 AM Geoffrey McRae  
wrote:

On 2019-11-06 23:41, Gerd Hoffmann wrote:
> On Wed, Nov 06, 2019 at 05:36:22PM +0900, David Stevens wrote:
>> > (1) The virtio device
>> > =
>> >
>> > Has a single virtio queue, so the guest can send commands to register
>> > and unregister buffers.  Buffers are allocated in guest ram.  Each buffer
>> > has a list of memory ranges for the data. Each buffer also has some
>>
>> Allocating from guest ram would work most of the time, but I think
>> it's insufficient for many use cases. It doesn't really support things
>> such as contiguous allocations, allocations from carveouts or <4GB,
>> protected buffers, etc.
>
> If there are additional constrains (due to gpu hardware I guess)
> I think it is better to leave the buffer allocation to virtio-gpu.

The entire point of this for our purposes is due to the fact that we 
can

not allocate the buffer, it's either provided by the GPU driver or
DirectX. If virtio-gpu were to allocate the buffer we might as well
forget
all this and continue using the ivshmem device.

I don't understand why virtio-gpu couldn't allocate those buffers.
Allocation doesn't necessarily mean creating new memory. Since the
virtio-gpu device on the host talks to the GPU driver (or DirectX?),
why couldn't it return one of the buffers provided by those if
BIND_SCANOUT is requested?

Because in our application we are a user-mode application in windows
that is provided with buffers that were allocated by the video stack in
windows. We are not using a virtual GPU but a physical GPU via vfio
passthrough and as such we are limited in what we can do. Unless I have
completely missed what virtio-gpu does, from what I understand it's
attempting to be a virtual GPU in its own right, which is not at all
suitable for our requirements.

This discussion seems to have moved away completely from the original
simple feature we need, which is to share a random block of guest
allocated ram with the host. While it would be nice if it's contiguous
ram, it's not an issue if it's not, and with udmabuf (now I understand
it) it can be made to appear contigous if it is so desired anyway.

vhost-user could be used for this if it is fixed to allow dynamic
remapping, all the other bells and whistles that are virtio-gpu are
useless to us.

Our use case is niche, and the state of things may change if vendors
like
AMD follow through with their promises and give us SR-IOV on consumer
GPUs, but even then we would still need their support to achieve the
same
results as the same issue would still be present.

Also don't forget that QEMU already has a non virtio generic device
(IVSHMEM). The only difference is, this device doesn't allow us to
attain
zero-copy transfers.

Currently IVSHMEM is used by two projects that I am aware of, Looking
Glass and SCREAM. While Looking Glass is solving a problem that is out
of
scope for QEMU, SCREAM is working around the audio problems in QEMU 
that

have been present for years now.

While I don't agree with SCREAM being used this way (we really need a
virtio-sound device, and/or intel-hda needs to be fixed), it again is 
an
example of working around bugs/faults/limitations in QEMU by those of 
us

that are unable to fix them ourselves and seem to have low priority to
the
QEMU project.

What we are trying to attain is freedom from dual boot Linux/Windows
systems, not migrate-able enterprise VPS configurations. The Looking
Glass project has brought attention to several other bugs/problems in
QEMU, some of which were fixed as a direct result of this project 
(i8042

race, AMD NPT).

Unless there is another solution to getting the guest GPUs 
frame-buffer
back to the host, a device like this will always be required. Since 
the

landscape could change at any moment, this device should not be a LG
specific device, but rather a generic device to allow for other
workarounds like LG to be developed in the future should they be
required.

Is it optimal? no
Is there a better solution? not that I am aware of

>
> virtio-gpu can't do that right now, but we have to improve virtio-gpu
> memory management for vulkan support anyway.
>
>> > properties to carry metadata, some fixed (id, size, application), but
>>
>> What exactly do you mean by application?
>
> Basically some way to group buffers.  A wayland proxy for example would
> add a "application=wayland-proxy" tag to the buffers it creates in the
> guest, and the host side part of the proxy could ask qemu (or another
> vmm) to notify about all buffers with that tag.  So in case multiple
> applications are using the device in parallel they don't interfere with
> each other.
>
>> > also allow free form (name = value, framebuffers would have
>> > width/height/stride/format for example).
>>
>> Is this approach expected to handle allocating buffers with
>> hardware-specific constraints such as stride/height alignment or

Re: [PATCH for-5.0 v5 11/23] ppc/pnv: Introduce a pnv_xive_is_cpu_enabled() helper

2019-11-20 Thread Cédric Le Goater

On 20/11/2019 18:26, Greg Kurz wrote:
> On Fri, 15 Nov 2019 17:24:24 +0100
> Cédric Le Goater  wrote:
> 
>> and use this helper to exclude CPUs which are not enabled in the XIVE
>> controller.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  hw/intc/pnv_xive.c | 18 ++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
>> index 71ca4961b6b1..4c8c6e51c20f 100644
>> --- a/hw/intc/pnv_xive.c
>> +++ b/hw/intc/pnv_xive.c
>> @@ -372,6 +372,20 @@ static int pnv_xive_get_eas(XiveRouter *xrtr, uint8_t 
>> blk, uint32_t idx,
>>  return pnv_xive_vst_read(xive, VST_TSEL_IVT, blk, idx, eas);
>>  }
>>  
>> +static int cpu_pir(PowerPCCPU *cpu)
>> +{
>> +CPUPPCState *env = &cpu->env;
>> +return env->spr_cb[SPR_PIR].default_value;
>> +}
>> +
>> +static bool pnv_xive_is_cpu_enabled(PnvXive *xive, PowerPCCPU *cpu)
>> +{
>> +int pir = cpu_pir(cpu);
>> +int thrd_id = pir & 0x7f;
>> +
>> +return xive->regs[PC_THREAD_EN_REG0 >> 3] & PPC_BIT(thrd_id);
> 
> A similar check is open-coded in pnv_xive_get_indirect_tctx() :
> 
> /* Check that HW thread is XIVE enabled */
> if (!(xive->regs[PC_THREAD_EN_REG0 >> 3] & PPC_BIT(pir & 0x3f))) {
> xive_error(xive, "IC: CPU %x is not enabled", pir);
> }
> 
> The thread id is only the 6 lower bits of the PIR there, and so seems to
> indicate the skiboot sources:
> 
> /* Get bit in register */
> bit = c->pir & 0x3f;

skiboot uses 0x3f when enabling the TCTXT of a CPU because register
INT_TCTXT_EN0 covers cores 0-15 (normal) and 0-7 (fused) and 
register INT_TCTXT_EN1 covers cores 16-23 (normal) and 8-11 (fused). 
The encoding in the registers is a bit different.

> Why make it pir & 0x7f here ? 

See pnv_chip_core_pir_p9 comments for some details on the CPU ID 
layout.

> If it should actually be 0x3f, 
but yes, we should fix the mask in the register setting. 

> maybe also use the helper in pnv_xive_get_indirect_tctx().

This is getting changed later on. So I rather not.

C.

> 
>> +}
>> +
>>  static int pnv_xive_match_nvt(XivePresenter *xptr, uint8_t format,
>>uint8_t nvt_blk, uint32_t nvt_idx,
>>bool cam_ignore, uint8_t priority,
>> @@ -393,6 +407,10 @@ static int pnv_xive_match_nvt(XivePresenter *xptr, 
>> uint8_t format,
>>  XiveTCTX *tctx;
>>  int ring;
>>  
>> +if (!pnv_xive_is_cpu_enabled(xive, cpu)) {
>> +continue;
>> +}
>> +
>>  tctx = XIVE_TCTX(pnv_cpu_state(cpu)->intc);
>>  
>>  /*
>

Re: [PATCH v2 6/6] iotests: Test committing to short backing file

2019-11-20 Thread Eric Blake


On 11/20/19 12:45 PM, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf 
---
  tests/qemu-iotests/274| 141 +
  tests/qemu-iotests/274.out| 227 ++
  tests/qemu-iotests/group  |   1 +
  tests/qemu-iotests/iotests.py |   2 +-
  4 files changed, 370 insertions(+), 1 deletion(-)
  create mode 100755 tests/qemu-iotests/274
  create mode 100644 tests/qemu-iotests/274.out





+iotests.log('=== Testing QMP active commit (top -> mid) ===')
+
+create_chain()
+with create_vm() as vm:
+vm.launch()
+vm.qmp_log('block-commit', device='top', base_node='mid',
+   job_id='job0', auto_dismiss=False)
+vm.run_job('job0', wait=5)
+
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+


Would it also be worth testing a commit of mid -> base, and showing that 
top still sees the same contents afterwards?


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 2/6] block: truncate: Don't make backing file data visible

2019-11-20 Thread Eric Blake


On 11/20/19 12:44 PM, Kevin Wolf wrote:

When extending the size of an image that has a backing file larger than
its old size, make sure that the backing file data doesn't become
visible in the guest, but the added area is properly zeroed out.

Consider the following scenario where the overlay is shorter than its
backing file:

 base.qcow2: 
 overlay.qcow2:  

When resizing (extending) overlay.qcow2, the new blocks should not stay
unallocated and make the additional As from base.qcow2 visible like
before this patch, but zeros should be read.

A similar case happens with the various variants of a commit job when an
intermediate file is short (- for unallocated):

 base.qcow2: A-A-
 mid.qcow2:  BB-B
 top.qcow2:  C--C--C-

After commit top.qcow2 to mid.qcow2, the following happens:

 mid.qcow2:  CB-C00C0 (correct result)
 mid.qcow2:  CB-C--C- (before this fix)

Without the fix, blocks that previously read as zeros on top.qcow2
suddenly turn into A.

Signed-off-by: Kevin Wolf 
---
  block/io.c | 32 
  1 file changed, 32 insertions(+)




+if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+int64_t backing_len;
+
+backing_len = bdrv_getlength(backing_bs(bs));
+if (backing_len < 0) {
+ret = backing_len;
+goto out;
+}
+
+if (backing_len > old_size) {
+ret = bdrv_co_do_pwrite_zeroes(
+bs, old_size, MIN(new_bytes, backing_len - old_size),
+BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP);
+if (ret < 0) {
+goto out;
+}
+}
+}


Note that if writing zeroes is not fast, and it turns out that we copy a 
lot of data rather than unallocated sections from the image being 
committed, that this can actually slow things down (doing a bulk 
pre-zero doubles up data I/O unless it is fast, which is why we added 
BDRV_REQ_NO_FALLBACK to avoid slow pre-zeroing).  However, the 
complication of zeroing only the unallocated clusters rather than a bulk 
pre-zeroing for something that is an unlikely corner case (how often do 
you create an overlay shorter than the backing file?) is not worth the 
extra code maintenance (unlike in the 'qemu-img convert' case where it 
was worth the optimization). So I'm fine with how you fixed it here.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Eduardo Habkost

On Wed, Nov 20, 2019 at 09:49:35PM +0100, Paolo Bonzini wrote:
> On 20/11/19 19:45, Eduardo Habkost wrote:
> > On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
> >> This allows using "-cpu Haswell,+vmx", which we did not really want to
> >> support in QEMU but was produced by Libvirt when using the "host-model"
> >> CPU model.
> > 
> > I understand guest ABI compatibility is not a concern, but I
> > don't remember how we guarantee it won't break by accident if
> > somebody tries to live migrate a VM.
> 
> I'm not sure I understand the question, but I can answer the second part:
> 
> > What is supposed to happen today if trying to live migrate a VM
> > using "-cpu Haswell,+vmx"?
> 
> Before 4.2: same guest ABI compatibility as "-cpu host".

Oh, I forgot that FEAT_KVM_* is recent and is not in QEMU 4.1.
If host-independent guest ABI was never guaranteed before, we're
not making it worse.

For how long was this broken?  Jiri, was libvirt including +vmx
in mode=host-model for a long time, or is this something recent?

> 
> 4.2+: ABI compatibility is preserved, because each named CPU model can
> be given a precise set of features that are matched against the host
> (and are subject to check/enforce).

Good.

> 
> 4.1->4.2: the ABI *should* be preserved if you're running "-cpu
> SandyBridge,+vmx" on an actual Sandy Bridge, but some VMX features will
> disappear after live migration if e.g. you're running "-cpu
> SandyBridge,+vmx" on a Haswell.  Host-model should be fine though.

That was the case I was worried about.

So, host-model should be fine in theory, because the CPU model
chosen by libvirt is supposed to match the host CPU.  Good.

All the other cases already had an unpredictable host-dependent
guest ABI, and can't be fixed.  Bad, but I don't think we can do
anything about it.

So:

Reviewed-by: Eduardo Habkost 

It would be nice if we at least printed a warning when using +vmx
with pc-*-4.1 and older, so people know their configuration is
likely to be broken.

-- 
Eduardo

Re: [PATCH v2] linux-user/strace: Improve output of various syscalls

2019-11-20 Thread Aleksandar Markovic

On Wed, Nov 20, 2019 at 3:58 PM Helge Deller  wrote:
>
> Improve strace output of various syscalls which either have none
> or only int-type parameters.
>
> Signed-off-by: Helge Deller 
>

It would be nice if you included a history of the patch (after the line
"---", as it is customary for single patch submission). You changed
only ioctl() in v2, right?

I missed your v2, but responded with several hints to v1.

Yours,
Aleksandar

> diff --git a/linux-user/strace.list b/linux-user/strace.list
> index 1de4319dcf..add53b1734 100644
> --- a/linux-user/strace.list
> +++ b/linux-user/strace.list
> @@ -26,7 +26,7 @@
>  { TARGET_NR_afs_syscall, "afs_syscall" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_alarm
> -{ TARGET_NR_alarm, "alarm" , NULL, NULL, NULL },
> +{ TARGET_NR_alarm, "alarm" , "%s(%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_aplib
>  { TARGET_NR_aplib, "aplib" , NULL, NULL, NULL },
> @@ -116,13 +116,13 @@
>  { TARGET_NR_dipc, "dipc" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_dup
> -{ TARGET_NR_dup, "dup" , NULL, NULL, NULL },
> +{ TARGET_NR_dup, "dup" , "%s(%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_dup2
> -{ TARGET_NR_dup2, "dup2" , NULL, NULL, NULL },
> +{ TARGET_NR_dup2, "dup2" , "%s(%d,%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_dup3
> -{ TARGET_NR_dup3, "dup3" , NULL, NULL, NULL },
> +{ TARGET_NR_dup3, "dup3" , "%s(%d,%d,%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_epoll_create
>  { TARGET_NR_epoll_create, "epoll_create" , NULL, NULL, NULL },
> @@ -191,7 +191,7 @@
>  { TARGET_NR_fanotify_mark, "fanotify_mark" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_fchdir
> -{ TARGET_NR_fchdir, "fchdir" , NULL, NULL, NULL },
> +{ TARGET_NR_fchdir, "fchdir" , "%s(%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_fchmod
>  { TARGET_NR_fchmod, "fchmod" , "%s(%d,%#o)", NULL, NULL },
> @@ -287,7 +287,7 @@
>  { TARGET_NR_getdtablesize, "getdtablesize" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getegid
> -{ TARGET_NR_getegid, "getegid" , NULL, NULL, NULL },
> +{ TARGET_NR_getegid, "getegid" , "%s()", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getegid32
>  { TARGET_NR_getegid32, "getegid32" , NULL, NULL, NULL },
> @@ -299,7 +299,7 @@
>  { TARGET_NR_geteuid32, "geteuid32" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getgid
> -{ TARGET_NR_getgid, "getgid" , NULL, NULL, NULL },
> +{ TARGET_NR_getgid, "getgid" , "%s()", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getgid32
>  { TARGET_NR_getgid32, "getgid32" , NULL, NULL, NULL },
> @@ -329,10 +329,10 @@
>  { TARGET_NR_getpeername, "getpeername" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getpgid
> -{ TARGET_NR_getpgid, "getpgid" , NULL, NULL, NULL },
> +{ TARGET_NR_getpgid, "getpgid" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getpgrp
> -{ TARGET_NR_getpgrp, "getpgrp" , NULL, NULL, NULL },
> +{ TARGET_NR_getpgrp, "getpgrp" , "%s()", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getpid
>  { TARGET_NR_getpid, "getpid" , "%s()", NULL, NULL },
> @@ -432,7 +432,7 @@
>  { TARGET_NR_io_cancel, "io_cancel" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_ioctl
> -{ TARGET_NR_ioctl, "ioctl" , NULL, NULL, NULL },
> +{ TARGET_NR_ioctl, "ioctl" , "%s(%d,%#x,%#x)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_io_destroy
>  { TARGET_NR_io_destroy, "io_destroy" , NULL, NULL, NULL },
> @@ -1257,22 +1257,22 @@
>  { TARGET_NR_setdomainname, "setdomainname" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsgid
> -{ TARGET_NR_setfsgid, "setfsgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setfsgid, "setfsgid" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsgid32
> -{ TARGET_NR_setfsgid32, "setfsgid32" , NULL, NULL, NULL },
> +{ TARGET_NR_setfsgid32, "setfsgid32" , "%s(%u)" , NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsuid
> -{ TARGET_NR_setfsuid, "setfsuid" , NULL, NULL, NULL },
> +{ TARGET_NR_setfsuid, "setfsuid" , "%s(%u)" , NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsuid32
>  { TARGET_NR_setfsuid32, "setfsuid32" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setgid
> -{ TARGET_NR_setgid, "setgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setgid, "setgid" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setgid32
> -{ TARGET_NR_setgid32, "setgid32" , NULL, NULL, NULL },
> +{ TARGET_NR_setgid32, "setgid32" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setgroups
>  { TARGET_NR_setgroups, "setgroups" , NULL, NULL, NULL },
> @@ -1296,7 +1296,7 @@
>  { TARGET_NR_setns, "setns" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setpgid
> -{ TARGET_NR_setpgid, "setpgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setpgid, "setpgid" , "%s(%u,%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setpgrp
>  { TARGET_NR_setpgrp, "setpgrp" , NULL, NULL, NULL },
> @@ -1311,22 +1311,22 @@
>  { TARGET_NR_setregid32, "setregid32" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setresgid
> -{ TARGET_NR_setresgid, "setresgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setresgid, "setresgid" , "%

Re: [PATCH] configure: Use lld --image-base for --disable-pie user mode binaries

2019-11-20 Thread Fangrui Song


On 2019-11-15, Fangrui Song wrote:

For lld, --image-base is the preferred way to set the base address.
lld does not actually implement -Ttext-segment, but treats it as an alias for
-Ttext. -Ttext-segment=0x6000 combined with --no-rosegment can
create a 1.6GB executable.

Fix the problem by using --image-base for lld. GNU ld and gold will
still get -Ttext-segment. Also delete the ld --verbose fallback introduced
in 2013, which is no longer relevant or correct (the default linker
script has changed).

Signed-off-by: Fangrui Song 
---
 configure | 33 -
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/configure b/configure
index 6099be1d84..2d45af0d09 100755
--- a/configure
+++ b/configure
@@ -6336,43 +6336,34 @@ fi

 # Probe for the need for relocating the user-only binary.
 if ( [ "$linux_user" = yes ] || [ "$bsd_user" = yes ] ) && [ "$pie" = no ]; 
then
-  textseg_addr=
+  image_base=
   case "$cpu" in
 arm | i386 | ppc* | s390* | sparc* | x86_64 | x32)
-  # ??? Rationale for choosing this address
-  textseg_addr=0x6000
+  # An arbitrary address that makes it unlikely to collide with user
+  # programs.
+  image_base=0x6000
   ;;
 mips)
   # A 256M aligned address, high in the address space, with enough
   # room for the code_gen_buffer above it before the stack.
-  textseg_addr=0x6000
+  image_base=0x6000
   ;;
   esac
-  if [ -n "$textseg_addr" ]; then
+  if [ -n "$image_base" ]; then
 cat > $TMPC &1; then
+image_base_ldflags="-Wl,--image-base=$image_base"
+if ! compile_prog "" "$image_base_ldflags"; then
+  image_base_ldflags="-Wl,-Ttext-segment=$image_base"
+  if ! compile_prog "" "$image_base_ldflags"; then
 error_exit \
 "We need to link the QEMU user mode binaries at a" \
 "specific text address. Unfortunately your linker" \
-"doesn't support either the -Ttext-segment option or" \
-"printing the default linker script with --verbose." \
+"supports neither --image-base nor -Ttext-segment. " \
 "If you don't want the user mode binaries, pass the" \
 "--disable-user option to configure."
   fi
-
-  $ld --verbose | sed \
--e '1,/==/d' \
--e '/==/,$d' \
--e "s/[.] = [0-9a-fx]* [+] SIZEOF_HEADERS/. = $textseg_addr + 
SIZEOF_HEADERS/" \
--e "s/__executable_start = [0-9a-fx]*/__executable_start = 
$textseg_addr/" > config-host.ld
-  textseg_ldflags="-Wl,-T../config-host.ld"
 fi
   fi
 fi
@@ -7945,7 +7936,7 @@ if test "$gprof" = "yes" ; then
 fi

 if test "$target_linux_user" = "yes" || test "$target_bsd_user" = "yes" ; then
-  ldflags="$ldflags $textseg_ldflags"
+  ldflags="$ldflags $image_base_ldflags"
 fi

 # Newer kernels on s390 check for an S390_PGSTE program header and
--
2.24.0



Ping :)

Re: [PATCH] linux-user/strace: Improve output of various syscalls

2019-11-20 Thread Aleksandar Markovic

On Wed, Nov 20, 2019 at 9:57 PM Aleksandar Markovic
 wrote:
>
> On Tue, Nov 19, 2019 at 8:05 PM Helge Deller  wrote:
> >
> > Improve strace output of various syscalls which either have none
> > or only int-type parameters.
> >
> > Signed-off-by: Helge Deller 
> >
>
> A very good patch!
>
> It would be even better if it covered ALL syscalls either without
> parameter or with all int-like parameters.
>
> I believe this table can be very useful to you, for the purpose
> of identifying such syscalls, and completing your patch:
>
> https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
>

You can even add items in strace.list for, let's say:

sys_sched_get_priority_max()
sys_sched_get_priority_min()

Both have "int policy" as a sole argument.

> Regards,
> Aleksandar
>
>
> > diff --git a/linux-user/strace.list b/linux-user/strace.list
> > index 1de4319dcf..5163717087 100644
> > --- a/linux-user/strace.list
> > +++ b/linux-user/strace.list
> > @@ -26,7 +26,7 @@
> >  { TARGET_NR_afs_syscall, "afs_syscall" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_alarm
> > -{ TARGET_NR_alarm, "alarm" , NULL, NULL, NULL },
> > +{ TARGET_NR_alarm, "alarm" , "%s(%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_aplib
> >  { TARGET_NR_aplib, "aplib" , NULL, NULL, NULL },
> > @@ -116,13 +116,13 @@
> >  { TARGET_NR_dipc, "dipc" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_dup
> > -{ TARGET_NR_dup, "dup" , NULL, NULL, NULL },
> > +{ TARGET_NR_dup, "dup" , "%s(%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_dup2
> > -{ TARGET_NR_dup2, "dup2" , NULL, NULL, NULL },
> > +{ TARGET_NR_dup2, "dup2" , "%s(%d,%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_dup3
> > -{ TARGET_NR_dup3, "dup3" , NULL, NULL, NULL },
> > +{ TARGET_NR_dup3, "dup3" , "%s(%d,%d,%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_epoll_create
> >  { TARGET_NR_epoll_create, "epoll_create" , NULL, NULL, NULL },
> > @@ -191,7 +191,7 @@
> >  { TARGET_NR_fanotify_mark, "fanotify_mark" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_fchdir
> > -{ TARGET_NR_fchdir, "fchdir" , NULL, NULL, NULL },
> > +{ TARGET_NR_fchdir, "fchdir" , "%s(%d)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_fchmod
> >  { TARGET_NR_fchmod, "fchmod" , "%s(%d,%#o)", NULL, NULL },
> > @@ -287,7 +287,7 @@
> >  { TARGET_NR_getdtablesize, "getdtablesize" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getegid
> > -{ TARGET_NR_getegid, "getegid" , NULL, NULL, NULL },
> > +{ TARGET_NR_getegid, "getegid" , "%s()", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getegid32
> >  { TARGET_NR_getegid32, "getegid32" , NULL, NULL, NULL },
> > @@ -299,7 +299,7 @@
> >  { TARGET_NR_geteuid32, "geteuid32" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getgid
> > -{ TARGET_NR_getgid, "getgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_getgid, "getgid" , "%s()", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getgid32
> >  { TARGET_NR_getgid32, "getgid32" , NULL, NULL, NULL },
> > @@ -329,10 +329,10 @@
> >  { TARGET_NR_getpeername, "getpeername" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getpgid
> > -{ TARGET_NR_getpgid, "getpgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_getpgid, "getpgid" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getpgrp
> > -{ TARGET_NR_getpgrp, "getpgrp" , NULL, NULL, NULL },
> > +{ TARGET_NR_getpgrp, "getpgrp" , "%s()", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_getpid
> >  { TARGET_NR_getpid, "getpid" , "%s()", NULL, NULL },
> > @@ -432,7 +432,7 @@
> >  { TARGET_NR_io_cancel, "io_cancel" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_ioctl
> > -{ TARGET_NR_ioctl, "ioctl" , NULL, NULL, NULL },
> > +{ TARGET_NR_ioctl, "ioctl" , "%s(%d,%#x,%#x,%#x,%#x,%#x)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_io_destroy
> >  { TARGET_NR_io_destroy, "io_destroy" , NULL, NULL, NULL },
> > @@ -1257,22 +1257,22 @@
> >  { TARGET_NR_setdomainname, "setdomainname" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsgid
> > -{ TARGET_NR_setfsgid, "setfsgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_setfsgid, "setfsgid" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsgid32
> > -{ TARGET_NR_setfsgid32, "setfsgid32" , NULL, NULL, NULL },
> > +{ TARGET_NR_setfsgid32, "setfsgid32" , "%s(%u)" , NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsuid
> > -{ TARGET_NR_setfsuid, "setfsuid" , NULL, NULL, NULL },
> > +{ TARGET_NR_setfsuid, "setfsuid" , "%s(%u)" , NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setfsuid32
> >  { TARGET_NR_setfsuid32, "setfsuid32" , NULL, NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setgid
> > -{ TARGET_NR_setgid, "setgid" , NULL, NULL, NULL },
> > +{ TARGET_NR_setgid, "setgid" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setgid32
> > -{ TARGET_NR_setgid32, "setgid32" , NULL, NULL, NULL },
> > +{ TARGET_NR_setgid32, "setgid32" , "%s(%u)", NULL, NULL },
> >  #endif
> >  #ifdef TARGET_NR_setgroups
> >  { TARGET_NR_setgroups, "setgroups

Re: [PATCH] linux-user/strace: Improve output of various syscalls

2019-11-20 Thread Aleksandar Markovic

On Tue, Nov 19, 2019 at 8:05 PM Helge Deller  wrote:
>
> Improve strace output of various syscalls which either have none
> or only int-type parameters.
>
> Signed-off-by: Helge Deller 
>

A very good patch!

It would be even better if it covered ALL syscalls either without
parameter or with all int-like parameters.

I believe this table can be very useful to you, for the purpose
of identifying such syscalls, and completing your patch:

https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/

Regards,
Aleksandar


> diff --git a/linux-user/strace.list b/linux-user/strace.list
> index 1de4319dcf..5163717087 100644
> --- a/linux-user/strace.list
> +++ b/linux-user/strace.list
> @@ -26,7 +26,7 @@
>  { TARGET_NR_afs_syscall, "afs_syscall" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_alarm
> -{ TARGET_NR_alarm, "alarm" , NULL, NULL, NULL },
> +{ TARGET_NR_alarm, "alarm" , "%s(%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_aplib
>  { TARGET_NR_aplib, "aplib" , NULL, NULL, NULL },
> @@ -116,13 +116,13 @@
>  { TARGET_NR_dipc, "dipc" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_dup
> -{ TARGET_NR_dup, "dup" , NULL, NULL, NULL },
> +{ TARGET_NR_dup, "dup" , "%s(%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_dup2
> -{ TARGET_NR_dup2, "dup2" , NULL, NULL, NULL },
> +{ TARGET_NR_dup2, "dup2" , "%s(%d,%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_dup3
> -{ TARGET_NR_dup3, "dup3" , NULL, NULL, NULL },
> +{ TARGET_NR_dup3, "dup3" , "%s(%d,%d,%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_epoll_create
>  { TARGET_NR_epoll_create, "epoll_create" , NULL, NULL, NULL },
> @@ -191,7 +191,7 @@
>  { TARGET_NR_fanotify_mark, "fanotify_mark" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_fchdir
> -{ TARGET_NR_fchdir, "fchdir" , NULL, NULL, NULL },
> +{ TARGET_NR_fchdir, "fchdir" , "%s(%d)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_fchmod
>  { TARGET_NR_fchmod, "fchmod" , "%s(%d,%#o)", NULL, NULL },
> @@ -287,7 +287,7 @@
>  { TARGET_NR_getdtablesize, "getdtablesize" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getegid
> -{ TARGET_NR_getegid, "getegid" , NULL, NULL, NULL },
> +{ TARGET_NR_getegid, "getegid" , "%s()", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getegid32
>  { TARGET_NR_getegid32, "getegid32" , NULL, NULL, NULL },
> @@ -299,7 +299,7 @@
>  { TARGET_NR_geteuid32, "geteuid32" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getgid
> -{ TARGET_NR_getgid, "getgid" , NULL, NULL, NULL },
> +{ TARGET_NR_getgid, "getgid" , "%s()", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getgid32
>  { TARGET_NR_getgid32, "getgid32" , NULL, NULL, NULL },
> @@ -329,10 +329,10 @@
>  { TARGET_NR_getpeername, "getpeername" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getpgid
> -{ TARGET_NR_getpgid, "getpgid" , NULL, NULL, NULL },
> +{ TARGET_NR_getpgid, "getpgid" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getpgrp
> -{ TARGET_NR_getpgrp, "getpgrp" , NULL, NULL, NULL },
> +{ TARGET_NR_getpgrp, "getpgrp" , "%s()", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_getpid
>  { TARGET_NR_getpid, "getpid" , "%s()", NULL, NULL },
> @@ -432,7 +432,7 @@
>  { TARGET_NR_io_cancel, "io_cancel" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_ioctl
> -{ TARGET_NR_ioctl, "ioctl" , NULL, NULL, NULL },
> +{ TARGET_NR_ioctl, "ioctl" , "%s(%d,%#x,%#x,%#x,%#x,%#x)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_io_destroy
>  { TARGET_NR_io_destroy, "io_destroy" , NULL, NULL, NULL },
> @@ -1257,22 +1257,22 @@
>  { TARGET_NR_setdomainname, "setdomainname" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsgid
> -{ TARGET_NR_setfsgid, "setfsgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setfsgid, "setfsgid" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsgid32
> -{ TARGET_NR_setfsgid32, "setfsgid32" , NULL, NULL, NULL },
> +{ TARGET_NR_setfsgid32, "setfsgid32" , "%s(%u)" , NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsuid
> -{ TARGET_NR_setfsuid, "setfsuid" , NULL, NULL, NULL },
> +{ TARGET_NR_setfsuid, "setfsuid" , "%s(%u)" , NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setfsuid32
>  { TARGET_NR_setfsuid32, "setfsuid32" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setgid
> -{ TARGET_NR_setgid, "setgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setgid, "setgid" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setgid32
> -{ TARGET_NR_setgid32, "setgid32" , NULL, NULL, NULL },
> +{ TARGET_NR_setgid32, "setgid32" , "%s(%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setgroups
>  { TARGET_NR_setgroups, "setgroups" , NULL, NULL, NULL },
> @@ -1296,7 +1296,7 @@
>  { TARGET_NR_setns, "setns" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setpgid
> -{ TARGET_NR_setpgid, "setpgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setpgid, "setpgid" , "%s(%u,%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setpgrp
>  { TARGET_NR_setpgrp, "setpgrp" , NULL, NULL, NULL },
> @@ -1311,22 +1311,22 @@
>  { TARGET_NR_setregid32, "setregid32" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_set

Re: [PATCH 6/6] qapi: Simplify QAPISchemaModularCVisitor

2019-11-20 Thread Eric Blake


On 11/20/19 12:25 PM, Markus Armbruster wrote:

Since the previous commit, QAPISchemaVisitor.visit_module() is called
just once.  Simplify QAPISchemaModularCVisitor accordingly.

Signed-off-by: Markus Armbruster 
---


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Paolo Bonzini

On 20/11/19 19:45, Eduardo Habkost wrote:
> On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
>> This allows using "-cpu Haswell,+vmx", which we did not really want to
>> support in QEMU but was produced by Libvirt when using the "host-model"
>> CPU model.
> 
> I understand guest ABI compatibility is not a concern, but I
> don't remember how we guarantee it won't break by accident if
> somebody tries to live migrate a VM.

I'm not sure I understand the question, but I can answer the second part:

> What is supposed to happen today if trying to live migrate a VM
> using "-cpu Haswell,+vmx"?

Before 4.2: same guest ABI compatibility as "-cpu host".

4.2+: ABI compatibility is preserved, because each named CPU model can
be given a precise set of features that are matched against the host
(and are subject to check/enforce).

4.1->4.2: the ABI *should* be preserved if you're running "-cpu
SandyBridge,+vmx" on an actual Sandy Bridge, but some VMX features will
disappear after live migration if e.g. you're running "-cpu
SandyBridge,+vmx" on a Haswell.  Host-model should be fine though.

Paolo

Re: [PATCH] linux-user/strace: Improve output of various syscalls

2019-11-20 Thread Aleksandar Markovic

> @@ -26,7 +26,7 @@
>  { TARGET_NR_afs_syscall, "afs_syscall" , NULL, NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_alarm
> -{ TARGET_NR_alarm, "alarm" , NULL, NULL, NULL },
> +{ TARGET_NR_alarm, "alarm" , "%s(%d)", NULL, NULL },
>  #endif

Man page says:

unsigned int alarm(unsigned int seconds)

The sole argument is unsigned int - therefore "%d" should be "%u",
shouldn't it?

--

This is not a part of your changes, but appeared in your patch diff:

>  #ifdef TARGET_NR_epoll_create
>  { TARGET_NR_epoll_create, "epoll_create" , NULL, NULL, NULL },

>From man pages:

int epoll_create(int size);

So, this also belongs to the category "has only int-type parameter,'
and "%s(%d)" should be used, no?

---

>  #ifdef TARGET_NR_setresgid
> -{ TARGET_NR_setresgid, "setresgid" , NULL, NULL, NULL },
> +{ TARGET_NR_setresgid, "setresgid" , "%s(%u,%u,%u)", NULL, NULL },
>  #endif
>  #ifdef TARGET_NR_setresgid32
>  { TARGET_NR_setresgid32, "setresgid32" , NULL, NULL, NULL },
>  #endif

Why are you here correcting setresgid(), but leaving setresgid32()
intact, even though they have the same argument type pattern?

--

I have these objections, however, in general, I salute the patch, and
your efforts to improve QEMU linux-user strace, it is a quite useful
debug tool, and thanks for doing this! :)

Yours,
Aleksandar

Re: [PATCH 5/6] qapi: Fix code generation for empty modules

2019-11-20 Thread Eric Blake


On 11/20/19 12:25 PM, Markus Armbruster wrote:

When a sub-module doesn't contain any definitions, we don't generate
code for it, but we do generate the #include.

We generate code only for modules that get visited.
QAPISchema.visit() visits only modules that have definitions.  It can
visit modules multiple times.

Clean this up as follows.  Collect entities in their QAPISchemaModule.
Have QAPISchema.visit() call QAPISchemaModule.visit() for each module.
Have QAPISchemaModule.visit() call .visit_module() for itself, and
QAPISchemaEntity.visit() for each of its entities.  This way, we visit
each module exactly once.

Signed-off-by: Markus Armbruster 
---


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 4/6] qapi: Proper intermediate representation for modules

2019-11-20 Thread Eric Blake


On 11/20/19 12:25 PM, Markus Armbruster wrote:

Modules are represented only by their names so far.  Introduce class
QAPISchemaModule.  So far, it merely wraps the name.  The next patch
will put it to more interesting use.

Once again, arrays spice up the patch a bit.  For any other type,
@info points to the definition, which lets us map from @info to
module.  For arrays, there is no definition, and @info points to the
first use instead.  We have to use the element type's module instead,
which is only available after .check().

Signed-off-by: Markus Armbruster 
---
  scripts/qapi/schema.py | 63 --
  1 file changed, 43 insertions(+), 20 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] target/i386: add support for MSR_IA32_TSX_CTRL

2019-11-20 Thread Eduardo Habkost

On Wed, Nov 20, 2019 at 01:19:22PM +0100, Paolo Bonzini wrote:
> The MSR_IA32_TSX_CTRL MSR can be used to hide TSX (also known as the
> Trusty Side-channel Extension).  By virtualizing the MSR, KVM guests
> can disable TSX and avoid paying the price of mitigating TSX-based
> attacks on microarchitectural side channels.
> 
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Eduardo Habkost 

Minor suggestion, though: replacing the tabs below with spaces:

[...]
> +#define ARCH_CAP_TSX_CTRL_MSR(1<<7)
[...]
> +#define MSR_IA32_TSX_CTRL0x122

-- 
Eduardo

Re: [PATCH 0/6] qapi: Module fixes and cleanups

2019-11-20 Thread Markus Armbruster

Fat-fingered Kevin's e-mail address...

Markus Armbruster  writes:

> Kevin recently posted a minimally invasive fix for empty QAPI
> modules[*].  This is my attempt at a fix that also addresses the
> design weakness that led to the bug.
>
> Markus Armbruster (6):
>   qapi: Tweak "command returns a nice type" check for clarity
>   tests/Makefile.include: Fix missing test-qapi-emit-events.[ch]
>   qapi: Generate command registration stuff into separate files
>   qapi: Proper intermediate representation for modules
>   qapi: Fix code generation for empty modules
>   qapi: Simplify QAPISchemaModularCVisitor
>
>  docs/devel/qapi-code-gen.txt | 19 -
>  Makefile |  4 +-
>  monitor/misc.c   |  7 +-
>  qga/main.c   |  2 +-
>  tests/test-qmp-cmds.c|  1 +
>  .gitignore   |  1 +
>  qapi/Makefile.objs   |  1 +
>  qga/Makefile.objs|  1 +
>  scripts/qapi/commands.py | 17 +++--
>  scripts/qapi/events.py   |  2 +-
>  scripts/qapi/gen.py  | 28 
>  scripts/qapi/schema.py   | 92 +++-
>  scripts/qapi/types.py|  5 +-
>  scripts/qapi/visit.py|  8 +--
>  tests/.gitignore |  1 +
>  tests/Makefile.include   |  9 ++-
>  tests/qapi-schema/empty.out  |  1 +
>  tests/qapi-schema/include-repetition.out |  6 +-
>  tests/qapi-schema/qapi-schema-test.out   | 24 +++
>  19 files changed, 144 insertions(+), 85 deletions(-)

Re: [PATCH 3/6] qapi: Generate command registration stuff into separate files

2019-11-20 Thread Eric Blake


On 11/20/19 12:25 PM, Markus Armbruster wrote:

Having to include qapi-commands.h just for qmp_init_marshal() is
suboptimal.  Generate it into separate files.  This lets
monitor/misc.c, qga/main.c, and the generated qapi-commands-FOO.h
include less.

Signed-off-by: Markus Armbruster 
---



+++ b/docs/devel/qapi-code-gen.txt
@@ -1493,6 +1493,10 @@ $(prefix)qapi-commands.c: Command marshal/dispatch 
functions for each
  $(prefix)qapi-commands.h: Function prototypes for the QMP commands
specified in the schema
  
+$(prefix)qapi-init-commands.h - Command initialization prototype

+
+$(prefix)qapi-init-commands.h - Command initialization code


Looks like you meant s/h/c/



+#endif /* EXAMPLE_QAPI_INIT_COMMANDS_H */
+$ cat qapi-generated/example-qapi-init-commands.
+[Uninteresting stuff omitted...]


missing a 'c'


+++ b/Makefile


  
-QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qapi-commands.h)

+QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h 
qga-qapi-commands.h qga-qapi-init-commands.h)


Worth using \ for line-wrapping?

With those addressed,
Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 2/6] tests/Makefile.include: Fix missing test-qapi-emit-events.[ch]

2019-11-20 Thread Eric Blake


On 11/20/19 12:25 PM, Markus Armbruster wrote:

Commit 5d75648b56 "qapi: Generate QAPIEvent stuff into separate files"
added tests/test-qapi-emit-events.[ch] to the set of generated files,
but neglected to update tests/.gitignore and tests/Makefile.include.
Commit a0af8cee3c "tests/.gitignore: ignore test-qapi-emit-events.[ch]
for in-tree builds" fixed the former.  Now fix the latter.

Signed-off-by: Markus Armbruster 
---
  tests/Makefile.include | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[Bug 1852781] Re: qemu s390x on focal - applications breaking

2019-11-20 Thread Frank Heimes

** Also affects: ubuntu-z-systems
   Importance: Undecided
   Status: New

** Changed in: ubuntu-z-systems
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1852781

Title:
  qemu s390x on focal - applications breaking

Status in QEMU:
  Incomplete
Status in Ubuntu on IBM z Systems:
  Incomplete

Bug description:
  Running qemu-system-s390x (1:4.0+dfsg-0ubuntu10) on an x86-64 Focal
  host with an upgrade of a Eoan s390x VM to a Focal s390x is triggering
  random breakage, for example:

  sudo apt-get update && sudo apt-get dist-upgrade

  ...
  ...

  Unpacking debianutils (4.9) over (4.8.6.3) ...
  Setting up debianutils (4.9) ...
  Use of uninitialized value $ARGV[0] in string ne at /usr/sbin/update-mime 
line 43.
  (Reading database ... 83640 files and directories currently installed.)
  Preparing to unpack .../bash_5.0-5ubuntu1_s390x.deb ...
  Unpacking bash (5.0-5ubuntu1) over (5.0-4ubuntu1) ...
  Setting up bash (5.0-5ubuntu1) ...
  [12124.788618] User process fault: interruption code 0007 ilc:3 in 
bash[2aa3d78+149000]
  dpkg: error processing package bash (--configure):
   installed bash package post-installation script subprocess was killed by 
signal (Floating point exception), core du
  mped
  Errors were encountered while processing:
   bash
  E: Sub-process /usr/bin/dpkg returned an error code (1)

  And now bash is completely broken:

  cking@eoan-s390x:~$ bash
  [12676.204389] User process fault: interruption code 0007 ilc:3 in 
bash[2aa1478+149000]

  Floating point exception (core dumped)

  The upgrade works OK on a s390x, so I'm assuming it's something to do
  with the qemu emulation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1852781/+subscriptions

[PATCH 9/9] monitor/hmp: Prefer to use hmp_handle_error for error reporting in block hmp commands

2019-11-20 Thread Maxim Levitsky

This way they all will be prefixed with 'Error:' which some parsers
(e.g libvirt need)


Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index c943dccd03..197994716f 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -59,7 +59,6 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict)
 mc = MACHINE_GET_CLASS(current_machine);
 dinfo = drive_new(opts, mc->block_default_type, &err);
 if (err) {
-error_report_err(err);
 qemu_opts_del(opts);
 goto err;
 }
@@ -73,7 +72,7 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "OK\n");
 break;
 default:
-monitor_printf(mon, "Can't hot-add drive to type %d\n", dinfo->type);
+error_setg(&err, "Can't hot-add drive to type %d", dinfo->type);
 goto err;
 }
 return;
@@ -84,6 +83,7 @@ err:
 monitor_remove_blk(blk);
 blk_unref(blk);
 }
+hmp_handle_error(mon, &err);
 }
 
 void hmp_drive_del(Monitor *mon, const QDict *qdict)
@@ -105,14 +105,14 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 
 blk = blk_by_name(id);
 if (!blk) {
-error_report("Device '%s' not found", id);
-return;
+error_setg(&local_err, "Device '%s' not found", id);
+goto err;
 }
 
 if (!blk_legacy_dinfo(blk)) {
-error_report("Deleting device added with blockdev-add"
- " is not supported");
-return;
+error_setg(&local_err,
+   "Deleting device added with blockdev-add is not supported");
+goto err;
 }
 
 aio_context = blk_get_aio_context(blk);
@@ -121,9 +121,8 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 bs = blk_bs(blk);
 if (bs) {
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, &local_err)) {
-error_report_err(local_err);
 aio_context_release(aio_context);
-return;
+goto err;
 }
 
 blk_remove_bs(blk);
@@ -144,12 +143,15 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 }
 
 aio_context_release(aio_context);
+err:
+hmp_handle_error(mon, &local_err);
 }
 
 void hmp_commit(Monitor *mon, const QDict *qdict)
 {
 const char *device = qdict_get_str(qdict, "device");
 BlockBackend *blk;
+Error *local_err = NULL;
 int ret;
 
 if (!strcmp(device, "all")) {
@@ -160,12 +162,12 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 
 blk = blk_by_name(device);
 if (!blk) {
-error_report("Device '%s' not found", device);
-return;
+error_setg(&local_err, "Device '%s' not found", device);
+goto err;
 }
 if (!blk_is_available(blk)) {
-error_report("Device '%s' has no medium", device);
-return;
+error_setg(&local_err, "Device '%s' has no medium", device);
+goto err;
 }
 
 bs = blk_bs(blk);
@@ -177,8 +179,13 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 aio_context_release(aio_context);
 }
 if (ret < 0) {
-error_report("'commit' error for '%s': %s", device, strerror(-ret));
+error_setg(&local_err,
+   "'commit' error for '%s': %s", device, strerror(-ret));
+goto err;
 }
+return;
+err:
+hmp_handle_error(mon, &local_err);
 }
 
 void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
-- 
2.17.2

Re: [PATCH 1/6] qapi: Tweak "command returns a nice type" check for clarity

2019-11-20 Thread Eric Blake


On 11/20/19 12:25 PM, Markus Armbruster wrote:

Signed-off-by: Markus Armbruster 
---
  scripts/qapi/schema.py | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)


Reviewed-by: Eric Blake 



diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index cf0045f34e..cfb574c85d 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -711,10 +711,11 @@ class QAPISchemaCommand(QAPISchemaEntity):
  self.ret_type = schema.resolve_type(
  self._ret_type_name, self.info, "command's 'returns'")
  if self.name not in self.info.pragma.returns_whitelist:
-if not (isinstance(self.ret_type, QAPISchemaObjectType)
-or (isinstance(self.ret_type, QAPISchemaArrayType)
-and isinstance(self.ret_type.element_type,
-   QAPISchemaObjectType))):
+typ = self.ret_type
+if isinstance(typ, QAPISchemaArrayType):
+typ = self.ret_type.element_type
+assert typ
+if not isinstance(typ, QAPISchemaObjectType):
  raise QAPISemError(
  self.info,
  "command's 'returns' cannot take %s"



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH 8/9] monitor: move hmp_info_block* to blockdev-hmp-cmds.c

2019-11-20 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 247 
 monitor/hmp-cmds.c  | 245 ---
 2 files changed, 247 insertions(+), 245 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index 76951352b1..c943dccd03 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -33,6 +33,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "block/block_int.h"
+#include "block/qapi.h"
 #include "qapi/qapi-commands-block.h"
 #include "qapi/qmp/qerror.h"
 #include "monitor/hmp.h"
@@ -400,3 +401,249 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict 
*qdict)
 qmp_block_set_io_throttle(&throttle, &err);
 hmp_handle_error(mon, &err);
 }
+
+static void print_block_info(Monitor *mon, BlockInfo *info,
+ BlockDeviceInfo *inserted, bool verbose)
+{
+ImageInfo *image_info;
+
+assert(!info || !info->has_inserted || info->inserted == inserted);
+
+if (info && *info->device) {
+monitor_printf(mon, "%s", info->device);
+if (inserted && inserted->has_node_name) {
+monitor_printf(mon, " (%s)", inserted->node_name);
+}
+} else {
+assert(info || inserted);
+monitor_printf(mon, "%s",
+   inserted && inserted->has_node_name ? 
inserted->node_name
+   : info && info->has_qdev ? info->qdev
+   : "");
+}
+
+if (inserted) {
+monitor_printf(mon, ": %s (%s%s%s)\n",
+   inserted->file,
+   inserted->drv,
+   inserted->ro ? ", read-only" : "",
+   inserted->encrypted ? ", encrypted" : "");
+} else {
+monitor_printf(mon, ": [not inserted]\n");
+}
+
+if (info) {
+if (info->has_qdev) {
+monitor_printf(mon, "Attached to:  %s\n", info->qdev);
+}
+if (info->has_io_status && info->io_status != 
BLOCK_DEVICE_IO_STATUS_OK) {
+monitor_printf(mon, "I/O status:   %s\n",
+   BlockDeviceIoStatus_str(info->io_status));
+}
+
+if (info->removable) {
+monitor_printf(mon, "Removable device: %slocked, tray %s\n",
+   info->locked ? "" : "not ",
+   info->tray_open ? "open" : "closed");
+}
+}
+
+
+if (!inserted) {
+return;
+}
+
+monitor_printf(mon, "Cache mode:   %s%s%s\n",
+   inserted->cache->writeback ? "writeback" : "writethrough",
+   inserted->cache->direct ? ", direct" : "",
+   inserted->cache->no_flush ? ", ignore flushes" : "");
+
+if (inserted->has_backing_file) {
+monitor_printf(mon,
+   "Backing file: %s "
+   "(chain depth: %" PRId64 ")\n",
+   inserted->backing_file,
+   inserted->backing_file_depth);
+}
+
+if (inserted->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF) {
+monitor_printf(mon, "Detect zeroes:%s\n",
+BlockdevDetectZeroesOptions_str(inserted->detect_zeroes));
+}
+
+if (inserted->bps  || inserted->bps_rd  || inserted->bps_wr  ||
+inserted->iops || inserted->iops_rd || inserted->iops_wr)
+{
+monitor_printf(mon, "I/O throttling:   bps=%" PRId64
+" bps_rd=%" PRId64  " bps_wr=%" PRId64
+" bps_max=%" PRId64
+" bps_rd_max=%" PRId64
+" bps_wr_max=%" PRId64
+" iops=%" PRId64 " iops_rd=%" PRId64
+" iops_wr=%" PRId64
+" iops_max=%" PRId64
+" iops_rd_max=%" PRId64
+" iops_wr_max=%" PRId64
+" iops_size=%" PRId64
+" group=%s\n",
+inserted->bps,
+inserted->bps_rd,
+inserted->bps_wr,
+inserted->bps_max,
+inserted->bps_rd_max,
+inserted->bps_wr_max,
+inserted->iops,
+inserted->iops_rd,
+inserted->iops_wr,
+inserted->iops_max,
+inserted->iops_rd_max,
+inserted->iops_wr_max,
+inserted->iops_size,
+inserted->group);
+}
+
+if (verbose) {
+monitor_printf(mon, "\nImages:\n");
+image_info = inserted->image;
+while (1) {
+bdrv_image_info_dump(image_info);
+if (image_info->has_backing_image) {
+image_info = image_info->backing_image;
+} else {
+

[PATCH 7/9] monitor: move remaining hmp_block* functions to blockdev-hmp-cmds.c

2019-11-20 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 63 
 monitor/hmp-cmds.c  | 64 -
 2 files changed, 63 insertions(+), 64 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index f3d22c7dd3..76951352b1 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -337,3 +337,66 @@ void hmp_snapshot_delete_blkdev_internal(Monitor *mon, 
const QDict *qdict)
true, name, &err);
 hmp_handle_error(mon, &err);
 }
+
+void hmp_block_resize(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+int64_t size = qdict_get_int(qdict, "size");
+Error *err = NULL;
+
+qmp_block_resize(true, device, false, NULL, size, &err);
+hmp_handle_error(mon, &err);
+}
+
+void hmp_block_stream(Monitor *mon, const QDict *qdict)
+{
+Error *error = NULL;
+const char *device = qdict_get_str(qdict, "device");
+const char *base = qdict_get_try_str(qdict, "base");
+int64_t speed = qdict_get_try_int(qdict, "speed", 0);
+
+qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
+ false, NULL, qdict_haskey(qdict, "speed"), speed, true,
+ BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
+ &error);
+
+hmp_handle_error(mon, &error);
+}
+
+void hmp_block_passwd(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *password = qdict_get_str(qdict, "password");
+Error *err = NULL;
+
+qmp_block_passwd(true, device, false, NULL, password, &err);
+hmp_handle_error(mon, &err);
+}
+
+void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+char *device = (char *) qdict_get_str(qdict, "device");
+BlockIOThrottle throttle = {
+.bps = qdict_get_int(qdict, "bps"),
+.bps_rd = qdict_get_int(qdict, "bps_rd"),
+.bps_wr = qdict_get_int(qdict, "bps_wr"),
+.iops = qdict_get_int(qdict, "iops"),
+.iops_rd = qdict_get_int(qdict, "iops_rd"),
+.iops_wr = qdict_get_int(qdict, "iops_wr"),
+};
+
+/* qmp_block_set_io_throttle has separate parameters for the
+ * (deprecated) block device name and the qdev ID but the HMP
+ * version has only one, so we must decide which one to pass. */
+if (blk_by_name(device)) {
+throttle.has_device = true;
+throttle.device = device;
+} else {
+throttle.has_id = true;
+throttle.id = device;
+}
+
+qmp_block_set_io_throttle(&throttle, &err);
+hmp_handle_error(mon, &err);
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 2acdcd6e1e..8be48e0af6 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1309,16 +1309,6 @@ void hmp_set_link(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &err);
 }
 
-void hmp_block_passwd(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-const char *password = qdict_get_str(qdict, "password");
-Error *err = NULL;
-
-qmp_block_passwd(true, device, false, NULL, password, &err);
-hmp_handle_error(mon, &err);
-}
-
 void hmp_balloon(Monitor *mon, const QDict *qdict)
 {
 int64_t value = qdict_get_int(qdict, "value");
@@ -1328,17 +1318,6 @@ void hmp_balloon(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &err);
 }
 
-void hmp_block_resize(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-int64_t size = qdict_get_int(qdict, "size");
-Error *err = NULL;
-
-qmp_block_resize(true, device, false, NULL, size, &err);
-hmp_handle_error(mon, &err);
-}
-
-
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
 int saved_vm_running  = runstate_is_running();
@@ -1887,49 +1866,6 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &err);
 }
 
-void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
-{
-Error *err = NULL;
-char *device = (char *) qdict_get_str(qdict, "device");
-BlockIOThrottle throttle = {
-.bps = qdict_get_int(qdict, "bps"),
-.bps_rd = qdict_get_int(qdict, "bps_rd"),
-.bps_wr = qdict_get_int(qdict, "bps_wr"),
-.iops = qdict_get_int(qdict, "iops"),
-.iops_rd = qdict_get_int(qdict, "iops_rd"),
-.iops_wr = qdict_get_int(qdict, "iops_wr"),
-};
-
-/* qmp_block_set_io_throttle has separate parameters for the
- * (deprecated) block device name and the qdev ID but the HMP
- * version has only one, so we must decide which one to pass. */
-if (blk_by_name(device)) {
-throttle.has_device = true;
-throttle.device = device;
-} else {
-throttle.has_id = true;
-throttle.id = device;
-}
-
-qmp_block_set_io_throttle(&throttle, &err);
-hmp_handle_error(mon, &er

[PATCH 1/9] monitor: uninline add_init_drive

2019-11-20 Thread Maxim Levitsky

This is only used by hmp_drive_add.
The code is just a bit shorter this way.

No functional changes

Signed-off-by: Maxim Levitsky 
---
 device-hotplug.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/device-hotplug.c b/device-hotplug.c
index f01d53774b..5ce73f0cff 100644
--- a/device-hotplug.c
+++ b/device-hotplug.c
@@ -34,42 +34,35 @@
 #include "monitor/monitor.h"
 #include "block/block_int.h"
 
-static DriveInfo *add_init_drive(const char *optstr)
+
+void hmp_drive_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
-DriveInfo *dinfo;
+DriveInfo *dinfo = NULL;
 QemuOpts *opts;
 MachineClass *mc;
+const char *optstr = qdict_get_str(qdict, "opts");
+bool node = qdict_get_try_bool(qdict, "node", false);
+
+if (node) {
+hmp_drive_add_node(mon, optstr);
+return;
+}
 
 opts = drive_def(optstr);
 if (!opts)
-return NULL;
+return;
 
 mc = MACHINE_GET_CLASS(current_machine);
 dinfo = drive_new(opts, mc->block_default_type, &err);
 if (err) {
 error_report_err(err);
 qemu_opts_del(opts);
-return NULL;
-}
-
-return dinfo;
-}
-
-void hmp_drive_add(Monitor *mon, const QDict *qdict)
-{
-DriveInfo *dinfo = NULL;
-const char *opts = qdict_get_str(qdict, "opts");
-bool node = qdict_get_try_bool(qdict, "node", false);
-
-if (node) {
-hmp_drive_add_node(mon, opts);
-return;
+goto err;
 }
 
-dinfo = add_init_drive(opts);
 if (!dinfo) {
-goto err;
+return;
 }
 
 switch (dinfo->type) {
-- 
2.17.2

[PATCH 5/9] monitor: move hmp_block_job* to blockdev-hmp-cmd.c

2019-11-20 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 52 +
 monitor/hmp-cmds.c  | 52 -
 2 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index 5ae899a324..e333de27b1 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -238,3 +238,55 @@ void hmp_drive_backup(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &err);
 }
 
+
+void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict)
+{
+Error *error = NULL;
+const char *device = qdict_get_str(qdict, "device");
+int64_t value = qdict_get_int(qdict, "speed");
+
+qmp_block_job_set_speed(device, value, &error);
+
+hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
+{
+Error *error = NULL;
+const char *device = qdict_get_str(qdict, "device");
+bool force = qdict_get_try_bool(qdict, "force", false);
+
+qmp_block_job_cancel(device, true, force, &error);
+
+hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_pause(Monitor *mon, const QDict *qdict)
+{
+Error *error = NULL;
+const char *device = qdict_get_str(qdict, "device");
+
+qmp_block_job_pause(device, &error);
+
+hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_resume(Monitor *mon, const QDict *qdict)
+{
+Error *error = NULL;
+const char *device = qdict_get_str(qdict, "device");
+
+qmp_block_job_resume(device, &error);
+
+hmp_handle_error(mon, &error);
+}
+
+void hmp_block_job_complete(Monitor *mon, const QDict *qdict)
+{
+Error *error = NULL;
+const char *device = qdict_get_str(qdict, "device");
+
+qmp_block_job_complete(device, &error);
+
+hmp_handle_error(mon, &error);
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index aa94a15d74..326276cced 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1976,58 +1976,6 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &error);
 }
 
-void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict)
-{
-Error *error = NULL;
-const char *device = qdict_get_str(qdict, "device");
-int64_t value = qdict_get_int(qdict, "speed");
-
-qmp_block_job_set_speed(device, value, &error);
-
-hmp_handle_error(mon, &error);
-}
-
-void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
-{
-Error *error = NULL;
-const char *device = qdict_get_str(qdict, "device");
-bool force = qdict_get_try_bool(qdict, "force", false);
-
-qmp_block_job_cancel(device, true, force, &error);
-
-hmp_handle_error(mon, &error);
-}
-
-void hmp_block_job_pause(Monitor *mon, const QDict *qdict)
-{
-Error *error = NULL;
-const char *device = qdict_get_str(qdict, "device");
-
-qmp_block_job_pause(device, &error);
-
-hmp_handle_error(mon, &error);
-}
-
-void hmp_block_job_resume(Monitor *mon, const QDict *qdict)
-{
-Error *error = NULL;
-const char *device = qdict_get_str(qdict, "device");
-
-qmp_block_job_resume(device, &error);
-
-hmp_handle_error(mon, &error);
-}
-
-void hmp_block_job_complete(Monitor *mon, const QDict *qdict)
-{
-Error *error = NULL;
-const char *device = qdict_get_str(qdict, "device");
-
-qmp_block_job_complete(device, &error);
-
-hmp_handle_error(mon, &error);
-}
-
 typedef struct HMPMigrationStatus
 {
 QEMUTimer *timer;
-- 
2.17.2

[PATCH 4/9] monitor: move hmp_drive_mirror and hmp_drive_backup to blockdev-hmp-cmds.c

2019-11-20 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 61 +
 monitor/hmp-cmds.c  | 58 --
 2 files changed, 61 insertions(+), 58 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index 8884618238..5ae899a324 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -34,6 +34,8 @@
 #include "monitor/monitor.h"
 #include "block/block_int.h"
 #include "qapi/qapi-commands-block.h"
+#include "qapi/qmp/qerror.h"
+#include "monitor/hmp.h"
 
 void hmp_drive_add(Monitor *mon, const QDict *qdict)
 {
@@ -177,3 +179,62 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 error_report("'commit' error for '%s': %s", device, strerror(-ret));
 }
 }
+
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
+{
+const char *filename = qdict_get_str(qdict, "target");
+const char *format = qdict_get_try_str(qdict, "format");
+bool reuse = qdict_get_try_bool(qdict, "reuse", false);
+bool full = qdict_get_try_bool(qdict, "full", false);
+Error *err = NULL;
+DriveMirror mirror = {
+.device = (char *)qdict_get_str(qdict, "device"),
+.target = (char *)filename,
+.has_format = !!format,
+.format = (char *)format,
+.sync = full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
+.has_mode = true,
+.mode = reuse ? NEW_IMAGE_MODE_EXISTING : 
NEW_IMAGE_MODE_ABSOLUTE_PATHS,
+.unmap = true,
+};
+
+if (!filename) {
+error_setg(&err, QERR_MISSING_PARAMETER, "target");
+hmp_handle_error(mon, &err);
+return;
+}
+qmp_drive_mirror(&mirror, &err);
+hmp_handle_error(mon, &err);
+}
+
+void hmp_drive_backup(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *filename = qdict_get_str(qdict, "target");
+const char *format = qdict_get_try_str(qdict, "format");
+bool reuse = qdict_get_try_bool(qdict, "reuse", false);
+bool full = qdict_get_try_bool(qdict, "full", false);
+bool compress = qdict_get_try_bool(qdict, "compress", false);
+Error *err = NULL;
+DriveBackup backup = {
+.device = (char *)device,
+.target = (char *)filename,
+.has_format = !!format,
+.format = (char *)format,
+.sync = full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
+.has_mode = true,
+.mode = reuse ? NEW_IMAGE_MODE_EXISTING : 
NEW_IMAGE_MODE_ABSOLUTE_PATHS,
+.has_compress = !!compress,
+.compress = compress,
+};
+
+if (!filename) {
+error_setg(&err, QERR_MISSING_PARAMETER, "target");
+hmp_handle_error(mon, &err);
+return;
+}
+
+qmp_drive_backup(&backup, &err);
+hmp_handle_error(mon, &err);
+}
+
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index b2551c16d1..aa94a15d74 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1338,64 +1338,6 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &err);
 }
 
-void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
-{
-const char *filename = qdict_get_str(qdict, "target");
-const char *format = qdict_get_try_str(qdict, "format");
-bool reuse = qdict_get_try_bool(qdict, "reuse", false);
-bool full = qdict_get_try_bool(qdict, "full", false);
-Error *err = NULL;
-DriveMirror mirror = {
-.device = (char *)qdict_get_str(qdict, "device"),
-.target = (char *)filename,
-.has_format = !!format,
-.format = (char *)format,
-.sync = full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-.has_mode = true,
-.mode = reuse ? NEW_IMAGE_MODE_EXISTING : 
NEW_IMAGE_MODE_ABSOLUTE_PATHS,
-.unmap = true,
-};
-
-if (!filename) {
-error_setg(&err, QERR_MISSING_PARAMETER, "target");
-hmp_handle_error(mon, &err);
-return;
-}
-qmp_drive_mirror(&mirror, &err);
-hmp_handle_error(mon, &err);
-}
-
-void hmp_drive_backup(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-const char *filename = qdict_get_str(qdict, "target");
-const char *format = qdict_get_try_str(qdict, "format");
-bool reuse = qdict_get_try_bool(qdict, "reuse", false);
-bool full = qdict_get_try_bool(qdict, "full", false);
-bool compress = qdict_get_try_bool(qdict, "compress", false);
-Error *err = NULL;
-DriveBackup backup = {
-.device = (char *)device,
-.target = (char *)filename,
-.has_format = !!format,
-.format = (char *)format,
-.sync = full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
-.has_mode = true,
-.mode = reuse ? NEW_IMAGE_MODE_EXISTING : 
NEW_IMAGE_MODE_ABSOLUTE_PATHS,
-.has_compress = !!compress,
-.compress = compress,
-};
-
-if (!filename) {
-error_setg(&err, QERR_MISSING_PARAMETER, "target");
-

[PATCH 6/9] monitor: move hmp_snapshot_* to blockdev-hmp-cmds.c

2019-11-20 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 47 +
 monitor/hmp-cmds.c  | 46 
 2 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index e333de27b1..f3d22c7dd3 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -290,3 +290,50 @@ void hmp_block_job_complete(Monitor *mon, const QDict 
*qdict)
 
 hmp_handle_error(mon, &error);
 }
+
+void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *filename = qdict_get_try_str(qdict, "snapshot-file");
+const char *format = qdict_get_try_str(qdict, "format");
+bool reuse = qdict_get_try_bool(qdict, "reuse", false);
+enum NewImageMode mode;
+Error *err = NULL;
+
+if (!filename) {
+/* In the future, if 'snapshot-file' is not specified, the snapshot
+   will be taken internally. Today it's actually required. */
+error_setg(&err, QERR_MISSING_PARAMETER, "snapshot-file");
+hmp_handle_error(mon, &err);
+return;
+}
+
+mode = reuse ? NEW_IMAGE_MODE_EXISTING : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+qmp_blockdev_snapshot_sync(true, device, false, NULL,
+   filename, false, NULL,
+   !!format, format,
+   true, mode, &err);
+hmp_handle_error(mon, &err);
+}
+
+void hmp_snapshot_blkdev_internal(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *name = qdict_get_str(qdict, "name");
+Error *err = NULL;
+
+qmp_blockdev_snapshot_internal_sync(device, name, &err);
+hmp_handle_error(mon, &err);
+}
+
+void hmp_snapshot_delete_blkdev_internal(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+const char *name = qdict_get_str(qdict, "name");
+const char *id = qdict_get_try_str(qdict, "id");
+Error *err = NULL;
+
+qmp_blockdev_snapshot_delete_internal_sync(device, !!id, id,
+   true, name, &err);
+hmp_handle_error(mon, &err);
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 326276cced..2acdcd6e1e 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1338,52 +1338,6 @@ void hmp_block_resize(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, &err);
 }
 
-void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-const char *filename = qdict_get_try_str(qdict, "snapshot-file");
-const char *format = qdict_get_try_str(qdict, "format");
-bool reuse = qdict_get_try_bool(qdict, "reuse", false);
-enum NewImageMode mode;
-Error *err = NULL;
-
-if (!filename) {
-/* In the future, if 'snapshot-file' is not specified, the snapshot
-   will be taken internally. Today it's actually required. */
-error_setg(&err, QERR_MISSING_PARAMETER, "snapshot-file");
-hmp_handle_error(mon, &err);
-return;
-}
-
-mode = reuse ? NEW_IMAGE_MODE_EXISTING : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
-qmp_blockdev_snapshot_sync(true, device, false, NULL,
-   filename, false, NULL,
-   !!format, format,
-   true, mode, &err);
-hmp_handle_error(mon, &err);
-}
-
-void hmp_snapshot_blkdev_internal(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-const char *name = qdict_get_str(qdict, "name");
-Error *err = NULL;
-
-qmp_blockdev_snapshot_internal_sync(device, name, &err);
-hmp_handle_error(mon, &err);
-}
-
-void hmp_snapshot_delete_blkdev_internal(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-const char *name = qdict_get_str(qdict, "name");
-const char *id = qdict_get_try_str(qdict, "id");
-Error *err = NULL;
-
-qmp_blockdev_snapshot_delete_internal_sync(device, !!id, id,
-   true, name, &err);
-hmp_handle_error(mon, &err);
-}
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-- 
2.17.2

[PATCH 3/9] monitor: move hmp_drive_del and hmp_commit to blockdev-hmp-cmds.c

2019-11-20 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 blockdev-hmp-cmds.c | 97 -
 blockdev.c  | 95 
 2 files changed, 96 insertions(+), 96 deletions(-)

diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
index 21ff6fa9a9..8884618238 100644
--- a/blockdev-hmp-cmds.c
+++ b/blockdev-hmp-cmds.c
@@ -33,7 +33,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "block/block_int.h"
-
+#include "qapi/qapi-commands-block.h"
 
 void hmp_drive_add(Monitor *mon, const QDict *qdict)
 {
@@ -82,3 +82,98 @@ err:
 blk_unref(blk);
 }
 }
+
+void hmp_drive_del(Monitor *mon, const QDict *qdict)
+{
+const char *id = qdict_get_str(qdict, "id");
+BlockBackend *blk;
+BlockDriverState *bs;
+AioContext *aio_context;
+Error *local_err = NULL;
+
+bs = bdrv_find_node(id);
+if (bs) {
+qmp_blockdev_del(id, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+return;
+}
+
+blk = blk_by_name(id);
+if (!blk) {
+error_report("Device '%s' not found", id);
+return;
+}
+
+if (!blk_legacy_dinfo(blk)) {
+error_report("Deleting device added with blockdev-add"
+ " is not supported");
+return;
+}
+
+aio_context = blk_get_aio_context(blk);
+aio_context_acquire(aio_context);
+
+bs = blk_bs(blk);
+if (bs) {
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, &local_err)) {
+error_report_err(local_err);
+aio_context_release(aio_context);
+return;
+}
+
+blk_remove_bs(blk);
+}
+
+/* Make the BlockBackend and the attached BlockDriverState anonymous */
+monitor_remove_blk(blk);
+
+/* If this BlockBackend has a device attached to it, its refcount will be
+ * decremented when the device is removed; otherwise we have to do so here.
+ */
+if (blk_get_attached_dev(blk)) {
+/* Further I/O must not pause the guest */
+blk_set_on_error(blk, BLOCKDEV_ON_ERROR_REPORT,
+ BLOCKDEV_ON_ERROR_REPORT);
+} else {
+blk_unref(blk);
+}
+
+aio_context_release(aio_context);
+}
+
+void hmp_commit(Monitor *mon, const QDict *qdict)
+{
+const char *device = qdict_get_str(qdict, "device");
+BlockBackend *blk;
+int ret;
+
+if (!strcmp(device, "all")) {
+ret = blk_commit_all();
+} else {
+BlockDriverState *bs;
+AioContext *aio_context;
+
+blk = blk_by_name(device);
+if (!blk) {
+error_report("Device '%s' not found", device);
+return;
+}
+if (!blk_is_available(blk)) {
+error_report("Device '%s' has no medium", device);
+return;
+}
+
+bs = blk_bs(blk);
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
+
+ret = bdrv_commit(bs);
+
+aio_context_release(aio_context);
+}
+if (ret < 0) {
+error_report("'commit' error for '%s': %s", device, strerror(-ret));
+}
+}
diff --git a/blockdev.c b/blockdev.c
index 8e029e9c01..df43e0aaef 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1074,41 +1074,6 @@ static BlockBackend *qmp_get_blk(const char *blk_name, 
const char *qdev_id,
 return blk;
 }
 
-void hmp_commit(Monitor *mon, const QDict *qdict)
-{
-const char *device = qdict_get_str(qdict, "device");
-BlockBackend *blk;
-int ret;
-
-if (!strcmp(device, "all")) {
-ret = blk_commit_all();
-} else {
-BlockDriverState *bs;
-AioContext *aio_context;
-
-blk = blk_by_name(device);
-if (!blk) {
-error_report("Device '%s' not found", device);
-return;
-}
-if (!blk_is_available(blk)) {
-error_report("Device '%s' has no medium", device);
-return;
-}
-
-bs = blk_bs(blk);
-aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
-
-ret = bdrv_commit(bs);
-
-aio_context_release(aio_context);
-}
-if (ret < 0) {
-error_report("'commit' error for '%s': %s", device, strerror(-ret));
-}
-}
-
 static void blockdev_do_action(TransactionAction *action, Error **errp)
 {
 TransactionActionList list;
@@ -3101,66 +3066,6 @@ BlockDirtyBitmapSha256 
*qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
 return ret;
 }
 
-void hmp_drive_del(Monitor *mon, const QDict *qdict)
-{
-const char *id = qdict_get_str(qdict, "id");
-BlockBackend *blk;
-BlockDriverState *bs;
-AioContext *aio_context;
-Error *local_err = NULL;
-
-bs = bdrv_find_node(id);
-if (bs) {
-qmp_blockdev_del(id, &local_err);
-if (local_err) {
-error_report_err(local_err);
-}
-return;
-}
-
-blk = blk_by_name(id);
-if (!blk)

[PATCH 2/9] monitor: rename device-hotplug.c to blockdev-hmp-cmds.c

2019-11-20 Thread Maxim Levitsky

These days device-hotplug.c only contains the hmp_drive_add
In the next patch, rest of hmp_drive* functions will be moved
there.

Signed-off-by: Maxim Levitsky 
---
 MAINTAINERS | 1 +
 Makefile.objs   | 4 ++--
 device-hotplug.c => blockdev-hmp-cmds.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)
 rename device-hotplug.c => blockdev-hmp-cmds.c (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index dfb7932608..658c38edf4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1855,6 +1855,7 @@ Block QAPI, monitor, command line
 M: Markus Armbruster 
 S: Supported
 F: blockdev.c
+F: blockdev-hmp-cmds.c
 F: block/qapi.c
 F: qapi/block*.json
 F: qapi/transaction.json
diff --git a/Makefile.objs b/Makefile.objs
index 11ba1a36bd..cb33c1873c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -43,13 +43,13 @@ io-obj-y = io/
 # single QEMU executable should support all CPUs and machines.
 
 ifeq ($(CONFIG_SOFTMMU),y)
-common-obj-y = blockdev.o blockdev-nbd.o block/
+common-obj-y = blockdev.o blockdev-nbd.o blockdev-hmp-cmds.o block/
 common-obj-y += bootdevice.o iothread.o
 common-obj-y += dump/
 common-obj-y += job-qmp.o
 common-obj-y += monitor/
 common-obj-y += net/
-common-obj-y += qdev-monitor.o device-hotplug.o
+common-obj-y += qdev-monitor.o
 common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
 
diff --git a/device-hotplug.c b/blockdev-hmp-cmds.c
similarity index 98%
rename from device-hotplug.c
rename to blockdev-hmp-cmds.c
index 5ce73f0cff..21ff6fa9a9 100644
--- a/device-hotplug.c
+++ b/blockdev-hmp-cmds.c
@@ -1,5 +1,5 @@
 /*
- * QEMU device hotplug helpers
+ * Blockdev HMP commands
  *
  * Copyright (c) 2004 Fabrice Bellard
  *
-- 
2.17.2

[PATCH 0/9] RFC: [for 5.0]: HMP monitor handlers cleanups

2019-11-20 Thread Maxim Levitsky

This patch series is bunch of cleanups
to the hmp monitor code.

This series only touched blockdev related hmp handlers.

No functional changes expected other that
light error message changes by the last patch.

This was inspired by this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1719169

Basically some users still parse hmp error messages,
and they would like to have them prefixed with 'Error:'

In commit 66363e9a43f649360a3f74d2805c9f864da027eb we added
the hmp_handle_error which does exactl that but some hmp handlers
don't use it.

In this patch series, I moved all the block related hmp handlers
into blockdev-hmp-cmds.c, and then made them use this function
to report the errors.

I hope I didn't change too much code, I just felt that if
I touch this code, I can also make it easier to find these
handlers, that were scattered over 3 different files.

Best regards,
Maxim Levitsky

Maxim Levitsky (9):
  monitor: uninline add_init_drive
  monitor: rename device-hotplug.c to blockdev-hmp-cmds.c
  monitor: move hmp_drive_del and hmp_commit to blockdev-hmp-cmds.c
  monitor: move hmp_drive_mirror and hmp_drive_backup to
blockdev-hmp-cmds.c
  monitor: move hmp_block_job* to blockdev-hmp-cmd.c
  monitor: move hmp_snapshot_* to blockdev-hmp-cmds.c
  monitor: move remaining hmp_block* functions to blockdev-hmp-cmds.c
  monitor: move hmp_info_block* to blockdev-hmp-cmds.c
  monitor/hmp: Prefer to use hmp_handle_error for error reporting in
block hmp commands

 MAINTAINERS |   1 +
 Makefile.objs   |   4 +-
 blockdev-hmp-cmds.c | 656 
 blockdev.c  |  95 ---
 device-hotplug.c|  91 --
 monitor/hmp-cmds.c  | 465 ---
 6 files changed, 659 insertions(+), 653 deletions(-)
 create mode 100644 blockdev-hmp-cmds.c
 delete mode 100644 device-hotplug.c

-- 
2.17.2

Re: [PATCH v3 07/33] serial: register vmsd with DeviceClass

2019-11-20 Thread Dr. David Alan Gilbert

* Marc-André Lureau (marcandre.lur...@gmail.com) wrote:
> On Tue, Nov 19, 2019 at 2:35 PM Peter Maydell  
> wrote:
> >
> > On Tue, 19 Nov 2019 at 10:23, Marc-André Lureau
> >  wrote:
> > > On Mon, Nov 18, 2019 at 6:22 PM Peter Maydell  
> > > wrote:
> > > > Did you test whether migration still works from a QEMU
> > > > version without this patch to one with it? (The migration
> > >
> > > Yes, I thought I did test correctly, but I realized testing with x86
> > > isn't correct.
> > >
> > > So with arm/musicpal for ex, I can migrate from before->after, however
> > > after->before won't work. Is that ok?
> >
> > Broadly speaking, the only case where we care about not
> > breaking cross-version migration is where we have a versioned
> > machine type. So musicpal doesn't matter too much. Beyond
> > that, yes, generally before->after is more important than
> > after->before. I have a feeling Red Hat downstream cares about
> > after->before migration at least for x86 but you or your colleagues
> > would know that better than me :-)
> >
> > > > vmstate code is too complicated for me to be able to figure
> > > > out whether passing the 'dev' pointer makes a difference
> > > > to whot it names the state sections and whether the
> > > > 'qdev_set_legacy_instance_id' suffices to avoid problems.)
> > >
> > > I don't see a way to fix after->before, because the instance id is
> > > initially 0 with the new code, and the old code expect a different
> > > value.
> >
> > Can you explain how the instance ID stuff works? I was
> > expecting that the result of setting the legacy instance ID
> > would just be that the new version would always have
> > the older setting, so if it works for old->new it would also
> > work for new->old. But as I say I don't understand this bit
> > of the migration code.
> 
> From what I understand, the alias_id is only used in
> savevm.c:find_se(), and thus can only be used to match against
> "legacy" instance id values. On new code, instance_id is generated
> incrementally from 0 with calculate_new_instance_id(), based on
> "qdev-path/vmsd-name".

I think there are cases here there's no qdev path that's viable;
e.g. for ISA devices, the ID is set to the ISA IO base:

hw/char/serial-isa.c
79:qdev_set_legacy_instance_id(dev, isa->iobase, 3);

(In serial_isa_realizefn )

but to be honest I'd have to trace this out and see what values the
devices are actually using to be sure.

(And yes, please don't break backwards migration; otherwise I'll
end up having to figure out a fix).

Dave

> 
> 
> -- 
> Marc-André Lureau
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v2 6/6] iotests: Test committing to short backing file

2019-11-20 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/274| 141 +
 tests/qemu-iotests/274.out| 227 ++
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/iotests.py |   2 +-
 4 files changed, 370 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/274
 create mode 100644 tests/qemu-iotests/274.out

diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
new file mode 100755
index 00..0f2ef87327
--- /dev/null
+++ b/tests/qemu-iotests/274
@@ -0,0 +1,141 @@
+#!/usr/bin/env python
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Kevin Wolf 
+#
+# Some tests for short backing files and short overlays
+
+import iotests
+import os
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+iotests.verify_platform(['linux'])
+
+size_short = 1 * 1024 * 1024
+size_long = 2 * 1024 * 1024
+size_diff = size_long - size_short
+
+def create_chain():
+iotests.qemu_img_log('create', '-f', iotests.imgfmt, base,
+ str(size_long))
+iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', base, mid,
+ str(size_short))
+iotests.qemu_img_log('create', '-f', iotests.imgfmt, '-b', mid, top,
+ str(size_long))
+
+iotests.qemu_io_log('-c', 'write -P 1 0 %d' % size_long, base)
+
+def create_vm():
+vm = iotests.VM()
+vm.add_blockdev('file,filename=%s,node-name=base-file' % (base))
+vm.add_blockdev('%s,file=base-file,node-name=base' % (iotests.imgfmt))
+vm.add_blockdev('file,filename=%s,node-name=mid-file' % (mid))
+vm.add_blockdev('%s,file=mid-file,node-name=mid,backing=base' % 
(iotests.imgfmt))
+vm.add_drive(top, 'backing=mid,node-name=top')
+return vm
+
+with iotests.FilePath('base') as base, \
+ iotests.FilePath('mid') as mid, \
+ iotests.FilePath('top') as top:
+
+iotests.log('== Commit tests ==')
+
+create_chain()
+
+iotests.log('=== Check visible data ===')
+
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, top)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), top)
+
+iotests.log('=== Checking allocation status ===')
+
+iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
+'-c', 'alloc %d %d' % (size_short, size_diff),
+base)
+
+iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
+'-c', 'alloc %d %d' % (size_short, size_diff),
+mid)
+
+iotests.qemu_io_log('-c', 'alloc 0 %d' % size_short,
+'-c', 'alloc %d %d' % (size_short, size_diff),
+top)
+
+iotests.log('=== Checking map ===')
+
+iotests.qemu_img_log('map', '--output=json', base)
+iotests.qemu_img_log('map', '--output=human', base)
+iotests.qemu_img_log('map', '--output=json', mid)
+iotests.qemu_img_log('map', '--output=human', mid)
+iotests.qemu_img_log('map', '--output=json', top)
+iotests.qemu_img_log('map', '--output=human', top)
+
+iotests.log('=== Testing qemu-img commit (top -> mid) ===')
+
+iotests.qemu_img_log('commit', top)
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+
+iotests.log('=== Testing HMP commit (top -> mid) ===')
+
+create_chain()
+with create_vm() as vm:
+vm.launch()
+vm.qmp_log('human-monitor-command', command_line='commit drive0')
+
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+
+iotests.log('=== Testing QMP active commit (top -> mid) ===')
+
+create_chain()
+with create_vm() as vm:
+vm.launch()
+vm.qmp_log('block-commit', device='top', base_node='mid',
+   job_id='job0', auto_dismiss=False)
+vm.run_job('job0', wait=5)
+
+iotests.img_info_log(mid)
+iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
+
+
+iotests.log('== Resize tests ==')
+
+for prealloc in ['off', 'metadata', 'falloc', 'full']:
+
+

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Eduardo Habkost

On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
> This allows using "-cpu Haswell,+vmx", which we did not really want to
> support in QEMU but was produced by Libvirt when using the "host-model"
> CPU model.

I understand guest ABI compatibility is not a concern, but I
don't remember how we guarantee it won't break by accident if
somebody tries to live migrate a VM.

What is supposed to happen today if trying to live migrate a VM
using "-cpu Haswell,+vmx"?

-- 
Eduardo

[PATCH v2 3/6] iotests: Add qemu_io_log()

2019-11-20 Thread Kevin Wolf

Add a function that runs qemu-io and logs the output with the
appropriate filters applied.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6a248472b9..330681ad02 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -157,6 +157,11 @@ def qemu_io(*args):
 sys.stderr.write('qemu-io received signal %i: %s\n' % (-exitcode, ' 
'.join(args)))
 return subp.communicate()[0]
 
+def qemu_io_log(*args):
+result = qemu_io(*args)
+log(result, filters=[filter_testfiles, filter_qemu_io])
+return result
+
 def qemu_io_silent(*args):
 '''Run qemu-io and return the exit code, suppressing stdout'''
 args = qemu_io_args + list(args)
-- 
2.20.1

[PATCH v2 4/6] iotests: Fix timeout in run_job()

2019-11-20 Thread Kevin Wolf

run_job() accepts a wait parameter for a timeout, but it doesn't
actually use it. The only thing that is missing is passing it to
events_wait(), so do that now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 330681ad02..b409198e47 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -604,7 +604,7 @@ class VM(qtest.QEMUQtestMachine):
 ]
 error = None
 while True:
-ev = filter_qmp_event(self.events_wait(events))
+ev = filter_qmp_event(self.events_wait(events, timeout=wait))
 if ev['event'] != 'JOB_STATUS_CHANGE':
 if use_log:
 log(ev)
-- 
2.20.1

[PATCH v2 2/6] block: truncate: Don't make backing file data visible

2019-11-20 Thread Kevin Wolf

When extending the size of an image that has a backing file larger than
its old size, make sure that the backing file data doesn't become
visible in the guest, but the added area is properly zeroed out.

Consider the following scenario where the overlay is shorter than its
backing file:

base.qcow2: 
overlay.qcow2:  

When resizing (extending) overlay.qcow2, the new blocks should not stay
unallocated and make the additional As from base.qcow2 visible like
before this patch, but zeros should be read.

A similar case happens with the various variants of a commit job when an
intermediate file is short (- for unallocated):

base.qcow2: A-A-
mid.qcow2:  BB-B
top.qcow2:  C--C--C-

After commit top.qcow2 to mid.qcow2, the following happens:

mid.qcow2:  CB-C00C0 (correct result)
mid.qcow2:  CB-C--C- (before this fix)

Without the fix, blocks that previously read as zeros on top.qcow2
suddenly turn into A.

Signed-off-by: Kevin Wolf 
---
 block/io.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/block/io.c b/block/io.c
index 003f4ea38c..6a5144f8d2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3385,12 +3385,44 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not refresh total sector count");
+goto fail_refresh_total_sectors;
 } else {
 offset = bs->total_sectors * BDRV_SECTOR_SIZE;
 }
+
+/*
+ * If the image has a backing file that is large enough that it would
+ * provide data for the new area, we cannot leave it unallocated because
+ * then the backing file content would become visible. Instead, zero-fill
+ * the area where backing file and new area overlap.
+ *
+ * Note that if the image has a backing file, but was opened without the
+ * backing file, taking care of keeping things consistent with that backing
+ * file is the user's responsibility.
+ */
+if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+int64_t backing_len;
+
+backing_len = bdrv_getlength(backing_bs(bs));
+if (backing_len < 0) {
+ret = backing_len;
+goto out;
+}
+
+if (backing_len > old_size) {
+ret = bdrv_co_do_pwrite_zeroes(
+bs, old_size, MIN(new_bytes, backing_len - old_size),
+BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP);
+if (ret < 0) {
+goto out;
+}
+}
+}
+
 /* It's possible that truncation succeeded but refresh_total_sectors
  * failed, but the latter doesn't affect how we should finish the request.
  * Pass 0 as the last parameter so that dirty bitmaps etc. are handled. */
+fail_refresh_total_sectors:
 bdrv_co_write_req_finish(child, offset - new_bytes, new_bytes, &req, 0);
 
 out:
-- 
2.20.1

[PATCH v2 1/6] block: bdrv_co_do_pwrite_zeroes: 64 bit 'bytes' parameter

2019-11-20 Thread Kevin Wolf

bdrv_co_do_pwrite_zeroes() can already cope with maximum request sizes
by calling the driver in a loop until everything is done. Make the small
remaining change that is necessary to let it accept a 64 bit byte count.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index f75777f5ea..003f4ea38c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,7 +42,7 @@
 
 static void bdrv_parent_cb_resize(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
-int64_t offset, int bytes, BdrvRequestFlags flags);
+int64_t offset, int64_t bytes, BdrvRequestFlags flags);
 
 static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
   bool ignore_bds_parents)
@@ -1730,7 +1730,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 }
 
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
-int64_t offset, int bytes, BdrvRequestFlags flags)
+int64_t offset, int64_t bytes, BdrvRequestFlags flags)
 {
 BlockDriver *drv = bs->drv;
 QEMUIOVector qiov;
@@ -1760,7 +1760,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 assert(max_write_zeroes >= bs->bl.request_alignment);
 
 while (bytes > 0 && !ret) {
-int num = bytes;
+int num = MIN(bytes, BDRV_REQUEST_MAX_BYTES);
 
 /* Align request.  Block drivers can expect the "bulk" of the request
  * to be aligned, and that unaligned requests do not cross cluster
-- 
2.20.1

[PATCH v2 5/6] iotests: Support job-complete in run_job()

2019-11-20 Thread Kevin Wolf

Automatically complete jobs that have a 'ready' state and need an
explicit job-complete. Without this, run_job() would hang for such
jobs.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b409198e47..c4063ef6bb 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -617,6 +617,8 @@ class VM(qtest.QEMUQtestMachine):
 error = j['error']
 if use_log:
 log('Job failed: %s' % (j['error']))
+elif status == 'ready':
+self.qmp_log('job-complete', id=job)
 elif status == 'pending' and not auto_finalize:
 if pre_finalize:
 pre_finalize()
-- 
2.20.1

[PATCH for-4.2? v2 0/6] block: Fix resize (extending) of short overlays

2019-11-20 Thread Kevin Wolf

See patch 2 for the description of the bug fixed.

v2:
- Switched order of bs->total_sectors update and zero write [Vladimir]
- Fixed coding style [Vladimir]
- Changed the commit message to contain what was in the cover letter
- Test all preallocation modes
- Test allocation status with qemu-io 'map' [Vladimir]

Kevin Wolf (6):
  block: bdrv_co_do_pwrite_zeroes: 64 bit 'bytes' parameter
  block: truncate: Don't make backing file data visible
  iotests: Add qemu_io_log()
  iotests: Fix timeout in run_job()
  iotests: Support job-complete in run_job()
  iotests: Test committing to short backing file

 block/io.c|  38 +-
 tests/qemu-iotests/274| 141 +
 tests/qemu-iotests/274.out| 227 ++
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/iotests.py |  11 +-
 5 files changed, 413 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/274
 create mode 100644 tests/qemu-iotests/274.out

-- 
2.20.1

Re: [PATCH for-4.2 1/2] i386: Add new versions of Skylake/Cascadelake/Icelake without TSX

2019-11-20 Thread Eduardo Habkost

On Wed, Nov 20, 2019 at 06:40:06PM +0100, Paolo Bonzini wrote:
> On 20/11/19 17:49, Eduardo Habkost wrote:
> > One of the mitigation methods for TAA[1] is to disable TSX
> > support on the host system.  Linux added a mechanism to disable
> > TSX globally through the kernel command line, and many Linux
> > distributions now default to tsx=off.  This makes existing CPU
> > models that have HLE and RTM enabled not usable anymore.
> > 
> > Add new versions of all CPU models that have the HLE and RTM
> > features enabled, that can be used when TSX is disabled in the
> > host system.
> 
> What is the effect of this when using "-cpu CascadeLake-Server" and
> upgrading QEMU?  Would it automatically switch to the new version?  If
> so, would it be better to include a duplicate of the models (and if so,
> that would conflict with my VMX features patch, which is also for 4.2).

It won't, because PCMachineClass::default_cpu_version==1 for all
versioned PC machine-types, currently.

The plan is to set default_cpu_version=CPU_VERSION_LATEST on
pc-*-5.0 (or, more likely, 5.1).  But this will happen only after
libvirt starts resolving CPU model versions.  See the
"Runnability guarantee of CPU models" section at
qemu-deprecated.texi.

-- 
Eduardo

Re: [PATCH] pseries: disable migration-test if /dev/kvm cannot be used

2019-11-20 Thread Greg Kurz

On Wed, 20 Nov 2019 18:09:55 +0100
Laurent Vivier  wrote:

> On ppc64, migration-test only works with kvm_hv, and we already
> have a check to verify the module is loaded.
> 
> kvm_hv module can be loaded in memory and /sys/module/kvm_hv exists,
> but on some systems (like build systems) /dev/kvm can be missing
> (by administrators choice).
> 
> And as kvm_hv exists test-migration is started but QEMU falls back to
> TCG because it cannot be used:
> 
> Could not access KVM kernel module: No such file or directory
> failed to initialize KVM: No such file or directory
> Back to tcg accelerator
> 
> And as the test is done with TCG, it fails.
> 
> As for s390x, we must check for the existence and the access rights
> of /dev/kvm.
> 
> Reported-by: Cole Robinson 
> Signed-off-by: Laurent Vivier 
> ---

Reviewed-by: Greg Kurz 

>  tests/migration-test.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index ac780dffdaad..2b25ba6d77f6 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -1349,7 +1349,8 @@ int main(int argc, char **argv)
>   * some reason)
>   */
>  if (g_str_equal(qtest_get_arch(), "ppc64") &&
> -access("/sys/module/kvm_hv", F_OK)) {
> +(access("/sys/module/kvm_hv", F_OK) ||
> + access("/dev/kvm", R_OK | W_OK))) {
>  g_test_message("Skipping test: kvm_hv not available");
>  return g_test_run();
>  }

Re: [PATCH v9 QEMU 12/15] vfio: Add load state functions to SaveVMHandlers

2019-11-20 Thread Dr. David Alan Gilbert

* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> Sequence  during _RESUMING device state:
> While data for this device is available, repeat below steps:
> a. read data_offset from where user application should write data.
> b. write data of data_size to migration region from data_offset.
> c. write data_size which indicates vendor driver that data is written in
>staging buffer.
> 
> For user, data is opaque. User should write data in the same order as
> received.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c  | 170 
> +++
>  hw/vfio/trace-events |   3 +
>  2 files changed, 173 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index f890e864e174..16e12586fe8b 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -251,6 +251,33 @@ static int vfio_save_device_config_state(QEMUFile *f, 
> void *opaque)
>  return qemu_file_get_error(f);
>  }
>  
> +static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +uint64_t data;
> +
> +if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
> +int ret;
> +
> +ret = vbasedev->ops->vfio_load_config(vbasedev, f);
> +if (ret) {
> +error_report("%s: Failed to load device config space",
> + vbasedev->name);
> +return ret;
> +}
> +}
> +
> +data = qemu_get_be64(f);
> +if (data != VFIO_MIG_FLAG_END_OF_STATE) {
> +error_report("%s: Failed loading device config space, "
> + "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
> +return -EINVAL;
> +}
> +
> +trace_vfio_load_device_config_state(vbasedev->name);
> +return qemu_file_get_error(f);
> +}
> +
>  /* -- */
>  
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -410,12 +437,155 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
> void *opaque)
>  return ret;
>  }
>  
> +static int vfio_load_setup(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret = 0;
> +
> +if (migration->region.mmaps) {
> +ret = vfio_region_mmap(&migration->region);
> +if (ret) {
> +error_report("%s: Failed to mmap VFIO migration region %d: %s",
> + vbasedev->name, migration->region.nr,
> + strerror(-ret));
> +return ret;
> +}
> +}
> +
> +ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, 0);
> +if (ret) {
> +error_report("%s: Failed to set state RESUMING", vbasedev->name);
> +}
> +return ret;
> +}
> +
> +static int vfio_load_cleanup(void *opaque)
> +{
> +vfio_save_cleanup(opaque);
> +return 0;
> +}
> +
> +static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret = 0;
> +uint64_t data, data_size;
> +
> +data = qemu_get_be64(f);
> +while (data != VFIO_MIG_FLAG_END_OF_STATE) {
> +
> +trace_vfio_load_state(vbasedev->name, data);
> +
> +switch (data) {
> +case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> +{
> +ret = vfio_load_device_config_state(f, opaque);
> +if (ret) {
> +return ret;
> +}
> +break;
> +}
> +case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> +{
> +data = qemu_get_be64(f);
> +if (data == VFIO_MIG_FLAG_END_OF_STATE) {
> +return ret;
> +} else {
> +error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
> + vbasedev->name, data);
> +return -EINVAL;
> +}
> +break;
> +}
> +case VFIO_MIG_FLAG_DEV_DATA_STATE:
> +{
> +VFIORegion *region = &migration->region;
> +void *buf = NULL;
> +bool buffer_mmaped = false;
> +uint64_t data_offset = 0;
> +
> +data_size = qemu_get_be64(f);
> +if (data_size == 0) {
> +break;
> +}
> +
> +ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
> +region->fd_offset +
> +offsetof(struct vfio_device_migration_info,
> +data_offset));
> +if (ret != sizeof(data_offset)) {
> +error_report("%s:Failed to get migration buffer data offset 
> %d",
> + vbasedev->name, ret);
> +return -EINVAL;
> +}
> +
> +if (region->mmaps) {
> +buf = find_data_region(region, data_offset, data_size);
> +

Re: [PATCH for-5.0 v5 15/23] ppc/xive: Use the XiveFabric and XivePresenter interfaces

2019-11-20 Thread Greg Kurz

On Fri, 15 Nov 2019 17:24:28 +0100
Cédric Le Goater  wrote:

> Now that the machines have handlers implementing the XiveFabric and
> XivePresenter interfaces, remove xive_presenter_match() and make use
> of the 'match_nvt' handler of the machine.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/intc/xive.c | 48 +---
>  1 file changed, 17 insertions(+), 31 deletions(-)
> 

Nice diffstat :)

> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 1c9e58f8deac..ab62bda85788 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -1423,30 +1423,6 @@ int xive_presenter_tctx_match(XivePresenter *xptr, 
> XiveTCTX *tctx,
>  return -1;
>  }
>  
> -static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format,
> - uint8_t nvt_blk, uint32_t nvt_idx,
> - bool cam_ignore, uint8_t priority,
> - uint32_t logic_serv, XiveTCTXMatch *match)
> -{
> -XivePresenter *xptr = XIVE_PRESENTER(xrtr);
> -XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
> -int count;
> -
> -count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
> -   priority, logic_serv, match);
> -if (count < 0) {
> -return false;
> -}
> -
> -if (!match->tctx) {
> -qemu_log_mask(LOG_UNIMP, "XIVE: NVT %x/%x is not dispatched\n",
> -  nvt_blk, nvt_idx);

Maybe keep this trace...

> -return false;
> -}
> -
> -return true;
> -}
> -
>  /*
>   * This is our simple Xive Presenter Engine model. It is merged in the
>   * Router as it does not require an extra object.
> @@ -1462,22 +1438,32 @@ static bool xive_presenter_match(XiveRouter *xrtr, 
> uint8_t format,
>   *
>   * The parameters represent what is sent on the PowerBus
>   */
> -static bool xive_presenter_notify(XiveRouter *xrtr, uint8_t format,
> +static bool xive_presenter_notify(uint8_t format,
>uint8_t nvt_blk, uint32_t nvt_idx,
>bool cam_ignore, uint8_t priority,
>uint32_t logic_serv)
>  {
> +XiveFabric *xfb = XIVE_FABRIC(qdev_get_machine());
> +XiveFabricClass *xfc = XIVE_FABRIC_GET_CLASS(xfb);
>  XiveTCTXMatch match = { .tctx = NULL, .ring = 0 };
> -bool found;
> +int count;
>  
> -found = xive_presenter_match(xrtr, format, nvt_blk, nvt_idx, cam_ignore,
> - priority, logic_serv, &match);
> -if (found) {
> +/*
> + * Ask the machine to scan the interrupt controllers for a match
> + */
> +count = xfc->match_nvt(xfb, format, nvt_blk, nvt_idx, cam_ignore,
> +   priority, logic_serv, &match);
> +if (count < 0) {
> +return false;
> +}
> +
> +/* handle CPU exception delivery */
> +if (count) {
>  ipb_update(&match.tctx->regs[match.ring], priority);
>  xive_tctx_notify(match.tctx, match.ring);
>  }

... in an else block here ^^ ?

>  
> -return found;
> +return count;

Implicit cast is ok I guess, but !!count would ensure no paranoid
compiler ever complains.

>  }
>  
>  /*
> @@ -1590,7 +1576,7 @@ static void xive_router_end_notify(XiveRouter *xrtr, 
> uint8_t end_blk,
>  return;
>  }
>  
> -found = xive_presenter_notify(xrtr, format, nvt_blk, nvt_idx,
> +found = xive_presenter_notify(format, nvt_blk, nvt_idx,
>xive_get_field32(END_W7_F0_IGNORE, end.w7),
>priority,
>xive_get_field32(END_W7_F1_LOG_SERVER_ID, end.w7));

[PATCH 5/6] qapi: Fix code generation for empty modules

2019-11-20 Thread Markus Armbruster

When a sub-module doesn't contain any definitions, we don't generate
code for it, but we do generate the #include.

We generate code only for modules that get visited.
QAPISchema.visit() visits only modules that have definitions.  It can
visit modules multiple times.

Clean this up as follows.  Collect entities in their QAPISchemaModule.
Have QAPISchema.visit() call QAPISchemaModule.visit() for each module.
Have QAPISchemaModule.visit() call .visit_module() for itself, and
QAPISchemaEntity.visit() for each of its entities.  This way, we visit
each module exactly once.

Signed-off-by: Markus Armbruster 
---
 scripts/qapi/schema.py   | 24 +---
 tests/qapi-schema/empty.out  |  1 +
 tests/qapi-schema/include-repetition.out |  6 ++
 tests/qapi-schema/qapi-schema-test.out   | 24 ++--
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index 0f2e0dcfce..0bfc5256fb 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -68,6 +68,7 @@ class QAPISchemaEntity(object):
 def _set_module(self, schema, info):
 assert self._checked
 self._module = schema.module_by_fname(info and info.fname)
+self._module.add_entity(self)
 
 def set_module(self, schema):
 self._set_module(schema, self.info)
@@ -77,11 +78,6 @@ class QAPISchemaEntity(object):
 assert self._checked
 return self._ifcond
 
-@property
-def module(self):
-assert self._module or not self.info
-return self._module
-
 def is_implicit(self):
 return not self.info
 
@@ -142,6 +138,16 @@ class QAPISchemaVisitor(object):
 class QAPISchemaModule(object):
 def __init__(self, name):
 self.name = name
+self._entity_list = []
+
+def add_entity(self, ent):
+self._entity_list.append(ent)
+
+def visit(self, visitor):
+visitor.visit_module(self.name)
+for entity in self._entity_list:
+if visitor.visit_needed(entity):
+entity.visit(visitor)
 
 
 class QAPISchemaInclude(QAPISchemaEntity):
@@ -1093,10 +1099,6 @@ class QAPISchema(object):
 def visit(self, visitor):
 visitor.visit_begin(self)
 module = None
-for entity in self._entity_list:
-if visitor.visit_needed(entity):
-if entity.module != module:
-module = entity.module
-visitor.visit_module(module.name)
-entity.visit(visitor)
+for mod in self._module_dict.values():
+mod.visit(visitor)
 visitor.visit_end()
diff --git a/tests/qapi-schema/empty.out b/tests/qapi-schema/empty.out
index 5b53d00702..69666c39ad 100644
--- a/tests/qapi-schema/empty.out
+++ b/tests/qapi-schema/empty.out
@@ -9,3 +9,4 @@ enum QType
 member qdict
 member qlist
 member qbool
+module empty.json
diff --git a/tests/qapi-schema/include-repetition.out 
b/tests/qapi-schema/include-repetition.out
index 5423983239..0b654ddebb 100644
--- a/tests/qapi-schema/include-repetition.out
+++ b/tests/qapi-schema/include-repetition.out
@@ -11,15 +11,13 @@ enum QType
 member qbool
 module include-repetition.json
 include comments.json
+include include-repetition-sub.json
+include comments.json
 module comments.json
 enum Status
 member good
 member bad
 member ugly
-module include-repetition.json
-include include-repetition-sub.json
 module include-repetition-sub.json
 include comments.json
 include comments.json
-module include-repetition.json
-include comments.json
diff --git a/tests/qapi-schema/qapi-schema-test.out 
b/tests/qapi-schema/qapi-schema-test.out
index 3660e75a48..9bd3c4a490 100644
--- a/tests/qapi-schema/qapi-schema-test.out
+++ b/tests/qapi-schema/qapi-schema-test.out
@@ -153,9 +153,6 @@ object q_obj_sizeList-wrapper
 member data: sizeList optional=False
 object q_obj_anyList-wrapper
 member data: anyList optional=False
-module sub-sub-module.json
-array StatusList Status
-module qapi-schema-test.json
 object q_obj_StatusList-wrapper
 member data: StatusList optional=False
 enum UserDefListUnionKind
@@ -193,17 +190,6 @@ object UserDefListUnion
 case any: q_obj_anyList-wrapper
 case user: q_obj_StatusList-wrapper
 include include/sub-module.json
-module include/sub-module.json
-include sub-sub-module.json
-module sub-sub-module.json
-enum Status
-member good
-member bad
-member ugly
-module include/sub-module.json
-object SecondArrayRef
-member s: StatusList optional=False
-module qapi-schema-test.json
 command user_def_cmd None -> None
 gen=True success_response=True boxed=False oob=False preconfig=False
 object q_obj_user_def_cmd1-arg
@@ -435,3 +421,13 @@ command test-command-cond-features3 None -> None
 gen=True success_response=True boxed=False oob=False preconfig=False
 feature feature1
 if ['defined(TEST_IF_COND_1)', 'define

[PATCH 1/6] qapi: Tweak "command returns a nice type" check for clarity

2019-11-20 Thread Markus Armbruster

Signed-off-by: Markus Armbruster 
---
 scripts/qapi/schema.py | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index cf0045f34e..cfb574c85d 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -711,10 +711,11 @@ class QAPISchemaCommand(QAPISchemaEntity):
 self.ret_type = schema.resolve_type(
 self._ret_type_name, self.info, "command's 'returns'")
 if self.name not in self.info.pragma.returns_whitelist:
-if not (isinstance(self.ret_type, QAPISchemaObjectType)
-or (isinstance(self.ret_type, QAPISchemaArrayType)
-and isinstance(self.ret_type.element_type,
-   QAPISchemaObjectType))):
+typ = self.ret_type
+if isinstance(typ, QAPISchemaArrayType):
+typ = self.ret_type.element_type
+assert typ
+if not isinstance(typ, QAPISchemaObjectType):
 raise QAPISemError(
 self.info,
 "command's 'returns' cannot take %s"
-- 
2.21.0

[PATCH 0/6] qapi: Module fixes and cleanups

2019-11-20 Thread Markus Armbruster

Kevin recently posted a minimally invasive fix for empty QAPI
modules[*].  This is my attempt at a fix that also addresses the
design weakness that led to the bug.

Markus Armbruster (6):
  qapi: Tweak "command returns a nice type" check for clarity
  tests/Makefile.include: Fix missing test-qapi-emit-events.[ch]
  qapi: Generate command registration stuff into separate files
  qapi: Proper intermediate representation for modules
  qapi: Fix code generation for empty modules
  qapi: Simplify QAPISchemaModularCVisitor

 docs/devel/qapi-code-gen.txt | 19 -
 Makefile |  4 +-
 monitor/misc.c   |  7 +-
 qga/main.c   |  2 +-
 tests/test-qmp-cmds.c|  1 +
 .gitignore   |  1 +
 qapi/Makefile.objs   |  1 +
 qga/Makefile.objs|  1 +
 scripts/qapi/commands.py | 17 +++--
 scripts/qapi/events.py   |  2 +-
 scripts/qapi/gen.py  | 28 
 scripts/qapi/schema.py   | 92 +++-
 scripts/qapi/types.py|  5 +-
 scripts/qapi/visit.py|  8 +--
 tests/.gitignore |  1 +
 tests/Makefile.include   |  9 ++-
 tests/qapi-schema/empty.out  |  1 +
 tests/qapi-schema/include-repetition.out |  6 +-
 tests/qapi-schema/qapi-schema-test.out   | 24 +++
 19 files changed, 144 insertions(+), 85 deletions(-)

-- 
2.21.0

[PATCH 3/6] qapi: Generate command registration stuff into separate files

2019-11-20 Thread Markus Armbruster

Having to include qapi-commands.h just for qmp_init_marshal() is
suboptimal.  Generate it into separate files.  This lets
monitor/misc.c, qga/main.c, and the generated qapi-commands-FOO.h
include less.

Signed-off-by: Markus Armbruster 
---
 docs/devel/qapi-code-gen.txt | 19 ---
 Makefile |  4 +++-
 monitor/misc.c   |  7 ++-
 qga/main.c   |  2 +-
 tests/test-qmp-cmds.c|  1 +
 .gitignore   |  1 +
 qapi/Makefile.objs   |  1 +
 qga/Makefile.objs|  1 +
 scripts/qapi/commands.py | 15 +++
 tests/.gitignore |  1 +
 tests/Makefile.include   |  5 -
 11 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index 45c93a43cc..3f37339d16 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -1493,6 +1493,10 @@ $(prefix)qapi-commands.c: Command marshal/dispatch 
functions for each
 $(prefix)qapi-commands.h: Function prototypes for the QMP commands
   specified in the schema
 
+$(prefix)qapi-init-commands.h - Command initialization prototype
+
+$(prefix)qapi-init-commands.h - Command initialization code
+
 Example:
 
 $ cat qapi-generated/example-qapi-commands.h
@@ -1502,11 +1506,9 @@ Example:
 #define EXAMPLE_QAPI_COMMANDS_H
 
 #include "example-qapi-types.h"
-#include "qapi/qmp/dispatch.h"
 
 UserDefOne *qmp_my_command(UserDefOneList *arg1, Error **errp);
 void qmp_marshal_my_command(QDict *args, QObject **ret, Error **errp);
-void example_qmp_init_marshal(QmpCommandList *cmds);
 
 #endif /* EXAMPLE_QAPI_COMMANDS_H */
 $ cat qapi-generated/example-qapi-commands.c
@@ -1566,7 +1568,19 @@ Example:
 visit_end_struct(v, NULL);
 visit_free(v);
 }
+[Uninteresting stuff omitted...]
+$ cat qapi-generated/example-qapi-init-commands.h
+[Uninteresting stuff omitted...]
+#ifndef EXAMPLE_QAPI_INIT_COMMANDS_H
+#define EXAMPLE_QAPI_INIT_COMMANDS_H
 
+#include "qapi/qmp/dispatch.h"
+
+void example_qmp_init_marshal(QmpCommandList *cmds);
+
+#endif /* EXAMPLE_QAPI_INIT_COMMANDS_H */
+$ cat qapi-generated/example-qapi-init-commands.
+[Uninteresting stuff omitted...]
 void example_qmp_init_marshal(QmpCommandList *cmds)
 {
 QTAILQ_INIT(cmds);
@@ -1574,7 +1588,6 @@ Example:
 qmp_register_command(cmds, "my-command",
  qmp_marshal_my_command, QCO_NO_OPTIONS);
 }
-
 [Uninteresting stuff omitted...]
 
 For a modular QAPI schema (see section Include directives), code for
diff --git a/Makefile b/Makefile
index b437a346d7..8dad949483 100644
--- a/Makefile
+++ b/Makefile
@@ -117,6 +117,7 @@ GENERATED_QAPI_FILES += qapi/qapi-builtin-visit.h 
qapi/qapi-builtin-visit.c
 GENERATED_QAPI_FILES += qapi/qapi-visit.h qapi/qapi-visit.c
 GENERATED_QAPI_FILES += $(QAPI_MODULES:%=qapi/qapi-visit-%.h)
 GENERATED_QAPI_FILES += $(QAPI_MODULES:%=qapi/qapi-visit-%.c)
+GENERATED_QAPI_FILES += qapi/qapi-init-commands.h qapi/qapi-init-commands.c
 GENERATED_QAPI_FILES += qapi/qapi-commands.h qapi/qapi-commands.c
 GENERATED_QAPI_FILES += $(QAPI_MODULES:%=qapi/qapi-commands-%.h)
 GENERATED_QAPI_FILES += $(QAPI_MODULES:%=qapi/qapi-commands-%.c)
@@ -610,6 +611,7 @@ $(SRC_PATH)/scripts/qapi-gen.py
 qga/qapi-generated/qga-qapi-types.c qga/qapi-generated/qga-qapi-types.h \
 qga/qapi-generated/qga-qapi-visit.c qga/qapi-generated/qga-qapi-visit.h \
 qga/qapi-generated/qga-qapi-commands.h qga/qapi-generated/qga-qapi-commands.c \
+qga/qapi-generated/qga-qapi-init-commands.h 
qga/qapi-generated/qga-qapi-init-commands.c \
 qga/qapi-generated/qga-qapi-doc.texi: \
 qga/qapi-generated/qapi-gen-timestamp ;
 qga/qapi-generated/qapi-gen-timestamp: $(SRC_PATH)/qga/qapi-schema.json 
$(qapi-py)
@@ -628,7 +630,7 @@ qapi-gen-timestamp: $(qapi-modules) $(qapi-py)
"GEN","$(@:%-timestamp=%)")
@>$@
 
-QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h 
qga-qapi-commands.h)
+QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h 
qga-qapi-commands.h qga-qapi-init-commands.h)
 $(qga-obj-y): $(QGALIB_GEN)
 
 qemu-ga$(EXESUF): $(qga-obj-y) $(COMMON_LDADDS)
diff --git a/monitor/misc.c b/monitor/misc.c
index 3baa15f3bf..7c4b599342 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -66,8 +66,13 @@
 #include "qemu/option.h"
 #include "qemu/thread.h"
 #include "block/qapi.h"
-#include "qapi/qapi-commands.h"
+#include "qapi/qapi-commands-char.h"
+#include "qapi/qapi-commands-migration.h"
+#include "qapi/qapi-commands-misc.h"
+#include "qapi/qapi-commands-qom.h"
+#include "qapi/qapi-commands-trace.h"
 #include "qapi/qapi-emit-events.h"
+#include "qapi/qapi-init-commands.h"
 #include "qapi/error.h"
 #include "qapi/qmp-event.h"
 #include "qapi/qapi-introspect.h"
diff --git a/qga/main.c b/qga/main.c
index c35c2a2120..e5c39c189a 100644
---

[PATCH 4/6] qapi: Proper intermediate representation for modules

2019-11-20 Thread Markus Armbruster

Modules are represented only by their names so far.  Introduce class
QAPISchemaModule.  So far, it merely wraps the name.  The next patch
will put it to more interesting use.

Once again, arrays spice up the patch a bit.  For any other type,
@info points to the definition, which lets us map from @info to
module.  For arrays, there is no definition, and @info points to the
first use instead.  We have to use the element type's module instead,
which is only available after .check().

Signed-off-by: Markus Armbruster 
---
 scripts/qapi/schema.py | 63 --
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index cfb574c85d..0f2e0dcfce 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -50,9 +50,6 @@ class QAPISchemaEntity(object):
 
 def check(self, schema):
 assert not self._checked
-if self.info:
-self._module = os.path.relpath(self.info.fname,
-   os.path.dirname(schema.fname))
 seen = {}
 for f in self.features:
 f.check_clash(self.info, seen)
@@ -68,6 +65,13 @@ class QAPISchemaEntity(object):
 if self.doc:
 self.doc.check()
 
+def _set_module(self, schema, info):
+assert self._checked
+self._module = schema.module_by_fname(info and info.fname)
+
+def set_module(self, schema):
+self._set_module(schema, self.info)
+
 @property
 def ifcond(self):
 assert self._checked
@@ -75,7 +79,7 @@ class QAPISchemaEntity(object):
 
 @property
 def module(self):
-assert self._checked
+assert self._module or not self.info
 return self._module
 
 def is_implicit(self):
@@ -135,15 +139,19 @@ class QAPISchemaVisitor(object):
 pass
 
 
+class QAPISchemaModule(object):
+def __init__(self, name):
+self.name = name
+
+
 class QAPISchemaInclude(QAPISchemaEntity):
-
-def __init__(self, fname, info):
+def __init__(self, sub_module, info):
 QAPISchemaEntity.__init__(self, None, info, None)
-self.fname = fname
+self._sub_module = sub_module
 
 def visit(self, visitor):
 QAPISchemaEntity.visit(self, visitor)
-visitor.visit_include(self.fname, self.info)
+visitor.visit_include(self._sub_module.name, self.info)
 
 
 class QAPISchemaType(QAPISchemaEntity):
@@ -276,16 +284,14 @@ class QAPISchemaArrayType(QAPISchemaType):
 self.info and self.info.defn_meta)
 assert not isinstance(self.element_type, QAPISchemaArrayType)
 
+def set_module(self, schema):
+self._set_module(schema, self.element_type.info)
+
 @property
 def ifcond(self):
 assert self._checked
 return self.element_type.ifcond
 
-@property
-def module(self):
-assert self._checked
-return self.element_type.module
-
 def is_implicit(self):
 return True
 
@@ -783,6 +789,10 @@ class QAPISchema(object):
 self.docs = parser.docs
 self._entity_list = []
 self._entity_dict = {}
+self._module_dict = {}
+self._schema_dir = os.path.dirname(fname)
+self._make_module(None) # built-ins
+self._make_module(fname)
 self._predefining = True
 self._def_predefineds()
 self._predefining = False
@@ -826,14 +836,26 @@ class QAPISchema(object):
 info, "%s uses unknown type '%s'" % (what, name))
 return typ
 
+def _module_name(self, fname):
+if fname is None:
+return None
+return os.path.relpath(fname, self._schema_dir)
+
+def _make_module(self, fname):
+name = self._module_name(fname)
+if not name in self._module_dict:
+self._module_dict[name] = QAPISchemaModule(name)
+return self._module_dict[name]
+
+def module_by_fname(self, fname):
+name = self._module_name(fname)
+assert name in self._module_dict
+return self._module_dict[name]
+
 def _def_include(self, expr, info, doc):
 include = expr['include']
 assert doc is None
-main_info = info
-while main_info.parent:
-main_info = main_info.parent
-fname = os.path.relpath(include, os.path.dirname(main_info.fname))
-self._def_entity(QAPISchemaInclude(fname, info))
+self._def_entity(QAPISchemaInclude(self._make_module(include), info))
 
 def _def_builtin_type(self, name, json_type, c_type):
 self._def_entity(QAPISchemaBuiltinType(name, json_type, c_type))
@@ -1065,15 +1087,16 @@ class QAPISchema(object):
 ent.check(self)
 ent.connect_doc()
 ent.check_doc()
+for ent in self._entity_list:
+ent.set_module(self)
 
 def visit(self, visitor):
 visitor.visit_begin(self)
 module = None
-visitor.visit_module(module)

[PATCH 6/6] qapi: Simplify QAPISchemaModularCVisitor

2019-11-20 Thread Markus Armbruster

Since the previous commit, QAPISchemaVisitor.visit_module() is called
just once.  Simplify QAPISchemaModularCVisitor accordingly.

Signed-off-by: Markus Armbruster 
---
 scripts/qapi/commands.py |  2 +-
 scripts/qapi/events.py   |  2 +-
 scripts/qapi/gen.py  | 28 ++--
 scripts/qapi/types.py|  5 +++--
 scripts/qapi/visit.py|  8 
 5 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/scripts/qapi/commands.py b/scripts/qapi/commands.py
index 47f4a18cfe..afa55b055c 100644
--- a/scripts/qapi/commands.py
+++ b/scripts/qapi/commands.py
@@ -239,7 +239,7 @@ class 
QAPISchemaGenCommandVisitor(QAPISchemaModularCVisitor):
 def __init__(self, prefix):
 QAPISchemaModularCVisitor.__init__(
 self, prefix, 'qapi-commands',
-' * Schema-defined QAPI/QMP commands', __doc__)
+' * Schema-defined QAPI/QMP commands', None, __doc__)
 self._regy = QAPIGenCCode(None)
 self._visited_ret_types = {}
 
diff --git a/scripts/qapi/events.py b/scripts/qapi/events.py
index 10fc509fa9..2bde3e6128 100644
--- a/scripts/qapi/events.py
+++ b/scripts/qapi/events.py
@@ -140,7 +140,7 @@ class QAPISchemaGenEventVisitor(QAPISchemaModularCVisitor):
 def __init__(self, prefix):
 QAPISchemaModularCVisitor.__init__(
 self, prefix, 'qapi-events',
-' * Schema-defined QAPI/QMP events', __doc__)
+' * Schema-defined QAPI/QMP events', None, __doc__)
 self._event_enum_name = c_name(prefix + 'QAPIEvent', protect=False)
 self._event_enum_members = []
 self._event_emit_name = c_name(prefix + 'qapi_event_emit')
diff --git a/scripts/qapi/gen.py b/scripts/qapi/gen.py
index 112b6d94c5..95afae0615 100644
--- a/scripts/qapi/gen.py
+++ b/scripts/qapi/gen.py
@@ -201,10 +201,11 @@ class QAPISchemaMonolithicCVisitor(QAPISchemaVisitor):
 
 class QAPISchemaModularCVisitor(QAPISchemaVisitor):
 
-def __init__(self, prefix, what, blurb, pydoc):
+def __init__(self, prefix, what, user_blurb, builtin_blurb, pydoc):
 self._prefix = prefix
 self._what = what
-self._blurb = blurb
+self._user_blurb = user_blurb
+self._builtin_blurb = builtin_blurb
 self._pydoc = pydoc
 self._genc = None
 self._genh = None
@@ -245,7 +246,7 @@ class QAPISchemaModularCVisitor(QAPISchemaVisitor):
 genc = QAPIGenC(basename + '.c', blurb, self._pydoc)
 genh = QAPIGenH(basename + '.h', blurb, self._pydoc)
 self._module[name] = (genc, genh)
-self._set_module(name)
+self._genc, self._genh = self._module[name]
 
 def _add_user_module(self, name, blurb):
 assert self._is_user_module(name)
@@ -256,9 +257,6 @@ class QAPISchemaModularCVisitor(QAPISchemaVisitor):
 def _add_system_module(self, name, blurb):
 self._add_module(name and './' + name, blurb)
 
-def _set_module(self, name):
-self._genc, self._genh = self._module[name]
-
 def write(self, output_dir, opt_builtins=False):
 for name in self._module:
 if self._is_builtin_module(name) and not opt_builtins:
@@ -271,15 +269,17 @@ class QAPISchemaModularCVisitor(QAPISchemaVisitor):
 pass
 
 def visit_module(self, name):
-if name in self._module:
-self._set_module(name)
-elif self._is_builtin_module(name):
-# The built-in module has not been created.  No code may
-# be generated.
-self._genc = None
-self._genh = None
+if name is None:
+if self._builtin_blurb:
+self._add_system_module(None, self._builtin_blurb)
+self._begin_system_module(name)
+else:
+# The built-in module has not been created.  No code may
+# be generated.
+self._genc = None
+self._genh = None
 else:
-self._add_user_module(name, self._blurb)
+self._add_user_module(name, self._user_blurb)
 self._begin_user_module(name)
 
 def visit_include(self, name, info):
diff --git a/scripts/qapi/types.py b/scripts/qapi/types.py
index d8751daa04..99dcaf7074 100644
--- a/scripts/qapi/types.py
+++ b/scripts/qapi/types.py
@@ -243,8 +243,9 @@ class QAPISchemaGenTypeVisitor(QAPISchemaModularCVisitor):
 def __init__(self, prefix):
 QAPISchemaModularCVisitor.__init__(
 self, prefix, 'qapi-types', ' * Schema-defined QAPI types',
-__doc__)
-self._add_system_module(None, ' * Built-in QAPI types')
+' * Built-in QAPI types', __doc__)
+
+def _begin_system_module(self, name):
 self._genc.preamble_add(mcgen('''
 #include "qemu/osdep.h"
 #include "qapi/dealloc-visitor.h"
diff --git a/scripts/qapi/visit.py b/scripts/qapi/visit.py
index c72f2bc5c0..4efce62b0c 100644
--- a/scripts/qapi/visit.py
+++ b/scripts/qapi/visit.py
@@ -285,8 +285,9 @@ class QAPI

[PATCH 2/6] tests/Makefile.include: Fix missing test-qapi-emit-events.[ch]

2019-11-20 Thread Markus Armbruster

Commit 5d75648b56 "qapi: Generate QAPIEvent stuff into separate files"
added tests/test-qapi-emit-events.[ch] to the set of generated files,
but neglected to update tests/.gitignore and tests/Makefile.include.
Commit a0af8cee3c "tests/.gitignore: ignore test-qapi-emit-events.[ch]
for in-tree builds" fixed the former.  Now fix the latter.

Signed-off-by: Markus Armbruster 
---
 tests/Makefile.include | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 8566f5f119..75b377d1a9 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -503,6 +503,7 @@ generated-files-y += tests/test-qapi-visit-sub-sub-module.h
 generated-files-y += tests/test-qapi-commands.h
 generated-files-y += tests/include/test-qapi-commands-sub-module.h
 generated-files-y += tests/test-qapi-commands-sub-sub-module.h
+generated-files-y += tests/test-qapi-emit-events.h
 generated-files-y += tests/test-qapi-events.h
 generated-files-y += tests/include/test-qapi-events-sub-module.h
 generated-files-y += tests/test-qapi-events-sub-sub-module.h
@@ -610,6 +611,7 @@ tests/include/test-qapi-commands-sub-module.h \
 tests/include/test-qapi-commands-sub-module.c \
 tests/test-qapi-commands-sub-sub-module.h \
 tests/test-qapi-commands-sub-sub-module.c \
+tests/test-qapi-emit-events.c tests/test-qapi-emit-events.h \
 tests/test-qapi-events.c tests/test-qapi-events.h \
 tests/include/test-qapi-events-sub-module.c \
 tests/include/test-qapi-events-sub-module.h \
@@ -637,7 +639,7 @@ tests/qapi-schema/doc-good.test.texi: 
$(SRC_PATH)/tests/qapi-schema/doc-good.jso
 
 tests/test-string-output-visitor$(EXESUF): tests/test-string-output-visitor.o 
$(test-qapi-obj-y)
 tests/test-string-input-visitor$(EXESUF): tests/test-string-input-visitor.o 
$(test-qapi-obj-y)
-tests/test-qmp-event$(EXESUF): tests/test-qmp-event.o $(test-qapi-obj-y) 
tests/test-qapi-events.o
+tests/test-qmp-event$(EXESUF): tests/test-qmp-event.o $(test-qapi-obj-y) 
tests/test-qapi-emit-events.o tests/test-qapi-events.o
 tests/test-qobject-output-visitor$(EXESUF): 
tests/test-qobject-output-visitor.o $(test-qapi-obj-y)
 tests/test-clone-visitor$(EXESUF): tests/test-clone-visitor.o 
$(test-qapi-obj-y)
 tests/test-qobject-input-visitor$(EXESUF): tests/test-qobject-input-visitor.o 
$(test-qapi-obj-y)
-- 
2.21.0

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Kashyap Chamarthy

On Wed, Nov 20, 2019 at 06:57:20PM +0100, Paolo Bonzini wrote:
> On 20/11/19 18:50, Daniel P. Berrangé wrote:
> > On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
> >> This allows using "-cpu Haswell,+vmx", which we did not really want to
> >> support in QEMU but was produced by Libvirt when using the "host-model"
> >> CPU model.
> > Can you say what is currently broken ?  If I launch my current QEMU (I have
> > 4.1.1 on Fedora 31):
> > 
> >  qemu-system-x86_64 -cpu Haswell,+vmx
> > 
> > ... I don't get any reported errors.
> > 
> 
> KVM does not load in the guest, though?

Indeed it doesn't:

$> ./min-qemu.sh
qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.01H:ECX.vmx [bit 5]

[...]
cirros login: cirros
Password: 
[cirros] $ cat /proc/cpuinfo | grep vmx
[cirros] $ echo $?
1

Where `cat min-qemu.sh` is:

#!/usr/bin/env bash

args=(
 -display none
 -cpu Haswell,+vmx
 -no-user-config
 -machine q35,accel=kvm
 -nodefaults
 -m 2048
 -serial stdio
 -drive file=/export/vm1.qcow2,format=qcow2,if=virtio
)

~/build/qemu/x86_64-softmmu/qemu-system-x86_64 "${args[@]}"


[...]

-- 
/kashyap

Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2

2019-11-20 Thread Jan Kiszka


On 12.11.19 09:04, Michael S. Tsirkin wrote:

On Mon, Nov 11, 2019 at 05:38:29PM +0100, Jan Kiszka wrote:

On 11.11.19 17:11, Michael S. Tsirkin wrote:

On Mon, Nov 11, 2019 at 03:27:43PM +, Daniel P. Berrangé wrote:

On Mon, Nov 11, 2019 at 10:08:20AM -0500, Michael S. Tsirkin wrote:

On Mon, Nov 11, 2019 at 02:59:07PM +0100, Jan Kiszka wrote:

On 11.11.19 14:45, Michael S. Tsirkin wrote:

On Mon, Nov 11, 2019 at 01:57:11PM +0100, Jan Kiszka wrote:

+| Offset | Register   | Content
  |
+|---:|:---|:-|
+|00h | Vendor ID  | 1AF4h  
  |
+|02h | Device ID  | 1110h  
  |


Given it's a virtio vendor ID, please reserve a device ID
with the virtio TC.


Yeah, QEMU's IVSHMEM was always using that. I'm happy to make this finally
official.



And I guess we will just mark it reserved or something right?
Since at least IVSHMEM 1 isn't a virtio device.
And will you be reusing same ID for IVSHMEM 2 or a new one?


1110h isn't under either of the virtio PCI device ID allowed ranges
according to the spec:

"Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device
 ID 0x1000 through 0x107F inclusive is a virtio device.
 ...
 Additionally, devices MAY utilize a Transitional PCI Device
 ID range, 0x1000 to 0x103F depending on the device type. "

So there's no need to reserve 0x1110h from the virtio spec POV.


Well we do have:

B.3
What Device Number?
Device numbers can be reserved by the OASIS committee: email 
virtio-...@lists.oasis-open.org to secure
a unique one.
Meanwhile for experimental drivers, use 65535 and work backwards.

So it seems it can  in theory conflict at least with experimental virtio 
devices.

Really it's messy that people are reusing the virtio vendor ID for
random stuff - getting a vendor ID is only hard for a hobbyist, any big
company already has an ID - but if it is a hobbyist and they at least
register then doesn't cause much harm.


Note that ivshmem came from a research environment. I do know if there was a
check for the IDs at the point the code was merged.

That said, I may get a device ID here as well, provided I can explain that
not a single "product" will own it, but rather an open specification.

Jan


OK, up to you - if you decide you want an ID reserved, pls let us know.



Turned out to be much simpler than expect:

I reserved device ID 4106h under the vendor ID 110Ah (Siemens AG) for 
the purpose of specifying a shared memory device via the VIRTIO TC. Will 
update this "detail" in the next revision of the patches, also resetting 
the device revision ID to 0 as no longer need to tell us apart from the 
current implementation this way.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Daniel P . Berrangé

On Wed, Nov 20, 2019 at 06:57:20PM +0100, Paolo Bonzini wrote:
> On 20/11/19 18:50, Daniel P. Berrangé wrote:
> > On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
> >> This allows using "-cpu Haswell,+vmx", which we did not really want to
> >> support in QEMU but was produced by Libvirt when using the "host-model"
> >> CPU model.
> > Can you say what is currently broken ?  If I launch my current QEMU (I have
> > 4.1.1 on Fedora 31):
> > 
> >  qemu-system-x86_64 -cpu Haswell,+vmx
> > 
> > ... I don't get any reported errors.
> > 
> 
> KVM does not load in the guest, though?

Ah ok, thanks. Can you just put something in the commit message to that
effect.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/6] block: truncate: Don't make backing file data visible

2019-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2019 17:03, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> The old behaviour made a difference in 'block_resize' (where showing the
> backing file data from an old snapshot rather than zeros is
> questionable) as well as in commit block jobs (both from active and
> intermediate nodes) and HMP 'commit', where committing to a short
> backing file would incorrectly omit writing zeroes for unallocated
> blocks on the top layer after the EOF of the short backing file.
> 
> Signed-off-by: Kevin Wolf 
> ---
>   block/io.c | 25 +
>   1 file changed, 25 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 003f4ea38c..8683f7a4bd 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3382,6 +3382,31 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
> int64_t offset, bool exact,
>   goto out;
>   }
>   
> +/*
> + * If the image has a backing file that is large enough that it would
> + * provide data for the new area, we cannot leave it unallocated because
> + * then the backing file content would become visible. Instead, zero-fill
> + * the area where backing file and new area overlap.
> + */
> +if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
> +int64_t backing_len;
> +
> +backing_len = bdrv_getlength(backing_bs(bs));
> +if (backing_len < 0) {
> +ret = backing_len;
> +goto out;
> +}
> +
> +if (backing_len > old_size) {
> +ret = bdrv_co_do_pwrite_zeroes(bs, old_size,
> +   MIN(new_bytes, backing_len - 
> old_size),
> +   BDRV_REQ_ZERO_WRITE | 
> BDRV_REQ_MAY_UNMAP);
> +if (ret < 0) {
> +goto out;
> +}
> +}
> +}
> +
>   ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>   if (ret < 0) {
>   error_setg_errno(errp, -ret, "Could not refresh total sector 
> count");
> 


H. I'm think that for commit, we also should zero truncated area if 
!bdrv_has_zero_init_truncate(bs).
But we should not do it here, as it should not be done if we just resizing 
disk..

What formats are that bad?

-- 
Best regards,
Vladimir

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Daniel P . Berrangé

On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
> This allows using "-cpu Haswell,+vmx", which we did not really want to
> support in QEMU but was produced by Libvirt when using the "host-model"
> CPU model.

Can you say what is currently broken ?  If I launch my current QEMU (I have
4.1.1 on Fedora 31):

 qemu-system-x86_64 -cpu Haswell,+vmx

... I don't get any reported errors.

> 
> This was produced from the output of scripts/kvm/vmxcap using the following
> very ugly Python script:
> 
> bits = {
> 'INS/OUTS instruction information': ['FEAT_VMX_BASIC', 
> 'MSR_VMX_BASIC_INS_OUTS'],
> 'IA32_VMX_TRUE_*_CTLS support': ['FEAT_VMX_BASIC', 
> 'MSR_VMX_BASIC_TRUE_CTLS'],
> 'External interrupt exiting': ['FEAT_VMX_PINBASED_CTLS', 
> 'VMX_PIN_BASED_EXT_INTR_MASK'],
> 'NMI exiting': ['FEAT_VMX_PINBASED_CTLS', 
> 'VMX_PIN_BASED_NMI_EXITING'],
> 'Virtual NMIs': ['FEAT_VMX_PINBASED_CTLS', 
> 'VMX_PIN_BASED_VIRTUAL_NMIS'],
> 'Activate VMX-preemption timer': ['FEAT_VMX_PINBASED_CTLS', 
> 'VMX_PIN_BASED_VMX_PREEMPTION_TIMER'],
> 'Process posted interrupts': ['FEAT_VMX_PINBASED_CTLS', 
> 'VMX_PIN_BASED_POSTED_INTR'],
> 'Interrupt window exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_VIRTUAL_INTR_PENDING'],
> 'Use TSC offsetting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_USE_TSC_OFFSETING'],
> 'HLT exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_HLT_EXITING'],
> 'INVLPG exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_INVLPG_EXITING'],
> 'MWAIT exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_MWAIT_EXITING'],
> 'RDPMC exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_RDPMC_EXITING'],
> 'RDTSC exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_RDTSC_EXITING'],
> 'CR3-load exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_CR3_LOAD_EXITING'],
> 'CR3-store exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_CR3_STORE_EXITING'],
> 'CR8-load exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_CR8_LOAD_EXITING'],
> 'CR8-store exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_CR8_STORE_EXITING'],
> 'Use TPR shadow': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_TPR_SHADOW'],
> 'NMI-window exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_VIRTUAL_NMI_PENDING'],
> 'MOV-DR exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_MOV_DR_EXITING'],
> 'Unconditional I/O exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_UNCOND_IO_EXITING'],
> 'Use I/O bitmaps': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_USE_IO_BITMAPS'],
> 'Monitor trap flag': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_MONITOR_TRAP_FLAG'],
> 'Use MSR bitmaps': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_USE_MSR_BITMAPS'],
> 'MONITOR exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_MONITOR_EXITING'],
> 'PAUSE exiting': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_PAUSE_EXITING'],
> 'Activate secondary control': ['FEAT_VMX_PROCBASED_CTLS', 
> 'VMX_CPU_BASED_ACTIVATE_SECONDARY_CONTROLS'],
> 'Virtualize APIC accesses': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES'],
> 'Enable EPT': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_ENABLE_EPT'],
> 'Descriptor-table exiting': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_DESC'],
> 'Enable RDTSCP': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_RDTSCP'],
> 'Virtualize x2APIC mode': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE'],
> 'Enable VPID': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_ENABLE_VPID'],
> 'WBINVD exiting': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_WBINVD_EXITING'],
> 'Unrestricted guest': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_UNRESTRICTED_GUEST'],
> 'APIC register emulation': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT'],
> 'Virtual interrupt delivery': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY'],
> 'PAUSE-loop exiting': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING'],
> 'RDRAND exiting': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_RDRAND_EXITING'],
> 'Enable INVPCID': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_ENABLE_INVPCID'],
> 'Enable VM functions': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_ENABLE_VMFUNC'],
> 'VMCS shadowing': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_SHADOW_VMCS'],
> 'RDSEED exiting': ['FEAT_VMX_SECONDARY_CTLS', 
> 'VMX_SECONDARY_EXEC_RDSEED_EXITING

Re: [PATCH] target/i386: add VMX features to named CPU models

2019-11-20 Thread Paolo Bonzini

On 20/11/19 18:50, Daniel P. Berrangé wrote:
> On Wed, Nov 20, 2019 at 06:37:53PM +0100, Paolo Bonzini wrote:
>> This allows using "-cpu Haswell,+vmx", which we did not really want to
>> support in QEMU but was produced by Libvirt when using the "host-model"
>> CPU model.
> Can you say what is currently broken ?  If I launch my current QEMU (I have
> 4.1.1 on Fedora 31):
> 
>  qemu-system-x86_64 -cpu Haswell,+vmx
> 
> ... I don't get any reported errors.
> 

KVM does not load in the guest, though?

Paolo

Re: [PATCH for-5.0 v5 14/23] ppc/spapr: Implement the XiveFabric interface

2019-11-20 Thread Greg Kurz

On Fri, 15 Nov 2019 17:24:27 +0100
Cédric Le Goater  wrote:

> The CAM line matching sequence in the pseries machine does not change
> much apart from the use of the new QOM interfaces. There is an extra
> indirection because of the sPAPR IRQ backend of the machine. Only the
> XIVE backend implements the new 'match_nvt' handler.
> 

The changelog needs an update since you dropped the indirection you had
in v4.

> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/spapr.c | 36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 94f9d27096af..a8f5850f65bb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4270,6 +4270,39 @@ static void 
> spapr_pic_print_info(InterruptStatsProvider *obj,
> kvm_irqchip_in_kernel() ? "in-kernel" : "emulated");
>  }
>  
> +static int spapr_xive_match_nvt(XiveFabric *xfb, uint8_t format,
> +uint8_t nvt_blk, uint32_t nvt_idx,
> +bool cam_ignore, uint8_t priority,
> +uint32_t logic_serv, XiveTCTXMatch *match)
> +{
> +SpaprMachineState *spapr = SPAPR_MACHINE(xfb);
> +XivePresenter *xptr = XIVE_PRESENTER(spapr->xive);
> +XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
> +int count;
> +

As suggested by David, you should probably assert() that XIVE is in use
for extra paran^Wsafety.

With these fixed,

Reviewed-by: Greg Kurz 

> +count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
> +   priority, logic_serv, match);
> +if (count < 0) {
> +return count;
> +}
> +
> +/*
> + * When we implement the save and restore of the thread interrupt
> + * contexts in the enter/exit CPU handlers of the machine and the
> + * escalations in QEMU, we should be able to handle non dispatched
> + * vCPUs.
> + *
> + * Until this is done, the sPAPR machine should find at least one
> + * matching context always.
> + */
> +if (count == 0) {
> +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: NVT %x/%x is not dispatched\n",
> +  nvt_blk, nvt_idx);
> +}
> +
> +return count;
> +}
> +
>  int spapr_get_vcpu_id(PowerPCCPU *cpu)
>  {
>  return cpu->vcpu_id;
> @@ -4366,6 +4399,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  PPCVirtualHypervisorClass *vhc = PPC_VIRTUAL_HYPERVISOR_CLASS(oc);
>  XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
>  InterruptStatsProviderClass *ispc = INTERRUPT_STATS_PROVIDER_CLASS(oc);
> +XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
>  
>  mc->desc = "pSeries Logical Partition (PAPR compliant)";
>  mc->ignore_boot_device_suffixes = true;
> @@ -4442,6 +4476,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  smc->linux_pci_probe = true;
>  smc->smp_threads_vsmt = true;
>  smc->nr_xirqs = SPAPR_NR_XIRQS;
> +xfc->match_nvt = spapr_xive_match_nvt;
>  }
>  
>  static const TypeInfo spapr_machine_info = {
> @@ -4460,6 +4495,7 @@ static const TypeInfo spapr_machine_info = {
>  { TYPE_PPC_VIRTUAL_HYPERVISOR },
>  { TYPE_XICS_FABRIC },
>  { TYPE_INTERRUPT_STATS_PROVIDER },
> +{ TYPE_XIVE_FABRIC },
>  { }
>  },
>  };

Re: [PATCH for 4.2?] tests/vm/centos: fix centos build target

2019-11-20 Thread Wainer dos Santos Moschetta




On 11/20/19 2:14 PM, Alex Bennée wrote:

To be able to run the docker tests centos has here we have to install
python3 as well as the basic tools.

Signed-off-by: Alex Bennée 
---
  tests/vm/centos | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Wainer dos Santos Moschetta 




diff --git a/tests/vm/centos b/tests/vm/centos
index 53976f1c4c9..b9e851f2d33 100755
--- a/tests/vm/centos
+++ b/tests/vm/centos
@@ -73,7 +73,7 @@ class CentosVM(basevm.BaseVM):
  self.wait_ssh()
  self.ssh_root_check("touch /etc/cloud/cloud-init.disabled")
  self.ssh_root_check("yum update -y")
-self.ssh_root_check("yum install -y docker make git")
+self.ssh_root_check("yum install -y docker make git python3")
  self.ssh_root_check("systemctl enable docker")
  self.ssh_root("poweroff")
  self.wait()

Re: qcow2 preallocation and backing files

2019-11-20 Thread Vladimir Sementsov-Ogievskiy

20.11.2019 19:46, Kevin Wolf wrote:
> Am 20.11.2019 um 16:58 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 20.11.2019 18:18, Alberto Garcia wrote:
>>> On Wed 20 Nov 2019 01:27:53 PM CET, Vladimir Semeeausntsov-Ogievskiy wrote:
>>>
 3. Also, the latter way is inconsistent with discard. Discarded
 regions returns zeroes, not clusters from backing. I think discard and
 truncate should behave in the same safe zero way.
>>>
>>> But then PREALLOC_MODE_OFF implies that the L2 metadata should be
>>> preallocated (all clusters should be QCOW2_CLUSTER_ZERO_PLAIN), at least
>>> when there is a backing file.
>>>
>>> Or maybe we just forbid PREALLOC_MODE_OFF during resize if there is a
>>> backing file ?
>>>
>>
>> Kevin proposed a fix that alters PREALLOC_MODE_OFF behavior if there is
>> a backing file, to allocate L2 metadata with ZERO clusters..
>>
>> I don't think that it's the best thing to do, but it's already done,
>> it works and seems appropriate for rc3..
>>
>> I see now, that change PREALLOC_MODE_OFF behavior may break things,
>> first of all qemu-img create, which creating UNALLOCATED qcow2 by
>> default for years.
> 
> And it still does, because the backing file is added only after giving
> the qcow2 image the right size.
> 
> But you're right, this is more accidental than by design. I wonder if
> there are other problematic cases (and whether merging something like
> this in -rc3 isn't rather risky).
> 
>> Still, I think that it would be safer to always ZERO expanded part of
>> qcow2, regardless of backing file..
>>
>> We may add PREALLOC_MODE_ZERO, and use it in mirror, commit, and some
>> other calls to bdrv_truncate, except for qcow2 image creation of
>> course.
> 
> What do we do with image formats that don't support zero clusters and
> therefore can't provide PREALLOC_MODE_ZERO? Will commit just fail for
> them?


Hmm. consider committing to raw

x y
qcow2 [--]  - full of unallocated clusters
raw   [2987957285235298]- full of data, but file is short


Before commit, data from [x,y] reads as zero. Therefore, we should zero
expanded part of base..

And this is for base of any format: [x,y] must be zero after commit. So,
if format can't do fast-zero, it should fallback to writing real zeros.

===

Hmm, actually after your patch all formats partly support PREALLOC_MDOE_ZERO,
which in the worst case is done by writing real zeros.

> 
>> Then, to improve this mode handling in qcow2, to not allocate all L2
>> tables, we may add "zero" bit to L1 table entry.
> 
> This would be an incompatible image format change that needs to be
> explicitly enabled by the user. This might limit its usefulness a bit.
> 

Yes, I understand this. Still it may make sense.


-- 
Best regards,
Vladimir

Re: [PATCH for-5.0 v5 13/23] ppc/pnv: Implement the XiveFabric interface

2019-11-20 Thread Greg Kurz

On Fri, 15 Nov 2019 17:24:26 +0100
Cédric Le Goater  wrote:

> The CAM line matching on the PowerNV machine now scans all chips of
> the system and all CPUs of a chip to find a dispatched NVT in the
> thread contexts.
> 
> Signed-off-by: Cédric Le Goater 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/pnv.c | 35 +++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 94c9f536413f..207a5cf2c650 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -1446,6 +1446,35 @@ static void pnv_pic_print_info(InterruptStatsProvider 
> *obj,
>  }
>  }
>  
> +static int pnv_xive_match_nvt(XiveFabric *xfb, uint8_t format,
> +   uint8_t nvt_blk, uint32_t nvt_idx,
> +   bool cam_ignore, uint8_t priority,
> +   uint32_t logic_serv,
> +   XiveTCTXMatch *match)
> +{
> +PnvMachineState *pnv = PNV_MACHINE(xfb);
> +int total_count = 0;
> +int i;
> +
> +for (i = 0; i < pnv->num_chips; i++) {
> +Pnv9Chip *chip9 = PNV9_CHIP(pnv->chips[i]);
> +XivePresenter *xptr = XIVE_PRESENTER(&chip9->xive);
> +XivePresenterClass *xpc = XIVE_PRESENTER_GET_CLASS(xptr);
> +int count;
> +
> +count = xpc->match_nvt(xptr, format, nvt_blk, nvt_idx, cam_ignore,
> +   priority, logic_serv, match);
> +
> +if (count < 0) {
> +return count;
> +}
> +
> +total_count += count;
> +}
> +
> +return total_count;
> +}
> +
>  static void pnv_get_num_chips(Object *obj, Visitor *v, const char *name,
>void *opaque, Error **errp)
>  {
> @@ -1509,9 +1538,11 @@ static void pnv_machine_power8_class_init(ObjectClass 
> *oc, void *data)
>  static void pnv_machine_power9_class_init(ObjectClass *oc, void *data)
>  {
>  MachineClass *mc = MACHINE_CLASS(oc);
> +XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
>  
>  mc->desc = "IBM PowerNV (Non-Virtualized) POWER9";
>  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0");
> +xfc->match_nvt = pnv_xive_match_nvt;
>  
>  mc->alias = "powernv";
>  }
> @@ -1558,6 +1589,10 @@ static const TypeInfo types[] = {
>  .name  = MACHINE_TYPE_NAME("powernv9"),
>  .parent= TYPE_PNV_MACHINE,
>  .class_init= pnv_machine_power9_class_init,
> +.interfaces = (InterfaceInfo[]) {
> +{ TYPE_XIVE_FABRIC },
> +{ },
> +},
>  },
>  {
>  .name  = MACHINE_TYPE_NAME("powernv8"),

Re: [PATCH] pseries: disable migration-test if /dev/kvm cannot be used

2019-11-20 Thread Thomas Huth

On 20/11/2019 18.09, Laurent Vivier wrote:
> On ppc64, migration-test only works with kvm_hv, and we already
> have a check to verify the module is loaded.
> 
> kvm_hv module can be loaded in memory and /sys/module/kvm_hv exists,
> but on some systems (like build systems) /dev/kvm can be missing
> (by administrators choice).
> 
> And as kvm_hv exists test-migration is started but QEMU falls back to
> TCG because it cannot be used:
> 
> Could not access KVM kernel module: No such file or directory
> failed to initialize KVM: No such file or directory
> Back to tcg accelerator
> 
> And as the test is done with TCG, it fails.
> 
> As for s390x, we must check for the existence and the access rights
> of /dev/kvm.
> 
> Reported-by: Cole Robinson 
> Signed-off-by: Laurent Vivier 
> ---
>  tests/migration-test.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index ac780dffdaad..2b25ba6d77f6 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -1349,7 +1349,8 @@ int main(int argc, char **argv)
>   * some reason)
>   */
>  if (g_str_equal(qtest_get_arch(), "ppc64") &&
> -access("/sys/module/kvm_hv", F_OK)) {
> +(access("/sys/module/kvm_hv", F_OK) ||
> + access("/dev/kvm", R_OK | W_OK))) {
>  g_test_message("Skipping test: kvm_hv not available");
>  return g_test_run();
>  }
> 

Reviewed-by: Thomas Huth

1 2 3 4 >

1 - 100 of 366 matches

Mail list logo