Re: [RFC] vfio/migration: reduce the msix virq setup cost in resume phase

2021-08-17 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/8/18 11:50, Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
写道:
> 
> 
> 在 2021/8/18 4:26, Alex Williamson 写道:
>> On Fri, 13 Aug 2021 12:06:14 +0800
>> "Longpeng(Mike)"  wrote:
>>
>>> In migration resume phase, all unmasked msix vectors need to be
>>> setup when load the VF state. However, the setup operation would
>>> takes longer if the VF has more unmasked vectors.
>>>
>>> In our case, the VF has 65 vectors and each one spend 0.8ms on
>>> setup operation (vfio_add_kvm_msi_virq -> kvm_irqchip_commit_routes),
>>> the total cost of the VF is more than 40ms. Even worse, the VM has
>>> 8 VFs, so the downtime increase more than 320ms.
>>>
>>> vfio_pci_load_config
>>>   vfio_msix_enable
>>> msix_set_vector_notifiers
>>>   for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>>> vfio_msix_vector_do_use
>>>   vfio_add_kvm_msi_virq
>>> kvm_irqchip_commit_routes <-- 0.8ms
>>>   }
>>>
>>> Originaly, We tried to batch all routes and just commit once
>>> outside the loop, but it's not easy to fallback to qemu interrupt
>>> if someone fails.
>>
>> I'm not sure I follow here, once we setup the virq, what's the failure
>> path?  kvm_irqchip_commit_routes() returns void.  Were you looking at
>> adding a "defer" arg to kvm_irqchip_add_msi_route() that skips the
>> commit, which vfio_add_kvm_msi_virq() might set based on the migration
>> state and vfio_pci_load_config() could then trigger the commit?
> 
> Yes, my implementation is almost exactly the same as you said here and it 
> works,
> but there's a semantic problem makes me suspect.
> 
> The calltrace in vfio_add_kvm_msi_virq is:
> vfio_add_kvm_msi_virq
>   kvm_irqchip_add_msi_route
> kvm_irqchip_commit_routes
>   kvm_vm_ioctl(KVM_SET_GSI_ROUTING)
>   kvm_irqchip_add_irqfd_notifier_gsi
> kvm_irqchip_assign_irqfd
>   kvm_vm_ioctl(KVM_IRQFD)
> 
> I referred to some other places where need to assign irqfds, the asignment is
> always after the virq is committed.
> The KVM API doc does not declare the dependencies of them, but the existent 
> code
> seem implys the order. The "defer" makes them out of order, it can work at the
> moment,  but not sure if KVM would change in the future.
> 
> I perfer "defer" too if we can make sure it's OK if the asigment and commit 
> are
> not in order. I hope we could reach agreement on this point first, and then 
> I'll
> continue to reply the comments below if still necessary.
> 
> So do you think we should keep the order of asignment and commit ?
> 
>> There's more overhead that can be removed if VFIO_DEVICE_SET_IRQS could
>> be called once rather than per vector.
> 
> Yes, I've already optimized these overhead in our production before. I
> saw the upstream also did ( commit ecebe53fe ) in this year, I'll backport
> the upstream's soluation in order to keep pace with the community.
> 

Oh, the commit ecebe53fe can not save the VFIO_DEVICE_SET_IRQS operations, it
still need to be called ( in vfio_set_irq_signaling ) for each unmasked vector
during the resume phase.

In my implementation, the VFIO_DEVICE_SET_IRQS is skipped totally and call
vfio_enable_vectors only once outside the loop. I'll send this optimization
together in the next version.

> [ stop here ]
> 
>>
>>> So this patch trys to defer the KVM interrupt setup, the unmasked
>>> vector will use qemu interrupt as default and switch to kvm interrupt
>>> once it fires.
>>>
>>> Signed-off-by: Longpeng(Mike) 
>>> ---
>>>  hw/vfio/pci.c | 39 ++-
>>>  hw/vfio/pci.h |  2 ++
>>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index e1ea1d8..dd35170 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -47,6 +47,8 @@
>>>  
>>>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>> +static void vfio_add_kvm_msix_virq(VFIOPCIDevice *vdev,
>>> +   VFIOMSIVector *vector, int nr);
>>>  
>>>  /*
>>>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
>>> @@ -347,6 +349,11 @@ static void vfio_msi_interrupt(void *opaque)
>>>  get_msg = msix_get_message;
>>>  notify = msix_notify;
>>>  
>>> +if (unlikely(vector->need_switch)) {
>>> +vfio_add_kvm_msix_virq(vdev, vector, nr);
>>> +vector->need_switch = false;
>>> +}
>>> +
>>
>> A better name might be "vector->kvm_routing_deferred".  Essentially this
>> is just a lazy setup of KVM routes, we could always do this, or we could
>> do this based on a device options.  I wonder if there are any affinity
>> benefits in the VM to defer the KVM route.
>>
>>>  /* A masked vector firing needs to use the PBA, enable it */
>>>  if (msix_is_masked(>pdev, nr)) {
>>>  set_bit(nr, vdev->msix->pending);
>>> @@ -438,6 +445,25 @@ static void 

[Bug 1858814] Re: 'make -C roms efi' does not update edk2 submodules

2021-08-17 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1858814

Title:
  'make -C roms efi' does not update edk2 submodules

Status in QEMU:
  Expired

Bug description:
  On a fresh clone, 'make -C roms efi' fails because submodule is not
  initialized [1]:

  
/builds/philmd/qemu/roms/edk2/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf(-1):
 error 000E: File/directory not found in workspace
  /builds/philmd/qemu/roms/edk2/CryptoPkg/Library/OpensslLib/openssl/e_os.h
  - Failed -

  Laszlo suggested [2] it is possibly a regression from commit f3e330e3c319:
  "roms/Makefile.edk2: don't pull in submodules when building from tarball"

  [1] https://gitlab.com/philmd/qemu/-/jobs/395644357#L436
  [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg668929.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1858814/+subscriptions




[Bug 1880763] Re: Missing page crossing check in use_goto_tb() for rx target

2021-08-17 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1880763

Title:
  Missing page crossing check in use_goto_tb() for rx target

Status in QEMU:
  Expired

Bug description:
  Currently the rx target doesn't have the page crossing check in its 
  use_goto_tb() function. 
  This is a required feature for stable system mode emulations that all 
  other targets implement.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1880763/+subscriptions




Re: [RFC] vfio/migration: reduce the msix virq setup cost in resume phase

2021-08-17 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/8/18 4:26, Alex Williamson 写道:
> On Fri, 13 Aug 2021 12:06:14 +0800
> "Longpeng(Mike)"  wrote:
> 
>> In migration resume phase, all unmasked msix vectors need to be
>> setup when load the VF state. However, the setup operation would
>> takes longer if the VF has more unmasked vectors.
>>
>> In our case, the VF has 65 vectors and each one spend 0.8ms on
>> setup operation (vfio_add_kvm_msi_virq -> kvm_irqchip_commit_routes),
>> the total cost of the VF is more than 40ms. Even worse, the VM has
>> 8 VFs, so the downtime increase more than 320ms.
>>
>> vfio_pci_load_config
>>   vfio_msix_enable
>> msix_set_vector_notifiers
>>   for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>> vfio_msix_vector_do_use
>>   vfio_add_kvm_msi_virq
>> kvm_irqchip_commit_routes <-- 0.8ms
>>   }
>>
>> Originaly, We tried to batch all routes and just commit once
>> outside the loop, but it's not easy to fallback to qemu interrupt
>> if someone fails.
> 
> I'm not sure I follow here, once we setup the virq, what's the failure
> path?  kvm_irqchip_commit_routes() returns void.  Were you looking at
> adding a "defer" arg to kvm_irqchip_add_msi_route() that skips the
> commit, which vfio_add_kvm_msi_virq() might set based on the migration
> state and vfio_pci_load_config() could then trigger the commit?

Yes, my implementation is almost exactly the same as you said here and it works,
but there's a semantic problem makes me suspect.

The calltrace in vfio_add_kvm_msi_virq is:
vfio_add_kvm_msi_virq
  kvm_irqchip_add_msi_route
kvm_irqchip_commit_routes
  kvm_vm_ioctl(KVM_SET_GSI_ROUTING)
  kvm_irqchip_add_irqfd_notifier_gsi
kvm_irqchip_assign_irqfd
  kvm_vm_ioctl(KVM_IRQFD)

I referred to some other places where need to assign irqfds, the asignment is
always after the virq is committed.
The KVM API doc does not declare the dependencies of them, but the existent code
seem implys the order. The "defer" makes them out of order, it can work at the
moment,  but not sure if KVM would change in the future.

I perfer "defer" too if we can make sure it's OK if the asigment and commit are
not in order. I hope we could reach agreement on this point first, and then I'll
continue to reply the comments below if still necessary.

So do you think we should keep the order of asignment and commit ?

> There's more overhead that can be removed if VFIO_DEVICE_SET_IRQS could
> be called once rather than per vector.

Yes, I've already optimized these overhead in our production before. I
saw the upstream also did ( commit ecebe53fe ) in this year, I'll backport
the upstream's soluation in order to keep pace with the community.

[ stop here ]

> 
>> So this patch trys to defer the KVM interrupt setup, the unmasked
>> vector will use qemu interrupt as default and switch to kvm interrupt
>> once it fires.
>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  hw/vfio/pci.c | 39 ++-
>>  hw/vfio/pci.h |  2 ++
>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e1ea1d8..dd35170 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -47,6 +47,8 @@
>>  
>>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>> +static void vfio_add_kvm_msix_virq(VFIOPCIDevice *vdev,
>> +   VFIOMSIVector *vector, int nr);
>>  
>>  /*
>>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
>> @@ -347,6 +349,11 @@ static void vfio_msi_interrupt(void *opaque)
>>  get_msg = msix_get_message;
>>  notify = msix_notify;
>>  
>> +if (unlikely(vector->need_switch)) {
>> +vfio_add_kvm_msix_virq(vdev, vector, nr);
>> +vector->need_switch = false;
>> +}
>> +
> 
> A better name might be "vector->kvm_routing_deferred".  Essentially this
> is just a lazy setup of KVM routes, we could always do this, or we could
> do this based on a device options.  I wonder if there are any affinity
> benefits in the VM to defer the KVM route.
> 
>>  /* A masked vector firing needs to use the PBA, enable it */
>>  if (msix_is_masked(>pdev, nr)) {
>>  set_bit(nr, vdev->msix->pending);
>> @@ -438,6 +445,25 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
>> VFIOMSIVector *vector,
>>  vector->virq = virq;
>>  }
>>  
>> +static void
>> +vfio_add_kvm_msix_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector, int nr)
>> +{
>> +Error *err = NULL;
>> +int fd;
>> +
>> +vfio_add_kvm_msi_virq(vdev, vector, nr, true);
>> +if (vector->virq < 0) {
>> +return;
>> +}
>> +
>> +fd = event_notifier_get_fd(>kvm_interrupt);
>> +if (vfio_set_irq_signaling(>vbasedev,
>> +   VFIO_PCI_MSIX_IRQ_INDEX, nr,
>> +   VFIO_IRQ_SET_ACTION_TRIGGER, fd, )) {
>> +

Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread James Bottomley
On Tue, 2021-08-17 at 16:10 -0700, Steve Rutherford wrote:
> On Tue, Aug 17, 2021 at 3:57 PM James Bottomley 
> wrote:
> > Realistically, migration is becoming a royal pain, not just for
> > confidential computing, but for virtual functions in general.  I
> > really think we should look at S3 suspend, where we shut down the
> > drivers and then reattach on S3 resume as the potential pathway to
> > getting migration working both for virtual functions and this use
> > case.
> 
> This type of migration seems a little bit less "live", which makes me
> concerned about its performance characteristics.

Well, there are too many scenarios we just fail at migration today.  We
need help from the guest to quiesce or shut down the interior devices,
and S3 suspend seems to be the machine signal for that.  I think in
most clouds guests would accept some loss of "liveness" for a gain in
reliability as long as we keep them within the SLA ... which is 5
minutes a year for 5 nines.  Most failed migrations also instantly fail
SLAs because of the recovery times involved so I don't see what's to be
achieved by keeping the current "we can migrate sometimes" approach.

James





[PATCH v4] hw/dma/pl330: Add memory region to replace default

2021-08-17 Thread Wen, Jianxian
Add configurable property memory region which can connect with IOMMU region to 
support SMMU translate.

Signed-off-by: Jianxian Wen 
---
v3 -> v4 (after review of Philippe Mathieu-Daudé):
 - Avoid adding unnecessary AS, add AS if we connect with IOMMU region.
v2 -> v3 (after review of Philippe Mathieu-Daudé):
 - Refine code to comply with code style, update error message if memory link 
is not set.
v1 -> v2 (after review of Peter Maydell):
 - Data access use dma_memory_read/write functions, update function 
AddressSpace* parameter.

 hw/dma/pl330.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/dma/pl330.c b/hw/dma/pl330.c
index 944ba29..8ae56c7 100644
--- a/hw/dma/pl330.c
+++ b/hw/dma/pl330.c
@@ -269,6 +269,9 @@ struct PL330State {
 uint8_t num_faulting;
 uint8_t periph_busy[PL330_PERIPH_NUM];
 
+/* Memory region that DMA operation access */
+MemoryRegion *mem_mr;
+AddressSpace *mem_as;
 };
 
 #define TYPE_PL330 "pl330"
@@ -1108,7 +,7 @@ static inline const PL330InsnDesc 
*pl330_fetch_insn(PL330Chan *ch)
 uint8_t opcode;
 int i;
 
-dma_memory_read(_space_memory, ch->pc, , 1);
+dma_memory_read(ch->parent->mem_as, ch->pc, , 1);
 for (i = 0; insn_desc[i].size; i++) {
 if ((opcode & insn_desc[i].opmask) == insn_desc[i].opcode) {
 return _desc[i];
@@ -1122,7 +1125,7 @@ static inline void pl330_exec_insn(PL330Chan *ch, const 
PL330InsnDesc *insn)
 uint8_t buf[PL330_INSN_MAXSIZE];
 
 assert(insn->size <= PL330_INSN_MAXSIZE);
-dma_memory_read(_space_memory, ch->pc, buf, insn->size);
+dma_memory_read(ch->parent->mem_as, ch->pc, buf, insn->size);
 insn->exec(ch, buf[0], [1], insn->size - 1);
 }
 
@@ -1186,7 +1189,7 @@ static int pl330_exec_cycle(PL330Chan *channel)
 if (q != NULL && q->len <= pl330_fifo_num_free(>fifo)) {
 int len = q->len - (q->addr & (q->len - 1));
 
-dma_memory_read(_space_memory, q->addr, buf, len);
+dma_memory_read(s->mem_as, q->addr, buf, len);
 trace_pl330_exec_cycle(q->addr, len);
 if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
 pl330_hexdump(buf, len);
@@ -1217,7 +1220,7 @@ static int pl330_exec_cycle(PL330Chan *channel)
 fifo_res = pl330_fifo_get(>fifo, buf, len, q->tag);
 }
 if (fifo_res == PL330_FIFO_OK || q->z) {
-dma_memory_write(_space_memory, q->addr, buf, len);
+dma_memory_write(s->mem_as, q->addr, buf, len);
 trace_pl330_exec_cycle(q->addr, len);
 if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
 pl330_hexdump(buf, len);
@@ -1562,6 +1565,13 @@ static void pl330_realize(DeviceState *dev, Error **errp)
   "dma", PL330_IOMEM_SIZE);
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), >iomem);
 
+if (s->mem_mr && (s->mem_mr != get_system_memory())) {
+s->mem_as = g_malloc0(sizeof(AddressSpace));
+address_space_init(s->mem_as, s->mem_mr, s->mem_mr->name);
+} else {
+s->mem_as = _space_memory;
+}
+
 s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pl330_exec_cycle_timer, s);
 
 s->cfg[0] = (s->mgr_ns_at_rst ? 0x4 : 0) |
@@ -1656,6 +1666,9 @@ static Property pl330_properties[] = {
 DEFINE_PROP_UINT8("rd_q_dep", PL330State, rd_q_dep, 16),
 DEFINE_PROP_UINT16("data_buffer_dep", PL330State, data_buffer_dep, 256),
 
+DEFINE_PROP_LINK("memory", PL330State, mem_mr,
+ TYPE_MEMORY_REGION, MemoryRegion *),
+
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.7.4



[PATCH 1/4] target/arm: Take an exception if PSTATE.IL is set

2021-08-17 Thread Richard Henderson
From: Peter Maydell 

In v8A, the PSTATE.IL bit is set for various kinds of illegal
exception return or mode-change attempts.  We already set PSTATE.IL
(or its AArch32 equivalent CPSR.IL) in all those cases, but we
weren't implementing the part of the behaviour where attempting to
execute an instruction with PSTATE.IL takes an immediate exception
with an appropriate syndrome value.

Add a new TB flags bit tracking PSTATE.IL/CPSR.IL, and generate code
to take an exception instead of whatever the instruction would have
been.

PSTATE.IL and CPSR.IL change only on exception entry, attempted
exception exit, and various AArch32 mode changes via cpsr_write().
These places generally already rebuild the hflags, so the only place
we need an extra rebuild_hflags call is in the illegal-return
codepath of the AArch64 exception_return helper.

Signed-off-by: Peter Maydell 
Message-Id: <20210817162118.24319-1-peter.mayd...@linaro.org>
Reviewed-by: Richard Henderson 
[rth: Added missing returns.]
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h   |  1 +
 target/arm/syndrome.h  |  5 +
 target/arm/translate.h |  2 ++
 target/arm/helper-a64.c|  1 +
 target/arm/helper.c|  8 
 target/arm/translate-a64.c | 11 +++
 target/arm/translate.c | 21 +
 7 files changed, 49 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9f0a5f84d5..be557bf5d8 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3441,6 +3441,7 @@ FIELD(TBFLAG_ANY, FPEXC_EL, 8, 2)
 FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 10, 2)
 /* Memory operations require alignment: SCTLR_ELx.A or CCR.UNALIGN_TRP */
 FIELD(TBFLAG_ANY, ALIGN_MEM, 12, 1)
+FIELD(TBFLAG_ANY, PSTATE__IL, 13, 1)
 
 /*
  * Bit usage when in AArch32 state, both A- and M-profile.
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index 39a31260f2..c590a109da 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -270,4 +270,9 @@ static inline uint32_t syn_wfx(int cv, int cond, int ti, 
bool is_16bit)
(cv << 24) | (cond << 20) | ti;
 }
 
+static inline uint32_t syn_illegalstate(void)
+{
+return EC_ILLEGALSTATE << ARM_EL_EC_SHIFT;
+}
+
 #endif /* TARGET_ARM_SYNDROME_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 241596c5bd..af1b6fa03c 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -98,6 +98,8 @@ typedef struct DisasContext {
 bool hstr_active;
 /* True if memory operations require alignment */
 bool align_mem;
+/* True if PSTATE.IL is set */
+bool pstate_il;
 /*
  * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
  *  < 0, set by the current instruction.
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 26f79f9141..19445b3c94 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1071,6 +1071,7 @@ illegal_return:
 if (!arm_singlestep_active(env)) {
 env->pstate &= ~PSTATE_SS;
 }
+helper_rebuild_hflags_a64(env, cur_el);
 qemu_log_mask(LOG_GUEST_ERROR, "Illegal exception return at EL%d: "
   "resuming execution at 0x%" PRIx64 "\n", cur_el, env->pc);
 }
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 155d8bf239..201ecf8c67 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -13408,6 +13408,10 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState 
*env, int fp_el,
 DP_TBFLAG_A32(flags, HSTR_ACTIVE, 1);
 }
 
+if (env->uncached_cpsr & CPSR_IL) {
+DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
+}
+
 return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
@@ -13502,6 +13506,10 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState 
*env, int el, int fp_el,
 }
 }
 
+if (env->pstate & PSTATE_IL) {
+DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
+}
+
 if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
 /*
  * Set MTE_ACTIVE if any access may be Checked, and leave clear
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 422e2ac0c9..230cc8d83b 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14662,6 +14662,16 @@ static void disas_a64_insn(CPUARMState *env, 
DisasContext *s)
 s->fp_access_checked = false;
 s->sve_access_checked = false;
 
+if (s->pstate_il) {
+/*
+ * Illegal execution state. This has priority over BTI
+ * exceptions, but comes after instruction abort exceptions.
+ */
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_illegalstate(), default_exception_el(s));
+return;
+}
+
 if (dc_isar_feature(aa64_bti, s)) {
 if (s->base.num_insns == 1) {
 /*
@@ -14780,6 +14790,7 @@ static void 
aarch64_tr_init_disas_context(DisasContextBase *dcbase,
 #endif
 dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
 dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
+ 

[PATCH 4/4] target/arm: Suppress bp for exceptions with more priority

2021-08-17 Thread Richard Henderson
Both single-step and pc alignment faults have priority over
breakpoint exceptions.

Signed-off-by: Richard Henderson 
---
 target/arm/debug_helper.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index 2983e36dd3..32f3caec23 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -220,6 +220,7 @@ bool arm_debug_check_breakpoint(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = >env;
+target_ulong pc;
 int n;
 
 /*
@@ -231,6 +232,28 @@ bool arm_debug_check_breakpoint(CPUState *cs)
 return false;
 }
 
+/*
+ * Single-step exceptions have priority over breakpoint exceptions.
+ * If single-step state is active-pending, suppress the bp.
+ */
+if (arm_singlestep_active(env) && !(env->pstate & PSTATE_SS)) {
+return false;
+}
+
+/*
+ * PC alignment faults have priority over breakpoint exceptions.
+ */
+pc = is_a64(env) ? env->pc : env->regs[15];
+if ((is_a64(env) || !env->thumb) && (pc & 3) != 0) {
+return false;
+}
+
+/*
+ * Instruction aborts have priority over breakpoint exceptions.
+ * TODO: We would need to look up the page for PC and verify that
+ * it is present and executable.
+ */
+
 for (n = 0; n < ARRAY_SIZE(env->cpu_breakpoint); n++) {
 if (bp_wp_matches(cpu, n, false)) {
 return true;
-- 
2.25.1




[PATCH 3/4] target/arm: Take an exception if PC is misaligned

2021-08-17 Thread Richard Henderson
For A64, any input to an indirect branch can cause this.

For A32, many indirect branch paths force the branch to be aligned,
but BXWritePC does not.  This includes the BX instruction but also
other interworking changes to PC.  Prior to v8, this case is UNDEFINED.
With v8, this is CONSTRAINED UNDEFINED and may either raise an
exception or force align the PC.

We choose to raise an exception because we have the infrastructure,
it makes the generated code for gen_bx simpler, and it has the
possibility of catching more guest bugs.

Signed-off-by: Richard Henderson 
---
 target/arm/syndrome.h  |  5 
 target/arm/translate-a64.c | 12 +
 target/arm/translate.c | 50 +++---
 3 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index c590a109da..569b0c1115 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -275,4 +275,9 @@ static inline uint32_t syn_illegalstate(void)
 return EC_ILLEGALSTATE << ARM_EL_EC_SHIFT;
 }
 
+static inline uint32_t syn_pcalignment(void)
+{
+return EC_PCALIGNMENT << ARM_EL_EC_SHIFT;
+}
+
 #endif /* TARGET_ARM_SYNDROME_H */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 333bc836b2..c394bddac6 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14754,6 +14754,7 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 CPUARMState *env = cpu->env_ptr;
 uint32_t insn;
 
+/* Singlestep exceptions have the highest priority. */
 if (s->ss_active && !s->pstate_ss) {
 /* Singlestep state is Active-pending.
  * If we're in this state at the start of a TB then either
@@ -14771,6 +14772,17 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 return;
 }
 
+if (s->base.pc_next & 3) {
+/*
+ * PC alignment fault.  This has priority over the instruction abort
+ * that we would receive from a translation fault via arm_ldl_code.
+ */
+gen_exception_insn(s, s->base.pc_next, EXCP_UDEF,
+   syn_pcalignment(), default_exception_el(s));
+s->base.pc_next = QEMU_ALIGN_UP(s->base.pc_next, 4);
+return;
+}
+
 s->pc_curr = s->base.pc_next;
 insn = arm_ldl_code(env, s->base.pc_next, s->sctlr_b);
 s->insn = insn;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 5e0fc8a0a0..00ddd4879c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -9452,19 +9452,8 @@ static void arm_tr_insn_start(DisasContextBase *dcbase, 
CPUState *cpu)
 dc->insn_start = tcg_last_op();
 }
 
-static bool arm_pre_translate_insn(DisasContext *dc)
+static bool arm_check_ss_active(DisasContext *dc)
 {
-#ifdef CONFIG_USER_ONLY
-/* Intercept jump to the magic kernel page.  */
-if (dc->base.pc_next >= 0x) {
-/* We always get here via a jump, so know we are not in a
-   conditional execution block.  */
-gen_exception_internal(EXCP_KERNEL_TRAP);
-dc->base.is_jmp = DISAS_NORETURN;
-return true;
-}
-#endif
-
 if (dc->ss_active && !dc->pstate_ss) {
 /* Singlestep state is Active-pending.
  * If we're in this state at the start of a TB then either
@@ -9485,6 +9474,21 @@ static bool arm_pre_translate_insn(DisasContext *dc)
 return false;
 }
 
+static bool arm_check_kernelpage(DisasContext *dc)
+{
+#ifdef CONFIG_USER_ONLY
+/* Intercept jump to the magic kernel page.  */
+if (dc->base.pc_next >= 0x) {
+/* We always get here via a jump, so know we are not in a
+   conditional execution block.  */
+gen_exception_internal(EXCP_KERNEL_TRAP);
+dc->base.is_jmp = DISAS_NORETURN;
+return true;
+}
+#endif
+return false;
+}
+
 static void arm_post_translate_insn(DisasContext *dc)
 {
 if (dc->condjmp && !dc->base.is_jmp) {
@@ -9500,7 +9504,25 @@ static void arm_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 CPUARMState *env = cpu->env_ptr;
 unsigned int insn;
 
-if (arm_pre_translate_insn(dc)) {
+/* Singlestep exceptions have the highest priority. */
+if (arm_check_ss_active(dc)) {
+dc->base.pc_next += 4;
+return;
+}
+
+if (dc->base.pc_next & 3) {
+/*
+ * PC alignment fault.  This has priority over the instruction abort
+ * that we would receive from a translation fault via arm_ldl_code
+ * (or the execution of the kernelpage entrypoint).
+ */
+gen_exception_insn(dc, dc->base.pc_next, EXCP_UDEF,
+   syn_pcalignment(), default_exception_el(dc));
+dc->base.pc_next = QEMU_ALIGN_UP(dc->base.pc_next, 4);
+return;
+}
+
+if (arm_check_kernelpage(dc)) {
 dc->base.pc_next += 4;
 return;
 }
@@ -9570,7 +9592,7 @@ static void 

[PATCH 2/4] target/arm: Merge disas_a64_insn into aarch64_tr_translate_insn

2021-08-17 Thread Richard Henderson
It is confusing to have different exits from translation
for various conditions in separate functions.

Merge disas_a64_insn into its only caller.  Standardize
on the "s" name for the DisasContext, as the code from
disas_a64_insn had more instances.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 224 ++---
 1 file changed, 109 insertions(+), 115 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 230cc8d83b..333bc836b2 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14649,113 +14649,6 @@ static bool btype_destination_ok(uint32_t insn, bool 
bt, int btype)
 return false;
 }
 
-/* C3.1 A64 instruction index by encoding */
-static void disas_a64_insn(CPUARMState *env, DisasContext *s)
-{
-uint32_t insn;
-
-s->pc_curr = s->base.pc_next;
-insn = arm_ldl_code(env, s->base.pc_next, s->sctlr_b);
-s->insn = insn;
-s->base.pc_next += 4;
-
-s->fp_access_checked = false;
-s->sve_access_checked = false;
-
-if (s->pstate_il) {
-/*
- * Illegal execution state. This has priority over BTI
- * exceptions, but comes after instruction abort exceptions.
- */
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-   syn_illegalstate(), default_exception_el(s));
-return;
-}
-
-if (dc_isar_feature(aa64_bti, s)) {
-if (s->base.num_insns == 1) {
-/*
- * At the first insn of the TB, compute s->guarded_page.
- * We delayed computing this until successfully reading
- * the first insn of the TB, above.  This (mostly) ensures
- * that the softmmu tlb entry has been populated, and the
- * page table GP bit is available.
- *
- * Note that we need to compute this even if btype == 0,
- * because this value is used for BR instructions later
- * where ENV is not available.
- */
-s->guarded_page = is_guarded_page(env, s);
-
-/* First insn can have btype set to non-zero.  */
-tcg_debug_assert(s->btype >= 0);
-
-/*
- * Note that the Branch Target Exception has fairly high
- * priority -- below debugging exceptions but above most
- * everything else.  This allows us to handle this now
- * instead of waiting until the insn is otherwise decoded.
- */
-if (s->btype != 0
-&& s->guarded_page
-&& !btype_destination_ok(insn, s->bt, s->btype)) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-   syn_btitrap(s->btype),
-   default_exception_el(s));
-return;
-}
-} else {
-/* Not the first insn: btype must be 0.  */
-tcg_debug_assert(s->btype == 0);
-}
-}
-
-switch (extract32(insn, 25, 4)) {
-case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
-unallocated_encoding(s);
-break;
-case 0x2:
-if (!dc_isar_feature(aa64_sve, s) || !disas_sve(s, insn)) {
-unallocated_encoding(s);
-}
-break;
-case 0x8: case 0x9: /* Data processing - immediate */
-disas_data_proc_imm(s, insn);
-break;
-case 0xa: case 0xb: /* Branch, exception generation and system insns */
-disas_b_exc_sys(s, insn);
-break;
-case 0x4:
-case 0x6:
-case 0xc:
-case 0xe:  /* Loads and stores */
-disas_ldst(s, insn);
-break;
-case 0x5:
-case 0xd:  /* Data processing - register */
-disas_data_proc_reg(s, insn);
-break;
-case 0x7:
-case 0xf:  /* Data processing - SIMD and floating point */
-disas_data_proc_simd_fp(s, insn);
-break;
-default:
-assert(FALSE); /* all 15 cases should be handled above */
-break;
-}
-
-/* if we allocated any temporaries, free them here */
-free_tmp_a64(s);
-
-/*
- * After execution of most insns, btype is reset to 0.
- * Note that we set btype == -1 when the insn sets btype.
- */
-if (s->btype > 0 && s->base.is_jmp != DISAS_NORETURN) {
-reset_btype(s);
-}
-}
-
 static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
   CPUState *cpu)
 {
@@ -14857,10 +14750,11 @@ static void aarch64_tr_insn_start(DisasContextBase 
*dcbase, CPUState *cpu)
 
 static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
 {
-DisasContext *dc = container_of(dcbase, DisasContext, base);
+DisasContext *s = container_of(dcbase, DisasContext, base);
 CPUARMState *env = cpu->env_ptr;
+uint32_t insn;
 
-if (dc->ss_active && !dc->pstate_ss) {
+if (s->ss_active && !s->pstate_ss) {
 /* Singlestep 

[PATCH 0/4] target/arm: Fix insn exception priorities

2021-08-17 Thread Richard Henderson
As discussed earlier today at
https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02686.html

Raise pc alignment faults.
Fix single-step and pc-align priority over breakpoints.
Not yet fixing insn abort priority over breakpoints.


r~


Peter Maydell (1):
  target/arm: Take an exception if PSTATE.IL is set

Richard Henderson (3):
  target/arm: Merge disas_a64_insn into aarch64_tr_translate_insn
  target/arm: Take an exception if PC is misaligned
  target/arm: Suppress bp for exceptions with more priority

 target/arm/cpu.h   |   1 +
 target/arm/syndrome.h  |  10 ++
 target/arm/translate.h |   2 +
 target/arm/debug_helper.c  |  23 
 target/arm/helper-a64.c|   1 +
 target/arm/helper.c|   8 ++
 target/arm/translate-a64.c | 267 -
 target/arm/translate.c |  71 --
 8 files changed, 244 insertions(+), 139 deletions(-)

-- 
2.25.1




Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 03:45:19PM -0400, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> > On 16.08.21 21:44, Vivek Goyal wrote:
> > > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > > 
> > > [..]
> > > > > > But given the inotify complications, there’s really a good reason 
> > > > > > we should
> > > > > > use mountinfo.
> > > > > > 
> > > > > > > > It’s a bit tricky because our sandboxing prevents easy access 
> > > > > > > > to mountinfo,
> > > > > > > > but if that’s the only way...
> > > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am 
> > > > > > > assuming
> > > > > > > that any mount table changes will still be visible despite the 
> > > > > > > fact
> > > > > > > I have fd open (and don't have to open new fd to notice new 
> > > > > > > mount/unmount
> > > > > > > changes).
> > > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite 
> > > > > > successful yet;
> > > > > > when I tried keeping the fd open, reading from it would just return > > > > > > 0
> > > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc 
> > > > > > so that
> > > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > > before /proc/self/fd is bind mounted on /proc?
> > > > Yes, I tried that, and then reading would just return 0 bytes.
> > > Hi Hanna,
> > > 
> > > I tried this simple patch and I can read /proc/self/mountinfo before
> > > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > > I missing something.
> > 
> > Yes, but I tried reading it in the main loop (where we’d actually need it). 
> > It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
> 
> Good point. I modified my code and notice too that after umoutn2() it
> always reads 0 bytes. I can understand that all the other mount points
> could go away but new rootfs mount point of virtiofsd should still be
> visible, IIUC. I don't understand why.
> 
> Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
> MNT_DETACH), and that seems to work and it shows root mount point. I 
> created a bind mount and it shows that too.
> 
> So looks like quick fix can be that we re-open /proc/self/mountinfo. But
> that means we can't bind /proc/self/fd on /proc/. We could bind mount
> /proc/self on /proc. Not sure is it safe enough.

Or may be I can do this.

- Open O_PATH fd for /proc/self
  proc_self = open("/proc/self");
- Bind mount /proc/self/fd on /proc
- pivot_root() and umount() stuff
- Openat(proc_self, "mountinfo")
- close(proc_self)

If this works, then we don't have the security issue and we managed
to open mountinfo after pivot_root() and umount(). Will give it a
try and see if it works tomorrow.

Vivek




Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Paolo Bonzini
On Tue, Aug 17, 2021 at 10:51 PM Tobin Feldman-Fitzthum
 wrote:
> This is essentially what we do in our prototype, although we have an
> even simpler approach. We have a 1:1 mapping that maps an address to
> itself with the cbit set. During Migration QEMU asks the migration
> handler to import/export encrypted pages and provides the GPA for said
> page. Since the migration handler only exports/imports encrypted pages,
> we can have the cbit set for every page in our mapping. We can still use
> OVMF functions with these mappings because they are on encrypted pages.
> The MH does need to use a few shared pages (to communicate with QEMU,
> for instance), so we have another mapping without the cbit that is at a
> large offset.
>
> I think this is basically equivalent to what you suggest. As you point
> out above, this approach does require that any page that will be
> exported/imported by the MH is mapped in the guest. Is this a bad
> assumption?

It should work well enough in the common case; and with SNP it looks
like it is a necessary assumption anyway. *shrug*

It would be a bit ugly because QEMU has to constantly convert
ram_addr_t's to guest physical addresses and back. But that's not a
performance problem.

The only important bit is that the encryption status bitmap be indexed
by ram_addr_t. This lets QEMU detect the problem of a ram_addr_t that
is marked encrypted but is not currently mapped, and abort migration.

> The Migration Handler in OVMF is not a contiguous region of memory. The
> MH uses OVMF helper functions that are allocated in various regions of
> runtime memory. I guess I can see how separating the memory of the MH
> and the guest OS could be positive. On the other hand, since the MH is
> in OVMF, it is fundamentally designed to coexist with the guest OS.

IIRC runtime services are not SMP-safe, so the migration helper cannot
coexist with the guest OS without extra care. I checked quickly and
CryptoPkg/Library/BaseCryptLib/SysCall/RuntimeMemAllocation.c does not
use any lock, so it would be bad if both OS-invoked runtime services
and the MH invoked the CryptoPkg malloc at the same time.

Paolo




Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Steve Rutherford
On Tue, Aug 17, 2021 at 3:57 PM James Bottomley  wrote:
> Realistically, migration is becoming a royal pain, not just for
> confidential computing, but for virtual functions in general.  I really
> think we should look at S3 suspend, where we shut down the drivers and
> then reattach on S3 resume as the potential pathway to getting
> migration working both for virtual functions and this use case.

This type of migration seems a little bit less "live", which makes me
concerned about its performance characteristics.

Steve



Re: [PATCH RFC v2 01/16] vfio-user: introduce vfio-user protocol specification

2021-08-17 Thread Alex Williamson
On Mon, 16 Aug 2021 09:42:34 -0700
Elena Ufimtseva  wrote:
> +Authentication
> +--
> +
> +For ``AF_UNIX``, we rely on OS mandatory access controls on the socket files,
> +therefore it is up to the management layer to set up the socket as required.
> +Socket types than span guests or hosts will require a proper authentication

s/than/that/

...
> +``VFIO_USER_DMA_UNMAP``
> +---
> +
> +This command message is sent by the client to the server to inform it that a
> +DMA region, previously made available via a ``VFIO_USER_DMA_MAP`` command
> +message, is no longer available for DMA. It typically occurs when memory is
> +subtracted from the client or if the client uses a vIOMMU. The DMA region is
> +described by the following structure:
> +
> +Request
> +^^^
> +
> +The request payload for this message is a structure of the following format:
> +
> ++--+++
> +| Name | Offset | Size   |
> ++==+++
> +| argsz| 0  | 4  |
> ++--+++
> +| flags| 4  | 4  |
> ++--+++
> +|  | +-+---+ |
> +|  | | Bit | Definition| |
> +|  | +=+===+ |
> +|  | | 0   | get dirty page bitmap | |
> +|  | +-+---+ |
> ++--+++
> +| address  | 8  | 8  |
> ++--+++
> +| size | 16 | 8  |
> ++--+++
> +
> +* *argsz* is the maximum size of the reply payload.
> +* *flags* contains the following DMA region attributes:
> +
> +  * *get dirty page bitmap* indicates that a dirty page bitmap must be
> +populated before unmapping the DMA region. The client must provide a
> +`VFIO Bitmap`_ structure, explained below, immediately following this
> +entry.
> +
> +* *address* is the base DMA address of the DMA region.
> +* *size* is the size of the DMA region.
> +
> +The address and size of the DMA region being unmapped must match exactly a
> +previous mapping. The size of request message depends on whether or not the
> +*get dirty page bitmap* bit is set in Flags:
> +
> +* If not set, the size of the total request message is: 16 + 24.
> +
> +* If set, the size of the total request message is: 16 + 24 + 16.

The address/size paradigm falls into the same issues as the vfio kernel
interface where we can't map or unmap the entire 64-bit address space,
ie. size is limited to 2^64 - 1.  The kernel interface also requires
PAGE_SIZE granularity for the DMA, which means the practical limit is
2^64 - PAGE_SIZE.  If we had a redo on the kernel interface we'd use
start/end so we can express a size of (end - start + 1).

Is following the vfio kernel interface sufficiently worthwhile for
compatibility to incur this same limitation?  I don't recall if we've
already discussed this, but perhaps worth a note in this design doc if
similarity to the kernel interface is being favored here.  See for
example QEMU commit 1b296c3def4b ("vfio: Don't issue full 2^64 unmap").
Thanks,

Alex




Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread James Bottomley
On Wed, 2021-08-18 at 00:37 +0200, Paolo Bonzini wrote:
> On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
>  wrote:
> > > 1) the easy one: the bottom 4G of guest memory are mapped in the
> > > mirror
> > > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G
> > > or a
> > > huge value such as 2^42 (MAXPHYADDR - physical address reduction
> > > - 1).
> > > This even lets the migration helper reuse the OVMF runtime
> > > services
> > > memory map (but be careful about thread safety...).
> > 
> > If I understand what you are proposing, this would only work for
> > SEV/SEV-ES, since the RMP prevents these remapping games. This
> > makes
> > me less enthusiastic about this (but I suspect that's why you call
> > this less future proof).
> 
> I called it less future proof because it allows the migration helper
> to rely more on OVMF details, but those may not apply in the future.
> 
> However you're right about SNP; the same page cannot be mapped twice
> at different GPAs by a single ASID (which includes the VM and the
> migration helper). :( That does throw a wrench in the idea of mapping
> pages by ram_addr_t(*), and this applies to both schemes.

Right, but in the current IBM approach, since we use the same mapping
for guest and mirror, we have the same GPA in both and it should work
with -SNP.

> Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
> can be moved and every time they do the migration helper needs to
> wait for validation to happen. :(

Realistically, migration is becoming a royal pain, not just for
confidential computing, but for virtual functions in general.  I really
think we should look at S3 suspend, where we shut down the drivers and
then reattach on S3 resume as the potential pathway to getting
migration working both for virtual functions and this use case.

James





Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Paolo Bonzini
On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
 wrote:
> > 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> > huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> > This even lets the migration helper reuse the OVMF runtime services
> > memory map (but be careful about thread safety...).
>
> If I understand what you are proposing, this would only work for
> SEV/SEV-ES, since the RMP prevents these remapping games. This makes
> me less enthusiastic about this (but I suspect that's why you call
> this less future proof).

I called it less future proof because it allows the migration helper
to rely more on OVMF details, but those may not apply in the future.

However you're right about SNP; the same page cannot be mapped twice
at different GPAs by a single ASID (which includes the VM and the
migration helper). :( That does throw a wrench in the idea of mapping
pages by ram_addr_t(*), and this applies to both schemes.

Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
can be moved and every time they do the migration helper needs to wait
for validation to happen. :(

Paolo

(*) ram_addr_t is not a GPA; it is constant throughout the life of the
guest and independent of e.g. PCI BARs. Internally, when QEMU
retrieves the dirty page bitmap from KVM it stores the bits indexed by
ram_addr_t (shifted right by PAGE_SHIFT).

> > 2) the more future-proof one.  Here, the migration helper tells QEMU
> > which area to copy from the guest to the mirror VM, as a (main GPA,
> > length, mirror GPA) tuple.  This could happen for example the first time
> > the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> > QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> > accordingly.  The page tables are built for this (usually very high)
> > mirror GPA and the migration helper operates in a completely separate
> > address space.  However, the backing memory would still be shared
> > between the main and mirror VMs.  I am saying this is more future proof
> > because we have more flexibility in setting up the physical address
> > space of the mirror VM.
>
> My intuition for this leans more on the host, but matches some of the
> bits you've mentioned in (2)/(3). My intuition would be to put the
> migration helper incredibly high in gPA space, so that it does not
> collide with the rest of the guest (and can then stay in the same
> place for a fairly long period of time without needing to poke a hole
> in the guest). Then you can leave the ram_addr_t-based addresses
> mapped normally (without the offsetting). All this together allows the
> migration helper to be orthogonal to the normal guest and normal
> firmware.
>
> In this case, since the migration helper has a somewhat stable base
> address, you can have a prebaked entry point and page tables
> (determined at build time). The shared communication pages can come
> from neighboring high-memory. The migration helper can support a
> straightforward halt loop (or PIO loop, or whatever) where it reads
> from a predefined page to find what work needs to be done (perhaps
> with that page depending on which CPU it is, so you can support
> multithreading of the migration helper). Additionally, having it high
> in memory makes it quite easy to assess who owns which addresses: high
> mem is under the purview of the migration helper and does not need to
> be dirty tracked. Only "low" memory can and needs to be encrypted for
> transport to the target side.
>
> --Steve
> >
> > Paolo
> >
>




Re: [PATCH v3] block/file-win32: add reopen handlers

2021-08-17 Thread Philippe Mathieu-Daudé
On 8/17/21 10:21 PM, Viktor Prutyanov wrote:
> Make 'qemu-img commit' work on Windows.
> 
> Command 'commit' requires reopening backing file in RW mode. So,
> add reopen prepare/commit/abort handlers and change dwShareMode
> for CreateFile call in order to allow further read/write reopening.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418
> Suggested-by: Hanna Reitz 
> Signed-off-by: Viktor Prutyanov 
> ---
>  v2:
> - fix indentation in raw_reopen_prepare
> - free rs if raw_reopen_prepare fails
>  v3:
> - restore suggested-by field missed in v2
> 
>  block/file-win32.c | 90 +-
>  1 file changed, 89 insertions(+), 1 deletion(-)

LGTM, asked Helge Konetzka for testing (on the gitlab issue).

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 20/21] target/riscv: Tidy trans_rvh.c.inc

2021-08-17 Thread Philippe Mathieu-Daudé
On 8/17/21 11:18 PM, Richard Henderson wrote:
> Exit early if check_access fails.
> Split out do_hlv, do_hsv, do_hlvx subroutines.
> Use dest_gpr, get_gpr in the new subroutines.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/riscv/insn32.decode  |   1 +
>  target/riscv/insn_trans/trans_rvh.c.inc | 266 +---
>  2 files changed, 57 insertions(+), 210 deletions(-)

Nice diffstat ;)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH v2 09/21] target/riscv: Move gen_* helpers for RVB

2021-08-17 Thread Philippe Mathieu-Daudé
On 8/17/21 11:17 PM, Richard Henderson wrote:
> Move these helpers near their use by the trans_*
> functions within insn_trans/trans_rvb.c.inc.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/riscv/translate.c| 233 ---
>  target/riscv/insn_trans/trans_rvb.c.inc | 234 
>  2 files changed, 234 insertions(+), 233 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH v2 08/21] target/riscv: Move gen_* helpers for RVM

2021-08-17 Thread Philippe Mathieu-Daudé
On 8/17/21 11:17 PM, Richard Henderson wrote:
> Move these helpers near their use by the trans_*
> functions within insn_trans/trans_rvm.c.inc.
> 
> Signed-off-by: Richard Henderson 
> ---
>  target/riscv/translate.c| 112 
>  target/riscv/insn_trans/trans_rvm.c.inc | 112 
>  2 files changed, 112 insertions(+), 112 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



Re: [PATCH 0/2] target/arm: Implement remaining HSTR functionality

2021-08-17 Thread Richard Henderson

On 8/16/21 8:03 AM, Peter Maydell wrote:

Peter Maydell (2):
   target/arm: Implement HSTR.TTEE
   target/arm: Implement HSTR.TJDBX


Reviewed-by: Richard Henderson 


r~



Re: [PATCH v2 03/21] target/riscv: Add DisasContext to gen_get_gpr, gen_set_gpr

2021-08-17 Thread Philippe Mathieu-Daudé
On 8/17/21 11:17 PM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 
> ---
>  target/riscv/translate.c| 58 -
>  target/riscv/insn_trans/trans_rva.c.inc | 18 
>  target/riscv/insn_trans/trans_rvb.c.inc |  4 +-
>  target/riscv/insn_trans/trans_rvd.c.inc | 32 +++---
>  target/riscv/insn_trans/trans_rvf.c.inc | 32 +++---
>  target/riscv/insn_trans/trans_rvh.c.inc | 52 +++---
>  target/riscv/insn_trans/trans_rvi.c.inc | 44 +--
>  target/riscv/insn_trans/trans_rvm.c.inc | 12 ++---
>  target/riscv/insn_trans/trans_rvv.c.inc | 36 +++
>  9 files changed, 144 insertions(+), 144 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 



[ANNOUNCE] QEMU 6.1.0-rc4 is now available

2021-08-17 Thread Michael Roth
Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
fifth release candidate for the QEMU 6.1 release.  This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-6.1.0-rc4.tar.xz
  http://download.qemu-project.org/qemu-6.1.0-rc4.tar.xz.sig

A note from the maintainer:

  There are a comparatively large number of commits here for an rc4,
  but they are mostly changes only to documentation or the test suite;
  there are only a few, relatively minor, bug fixes. Hopefully
  the final 6.1.0 release will come out on Tuesday 24th.

You can help improve the quality of the QEMU 6.1 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/6.1

Please add entries to the ChangeLog for the 6.1 release below:

  http://wiki.qemu.org/ChangeLog/6.1

Thank you to everyone involved!

Changes since rc3:

ecf2706e27: Update version for v6.1.0-rc4 release (Peter Maydell)
1c4c685936: softmmu/physmem: fix wrong assertion in qemu_ram_alloc_internal() 
(David Hildenbrand)
0572edc55b: qapi/machine.json: Remove zero value reference from 
SMPConfiguration documentation (Andrew Jones)
ea0aa1752c: hw/core: fix error checking in smp_parse (Daniel P. Berrangé)
0b46318170: hw/core: Add missing return on error (Philippe Mathieu-Daudé)
24d84c7e48: target/i386: Fixed size of constant for Windows (Lara Lazier)
a7686d5d85: Hexagon (disas/hexagon.c) fix memory leak for early exit cases 
(Taylor Simpson)
36b508993c: docs/about/removed-features: Document removed machines from older 
QEMU versions (Thomas Huth)
5643fcdd42: docs/about/removed-features: Document removed devices from older 
QEMU versions (Thomas Huth)
5d82c10160: docs/about/removed-features: Document removed HMP commands from 
QEMU v2.12 (Thomas Huth)
29e0447551: docs/about/removed-features: Document removed CLI options from QEMU 
v3.1 (Thomas Huth)
8cc461c185: docs/about/removed-features: Document removed CLI options from QEMU 
v3.0 (Thomas Huth)
3d9c7ec955: docs/about/removed-features: Document removed CLI options from QEMU 
v2.12 (Thomas Huth)
3973e7ae63: fuzz: avoid building twice, when running on gitlab (Alexander 
Bulekov)
b063c290f3: tests/qtest/vhost-user-blk-test: Check whether qemu-storage-daemon 
is available (Thomas Huth)
cc1838c25d: storage-daemon: Add missing build dependency to the 
vhost-user-blk-test (Thomas Huth)
a1f0f36838: gitlab: skip many more targets in windows cross builds (Daniel P. 
Berrangé)
a4de5e8a06: MAINTAINERS: update virtio-gpu entry. (Gerd Hoffmann)
1e2edb9866: MAINTAINERS: update virtio-input entry. (Gerd Hoffmann)
cd02c965c4: MAINTAINERS: update usb entries. (Gerd Hoffmann)
227b1638ba: MAINTAINERS: update spice entry. (Gerd Hoffmann)
6bc915f31a: MAINTAINERS: update audio entry. (Gerd Hoffmann)
8f6259055a: MAINTAINERS: update sockets entry. (Gerd Hoffmann)
f492bdf4ab: MAINTAINERS: update edk2 entry. (Gerd Hoffmann)
a62354915b: gitlab: exclude sparc-softmmu and riscv32-softmmu from cross builds 
(Daniel P. Berrangé)



Re: [PULL 24/27] accel/tcg: Move breakpoint recognition outside translation

2021-08-17 Thread Richard Henderson

On 8/17/21 5:39 AM, Richard Henderson wrote:

Hmm, you're correct that we get this wrong.


We probably didn't do these in the right priority
order before this series, though, and I dunno whether
we get the insn-abort vs swstep ordering right either...


And you're correct that we got it wrong beforehand.  The reorg did not alter the 
recognized ordering of the exceptions.


I'm a bit surprised that insn-abort comes higher than breakpoint.


That would be because I mis-remembered the language.

Going back to the list,

  4 - software step
  6 - pc alignment fault
  7 - instruction abort (address translation!)
  8 - breakpoint exceptions
  9 - illegal execution state
 10 - software breakpoint (brk)
 11 - BTI exceptions
 12 - el2 traps
 13 - undefined exceptions

I thought "insn-abort" was #13, but it's really #7.

Well, behaviour is unchanged, since we check for address match before calling 
arm_ldl_code, before and after the reorg.


So: we need to suppress breakpoints (#8) for any higher-priority condition.  For #7, that 
might require some extra generic work.  I should have a look at the other targets that use 
architectural breakpoints to see what happens for a breakpoint on an unmapped page.


I'll work something out.


r~



Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Steve Rutherford
On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
 wrote:
>
>
> On 8/17/21 12:32 PM, Paolo Bonzini wrote:
> > There's three possibilities for this:
> >
> > 1) the easy one: the bottom 4G of guest memory are mapped in the
> > mirror VM 1:1.  The ram_addr_t-based addresses are shifted by either
> > 4G or a huge value such as 2^42 (MAXPHYADDR - physical address
> > reduction - 1). This even lets the migration helper reuse the OVMF
> > runtime services memory map (but be careful about thread safety...).
>
> This is essentially what we do in our prototype, although we have an
> even simpler approach. We have a 1:1 mapping that maps an address to
> itself with the cbit set. During Migration QEMU asks the migration
> handler to import/export encrypted pages and provides the GPA for said
> page. Since the migration handler only exports/imports encrypted pages,
> we can have the cbit set for every page in our mapping. We can still use
> OVMF functions with these mappings because they are on encrypted pages.
> The MH does need to use a few shared pages (to communicate with QEMU,
> for instance), so we have another mapping without the cbit that is at a
> large offset.
>
> I think this is basically equivalent to what you suggest. As you point
> out above, this approach does require that any page that will be
> exported/imported by the MH is mapped in the guest. Is this a bad
> assumption? The VMSA for SEV-ES is one example of a region that is
> encrypted but not mapped in the guest (the PSP handles it directly). We
> have been planning to map the VMSA into the guest to support migration
> with SEV-ES (along with other changes).

Ahh, It sounds like you are looking into sidestepping the existing
AMD-SP flows for migration. I assume the idea is to spin up a VM on
the target side, and have the two VMs attest to each other. How do the
two sides know if the other is legitimate? I take it that the source
is directing the LAUNCH flows?


--Steve



Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Steve Rutherford
On Tue, Aug 17, 2021 at 9:32 AM Paolo Bonzini  wrote:
>
> On 17/08/21 01:53, Steve Rutherford wrote:
> > Separately, I'm a little weary of leaving the migration helper mapped
> > into the shared address space as writable.
>
> A related question here is what the API should be for how the migration
> helper sees the memory in both physical and virtual address.
>
> First of all, I would like the addresses passed to and from the
> migration helper to *not* be guest physical addresses (this is what I
> referred to as QEMU's ram_addr_t in other messages).  The reason is that
> some unmapped memory regions, such as virtio-mem hotplugged memory,
> would still have to be transferred and could be encrypted.  While the
> guest->host hypercall interface uses guest physical addresses to
> communicate which pages are encrypted, the host can do the
> GPA->ram_addr_t conversion and remember the encryption status of
> currently-unmapped regions.
>
> This poses a problem, in that the guest needs to prepare the page tables
> for the migration helper and those need to use the migration helper's
> physical address space.
>
> There's three possibilities for this:
>
> 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> This even lets the migration helper reuse the OVMF runtime services
> memory map (but be careful about thread safety...).
If I understand what you are proposing, this would only work for
SEV/SEV-ES, since the RMP prevents these remapping games. This makes
me less enthusiastic about this (but I suspect that's why you call
this less future proof).
>
> 2) the more future-proof one.  Here, the migration helper tells QEMU
> which area to copy from the guest to the mirror VM, as a (main GPA,
> length, mirror GPA) tuple.  This could happen for example the first time
> the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> accordingly.  The page tables are built for this (usually very high)
> mirror GPA and the migration helper operates in a completely separate
> address space.  However, the backing memory would still be shared
> between the main and mirror VMs.  I am saying this is more future proof
> because we have more flexibility in setting up the physical address
> space of the mirror VM.
>
> 3) the paranoid one, which I think is what you hint at above: this is an
> extension of (2), where userspace invokes the PSP send/receive API to
> copy the small requested area of the main VM into the mirror VM.  The
> mirror VM code and data are completely separate from the main VM.  All
> that the mirror VM shares is the ram_addr_t data.  Though I am not even
> sure it is possible to use the send/receive API this way...
Moreso what I was hinting at was treating the MH's code and data as
firmware is treated, i.e. initialize it via LAUNCH_UPDATE_DATA.
Getting the guest to trust host supplied code (i.e. firmware) needs to
happen regardless.

>
> What do you think?

My intuition for this leans more on the host, but matches some of the
bits you've mentioned in (2)/(3). My intuition would be to put the
migration helper incredibly high in gPA space, so that it does not
collide with the rest of the guest (and can then stay in the same
place for a fairly long period of time without needing to poke a hole
in the guest). Then you can leave the ram_addr_t-based addresses
mapped normally (without the offsetting). All this together allows the
migration helper to be orthogonal to the normal guest and normal
firmware.

In this case, since the migration helper has a somewhat stable base
address, you can have a prebaked entry point and page tables
(determined at build time). The shared communication pages can come
from neighboring high-memory. The migration helper can support a
straightforward halt loop (or PIO loop, or whatever) where it reads
from a predefined page to find what work needs to be done (perhaps
with that page depending on which CPU it is, so you can support
multithreading of the migration helper). Additionally, having it high
in memory makes it quite easy to assess who owns which addresses: high
mem is under the purview of the migration helper and does not need to
be dirty tracked. Only "low" memory can and needs to be encrypted for
transport to the target side.

--Steve
>
> Paolo
>



[PATCH v2 20/21] target/riscv: Tidy trans_rvh.c.inc

2021-08-17 Thread Richard Henderson
Exit early if check_access fails.
Split out do_hlv, do_hsv, do_hlvx subroutines.
Use dest_gpr, get_gpr in the new subroutines.

Signed-off-by: Richard Henderson 
---
 target/riscv/insn32.decode  |   1 +
 target/riscv/insn_trans/trans_rvh.c.inc | 266 +---
 2 files changed, 57 insertions(+), 210 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f09f8d5faf..2cd921d51c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -42,6 +42,7 @@
 imm rd
 rd rs1 rs2
rd rs1
+_s rs1 rs2
 imm rs1 rs2
 imm rd
  shamt rs1 rd
diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
b/target/riscv/insn_trans/trans_rvh.c.inc
index 585eb1d87e..ecbf77ff9c 100644
--- a/target/riscv/insn_trans/trans_rvh.c.inc
+++ b/target/riscv/insn_trans/trans_rvh.c.inc
@@ -17,281 +17,139 @@
  */
 
 #ifndef CONFIG_USER_ONLY
-static void check_access(DisasContext *ctx) {
+static bool check_access(DisasContext *ctx)
+{
 if (!ctx->hlsx) {
 if (ctx->virt_enabled) {
 generate_exception(ctx, RISCV_EXCP_VIRT_INSTRUCTION_FAULT);
 } else {
 generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
 }
+return false;
 }
+return true;
 }
 #endif
 
+static bool do_hlv(DisasContext *ctx, arg_r2 *a, MemOp mop)
+{
+#ifdef CONFIG_USER_ONLY
+return false;
+#else
+if (check_access(ctx)) {
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
+int mem_idx = ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+tcg_gen_qemu_ld_tl(dest, addr, mem_idx, mop);
+gen_set_gpr(ctx, a->rd, dest);
+}
+return true;
+#endif
+}
+
 static bool trans_hlv_b(DisasContext *ctx, arg_hlv_b *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_SB);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_SB);
 }
 
 static bool trans_hlv_h(DisasContext *ctx, arg_hlv_h *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_TESW);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_TESW);
 }
 
 static bool trans_hlv_w(DisasContext *ctx, arg_hlv_w *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_TESL);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_TESL);
 }
 
 static bool trans_hlv_bu(DisasContext *ctx, arg_hlv_bu *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_UB);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_UB);
 }
 
 static bool trans_hlv_hu(DisasContext *ctx, arg_hlv_hu *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
+return do_hlv(ctx, a, MO_TEUW);
+}
 
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_TEUW);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
+static bool do_hsv(DisasContext *ctx, arg_r2_s *a, MemOp mop)
+{
+#ifdef CONFIG_USER_ONLY
 return false;
+#else
+if (check_access(ctx)) {
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv data = get_gpr(ctx, a->rs2, EXT_NONE);
+int mem_idx = ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+tcg_gen_qemu_st_tl(data, addr, mem_idx, mop);
+}
+return true;
 #endif
 }
 
 static bool trans_hsv_b(DisasContext *ctx, arg_hsv_b *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-gen_get_gpr(ctx, dat, a->rs2);
-
-tcg_gen_qemu_st_tl(dat, t0, ctx->mem_idx | 

[PATCH v2 19/21] target/riscv: Use {get,dest}_gpr for RVD

2021-08-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvd.c.inc | 125 
 1 file changed, 60 insertions(+), 65 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
b/target/riscv/insn_trans/trans_rvd.c.inc
index 11b9b3f90b..db9ae15755 100644
--- a/target/riscv/insn_trans/trans_rvd.c.inc
+++ b/target/riscv/insn_trans/trans_rvd.c.inc
@@ -20,30 +20,40 @@
 
 static bool trans_fld(DisasContext *ctx, arg_fld *a)
 {
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
 
-tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], addr, ctx->mem_idx, MO_TEQ);
 
 mark_fs_dirty(ctx);
-tcg_temp_free(t0);
 return true;
 }
 
 static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 {
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
 
-tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEQ);
 
-tcg_temp_free(t0);
 return true;
 }
 
@@ -252,11 +262,10 @@ static bool trans_feq_d(DisasContext *ctx, arg_feq_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_feq_d(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_helper_feq_d(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -265,11 +274,10 @@ static bool trans_flt_d(DisasContext *ctx, arg_flt_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_flt_d(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_helper_flt_d(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -278,11 +286,10 @@ static bool trans_fle_d(DisasContext *ctx, arg_fle_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_fle_d(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_helper_fle_d(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -291,10 +298,10 @@ static bool trans_fclass_d(DisasContext *ctx, 
arg_fclass_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_fclass_d(t0, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_fclass_d(dest, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -303,12 +310,11 @@ static bool trans_fcvt_w_d(DisasContext *ctx, 
arg_fcvt_w_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_w_d(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_w_d(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -317,12 +323,11 @@ static bool trans_fcvt_wu_d(DisasContext *ctx, 
arg_fcvt_wu_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_wu_d(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_wu_d(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -331,12 +336,10 @@ static bool trans_fcvt_d_w(DisasContext *ctx, 
arg_fcvt_d_w *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
+TCGv src = get_gpr(ctx, a->rs1, EXT_SIGN);
 
 gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_d_w(cpu_fpr[a->rd], cpu_env, t0);
-tcg_temp_free(t0);
+gen_helper_fcvt_d_w(cpu_fpr[a->rd], cpu_env, src);
 
 mark_fs_dirty(ctx);
 return true;
@@ -347,12 +350,10 @@ static bool trans_fcvt_d_wu(DisasContext *ctx, 
arg_fcvt_d_wu *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
- 

[PATCH v2 18/21] target/riscv: Use {get,dest}_gpr for RVF

2021-08-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvf.c.inc | 146 
 1 file changed, 70 insertions(+), 76 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index fb9f7f9c00..bddbd418d9 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -25,32 +25,43 @@
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
+TCGv_i64 dest;
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
 
-tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
-gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+dest = cpu_fpr[a->rd];
+tcg_gen_qemu_ld_i64(dest, addr, ctx->mem_idx, MO_TEUL);
+gen_nanbox_s(dest, dest);
 
-tcg_temp_free(t0);
 mark_fs_dirty(ctx);
 return true;
 }
 
 static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 {
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
 
-tcg_gen_addi_tl(t0, t0, a->imm);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = tcg_temp_new();
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
 
-tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
+tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEUL);
 
-tcg_temp_free(t0);
 return true;
 }
 
@@ -271,12 +282,11 @@ static bool trans_fcvt_w_s(DisasContext *ctx, 
arg_fcvt_w_s *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_w_s(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -285,12 +295,11 @@ static bool trans_fcvt_wu_s(DisasContext *ctx, 
arg_fcvt_wu_s *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_wu_s(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -300,17 +309,15 @@ static bool trans_fmv_x_w(DisasContext *ctx, arg_fmv_x_w 
*a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
 
 #if defined(TARGET_RISCV64)
-tcg_gen_ext32s_tl(t0, cpu_fpr[a->rs1]);
+tcg_gen_ext32s_tl(dest, cpu_fpr[a->rs1]);
 #else
-tcg_gen_extrl_i64_i32(t0, cpu_fpr[a->rs1]);
+tcg_gen_extrl_i64_i32(dest, cpu_fpr[a->rs1]);
 #endif
 
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
-
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -318,10 +325,11 @@ static bool trans_feq_s(DisasContext *ctx, arg_feq_s *a)
 {
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_feq_s(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -329,10 +337,11 @@ static bool trans_flt_s(DisasContext *ctx, arg_flt_s *a)
 {
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_flt_s(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -340,10 +349,11 @@ static bool trans_fle_s(DisasContext *ctx, arg_fle_s *a)
 {
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_fle_s(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -352,13 +362,10 @@ static bool trans_fclass_s(DisasContext *ctx, 
arg_fclass_s *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
-
-gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
-
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+

[PATCH v2 14/21] target/riscv: Use {get, dest}_gpr for integer load/store

2021-08-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 36 +
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index af3e0bc0e6..f616a26c82 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -138,15 +138,17 @@ static bool trans_bgeu(DisasContext *ctx, arg_bgeu *a)
 
 static bool gen_load(DisasContext *ctx, arg_lb *a, MemOp memop)
 {
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
 
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, memop);
-gen_set_gpr(ctx, a->rd, t1);
-tcg_temp_free(t0);
-tcg_temp_free(t1);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, memop);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -177,19 +179,19 @@ static bool trans_lhu(DisasContext *ctx, arg_lhu *a)
 
 static bool gen_store(DisasContext *ctx, arg_sb *a, MemOp memop)
 {
-TCGv t0 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
-gen_get_gpr(ctx, dat, a->rs2);
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv data = get_gpr(ctx, a->rs2, EXT_NONE);
 
-tcg_gen_qemu_st_tl(dat, t0, ctx->mem_idx, memop);
-tcg_temp_free(t0);
-tcg_temp_free(dat);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_st_tl(data, addr, ctx->mem_idx, memop);
 return true;
 }
 
-
 static bool trans_sb(DisasContext *ctx, arg_sb *a)
 {
 return gen_store(ctx, a, MO_SB);
-- 
2.25.1




[PATCH v2 17/21] target/riscv: Use gen_shift_imm_fn for slli_uw

2021-08-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvb.c.inc | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index af7694ed29..d5a036b1f3 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -647,21 +647,18 @@ static bool trans_add_uw(DisasContext *ctx, arg_add_uw *a)
 return gen_arith(ctx, a, EXT_NONE, gen_add_uw);
 }
 
+static void gen_slli_uw(TCGv dest, TCGv src, target_long shamt)
+{
+if (shamt < 32) {
+tcg_gen_deposit_z_tl(dest, src, shamt, 32);
+} else {
+tcg_gen_shli_tl(dest, src, shamt);
+}
+}
+
 static bool trans_slli_uw(DisasContext *ctx, arg_slli_uw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-
-TCGv source1 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-
-if (a->shamt < 32) {
-tcg_gen_deposit_z_tl(source1, source1, a->shamt, 32);
-} else {
-tcg_gen_shli_tl(source1, source1, a->shamt);
-}
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-return true;
+return gen_shift_imm_fn(ctx, a, EXT_NONE, gen_slli_uw);
 }
-- 
2.25.1




[PATCH v2 15/21] target/riscv: Reorg csr instructions

2021-08-17 Thread Richard Henderson
Introduce csrr and csrw helpers, for read-only and write-only insns.

Note that we do not properly implement this in riscv_csrrw, in that
we cannot distinguish true read-only (rs1 == 0) from any other zero
write_mask another source register -- this should still raise an
exception for read-only registers.

Only issue gen_io_start for CF_USE_ICOUNT.
Use ctx->zero for csrrc.
Use get_gpr and dest_gpr.

Signed-off-by: Richard Henderson 
---
 target/riscv/helper.h   |   6 +-
 target/riscv/op_helper.c|  18 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 172 +---
 3 files changed, 131 insertions(+), 65 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 415e37bc37..460eee9988 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -65,9 +65,9 @@ DEF_HELPER_FLAGS_2(gorc, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(gorcw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
 /* Special functions */
-DEF_HELPER_3(csrrw, tl, env, tl, tl)
-DEF_HELPER_4(csrrs, tl, env, tl, tl, tl)
-DEF_HELPER_4(csrrc, tl, env, tl, tl, tl)
+DEF_HELPER_2(csrr, tl, env, int)
+DEF_HELPER_3(csrw, void, env, int, tl)
+DEF_HELPER_4(csrrw, tl, env, int, tl, tl)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_2(sret, tl, env, tl)
 DEF_HELPER_2(mret, tl, env, tl)
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 3c48e739ac..ee7c24efe7 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -37,11 +37,10 @@ void helper_raise_exception(CPURISCVState *env, uint32_t 
exception)
 riscv_raise_exception(env, exception, 0);
 }
 
-target_ulong helper_csrrw(CPURISCVState *env, target_ulong src,
-target_ulong csr)
+target_ulong helper_csrr(CPURISCVState *env, int csr)
 {
 target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , src, -1);
+RISCVException ret = riscv_csrrw(env, csr, , 0, 0);
 
 if (ret != RISCV_EXCP_NONE) {
 riscv_raise_exception(env, ret, GETPC());
@@ -49,23 +48,20 @@ target_ulong helper_csrrw(CPURISCVState *env, target_ulong 
src,
 return val;
 }
 
-target_ulong helper_csrrs(CPURISCVState *env, target_ulong src,
-target_ulong csr, target_ulong rs1_pass)
+void helper_csrw(CPURISCVState *env, int csr, target_ulong src)
 {
-target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , -1, rs1_pass ? src : 0);
+RISCVException ret = riscv_csrrw(env, csr, NULL, src, -1);
 
 if (ret != RISCV_EXCP_NONE) {
 riscv_raise_exception(env, ret, GETPC());
 }
-return val;
 }
 
-target_ulong helper_csrrc(CPURISCVState *env, target_ulong src,
-target_ulong csr, target_ulong rs1_pass)
+target_ulong helper_csrrw(CPURISCVState *env, int csr,
+  target_ulong src, target_ulong write_mask)
 {
 target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , 0, rs1_pass ? src : 0);
+RISCVException ret = riscv_csrrw(env, csr, , src, write_mask);
 
 if (ret != RISCV_EXCP_NONE) {
 riscv_raise_exception(env, ret, GETPC());
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index f616a26c82..688cb6a6ad 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -416,80 +416,150 @@ static bool trans_fence_i(DisasContext *ctx, arg_fence_i 
*a)
 return true;
 }
 
-#define RISCV_OP_CSR_PRE do {\
-source1 = tcg_temp_new(); \
-csr_store = tcg_temp_new(); \
-dest = tcg_temp_new(); \
-rs1_pass = tcg_temp_new(); \
-gen_get_gpr(ctx, source1, a->rs1); \
-tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next); \
-tcg_gen_movi_tl(rs1_pass, a->rs1); \
-tcg_gen_movi_tl(csr_store, a->csr); \
-gen_io_start();\
-} while (0)
+static bool do_csr_post(DisasContext *ctx)
+{
+/* We may have changed important cpu state -- exit to main loop. */
+tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
+exit_tb(ctx);
+ctx->base.is_jmp = DISAS_NORETURN;
+return true;
+}
 
-#define RISCV_OP_CSR_POST do {\
-gen_set_gpr(ctx, a->rd, dest); \
-tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn); \
-exit_tb(ctx); \
-ctx->base.is_jmp = DISAS_NORETURN; \
-tcg_temp_free(source1); \
-tcg_temp_free(csr_store); \
-tcg_temp_free(dest); \
-tcg_temp_free(rs1_pass); \
-} while (0)
+static bool do_csrr(DisasContext *ctx, int rd, int rc)
+{
+TCGv dest = dest_gpr(ctx, rd);
+TCGv_i32 csr = tcg_constant_i32(rc);
 
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrr(dest, cpu_env, csr);
+gen_set_gpr(ctx, rd, dest);
+return do_csr_post(ctx);
+}
+
+static bool do_csrw(DisasContext *ctx, int rc, TCGv src)
+{
+TCGv_i32 csr = tcg_constant_i32(rc);
+
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrw(cpu_env, csr, src);
+return do_csr_post(ctx);
+}
+
+static bool do_csrrw(DisasContext *ctx, int rd, 

[PATCH v2 21/21] target/riscv: Use {get,dest}_gpr for RVV

2021-08-17 Thread Richard Henderson
Remove gen_get_gpr, as the function becomes unused.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 13 ++---
 target/riscv/insn_trans/trans_rvv.c.inc | 74 +++--
 2 files changed, 26 insertions(+), 61 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 75e83fb41f..056d474faa 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -231,11 +231,6 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, 
DisasExtend ext)
 g_assert_not_reached();
 }
 
-static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
-{
-tcg_gen_mov_tl(t, get_gpr(ctx, reg_num, EXT_NONE));
-}
-
 static TCGv dest_gpr(DisasContext *ctx, int reg_num)
 {
 if (reg_num == 0 || ctx->w) {
@@ -634,9 +629,11 @@ void riscv_translate_init(void)
 {
 int i;
 
-/* cpu_gpr[0] is a placeholder for the zero register. Do not use it. */
-/* Use the gen_set_gpr and gen_get_gpr helper functions when accessing */
-/* registers, unless you specifically block reads/writes to reg 0 */
+/*
+ * cpu_gpr[0] is a placeholder for the zero register. Do not use it.
+ * Use the gen_set_gpr and get_gpr helper functions when accessing regs,
+ * unless you specifically block reads/writes to reg 0.
+ */
 cpu_gpr[0] = NULL;
 
 for (i = 1; i < 32; i++) {
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index de580c493c..fa451938f1 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -27,27 +27,22 @@ static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl *a)
 return false;
 }
 
-s2 = tcg_temp_new();
-dst = tcg_temp_new();
+s2 = get_gpr(ctx, a->rs2, EXT_ZERO);
+dst = dest_gpr(ctx, a->rd);
 
 /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
 if (a->rs1 == 0) {
 /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
 s1 = tcg_constant_tl(RV_VLEN_MAX);
 } else {
-s1 = tcg_temp_new();
-gen_get_gpr(ctx, s1, a->rs1);
+s1 = get_gpr(ctx, a->rs1, EXT_ZERO);
 }
-gen_get_gpr(ctx, s2, a->rs2);
 gen_helper_vsetvl(dst, cpu_env, s1, s2);
 gen_set_gpr(ctx, a->rd, dst);
+
 tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
 lookup_and_goto_ptr(ctx);
 ctx->base.is_jmp = DISAS_NORETURN;
-
-tcg_temp_free(s1);
-tcg_temp_free(s2);
-tcg_temp_free(dst);
 return true;
 }
 
@@ -60,23 +55,20 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli *a)
 }
 
 s2 = tcg_constant_tl(a->zimm);
-dst = tcg_temp_new();
+dst = dest_gpr(ctx, a->rd);
 
 /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
 if (a->rs1 == 0) {
 /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
 s1 = tcg_constant_tl(RV_VLEN_MAX);
 } else {
-s1 = tcg_temp_new();
-gen_get_gpr(ctx, s1, a->rs1);
+s1 = get_gpr(ctx, a->rs1, EXT_ZERO);
 }
 gen_helper_vsetvl(dst, cpu_env, s1, s2);
 gen_set_gpr(ctx, a->rd, dst);
+
 gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
 ctx->base.is_jmp = DISAS_NORETURN;
-
-tcg_temp_free(s1);
-tcg_temp_free(dst);
 return true;
 }
 
@@ -173,7 +165,7 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
-base = tcg_temp_new();
+base = get_gpr(s, rs1, EXT_NONE);
 
 /*
  * As simd_desc supports at most 256 bytes, and in this implementation,
@@ -184,7 +176,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
  */
 desc = tcg_constant_i32(simd_desc(s->vlen / 8, s->vlen / 8, data));
 
-gen_get_gpr(s, base, rs1);
 tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
 
@@ -192,7 +183,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 
 tcg_temp_free_ptr(dest);
 tcg_temp_free_ptr(mask);
-tcg_temp_free(base);
 gen_set_label(over);
 return true;
 }
@@ -330,12 +320,10 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
-base = tcg_temp_new();
-stride = tcg_temp_new();
+base = get_gpr(s, rs1, EXT_NONE);
+stride = get_gpr(s, rs2, EXT_NONE);
 desc = tcg_constant_i32(simd_desc(s->vlen / 8, s->vlen / 8, data));
 
-gen_get_gpr(s, base, rs1);
-gen_get_gpr(s, stride, rs2);
 tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
 
@@ -343,8 +331,6 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 
 tcg_temp_free_ptr(dest);
 tcg_temp_free_ptr(mask);
-tcg_temp_free(base);
-tcg_temp_free(stride);
 gen_set_label(over);
 return true;
 }
@@ -458,10 +444,9 @@ static bool 

[PATCH v2 13/21] target/riscv: Use get_gpr in branches

2021-08-17 Thread Richard Henderson
Narrow the scope of t0 in trans_jalr.

Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index e25f64c45a..af3e0bc0e6 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -54,24 +54,25 @@ static bool trans_jal(DisasContext *ctx, arg_jal *a)
 
 static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 {
-/* no chaining with JALR */
 TCGLabel *misaligned = NULL;
-TCGv t0 = tcg_temp_new();
 
-
-gen_get_gpr(ctx, cpu_pc, a->rs1);
-tcg_gen_addi_tl(cpu_pc, cpu_pc, a->imm);
+tcg_gen_addi_tl(cpu_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
 tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
 
 if (!has_ext(ctx, RVC)) {
+TCGv t0 = tcg_temp_new();
+
 misaligned = gen_new_label();
 tcg_gen_andi_tl(t0, cpu_pc, 0x2);
 tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
+tcg_temp_free(t0);
 }
 
 if (a->rd != 0) {
 tcg_gen_movi_tl(cpu_gpr[a->rd], ctx->pc_succ_insn);
 }
+
+/* No chaining with JALR. */
 lookup_and_goto_ptr(ctx);
 
 if (misaligned) {
@@ -80,21 +81,18 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 }
 ctx->base.is_jmp = DISAS_NORETURN;
 
-tcg_temp_free(t0);
 return true;
 }
 
 static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond cond)
 {
 TCGLabel *l = gen_new_label();
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
 
-tcg_gen_brcond_tl(cond, source1, source2, l);
+tcg_gen_brcond_tl(cond, src1, src2, l);
 gen_goto_tb(ctx, 1, ctx->pc_succ_insn);
+
 gen_set_label(l); /* branch taken */
 
 if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {
@@ -105,9 +103,6 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 }
 ctx->base.is_jmp = DISAS_NORETURN;
 
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-
 return true;
 }
 
-- 
2.25.1




[PATCH v2 12/21] target/riscv: Add gen_greviw

2021-08-17 Thread Richard Henderson
Replicate the bswap special case from gen_grevi for
the word-sized operation.

Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvb.c.inc | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index b97c3ca5da..af7694ed29 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -568,12 +568,24 @@ static bool trans_grevw(DisasContext *ctx, arg_grevw *a)
 return gen_shift(ctx, a, EXT_ZERO, gen_helper_grev);
 }
 
+static void gen_greviw(TCGv dest, TCGv src, target_long shamt)
+{
+#if TARGET_LONG_BITS == 64
+if (shamt == 32 - 8) {
+/* rev4, byte swaps */
+tcg_gen_bswap32_i64(dest, src, TCG_BSWAP_IZ | TCG_BSWAP_OS);
+return;
+}
+#endif
+gen_helper_grev(dest, src, tcg_constant_tl(shamt));
+}
+
 static bool trans_greviw(DisasContext *ctx, arg_greviw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
 ctx->w = true;
-return gen_shift_imm_tl(ctx, a, EXT_ZERO, gen_helper_grev);
+return gen_shift_imm_fn(ctx, a, EXT_ZERO, gen_greviw);
 }
 
 static bool trans_gorcw(DisasContext *ctx, arg_gorcw *a)
-- 
2.25.1




[PATCH v2 11/21] target/riscv: Use DisasExtend in shift operations

2021-08-17 Thread Richard Henderson
These operations are greatly simplified by ctx->w, which allows
us to fold gen_shiftw into gen_shift.  Split gen_shifti into
gen_shift_imm_{fn,tl} like we do for gen_arith_imm_{fn,tl}.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 110 +---
 target/riscv/insn_trans/trans_rvb.c.inc | 129 +++-
 target/riscv/insn_trans/trans_rvi.c.inc |  88 
 3 files changed, 125 insertions(+), 202 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 178d317976..75e83fb41f 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -98,6 +98,13 @@ static inline bool is_32bit(DisasContext *ctx)
 }
 #endif
 
+/* The word size for this operation. */
+static inline int oper_len(DisasContext *ctx)
+{
+return ctx->w ? 32 : TARGET_LONG_BITS;
+}
+
+
 /*
  * RISC-V requires NaN-boxing of narrower width floating point values.
  * This applies when a 32-bit value is assigned to a 64-bit FP register.
@@ -392,88 +399,58 @@ static bool gen_arith(DisasContext *ctx, arg_r *a, 
DisasExtend ext,
 return true;
 }
 
-static bool gen_shift(DisasContext *ctx, arg_r *a,
-void(*func)(TCGv, TCGv, TCGv))
+static bool gen_shift_imm_fn(DisasContext *ctx, arg_shift *a, DisasExtend ext,
+ void (*func)(TCGv, TCGv, target_long))
 {
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+TCGv dest, src1;
+int max_len = oper_len(ctx);
 
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-
-tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
-}
-
-static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
-{
-DisasContext *ctx = container_of(dcbase, DisasContext, base);
-CPUState *cpu = ctx->cs;
-CPURISCVState *env = cpu->env_ptr;
-
-return cpu_ldl_code(env, pc);
-}
-
-static bool gen_shifti(DisasContext *ctx, arg_shift *a,
-   void(*func)(TCGv, TCGv, TCGv))
-{
-if (a->shamt >= TARGET_LONG_BITS) {
+if (a->shamt >= max_len) {
 return false;
 }
 
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+dest = dest_gpr(ctx, a->rd);
+src1 = get_gpr(ctx, a->rs1, ext);
 
-gen_get_gpr(ctx, source1, a->rs1);
+func(dest, src1, a->shamt);
 
-tcg_gen_movi_tl(source2, a->shamt);
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static bool gen_shiftw(DisasContext *ctx, arg_r *a,
-   void(*func)(TCGv, TCGv, TCGv))
+static bool gen_shift_imm_tl(DisasContext *ctx, arg_shift *a, DisasExtend ext,
+ void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+TCGv dest, src1, src2;
+int max_len = oper_len(ctx);
 
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
+if (a->shamt >= max_len) {
+return false;
+}
 
-tcg_gen_andi_tl(source2, source2, 31);
-(*func)(source1, source1, source2);
-tcg_gen_ext32s_tl(source1, source1);
+dest = dest_gpr(ctx, a->rd);
+src1 = get_gpr(ctx, a->rs1, ext);
+src2 = tcg_constant_tl(a->shamt);
 
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+func(dest, src1, src2);
+
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static bool gen_shiftiw(DisasContext *ctx, arg_shift *a,
-void(*func)(TCGv, TCGv, TCGv))
+static bool gen_shift(DisasContext *ctx, arg_r *a, DisasExtend ext,
+  void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+TCGv ext2 = tcg_temp_new();
 
-gen_get_gpr(ctx, source1, a->rs1);
-tcg_gen_movi_tl(source2, a->shamt);
+tcg_gen_andi_tl(ext2, src2, oper_len(ctx) - 1);
+func(dest, src1, ext2);
 
-(*func)(source1, source1, source2);
-tcg_gen_ext32s_tl(source1, source1);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
+tcg_temp_free(ext2);
 return true;
 }
 
@@ -489,6 +466,15 @@ static bool gen_unary(DisasContext *ctx, arg_r2 *a, 
DisasExtend ext,
 return true;
 }
 
+static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
+{
+DisasContext *ctx = container_of(dcbase, DisasContext, base);
+CPUState *cpu = ctx->cs;
+CPURISCVState *env = cpu->env_ptr;
+
+return cpu_ldl_code(env, pc);
+}

[PATCH v2 16/21] target/riscv: Use {get,dest}_gpr for RVA

2021-08-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rva.c.inc | 47 ++---
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
b/target/riscv/insn_trans/trans_rva.c.inc
index 3cc3c3b073..6ea07d89b0 100644
--- a/target/riscv/insn_trans/trans_rva.c.inc
+++ b/target/riscv/insn_trans/trans_rva.c.inc
@@ -18,11 +18,10 @@
  * this program.  If not, see .
  */
 
-static inline bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp mop)
+static bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp mop)
 {
-TCGv src1 = tcg_temp_new();
-/* Put addr in load_res, data in load_val.  */
-gen_get_gpr(ctx, src1, a->rs1);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
+
 if (a->rl) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
 }
@@ -30,33 +29,33 @@ static inline bool gen_lr(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
 if (a->aq) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
 }
+
+/* Put addr in load_res, data in load_val.  */
 tcg_gen_mov_tl(load_res, src1);
 gen_set_gpr(ctx, a->rd, load_val);
 
-tcg_temp_free(src1);
 return true;
 }
 
-static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp mop)
+static bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp mop)
 {
-TCGv src1 = tcg_temp_new();
-TCGv src2 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
+TCGv dest, src1, src2;
 TCGLabel *l1 = gen_new_label();
 TCGLabel *l2 = gen_new_label();
 
-gen_get_gpr(ctx, src1, a->rs1);
+src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
 tcg_gen_brcond_tl(TCG_COND_NE, load_res, src1, l1);
 
-gen_get_gpr(ctx, src2, a->rs2);
 /*
  * Note that the TCG atomic primitives are SC,
  * so we can ignore AQ/RL along this path.
  */
-tcg_gen_atomic_cmpxchg_tl(src1, load_res, load_val, src2,
+dest = dest_gpr(ctx, a->rd);
+src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+tcg_gen_atomic_cmpxchg_tl(dest, load_res, load_val, src2,
   ctx->mem_idx, mop);
-tcg_gen_setcond_tl(TCG_COND_NE, dat, src1, load_val);
-gen_set_gpr(ctx, a->rd, dat);
+tcg_gen_setcond_tl(TCG_COND_NE, dest, dest, load_val);
+gen_set_gpr(ctx, a->rd, dest);
 tcg_gen_br(l2);
 
 gen_set_label(l1);
@@ -65,8 +64,7 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
  * provide the memory barrier implied by AQ/RL.
  */
 tcg_gen_mb(TCG_MO_ALL + a->aq * TCG_BAR_LDAQ + a->rl * TCG_BAR_STRL);
-tcg_gen_movi_tl(dat, 1);
-gen_set_gpr(ctx, a->rd, dat);
+gen_set_gpr(ctx, a->rd, tcg_constant_tl(1));
 
 gen_set_label(l2);
 /*
@@ -75,9 +73,6 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
  */
 tcg_gen_movi_tl(load_res, -1);
 
-tcg_temp_free(dat);
-tcg_temp_free(src1);
-tcg_temp_free(src2);
 return true;
 }
 
@@ -85,17 +80,13 @@ static bool gen_amo(DisasContext *ctx, arg_atomic *a,
 void(*func)(TCGv, TCGv, TCGv, TCGArg, MemOp),
 MemOp mop)
 {
-TCGv src1 = tcg_temp_new();
-TCGv src2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
 
-gen_get_gpr(ctx, src1, a->rs1);
-gen_get_gpr(ctx, src2, a->rs2);
+func(dest, src1, src2, ctx->mem_idx, mop);
 
-(*func)(src2, src1, src2, ctx->mem_idx, mop);
-
-gen_set_gpr(ctx, a->rd, src2);
-tcg_temp_free(src1);
-tcg_temp_free(src2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-- 
2.25.1




[PATCH v2 10/21] target/riscv: Add DisasExtend to gen_unary

2021-08-17 Thread Richard Henderson
Use ctx->w for ctpopw, which is the only one that can
re-use the generic algorithm for the narrow operation.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 14 ++
 target/riscv/insn_trans/trans_rvb.c.inc | 24 +---
 2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 8d96e70abb..178d317976 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -477,17 +477,15 @@ static bool gen_shiftiw(DisasContext *ctx, arg_shift *a,
 return true;
 }
 
-static bool gen_unary(DisasContext *ctx, arg_r2 *a,
-  void(*func)(TCGv, TCGv))
+static bool gen_unary(DisasContext *ctx, arg_r2 *a, DisasExtend ext,
+  void (*func)(TCGv, TCGv))
 {
-TCGv source = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
 
-gen_get_gpr(ctx, source, a->rs1);
+func(dest, src1);
 
-(*func)(source, source);
-
-gen_set_gpr(ctx, a->rd, source);
-tcg_temp_free(source);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 73f088be23..e255678fff 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -26,7 +26,7 @@ static void gen_clz(TCGv ret, TCGv arg1)
 static bool trans_clz(DisasContext *ctx, arg_clz *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_clz);
+return gen_unary(ctx, a, EXT_ZERO, gen_clz);
 }
 
 static void gen_ctz(TCGv ret, TCGv arg1)
@@ -37,13 +37,13 @@ static void gen_ctz(TCGv ret, TCGv arg1)
 static bool trans_ctz(DisasContext *ctx, arg_ctz *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_ctz);
+return gen_unary(ctx, a, EXT_ZERO, gen_ctz);
 }
 
 static bool trans_cpop(DisasContext *ctx, arg_cpop *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, tcg_gen_ctpop_tl);
+return gen_unary(ctx, a, EXT_ZERO, tcg_gen_ctpop_tl);
 }
 
 static bool trans_andn(DisasContext *ctx, arg_andn *a)
@@ -132,13 +132,13 @@ static bool trans_maxu(DisasContext *ctx, arg_maxu *a)
 static bool trans_sext_b(DisasContext *ctx, arg_sext_b *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, tcg_gen_ext8s_tl);
+return gen_unary(ctx, a, EXT_NONE, tcg_gen_ext8s_tl);
 }
 
 static bool trans_sext_h(DisasContext *ctx, arg_sext_h *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, tcg_gen_ext16s_tl);
+return gen_unary(ctx, a, EXT_NONE, tcg_gen_ext16s_tl);
 }
 
 static void gen_sbop_mask(TCGv ret, TCGv shamt)
@@ -366,7 +366,6 @@ GEN_TRANS_SHADD(3)
 
 static void gen_clzw(TCGv ret, TCGv arg1)
 {
-tcg_gen_ext32u_tl(ret, arg1);
 tcg_gen_clzi_tl(ret, ret, 64);
 tcg_gen_subi_tl(ret, ret, 32);
 }
@@ -375,7 +374,7 @@ static bool trans_clzw(DisasContext *ctx, arg_clzw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_clzw);
+return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
 }
 
 static void gen_ctzw(TCGv ret, TCGv arg1)
@@ -388,20 +387,15 @@ static bool trans_ctzw(DisasContext *ctx, arg_ctzw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_ctzw);
-}
-
-static void gen_cpopw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ext32u_tl(arg1, arg1);
-tcg_gen_ctpop_tl(ret, arg1);
+return gen_unary(ctx, a, EXT_NONE, gen_ctzw);
 }
 
 static bool trans_cpopw(DisasContext *ctx, arg_cpopw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_cpopw);
+ctx->w = true;
+return gen_unary(ctx, a, EXT_ZERO, tcg_gen_ctpop_tl);
 }
 
 static void gen_packw(TCGv ret, TCGv arg1, TCGv arg2)
-- 
2.25.1




[PATCH v2 05/21] target/riscv: Add DisasExtend to gen_arith*

2021-08-17 Thread Richard Henderson
Most arithmetic does not require extending the inputs.
Exceptions include division, comparison and minmax.

Begin using ctx->w, which allows elimination of gen_addw,
gen_subw, gen_mulw.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 69 +++--
 target/riscv/insn_trans/trans_rvb.c.inc | 30 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 39 --
 target/riscv/insn_trans/trans_rvm.c.inc | 16 +++---
 4 files changed, 64 insertions(+), 90 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d5cf5e5826..4819682bf1 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -229,7 +229,7 @@ static void gen_get_gpr(DisasContext *ctx, TCGv t, int 
reg_num)
 tcg_gen_mov_tl(t, get_gpr(ctx, reg_num, EXT_NONE));
 }
 
-static TCGv __attribute__((unused)) dest_gpr(DisasContext *ctx, int reg_num)
+static TCGv dest_gpr(DisasContext *ctx, int reg_num)
 {
 if (reg_num == 0 || ctx->w) {
 return temp_new(ctx);
@@ -466,57 +466,31 @@ static int ex_rvc_shifti(DisasContext *ctx, int imm)
 /* Include the auto-generated decoder for 32 bit insn */
 #include "decode-insn32.c.inc"
 
-static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a,
+static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a, DisasExtend ext,
  void (*func)(TCGv, TCGv, target_long))
 {
-TCGv source1;
-source1 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
 
-gen_get_gpr(ctx, source1, a->rs1);
+func(dest, src1, a->imm);
 
-(*func)(source1, source1, a->imm);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a,
+static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, DisasExtend ext,
  void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = tcg_constant_tl(a->imm);
 
-gen_get_gpr(ctx, source1, a->rs1);
-tcg_gen_movi_tl(source2, a->imm);
+func(dest, src1, src2);
 
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static void gen_addw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_add_tl(ret, arg1, arg2);
-tcg_gen_ext32s_tl(ret, ret);
-}
-
-static void gen_subw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_sub_tl(ret, arg1, arg2);
-tcg_gen_ext32s_tl(ret, ret);
-}
-
-static void gen_mulw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_mul_tl(ret, arg1, arg2);
-tcg_gen_ext32s_tl(ret, ret);
-}
-
 static bool gen_arith_div_w(DisasContext *ctx, arg_r *a,
 void(*func)(TCGv, TCGv, TCGv))
 {
@@ -782,21 +756,16 @@ static void gen_add_uw(TCGv ret, TCGv arg1, TCGv arg2)
 tcg_gen_add_tl(ret, arg1, arg2);
 }
 
-static bool gen_arith(DisasContext *ctx, arg_r *a,
-  void(*func)(TCGv, TCGv, TCGv))
+static bool gen_arith(DisasContext *ctx, arg_r *a, DisasExtend ext,
+  void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = get_gpr(ctx, a->rs2, ext);
 
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
+func(dest, src1, src2);
 
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 260e15b47d..217a7d1f26 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -38,61 +38,61 @@ static bool trans_cpop(DisasContext *ctx, arg_cpop *a)
 static bool trans_andn(DisasContext *ctx, arg_andn *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, tcg_gen_andc_tl);
+return gen_arith(ctx, a, EXT_NONE, tcg_gen_andc_tl);
 }
 
 static bool trans_orn(DisasContext *ctx, arg_orn *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, tcg_gen_orc_tl);
+return gen_arith(ctx, a, EXT_NONE, tcg_gen_orc_tl);
 }
 
 static bool trans_xnor(DisasContext *ctx, arg_xnor *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, tcg_gen_eqv_tl);
+return gen_arith(ctx, a, EXT_NONE, tcg_gen_eqv_tl);
 }
 
 static bool trans_pack(DisasContext *ctx, arg_pack *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_pack);
+return gen_arith(ctx, a, EXT_NONE, gen_pack);
 }
 
 static bool 

[PATCH v2 07/21] target/riscv: Use gen_arith for mulh and mulhu

2021-08-17 Thread Richard Henderson
Split out gen_mulh and gen_mulhu and use the common helper.

Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvm.c.inc | 40 +++--
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvm.c.inc 
b/target/riscv/insn_trans/trans_rvm.c.inc
index 3d93b24c25..80552be7a3 100644
--- a/target/riscv/insn_trans/trans_rvm.c.inc
+++ b/target/riscv/insn_trans/trans_rvm.c.inc
@@ -25,20 +25,18 @@ static bool trans_mul(DisasContext *ctx, arg_mul *a)
 return gen_arith(ctx, a, EXT_NONE, tcg_gen_mul_tl);
 }
 
+static void gen_mulh(TCGv ret, TCGv s1, TCGv s2)
+{
+TCGv discard = tcg_temp_new();
+
+tcg_gen_muls2_tl(discard, ret, s1, s2);
+tcg_temp_free(discard);
+}
+
 static bool trans_mulh(DisasContext *ctx, arg_mulh *a)
 {
 REQUIRE_EXT(ctx, RVM);
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-
-tcg_gen_muls2_tl(source2, source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
+return gen_arith(ctx, a, EXT_NONE, gen_mulh);
 }
 
 static bool trans_mulhsu(DisasContext *ctx, arg_mulhsu *a)
@@ -47,20 +45,18 @@ static bool trans_mulhsu(DisasContext *ctx, arg_mulhsu *a)
 return gen_arith(ctx, a, EXT_NONE, gen_mulhsu);
 }
 
+static void gen_mulhu(TCGv ret, TCGv s1, TCGv s2)
+{
+TCGv discard = tcg_temp_new();
+
+tcg_gen_mulu2_tl(discard, ret, s1, s2);
+tcg_temp_free(discard);
+}
+
 static bool trans_mulhu(DisasContext *ctx, arg_mulhu *a)
 {
 REQUIRE_EXT(ctx, RVM);
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-
-tcg_gen_mulu2_tl(source2, source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
+return gen_arith(ctx, a, EXT_NONE, gen_mulhu);
 }
 
 static bool trans_div(DisasContext *ctx, arg_div *a)
-- 
2.25.1




[PATCH v2 03/21] target/riscv: Add DisasContext to gen_get_gpr, gen_set_gpr

2021-08-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 58 -
 target/riscv/insn_trans/trans_rva.c.inc | 18 
 target/riscv/insn_trans/trans_rvb.c.inc |  4 +-
 target/riscv/insn_trans/trans_rvd.c.inc | 32 +++---
 target/riscv/insn_trans/trans_rvf.c.inc | 32 +++---
 target/riscv/insn_trans/trans_rvh.c.inc | 52 +++---
 target/riscv/insn_trans/trans_rvi.c.inc | 44 +--
 target/riscv/insn_trans/trans_rvm.c.inc | 12 ++---
 target/riscv/insn_trans/trans_rvv.c.inc | 36 +++
 9 files changed, 144 insertions(+), 144 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 6ae7e140d0..d540c85a1a 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -175,7 +175,7 @@ static void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 /* Wrapper for getting reg values - need to check of reg is zero since
  * cpu_gpr[0] is not actually allocated
  */
-static inline void gen_get_gpr(TCGv t, int reg_num)
+static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
 {
 if (reg_num == 0) {
 tcg_gen_movi_tl(t, 0);
@@ -189,7 +189,7 @@ static inline void gen_get_gpr(TCGv t, int reg_num)
  * since we usually avoid calling the OP_TYPE_gen function if we see a write to
  * $zero
  */
-static inline void gen_set_gpr(int reg_num_dst, TCGv t)
+static void gen_set_gpr(DisasContext *ctx, int reg_num_dst, TCGv t)
 {
 if (reg_num_dst != 0) {
 tcg_gen_mov_tl(cpu_gpr[reg_num_dst], t);
@@ -420,11 +420,11 @@ static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a,
 TCGv source1;
 source1 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
+gen_get_gpr(ctx, source1, a->rs1);
 
 (*func)(source1, source1, a->imm);
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 return true;
 }
@@ -436,12 +436,12 @@ static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
+gen_get_gpr(ctx, source1, a->rs1);
 tcg_gen_movi_tl(source2, a->imm);
 
 (*func)(source1, source1, source2);
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -472,15 +472,15 @@ static bool gen_arith_div_w(DisasContext *ctx, arg_r *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 tcg_gen_ext32s_tl(source1, source1);
 tcg_gen_ext32s_tl(source2, source2);
 
 (*func)(source1, source1, source2);
 
 tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -493,15 +493,15 @@ static bool gen_arith_div_uw(DisasContext *ctx, arg_r *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 tcg_gen_ext32u_tl(source1, source1);
 tcg_gen_ext32u_tl(source2, source2);
 
 (*func)(source1, source1, source2);
 
 tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -591,7 +591,7 @@ static bool gen_grevi(DisasContext *ctx, arg_grevi *a)
 TCGv source1 = tcg_temp_new();
 TCGv source2;
 
-gen_get_gpr(source1, a->rs1);
+gen_get_gpr(ctx, source1, a->rs1);
 
 if (a->shamt == (TARGET_LONG_BITS - 8)) {
 /* rev8, byte swaps */
@@ -603,7 +603,7 @@ static bool gen_grevi(DisasContext *ctx, arg_grevi *a)
 tcg_temp_free(source2);
 }
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 return true;
 }
@@ -737,12 +737,12 @@ static bool gen_arith(DisasContext *ctx, arg_r *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 
 (*func)(source1, source1, source2);
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -754,13 +754,13 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
 TCGv source1 = tcg_temp_new();
 TCGv source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 
 tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
 

[PATCH v2 04/21] target/riscv: Introduce DisasExtend and new helpers

2021-08-17 Thread Richard Henderson
Introduce get_gpr, dest_gpr, temp_new -- new helpers that do not force
tcg globals into temps, returning a constant 0 for $zero as source and
a new temp for $zero as destination.

Introduce ctx->w for simplifying word operations, such as addw.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c | 102 +++
 1 file changed, 82 insertions(+), 20 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d540c85a1a..d5cf5e5826 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -39,15 +39,25 @@ static TCGv load_val;
 
 #include "exec/gen-icount.h"
 
+/*
+ * If an operation is being performed on less than TARGET_LONG_BITS,
+ * it may require the inputs to be sign- or zero-extended; which will
+ * depend on the exact operation being performed.
+ */
+typedef enum {
+EXT_NONE,
+EXT_SIGN,
+EXT_ZERO,
+} DisasExtend;
+
 typedef struct DisasContext {
 DisasContextBase base;
 /* pc_succ_insn points to the instruction following base.pc_next */
 target_ulong pc_succ_insn;
 target_ulong priv_ver;
-bool virt_enabled;
+target_ulong misa;
 uint32_t opcode;
 uint32_t mstatus_fs;
-target_ulong misa;
 uint32_t mem_idx;
 /* Remember the rounding mode encoded in the previous fp instruction,
which we have already installed into env->fp_status.  Or -1 for
@@ -55,6 +65,8 @@ typedef struct DisasContext {
to any system register, which includes CSR_FRM, so we do not have
to reset this known value.  */
 int frm;
+bool w;
+bool virt_enabled;
 bool ext_ifencei;
 bool hlsx;
 /* vector extension */
@@ -64,7 +76,10 @@ typedef struct DisasContext {
 uint16_t vlen;
 uint16_t mlen;
 bool vl_eq_vlmax;
+uint8_t ntemp;
 CPUState *cs;
+TCGv zero;
+TCGv temp[4];
 } DisasContext;
 
 static inline bool has_ext(DisasContext *ctx, uint32_t ext)
@@ -172,27 +187,64 @@ static void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 }
 }
 
-/* Wrapper for getting reg values - need to check of reg is zero since
- * cpu_gpr[0] is not actually allocated
+/*
+ * Wrappers for getting reg values.
+ *
+ * The $zero register does not have cpu_gpr[0] allocated -- we supply the
+ * constant zero as a source, and an uninitialized sink as destination.
+ *
+ * Further, we may provide an extension for word operations.
  */
-static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
+static TCGv temp_new(DisasContext *ctx)
 {
-if (reg_num == 0) {
-tcg_gen_movi_tl(t, 0);
-} else {
-tcg_gen_mov_tl(t, cpu_gpr[reg_num]);
-}
+assert(ctx->ntemp < ARRAY_SIZE(ctx->temp));
+return ctx->temp[ctx->ntemp++] = tcg_temp_new();
 }
 
-/* Wrapper for setting reg values - need to check of reg is zero since
- * cpu_gpr[0] is not actually allocated. this is more for safety purposes,
- * since we usually avoid calling the OP_TYPE_gen function if we see a write to
- * $zero
- */
-static void gen_set_gpr(DisasContext *ctx, int reg_num_dst, TCGv t)
+static TCGv get_gpr(DisasContext *ctx, int reg_num, DisasExtend ext)
 {
-if (reg_num_dst != 0) {
-tcg_gen_mov_tl(cpu_gpr[reg_num_dst], t);
+TCGv t;
+
+if (reg_num == 0) {
+return ctx->zero;
+}
+
+switch (ctx->w ? ext : EXT_NONE) {
+case EXT_NONE:
+return cpu_gpr[reg_num];
+case EXT_SIGN:
+t = temp_new(ctx);
+tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
+return t;
+case EXT_ZERO:
+t = temp_new(ctx);
+tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
+return t;
+}
+g_assert_not_reached();
+}
+
+static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
+{
+tcg_gen_mov_tl(t, get_gpr(ctx, reg_num, EXT_NONE));
+}
+
+static TCGv __attribute__((unused)) dest_gpr(DisasContext *ctx, int reg_num)
+{
+if (reg_num == 0 || ctx->w) {
+return temp_new(ctx);
+}
+return cpu_gpr[reg_num];
+}
+
+static void gen_set_gpr(DisasContext *ctx, int reg_num, TCGv t)
+{
+if (reg_num != 0) {
+if (ctx->w) {
+tcg_gen_ext32s_tl(cpu_gpr[reg_num], t);
+} else {
+tcg_gen_mov_tl(cpu_gpr[reg_num], t);
+}
 }
 }
 
@@ -927,8 +979,11 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->cs = cs;
 }
 
-static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
+static void riscv_tr_tb_start(DisasContextBase *dcbase, CPUState *cpu)
 {
+DisasContext *ctx = container_of(dcbase, DisasContext, base);
+
+ctx->zero = tcg_constant_tl(0);
 }
 
 static void riscv_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
@@ -946,6 +1001,13 @@ static void riscv_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 
 decode_opc(env, ctx, opcode16);
 ctx->base.pc_next = ctx->pc_succ_insn;
+ctx->w = false;
+
+for (int i = ctx->ntemp - 1; i >= 0; --i) {
+

[PATCH v2 09/21] target/riscv: Move gen_* helpers for RVB

2021-08-17 Thread Richard Henderson
Move these helpers near their use by the trans_*
functions within insn_trans/trans_rvb.c.inc.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 233 ---
 target/riscv/insn_trans/trans_rvb.c.inc | 234 
 2 files changed, 234 insertions(+), 233 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 168274934d..8d96e70abb 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -379,229 +379,6 @@ static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, 
DisasExtend ext,
 return true;
 }
 
-static void gen_pack(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_deposit_tl(ret, arg1, arg2,
-   TARGET_LONG_BITS / 2,
-   TARGET_LONG_BITS / 2);
-}
-
-static void gen_packu(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_shri_tl(t, arg1, TARGET_LONG_BITS / 2);
-tcg_gen_deposit_tl(ret, arg2, t, 0, TARGET_LONG_BITS / 2);
-tcg_temp_free(t);
-}
-
-static void gen_packh(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_ext8u_tl(t, arg2);
-tcg_gen_deposit_tl(ret, arg1, t, 8, TARGET_LONG_BITS - 8);
-tcg_temp_free(t);
-}
-
-static void gen_sbop_mask(TCGv ret, TCGv shamt)
-{
-tcg_gen_movi_tl(ret, 1);
-tcg_gen_shl_tl(ret, ret, shamt);
-}
-
-static void gen_bset(TCGv ret, TCGv arg1, TCGv shamt)
-{
-TCGv t = tcg_temp_new();
-
-gen_sbop_mask(t, shamt);
-tcg_gen_or_tl(ret, arg1, t);
-
-tcg_temp_free(t);
-}
-
-static void gen_bclr(TCGv ret, TCGv arg1, TCGv shamt)
-{
-TCGv t = tcg_temp_new();
-
-gen_sbop_mask(t, shamt);
-tcg_gen_andc_tl(ret, arg1, t);
-
-tcg_temp_free(t);
-}
-
-static void gen_binv(TCGv ret, TCGv arg1, TCGv shamt)
-{
-TCGv t = tcg_temp_new();
-
-gen_sbop_mask(t, shamt);
-tcg_gen_xor_tl(ret, arg1, t);
-
-tcg_temp_free(t);
-}
-
-static void gen_bext(TCGv ret, TCGv arg1, TCGv shamt)
-{
-tcg_gen_shr_tl(ret, arg1, shamt);
-tcg_gen_andi_tl(ret, ret, 1);
-}
-
-static void gen_slo(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_not_tl(ret, arg1);
-tcg_gen_shl_tl(ret, ret, arg2);
-tcg_gen_not_tl(ret, ret);
-}
-
-static void gen_sro(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_not_tl(ret, arg1);
-tcg_gen_shr_tl(ret, ret, arg2);
-tcg_gen_not_tl(ret, ret);
-}
-
-static bool gen_grevi(DisasContext *ctx, arg_grevi *a)
-{
-TCGv source1 = tcg_temp_new();
-TCGv source2;
-
-gen_get_gpr(ctx, source1, a->rs1);
-
-if (a->shamt == (TARGET_LONG_BITS - 8)) {
-/* rev8, byte swaps */
-tcg_gen_bswap_tl(source1, source1);
-} else {
-source2 = tcg_temp_new();
-tcg_gen_movi_tl(source2, a->shamt);
-gen_helper_grev(source1, source1, source2);
-tcg_temp_free(source2);
-}
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-return true;
-}
-
-#define GEN_SHADD(SHAMT)   \
-static void gen_sh##SHAMT##add(TCGv ret, TCGv arg1, TCGv arg2) \
-{  \
-TCGv t = tcg_temp_new();   \
-   \
-tcg_gen_shli_tl(t, arg1, SHAMT);   \
-tcg_gen_add_tl(ret, t, arg2);  \
-   \
-tcg_temp_free(t);  \
-}
-
-GEN_SHADD(1)
-GEN_SHADD(2)
-GEN_SHADD(3)
-
-static void gen_ctzw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ori_tl(ret, arg1, (target_ulong)MAKE_64BIT_MASK(32, 32));
-tcg_gen_ctzi_tl(ret, ret, 64);
-}
-
-static void gen_clzw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ext32u_tl(ret, arg1);
-tcg_gen_clzi_tl(ret, ret, 64);
-tcg_gen_subi_tl(ret, ret, 32);
-}
-
-static void gen_cpopw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ext32u_tl(arg1, arg1);
-tcg_gen_ctpop_tl(ret, arg1);
-}
-
-static void gen_packw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_ext16s_tl(t, arg2);
-tcg_gen_deposit_tl(ret, arg1, t, 16, 48);
-tcg_temp_free(t);
-}
-
-static void gen_packuw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_shri_tl(t, arg1, 16);
-tcg_gen_deposit_tl(ret, arg2, t, 0, 16);
-tcg_gen_ext32s_tl(ret, ret);
-tcg_temp_free(t);
-}
-
-static void gen_rorw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = tcg_temp_new_i32();
-
-/* truncate to 32-bits */
-tcg_gen_trunc_tl_i32(t1, arg1);
-tcg_gen_trunc_tl_i32(t2, arg2);
-
-tcg_gen_rotr_i32(t1, t1, t2);
-
-/* sign-extend 64-bits */
-tcg_gen_ext_i32_tl(ret, t1);
-
-tcg_temp_free_i32(t1);
-tcg_temp_free_i32(t2);
-}
-
-static void gen_rolw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = 

[PATCH v2 06/21] target/riscv: Remove gen_arith_div*

2021-08-17 Thread Richard Henderson
Use ctx->w and the enhanced gen_arith function.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 42 -
 target/riscv/insn_trans/trans_rvm.c.inc | 16 +-
 2 files changed, 8 insertions(+), 50 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 4819682bf1..e337dca01b 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -491,48 +491,6 @@ static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, 
DisasExtend ext,
 return true;
 }
 
-static bool gen_arith_div_w(DisasContext *ctx, arg_r *a,
-void(*func)(TCGv, TCGv, TCGv))
-{
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
-
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-tcg_gen_ext32s_tl(source1, source1);
-tcg_gen_ext32s_tl(source2, source2);
-
-(*func)(source1, source1, source2);
-
-tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
-}
-
-static bool gen_arith_div_uw(DisasContext *ctx, arg_r *a,
-void(*func)(TCGv, TCGv, TCGv))
-{
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
-
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-tcg_gen_ext32u_tl(source1, source1);
-tcg_gen_ext32u_tl(source2, source2);
-
-(*func)(source1, source1, source2);
-
-tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
-}
-
 static void gen_pack(TCGv ret, TCGv arg1, TCGv arg2)
 {
 tcg_gen_deposit_tl(ret, arg1, arg2,
diff --git a/target/riscv/insn_trans/trans_rvm.c.inc 
b/target/riscv/insn_trans/trans_rvm.c.inc
index 013b3f7009..3d93b24c25 100644
--- a/target/riscv/insn_trans/trans_rvm.c.inc
+++ b/target/riscv/insn_trans/trans_rvm.c.inc
@@ -99,30 +99,30 @@ static bool trans_divw(DisasContext *ctx, arg_divw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_w(ctx, a, _div);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_SIGN, gen_div);
 }
 
 static bool trans_divuw(DisasContext *ctx, arg_divuw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_uw(ctx, a, _divu);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_ZERO, gen_divu);
 }
 
 static bool trans_remw(DisasContext *ctx, arg_remw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_w(ctx, a, _rem);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_SIGN, gen_rem);
 }
 
 static bool trans_remuw(DisasContext *ctx, arg_remuw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_uw(ctx, a, _remu);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_ZERO, gen_remu);
 }
-- 
2.25.1




[PATCH v2 08/21] target/riscv: Move gen_* helpers for RVM

2021-08-17 Thread Richard Henderson
Move these helpers near their use by the trans_*
functions within insn_trans/trans_rvm.c.inc.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 112 
 target/riscv/insn_trans/trans_rvm.c.inc | 112 
 2 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index e337dca01b..168274934d 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -248,118 +248,6 @@ static void gen_set_gpr(DisasContext *ctx, int reg_num, 
TCGv t)
 }
 }
 
-static void gen_mulhsu(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv rl = tcg_temp_new();
-TCGv rh = tcg_temp_new();
-
-tcg_gen_mulu2_tl(rl, rh, arg1, arg2);
-/* fix up for one negative */
-tcg_gen_sari_tl(rl, arg1, TARGET_LONG_BITS - 1);
-tcg_gen_and_tl(rl, rl, arg2);
-tcg_gen_sub_tl(ret, rh, rl);
-
-tcg_temp_free(rl);
-tcg_temp_free(rh);
-}
-
-static void gen_div(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp1, temp2, zero, one, mone, min;
-
-/*
- * Handle by altering args to tcg_gen_div to produce req'd results:
- * For overflow: want source1 in temp1 and 1 in temp2
- * For div by zero: want -1 in temp1 and 1 in temp2 -> -1 result
- */
-temp1 = tcg_temp_new();
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-mone = tcg_constant_tl(-1);
-min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
-
-tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
-tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
-tcg_gen_and_tl(temp1, temp1, temp2); /* temp1 = overflow */
-tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, zero); /* temp2 = div0 */
-tcg_gen_or_tl(temp2, temp2, temp1);  /* temp2 = overflow | div0 */
-
-/* if div by zero, set source1 to -1, otherwise don't change */
-tcg_gen_movcond_tl(TCG_COND_NE, temp1, source2, zero, source1, mone);
-
-/* if overflow or div by zero, set source2 to 1, else don't change */
-tcg_gen_movcond_tl(TCG_COND_EQ, temp2, temp2, zero, source2, one);
-
-tcg_gen_div_tl(ret, temp1, temp2);
-
-tcg_temp_free(temp1);
-tcg_temp_free(temp2);
-}
-
-static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp1, temp2, zero, one, mone;
-
-temp1 = tcg_temp_new();
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-mone = tcg_constant_tl(-1);
-
-tcg_gen_movcond_tl(TCG_COND_NE, temp1, source2, zero, source1, mone);
-tcg_gen_movcond_tl(TCG_COND_NE, temp2, source2, zero, source2, one);
-tcg_gen_divu_tl(ret, temp1, temp2);
-
-tcg_temp_free(temp1);
-tcg_temp_free(temp2);
-}
-
-static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp1, temp2, zero, one, mone, min;
-
-temp1 = tcg_temp_new();
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-mone = tcg_constant_tl(-1);
-min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
-
-tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
-tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
-tcg_gen_and_tl(temp1, temp1, temp2); /* temp1 = overflow */
-tcg_gen_setcondi_tl(TCG_COND_EQ, temp2, source2, 0); /* temp2 = div0 */
-tcg_gen_or_tl(temp2, temp2, temp1);  /* temp2 = overflow | div0 */
-
-/* if overflow or div by zero, set source2 to 1, else don't change */
-tcg_gen_movcond_tl(TCG_COND_EQ, temp2, temp2, zero, source2, one);
-tcg_gen_rem_tl(temp1, temp1, temp2);
-
-/* if div by zero, just return the original dividend */
-tcg_gen_movcond_tl(TCG_COND_NE, ret, source2, zero, temp1, source1);
-
-tcg_temp_free(temp1);
-tcg_temp_free(temp2);
-}
-
-static void gen_remu(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp2, zero, one;
-
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-
-tcg_gen_movcond_tl(TCG_COND_EQ, temp2, source2, zero, source2, one);
-tcg_gen_remu_tl(temp2, source1, temp2);
-
-/* if div by zero, just return the original dividend */
-tcg_gen_movcond_tl(TCG_COND_NE, ret, source2, zero, temp2, source1);
-
-tcg_temp_free(temp2);
-}
-
 static void gen_jal(DisasContext *ctx, int rd, target_ulong imm)
 {
 target_ulong next_pc;
diff --git a/target/riscv/insn_trans/trans_rvm.c.inc 
b/target/riscv/insn_trans/trans_rvm.c.inc
index 80552be7a3..ca665b96b1 100644
--- a/target/riscv/insn_trans/trans_rvm.c.inc
+++ b/target/riscv/insn_trans/trans_rvm.c.inc
@@ -39,6 +39,21 @@ static bool trans_mulh(DisasContext *ctx, arg_mulh *a)
 return gen_arith(ctx, a, EXT_NONE, gen_mulh);
 }
 
+static void gen_mulhsu(TCGv ret, TCGv arg1, TCGv arg2)
+{
+TCGv rl = tcg_temp_new();
+TCGv rh = tcg_temp_new();
+
+tcg_gen_mulu2_tl(rl, rh, arg1, arg2);
+/* fix up for one negative */
+tcg_gen_sari_tl(rl, arg1, TARGET_LONG_BITS - 1);
+

[PATCH v2 02/21] target/riscv: Clean up division helpers

2021-08-17 Thread Richard Henderson
Utilize the condition in the movcond more; this allows some of
the setcond that were feeding into movcond to be removed.
Do not write into source1 and source2.  Re-name "condN" to "tempN"
and use the temporaries for more than holding conditions.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c | 137 +++
 1 file changed, 65 insertions(+), 72 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 20a55c92fb..6ae7e140d0 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -213,106 +213,99 @@ static void gen_mulhsu(TCGv ret, TCGv arg1, TCGv arg2)
 
 static void gen_div(TCGv ret, TCGv source1, TCGv source2)
 {
-TCGv cond1, cond2, zeroreg, resultopt1;
+TCGv temp1, temp2, zero, one, mone, min;
+
 /*
  * Handle by altering args to tcg_gen_div to produce req'd results:
- * For overflow: want source1 in source1 and 1 in source2
- * For div by zero: want -1 in source1 and 1 in source2 -> -1 result
+ * For overflow: want source1 in temp1 and 1 in temp2
+ * For div by zero: want -1 in temp1 and 1 in temp2 -> -1 result
  */
-cond1 = tcg_temp_new();
-cond2 = tcg_temp_new();
-zeroreg = tcg_constant_tl(0);
-resultopt1 = tcg_temp_new();
+temp1 = tcg_temp_new();
+temp2 = tcg_temp_new();
+zero = tcg_constant_tl(0);
+one = tcg_constant_tl(1);
+mone = tcg_constant_tl(-1);
+min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
+
+tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
+tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
+tcg_gen_and_tl(temp1, temp1, temp2); /* temp1 = overflow */
+tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, zero); /* temp2 = div0 */
+tcg_gen_or_tl(temp2, temp2, temp1);  /* temp2 = overflow | div0 */
 
-tcg_gen_movi_tl(resultopt1, (target_ulong)-1);
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond2, source2, (target_ulong)(~0L));
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source1,
-((target_ulong)1) << (TARGET_LONG_BITS - 1));
-tcg_gen_and_tl(cond1, cond1, cond2); /* cond1 = overflow */
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond2, source2, 0); /* cond2 = div 0 */
 /* if div by zero, set source1 to -1, otherwise don't change */
-tcg_gen_movcond_tl(TCG_COND_EQ, source1, cond2, zeroreg, source1,
-resultopt1);
-/* if overflow or div by zero, set source2 to 1, else don't change */
-tcg_gen_or_tl(cond1, cond1, cond2);
-tcg_gen_movi_tl(resultopt1, (target_ulong)1);
-tcg_gen_movcond_tl(TCG_COND_EQ, source2, cond1, zeroreg, source2,
-resultopt1);
-tcg_gen_div_tl(ret, source1, source2);
+tcg_gen_movcond_tl(TCG_COND_NE, temp1, source2, zero, source1, mone);
 
-tcg_temp_free(cond1);
-tcg_temp_free(cond2);
-tcg_temp_free(resultopt1);
+/* if overflow or div by zero, set source2 to 1, else don't change */
+tcg_gen_movcond_tl(TCG_COND_EQ, temp2, temp2, zero, source2, one);
+
+tcg_gen_div_tl(ret, temp1, temp2);
+
+tcg_temp_free(temp1);
+tcg_temp_free(temp2);
 }
 
 static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
 {
-TCGv cond1, zeroreg, resultopt1;
-cond1 = tcg_temp_new();
+TCGv temp1, temp2, zero, one, mone;
 
-zeroreg = tcg_constant_tl(0);
-resultopt1 = tcg_temp_new();
+temp1 = tcg_temp_new();
+temp2 = tcg_temp_new();
+zero = tcg_constant_tl(0);
+one = tcg_constant_tl(1);
+mone = tcg_constant_tl(-1);
 
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source2, 0);
-tcg_gen_movi_tl(resultopt1, (target_ulong)-1);
-tcg_gen_movcond_tl(TCG_COND_EQ, source1, cond1, zeroreg, source1,
-resultopt1);
-tcg_gen_movi_tl(resultopt1, (target_ulong)1);
-tcg_gen_movcond_tl(TCG_COND_EQ, source2, cond1, zeroreg, source2,
-resultopt1);
-tcg_gen_divu_tl(ret, source1, source2);
+tcg_gen_movcond_tl(TCG_COND_NE, temp1, source2, zero, source1, mone);
+tcg_gen_movcond_tl(TCG_COND_NE, temp2, source2, zero, source2, one);
+tcg_gen_divu_tl(ret, temp1, temp2);
 
-tcg_temp_free(cond1);
-tcg_temp_free(resultopt1);
+tcg_temp_free(temp1);
+tcg_temp_free(temp2);
 }
 
 static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
 {
-TCGv cond1, cond2, zeroreg, resultopt1;
+TCGv temp1, temp2, zero, one, mone, min;
 
-cond1 = tcg_temp_new();
-cond2 = tcg_temp_new();
-zeroreg = tcg_constant_tl(0);
-resultopt1 = tcg_temp_new();
+temp1 = tcg_temp_new();
+temp2 = tcg_temp_new();
+zero = tcg_constant_tl(0);
+one = tcg_constant_tl(1);
+mone = tcg_constant_tl(-1);
+min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
+
+tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
+tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
+tcg_gen_and_tl(temp1, temp1, temp2); /* temp1 = overflow */
+tcg_gen_setcondi_tl(TCG_COND_EQ, temp2, source2, 0); /* temp2 = div0 */
+

[PATCH v2 01/21] target/riscv: Use tcg_constant_*

2021-08-17 Thread Richard Henderson
Replace uses of tcg_const_* with the allocate and free close together.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 36 --
 target/riscv/insn_trans/trans_rvf.c.inc |  3 +-
 target/riscv/insn_trans/trans_rvv.c.inc | 65 +
 3 files changed, 34 insertions(+), 70 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 6983be5723..20a55c92fb 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -104,20 +104,16 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
  */
 static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
 {
-TCGv_i64 t_max = tcg_const_i64(0xull);
-TCGv_i64 t_nan = tcg_const_i64(0x7fc0ull);
+TCGv_i64 t_max = tcg_constant_i64(0xull);
+TCGv_i64 t_nan = tcg_constant_i64(0x7fc0ull);
 
 tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
-tcg_temp_free_i64(t_max);
-tcg_temp_free_i64(t_nan);
 }
 
 static void generate_exception(DisasContext *ctx, int excp)
 {
 tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-TCGv_i32 helper_tmp = tcg_const_i32(excp);
-gen_helper_raise_exception(cpu_env, helper_tmp);
-tcg_temp_free_i32(helper_tmp);
+gen_helper_raise_exception(cpu_env, tcg_constant_i32(excp));
 ctx->base.is_jmp = DISAS_NORETURN;
 }
 
@@ -125,17 +121,13 @@ static void generate_exception_mtval(DisasContext *ctx, 
int excp)
 {
 tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
 tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
-TCGv_i32 helper_tmp = tcg_const_i32(excp);
-gen_helper_raise_exception(cpu_env, helper_tmp);
-tcg_temp_free_i32(helper_tmp);
+gen_helper_raise_exception(cpu_env, tcg_constant_i32(excp));
 ctx->base.is_jmp = DISAS_NORETURN;
 }
 
 static void gen_exception_debug(void)
 {
-TCGv_i32 helper_tmp = tcg_const_i32(EXCP_DEBUG);
-gen_helper_raise_exception(cpu_env, helper_tmp);
-tcg_temp_free_i32(helper_tmp);
+gen_helper_raise_exception(cpu_env, tcg_constant_i32(EXCP_DEBUG));
 }
 
 /* Wrapper around tcg_gen_exit_tb that handles single stepping */
@@ -229,7 +221,7 @@ static void gen_div(TCGv ret, TCGv source1, TCGv source2)
  */
 cond1 = tcg_temp_new();
 cond2 = tcg_temp_new();
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_movi_tl(resultopt1, (target_ulong)-1);
@@ -250,7 +242,6 @@ static void gen_div(TCGv ret, TCGv source1, TCGv source2)
 
 tcg_temp_free(cond1);
 tcg_temp_free(cond2);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -259,7 +250,7 @@ static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
 TCGv cond1, zeroreg, resultopt1;
 cond1 = tcg_temp_new();
 
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source2, 0);
@@ -272,7 +263,6 @@ static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
 tcg_gen_divu_tl(ret, source1, source2);
 
 tcg_temp_free(cond1);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -282,7 +272,7 @@ static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
 
 cond1 = tcg_temp_new();
 cond2 = tcg_temp_new();
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_movi_tl(resultopt1, 1L);
@@ -302,7 +292,6 @@ static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
 
 tcg_temp_free(cond1);
 tcg_temp_free(cond2);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -310,7 +299,7 @@ static void gen_remu(TCGv ret, TCGv source1, TCGv source2)
 {
 TCGv cond1, zeroreg, resultopt1;
 cond1 = tcg_temp_new();
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_movi_tl(resultopt1, (target_ulong)1);
@@ -323,7 +312,6 @@ static void gen_remu(TCGv ret, TCGv source1, TCGv source2)
 source1);
 
 tcg_temp_free(cond1);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -384,15 +372,11 @@ static inline void mark_fs_dirty(DisasContext *ctx) { }
 
 static void gen_set_rm(DisasContext *ctx, int rm)
 {
-TCGv_i32 t0;
-
 if (ctx->frm == rm) {
 return;
 }
 ctx->frm = rm;
-t0 = tcg_const_i32(rm);
-gen_helper_set_rounding_mode(cpu_env, t0);
-tcg_temp_free_i32(t0);
+gen_helper_set_rounding_mode(cpu_env, tcg_constant_i32(rm));
 }
 
 static int ex_plus_1(DisasContext *ctx, int nf)
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index db1c0c9974..89f78701e7 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -200,12 +200,11 @@ static bool trans_fsgnjn_s(DisasContext 

[PATCH v2 00/21] target/riscv: Use tcg_constant_*

2021-08-17 Thread Richard Henderson
Replace use of tcg_const_*, which makes a copy into a temp
which must be freed, with direct use of the constant.

Reorg handling of $zero, with different accessors for
source and destination.

Reorg handling of csrs, passing the actual write_mask
instead of a regno.

Use more helpers for RVH expansion.

Changes for v2:
  * Retain the requirement to call gen_set_gpr.

  * Add DisasExtend as an argument to get_gpr, and ctx->w as a member
of DisasContext.  This should help in implementing UXL, where we
should be able to set ctx->w for all insns, but there is certainly
more required for that.

Because of this, I've dropped most of the r-b from v1.


r~


Richard Henderson (21):
  target/riscv: Use tcg_constant_*
  target/riscv: Clean up division helpers
  target/riscv: Add DisasContext to gen_get_gpr, gen_set_gpr
  target/riscv: Introduce DisasExtend and new helpers
  target/riscv: Add DisasExtend to gen_arith*
  target/riscv: Remove gen_arith_div*
  target/riscv: Use gen_arith for mulh and mulhu
  target/riscv: Move gen_* helpers for RVM
  target/riscv: Move gen_* helpers for RVB
  target/riscv: Add DisasExtend to gen_unary
  target/riscv: Use DisasExtend in shift operations
  target/riscv: Add gen_greviw
  target/riscv: Use get_gpr in branches
  target/riscv: Use {get,dest}_gpr for integer load/store
  target/riscv: Reorg csr instructions
  target/riscv: Use {get,dest}_gpr for RVA
  target/riscv: Use gen_shift_imm_fn for slli_uw
  target/riscv: Use {get,dest}_gpr for RVF
  target/riscv: Use {get,dest}_gpr for RVD
  target/riscv: Tidy trans_rvh.c.inc
  target/riscv: Use {get,dest}_gpr for RVV

 target/riscv/helper.h   |   6 +-
 target/riscv/insn32.decode  |   1 +
 target/riscv/op_helper.c|  18 +-
 target/riscv/translate.c| 702 +---
 target/riscv/insn_trans/trans_rva.c.inc |  51 +-
 target/riscv/insn_trans/trans_rvb.c.inc | 382 ++---
 target/riscv/insn_trans/trans_rvd.c.inc | 127 ++---
 target/riscv/insn_trans/trans_rvf.c.inc | 149 +++--
 target/riscv/insn_trans/trans_rvh.c.inc | 266 ++---
 target/riscv/insn_trans/trans_rvi.c.inc | 360 ++--
 target/riscv/insn_trans/trans_rvm.c.inc | 176 --
 target/riscv/insn_trans/trans_rvv.c.inc | 151 ++---
 12 files changed, 1058 insertions(+), 1331 deletions(-)

-- 
2.25.1




Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Tobin Feldman-Fitzthum



On 8/17/21 12:32 PM, Paolo Bonzini wrote:

On 17/08/21 01:53, Steve Rutherford wrote:

Separately, I'm a little weary of leaving the migration helper mapped
into the shared address space as writable.


A related question here is what the API should be for how the 
migration helper sees the memory in both physical and virtual address.


First of all, I would like the addresses passed to and from the 
migration helper to *not* be guest physical addresses (this is what I 
referred to as QEMU's ram_addr_t in other messages).  The reason is 
that some unmapped memory regions, such as virtio-mem hotplugged 
memory, would still have to be transferred and could be encrypted.  
While the guest->host hypercall interface uses guest physical 
addresses to communicate which pages are encrypted, the host can do 
the GPA->ram_addr_t conversion and remember the encryption status of 
currently-unmapped regions.


This poses a problem, in that the guest needs to prepare the page 
tables for the migration helper and those need to use the migration 
helper's physical address space.


There's three possibilities for this:

1) the easy one: the bottom 4G of guest memory are mapped in the 
mirror VM 1:1.  The ram_addr_t-based addresses are shifted by either 
4G or a huge value such as 2^42 (MAXPHYADDR - physical address 
reduction - 1). This even lets the migration helper reuse the OVMF 
runtime services memory map (but be careful about thread safety...).


This is essentially what we do in our prototype, although we have an 
even simpler approach. We have a 1:1 mapping that maps an address to 
itself with the cbit set. During Migration QEMU asks the migration 
handler to import/export encrypted pages and provides the GPA for said 
page. Since the migration handler only exports/imports encrypted pages, 
we can have the cbit set for every page in our mapping. We can still use 
OVMF functions with these mappings because they are on encrypted pages. 
The MH does need to use a few shared pages (to communicate with QEMU, 
for instance), so we have another mapping without the cbit that is at a 
large offset.


I think this is basically equivalent to what you suggest. As you point 
out above, this approach does require that any page that will be 
exported/imported by the MH is mapped in the guest. Is this a bad 
assumption? The VMSA for SEV-ES is one example of a region that is 
encrypted but not mapped in the guest (the PSP handles it directly). We 
have been planning to map the VMSA into the guest to support migration 
with SEV-ES (along with other changes).


2) the more future-proof one.  Here, the migration helper tells QEMU 
which area to copy from the guest to the mirror VM, as a (main GPA, 
length, mirror GPA) tuple.  This could happen for example the first 
time the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration 
starts, QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION 
accordingly.  The page tables are built for this (usually very high) 
mirror GPA and the migration helper operates in a completely separate 
address space.  However, the backing memory would still be shared 
between the main and mirror VMs.  I am saying this is more future 
proof because we have more flexibility in setting up the physical 
address space of the mirror VM.


The Migration Handler in OVMF is not a contiguous region of memory. The 
MH uses OVMF helper functions that are allocated in various regions of 
runtime memory. I guess I can see how separating the memory of the MH 
and the guest OS could be positive. On the other hand, since the MH is 
in OVMF, it is fundamentally designed to coexist with the guest OS.


What do you envision in terms of future changes to the mirror address space?

3) the paranoid one, which I think is what you hint at above: this is 
an extension of (2), where userspace invokes the PSP send/receive API 
to copy the small requested area of the main VM into the mirror VM.  
The mirror VM code and data are completely separate from the main VM.  
All that the mirror VM shares is the ram_addr_t data. Though I am not 
even sure it is possible to use the send/receive API this way...


Yeah not sure if you could use the PSP for this.

-Tobin



What do you think?

Paolo






Re: [PATCH] target/arm: Do hflags rebuild in cpsr_write()

2021-08-17 Thread Richard Henderson

On 8/17/21 10:26 AM, Peter Maydell wrote:

On Tue, 17 Aug 2021 at 21:18, Peter Maydell  wrote:


Currently we rely on all the callsites of cpsr_write() to rebuild the
cached hflags if they change one of the CPSR bits which we use as a
TB flag and cache in hflags.  This is a bit awkward when we want to
change the set of CPSR bits that we cache, because it means we need
to re-audit all the cpsr_write() callsites to see which flags they
are writing and whether they now need to rebuild the hflags.

Switch instead to making cpsr_write() call arm_rebuild_hflags()
itself if one of the bits being changed is a cached bit.

We don't do the rebuild for the CPSRWriteRaw write type,


Doh. I said this, but then...


diff --git a/target/arm/helper.c b/target/arm/helper.c
index 201ecf8c67f..cdd6e0858fc 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9215,6 +9215,7 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t 
mask,
  CPSRWriteType write_type)
  {
  uint32_t changed_daif;
+bool rebuild_hflags = mask & (CPSR_M | CPSR_E | CPSR_IL);


...forgot to actually check the write type.

Should be:

 bool rebuild_hflags = (write_type != CPSRWriteRaw) &&
 (mask & (CPSR_M | CPSR_E | CPSR_IL));


with the fix,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH] target/arm: Take an exception if PSTATE.IL is set

2021-08-17 Thread Richard Henderson

On 8/17/21 6:21 AM, Peter Maydell wrote:

In v8A, the PSTATE.IL bit is set for various kinds of illegal
exception return or mode-change attempts.  We already set PSTATE.IL
(or its AArch32 equivalent CPSR.IL) in all those cases, but we
weren't implementing the part of the behaviour where attempting to
execute an instruction with PSTATE.IL takes an immediate exception
with an appropriate syndrome value.

Add a new TB flags bit tracking PSTATE.IL/CPSR.IL, and generate code
to take an exception instead of whatever the instruction would have
been.

PSTATE.IL and CPSR.IL change only on exception entry, attempted
exception exit, and various AArch32 mode changes via cpsr_write().
These places generally already rebuild the hflags, so the only place
we need an extra rebuild_hflags call is in the illegal-return
codepath of the AArch64 exception_return helper.

Signed-off-by: Peter Maydell 
---
Obviously correct guest code is never going to set PSTATE.IL, but
it's pretty confusing to debug bugs in guest OSes if we just plough
on executing code rather than taking the illegal-state exception.  We
had a user point this bug out to us earlier this year I think
(probably on IRC since I can't find anything about it in my email).


Reviewed-by: Richard Henderson 


+if (s->pstate_il) {
+/*
+ * Illegal execution state. This has priority over BTI
+ * exceptions, but comes after instruction abort exceptions.
+ */
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_illegalstate(), default_exception_el(s));
+}


Missing return after exception.


@@ -9045,6 +9045,15 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
  return;
  }
  
+if (s->pstate_il) {

+/*
+ * Illegal execution state. This has priority over BTI
+ * exceptions, but comes after instruction abort exceptions.
+ */
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_illegalstate(), default_exception_el(s));
+}

...

@@ -9576,6 +9586,15 @@ static void thumb_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
  }
  dc->insn = insn;
  
+if (dc->pstate_il) {

+/*
+ * Illegal execution state. This has priority over BTI
+ * exceptions, but comes after instruction abort exceptions.
+ */
+gen_exception_insn(dc, dc->pc_curr, EXCP_UDEF,
+   syn_illegalstate(), default_exception_el(dc));
+}


Likewise.

Otherwise,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH] target/arm: Do hflags rebuild in cpsr_write()

2021-08-17 Thread Peter Maydell
On Tue, 17 Aug 2021 at 21:18, Peter Maydell  wrote:
>
> Currently we rely on all the callsites of cpsr_write() to rebuild the
> cached hflags if they change one of the CPSR bits which we use as a
> TB flag and cache in hflags.  This is a bit awkward when we want to
> change the set of CPSR bits that we cache, because it means we need
> to re-audit all the cpsr_write() callsites to see which flags they
> are writing and whether they now need to rebuild the hflags.
>
> Switch instead to making cpsr_write() call arm_rebuild_hflags()
> itself if one of the bits being changed is a cached bit.
>
> We don't do the rebuild for the CPSRWriteRaw write type,

Doh. I said this, but then...

> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 201ecf8c67f..cdd6e0858fc 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -9215,6 +9215,7 @@ void cpsr_write(CPUARMState *env, uint32_t val, 
> uint32_t mask,
>  CPSRWriteType write_type)
>  {
>  uint32_t changed_daif;
> +bool rebuild_hflags = mask & (CPSR_M | CPSR_E | CPSR_IL);

...forgot to actually check the write type.

Should be:

bool rebuild_hflags = (write_type != CPSRWriteRaw) &&
(mask & (CPSR_M | CPSR_E | CPSR_IL));

>  if (mask & CPSR_NZCV) {
>  env->ZF = (~val) & CPSR_Z;
> @@ -9334,6 +9335,9 @@ void cpsr_write(CPUARMState *env, uint32_t val, 
> uint32_t mask,
>  }
>  mask &= ~CACHED_CPSR_BITS;
>  env->uncached_cpsr = (env->uncached_cpsr & ~mask) | (val & mask);
> +if (rebuild_hflags) {
> +arm_rebuild_hflags(env);
> +}
>  }
>
>  /* Sign/zero extend */
> --

-- PMM



Re: [RFC] vfio/migration: reduce the msix virq setup cost in resume phase

2021-08-17 Thread Alex Williamson
On Fri, 13 Aug 2021 12:06:14 +0800
"Longpeng(Mike)"  wrote:

> In migration resume phase, all unmasked msix vectors need to be
> setup when load the VF state. However, the setup operation would
> takes longer if the VF has more unmasked vectors.
> 
> In our case, the VF has 65 vectors and each one spend 0.8ms on
> setup operation (vfio_add_kvm_msi_virq -> kvm_irqchip_commit_routes),
> the total cost of the VF is more than 40ms. Even worse, the VM has
> 8 VFs, so the downtime increase more than 320ms.
> 
> vfio_pci_load_config
>   vfio_msix_enable
> msix_set_vector_notifiers
>   for (vector = 0; vector < dev->msix_entries_nr; vector++) {
> vfio_msix_vector_do_use
>   vfio_add_kvm_msi_virq
> kvm_irqchip_commit_routes <-- 0.8ms
>   }
> 
> Originaly, We tried to batch all routes and just commit once
> outside the loop, but it's not easy to fallback to qemu interrupt
> if someone fails.

I'm not sure I follow here, once we setup the virq, what's the failure
path?  kvm_irqchip_commit_routes() returns void.  Were you looking at
adding a "defer" arg to kvm_irqchip_add_msi_route() that skips the
commit, which vfio_add_kvm_msi_virq() might set based on the migration
state and vfio_pci_load_config() could then trigger the commit?
There's more overhead that can be removed if VFIO_DEVICE_SET_IRQS could
be called once rather than per vector.

> So this patch trys to defer the KVM interrupt setup, the unmasked
> vector will use qemu interrupt as default and switch to kvm interrupt
> once it fires.
> 
> Signed-off-by: Longpeng(Mike) 
> ---
>  hw/vfio/pci.c | 39 ++-
>  hw/vfio/pci.h |  2 ++
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e1ea1d8..dd35170 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -47,6 +47,8 @@
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static void vfio_add_kvm_msix_virq(VFIOPCIDevice *vdev,
> +   VFIOMSIVector *vector, int nr);
>  
>  /*
>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> @@ -347,6 +349,11 @@ static void vfio_msi_interrupt(void *opaque)
>  get_msg = msix_get_message;
>  notify = msix_notify;
>  
> +if (unlikely(vector->need_switch)) {
> +vfio_add_kvm_msix_virq(vdev, vector, nr);
> +vector->need_switch = false;
> +}
> +

A better name might be "vector->kvm_routing_deferred".  Essentially this
is just a lazy setup of KVM routes, we could always do this, or we could
do this based on a device options.  I wonder if there are any affinity
benefits in the VM to defer the KVM route.

>  /* A masked vector firing needs to use the PBA, enable it */
>  if (msix_is_masked(>pdev, nr)) {
>  set_bit(nr, vdev->msix->pending);
> @@ -438,6 +445,25 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
> VFIOMSIVector *vector,
>  vector->virq = virq;
>  }
>  
> +static void
> +vfio_add_kvm_msix_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector, int nr)
> +{
> +Error *err = NULL;
> +int fd;
> +
> +vfio_add_kvm_msi_virq(vdev, vector, nr, true);
> +if (vector->virq < 0) {
> +return;
> +}
> +
> +fd = event_notifier_get_fd(>kvm_interrupt);
> +if (vfio_set_irq_signaling(>vbasedev,
> +   VFIO_PCI_MSIX_IRQ_INDEX, nr,
> +   VFIO_IRQ_SET_ACTION_TRIGGER, fd, )) {
> +error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> +}
> +}
> +
>  static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector)
>  {
>  kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, >kvm_interrupt,
> @@ -490,7 +516,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> unsigned int nr,
>  }
>  } else {
>  if (msg) {
> -vfio_add_kvm_msi_virq(vdev, vector, nr, true);
> +if (unlikely(vdev->defer_set_virq)) {

Likewise this could be "vdev->defer_kvm_irq_routing" and we could apply
it to all IRQ types.

> +vector->need_switch = true;
> +} else {
> +vfio_add_kvm_msi_virq(vdev, vector, nr, true);
> +}
>  }
>  }
>  
> @@ -566,6 +596,11 @@ static void vfio_msix_vector_release(PCIDevice *pdev, 
> unsigned int nr)
>  }
>  }
>  
> +static void inline vfio_msix_defer_set_virq(VFIOPCIDevice *vdev, bool defer)
> +{
> +vdev->defer_set_virq = defer;
> +}

A helper function seems excessive.

> +
>  static void vfio_msix_enable(VFIOPCIDevice *vdev)
>  {
>  PCIDevice *pdev = >pdev;
> @@ -2466,7 +2501,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
> QEMUFile *f)
>  if (msi_enabled(pdev)) {
>  vfio_msi_enable(vdev);
>  } else if (msix_enabled(pdev)) {
> +vfio_msix_defer_set_virq(vdev, true);
>  

[PATCH v3] block/file-win32: add reopen handlers

2021-08-17 Thread Viktor Prutyanov
Make 'qemu-img commit' work on Windows.

Command 'commit' requires reopening backing file in RW mode. So,
add reopen prepare/commit/abort handlers and change dwShareMode
for CreateFile call in order to allow further read/write reopening.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418
Suggested-by: Hanna Reitz 
Signed-off-by: Viktor Prutyanov 
---
 v2:
- fix indentation in raw_reopen_prepare
- free rs if raw_reopen_prepare fails
 v3:
- restore suggested-by field missed in v2

 block/file-win32.c | 90 +-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 2642088bd6..9dcbb2b53b 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -58,6 +58,10 @@ typedef struct BDRVRawState {
 QEMUWin32AIOState *aio;
 } BDRVRawState;
 
+typedef struct BDRVRawReopenState {
+HANDLE hfile;
+} BDRVRawReopenState;
+
 /*
  * Read/writes the data to/from a given linear buffer.
  *
@@ -392,7 +396,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 s->hfile = CreateFile(filename, access_flags,
-  FILE_SHARE_READ, NULL,
+  FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
   OPEN_EXISTING, overlapped, NULL);
 if (s->hfile == INVALID_HANDLE_VALUE) {
 int err = GetLastError();
@@ -634,6 +638,86 @@ static int coroutine_fn raw_co_create_opts(BlockDriver 
*drv,
 return raw_co_create(, errp);
 }
 
+static int raw_reopen_prepare(BDRVReopenState *state,
+  BlockReopenQueue *queue, Error **errp)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs;
+int access_flags;
+DWORD overlapped;
+int ret = 0;
+
+rs = g_new0(BDRVRawReopenState, 1);
+
+raw_parse_flags(state->flags, s->aio != NULL, _flags, );
+rs->hfile = CreateFile(state->bs->filename, access_flags,
+   FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
+   OPEN_EXISTING, overlapped, NULL);
+
+if (rs->hfile == INVALID_HANDLE_VALUE) {
+int err = GetLastError();
+
+error_setg_win32(errp, err, "Could not reopen '%s'",
+ state->bs->filename);
+if (err == ERROR_ACCESS_DENIED) {
+ret = -EACCES;
+} else {
+ret = -EINVAL;
+}
+goto fail;
+}
+
+if (s->aio) {
+ret = win32_aio_attach(s->aio, rs->hfile);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not enable AIO");
+goto fail;
+}
+}
+
+state->opaque = rs;
+
+return 0;
+
+fail:
+g_free(rs);
+state->opaque = NULL;
+
+return ret;
+}
+
+static void raw_reopen_commit(BDRVReopenState *state)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs = state->opaque;
+
+if (!rs) {
+return;
+}
+
+CloseHandle(s->hfile);
+s->hfile = rs->hfile;
+
+g_free(rs);
+state->opaque = NULL;
+}
+
+static void raw_reopen_abort(BDRVReopenState *state)
+{
+BDRVRawReopenState *rs = state->opaque;
+
+if (!rs) {
+return;
+}
+
+if (rs->hfile != INVALID_HANDLE_VALUE) {
+CloseHandle(rs->hfile);
+}
+
+g_free(rs);
+state->opaque = NULL;
+}
+
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -659,6 +743,10 @@ BlockDriver bdrv_file = {
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 
+.bdrv_reopen_prepare = raw_reopen_prepare,
+.bdrv_reopen_commit  = raw_reopen_commit,
+.bdrv_reopen_abort   = raw_reopen_abort,
+
 .bdrv_aio_preadv= raw_aio_preadv,
 .bdrv_aio_pwritev   = raw_aio_pwritev,
 .bdrv_aio_flush = raw_aio_flush,
-- 
2.21.0




[PATCH] target/arm: Do hflags rebuild in cpsr_write()

2021-08-17 Thread Peter Maydell
Currently we rely on all the callsites of cpsr_write() to rebuild the
cached hflags if they change one of the CPSR bits which we use as a
TB flag and cache in hflags.  This is a bit awkward when we want to
change the set of CPSR bits that we cache, because it means we need
to re-audit all the cpsr_write() callsites to see which flags they
are writing and whether they now need to rebuild the hflags.

Switch instead to making cpsr_write() call arm_rebuild_hflags()
itself if one of the bits being changed is a cached bit.

We don't do the rebuild for the CPSRWriteRaw write type, because that
kind of write is generally doing something special anyway.  For the
CPSRWriteRaw callsites in the KVM code and inbound migration we
definitely don't want to recalculate the hflags; the callsites in
boot.c and arm-powerctl.c have to do a rebuild-hflags call themselves
anyway because of other CPU state changes they make.

This allows us to drop explicit arm_rebuild_hflags() calls in a
couple of places where the only reason we needed to call it was the
CPSR write.

This fixes a bug where we were incorrectly failing to rebuild hflags
in the code path for a gdbstub write to CPSR, which meant that you
could make QEMU assert by breaking into a running guest, altering the
CPSR to change the value of, for example, CPSR.E, and then
continuing.

Signed-off-by: Peter Maydell 
---
I've included CPSR.IL on the list of "cached bits" on the assumption
that my "honour PSTATE.IL" patch is going in, but this patch doesn't
strictly depend on that one. Just in case, though:
Based-on: id:20210817162118.24319-1-peter.mayd...@linaro.org
"[PATCH] target/arm: Take an exception if PSTATE.IL is set"

 target/arm/cpu.h| 10 --
 linux-user/arm/signal.c |  2 --
 target/arm/helper.c |  4 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index be557bf5d83..4c0568f85a8 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1393,11 +1393,17 @@ uint32_t cpsr_read(CPUARMState *env);
 typedef enum CPSRWriteType {
 CPSRWriteByInstr = 0, /* from guest MSR or CPS */
 CPSRWriteExceptionReturn = 1, /* from guest exception return insn */
-CPSRWriteRaw = 2, /* trust values, do not switch reg banks */
+CPSRWriteRaw = 2,
+/* trust values, no reg bank switch, no hflags rebuild */
 CPSRWriteByGDBStub = 3,   /* from the GDB stub */
 } CPSRWriteType;
 
-/* Set the CPSR.  Note that some bits of mask must be all-set or all-clear.*/
+/*
+ * Set the CPSR.  Note that some bits of mask must be all-set or all-clear.
+ * This will do an arm_rebuild_hflags() if any of the bits in @mask
+ * correspond to TB flags bits cached in the hflags, unless @write_type
+ * is CPSRWriteRaw.
+ */
 void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
 CPSRWriteType write_type);
 
diff --git a/linux-user/arm/signal.c b/linux-user/arm/signal.c
index 32b68ee302b..1dfcfd2d57b 100644
--- a/linux-user/arm/signal.c
+++ b/linux-user/arm/signal.c
@@ -289,7 +289,6 @@ setup_return(CPUARMState *env, struct target_sigaction *ka,
 env->regs[14] = retcode;
 env->regs[15] = handler & (thumb ? ~1 : ~3);
 cpsr_write(env, cpsr, CPSR_IT | CPSR_T | CPSR_E, CPSRWriteByInstr);
-arm_rebuild_hflags(env);
 
 return 0;
 }
@@ -547,7 +546,6 @@ restore_sigcontext(CPUARMState *env, struct 
target_sigcontext *sc)
 __get_user(env->regs[15], >arm_pc);
 __get_user(cpsr, >arm_cpsr);
 cpsr_write(env, cpsr, CPSR_USER | CPSR_EXEC, CPSRWriteByInstr);
-arm_rebuild_hflags(env);
 
 err |= !valid_user_regs(env);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 201ecf8c67f..cdd6e0858fc 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9215,6 +9215,7 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t 
mask,
 CPSRWriteType write_type)
 {
 uint32_t changed_daif;
+bool rebuild_hflags = mask & (CPSR_M | CPSR_E | CPSR_IL);
 
 if (mask & CPSR_NZCV) {
 env->ZF = (~val) & CPSR_Z;
@@ -9334,6 +9335,9 @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t 
mask,
 }
 mask &= ~CACHED_CPSR_BITS;
 env->uncached_cpsr = (env->uncached_cpsr & ~mask) | (val & mask);
+if (rebuild_hflags) {
+arm_rebuild_hflags(env);
+}
 }
 
 /* Sign/zero extend */
-- 
2.20.1




[PATCH v2] block/file-win32: add reopen handlers

2021-08-17 Thread Viktor Prutyanov
Make 'qemu-img commit' work on Windows.

Command 'commit' requires reopening backing file in RW mode. So,
add reopen prepare/commit/abort handlers and change dwShareMode
for CreateFile call in order to allow further read/write reopening.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418

Signed-off-by: Viktor Prutyanov 
---
 v2:
- fix indentation in raw_reopen_prepare
- free rs if raw_reopen_prepare fails

 block/file-win32.c | 90 +-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 2642088bd6..9dcbb2b53b 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -58,6 +58,10 @@ typedef struct BDRVRawState {
 QEMUWin32AIOState *aio;
 } BDRVRawState;
 
+typedef struct BDRVRawReopenState {
+HANDLE hfile;
+} BDRVRawReopenState;
+
 /*
  * Read/writes the data to/from a given linear buffer.
  *
@@ -392,7 +396,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 s->hfile = CreateFile(filename, access_flags,
-  FILE_SHARE_READ, NULL,
+  FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
   OPEN_EXISTING, overlapped, NULL);
 if (s->hfile == INVALID_HANDLE_VALUE) {
 int err = GetLastError();
@@ -634,6 +638,86 @@ static int coroutine_fn raw_co_create_opts(BlockDriver 
*drv,
 return raw_co_create(, errp);
 }
 
+static int raw_reopen_prepare(BDRVReopenState *state,
+  BlockReopenQueue *queue, Error **errp)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs;
+int access_flags;
+DWORD overlapped;
+int ret = 0;
+
+rs = g_new0(BDRVRawReopenState, 1);
+
+raw_parse_flags(state->flags, s->aio != NULL, _flags, );
+rs->hfile = CreateFile(state->bs->filename, access_flags,
+   FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
+   OPEN_EXISTING, overlapped, NULL);
+
+if (rs->hfile == INVALID_HANDLE_VALUE) {
+int err = GetLastError();
+
+error_setg_win32(errp, err, "Could not reopen '%s'",
+ state->bs->filename);
+if (err == ERROR_ACCESS_DENIED) {
+ret = -EACCES;
+} else {
+ret = -EINVAL;
+}
+goto fail;
+}
+
+if (s->aio) {
+ret = win32_aio_attach(s->aio, rs->hfile);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not enable AIO");
+goto fail;
+}
+}
+
+state->opaque = rs;
+
+return 0;
+
+fail:
+g_free(rs);
+state->opaque = NULL;
+
+return ret;
+}
+
+static void raw_reopen_commit(BDRVReopenState *state)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs = state->opaque;
+
+if (!rs) {
+return;
+}
+
+CloseHandle(s->hfile);
+s->hfile = rs->hfile;
+
+g_free(rs);
+state->opaque = NULL;
+}
+
+static void raw_reopen_abort(BDRVReopenState *state)
+{
+BDRVRawReopenState *rs = state->opaque;
+
+if (!rs) {
+return;
+}
+
+if (rs->hfile != INVALID_HANDLE_VALUE) {
+CloseHandle(rs->hfile);
+}
+
+g_free(rs);
+state->opaque = NULL;
+}
+
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -659,6 +743,10 @@ BlockDriver bdrv_file = {
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 
+.bdrv_reopen_prepare = raw_reopen_prepare,
+.bdrv_reopen_commit  = raw_reopen_commit,
+.bdrv_reopen_abort   = raw_reopen_abort,
+
 .bdrv_aio_preadv= raw_aio_preadv,
 .bdrv_aio_pwritev   = raw_aio_pwritev,
 .bdrv_aio_flush = raw_aio_flush,
-- 
2.21.0




Re: [PATCH v3 13/25] python/aqmp: add QMP Message format

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:29PM -0400, John Snow wrote:
> The Message class is here primarily to serve as a solid type to use for
> mypy static typing for unambiguous annotation and documentation.
> 
> We can also stuff JSON serialization and deserialization into this class
> itself so it can be re-used even outside this infrastructure.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/__init__.py |   4 +-
>  python/qemu/aqmp/message.py  | 209 +++
>  2 files changed, 212 insertions(+), 1 deletion(-)
>  create mode 100644 python/qemu/aqmp/message.py

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> On 16.08.21 21:44, Vivek Goyal wrote:
> > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> > 
> > [..]
> > > > > But given the inotify complications, there’s really a good reason we 
> > > > > should
> > > > > use mountinfo.
> > > > > 
> > > > > > > It’s a bit tricky because our sandboxing prevents easy access to 
> > > > > > > mountinfo,
> > > > > > > but if that’s the only way...
> > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > that any mount table changes will still be visible despite the fact
> > > > > > I have fd open (and don't have to open new fd to notice new 
> > > > > > mount/unmount
> > > > > > changes).
> > > > > Well, yes, that was my idea.  Unfortunately, I wasn’t quite 
> > > > > successful yet;
> > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > bytes.  Perhaps that’s because we bind-mount /proc/self/fd to /proc 
> > > > > so that
> > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > before /proc/self/fd is bind mounted on /proc?
> > > Yes, I tried that, and then reading would just return 0 bytes.
> > Hi Hanna,
> > 
> > I tried this simple patch and I can read /proc/self/mountinfo before
> > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > I missing something.
> 
> Yes, but I tried reading it in the main loop (where we’d actually need it). 
> It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.

Good point. I modified my code and notice too that after umoutn2() it
always reads 0 bytes. I can understand that all the other mount points
could go away but new rootfs mount point of virtiofsd should still be
visible, IIUC. I don't understand why.

Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
MNT_DETACH), and that seems to work and it shows root mount point. I 
created a bind mount and it shows that too.

So looks like quick fix can be that we re-open /proc/self/mountinfo. But
that means we can't bind /proc/self/fd on /proc/. We could bind mount
/proc/self on /proc. Not sure is it safe enough.

Here is the debug patch I tried.


---
 tools/virtiofsd/passthrough_ll.c |  101 +--
 1 file changed, 96 insertions(+), 5 deletions(-)

Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c 2021-08-16 
15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c  2021-08-17 
15:40:20.456811218 -0400
@@ -172,6 +172,8 @@ struct lo_data {
 
 /* An O_PATH file descriptor to /proc/self/fd/ */
 int proc_self_fd;
+int proc_mountinfo;
+int proc_self;
 int user_killpriv_v2, killpriv_v2;
 /* If set, virtiofsd is responsible for setting umask during creation */
 bool change_umask;
@@ -3403,12 +3405,56 @@ static void setup_wait_parent_capabiliti
 capng_apply(CAPNG_SELECT_BOTH);
 }
 
+static void read_mountinfo(struct lo_data *lo)
+{
+char buf[4096];
+ssize_t count, total_read = 0;
+int ret;
+
+ret = lseek(lo->proc_mountinfo, 0, SEEK_SET);
+if (ret == -1) {
+fuse_log(FUSE_LOG_ERR, "lseek(): %m\n");
+exit(1);
+}
+
+do {
+count = read(lo->proc_mountinfo, buf, 4095);
+if (count == -1) {
+fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+exit(1);
+}
+
+//fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+buf[count] = '\0';
+fuse_log(FUSE_LOG_INFO, "%s", buf);
+total_read += count;
+} while(count);
+
+fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", total_read);
+}
+
+static void reopen_mountinfo(struct lo_data *lo)
+{
+int fd;
+
+close(lo->proc_mountinfo);
+
+fd = openat(lo->proc_self, "mountinfo", O_RDONLY);
+if (fd == -1) {
+fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+exit(1);
+}
+
+lo->proc_mountinfo = fd;
+}
+
 /*
  * Move to a new mount, net, and pid namespaces to isolate this process.
  */
 static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
 {
 pid_t child;
+int fd;
 
 /*
  * Create a new pid namespace for *child* processes.  We'll have to
@@ -3472,21 +3518,35 @@ static void setup_namespaces(struct lo_d
 exit(1);
 }
 
+fd = open("/proc/self/mountinfo", O_RDONLY);
+if (fd == -1) {
+fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+exit(1);
+}
+
+lo->proc_mountinfo = fd;
+
 /*
  * We only need /proc/self/fd. Prevent ".." from 

Re: [PATCH v3 12/25] python/aqmp: add AsyncProtocol._readline() method

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:28PM -0400, John Snow wrote:
> This is added as a courtesy: many protocols are line-based, including
> QMP. Putting it in AsyncProtocol lets us keep the QMP class
> implementation just a pinch more abstract.
> 
> (And, if we decide to add a QTEST implementation later, it will need
> this, too. (Yes, I have a QTEST implementation.))
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/protocol.py | 29 +
>  1 file changed, 29 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 11/25] python/aqmp: add _cb_inbound and _cb_outbound logging hooks

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:27PM -0400, John Snow wrote:
> Add hooks designed to log/filter incoming/outgoing messages. The primary
> intent for these is to be able to support iotests which may want to log
> messages with specific filters for reproducible output.
> 
> Another use is for plugging into Urwid frameworks; all messages in/out
> can be automatically added to a rendering list for the purposes of a
> qmp-shell like tool.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/protocol.py | 50 +---
>  1 file changed, 46 insertions(+), 4 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 10/25] python/aqmp: add configurable read buffer limit

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:26PM -0400, John Snow wrote:
> QMP can transmit some pretty big messages, and the default limit of 64KB
> isn't sufficient. Make sure that we can configure it.
> 
> Reported-by: G S Niteesh Babu 
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/protocol.py | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 09/25] python/aqmp: add AsyncProtocol.accept() method

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:25PM -0400, John Snow wrote:
> It's a little messier than connect, because it wasn't designed to accept
> *precisely one* connection. Such is life.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/protocol.py | 89 ++--
>  1 file changed, 85 insertions(+), 4 deletions(-)
> 
> diff --git a/python/qemu/aqmp/protocol.py b/python/qemu/aqmp/protocol.py
> index 77b330627b3..7eca65aa265 100644
> --- a/python/qemu/aqmp/protocol.py
> +++ b/python/qemu/aqmp/protocol.py
> @@ -243,6 +243,24 @@ async def runstate_changed(self) -> Runstate:
>  await self._runstate_event.wait()
>  return self.runstate
>  
> +@upper_half
> +@require(Runstate.IDLE)
> +async def accept(self, address: Union[str, Tuple[str, int]],
> + ssl: Optional[SSLContext] = None) -> None:
> +"""
> +Accept a connection and begin processing message queues.
> +
> +If this call fails, `runstate` is guaranteed to be set back to 
> `IDLE`.
> +
> +:param address:
> +Address to listen to; UNIX socket path or TCP address/port.

Can't TCP use a well-known port name instead of an int?  But limiting
clients to just int port for now isn't fatal to the patch.

> +:param ssl: SSL context to use, if any.
> +
> +:raise StateError: When the `Runstate` is not `IDLE`.
> +:raise ConnectError: If a connection could not be accepted.
> +"""
> +await self._new_session(address, ssl, accept=True)
> +

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




[PATCH v4] vga: don't abort when adding a duplicate isa-vga device

2021-08-17 Thread Jose R. Ziviani
If users try to add an isa-vga device that was already registered,
still in command line, qemu will crash:

$ qemu-system-mips64el -M pica61 -device isa-vga
RAMBlock "vga.vram" already registered, abort!
Aborted (core dumped)

That particular board registers the device automaticaly, so it's
not obvious that a VGA device already exists. This patch changes
this behavior by displaying a message and exiting without crashing.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/44
Signed-off-by: Jose R. Ziviani 
---
v3 to v4: Used object_resolve_path_type instead of qemu_ram_block_by_name
  and copied the message from virgl, to give the same user exp.
v2 to v3: Improved error message
v1 to v2: Use error_setg instead of error_report

 hw/display/vga-isa.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/display/vga-isa.c b/hw/display/vga-isa.c
index 90851e730b..8cea84f2be 100644
--- a/hw/display/vga-isa.c
+++ b/hw/display/vga-isa.c
@@ -33,6 +33,7 @@
 #include "hw/loader.h"
 #include "hw/qdev-properties.h"
 #include "qom/object.h"
+#include "qapi/error.h"
 
 #define TYPE_ISA_VGA "isa-vga"
 OBJECT_DECLARE_SIMPLE_TYPE(ISAVGAState, ISA_VGA)
@@ -61,6 +62,15 @@ static void vga_isa_realizefn(DeviceState *dev, Error **errp)
 MemoryRegion *vga_io_memory;
 const MemoryRegionPortio *vga_ports, *vbe_ports;
 
+/*
+ * make sure this device is not being added twice, if so
+ * exit without crashing qemu
+ */
+if (object_resolve_path_type("", TYPE_ISA_VGA, NULL)) {
+error_setg(errp, "at most one %s device is permitted", TYPE_ISA_VGA);
+return;
+}
+
 s->global_vmstate = true;
 vga_common_init(s, OBJECT(dev));
 s->legacy_address_space = isa_address_space(isadev);
-- 
2.32.0




Re: [PATCH v3 08/25] python/aqmp: add logging to AsyncProtocol

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:24PM -0400, John Snow wrote:
> Give the connection and the reader/writer tasks nicknames, and add
> logging statements throughout.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/protocol.py | 82 
>  1 file changed, 73 insertions(+), 9 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 07/25] python/aqmp: Add logging utility helpers

2021-08-17 Thread Eric Blake
On Tue, Aug 03, 2021 at 02:29:23PM -0400, John Snow wrote:
> Signed-off-by: John Snow 
> ---
>  python/qemu/aqmp/util.py | 56 
>  1 file changed, 56 insertions(+)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 09/13] python/aqmp-tui: Add QMP connection manager

2021-08-17 Thread Niteesh G. S.
On Tue, Aug 17, 2021 at 10:21 AM John Snow  wrote:

>
>
> On Fri, Jul 30, 2021 at 4:19 PM G S Niteesh Babu 
> wrote:
>
>> Instead of manually connecting and disconnecting from the
>> server. We now rely on the runstate to manage the QMP
>> connection.
>>
>> Along with this the ability to reconnect on certain exceptions
>> has also been added.
>>
>> Signed-off-by: G S Niteesh Babu 
>> ---
>>  python/qemu/aqmp/aqmp_tui.py | 109 ++-
>>  1 file changed, 94 insertions(+), 15 deletions(-)
>>
>> diff --git a/python/qemu/aqmp/aqmp_tui.py b/python/qemu/aqmp/aqmp_tui.py
>> index 0d5ec62cb7..ef91883fa5 100644
>> --- a/python/qemu/aqmp/aqmp_tui.py
>> +++ b/python/qemu/aqmp/aqmp_tui.py
>> @@ -25,8 +25,9 @@
>>  import urwid_readline
>>
>>  from ..qmp import QEMUMonitorProtocol, QMPBadPortError
>> +from .error import ProtocolError
>>  from .message import DeserializationError, Message, UnexpectedTypeError
>> -from .protocol import ConnectError
>> +from .protocol import ConnectError, Runstate
>>  from .qmp_client import ExecInterruptedError, QMPClient
>>  from .util import create_task, pretty_traceback
>>
>> @@ -67,12 +68,24 @@ def format_json(msg: str) -> str:
>>  return ' '.join(words)
>>
>>
>> +def type_name(mtype: Any) -> str:
>> +"""
>> +Returns the type name
>> +"""
>> +return type(mtype).__name__
>>
>
> This is a lot of lines for something that doesn't do very much -- do we
> really need it?
>
No. This has been removed in v4.

>
>
>> +
>> +
>>  class App(QMPClient):
>> -def __init__(self, address: Union[str, Tuple[str, int]]) -> None:
>> +def __init__(self, address: Union[str, Tuple[str, int]],
>> num_retries: int,
>> + retry_delay: Optional[int]) -> None:
>>  urwid.register_signal(type(self), UPDATE_MSG)
>>  self.window = Window(self)
>>  self.address = address
>>  self.aloop: Optional[Any] = None  # FIXME: Use more concrete
>> type.
>> +self.num_retries = num_retries
>> +self.retry_delay = retry_delay
>> +self.retry: bool = False
>> +self.disconnecting: bool = False
>>
>
> Why is this one needed again ? ...
>
A race condition occurs in protocol.py line 597
The reason behind this is there are two disconnect calls initiated. The
first one via kill_app
and the second one via manage_connection when the state is set to
disconnecting by the first call.
One of the calls set's the state to IDLE(protocol.py:584) after it has
finished disconnecting, meanwhile
the second call is somehow in the process of disconnecting and assert the
state to be in DISCONNECTING
in protocol.py:597, which it is not since it has been set to IDLE by the
first call.

If I don't gaurd against the second call I get the following exception
--
Traceback (most recent call last):
  File "/home/niteesh/development/qemu/python/.venv/bin/aqmp-tui", line 33,
in 
sys.exit(load_entry_point('qemu', 'console_scripts', 'aqmp-tui')())
  File "/home/niteesh/development/qemu/python/qemu/aqmp/aqmp_tui.py", line
695, in main
app.run(args.asyncio_debug)
  File "/home/niteesh/development/qemu/python/qemu/aqmp/aqmp_tui.py", line
444, in run
raise err
  File "/home/niteesh/development/qemu/python/qemu/aqmp/aqmp_tui.py", line
441, in run
main_loop.run()
  File
"/home/niteesh/development/qemu/python/.venv/lib/python3.6/site-packages/urwid/main_loop.py",
line 287, in run
self._run()
  File
"/home/niteesh/development/qemu/python/.venv/lib/python3.6/site-packages/urwid/main_loop.py",
line 385, in _run
self.event_loop.run()
  File
"/home/niteesh/development/qemu/python/.venv/lib/python3.6/site-packages/urwid/main_loop.py",
line 1494, in run
reraise(*exc_info)
  File
"/home/niteesh/development/qemu/python/.venv/lib/python3.6/site-packages/urwid/compat.py",
line 58, in reraise
raise value
  File "/home/niteesh/development/qemu/python/qemu/aqmp/aqmp_tui.py", line
391, in manage_connection
await self.disconnect()
  File "/home/niteesh/development/qemu/python/qemu/aqmp/aqmp_tui.py", line
312, in disconnect
raise err
  File "/home/niteesh/development/qemu/python/qemu/aqmp/aqmp_tui.py", line
300, in disconnect
await super().disconnect()
  File "/home/niteesh/development/qemu/python/qemu/aqmp/protocol.py", line
302, in disconnect
await self._wait_disconnect()
  File "/home/niteesh/development/qemu/python/qemu/aqmp/protocol.py", line
583, in _wait_disconnect
self._cleanup()
  File "/home/niteesh/development/qemu/python/qemu/aqmp/qmp_client.py",
line 331, in _cleanup
super()._cleanup()
  File "/home/niteesh/development/qemu/python/qemu/aqmp/protocol.py", line
597, in _cleanup
assert self.runstate == Runstate.DISCONNECTING
AssertionError
---


>
>>  super().__init__()
>>
>>  def add_to_history(self, msg: 

Re: Using loadvm with snapshot

2021-08-17 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Tue, 17 Aug 2021 at 17:27, Dr. David Alan Gilbert
>  wrote:
> >
> > * Gabriel Southern (gsout...@qti.qualcomm.com) wrote:
> > > Hi,
> > >
> > > Are there plans to support using -loadvm with -snapshot?
> > >
> > > I saw some past discussion on mailing list including bug that was closed 
> > > earlier this year but no recent activity:
> > >
> > > https://lore.kernel.org/qemu-devel/162424905685.11837.7303570061857383641.mal...@loganberry.canonical.com/
> > >
> > > Testing with latest qemu it looks like -loadvm still does not work when 
> > > combined with -snapshot.
> > >
> > > I'm curious how complex it would be to implement this feature and if it 
> > > may show up on QEMU roadmap in the future. Or if there is alterative 
> > > command that can be used to save the state of a running VM and then use 
> > > the same qcow to run multiple QEMU instances loading this VM state 
> > > running in snapshot mode.
> 
> Do you know why -loadvm and -snapshot don't work together?
> It doesn't on the face of it seem like they would be incompatible...

No I don't; I've never actually used -loadvm - given that both the
snapshot mechanism and the load/savevm mechanism are fairly special
qcow2 hacks, it'll be a qcow2 level thing.

Dave

> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v3] vga: don't abort when adding a duplicate isa-vga device

2021-08-17 Thread Jose R. Ziviani
On Tue, Aug 17, 2021 at 10:07:55AM +0200, Philippe Mathieu-Daudé wrote:
> On 8/17/21 9:36 AM, Mark Cave-Ayland wrote:
> > On 17/08/2021 08:25, Thomas Huth wrote:
> > 
> >> On 16/08/2021 15.55, Jose R. Ziviani wrote:
> >>> If users try to add an isa-vga device that was already registered,
> >>> still in command line, qemu will crash:
> >>>
> >>> $ qemu-system-mips64el -M pica61 -device isa-vga
> >>> RAMBlock "vga.vram" already registered, abort!
> >>> Aborted (core dumped)
> >>>
> >>> That particular board registers the device automaticaly, so it's
> >>> not obvious that a VGA device already exists. This patch changes
> >>> this behavior by displaying a message and exiting without crashing.
> >>>
> >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/44
> >>> Reviewed-by: Philippe Mathieu-Daudé 
> >>> Signed-off-by: Jose R. Ziviani 
> >>> ---
> >>> v2 to v3: Improved error message
> >>> v1 to v2: Use error_setg instead of error_report
> >>>
> >>>   hw/display/vga-isa.c | 10 ++
> >>>   1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/hw/display/vga-isa.c b/hw/display/vga-isa.c
> >>> index 90851e730b..30d55b41c3 100644
> >>> --- a/hw/display/vga-isa.c
> >>> +++ b/hw/display/vga-isa.c
> >>> @@ -33,6 +33,7 @@
> >>>   #include "hw/loader.h"
> >>>   #include "hw/qdev-properties.h"
> >>>   #include "qom/object.h"
> >>> +#include "qapi/error.h"
> >>>   #define TYPE_ISA_VGA "isa-vga"
> >>>   OBJECT_DECLARE_SIMPLE_TYPE(ISAVGAState, ISA_VGA)
> >>> @@ -61,6 +62,15 @@ static void vga_isa_realizefn(DeviceState *dev,
> >>> Error **errp)
> >>>   MemoryRegion *vga_io_memory;
> >>>   const MemoryRegionPortio *vga_ports, *vbe_ports;
> >>> +    /*
> >>> + * make sure this device is not being added twice, if so
> >>> + * exit without crashing qemu
> >>> + */
> >>> +    if (qemu_ram_block_by_name("vga.vram")) {
> >>> +    error_setg(errp, "'isa-vga' device already registered");
> >>> +    return;
> >>> +    }
> >>> +
> >>>   s->global_vmstate = true;
> >>>   vga_common_init(s, OBJECT(dev));
> >>>   s->legacy_address_space = isa_address_space(isadev);
> >>>
> >>
> >> Reviewed-by: Thomas Huth 
> > 
> > Instead of checking for the presence of the vga.vram block, would it be
> > better to use the standard object_resolve_path_type() method to check
> > for the presence of the existing isa-vga device instead? See
> > https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg00717.html for
> > how this was done for virgl.
> 
> I remembered there was a nicer way but couldn't find it.
> If this patch were for 6.1, it was good enough. Now it
> will be merged in 6.2, I prefer Mark's suggestion.
> Jose, do you mind a v4?
> 

Hello people! Thanks for reviewing it.

Sure, I'll send a v4. It's not for 6.1 anyway.

Thank you very much


signature.asc
Description: Digital signature


Re: Using loadvm with snapshot

2021-08-17 Thread Peter Maydell
On Tue, 17 Aug 2021 at 17:27, Dr. David Alan Gilbert
 wrote:
>
> * Gabriel Southern (gsout...@qti.qualcomm.com) wrote:
> > Hi,
> >
> > Are there plans to support using -loadvm with -snapshot?
> >
> > I saw some past discussion on mailing list including bug that was closed 
> > earlier this year but no recent activity:
> >
> > https://lore.kernel.org/qemu-devel/162424905685.11837.7303570061857383641.mal...@loganberry.canonical.com/
> >
> > Testing with latest qemu it looks like -loadvm still does not work when 
> > combined with -snapshot.
> >
> > I'm curious how complex it would be to implement this feature and if it may 
> > show up on QEMU roadmap in the future. Or if there is alterative command 
> > that can be used to save the state of a running VM and then use the same 
> > qcow to run multiple QEMU instances loading this VM state running in 
> > snapshot mode.

Do you know why -loadvm and -snapshot don't work together?
It doesn't on the face of it seem like they would be incompatible...

-- PMM



[PATCH] migration: Don't sync dirty bitmap when init

2021-08-17 Thread Peter Xu
Drop migration_bitmap_sync_precopy() since dirty bitmap is initialized to all
ones anyways, so no need to sync at start.

Since at it, clean the locks up a bit:

  - RCU lock is only needed to walk the ramblocks, move it into
ram_list_init_bitmaps().

  - The ram_list lock seems to be unnecessary now, drop it.

  - The bql should only be needed for memory_global_dirty_log_start(), move it
to only protect that.

Signed-off-by: Peter Xu 
---
 migration/ram.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7a43bfd7af..189d6427ac 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2651,6 +2651,7 @@ static void ram_list_init_bitmaps(void)
 shift = CLEAR_BITMAP_SHIFT_MIN;
 }
 
+RCU_READ_LOCK_GUARD();
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 pages = block->max_length >> TARGET_PAGE_BITS;
 /*
@@ -2672,20 +2673,14 @@ static void ram_list_init_bitmaps(void)
 
 static void ram_init_bitmaps(RAMState *rs)
 {
-/* For memory_global_dirty_log_start below.  */
-qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+ram_list_init_bitmaps();
 
-WITH_RCU_READ_LOCK_GUARD() {
-ram_list_init_bitmaps();
-/* We don't use dirty log with background snapshots */
-if (!migrate_background_snapshot()) {
-memory_global_dirty_log_start();
-migration_bitmap_sync_precopy(rs);
-}
+/* We don't use dirty log with background snapshots */
+if (!migrate_background_snapshot()) {
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_start();
+qemu_mutex_unlock_iothread();
 }
-qemu_mutex_unlock_ramlist();
-qemu_mutex_unlock_iothread();
 }
 
 static int ram_init_all(RAMState **rsp)
-- 
2.31.1




Re: [PATCH 1/3] MAINTAINERS: Split Audio backends VS frontends

2021-08-17 Thread Philippe Mathieu-Daudé
On 8/17/21 6:12 PM, Christian Schoenebeck wrote:
> On Dienstag, 17. August 2021 14:41:27 CEST Gerd Hoffmann wrote:
>>   Hi,
>>
 +Overall Audio frontends
>>>
>>> I would call that "Audio Hardware Emulation" instead of "Overall Audio
>>> frontends".
>>>
 +Overall Audio backends
>>>
>>> Likewise I would call this section "Shared/common QEMU audio library/
>>> subsystem" or something like that instead of "Overall Audio backends".
>>
>> Well, frontend/backend is common qemu terminology, with "frontend" being
>> the emulated/virtual device as seen by the guest and "backend" being the
>> host-side wireup (i.e. -audiodev / -blockdev / -chardev / -netdev / ...)
>>
>> take care,
>>   Gerd
> 
> Yeah, I was seeing this (like usual) more from an external/new developer 
> perspective where the semantic for "frontend"/"backend" is not that obvious 
> here.

"Audio Backends" is in the "Subsystems" meta-section, and
"Audio Frontends" in the "Devices" one.

Maybe we using the === separator for meta-sections (like rST)
instead of --- would help seeing the difference.

I don't want to use parenthesis in the descriptions because
this then breaks the get_maintainer.pl output when parsed
by (git) scripts.

> 
> But okay ...
> 
> Acked-by: Christian Schoenebeck 
> 
> Best regards,
> Christian Schoenebeck
> 
> 




Re: [PATCH] multifd: Implement yank for multifd send side

2021-08-17 Thread Leonardo Bras Soares Passos
On Mon, Aug 16, 2021 at 2:44 AM Leonardo Bras Soares Passos
 wrote:
>
> Hello Lukas,
>
> On Wed, Aug 4, 2021 at 4:27 PM Lukas Straub  wrote:
> >
> > When introducing yank functionality in the migration code I forgot
> > to cover the multifd send side.
> >
> > Signed-off-by: Lukas Straub 
> > ---
> >
> > @Leonardo Could you check if this fixes your issue?
>
> In the same scenario I was testing my previous patch,
> (http://patchwork.ozlabs.org/project/qemu-devel/patch/20210730074043.54260-1-leob...@redhat.com/)
> this patch also seems to fix the issue .
> (https://bugzilla.redhat.com/show_bug.cgi?id=1970337).

Regarding this single test:
Tested-by: Leonardo Bras 

I am by no means a yank or migration expert, but the change seems to
make sense based on what I could learn in the above BZ.

So, FWIW:
Reviewed-by: Leonardo Bras 

Although I think it would be great if a more experienced person could
also review.

Best regards,
Leonardo Bras



>
>
> >
> >  migration/multifd.c | 6 +-
> >  migration/multifd.h | 2 ++
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 377da78f5b..5a4f158f3c 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -546,6 +546,9 @@ void multifd_save_cleanup(void)
> >  MultiFDSendParams *p = _send_state->params[i];
> >  Error *local_err = NULL;
> >
> > +if (p->registered_yank) {
> > +migration_ioc_unregister_yank(p->c);
> > +}
> >  socket_send_channel_destroy(p->c);
> >  p->c = NULL;
> >  qemu_mutex_destroy(>mutex);
> > @@ -813,7 +816,8 @@ static bool multifd_channel_connect(MultiFDSendParams 
> > *p,
> >  return false;
> >  }
> >  } else {
> > -/* update for tls qio channel */
> > +migration_ioc_register_yank(ioc);
> > +p->registered_yank = true;
> >  p->c = ioc;
> >  qemu_thread_create(>thread, p->name, multifd_send_thread, p,
> > QEMU_THREAD_JOINABLE);
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 8d6751f5ed..16c4d112d1 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -85,6 +85,8 @@ typedef struct {
> >  bool running;
> >  /* should this thread finish */
> >  bool quit;
> > +/* is the yank function registered */
> > +bool registered_yank;
> >  /* thread has work to do */
> >  int pending_job;
> >  /* array of pages to sent */
> > --
> > 2.32.0




Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Paolo Bonzini

On 17/08/21 01:53, Steve Rutherford wrote:

Separately, I'm a little weary of leaving the migration helper mapped
into the shared address space as writable.


A related question here is what the API should be for how the migration 
helper sees the memory in both physical and virtual address.


First of all, I would like the addresses passed to and from the 
migration helper to *not* be guest physical addresses (this is what I 
referred to as QEMU's ram_addr_t in other messages).  The reason is that 
some unmapped memory regions, such as virtio-mem hotplugged memory, 
would still have to be transferred and could be encrypted.  While the 
guest->host hypercall interface uses guest physical addresses to 
communicate which pages are encrypted, the host can do the 
GPA->ram_addr_t conversion and remember the encryption status of 
currently-unmapped regions.


This poses a problem, in that the guest needs to prepare the page tables 
for the migration helper and those need to use the migration helper's 
physical address space.


There's three possibilities for this:

1) the easy one: the bottom 4G of guest memory are mapped in the mirror 
VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a 
huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1). 
This even lets the migration helper reuse the OVMF runtime services 
memory map (but be careful about thread safety...).


2) the more future-proof one.  Here, the migration helper tells QEMU 
which area to copy from the guest to the mirror VM, as a (main GPA, 
length, mirror GPA) tuple.  This could happen for example the first time 
the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts, 
QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION 
accordingly.  The page tables are built for this (usually very high) 
mirror GPA and the migration helper operates in a completely separate 
address space.  However, the backing memory would still be shared 
between the main and mirror VMs.  I am saying this is more future proof 
because we have more flexibility in setting up the physical address 
space of the mirror VM.


3) the paranoid one, which I think is what you hint at above: this is an 
extension of (2), where userspace invokes the PSP send/receive API to 
copy the small requested area of the main VM into the mirror VM.  The 
mirror VM code and data are completely separate from the main VM.  All 
that the mirror VM shares is the ram_addr_t data.  Though I am not even 
sure it is possible to use the send/receive API this way...


What do you think?

Paolo




Re: [PATCH v4 1/3] target-arm: Add support for Fujitsu A64FX

2021-08-17 Thread Andrew Jones
On Tue, Aug 17, 2021 at 05:53:34AM -1000, Richard Henderson wrote:
> On 8/17/21 5:36 AM, Andrew Jones wrote:
> > On Tue, Aug 17, 2021 at 05:23:17AM -1000, Richard Henderson wrote:
> > > On 8/17/21 1:56 AM, Andrew Jones wrote:
> > > > I guess it's fine. You could easily create a new cpu_arm_set_sve_vq()
> > > > which would forbid changing the properties if you wanted to, but then
> > > > we need to answer Peter's question in order to see if there's a
> > > > precedent for that type of property.
> > > 
> > > I don't see the point in read-only properties.  If the user wants to set
> > > non-standard values on the command-line, let them.  What is most important
> > > is getting the correct default from '-cpu a64fx'.
> > > 
> > 
> > So maybe we should just go ahead and add all sve* properties, but then
> > make sure the default vq map is correct.
> 
> I think that's the right answer.
> 
> Presently we have a kvm_supported variable that's initialized by
> kvm_arm_sve_get_vls().  I think we want to rename that variable and provide
> a version of that function for tcg. Probably we should have done that
> before, with a trivial function for -cpu max to set all bits.
> 
> Then eliminate most of the other kvm_enabled() checks in
> arm_cpu_sve_finalize.  I think the only one we keep is the last, where we
> verify that the final sve_vq_map matches kvm_enabled exactly, modulo max_vq.
> 
> This should minimize the differences in behaviour between tcg and kvm.

That's a good idea. I'll send a patch with your suggested-by.

Thanks,
drew




Re: Using loadvm with snapshot

2021-08-17 Thread Dr. David Alan Gilbert
* Gabriel Southern (gsout...@qti.qualcomm.com) wrote:
> Hi,
> 
> Are there plans to support using -loadvm with -snapshot?
> 
> I saw some past discussion on mailing list including bug that was closed 
> earlier this year but no recent activity:
> 
> https://lore.kernel.org/qemu-devel/162424905685.11837.7303570061857383641.mal...@loganberry.canonical.com/
> 
> Testing with latest qemu it looks like -loadvm still does not work when 
> combined with -snapshot.
> 
> I'm curious how complex it would be to implement this feature and if it may 
> show up on QEMU roadmap in the future. Or if there is alterative command that 
> can be used to save the state of a running VM and then use the same qcow to 
> run multiple QEMU instances loading this VM state running in snapshot mode.

One thing that might work for you is migrate to a file;

 (qemu)  migrate_set_parameter max-bandwidth 100G
 (qemu) migrate "exec:cat > myfile.mig"
 (qemu) q


qemu -incoming "exec:cat myfile.mig"

Dave
> 
> Thanks,
> 
> -Gabriel
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH] target/arm: Take an exception if PSTATE.IL is set

2021-08-17 Thread Peter Maydell
In v8A, the PSTATE.IL bit is set for various kinds of illegal
exception return or mode-change attempts.  We already set PSTATE.IL
(or its AArch32 equivalent CPSR.IL) in all those cases, but we
weren't implementing the part of the behaviour where attempting to
execute an instruction with PSTATE.IL takes an immediate exception
with an appropriate syndrome value.

Add a new TB flags bit tracking PSTATE.IL/CPSR.IL, and generate code
to take an exception instead of whatever the instruction would have
been.

PSTATE.IL and CPSR.IL change only on exception entry, attempted
exception exit, and various AArch32 mode changes via cpsr_write().
These places generally already rebuild the hflags, so the only place
we need an extra rebuild_hflags call is in the illegal-return
codepath of the AArch64 exception_return helper.

Signed-off-by: Peter Maydell 
---
Obviously correct guest code is never going to set PSTATE.IL, but
it's pretty confusing to debug bugs in guest OSes if we just plough
on executing code rather than taking the illegal-state exception.  We
had a user point this bug out to us earlier this year I think
(probably on IRC since I can't find anything about it in my email).
---
 target/arm/cpu.h   |  1 +
 target/arm/syndrome.h  |  5 +
 target/arm/translate.h |  2 ++
 target/arm/helper-a64.c|  1 +
 target/arm/helper.c|  8 
 target/arm/translate-a64.c | 10 ++
 target/arm/translate.c | 19 +++
 7 files changed, 46 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9f0a5f84d50..be557bf5d83 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3441,6 +3441,7 @@ FIELD(TBFLAG_ANY, FPEXC_EL, 8, 2)
 FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 10, 2)
 /* Memory operations require alignment: SCTLR_ELx.A or CCR.UNALIGN_TRP */
 FIELD(TBFLAG_ANY, ALIGN_MEM, 12, 1)
+FIELD(TBFLAG_ANY, PSTATE__IL, 13, 1)
 
 /*
  * Bit usage when in AArch32 state, both A- and M-profile.
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index 39a31260f2d..c590a109da9 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -270,4 +270,9 @@ static inline uint32_t syn_wfx(int cv, int cond, int ti, 
bool is_16bit)
(cv << 24) | (cond << 20) | ti;
 }
 
+static inline uint32_t syn_illegalstate(void)
+{
+return EC_ILLEGALSTATE << ARM_EL_EC_SHIFT;
+}
+
 #endif /* TARGET_ARM_SYNDROME_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 241596c5bda..af1b6fa03c9 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -98,6 +98,8 @@ typedef struct DisasContext {
 bool hstr_active;
 /* True if memory operations require alignment */
 bool align_mem;
+/* True if PSTATE.IL is set */
+bool pstate_il;
 /*
  * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI.
  *  < 0, set by the current instruction.
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 26f79f9141a..19445b3c947 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1071,6 +1071,7 @@ illegal_return:
 if (!arm_singlestep_active(env)) {
 env->pstate &= ~PSTATE_SS;
 }
+helper_rebuild_hflags_a64(env, cur_el);
 qemu_log_mask(LOG_GUEST_ERROR, "Illegal exception return at EL%d: "
   "resuming execution at 0x%" PRIx64 "\n", cur_el, env->pc);
 }
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 155d8bf2399..201ecf8c67f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -13408,6 +13408,10 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState 
*env, int fp_el,
 DP_TBFLAG_A32(flags, HSTR_ACTIVE, 1);
 }
 
+if (env->uncached_cpsr & CPSR_IL) {
+DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
+}
+
 return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
@@ -13502,6 +13506,10 @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState 
*env, int el, int fp_el,
 }
 }
 
+if (env->pstate & PSTATE_IL) {
+DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
+}
+
 if (cpu_isar_feature(aa64_mte, env_archcpu(env))) {
 /*
  * Set MTE_ACTIVE if any access may be Checked, and leave clear
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 422e2ac0c96..7ff922d4302 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14662,6 +14662,15 @@ static void disas_a64_insn(CPUARMState *env, 
DisasContext *s)
 s->fp_access_checked = false;
 s->sve_access_checked = false;
 
+if (s->pstate_il) {
+/*
+ * Illegal execution state. This has priority over BTI
+ * exceptions, but comes after instruction abort exceptions.
+ */
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_illegalstate(), default_exception_el(s));
+}
+
 if (dc_isar_feature(aa64_bti, s)) {
 if (s->base.num_insns == 1) {
 /*
@@ -14780,6 +14789,7 @@ static void 

Re: Bootloading within QEMU?

2021-08-17 Thread Peter Maydell
On Tue, 17 Aug 2021 at 16:57, Paolo Bonzini  wrote:
> On 17/08/21 16:31, Kenneth Adam Miller wrote:
> > I am trying to discover how to schedule QEMU to begin actual emulation
> > as currently my target correctly starts QEMU but only shows the shell,
> > and not even boot loading occurs within QEMU. I'm trying to learn from
> > example, and so will focus my questions only on X86.

x86 is the oldest of QEMU's target architectures and thus the
one most laden down with ancient "we wouldn't write it that way today"
code, backwards-compatibility cruft and other confusing encrustations.
It's not a good choice for trying to learn how a target architecture
should be structured, I'm afraid. Arm is good-in-parts but has a similar
amount of old code and back-compat junk (we have overhauled the translate.c
code massively, so that part is good, unlike the i386 translate.c which
is absolutely dreadful). You might try riscv, that's a lot newer.

> > I can see the
> > MachineClass and MachineState types, and I have tried to follow QEMU
> > with the debugger and found where QEMU calls qemu_init and
> > qemu_main_loop under qemu/softmmu/main.c, and even tried to follow
> > through from init to main loop to see where it would begin booting, but
> > I cannot see where the bootloader is scheduled or specified or started
> > from within the target occurs.
>
> There are two possibilities:
>
> 1) QEMU loads a fixed firmware file, usually at a fixed address in
> memory so that the reset vector of the CPU is inside the firmware.  This
> is what happens for example on x86.  The firmware ultimately boots the
> machine (e.g. on x86 you have BIOS->GRUB->Linux or something like that).
>
> 2) QEMU loads a binary specified on the command line---typically with
> -kernel, which is stored in current_machine->kernel_filename---and
> somehow arranges for the guest to execute that file when it starts.  For
> example one possibility is to write a jump instruction at the CPU reset
> vector (see riscv_setup_rom_reset_vec for an example).  The functions
> you want to look at for the loading part are load_elf_ram*, and
> load_uimage_as and load_image_targphys_as.

For a new architecture I would strongly suggest avoiding putting
any more magic into the "-kernel" handling than you can avoid.
You probably do want it to do "load a Linux kernel with whatever
the standard image format and boot protocol that implies", but
stick to exactly that (and if you can avoid it, don't even implement
that). Definitely don't overload it with "and if it's an ELF file then
load it like an ELF file too" or supporting 15 different kinds of file
format or other "do what I mean" handling.

You can do generic "load an ELF file" with the generic-loader
https://qemu-project.gitlab.io/qemu/system/generic-loader.html
which requires no architecture or board specific handling --
as the name suggests, it is generic ;-) . This makes it different
from the -kernel and -bios options, which both need at least
some handling in the board code.

A general recommendation: to the extent that you can do so, avoid
implementing behaviour in QEMU which is not just "same thing the
real hardware does". When you're implementing "what the hardware
does" you have a concrete specification that defines what the
"right thing" is, people writing code for it hopefully already
know what that behaviour is, and you can generally point your users
at the h/w docs for specifics rather than having to write them
up in the QEMU docs. As soon as you wander off into the realms
of "it would be kind of convenient if QEMU directly booted this
file I had lying around" (which usually implies emulating some
behaviour that is not that of the hardware but of firmware or
a bootloader) things get a lot murkier and you can end up with
a bit of a mess, especially over time. Worse, that mess is hard
to back out of because we don't like to break backwards-compatibility
for user command lines that worked with previous QEMU versions.
target/arm's "let's just pretend we're a bootloader because
it's convenient" code was initially about 100 lines long;
today it is more than ten times that size...

-- PMM



Re: [PATCH for-6.2 21/25] hw/timer/armv7m_systick: Use clock inputs instead of system_clock_scale

2021-08-17 Thread Damien Hedde



On 8/17/21 5:59 PM, Peter Maydell wrote:
> On Tue, 17 Aug 2021 at 16:55, Damien Hedde  wrote:
>>
>>
>>
>> On 8/12/21 11:33 AM, Peter Maydell wrote:
>> According to
>> https://developer.arm.com/documentation/ddi0403/d/System-Level-Architecture/System-Address-Map/The-system-timer--SysTick/SysTick-Calibration-value-Register--SYST-CALIB
>> , the field is 24bits wide.
>>
>> Should we prevent an overflow into the reserved bits and other fields ?
>> by doing something like this:
>>val &= SYSCALIB_TENMS;
>> with the following #define with the other ones, above.
>> #define SYSCALIB_TENMS ((1U << 24) - 1)
>>
>> Note, the overflow would happen around ~1.68GHz refclk frequency, it is
>> probably a config that will never happen. I'm not sure if we should care
>> or do something if this happens because it is probably an error
>> somewhere else.
> 
> I guess we should do something, yes, though pretty much anything
> we do will not really provide the guest with sensible data...
> I suppose masking out the higher bits is no worse than anything else.
> 
> -- PMM
> 


Then, with the masking.
Reviewed-by: Damien Hedde 

--
Damien



Re: [PATCH 1/3] MAINTAINERS: Split Audio backends VS frontends

2021-08-17 Thread Christian Schoenebeck
On Dienstag, 17. August 2021 14:41:27 CEST Gerd Hoffmann wrote:
>   Hi,
> 
> > > +Overall Audio frontends
> > 
> > I would call that "Audio Hardware Emulation" instead of "Overall Audio
> > frontends".
> > 
> > > +Overall Audio backends
> > 
> > Likewise I would call this section "Shared/common QEMU audio library/
> > subsystem" or something like that instead of "Overall Audio backends".
> 
> Well, frontend/backend is common qemu terminology, with "frontend" being
> the emulated/virtual device as seen by the guest and "backend" being the
> host-side wireup (i.e. -audiodev / -blockdev / -chardev / -netdev / ...)
> 
> take care,
>   Gerd

Yeah, I was seeing this (like usual) more from an external/new developer 
perspective where the semantic for "frontend"/"backend" is not that obvious 
here.

But okay ...

Acked-by: Christian Schoenebeck 

Best regards,
Christian Schoenebeck





Re: [PATCH RESEND 2/2] memory: Add tracepoint for dirty sync

2021-08-17 Thread Peter Xu
On Tue, Aug 17, 2021 at 09:25:56AM +0200, David Hildenbrand wrote:
> On 17.08.21 03:37, Peter Xu wrote:
> > Trace at memory_region_sync_dirty_bitmap() for log_sync() or 
> > global_log_sync()
> > on memory regions.  One trace line should suffice when it finishes, so as to
> > estimate the time used for each log sync process.
> 
> I wonder if a start/finish would be even nicer. At least it wouldn't really
> result in significantly more code changes :)

Note that the "name"s I added is not only for not using start/end, it's about
knowing which memory listener is slow.  Start/end won't achieve that if we
don't have a name for them.  So far I just wanted to identify majorly kvm,
vhost and kvm-smram, however it'll always be good when some log_sync is missed
when tracing.

I'm also wondering whether kvm-smram needs a whole bitmap as I don't know what
RAM would be touched within system manager mode (as I thought it should only
touch a very limited range and should be defined somewhere), but that's
off-topic.

If we want to make it start/end pair, I can do that too.  But the 1st patch
will still be wanted.

Thanks,

-- 
Peter Xu




Re: [PATCH RESEND 2/2] memory: Add tracepoint for dirty sync

2021-08-17 Thread David Hildenbrand

On 17.08.21 18:05, Peter Xu wrote:

On Tue, Aug 17, 2021 at 09:25:56AM +0200, David Hildenbrand wrote:

On 17.08.21 03:37, Peter Xu wrote:

Trace at memory_region_sync_dirty_bitmap() for log_sync() or global_log_sync()
on memory regions.  One trace line should suffice when it finishes, so as to
estimate the time used for each log sync process.


I wonder if a start/finish would be even nicer. At least it wouldn't really
result in significantly more code changes :)


Note that the "name"s I added is not only for not using start/end, it's about
knowing which memory listener is slow.  Start/end won't achieve that if we
don't have a name for them.  So far I just wanted to identify majorly kvm,
vhost and kvm-smram, however it'll always be good when some log_sync is missed
when tracing.

I'm also wondering whether kvm-smram needs a whole bitmap as I don't know what
RAM would be touched within system manager mode (as I thought it should only
touch a very limited range and should be defined somewhere), but that's
off-topic.

If we want to make it start/end pair, I can do that too.  But the 1st patch
will still be wanted.


Yeah, absolutely, not complaining about the name, it will be valuable to 
have!



--
Thanks,

David / dhildenb




Re: [PATCH for-6.2 23/25] hw/arm/stellaris: Split stellaris-gptm into its own file

2021-08-17 Thread Damien Hedde



On 8/12/21 11:33 AM, Peter Maydell wrote:
> The implementation of the Stellaris general purpose timer module
> device stellaris-gptm is currently in the same source file as the
> board model.  Split it out into its own source file in hw/timer.
> 
> Apart from the new file comment headers and the Kconfig and
> meson.build changes, this is just code movement.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Damien Hedde 

--
Damien



Re: [PATCH for-6.2 21/25] hw/timer/armv7m_systick: Use clock inputs instead of system_clock_scale

2021-08-17 Thread Peter Maydell
On Tue, 17 Aug 2021 at 16:55, Damien Hedde  wrote:
>
>
>
> On 8/12/21 11:33 AM, Peter Maydell wrote:
> According to
> https://developer.arm.com/documentation/ddi0403/d/System-Level-Architecture/System-Address-Map/The-system-timer--SysTick/SysTick-Calibration-value-Register--SYST-CALIB
> , the field is 24bits wide.
>
> Should we prevent an overflow into the reserved bits and other fields ?
> by doing something like this:
>val &= SYSCALIB_TENMS;
> with the following #define with the other ones, above.
> #define SYSCALIB_TENMS ((1U << 24) - 1)
>
> Note, the overflow would happen around ~1.68GHz refclk frequency, it is
> probably a config that will never happen. I'm not sure if we should care
> or do something if this happens because it is probably an error
> somewhere else.

I guess we should do something, yes, though pretty much anything
we do will not really provide the guest with sensible data...
I suppose masking out the higher bits is no worse than anything else.

-- PMM



Re: [PATCH for-6.2 21/25] hw/timer/armv7m_systick: Use clock inputs instead of system_clock_scale

2021-08-17 Thread Damien Hedde



On 8/12/21 11:33 AM, Peter Maydell wrote:
> Now that all users of the systick devices wire up the clock inputs,
> use those instead of the system_clock_scale and the hardwired 1MHz
> value for the reference clock.
> 
> This will fix various board models where we were incorrectly
> providing a 1MHz reference clock instead of some other value or
> instead of providing no reference clock at all.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/timer/armv7m_systick.c | 110 --
>  1 file changed, 82 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/timer/armv7m_systick.c b/hw/timer/armv7m_systick.c
> index e43f74114e8..39cca206cfd 100644
> --- a/hw/timer/armv7m_systick.c
> +++ b/hw/timer/armv7m_systick.c
> @@ -18,25 +18,29 @@
>  #include "qemu/timer.h"
>  #include "qemu/log.h"
>  #include "qemu/module.h"
> +#include "qapi/error.h"
>  #include "trace.h"
>  
> -/* qemu timers run at 1GHz.   We want something closer to 1MHz.  */
> -#define SYSTICK_SCALE 1000ULL
> -
>  #define SYSTICK_ENABLE(1 << 0)
>  #define SYSTICK_TICKINT   (1 << 1)
>  #define SYSTICK_CLKSOURCE (1 << 2)
>  #define SYSTICK_COUNTFLAG (1 << 16)
>  
> +#define SYSCALIB_NOREF (1U << 31)
> +#define SYSCALIB_SKEW (1U << 30)
> +
>  int system_clock_scale;
>  
> -/* Conversion factor from qemu timer to SysTick frequencies.  */
> -static inline int64_t systick_scale(SysTickState *s)
> +static void systick_set_period_from_clock(SysTickState *s)
>  {
> +/*
> + * Set the ptimer period from whichever clock is selected.
> + * Must be called from within a ptimer transaction block.
> + */
>  if (s->control & SYSTICK_CLKSOURCE) {
> -return system_clock_scale;
> +ptimer_set_period_from_clock(s->ptimer, s->cpuclk, 1);
>  } else {
> -return 1000;
> +ptimer_set_period_from_clock(s->ptimer, s->refclk, 1);
>  }
>  }
>  
> @@ -83,7 +87,27 @@ static MemTxResult systick_read(void *opaque, hwaddr addr, 
> uint64_t *data,
>  val = ptimer_get_count(s->ptimer);
>  break;
>  case 0xc: /* SysTick Calibration Value.  */
> -val = 1;
> +/*
> + * In real hardware it is possible to make this register report
> + * a different value from what the reference clock is actually
> + * running at. We don't model that (which usually happens due
> + * to integration errors in the real hardware) and instead always
> + * report the theoretical correct value as described in the
> + * knowledgebase article at
> + * https://developer.arm.com/documentation/ka001325/latest
> + * If necessary, we could implement an extra QOM property on this
> + * device to force the STCALIB value to something different from
> + * the "correct" value.
> + */
> +if (!clock_has_source(s->refclk)) {
> +val = SYSCALIB_NOREF;
> +break;
> +}
> +val = clock_ns_to_ticks(s->refclk, 10 * SCALE_MS) - 1;

According to
https://developer.arm.com/documentation/ddi0403/d/System-Level-Architecture/System-Address-Map/The-system-timer--SysTick/SysTick-Calibration-value-Register--SYST-CALIB
, the field is 24bits wide.

Should we prevent an overflow into the reserved bits and other fields ?
by doing something like this:
   val &= SYSCALIB_TENMS;
with the following #define with the other ones, above.
#define SYSCALIB_TENMS ((1U << 24) - 1)

Note, the overflow would happen around ~1.68GHz refclk frequency, it is
probably a config that will never happen. I'm not sure if we should care
or do something if this happens because it is probably an error
somewhere else.

> +if (clock_ticks_to_ns(s->refclk, val + 1) != 10 * SCALE_MS) {
> +/* report that tick count does not yield exactly 10ms */
> +val |= SYSCALIB_SKEW;
> +}
>  break;

Otherwise the patch looks great.

Thanks,
Damien




Re: Bootloading within QEMU?

2021-08-17 Thread Paolo Bonzini

On 17/08/21 16:31, Kenneth Adam Miller wrote:



I am trying to discover how to schedule QEMU to begin actual emulation 
as currently my target correctly starts QEMU but only shows the shell, 
and not even boot loading occurs within QEMU. I'm trying to learn from 
example, and so will focus my questions only on X86. I can see the 
MachineClass and MachineState types, and I have tried to follow QEMU 
with the debugger and found where QEMU calls qemu_init and 
qemu_main_loop under qemu/softmmu/main.c, and even tried to follow 
through from init to main loop to see where it would begin booting, but 
I cannot see where the bootloader is scheduled or specified or started 
from within the target occurs.


There are two possibilities:

1) QEMU loads a fixed firmware file, usually at a fixed address in 
memory so that the reset vector of the CPU is inside the firmware.  This 
is what happens for example on x86.  The firmware ultimately boots the 
machine (e.g. on x86 you have BIOS->GRUB->Linux or something like that).


2) QEMU loads a binary specified on the command line---typically with 
-kernel, which is stored in current_machine->kernel_filename---and 
somehow arranges for the guest to execute that file when it starts.  For 
example one possibility is to write a jump instruction at the CPU reset 
vector (see riscv_setup_rom_reset_vec for an example).  The functions 
you want to look at for the loading part are load_elf_ram*, and 
load_uimage_as and load_image_targphys_as.


Note that on platforms that use a fixed firmware file there's still the 
possibility of using -kernel.  In that case, the firmware initializes 
the system, then places the binary in memory and jumps to it.  qboot 
(https://github.com/qemu/qboot) is a very small x86 firmware that is 
able to boot a Linux or multiboot kernel.


Paolo




Re: [PATCH v4 1/3] target-arm: Add support for Fujitsu A64FX

2021-08-17 Thread Richard Henderson

On 8/17/21 5:36 AM, Andrew Jones wrote:

On Tue, Aug 17, 2021 at 05:23:17AM -1000, Richard Henderson wrote:

On 8/17/21 1:56 AM, Andrew Jones wrote:

I guess it's fine. You could easily create a new cpu_arm_set_sve_vq()
which would forbid changing the properties if you wanted to, but then
we need to answer Peter's question in order to see if there's a
precedent for that type of property.


I don't see the point in read-only properties.  If the user wants to set
non-standard values on the command-line, let them.  What is most important
is getting the correct default from '-cpu a64fx'.



So maybe we should just go ahead and add all sve* properties, but then
make sure the default vq map is correct.


I think that's the right answer.

Presently we have a kvm_supported variable that's initialized by kvm_arm_sve_get_vls().  I 
think we want to rename that variable and provide a version of that function for tcg. 
Probably we should have done that before, with a trivial function for -cpu max to set all 
bits.


Then eliminate most of the other kvm_enabled() checks in arm_cpu_sve_finalize.  I think 
the only one we keep is the last, where we verify that the final sve_vq_map matches 
kvm_enabled exactly, modulo max_vq.


This should minimize the differences in behaviour between tcg and kvm.

r~



Re: [PATCH-for-6.1 v2] softmmu/physmem: fix wrong assertion in qemu_ram_alloc_internal()

2021-08-17 Thread Peter Maydell
On Tue, 17 Aug 2021 at 08:14, David Hildenbrand  wrote:
>
> On 16.08.21 22:52, Peter Xu wrote:
> > On Thu, Aug 05, 2021 at 11:23:50AM +0200, David Hildenbrand wrote:
> >> When adding RAM_NORESERVE, we forgot to remove the old assertion when
> >> adding the updated one, most probably when reworking the patches or
> >> rebasing. We can easily crash QEMU by adding
> >>-object memory-backend-ram,id=mem0,size=500G,reserve=off
> >> to the QEMU cmdline:
> >>qemu-system-x86_64: ../softmmu/physmem.c:2146: qemu_ram_alloc_internal:
> >>Assertion `(ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC))
> >>== 0' failed.
> >>
> >> Fix it by removing the old assertion.
> >>
> >> Fixes: 8dbe22c6868b ("memory: Introduce RAM_NORESERVE and wire it up in 
> >> qemu_ram_mmap()")
> >> Reviewed-by: Peter Xu 
> >> Reviewed-by: Philippe Mathieu-Daudé 
> >> Tested-by: Philippe Mathieu-Daudé 
> >> Cc: Paolo Bonzini 
> >> Cc: Peter Xu 
> >> Cc: Philippe Mathieu-Daudé 
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>
> >> v1 -> v2:
> >> - Added rbs
> >> - Tagged for 6.1 inclusion
> >>
> >> ---
> >>   softmmu/physmem.c | 1 -
> >>   1 file changed, 1 deletion(-)
> >>
> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> >> index 3c1912a1a0..2e18947598 100644
> >> --- a/softmmu/physmem.c
> >> +++ b/softmmu/physmem.c
> >> @@ -2143,7 +2143,6 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
> >> ram_addr_t max_size,
> >>   RAMBlock *new_block;
> >>   Error *local_err = NULL;
> >>
> >> -assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC)) == 
> >> 0);
> >>   assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
> >> RAM_NORESERVE)) == 0);
> >>   assert(!host ^ (ram_flags & RAM_PREALLOC));
> >> --
> >> 2.31.1
> >>
> >
> > Today I just noticed this patch is still missing for 6.1. How many users are
> > there with reserve=off?  Would it be worth rc4 or not?
> >
>
> Indeed, I forgot to follow up, thanks for bringing this up.
>
> Libvirt does not support virtio-mem yet and consequently doesn't support
> reserve=off yet. (there are use cases without virtio-mem, but I don't
> think anybody is using it yet)
>
> It's an easy way to crash QEMU, but we could also fix in the -stable
> tree instead.

It is a really obvious right fix, though, so I'm going to apply
it for rc4 anyway.

thanks
-- PMM



Re: [PATCH] qapi/machine.json: Remove zero value reference from SMPConfiguration documentation

2021-08-17 Thread Peter Maydell
On Tue, 17 Aug 2021 at 13:56, Andrew Jones  wrote:
>
> Commit 1e63fe685804 ("machine: pass QAPI struct to mc->smp_parse")
> introduced documentation stating that a zero input value for an SMP
> parameter indicates that its value should be automatically configured.
> This is indeed how things work today, but we'd like to change that.
> Avoid documenting behaviors we want to leave undefined for the time
> being, giving us freedom to change it later.
>
> Fixes: 1e63fe685804 ("machine: pass QAPI struct to mc->smp_parse")
> Signed-off-by: Andrew Jones 
> ---
>  qapi/machine.json | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Re: [PULL 24/27] accel/tcg: Move breakpoint recognition outside translation

2021-08-17 Thread Richard Henderson

On 8/17/21 3:33 AM, Peter Maydell wrote:

On Wed, 21 Jul 2021 at 21:00, Richard Henderson
 wrote:


Trigger breakpoints before beginning translation of a TB
that would begin with a BP.  Thus we never generate code
for the BP at all.


I happened to notice in the Arm ARM today a corner case that this
implementation approach I think gets wrong: the priority ordering
of exceptions is supposed to be (among others)
  * (architectural) software step
  * instruction abort
  * (architectural) breakpoints

I think that doing the bp check here means it is incorrectly
hoisted up the priority order above both swstep and insn
abort.


Hmm, you're correct that we get this wrong.


We probably didn't do these in the right priority
order before this series, though, and I dunno whether
we get the insn-abort vs swstep ordering right either...


And you're correct that we got it wrong beforehand.  The reorg did not alter the 
recognized ordering of the exceptions.


I'm a bit surprised that insn-abort comes higher than breakpoint.  Fixing this would mean 
performing the insn decode and only then recognizing the breakpoint.  One of the 
intermediate versions of the patch set would have allowed this sort of thing, but I didn't 
realize it was necessary.  And it would be a huge job to alter all of the trans_* functions.


Fixing the order of swstep and bp can be done via the arm_debug_check_breakpoint hook. 
Just return false if swstep is enabled.



r~



Re: [PATCH v4 1/3] target-arm: Add support for Fujitsu A64FX

2021-08-17 Thread Andrew Jones
On Tue, Aug 17, 2021 at 05:23:17AM -1000, Richard Henderson wrote:
> On 8/17/21 1:56 AM, Andrew Jones wrote:
> > I guess it's fine. You could easily create a new cpu_arm_set_sve_vq()
> > which would forbid changing the properties if you wanted to, but then
> > we need to answer Peter's question in order to see if there's a
> > precedent for that type of property.
> 
> I don't see the point in read-only properties.  If the user wants to set
> non-standard values on the command-line, let them.  What is most important
> is getting the correct default from '-cpu a64fx'.
>

So maybe we should just go ahead and add all sve* properties, but then
make sure the default vq map is correct.

Thanks,
drew




Re: [PATCH v4 1/3] target-arm: Add support for Fujitsu A64FX

2021-08-17 Thread Richard Henderson

On 8/17/21 1:56 AM, Andrew Jones wrote:

I guess it's fine. You could easily create a new cpu_arm_set_sve_vq()
which would forbid changing the properties if you wanted to, but then
we need to answer Peter's question in order to see if there's a
precedent for that type of property.


I don't see the point in read-only properties.  If the user wants to set non-standard 
values on the command-line, let them.  What is most important is getting the correct 
default from '-cpu a64fx'.



r~



Re: [PATCH for-6.2 20/25] hw/arm/msf2-soc: Wire up refclk

2021-08-17 Thread Damien Hedde



On 8/12/21 11:33 AM, Peter Maydell wrote:
> Wire up the refclk for the msf2 SoC.  This SoC runs the refclk at a
> frequency which is programmably either /4, /8, /16 or /32 of the main
> CPU clock.  We don't currently model the register which allows the
> guest to set the divisor, so implement the refclk as a fixed /32 of
> the CPU clock (which is the value of the divisor at reset).
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Damien Hedde 



Re: [PATCH v3 3/6] block: Clarify that @bytes is no limit on *pnum

2021-08-17 Thread Eric Blake
On Thu, Aug 12, 2021 at 10:41:45AM +0200, Hanna Reitz wrote:
> .bdrv_co_block_status() implementations are free to return a *pnum that
> exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp
> *pnum as necessary.
> 
> On the other hand, if drivers' implementations return values for *pnum
> that are as large as possible, our recently introduced block-status
> cache will become more effective.
> 
> So, make a note in block_int.h that @bytes is no upper limit for *pnum.
> 
> Suggested-by: Eric Blake 
> Signed-off-by: Hanna Reitz 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block_int.h | 9 +
>  1 file changed, 9 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH for-6.2 09/25] clock: Provide builtin multiplier/divider

2021-08-17 Thread Damien Hedde



On 8/17/21 12:46 PM, Peter Maydell wrote:
> On Tue, 17 Aug 2021 at 10:59, Damien Hedde  wrote:
>>
>>
>>
>> On 8/12/21 11:33 AM, Peter Maydell wrote:
> 
>>> +void clock_set_mul_div(Clock *clk, uint32_t multiplier, uint32_t divider)
>>> +{
>>> +assert(divider != 0);
>>> +
>>> +clk->multiplier = multiplier;
>>> +clk->divider = divider;
>>> +}
>>> +
>>>  static void clock_initfn(Object *obj)
>>>  {
>>>  Clock *clk = CLOCK(obj);
>>>
>>> +clk->multiplier = 1;
>>> +clk->divider = 1;
>>> +
>>>  QLIST_INIT(>children);
>>>  }
>>>
>>>
>>
>> Regarding migration, you made the vmstate_muldiv subsection optional. I
>> suppose it is to keep backward migration working in case of simple
>> mul=1,div=1 clocks.
> 
> Yes -- we only send the subsection if the mul,div are
> something other than 1,1; an inbound migration stream without
> the subsection then means "use 1,1".
> 
>> Do we need to ensure multiplier and divider fields are set to 1 upon
>> receiving a state if the vmstate_muldiv subsection is absent ? I
>> remember there are post_load() tricks to achieve that.
> 
> I was relying on "we set the default in clock_initfn()" (by analogy
> with being able to assume that fields in device state are at their
> reset values when an inbound migration happens, so if the migration
> doesn't set them then they stay at those reset values). But
> thinking about it a bit more I think you're right and we do have to
> have a pre_load function to set them to 1. Otherwise we would get
> wrong the case where a board/SoC/device sets a clock up on reset
> to have a non-1 multiplier, and then later the guest programs it to
> be 1, and then we migrate like that. (That is, the mul/div at point
> of migration will be the value-on-reset as set by the device etc
> code that created the clock, which might be different from what
> clock_initfn() set it to.)

Yes, I think we cannot expect the machine init or reset to keep all
clocks in div=1,mul=1 state.

> 
> So we need to add something like
> 
> static int clock_pre_load(void *opaque)
> {
> Clock *clk = opaque;
> /*
>  * The initial out-of-reset settings of the Clock might have been
>  * configured by the device to be different from what we set
>  * in clock_initfn(), so we must here set the default values to
>  * be used if they are not in the inbound migration state.
>  */
> clk->multiplier = 1;
> clk->divider = 1;
> }
> 

With this callback registered in the main vmsd section,
Reviewed-by: Damien Hedde 

Thanks,
--
Damien



Re: [PATCH 0/6] virtio-iommu: Add ACPI support

2021-08-17 Thread Eric Auger
Hi Jean,

On 8/10/21 10:45 AM, Jean-Philippe Brucker wrote:
> Allow instantiating a virtio-iommu device on ACPI systems by adding a
> Virtual I/O Translation table (VIOT). Enable x86 support for VIOT.

Don't you need your other patch
"virtio-iommu: Default to bypass during boot"?

Without this latter, for me the guest fails to boot.

Thanks

Eric
>
> With a simple configuration the table contains a virtio-iommu-pci node
> and a pci-range node:
>
>   qemu-system-aarch64 -M virt -bios QEMU_EFI.fd
> -device virtio-iommu ...
>
>   $ iasl -d ...
>   [000h    4]Signature : "VIOT"
>
>   [024h 0036   2]   Node count : 0002
>   [026h 0038   2]  Node offset : 0030
>
>   [030h 0048   1] Type : 03 [VirtIO-PCI IOMMU]
>   [032h 0050   2]   Length : 0010
>   [034h 0052   2]  PCI Segment : 
>   [036h 0054   2]   PCI BDF number : 0030
>
>   [040h 0064   1] Type : 01 [PCI Range]
>   [042h 0066   2]   Length : 0018
>   [044h 0068   4]   Endpoint start : 
>   [048h 0072   2]PCI Segment start : 
>   [04Ah 0074   2]  PCI Segment end : 
>   [04Ch 0076   2]PCI BDF start : 
>   [04Eh 0078   2]  PCI BDF end : 00FF
>   [050h 0080   2]  Output node : 0030
>
> With a more complex topology multiple PCI Range nodes describe the system:
>
>   qemu-system-aarch64 -bios QEMU_EFI.fd -device virtio-iommu
> -M virt,default_bus_bypass_iommu=true
> -device pxb-pcie,bus_nr=0x10,id=pcie.1000,bus=pcie.0
> -device pxb-pcie,bus_nr=0x20,id=pcie.2000,bus=pcie.0,bypass_iommu=true
> -device pxb-pcie,bus_nr=0x30,id=pcie.3000,bus=pcie.0
>
>   [024h 0036   2]   Node count : 0003
>   [026h 0038   2]  Node offset : 0030
>
>   [030h 0048   1] Type : 03 [VirtIO-PCI IOMMU]
>   [032h 0050   2]   Length : 0010
>   [034h 0052   2]  PCI Segment : 
>   [036h 0054   2]   PCI BDF number : 0020
>
>   [040h 0064   1] Type : 01 [PCI Range]
>   [042h 0066   2]   Length : 0018
>   [044h 0068   4]   Endpoint start : 3000
>   [048h 0072   2]PCI Segment start : 
>   [04Ah 0074   2]  PCI Segment end : 
>   [04Ch 0076   2]PCI BDF start : 3000
>   [04Eh 0078   2]  PCI BDF end : 32FF
>   [050h 0080   2]  Output node : 0030
>
>   [058h 0088   1] Type : 01 [PCI Range]
>   [05Ah 0090   2]   Length : 0018
>   [05Ch 0092   4]   Endpoint start : 1000
>   [060h 0096   2]PCI Segment start : 
>   [062h 0098   2]  PCI Segment end : 
>   [064h 0100   2]PCI BDF start : 1000
>   [066h 0102   2]  PCI BDF end : 11FF
>   [068h 0104   2]  Output node : 0030
>
>
> The VIOT table description will be in the next release of ACPI.
> In the meantime you can find a description at
> https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
> Linux support for VIOT was added in version 5.14
>
> Eric Auger (1):
>   pc: Allow instantiating a virtio-iommu device
>
> Jean-Philippe Brucker (5):
>   acpi: Add VIOT structure definitions
>   hw/acpi: Add VIOT table
>   hw/arm/virt-acpi-build: Add VIOT table for virtio-iommu
>   hw/arm/virt: Remove device tree restriction for virtio-iommu
>   pc: Add VIOT table for virtio-iommu
>
>  hw/acpi/viot.h   | 13 ++
>  include/hw/acpi/acpi-defs.h  | 60 ++
>  include/hw/i386/pc.h |  2 +
>  hw/acpi/viot.c   | 82 
>  hw/arm/virt-acpi-build.c |  7 +++
>  hw/arm/virt.c| 10 +
>  hw/i386/acpi-build.c |  5 +++
>  hw/i386/pc.c | 18 +++-
>  hw/virtio/virtio-iommu-pci.c |  7 ---
>  hw/acpi/Kconfig  |  4 ++
>  hw/acpi/meson.build  |  1 +
>  hw/arm/Kconfig   |  1 +
>  hw/i386/Kconfig  |  1 +
>  13 files changed, 195 insertions(+), 16 deletions(-)
>  create mode 100644 hw/acpi/viot.h
>  create mode 100644 hw/acpi/viot.c
>




Re: [PATCH 0/8] target/mips: Housekeeping in gen_helper() macros

2021-08-17 Thread Richard Henderson

On 8/16/21 10:50 AM, Philippe Mathieu-Daudé wrote:

Trivial patches:
- Remove unused macros
- Use tcg_constant_i32()
- Inline the macros when few uses
- Move macro definitions in translate.h

Philippe Mathieu-Daudé (8):
   target/mips: Remove gen_helper_0e3i()
   target/mips: Remove gen_helper_1e2i()
   target/mips: Use tcg_constant_i32() in gen_helper_0e2i()
   target/mips: Simplify gen_helper() macros by using tcg_constant_i32()
   target/mips: Inline gen_helper_1e1i() call in op_ld_INSN() macros
   target/mips: Inline gen_helper_0e0i()
   target/mips: Use tcg_constant_i32() in generate_exception_err()
   target/mips: Define gen_helper() macros in translate.h


Reviewed-by: Richard Henderson 

r~



Bootloading within QEMU?

2021-08-17 Thread Kenneth Adam Miller
Hello,

I am trying to discover how to schedule QEMU to begin actual emulation as
currently my target correctly starts QEMU but only shows the shell, and not
even boot loading occurs within QEMU. I'm trying to learn from example, and
so will focus my questions only on X86. I can see the MachineClass and
MachineState types, and I have tried to follow QEMU with the debugger and
found where QEMU calls qemu_init and qemu_main_loop under
qemu/softmmu/main.c, and even tried to follow through from init to main
loop to see where it would begin booting, but I cannot see where the
bootloader is scheduled or specified or started from within the target
occurs.

It's difficult because the surrounding QEMU APIs that each target must use
isn't documented at all. There's a little bit of abstract documentation
that can be generated, same as what is publicly found. Reading that only
does so much good because there isn't much of a guide as to how to program
the surrounding QEMU libraries.


Re: [PATCH-for-6.1 v2] softmmu/physmem: fix wrong assertion in qemu_ram_alloc_internal()

2021-08-17 Thread Peter Xu
On Tue, Aug 17, 2021 at 09:14:32AM +0200, David Hildenbrand wrote:
> Libvirt does not support virtio-mem yet and consequently doesn't support
> reserve=off yet. (there are use cases without virtio-mem, but I don't think
> anybody is using it yet)
> 
> It's an easy way to crash QEMU, but we could also fix in the -stable tree
> instead.

I hit this when trying with some extreme bitmap tests then I remembered this
patch, but that shouldn't really be used by any real customer for sure.

Sounds good then, thanks.

-- 
Peter Xu




Re: [RFC PATCH 00/13] Add support for Mirror VM.

2021-08-17 Thread Ashish Kalra
Hello Dave, Steve,

On Tue, Aug 17, 2021 at 09:38:24AM +0100, Dr. David Alan Gilbert wrote:
> * Steve Rutherford (srutherf...@google.com) wrote:
> > On Mon, Aug 16, 2021 at 6:37 AM Ashish Kalra  wrote:
> > >
> > > From: Ashish Kalra 
> > >
> > > This is an RFC series for Mirror VM support that are
> > > essentially secondary VMs sharing the encryption context
> > > (ASID) with a primary VM. The patch-set creates a new
> > > VM and shares the primary VM's encryption context
> > > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > > The mirror VM uses a separate pair of VM + vCPU file
> > > descriptors and also uses a simplified KVM run loop,
> > > for example, it does not support any interrupt vmexit's. etc.
> > > Currently the mirror VM shares the address space of the
> > > primary VM.
> > Sharing an address space is incompatible with post-copy migration via
> > UFFD on the target side. I'll be honest and say I'm not deeply
> > familiar with QEMU's implementation of post-copy, but I imagine there
> > must be a mapping of guest memory that doesn't fault: on the target
> > side (or on both sides), the migration helper will need to have it's
> > view of guest memory go through that mapping, or a similar mapping.
> 
> Ignoring SEV, our postcopy currently has a single mapping which is
> guarded by UFFD. There is no 'no-fault' mapping.  We use the uffd ioctl
> to 'place' a page into that space when we receive it.
> But yes, I guess that can't work with SEV; as you say; if the helper
> has to do the write, it'll have to do it into a shadow that it can write
> to, even though the rest of th e guest must UF on access.
> 

I assume that MH will be sharing the address space for source VM,
this will be in compatibility with host based live migration, the source
VM will be running in context of the qemu process (with all it's vcpu
threads and migration thread). 

Surely sharing the address space on target side will be incompatible
with post copy migration, as post copy migration will need to setup UFFD
mappings to start running the target VM while post copy migration is
active.

But will the UFFD mappings only be setup till the post-copy migration is
active, won't similar page mappings as source VM be setup on target VM
after the migration process is complete ?

Thanks,
Ashish

> > Separately, I'm a little weary of leaving the migration helper mapped
> > into the shared address space as writable. Since the migration threads
> > will be executing guest-owned code, the guest could use these threads
> > to do whatever it pleases (including getting free cycles)a
> 
> Agreed.
> 
> > . The
> > migration helper's code needs to be trusted by both the host and the
> > guest. 
> 
> 
> > Making it non-writable, sourced by the host, and attested by
> > the hardware would mitigate these concerns.
> 
> Some people worry about having the host supply the guest firmware,
> because they worry they'll be railroaded into using something they
> don't actually trust, and if there aim of using SEV etc is to avoid
> trusting the cloud owner, that breaks that.
> 
> So for them, I think they want the migration helper to be trusted by the
> guest and tolerated by the host.
> 
> > The host could also try to
> > monitor for malicious use of migration threads, but that would be
> > pretty finicky.  The host could competitively schedule the migration
> > helper vCPUs with the guest vCPUs, but I'd imagine that wouldn't be
> > the best for guest performance.
> 
> The CPU usage has to go somewhere.
> 
> Dave
> 
> > 
> > --Steve
> > 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 



  1   2   >